Libgdx goes HTML5

update #2: and another port, basilisk ported his game bumble, see below. F’ing A!
update: twbompo already ported his game QB :D, see below

I wrote a GWT backend for libgdx over the weekend and ported Super Jumper. Click on the below screenshot to try it out. You need Firefox/Safari/Chrome/Opera 11 with WebGL support. You can also run it in Opera Mobile on Android or on the Xperia Play browser. If nothing shows up, please check if WebGL works on your browser/OS/GPU combination, e.g. by trying some of the WebGL experiments. If nothing helps, i’m afraid your browser/OS/GPU isn’t cool enough. Use your mouse to navigate through the “menus”, use the left and right arrow key to move the dude ones you clicked on “Ready?”.

Stefan Wagner, alias twbompo, already ported his game QB. Click the image below:

And here’s another quick port by basilisk from Molded Bits.

This means we can now compile libgdx apps to Javascript and have the run natively in the browser. The code runs totally unmodified (save for 2 fixes due to Super Jumper formally being a GLES 1.0 game. Just removed glEnable(GL_TEXTURE_2D) and cam.apply(gl10). Check the SVN if you don’t trust my word, the gwt project for Super Jumper is in there as well :)). The port took 10 minutes, setting up the GWT project in Eclipse took most of that time.

What’s done

  • Graphics fully implemented, full GL ES 2.0 capabilities. There’s some space for optimizations.
  • Input 90% implemented, need a solution for Input#getTextInput()
  • Files 80% implemented, files are preloaded before the app starts. They will be fetched from the browser cache on second startup. Files are accessed via Gdx.files.internal, so your code can stay the same. Currently you can’t read binary files, i’ll try to work around that with a few tiny tricks, no promises.
  • Most of the graphisc classes like Mesh, ShaderProgram, SpriteBatch, TextureAtlas, BitmapFont and so on are fully working
  • All of the math classes are fully supported.
  • Almost all collections in utils/
  • More stuff i forgot.

You can check out this file to see which libgdx classes you can currently use. I have to gradually include more classes and check if everthing still works.

What’s left to do

  • Audio, most likely leveraging Sound Manager 2 which falls back to Flash if the audio tag doesn’t work
  • Preferences and local files, most likely implemented via the Web Storage APIs.
  • Binary file support. This one is a bit harder to do with regards to caching, but it should be possible. XmlHttpRequest to the rescue (screw IE)
  • Making sure the rest of libgdx compiles and works (scene2d, some stuff in the packages utils and g3d).
  • Figuring out a good way to embed GWT apps into any web page

What i’d like to integrate eventually

  • Web Sockets. The GWT Quake 2 port did i, so can we. This would mean the inclusion of a new module for libgdx that provides networking capabilities. So far this was unnecessary as all backends have access to the standard Java networking classes.
  • Mobile HTML5 apis, e.g. anything that PhoneGap exposes (multi-touch, accelerometer etc).
  • App cache support, so you can play offline.

What you won’t be able to do with the GWT backend

  • Native code, obviously. That also means no Box2D, unless you use JBox2D or the cleaned up version by the PlayN guys.
  • Concurrency, Javascript is single threaded, there’s no way to do anything fancy. However, we can easily abstract things like http requests via callbacks. Web workers seem to be in their infancy at the moment, and communication is done via message passing. A Java-compatible version of that is unlikely (e.g. concurrency primitives you usually work with won’t be available)
  • All the caveats of GWT apply. Reflection is not available, which is a pain because our awesome Json class relies heavily on reflection. A lot of standard Java classes aren’t available either in GWT. I did my best to port over a few crucial ones, like most of the and java.nio package from Harmony/Avian/gwt-quake-2. Big shout out to those projects, and Stefan Haustein in particular, who did a lot of ground-work with gwt-quake. Standing on the shoulder of giants here 🙂
  • Local file storage will be limited. I’m new to this html5 stuff, but if i’m correct, then you can store just a few mb on the client side.
  • Other things i overlooked so far, e.g. integration of in-app purchases and so on. Given that we use GWT, the simple interface approach you already use on Android/desktop should work here as well.

Where’s the code?

In SVN. Check out the gdx-tests-gwt project and the superjumper projects. If you want to play around with things, check out the SVN trunk, import everything into Eclipse, make sure GWT is installed and away you go. I’ll post a full guide how to setup a desktop/android/gwt project once i’m done with the above list. Note that running GWT projects in hosted mode is terribly slow and not representative of the real performance. Think Android emulator…

What about PlayN?

After Google I/O last year i contacted the guys. They were not really interested in a collaboration (yeah, i was kinda disappoint). They were aware of libgdx at the time, which is unsurprising given both frameworks have a very similar architecture and target audience (it’s a pretty obvious design choice :)). Here’s why i think that libgdx might be more valuable in the end. It is also not meant as an attack on the project, i sincerely like what they guys pulled off (especially their non-Google contributors which seem to keep the project going). The work Ray Cromwell did on the GWT backend is fantastic, he’s a beast 🙂 Take the following as a justification for me to invest time in the GWT backend.

PlayN doesn’t expose OpenGL directly, as they fall back to Canvas or a DOM renderer if WebGL is not available, a good choice if you want to stay as compatible as possible. This might also be a wise decision should WebGL not get the adoption it deserves. With the GWT backend i do not aim for maximum compatibility. WebGL is a requirement, so IE is out (unless you use Google Chrome Frame maybe). I place my bet on WebGL, hence the full exposure of the GLES 2.0 like API in the backend, which enables you to do anything graphically. update: Stefan Haustein just send me a mail, he’s now working on PlayN (a good thing ™) and will add a GL ES 2.0 interface.

The Android port seems to have a few performance problems, which given the age of the framework is more than forgivable. This is something that can be solved over time. Libgdx does not sacrifice anything in that regard, it’s probably as fast as can be given it’s Dalvik/JVM heritage.

The Flash port is basically dead, which is kinda sad. In my opinion this was the big point for PlayN, Flash is everyhwere, HTML5 not yet.

There’s currently an iOS port in the works, based on IKVM and MonoTouch (sounds familiar?). I’m not sure if this will allow to debug the transformed Java/CLI code in MonoDevelop, but if it does it’s a pretty cool thing. What’s less cool is that MonoTouch costs 399$ in its most basic version. My work on the Avian based iOS backend is currently on ice, i can’t stem all of the work i’d like to do at the moment. Avian would mean no debugging support (unless i can add a JDWP bridge, which is unlikely, Avian is a complex beast). It currently also means no FPU support, the ARM backend emits software float code. An ARM emitter with FPU support is in the works it seems, not sure about it’s current state. There’s a lot of drawbacks using Avian, but in the end it is fun to do something a tad bit more complex from time to time, and it would mean i don’t have to pay for MonoTouch. PlayN has an easier time targeting other platforms like consoles should the Mono port come to fruitation. Avian can be made to work there as well, however, the dependency on OpenGL is a major drawback.

The build system of PlayN seems to be based on Maven with all its benefits and complexities. At the moment it seems to put off beginners, eventually that might be a big plus for PlayN. The more platforms you support, the harder it can be to setup a project. Whether Maven is the way to go in terms of dependency managment and project setup is another story. At the moment it seems to be hard to escape the claws of Maven.

Documentation is a big issue, for both projects. I like to think that we are a tad bit ahead in that regard, i might be totally wrong. Our forums seem to be filled with a lot of knowledgable people that actually help out, we have a few video tutorials to help absolute newcomers and mostly complete java docs. For libgdx, i think we are on the right track. We still lack a unified tutorial/dev guide system. PlayN is still in the process of gathering a community. Again, the age of the project probably plays a major role in that regard, and given time this can and will improve.

In the end it’s a matter of taste and targets. Choose your poison 🙂

Reducing CPU and Battery Usage in OpenGL Apps on Android

ninja edit: the stats below are for an Asus Transformer. Readings will heavily differ on other devices. Grain of salt etc. Also, these are tips for the continuous rendering case. I’ll try to incorporate non-continuous rendering if i can find the time.

Long ass title, but i like things to be descriptive. I’m currently working on a non-gaming app for Android that lets you browse reddit in a more visual style. Here’s a little screenshot:

The tiles are actually decals rendered via DecalBatch, text beneath each tile is rendered via BitmapFontCache and a SpriteBatch, the UI at the top is a Stage with UI Actors (FlickScrollPane, Image, TextButton). Usually you browse a subreddit, displaying 50+ entries, through which you can scroll vertically. When you click on a tile, it expands into a new view, depending on it’s type. Here’s how an image is displayed:

You can freely pinch zoom/pan within the image view. There are also views for videos and html previews. In any case, there’s a lot going on in terms of blending, submitting geometry and so on.

At this point there are hardly any optimizations on the rendering side of things, i invested most of my time getting the heavy threading for media retrieval correct and working.

I’m worried about battery usage, as the application should be a competitor to browsing reddit in the browser or in one of the many reddit apps on the market. Being an OpenGL application with lots of animations and transitions, it’s rather hard to only render dirty regions. The only way for me to reduce battery usage is to decrease CPU usage as much as possible, and having an overall dark theme so as to not make the display shine the energy of a thousand suns into your face. Here’s what i did.

The first step was a simple analysis of the CPU usage of the app. The easiest way to do this is to use adb from the console as follows:

This will fire up top on the connected device and output the top 10 cpu consumers in 2-3 second intervals. I additionally grep for my applications name, rather, the part of the package name which will also be the process name reported by top. I then let the app run and interact with it as usual. What i get is a coarse grained trace of CPU activity which is sufficient for an initial analysis. Here’s the output of the unoptimized app after a bit of playing around with it on my Transformer:

27174 1 22% S 11 449116K 36916K fg app_126 com.badlogic.reddittv
27174 1 20% S 11 449032K 36820K fg app_126 com.badlogic.reddittv
27174 0 20% S 11 449032K 36824K fg app_126 com.badlogic.reddittv
27174 0 21% S 11 449032K 36832K fg app_126 com.badlogic.reddittv
27174 0 20% S 11 449032K 36840K fg app_126 com.badlogic.reddittv
27174 0 20% S 11 449032K 36840K fg app_126 com.badlogic.reddittv
27174 0 21% S 11 449032K 36844K fg app_126 com.badlogic.reddittv
27174 0 21% S 11 449032K 36856K fg app_126 com.badlogic.reddittv
27174 0 21% S 11 449032K 36880K fg app_126 com.badlogic.reddittv
27174 0 20% S 11 449032K 36888K fg app_126 com.badlogic.reddittv
27174 0 18% S 11 449032K 36892K fg app_126 com.badlogic.reddittv
27174 0 19% S 11 449032K 36892K fg app_126 com.badlogic.reddittv

First, notice how we don’t take up 100% of the CPU. That can be attributed to the vsynch we perform each frame. This will put our rendering thread to sleep for quite a bit, allowing our thread to execute at most 60 times a second. The application runs at 60fps at this point. Now, let’s see how other applications perform, namely the browser. I fired up went to and then I panned/zoomed and idled on both pages. It’s not a super fair comparison of course, but it’s a nice set of measurements i can compare my app with.
25573 1 0% S 29 597684K 157716K fg app_54
25573 1 0% S 29 596908K 155840K fg app_54
25573 1 0% S 29 596908K 155848K fg app_54
25573 1 13% S 29 597024K 156172K fg app_54
25573 1 8% S 29 597100K 156260K fg app_54
25573 0 11% S 29 597100K 156260K fg app_54
25573 0 12% S 29 597176K 156380K fg app_54
25573 0 9% S 29 597252K 156456K fg app_54
25573 0 0% S 29 597252K 156456K fg app_54
25573 1 5% S 29 597252K 156456K fg app_54
25573 0 5% S 29 597328K 156532K fg app_54
25573 0 6% S 29 597328K 156532K fg app_54
25573 1 14% S 30 603436K 158600K fg app_54
25573 0 37% S 30 609352K 162308K fg app_54
25573 0 32% S 29 610868K 162268K fg app_54
25573 1 12% S 29 608724K 160964K fg app_54
25573 0 32% S 29 602000K 160056K fg app_54
25573 0 18% S 29 602000K 160068K fg app_54
25573 0 20% S 29 602352K 160420K fg app_54
25573 1 20% S 29 602000K 160080K fg app_54

The 0% usage stats for were achieved when i did not interact with the site at all. There are no animated components, so nothing needs to be redrawn. The browser goes totally to sleep in that case. The 5-12% measurements were taken while scrolling around, causing the browser to redraw parts of the screen.

The case is a bit more interesting. I did not interact with the site at all. However, there’s a text input field on that site that gets focus automatically. This field contains a blinking cursor, which triggers redraws. Those redraws are costly it seems. It could also be the interaction with the UI toolkit, so it’s likely not pure Webkit rendering. In any case, redraws will bring up CPU usage of the browser. These happen whenever you interact with the site (panning/zooming), animated elements, javascript etc. I can not (or hardly) bring my application’s CPU usage down to 0% when it’s not being interacted with and not animated to due to the architecture, which does not keep track of dirty rectangles. However, i can try to reduce the average CPU usage by as much as possible, so that it evens out with the browsers average CPU usage under normal user interaction scenarios. For reddit that means quite a bit of panning and thus redrawing.

What reduces CPU usage? Executing less code per frame. SpriteBatch and DecalBatch perform vertex generation on the CPU as it is faster to submit a CPU crafted vertex array to the GPU containing all sprites/decals than drawing individual (sub-)meshes and reseting transformation matrices each frame. To bring my CPU usage down, i have to reduce the amount of sprites/decals i render per pass. This can be easily achieved via simple culling. Let’s see what culling decals means for CPU usage:

27288 0 20% S 11 449036K 36932K fg app_126 com.badlogic.reddittv
27288 0 20% S 11 449036K 36944K fg app_126 com.badlogic.reddittv
27288 1 20% S 11 449036K 37000K fg app_126 com.badlogic.reddittv
27288 0 19% S 11 449036K 37008K fg app_126 com.badlogic.reddittv
27288 0 20% S 11 449036K 37012K fg app_126 com.badlogic.reddittv

Well, that didn’t do much. Out of 50 decals in total, i only render 15, yet the CPU usage stays pretty much the same. It’s no surprise really, as 50 decals aren’t that big of deal in terms of calculations, especially if they don’t change. All it amounts to is copying vertices to the GPU, binding the shader with the respective textures & matrices and issuing the drawing command. Note that that would change a bit if the decals were animated, as they are during transitions. Still, tranforming the 4 vertices of each of the 50 decals is really not a big deal.

The next candidate is the text. As stated above, i use a BitmapFontCache for each tile label in conjunction with a SpriteBatch. This means that i’m also just copying vertices to the GPU, as the cache of a label won’t ever change after its construction. Let’s see what culling does for us:

27364 0 10% S 12 449504K 36268K fg app_126 com.badlogic.reddittv
27364 1 11% S 12 449504K 36276K fg app_126 com.badlogic.reddittv
27364 1 11% S 12 449696K 36452K fg app_126 com.badlogic.reddittv
27364 1 12% S 12 449696K 36504K fg app_126 com.badlogic.reddittv
27364 1 10% S 12 449696K 36504K fg app_126 com.badlogic.reddittv

Oi, not bad! We brought down the CPU usage by 20%. You might wonder how much code was necessary for this. Here it is in all its glory:

As you can see, the culling is stupidly simple. Everything is done in screen coordinates (yes, the 3D decals are actually using pixel coordinates, hurray for pixel perfect perspective projection :)). We could also exploit the fact that the overlays Array has an order, from the top left overlay to the bottom right overlay. As soon as an overlay is beneath the bottom scren edge we could jump out of the loop and don’t boughter culling the rest of the overlays, which we know would be invisible anyways. I added this optimization for a future feature were you can load more result pages. If you have hundreds of results it will make a difference to bail out early.

We are down from 20% to 10% with only 4 lines of code, not bad. Note that before the culling the application already ran at 60fps. If you only use that as your metric, you’ll miss battery saving opportunities!

Could we do better? The overlays could actually be put into a mesh, their world position and orientation does not change (the camera moves, not the overlays). This would get rid of the copying of vertex data to the GPU each frame, which has quite a bit of impact on the CPU usage if there’s a lot of text. I did that, and it seems to help a tiny little bit:

27467 1 9% S 12 452476K 46728K fg app_126 com.badlogic.reddittv
27467 1 8% S 12 452476K 46728K fg app_126 com.badlogic.reddittv
27467 1 9% S 12 452476K 46728K fg app_126 com.badlogic.reddittv
27467 1 9% S 12 452476K 46728K fg app_126 com.badlogic.reddittv
27467 1 9% S 12 452476K 46728K fg app_126 com.badlogic.reddittv

Since i essentially have a single draw call (indexing into a mesh, starting at the first glyph of the first visible overlay, to the last glyph of the last visible overlay) there’s pretty much no CPU side code executed. It’s almost as if we don’t render anything at all :p.

What else could we do? Once there’s nothing animating anymore, we could draw our scene to an FBO, so we can render that in a single drawcall for as long as there’s nothing changing (user input, new animations). This is highly dependent on the applications design of course, and might not be worth the often horribly big effort. Luckily my application suites itself for a pattern like this, games usually don’t fit that bill well. Apart from boring card games that is.

For reference, here’s the CPU usage for the app without clearing the screen, but performing all logic and input processing:

27544 1 2% S 13 451728K 43600K fg app_126 com.badlogic.reddittv
27544 1 1% S 13 451728K 43600K fg app_126 com.badlogic.reddittv
27544 1 1% S 13 451728K 43600K fg app_126 com.badlogic.reddittv
27544 1 1% S 13 451728K 43600K fg app_126 com.badlogic.reddittv
27544 1 1% S 13 451728K 43600K fg app_126 com.badlogic.reddittv
27544 1 2% S 13 451728K 43608K fg app_126 com.badlogic.reddittv

This indicates that there’s an oportunity for improvement with the FBO approach i just described. In any case, i can the apps CPU and thus battery consumption down to a level where it can be competitive with the browser. In the usualy usage scenario, where you do a lof of panning, my app already outperforms the browser. Granted, the amount of work the browser has to do to redraw dirty regions is huge compared to what my simple app does.

Take aways:

  • Care for your CPU consumption, even if your app runs at 60fps
  • Use top or similar tools to monitor the CPU usage of your app during it’s life time
  • Cache computations, e.g. use SpriteCache, BitmapFontCache or put things into a static Mesh to get rid of data transfer completely
  • Culling, broad phase and narrow phase
  • If you app’s architecture allows it, render the scene to an FBO when nothing changed, and use the FBO while nothing changes.

Wow, that got longer than i planed. Do you guys still read this? Shall i continue with these entries again? Been a while, not sure if there’s still an audience.