Non-continuous rendering in Libgdx

After the feedback on my last post i had to implement non-continuous rendering. It was easier then expected. Thanks to Romain and P.T. for subtly pointing out that the proposed “solution” for fake non-continuous rendering is pretty terrible. I will repent 🙂

Here’s how it works. The Graphics interface has three new methods:

Not sure this needs more explanation. Read the Javadocs 🙂

Caveat: This is currently a NOP in the Lwjgl backend, I added it to the Jogl backend already. It will take a bit before those are up to speed. I hope to finish that of today.

Try it out and let me know of any issues you have.

Reducing CPU and Battery Usage in OpenGL Apps on Android

ninja edit: the stats below are for an Asus Transformer. Readings will heavily differ on other devices. Grain of salt etc. Also, these are tips for the continuous rendering case. I’ll try to incorporate non-continuous rendering if i can find the time.

Long ass title, but i like things to be descriptive. I’m currently working on a non-gaming app for Android that lets you browse reddit in a more visual style. Here’s a little screenshot:

The tiles are actually decals rendered via DecalBatch, text beneath each tile is rendered via BitmapFontCache and a SpriteBatch, the UI at the top is a Stage with UI Actors (FlickScrollPane, Image, TextButton). Usually you browse a subreddit, displaying 50+ entries, through which you can scroll vertically. When you click on a tile, it expands into a new view, depending on it’s type. Here’s how an image is displayed:

You can freely pinch zoom/pan within the image view. There are also views for videos and html previews. In any case, there’s a lot going on in terms of blending, submitting geometry and so on.

At this point there are hardly any optimizations on the rendering side of things, i invested most of my time getting the heavy threading for media retrieval correct and working.

I’m worried about battery usage, as the application should be a competitor to browsing reddit in the browser or in one of the many reddit apps on the market. Being an OpenGL application with lots of animations and transitions, it’s rather hard to only render dirty regions. The only way for me to reduce battery usage is to decrease CPU usage as much as possible, and having an overall dark theme so as to not make the display shine the energy of a thousand suns into your face. Here’s what i did.

The first step was a simple analysis of the CPU usage of the app. The easiest way to do this is to use adb from the console as follows:

This will fire up top on the connected device and output the top 10 cpu consumers in 2-3 second intervals. I additionally grep for my applications name, rather, the part of the package name which will also be the process name reported by top. I then let the app run and interact with it as usual. What i get is a coarse grained trace of CPU activity which is sufficient for an initial analysis. Here’s the output of the unoptimized app after a bit of playing around with it on my Transformer:


27174 1 22% S 11 449116K 36916K fg app_126 com.badlogic.reddittv
27174 1 20% S 11 449032K 36820K fg app_126 com.badlogic.reddittv
27174 0 20% S 11 449032K 36824K fg app_126 com.badlogic.reddittv
27174 0 21% S 11 449032K 36832K fg app_126 com.badlogic.reddittv
27174 0 20% S 11 449032K 36840K fg app_126 com.badlogic.reddittv
27174 0 20% S 11 449032K 36840K fg app_126 com.badlogic.reddittv
27174 0 21% S 11 449032K 36844K fg app_126 com.badlogic.reddittv
27174 0 21% S 11 449032K 36856K fg app_126 com.badlogic.reddittv
27174 0 21% S 11 449032K 36880K fg app_126 com.badlogic.reddittv
27174 0 20% S 11 449032K 36888K fg app_126 com.badlogic.reddittv
27174 0 18% S 11 449032K 36892K fg app_126 com.badlogic.reddittv
27174 0 19% S 11 449032K 36892K fg app_126 com.badlogic.reddittv

First, notice how we don’t take up 100% of the CPU. That can be attributed to the vsynch we perform each frame. This will put our rendering thread to sleep for quite a bit, allowing our thread to execute at most 60 times a second. The application runs at 60fps at this point. Now, let’s see how other applications perform, namely the browser. I fired up went to reddit.com and then google.com. I panned/zoomed and idled on both pages. It’s not a super fair comparison of course, but it’s a nice set of measurements i can compare my app with.


reddit.com
25573 1 0% S 29 597684K 157716K fg app_54 com.android.browser
25573 1 0% S 29 596908K 155840K fg app_54 com.android.browser
25573 1 0% S 29 596908K 155848K fg app_54 com.android.browser
25573 1 13% S 29 597024K 156172K fg app_54 com.android.browser
25573 1 8% S 29 597100K 156260K fg app_54 com.android.browser
25573 0 11% S 29 597100K 156260K fg app_54 com.android.browser
25573 0 12% S 29 597176K 156380K fg app_54 com.android.browser
25573 0 9% S 29 597252K 156456K fg app_54 com.android.browser
25573 0 0% S 29 597252K 156456K fg app_54 com.android.browser
25573 1 5% S 29 597252K 156456K fg app_54 com.android.browser
25573 0 5% S 29 597328K 156532K fg app_54 com.android.browser
25573 0 6% S 29 597328K 156532K fg app_54 com.android.browser
google.com
25573 1 14% S 30 603436K 158600K fg app_54 com.android.browser
25573 0 37% S 30 609352K 162308K fg app_54 com.android.browser
25573 0 32% S 29 610868K 162268K fg app_54 com.android.browser
25573 1 12% S 29 608724K 160964K fg app_54 com.android.browser
25573 0 32% S 29 602000K 160056K fg app_54 com.android.browser
25573 0 18% S 29 602000K 160068K fg app_54 com.android.browser
25573 0 20% S 29 602352K 160420K fg app_54 com.android.browser
25573 1 20% S 29 602000K 160080K fg app_54 com.android.browser

The 0% usage stats for reddit.com were achieved when i did not interact with the site at all. There are no animated components, so nothing needs to be redrawn. The browser goes totally to sleep in that case. The 5-12% measurements were taken while scrolling around, causing the browser to redraw parts of the screen.

The google.com case is a bit more interesting. I did not interact with the site at all. However, there’s a text input field on that site that gets focus automatically. This field contains a blinking cursor, which triggers redraws. Those redraws are costly it seems. It could also be the interaction with the UI toolkit, so it’s likely not pure Webkit rendering. In any case, redraws will bring up CPU usage of the browser. These happen whenever you interact with the site (panning/zooming), animated elements, javascript etc. I can not (or hardly) bring my application’s CPU usage down to 0% when it’s not being interacted with and not animated to due to the architecture, which does not keep track of dirty rectangles. However, i can try to reduce the average CPU usage by as much as possible, so that it evens out with the browsers average CPU usage under normal user interaction scenarios. For reddit that means quite a bit of panning and thus redrawing.

What reduces CPU usage? Executing less code per frame. SpriteBatch and DecalBatch perform vertex generation on the CPU as it is faster to submit a CPU crafted vertex array to the GPU containing all sprites/decals than drawing individual (sub-)meshes and reseting transformation matrices each frame. To bring my CPU usage down, i have to reduce the amount of sprites/decals i render per pass. This can be easily achieved via simple culling. Let’s see what culling decals means for CPU usage:


27288 0 20% S 11 449036K 36932K fg app_126 com.badlogic.reddittv
27288 0 20% S 11 449036K 36944K fg app_126 com.badlogic.reddittv
27288 1 20% S 11 449036K 37000K fg app_126 com.badlogic.reddittv
27288 0 19% S 11 449036K 37008K fg app_126 com.badlogic.reddittv
27288 0 20% S 11 449036K 37012K fg app_126 com.badlogic.reddittv

Well, that didn’t do much. Out of 50 decals in total, i only render 15, yet the CPU usage stays pretty much the same. It’s no surprise really, as 50 decals aren’t that big of deal in terms of calculations, especially if they don’t change. All it amounts to is copying vertices to the GPU, binding the shader with the respective textures & matrices and issuing the drawing command. Note that that would change a bit if the decals were animated, as they are during transitions. Still, tranforming the 4 vertices of each of the 50 decals is really not a big deal.

The next candidate is the text. As stated above, i use a BitmapFontCache for each tile label in conjunction with a SpriteBatch. This means that i’m also just copying vertices to the GPU, as the cache of a label won’t ever change after its construction. Let’s see what culling does for us:


27364 0 10% S 12 449504K 36268K fg app_126 com.badlogic.reddittv
27364 1 11% S 12 449504K 36276K fg app_126 com.badlogic.reddittv
27364 1 11% S 12 449696K 36452K fg app_126 com.badlogic.reddittv
27364 1 12% S 12 449696K 36504K fg app_126 com.badlogic.reddittv
27364 1 10% S 12 449696K 36504K fg app_126 com.badlogic.reddittv

Oi, not bad! We brought down the CPU usage by 20%. You might wonder how much code was necessary for this. Here it is in all its glory:

As you can see, the culling is stupidly simple. Everything is done in screen coordinates (yes, the 3D decals are actually using pixel coordinates, hurray for pixel perfect perspective projection :)). We could also exploit the fact that the overlays Array has an order, from the top left overlay to the bottom right overlay. As soon as an overlay is beneath the bottom scren edge we could jump out of the loop and don’t boughter culling the rest of the overlays, which we know would be invisible anyways. I added this optimization for a future feature were you can load more result pages. If you have hundreds of results it will make a difference to bail out early.

We are down from 20% to 10% with only 4 lines of code, not bad. Note that before the culling the application already ran at 60fps. If you only use that as your metric, you’ll miss battery saving opportunities!

Could we do better? The overlays could actually be put into a mesh, their world position and orientation does not change (the camera moves, not the overlays). This would get rid of the copying of vertex data to the GPU each frame, which has quite a bit of impact on the CPU usage if there’s a lot of text. I did that, and it seems to help a tiny little bit:


27467 1 9% S 12 452476K 46728K fg app_126 com.badlogic.reddittv
27467 1 8% S 12 452476K 46728K fg app_126 com.badlogic.reddittv
27467 1 9% S 12 452476K 46728K fg app_126 com.badlogic.reddittv
27467 1 9% S 12 452476K 46728K fg app_126 com.badlogic.reddittv
27467 1 9% S 12 452476K 46728K fg app_126 com.badlogic.reddittv

Since i essentially have a single draw call (indexing into a mesh, starting at the first glyph of the first visible overlay, to the last glyph of the last visible overlay) there’s pretty much no CPU side code executed. It’s almost as if we don’t render anything at all :p.

What else could we do? Once there’s nothing animating anymore, we could draw our scene to an FBO, so we can render that in a single drawcall for as long as there’s nothing changing (user input, new animations). This is highly dependent on the applications design of course, and might not be worth the often horribly big effort. Luckily my application suites itself for a pattern like this, games usually don’t fit that bill well. Apart from boring card games that is.

For reference, here’s the CPU usage for the app without clearing the screen, but performing all logic and input processing:


27544 1 2% S 13 451728K 43600K fg app_126 com.badlogic.reddittv
27544 1 1% S 13 451728K 43600K fg app_126 com.badlogic.reddittv
27544 1 1% S 13 451728K 43600K fg app_126 com.badlogic.reddittv
27544 1 1% S 13 451728K 43600K fg app_126 com.badlogic.reddittv
27544 1 1% S 13 451728K 43600K fg app_126 com.badlogic.reddittv
27544 1 2% S 13 451728K 43608K fg app_126 com.badlogic.reddittv

This indicates that there’s an oportunity for improvement with the FBO approach i just described. In any case, i can the apps CPU and thus battery consumption down to a level where it can be competitive with the browser. In the usualy usage scenario, where you do a lof of panning, my app already outperforms the browser. Granted, the amount of work the browser has to do to redraw dirty regions is huge compared to what my simple app does.

Take aways:

  • Care for your CPU consumption, even if your app runs at 60fps
  • Use top or similar tools to monitor the CPU usage of your app during it’s life time
  • Cache computations, e.g. use SpriteCache, BitmapFontCache or put things into a static Mesh to get rid of data transfer completely
  • Culling, broad phase and narrow phase
  • If you app’s architecture allows it, render the scene to an FBO when nothing changed, and use the FBO while nothing changes.

Wow, that got longer than i planed. Do you guys still read this? Shall i continue with these entries again? Been a while, not sure if there’s still an audience.

EGL Context loss and GLES 2.0 performance

I just finished off a couple of new improvements.

  • If your app runs on a device with Android +3.0, the EGL context will be preserved when there’s not to much resource pressure from other apps. You don’t have to do anything special.
  • SpriteBatch/DecalBatch and all things that use those funky classes (Stage, etc.) got a performance boost when using GLES 2.0. On my test devices, GLES 2.0 is now always faster than GLES 1.x.

Those improvements were added as i discovered a few performance issues with an app i’m working on. There might be more to come 😀