Reducing CPU and Battery Usage in OpenGL Apps on Android

ninja edit: the stats below are for an Asus Transformer. Readings will heavily differ on other devices. Grain of salt etc. Also, these are tips for the continuous rendering case. I’ll try to incorporate non-continuous rendering if i can find the time.

Long ass title, but i like things to be descriptive. I’m currently working on a non-gaming app for Android that lets you browse reddit in a more visual style. Here’s a little screenshot:

The tiles are actually decals rendered via DecalBatch, text beneath each tile is rendered via BitmapFontCache and a SpriteBatch, the UI at the top is a Stage with UI Actors (FlickScrollPane, Image, TextButton). Usually you browse a subreddit, displaying 50+ entries, through which you can scroll vertically. When you click on a tile, it expands into a new view, depending on it’s type. Here’s how an image is displayed:

You can freely pinch zoom/pan within the image view. There are also views for videos and html previews. In any case, there’s a lot going on in terms of blending, submitting geometry and so on.

At this point there are hardly any optimizations on the rendering side of things, i invested most of my time getting the heavy threading for media retrieval correct and working.

I’m worried about battery usage, as the application should be a competitor to browsing reddit in the browser or in one of the many reddit apps on the market. Being an OpenGL application with lots of animations and transitions, it’s rather hard to only render dirty regions. The only way for me to reduce battery usage is to decrease CPU usage as much as possible, and having an overall dark theme so as to not make the display shine the energy of a thousand suns into your face. Here’s what i did.

The first step was a simple analysis of the CPU usage of the app. The easiest way to do this is to use adb from the console as follows:

~/adb shell top -m 10 | grep reddittv

This will fire up top on the connected device and output the top 10 cpu consumers in 2-3 second intervals. I additionally grep for my applications name, rather, the part of the package name which will also be the process name reported by top. I then let the app run and interact with it as usual. What i get is a coarse grained trace of CPU activity which is sufficient for an initial analysis. Here’s the output of the unoptimized app after a bit of playing around with it on my Transformer:

27174 1 22% S 11 449116K 36916K fg app_126 com.badlogic.reddittv
27174 1 20% S 11 449032K 36820K fg app_126 com.badlogic.reddittv
27174 0 20% S 11 449032K 36824K fg app_126 com.badlogic.reddittv
27174 0 21% S 11 449032K 36832K fg app_126 com.badlogic.reddittv
27174 0 20% S 11 449032K 36840K fg app_126 com.badlogic.reddittv
27174 0 20% S 11 449032K 36840K fg app_126 com.badlogic.reddittv
27174 0 21% S 11 449032K 36844K fg app_126 com.badlogic.reddittv
27174 0 21% S 11 449032K 36856K fg app_126 com.badlogic.reddittv
27174 0 21% S 11 449032K 36880K fg app_126 com.badlogic.reddittv
27174 0 20% S 11 449032K 36888K fg app_126 com.badlogic.reddittv
27174 0 18% S 11 449032K 36892K fg app_126 com.badlogic.reddittv
27174 0 19% S 11 449032K 36892K fg app_126 com.badlogic.reddittv

First, notice how we don’t take up 100% of the CPU. That can be attributed to the vsynch we perform each frame. This will put our rendering thread to sleep for quite a bit, allowing our thread to execute at most 60 times a second. The application runs at 60fps at this point. Now, let’s see how other applications perform, namely the browser. I fired up went to and then I panned/zoomed and idled on both pages. It’s not a super fair comparison of course, but it’s a nice set of measurements i can compare my app with.
25573 1 0% S 29 597684K 157716K fg app_54
25573 1 0% S 29 596908K 155840K fg app_54
25573 1 0% S 29 596908K 155848K fg app_54
25573 1 13% S 29 597024K 156172K fg app_54
25573 1 8% S 29 597100K 156260K fg app_54
25573 0 11% S 29 597100K 156260K fg app_54
25573 0 12% S 29 597176K 156380K fg app_54
25573 0 9% S 29 597252K 156456K fg app_54
25573 0 0% S 29 597252K 156456K fg app_54
25573 1 5% S 29 597252K 156456K fg app_54
25573 0 5% S 29 597328K 156532K fg app_54
25573 0 6% S 29 597328K 156532K fg app_54
25573 1 14% S 30 603436K 158600K fg app_54
25573 0 37% S 30 609352K 162308K fg app_54
25573 0 32% S 29 610868K 162268K fg app_54
25573 1 12% S 29 608724K 160964K fg app_54
25573 0 32% S 29 602000K 160056K fg app_54
25573 0 18% S 29 602000K 160068K fg app_54
25573 0 20% S 29 602352K 160420K fg app_54
25573 1 20% S 29 602000K 160080K fg app_54

The 0% usage stats for were achieved when i did not interact with the site at all. There are no animated components, so nothing needs to be redrawn. The browser goes totally to sleep in that case. The 5-12% measurements were taken while scrolling around, causing the browser to redraw parts of the screen.

The case is a bit more interesting. I did not interact with the site at all. However, there’s a text input field on that site that gets focus automatically. This field contains a blinking cursor, which triggers redraws. Those redraws are costly it seems. It could also be the interaction with the UI toolkit, so it’s likely not pure Webkit rendering. In any case, redraws will bring up CPU usage of the browser. These happen whenever you interact with the site (panning/zooming), animated elements, javascript etc. I can not (or hardly) bring my application’s CPU usage down to 0% when it’s not being interacted with and not animated to due to the architecture, which does not keep track of dirty rectangles. However, i can try to reduce the average CPU usage by as much as possible, so that it evens out with the browsers average CPU usage under normal user interaction scenarios. For reddit that means quite a bit of panning and thus redrawing.

What reduces CPU usage? Executing less code per frame. SpriteBatch and DecalBatch perform vertex generation on the CPU as it is faster to submit a CPU crafted vertex array to the GPU containing all sprites/decals than drawing individual (sub-)meshes and reseting transformation matrices each frame. To bring my CPU usage down, i have to reduce the amount of sprites/decals i render per pass. This can be easily achieved via simple culling. Let’s see what culling decals means for CPU usage:

27288 0 20% S 11 449036K 36932K fg app_126 com.badlogic.reddittv
27288 0 20% S 11 449036K 36944K fg app_126 com.badlogic.reddittv
27288 1 20% S 11 449036K 37000K fg app_126 com.badlogic.reddittv
27288 0 19% S 11 449036K 37008K fg app_126 com.badlogic.reddittv
27288 0 20% S 11 449036K 37012K fg app_126 com.badlogic.reddittv

Well, that didn’t do much. Out of 50 decals in total, i only render 15, yet the CPU usage stays pretty much the same. It’s no surprise really, as 50 decals aren’t that big of deal in terms of calculations, especially if they don’t change. All it amounts to is copying vertices to the GPU, binding the shader with the respective textures & matrices and issuing the drawing command. Note that that would change a bit if the decals were animated, as they are during transitions. Still, tranforming the 4 vertices of each of the 50 decals is really not a big deal.

The next candidate is the text. As stated above, i use a BitmapFontCache for each tile label in conjunction with a SpriteBatch. This means that i’m also just copying vertices to the GPU, as the cache of a label won’t ever change after its construction. Let’s see what culling does for us:

27364 0 10% S 12 449504K 36268K fg app_126 com.badlogic.reddittv
27364 1 11% S 12 449504K 36276K fg app_126 com.badlogic.reddittv
27364 1 11% S 12 449696K 36452K fg app_126 com.badlogic.reddittv
27364 1 12% S 12 449696K 36504K fg app_126 com.badlogic.reddittv
27364 1 10% S 12 449696K 36504K fg app_126 com.badlogic.reddittv

Oi, not bad! We brought down the CPU usage by 20%. You might wonder how much code was necessary for this. Here it is in all its glory:

private void drawOverlay () {
	orthoCamera.position.x = camera.position.x;
	orthoCamera.position.y = camera.position.y;
	float upperY = camera.position.y + * 0.5f;
	float lowerY = camera.position.y - * 0.5f;
	for(int i = 0; i < overlays.size; i++) {
		Overlay overlay = overlays.get(i);
		float halfHeight = overlay.getHeight() * 0.5f;
		if(overlay.getPosition().y + halfHeight < lowerY || overlay.getPosition().y - halfHeight > upperY) continue;

As you can see, the culling is stupidly simple. Everything is done in screen coordinates (yes, the 3D decals are actually using pixel coordinates, hurray for pixel perfect perspective projection :)). We could also exploit the fact that the overlays Array has an order, from the top left overlay to the bottom right overlay. As soon as an overlay is beneath the bottom scren edge we could jump out of the loop and don’t boughter culling the rest of the overlays, which we know would be invisible anyways. I added this optimization for a future feature were you can load more result pages. If you have hundreds of results it will make a difference to bail out early.

We are down from 20% to 10% with only 4 lines of code, not bad. Note that before the culling the application already ran at 60fps. If you only use that as your metric, you’ll miss battery saving opportunities!

Could we do better? The overlays could actually be put into a mesh, their world position and orientation does not change (the camera moves, not the overlays). This would get rid of the copying of vertex data to the GPU each frame, which has quite a bit of impact on the CPU usage if there’s a lot of text. I did that, and it seems to help a tiny little bit:

27467 1 9% S 12 452476K 46728K fg app_126 com.badlogic.reddittv
27467 1 8% S 12 452476K 46728K fg app_126 com.badlogic.reddittv
27467 1 9% S 12 452476K 46728K fg app_126 com.badlogic.reddittv
27467 1 9% S 12 452476K 46728K fg app_126 com.badlogic.reddittv
27467 1 9% S 12 452476K 46728K fg app_126 com.badlogic.reddittv

Since i essentially have a single draw call (indexing into a mesh, starting at the first glyph of the first visible overlay, to the last glyph of the last visible overlay) there’s pretty much no CPU side code executed. It’s almost as if we don’t render anything at all :p.

What else could we do? Once there’s nothing animating anymore, we could draw our scene to an FBO, so we can render that in a single drawcall for as long as there’s nothing changing (user input, new animations). This is highly dependent on the applications design of course, and might not be worth the often horribly big effort. Luckily my application suites itself for a pattern like this, games usually don’t fit that bill well. Apart from boring card games that is.

For reference, here’s the CPU usage for the app without clearing the screen, but performing all logic and input processing:

27544 1 2% S 13 451728K 43600K fg app_126 com.badlogic.reddittv
27544 1 1% S 13 451728K 43600K fg app_126 com.badlogic.reddittv
27544 1 1% S 13 451728K 43600K fg app_126 com.badlogic.reddittv
27544 1 1% S 13 451728K 43600K fg app_126 com.badlogic.reddittv
27544 1 1% S 13 451728K 43600K fg app_126 com.badlogic.reddittv
27544 1 2% S 13 451728K 43608K fg app_126 com.badlogic.reddittv

This indicates that there’s an oportunity for improvement with the FBO approach i just described. In any case, i can the apps CPU and thus battery consumption down to a level where it can be competitive with the browser. In the usualy usage scenario, where you do a lof of panning, my app already outperforms the browser. Granted, the amount of work the browser has to do to redraw dirty regions is huge compared to what my simple app does.

Take aways:

  • Care for your CPU consumption, even if your app runs at 60fps
  • Use top or similar tools to monitor the CPU usage of your app during it’s life time
  • Cache computations, e.g. use SpriteCache, BitmapFontCache or put things into a static Mesh to get rid of data transfer completely
  • Culling, broad phase and narrow phase
  • If you app’s architecture allows it, render the scene to an FBO when nothing changed, and use the FBO while nothing changes.

Wow, that got longer than i planed. Do you guys still read this? Shall i continue with these entries again? Been a while, not sure if there’s still an audience.

  • ChvyVele

    Great post! I’ll have to start using ‘top’.

  • Romain Guy

    Why the FBO? It’s much simpler to stop drawing when nothing changes. That’s why the browser has a 0% CPU usage when a page is still.

  • P.T.

    Great post. The libgdx architecture (run render() as fast as possible) always seemed like it would lead to unnecessary CPU usage and battery drain, so its nice to see that this problem isn’t too hard to tackle. The specific tips on tracking the usage and solutions to look for a really handy.

    That said, having an way for a libgdx app to ‘block until input or timeout’ would be handy (especially for menus or other times when the screen is basically static) and would probably get as much of the savings? (Of course, you have actual code and numbers and have mere words here, so I’m probably missing something …)

    I’m not sure how you’d measure it, but it would be interesting to see for each frame render how often the entire screen is 100% the same as last time….

  • cypher

    bring it on!

  • magnesus

    Please, continue, it’s very interesting read! “Apart from boring card games that is.” – I will think about it for my Solitaires. Although I think in my case more usefull would be drawing only visible parts of cards when they are stacked. This shouldn’t be difficult to implement, and I love doing optimisations like that. :) I started having performance problems in solitaires with 100 and more cards on screen (they need to be lineary scaled for them to be readable and my textures are quire big for the cards look great on tablets), simple removing 9/10 cards from tight stacks (almost no visible change, because it’s basically 50 cards on top of each other with 0.1 pixel shift) helped a lot, but I still don’t have 60FPS in one of the solitaires I’m working on. FBO could help with semitransparent elements, I’ll have to check it. Also I just realised that semitransparent bottoms of stacks are most of the time invisible, so I don’t have to draw them then.

  • Mando

    Very interesting stuff

  • Mario

    Good point. That assumes that we have a mechanism to prevent buffer swapping, which at the moment, is not supported by libgdx. The workaround would consist of drawing to the FBO, then sleeping heavily on the rendering thread, reusing the FBO to update the buffer as necessary. That’s of course more than suboptimal, but a working solution until i have time to add support for non-continuous rendering.

  • Rik

    Very interesting read, thank you.

  • esak

    Interesting read, keep it up!

  • Mario

    “block until input or timeout” sounds like a good plan. The threading model of libgdx on Android is something i’m not particularly proud of. It’s “design” prohibits me from adding that easily. I’ll try to find some time to resolve this issue.

  • Badly Drawn Rod

    Posts like this are always interesting as it is always fascinating and enlightening to see someone else’s approach to a problem. And, as you can see from the comments that you’re getting, it allows other people to contribute too.

  • Simon

    Great blog post :) what other commands than top are there for measuring cpu performance and what other factors should be mentioned. what about gpu workload measuring?

  • Erik

    One of my favorite blogs! I learn a lot 😀

  • Mike Leahy

    Keep the posts coming.. :)

    I like the “tip from the top”.

    @Simon for GPU workload measuring perhaps check out the following GPU vendor specific tools:

    I’ve yet to fire up the above, but saw a demo of the NVidia and Qualcomm tools at AnDevCon and they are amazing.

    While I’m still approaching GL GUI / scene support with my efforts this post certainly got some wheels turning here on how I’ll have to modify clocking to support non-continuous rendering.

  • Mika

    Nice post, especially the trick with top to measure CPU time. I was able to shave off quite a bit of CPU time from my game project (not using libgdx, but still) by a metric other than FPS count. Good stuff! :)

  • Carlos

    please continue with these entries, they are not only delightful but absolutely useful