Non-continuous rendering in Libgdx

After the feedback on my last post i had to implement non-continuous rendering. It was easier then expected. Thanks to Romain and P.T. for subtly pointing out that the proposed “solution” for fake non-continuous rendering is pretty terrible. I will repent :)

Here’s how it works. The Graphics interface has three new methods:

interface Graphics {
/**
	 * Sets whether to render continuously. In case rendering is performed non-continuously, the
	 * following events will trigger a redraw:
	 * 
	 * <ul>
	 * <li>A call to {@link #requestRendering()}</li>
	 * <li>Input events from the touch screen/mouse or keyboard</li>
	 * <li>A {@link Runnable} is posted to the rendering thread via {@link Application#postRunnable(Runnable)}</li>
	 * </ul>
	 * 
	 * Life-cycle events will also be reported as usual, see {@link ApplicationListener}. 
	 * This method can be called from any thread.
	 * 
	 * @param isContinuous whether the rendering should be continuous or not.
	 */
	public void setContinuousRendering(boolean isContinuous);
 
	/**
	 * @return wheter rendering is continuous.
	 */
	public boolean isContinuousRendering();
 
	/**
	 * Requests a new frame to be rendered if the rendering mode is non-continuous. This method
	 * can be called from any thread.
	 */
	public void requestRendering();
}

Not sure this needs more explanation. Read the Javadocs :)

Caveat: This is currently a NOP in the Lwjgl backend, I added it to the Jogl backend already. It will take a bit before those are up to speed. I hope to finish that of today.

Try it out and let me know of any issues you have.

Reducing CPU and Battery Usage in OpenGL Apps on Android

ninja edit: the stats below are for an Asus Transformer. Readings will heavily differ on other devices. Grain of salt etc. Also, these are tips for the continuous rendering case. I’ll try to incorporate non-continuous rendering if i can find the time.

Long ass title, but i like things to be descriptive. I’m currently working on a non-gaming app for Android that lets you browse reddit in a more visual style. Here’s a little screenshot:

The tiles are actually decals rendered via DecalBatch, text beneath each tile is rendered via BitmapFontCache and a SpriteBatch, the UI at the top is a Stage with UI Actors (FlickScrollPane, Image, TextButton). Usually you browse a subreddit, displaying 50+ entries, through which you can scroll vertically. When you click on a tile, it expands into a new view, depending on it’s type. Here’s how an image is displayed:

You can freely pinch zoom/pan within the image view. There are also views for videos and html previews. In any case, there’s a lot going on in terms of blending, submitting geometry and so on.

At this point there are hardly any optimizations on the rendering side of things, i invested most of my time getting the heavy threading for media retrieval correct and working.

I’m worried about battery usage, as the application should be a competitor to browsing reddit in the browser or in one of the many reddit apps on the market. Being an OpenGL application with lots of animations and transitions, it’s rather hard to only render dirty regions. The only way for me to reduce battery usage is to decrease CPU usage as much as possible, and having an overall dark theme so as to not make the display shine the energy of a thousand suns into your face. Here’s what i did.

The first step was a simple analysis of the CPU usage of the app. The easiest way to do this is to use adb from the console as follows:

~/adb shell top -m 10 | grep reddittv

This will fire up top on the connected device and output the top 10 cpu consumers in 2-3 second intervals. I additionally grep for my applications name, rather, the part of the package name which will also be the process name reported by top. I then let the app run and interact with it as usual. What i get is a coarse grained trace of CPU activity which is sufficient for an initial analysis. Here’s the output of the unoptimized app after a bit of playing around with it on my Transformer:


27174 1 22% S 11 449116K 36916K fg app_126 com.badlogic.reddittv
27174 1 20% S 11 449032K 36820K fg app_126 com.badlogic.reddittv
27174 0 20% S 11 449032K 36824K fg app_126 com.badlogic.reddittv
27174 0 21% S 11 449032K 36832K fg app_126 com.badlogic.reddittv
27174 0 20% S 11 449032K 36840K fg app_126 com.badlogic.reddittv
27174 0 20% S 11 449032K 36840K fg app_126 com.badlogic.reddittv
27174 0 21% S 11 449032K 36844K fg app_126 com.badlogic.reddittv
27174 0 21% S 11 449032K 36856K fg app_126 com.badlogic.reddittv
27174 0 21% S 11 449032K 36880K fg app_126 com.badlogic.reddittv
27174 0 20% S 11 449032K 36888K fg app_126 com.badlogic.reddittv
27174 0 18% S 11 449032K 36892K fg app_126 com.badlogic.reddittv
27174 0 19% S 11 449032K 36892K fg app_126 com.badlogic.reddittv

First, notice how we don’t take up 100% of the CPU. That can be attributed to the vsynch we perform each frame. This will put our rendering thread to sleep for quite a bit, allowing our thread to execute at most 60 times a second. The application runs at 60fps at this point. Now, let’s see how other applications perform, namely the browser. I fired up went to reddit.com and then google.com. I panned/zoomed and idled on both pages. It’s not a super fair comparison of course, but it’s a nice set of measurements i can compare my app with.


reddit.com
25573 1 0% S 29 597684K 157716K fg app_54 com.android.browser
25573 1 0% S 29 596908K 155840K fg app_54 com.android.browser
25573 1 0% S 29 596908K 155848K fg app_54 com.android.browser
25573 1 13% S 29 597024K 156172K fg app_54 com.android.browser
25573 1 8% S 29 597100K 156260K fg app_54 com.android.browser
25573 0 11% S 29 597100K 156260K fg app_54 com.android.browser
25573 0 12% S 29 597176K 156380K fg app_54 com.android.browser
25573 0 9% S 29 597252K 156456K fg app_54 com.android.browser
25573 0 0% S 29 597252K 156456K fg app_54 com.android.browser
25573 1 5% S 29 597252K 156456K fg app_54 com.android.browser
25573 0 5% S 29 597328K 156532K fg app_54 com.android.browser
25573 0 6% S 29 597328K 156532K fg app_54 com.android.browser
google.com
25573 1 14% S 30 603436K 158600K fg app_54 com.android.browser
25573 0 37% S 30 609352K 162308K fg app_54 com.android.browser
25573 0 32% S 29 610868K 162268K fg app_54 com.android.browser
25573 1 12% S 29 608724K 160964K fg app_54 com.android.browser
25573 0 32% S 29 602000K 160056K fg app_54 com.android.browser
25573 0 18% S 29 602000K 160068K fg app_54 com.android.browser
25573 0 20% S 29 602352K 160420K fg app_54 com.android.browser
25573 1 20% S 29 602000K 160080K fg app_54 com.android.browser

The 0% usage stats for reddit.com were achieved when i did not interact with the site at all. There are no animated components, so nothing needs to be redrawn. The browser goes totally to sleep in that case. The 5-12% measurements were taken while scrolling around, causing the browser to redraw parts of the screen.

The google.com case is a bit more interesting. I did not interact with the site at all. However, there’s a text input field on that site that gets focus automatically. This field contains a blinking cursor, which triggers redraws. Those redraws are costly it seems. It could also be the interaction with the UI toolkit, so it’s likely not pure Webkit rendering. In any case, redraws will bring up CPU usage of the browser. These happen whenever you interact with the site (panning/zooming), animated elements, javascript etc. I can not (or hardly) bring my application’s CPU usage down to 0% when it’s not being interacted with and not animated to due to the architecture, which does not keep track of dirty rectangles. However, i can try to reduce the average CPU usage by as much as possible, so that it evens out with the browsers average CPU usage under normal user interaction scenarios. For reddit that means quite a bit of panning and thus redrawing.

What reduces CPU usage? Executing less code per frame. SpriteBatch and DecalBatch perform vertex generation on the CPU as it is faster to submit a CPU crafted vertex array to the GPU containing all sprites/decals than drawing individual (sub-)meshes and reseting transformation matrices each frame. To bring my CPU usage down, i have to reduce the amount of sprites/decals i render per pass. This can be easily achieved via simple culling. Let’s see what culling decals means for CPU usage:


27288 0 20% S 11 449036K 36932K fg app_126 com.badlogic.reddittv
27288 0 20% S 11 449036K 36944K fg app_126 com.badlogic.reddittv
27288 1 20% S 11 449036K 37000K fg app_126 com.badlogic.reddittv
27288 0 19% S 11 449036K 37008K fg app_126 com.badlogic.reddittv
27288 0 20% S 11 449036K 37012K fg app_126 com.badlogic.reddittv

Well, that didn’t do much. Out of 50 decals in total, i only render 15, yet the CPU usage stays pretty much the same. It’s no surprise really, as 50 decals aren’t that big of deal in terms of calculations, especially if they don’t change. All it amounts to is copying vertices to the GPU, binding the shader with the respective textures & matrices and issuing the drawing command. Note that that would change a bit if the decals were animated, as they are during transitions. Still, tranforming the 4 vertices of each of the 50 decals is really not a big deal.

The next candidate is the text. As stated above, i use a BitmapFontCache for each tile label in conjunction with a SpriteBatch. This means that i’m also just copying vertices to the GPU, as the cache of a label won’t ever change after its construction. Let’s see what culling does for us:


27364 0 10% S 12 449504K 36268K fg app_126 com.badlogic.reddittv
27364 1 11% S 12 449504K 36276K fg app_126 com.badlogic.reddittv
27364 1 11% S 12 449696K 36452K fg app_126 com.badlogic.reddittv
27364 1 12% S 12 449696K 36504K fg app_126 com.badlogic.reddittv
27364 1 10% S 12 449696K 36504K fg app_126 com.badlogic.reddittv

Oi, not bad! We brought down the CPU usage by 20%. You might wonder how much code was necessary for this. Here it is in all its glory:

private void drawOverlay () {
	orthoCamera.position.x = camera.position.x;
	orthoCamera.position.y = camera.position.y;
	orthoCamera.update(false);
	sbatch.setProjectionMatrix(orthoCamera.combined);
	sbatch.begin();
	float upperY = camera.position.y + Gdx.graphics.getHeight() * 0.5f;
	float lowerY = camera.position.y - Gdx.graphics.getHeight() * 0.5f;
	for(int i = 0; i < overlays.size; i++) {
		Overlay overlay = overlays.get(i);
		float halfHeight = overlay.getHeight() * 0.5f;
		if(overlay.getPosition().y + halfHeight < lowerY || overlay.getPosition().y - halfHeight > upperY) continue;
		overlay.draw(sbatch);
	}
	sbatch.end();
}

As you can see, the culling is stupidly simple. Everything is done in screen coordinates (yes, the 3D decals are actually using pixel coordinates, hurray for pixel perfect perspective projection :) ). We could also exploit the fact that the overlays Array has an order, from the top left overlay to the bottom right overlay. As soon as an overlay is beneath the bottom scren edge we could jump out of the loop and don’t boughter culling the rest of the overlays, which we know would be invisible anyways. I added this optimization for a future feature were you can load more result pages. If you have hundreds of results it will make a difference to bail out early.

We are down from 20% to 10% with only 4 lines of code, not bad. Note that before the culling the application already ran at 60fps. If you only use that as your metric, you’ll miss battery saving opportunities!

Could we do better? The overlays could actually be put into a mesh, their world position and orientation does not change (the camera moves, not the overlays). This would get rid of the copying of vertex data to the GPU each frame, which has quite a bit of impact on the CPU usage if there’s a lot of text. I did that, and it seems to help a tiny little bit:


27467 1 9% S 12 452476K 46728K fg app_126 com.badlogic.reddittv
27467 1 8% S 12 452476K 46728K fg app_126 com.badlogic.reddittv
27467 1 9% S 12 452476K 46728K fg app_126 com.badlogic.reddittv
27467 1 9% S 12 452476K 46728K fg app_126 com.badlogic.reddittv
27467 1 9% S 12 452476K 46728K fg app_126 com.badlogic.reddittv

Since i essentially have a single draw call (indexing into a mesh, starting at the first glyph of the first visible overlay, to the last glyph of the last visible overlay) there’s pretty much no CPU side code executed. It’s almost as if we don’t render anything at all :p.

What else could we do? Once there’s nothing animating anymore, we could draw our scene to an FBO, so we can render that in a single drawcall for as long as there’s nothing changing (user input, new animations). This is highly dependent on the applications design of course, and might not be worth the often horribly big effort. Luckily my application suites itself for a pattern like this, games usually don’t fit that bill well. Apart from boring card games that is.

For reference, here’s the CPU usage for the app without clearing the screen, but performing all logic and input processing:


27544 1 2% S 13 451728K 43600K fg app_126 com.badlogic.reddittv
27544 1 1% S 13 451728K 43600K fg app_126 com.badlogic.reddittv
27544 1 1% S 13 451728K 43600K fg app_126 com.badlogic.reddittv
27544 1 1% S 13 451728K 43600K fg app_126 com.badlogic.reddittv
27544 1 1% S 13 451728K 43600K fg app_126 com.badlogic.reddittv
27544 1 2% S 13 451728K 43608K fg app_126 com.badlogic.reddittv

This indicates that there’s an oportunity for improvement with the FBO approach i just described. In any case, i can the apps CPU and thus battery consumption down to a level where it can be competitive with the browser. In the usualy usage scenario, where you do a lof of panning, my app already outperforms the browser. Granted, the amount of work the browser has to do to redraw dirty regions is huge compared to what my simple app does.

Take aways:

  • Care for your CPU consumption, even if your app runs at 60fps
  • Use top or similar tools to monitor the CPU usage of your app during it’s life time
  • Cache computations, e.g. use SpriteCache, BitmapFontCache or put things into a static Mesh to get rid of data transfer completely
  • Culling, broad phase and narrow phase
  • If you app’s architecture allows it, render the scene to an FBO when nothing changed, and use the FBO while nothing changes.

Wow, that got longer than i planed. Do you guys still read this? Shall i continue with these entries again? Been a while, not sure if there’s still an audience.

EGL Context loss and GLES 2.0 performance

I just finished off a couple of new improvements.

  • If your app runs on a device with Android +3.0, the EGL context will be preserved when there’s not to much resource pressure from other apps. You don’t have to do anything special.
  • SpriteBatch/DecalBatch and all things that use those funky classes (Stage, etc.) got a performance boost when using GLES 2.0. On my test devices, GLES 2.0 is now always faster than GLES 1.x.

Those improvements were added as i discovered a few performance issues with an app i’m working on. There might be more to come :D

gdx-audio

I just finished off gdx-audio, our newest extension. Features:

  • Decoders for mp3, ogg Vorbis and wav, using Mpg123, Xiph Tremor
  • KissFFT and Java FFT by Damien Di Fede for comparison
  • SoundTouch for pitch shifting, time stretching and playback rate modification

To use the extension add the gdx-audio.jar and gdx-audio-natives.jar to your desktop project. For your android project add the gdx-audio.jar and copy the libgdx-audio.so files to your libs/armeabi and libs/armeabi-v7a folders.

For usage examples see:

  • Mpg123Test, shows how to decode an mp3 with the Mpg123Decoder class
  • VorbisTest, shows how to decode an ogg with the VorbisDecoder class
  • WavTest, shows how to decode an wav with the WavDecoder class
  • SoundTouchTest, shows how to apply pitch shifting to a PCM stream

Caveat: the vorbis and mp3 decoder can only decode files stored on the external storage. I might be able to work around that limitation in the future. For most practical purposes it shouldn’t be to limiting.

gdx-jnigen: a stupid idea that might just work

Since i’m to lazy to type, i made a demo video. All the native code in libgdx uses this now. I explain the reasons in the video. Use at your own risk. Also, sorry for the crappy audio, my headset died.

edit: as Riven kindly pointed out, there’s a bug in the add method. Look at the offset, now back to me, now back at numElements, now back to me. Sadly, i never learned how to iterate over arrays given offsets :D

Beginning Android 4 Games Development

preface: this is soley my view of things, Robert has nothing to do with this post.

http://www.amazon.com/Beginning-Android-Development-Apress-ebook/product-reviews/B006LPJXZ6/ref=sr_1_3_cm_cr_acr_txt?ie=UTF8&showViewpoints=1

So, i was asked by my publisher to write a second edition of “Beginning Android Games”, updated to ICS. At the time i was asked, ICS was not out yet, there were also just rumours and nobody really knew when it would drop. On top of that, i didn’t have the intent of writting a second edition, as i just finished the first one and was loaded with a ton of other work that had to take priority.

I declined the offer. I was informed that they would hire another person (any other person really) to do the job and that i would lose my authorship (that is, my name wouldn’t be on the book even if it was 100% my writting). The contract i signed for the first edition gives the publisher the full right to do that, for better or worse. For obvious reasons i didn’t want this to happen. So i asked Robert Green from Battery Powered Games if he wanted to take over. I could take on the part of a technical reviewer and he’d write the additions.

The end result is an updated version of the original book, with bug and typo fixes and a few additions concerning Honeycomb and ICS. I did not invest any time apart from going over Robert’s changes.

If anything, it should have been called “Beginning Android Games, Second Edition”, putting 4 in the title is suggesting it’s full of ICS related material.

I can not add more at this point due to legal reasons. Suffice it to say that i’m not happy with it either and as with any purchase, you have to evaluate whether it’s worth it for you or not.

Here’s the second edition’s TOC on Amazon for your convenience.

Update to Lwjgl 2.8.2

I just updated libgdx to use Lwjgl 2.8.2 on the desktop. Please test it on your favorite platform and let me know of any problems. I tested it on win32/64 and lin32/64 (Ubuntu in VirtualBox with HW acceleration enabled) without any kind of problems.

The new version also allows us to have the Lwjgl window be resizable. To enable this feature just set the respective field in LwjglApplicationConfiguration to true:

LwjglApplicationConfiguration config = new LwjglApplicationConfiguration();
config.resizable = true;
new LwjglApplication(new MyListener(), config);

Nightlies Moved!

I moved everything over to our own servers now. I pay 40€/month for all of this. Yes, i complain :D

In any case, you can find the latest nightlies here: http://libgdx.badlogicgames.com/nightlies

New Build System

I reworked the build system over the last couple of days. It’s now composed of the following:

  • build.xml: the master Ant build script. Responsible for building the core, all backends, extensions and docs
  • build-template.xml: a template Ant script used to build all projects. Invoked by the master build script with appropriate parameters for a project’s classpath, jars that should go into the output jar and so on.

That’s about it. As a result, the release/nightly builds now have the following structure:


docs/
extensions/
armeabi/
armeabi-v7a/
sources/
gdx-$extensionname-sources.jar
gdx-$extensionname.jar
gdx-$extensionname-natives.jar
armeabi/
armeabi-v7a/
sources/
gdx-$projectname-sources.jar
gdx-$projectname.jar
gdx-$projectname-natives.jar

The core and backend jars are in the root folder of the distribution. The armeabi/ and armeabi-v7a/ folder contain the natives for Android, the gdx-$projectname-natives.jar contain the natives for the respective desktop project.

The extensions folder has the same setup for extensions (jars + desktop natives jars + arm binaries).

In other news: i removed the audio analysis and decoder classes from the core, they are now in gdx-audio. I’ll have those finished by the end of the day, including the mpg123 decoder. Stb TrueType has been moved to its own extension as well (gdx-stb-truetype).

You can now easily build your own distribution by invoking:


ant -f build-new.xml

This will create a dist/ directory containing all the jars and natives for both core and extension projects as outlined above. Only requirement: ant needs to be on your $PATH, JAVA_HOME needs to point to your JDK installation.

You can also build individual projects via:


ant -f build-new.xml gdx-$projectname

More things to come :D

Oh god, it’s 6am and…

comments should work again!

i hate writting JNI bindings, e.g. for bullet. So i spent the last couple of hours on this:

I’ll explain it tomorrow. Not sure what to call it, abomination is probably the most fitting.