Matrix4 goodness in libgdx

I worked on the Matrix4 class yesterday and thought i’d share what has been improved. First of all, i reworked some of the Java methods, name Matrix4.inv() which did more divisions than necessary. The net result is a slightly faster Matrix4.inv() method, who’d have thought.

The bigger addition are the shiny new native code based Matrix4 static methods!

We have native methods for matrix/matrix multiplication, matrix/vector(s) multiplication, matrix/vector(s) multiplication with w-division, matrix/vector(s) multiplication using only the upper 3×3 sub-matrix and inverse and determinante calculation. The methods are static and work directly on float[] arrays to trim down the work necessary on the JNI side (fetching classes/methods/fields in JNI is a pain and slow). The methods work exactly like their non-static Java counter parts, but with benefits. I will eventually replace the Java methods with these suckers so you don’t have to decide which version to use (they produce the exact same result, unless you use strictfp…).

So how well do those methods perform? For this i setup a little micro-benchmark, without warmup (cause i’m lazy).

Here are the results on my 4 test devices

Hero (1.5)

Matrix/matrix multiplication is ~2x as fast, bulk matrix/vector multiplication is also about 2x as fast. Taking the inverse of a matrix is 3x as fast. Not bad, but also not mind blowing. The Hero has an MSM720xa chip which does not sport an FPU, so that kind of explains it. Kudos to the 1.5 Dalvik VM i guess 🙂

Droid (2.1.1)

Matrix/matrix multiplication is 5x as fast! Bulk matrix/vector multiplication is 7x as fast and taking the inverse is roughly 10x as fast as the pure Java version. Not bad at all! The Droid has an FPU, so the benefit is clearly visible. Android 2.1 is still interpreting the (dex) bytecode so floating point operations are software-emulated. Still a pretty good result for the Dalvik VM i have to say.

HTC Desire HD

Matrix/matrix multiplication is 2x as fast, bulk matrix/vector multiplication is 4x as fast and taking the inverse is 9x as fast. The JIT introduced in 2.2 does some great things and lowers the difference between the native and Java version of matrix/matrix and matrix/vector multiplication. The inverse is still a lot faster in native code though, even with the additional JNI overhead.

Nexus One (2.3.3)

Matrix/matrix multiplications are 2x faster, bulk matrix/vector multiplications are ~9x faster. That’s rather surprising given that the 2.2 JIT seems to perform better. I have no clue what causes this difference, maybe the test case is not so well suited for the 2.3 JIT. Matrix inversion is also 9x faster, just like in the 2.2 case.

Conclusion:
As with all micro-benchmarks, this one has to be taken with a grain of salt. I repeated the runs for 10 times each and averaged the outcome. While the Dalvik JIT produces really great results given it’s youth, it still pays off to write some native code in some cases. I’m a little surprised about the matrix/vector result on 2.3, i guess i hit a worst case scenario there.

I’ll replace the java methods with the native methods asap. If i find the time i might add VFP and NEON support at some point. Gotta figure out how to keep those methods in a single shared library for armeabi-v7a.

3 thoughts on “Matrix4 goodness in libgdx

  1. I love these kind of optimizations! Even if it’s a tenth faster it pays off in many cases when game logic gets more complex, good job !!!

  2. Hello!

    I have a question here. If I draw all my rendering in java its very slow… (i am using VBO’s, culling, blending, etc)

    I get 30 FPS on a live wallpaper (samsung galaxy s i9000), that has only 80 textured vertexes. (4 different textures, max 128*128 pixels)

    On my HTC Legend it will 10-20 FPS, that is very very slow:(

    How can performance up my rendering speed?
    If I use Native openGL C’s calls, then it will be faster?
    How many times?

    Thanks for the replay, Lacroix

Leave a Reply

Your email address will not be published.