Follow up on Java vs C for MD5 vertex skinning

So i was really puzzled by the results i got earlier today and double checked everything. The function call and array locking overhead for the JNI function is really insignificant. To make sure that that -O2 doesn’t kill the array locking when i comment out the actual calculation code i also checked the generated assembler output. Here it is

That’s GCC’s way of telling me two things: 1) I’m smarter than you 2) I inlined your function calls, lol :p. I didn’t just check the assembler output of the array locking only native method but also checked the assembler code for the fullblown version. To get myself comfortable with x86 assembler and AT&T syntax i first practiced to decode the assembler output of the unoptimized version. Here it is with some annotations. Compare to the C code of the last article:

Here’s the stackframe layout

I usually visualize the stack bottom up as that’s also the way local variables and arguments get laid out by gcc (at least relative to the base pointer…) That took me like an hour to decode and i was pretty proud of myself that i still “have it”. What’s immediatly clear is that there’s a shitload of load/stores to and from the stack in the unoptimized version. One could alter the array indexing in the C code to get rid of some of them. Then i turned on -O2, feeling confident that i would see what’s wrong in the actually used version.

Now, i do not pretend that i can decipher this. I can see the basic program structure reflected by keeping track of were what local variables and argument is stored is impossible for me.

So i spent an hour diving into x86 code just to discover that i didn’t discover anything 🙂

UPDATE: I think i figured it out. It seems like the Sun HotSpot VM JIT compiles to SSE code in some circumstances. My code seems to trigger this. Find out more at Yes, they do that since 1.4.2! The gcc code does not use any SIMD so it is at a disadvantage. Awesome sauce! The C version now takes 9 seconds with SSE enabled on my netbook (10.000 calls, the netbook is slow). With the 32-bit Client JVM it takes 11 seconds in Java. Suns JVM is indeed nice. Gotta test out JRockit and the IBM VM at some time.

Leave a Reply

Your email address will not be published.