Jack, or what i’ve been up to lately

I’ve not been super mega active in the libgdx repository over the last 2-3 weeks, apart from fixing issues. The reason is a new side-project i’m working on called Jack (among other things).

Jack is a silly attempt to create a Java bytecode to C++ compiler. Yes, crazy, stupid, futile. But let a man have some fun in his free time :)

I want to document this effort, even if it fails horribly (which is very likely). I’ll split this up into a couple of posts which detail how i’m trying to solve specific issues. Let’s start with something more important, motivation. I’m gonna ramble on about a few things, feel free to skip to the last header :)

The Failure of the JVM (for a specific use case :))

small edit: i’m aware of this. I believe it when i see it
The Sun JVM is a very cool piece of software. It’s pretty much a given that a one-man show can not even start to compete with the many many man years that have been invested in Sun’s VM (yeah, i still like to pretend it’s Sun…). However…

The JVM is not the portable platform it once was/pretended to be. For whatever reason, Oracle has completely missed the mobile revolution in 2007, starting with the iPhone. The same is true for the console market, where indies can publish on Microsoft’s, Sony’s and Nintendo’s consoles. Many would argue that using a VM based language on any of those system is a rather stupid idea.

However, one only has to look at Unity to see that that’s not necessarily the case. Unity uses C/C++ internally to implement it’s core functionality, and exposes it to a range of scripting languages, the most commonly used one being C#. The C# code is run on Mono, a free implementation of the .NET platform. Unity is arguably very popular among hobbyists and indies, and even big game studios adopt it. Nobody seems to be afraid of the fact that their games (partially) run on a VM, even on resource limited mobile devices.

Why not ditch Java/the JVM?

First i’d like to state that i’m not a particular Java fan. I like it for what it is, an unsophisticated language that allows large teams with a variety of skill sets to work together. From a programmer’s stand-point there are better alternatives on the JVM, but Java is the lowest common denominator, with an extremely low barrier of entry for pretty much anyone.

I could have written libgdx in Scala, Clojure, Kotlin (if it were available back then), JRuby, or other, more esoteric JVM languages. But by going with Java, i knew i’d have an easier time on Android, as other JVM languages are a pain to get running on there. I also knew that it’d be accessible to a LOT more people, especially hobbyists or students that just start to learn programming in school or university in Java (not necessarily the best language to start with in my opinion, but the industry needs Java monkeys i guess…). By using Java i also make sure that interoperability with other JVM languages is simple. Had i written it in idiomatic Scala, i’d have severly limited the target audience and potential platforms it could run on (looking at you GWT).

That explains why libgdx is written Java. It does not explain why it’s not written in C#, Lua or C/C++. My answer to that would be tooling. While Visual Studio and MonoDevelop are workable IDEs for C#, i’m not particularly fond of them. Compared to IDEs from the JVM world (Eclipse, Intellij Idea), they feel rather lacking (yes, i know and use Resharper). Similar things are true for C/C++, and don’t even get me started on XCode. And for all you Vim/Emacs guys, please go back to changing your terminal background color to transparent black. Just kidding. Those two beast just don’t fit my workflow.

One might think that the strong ecosystem of Java, with its thousands of third party libraries would be an argument for Java. I think that’s only partially relevant in this context. Most of them are hard to use on Android, and getting anything complex to work on GWT is pretty much futile. However, many games have server backends, and being able to share code between your client and server is pretty valuable imo (depending on the complexity of your protocol of course).

.NET/Mono is comparable in that regard, with one simple drawback: it’s not free. Running your C#/MonoGame/whathaveyou game on iOS costs you 400$, running it on Android another 400$. For a small team/company that’s of course totally affordable, but for hobbyists and youngsters that might be to much.

I got into programming due to games. I started out with QBasic, arguably a horrible horrible language. But it was good enough to keep me interested and even forced me to learn 386/387 assembler. And all of that was available to me for free (minus the machine of course). I want children and teenagers to be able to code for their mobile devices and show things of to their friends at school. And i want that experience to be free. I’m not saying that libgdx is the best option for them to start out, though compared to what we had to endure during the good old DOS days, it’s a walk in the park :)

Concluding this section: Java’s low barrier of entry, accessibility, free tooling and ecosystem make it a nice choice in my opinion.

Searching for alternatives

So, Sun’s JVM isn’t an option on the platforms i like to target (iOS, consoles, maybe windows phone). There must be alternatives, right? There are, but all come with a bunch of caveats.

The first alternative that comes to mind is Avian by Joel Dice. It’s a fantastic effort in my opinion, and the amount of great engineering that went into it is simply amazing. On top of that, Joel is a really nice guy, and i had the pleasure of harassing him on his mailing list when i tried to get it running on iOS. Avian can compile Java bytecode to machine code, either at runtime via JIT compilation, or ahead of time, via a fancy trick that saves out a memory snapshot that’s converted to an object file against which you can link. The latter makes it possible to run Avian on iOS, which doesn’t allow JIT compilation. Memory pages can’t be made executable, something that other things like LuaJIT or browsers and their Javascript VMs aren’t to fond of either. Thanks Apple!

It also has a minimal runtime library implementation. I even used parts of that in our GWT backend. Alternatively one can link against the OpenJDK runtime or GNU classpath. It has a few drawbacks though. The JIT is limited, on ARM it used software floating point emulation the last time i checked (i think there’s work on hardware floats now). It’s rather hard to debug and doing ahead of time compilation takes quite a bit of time. So much time, that quickly iterating is hard.

Another “VM” i looked at was XMLVM. They compile Java bytecode to XML, then use XSL tranforms to compile the XML to another target language, e.g. Objective-C or Javascript. I’m not commenting on the use of XML/XSL here, the craziness should be obvious. What irked me a bit about XMLVM when i looked at it first, was the lack of consideration of any Java language features and JVM semantics. The first iteration uses auto-pools (read: reference counting) instead of real garbage collection. It also used to tranlate the Java bytecode, which is basically “running” on a stack machine, directly to Objective-C. That means tons of load and stores. Both things seem to have been fixed in the latest versions, they now use Dex (register machine bytecode used on Android) as well as the Boehm GC (conservative garbage collector). On the runtime library side they now seem to use Harmony. While there are instructions on how to use XMLVM, it’s really hard to get a feeling what language features are implemented to what extend (GC, exceptions, memory model, etc.). Eventually i decided that i don’t want to waste time on finding out by inspecting random segfaults. Overall, it’s an interesting/crazy approach and implementation, but i find it hard to build confidence in this tool.

Next, there is Mono. Yes, you read that correctly. Michael Bayne of ThreeRings, who’s also a contributor (the only one?) to PlayN, got IKVM running with MonoTouch after adding some serious magic. IVKM takes Java bytecode and translates it .NET bytecode. The Java runtime library is emulated on top of the .NET library. I added a tiny bit of code to Michael’s IKVM fork that allows us to call into JNI code as well, and lo and behold, libgdx runs on iOS. The workflow is bearable (edit in Eclipse, switch to MonoDevelop and hit build/run), however, one can’t debug into Java code (maybe there’s a way, one can debug into system assemblies, just haven’t found a way to enable that for custom assemblies). This is a workable solution, but it comes at the price of 400$ for a MonoTouch license. Now, i think Xamarin, the makers of Mono and MonoTouch, deserve that money. But as i stated above, it’s a barrier of entry for quite a few folks. I like my tools to be free. Note: this is the most likely way for us to get libgdx running on iOS, so i’ve not given up on this, i’ll finish it asap. Only audio and some screen orientation work is left.

Then there are a few old efforts that can only run Java 1.4 but are fully functional otherwise, e.g. JCVM. I like it quite a bit and the code is pretty readable as well. However, getting this to work with newer Java versions and porting it to iOS is something i don’t want to try.

In summary: out of the available options, only IKVM + MonoTouch is working for me. However, i got a MonoTouch license. Not being able to debug the IKVM compiled Java code sucks as well.

Jack

Not satisfied with the available options, i started to ponder about whether i should give writting a kind of JVM a try. Now, doing that is a massive undertaking, especially if you want to run Eclipse on that thing. But, as we saw with GWT, there are ways to cut corners and still have a viable solution. Also, JCVM showed me a few nice tricks that made it seem “easier” to tackle the problem. The first thing i did was define a few goals, a minimally viable product so to speak. The “VM” i’ll try to create should support the following things:

  • Portable to platforms where no JVM is available
  • Ahead of time compiled
  • Debuggable
  • JVM bytecode as input (1.6 compatible)
  • Preserve Java class/interface hierarchies
  • Minimal runtime library implementation so all of libgdx works
  • Garbage collection
  • Exceptions
  • Threading
  • JNI
  • custom native interface to speed up writting the minimal classpath

Here are the non-goals:

  • Java memory model support. Explicit locking/message passing is a valuable alternative.
  • Full Java runtime library support (AWT/Swing, all the EE crap)
  • Loading bytecode at runtime (would need at least an interpreter, making everything a lot more complicated)

These constraints allow me to cut a few corners, just like GWT can cut a few corners (which is not to say that GWT isn’t complex, it’s fucking complex). Next i had to decide on how to implement this.

I took some inspiration from JCVM, which compiles Java bytecode to C. Using C as the target means that a lot of the Java language features like inheritance/polymorphism have to be manually created. Given my time budget, i made C++ the target, as it will take care of vtables and exceptions for me (more on that in a later post). While not as portable (looking at you Visual C++), C++ is still a good target when one tries to port to multiple platforms.

Once i decided on the target, i researched how to best translate the Java bytecode to C++. The first issue was that the JVM instruction set is huge (+200 instructions). Also, Java bytecode targets a stack machine, that means all operands are pushed to a stack, and operations act on elements on that stack. Translating that to C++ is neither super readable, nor super performant (unless one employs some tricks).

Luckily, i found Soot (through JCVM), a Java bytecode analysis/optimization framework. It translates stack-based Java bytecode to another intermediate representation called Jimple, which is register-based. C/C++ compilers eat such code, which is basically 3-address code, for breakfast. Superfluous load and stores can be easily eliminated. Jimple is also easier to read than bytecode.

There are of course many details that need to be figured out when translating Jimple to C++ (class hierarchy, exceptions, threading, reflection, JNI, GC, etc.). That’s what i’m going to get into detail in the next few blog posts.

Finally there was the question of what to use as the basis for the runtime library implementation. I’m currently using a version of Avian’s classpath implementation that i augmented with classes from Harmony when i worked on Avian/iOS.

And that’s the plan. Over the next few weeks i’ll see how far i can get with Jack. I already got quite a few things to work, and i have a pretty good idea of how to go about all the issues i could think of so far.Riven from JGO helped me resolve many conceptual issues and even wrote a garbage collector implementation that i’d like to integrate at some point. Big props to him.

You can follow Jack at Github, the README goes into detail about many things, the code is heavily commented and hopefully trivial to follow. I’m aware that a lot of this is very blue eyed, and i’m also aware that this is going to fail horribly. But as i said in the beginning, everybody needs an engaging hobby :)