It’s been quiet a while since i last posted on this blog. I’m having constant stress doing a lot of stuff and don’t find my inner zen at the moment to sit down and do anything Android or game development related. So here’s a roundup of what i’ve been up to.
I’m currently working on something very interesting at work which i’ll try to explain in as little text as possible: Given free form text (e.g. a news article) we in the information extraction community want to extract useful data from it. Examples would be instances of locations, organisations or persons which we call named entities. There’s two ways to find such named entities: via a statistical machine learning approach or via a set of rules. The machine learning approach is (mostly) awesome if you happen to have a shitload of ground truth data (text with the named entities marked by a human annotator for example) to learn from. The rule-based approach is tedious to work with and it’s quiet easy to come up with a mess of rules with all kinds of silly side-effects. It allows you to bootstrap your statistical learners though and thus they are still very useful. Both types of extraction system output something called annotations, which are basically segments within a text that encode some information like the name of a person. Other annotation types could be more fundamental, like tokens which mark every word in a text and denote some features of that words, like their part of speech tag and so on.
Now, there’s two ways to write a rule-based information extractor: the hard and the harder way. The harder way means coming up with certain patterns you want to match and create annotations from, e.g. “(TITLE)? (GIVEN_NAME)? SURNAME”, a very simplistic pattern which will not work like this in the real world :p. Once you formulated your patterns on paper you continue by writting them out as a form of hand-written regular expression matcher which operates on annotations instead of characters. Yes, very confusing. The thing is that writting those matchers by hand sucks. For character matching with have our beloved and funny looking regular expressions which express the patterns to be matched in a very concise way. Why not create a regular expression matcher for annotations?
Indeed that’s what others have thought before and they came up with things like the Java Annotation Pattern Engine (Jape) which is part of Gate or Sprout by the DfKI. The former is open source but highly coupled to the Gate internals. The later offers academic licenses for free but has a cost attached to it when you want to use it in a commercial project.
So what will any good engineer do in this case? Fucking reinvent the wheel! My task in the near future will be to implement a backend independant version of Jape, completely open source with no strings attached that one can use in his own information extraction framework. The parsing and creation of the AST is actually already complete (that part only took a few hours today) so i’m left with implementing the actual matcher based on the AST. For this i’ll gladly stand on the shoulder of giants, in this case the giant is called Thompson who first described how to efficiently execute a regular expression matcher in O(nm) (yes, that’s worst case). Russ Cox wrote three awesome articles on the matter, explaining in (extremely well written) detail how one would go about it. He actually wrote the regular expression support for Google Code Search so the man must know what he’s talking about.
Speaking of which, i was contacted by a “big bad company” and asked if i’d be interested working for them. “Of course!” i replied and send over my CV and project portfolio. A nice lady then called me via phone and asked me questions about my unfinished studies and whether i could imagine living in Zürich or London. Said company is known for their long and technical interviews in their hiring process so i’m kind of pessimistic whether i would make it, i’m a bit rusty on my complexity analysis. In any case i’ll make the best of it. Fun times.
I was also approached by APress and asked whether i’d want to write them a book called “Beginning Android Game Development”. I just received the contract draft as well as the proposal form and will meditate over this over the weekend. I’ll most likely accept the offer. Nothing set in stone yet, they’ll also have to accept my proposal. We’ll see about that.
I also started maintaining, fixing and improving the Android app FaceIT by Lior Vankin. Fun times, as the original code base was horrible. We haven’t had any automatic crash reports since the last update so that gives me hope that future maintainance work will get down to a minimum and i can concentrate on adding new features every now and then.
Finally i had a talk with Robert from Battery Powered Games, our most beloved competitor. We plan on adapting the Cal3D format for skeletal animation in the very near future. Some of the code wil be integrated in libgdx in favor of the md2 format, which just plain sucks. I should also get my ass up and start implementing multi-touch this weekend. Everything is set up, i just need to copy and paste 50 lines of code from an old project. No promises at the moment though!
There’s actually a lot more that happened in the last couple of weeks. Starcraft 2 for example… GOD DAMN YOU BLIZZARD!