I have added a lightweight XML parser to the utils package. Yes, I realize writing a new XML parser is a little crazy, but I had a hunch I could come up with something small and fast, and it was a good exercise of my Ragel skills.
The Xml class parses using SAX-like events and builds a DOM by default. It supports a subset of XML features: elements, attributes, text, predefined entities, CDATA, mixed content. Namespaces are parsed as part of the element or attribute name. Prologs and doctypes are ignored. Only 8-bit character encodings are supported. Input is assumed to be well formed.
So, how does it perform? I compared it to javax.xml.parsers.DocumentBuilder on both the desktop and Android. On the desktop, it is slightly faster for small (1MB) XML documents, and slightly slower for very large (100MB) documents. On Android it is much faster on small documents. Android choked on the 100MB file, so I couldn’t test it. Here are the benchmark results, in seconds:
The entire XML parser was written in 251 lines of code using Ragel, a FSM compiler. Here is the core of the grammar:
((‘\” ^’\”* >buffer %attribute ‘\”) | (‘”‘ ^'”‘* >buffer %attribute ‘”‘));
element = ‘<' space* ^(space | '/' | '>‘)+ >buffer %elementStart (space+ attribute)*
:>> (space* (‘/’ %elementEndSingle)? space* ‘>’ @element);
elementBody := space* <: ((^'<'+ >buffer %text) <: space*)? element? :>> (‘<' space* '/' ^'>‘+ ‘>’ @elementEnd);
main := space* element space*;
It’s similar to writing regex, except the names like “buffer” and “attribute” are named snippets of Java code that get executed during various FSM state changes. These snippets collect pieces of input, build the DOM, etc.
Ragel is really cool! Now that I’ve used it for TableLayout and this XML parser, I feel comfortable using it to quickly whip up DSLs.