Personally, I have a penchant for writing my own pull parsers. Its a mind-expanding exercise.<p>The neat thing about Go is that parsers can return functions that consume the next token. Rob Pike has an excellent video about this: <a href="http://www.youtube.com/watch?v=HxaD_trXwRE" rel="nofollow">http://www.youtube.com/watch?v=HxaD_trXwRE</a>
Streaming parsers are key when dealing with XML files this big. Used to have a C# parser that would parse about 1 TB of XML per day the biggest files were > 200GB.<p>It was impossible with out rewriting everything to use a SAX style parser.
I had to do a similar task of parsing the huge wikipedia dump and rewriting the Wikipedia XML (I had to add a couple of other tags to the main "page" tag) I used a SAX parser in Python and rewrote the dump. I found SAX parsers very simple to deal with huge XML Streams.