TechEcho

willvarfaralmost 13 years ago

Personally, I have a penchant for writing my own pull parsers. Its a mind-expanding exercise.<p>The neat thing about Go is that parsers can return functions that consume the next token. Rob Pike has an excellent video about this: <a href="http://www.youtube.com/watch?v=HxaD_trXwRE" rel="nofollow">http://www.youtube.com/watch?v=HxaD_trXwRE</a>

评论 #4131143 未加载

评论 #4130951 未加载

fleitzalmost 13 years ago

Streaming parsers are key when dealing with XML files this big. Used to have a C# parser that would parse about 1 TB of XML per day the biggest files were > 200GB.<p>It was impossible with out rewriting everything to use a SAX style parser.

评论 #4130650 未加载

评论 #4131075 未加载

duanebalmost 13 years ago

As much as I like hearing about Go, SAX parsers are not exactly new.

评论 #4130805 未加载

eximalmost 13 years ago

In the first place, why should you have huge XML files? (Except those wikipedia dump files :))

评论 #4131147 未加载

评论 #4130784 未加载

评论 #4131043 未加载

评论 #4131365 未加载

pradeepprabakaralmost 13 years ago

I had to do a similar task of parsing the huge wikipedia dump and rewriting the Wikipedia XML (I had to add a couple of other tags to the main "page" tag) I used a SAX parser in Python and rewrote the dump. I found SAX parsers very simple to deal with huge XML Streams.

Parsing huge XML files with Go

5 comments

Parsing huge XML files with Go

5 comments