TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Parsing huge XML files with Go

44 pointsby dpsalmost 13 years ago

5 comments

willvarfaralmost 13 years ago
Personally, I have a penchant for writing my own pull parsers. Its a mind-expanding exercise.<p>The neat thing about Go is that parsers can return functions that consume the next token. Rob Pike has an excellent video about this: <a href="http://www.youtube.com/watch?v=HxaD_trXwRE" rel="nofollow">http://www.youtube.com/watch?v=HxaD_trXwRE</a>
评论 #4131143 未加载
评论 #4130951 未加载
fleitzalmost 13 years ago
Streaming parsers are key when dealing with XML files this big. Used to have a C# parser that would parse about 1 TB of XML per day the biggest files were &#62; 200GB.<p>It was impossible with out rewriting everything to use a SAX style parser.
评论 #4130650 未加载
评论 #4131075 未加载
duanebalmost 13 years ago
As much as I like hearing about Go, SAX parsers are not exactly new.
评论 #4130805 未加载
eximalmost 13 years ago
In the first place, why should you have huge XML files? (Except those wikipedia dump files :))
评论 #4131147 未加载
评论 #4130784 未加载
评论 #4131043 未加载
评论 #4131365 未加载
pradeepprabakaralmost 13 years ago
I had to do a similar task of parsing the huge wikipedia dump and rewriting the Wikipedia XML (I had to add a couple of other tags to the main "page" tag) I used a SAX parser in Python and rewrote the dump. I found SAX parsers very simple to deal with huge XML Streams.