Hey everybody, fauigerzigerk sort of gets into this, but I just downloaded the dump yesterday expecting there to be a relatively straightforward way to parse and search it with Python and extract and process articles of interest w/ NLTK.<p>I'm not sure what I was expecting exactly, but it sure wasn't a single 40gb XML file that I can't even open in Notepad++.<p>Is my only real option (for parsing and data mining this thing) to basically set up a clone of wikipedia's system, and then screen scrape localhost?