Impressive...I wonder how big a content snapshot is, ie no article histories and no meta-material like talk pages or WP:xxx pages, just the user-facing content.<p>I was also sort of hoping to see from the stats what proportion of content was public-facing vs devoted to arguments between wikipedians...if you look at the stats for 'most edited articles' (accessible from the top link) it's interesting that of the top 50 most edited articles, only one, 'George W. Bush' is user-facing - and I suspect that only made it in because of persistent vandalism.<p>Still, with history and all included, there is some fabulous data-mining potential here, with which there's the potential to do some really innovative work. I'd hazard a guess that the size of Wikipedia already exceeds that of existing language corpuses like the US code...<p><i>/retreats into corner muttering about semantic engines and link free concepts of total hypertext as necessary AI boot conditions</i>