My hunch is that 90-99% of all Jeopardy questions can be answered with information in Wikipedia/Wiktionary, properly understood.<p>So I'd start with Wikipedia: ~30GB uncompressed full article text. Break it into chunks; canonicalize phrasings to be more declarative, and include synonyms/hypernym/hyponym phrasings (via something like WordNet), so that various 'cluesy' ways of saying things still bring up the same candidate answers.<p>Because it's free and compact and well-structured, throw in Freebase, too.<p>Jeopardy goes back to certain topics/answers again and again. So I'd scrape the full 200K+ clue "J!Archive", and use it as both source and testing material (though of course not testing the system on rounds in its memory).<p>And I'd add special interpretation rules for commonly-recurring category types: X-letter words, before-and-after, quasi-multiple-choice, words-in-quotes.<p>I think such a system might get half or more of the questions in a typical round correct, and in a matter of seconds, even on a single machine.