That reminds me of a project I've always wanted to do but keep putting on the backburner: for a while now, I've been wanting to scrape the Unofficial Handbook of Marvel Comics Creators (UHBMCC) [0] and turn it into an actual database that I can run arbitrary queries on instead of just a collection of flat files (it's a very well-put-together collection of flat files, though). Fortunately they provide an offline version (in the form of CAB files of all things!) so I'd be able to do everything without hammering their server.<p>IMO, when it comes to Marvel comics, they're a much more useful resource than even the GCD (and part of this is their UI: it may be dated, but putting all the information for a series on one page is ten times better than the GCD's issue-by-issue interface).<p>(so, several years ago I did do some scraping of a very old version of the UHBMCC, but it's really outdated, I used some awful scraping tools, and I just stored everything in pickled Python objects rather than a database... if I were to start the project up again, I'd want to do it right from the start)<p>[0] <a href="http://maelmill-insi.de/UHBMCC/" rel="nofollow">http://maelmill-insi.de/UHBMCC/</a>