I've been tempted to look into API-based HN access having scraped the front-page archive about two years ago.<p>One of the advantages of comments is that there's simply <i>so much more text</i> to work with. For the front page, there is <i>up to</i> 80 characters of context (often deliberately obtuse), as well as metadata (date, story position, votes, site, submitter).<p>I'd initially embarked on the project to find out what cities were mentioned most often on HN (in front-page titles), though it turned out to be a much more interesting project than I'd anticipated.<p>(I've somewhat neglected it for a while though I'll occasionally spin it up to check on questions or ideas.)