I was wondering if anyone had prepared some sort of data archive for Hacker News. Something fairly simple, having the attributes id, title, url, user, score, last time score changed etc.?<p>I was thinking it would be an extremely interesting and valuable dataset.<p>I had some sample idea that I would like to try:<p>1. Is there a best time to post on HN? (I know this is very SEO, but I think it's an interesting question nonetheless)<p>2. It might be fun to cluster the data (perhaps all articles with score > 5), and see the top X articles in every cluster. I think that'll give you a wide variety of extremely good articles to read.<p>I know that this isn't the most enlightening or groundbreaking work, but I'm sure if we had the dataset, we would be able to come up with some interesting ways to analyze the data and come up with some nice results. (In fact, if anyone can think of some other interesting ways to analyze the dataset, can you post anyway, I'd like to hear them).<p>I was actually putting together a little script that scrapes HN and puts the data into a MySQL database, but that doesn't seem to be a good idea since it would hit the servers unnecessarily. Also, I'm not sure people would like me doing that.