Ask HN: What is the best way to get a data dump of HN?

3 点作者 c1sc0将近 3 年前

I’m working on a little text indexing side project & I think the content posted to HN would be a good dataset to work on. What is the best way to get a dump of all the url’s That have been submitted to HN? Asking for ideas before firing up a crawler. Are there existing dumps? APIs?

3 条评论

yamrzou将近 3 年前

There is a BigQuery dataset: <a href="https://console.cloud.google.com/marketplace/details/y-combinator/hacker-news" rel="nofollow">https://console.cloud.google.com/marketplace/details/y-combi...</a>

krapp将近 3 年前

You can find a link to HN's API in the footer of the page. Unfortunately, it's a bit awkward to work with, but it isn't rate limited.

anigbrowl将近 3 年前

Have you considered looking at the bottom of the page as well as well as the top?