I’m working on a little text indexing side project & I think the content posted to HN would be a good dataset to work on. What is the best way to get a dump of all the url’s That have been submitted to HN? Asking for ideas before firing up a crawler. Are there existing dumps? APIs?
There is a BigQuery dataset: <a href="https://console.cloud.google.com/marketplace/details/y-combinator/hacker-news" rel="nofollow">https://console.cloud.google.com/marketplace/details/y-combi...</a>