A few comments about on Hacker News data (i.e why I haven't played with the data in awhile):<p>1. The algorithm changed recently. This post uses >40pts as a proxy for front pageness. That's too conservative; even my 10pt threshold back then was conservative. With recent algorithm changes to Hacker News (<1 yr), I've seen posts with <i>3pts</i> get into the Top 10 for whatever reason, which breaks predictive analysis.<p>2) The dataset/this submission only includes submissions/ submission scores; comment scores were removed from the API which is disappointing.<p>3) Given that HN titles/links can be edited by moderators (and they do a good job), it's harder to judge initial submissions from the final result.<p>4) Slight edge case in the article, but link shorteners are auto-killed which is why youtu.be/goo.gl links are not prominent.