Using reinforcement learning and $4.80 of GPU time to find the best HN post

217 pointsby kcorbitt7 months ago

25 comments

> In this case, I included the post title, author, date, and content. All of those factors could be relevant to the chance a story gets voted up.> Even if the model gets extremely good at predicting final_score_if_it_hits_front_page, there’s still the inherent randomness of probability_of_hitting_front_page that is fundamentally unpredictable.In addition to date, you might want to include three fields:- day of week (categorical)- is weekend/holiday (boolean)- hour or time of the day (categorical, you can have 24 of them or morning/afternoon/etc.).The probability of a post hitting the front page is usually affected by these things so it can really help the model.

评论 #41975308 未加载

评论 #41977430 未加载

评论 #41981099 未加载

评论 #41975357 未加载

评论 #41974536 未加载

评论 #42018757 未加载

评论 #41992149 未加载

评论 #41978438 未加载

kelnos7 months ago

I don't get the conclusion the author is trying to draw. If you look at the data presented, it seems that the model was actually pretty bad at guessing the real-world behavior of the posts listed. Out of the top ten it picked:* 1 had a score that was reasonably close (8.4%) to what the model predicted* 4 had scores wildly lower than the model predicted* 2 had scores wildly higher than the model predicted* the remaining 3 were not wildly off, but weren't really that close either (25%-42% off)Then there's a list of 10 submissions that the model predicted would have scores ranging from 33 to 135, but they all only received a score of 1 in reality.The graph shown paints a bit of a better picture, I guess, but it's still not all that compelling to me.

评论 #41975661 未加载

评论 #41979908 未加载

评论 #41979666 未加载

youoy7 months ago

Thanks for sharing! Very interesting.> The correlation is actually not bad (0.53), but our model is very consistently over-estimating the score at the low end, and underestimating it at the high end. This is surprising; some variation on any given data point is expected, but such a consistent mis-estimation trend isn’t what we’d expect.This is a consequence on the model objective. If you don't know what is really happening, a good way of reducing the overall error is to do that. If you instead try to exactly predict the very highs and very lows, you can see that you will get very high errors on those, resulting in a bigger overall error.Appart from that, I want to comment on AI alignment here. For me the objective of "most up votes" is not fully correlated with where I get the most value on HN. Most of the time, the most up voted I would have found them anyway on other platforms. It's the middle range what I really like. So be careful implementing this algorithm at scale, it could turn the website into another platform with shitty AI recommendations.

评论 #41974696 未加载

oli56797 months ago

If you withhold a small amount of data, or even retrain on a sample of your training data, then isotonicregression is good to solve many calibration problems.<a href="https://scikit-learn.org/dev/modules/generated/sklearn.isotonic.IsotonicRegression.html" rel="nofollow">https://scikit-learn.org/dev/modules/generated/sklearn.isoto...</a>I also agree with your intuition that if your output is censored at 0, with a large mass there, it's good to create two models, one for likelihood of zero karma, and another expected karma, conditional on it being non-zero.

评论 #41974739 未加载

评论 #41974197 未加载

swyx7 months ago

> > This query took 17 seconds to load the dataset into RAM and then aggregating by type was almost instant. It is absolutely incredible to me that I can load every HN post and comment ever into RAM in a few seconds on my (admittedly beefy) dev laptop, and analyze them at will. What an age of abundance!<a href="https://motherduck.com/blog/big-data-is-dead/" rel="nofollow">https://motherduck.com/blog/big-data-is-dead/</a>

Arctic_fly7 months ago

> But in 2015 there is a stark discontinuity, where the number of stories (with text) shoots up by >10x, and the average score drops by 5x! Is this some kind of eternal September?Based on the later analysis in the post (which I agree with), the total score of a comment is disproportionately tied to whether it hits the front page, and of course how long it stays there. Regardless of the quality of the average post starting in 2015, the sheer quantity would make it impossible for all but a few to stay on the front page for very long. Hacker News got more popular, so each story got less prime time.

kcorbitt7 months ago

Hey all, this project was a labor of love I worked on in my spare time over the last couple of weeks. Happy to answer any questions!

评论 #41980512 未加载

sdflhasjd7 months ago

It's interesting that service complaints are so popular on HN. I always feel a bit bad that my most popular HN contribution was me complaining about a popular service

评论 #41974729 未加载

评论 #41974435 未加载

评论 #41974182 未加载

评论 #41974293 未加载

评论 #41975572 未加载

pclmulqdq7 months ago

There is a timing factor that you need to consider, too. Anecdotally, Sunday morning is the best time to get onto the front page, while Tuesday or Wednesday morning gets you the most views.

评论 #41973999 未加载

manx7 months ago

Very interesting! Identifying great new content is a big unsolved problem for HN IMHO. Unfortunately, scores are not a good metric to predict, because they are not comparable (see <a href="https://felx.me/2021/08/29/improving-the-hacker-news-ranking-algorithm.html" rel="nofollow">https://felx.me/2021/08/29/improving-the-hacker-news-ranking...</a>). A better metric might be "upvoterate", defined as how much more or less likely users are to upvote a story compared to the average story. More about that here: <a href="https://github.com/social-protocols/quality-news?tab=readme-ov-file">https://github.com/social-protocols/quality-news?tab=readme-...</a>

Nevermark7 months ago

> It’s super important that your training inputs includes all the information your model will need to make predictions. In this case, I included the post title, author, date, and content. All of those factors could be relevant to the chance a story gets voted up.You would do better to leave out dates and authors.Do you really want the model to hone in on dates & authors? If you just trained on those would it create anything useful?It can’t for dates, since it isn’t getting any future date examples to prepare for future dates. I suppose you could argue that month & day matter. But surely that would be a much lower quality discriminator than forcing the model to stay focused on title & content.Similarly with author. You can find out which authors produce content with the most upvotes with a simple calculation.But again, is that the discriminator you want the model to use? Or the title & content? Because it will use the easiest discriminator it can.

gavin_gee7 months ago

Take note HN, this is what great content marketing looks like.

6gvONxR4sf7o7 months ago

Why use RL for this instead of plain old supervised learning?

评论 #41974848 未加载

评论 #41975102 未加载

Havoc7 months ago

Nice write up.Did you ever figure out what happened in 2016?

评论 #41974025 未加载

1024core7 months ago

Is it my understanding that the reward model is also similar to an LLM (with the difference being it predicts a score instead of the next token)?

评论 #41975621 未加载

hnburnsy7 months ago

Suggestion would be to try and coorolate the best time to post on HN to get it noticed. A good post won't catch fire if it doesn't overcome the initial low visibility. I've posted items that are later posted by others that gain traction.Maybe the reputation of the poster is also a factor?

metalman7 months ago

now do it again, and this time see where your post on ranking posts,ranks Personaly,I find lauding the dead, and dead past to be some how objectionable. Though I suppose that it is the business of our so called Ai, mining the dead past, hoping to come up with something better than frankenstien's zombie corpse. It is an insurmountable limitation, and dangerous I think as well, the past is that ultimatly perfect thing, its absolute imutability, and totality, as it is all there, to pick and choose from such a thing is brazen indeed. I cant help but imagine a picture of your $4.80 actualy bieng consumed in a bed of fluidised coal, which in fact it was.

eugenekolo7 months ago

What does the model say about this post?

评论 #41974018 未加载

hn_throwaway_997 months ago

> And in follow-up posts in this series, we’ll use that reward model along with reinforcement learning to create a model that can write high-value HN stories!Well, thanks HN, you were good while it lasted...

suyash7 months ago

Very interesting project, would love to read a more technical write up on how the model was architected and trained, any pointers?

评论 #41974569 未加载

octocop7 months ago

Even the AI's don't read the content before up/down voting.

floobertoober7 months ago

Maybe it would help to use a box cox transform on the score distribution?

chx7 months ago

> . That’s not much time for a model that (hopefully) understands all of HN!this is dangerous talk.it doesn't understand anything at all.Reminder: We are more prone to anthromorphizing LLMs than to humanizing suffering humans.

ChrisArchitect7 months ago

First problem with the submissions that supposedly 'would do well on HN' is other than the Ask HN: they're misusing the submission by putting it in a text post instead of sharing as a link post directly. And sketchy new/inactive accounts. C'mon. Not gonna keep reading grifty post after that opening.

ivanovm7 months ago

this is very cool, have you tried DPO?