TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How Hacker News ranking algorithm works

311 点作者 cristoperb超过 14 年前

18 条评论

pg超过 14 年前
That's close to the current version, but a little out of date. Here's the code running now:<p><pre><code> (= gravity* 1.8 timebase* 120 front-threshold* 1 nourl-factor* .4 lightweight-factor* .17 gag-factor* .1) (def frontpage-rank (s (o scorefn realscore) (o gravity gravity*)) (* (/ (let base (- (scorefn s) 1) (if (&#62; base 0) (expt base .8) base)) (expt (/ (+ (item-age s) timebase*) 60) gravity)) (if (no (in s!type 'story 'poll)) .8 (blank s!url) nourl-factor* (mem 'bury s!keys) .001 (* (contro-factor s) (if (mem 'gag s!keys) gag-factor* (lightweight s) lightweight-factor* 1)))))</code></pre>
评论 #1781448 未加载
评论 #1781961 未加载
评论 #1782449 未加载
gojomo超过 14 年前
Some weaknesses of this algorithm are:<p>(1) Wall-clock hours penalize an article even if no one is reading (overnight, for example). A time denominated in ticks of actual activity (such as views of the 'new' page, or even upvotes-to-all-submissions) might address this.<p>(2) An article that misses its audience first time through -- perhaps due to (1) or a bad headline -- may never recover, even with a later flurry of votes far beyond what new submissions are getting.<p>Without checking the exact numbers, consider a contrived example: Article A is submitted at midnight and 3 votes trickle in until 8am. Then at 8am article B is submitted. Over the next hour, B gets 6 votes and A gets 9 votes. (Perhaps many of those are duplicate-submissions that get turned into upvotes.) A has double the total votes, and 50% more votes even in the shared hour, but still may never rank above B, because of the drag of its first 8 hours.<p>(I think you'd need to timestamp each vote for an improved decay function.)
评论 #1781524 未加载
评论 #1781609 未加载
评论 #1781640 未加载
评论 #1781433 未加载
评论 #1781881 未加载
评论 #1782841 未加载
评论 #1781519 未加载
评论 #1782456 未加载
antirez超过 14 年前
When I built oknotizie.virgilio.it many years ago, more or less at the same time reddit was created, I used the same base algorithm, that is: RANK = SCORE / AGE^ALPHA, where ALPHA is the obsolescence factor.<p>This is a pretty obvious algorithm, but the evil is in the details. First, since oknotizie is based in italy AGE is calculated in a special way so that nightly hours are calculated in a different way (every hour should be take into account proportionally to the traffic that there is in this hour).<p>Second, there is to do a lot of filtering. Oknotizie is completely built out of anti-spamming: statistical analysis on users voting patterns, cycles detection, an algorithm penalizing similarities in general in the home page, and so forth.<p>To run a simple HN style site is simple as long as the community is not trying hard to game it. Otherwise it starts to get a much more complex (and sad) affair.
barrkel超过 14 年前
A problem (IMHO) with the HN ranking algorithm is that once a post fails to get traction (perhaps because things were busy at the time it was submitted), it won't really be able to get traction later, even if it's re-discovered 6 hours or 2 days later. Seems to me like velocity ought to be taken into account a little more for items that have otherwise languished.
评论 #1781382 未加载
yesbabyyes超过 14 年前
Here's an explanation of other ranking algorithms, including Bayesian average, Wilson score and the ones used on HN, Reddit, StumbleUpon and Del.icio.us:<p><a href="http://blog.linkibol.com/2010/05/07/how-to-build-a-popularity-algorithm-you-can-be-proud-of/" rel="nofollow">http://blog.linkibol.com/2010/05/07/how-to-build-a-popularit...</a>
jacquesm超过 14 年前
This is not the 'hacker news' ranking algorithm, this is the ranking algorithm distributed with 'ARC', which is the basis for the HN algorithm, but definitely not equal to it.<p>The biggest missing ingredients are flagged posts dropping off quicker and posts that contain no URL dropping off quicker but there are quite a few other subtle tweaks.<p>The (very good) reason why the ARC sources do not give out the real ranking algorithm is to make it a bit harder to game the system.
评论 #1781332 未加载
评论 #1781335 未加载
bergie超过 14 年前
I built a reasonably similar ranking system a few years ago, but also taking social media interaction (blog links, delicious bookmarks, etc) with the content items being ranked into account: <a href="http://bergie.iki.fi/blog/calculating_news_item_relevance/" rel="nofollow">http://bergie.iki.fi/blog/calculating_news_item_relevance/</a><p>You can see it in action on maemo.org: <a href="http://maemo.org/news/" rel="nofollow">http://maemo.org/news/</a><p>PHP sources: <a href="http://trac.midgard-project.org/browser/branches/ragnaroek/midcom/org.maemo.socialnews" rel="nofollow">http://trac.midgard-project.org/browser/branches/ragnaroek/m...</a>
qeorge超过 14 年前
I made an HN filter for myself, that's basically:<p>points / comments<p>It works shockingly well. Its here if anyone would like to check it out: <a href="http://www.upthread.com/" rel="nofollow">http://www.upthread.com/</a>
评论 #1781614 未加载
DeusExMachina超过 14 年前
Does anybody care to explain the strange indentation I see in the Arc code? I know Lisp a little (mostly Clojure), but I don't get the indentation of the code at the bottom of the algorithm where it looks like it's branching in two parts. Is this peculiar to Arc? Or to other Lisps as well?
评论 #1781310 未加载
评论 #1781294 未加载
评论 #1781305 未加载
kens超过 14 年前
For more details on the algorithm, see my article "Inside the news.yc ranking formula" from last year: <a href="http://www.arcfn.com/2009/06/how-does-newsyc-ranking-work.html" rel="nofollow">http://www.arcfn.com/2009/06/how-does-newsyc-ranking-work.ht...</a>
gsivil超过 14 年前
Nice post. Do you know what is the algorithm for the ranking of comments? I think this would be interesting to write about that too.
johns超过 14 年前
The home page algorithm seems to have changed recently, but the RSS feed hasn't and is much noisier. It would be nice to see the RSS feed updated to reflect the home page changes (or confirmation that I'm just perceiving a difference that doesn't actually exist).
callmeed超过 14 年前
So, if you're implementing this in a framework like Django or Rails, can you get a result set in this order directly from a query? Or do you have to query then sort?
评论 #1781322 未加载
tamersalama超过 14 年前
A basic question: If this was performed on page load, and in-memory, how would the first db fetch occur? Unless this is pushed 'somehow' to the database.
brianbreslin超过 14 年前
so this means you'd be best served to submit something at or nearest peak hours?<p>what are the peak use hours on HN? since everyone is a hacker, i'd assume it was evening hours on east and west coast US as heaviest load? not 9-5 ET?
评论 #1781233 未加载
b_emery超过 14 年前
Looks like there is a built in max lifetime of about 5-10 hrs.
评论 #1781180 未加载
d0m超过 14 年前
I thought the "secret sauce" was hidden from Arc sources..?
评论 #1781158 未加载
评论 #1781173 未加载
评论 #1781175 未加载
10smom超过 14 年前
Thanks for the info! now I need to get my son to explain to me. :)