Hacker News Ranking Algorithm

181 pointsby vignesh_wararabout 2 years ago

22 comments

tbonzaabout 2 years ago

Using page rank as suggested by OP, would provide weird incentives like frequent comments, or higher volume, could rank a person higher than if they had occasional comments further down the thread. If OP is interested in the most influential commenters, people who write frequently and have a posse, then PR would be a good way to do it. What would happen to a helpful comment from a throwaway account?If there's more of a ranking algorithm, identifying who is most likely to have actually read the article could be neat.

评论 #35510794 未加载

评论 #35512231 未加载

评论 #35510804 未加载

didgetmasterabout 2 years ago

I am often faced with a dilemma when trying to decide whether to vote on a particular post or comment. There are times when it is very informative, well written, and you can tell a lot of thought went into it; but I happen to disagree with the author's point of view. The same occurs when I agree with the author's sentiment, but the comment itself is poorly written and not well argued.There is only a single up/down vote. You can't upvote a thoughtful comment for its quality, while also downvoting its conclusion. Sometimes I have strong opinions about something; but I also try to educate myself with the other side of the issue by reading thoughtful articles and/or comments from people with different points of view.It would be nice if you could choose between two different 'modes' in the interface. One where the highest quality comments are near the top. The other where the most popular viewpoints are prioritized.

评论 #35513729 未加载

评论 #35512742 未加载

评论 #35518517 未加载

评论 #35514697 未加载

Scubabear68about 2 years ago

I think the article ranking is probably good enough.Where I have more trouble with is with comment-ranking within an article. The interface makes it hard to understand conversational flow, and of course that flow changes over time according to whatever the algorithm does. This makes it very hard to catch up later on an article’s comments and get a view of what has changed since my last view. Maybe there is a way to do this via the interface, but if so it’s not obvious.

评论 #35511597 未加载

评论 #35511546 未加载

评论 #35511542 未加载

phailhausabout 2 years ago

> why don't we apply PageRank for every user based on the upvotes they receive for the comments they leave on any postAbsolutely not, think about the incentives this creates! Imagine yourself to be an adversary, how would you game this system? Easy: spam low-effort highly agreeable comments across the site. The more the better, they're basically guaranteed to get some upvotes and boost your karma. Now, with your newfound power, you can shape the HN front page to your will.Instead of the front page being a signal of quality, it would be correlated to the posters' karma. And because the front page is a limited resource, you basically get a crappier version of Twitter where relatively low-karma accounts have practically no chance to get on the front page even with high-quality posts, and high-karma accounts will be able to roll their front-page posts into even more karma. This is pretty much one of the biggest flaws of major social media websites today: they concentrate power in a few users with massive influence.

prependabout 2 years ago

> I can't find the official page on why Paul Graham created HN, but to me, HN is the place to find the most interesting links and have a healthy discussion about various topics. In fact, discussions are my favorite part, except for the useless comments.Is there an official reason? I’ve wondered this as well and sort of assumed it was because hackers like having some community aspect and are novelty-seeking so YC created this due to the largesse of PG for the YC group and someone said “why not just make it public so applicants and future YCers can participate.” And job postings were added on to generate some revenue so it wasn’t a pure cost center.This headcanon reenforces my belief that the best community sites are accessories to something else and that you can’t make money from a community without diminishing its value (since the incentives for monetizing conversation are not the same and I’d guess 90% at odds with good conversation- mainly because bad things draw attention).

评论 #35511610 未加载

评论 #35511158 未加载

toastalabout 2 years ago

I'd like to see a negative multiplier for Substack and Medium posts for the pop-up spam.

评论 #35510923 未加载

slushhabout 2 years ago

HN is not the place for new ranking algorithms since it is more than good enough. However, the other social networks could benefit from better sorting. Would it be possible to create a Youtube client that only shows the interesting comments?

评论 #35511312 未加载

bitshiftfacedabout 2 years ago

We may be getting to the point where we can directly predict the quality, interestingness, and originality of a comment before it has a chance to receive viewer feedback. This would be similar to how we can now rank historical chess players according to how chess bots evaluate their moves. A forum could then start comments off where it approximates their rank will end up, and then potentially let viewer feedback take over from there.What's more: if you can't stop bots from dominating your signal, then directly predicting things like quality may be one of the only good options left.

bluelightning2kabout 2 years ago

Two obvious points: why not machine learning, and a more explicit mix of exploration and exploitationMeaning give more posts a chance, a certain number of views to determine the upvote rate.Conventional wisdom would apply spam filter, boost new users (e.g. first post), detect blatant manipulation (or more sophisticated manipulation using clustering).

评论 #35512886 未加载

kensabout 2 years ago

The article mentions a couple of formulas for the ranking algorithm. I'd like to point out my data-driven reverse-engineering of the HN ranking algorithm from 2013. The basic ranking formula uses votes and age with various exponents. But in my analysis, I found that "penalties" had a large effect on the ranking. Some penalties are applied automatically based on title words or the domain. Other penalties would be applied later. Too many comments would trigger the "controversy" penalty which would cause a sudden and drastic ranking drop.Link: <a href="https://www.righto.com/2013/11/how-hacker-news-ranking-really-works.html" rel="nofollow">https://www.righto.com/2013/11/how-hacker-news-ranking-reall...</a>

cinntaileabout 2 years ago

The actual algorithm seems to take into account post rate in some way. I think the assumption is that high early post rate implies controversial topic. If the ratio is too high it seems to get penalized pretty hard. Maybe downvotes and flaggings of posts is a factor too?

评论 #35511126 未加载

评论 #35511361 未加载

manxabout 2 years ago

Great to see more people thinking about this stuff! Our group, "Social Protocols" recently published a concrete idea on how to improve the ranking algorithm by introducing a new metric: <a href="https://github.com/social-protocols/news/blob/master/README.md">https://github.com/social-protocols/news/blob/master/README....</a>The metric in action: <a href="https://news.social-protocols.org/" rel="nofollow">https://news.social-protocols.org/</a>HN Discussion: <a href="https://news.ycombinator.com/item?id=35183317" rel="nofollow">https://news.ycombinator.com/item?id=35183317</a>

sideprojectabout 2 years ago

I recently launched a tool called HN+<a href="https://hn.plus" rel="nofollow">https://hn.plus</a>It's a tool where you can create your own HackerNews clone. Obviously, while working on this, I had to emulate the HN ranking algorithm. But what I found is that the algorithm actually doesn't seem to work well for a "young" forums.I had to tweak it so that on the frontpage, the posts that are more relatively recent rises to the top over older posts of higher votes. I know these are considered in the original algorithm, but I had to tweak it.Which made me think that the algorithm should be considering the age of the community and also how active it is.

EFruitabout 2 years ago

Perhaps I'm in the minority here, but I would like to see the entirety of HN's codebase posted as-is, no redactions, cleanup, etc.The entire site is officially moderated by ~2 people, and autonomously by the community. Whatever secret sauce they're using, that should be proof positive that it works. Keeping it a secret deprives countless communities of best-in-class tools and knowledge.Would it be detrimental to HN? In the short term, possibly (gaming the voting ring detector comes to mind), but it's hard to say what kind of impact there would be without knowing what goes on under the surface.

AraceliHarkerabout 2 years ago

I always read Hacker News through Feedly because the web page has too small letters and it’s hard to follow them, so I hardly had a chance to see the ranking.

nebulous1about 2 years ago

I don't like this idea. If hacker news comments were so numerous that the best comments by the "best" commenters always got lost in the maelstrom then there could be some sense to it. However, this has not been my experience of hacker news, and this shift towards a reputation based system seems unwarranted.

评论 #35511111 未加载

walterbellabout 2 years ago

<a href="https://hnrankings.info" rel="nofollow">https://hnrankings.info</a> graphs the history of every HN story, e.g. <a href="https://hnrankings.info/35510413/" rel="nofollow">https://hnrankings.info/35510413/</a>

karmakazeabout 2 years ago

Downvotes on posts is another way. It can require a minimum karma level and the value of downvotes can reduce with use to avoid abuse by individuals.

pluralisticabout 2 years ago

last month, "update to kagi search pricing" ranked 275 with metadata 116 points by exist 8 hours ago with 132 commentsits cohorts ranked 270 to 282 (by my screenshoot) are of age 1day+ and with ten or more commentsit's weird that post which is new, popular and engaging then sunk to the bottom 275 very fast within hours

评论 #35513204 未加载

评论 #35511287 未加载

iamsanteriabout 2 years ago

Is HN also built with RoR or is it something custom?

评论 #35511142 未加载

lonk11about 2 years ago

1. The article suggests running PageRank on a graph of users as nodes and edges as - "has user A upvoted a comment of user B" as an edge:""" Since it's likely that one user may upvote multiple comments from the same user, we check whether a user has already upvoted a comment from that specific user before considering their upvote. In other words, we treat user profiles as nodes and upvotes for comments as edges. """This is a very lossy conversion of the actual data:a. It does not distinguish if user C and D upvoted the same comment of user B or not. Maybe one comment was good and the other was bad. But when you convert it to the above graph you only get that C and D upvoted some comment of user B.b. It does not account for the number of comments user B left - 1000 or 5? This incentivizes spam because there is no upside to not posting.c. It ignores users that do not comment but upvote valuable comments themselves. Then the PageRank is used to weigh upvotes of users. This is backwards. The PageRank values should capture the value of each user's past upvotes in order to use it as a prediction of how valuable their future upvotes will be. But the suggested algorithm uses the value of the user's past comments as a weight of their future upvotes.To fix this I think the graph needs to be changed to a bipartite graph of users and comments as nodes and upvotes and flags as directed edges (when the author posts a comment - this should be represented as an implicit upvote of the comment). Then you can calculate how valuable each user's upvotes (and flags) are.2. The "gameability" of PageRank stems from the fact that the random walk algorithm treats each users equally as a starting point. It means that you can create a ton of fake users and upvote the comments of a target user you want to artificially boost in upvote-power. Each time the random walk starts at one of those fake users the walk will end up in the target user - increasing their PageRank score.My proposal to solve the "gameability" problem is to start each walk from you - the user that views HackerNews. It means that your past upvotes become the starting step of the random walk and so the resulting PageRank will be personalized for you. Instead of a single PageRank reputation score (which captures how user A's contributions to HN have been to all users), there is a set of personalized scores that capture how useful other users have been to you.I'm building <a href="https://linklonk.com" rel="nofollow">https://linklonk.com</a> which uses this kind of algorithm to rank both links and comments. The details of the ranking algorithm are here: <a href="https://linklonk.com/item/3292763817660940288" rel="nofollow">https://linklonk.com/item/3292763817660940288</a>

jonathanstrangeabout 2 years ago

I don't understand this blog post. The authors says that HN is great for discussions. Then they go on to suggest a different ranking algorithm? Why? To answer the authors question: I would change absolutely nothing about HN. It's great for discussions as is.Edit: I suppose there is some insinuation that HN doesn't really use the original algorithm, and this is the author's attempt to reconstruct what's going on. Is that what I was missing?

评论 #35511029 未加载

评论 #35510979 未加载

评论 #35511712 未加载

评论 #35511315 未加载