TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: HN Domain Leaderboard

266 pointsby refrigeratorabout 7 years ago

17 comments

minimaxirabout 7 years ago
So that others can play with the data, here&#x27;s a reverse engineering of the BigQuery OP used to create the leaderboard:<p><pre><code> #standardSQL SELECT domain, COUNT(*) AS num_posts, perc_75, AVG(score) AS avg_score, (AVG(score) + 2*perc_75) * LOG(COUNT(*)) AS calc_score FROM ( SELECT REGEXP_REPLACE(NET.HOST(url), &#x27;www.&#x27;, &#x27;&#x27;) AS domain, score, PERCENTILE_CONT(score, 0.75) OVER (PARTITION BY REGEXP_REPLACE(NET.HOST(url), &#x27;www.&#x27;, &#x27;&#x27;)) AS perc_75 FROM `bigquery-public-data.hacker_news.full` WHERE type = &#x27;story&#x27; AND url IS NOT NULL ) GROUP BY domain, perc_75 ORDER BY calc_score DESC </code></pre> Top 10000 results: <a href="https:&#x2F;&#x2F;docs.google.com&#x2F;spreadsheets&#x2F;d&#x2F;1Z9atmizTAPkgFiBte2eQiQgxEAiMyMB7Q99fzMzfIJs&#x2F;edit?usp=sharing" rel="nofollow">https:&#x2F;&#x2F;docs.google.com&#x2F;spreadsheets&#x2F;d&#x2F;1Z9atmizTAPkgFiBte2eQ...</a><p>(it&#x27;s apparently not a perfect match since there appears to be a minimum # of posts requirement for domains [e.g. without that requirement, <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;from?site=pardonsnowden.org" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;from?site=pardonsnowden.org</a> is #3], which should be added to the description of the leaderboard)
评论 #16695801 未加载
评论 #16694722 未加载
foobabout 7 years ago
Very cool, thanks for sharing! I did a somewhat similar analysis a while back [1], and I found that many of the top domains either had a YC affiliation or corresponded to extremely well-known companies or organizations. This made me interested in finding lesser known blogs that also produce high quality content. I tried to identify these by putting a limit on the number of unique users who had submitted content from each domain. My thinking here was that something like the GitHub blog would have submissions from many users, while smaller personal blogs would probably be mostly self-promoted. Using this approach, I was able to turn up some pretty interesting blogs that I had never heard of before.<p>I think it could really increase the usefulness of HN Domain Leaderboard if you added some additional filtering capabilities. Filtering based on the category would probably be pretty easy because you have that information there already, but perhaps also consider some measure of how broadly promoted each domain is. The time range option is already pretty cool, and I&#x27;ll bet that a few more options would make it even more fun to play around with.<p>[1] - <a href="https:&#x2F;&#x2F;intoli.com&#x2F;blog&#x2F;pareto-optimal-blogs&#x2F;" rel="nofollow">https:&#x2F;&#x2F;intoli.com&#x2F;blog&#x2F;pareto-optimal-blogs&#x2F;</a>
评论 #16695856 未加载
aphextronabout 7 years ago
I&#x27;d really like to see the opposite of this: domains that have been flagged multiple times and have a high submissions-to-upvotes ratio so that I can filter them out.
评论 #16695653 未加载
评论 #16693331 未加载
ghayesabout 7 years ago
It would be great if you could add top posts from each of these domains. I am really interested to see the top content I may have missed from a few of these domains.
评论 #16693313 未加载
评论 #16695815 未加载
aaronhoffmanabout 7 years ago
This is a little out of date but may be of interest here. This is a visualization of the top 10,000 HN posts <a href="https:&#x2F;&#x2F;www.sizzleanalytics.com&#x2F;Boards&#x2F;sizzle&#x2F;Hacker-News-Top-Posts-All-Time&#x2F;dfb2af8e-67fa-47a7-892c-435de6321378" rel="nofollow">https:&#x2F;&#x2F;www.sizzleanalytics.com&#x2F;Boards&#x2F;sizzle&#x2F;Hacker-News-To...</a>
mzzterabout 7 years ago
I would have thought bravenewgeek.com would make it onto the leaderboard since his posts [1] are typically high quality.<p>[1] <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;from?site=bravenewgeek.com" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;from?site=bravenewgeek.com</a>
sliabout 7 years ago
Kind of amazing that the Rust blog, something relatively new, is the top domain of all time.
glaberfickenabout 7 years ago
Ah! was searching around for exactly this just a week ago and gave up. Could you add more granular date filters? (past month past week etc?) thanks for doing it!
bhhaskinabout 7 years ago
Interesting that there are no News related domains in the list. I wounder if that is due to the number of posts those domains have that never gain any traction.
评论 #16695720 未加载
raymondghabout 7 years ago
How did you determine the domain categories?
评论 #16695821 未加载
评论 #16693557 未加载
alexchamberlainabout 7 years ago
Is mean a valid statistic for this dataset?<p>I suspect that the score a link gets is highly variable and doesn&#x27;t follow a known distribution, therefore, taking a straight mean may not be a valid thing to do, or at the very least, very very skewed.<p>That being said, cool idea, well executed.
评论 #16695962 未加载
downandoutabout 7 years ago
Interesting that so many of the top sites are &quot;individual&quot;. I always thought that self promotion was shunned on places like HN, but I guess if you do it in the &quot;right&quot; way, it can be a successful tactic.
评论 #16694034 未加载
评论 #16693682 未加载
ninjakeyboardabout 7 years ago
Aphyr needs more upvotes :)<p>EDIT: never mind - on the three year view he&#x27;s in the top 10
matte_blackabout 7 years ago
I thought this would be a leaderboard of what users get the most votes for comments on different topics.
dsaccoabout 7 years ago
Karpathy got into the top 20 most upvoted domain submissions? I don&#x27;t even remember that many.
ecesenaabout 7 years ago
nit: blog.pinboard.in is classified as individual
评论 #16693304 未加载
评论 #16695833 未加载
skullumabout 7 years ago
hnleaderboard insecure connection rip
评论 #16696128 未加载