TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: I Scraped Hacker News for TLD Popularity

68 点作者 klausbreyer大约 2 年前

15 条评论

zX41ZdbW大约 2 年前
This will save your time:<p><a href="https:&#x2F;&#x2F;play.clickhouse.com&#x2F;play?user=play#U0VMRUNUIHRvcExldmVsRG9tYWluKHVybCkgQVMgZG9tYWluLCBjb3VudCgpIEFTIGMsIHN1bShzY29yZSksIHVuaXEoYnkpLCByb3VuZChhdmcoc2NvcmUpLCAzKSBGUk9NIGhhY2tlcm5ld3MgV0hFUkUgdHlwZSA9ICdzdG9yeScgQU5EIGRvbWFpbiAhPSAnJyBHUk9VUCBCWSAxIE9SREVSIEJZIDIgREVTQw==" rel="nofollow">https:&#x2F;&#x2F;play.clickhouse.com&#x2F;play?user=play#U0VMRUNUIHRvcExld...</a><p>Takeaways:<p>- .org is 50% better than .com; - .edu and .gov are really nice; - .io is cool, and much better than .uk;<p>PS. I don&#x27;t remember any rate limits of the API. Here is how I downloaded the data: <a href="https:&#x2F;&#x2F;github.com&#x2F;ClickHouse&#x2F;ClickHouse&#x2F;issues&#x2F;29693">https:&#x2F;&#x2F;github.com&#x2F;ClickHouse&#x2F;ClickHouse&#x2F;issues&#x2F;29693</a>
评论 #35135967 未加载
gryn大约 2 年前
you don&#x27;t need to scrape HN there&#x27;s the public dataset in google BigQuery, I don&#x27;t know if it&#x27;s still updated regularly.<p>edit:<p>here&#x27;s the link <a href="https:&#x2F;&#x2F;console.cloud.google.com&#x2F;marketplace&#x2F;details&#x2F;y-combinator&#x2F;hacker-news" rel="nofollow">https:&#x2F;&#x2F;console.cloud.google.com&#x2F;marketplace&#x2F;details&#x2F;y-combi...</a><p>they seems to have stopped updating around the later part of 2022, don&#x27;t know why.
评论 #35135462 未加载
评论 #35137855 未加载
skizm大约 2 年前
Dumb question: but I thought the &quot;.io&quot; TLD was owned by a really sketchy group &#x2F; organization or ownership was being contested (or similar?) and having a domain there was semi-risky, is that still the case? I&#x27;ve avoided .io TLD because of this vague notion, but never really knew the specifics.
评论 #35135663 未加载
评论 #35137909 未加载
评论 #35135781 未加载
评论 #35138906 未加载
ryan29大约 2 年前
This is great! It&#x27;s something I&#x27;ve wondered about for a while. I was surprised to see the decline of .com is fairly linear. Before I looked I was expecting the use of alternate TLDs to be accelerating a bit.<p>Are the absolute values the running totals? If so, why do they decline from 2021 to 2022?<p>I think a graph for unique counts would be cool to see too. For example, the ClickHouse query posted earlier in this thread shows:<p><pre><code> domain count unique ------------------------- .org 349414 58226 .net 114499 31129 </code></pre> So the submissions using .org are 3x .net, but the unique domains seen using .org are less than 2x .net. I&#x27;m not sure if there&#x27;s any significance there, but it would be interesting to see the difference.<p>In the same context, I think it would be interesting to see the top 50 domains on each TLD.<p>Anyway, it&#x27;s very cool info to see. Thanks for sharing it!
评论 #35148594 未加载
ktpsns大约 2 年前
The interpretation you never asked for: The data exhibit a strong preferential attachment [1] behaviour, i.e. you can draw a line in a log-log plot (despite only a semilog plot is shown). This is typical for real world data.<p>[1] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Preferential_attachment" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Preferential_attachment</a>
评论 #35137993 未加载
jamiemau大约 2 年前
Speaking of .io tld: Looks like the territory is going thru something (last paragraphs of <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;.io#History" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;.io#History</a> mentions &quot;.io domain could also be extinguished&quot;)<p>Should domain owners be worried?
thomasjb大约 2 年前
Is there any information within .it, for the Italian provincial second level domains? I find the idea of these domains fascinating, although implementation of them could be problematic, given that countries can lose or gain territory over time (suppose the Kingdom of Naples secedes from Italy?)
linux2647大约 2 年前
&gt; Wrong links: <a href="http:&#x2F;&#x2F;blog.plover.com.&#x2F;prog&#x2F;lib.html" rel="nofollow">http:&#x2F;&#x2F;blog.plover.com.&#x2F;prog&#x2F;lib.html</a><p>That’s technically not incorrect. Host names, as far as DNS is concerned, always have a trailing “.” And my browser resolves the URL just fine
c7b大约 2 年前
One remark: .io looks like it&#x27;s #3 in the time series, but it seems to be missing from bar plot.
评论 #35137373 未加载
Beefin大约 2 年前
here&#x27;s what i&#x27;ve been using to automate domain name discovery:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;esteininger&#x2F;domain-name-checker&#x2F;blob&#x2F;main&#x2F;main.py">https:&#x2F;&#x2F;github.com&#x2F;esteininger&#x2F;domain-name-checker&#x2F;blob&#x2F;main...</a>
RomanPushkin大约 2 年前
&gt; I had a database full of HN Stories since the very beginning, which accumulated to ~1GB.<p>Just curious, is there any way to download that?
评论 #35137941 未加载
mthoms大约 2 年前
This is interesting, thanks.<p>If someone wanted to dig deeper, does anyone know if Google makes the .dev zone file public?
jossclimb大约 2 年前
I can&#x27;t figure out what they did? Web scrapped hacker news for domain ideas?
thrdbndndn大约 2 年前
Why does the stack chart randomly change colors by just filtering&#x2F;ordering?
评论 #35137882 未加载
testernews大约 2 年前
domain.com.&#x2F;whatever is valid tho?
评论 #35148599 未加载