TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How Akka Streams can be used to process the Wikidata dump in parallel

108 点作者 ArturSoler将近 10 年前

5 条评论

mtrn将近 10 年前
On a related note: When I indexed the whole English wikipedia last year, I was surprised, that it was possible to have a JSON version of it indexed[1] and searchable within half an hour on my laptop.<p>[1] Using parallel bulk indexer for ES: <a href="https:&#x2F;&#x2F;github.com&#x2F;miku&#x2F;esbulk" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;miku&#x2F;esbulk</a>
评论 #9764918 未加载
评论 #9767579 未加载
frik将近 10 年前
&gt; Process the whole Wikidata in 7 minutes with your laptop<p>Wikidata is several magnitudes smaller than Freebase (closed by Google in May) and it won&#x27;t fit in your RAM (laptop).
评论 #9765462 未加载
评论 #9765419 未加载
cristianpascu将近 10 年前
From their video: The presenter: &quot;Why would you (the assistent lady) be interested in cars?&quot; The assistent: &quot;I&#x27;m the perfect chick to be into Masserati.&quot;<p>It&#x27;s a bit disturbing to see an employee presenting her personal life, kids, interests, and what not. Good job, IntentHQ!<p>The video: <a href="https:&#x2F;&#x2F;www.intenthq.com&#x2F;resources&#x2F;interest-fingerprint&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.intenthq.com&#x2F;resources&#x2F;interest-fingerprint&#x2F;</a>
评论 #9764804 未加载
评论 #9764679 未加载
jimbokun将近 10 年前
Could this example have been accomplished with awk and xargs just as fast, with same or less memory usage, in fewer lines of code?<p>Seems so to me after skimming the article, but maybe I missed an important advantage of using Akka Streams for this task?
评论 #9769897 未加载
MrDosu将近 10 年前
Are streaming json parsers that rare?