TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Aleph: A suite of data analysis tools for investigators

240 点作者 salzig大约 5 年前

10 条评论

capableweb大约 5 年前
The GitHub readme&#x2F;repository doesn&#x27;t give a fair overview over what this project really covers. Seems really ambitions and well made, at least from a quick glance. This page gives a better overview: <a href="https:&#x2F;&#x2F;docs.alephdata.org&#x2F;how-aleph-is-used" rel="nofollow">https:&#x2F;&#x2F;docs.alephdata.org&#x2F;how-aleph-is-used</a><p>Some problems they aim to solve:<p>&gt; Easy data search for both structured and unstructured information (ie. documents and databases).<p>&gt; Cross-referencing between different datasets (&quot;Who are all the politicians in my country that are mentioned in this leak?&quot;)<p>&gt; Access control and data compartmentalisation, but also flexible sharing within cross-border teams.<p>&gt; Continuous crawling of hundreds of public data sources as background material for research.<p>&gt; Visual exploration of investigative analysis.
评论 #22458673 未加载
divbzero大约 5 年前
I trialed Aleph recently and was impressed by its progress against an ambitious goal. My impressions as a user were as follows:<p>1. Aleph is excellent out-of-the-box for its<p>– OCR, via Tesseract or Google’s Vision API<p>– Full text search, via Elasticsearch<p>– Browser based UI, via React<p>2. Aleph does a okay job but has room for improvement with<p>– Entity extraction<p>– Language detection<p>where “okay” means it’s accurate enough to be useful for filtering by names, emails, languages, <i>etc.</i>, but you’ll probably encounter occasional errors.<p>I also noticed search latency in my deployment and would love to try the Elasticsearch tips from the HN thread last week [1]. This latency does not appear in the production deployment by the Aleph team.<p>[1]: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=22396918" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=22396918</a><p>Again, props to the Aleph team for their success so far.
评论 #22462516 未加载
评论 #22463186 未加载
ssutch3大约 5 年前
A LOL from the docs:<p>&gt; Can I run Aleph without using Docker?<p>&gt; Can Britain leave the European Union? Yes, it&#x27;s possible; but complicated and will probably not make your life better in the way that you&#x27;re expecting.
salzig大约 5 年前
As a side note, I stumbled on this cause the German Public Television seems to work on this too. Found it quite interesting to see that, in addition to finding this project<p><a href="https:&#x2F;&#x2F;github.com&#x2F;NorddeutscherRundfunk&#x2F;aleph" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;NorddeutscherRundfunk&#x2F;aleph</a>
adultSwim大约 5 年前
<a href="https:&#x2F;&#x2F;www.icij.org&#x2F;blog&#x2F;2016&#x2F;04&#x2F;data-tech-team-icij&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.icij.org&#x2F;blog&#x2F;2016&#x2F;04&#x2F;data-tech-team-icij&#x2F;</a><p>ICIJ put together a great platform to investigate the Panama and Paradise Papers
Jugurtha大约 5 年前
Note on the name: In addition to the origin story, Aleph is also the first letter in Arabic and Hebrew (א, ا)
评论 #22460091 未加载
DyslexicAtheist大约 5 年前
been using this for some time to find info on companies&#x2F;CEO&#x27;s and other characters that appear in my news feeds.<p>here is a working example: <a href="https:&#x2F;&#x2F;aleph.occrp.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;aleph.occrp.org&#x2F;</a><p>It also has a great client API which allows you to index a large volume of pdf&#x27;s all at once:<p><pre><code> $&gt; alephclient crawldir --foreign-id &lt;id&gt; directory_with_pdf&#x2F;</code></pre>
monkeydust大约 5 年前
Looks interesting for personal or company wide search across multiple document types.
traverseda大约 5 年前
Looks like a great alternative to open semantic search.
OliverJones大约 5 年前
Dear HN colleagues: let&#x27;s be careful about swamping those Aleph folks with traffic. They probably have enemies around the web that would exploit any overload and outage. Slashdotting can definitely turn into an unintended dDOS attack.<p>Better yet: maybe somebody with access to some kind of attack-resistant CDN provider could help them migrate.<p>If they haven&#x27;t already.
评论 #22458888 未加载