I trialed Aleph recently and was impressed by its progress against an ambitious goal. My impressions as a user were as follows:<p>1. Aleph is excellent out-of-the-box for its<p>– OCR, via Tesseract or Google’s Vision API<p>– Full text search, via Elasticsearch<p>– Browser based UI, via React<p>2. Aleph does a okay job but has room for improvement with<p>– Entity extraction<p>– Language detection<p>where “okay” means it’s accurate enough to be useful for filtering by names, emails, languages, <i>etc.</i>, but you’ll probably encounter occasional errors.<p>I also noticed search latency in my deployment and would love to try the Elasticsearch tips from the HN thread last week [1]. This latency does not appear in the production deployment by the Aleph team.<p>[1]: <a href="https://news.ycombinator.com/item?id=22396918" rel="nofollow">https://news.ycombinator.com/item?id=22396918</a><p>Again, props to the Aleph team for their success so far.