TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Scalable reverse image search built on Kubernetes and Elasticsearch

139 点作者 alexkern大约 9 年前

6 条评论

rhsimplex大约 9 年前
Hi everyone, I&#x27;m the author of the underlying image matching library ( <a href="https:&#x2F;&#x2F;github.com&#x2F;ascribe&#x2F;image-match" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;ascribe&#x2F;image-match</a> ). First, thank you Alex for your contribution -- it makes image-match much more useful for the typical user.<p>Just to answer a couple of questions in the comments:<p>Goldberg&#x27;s algorithm (the one used in the image-match library) is not very robust against arbitrary rotation -- around +&#x2F;- 5 degrees should be ok. 90 degree rotations, mirror images, and color inversion are handled with the `all_orientations` parameter but under the hood this is just manipulating the image array and searching multiple times.<p>Even though the hash is a vector, and the similarity is a vector distance, when it&#x27;s time to search for an image, we don&#x27;t compute every distance. The hash vector is binned into integer &quot;words&quot; and we lookup against these columns. Only if there is a hit is the full signature computed. You can find more details in the Goldberg paper ( <a href="http:&#x2F;&#x2F;www.cs.cmu.edu&#x2F;~hcwong&#x2F;Pdfs&#x2F;icip02.ps" rel="nofollow">http:&#x2F;&#x2F;www.cs.cmu.edu&#x2F;~hcwong&#x2F;Pdfs&#x2F;icip02.ps</a> ).<p>Our original use case was a web image crawler, and hopefully we can release that code someday too. In the meantime, if you decide to roll your own crawler, be sure to Elasticsearch&#x27;s bulk API ( <a href="https:&#x2F;&#x2F;www.elastic.co&#x2F;guide&#x2F;en&#x2F;elasticsearch&#x2F;reference&#x2F;current&#x2F;docs-bulk.html" rel="nofollow">https:&#x2F;&#x2F;www.elastic.co&#x2F;guide&#x2F;en&#x2F;elasticsearch&#x2F;reference&#x2F;curr...</a> ) for the crawlers so as not to burden the Elasticsearch cluster too much. We were able to get well over 10k inserts&#x2F;s on a 5-node Elasticsearch cluster (I don&#x27;t remember how many worker nodes...the whole thing is IO limited waiting for images to download for processing, so there is even more optimization to be had there).<p>Thanks again, Alex!
teraflop大约 9 年前
This is cool!<p>I&#x27;m curious about which perceptual hashing algorithm is being used. The README says it&#x27;s &quot;invariant to scaling and rotation&quot; but the approach described in the linked paper is highly rotation-sensitive. (EDIT: it looks like the implementation can handle multiples of 90 degrees, which is a bit better than I thought at first)
评论 #11285732 未加载
rjvir大约 9 年前
Would be useful if they put up a simple site with an image uploader that demos the search
评论 #11286376 未加载
aub3bhat大约 9 年前
Is it built for exact reverse image search or is it able to retrieve semantically similar objects?
评论 #11286383 未加载
impostervt大约 9 年前
How about a scalable web crawler&#x2F;image indexer? ;)
评论 #11286422 未加载
justinsayarath大约 9 年前
wow super cool.