TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Scalable reverse image search built on Kubernetes and Elasticsearch

139 pointsby alexkernabout 9 years ago

6 comments

rhsimplexabout 9 years ago
Hi everyone, I&#x27;m the author of the underlying image matching library ( <a href="https:&#x2F;&#x2F;github.com&#x2F;ascribe&#x2F;image-match" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;ascribe&#x2F;image-match</a> ). First, thank you Alex for your contribution -- it makes image-match much more useful for the typical user.<p>Just to answer a couple of questions in the comments:<p>Goldberg&#x27;s algorithm (the one used in the image-match library) is not very robust against arbitrary rotation -- around +&#x2F;- 5 degrees should be ok. 90 degree rotations, mirror images, and color inversion are handled with the `all_orientations` parameter but under the hood this is just manipulating the image array and searching multiple times.<p>Even though the hash is a vector, and the similarity is a vector distance, when it&#x27;s time to search for an image, we don&#x27;t compute every distance. The hash vector is binned into integer &quot;words&quot; and we lookup against these columns. Only if there is a hit is the full signature computed. You can find more details in the Goldberg paper ( <a href="http:&#x2F;&#x2F;www.cs.cmu.edu&#x2F;~hcwong&#x2F;Pdfs&#x2F;icip02.ps" rel="nofollow">http:&#x2F;&#x2F;www.cs.cmu.edu&#x2F;~hcwong&#x2F;Pdfs&#x2F;icip02.ps</a> ).<p>Our original use case was a web image crawler, and hopefully we can release that code someday too. In the meantime, if you decide to roll your own crawler, be sure to Elasticsearch&#x27;s bulk API ( <a href="https:&#x2F;&#x2F;www.elastic.co&#x2F;guide&#x2F;en&#x2F;elasticsearch&#x2F;reference&#x2F;current&#x2F;docs-bulk.html" rel="nofollow">https:&#x2F;&#x2F;www.elastic.co&#x2F;guide&#x2F;en&#x2F;elasticsearch&#x2F;reference&#x2F;curr...</a> ) for the crawlers so as not to burden the Elasticsearch cluster too much. We were able to get well over 10k inserts&#x2F;s on a 5-node Elasticsearch cluster (I don&#x27;t remember how many worker nodes...the whole thing is IO limited waiting for images to download for processing, so there is even more optimization to be had there).<p>Thanks again, Alex!
teraflopabout 9 years ago
This is cool!<p>I&#x27;m curious about which perceptual hashing algorithm is being used. The README says it&#x27;s &quot;invariant to scaling and rotation&quot; but the approach described in the linked paper is highly rotation-sensitive. (EDIT: it looks like the implementation can handle multiples of 90 degrees, which is a bit better than I thought at first)
评论 #11285732 未加载
rjvirabout 9 years ago
Would be useful if they put up a simple site with an image uploader that demos the search
评论 #11286376 未加载
aub3bhatabout 9 years ago
Is it built for exact reverse image search or is it able to retrieve semantically similar objects?
评论 #11286383 未加载
impostervtabout 9 years ago
How about a scalable web crawler&#x2F;image indexer? ;)
评论 #11286422 未加载
justinsayarathabout 9 years ago
wow super cool.