科技回声

6 条评论

rhsimplex大约 9 年前

Hi everyone, I'm the author of the underlying image matching library ( <a href="https://github.com/ascribe/image-match" rel="nofollow">https://github.com/ascribe/image-match</a> ). First, thank you Alex for your contribution -- it makes image-match much more useful for the typical user.Just to answer a couple of questions in the comments:Goldberg's algorithm (the one used in the image-match library) is not very robust against arbitrary rotation -- around +/- 5 degrees should be ok. 90 degree rotations, mirror images, and color inversion are handled with the `all_orientations` parameter but under the hood this is just manipulating the image array and searching multiple times.Even though the hash is a vector, and the similarity is a vector distance, when it's time to search for an image, we don't compute every distance. The hash vector is binned into integer "words" and we lookup against these columns. Only if there is a hit is the full signature computed. You can find more details in the Goldberg paper ( <a href="http://www.cs.cmu.edu/~hcwong/Pdfs/icip02.ps" rel="nofollow">http://www.cs.cmu.edu/~hcwong/Pdfs/icip02.ps</a> ).Our original use case was a web image crawler, and hopefully we can release that code someday too. In the meantime, if you decide to roll your own crawler, be sure to Elasticsearch's bulk API ( <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html" rel="nofollow">https://www.elastic.co/guide/en/elasticsearch/reference/curr...</a> ) for the crawlers so as not to burden the Elasticsearch cluster too much. We were able to get well over 10k inserts/s on a 5-node Elasticsearch cluster (I don't remember how many worker nodes...the whole thing is IO limited waiting for images to download for processing, so there is even more optimization to be had there).Thanks again, Alex!

teraflop大约 9 年前

This is cool!I'm curious about which perceptual hashing algorithm is being used. The README says it's "invariant to scaling and rotation" but the approach described in the linked paper is highly rotation-sensitive. (EDIT: it looks like the implementation can handle multiples of 90 degrees, which is a bit better than I thought at first)

评论 #11285732 未加载

rjvir大约 9 年前

Would be useful if they put up a simple site with an image uploader that demos the search

评论 #11286376 未加载

aub3bhat大约 9 年前

Is it built for exact reverse image search or is it able to retrieve semantically similar objects?

评论 #11286383 未加载

impostervt大约 9 年前

How about a scalable web crawler/image indexer? ;)

评论 #11286422 未加载

justinsayarath大约 9 年前

wow super cool.

6 条评论

rhsimplex大约 9 年前

teraflop大约 9 年前

评论 #11285732 未加载

rjvir大约 9 年前

Would be useful if they put up a simple site with an image uploader that demos the search

评论 #11286376 未加载

aub3bhat大约 9 年前

Is it built for exact reverse image search or is it able to retrieve semantically similar objects?

评论 #11286383 未加载

impostervt大约 9 年前

How about a scalable web crawler/image indexer? ;)

评论 #11286422 未加载

justinsayarath大约 9 年前

wow super cool.

Show HN: Scalable reverse image search built on Kubernetes and Elasticsearch

6 条评论

Show HN: Scalable reverse image search built on Kubernetes and Elasticsearch

6 条评论