TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Locality-sensitive hashes are designed to cause collisions and are useful

37 点作者 gauravsc大约 13 年前

5 条评论

tylerneylon大约 13 年前
I created a useful LSH algorithm for real-valued vector data. Its strong suit is that it is deterministic. In other words, it is deterministically guaranteed to find all close-enough neighbors, and omit all far-enough neighbors. Most other LSH algorithms do not provide both of these guarantees deterministically -- instead they strive toward those goals probabilistically. It's also fast, conceptually simple, and built on top of a very cool non-obvious tessellation of arbitrarily high dimensions.<p>Here are some slides on it:<p><a href="https://files.pbworks.com/download/dJjN51z5uR/hackerdojo/27189819/lsh_hacker_dojo_talk.pdf" rel="nofollow">https://files.pbworks.com/download/dJjN51z5uR/hackerdojo/271...</a><p>This work was also published in SODA 10:<p><a href="https://www.siam.org/proceedings/soda/2010/SODA10_094_neylont.pdf" rel="nofollow">https://www.siam.org/proceedings/soda/2010/SODA10_094_neylon...</a>
marcusf大约 13 年前
I just spent an inordinate amount of time reading up on LSH for images. In the system we develop, people tend to upload images from their desktop time and again instead of searching in the system. Part of it is a UI challenge, we have to make search better. But in the case of them uploading the same image repeatedly it would be good to be able to do some kind of LSH and see if the image is already in our database (like TinEye). I started out with LSH via random projection [1] and got further down the rabbit hole from there.<p>What stumped me was creating a good enough feature vector, balancing size with information. We have a hack day tomorrow, might pick it up again.<p>[1] <a href="https://engineering.purdue.edu/~malcolm/yahoo/Slaney2008(LSHTutorialDraft).pdf" rel="nofollow">https://engineering.purdue.edu/~malcolm/yahoo/Slaney2008(LSH...</a>
gms大约 13 年前
For anyone looking to use this, please note that the benefits of LSH rapidly diminish if your nearest neighbours are in fact far away.
评论 #3969637 未加载
评论 #3969688 未加载
lclarkmichalek大约 13 年前
Kind of related, just recently I wrote a blog post on how to do fuzzy location aware matchmaking using redis and geohash. It exploits similar properties, namely that as you reduce the precision of the geohash, you get more collisions. You can find it here: <a href="http://www.generictestdomain.net/Redis/2012/05/07/location-aware-matchmaking-with-redis/" rel="nofollow">http://www.generictestdomain.net/Redis/2012/05/07/location-a...</a>
ndl大约 13 年前
I recently had some ideas about how to use the concept of locality-sensitive hashes with thread/worker pools that share locking resources. Basically, it's useful when want things which are going to take write locks on the same resources to end up on the same thread, since otherwise you are just needlessly blocking up extra workers waiting for other workers to finish. Also, in the case of having multiple actors with independent task queues, you can send tasks using a particular resource to the same queue, so that they will be processed in the same order they were received.<p>There are probably better ways to do this in most cases, but I thought it an interesting idea.