TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Vector-space database (as a service)?

5 pointsby kleebeeshover 7 years ago
Methods like collaborative filtering via matrix factorization, Word2vec, Doc2vec, etc.. map large, sparse matrices into a low-dimensional vector space while enforcing similarity constraints. There are extensions for vectorizing various modalities (users, items, documents, audio, images, etc) into one vector-space for similarity search and recommendation ([1], [2], [3]). There is extensive research on approximate nearest-neighbor searches ([4]).<p>For example: it&#x27;s possible to map users, songs, and artists into a common vector space ([1]). Two users who listen to similar songs have high similarity. Songs are recommended based on vector similarity to users. This pattern extends to many domains as long as there is a way to enforce similarity (likes, co-occurrences, etc.) to &quot;train&quot; the vectors.<p>In my experience, training the vectors is simpler than the engineering to efficiently query them (e.g. &quot;select the 10 nearest neighbors to vector with ID 123&quot;). This becomes expensive for large datasets, and correctly using the approximate nearest neighbor libraries is non-trivial.<p>I can&#x27;t find any database to insert vectors as they&#x27;re computed and then run queries against them. It seems often companies build a custom API on top of one of the approximate nearest neighbors libraries. Though the interesting queries seem pretty homogeneous.<p>Any ideas as to why none of the big DB players have an offering for this use-case? Like Algolia, but for vectors instead of text? Any recommendations for such a product?<p>[1] IHeartRadio queries various modalities of data from the same vector space: https:&#x2F;&#x2F;youtu.be&#x2F;jjO1gOH-BW4?t=5m39s [2] Using a convnet to map new (cold-start) songs into an existing vector space: http:&#x2F;&#x2F;benanne.github.io&#x2F;2014&#x2F;08&#x2F;05&#x2F;spotify-cnns.html [3] Flickr similarity search: http:&#x2F;&#x2F;code.flickr.net&#x2F;2017&#x2F;03&#x2F;07&#x2F;introducing-similarity-search-at-flickr&#x2F; [4] Benchmarks for approximate nearest neighbor libs: https:&#x2F;&#x2F;github.com&#x2F;erikbern&#x2F;ann-benchmarks

2 comments

PaulHouleover 7 years ago
Hyperdimensional nearest-neighbor search is a tough problem; there are index algorithms such as ball trees that work, but they don&#x27;t deliver the big wins that b-trees give in 1-d space, quadtrees in 2-d space, etc.<p>In many &quot;as a service&quot; offerings computational costs are not a big deal. For this one it would be, thus making the pricing work right for everybody would be a toughie.
billconanover 7 years ago
I thought about word2vec as a service. I gave up because I think customers could easily cache (pirate) my data.
评论 #15714051 未加载