TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Vector-space database (as a service)?

5 点作者 kleebeesh超过 7 年前
Methods like collaborative filtering via matrix factorization, Word2vec, Doc2vec, etc.. map large, sparse matrices into a low-dimensional vector space while enforcing similarity constraints. There are extensions for vectorizing various modalities (users, items, documents, audio, images, etc) into one vector-space for similarity search and recommendation ([1], [2], [3]). There is extensive research on approximate nearest-neighbor searches ([4]).<p>For example: it&#x27;s possible to map users, songs, and artists into a common vector space ([1]). Two users who listen to similar songs have high similarity. Songs are recommended based on vector similarity to users. This pattern extends to many domains as long as there is a way to enforce similarity (likes, co-occurrences, etc.) to &quot;train&quot; the vectors.<p>In my experience, training the vectors is simpler than the engineering to efficiently query them (e.g. &quot;select the 10 nearest neighbors to vector with ID 123&quot;). This becomes expensive for large datasets, and correctly using the approximate nearest neighbor libraries is non-trivial.<p>I can&#x27;t find any database to insert vectors as they&#x27;re computed and then run queries against them. It seems often companies build a custom API on top of one of the approximate nearest neighbors libraries. Though the interesting queries seem pretty homogeneous.<p>Any ideas as to why none of the big DB players have an offering for this use-case? Like Algolia, but for vectors instead of text? Any recommendations for such a product?<p>[1] IHeartRadio queries various modalities of data from the same vector space: https:&#x2F;&#x2F;youtu.be&#x2F;jjO1gOH-BW4?t=5m39s [2] Using a convnet to map new (cold-start) songs into an existing vector space: http:&#x2F;&#x2F;benanne.github.io&#x2F;2014&#x2F;08&#x2F;05&#x2F;spotify-cnns.html [3] Flickr similarity search: http:&#x2F;&#x2F;code.flickr.net&#x2F;2017&#x2F;03&#x2F;07&#x2F;introducing-similarity-search-at-flickr&#x2F; [4] Benchmarks for approximate nearest neighbor libs: https:&#x2F;&#x2F;github.com&#x2F;erikbern&#x2F;ann-benchmarks

2 条评论

PaulHoule超过 7 年前
Hyperdimensional nearest-neighbor search is a tough problem; there are index algorithms such as ball trees that work, but they don&#x27;t deliver the big wins that b-trees give in 1-d space, quadtrees in 2-d space, etc.<p>In many &quot;as a service&quot; offerings computational costs are not a big deal. For this one it would be, thus making the pricing work right for everybody would be a toughie.
billconan超过 7 年前
I thought about word2vec as a service. I gave up because I think customers could easily cache (pirate) my data.
评论 #15714051 未加载