TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Embeddinghub: A vector database built for Machine Learning embeddings

118 点作者 cyrusthegreat超过 3 年前

11 条评论

cyrusthegreat超过 3 年前
Hi everyone!<p>Over the years, I&#x27;ve found myself building hacky solutions to serve and manage my embeddings. I’m excited to share Embeddinghub, an open-source vector database for ML embeddings. It is built with four goals in mind:<p>Store embeddings durably and with high availability<p>Allow for approximate nearest neighbor operations<p>Enable other operations like partitioning, sub-indices, and averaging<p>Manage versioning, access control, and rollbacks painlessly<p>It&#x27;s still in the early stages, and before we committed more dev time to it we wanted to get your feedback. Let us know what you think and what you&#x27;d like to see!<p>Repo: <a href="https:&#x2F;&#x2F;github.com&#x2F;featureform&#x2F;embeddinghub" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;featureform&#x2F;embeddinghub</a><p>Docs: <a href="https:&#x2F;&#x2F;docs.featureform.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;docs.featureform.com&#x2F;</a><p>What&#x27;s an Embedding? The Definitive Guide to Embeddings: <a href="https:&#x2F;&#x2F;www.featureform.com&#x2F;post&#x2F;the-definitive-guide-to-embeddings" rel="nofollow">https:&#x2F;&#x2F;www.featureform.com&#x2F;post&#x2F;the-definitive-guide-to-emb...</a>
评论 #28554717 未加载
评论 #28552794 未加载
评论 #28553392 未加载
评论 #28554774 未加载
barefeg超过 3 年前
Where can I find documentation on versioning? My first use case would be to versión different embeddings and use it more like a storage backend than to search for KNN. Would it be possible to not create the NN graph and just use it for versioned storage? We currently use opendistro and it nicely allows doing pre and post filtering based on other document fields (other than the embedding). Therefore I think this could never be a full replacement without figuring out how to combine the rest of the document structure
评论 #28558470 未加载
shabbyjoon超过 3 年前
How is this different from Pinecone, Milvus, and Faiss?
评论 #28555008 未加载
评论 #28555704 未加载
nelsondev超过 3 年前
Cool! Nice work! Do you have any performance numbers you could share?<p>Specifically around nearest neighbor computation latency, a regular get embedding latency, read&#x2F;write rate achieved on a machine?
评论 #28553663 未加载
kevin948超过 3 年前
This is really great! It speaks very much to my use-case (building user embeddings and serving them both to analysts + other ML models).<p>I was wondering if there was a reasonable way to store raw data next to the embeddings such that: 1. Analysts can run queries to filter down to a space they understand (the raw data). 2. Nearest neighbors can be run on top of their selection on the embedding space.<p>Our main use case is segmentation, so giving analysts access to the raw feature space is very important.
评论 #28558435 未加载
elephantum超过 3 年前
Nice, are there any benchmarks?<p>Would be interesting to see how it compares to Postgres or LevelDB for read&#x2F;write of exact values<p>And how it compares to Faiss&#x2F;Annoy for KNN
tourist_on_road超过 3 年前
Great work! Looks like you are using HNSWLIB. From what I understand HNSW graph based approach can be memory intensive compared PQ code based approach. FAISS has support for both HNSW and PQ codes. Any plans on extending your work to support PQ code based index in future?
评论 #28555266 未加载
planetsprite超过 3 年前
What makes this different from something like gensim? They have vector search for doc2vec embeddings.
评论 #28554474 未加载
deploy超过 3 年前
This looks awesome - psyched to try! Embeddings are a bitch, nice to see some new tools for managing them :)
评论 #28553848 未加载
sathergate超过 3 年前
which search algorithm does it use?
评论 #28553235 未加载
andreawangahead超过 3 年前
keep up with the good work!