TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Embeddinghub: A vector database built for Machine Learning embeddings

118 pointsby cyrusthegreatover 3 years ago

11 comments

cyrusthegreatover 3 years ago
Hi everyone!<p>Over the years, I&#x27;ve found myself building hacky solutions to serve and manage my embeddings. I’m excited to share Embeddinghub, an open-source vector database for ML embeddings. It is built with four goals in mind:<p>Store embeddings durably and with high availability<p>Allow for approximate nearest neighbor operations<p>Enable other operations like partitioning, sub-indices, and averaging<p>Manage versioning, access control, and rollbacks painlessly<p>It&#x27;s still in the early stages, and before we committed more dev time to it we wanted to get your feedback. Let us know what you think and what you&#x27;d like to see!<p>Repo: <a href="https:&#x2F;&#x2F;github.com&#x2F;featureform&#x2F;embeddinghub" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;featureform&#x2F;embeddinghub</a><p>Docs: <a href="https:&#x2F;&#x2F;docs.featureform.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;docs.featureform.com&#x2F;</a><p>What&#x27;s an Embedding? The Definitive Guide to Embeddings: <a href="https:&#x2F;&#x2F;www.featureform.com&#x2F;post&#x2F;the-definitive-guide-to-embeddings" rel="nofollow">https:&#x2F;&#x2F;www.featureform.com&#x2F;post&#x2F;the-definitive-guide-to-emb...</a>
评论 #28554717 未加载
评论 #28552794 未加载
评论 #28553392 未加载
评论 #28554774 未加载
barefegover 3 years ago
Where can I find documentation on versioning? My first use case would be to versión different embeddings and use it more like a storage backend than to search for KNN. Would it be possible to not create the NN graph and just use it for versioned storage? We currently use opendistro and it nicely allows doing pre and post filtering based on other document fields (other than the embedding). Therefore I think this could never be a full replacement without figuring out how to combine the rest of the document structure
评论 #28558470 未加载
shabbyjoonover 3 years ago
How is this different from Pinecone, Milvus, and Faiss?
评论 #28555008 未加载
评论 #28555704 未加载
nelsondevover 3 years ago
Cool! Nice work! Do you have any performance numbers you could share?<p>Specifically around nearest neighbor computation latency, a regular get embedding latency, read&#x2F;write rate achieved on a machine?
评论 #28553663 未加载
kevin948over 3 years ago
This is really great! It speaks very much to my use-case (building user embeddings and serving them both to analysts + other ML models).<p>I was wondering if there was a reasonable way to store raw data next to the embeddings such that: 1. Analysts can run queries to filter down to a space they understand (the raw data). 2. Nearest neighbors can be run on top of their selection on the embedding space.<p>Our main use case is segmentation, so giving analysts access to the raw feature space is very important.
评论 #28558435 未加载
elephantumover 3 years ago
Nice, are there any benchmarks?<p>Would be interesting to see how it compares to Postgres or LevelDB for read&#x2F;write of exact values<p>And how it compares to Faiss&#x2F;Annoy for KNN
tourist_on_roadover 3 years ago
Great work! Looks like you are using HNSWLIB. From what I understand HNSW graph based approach can be memory intensive compared PQ code based approach. FAISS has support for both HNSW and PQ codes. Any plans on extending your work to support PQ code based index in future?
评论 #28555266 未加载
planetspriteover 3 years ago
What makes this different from something like gensim? They have vector search for doc2vec embeddings.
评论 #28554474 未加载
deployover 3 years ago
This looks awesome - psyched to try! Embeddings are a bitch, nice to see some new tools for managing them :)
评论 #28553848 未加载
sathergateover 3 years ago
which search algorithm does it use?
评论 #28553235 未加载
andreawangaheadover 3 years ago
keep up with the good work!