TechEcho

11 comments

Hi everyone!Over the years, I've found myself building hacky solutions to serve and manage my embeddings. I’m excited to share Embeddinghub, an open-source vector database for ML embeddings. It is built with four goals in mind:Store embeddings durably and with high availabilityAllow for approximate nearest neighbor operationsEnable other operations like partitioning, sub-indices, and averagingManage versioning, access control, and rollbacks painlesslyIt's still in the early stages, and before we committed more dev time to it we wanted to get your feedback. Let us know what you think and what you'd like to see!Repo: <a href="https://github.com/featureform/embeddinghub" rel="nofollow">https://github.com/featureform/embeddinghub</a>Docs: <a href="https://docs.featureform.com/" rel="nofollow">https://docs.featureform.com/</a>What's an Embedding? The Definitive Guide to Embeddings: <a href="https://www.featureform.com/post/the-definitive-guide-to-embeddings" rel="nofollow">https://www.featureform.com/post/the-definitive-guide-to-emb...</a>

评论 #28554717 未加载

评论 #28552794 未加载

评论 #28553392 未加载

评论 #28554774 未加载

barefegover 3 years ago

Where can I find documentation on versioning? My first use case would be to versión different embeddings and use it more like a storage backend than to search for KNN. Would it be possible to not create the NN graph and just use it for versioned storage? We currently use opendistro and it nicely allows doing pre and post filtering based on other document fields (other than the embedding). Therefore I think this could never be a full replacement without figuring out how to combine the rest of the document structure

评论 #28558470 未加载

shabbyjoonover 3 years ago

How is this different from Pinecone, Milvus, and Faiss?

评论 #28555008 未加载

评论 #28555704 未加载

nelsondevover 3 years ago

Cool! Nice work! Do you have any performance numbers you could share?Specifically around nearest neighbor computation latency, a regular get embedding latency, read/write rate achieved on a machine?

评论 #28553663 未加载

kevin948over 3 years ago

This is really great! It speaks very much to my use-case (building user embeddings and serving them both to analysts + other ML models).I was wondering if there was a reasonable way to store raw data next to the embeddings such that: 1. Analysts can run queries to filter down to a space they understand (the raw data). 2. Nearest neighbors can be run on top of their selection on the embedding space.Our main use case is segmentation, so giving analysts access to the raw feature space is very important.

评论 #28558435 未加载

elephantumover 3 years ago

Nice, are there any benchmarks?Would be interesting to see how it compares to Postgres or LevelDB for read/write of exact valuesAnd how it compares to Faiss/Annoy for KNN

tourist_on_roadover 3 years ago

Great work! Looks like you are using HNSWLIB. From what I understand HNSW graph based approach can be memory intensive compared PQ code based approach. FAISS has support for both HNSW and PQ codes. Any plans on extending your work to support PQ code based index in future?

评论 #28555266 未加载

planetspriteover 3 years ago

What makes this different from something like gensim? They have vector search for doc2vec embeddings.

评论 #28554474 未加载

deployover 3 years ago

This looks awesome - psyched to try! Embeddings are a bitch, nice to see some new tools for managing them :)

评论 #28553848 未加载

sathergateover 3 years ago

which search algorithm does it use?

评论 #28553235 未加载

andreawangaheadover 3 years ago

keep up with the good work!

11 comments

cyrusthegreatover 3 years ago

评论 #28554717 未加载

评论 #28552794 未加载

评论 #28553392 未加载

评论 #28554774 未加载

barefegover 3 years ago

评论 #28558470 未加载

shabbyjoonover 3 years ago

How is this different from Pinecone, Milvus, and Faiss?

评论 #28555008 未加载

评论 #28555704 未加载

nelsondevover 3 years ago

评论 #28553663 未加载

kevin948over 3 years ago

评论 #28558435 未加载

elephantumover 3 years ago

Nice, are there any benchmarks?Would be interesting to see how it compares to Postgres or LevelDB for read/write of exact valuesAnd how it compares to Faiss/Annoy for KNN

tourist_on_roadover 3 years ago

评论 #28555266 未加载

planetspriteover 3 years ago

What makes this different from something like gensim? They have vector search for doc2vec embeddings.

评论 #28554474 未加载

deployover 3 years ago

This looks awesome - psyched to try! Embeddings are a bitch, nice to see some new tools for managing them :)

评论 #28553848 未加载

sathergateover 3 years ago

which search algorithm does it use?

评论 #28553235 未加载

andreawangaheadover 3 years ago

keep up with the good work!

Show HN: Embeddinghub: A vector database built for Machine Learning embeddings

11 comments

Show HN: Embeddinghub: A vector database built for Machine Learning embeddings

11 comments