科技回声

11 条评论

Hi everyone!Over the years, I've found myself building hacky solutions to serve and manage my embeddings. I’m excited to share Embeddinghub, an open-source vector database for ML embeddings. It is built with four goals in mind:Store embeddings durably and with high availabilityAllow for approximate nearest neighbor operationsEnable other operations like partitioning, sub-indices, and averagingManage versioning, access control, and rollbacks painlesslyIt's still in the early stages, and before we committed more dev time to it we wanted to get your feedback. Let us know what you think and what you'd like to see!Repo: <a href="https://github.com/featureform/embeddinghub" rel="nofollow">https://github.com/featureform/embeddinghub</a>Docs: <a href="https://docs.featureform.com/" rel="nofollow">https://docs.featureform.com/</a>What's an Embedding? The Definitive Guide to Embeddings: <a href="https://www.featureform.com/post/the-definitive-guide-to-embeddings" rel="nofollow">https://www.featureform.com/post/the-definitive-guide-to-emb...</a>

评论 #28554717 未加载

评论 #28552794 未加载

评论 #28553392 未加载

评论 #28554774 未加载

barefeg超过 3 年前

Where can I find documentation on versioning? My first use case would be to versión different embeddings and use it more like a storage backend than to search for KNN. Would it be possible to not create the NN graph and just use it for versioned storage? We currently use opendistro and it nicely allows doing pre and post filtering based on other document fields (other than the embedding). Therefore I think this could never be a full replacement without figuring out how to combine the rest of the document structure

评论 #28558470 未加载

shabbyjoon超过 3 年前

How is this different from Pinecone, Milvus, and Faiss?

评论 #28555008 未加载

评论 #28555704 未加载

nelsondev超过 3 年前

Cool! Nice work! Do you have any performance numbers you could share?Specifically around nearest neighbor computation latency, a regular get embedding latency, read/write rate achieved on a machine?

评论 #28553663 未加载

kevin948超过 3 年前

This is really great! It speaks very much to my use-case (building user embeddings and serving them both to analysts + other ML models).I was wondering if there was a reasonable way to store raw data next to the embeddings such that: 1. Analysts can run queries to filter down to a space they understand (the raw data). 2. Nearest neighbors can be run on top of their selection on the embedding space.Our main use case is segmentation, so giving analysts access to the raw feature space is very important.

评论 #28558435 未加载

elephantum超过 3 年前

Nice, are there any benchmarks?Would be interesting to see how it compares to Postgres or LevelDB for read/write of exact valuesAnd how it compares to Faiss/Annoy for KNN

tourist_on_road超过 3 年前

Great work! Looks like you are using HNSWLIB. From what I understand HNSW graph based approach can be memory intensive compared PQ code based approach. FAISS has support for both HNSW and PQ codes. Any plans on extending your work to support PQ code based index in future?

评论 #28555266 未加载

planetsprite超过 3 年前

What makes this different from something like gensim? They have vector search for doc2vec embeddings.

评论 #28554474 未加载

deploy超过 3 年前

This looks awesome - psyched to try! Embeddings are a bitch, nice to see some new tools for managing them :)

评论 #28553848 未加载

sathergate超过 3 年前

which search algorithm does it use?

评论 #28553235 未加载

andreawangahead超过 3 年前

keep up with the good work!

11 条评论

cyrusthegreat超过 3 年前

评论 #28554717 未加载

评论 #28552794 未加载

评论 #28553392 未加载

评论 #28554774 未加载

barefeg超过 3 年前

评论 #28558470 未加载

shabbyjoon超过 3 年前

How is this different from Pinecone, Milvus, and Faiss?

评论 #28555008 未加载

评论 #28555704 未加载

nelsondev超过 3 年前

评论 #28553663 未加载

kevin948超过 3 年前

评论 #28558435 未加载

elephantum超过 3 年前

Nice, are there any benchmarks?Would be interesting to see how it compares to Postgres or LevelDB for read/write of exact valuesAnd how it compares to Faiss/Annoy for KNN

tourist_on_road超过 3 年前

评论 #28555266 未加载

planetsprite超过 3 年前

What makes this different from something like gensim? They have vector search for doc2vec embeddings.

评论 #28554474 未加载

deploy超过 3 年前

This looks awesome - psyched to try! Embeddings are a bitch, nice to see some new tools for managing them :)

评论 #28553848 未加载

sathergate超过 3 年前

which search algorithm does it use?

评论 #28553235 未加载

andreawangahead超过 3 年前

keep up with the good work!

Show HN: Embeddinghub: A vector database built for Machine Learning embeddings

11 条评论

Show HN: Embeddinghub: A vector database built for Machine Learning embeddings

11 条评论