科技回声

11 条评论

bradhe超过 6 年前

This is awesome! My company, Reflect, was looking at building a set of features similar to this but on top of existing data in peoples' systems right before we were acquired.There's a big market of people who are looking to do simple data science and machine learning but don't know how to get started, don't have a lot of expertise to implement the algorithms, and the required ETL looks really daunting. You could put all of this on rails by looking integrating with existing systems.

评论 #18492358 未加载

enisberk超过 6 年前

Reddit post by Author: <a href="https://www.reddit.com/r/MachineLearning/comments/9yhsbu/p_euclidesdb_a_multimodel_machine_learning/?ref=readnext" rel="nofollow">https://www.reddit.com/r/MachineLearning/comments/9yhsbu/p_e...</a>

Rafuino超过 6 年前

Very cool. I've been doing some research on DBs and how deep learning researchers connect them to their data pipelines, but this is the first open source one I've seen explicitly designed around that purpose. One thing I noticed is that LMDB is quite widely used, at least according to my research... At least one paper I read said the following: "LMDB database files . . . are predominant in the deep learning community" [1]. What makes EuclidesDB different than LMDB?[1] <a href="https://www.mcs.anl.gov/papers/P7075-0717.pdf" rel="nofollow">https://www.mcs.anl.gov/papers/P7075-0717.pdf</a>

评论 #18491093 未加载

alienreborn超过 6 年前

Looks interesting. Are you the creator? If so..Any reason why you chose LevelDB?I was actually looking at clipper.ai for model serving at my work. I know it's not 1-to-1 comparison as clipper is much more generic where as this seem to tie in closely with PyTorch. Can it support models created using other libraries?

评论 #18488945 未加载

评论 #18488754 未加载

评论 #18488718 未加载

评论 #18488719 未加载

btown超过 6 年前

<a href="https://clarifai.com/developer/guide/search#search" rel="nofollow">https://clarifai.com/developer/guide/search#search</a> does something very similar as a service; it allows you to ingest numerous images, then feed them through constantly-evolving models and have any number of model-based indices over those images that can answer similarity queries based on previously-unseen inputs. Great to see that there's open-source competition, and that they're focusing on developer productivity (via the tight coupling with Torch) rather than prematurely adding layers of abstraction.

评论 #18490130 未加载

评论 #18489147 未加载

eggie5超过 6 年前

I gave a talk on the theory behind image to image search if anyone is interested. Image search is essentially what this backend well suited for and what the graphic on their home page uses:<a href="http://www.eggie5.com/126-semantic-image-search-video" rel="nofollow">http://www.eggie5.com/126-semantic-image-search-video</a>

评论 #18490999 未加载

eggie5超过 6 年前

If you've every tried to deploy a deep learning based image to image search product, you will know the engineering challenges especially with the Approximate Nearest Neighbors infrastructure. This is a good progress in abstracting out that step!

评论 #18491692 未加载

sandGorgon超过 6 年前

Has anyone done this serialization with a relational DB like Postgres - which is has this hugely performant key-value store called Hstore or JSONB ?This coupling to pytorch is very cool, but basing this on a production capable database like postgres (which has incredible hosted solutions like Google Cloud SQL, AWS RDS, Azure , etc) would be much more useful.

评论 #18492395 未加载

评论 #18492973 未加载

pilooch超过 6 年前

Interesting DB for feature storage and LSH is good choice I believe. I'm wondering why the tight link to pytorch C++ tensors (under refactoring actually), bit I haven't looked at the euclidendb code yet. Thanks for sharing !Those interested can also find an open source integration of lmdb + annoy here: <a href="https://github.com/jolibrain/deepdetect/blob/master/src/simsearch.cc#L188" rel="nofollow">https://github.com/jolibrain/deepdetect/blob/master/src/sims...</a>This the underlying support for similarity search based on embeddings, including images and object similarity search, see <a href="https://github.com/jolibrain/deepdetect/tree/master/demo/objsearch" rel="nofollow">https://github.com/jolibrain/deepdetect/tree/master/demo/obj...</a>This is running for apps such as a Shazam for art, faster annotation tooling and text similarity search.Annoy only supports indexing once, while hnwlib supports incremental indexing, something I'm looking at.

评论 #18493071 未加载

评论 #18493816 未加载

bratao超过 6 年前

Fantastic project! Vespa.ai is also an alternative more focused in NLP

devj超过 6 年前

Noob question: Which data serialisation format is used to represent models? Are there any standardisation efforts being undertaken by the community?

评论 #18491907 未加载

评论 #18496358 未加载

11 条评论

bradhe超过 6 年前

评论 #18492358 未加载

enisberk超过 6 年前

Rafuino超过 6 年前

评论 #18491093 未加载

alienreborn超过 6 年前

评论 #18488945 未加载

评论 #18488754 未加载

评论 #18488718 未加载

评论 #18488719 未加载

btown超过 6 年前

评论 #18490130 未加载

评论 #18489147 未加载

eggie5超过 6 年前

评论 #18490999 未加载

eggie5超过 6 年前

评论 #18491692 未加载

sandGorgon超过 6 年前

评论 #18492395 未加载

评论 #18492973 未加载

pilooch超过 6 年前

评论 #18493071 未加载

评论 #18493816 未加载

bratao超过 6 年前

Fantastic project! Vespa.ai is also an alternative more focused in NLP

devj超过 6 年前

Noob question: Which data serialisation format is used to represent models? Are there any standardisation efforts being undertaken by the community?

评论 #18491907 未加载

评论 #18496358 未加载

EuclidesDB: a multi-model machine learning feature database

11 条评论

EuclidesDB: a multi-model machine learning feature database

11 条评论