TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

EuclidesDB: a multi-model machine learning feature database

159 点作者 drewvolpe超过 6 年前

11 条评论

bradhe超过 6 年前
This is awesome! My company, Reflect, was looking at building a set of features similar to this but on top of existing data in peoples&#x27; systems right before we were acquired.<p>There&#x27;s a big market of people who are looking to do simple data science and machine learning but don&#x27;t know how to get started, don&#x27;t have a lot of expertise to implement the algorithms, and the required ETL looks really daunting. You could put all of this on rails by looking integrating with existing systems.
评论 #18492358 未加载
enisberk超过 6 年前
Reddit post by Author: <a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;MachineLearning&#x2F;comments&#x2F;9yhsbu&#x2F;p_euclidesdb_a_multimodel_machine_learning&#x2F;?ref=readnext" rel="nofollow">https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;MachineLearning&#x2F;comments&#x2F;9yhsbu&#x2F;p_e...</a>
Rafuino超过 6 年前
Very cool. I&#x27;ve been doing some research on DBs and how deep learning researchers connect them to their data pipelines, but this is the first open source one I&#x27;ve seen explicitly designed around that purpose. One thing I noticed is that LMDB is quite widely used, at least according to my research... At least one paper I read said the following: &quot;LMDB database files . . . are predominant in the deep learning community&quot; [1]. What makes EuclidesDB different than LMDB?<p>[1] <a href="https:&#x2F;&#x2F;www.mcs.anl.gov&#x2F;papers&#x2F;P7075-0717.pdf" rel="nofollow">https:&#x2F;&#x2F;www.mcs.anl.gov&#x2F;papers&#x2F;P7075-0717.pdf</a>
评论 #18491093 未加载
alienreborn超过 6 年前
Looks interesting. Are you the creator? If so..<p>Any reason why you chose LevelDB?<p>I was actually looking at clipper.ai for model serving at my work. I know it&#x27;s not 1-to-1 comparison as clipper is much more generic where as this seem to tie in closely with PyTorch. Can it support models created using other libraries?
评论 #18488945 未加载
评论 #18488754 未加载
评论 #18488718 未加载
评论 #18488719 未加载
btown超过 6 年前
<a href="https:&#x2F;&#x2F;clarifai.com&#x2F;developer&#x2F;guide&#x2F;search#search" rel="nofollow">https:&#x2F;&#x2F;clarifai.com&#x2F;developer&#x2F;guide&#x2F;search#search</a> does something very similar as a service; it allows you to ingest numerous images, then feed them through constantly-evolving models and have any number of model-based indices over those images that can answer similarity queries based on previously-unseen inputs. Great to see that there&#x27;s open-source competition, and that they&#x27;re focusing on developer productivity (via the tight coupling with Torch) rather than prematurely adding layers of abstraction.
评论 #18490130 未加载
评论 #18489147 未加载
eggie5超过 6 年前
I gave a talk on the theory behind image to image search if anyone is interested. Image search is essentially what this backend well suited for and what the graphic on their home page uses:<p><a href="http:&#x2F;&#x2F;www.eggie5.com&#x2F;126-semantic-image-search-video" rel="nofollow">http:&#x2F;&#x2F;www.eggie5.com&#x2F;126-semantic-image-search-video</a>
评论 #18490999 未加载
eggie5超过 6 年前
If you&#x27;ve every tried to deploy a deep learning based image to image search product, you will know the engineering challenges especially with the Approximate Nearest Neighbors infrastructure. This is a good progress in abstracting out that step!
评论 #18491692 未加载
sandGorgon超过 6 年前
Has anyone done this serialization with a relational DB like Postgres - which is has this hugely performant key-value store called Hstore or JSONB ?<p>This coupling to pytorch is very cool, but basing this on a production capable database like postgres (which has incredible hosted solutions like Google Cloud SQL, AWS RDS, Azure , etc) would be much more useful.
评论 #18492395 未加载
评论 #18492973 未加载
pilooch超过 6 年前
Interesting DB for feature storage and LSH is good choice I believe. I&#x27;m wondering why the tight link to pytorch C++ tensors (under refactoring actually), bit I haven&#x27;t looked at the euclidendb code yet. Thanks for sharing !<p>Those interested can also find an open source integration of lmdb + annoy here: <a href="https:&#x2F;&#x2F;github.com&#x2F;jolibrain&#x2F;deepdetect&#x2F;blob&#x2F;master&#x2F;src&#x2F;simsearch.cc#L188" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;jolibrain&#x2F;deepdetect&#x2F;blob&#x2F;master&#x2F;src&#x2F;sims...</a><p>This the underlying support for similarity search based on embeddings, including images and object similarity search, see <a href="https:&#x2F;&#x2F;github.com&#x2F;jolibrain&#x2F;deepdetect&#x2F;tree&#x2F;master&#x2F;demo&#x2F;objsearch" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;jolibrain&#x2F;deepdetect&#x2F;tree&#x2F;master&#x2F;demo&#x2F;obj...</a><p>This is running for apps such as a Shazam for art, faster annotation tooling and text similarity search.<p>Annoy only supports indexing once, while hnwlib supports incremental indexing, something I&#x27;m looking at.
评论 #18493071 未加载
评论 #18493816 未加载
bratao超过 6 年前
Fantastic project! Vespa.ai is also an alternative more focused in NLP
devj超过 6 年前
Noob question: Which data serialisation format is used to represent models? Are there any standardisation efforts being undertaken by the community?
评论 #18491907 未加载
评论 #18496358 未加载