TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A novel approach to entity resolution using serverless technology

63 点作者 Major_Grooves将近 4 年前

6 条评论

Major_Grooves将近 4 年前
Hi, I’m one of the (prospective) co-founders of TiloDB, a serverless “entity-resolution” technology.<p>We built TiloDB as the tech team at a European consumer credit bureau when we were faced with the technical challenge of how to assemble hundreds of millions of data sets about tens of millions of people in a way that is scalable and allows fast searching, without breaking the bank.<p>We tried various technologies, such as graph databases, but none of them could give us satisfactory performance.<p>So we turned to the opportunities of serverless technology (AWS specifically) to build a new type of entity resolution technology.<p>In this article we write about the technology breakthroughs that led to TiloDB, and there is also an interactive demo where you can submit data, see it linked, and see other people submitting data in real-time.<p>We want to spin the tech out into a new company, release it as OSS, and so are keen to hear about potential use cases you might have.
评论 #28139293 未加载
Major_Grooves将近 4 年前
This blog post from the CTO of VMware and Springsource, gives a pretty good summary of the entity-resolution field: <a href="https:&#x2F;&#x2F;blog.acolyer.org&#x2F;2020&#x2F;12&#x2F;14&#x2F;entity-resolution&#x2F;" rel="nofollow">https:&#x2F;&#x2F;blog.acolyer.org&#x2F;2020&#x2F;12&#x2F;14&#x2F;entity-resolution&#x2F;</a>
lmeyerov将近 4 年前
ER for identity graphs is a great use case! We see teams do this a lot and with not-great tools. (Ex: users&#x2F;IPs in splunk&#x2F;elastic, which are better for simpler matches.)<p>For one Graphistry project, we run a single node neo4j with 0.5b nodes&#x2F;edges, so something in the description isn&#x27;t adding up for me here wrt perf. Maybe an open benchmark would help?<p>I do agree indexing matters, as that was night&#x2F;day for our use cases. For ML workloads, we are looking at vector indexes, which graph DBs do not currently support. The ones in this article are on text and take &gt; 100ms, so I&#x27;m curious..
评论 #28140575 未加载
Major_Grooves将近 4 年前
I&#x27;d be very interested to hear peoples&#x27; thoughts on OSS licences. We are rather new to that world so very rapidly learning about the difference between Open Core, Elastic 2.0 and Apache Licence etc.<p>What licence should a new company like us adopt when we want to build a community but we also want to commercialise the technology, especially when we already have a &quot;enterprise ready&quot; version of the tech?
评论 #28140596 未加载
trhway将近 4 年前
from experience with similar product (where we had similarly sounding way of entity resolution based on rule based fuzzy indexes and fuzzy matching, and it was working for tens of millions of entities on regular, though beefy, RDBMS more than a decade ago) - the issue isn&#x27;t that much technological, it is that each customer&#x2F;client has custom everything when it comes to ER, and thus scaling that business is extremely hard (that specific business collapsed primarily for that reason)
评论 #28139337 未加载
评论 #28139288 未加载
mkhnews将近 4 年前
Ever look into HPCCSystems ?
评论 #28141664 未加载