TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Stream Processing and Probabilistic Methods: Data at Scale

25 点作者 tylertreat超过 10 年前

2 条评论

xaa超过 10 年前
It&#x27;s good that the &quot;Big Data&quot; community is finally shifting the paradigm (back) to stream processing. I mean, it is the abstraction behind pipes, which were invented when data we now consider small was Big. Now if only someone will take the UNIX pipe and make it transparently multi-machine instead of writing an ungodly large Java framework to emulate them badly, slowly, and verbosely...<p>However, I was a little disappointed by the &quot;probabilistic&quot; methods. I was thinking of things like approximate kNN, online regression, that sort of thing, in which you actually trade speed and streamability for accuracy. Bloom filters don&#x27;t actually lose any accuracy in the example given, since there is a fallback to a database in the case of a false positive. Instead they are an optimization technique.<p>The more interesting probabilistic methods to me are the ones that say: we are willing to give up the accuracy of the traditional technique, but are hoping to make up for it by being able to process more data. But of course &quot;probabilistic method&quot; is a broad and context-dependent term.
评论 #9048883 未加载
rawnlq超过 10 年前
Another pretty good article on this topic: <a href="https://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/" rel="nofollow">https:&#x2F;&#x2F;highlyscalable.wordpress.com&#x2F;2012&#x2F;05&#x2F;01&#x2F;probabilisti...</a>