TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Stream Processing and Probabilistic Methods: Data at Scale

25 pointsby tylertreatover 10 years ago

2 comments

xaaover 10 years ago
It&#x27;s good that the &quot;Big Data&quot; community is finally shifting the paradigm (back) to stream processing. I mean, it is the abstraction behind pipes, which were invented when data we now consider small was Big. Now if only someone will take the UNIX pipe and make it transparently multi-machine instead of writing an ungodly large Java framework to emulate them badly, slowly, and verbosely...<p>However, I was a little disappointed by the &quot;probabilistic&quot; methods. I was thinking of things like approximate kNN, online regression, that sort of thing, in which you actually trade speed and streamability for accuracy. Bloom filters don&#x27;t actually lose any accuracy in the example given, since there is a fallback to a database in the case of a false positive. Instead they are an optimization technique.<p>The more interesting probabilistic methods to me are the ones that say: we are willing to give up the accuracy of the traditional technique, but are hoping to make up for it by being able to process more data. But of course &quot;probabilistic method&quot; is a broad and context-dependent term.
评论 #9048883 未加载
rawnlqover 10 years ago
Another pretty good article on this topic: <a href="https://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/" rel="nofollow">https:&#x2F;&#x2F;highlyscalable.wordpress.com&#x2F;2012&#x2F;05&#x2F;01&#x2F;probabilisti...</a>