TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

The world beyond batch: Streaming 101

71 点作者 dpmehta02超过 9 年前

3 条评论

alephnil超过 9 年前
Interesting article, but the author often seems to be carried away. Like below:<p>&gt; Quite honestly, I’d take things a step further. I would argue that well-designed streaming systems actually provide a strict superset of batch functionality.<p>At least seen from a computer science perspective, this is strictly wrong. In problems like bin-packing, it has been shown that there are cases where streaming is provably less optimal than batch processing. In the batch case you always have all data available, while in the streaming case need to make decisions before you have seen all the data, which may lead to suboptimal results. From this perspective it is the other way around. Batch processing is a strict superset of streaming.<p>The reason streaming often is preferred to batch processing is that you get the result sooner, and don&#x27;t have to wait until all data are there before you have a result. Such answers are often much more valuable than the accurate answer you get in the end.
评论 #10536429 未加载
chollida1超过 9 年前
I made a comment a few weeks ago here: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=10459992" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=10459992</a><p>that made the point that hedge funds are now spending a lot of their processing and research budgets on consuming many different types of streams of data.<p>Complex Event Processing has been a main stay of algorithmic traders for the past 8-10 years and has been a key tool in dealing with amalgamating these streams of real time data. The author didn&#x27;t define CEP as it relates to Stream processing but this quora article does a good job of trying to:<p><a href="https:&#x2F;&#x2F;www.quora.com&#x2F;How-is-stream-processing-and-complex-event-processing-CEP-different" rel="nofollow">https:&#x2F;&#x2F;www.quora.com&#x2F;How-is-stream-processing-and-complex-e...</a><p>So instead of the database example of where you&#x27;d store your data and then ask questions about it, you&#x27;d ask the question first and then push your streams of data through the library and have it call you when the question you asked became true.<p>For a practical example of how this is used in algorithmic trading. You might have streams that consist of market data, a stream for twitter sentiment, a stream for the sentiment of a real time news feed, streams to follow the futures market and a stream to indicate if there are any upcoming Fed reporting windows.<p>You might then create a streaming query to say notify me when a gold stock has 4 new daily highs in a 5 minute window and the gold futures are within 10% of their daily highs and no negative news sentiment for this stock has been seen in the past 30 minutes and there are no upcoming fed reports due today.<p>If this query reports back to you that this event has occurred, you could then translate this into an order to be sent to the market.<p>you give up some performance for the CEP overhead but it makes maintaining the logic much easier than having each algo have to manually track the state of each of those streams.<p>They also have the nice ability to make your back testing and unit testing easier by allowing you to step time forward in discrete intervals. So if you have an application that needs to be notified 20 milliseconds after an event you can step time forward in a non real time manner, probably faster when unit testing to verify your call back is correctly called.
评论 #10536078 未加载
评论 #10536910 未加载
127001brewer超过 9 年前
This might be too much of a newbie question, but how do these types of data stores (such as &quot;data lakes&quot;) get populated?<p>For example, and again I might be too uninformed here to ask the correct question, do you use ETL-type tools to get and store data? Or is it usually scripts that pull in (and process) data from various sources?
评论 #10535224 未加载