TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Analytics data integrity is tough

49 点作者 pragmacoders大约 7 年前

3 条评论

teej大约 7 年前
I&#x27;m surprised that there isn&#x27;t a mention of Google Dataflow aka Apache Beam. The Beam programming model is specifically designed to solve nearly all of the problems this post is addressing.<p>&gt; It is likely that one day, I&#x27;ll need to shard the data and distribute the processing amongst multiple servers. But no company I&#x27;ve used this with currently has enough data flowing through its analytics system, or intense amounts of real-time processing, to warrant such a complexity.<p>This solution is so over-engineered for low data volume. You could capture all of the business value for 10% of the engineering effort by just dumping this data into a database meant for analytics. And then you&#x27;d at least have an answer for things like fixing broken data, making full-history business logic changes, merging events, etc.<p>If you&#x27;re in AWS, sending your events from snowplow into S3 and then into Redshift&#x2F;Athena&#x2F;Presto&#x2F;PostgreSQL is the way to go.
评论 #16615314 未加载
sturgill大约 7 年前
I’d recommend the event producer send a UUID that is then the primary key on the events table. The producer should also send the timestamp the event occurred.<p>I could be missing something, but that seems to solve both the duplicate event firing (an upsert command based on the UUID makes duplicate event writing a non-issue) and the timing issues.<p>Though I’m still incredibly skeptical of “real-time analytics.” The number of business cases that require actual real-time analysis are pretty limited. High frequency trading and...?
评论 #16617709 未加载
评论 #16615340 未加载
slap_shot大约 7 年前
I have a lot issues with this project and I’m on my phone so I can’t outline them all. At quick glance, a few comments have already addressed some of these concerns.<p>But my biggest question is WHY did this person feel it necessary to do this project? From first glance, there is no way Crystal is producing the traffic required to roll this solution. There are dozens of companies that can solve this problem for a few hundred dollars a month and have handled all the problems discussed in this article at serious scale for their customers.<p>The most irritating part is that the developer states in the beginning why he did this: because it’s fun.<p>Disclosure: I’m a founder of company whose core product is a real time analytics platform for web and mobile. Of course I’m going to recommend “Buy” for a small company like this in a Build vs Buy analysis. But when “professional engineers” say they’re building projects “because they are fun”, a lot of people suffer.
评论 #16616147 未加载