TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Effective Web App Analytics with Redis

64 点作者 hdeshev大约 13 年前

5 条评论

trun大约 13 年前
Great article. I've built a number of systems very similar to this and have found Redis to be a fantastic platform, both in terms of reliability and flexibiliy. I'll share a few tips that I think complement your approach.<p>- When you want to compute metrics for multiple intervals (hour / day / month / etc) Redis' MULTI / EXEC constructs make transactional updates to multiple keys a snap. Additionally batching (which is supported by most Redis clients) can <i>dramatically</i> improve performance.<p>- You can use Redis sets for computing uniques in realtime. You can also use set operations like SUNION to compute uniques across multiple time periods relatively quickly. For example, SUNION 24 hour intervals to get the total uniques for the day. You just have to be careful that large numbers of uniques eat up your available memory <i>very</i> quickly. EXPIREAT helps ensure things get cleaned up automatically.<p>- Using a Redis list as an event queue is a great way to further ensure atomicity. Use RPOPLPUSH to move events to a 'uncommitted' queue while processing a batch of events. If you have to rollback, just pop them back on to the original list.
评论 #3768227 未加载
thibaut_barrere大约 13 年前
First, thanks for sharing! Then a comment on this:<p>"I've done implementations of the above using SQL databases (MySQL) and it wasn't fun at all. The storage mechanism is awkward - put all your values in a single table and have them keyed according to stats name and period. That makes querying for the data weird too. That is not a showstopper though - I could do it. The real problem is hitting your main DB a couple of times in a web request, and that is definitely a no-no."<p>This is not a SQL vs NOSQL issue: decoupling the reporting system from your main (production/transaction) system is a widely advised practice in "business intelligence".<p>Use a different instance, with a schema designed for reporting.<p>You can use Redis for that (and I use it actually!) but you can also use MySQL or any other RDBMS.<p>It's fairly easy to implement: one line for each fact, then foreign keys to a date dimension and hour dimension (see [1]), then you can sum on date ranges, hour ranges, drill down etc, on many different metrics.<p>[1] <a href="https://github.com/activewarehouse/activewarehouse-etl-sample/tree/master/etl" rel="nofollow">https://github.com/activewarehouse/activewarehouse-etl-sampl...</a>
评论 #3766625 未加载
ihsw大约 13 年前
&#62; The above mechanism needs some finishing touches. The first is data expiration. If you don't need daily data for more than 30 days back, you need to delete it yourself. The same goes for expiring monthly data - in our case stuff older than 12 months. We do it in a cron job that runs once a day. We just loop over all series and trim the expired elements from the hashes.<p>Rather than iterating over the entire list of series and checking for expired elements you can use a sorted set and assign a time-based score. The cron job can still run once a day but you can find items in that sorted set that have members below a certain score threshold, which will almost certainly be faster.<p>Naturally this will increase memory usage (which may be undesired) but it's food for thought. Eventually the looping and trimming expired hashes can be coded using lua server-side scripting in redis-2.6, which is interesting in a different way and has it's own challenges.
评论 #3767673 未加载
tptacek大约 13 年前
There's a blog post from Salvatore somewhere talking about how he marshalled time series data into strings which made me thing the naive/straightforward approach was suboptimal. I always thought ZSETs indexed by time_ts would be a good fit for this.
评论 #3766526 未加载
评论 #3766409 未加载
bradleyland大约 13 年前
This is cool, but if you're looking to work with time series data, you should definitely have a look at RRD. A lot of the operations you'd want to perform on time series data are available internally with RRD. RRD can also do some cool stuff like generate graphs.
评论 #3766460 未加载