TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

View Counting at Reddit

489 pointsby strzalekalmost 8 years ago

16 comments

haburkaalmost 8 years ago
I love the article on hyperloglog! It is really quite good to read even if you&#x27;re not interested in algorithms. I always liked number theory and I think that it&#x27;s very interesting that you can guess how many uniques there are by counting how long your longest run of zeroes in a hash is.<p>I suppose this could be broken by injecting in a unique visitor id that would hash to something with an absurd amount of zeroes? That&#x27;s assuming that the user has control over their user id and that I&#x27;m understanding the algorithm correctly.
评论 #14431824 未加载
评论 #14432983 未加载
评论 #14431822 未加载
nyaralmost 8 years ago
&quot;We want to better communicate the scale of Reddit to our users.&quot;<p>If that&#x27;s true why did they hide vote numbers on comments and posts? It used to say &quot;xxx upvotes xxx downvotes&quot; now it just gives a number and hides that.
评论 #14432674 未加载
评论 #14432998 未加载
评论 #14433524 未加载
mxmxmalmost 8 years ago
Counting views&#x2F;impressions in combination with Apache Kafka sounds like the ideal use case for a stream processor like Apache Flink. It supports very large state which can be managed off-hand. This should enable you to count the exact number of unique views in real time with exactly once semantics. Here is a blog post on large scale counting with more details. It also includes a comparison with other streaming technologies like Sanza and Spark: <a href="https:&#x2F;&#x2F;data-artisans.com&#x2F;blog&#x2F;counting-in-streams-a-hierarchy-of-needs" rel="nofollow">https:&#x2F;&#x2F;data-artisans.com&#x2F;blog&#x2F;counting-in-streams-a-hierarc...</a><p>Also check out this blog post by a Twitter engineer on counting ad impressions: <a href="https:&#x2F;&#x2F;data-artisans.com&#x2F;blog&#x2F;extending-the-yahoo-streaming-benchmark" rel="nofollow">https:&#x2F;&#x2F;data-artisans.com&#x2F;blog&#x2F;extending-the-yahoo-streaming...</a>
noamhackeralmost 8 years ago
How do you test a system like this for accuracy? Is this done by simulating millions of unique requests?
评论 #14433077 未加载
评论 #14432916 未加载
评论 #14432909 未加载
alzaeemalmost 8 years ago
So how do they determine whether a user has viewed a post already? I would think that unique counting is accomplished using the hyperloglog counter, but the article says that this decision is made by the Nazar system, which doesn&#x27;t use the hyperloglog counter in Redis.
评论 #14432262 未加载
评论 #14432273 未加载
评论 #14432657 未加载
stoickingalmost 8 years ago
Given how much simpler it is to count total views than unique user views, why is it more valuable to count unique user views?
评论 #14432198 未加载
评论 #14431894 未加载
评论 #14431902 未加载
评论 #14433187 未加载
tudorconstantinalmost 8 years ago
Wouldn&#x27;t it had been easier to simply increment a counter for each visit and then set a short lived cookie in the browser for that post? And put the spam detection system before the counter increment
评论 #14432853 未加载
评论 #14432206 未加载
评论 #14432007 未加载
评论 #14432644 未加载
评论 #14432859 未加载
评论 #14431857 未加载
tsukaisutealmost 8 years ago
Weird thing I have been seeing on Reddit is comment upvotes being off-by-one periodically on page refreshes. Reload, you get 3. Reload again, you get 4. Again, you get 3. Seems like a replication issue?
评论 #14431725 未加载
评论 #14431734 未加载
评论 #14431723 未加载
theomegaalmost 8 years ago
Very interesting article, thanks for publishing.<p>I have two related questions: 1. I assume the process which reads from Cassandra and puts it back to Redis is parallized if not even distributed. How do you ensure correctness? Implementing 2PC seems extreme overhead. Or do you lock in Redis? 2. What database is used to actually store the view counts? Cassandras Counters are afaik not very reliable...
评论 #14434992 未加载
ronalbarbarenalmost 8 years ago
Thanks Reddit guys. I hope engineer of Youtube will post similar article. Still curious how Youtube count.
hellbanneralmost 8 years ago
Slightly OT; but I wish reddit would use traditional forum style replies to push threads up, instead of the positive feedback loop of votes with opinions that agree with majority getting upvotes giving views which give proportionally more upvotes
评论 #14431763 未加载
评论 #14431740 未加载
评论 #14432179 未加载
federicoponzialmost 8 years ago
Probably noob question, but:<p>&gt;&gt; Nazar will then alter the event, adding a Boolean flag indicating whether or not it should be counted, before sending the event back to Kafka.<p>Why don&#x27;t they just discard it instead of reputting the event back to Kafka?
评论 #14433451 未加载
golergkaalmost 8 years ago
A beautiful example of how a feature that seems so easy to an end user can be complex at scale.
fiatjafalmost 8 years ago
At <a href="https:&#x2F;&#x2F;trackingco.de&#x2F;" rel="nofollow">https:&#x2F;&#x2F;trackingco.de&#x2F;</a> we store events on Redis and compile them daily into a reduced string format, storing these on CouchDB.
ugh123almost 8 years ago
Forgive my ignorance, but isn&#x27;t this what Google Analytics is for?
评论 #14432126 未加载
评论 #14432000 未加载
评论 #14432006 未加载
评论 #14432165 未加载
qrbLPHiKpiuxalmost 8 years ago
Not applied to &#x2F;r&#x2F;the_donald however.
评论 #14431807 未加载
评论 #14431766 未加载