TechEcho

16 comments

haburkaalmost 8 years ago

I love the article on hyperloglog! It is really quite good to read even if you're not interested in algorithms. I always liked number theory and I think that it's very interesting that you can guess how many uniques there are by counting how long your longest run of zeroes in a hash is.I suppose this could be broken by injecting in a unique visitor id that would hash to something with an absurd amount of zeroes? That's assuming that the user has control over their user id and that I'm understanding the algorithm correctly.

评论 #14431824 未加载

评论 #14432983 未加载

评论 #14431822 未加载

nyaralmost 8 years ago

"We want to better communicate the scale of Reddit to our users."If that's true why did they hide vote numbers on comments and posts? It used to say "xxx upvotes xxx downvotes" now it just gives a number and hides that.

评论 #14432674 未加载

评论 #14432998 未加载

评论 #14433524 未加载

mxmxmalmost 8 years ago

Counting views/impressions in combination with Apache Kafka sounds like the ideal use case for a stream processor like Apache Flink. It supports very large state which can be managed off-hand. This should enable you to count the exact number of unique views in real time with exactly once semantics. Here is a blog post on large scale counting with more details. It also includes a comparison with other streaming technologies like Sanza and Spark: <a href="https://data-artisans.com/blog/counting-in-streams-a-hierarchy-of-needs" rel="nofollow">https://data-artisans.com/blog/counting-in-streams-a-hierarc...</a>Also check out this blog post by a Twitter engineer on counting ad impressions: <a href="https://data-artisans.com/blog/extending-the-yahoo-streaming-benchmark" rel="nofollow">https://data-artisans.com/blog/extending-the-yahoo-streaming...</a>

noamhackeralmost 8 years ago

How do you test a system like this for accuracy? Is this done by simulating millions of unique requests?

评论 #14433077 未加载

评论 #14432916 未加载

评论 #14432909 未加载

alzaeemalmost 8 years ago

So how do they determine whether a user has viewed a post already? I would think that unique counting is accomplished using the hyperloglog counter, but the article says that this decision is made by the Nazar system, which doesn't use the hyperloglog counter in Redis.

评论 #14432262 未加载

评论 #14432273 未加载

评论 #14432657 未加载

stoickingalmost 8 years ago

Given how much simpler it is to count total views than unique user views, why is it more valuable to count unique user views?

评论 #14432198 未加载

评论 #14431894 未加载

评论 #14431902 未加载

评论 #14433187 未加载

tudorconstantinalmost 8 years ago

Wouldn't it had been easier to simply increment a counter for each visit and then set a short lived cookie in the browser for that post? And put the spam detection system before the counter increment

评论 #14432853 未加载

评论 #14432206 未加载

评论 #14432007 未加载

评论 #14432644 未加载

评论 #14432859 未加载

评论 #14431857 未加载

tsukaisutealmost 8 years ago

Weird thing I have been seeing on Reddit is comment upvotes being off-by-one periodically on page refreshes. Reload, you get 3. Reload again, you get 4. Again, you get 3. Seems like a replication issue?

评论 #14431725 未加载

评论 #14431734 未加载

评论 #14431723 未加载

theomegaalmost 8 years ago

Very interesting article, thanks for publishing.I have two related questions: 1. I assume the process which reads from Cassandra and puts it back to Redis is parallized if not even distributed. How do you ensure correctness? Implementing 2PC seems extreme overhead. Or do you lock in Redis? 2. What database is used to actually store the view counts? Cassandras Counters are afaik not very reliable...

评论 #14434992 未加载

ronalbarbarenalmost 8 years ago

Thanks Reddit guys. I hope engineer of Youtube will post similar article. Still curious how Youtube count.

hellbanneralmost 8 years ago

Slightly OT; but I wish reddit would use traditional forum style replies to push threads up, instead of the positive feedback loop of votes with opinions that agree with majority getting upvotes giving views which give proportionally more upvotes

评论 #14431763 未加载

评论 #14431740 未加载

评论 #14432179 未加载

federicoponzialmost 8 years ago

Probably noob question, but:>> Nazar will then alter the event, adding a Boolean flag indicating whether or not it should be counted, before sending the event back to Kafka.Why don't they just discard it instead of reputting the event back to Kafka?

评论 #14433451 未加载

golergkaalmost 8 years ago

A beautiful example of how a feature that seems so easy to an end user can be complex at scale.

fiatjafalmost 8 years ago

At <a href="https://trackingco.de/" rel="nofollow">https://trackingco.de/</a> we store events on Redis and compile them daily into a reduced string format, storing these on CouchDB.

ugh123almost 8 years ago

Forgive my ignorance, but isn't this what Google Analytics is for?

评论 #14432126 未加载

评论 #14432000 未加载

评论 #14432006 未加载

评论 #14432165 未加载

qrbLPHiKpiuxalmost 8 years ago

Not applied to /r/the_donald however.

评论 #14431807 未加载

评论 #14431766 未加载

16 comments

haburkaalmost 8 years ago

评论 #14431824 未加载

评论 #14432983 未加载

评论 #14431822 未加载

nyaralmost 8 years ago

评论 #14432674 未加载

评论 #14432998 未加载

评论 #14433524 未加载

mxmxmalmost 8 years ago

noamhackeralmost 8 years ago

How do you test a system like this for accuracy? Is this done by simulating millions of unique requests?

评论 #14433077 未加载

评论 #14432916 未加载

评论 #14432909 未加载

alzaeemalmost 8 years ago

评论 #14432262 未加载

评论 #14432273 未加载

评论 #14432657 未加载

stoickingalmost 8 years ago

Given how much simpler it is to count total views than unique user views, why is it more valuable to count unique user views?

评论 #14432198 未加载

评论 #14431894 未加载

评论 #14431902 未加载

评论 #14433187 未加载

tudorconstantinalmost 8 years ago

Wouldn't it had been easier to simply increment a counter for each visit and then set a short lived cookie in the browser for that post? And put the spam detection system before the counter increment

评论 #14432853 未加载

评论 #14432206 未加载

评论 #14432007 未加载

评论 #14432644 未加载

评论 #14432859 未加载

评论 #14431857 未加载

tsukaisutealmost 8 years ago

评论 #14431725 未加载

评论 #14431734 未加载

评论 #14431723 未加载

theomegaalmost 8 years ago

评论 #14434992 未加载

ronalbarbarenalmost 8 years ago

Thanks Reddit guys. I hope engineer of Youtube will post similar article. Still curious how Youtube count.

hellbanneralmost 8 years ago

评论 #14431763 未加载

评论 #14431740 未加载

评论 #14432179 未加载

federicoponzialmost 8 years ago

评论 #14433451 未加载

golergkaalmost 8 years ago

A beautiful example of how a feature that seems so easy to an end user can be complex at scale.

fiatjafalmost 8 years ago

At <a href="https://trackingco.de/" rel="nofollow">https://trackingco.de/</a> we store events on Redis and compile them daily into a reduced string format, storing these on CouchDB.

View Counting at Reddit

16 comments

View Counting at Reddit

16 comments