Normalizing Ratings

51 点作者 Symmetry10 天前

18 条评论

nlh9 天前

Similarly - one of my biggest complaints about almost every rating system in production is how just absolutely lazy they are. And by that, I mean everyone seems to think "the object's collective rating is an average of all the individual ratings" is good enough. It's not.Take any given Yelp / Google / Amazon page and you'll see some distribution like this:User 1: "5 stars. Everything was great!"User 2: "5 stars. I'd go here again!"User 3: "1 star. The food was delicious but the waiter was so rude!!!one11!! They forgot it was my cousin's sister's mother's birthday and they didn't kiss my hand when I sat down!! I love the food here but they need to fire that one waiter!!"Yelp: 3.6 stars average rating.One thing I always liked about FourSquare was that they did NOT use this lazy method. Their score was actually intelligent - it checked things like how often someone would return, how much time they spent there, etc. and weighted a review accordingly.

评论 #43875473 未加载

评论 #43875922 未加载

评论 #43875747 未加载

评论 #43875148 未加载

tibbar9 天前

One of my favorite algorithms for this is Expectation Maximization [0].You would start by estimating each driver's rating as the average of their ratings - and then estimate the bias of each rider by comparing the average rating they give to the estimated score of their drivers. Then you repeat the process iteratively until you see both scores (driver rating, and user bias) converge.)[0] <a href="https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm" rel="nofollow">https://en.wikipedia.org/wiki/Expectation%E2%80%93maximizati...</a>

stevage9 天前

I like rating systems from -2 to +2 for this reason.The big rating problem I have is with sites like boardgamegeek where ratings are treated by different people as either an objective rating of how good the game is within its category, or subjectively how much they like (or approve of) the game. They're two very different things and it makes the ratings much less useful than they could be.They also suffer a similar problem in that most games score 7 out of 10. 8 is exceptional, 6 is bad, and 5 is disastrous.

homeonthemtn9 天前

I'd rather we just did an increment of 3 rating. 1. Bad 2. Fine 3. Great2 and 4 are irrelevant and/or a wild guess or user defined/specific.Most of the time our rating systems devolve into roughly this state anyways.E.g.5 is excellent 4.x is fine <4 is problematicAnd then there's a sub domain of the area between 4 and 5 where a 4.1 is questionable, 4.5 is fine and 4.7+ is excellentIn the end, it's just 3 parts nested within 3 parts nested within 3 parts nested within....Let's just do 3 stars (no decimal) and call it a day

评论 #43876077 未加载

Retr0id9 天前

> I'm genuinely mystified why its not applied anywhere I can see.I wonder if companies are afraid of being accused of "cooking the books", especially in contexts where the individual ratings are visible.If I saw a product with 3x 5-star reviews and 1x 3-star review, I'd be suspicious if the overall rating was still a perfect 5 stars.

mzmzmzm9 天前

A problem with accounting for "above average" service is sometimes I don't want it. If a driver goes above and beyond, offering a water bottle or something else exceptional, occasionally I would rather be left alone during a quiet, impersonal ride.

parrit9 天前

For uber you don't need a rating at all. The tracking system knows if they were late, if they took a good route and if they dropped you off at the wrong location.Anything really bad can be dealt with via a complaint system.Anything exceptional could be asked by a free text field when giving a tip.Who is going to read all those text fields and classify them? AI!

评论 #43875582 未加载

pbronez9 天前

One formal measure of this is Inter-Rater Reliability<a href="https://en.wikipedia.org/wiki/Inter-rater_reliability" rel="nofollow">https://en.wikipedia.org/wiki/Inter-rater_reliability</a>

rossdavidh9 天前

I have often had the same thought, and I have to believe the reason is that the companies' bottom line is not impacted the tiniest bit by their ratings' systems. It wouldn't be that hard to do better, but anything that takes a non-zero amount of attention and effort to improve, has to compete with all of those other priorities. As far as I can tell, they just don't care at all about how useful their rating system is.Alternatively, there might be some hidden reason why a broken rating system is better than a good one, but if so I don't know it.

adrmtu9 天前

Isn't this basically a de-biasing problem? Treat each rider’s ratings as a random variable with its own mean μᵤ and variance σᵤ², then normalize. Basically compute z = (r – μᵤ)/σᵤ, then remap z back onto a 1–5 scale so “normal” always centers around ~3. You could also add a time decay to weight recent rides higher to adapt when someone’s rating habits drift.Has anyone seen a live system (Uber, Goodreads, etc.) implement per-user z-score normalization?

parrit9 天前

<a href="https://xkcd.com/1098/" rel="nofollow">https://xkcd.com/1098/</a><a href="https://xkcd.com/937/" rel="nofollow">https://xkcd.com/937/</a>

nmstoker9 天前

Does anyone else get that survey rating effect where you start off thinking the company is reasonable, you give a 4 or 5, then the next page asks for why you chose this and as you think it through you realise more and more shitty things they did, so you go back to bring them down to a 2 or 3. Effectively by asking in detail they undermine the perception of them

enaaem9 天前

Check the bad reviews. If the 1-2 star reviews are mostly about the rude owner, then you know the food is good.

lordnacho9 天前

Has anyone done a forced ranking rating?"Here's your last 5 drivers, please rank them"

评论 #43876181 未加载

xnx9 天前

I don't understand why letter grades aren't more popular for rating things in the US."A+" "B" "C-" "F", etc. feel a lot more intuitive than how stars are used.

评论 #43875353 未加载

评论 #43875466 未加载

评论 #43875314 未加载

评论 #43875299 未加载

JSR_FDED9 天前

A++++ article!

评论 #43875584 未加载

jonstewart9 天前

I give five stars always because I’m not a rat.

User239 天前

Same for peer reviews. Giving anything less than a four is saying fire this person. And even too many fours is PIP territory.

18 条评论

nlh9 天前

评论 #43875473 未加载

评论 #43875922 未加载

评论 #43875747 未加载

评论 #43875148 未加载

tibbar9 天前

stevage9 天前

homeonthemtn9 天前

评论 #43876077 未加载

Retr0id9 天前

mzmzmzm9 天前

parrit9 天前

评论 #43875582 未加载

pbronez9 天前

One formal measure of this is Inter-Rater Reliability<a href="https://en.wikipedia.org/wiki/Inter-rater_reliability" rel="nofollow">https://en.wikipedia.org/wiki/Inter-rater_reliability</a>

rossdavidh9 天前

adrmtu9 天前

parrit9 天前

<a href="https://xkcd.com/1098/" rel="nofollow">https://xkcd.com/1098/</a><a href="https://xkcd.com/937/" rel="nofollow">https://xkcd.com/937/</a>

nmstoker9 天前

enaaem9 天前

Check the bad reviews. If the 1-2 star reviews are mostly about the rude owner, then you know the food is good.

lordnacho9 天前

Has anyone done a forced ranking rating?"Here's your last 5 drivers, please rank them"

评论 #43876181 未加载

xnx9 天前

I don't understand why letter grades aren't more popular for rating things in the US."A+" "B" "C-" "F", etc. feel a lot more intuitive than how stars are used.

评论 #43875353 未加载

评论 #43875466 未加载

评论 #43875314 未加载

评论 #43875299 未加载

JSR_FDED9 天前

A++++ article!

评论 #43875584 未加载

jonstewart9 天前

I give five stars always because I’m not a rat.

User239 天前

Same for peer reviews. Giving anything less than a four is saying fire this person. And even too many fours is PIP territory.