Yelp (or Qype/Citysearch/TrustedPlaces/Tipped etc.) are not deliberately misleading their users, but fundamentally they’re all broken, the foundations on which they’re built are unsound.
I like Amazon's 5 star rating system because it also tells you how many people gave the product which rating. There's a big difference if 20 people gave a product 3 stars or if 10 gave it 5 stars and 10 gave it 1 star. Amazon's system makes this difference visible, but most others don't.
This article seems kinda troll-y to me.<p>Why? Because even Yelp understands the "The man who drowned in the river whose average depth was 6 inches" problem, which is exactly why they offer both a histogram view of ratings, and an average-over-time graph. I <i>frequently</i> use both of those tools.<p>Now don't get me wrong there are all kinds of problems with 1-5 rating schemes, and there are probably better schemes out there. But what Yelp does is as good as any I've seen towards combatting the problem, which is why I enjoy using it.<p>As someone that took 4 semesters of business statistics in college, my personal favorite alternative scheme is paired-comparison, or even better yet, the related (and newer) MaxDiff algorithm [<a href="http://en.wikipedia.org/wiki/MaxDiff" rel="nofollow">http://en.wikipedia.org/wiki/MaxDiff</a>]. The only problem with these schemes are that they require much more user input which frankly is a lot to ask for on a site like Yelp.
While a simple scale rating system is just that, simple, anything more complex quickly exceeds the amount of effort both rater and searchers are willing to put in to understanding it. "Rate this restaurant on a scale of 1-10" is much simpler to understand than a massive questionnaire (even ratings on something like 3 or 4 axes may be too many/too much effort). It also has the advantage of being ambiguous enough that ANY opinion can fit into it -- you would be hard pressed to find the right kind of questions that fit all the different things that can be rated (What if you liked a place but didn't experience any of the specific things being asked about? Would you rate a movie theater on service if you never used the concession stand?). A site that didn't allow a reviewer to put in prose to explain their rating would be useless; I believe all the sites the OP talked about do. The rating scale is just meant to be a summary, each one is a summary of the thing being reviewed, and in aggregate it's a summary of the community's feel for the thing being reviewed.<p>When flipping channels looking at movies that are on, the star rating system is largely worthless because 1) it was most likely done by some paid movie buff who has inherently different motivations and likes/dislikes than I do and 2) there is no explanation as to the reasoning why a rating was given. It's all subjective measurements, and it can't be objective. The whole point of rating systems is opinion. It's just like asking a friend if a place is worth going.<p>Often times, the use case is "find something good with a minimal amount of hassle". This comes down to things like "places people liked that are within a mile of where I am right now". I have five minutes to make a decision on this, the single data point of a rating scale helps me make the decision quickly and (perceptually) accurately (it may not actually be accurate, but I feel like I'm making a good decision). These sites are also designed for repeat users and users being contributors. You learn the way other people have rated things on the site over time and are more able to decide for yourself what a "1 star" means vs a "5 star" means given the context.<p>I think a larger problem is getting people to <i>want</i> to expend the effort to provide a review of a place that they aren't extremely excited about (either positively or negatively). I know I don't bother to review places I had a so-so experience at, but if I had excellent service or really bad service, I make it a point to rate them. In some way, this skews the results, but is most likely isn't that big of a deal.<p>I don't know what the OP has in store for his next posting, but if it's truly revolutionary, he should be starting his own competing review service.
I've always thought 5 star rating systems were fundamentally flawed because they're based on the assumption that the same experience experienced by two different people will get the same star rating. For example, what might be a four star experience for me could be a two star experience for you.<p>Maybe it's just me, but I'd prefer a probability system such as up and down votes. If I see a restaurant has 100 "likes" and 10 "dislikes" it's probable that I'll also enjoy the place (given the fact I already know they're serving food and drinks I would normally enjoy).
Nothing about his analysis is limited to local reviews -- the mean is a limited measure regardless, and the limitation of reviewers reviewing at different times is a function of the number of reviews per unit time, not locality. (They're correlated, obviously, but far enough out in the "long tail", even international reviews are going to be thin.)
What about the Wilson score confidence interval mentioned here?<p><a href="http://www.evanmiller.org/how-not-to-sort-by-average-rating.html" rel="nofollow">http://www.evanmiller.org/how-not-to-sort-by-average-rating....</a>
There's nothing wrong with <i>asking</i> users to rate things out of five. However, just seeing an out-of-five score is, indeed, pointless when you talk about products that can vary per-person and over time.<p>I'd suggest that the score be rendered as a Sparkline of aggregate-score-over-time, with a surrounding colored field, the width of which is the deviation for that aggregate sample point. Thus, you could see whether a five-star restaurant used to be a three-star at a glance, and see how many people disagree with the current rating with a simple visual geometric comparison.
Right, but that's not the whole story. The 5 star system is probably a lot better suited to the sorts of things that Amazon sell (music, DVDs, electronics etc.). These ratings are not as time sensitive as a bar or restaurant ratings for example which could change entirely in the space of a very short period of time.
this type of problem is very common when surveying people. A 6 or 4 star system would be much better. People then have to decide it it was above or below "average" because there is no middle star.
<i>That's</i> why you read the comments.<p>The best rating systems I've seen looked something like this:<p><pre><code> # of stars (1-5): __
If not 5, the #1 thing that would get more stars:
_________
</code></pre>
or<p><pre><code> If not 5, the top 3 things that would get more stars:
_______________
_______________
_______________</code></pre>