Show HN: We just launched a Bayesian-Based Sentiment Tracker

42 pointsby spxdczover 14 years ago

11 comments

toast76over 14 years ago

Call me crazy, but things going UP is generally good. I don't see how you can logically describe something as having an "increase in worsening satisfaction"."AT&T is ranked 1st out of 244 brands"... they must be AWESOME...oh wait, no they're not.It says it's a customer satisfaction index, when it's actually a customer DISsatisfaction index.Also, I don't mean to sink the boots in but... "that is awesome that you got SVU to discuss the boycott of the pedophile book on Amazon! I cannot wait to see how it goes!" Is that a good comment or a bad comment?

random42over 14 years ago

This is not how sentiment analysis work (or should work). I worked on something similar (Naive Bayes based sentiment analyzer <a href="https://github.com/mohitranka/TwitterSentiment" rel="nofollow">https://github.com/mohitranka/TwitterSentiment</a>). I also work for a company which is in the same space as groubalcsi.com (Brand/Product opinion mining)Sentiment analysis is not a classification problem (like spam detection), but it is an identification problem, because sentiments are always associated with an entity (and attribute, if specified).For example, a tweet saying, "Dell is not as good as apple" requires to identify entities (Dell and Apple) and associate sentiments to them (Negative and Positive, respectively). It is incorrect to try to associate sentiment (whatever it may be) to the tweet itself.

rayvalover 14 years ago

Interesting but possibly flawed exercise. It would be good to show the entire set of brands sorted from bottom (i.e., good) to top (i.e. bad).I sorted the data and present here two groups:1. This is a sample of supposedly the most satisfying, from the best on down (er, up): TGI Fridays, Best Western, Zenith Electronics, JVC, Chili's, Denny's, Hampton Inn, Olive Garden, Applebee's, Sams Club, Yahoo, AOL.2. By contrast, here is a sample of some of the worst, listed from the top (high dissatisfaction) on down: Wikipedia, Apple, Nokia, Facebook, Volkswagen, YouTube, Amazon, Nike, Sony, Ikea, Range Rover, Rolex, Porsche, Google, Netflix, Louis Vuitton, CNN, American Express. Wall Street Journal, Intel.Group 1 and 2 do not overlap in their scores.Meaning that Intel (the best of the worst) is at 404, with a higher dissatisfaction rating than AOL (the worst of the best).This grouping does not make sense to me, because if you showed me the two lists above and asked which of these two sets had better satisfaction scores, I would have picked Group 2 over Group 1.What could explain this? Perhaps there is demographic skew, in that down-market brands (Dennys, Sams Club, Zenith) are not talked about as much among upscale social media people, who would rather complain about Apple, Sony, and Porsche.Or perhaps there is a mismatch of expectations. People expect the premium brands to deliver more, and complain loudly when they fall short in the slightest. And conversely, perhaps people expect a mediocre experience with downmarket brands.

telover 14 years ago

What are the units of dissatisfaction used throughout the page? How do they map to the y-axis of the dissatisfaction graph? What sense of scale do I need to have to understand the units? Is a 945 bad? How bad? Is hate linear? Since AAPL scores roughly half as much as AT&T does that mean that the average Twitterer hates AAPL half as much? What happens if someone scores a perfect 1000? Can they be hated no further?What time zone is the next update measured in? What makes your classifier 'Bayesian' besides just using something called a 'Naive Bayes Classifier'? What is the 90% accuracy determined from? Why should I care? Is a 24-hour improvement in customer satisfaction a significant thing? How quickly does hate fluctuate? What is your uncertainty in each of these measurements? Is there an overall brand hate level that I can compare these things to? How are they affected by overall sentiment toward companies?<pre><code> ------ </code></pre> It's an interesting complementary site to your primary interest in Groubal. I'm just a skeptic to methods in sentiment analysis in general. To analyze data properly is very hard. Applying tools to observe what happens is still interesting though.But I'm not sure I learned a whole lot to see graphs proclaiming that Twitterers dislike AT&T, Time Warner, Banks, Internet Providers, and Zynga. Tylenol and Enterprise were interesting to find though. I have no idea what it means for Tylenol to be 100 units less hated though.So perhaps what you should tune your ML stuff to seek out is not just some difficult to quantify measure of dissatisfaction, but instead look for things like Tylenol and Enterprise where people might not expect themselves to have such trouble with the brand. In such a case, it becomes automatic, insightful rabble-rousing instead of methodologically sparse hate-ranking.

spxdczover 14 years ago

If anyone's interested, we're using the Google Graph API for all the graphs (the spark lines and the big transparent ones at the top), and the Bayesian stuff is based on the PHP work I wrote up here: <a href="http://danzambonini.com/self-improving-bayesian-sentiment-analysis-for-twitter/" rel="nofollow">http://danzambonini.com/self-improving-bayesian-sentiment-an...</a>EDIT: Also, we're not really using it yet, but I thought it was interesting how you can also easily calculate the 'agreement' on sentiment by using the MySQL STDDEV function (or similar) to work out the variance in sentiment.

评论 #1923739 未加载

physcabover 14 years ago

pretty cool. I would change high meaning bad and low meaning good, unless its a rank out of the total. Its a bit counter-intuitive. Why do you just place an emphasis on dissatisfaction instead of giving the option to look at both?

评论 #1923712 未加载

jhamburgerover 14 years ago

Something came to mind that will skew this heavily. People will mention a company by name for mainly one of two reasons, either to complain or to tell people about some cool new thing. If someone mentions a ubiquitous company like Google, Verizon, etc, it's usually to complain. They're probably not telling the world about the wonders of Google search. On the other hand, if someone mentions a smaller company it's probably the cool-new-thing factor.

评论 #1924096 未加载

DeusExMachinaover 14 years ago

Is the time span so short just because of initial lack of data? If not, I think it would be useful to change the span of the graph to more than a week, to understand the long term trend. Fore some of the lines I see high fluctuations, so the graph is not so meaningful.

jhamburgerover 14 years ago

I like how "iHop" is capitalized as if it were Steve Jobs' take on the pogo stick.

rhizomeover 14 years ago

I don't know if it's intentional, but I would expand your scope beyond complaints. If your math is good you could have a very nice reputation tracker and analytics package in general.

评论 #1923748 未加载

rorrrover 14 years ago

Google is #7 worst web company? <a href="http://www.groubalcsi.com/sector/web-based-services" rel="nofollow">http://www.groubalcsi.com/sector/web-based-services</a>BMW is the worst in motor vehicle? <a href="http://www.groubalcsi.com/sector/motor-vehicles" rel="nofollow">http://www.groubalcsi.com/sector/motor-vehicles</a>

评论 #1923805 未加载

11 comments

toast76over 14 years ago

random42over 14 years ago

rayvalover 14 years ago

telover 14 years ago

spxdczover 14 years ago

评论 #1923739 未加载

physcabover 14 years ago

评论 #1923712 未加载

jhamburgerover 14 years ago

评论 #1924096 未加载

DeusExMachinaover 14 years ago

jhamburgerover 14 years ago

I like how "iHop" is capitalized as if it were Steve Jobs' take on the pogo stick.

rhizomeover 14 years ago

I don't know if it's intentional, but I would expand your scope beyond complaints. If your math is good you could have a very nice reputation tracker and analytics package in general.

评论 #1923748 未加载

rorrrover 14 years ago

评论 #1923805 未加载