Unsupervised sentiment neuron

603 pointsby gdbabout 8 years ago

29 comments

ericjangabout 8 years ago

Why are people being so critical about this work? Sure, the blog post provides a simplified picture about what the system is actually capable of, but it's still helpful for a non-ML audience to get a better understanding of the high-level motivation behind the work. The OpenAI folks are trying to educate the broader public as well, not just ML/AI researchers.Imagine if this discovery were made by some undergraduate student who had little experience in the traditions of how ML benchmark experiments are done, or was just starting out her ML career. Would we be just as critical?As a researcher, I like seeing shorter communications like these, as it illuminates the thinking process of the researcher. Read ML papers for the ideas, not the results :)I personally don't mind blog posts that have a bit of hyped-up publicity. It's thanks to groups like DeepMind and OpenAI that have captured public imagination on the subject and accelerated such interest in prospective students in studying ML + AI + robotics. If the hype is indeed unjustified, then it'll become irrelevant in the long-term. One caveat is that researchers should be very careful to not mislead reporters who are looking for the next "killer robots" story. But that doesn't really apply here.

评论 #14053227 未加载

评论 #14054506 未加载

评论 #14053636 未加载

评论 #14053767 未加载

评论 #14057190 未加载

1024coreabout 8 years ago

I don't know, but this seems a bit hyped in places.They start with:> Our L1-regularized model matches multichannel CNN performance with only 11 labeled examples, and state-of-the-art CT-LSTM Ensembles with 232 examples.Hmm, that sounds pretty impressive. But then later you read:> We first trained a multiplicative LSTM with 4,096 units on a corpus of 82 million Amazon reviews to predict the next character in a chunk of text. Training took one month across four NVIDIA Pascal GPUsWait, what? How did "232 examples" transform into "82 million"??OK, I get it: they pretrained the network on the 82M reviews, and then trained the last layer to do the sentiment analysis. But you can't honestly claim that you did great with just 232 examples!

评论 #14052292 未加载

评论 #14052228 未加载

评论 #14052144 未加载

评论 #14054509 未加载

评论 #14052483 未加载

评论 #14052278 未加载

评论 #14053848 未加载

评论 #14056874 未加载

srushabout 8 years ago

If you are interested in looking at the model in more detail, we (@harvardnlp) have uploaded the model features to LSTMVis [1]. We ran their code on amazon reviews and are showing a subset of the learned features. Haven't had a chance to look further yet, but it is interesting to play with.[1] <a href="http://lstm.seas.harvard.edu/client/pattern_finder.html?data_set=32sentiment&source=states::states&pos=110&brush=28,31&queried=true&ex_cells=" rel="nofollow">http://lstm.seas.harvard.edu/client/pattern_finder.html?data...</a>

YCodeabout 8 years ago

The synthetic text they generated was surprisingly realistic, despite being generic.If I were perusing a dozen reviews I probably wouldn't have spotted the AI-generated ones in the crowd.

评论 #14053563 未加载

评论 #14053210 未加载

评论 #14052235 未加载

评论 #14056830 未加载

nlabout 8 years ago

So char-by-char models is the next Word2Vec then. Pretty impressive results.It would be interesting to see how it performed for other NLP tasks. I'd be pretty interested to see how many neurons it uses to attempt something like stance detection.Data-parallelism was used across 4 Pascal Titan X gpus to speed up training and increase effective memory size. Training took approximately one month.Everytime I look at something like this I find a line like that and go: "ok that's ncie.. I'll wait for the trained model".

评论 #14052839 未加载

emcqabout 8 years ago

It's very difficult to understand what the contributions are here. From what I've read so far this feels more of a proposal for future research or a press release than advancing the state of the art.* Using large models trained on lots of data to provide the foundation for sample efficient smaller models is common.* Transfer learning, fine tuning, character RNNs is common.Were there any insights learned that give a deeper understanding of these phenomena?Not knowing too much about the sentiment space, it's hard to tell how significant the resulting model is.

评论 #14052637 未加载

评论 #14052634 未加载

wackspurtabout 8 years ago

(Apologies for the slightly incoherent post below)I've been noticing a lot of work that digs into ML model internals (as they've done here to find the sentiment neuron) to understand why they work or use them to do something. Let me recall interesting instances of this:1. Sander Dieleman's blog post about using CNNs at Spotify to do content-based recommendations for music. He didn't write about the system performance but collected playlists that maximally activated each of the CNN filters (early layer filters picked up on primitive audio features, later ones picked up on more abstract features). The filters were essentially learning the musical elements specific to various subgenres.2. The ELI5 - Explain Like I'm Five - Python Library. It explains the outputs of many linear classifiers. I've used it to explain why a text classifier was given a certain prediction: it highlights features to show how much or little they contribute to the prediction (dark red for negative contribution, dark green for positive contribution).3. FairML: Auditing black-box models. Inspecting the model to find which features are important. With privacy and security concerns too!Since deep learning/machine learning is very empirical at this stage, I think improvements in instrumentation can lead to ML/DL being adopted for more kinds of problems. For example: chemical/biological data. I'd be highly curious to what new ways of inspecting such kinds of data would be insightful (we can play audio input that maximally active filters for a music-related network, we can visualize what filters are learning in an object detection network, etc.)

tshadleyabout 8 years ago

"The selected model reaches 1.12 bits per byte." (<a href="https://arxiv.org/pdf/1704.01444.pdf" rel="nofollow">https://arxiv.org/pdf/1704.01444.pdf</a>)For context, Claude Shannon found that humans could model English text with an entropy of 0.6 to 1.3 bits per character (<a href="http://languagelog.ldc.upenn.edu/myl/Shannon1950.pdf" rel="nofollow">http://languagelog.ldc.upenn.edu/myl/Shannon1950.pdf</a>)

itchyjunkabout 8 years ago

I would imagine stuff like sarcasm is still out of reach though. It seems hard for humans to understand it in text based communication. Also using anything out of the standard sentimental model might throw it off. "This product is as good as <product x> (where product x has been known to perform bad." I am just trying to think of scenarios where a sentimental model would fail.Sentimental neuron sounds fascinating too. I didn't realize individual neurons could be talked about or understood outside of the concept of the NN. I am thinking in terms of "black box" its often referenced to in some articles.Since one of the research goal for openai is to train language model on jokes[0], I wonder how this neuron would perform with a joke corpus.----------------------------[0] <a href="https://openai.com/requests-for-research/#funnybot" rel="nofollow">https://openai.com/requests-for-research/#funnybot</a>

评论 #14054199 未加载

评论 #14054010 未加载

aabajianabout 8 years ago

I'm trying to understand this statement:"The sentiment neuron within our model can classify reviews as negative or positive, even though the model is trained only to predict the next character in the text."If you look closely at the colorized paragraph in their paper/website, you can see that the major sentiment jumps (e.g. from green to light-green and from light-orangish to red) occur with period characters. Perhaps the insight is that periods delineate the boundary of sentiment. For example:I like this movie. I liked this movie, but not that much. I initially hated the movie, but ended up loving it.The period tells the model that the thought has ended.My question for the team: How well does the model perform if you remove periods?

评论 #14053411 未加载

评论 #14056371 未加载

d--babout 8 years ago

Can someone explain what is "unsupervised" about this? I'm guessing this is what confuses me most.I think this work is interesting, although when you think about it, it's kind of normal that the model converges to a point where there is a neuron that indicates whether the review is positive or negative. There are probably a lot of other traits that can be found in the "features" layer as well.There are probably neurons that can predict the geographical location of the author, based on the words they use.There are probably neurons that can predict that the author favors short sentences over long explanations.But what makes this "unsupervised"?

评论 #14053556 未加载

评论 #14053771 未加载

评论 #14054280 未加载

huulaabout 8 years ago

Machine Learning has become more and more like archaeology after people start saying "empirically" more and only provide a single or limited datasets.

andreykabout 8 years ago

I think it's fair to criticize this blog post for being unclear on what exactly is novel here; pre-training is a straighforward and old idea, but the blog post does not even mention this. Having accessible write ups for AI work is great, but surely it should not be confusing to domain experts or be written in such a way as to exacerbate the rampant oversimplification or misreporting in popular press about AI. Still, it is a cool mostly-experimental/empirical result, and it's good that these blog posts exist these days.For what it's worth, the paper predictably does a better job of covering the previous work and stating what their motivation was: "The experimental and evaluation protocols may be underestimating the quality of unsupervised representation learning for sentences and documents due to certain seemingly insignificant design decisions. Hill et al. (2016) also raises concern about current evaluation tasks in their recent work which provides a thorough survey of architectures and objectives for learning unsupervised sentence representations - including the above mentioned skip-thoughts. In this work, we test whether this is the case. We focus in on the task of sentiment analysis and attempt to learn an unsupervised representation that accurately contains this concept. Mikolov et al. (2013) showed that word-level recurrent language modelling supports the learning of useful word vectors and we are interested in pushing this line of work. As an approach, we consider the popular research benchmark of byte (character) level language modelling due to its further simplicity and generality. We are also interested in evaluating this approach as it is not immediately clear whether such a low-level training objective supports the learning of high-level representations." So, they question some built in assumptions from the past by training on lower-level data (characters), with a bigger dataset and more varied evaluation.The interesting result they highlight is that a single model unit is able to perform so well with their representation: "It is an open question why our model recovers the concept of sentiment in such a precise, disentangled, interpretable, and manipulable way. It is possible that sentiment as a conditioning feature has strong predictive capability for language modelling. This is likely since sentiment is such an important component of a review" , which I tend to agree with... train a on a whole lot of reviews, it's only natural to train a regressor for review sentiment.

eanzenbergabout 8 years ago

I think one of the most amazing parts of this is how accessible the hardware is right now. You can get world-class AI results with the cost of less than most used cars. In addition, with so many resources freely available through open-source, the ability to get started is very accessible.

stillsutabout 8 years ago

> The model struggles the more the input text diverges from review dataThis is where I fear the results will fail to scale. The ability to represent 'sentiment' as one neuron, and its ground truth as uni-dimensional seems most true to corpuses of online reviews where the entire point is to communicate whether you're happy with the thing that came out of the box. Most other forms of writing communicate sentiment in a more multi-dimensional way, and the subject of sentiment is more varied than a single item shipped in a box.In otherwords, the unreasonable simplicity of modelling a complex feature like sentiment with this method, is something of an artifact of this dataset.

gallerdudeabout 8 years ago

The neural network is savage enough to learn "I would have given it zero stars, but that was not an option." Are we humans that predictable?

评论 #14052596 未加载

anonymfusabout 8 years ago

This article is not accessible. It puts all textual examples into images and ever has some absolutely unnecessary animation. Please fix it.

评论 #14053649 未加载

ChuckMcMabout 8 years ago

This is a great name for a band :-). That said, I found the paper really interesting. I tend to think about LSTM systems as series expansions and using that as an analogy don't find it unusual that you can figure out the dominant (or first) coefficient of the expansion and that it has a really strong impact on the output.

kamalbangaabout 8 years ago

What they have done is semi-supervised learning (Char-RNN) + supervised training of sentiment. Another way to do is semi-supervised learning (Word2Vec) + supervised training of sentiment. If first approach works better, does it imply that character level learning is more performant than word level learning?

mdibaieeabout 8 years ago

As far as I understand, it means that there must be a relation between a character's sentiment and what the next character can (/should) be for neural network to use this as a feature, am I right?Does this mean we have unconsciously developed a language that exposes such relations?

评论 #14069178 未加载

评论 #14069182 未加载

kvhabout 8 years ago

Impressive the abstraction NNs can achieve from just character prediction. Do the other systems they compare to also use 81M Amazon reviews for training? Seems disingenuous to claim "state-of-the-art" and "less data" if they haven't.

auviabout 8 years ago

just wondering, how many AI programs (models with complete source code) OpenAI has released?

评论 #14055903 未加载

du_bingabout 8 years ago

Train on character-by-character basis, this is really incredible, quite opposite to human's intuition about language, but it seems a brilliant idea, and OpenAI tried it out, great!

mrfusionabout 8 years ago

why did they do this character by character? Would word by word make sense? Other than punctuation I'm not seeing why specific characters are meaningful units.

评论 #14055894 未加载

djangowithmeabout 8 years ago

Why is the linear combination used to train the sentiment classifier? Why does its result get taken into account?Is this linear combination between 2 different strings?

changoplataneroabout 8 years ago

What's the easiest way to make a text heatmap like the ones in their blog?

sushirainabout 8 years ago

Very interesting. I wonder if they tried to predict part-of-speech tags.

评论 #14053267 未加载

grandalfabout 8 years ago

This has amazing potential for use in sock puppet accounts.

curuinorabout 8 years ago

moved that needle I guess