Wittgenstein’s theories are the basis of all modern NLP

175 点作者 ghosthamlet超过 6 年前

14 条评论

idoubtit超过 6 年前

In what sense are these theories a "basis" to NLP? Did they have any influence? Do they bring any practical contributions? I suspect a slight similarity between popular domains (Wittgenstein and NLP) was contrived into an article that seems very light on the W part.The "Wittgenstein’s theories" that appear here is just that "the meaning of a word is its use in the language". If such a plain concept was all of Wittgenstein’s theories, he would be long forgotten.For centuries, dictionaries have presented words through one or several explanations as well as quotes and examples. 150 years ago, Émile Littré wrote a wonderful French dictionary that contains 80,000 words and about 300,000 literary quotes. He knew no word has a simple and permanent meaning, and that one needs to know many real world contexts to get a fine view on a word.

评论 #18871283 未加载

评论 #18870650 未加载

评论 #18870813 未加载

评论 #18871824 未加载

mlucy超过 6 年前

It's really difficult to overstate how important embeddings are going to be for ML.Word embeddings have already transformed NLP. Most people I know, when they sit down to work on an NLP task, the first thing they do is use an off-the-shelf library to turn it into a sequence of embedded tokens. They don't even think about it; it's just the natural first step, because it makes everything so much easier.In the last couple years, embeddings for other data types (images, whole sentences, audio, etc.) have started to enter mainstream practice too. You can get near-state-of-the-art image classification with a pretrained image embedding, a few thousand examples, and a logistic regression trained on your laptop CPU. It's astonishing.(Note: I work on <a href="https://www.basilica.ai" rel="nofollow">https://www.basilica.ai</a> , an embeddings-as-a-service company, so I'm definitely a little bit biased.)

评论 #18869737 未加载

评论 #18869501 未加载

akozak超过 6 年前

Figuring out how to process context is important for NLP, no question.But I think this is probably wrong on Wittgenstein. I'm pretty sure his entire point in the Philosophical Investigations was that "meaning" is exactly NOT probabilities of symbol co-occurrence, or just names of objects in the world. Symbols acquire meanings from their use by humans. Accounting for context in NLP via probabilities of occurrence might be useful in better reproducing language, but we should be careful not to say that this is the essence of meaning and language.

评论 #18869356 未加载

jeromebaek超过 6 年前

The author has seriously misunderstood Wittgenstein's contributions to philosophy of language.>And it’s now quite clear where the Wittgenstein’s theories jump in: context is crucial to learn the embeddings as it’s crucial in his theories to attach meaning.Yes, Wittgenstein said context is important for meaning, but that is hardly his unique or even most important contribution to philosophy of language. Wittgenstein's real contribution is in showing that meaning cannot be pinned down like butterflies under glass -- that meaning spontaneously arises in each playthrough of a language-game, and that any effort to find a "canonical", "authoritative" definition is grasping at an illusion.But word embeddings try to do almost exactly what Wittgenstein says is an illusion -- trying to pin down a canonical n-dimensional vector for each word. To correspond with Wittgenstein's theory, there cannot exist any mapping from a word to a vector. Perhaps each vector can be dynamically changing in a by principle uncomputable way. But to get there we are going to need a lot more advances than the state of the art NLP.

评论 #18871949 未加载

评论 #18871153 未加载

mlthoughts2018超过 6 年前

One interesting concept I read in Wittgenstein was the idea of decomposing a word into its constituent parts. I’ll use the term broom for it because that was the classic example and also the motivation for David Foster Wallace’s novel “Broom of the System.”So you take “broom” and you could decompose it into “handle” and “bristles”. But then you could decompose it more, by recursively decomposing “handle” into “grains of wood” and “bristles” into “pieces of fiber” (or whatever).You keep doing this ad infinitum, I guess on down to the summation of a bunch of quarks or whatever.The question of interest to Wittgenstein was where does this process bottom out. What would it mean, either physically or semantically, to have a word identifying a concept that could not be broken down into further constituent parts.Wittgenstein was interested in this for the philosophy of language. But I got interested in it by thinking about the decomposition as a mathematical operator,D(“broom”) = {“handle”, “bristles”}and then asking what it could mean if this operator D had an “eigenvector” with an “eigenvalue” of 1, so that Dx = x for some non-decomposeable word x.In some ways, you can see how it could relate to things like word2vec and embedding representations if you could represent a decomposition operator, and define a hierarchical relationship of words as an ordering of how to more or less specifically decompose a word’s representation.

评论 #18871419 未加载

atrudeau超过 6 年前

These older word embedding models (word2vec, GloVe, LexVec, fastText) are being superseded by contextual embeddings ( <a href="https://allennlp.org/elmo" rel="nofollow">https://allennlp.org/elmo</a> ) and fine-tuned language models ( <a href="https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html" rel="nofollow">https://ai.googleblog.com/2018/11/open-sourcing-bert-state-o...</a> ). These contextual models can infer that "bank" in "I spent two hours at the bank trying to get a loan" is very different from "The ocean bank is where most fish species proliferate."

nostrademons超过 6 年前

It's interesting how different this is from 10 years ago, when Chomsky's theories were the basis of all modern NLP, or even 5 years ago, when most NLP used a hybrid of formal grammars + embeddings. I remember attending a tech-talk on part-of-speech tagging in 2011; the state-of-the-art then was a probabilistic shift-reduce parser where the decision to shift vs. reduce at each node was done by a machine-learned classifier.

评论 #18869674 未加载

lettergram超过 6 年前

For those interested, I recently wrote a guide on using neural networks for NLP[1].I wrote the guide with the explicate goal of trying to help the people understand NLP (sentence classification) without the need to understand the math.I cover word embeddings:<a href="https://austingwalters.com/word-embedding-and-data-splitting/" rel="nofollow">https://austingwalters.com/word-embedding-and-data-splitting...</a>As well as FastText:<a href="https://austingwalters.com/fasttext-for-sentence-classification/" rel="nofollow">https://austingwalters.com/fasttext-for-sentence-classificat...</a>Hope someone finds it useful.[1] <a href="https://github.com/lettergram/sentence-classification" rel="nofollow">https://github.com/lettergram/sentence-classification</a>

kolbe超过 6 年前

I am really struggling to find where Wittgenstein fits into any of this at all.>And it’s now quite clear where the Wittgenstein’s theories jump in: context is crucial to learn the embeddings as it’s crucial in his theories to attach meaning.That's not at all clear to me. The crucial part of W's tome is that two sentient beings are knowingly engaging in a game where they have 'agreed' on meanings. My guess from reading Philosophical Investigations is that W would only think NLP were possible in formal settings like law, where all players of the game know the rules quite well, and the program could be trained as if it were a player in that game.

评论 #18869708 未加载

andybak超过 6 年前

I really wish NLP didn't have two common meanings.

评论 #18873611 未加载

libertas超过 6 年前

I would think that the tractatus would be more useful to an AI. But Witgebstein's remarkable ability to shift the paradign and over extend into a meta level of analysis seems similar to the way alpha mind and Leela play chess. The tools W uses to understand perception have a more probabilistic and irrational nature then the tools he uses in his previous work. As if he realized that human communication cannot be considered as a closed and finite system, hence I cannot see how his ideas are implemented in these applications, yet.

评论 #18870771 未加载

southerndrift超过 6 年前

>As human beings speaking English it is quite trivial to understand that a “dog” is an “animal” and that is more similar to a “cat” than to a “dolphin” but this task is far from easy to be solved in a systematic way.Are they? A dog can be trained like a dolphin, unlike a cat. In the context of training, dogs are more similar to dolphins.

评论 #18869832 未加载

评论 #18869788 未加载

perfmode超过 6 年前

Can someone ELI5 the term "embedding"?

评论 #18869896 未加载

评论 #18869827 未加载

KasianFranks超过 6 年前

Inaccurate. This is absurd. Epigraphy is the basis of all modern NLP/NLU. Add computational epigraphy, neuroscience, linguistics and cognition. Ref: Word2Vec is based on an approach from Lawrence Berkeley National Lab""Google silently did something revolutionary on Thursday. It open sourced a tool called word2vec, prepackaged deep-learning software designed to understand the relationships between words with no human guidance. Just input a textual data set and let underlying predictive models get to work learning."“This is a really, really, really big deal,” said Jeremy Howard, president and chief scientist of data-science competition platform Kaggle. “… It’s going to enable whole new classes of products that have never existed before.” <a href="https://gigaom.com/2013/08/16/were-on-the-cusp-of-deep-learning-for-the-masses-you-can-thank-google-later/" rel="nofollow">https://gigaom.com/2013/08/16/were-on-the-cusp-of-deep-learn...</a>Spotify seems to be using it now: <a href="http://www.slideshare.net/AndySloane/machine-learning-spotify-madison-big-data-meetup" rel="nofollow">http://www.slideshare.net/AndySloane/machine-learning-spotif...</a> pg 34But here's the interesting part:Lawrence Berkeley National Lab was working on an approach more detailed than word2vec (in terms of how the vectors are structured) since 2005 after reading the bottom of their patent: <a href="http://www.google.com/patents/US7987191" rel="nofollow">http://www.google.com/patents/US7987191</a> The Berkeley Lab method also seems much more exhaustive by using a fibonacci based distance decay for proximity between words such that vectors contain up to thousands of scored and ranked feature attributes beyond the bag-of-words approach. They also use filters to control context of the output. It was also made part of search/knowledge discovery tech that won the 2008 R&D100 award <a href="http://newscenter.lbl.gov/news-releases/2008/07/09/berkeley-lab-wins-four-2008-rd-100-awards/" rel="nofollow">http://newscenter.lbl.gov/news-releases/2008/07/09/berkeley-...</a> & <a href="http://www2.lbl.gov/Science-Articles/Archive/sabl/2005/March/06-genopharm.html" rel="nofollow">http://www2.lbl.gov/Science-Articles/Archive/sabl/2005/March...</a>A search company that competed with Google called "seeqpod" was spun out of Berkeley Lab using the tech but was then sued for billions by Steve Jobs <a href="https://medium.com/startup-study-group/steve-jobs-made-warner-music-sue-my-startup-9a81c5a21d68#.jw76fu1vo" rel="nofollow">https://medium.com/startup-study-group/steve-jobs-made-warne...</a> and a few media companies <a href="http://goo.gl/dzwpFq" rel="nofollow">http://goo.gl/dzwpFq</a>We might combine these approaches as there seems to be something fairly important happening here in this area. Recommendations and sentiment analysis seem to be driving the bottom lines of companies today including Amazon, Google, Nefflix, Apple et al."