Recent Advances in Natural Language Processing

278 pointsby saadalemalmost 5 years ago

18 comments

rvensealmost 5 years ago

I think the point about language being a model of reality was interesting. I have an MA in linguistics including some NLP from about a decade ago and was looking at a career in academic NLP. I ultimately left to become a programmer because (of life circumstances and the fact that) I didn't see much of a future for the field, precisely because it was ignoring the (to me) obvious issues of written language bias, ignorance of multi-modality and situatedness etc. that are brought up in this post.All of these results are very interesting, but I'm not really feeling like we've been proved wrong yet. There is a big question of scalability here, at least as far as the goal of AGI goes, which the author also admits:> Of course everyday language stands in a woolier relation to sheep, pine cones, desire and quarks than the formal language of chess moves stands in relation to chess moves, and the patterns are far more complex. Modality, uncertainty, vagueness and other complexities enter but the isomorphism between world and language is there, even if inexact.This woolly relation between language and reality is well-known. It has been studied in various ways in linguistics and the philosophy of language, for instance by Frege and not least Foucault and everything after. I also think many modern linguistic schools take a very different view of "uncertainty and vagueness" than I sense in the author here, but they are obviously writing for non-specialist audience and trying not to dwell on this subject here.My point is, when making and evaluating these NLP methods and the tools they are used to construct, it is extremely important to understand that language models social realities rather than any single physical one. It seems to me all too easy, coming from formal grammar or pure stats or computer science, to rush into these things with naive assumptions about what words are or how they mean things to people. I dread to think what will happen if we base our future society on tools made in that way.

评论 #24181588 未加载

评论 #24181556 未加载

评论 #24194532 未加载

评论 #24182612 未加载

FiberBundlealmost 5 years ago

I found the science exams results interesting and skimmed the paper [1]. They report an accuracy of >90% on the questions. What I found puzzling was that they have a section in the experimental results part where they test the robustness of the results using adverserial answer options, more specifically they used some simple heuristic to choose 4 additional answer options from the set of other questions which maximized 'confusion' for the model. This resulted in a drop of more than 40 percentage points in the accuracy of the model. I find this extremely puzzling, what do these models actually learn? Clearly they don't actually learn any scientific principles.[1] <a href="https://arxiv.org/pdf/1909.01958.pdf" rel="nofollow">https://arxiv.org/pdf/1909.01958.pdf</a>

评论 #24181474 未加载

评论 #24181991 未加载

评论 #24181851 未加载

评论 #24182810 未加载

rlandalmost 5 years ago

> Models are transitive- if x models y, and y models z, then x models z. The upshot of these facts are that if you have a really good statistical model of how words relate to each other, that model is also implicitly a model of the world.This right here is a great way of putting the success of GPT-3 into context. We think GPT is smart, because when it says something eerily human-like we apply our model of the world onto what it is saying. A conversation like this:> Me: So, what happened when you fell off the balance beam?> GPT: It hurt.> Me: Why'd it hurt so bad?> GPT: The beam was high up and I feel awkwardly.> Me: Wow, that sounds awful.In this conversation, one of us is thinking far harder than the other. GPT can have conversations like this now, which is impressive. But only I can model the beam, the fall, and the physical reality. When I say "that sounds awful," I actually do a miniature physics simulation in my head, imagining losing my balance and falling off a high beam, landing, the physical pain, etc. GPT does none of that. In either case, when it asks the question or when it answers it, it is entirely ignorant of this sort of "shadow" model that's being constructed.Generalizing a bit, our "shadow" model of reality in every single domain is far more powerful than language's approximation. That's why we won't be able to use GPT to do a medical diagnosis or create a piece of architecture or whatever else people are saying it's going to do now.

评论 #24182641 未加载

评论 #24182577 未加载

评论 #24182560 未加载

mqusalmost 5 years ago

Not a single mention if this is only applicable to english or to other natural languages. Afaict this mostly lists advancements in ELP (english language processing), Especially the Winograd schema (or ar least the given example) seems to be heavily focused on english.Relevant article for this problem: <a href="https://news.ycombinator.com/item?id=24026511" rel="nofollow">https://news.ycombinator.com/item?id=24026511</a>

评论 #24180331 未加载

skybrianalmost 5 years ago

Darn, based on the title, I was hoping for an overview of recent research.Lots of people are having fun playing with GPT-3 or AI Dungeon, myself included, but it seems like there is other interesting research going on like the REALM paper [1], [2]. What should I be reading? Why aren't people talking about REALM more? I'm no expert, but it seems like keeping the knowledge base outside the language model has a lot going for it?[1] <a href="https://ai.googleblog.com/2020/08/realm-integrating-retrieval-into.html" rel="nofollow">https://ai.googleblog.com/2020/08/realm-integrating-retrieva...</a> [2] <a href="https://arxiv.org/abs/2002.08909" rel="nofollow">https://arxiv.org/abs/2002.08909</a>

评论 #24182817 未加载

评论 #24182869 未加载

YeGoblynQueennealmost 5 years ago

Separate comment because separate point.To my mind, all this benchmark-beating is just pointless faffing about and not a little frustrating to see that this is where the current state of the art is at. NLP systems keep getting better and better at beating benchmarks and yet we still don't have artificial agents that can hold an intelligent conversation, even in a restricted context.Or rather, we had such programs since the late 1960's, with no one else's than Terry Winograd's own doctoral thesis system, SHRDLU [1], but it seems that in the race to build powerful classifiers, we forgot how to build systems that can interact intelligently with a human.Here's a small excerpt from SHRDLU (taken from the wikipedia page on SHRDLU, originally in an appendix in Terry Winograd's thesis):<pre><code> Person: Is there a large block behind a pyramid? Computer: YES, THREE OF THEM: A LARGE RED ONE, A LARGE GREEN CUBE, AND THE BLUE ONE. Person: Put a small one onto the green cube which supports a pyramid. Computer: OK. Person: Put the littlest pyramid on top of it. Computer: OK. Person: Does the shortest thing the tallest pyramid's support supports support anything green? Computer: YES, THE GREEN PYRAMID. Person: What color is it? Computer: BY "IT", I ASSUME YOU MEAN THE SHORTEST THING THE TALLEST PYRAMID'S SUPPORT SUPPORTS. Computer: RED </code></pre> Note that this is not just mindless text generation. The conversation is held in the context of a "blocks world" where a robot with a single hand and eye ("Computer" in the excerpt above) moves blocks of various shapes and colours around, as directed by a human user in free-form natural language. When the Computer says "OK" after it's directed to "put the littlest pyramid on top of it" it's because it really has grabbed the smallest pyramid in the blocks world and placed it on top of the small block in an earlier sentence, as the Person asked. The program has a memory module to keep track of what ellipses like "it", "one" etc refer to throughout the conversation.SHRDLU was a traditional program hand-crafted by a single PhD student- no machine learning, no statistical techniques. It included, among other things, a context-free grammar (!) of natural English and a planner (to control the robot's hand) all written in Lisp and PLANNER. In its limited domain, it was smarter than anything ever created with statisical NLP methods.______________________[1] <a href="https://en.wikipedia.org/wiki/SHRDLU" rel="nofollow">https://en.wikipedia.org/wiki/SHRDLU</a>

评论 #24181173 未加载

评论 #24181115 未加载

YeGoblynQueennealmost 5 years ago

>> The Winograd schema test was originally intended to be a more rigorous replacement for the Turing test, because it seems to require deep knowledge of how things fit together in the world, and the ability to reason about that knowledge in a linguistic context. Recent advances in NLP have allowed computers to achieve near human scores:(<a href="https://gluebenchmark.com/leaderboard/" rel="nofollow">https://gluebenchmark.com/leaderboard/</a>).The "Winograd schema" in Glue/SuperGlue refers to the Winograd-NLI benchmark which is simplified with respect to the original Winograd Schema Challenge [1], on which the state-of-the-art still significantly lags human performance:The Winograd Schema Challenge is a dataset for common sense reasoning. It employs Winograd Schema questions that require the resolution of anaphora: the system must identify the antecedent of an ambiguous pronoun in a statement. Models are evaluated based on accuracy.WNLI is a relaxation of the Winograd Schema Challenge proposed as part of the GLUE benchmark and a conversion to the natural language inference (NLI) format. The task is to predict if the sentence with the pronoun substituted is entailed by the original sentence. While the training set is balanced between two classes (entailment and not entailment), the test set is imbalanced between them (35% entailment, 65% not entailment). The majority baseline is thus 65%, while for the Winograd Schema Challenge it is 50% (Liu et al., 2017). The latter is more challenging.<a href="https://nlpprogress.com/english/common_sense.html" rel="nofollow">https://nlpprogress.com/english/common_sense.html</a>There is also a more recent adversarial version of the Winograd Schema Challenge called Winogrande. I can't say I'm on top of the various results and so I don't know the state of the art, but it's not yet "near human", not without caveats (for example, wikipedia reports 70% accuracy on 70 problems manually selected from the originoal WSC).__________[1] <a href="https://www.aaai.org/ocs/index.php/KR/KR12/paper/view/4492" rel="nofollow">https://www.aaai.org/ocs/index.php/KR/KR12/paper/view/4492</a>

bloafalmost 5 years ago

I know that there are allegedly NLP algorithms for generating things like articles about sports games. I assume they have something more like the type signature (timeline of events) -> (narrative about said events)What this article is about is more (question/prompt) -> (answer/continuation of prompt)Does anyone know if there is progress in the (timeline of events) -> (narrative about said events) space?

评论 #24185148 未加载

walleeeealmost 5 years ago

> A lot of the power of the thought experiment hinges on the fact that the room solves questions using a lookup table, this stacks the deck. Perhaps we be more willing to say that the room as a whole understood language if it formed an (implicit) model of how things are, and of the current context, and used those models to answer questions.Some define intelligence (entirely separately from consciousness) precisely as the ability to develop an internal model. Coupled to a regulatory feedback the system can then modify itself in response to some set of internal and/or external conditions (Joscha Bach for instance suggests consciousness is a consequence of extremely complex self-models)

ragebolalmost 5 years ago

> In my head- and maybe this was naive- I had thought that, in order to attempt these sorts of tasks with any facility, it wouldn’t be sufficient to simply feed a computer lots of text.(Tasks here referring to questions in the New York Regent’s science exam)Same for me.But it makes sense of course that learning from text only is entirely possible. I certainly have not directly observed the answer to eg. 'Which process in an apple tree primarily results from cell division? (1) growth (2) photosynthesis (3) gas exchange (4) waste removal', I have been taught, from text books, what the answer should be.I do have a much better grounding of what growth is, what apples and apple trees are though.

_emacsomancer_almost 5 years ago

A bit I found rather strange, on the language-side:> This is to say the patterns in language use mirror the patterns of how things are(1).> (1)- Strictly of course only the patterns in true sentences mirror, or are isomorphic to, the arrangement of the world, but most sentences people utter are at least approximately true.Presumably this should really say something like "...but most sentences people utter are at least approximately true of their mental representation of the world."

ascavalcante80almost 5 years ago

NLP is great for many things, but, from my own experience as a NLP developer, machines are not even close to understand human language. They can interpret well some kind of written speech, but they will struggle to grasp two humans speaking to each other. The progress we are make on building chatbots and vocal assistants is mainly due to the fact We are learning how to speaking to the machines, and not the contrary.

lauriegalmost 5 years ago

I find it a little bit strange that there is an unspoked assumption in almost all natural language processing: That speech and text are perfectly equivalent.All of the examples in the article work on English text, not spoken English. I would consider spoken English to be a much better "Gold standard" of natural language.I'm really looking forward to machine translation operating purely on a speech in/speech out basis, instead of converting to text as an intermediate step.

rllinalmost 5 years ago

the thing is humans have most efficiently encoded (in detail) reality in text. humans already highlight what is worth encoding about reality.for example, you can finetune gpt-2 to have an idea of sexual biology by having it read erotica. just like how you can have a model learn the same by watching porn. but it is much more efficient to read the text, since there is much less information that is "useless"

p1eskalmost 5 years ago

Note this is pre-GPT-3. In fact I expect GPT-4 will be where interesting things start happening in NLP.

评论 #24180078 未加载

benibelaalmost 5 years ago

Rather than a generator, I could use a good verifier, i.e., an accurate grammar checker

naragalmost 5 years ago

Has it happened that a "thought experiment" has become a real experiment ever?

评论 #24180335 未加载

评论 #24181820 未加载

评论 #24181197 未加载

jvanderbotalmost 5 years ago

I'd go one step further: Humans themselves don't understand anything, we are just good at constructing logical-sounding (plausible, testable) stories about things. These are mental models, and it's the only way we can make reasonable predictions to within error tolerances of our day-to-day experience, but they are flat-out lies and stories we tell ourselves not based on a high-fidelity understanding of anything.Rumination, deep thinking, etc is simply actor-critic learning of these mental models for story-telling.

评论 #24180188 未加载

评论 #24180519 未加载

评论 #24180733 未加载

评论 #24180309 未加载

18 comments

rvensealmost 5 years ago

评论 #24181588 未加载

评论 #24181556 未加载

评论 #24194532 未加载

评论 #24182612 未加载

FiberBundlealmost 5 years ago

评论 #24181474 未加载

评论 #24181991 未加载

评论 #24181851 未加载

评论 #24182810 未加载

rlandalmost 5 years ago

评论 #24182641 未加载

评论 #24182577 未加载

评论 #24182560 未加载

mqusalmost 5 years ago

评论 #24180331 未加载

skybrianalmost 5 years ago

评论 #24182817 未加载

评论 #24182869 未加载

YeGoblynQueennealmost 5 years ago

评论 #24181173 未加载

评论 #24181115 未加载

YeGoblynQueennealmost 5 years ago

bloafalmost 5 years ago

评论 #24185148 未加载

walleeeealmost 5 years ago

ragebolalmost 5 years ago

_emacsomancer_almost 5 years ago

ascavalcante80almost 5 years ago

lauriegalmost 5 years ago

rllinalmost 5 years ago

p1eskalmost 5 years ago

Note this is pre-GPT-3. In fact I expect GPT-4 will be where interesting things start happening in NLP.

评论 #24180078 未加载

benibelaalmost 5 years ago

Rather than a generator, I could use a good verifier, i.e., an accurate grammar checker