Zero-Shot Translation with Multilingual Neural Machine Translation System

218 pointsby wwilsonover 8 years ago

10 comments

Smerityover 8 years ago

If people are interested in the underlying architecture of Google's Neural Machine Translation (GNMT) system, I wrote an article that builds it up piece by piece. While it's intended for people who are likely to implement GNMT or similar architectures, the article is descriptive enough that it should be possible to follow along even if you're not well versed in deep learning.<a href="http://smerity.com/articles/2016/google_nmt_arch.html" rel="nofollow">http://smerity.com/articles/2016/google_nmt_arch.html</a>The GNMT architecture is used almost as is for the zero-shot MT experiments. We're likely to see the GNMT architecture used extensively by Google for a variety of projects as they spent a deal of time and effort ensuring it is scalable to quite large datasets. Training a neural machine translation system with a single language pair is difficult - training it with multiple, especially all using the same set of weights, is insanely challenging!As an example, the GNMT architecture was used as the basis of "Generating Long and Diverse Responses with Neural Conversation Models", which trains on the entirety of Reddit (1.7 billion messages) as well as other various datasets.<a href="https://openreview.net/forum?id=HJDdiT9gl" rel="nofollow">https://openreview.net/forum?id=HJDdiT9gl</a>

评论 #13020537 未加载

评论 #13019275 未加载

sgentleover 8 years ago

This reminds me of Searle's Chinese Room Argument[0]: imagine you have a particularly dreary job where you sit in a room filled with boxes of symbols written on paper. Every now and again someone comes in and hands you some new symbols. You look through your rulebook and, depending on what it says, hand them some symbols back. It turns out that these rules actually implement a conversational program in Chinese. And if you can implement those rules and not understand Chinese, why would you think a computer program, implementing its own rules, could understand anything?The common "Systems Reply" response to this is that you're looking at the wrong layer of abstraction. The computer hardware (or the person in the room) doesn't understand Chinese, the computer plus the rules plus the data forms a system that understands Chinese. Searle's answer to this is that, well, what if you memorised the rules and the database? You might know all the rules, you might be able to follow them, but you wouldn't understand Chinese.What I think is fascinating about this is that it's vulnerable to Bayesian Judo[1]: if you have a strong belief that computers aren't capable of true understanding because of the Chinese Room Argument, then building an actual Chinese Room-style computer and having it show understanding should be a fairly strong blow to that belief.Now, it's easy to quibble about what true understanding actually means, but one version (used by Searle's answer) is this: "[..] he would not know the meaning of the Chinese word for hamburger. He still cannot get semantics from syntax." But this news is exactly that! A computer translation of the same semantic concept from one syntax to another without ever having been taught the rules connecting them. In other words, this is semantics from syntax implemented by nothing but a computer, a database, and a set of rules.So, by the reverse Chinese Room Argument, I would say this system exhibits a kind of understanding. Not a very sophisticated kind, mind you, but something that should still spook you if you believe computers are categorically incapable of thinking like us.[0] <a href="http://plato.stanford.edu/entries/chinese-room/" rel="nofollow">http://plato.stanford.edu/entries/chinese-room/</a> [1] <a href="http://lesswrong.com/lw/i5/bayesian_judo/" rel="nofollow">http://lesswrong.com/lw/i5/bayesian_judo/</a>

评论 #13020708 未加载

评论 #13020153 未加载

评论 #13022220 未加载

评论 #13024008 未加载

hota_maziover 8 years ago

Fascinating. Maybe the next step will be to extract the tokenized interlingua language that's emerged in the neural network and map it to real words, and blam, we reinvent Esperanto!

评论 #13018757 未加载

评论 #13018826 未加载

ChuckMcMover 8 years ago

Nice piece of work and counts as "implementing Star Trek in the present". Now I just need a nice pair of noise cancelling over the ear headphones that let me hear english spoken no matter where I am :-)

评论 #13020463 未加载

honkhonkpantsover 8 years ago

Pretty impressive, but even more amazingly their paper is in a single-column format that I can actually read on my computer, instead of pretending that I am reading printed and bound conference proceedings. Truly a giant leap for the field.

YeGoblynQueenneover 8 years ago

>> We call this “zero-shot” translation, shown by the yellow dotted lines in the animation. To the best of our knowledge, this is the first time this type of transfer learning has worked in Machine Translation.I think it was last year when a friend was telling me how Google translates the Greek word for "swallow" (the bird) to French. Back then, the translation was the French word for "to swallow" (the verb). The bird and the action don't even sound remotely alike in Greek and neither are they spelled alike (the bird is "χελιδόνι" the action is "καταπίνω"; google trans. will at least give their correct pronounciation). My friend figured Google can't find enough examples between the two languages, so it goes via English ... were the two words are homonyms.I think that was last year, and certainly before September.So I gave it a try again today, and this is still what I get:<pre><code> Greek French χελιδόνι avaler chelidóni </code></pre> If you omit the accent on the "o" you don't even get the mistranslation- you get only the phonetic transcription of the Greek word in latin characters.Obviously the important thing here is not the one word that google translate gets wrong, but the fact that it doesn't really look like this "new" system is all that new, or that it does anything all that different from the previous one, or indeed that it improves things at all.

YeGoblynQueenneover 8 years ago

Google, and also Microsoft btw, absolutely need to be called out on this. They keep claiming that their translation systems work well, because they have reasonably good results between some language pairs, like English/French or English/Spanish, that are a) close linguistically b) have a lot of examples of translated documents and, more importantly, c) have many speakers who might use Google translate.For languages where none of the above holds, however, the results continue to be completely ridiculous, no matter what "new" technique Google (or MS) advertises. Since those languages are not spoken by as many people as English or Spanish etc, however, it's very hard for the user to figure out how attrociously bad their automatic translations are.Here's an example from my native Greek; this is a bit of news text from yesterday [1]:Λανθασμένη χαρακτηρίζει ο Κύπριος κυβερνητικός εκπρόσωπος, Νίκος Χριστοδουλίδης, την προσέγγιση, να μπαίνουν στο «ίδιο καλάθι» η Ελλάδα με την Τουρκία σε σχέση με το κυπριακό.And here's Google's translation:Incorrect characterizes Cypriot government spokesman Nikos Christodoulides, the approach to be put in "one basket" by Greece and Turkey in relation to Cyprus.So, the Cypriot government spokesman (well done) is put in a basket by Greece and Turkey (wait wut). Hey, maybe the guy wanted to be put in two baskets? [2]That's very typical of the way Google translates between Greek and English. For Google, it's Neural Networks leading us to a bright future where language barriers are eliminated thanks to Scienz! For Greeks, it's comedy gold.And it's the same for Russians, Polish, Finns, Swedes, Indians, Chinese, Hungarians...Still, Google keeps including those languages in the count of languages it "covers", because it's good advertisement and who can really dispute them anyway?_____________________[1] <a href="http://www.kathimerini.gr/884845/article/epikairothta/politikh/xristodoylidhs-gia-kypriako-den-yphr3e-akraia-8esh-apo-thn-a8hna" rel="nofollow">http://www.kathimerini.gr/884845/article/epikairothta/politi...</a>[2] What's being said is more like: "The Cypriot government spokesman said that it's a mistake to treat Greece and Turkey in the same manner with regards to Cyprus".

vurpoover 8 years ago

I wonder how much memory this translation via an intermediate representation of a sentence takes. It seems like representing the semantic meaning of a sentence in a language-independent way would take a huge amount of data.

评论 #13020927 未加载

glandiumover 8 years ago

The mentioned Japanese->English->Korean combination is one of the worst possible things to do. Both Korean and Japanese are very different from English, while similar to each other to some extent. Direct translation from one to the other would actually have much better results than translating back to Korean the (likely broken) English you get from Japanese.Edit: I do realize they're not talking about successive translations, but that's essentially how the training ends up happening, isn't it?A better example, IMHO, would have been three very different languages, like English, Japanese and Russian.

评论 #13018730 未加载

评论 #13019434 未加载

spynxicover 8 years ago

This post seems to over-exaggerate a commonly known mathematical property.Suppose I have languages X, Y, and Z. My machine currently knows how to translate between X->Y and X->Z. The goal is to turn Y into Z without direct training. The process would be to translate Y into X and X into Z.. effectively Y->Z.This isn't really transfer learning as much as it is logical induction...Or am I missing something?

评论 #13019851 未加载

评论 #13018697 未加载

评论 #13019970 未加载

评论 #13018687 未加载