I tried some Chinese news sites, and more than one paragraph was translated to English perfectly. Very impressive. But my Chinese wife asked me to put in some text from Weibo, China's Twitter clone, and the translations were nearly incomprehensible. She insisted that the samples we chose were not slang but were everyday colloquial Chinese that is easily understood by anyone. My guess is that Google's training set is mainly Chinese news sites, which are a formal type of Chinese that is quite different from spoken Chinese. I wonder if they can scrape Weibo messages to improve their translations.
I can confirm: I took some of the most difficult text I could find (some articles from lesswrong.com) and translated them from English to German with Google... the German translation is very close to perfect now- Comparable to any manual translation I could do on my own, being fluent in both languages.
Wow I looked at this between Korean and English -- it's very impressive. Amazing, in fact, b/c Korean and Japanese seem to be the hardest to get right(?). There were inaccuracies but in the past even getting the gist of something was difficult. I then tried translating newspaper text from Korean to French, but that was making far more mistakes... Also, going from English to Korean is better but, for example, "I'll go nuts" turns into "I'll bear fruit" (나는 열매 맺을거야) and so on. And of course the social / honorific stuff can't be conveyed yet. But it's head and shoulders over the previous versions. Amazing. A bit frightening.<p>Just reading Korean is really hard for me b/c I'm not Korean... so this should help. It might not help my Korean language skill, though... or will it? Of course it also tends to devalue my skill of reading in Korean... or does it?
If you are interested in how neural translation systems like this work and how they are different than the previous statistical systems, check out <a href="https://medium.com/@ageitgey/machine-learning-is-fun-part-5-language-translation-with-deep-learning-and-the-magic-of-sequences-2ace0acca0aa" rel="nofollow">https://medium.com/@ageitgey/machine-learning-is-fun-part-5-...</a>
Let's try (Brazilian) Portuguese:<p>"Google e Facebook declararam guerra aos sites de Internet que difundem notícias falsas, que o buscador e a rede social vão impedir que se beneficiem de seus serviços de publicidade."<p>"Google and Facebook have declared war on Internet sites that broadcast fake news that the search engine and social network will prevent them from benefiting from their advertising services."<p>_____<p>After the comma (which Google misses in its English translation), the word "que" (that) should be translated as "which" in this case. Also, the reflexive "beneficiar-se" (to benefit onself, used here in the imperative/command tense) seems to have confused Google, likely due to having missed the comma earlier. I took out the middle part of the second clause and translated only "que vão impedir que se beneficiem" and Google got it right, translating it as "which will prevent them from benefiting".<p>I have experience translating from BR-PT to EN (and even vice-versa for my own testing), and BR-PT native speakers have a habit of writing long-winded, run-on sentences in all sorts of published literature. I'm curious to see Google understand that aspect, which even trips me up once in a blue moon.
Find it really interesting that Google Cloud Platform customers get access to this immediately.<p>Bodes well for Google cloud, putting out your latest and greatest eases my thoughts as to whether its a first class citizen within Google. (I know the head of the Cloud unit is on Google's board which was a major sign of taking 'cloud' seriously.)
Japanese translation has improved recently but it still often generates pretty silly results. Though this language is one big Winograd schema so it would be hard to improve without strong AI. Shameless plug for my own Japanese translation service <a href="http://ichi.moe" rel="nofollow">http://ichi.moe</a> which doesn't even attempt to build a sentence and relies on the user to solve ambiguity.
Since they listed Japanese in the list of languages, I went ahead and took the first news on yahoo japan, which was translated to:<p>"Government and ruling parties will raise the upper limit of the annual income (under 1,300,000 yen) of spouses subject to deduction to 1.3 million yen or 1.5 million yen, over the review of spousal deduction, which is the focus of the tax reform debate focused on in the 2017 tax reform debate I entered the adjustment with the plan. If the annual income of each husband exceeds 13.2 million yen (11 million yen for "income" minus the amount deemed necessary expenses for work), 11.2 million yen (9 million yen same), it is excluded from the system. The ruling party taxation study committee will review these two plans and aim to include it in the tax reform outline of FY2005"<p>Seems like there's still a long way to go.<p>(Copy/pasted original text:
2017年度の税制改正議論で焦点となっている配偶者控除の見直しを巡り、政府・与党は、控除対象となる配偶者の年収上限(103万円以下)を130万円か150万円まで引き上げる案で調整に入った。それぞれ夫の年収が1320万円(仕事の必要経費とみなされる額を差し引いた「所得」では1100万円)、1120万円(同900万円)を超える場合は制度の対象外とする。与党税制調査会はこの2案を軸に検討し、17年度税制改正大綱に盛り込むことを目指す)<p>I wonder how it went from 103万円以下 to "under 1,300,000 yen"
It's still a ways from becoming self aware. Here's a piece of an old family letter in German:<p>Meine lieben Kinder!
Sveben brachte Frl. Moldelen die Rarte
von Herrn Thomass mit du schoene Nachricht,
dass R. doch endlich gut in Br. ankam. Was
bin ich froh daruber! Und nun hoffe ich doch
schr, dess Roesi Samstag mittag in R. ankem,
sich schrubben u aus schlafen konnte. Dickes,
hast Du Dir nichts gehalt bei der Rums Scheru?
Mittwoch ging ich nach Tisch zur Stadt, be-
sorgte Einiges u wollte im Hansahed in der
Wilm. Str. haden. Musste aber 2 1/2 St. werde,
dann war es aber sehr schoen. Ich mechte
dann das Abendessen u erst um 8 h ging
ich rauf ins Zimmer zum Tisch decken de
fand ich den Zettel von Frl. B. mit Frl. Mol.
dehns grusse von Dir! Meinen Schrucke koemmt
Ihr Euch denken.<p>My dear children!
Sveben brought Ms. Moldelen the Rarte
From Mr. Thomass, with a nice news,
That R. finally got well in Br. What
I'm glad about it And now I hope
Schr, dess Roesi Saturday noon in R. ankem,
Could scrub u from sleeping. Thick,
You have nothing to do with the Rums Scheru?
Wednesday, I went to the city,
Caused some u wanted in the Hansahed in the
Wilm. Str. Had to be 2 1/2 St.,
Then it was very beautiful. I want to
Then the dinner u went around 8 h
I rise up into the room to cover the table
I found the note of Miss B. with Miss Mol.
Dehns greetings from you! My shrine
You think.
So for German:<p>Our baby is due in January.<p>goes to<p>Unser Baby ist im Januar fällig<p>My German colleagues assure me Google's neural network needs a bit more training on that one. I often use Google Translate to go back from the German I have (badly) created to English, as a further check that it's somewhat understandable. In terms of it replacing asking real humans for help... I think it's still a long way away, but good to see Google investing in it.
This might create a new problem: Without obvious mistakes like word salad, you can no longer evaluate the probable quality of a translation. You might read a translated news article and be oblivious to missing or mistranslated facts, because the text flows well and sounds convincing. The old method had telltale signs of breakdown and the resulting text was always clumsy enough to warn anyone off with regards to trusting it too much. Hmmm. Interesting times.
If they manage to reach 99% translation accuracy of German texts, I say they achieved a very remarkable feat.<p>I know that Modern Standard Arabic is not supported yet with the NML system but I just went and tried the translation for a small excerpt from an article on DW [1]<p>"بالرغم من عدم وجود تأكيدات رسمية منها على نيتها للترشح مجددا، قال قيادي بارز في حزبها إن المستشارة ميركل ستترشح لولاية رابعة. جاء ذلك على لسان المسؤول عن لجنة العلاقات الخارجية في البرلمان الألماني نوربرت روتغن. "<p>"Despite the lack of official confirmation, including the intention to run again, a senior leader of her party said that Chancellor Merkel will stand for a fourth term. This came on the tongue in charge of the Foreign Relations Committee in the German Parliament Norbert Rongn."<p>Of course, the translation is not perfect but good enough. However, I believe that they could do better by working on their Arabic text-to-speech synthesizer and having a toggle option for diacritics that would definitely help them with the synthesizer as there are many words pronounced wrong or actually very wrong that's disappointing.<p>All in all, great work by the people at Google Translate.<p>[1]: <a href="http://www.dw.com/ar/%D8%B3%D9%8A%D8%A7%D8%B3%D9%8A-%D8%A8%D8%A7%D8%B1%D8%B2-%D9%81%D9%8A-%D8%AD%D8%B2%D8%A8-%D8%A7%D9%84%D9%85%D8%B3%D8%AA%D8%B4%D8%A7%D8%B1%D8%A9-%D9%85%D9%8A%D8%B1%D9%83%D9%84-%D8%B3%D8%AA%D8%AA%D8%B1%D8%B4%D8%AD-%D9%84%D9%88%D9%84%D8%A7%D9%8A%D8%A9-%D8%B1%D8%A7%D8%A8%D8%B9%D8%A9/a-36404882" rel="nofollow">http://www.dw.com/ar/%D8%B3%D9%8A%D8%A7%D8%B3%D9%8A-%D8%A8%D...</a>
German to English works amazingly well... I was able to fool it the other way round though. Here some crazy German sentences and the English version:<p>"Das altbacken emotionale Muster einer zerstoerten Ehe aehnelt dem Neutrino-sturm eines sterbenden Gasgiganten"
-"The old-fashioned emotional pattern of a ruined marriage resembles the neutrino storm of a dying gas giant"<p>(In the above I tried to confuse it using archaic words mixed with completely disconnected topics in the same sentence while still being grammatically correct.)<p>"Das verrueckte an der Sache ist der enorme Unterschied zwischen digitalem Denkmuster und analogem Sachverstand" -
"The crazy thing about this is the enormous difference between digital thought patterns and analogous expertise"<p>Beautiful! [Disclaimer: studied linguistics]
When I read something about google translate (related to Latvian or not), there is a certain phrase I love testing - "Hard rock fan".<p>There were times when it was translated literally - "a fan, manufactured from solid rock" (cieto iežu ventilators). Then for a brief moment it was translated as originally intended. Now, however, it translates to nonsense ("hard rock ventilators"), which losely may be translated back to English as "Hard rock fan", where "fan" refers to that thing which moves air around.<p>However, an article from Latvian news site was translated to English unexpectedly good. Which was not the case for English to Latvian translation, sadly. But it makes some kind of point if we consider English as lingua franca.
So what does this mean for language learning?<p>In one sense, calculators and computers haven't made learning arithmetic and other math less important. On the contrary.<p>Maybe in a similar way, by increasing the amount of communication between speakers and writers of different languages, tools like this might actually make language learning _more_ important? Or is that an interesting thought but completely wrong? Perhaps speaking and listening will gain in importance while writing and reading will decrease? Or is it not worth the time and trouble to learn another language anymore?<p>A penny for your thoughts. (OK, not a real penny ;)
I'm amazed with the improvement in Turkish. Here's an example:<p><a href="http://i.imgur.com/NQvJ6bK.png" rel="nofollow">http://i.imgur.com/NQvJ6bK.png</a><p>Left side is human translation, right side is Google:<p><a href="http://i.imgur.com/bYJDHhs.png" rel="nofollow">http://i.imgur.com/bYJDHhs.png</a><p>The loss of information is minimal, and mistakes are very tolerable. Funnily, the biggest ones are already underlined by Chrome spell checker.<p>update: I'm trying it with my commit logs (English -> Turkish and German) and the results are amazing.
I may sound over optimistic but I feel this is very good for people who are interested in learning new languages. It will surely help people learn many languages with relative ease and low cost. Anyone can try out various sentences from new target language (e.g. German) and at least get a near-enough meaning from google. I can try many variations in simple sentences and get a good start, without having to rely on some human help, which is very costly in terms of money and (more importantly) time. A human teacher will get bored with me asking hundreds of variations to translate for me, but not computer. This is great, at least for me.<p>Some people are fearing that this means now there is no need for language learning. But I see it differently, it's like how Wikipedia/Internet opened doors of knowledge to all people who are "interested" in knowledge. Now with this tool, we have a door opened to learning other language right from within our home.<p>The only nagging feeling is all this is google, with google becoming more-and-more evil, this is scary.
I translated this line from the New York Times - "The dismissals followed the abrupt firing on Friday of Gov. Chris Christie" to Tamil. It translated "firing" as துப்பாக்கி சூடு (Gunfire). Maybe it needs to infer the contextual meaning from the earlier word "dismissals"?
My German (3rd language) is rather horrid, but I thought I'd try with "Es gibt mich Schadenfreude". In return I got "It gives me pleasure". Well, yes, but.. :-)<p>I then moved on to translating between Norwegian and English, both primary languages of mine (well, the latter for some 16 years), and was thoroughly impressed by the results as long as I stayed away from idioms - well, some. Try something a bit Aussie like "Up sh*t creek in a barbed wire canoe", and it'd fall flat on its face. Then, however it successfully mastered "Bedre med en fugl i hånden enn ti på taket" => "A bird in the hand is worth two in the bush".<p>Overall, that's really quite amazing work that the team has put in.
I'm not a fan of translating the 'meaning' instead of the actual wording. You lose idiom, phrasing, even the poetry. Its like I won't understand what they said, so it gets dumbed down. Like talking to a child.<p>My pet peeve: translators that tell you 'what they meant' instead of what they said.
It seems that production versions of these cutting edge techniques come out very soon after the paper is released (in this case just a few months). I wonder how long the internal development process for these big ML efforts are.
The french translation is pretty bad. The funny thing to try is to double translate english => french => english and compare both texts. If the results are way off, you know the translation is incorrect.
I'm excited to see if these changes trickle down to the Latin model, which has seen improvement over the years but at a pace slower than non-dead/historical languages.
Even though I appreciate the quality of the new translation, on the translation example on the image with the 2 phones, I find the old translation more insightful (even if not grammatically correct) than the new one.
I.e. I prefer: "No problem can be solved from the same consciousness that they have arisen" to "Problems can never be solved with the same way of thinking that caused them"