TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

A novel approach to neural machine translation

283 pointsby snippyhollowabout 8 years ago

13 comments

CGamesPlayabout 8 years ago
I&#x27;m relatively novice to machine learning but here&#x27;s my best attempt to summarize what&#x27;s going on in layman&#x27;s terms. Please correct me if I&#x27;m wrong.<p>- Encode the words in the source (aka embedding, section 3.1)<p>- Feed every run of k words into a convolutional layer producing an output, repeat this process 6 layers deep (section 3.2).<p>- Decide on which input word is most important for the &quot;current&quot; output word (aka attention, section 3.3).<p>- The most important word is decoded into the target language (section 3.1 again).<p>You repeat this process with every word as the &quot;current&quot; word. The critical insight of using this mechanism over an RNN is that you can do this repetition in parallel because each &quot;current&quot; word does not depend on any of the previous ones.<p>Am I on the right track?
评论 #14303165 未加载
forgotmyhnaccabout 8 years ago
I really like that Facebook open sources both code and model along with the paper. Most companies don&#x27;t: e.g. Google, deepmind, Baidu.
评论 #14304739 未加载
评论 #14303394 未加载
评论 #14302964 未加载
gavinpcabout 8 years ago
&gt; Facebook&#x27;s mission of making the world more open<p>That&#x27;s a rather strong statement, for a company that has become one of the world&#x27;s most complained-about black boxes.<p>But yes, they have done a lot of good in the computer science space.
评论 #14302600 未加载
snippyhollowabout 8 years ago
paper: <a href="https:&#x2F;&#x2F;s3.amazonaws.com&#x2F;fairseq&#x2F;papers&#x2F;convolutional-sequence-to-sequence-learning.pdf" rel="nofollow">https:&#x2F;&#x2F;s3.amazonaws.com&#x2F;fairseq&#x2F;papers&#x2F;convolutional-sequen...</a><p>code: <a href="https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;fairseq" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;fairseq</a><p>pre-trained models: <a href="https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;fairseq#evaluating-pre-trained-models" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;fairseq#evaluating-pre-t...</a>
评论 #14302255 未加载
pwaiversabout 8 years ago
As far I understood it, Facebook put lots of research into optimizing a certain type of neural network (CNN), while everyone else is using another type called RNN. Up until now, CNN was faster but less accurate. However FB has progressed CNN to the point where it can compete in accuracy, particularly in speech recognition. And most importantly, they are releasing the source code and papers. Does that sound right?<p>Can anyone else give us an ELI5?
评论 #14303086 未加载
评论 #14304785 未加载
deepnotderpabout 8 years ago
As far as I understand, only the use of the attention mechanism with ConvNets is novel, right? Convolutional encoders have been done before.
评论 #14307067 未加载
mrdrozdovabout 8 years ago
In this work Convolution Neural Nets (spatial models that have a weakly ordered context, as opposed to Recurrent Neural Nets which are sequential models that have a strongly ordered context) are demonstrated here to achieve State of the Art results in Machine Translation.<p>It seems the combination of gated linear units &#x2F; residual connections &#x2F; attention was the key to bringing this architecture to State of the Art.<p>It&#x27;s worth noting that previously the QRNN and ByteNet architectures have used Convolutional Neural Nets for machine translation also. IIRC, those models performed well on small tasks but were not able to best SotA performance on larger benchmark tasks.<p>I believe it is almost always more desirable to encode a sequence using a CNN if possible as many operations are embarrassingly parallel!<p>The bleu scores in this work were the following:<p>Task (previous baseline): new baseline<p>WMT’16 English-Romanian (28.1): 29.88 WMT’14 English-German (24.61): 25.16 WMT’14 English-French (39.92): 40.46
londons_exploreabout 8 years ago
This smells of &quot;we built custom silicon to do fast image processing using CNNs and fully connected networks, and now we want to use that same silicon for translations. &quot;
评论 #14307144 未加载
shriphaniabout 8 years ago
I wonder if they can combine this with bytenet (dilated convolutons in place of vanilla convs) - gives you a larger FOV and add in attention and then you probably have a new SOTA.
pamaabout 8 years ago
This is a very cool development. Has anyone written a pytorch or Keras version of the architecture?
m00xabout 8 years ago
Does this mean that we&#x27;re close to being able to use CNNs for text-to-speech?
评论 #14304690 未加载
esMazerabout 8 years ago
no demo?
评论 #14305311 未加载
danielvfabout 8 years ago
TLDR: Cutting edge accuracy, nine times faster than previous state of the art, published models and source code.<p>But go read the article- nice animated diagrams in there.
评论 #14302624 未加载