TechEcho

13 comments

CGamesPlayabout 8 years ago

I'm relatively novice to machine learning but here's my best attempt to summarize what's going on in layman's terms. Please correct me if I'm wrong.- Encode the words in the source (aka embedding, section 3.1)- Feed every run of k words into a convolutional layer producing an output, repeat this process 6 layers deep (section 3.2).- Decide on which input word is most important for the "current" output word (aka attention, section 3.3).- The most important word is decoded into the target language (section 3.1 again).You repeat this process with every word as the "current" word. The critical insight of using this mechanism over an RNN is that you can do this repetition in parallel because each "current" word does not depend on any of the previous ones.Am I on the right track?

评论 #14303165 未加载

forgotmyhnaccabout 8 years ago

I really like that Facebook open sources both code and model along with the paper. Most companies don't: e.g. Google, deepmind, Baidu.

评论 #14304739 未加载

评论 #14303394 未加载

评论 #14302964 未加载

gavinpcabout 8 years ago

> Facebook's mission of making the world more openThat's a rather strong statement, for a company that has become one of the world's most complained-about black boxes.But yes, they have done a lot of good in the computer science space.

评论 #14302600 未加载

snippyhollowabout 8 years ago

paper: <a href="https://s3.amazonaws.com/fairseq/papers/convolutional-sequence-to-sequence-learning.pdf" rel="nofollow">https://s3.amazonaws.com/fairseq/papers/convolutional-sequen...</a>code: <a href="https://github.com/facebookresearch/fairseq" rel="nofollow">https://github.com/facebookresearch/fairseq</a>pre-trained models: <a href="https://github.com/facebookresearch/fairseq#evaluating-pre-trained-models" rel="nofollow">https://github.com/facebookresearch/fairseq#evaluating-pre-t...</a>

评论 #14302255 未加载

pwaiversabout 8 years ago

As far I understood it, Facebook put lots of research into optimizing a certain type of neural network (CNN), while everyone else is using another type called RNN. Up until now, CNN was faster but less accurate. However FB has progressed CNN to the point where it can compete in accuracy, particularly in speech recognition. And most importantly, they are releasing the source code and papers. Does that sound right?Can anyone else give us an ELI5?

评论 #14303086 未加载

评论 #14304785 未加载

deepnotderpabout 8 years ago

As far as I understand, only the use of the attention mechanism with ConvNets is novel, right? Convolutional encoders have been done before.

评论 #14307067 未加载

mrdrozdovabout 8 years ago

In this work Convolution Neural Nets (spatial models that have a weakly ordered context, as opposed to Recurrent Neural Nets which are sequential models that have a strongly ordered context) are demonstrated here to achieve State of the Art results in Machine Translation.It seems the combination of gated linear units / residual connections / attention was the key to bringing this architecture to State of the Art.It's worth noting that previously the QRNN and ByteNet architectures have used Convolutional Neural Nets for machine translation also. IIRC, those models performed well on small tasks but were not able to best SotA performance on larger benchmark tasks.I believe it is almost always more desirable to encode a sequence using a CNN if possible as many operations are embarrassingly parallel!The bleu scores in this work were the following:Task (previous baseline): new baselineWMT’16 English-Romanian (28.1): 29.88 WMT’14 English-German (24.61): 25.16 WMT’14 English-French (39.92): 40.46

londons_exploreabout 8 years ago

This smells of "we built custom silicon to do fast image processing using CNNs and fully connected networks, and now we want to use that same silicon for translations. "

评论 #14307144 未加载

shriphaniabout 8 years ago

I wonder if they can combine this with bytenet (dilated convolutons in place of vanilla convs) - gives you a larger FOV and add in attention and then you probably have a new SOTA.

pamaabout 8 years ago

This is a very cool development. Has anyone written a pytorch or Keras version of the architecture?

m00xabout 8 years ago

Does this mean that we're close to being able to use CNNs for text-to-speech?

评论 #14304690 未加载

esMazerabout 8 years ago

no demo?

评论 #14305311 未加载

danielvfabout 8 years ago

TLDR: Cutting edge accuracy, nine times faster than previous state of the art, published models and source code.But go read the article- nice animated diagrams in there.

评论 #14302624 未加载

13 comments

CGamesPlayabout 8 years ago

评论 #14303165 未加载

forgotmyhnaccabout 8 years ago

I really like that Facebook open sources both code and model along with the paper. Most companies don't: e.g. Google, deepmind, Baidu.

评论 #14304739 未加载

评论 #14303394 未加载

评论 #14302964 未加载

gavinpcabout 8 years ago

评论 #14302600 未加载

snippyhollowabout 8 years ago

评论 #14302255 未加载

pwaiversabout 8 years ago

评论 #14303086 未加载

评论 #14304785 未加载

deepnotderpabout 8 years ago

As far as I understand, only the use of the attention mechanism with ConvNets is novel, right? Convolutional encoders have been done before.

评论 #14307067 未加载

mrdrozdovabout 8 years ago

londons_exploreabout 8 years ago

This smells of "we built custom silicon to do fast image processing using CNNs and fully connected networks, and now we want to use that same silicon for translations. "

评论 #14307144 未加载

shriphaniabout 8 years ago

I wonder if they can combine this with bytenet (dilated convolutons in place of vanilla convs) - gives you a larger FOV and add in attention and then you probably have a new SOTA.

pamaabout 8 years ago

This is a very cool development. Has anyone written a pytorch or Keras version of the architecture?

m00xabout 8 years ago

Does this mean that we're close to being able to use CNNs for text-to-speech?

评论 #14304690 未加载

esMazerabout 8 years ago

no demo?

评论 #14305311 未加载

danielvfabout 8 years ago

TLDR: Cutting edge accuracy, nine times faster than previous state of the art, published models and source code.But go read the article- nice animated diagrams in there.

评论 #14302624 未加载

A novel approach to neural machine translation

13 comments

A novel approach to neural machine translation

13 comments