TechEcho

7 comments

emeijerover 7 years ago

Very interesting approach, and intuitively it makes sense to treat language less as a sequence of words over time and more as a collection of words/tokens with meaning in their relative ordering.<p>Now I'm wondering what would happen if a model like this were applied to different kinds of text generation like chat bots. Maybe we could build actually useful bots if they can have attention on the entire conversation so far and additional meta data. Think customer service bots with access to customer data that can learn to interpret questions, associate it with their account information through the attention model and generate useful responses.

评论 #15152140 未加载

评论 #15205972 未加载

评论 #15148148 未加载

devindotcomover 7 years ago

DeepL (was on HN earlier this week) also uses an attention-based mechanism like this (or at least, with the same intention and effect). They didn't really talk about it but the founder mentioned it to me. The two seem to have independently pursued the technique, perhaps from some shared ancestor like a paper they both were inspired by.

评论 #15146688 未加载

评论 #15145819 未加载

评论 #15146389 未加载

rayuelaover 7 years ago

The key to this paper is the "Multi-Head Attention" which looks a lot like a Convolutional layer to me.

jatsignover 7 years ago

Has anyone come across good ML to do arabic-english? Seems to be a complete lack of decent training data.

评论 #15150820 未加载

评论 #15153170 未加载

mykeliuover 7 years ago

I'm a novice when it comes to neural network models, but would I be correct in interpreting this as a convolutional network architecture with multiple stacked encoders and decoders?

评论 #15145212 未加载

sandGorgonover 7 years ago

would something like this work well on mixed/pidgin languages - e.g. Hinglish , which is a mixture of Hindi and English and used in daily vernacular ?

bra-ketover 7 years ago

does it mean we don't need gradient descent after all to achieve the same result?

评论 #15146823 未加载

7 comments

emeijerover 7 years ago

评论 #15152140 未加载

评论 #15205972 未加载

评论 #15148148 未加载

devindotcomover 7 years ago

评论 #15146688 未加载

评论 #15145819 未加载

评论 #15146389 未加载

rayuelaover 7 years ago

The key to this paper is the "Multi-Head Attention" which looks a lot like a Convolutional layer to me.

jatsignover 7 years ago

Has anyone come across good ML to do arabic-english? Seems to be a complete lack of decent training data.

评论 #15150820 未加载

评论 #15153170 未加载

mykeliuover 7 years ago

I'm a novice when it comes to neural network models, but would I be correct in interpreting this as a convolutional network architecture with multiple stacked encoders and decoders?

评论 #15145212 未加载

sandGorgonover 7 years ago

would something like this work well on mixed/pidgin languages - e.g. Hinglish , which is a mixture of Hindi and English and used in daily vernacular ?

bra-ketover 7 years ago

does it mean we don't need gradient descent after all to achieve the same result?

评论 #15146823 未加载

Transformer: A Novel Neural Network Architecture for Language Understanding

7 comments

Transformer: A Novel Neural Network Architecture for Language Understanding

7 comments