科技回声

7 条评论

emeijer超过 7 年前

Very interesting approach, and intuitively it makes sense to treat language less as a sequence of words over time and more as a collection of words/tokens with meaning in their relative ordering.<p>Now I'm wondering what would happen if a model like this were applied to different kinds of text generation like chat bots. Maybe we could build actually useful bots if they can have attention on the entire conversation so far and additional meta data. Think customer service bots with access to customer data that can learn to interpret questions, associate it with their account information through the attention model and generate useful responses.

评论 #15152140 未加载

评论 #15205972 未加载

评论 #15148148 未加载

devindotcom超过 7 年前

DeepL (was on HN earlier this week) also uses an attention-based mechanism like this (or at least, with the same intention and effect). They didn't really talk about it but the founder mentioned it to me. The two seem to have independently pursued the technique, perhaps from some shared ancestor like a paper they both were inspired by.

评论 #15146688 未加载

评论 #15145819 未加载

评论 #15146389 未加载

rayuela超过 7 年前

The key to this paper is the "Multi-Head Attention" which looks a lot like a Convolutional layer to me.

jatsign超过 7 年前

Has anyone come across good ML to do arabic-english? Seems to be a complete lack of decent training data.

评论 #15150820 未加载

评论 #15153170 未加载

mykeliu超过 7 年前

I'm a novice when it comes to neural network models, but would I be correct in interpreting this as a convolutional network architecture with multiple stacked encoders and decoders?

评论 #15145212 未加载

sandGorgon超过 7 年前

would something like this work well on mixed/pidgin languages - e.g. Hinglish , which is a mixture of Hindi and English and used in daily vernacular ?

bra-ket超过 7 年前

does it mean we don't need gradient descent after all to achieve the same result?

评论 #15146823 未加载

7 条评论

emeijer超过 7 年前

评论 #15152140 未加载

评论 #15205972 未加载

评论 #15148148 未加载

devindotcom超过 7 年前

评论 #15146688 未加载

评论 #15145819 未加载

评论 #15146389 未加载

rayuela超过 7 年前

The key to this paper is the "Multi-Head Attention" which looks a lot like a Convolutional layer to me.

jatsign超过 7 年前

Has anyone come across good ML to do arabic-english? Seems to be a complete lack of decent training data.

评论 #15150820 未加载

评论 #15153170 未加载

mykeliu超过 7 年前

I'm a novice when it comes to neural network models, but would I be correct in interpreting this as a convolutional network architecture with multiple stacked encoders and decoders?

评论 #15145212 未加载

sandGorgon超过 7 年前

would something like this work well on mixed/pidgin languages - e.g. Hinglish , which is a mixture of Hindi and English and used in daily vernacular ?

bra-ket超过 7 年前

does it mean we don't need gradient descent after all to achieve the same result?

评论 #15146823 未加载

Transformer: A Novel Neural Network Architecture for Language Understanding

7 条评论

Transformer: A Novel Neural Network Architecture for Language Understanding

7 条评论