Very interesting approach, and intuitively it makes sense to treat language less as a sequence of words over time and more as a collection of words/tokens with meaning in their relative ordering.<p>Now I'm wondering what would happen if a model like this were applied to different kinds of text generation like chat bots. Maybe we could build actually useful bots if they can have attention on the entire conversation so far and additional meta data. Think customer service bots with access to customer data that can learn to interpret questions, associate it with their account information through the attention model and generate useful responses.
DeepL (was on HN earlier this week) also uses an attention-based mechanism like this (or at least, with the same intention and effect). They didn't really talk about it but the founder mentioned it to me. The two seem to have independently pursued the technique, perhaps from some shared ancestor like a paper they both were inspired by.
I'm a novice when it comes to neural network models, but would I be correct in interpreting this as a convolutional network architecture with multiple stacked encoders and decoders?
would something like this work well on mixed/pidgin languages - e.g. Hinglish , which is a mixture of Hindi and English and used in daily vernacular ?