TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Transformer: A Novel Neural Network Architecture for Language Understanding

280 pointsby andrew3726over 7 years ago

7 comments

emeijerover 7 years ago
Very interesting approach, and intuitively it makes sense to treat language less as a sequence of words over time and more as a collection of words&#x2F;tokens with meaning in their relative ordering.<p>Now I&#x27;m wondering what would happen if a model like this were applied to different kinds of text generation like chat bots. Maybe we could build actually useful bots if they can have attention on the entire conversation so far and additional meta data. Think customer service bots with access to customer data that can learn to interpret questions, associate it with their account information through the attention model and generate useful responses.
评论 #15152140 未加载
评论 #15205972 未加载
评论 #15148148 未加载
devindotcomover 7 years ago
DeepL (was on HN earlier this week) also uses an attention-based mechanism like this (or at least, with the same intention and effect). They didn&#x27;t really talk about it but the founder mentioned it to me. The two seem to have independently pursued the technique, perhaps from some shared ancestor like a paper they both were inspired by.
评论 #15146688 未加载
评论 #15145819 未加载
评论 #15146389 未加载
rayuelaover 7 years ago
The key to this paper is the &quot;Multi-Head Attention&quot; which looks a lot like a Convolutional layer to me.
jatsignover 7 years ago
Has anyone come across good ML to do arabic-english? Seems to be a complete lack of decent training data.
评论 #15150820 未加载
评论 #15153170 未加载
mykeliuover 7 years ago
I&#x27;m a novice when it comes to neural network models, but would I be correct in interpreting this as a convolutional network architecture with multiple stacked encoders and decoders?
评论 #15145212 未加载
sandGorgonover 7 years ago
would something like this work well on mixed&#x2F;pidgin languages - e.g. Hinglish , which is a mixture of Hindi and English and used in daily vernacular ?
bra-ketover 7 years ago
does it mean we don&#x27;t need gradient descent after all to achieve the same result?
评论 #15146823 未加载