TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Transformer: A Novel Neural Network Architecture for Language Understanding

280 点作者 andrew3726超过 7 年前

7 条评论

emeijer超过 7 年前
Very interesting approach, and intuitively it makes sense to treat language less as a sequence of words over time and more as a collection of words&#x2F;tokens with meaning in their relative ordering.<p>Now I&#x27;m wondering what would happen if a model like this were applied to different kinds of text generation like chat bots. Maybe we could build actually useful bots if they can have attention on the entire conversation so far and additional meta data. Think customer service bots with access to customer data that can learn to interpret questions, associate it with their account information through the attention model and generate useful responses.
评论 #15152140 未加载
评论 #15205972 未加载
评论 #15148148 未加载
devindotcom超过 7 年前
DeepL (was on HN earlier this week) also uses an attention-based mechanism like this (or at least, with the same intention and effect). They didn&#x27;t really talk about it but the founder mentioned it to me. The two seem to have independently pursued the technique, perhaps from some shared ancestor like a paper they both were inspired by.
评论 #15146688 未加载
评论 #15145819 未加载
评论 #15146389 未加载
rayuela超过 7 年前
The key to this paper is the &quot;Multi-Head Attention&quot; which looks a lot like a Convolutional layer to me.
jatsign超过 7 年前
Has anyone come across good ML to do arabic-english? Seems to be a complete lack of decent training data.
评论 #15150820 未加载
评论 #15153170 未加载
mykeliu超过 7 年前
I&#x27;m a novice when it comes to neural network models, but would I be correct in interpreting this as a convolutional network architecture with multiple stacked encoders and decoders?
评论 #15145212 未加载
sandGorgon超过 7 年前
would something like this work well on mixed&#x2F;pidgin languages - e.g. Hinglish , which is a mixture of Hindi and English and used in daily vernacular ?
bra-ket超过 7 年前
does it mean we don&#x27;t need gradient descent after all to achieve the same result?
评论 #15146823 未加载