TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Is a transformer a function from words to words?

2 点作者 trivialmath将近 2 年前
Example, the attention mechanism in the phrase: argument one is dog, argument two is cat, function is concat, result is dogcat.<p>So the query is the function, the keys are the arguments to the function and the values are the result of applying the function to the arguments. Here the result of the function is memorized from the training set and not computed.

4 条评论

PaulHoule将近 2 年前
Basically yes. &quot;Memorized&quot; is wrong in the sense that neural networks, when they work well, learn an approximation of a function and often give the right answer for cases that they haven&#x27;t seen before. It&#x27;s a danger though that a network will &quot;overfit&quot; and memorize examples and not generalize to ones it hasn&#x27;t seen.<p>The argument that &quot;chatbots can&#x27;t create anything new&quot; is completely bogus (and is often tied up in a fetishization of creativity), there is no fundamental reason one can&#x27;t attempt a literary task like &quot;Write a play in the style of Shakespeare set in Russia&#x27;s October Revolution&quot;. On the other hand it can&#x27;t (correctly) make up factual information that it hasn&#x27;t been trained on: ChatGPT on it&#x27;s own resources can&#x27;t talk about who won the Superbowl or Premier League this year because it hasn&#x27;t seen any documents about it.<p>Note it doesn&#x27;t have to be words, the same strategy works amazing well for images and audio, see<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Vision_transformer" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Vision_transformer</a>
proc0将近 2 年前
It&#x27;s not really a function because it doesn&#x27;t take the entire input and generate the whole output. It&#x27;s more of a stream of inputs and a stream of outputs, and the outputs go back in as inputs. It&#x27;s basically an app, with app layers doing different things like transforming words into tokens, and processing them with an attention algorithm, and then feeding it through a relatively simple neural net (compared to previous architectures before GPT).
trivialmath将近 2 年前
The idea is about the LLM trained in a programming language should use as queries the functions, and learn to detect the keys (arguments to the functions) and the value matrix could be related to a memorized version of the computed function with those arguments. So the sequence of learning in transformers is like: 1) query = is this a function?, what function? 2) keys = where are what are their arguments 3) values = embed the memorized result of computing the function with the given arguments.
BWStearns将近 2 年前
The most intuitive descriptions I&#x27;ve heard is to imagine it like querying a continuous database. That is, it can give you responses from the space _between_ the data that it has memorized&#x2F;stored&#x2F;incorporated.<p>Caveat: I&#x27;m not super on top of AI stuff, but that description struck a chord with me.