TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: GPT-4 has 1.7T parameters. What's a parameter?

13 点作者 herodoturtle超过 1 年前
Hi folks, AI newbie here.<p>Can those of you that are well versed in AI please help me understand what exactly constitutes a parameter?<p>I often see LLMs being compared &#x2F; ranked based on their number of parameters, so I&#x27;m hoping to better understand this metric.<p>I have a rudimentary understanding of artificial neural networks (in terms of inputs -&gt; functions -&gt; outputs), and by extension, a very basic understanding of deep learning.<p>But when it comes to defining a parameter, my google searches thus far have led me to concepts such as &#x27;AI model behaviour&#x27; and &#x27;adjustable settings&#x27; which, whilst interesting, are still too complex for me to distil into simple terms.<p>If I were to explain what a parameter is to my child, what might I say?<p>Thanks!

5 条评论

ash-ishh超过 1 年前
Checkout this tweet <a href="https:&#x2F;&#x2F;twitter.com&#x2F;ylecun&#x2F;status&#x2F;1706545305762582580" rel="nofollow noreferrer">https:&#x2F;&#x2F;twitter.com&#x2F;ylecun&#x2F;status&#x2F;1706545305762582580</a> by Yan LeCun.<p>&quot;Parameters are coefficients inside the model that are adjusted by the training procedure. The dataset is what you train the model on. Language models are trained with tokens that are subword units (e.g. prefix, root, suffix).&quot;<p>His comment on GPT-4 parameters count<p>&quot;Also: a model with more parameters is not necessarily better. It&#x27;s generally more expensive to run and requires more RAM than a single GPU card can have. GPT-4 is rumored to be a &quot;mixture of experts&quot;, i.e. a neural net consisting of multiple specialized modules, only one of which is run on any particular prompt. So the effective number of parameters used at any one time is smaller than the total number.&quot;
评论 #37814599 未加载
ps256超过 1 年前
f(x) = ax^2 + bx + c<p>a, b and c are parameters. So this is a model with 3 parameters. Keep adding parameters and chaining various operations on large inputs f(g(h(...) until 1.7T.
评论 #37804992 未加载
IceHegel超过 1 年前
A parameter refers to any trained value in the model. If you initialize it to a random number at the start of training - it&#x27;s a parameter.<p>In the context of transformer language models, that includes the weights and biases in the feed-forward layers, as well as the input embeddings, positional encodings, and transformers query, key, and value matrixes.
sbierwagen超过 1 年前
Number of connections between neurons, basically.<p>&gt;GPT-4 has 1.7T parameters<p>That&#x27;s a rumored parameter count. OpenAI has not released technical details of any of their newer models.
gardenhedge超过 1 年前
Think of it like 1.7T different settings. For the model you&#x27;re using, they&#x27;re set in a specific way