Hi folks, AI newbie here.<p>Can those of you that are well versed in AI please help me understand what exactly constitutes a parameter?<p>I often see LLMs being compared / ranked based on their number of parameters, so I'm hoping to better understand this metric.<p>I have a rudimentary understanding of artificial neural networks (in terms of inputs -> functions -> outputs), and by extension, a very basic understanding of deep learning.<p>But when it comes to defining a parameter, my google searches thus far have led me to concepts such as 'AI model behaviour' and 'adjustable settings' which, whilst interesting, are still too complex for me to distil into simple terms.<p>If I were to explain what a parameter is to my child, what might I say?<p>Thanks!
Checkout this tweet <a href="https://twitter.com/ylecun/status/1706545305762582580" rel="nofollow noreferrer">https://twitter.com/ylecun/status/1706545305762582580</a> by Yan LeCun.<p>"Parameters are coefficients inside the model that are adjusted by the training procedure. The dataset is what you train the model on. Language models are trained with tokens that are subword units (e.g. prefix, root, suffix)."<p>His comment on GPT-4 parameters count<p>"Also: a model with more parameters is not necessarily better. It's generally more expensive to run and requires more RAM than a single GPU card can have.
GPT-4 is rumored to be a "mixture of experts", i.e. a neural net consisting of multiple specialized modules, only one of which is run on any particular prompt. So the effective number of parameters used at any one time is smaller than the total number."
f(x) = ax^2 + bx + c<p>a, b and c are parameters. So this is a model with 3 parameters. Keep adding parameters and chaining various operations on large inputs f(g(h(...) until 1.7T.
A parameter refers to any trained value in the model. If you initialize it to a random number at the start of training - it's a parameter.<p>In the context of transformer language models, that includes the weights and biases in the feed-forward layers, as well as the input embeddings, positional encodings, and transformers query, key, and value matrixes.
Number of connections between neurons, basically.<p>>GPT-4 has 1.7T parameters<p>That's a rumored parameter count. OpenAI has not released technical details of any of their newer models.