Fun post! Back during the holidays we wrote one where we abused temperature AND structured output to approximate a random selection: <a href="https://bits.logic.inc/p/all-i-want-for-christmas-is-a-random" rel="nofollow">https://bits.logic.inc/p/all-i-want-for-christmas-is-a-rando...</a>
Wouldn’t any randomness (for a fixed combination of hardware and weights) be a result of the temperature and any randomness inserted at inference-time?<p>Otherwise, doing a H/T comparison is just a proxy to what the underlying token probabilities are and the temperature configuration (+hardware differences for a remote-hosted model).
One thing to consider: we don’t know if these LLMs are wrapped with server-side logic that injects randomness (e.g. using actual code or external RNG). The outputs might not come purely from the model's token probabilities, but from some opaque post-processing layer. That’s a major blind spot in this kind of testing.
Is randomness even possible?
You can't technically prove it just see it, more likely to be close to that, in <a href="https://www.random.org/#learn" rel="nofollow">https://www.random.org/#learn</a> they talk a little about this
Author here. I know 0-10 is one extra even number. I also just did this for fun so don't take the statistical significance aspect of it very seriously. You also need to run this multiple times with multiple temperature and top_p values to do this more rigorously.
Oh, surprising that Claude can do heads/tails.<p>In a project last year, I did a combination of LLMs plus a list of random numbers from a quantum computer. Random numbers are the only useful things quantum computers can produce—and that is one thing LLMs are terrible at
During my tenure at NVidia I met a guy that was working on special versions of to the kernels that would make them deterministic.<p>Otherwise, parallel floating point computations like these are not going to be perfectly deterministic, due to a combination of two factors. First, the order of some operations will be random due to all sorts of environmental conditions such as temperature variations. Second, floating point operations like addition are not ~~commutative~~ <i>associative</i> (thanks!!), which surprises people unfamiliar with how they work.<p>That is before we even talk about the temperature setting on LLMs.
What I find more important is the ability to get reproducible results for testing.<p>I do not know about other LLMs, but Cohere allows setting a seed value. When setting the same seed value it will always give you the same result for a specific prompt (of course unless the LLM gets an update).<p>OTOH I would guess that they normally simply generate a random seed value on the server side when processing a prompt, and it depends on their random number generator how random that really is.
I would suggest them to repeat the experiment while including sets from answers to "choose heads or tails" AND "choose tails or heads", ditto for numbers or rephrase the question to not include a "choice" (choose from 0 to 9 (btw, they're asking to choose from 0 to 10 inclusive, which is inherently wrong as the even subset is bigger in this case)), but rather "choose a random integer".
Is the LLM reset between each event?<p>If LLMs are anything like people, I would expect a different result depending on that. The idea that random events are independent is very unintuitive to us, resulting in what we call the Gambler's Fallacy. LLMs attempts at randomness are very likely to be just as biased, if not more.
They should measure for different temperatures, where at 0 it will be the same output every time, but it's interesting to see how results will change for different temperatures from 0.01 to 2. But, I'm not sure if temperature is implemented the same way in all llms
I'd be interested to see the bias in random character generation. It's something which would be closer to the domains of LLMs, seeing that they're 'next word generators' (based on probability).<p>How cryptographically secure would an LLM rng seed generator be?
LLMs are acting like humans, I believe humans will have biases if you ask them to do random things :)<p>On a more serious note, you could always adjust the temperature so they behave more randomly.
This is silly. Behind an LLM sits a deterministic algorithm. So no, it is not possible without ibserting randomness by other means into the algo, for example by setting temperatures for gradient descent.<p>Why are all these posts and news about LLMs so uninformed? This is human built technology. You can actually read up how these things work. And yet they are treated as if it were an alien species that must be examined by sociological means and methods where it is not necessary. Grinds my gears every time :D