Given a trained LLM model with fixed weights, why is it that the same prompt yields different responses? Or is it the case that some type of RL takes place?
It's deliberately made nondeterministic, partly using something called softmax<p><a href="https://en.wikipedia.org/wiki/Softmax_function" rel="nofollow">https://en.wikipedia.org/wiki/Softmax_function</a><p>I'd say mainly in order to avoid boring its users.