Not sure if useful - but if you know the initial biases of outputs, you can recalibrate these yourselves, provided you have output probabilities for all tokens (or all non-negligible ones at least).<p>Say the model outputs n tokens, and the prior (bias) in the model for tokens is m = (m_1, m_2, .... m_n), and the new prior you want is n=(n_1, n_2, ..._)<p>Then if the model outputs prediction p = (p_1, ... , p_n) for all tokens, then the new output you are looking for is<p>bias_shift(p) = softmax(logit(p) + log(n) - log(m))<p>You can prove this using Bayes rule + a bit of algebra. Most ML people don't seem to know this trick, but it's super useful for domain adaptation / class upsampling, where you know that the class balance in your training set is different to the one you want to predict on.