Very nice! How about this, for more than 2 classes:<p>Let p_k be the probability of being in class k. We assume log p_k = f_k(x) + C(x) where x is the feature vector and C(x) is normalisation to make the probabilities sum to 1.<p>Equivalently, p_k is proportional to exp(f_k(x)), so p_k = exp(f_k(x)) / sum_j exp(f_j(x)).<p>Because of the normalisation we may assume without loss of generality that f_0(x) = 0. Then if we have 2 classes and f_1(x) is linear, we get logistic regression.