科技回声

7 条评论

imbusy111将近 6 年前

If you fit a linear model for the coffee making problem and one of the parameters is temperature and the coefficient for the temperature in the linear model is positive, does that mean if you keep increasing the temperature without limit, the probability of making a good cup of coffee increases as well without limit?In reality the temperature is required to be a certain exact value within a range.

评论 #20179198 未加载

评论 #20178258 未加载

评论 #20176489 未加载

评论 #20179803 未加载

debbiedowner将近 6 年前

It would be nice to hear about the optimization method with convergence guarantees etc. Introducing the model is nice, but you need to show quality and easiness of fit. You can maybe do this before since you rely on the idea of learning the parameters somehow to motivate the model.You can relate to NNs for free since it is a linear layer with sigmoid activation.You can stress it is linear in that your decision boundary is linear.I don't like how capitalized letters are not random variables but are observations.You can give some examples of what conditional PDFs P(H=1 | D ) look like and what you can model. In your case if the ideal temp for coffee is 190F and +/- 10 or more and the coffee is bad then you hope that (temp - 190)^2 is a feature input.Congrats on the book deal!

jules将近 6 年前

Very nice! How about this, for more than 2 classes:Let p_k be the probability of being in class k. We assume log p_k = f_k(x) + C(x) where x is the feature vector and C(x) is normalisation to make the probabilities sum to 1.Equivalently, p_k is proportional to exp(f_k(x)), so p_k = exp(f_k(x)) / sum_j exp(f_j(x)).Because of the normalisation we may assume without loss of generality that f_0(x) = 0. Then if we have 2 classes and f_1(x) is linear, we get logistic regression.

评论 #20178216 未加载

doomrobo将近 6 年前

This was a really neat exposition! I have a few questions:1. Is D a binary random variable? If so, what exactly does it mean to say beta*D + beta_0 is an approximation for log odds? Doesn't this formula only take on 2 possible values?2. Could you provide intuition for why a linear function of D would be a good approximation for the log odds mentioned?

评论 #20178464 未加载

评论 #20176574 未加载

blackbear_将近 6 年前

NB: this post uses D for the input x and H for the output y. This confused me quite a bit since usually in ML we use D for the data (pairs of x and y) and H for the model (in most cases the parameters, the betas in this example).

PopularBoard将近 6 年前

I'm a little confused, how much technical this approach is? I can't understand the meaning of P(D) for example. Does it make sense in strict mathematics?

评论 #20178727 未加载

评论 #20178592 未加载

s_Hogg将近 6 年前

I realise this is pedantry but it's definitely "Bayes' theorem" not "Baye's theorem" dammit.Sorry about that.

评论 #20175283 未加载

7 条评论

imbusy111将近 6 年前

评论 #20179198 未加载

评论 #20178258 未加载

评论 #20176489 未加载

评论 #20179803 未加载

debbiedowner将近 6 年前

jules将近 6 年前

评论 #20178216 未加载

doomrobo将近 6 年前

评论 #20178464 未加载

评论 #20176574 未加载

blackbear_将近 6 年前

PopularBoard将近 6 年前

I'm a little confused, how much technical this approach is? I can't understand the meaning of P(D) for example. Does it make sense in strict mathematics?

评论 #20178727 未加载

评论 #20178592 未加载

s_Hogg将近 6 年前

I realise this is pedantry but it's definitely "Bayes' theorem" not "Baye's theorem" dammit.Sorry about that.

评论 #20175283 未加载

Logistic Regression from Bayes’ Theorem

7 条评论

Logistic Regression from Bayes’ Theorem

7 条评论