How linear regression works intuitively and how it leads to gradient descent

292 ポイント投稿者: lucasfcosta3日前

15 comments

tibbar約2時間前

Some important context missing from this post (IMO) is that the data set presented is probably not a very good fit for linear regression, or really most classical models: You can see that there's way more variance at one end of the dataset. So even if we find the best model for the data that looks great in our gradient-descent-like visualization, it might not have that much predictive power. One common trick to deal with data sets like this is to map the data to another space where the distribution is more even and then build a model in that space. Then you can make predictions for the original data set by taking the inverse mapping on the outputs of the model.

评论 #43928725 未加载

评论 #43928152 未加载

评论 #43928192 未加载

c7b約12時間前

One interesting property of least squares regression is that the predictions are the conditional expectation (mean) of the target variable given the right-hand-side variables. So in the OP example, we're predicting the average price of houses of a given size.The notion of predicting the mean can be extended to other properties of the conditional distribution of the target variable, such as the median or other quantiles [0]. This comes with interesting implications, such as the well-known properties of the median being more robust to outliers than the mean. In fact, the absolute loss function mentioned in the article can be shown to give a conditional median prediction (using the mid-point in case of non-uniqueness). So in the OP example, if the data set is known to contain outliers like properties that have extremely high or low value due to idiosyncratic reasons (e.g. former celebrity homes or contaminated land) then the absolute loss could be a wiser choice than least squares (of course, there are other ways to deal with this as well).Worth mentioning here I think because the OP seems to be holding a particular grudge against the absolute loss function. It's not perfect, but it has its virtues and some advantages over least squares. It's a trade-off, like so many things.[0] <a href="https://en.wikipedia.org/wiki/Quantile_regression" rel="nofollow">https://en.wikipedia.org/wiki/Quantile_regression</a>

评论 #43924645 未加载

评论 #43928836 未加载

评论 #43924234 未加载

评论 #43925155 未加载

easygenes約11時間前

This is very light and approachable but stops short of building the statistical intuition you want here. They fixate on the smoothness of squared errors without connecting that to the gaussian noise model and establishing how that relates to the predictive power against natural sorts of data.

评论 #43924515 未加载

评论 #43924499 未加载

评论 #43924382 未加载

stared約9時間前

I really recommend this explorable explanation: <a href="https://setosa.io/ev/ordinary-least-squares-regression/" rel="nofollow">https://setosa.io/ev/ordinary-least-squares-regression/</a>And for actual gradient descent code, here is an older example of mine in PyTorch: <a href="https://github.com/stared/thinking-in-tensors-writing-in-pytorch/blob/master/3%20Linear%20regression.ipynb">https://github.com/stared/thinking-in-tensors-writing-in-pyt...</a>

评论 #43925431 未加载

jampekka約10時間前

The main practical reason why square error is minimized in ordinary linear regression is that it has an analytical solution. Makes it a bit weird example for gradient descent.There are plenty of error formulations that give a smooth loss function, and many even a convex one, but most don't have analytical solutions so they are solved via numerical optimization like GD.The main message is IMHO correct though: square error (and its implicit gaussian noise assumption) is all too often used just per convenience and tradition.

评论 #43925024 未加载

评论 #43925591 未加载

评论 #43924550 未加载

评论 #43924456 未加载

rogue7約3時間前

I built a small static web app [0] (with svelte and tensorflow js) that shows gradient descent. It has two kind of problems: wave (the default) and linear. In the first case, the algorithm learns y = ax + b ; in the second, y = cos(ax + b). The training data is generated from these functions with some noise.I spent some time making it work with interpolation so that the transitions are smooth.Then I expanded to another version, including a small neural network (nn) [1].And finally, for the two functions that have a 2d parameter space, I included a viz of the loss [2]. You can click on the 2d space and get a new initial point for the descent, and see the trajectory.Never really finished it, though I wrote a blog post about it [3][0] <a href="https://gradfront.pages.dev/" rel="nofollow">https://gradfront.pages.dev/</a>[1] <a href="https://f36dfeb7.gradfront.pages.dev/" rel="nofollow">https://f36dfeb7.gradfront.pages.dev/</a>[2] <a href="https://deploy-preview-1--gradient-descent.netlify.app/" rel="nofollow">https://deploy-preview-1--gradient-descent.netlify.app/</a>[3] <a href="https://blog.horaceg.xyz/posts/need-for-speed/" rel="nofollow">https://blog.horaceg.xyz/posts/need-for-speed/</a>

评论 #43927192 未加载

brrrrrm約13時間前

> When using least squares, a zero derivative always marks a minimum. But that's not true in general ... To tell the difference between a minimum and a maximum, you'd need to look at the second derivative.It's interesting to continue the analysis into higher dimensions, which have interesting stationary points that require looking at the matrix properties of a specific type of second order derivative (the Hessian) <a href="https://en.wikipedia.org/wiki/Saddle_point" rel="nofollow">https://en.wikipedia.org/wiki/Saddle_point</a>In general it's super powerful to convert data problems like linear regression into geometric considerations.

dalmo3約5時間前

I don't have anything useful to say, but, how the hell is that a "12 min read"?I always find those counters to greatly overestimate reading speed, but for a technical article like this it's outright insulting, to be honest.

评论 #43927172 未加载

setgree約2時間前

Nice, thanks for sharing! I shared this with my HS calculus teacher :) (My model is that his students should be motivated to get machine learning engineering jobs, so they should be motivated to learn calculus, but who knows.)

throwaway7783約2時間前

In the same vein, Karpathy's video series "Neural Networks from zero to hero"[0] touches upon a lot of this and intuitions as well. One of the best introductory series (even if you ignore the neural net part of it) and brushes on gradients, differentiation and what it means intuitively.[0] <a href="https://youtu.be/VMj-3S1tku0?si=jq1cCSn5si17KK1o" rel="nofollow">https://youtu.be/VMj-3S1tku0?si=jq1cCSn5si17KK1o</a>

quercusa約4時間前

This (housing prices) example seems really familiar. Was it used in Andrew Ng's original Coursera ML class?

评论 #43927514 未加载

wodenokoto約9時間前

Speaking of linear regression, can any of you recommend an online course or book that deep dives into fitting linear models?

评论 #43924446 未加载

jwilber約3時間前

See another interactive article explaining linear regression and gradient descent: <a href="https://mlu-explain.github.io/linear-regression/" rel="nofollow">https://mlu-explain.github.io/linear-regression/</a>

reify約12時間前

All thats wrong with the modern world<a href="https://www.ibm.com/think/topics/linear-regression" rel="nofollow">https://www.ibm.com/think/topics/linear-regression</a>A proven way to scientifically and reliably predict the futureBusiness and organizational leaders can make better decisions by using linear regression techniques. Organizations collect masses of data, and linear regression helps them use that data to better manage reality, instead of relying on experience and intuition. You can take large amounts of raw data and transform it into actionable information.You can also use linear regression to provide better insights by uncovering patterns and relationships that your business colleagues might have previously seen and thought they already understood.For example, performing an analysis of sales and purchase data can help you uncover specific purchasing patterns on particular days or at certain times. Insights gathered from regression analysis can help business leaders anticipate times when their company’s products will be in high demand.

评论 #43923737 未加载

jascha_eng約9時間前

The amount of em dashes in this make this look very AI written. Which doesn't make it a bad piece but makes me more carefully check every sentence for errors.

评论 #43924671 未加载

评论 #43927528 未加载