Despite what the article claims, normality is not actually an assumption of linear regression. It is "required" for doing F-tests (the F-distribution being related to the normal distribution), but it is not required for proving that the regression coefficients are consistent.
> Assumptions of linear regression: There must be a linear relation between independent and dependent variables.<p>That's not wrong, but it's a strong way to word it. If linear regression were only suitable when the variables were perfectly linearly related, it would get a lot less use. Practically, linear regression can be used when the relationship is linear-ish, at least in the interval of interest. In other words, you can choose to declare linearity as an assumption (and take responsibility for what that choice entails, and for the error it might introduce into your analysis).
A tool that I've found myself reaching for more and more often is Gaussian Process Regression [1] [2]<p>* It allows you to model essentially arbitrary functions. The main model assumption is your choice of kernel, which defines the local correlation between nearby points.<p>* You can draw samples from the distribution of all possible functions that fit your data.<p>* You can quantify which regions of the function you have more or less certainty about.<p>* Imagine this situation: you want to discover the functional relationship between the inputs and outputs of a long-running process. You can test any input you want, but it's not practical to exhaustively grid-search the input space. A Gaussian Process model can tell you which inputs to test next so as to gain the most information, which makes it perfect for optimising complex simulations. Used in this way, it's one means of implementing "Bayesian Optimisation" [3]<p>[1] <a href="https://en.wikipedia.org/wiki/Gaussian_process" rel="nofollow">https://en.wikipedia.org/wiki/Gaussian_process</a><p>[2] <a href="http://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html#sklearn.gaussian_process.GaussianProcessRegressor" rel="nofollow">http://scikit-learn.org/stable/modules/generated/sklearn.gau...</a><p>[3] <a href="https://en.wikipedia.org/wiki/Bayesian_optimization" rel="nofollow">https://en.wikipedia.org/wiki/Bayesian_optimization</a>
Now this is a topic I desperately need. Can anyone here by any chance explain why would one choose predictors in multilinear regression that are NOT correlated to the target? I am having trouble understanding paper [1] where authors avoid using predictors that are correlated to target. Target is ozone concentration shown by referent instrument and predictors are low cost sensor outputs.<p>[1] <a href="https://www.sciencedirect.com/science/article/pii/S092540051500355X" rel="nofollow">https://www.sciencedirect.com/science/article/pii/S092540051...</a> Section 4.1 about ozone predictors
This article is obviously a jumping off point kind of article. Most people using linear regression have never even heard of things like ridge regression. So I like the article.<p>However, there are at least two types of regression I'd add to the list, and a suggestion.:<p>1 Multivariate Distance Matrix Regression (MDMR; Anderson, 2001; McArdle & Anderson, 2001).<p>2. Regression with splines<p>3. On polynomial regression, add mention of orthogonal polynomials.
Why did the article cover a basic term like "outlier" under "Terminologies related to regression" but omitted information about how to evaluate a regression model? I liked that there was some information at the bottom about "How to choose a regression model" that mentioned "you can select the final model based on Adjusted r-square, RMSE, AIC and BIC" but providing a little more context would make this post even better. Perhaps a link to a future blog post on the topic?
Are there any ML APIs or web services that accept a vector and run various regression scenarios to identify optimal fit?<p>I suppose vectors for both training and testing would be required.<p>Would gladly pay $1-$5 per batch for a service to do this.
Logistic regression is doing classification not regression. That is, it's assigning/predicting categories of data points instead of predicting some continuous value on some interval. Maybe this is splitting hairs but the way you evaluate a classification model is totally different than a regression one.
Don’t forget to put RANSAC on you list:
<a href="https://en.m.wikipedia.org/wiki/Random_sample_consensus" rel="nofollow">https://en.m.wikipedia.org/wiki/Random_sample_consensus</a>
I was hoping one <i>interesting</i> graphic chart per Regression Analysis Type. That didn't happen, and I felt lost at sea. Please, improve the post on such amazing topic.
> In simple words, regression analysis is used to model the relationship between a dependent variable and one or more independent variables.<p>“model” isn’t a simple word.
This is just horrible quality material. What in the heck is this?<p>> It is to be kept in mind that the coefficients which we get in quantile regression for a particular quantile should differ significantly from those we obtain from linear regression. If it is not so then our usage of quantile regression isn't justifiable. This can be done by observing the confidence intervals of regression coefficients of the estimates obtained from both the regressions.