Regression Algorithms
What is Regression?
Regression is a statistical model/method used to determine the strength and character of the relationship between one dependent variable and one or more independent variables.
If you have data about the consumption of electricity in an area based on the number of houses in that area and now you are asked to estimate electricity consumption for a larger area with more houses, you would easily use the average per house and extrapolate it. You would not be completely wrong. Regression goes a step further and allows this guess to be better by finding a relationship between the number of houses and electricity consumption. Typically it is a linear relationship and hence it is called a Linear Regression.
Mathematically it is represented by the equation: y= mx + c where y is the electricity consumed, x is the number of houses. If we find the right m and c we have found the relation between y and x!! It is that simple.
If you observe the hypothetical graph here, you will see that given 100 houses in an area, the line lets you predict that the consumption may be around 20 to 21 kWh.
Here x is called the independent or the predictor variable
y is called the dependent or the target variable.
In the above example, the number of houses in an area is the independent or predictor variable and the electricity consumed by that area is the dependent or target variable.
What are the Types of Regression Algorithms?
Conceptually, there are 2 types:
Simple Linear Regression
Multi-Linear Regression
Simple Linear Regression (SLR):
If the variables involved consist of only ONE independent variable, then it is a simple linear regression. In the above example, if only the 'Number of houses" determines the electricity consumption, then it is an SLR problem.
Multi-Linear Regression (MLR):
Here's where MLR comes in. If there are more than one predictors that determine the target variable, it becomes an MLR problem.
In the above example, in reality, the size of the house, the number of electrical equipment, the number of residents, the location of houses, the external temperature, and many more factors influence the electricity consumption and hence this becomes an MLR problem
The same regression output can be obtained from various other algorithms:
Polynomial Regression
Logistic Regression
Support Vector Regression
Decision Tree Regression
Random Forest Regression
Ridge Regression
Lasso Regression
Generalized Linear Regression
When do you use Regression Algorithms?
There are mainly three uses of regression analysis
Predictions - of a continuous dependent variable based on the predictors
Trend Forecasting - the best fit line also helps in understanding the trend
Determining the strength of the predictors - if there is no strong relationship between the target and the predictors, it becomes visible through regression analysis
Associated Concepts to go deeper into Linear Regression:
Data Preparation and cleansing
Concept of Cost Functions
Optimization of Cost Functions
Assumptions of Simple Linear Regression
Hypothesis Testing
p-values of coefficients
Residual Analysis
Various statistics like R-Squared, Adjusted R-Squared, F-Statistic
Conclusion
Regression is one of the most fundamental models that work very well in quite a large number of use cases for prediction and trend forecasting. Esp. Multi-Linear Regression (MLR) is used in innumerable scenarios and would come in handy for any Data Science Manager or ML developer.
It also mathematically provides the strength of the relationship between the target and predictors.
Comentarios