Is your data a werewolf? Is your data affected by the phases of the moon? Or some other factors inside or outside your organization? Linear regression models may be that silver bullet that can help you better understand the factors that have an effect on your data. These models can be very telling when it comes to Predictive Analytics. Linear regression models can be used to determine how much one variable is affected by other variables.
- Step one: Determine the dependent and independent variables.
- Dependent variable: the variable in the data set that is attempting to be predicted.
- Independent Variable: the variables in the data set that may or may not affect the dependent variable. These variables can be either internal or external.
- Step two: Use a linear regression to plot the relationships between the dependent and independent variables along a straight line.
- The “best fit” line is determined and the differences between actual data points and the line are recorded.
- These differences are used in determining the R-squared value.
- Step three: Use the R-squared value to provide insight into whether the independent variable has any affect on the dependent variable.
- A high R-squared value indicates a relationship between the two variables.
- A low R-squared value indicates a relationship probably does not exist between the two variables.
- Step four: Does a relationship exist?
- Yes, incorporate the predictions or known values for the independent variables into your prediction for the dependent variable to obtain a more accurate prediction.
- No, ignore and move on to test more independent variables against the dependent variable.
I understand that this is a very simplistic view of what a linear regression model can be. This is meant to serve as a starting point for the user to gain a basic understanding of the model. There are many sites that dive deeper into the Linear Regression model and its uses. A few that I have found very helpful are: