Linear Regression for Machine Learning

For anyone who wants to learn ML algorithms but don't know from where they should start then yes it is linear regression. Therefore, we shall do the same as it provides a base for us to build on and learn other ML algorithms. Linear regression is perhaps one of the most well known and well understood algorithms in statistics and machine learning. but before starting to know about linear regression we should take a look at regression. so what is Regression?? Answer is Regression is a method of modelling a target value based on independent predictors. This method is mostly used for forecasting and finding out cause and effect relationship between variables. Regression techniques mostly differ based on the number of independent variables and the type of relationship between the independent and dependent variables.

Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable. For example, a modeler might want to relate the weights of individuals to their heights using a linear regression model. The red line in the graph is referred to as the best fit straight line. The line can be modelled based on the linear equation shown below.

Before moving on to the algorithm, let’s have a look at two important concepts you must know to better understand linear regression.

Cost Function

The cost function helps us to figure out the best possible values for θ1 and θ2 which would provide the best fit line for the data points. Since we want the best values for θ1 and θ2, we convert this search problem into a minimization problem where we would like to minimize the error between the predicted value and the actual value.

We have chosen the above function to minimize. The main difference between the predicted values and ground truth is the error difference. We square the error difference and sum over all data points and divide that value by the total number of data points. This provides the average squared error over all the data points. Therefore, this cost function is also known as the Mean Squared Error(MSE) function. Now, using this MSE function we are going to change the values of θ1 and θ2 such that the MSE value settles at the minima.

Gradient Descent

When there are one or more inputs you can use a process of optimizing the values of the coefficients by iteratively minimizing the error of the model on your training data. This operation is called Gradient Descent and works by starting with random values for each coefficient. The sum of the squared errors are calculated for each pair of input and output values. A learning rate is used as a scale factor and the coefficients are updated in the direction towards minimizing the error. The process is repeated until a minimum sum squared error is achieved or no further improvement is possible. The choice of correct learning rate is very important as it ensures that Gradient Descent converges in a reasonable time. :

If we choose to be very large, Gradient Descent can overshoot the minimum. It may fail to converge or even diverge.

If we choose to be very small, Gradient Descent will take small steps to reach local minima and will take a longer time to reach minima.

Sometimes the cost function can be a non-convex function where you could settle at a local minima but for linear regression, it is always a convex function.

You may be wondering how to use gradient descent to update θ1 and θ2. To update θ1 and θ2, we take gradients from the cost function. To find these gradients, we take partial derivatives with respect to θ1 and θ2. Now, to understand how the partial derivatives are found below you would require some calculus but if you don’t, it is alright. You can take it as it is.

The partial derivates are the gradients and they are used to update the values of θ1 and θ2. Alpha is the learning rate which is a hyperparameter that you must specify. A smaller learning rate could get you closer to the minima but takes more time to reach the minima, a larger learning rate converges sooner but there is a chance that you could overshoot the minima.

Now we can start the main part of linear regression. that is the code. We have two choices, we can either use the scikit learn library to import the linear regression model and use it directly or we can write our own regression model based on the equations above. In that case at first we need to open jupyter notebook and need a dataset. for example I am using linear regression to predict the price of a house with 2050 square feet and 3 bedrooms. There are many datasets available online for linear regression. I attach my github link here . Let’s visualize the training and testing data.

We use matplotlib and numpy library to read the train and test files. We retrieve the independent(x) and dependent(y) variables and since we have only one feature(x) we reshape them so that we could feed them into our linear regression model.

Now, let’s build our own linear regression model from the equations above.

We initialize the value 0.0 for a_0 and a_1. For 1000 epochs we calculate the cost, and using the cost we calculate the gradients, and using the gradients we update the values of a_0 and a_1. After 1000 epochs, we would’ve obtained the best values for a_0 and a_1 and hence, we can formulate the best fit straight line.

So, Finally we got the required result. Cost of house with 2050 sq ft and 4 bedroom is $340027. 75766666

Conclusion

Linear Regression is an algorithm that every Machine Learning enthusiast must know and it is also the right place to start for people who want to learn Machine Learning as well. It is really a simple but useful algorithm. I hope this article was helpful to you.

Linear Regression for Machine Learning

Product

Explore

Company

Blogs

Support