Lasso and Ridge regression are the extended form of the Ordinary Least Square (OLS) Approach in general which is called as Linear Regression and while doing the classification it is known as Logistic Regression.

The main drawback of using the Least Squares Regression is that it won’t be able to define the number of predictors exceeds the number of observations, second not able to differentiate the “important” variables from the “less-important” ones this leads to overfitting a model on the training data and produces high variance in the testing data or making the predictions with least accuracy on the new data or the test data. The Least Squares also has issues while dealing with the ”multicollinearity”.

'Multicollinearity occurs when independent variables in a regression model are correlated. This correlation is a problem because independent variables should be independent. If degree of correlation between variable is high enough, it can cause a problem when we try to fit the model and interpret the results.'

In that scenario the Regularization comes into play which helps to avoid overfitting by penalizing high-valued regression coefficients which means it tries to reduce the parameters and simplifies the model. It works on biasing data towards particular values (such as small values near zero). The bias is achieved by adding tuning parameters to encourage those values:

L1 regularization adds an L1 penalty or biasness equal to absolute value of the magnitude of coefficients. In other words, it limits the size of the coefficients. L1 can yield the sparse models, some coefficients can become zero and eliminated therefore it aids in removing some features altogether and works well in performing feature selection when we have huge number of independent variables in the data set. Lasso Regression (Least Absolute Shrinkage and Selection Operator)uses this method. In Lasso Regression larger penalties results in coefficient values closer to zero, which is ideal for producing the simpler models.

L2 regularization adds an L2 penalty or biasness equal to sum of square of the magnitude of coefficients. It will not yield the sparse models but all the coefficients are shrunk by the same factor (none are eliminated). Ridge Regression and SVMs uses this method. Ridge regression helps in adding just enough bias to make the estimates reasonably reliable approximations to true population values. It uses a type of shrinkage estimator called “ridge estimator”. Shrinkage estimator theoretically produces new estimators that shrunk closer to the “true” population parameters which help in improving the least-squares estimate when multicollinearity is present.

To explain the importance of Ridge and LASSO regression the data which has been used is “Breast Cancer Data” to classify whether the cancer is “Malignant” or “Benign”.

Reading and looking at the structure of the data

BreastCancer_Data <- read.csv(file = "BreastCancerData.csv", sep = ",", na.strings = "") str(BreastCancer_Data)

Doing the basic descriptive statistics on the data

# Reading and analyzing the structure of the data
BreastCancer_Data <- read.csv(file = "BreastCancerData.csv", sep = ",", na.strings = "")
str(BreastCancer_Data)
# Doing the basic descriptive statistics of the data
summary(BreastCancer_Data)


# Checking the correlation among the independent variables and creating the graph       using correplot library
library(corrplot)
Correlation_Data <- cor(BreastCancer_Data[,-c(1,2)])
corrplot(Correlation_Data, method = "circle")
# The results shows that there is high correlation among the predictor variables which leads to multicollinearity in the data which we can check further by doing the         Variance Inflation Factor(vif) on the logistics model which we create in the next part of the code.

Logistic Regression

# Doing the Logistic Regression on the data to do the classification
# Firstly will divide the data in training and testing data set in 70% will be the     training data and 30% will be the testing data
BreastCancer_Data1 <- BreastCancer_Data[,-1]
library(caret)
training.sample <- BreastCancer_Data1$diagnosis %>% createDataPartition(p = 0.7, list = FALSE)
train.dat <- BreastCancer_Data1[training.sample,]
test.dat <- BreastCancer_Data1[-training.sample,]
# building the model using training data set
attach(BreastCancer_Data1)
model1 <- glm(diagnosis ~ ., data = train.dat, family = "binomial")
summary(model1)

Checking for the multicollinearity in the data using the vif() function from the “car” package

# Detecting the multicollinearity in the model
car::vif(model1)


# Fitting the model in the test data to predict the values and checking for accuracy    using the mean value for the predicted probabilities in respect to the observed ones and by creating the confusion matrix table
#make predictions
probabilities3 <- model1 %>% predict(test.dat, type = "response")
predictedclasses3 <- ifelse(probabilities3 >0.5, "M", "B")
#model accuracy
observed.classes3 <- test.dat$diagnosis
mean(predictedclasses3 == observed.classes3)
#to gte the confusion matrix
table(pred = predictedclasses3, true = test.dat$diagnosis)

Lasso Regression

# Now doing the lasso regression by penalizing the values of the coefficients which    will helps in getting the best features from the predictors by making the value of    the coffiecients as zero and eliminating those predictors and also reducing the multicollinearity in the data
# In Lasso regression the value of alpha = 1 and for the Ridge regression the           alpha = 0.
# Will first set the value of lambda using the cv.glmnet() function available in the     glmnet library. It will help in getting the optimal value of lambda
library(glmnet)
x <- model.matrix(diagnosis ~ . , train.dat)
y <- ifelse(train.dat$diagnosis == "M", 1,0)
set.seed(123)
cv.lasso <- cv.glmnet(x,y, alpha = 1, family = "binomial")
plot(cv.lasso)


# Making the model on the training data by setting the value of lambda which stores in the cv.lasso. Here we are taking the minimum value of lambda thus setting the value as cv.lasso$lambda.min to get the accurate results of the predictions which we do further on the test data
model <- glmnet(x, y, alpha = 1, family = "binomial", lambda = cv.lasso$lambda.min)
# final model with lambda.min on the testing data(predicted model)
x.test <- model.matrix(diagnosis ~ . , test.dat)
probabilities <- model %>% predict(newx = x.test)
predicted.classes <- ifelse(probabilities>0.5,"M","B")
#model accuracy
observed.classes <- test.dat$diagnosis
mean(predicted.classes == observed.classes)
table(pred = predicted.classes, true = test.dat$diagnosis)
# In this we are getting higher accuracy than that of the Logistics one.

Ridge Regression

# In ridge regression the alpha value will set to 0 (alpha = 0) and it will reduce    the multicollinearity by doing shrinkage of coefficients by same factor making and    not eliminating any of the predictor variables.
cv.ridge <- cv.glmnet(x,y, alpha = 0, family = "binomial")
#fitting the ridge model to the training data
ridge_model <- glmnet(x,y, alpha = 0, family = "binomial", lambda = cv.ridge$lambda.min)
#make predictions for the ridge regression on the test data
x_ridge.test <- model.matrix(diagnosis ~ . , test.dat)
ridge_probabilities <- ridge_model %>% predict(newx = x_ridge.test)
ridge_predictclass <- ifelse(ridge_probabilities>0.5,"M","B")
# checking for the accuracy we are getting from ridge regression
ridge_obsvclasses <- test.dat$diagnosis
mean(ridge_predictclass == ridge_obsvclasses)
# get the confusion matrix
table(pred = ridge_predictclass, true = test.dat$diagnosis)

Cover image credit Mika Baumeister from Unsplash

Lasso & Ridge Regression in R by Chesta Dhingra

Reading and looking at the structure of the data

Doing the basic descriptive statistics on the data

Product

Explore

Company

Blogs

Partner with us

Support

Comparisons

Comparisons