R Statistics Blog

Data Science From R Programmers Point Of View

Ridge Regression

Ridge Regression is a variation of linear regression. We use ridge regression to tackle the multicollinearity problem. Due to multicollinearity, we see a very large variance in the least square estimates of the model. So to reduce this variance a degree of bais is added to the regression estimates.

Things You Will Master

  1. Overview - Ridge Regression
  2. Training Ridge Regression Model
  3. Choosing Optimal Lambda Value - k-Cross Validation
  4. Bias and variance of ridge regression
  5. Assumptions of Ridge Regressions

Overview - Ridge Regression

Ridge regression is a parsimonious model which performs L2 regularization. The L2 regularization adds a penality equivalent to the square of the maginitude of regression coefficients and tries to minimize them. The equation of rigde regression looks like as given below.

  LS Obj + λ (sum of the square of coefficients)

Here the objective is as follows:

  1. If λ = 0, the output is similar to simple linear regression.

  2. If λ = very large , the coefficients will become zero.

The following diagram is the visual interpretation comparing OLS and ridge regression.

Geometric representaion of OLS vs Ridge Regression

Training Ridge Regression Model

To build the ridge regression in r we use glmnetfunction from glmnet package in R. Let’s use ridge regression to predict the mileage of the car using mtcars dataset.

# Loaging the library
library(glmnet)
# Getting the independent variable
x_var <- data.matrix(mtcars[, c("hp", "wt", "drat")])
# Getting the dependent variable
y_var <- mtcars[, "mpg"]

# Setting the range of lambda values
lambda_seq <- 10^seq(2, -2, by = -.1)
# Using glmnet function to build the ridge regression model
fit <- glmnet(x_var, y_var, alpha = 0, lambda  = lambda_seq)
# Checking the model
summary(fit)
# Output
          Length Class     Mode   
a0         41    -none-    numeric
beta      123    dgCMatrix S4     
df         41    -none-    numeric
dim         2    -none-    numeric
lambda     41    -none-    numeric
dev.ratio  41    -none-    numeric
nulldev     1    -none-    numeric
npasses     1    -none-    numeric
jerr        1    -none-    numeric
offset      1    -none-    logical
call        5    -none-    call   
nobs        1    -none-    numeric

Choosing Optimal Lambda Value

The glmnet function trains the model multiple times for all the different values oflambda which we pass as a sequence of vector to the lambda = argument in the glmnet function. The next task is to identify the optimal value of lambda which results into minimum error. This can be achieved automatically by using cv.glmnet() function.

# Using cross validation glmnet
ridge_cv <- cv.glmnet(x_var, y_var, alpha = 0, lambda = lambdas)
# Best lambda value
best_lambda <- ridge_cv$lambda.min
best_lambda
# Output
[1] 79.43000

Extracting the best model using K-cross validation

The best model can be extracted by calling the glmnet.fit object from the cross validation object. Once you hav that we can rebuild the model by passing lambda as 79.43000.

best_fit <- ridge_cv$glmnet.fit
head(best_fit)
# Output
      Df   %Dev    Lambda
 [1,]  3 0.1798 100.00000
 [2,]  3 0.2167  79.43000
 [3,]  3 0.2589  63.10000
 [4,]  3 0.3060  50.12000
 [5,]  3 0.3574  39.81000
 [6,]  3 0.4120  31.62000

Building the final model

# Rebuilding the model with optimal lambda value
best_ridge <- glmnet(x_var, y_var, alpha = 0, lambda = 79.43000)

Checking the coefficients

coef(best_ridge)
# Output
4 x 1 sparse Matrix of class "dgCMatrix"
                      s0
(Intercept) 20.099502946
hp          -0.004398609
wt          -0.344175261
drat         0.484807607

Next task is to use the predict function and compute the R2 value for both train and test dataset. In this, example we did not create the train and test split. So, I am only providing a sample code. However, you can read the linear regression chapter to understand this step in detail.

# here x is the test dataset
pred <- predict(best_ridge, s = best_lambda, newx = x)

# R squared formula
actual <- test$Price
preds <- test$PreditedPrice
rss <- sum((preds - actual) ^ 2)
tss <- sum((actual - mean(actual)) ^ 2)
rsq <- 1 - rss/tss
rsq

Bias and variance of ridge regression

Bias and variance trade of is generally a complicated stuff when it comes to building ridge regression models on actual dataset. However, following the general trend which I would like to highlight here:

  1. The bias increases as λ increases.
  2. The variance decreases as λ increases.

Assumptions of Ridge Regressions

The assumptions of ridge regression are same as that of linear regression: linearity, constant variance, and independence. However, as ridge regression does not provide confidence limits, distribution of errors to be normal need not be assumed.

Closing Note

In this chapter, learned about ridge regression in R using functions from glmnet package. We also saw how one can use cross validation to get the best model. In the next chapter, we will learn how to lasson regression.
Last updated on 6 Jan 2019 / Published on 17 Oct 2017