# Lasso Regression

LASSOstands forLeast Absolute Shrinkage and Selection Operator. The algorithm is another variation of linear regression like ridge regression. We use lasso regression when we have large number of predictor variables.

### Things You Will Master

- Overview - Lasso Regression
- Training and Predicting Lasso Regression Model
- Getting the list of important variables

## Overview - Lasso Regression

Lasso regression is a parsimonious model which performs L1 regularization. The L1 regularization adds a penality equivalent to the absolute of the maginitude of regression coefficients and tries to minimize them. The equation of lasso is similar to ridge regression and looks like as given below.

```
LS Obj + λ (sum of the absolute values of coefficients)
```

Here the objective is as follows: If λ = 0, We get same coefficients as linear regression If λ = vary large, All coefficients are shriked towards zero

The two models, lasso and ridge regression are almost similar to each other. However, in lasso the coefficients which are responsible for large variance are converted to *zero*. On the other hand, coefficients are only shrinked but are never made zero.

**Lasso regression analysis is also used for variable selection as the model imposes coefficients of some variables to shrink towards zero.**

### What does large number of variables mean?

- Large number here means that model tend to over-fit. Theoratically, minimum ten variables can cause overfitting problem.
- When you face computational challenges due to presence of n number of variables. Given todays processing power of systems, this situation arises rarely.

*The following diagram is the visual interpretation comparing OLS and lasso regression*.

### Important Note

The LASSO is not very good at handling variables which show correlation between them and thus can sometimes show very wild behaviors.

## Training Lasso Regression Model

The training of lasso regression model is exactly same as that of ridge regression. We need to identify the optimal lambda value and then use that value to train the model. To achieve this, we can use the same `glmnet`

function by passing `alpha = 1`

argument. When we pass `alpha = 0`

, `glmnet()`

runs a ridge regression and when we pass `alpha = 0.5`

the glmnet runs another kind of model which is called as **elastic net** and is a combination of ridge and lasso regression.

- We use
`cv.glmnet()`

function to identify the optimal lambda value - Extract the best lambda and best model
- Rebuild the model using
`glmnet()`

function - Use predict function to predict the values on future data

For this example we will be using `swiss`

dataset to predict the fertility based upon Socioeconomic Indicators for the year 1888.

```
# Loaging the library
library(glmnet)
# Loading the data
data(swiss)
x_vars <- model.matrix(Fertility~. , swiss)[,-1]
y_var <- swiss$Fertility
lambda_seq <- 10^seq(2, -2, by = -.1)
# Splitting the data into test and train
set.seed(86)
train = sample(1:nrow(x_var), nrow(x_var)/2)
x_test = (-train)
y_test = y_var[test]
cv_output <- cv.glmnet(x_vars[train,], y_var[train],
alpha = 1, lambda = lambda_seq)
# identifying best lamda
best_lam <- cv_output$lambda.min
```

```
# Output
[1] 1.995262
```

*Using this value, let us train the lasso model again*.

```
# Rebuilding the model with best lamda value identified
lasso_best <- glmnet(x_vars[train,], y_var[train], alpha = 1, lambda = best_lam)
pred <- predict(lasso_best, s = best_lam, newx = x_vars[test,])
```

#### Finally, we combine the predicted values and actual values to see the two values side by side and then you can use the R-Squared formula to check the model performance. Note - you must calculate the R-Squared values for both train and test dataset.

```
final <- cbind(y_var[test], pred)
# Checking the first six obs
head(final)
```

```
# Output
Actual Pred
Courtelary 80.2 69.92666
Delemont 83.1 76.15793
Franches-Mnt 92.5 75.16697
Moutier 85.8 70.33981
Glane 92.4 76.61480
Veveyse 87.1 76.34404
```

#### Sharing the R Squared formula

*The function provided below is just indicative and you must provide the actual and predicted values based upon your dataset*.

```
actual <- test$actual
preds <- test$predicted
rss <- sum((preds - actual) ^ 2)
tss <- sum((actual - mean(actual)) ^ 2)
rsq <- 1 - rss/tss
rsq
```

## Getting the list of important variables

To get the list of important variables we just need to investigate the beta coefficients of final best model.

```
# Inspecting beta coefficients
coef(lasso_best)
```

```
# Output
6 x 1 sparse Matrix of class "dgCMatrix"
s0
(Intercept) 55.16706057
Agriculture .
Examination -0.30124968
Education .
Catholic 0.04700893
Infant.Mortality 0.84730322
```

*The model indicates that the coefficients of Agriculture and Education have been shriked to zero. Thus we are left with three variables namely; Examination, Catholic, and Infant.Mortality*