STA 141A Final Project
In this final project, you will be required to learm and apply a key machine learning algorithm-the ridge regression model, which generalizes the ordinary linear regression model by introducing a regularization term.
Reading
● The conceptual partis in 6.2.1 Ridge Regression from the book An Introduction to Statistical Learning.
● The coding session is in 6.5. 2 Ridge Regression and the lasso from the same book
Instructions
● Clean the given data set.
● Plot the standardized ridge regression coefficients against the hyperparameter λ. (refer to Figure 6. 4 (left) in the ISL book.)
■ Note that standardized means that you need to standardize the covariates.
● Answer the following discussion questions.
Grading (20 pts total)
● Data clearning: 5 points (2 4 issues)
● Modeling: 5 points (Ridge Regression and Linear Regression)
● Plotting: 5 points (Visualizations must be correct, clearly labeled, aesthetically clean)
● Discussion: 5 points
● Readability (deduction)
■ Code should be well-commented and clear.
■ Up to 2 points deduction for poor readability (e.g., unexplained code, no comments, hard to follow).
In [ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Import any packages you want to use below
Data Cleaning
Clean the given dataset first.
● indicate the potential problems (hint: >=4 issues)
● apply reasonable method to address these problems
In [ ]: # add more cells when needed
Plotting
Make the plots below
In [ ]: add more cells when needed
Discussion
1. What's the connection between the linear regression model and the ridge regression model? (hint: think about the additional term in ridge regression)
2. How to understand the parameter λ? (Hint: think how the model changes when the value of λ changes)
3. Why are we interested in the standardized coefficient? (Hint: think about what happens when it is not standardized)
4. Interpret your coefficient for x6 when λ=0. Is it the same as the linear regression coefficient (you need to run a linear regression model. with the same data and compare them)? Explain why.
In [ ]:
#run your linear regression model here
#add more cells when needed