Suppose \(y_i\) is a count then a very common model is to assume the Poisson disttribuion: \[ P(Y=y \;|\; \lambda) = \frac{e^{-\lambda} \, \lambda^y}{y!}, \; y = 0,1,2,\ldots \]
Given \(Y_i \sim Poisson(\lambda)\) iid, (that is, \(Y_i = y_i\)), what is the MLE of \(\lambda\)?
Let \[ f(x) = (x_1 - a_1)^2 + (x_2 - a_2)^2, \;\; g(x_1,x_2) = x_1^2 + x_2^2 - 1. \]
Minimize \(f(x)\) subject to the constraint that \(g(x) \leq 0\).
First draw simple pictures to make the solution obvious.
Then check that the lagrange multiplier first order condition conforms with with your intution.
How does the norm of \((a_1,a_2)\) affect the solution !!??
A basic idea in nonlinear regression is to use polynomial terms.
With one \(x\) variable, this means we consider the models: \[ Y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \ldots + \beta_p x_i^p + \epsilon_i \]
Using the simple used cars data (with \(n\)=1,000) with Y= price and x=mileage, find the best choice of \(p\).
Fit your chosen polynomial mode using all the data and plot the fit on top of the data. Do you like it? Also plot the fits for a \(p\) that is “way to big”. Whais wrong with it?
Let’s try ridge and LASSO on the car price data.
cd = read.csv("http://www.rob-mcculloch.org/data/usedcars.csv")
print(dim(cd))
## [1] 20063 11
Note that this version of the cars data has 20 thousand observations and 11 variables.
In addition many of the x variables are categorial so you will have to dummy them up.
sapply(cd,is.numeric)
## price trim isOneOwner mileage year color
## TRUE FALSE FALSE TRUE TRUE FALSE
## displacement fuel region soundSystem wheelType
## TRUE FALSE FALSE FALSE FALSE
displacement is actually categorical.
table(cd$displacement)
##
## 3 3.2 3.5 3.7 4.2 4.3 4.6 5 5.4 5.5 5.8 6 6.3
## 204 274 227 141 239 2787 2794 2661 356 9561 112 213 494
Use the LASSO to relate log of price to the features.
Use ridge regression to relate log of price to the features.
Note that is R, you use glmnet for LASSO and Ridge.
Here is the glmnet
help on the parameter alpha.
alpha :
The elasticnet mixing parameter, with 0 <= alpha <= 1 . The penalty is defined as
(1-alpha)/2||beta||_2^2+alpha||beta||_1.
alpha=1 is the lasso penalty, and alpha=0 the ridge penalty.