Problem 1

Here is the R code chunk we used to create a \(\Sigma\) matrix with one on the diagonal and .8 everywhere else.

p = 5
rho = .8
Sigma = matrix(rep(rho,p^2),ncol=p)
diag(Sigma) = 1.0

Vary \(p\) over some range.
We want to understand the relationship between \(p\) and

When I say “invert” I mean without taking advantage of the fact that \(\Sigma\) is symmetric and positive definite.

Plot the two times versus \(p\).

Regress the times on \((1,p,p^2,p^3,p^4)\).

You will have to figure out how to time things and pick a range for \(p\) that makes the plot looks nice.

Problem 2

Again use

p = 5
rho = .8
Sigma = matrix(rep(rho,p^2),ncol=p)
diag(Sigma) = 1.0

But now just fix \(p=5\).

Let, \[ Y \sim N(0,\Sigma) \]

(a)

What is the marginal distribution of \((Y_1,Y_2)\) ?

(b)

What is the conditional dist of \((Y_1,Y_2) \,|\, Y_3=.23, Y_4=-.65,Y_5=-.3\) ?

(c)

What is \(L\) in \(\Sigma = LL'\), \(L\) lower triangular ?

(d)

What is \(L^{-1}\) ?

(e)

What is \(A\) in \(A = PD^{1/2}\) ?

(f)

Simulate 10,000 observations, iid, \(N(0,\Sigma)\). compute the mle of \(\mu\) and \(\Sigma\), are they close to the true values ?

(g)

Simulate 50 observations, iid, \(N(0,\Sigma)\). Compute the mle of \(\mu\) and \(\Sigma\), are they close to the true values ?

(h)

What is the mle of \(L\) where \(\Sigma = LL'\), \(L\) lower triangular ?

Problem 3

This code is supposed to evaluate the log likelihood for the iid multivariate normal model.

logL = function(Y,mu,Sigma) {
   n = nrow(Y); p=ncol(Y)
   dS = det(Sigma)
   retval = -.5*n*log(dS)
   Si = solve(Sigma)
   for(i in 1:n) {
      yi = matrix(Y[i,]-mu,ncol=1)
      retval = retval - .5 * t(yi) %*% Si %*% yi
   }
   return(retval)
}

Is the code correct ?

This code will be slow for large \(n\) because of the loop over observations.

“Vectorize” the code by writing everything using matrix operations.
Check that your code is faster than the simple code above.

Again simulate data and then plot the log likelihood versus each of the 5 parameters in \(\mu\) and \(\Sigma\) one at a time keeping the other 4 parameters fixed at the mle values.

You want to see that the max is indeed obtained at the mle.

Note that when you vary an element of \(\hat{\Sigma}\), keeping the others fixed, you need to limit the range so that \(\Sigma\) stays positive definite.

Problem 4

n=100
x = seq(from=-2,to=2,length.out=100)
fx = x^3
plot(x,fx)
ii = 76:85
points(x[ii],fx[ii],col="red",pch=16)

Suppose you did not know the function \(f\). You observe \(f(x_i)\) for \(i\) in \(1:75 \; \cup \; 86:100 = Io\).

What do you know about \(f(x_i)\) for \(i\) in \(76:85 = Iu\) ?

We let \[ F = (f(x_1),f(x_2),...f(x_{100})) \sim N(\mu,\Sigma). \]

We then let \(X = F[Io]\) and \(Y = F[Iu]\) and compute \(Y \;|\; X\).

The key is what are \(\mu\) and \(\Sigma\) ?

In this example, lets suppose we let \(\mu=0\).

\[ \Sigma_{ij} = \sigma^2_f \; \exp(-\frac{1}{(2l^2)}(x_i - x_j)^2) \] See equation 15.10 of “Machine Learning” by Kevin Murphy.
This is the “squared exponential kernel”.

Choose reasonable values for \((\sigma_f, l)\) and display Y | X.

How does Y | X compare to Y?