Title: Sparse Estimation of High-Dimensional Covariance Matrices
Co-Chairs: Associate Professor Elizaveta Levina, Associate Professor Ji Zhu
Committee Members: Associate Professor Kerby Shedden, Associate Professor Bin Nan (Biostatistics)
Abstract: This thesis develops methodology and asymptotic analysis for sparse estimators of the covariance matrix and the inverse covariance (concentration) matrix in high-dimensional settings. We propose estimators that are invariant to the ordering of the variables and estimators that exploit variable ordering. For the estimators that are invariant to the ordering of the variables, estimation is based on both lasso-type penalized normal likelihood and a new proposed class of generalized thresholding operators which combine thresholding with shrinkage applied to the entries of the sample covariance matrix. For both approaches we obtain explicit convergence rates in matrix norms that show the trade-off between the sparsity of the true model, dimension, and the sample size. In addition, we show that the generalized thresholding approach estimates true zero as zeros with probability tending to 1, and is sign consistent for non-zero elements. We also derive a fast iterative algorithm for computing the penalized likelihood estimator. To exploit a natural ordering of the variables to estimate the covariance matrix, we propose a new regression interpretation of the Cholesky factor of covariance matrix, as opposed to the well known regression interpretation of the Cholesky factor of the inverse covariance, which leads to a new class of regularized covariance estimators suitable for high-dimensional problems. We also establish theoretical connections between banding Cholesky factors of the covariance matrix and its inverse and constrained maximum likelihood estimation under the banding constraint. These covariance estimators are compared to other estimators on simulated data and on real data examples from gene microarray experiments and remote sensing. Lastly, we propose a procedure for constructing a sparse estimator of a multivariate regression coefficient matrix that accounts for correlation of the response variables. An efficient optimization algorithm and a fast approximation are developed and we show that the proposed method outperforms relevant competitors when the responses are highly correlated. We also apply the new method to a finance example on predicting asset returns.