Title: Estimation and Inference in High Dimensional Networks, with Applications to Biological Systems
Chair: Professor George Michailidis
Committee Members: Professor Susan Murphy, Associate Professor Kerby Shedden, Professor Naisyin Wang, Professor Mike Boehnke (Biostatistics), Professor Jeremy Taylor (Biostatistics)
Abstract: This work discusses several aspects of estimation and inference for high dimensional networks, and is divided into three main parts. First, to assess the significance of arbitrary subnetworks (e.g. biological pathways), we propose a latent variable model that directly incorporates the network information. By formulating the problem as a (generalized) mixed linear model, we introduce a general inference procedure for testing the significance of subnetworks, that can be used to test for changes in both expression levels of the corresponding nodes (e.g. genes), as well as the structure of the network. The framework is then extended for analysis of data obtained from complex experimental designs. We also study the effect of noise in the network information, both theoretically and empirically, and show that the proposed inference procedure is robust to the presence of random noise. In the second part, we consider the problem of estimating directed graphs from observed data. The general problem of estimation of directed graphs is computationally NP-hard and direction of interactions may not be distinguishable from observations. We consider a special case of this problem, where the nodes (i.e. variables) inherit a natural ordering, and propose efficient penalized likelihood methods for the estimation of the underlying network structure. Consistency of the estimators in the high dimensional setting (more variables than observations) is established. We also propose an extension of the lasso penalty that results in improved estimation of graphical Granger causality from time-course observations. The last part of the dissertation is devoted to issues of dimension reduction and efficient computation in networks. We propose a dimension reduction algorithm for networks using Laplacian eigenmaps, discuss the connections of this method to principal component analysis, and formulate the inference problem using a group lasso penalty. We also address computational aspects of estimation in networks, by proposing a distributed algorithm based on block-relaxation and derive conditions required for convergence of the algorithm to the maximum likelihood estimate. Finally, we propose an extension of the block-relaxation algorithm, called approximate block-relaxation, that facilitates the use of iterative algorithms in optimization problems with complex objective functions.