Large sample covariance matrices and high-dimensional data analysis pdf

Large sample covariance matrices and highdimensional data analysis high dimensional data appear in many fields, and their analysis. Regularized estimation of large covariance matrices. Covariance and precision matrices provide a useful summary of such structure, yet the performance of popular matrix estimators typically hinges upon a. An overview on the estimation of large covariance and. We may say that the main success of multiparametric statistics is based on methods of spectral theory of large sample covariance matrices and their limit spectra. It teaches basic theoretical skills for the analysis. Minimax rates of convergence for estimating several classes of structured covariance and precision matrices, including bandable, toeplitz, and sparse covariance matrices as well as sparse precision matrices, are given under the spectral norm loss. The latter renders formulations of test procedures and power analysis, as. The sample covariance matrix is regarded a poor estimator, since it is not consistent w. Covariance matrix an overview sciencedirect topics. Spectral analysis of highdimensional sample covariance matrices.

This book deals with the analysis of covariance matrices under two different assumptions. On spectral properties of large dimensional correlation. Highdimensional covariance estimation mohsen pourahmadi. Global testing and largescale multiple testing for high. Spectrum estimation for large dimensional covariance.

Highdimensional covariance estimation provides accessible and comprehensive. However, it has long been observed that several wellknown methods in multivariate analysis. Concerning statistical inference for high dimensional data, furthermore, there are many available research methods based on sample covariance matrices such as johnstone 2001. Robust estimation of highdimensional covariance and. Techniques and asymptotic theory for highdimensional covariance matrix estimates are quite different from the lowdimensional ones. Sample covariance matrices are widely used in multivariate statistical analysis. Estimating covariance matrices has always been an important part of multivariate analysis, and estimating large covariance matrices where the dimension of the data p is comparable to or larger than the sample size nhas gained particular attention recently, since highdimensional data are so common in. A test for the equality of covariance matrices when the dimension is large relative to the sample sizes. Large sample covariance matrices and highdimensional data. Generalized thresholding of large covariance matrices. Large sample covariance matrices and highdimensional. Large sample covariance matrices and highdimensional data analysis highdimensional data appear in many fields, and their analysis.

Large sample approximations for variancecovariance matrices of. Compressed covariance estimation with automated dimension. One test is on the whole variancecovariance matrices, and the other is on offdiagonal submatrices which define the covariance between two nonoverlapping segments of the highdimensional random vectors. The abundance of highdimensional data is one reason for the interest in the problem. Highdimensional data appear in many fields, and their analysis has become increasingly important in modern statistics. Highdimensional covariance estimation provides accessible and comprehensive coverage of. Estimating structured highdimensional covariance and. Large sample covariance matrices and highdimensional data analysis highdimensional data appear in many. Sample covariance matrices and highdimensional data analysis revised draft april 2019 the is a revision of the book published by cambridge university press in 2015 isbn. Cambridge series in statistical and probabilistic mathematics 39. Highdimensional variancecovariance matrices play a crucial role in those areas, since they provide information on the dependence of the coordinates 2nd order. Twosampletestsforhighdimensionalcovariance matrices jun li and song xi chen department of statistics, iowa state university. Massive data analyses and statistical learning in many real applications require a careful understanding of the high dimensional covariance structure.

It is wellknown that the sample covariance based on the observed data is singular when the dimension is larger than the sample size. Large sample covariance matrices and highdimensional data analysis large sample covariance matrices and highdimensional data analysis bai, zhidong, yao, jianfeng, zheng, shurong. While the former approach is the classical framework to derive asymptotics, nevertheless the latter has received increasing attention due to its applications in the emerging field of big data. Sparse estimation of highdimensional covariance matrices. Estimation of large covariance matrices, particularly in situations where the data dimension p is comparable to or larger than the sample size n, has attracted a lot of attention recently.

We also show that generalized thresholding is, in the terminology of lam and fan 2007, sparsistent, meaning that in addition to being consistent it estimates true zeros as. Two sample tests for highdimensional covariance matrices 911 those based on the other norm, which is especially the case when considering the limiting distribution of the test statistics. The problems arise from statistical analysis of large panel economics. These include estimation of and testing for large covariance matrices, volatility matrices, correlation matrices, precision matrices, gaussian graphical models. Two sample tests for highdimensional covariance matrices. However, it has long been observed that several wellknown methods in multivariate analysis become inefficient, or even misleading, when the data dimension p. Highdimensional probability is an area of probability theory that studies random objects in rn where the dimension ncan be very large. The central limit theorems clts for linear spectral statistics of highdimensional noncentralized sample covariance matrices have received considerable attention in random matrix theory and have been applied to many highdimensional statistical problems. As highdimensional data becomes ubiquitous, standard estimators of the population covariance matrix become di cult to use. Sample covariance matrices and highdimensional data analysis.

Analysis of high dimensional data, whose dimension pcan be much larger than the sample. How to analyze highdimensional highlycorrelated vector time series. Global testing and testing for highdimensional covariance. In the central limit theorem of linear spectral statistics for sample covariance matrices, the theoretical mean and covariance are computed numerically. Shrinkage estimators for highdimensional covariance matrices brian williamson abstract. A less developed theory nonparametric estimation of sparse means y i. Exact separation of eigenvalues of large dimensional sample covariance matrices bai, z. Tests for highdimensional covariance matrices statistics and.

An overview on the estimation of large covariance and precision matrices jianqing fan. Many problems in statistical pattern recognition and analysis require the classi. Department of statistics, university of california, berkeley abstract we place ourselves in the setting of highdimensional statistical inference, where the. In the highdimensional settings, these methods either do not perform well or are no longer applicable. However, it has long been observed that several wellknown methods in multivariate analysis become inef. Shrinkage estimators for highdimensional covariance matrices. The tests are applicable i when the data dimension is much larger than the sample.

For inference on highdimensional covariance matrices, there has been an array of works on the convergence of the sample covariance matrices based on the spectral analysis of largedimensional random matrices bai and yin 1993. We study highdimensional sample covariance matrices based on. Large sample covariance matrices and highdimensional data analysis jianfeng yao, shurong zheng and zhidong bai excerpt. However, covariance estimation for high dimensional vectors is a classically dif.

Estimating high dimensional covariance matrices and its applications. Pdf highdimensional data appear in many fields, and their analysis has become increasingly important in modern statistics. How to analyze high dimensional highlycorrelated vector time series. Highdimensional data are often most plausibly generated from distributions with complex structure and leptokurtosis in some or all components. Corrections to lrt on largedimensional covariance matrix by rmt bai, zhidong, jiang, dandan, yao, jianfeng, and zheng, shurong, the annals of statistics, 2009. Testing high dimensional covariance matrices 2361 settings here include all the cases where the dimension ppn. This book places particular emphasis on random vectors, random matrices, and random projections. Based on the random matrix theory, a unified numerical approach is developed for power calculation in the general framework of hypothesis testing with highdimensional covariance matrices. University of california, berkeley estimating the eigenvalues of a population covariance matrix from a sample covariance matrix is a problem of fundamental i mportance in multivariate statistics. Bickel and elizavetalevina1 university of california, berkeley and university of michigan this paper considers estimating a covariance matrix of p variables from n observations by either banding or tapering the sample covariance matrix. Largest entries of sample correlation matrices from equicorrelated normal populations. Estimating high dimensional covariance matrices and its. Power computation for hypothesis testing with high. Pdf large sample covariance matrices and highdimensional.

The limitations of the sample covariance matrix are discussed. Two sample tests for high dimensional covariance matrices. Abstract title of dissertation tests for covariance matrices with highdimensional data author ms. Sparse estimation of large covariance matrices via a. While the former approach is the classical framework to derive asymptotics, nevertheless the latter has received increasing attention due to its applications in the emerging field of bigdata. Sorry, we are unable to provide the full text but you may find it at the following locations. Large covariance matrix typically plays a role through either its quadratic and spectral functionals or a structure of lowrank plus sparse components. Saowapha chaipitak degree doctor of philosophy statistics year 2012 in multivariate statistical analysis, it is a necessity to know the facts regarding the covariance matrix of the data in hand before applying any further analysis.

Gaussian fluctuations for linear spectral statistics of large random covariance matrices najim, jamal and yao, jianfeng, the annals of applied probability, 2016. More recently, his research group has developed new statistical methods for highdimensional data analysis. We propose two tests for the equality of covariance matrices between two highdimensional populations. Optimal hypothesis testing for high dimensional covariance. Covariance estimation for high dimensional data vectors. Methods for estimating sparse and large covariance matrices covariance and correlation matrices play fundamental roles in every aspect of the analysis of multivariate data collected from a variety of fields including business and economics, health care, engineering, and environmental and physical sciences. Estimating covariance matrices is an important part of portfolio selection, risk management, and asset pricing. A twosample test for highdimensional data with applications to geneset testing chen, song xi and qin, yingli, the annals of statistics, 2010. Covariance and correlation matrices play fundamental roles in every aspect of the analysis of multivariate data collected from a variety of fields including business and economics, health care, engineering, and environmental and physical sciences. Large sample covariance matrices and highdimensional data analysis. Main large sample covariance matrices and highdimensional data analysis large sample covariance matrices and highdimensional data analysis bai, zhidong, yao, jianfeng, zheng, shurong.

1061 595 997 637 1040 156 21 1023 1219 1503 478 775 127 32 993 957 1427 998 1142 4 603 1210 859 883 446 505 1178 1391 1061 83 760 1399 549 995 163 1025 488 1476 278 1155 359