EBookClubs

Read Books & Download eBooks Full Online

EBookClubs

Read Books & Download eBooks Full Online

Book Tests of Hypotheses on Regression Coefficients in High Dimensional Regression Models

Download or read book Tests of Hypotheses on Regression Coefficients in High Dimensional Regression Models written by Ye Alex Zhao and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Statistical inference in high-dimensional settings has become an important area of research due to the increased production of high-dimensional data in a wide variety of areas. However, few approaches towards simultaneous hypothesis testing of high-dimensional regression coefficients have been proposed. In the first project of this dissertation, we introduce a new method for simultaneous tests of the coefficients in a high-dimensional linear regression model. Our new test statistic is based on the sum-of-squares of the score function mean with an additional power-enhancement term. The asymptotic distribution and power of the test statistic are derived, and our procedure is shown to outperform existing approaches. We conduct Monte Carlo simulations to demonstrate performance improvements over existing methods and apply the testing procedure to a real data example. In the second project, we propose a test statistic for regression coefficients in a high-dimensional setting that applies for generalized linear models. Building on previous work on testing procedures for high-dimensional linear regression models, we extend this approach to create a new testing methodology for GLMs, with specific illustrations for the Poisson and logistic regression scenarios. The asymptotic distribution of the test statistic is established, and both simulation results and a real data analysis are conducted to illustrate the performance of our proposed method. The final project of this dissertation introduces two new approaches for testing high-dimensional regression coefficients in the partial linear model setting and more generally for linear hypothesis tests in linear models. Our proposed statistic is motivated by the profile least squares method and decorrelation score method for high-dimensional inference, which we show to be equivalent in these particular cases. We outline the empirical performance of the new test statistic with simulation studies and real data examples. These results indicate generally satisfactory performance under a wide range of settings and applicability to real world data problems.

Book Testing a Single Regression Coefficient in High Dimensional Regression Model

Download or read book Testing a Single Regression Coefficient in High Dimensional Regression Model written by Wei Lan and published by . This book was released on 2016 with total page 46 pages. Available in PDF, EPUB and Kindle. Book excerpt: In linear regression models with high dimensional data, the classical z-test (or t-test) for testing the significance of each single regression coefficient is no longer applicable. This is mainly because the number of covariates exceeds the sample size. In this paper, we propose a simple and novel alternative by introducing the Correlated Predictors Screening (CPS) method to control for predictors that are highly correlated with the target covariate. Accordingly, the classical ordinary least squares approach can be employed to estimate the regression coefficient associated with the target covariate. In addition, we demonstrate that the resulting estimator is consistent and asymptotically normal even if the random errors are heteroscedastic. This enables us to apply the z-test to assess the significance of each covariate. Based on the p-value obtained from testing the significance of each covariate, we further conduct multiple hypothesis testing by controlling the false discovery rate at the nominal level. Then, we show that the multiple hypothesis testing achieves consistent model selection. Simulation studies and empirical examples are presented to illustrate the finite sample performance and the usefulness of the proposed method, respectively.

Book Inference Methods for High Dimensional Data

Download or read book Inference Methods for High Dimensional Data written by Zhe Zhang and published by . This book was released on 2023 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: This dissertation aims to develop new statistical inference procedure for high-dimensional regression models, and focuses on three fundamental problems: (a) individual hypothesis testing without specification of high-dimensional regression models, (b) high dimensional linear hypothesis testing in linear regression model and (c) individual hypothesis testing in partial linear model . In Chapter 3, we propose an effective model-free inference procedure for high-dimensional regression models. We first reformulate the hypothesis testing problem via sufficient dimension reduction framework. With the aid of new reformulation, we propose a new test statistic and show that its asymptotic distribution is $\chi^2$ distribution whose degree of freedom does not depend on the unknown population distribution. We further conduct power analysis under local alternative hypotheses. In addition, we study how to control the false discovery rate of the proposed chi-squared tests, which are correlated, to identify important predictors under a model-free framework. To this end, we propose a multiple testing procedure and establish its theoretical guarantees. Monte Carlo simulation studies are conducted to assess the performance of the proposed tests and an empirical analysis of a real-world data set is used to illustrate the proposed methodology. In Chapter 4, we present a novel transformation-based inference method for conducting linear hypothesis tests in high-dimensional linear regression models. Our method uses score functions to construct a new random vector and links high-dimensional coefficient tests to high-dimensional one sample mean tests. We provide a formulation for a U-statistic with a kernel of order two and demonstrate its asymptotic normality. The presence of high-dimensional nuisance parameters presents a significant challenge in our model setting, however, we have shown that their impact can be disregarded asymptotically under mild conditions. Additionally, we have studied the influence of the power enhancement term on power performance through both theoretical analysis and simulations. The results indicate that the enhancement term does not impact the type-I error rate and can improve power performance in scenarios where the U-statistic may not perform well. In Chapter 5, we consider testing the treatment effect in high-dimensional partial linear models. Due to the slow convergence rate of the unknown nuisance function estimator from some machine learning algorithms, we can not directly estimate and plug in the nuisance function on the same data. To overcome this limitation, we update the estimation of the nuisance function recursively. This leads to an explicit expression of the estimators of the parameters of interest. Our approach has been shown to have asymptotic normality, and we assess its finite sample performance through simulations. The results indicate that our statistic offers higher power than in cases of model misspecification.

Book Testing Covariates in High Dimensional Regression

Download or read book Testing Covariates in High Dimensional Regression written by Wei Lan and published by . This book was released on 2013 with total page 34 pages. Available in PDF, EPUB and Kindle. Book excerpt: In a high dimensional linear regression model, we propose a new procedure for testing statistical significance of a subset of regression coefficients. Specifically, we employ the partial covariances between the response variable and the tested covariates to obtain a test statistic. The resulting test is applicable even if the predictor dimension is much larger than the sample size. Under the null hypothesis, together with boundedness and moment conditions on the predictors, we show that the proposed test statistic is asymptotically standard normal, which is further supported by Monte Carlo experiments. A similar test can be extended to generalized linear models. The practical usefulness of the test is illustrated via an empirical example on paid search advertising.

Book High dimensional Regression Models with Structured Coefficients

Download or read book High dimensional Regression Models with Structured Coefficients written by Yuan Li and published by . This book was released on 2018 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Regression models are very common for statistical inference, especially linear regression models with Gaussian noise. But in many modern scientific applications with large-scale datasets, the number of samples is small relative to the number of model parameters, which is the so-called high- dimensional setting. Directly applying classical linear regression models to high-dimensional data is ill-posed. Thus it is necessary to impose additional assumptions for regression coefficients to make high-dimensional statistical analysis possible. Regularization methods with sparsity assumptions have received substantial attention over the past two decades. But there are still some open questions regarding high-dimensional statistical analysis. Firstly, most literature provides statistical analysis for high-dimensional linear models with Gaussian noise, it is unclear whether similar results still hold if we are no longer in the Gaussian setting. To answer this question under Poisson setting, we study the minimax rates and provide an implementable convex algorithm for high-dimensional Poisson inverse problems under weak sparsity assumption and physical constraints. Secondly, much of the theory and methodology for high-dimensional linear regression models are based on the assumption that independent variables are independent of each other or have weak correlations. But it is possible that this assumption is not satisfied that some features are highly correlated with each other. It is natural to ask whether it is still possible to make high-dimensional statistical inference with high-correlated designs. Thus we provide a graph-based regularization method for high-dimensional regression models with high-correlated designs along with theoretical guarantees.

Book Bayesian Hypothesis Testing and Variable Selection in High Dimensional Regression

Download or read book Bayesian Hypothesis Testing and Variable Selection in High Dimensional Regression written by Min Wang and published by . This book was released on 2013 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Abstract: This dissertation consists of three distinct but related research projects. First of all, we study the Bayesian approach to model selection in the class of normal regression models. We propose an explicit closed-form expression of the Bayes factor with the use of Zellner's g-prior and the beta-prime prior for g. Noting that linear models with a growing number of unknown parameters have recently gained increasing popularity in practice, such as the spline problem, we shall thus be particularly interested in studying the model selection consistency of the Bayes factor under the scenario in which the dimension of the parameter space increases with the sample size. Our results show that the proposed Bayes factor is always consistent under the null model and is consistent under the alternative model except for a small set of alternative models which can be characterized. It is noteworthy that the results mentioned above can be applied to the analysis of variance (ANOVA) model, which has been widely used in many areas of science, such as ecology, psychology, and behavioral research. For the one-way unbalanced ANOVA model, we propose an explicit closed-form expression of the Bayes factor which is thus easy to compute. In addition, its corresponding model selection consistency has been investigated under different asymptotic situations. For the one-way random effects models, we also propose a closed-form Bayes factor without integral representation which has reasonable model selection consistency under different asymptotic scenarios. Moreover, the performance of the proposed Bayes factor is examined by numerical studies. The second project deals with the intrinsic Bayesian inference for the correlation coefficient between the disturbances in the system of two seemingly unrelated regression equations. This work was inspired by the observation that considerable attention has been paid to the improved estimation of the regression coefficients of each model, whereas little attention has just been paid for making inference of the correlation coefficient, even though most of the improved estimation of the regression coefficients depend on the correlation coefficient. We propose an objective Bayesian solution to the problems of hypothesis testing and point estimation for the correlation coefficient based on combined use of the invariant loss function and the objective prior distribution for the unknown model parameters. This new solution possesses an invariance property under monotonic reparameterization of the quantity of interest. Some simulation studies and one real-data example are given for illustrative purpose. In the third project, we propose a new Bayesian strength of evidence built on divergence measures for testing point null hypotheses. Our proposed approach can be viewed as an objective and automatic approach to the problem of testing a point null hypothesis. It is shown that the new evidence successfully reconciles the disagreement between frequentists and Bayesians in many classical examples in which Lindley's paradox often occurs. In particular, note that the proposed Bayesian approach under the noninformative prior often recovers the frequentist P-values. From a Bayesian decision-theoretical viewpoint, it is justified that the new evidence is a formal Bayes test for some specific loss functions. The performance of the proposed approach is illustrated through several numerical examples. Possible applications of the new evidence for a variety of point null hypothesis testing problems are also briefly discussed.

Book Partially Linear Models

    Book Details:
  • Author : Wolfgang Härdle
  • Publisher : Springer Science & Business Media
  • Release : 2012-12-06
  • ISBN : 3642577008
  • Pages : 210 pages

Download or read book Partially Linear Models written by Wolfgang Härdle and published by Springer Science & Business Media. This book was released on 2012-12-06 with total page 210 pages. Available in PDF, EPUB and Kindle. Book excerpt: In the last ten years, there has been increasing interest and activity in the general area of partially linear regression smoothing in statistics. Many methods and techniques have been proposed and studied. This monograph hopes to bring an up-to-date presentation of the state of the art of partially linear regression techniques. The emphasis is on methodologies rather than on the theory, with a particular focus on applications of partially linear regression techniques to various statistical problems. These problems include least squares regression, asymptotically efficient estimation, bootstrap resampling, censored data analysis, linear measurement error models, nonlinear measurement models, nonlinear and nonparametric time series models.

Book Some Inference Problems in High Dimensional Linear Models

Download or read book Some Inference Problems in High Dimensional Linear Models written by Miles Edward Lopes and published by . This book was released on 2015 with total page 124 pages. Available in PDF, EPUB and Kindle. Book excerpt: During the past two decades, technological advances have led to a proliferation of high-dimensional problems in data analysis. The characteristic feature of such problems is that they involve large numbers of unknown parameters and relatively few observations. As the study of high-dimensional statistical models has developed, linear models have taken on a special status for their widespread application and extensive theory. Even so, much of the theoretical research on high-dimensional linear models has been concentrated on the problems of prediction and estimation, and many inferential questions regarding hypothesis tests and confidence intervals remain open. In this dissertation, we explore two sets of inferential questions arising in high-dimensional linear models. The first set deals with the residual bootstrap (RB) method and the distributional approximation of regression contrasts. The second set addresses the issue of unknown sparsity in the signal processing framework of compressed sensing. Although these topics involve distinct methods and applications, the dissertation is unified by an overall focus on the interplay between model structure and inference. Specifically, our work is motivated by an interest in using inferential methods to confirm the existence of model structure, and in developing new inferential methods that have minimal reliance on structural assumptions. The residual bootstrap method is a general approach to approximating the sampling distribution of statistics derived from estimated regression coefficients. When the number of regression coefficients p is small compared to the number of observations n, classical results show that RB consistently approximates the laws of contrasts obtained from least-squares coefficients. However, when p/n~1, it is known that there exist contrasts for which RB fails -- when applied to least-squares residuals. As a remedy, we propose an alternative method that is tailored to regression models involving near low-rank design matrices. In this situation, we prove that resampling the residuals of a ridge regression estimator can alleviate some of the problems that occur for least-squares residuals. Notably, our approach does not depend on sparsity in the true regression coefficients. Furthermore, the assumption of a near low-rank design is one that is satisfied in many applications and can be inspected directly in practice. In the second portion of the dissertation, we turn our attention to the subject of compressed sensing, which deals with the recovery of sparse high-dimensional signals from a limited number of linear measurements. Although the theory of compressed sensing offers strong recovery guarantees, many of its basic results depend on prior knowledge of the signal's sparsity level -- a parameter that is rarely known in practice. Towards a resolution of this issue, we introduce a generalized family of sparsity parameters that can be estimated in a way that is free of structural assumptions. We show that our estimator is ratio-consistent with a dimension-free rate of convergence, and also derive the estimator's limiting distribution. In turn, these results make it possible to set confidence intervals for the sparsity level and to test the hypothesis of sparsity in a precise sense.

Book Hypothesis Testing with High dimensional Data

Download or read book Hypothesis Testing with High dimensional Data written by Sen Zhao and published by . This book was released on 2017 with total page 185 pages. Available in PDF, EPUB and Kindle. Book excerpt: In the past two decades, vast high-dimensional biomedical datasets have become mainstay in various biomedical applications from genomics to neuroscience. These high-dimensional data enable researchers to answer scientific questions that are impossible to answer with classical, low-dimensional datasets. However, due to the "curse of dimensionality", such high-dimensional datasets also pose serious statistical challenges. Motivated by these emerging applications, statisticians have devoted much effort to developing estimation methods for high-dimensional linear models and graphical models. However, there is still little progress on quantifying the uncertainty of the estimates, e.g., obtaining p-values and confidence intervals, which are crucial for drawing scientific conclusions. While encouraging advances have been made in this area over the past couple of years, the majority of existing high-dimensional hypothesis testing methods still suffer from low statistical power or high computational intensity. In this dissertation, we focus on developing hypothesis testing methods for high-dimensional linear and graphical models. In Chapter 2, we investigate a naive and simple two-step hypothesis testing procedure for linear models. We show that, under appropriate conditions, such a simple procedure controls type-I error rate, and is closely connected to more complicated alternatives. We also show in numerical studies that such a simple procedure achieves similar performance as procedures that are computationally more intense. In Chapter 3, we consider hypothesis testing for linear regression that incorporates external information about the relationship between variables represented by a graph, such as the gene regulatory network. We show in theory and numerical studies that by incorporating informative external information, our proposal is substantially more powerful than existing methods that ignore such information. We also propose a more robust procedure for settings where the external information is potentially inaccurate or imprecise. This robust procedure could adaptively choose the amount of external information to be incorporated based on the data. In Chapter 4, we shift our focus to Gaussian graphical models. We propose a novel procedure to test whether two Gaussian graphical models share the same edge set, while controlling the false positive rate. In the case that two networks are different, our proposals could identify specific nodes and edges that show differential connectivity. In this chapter, we also demonstrate that when the goal is to identify differentially connected nodes and edges, the results from our proposal are more interpretable than existing procedures based on covariance or precision matrices. We finish the dissertation with a discussion in Chapter 5, in which we present viable future research directions, and discuss a possible extension of our proposals to vector autoregression models for time series.

Book Testing Research Hypotheses Using Multiple Linear Regression

Download or read book Testing Research Hypotheses Using Multiple Linear Regression written by Keith A. McNeil and published by . This book was released on 1975 with total page 616 pages. Available in PDF, EPUB and Kindle. Book excerpt: Multiple regression is becomingmore wide­ly used as the statistical technique for answering research hypotheses. This is so for several reasons: 1) the technique is extreme­ly versatile; 2) the computer has made the technique more available to researchers; and 3) texts such as the authors' earlier work are making the technique more available to re­searchers. The statistical technique of mul­tiple regression allows the inclusion of numerous continuous (quantitative) and categorical (qualitative) variables in the prediction of some criterion. Appendixes contain a multiple regression computer program and data on which the problems are based; a discussion of the simi­larities and differences between analysis of variance and multiple regression; and a computer program providing the regression solution to natural language research hy­potheses.

Book High dimensional Econometrics And Identification

Download or read book High dimensional Econometrics And Identification written by Kao Chihwa and published by World Scientific. This book was released on 2019-04-10 with total page 180 pages. Available in PDF, EPUB and Kindle. Book excerpt: In many applications of econometrics and economics, a large proportion of the questions of interest are identification. An economist may be interested in uncovering the true signal when the data could be very noisy, such as time-series spurious regression and weak instruments problems, to name a few. In this book, High-Dimensional Econometrics and Identification, we illustrate the true signal and, hence, identification can be recovered even with noisy data in high-dimensional data, e.g., large panels. High-dimensional data in econometrics is the rule rather than the exception. One of the tools to analyze large, high-dimensional data is the panel data model.High-Dimensional Econometrics and Identification grew out of research work on the identification and high-dimensional econometrics that we have collaborated on over the years, and it aims to provide an up-todate presentation of the issues of identification and high-dimensional econometrics, as well as insights into the use of these results in empirical studies. This book is designed for high-level graduate courses in econometrics and statistics, as well as used as a reference for researchers.

Book Resampling Based Multiple Testing

Download or read book Resampling Based Multiple Testing written by Peter H. Westfall and published by John Wiley & Sons. This book was released on 1993-01-12 with total page 382 pages. Available in PDF, EPUB and Kindle. Book excerpt: Combines recent developments in resampling technology (including the bootstrap) with new methods for multiple testing that are easy to use, convenient to report and widely applicable. Software from SAS Institute is available to execute many of the methods and programming is straightforward for other applications. Explains how to summarize results using adjusted p-values which do not necessitate cumbersome table look-ups. Demonstrates how to incorporate logical constraints among hypotheses, further improving power.

Book Linear Models in Statistics

Download or read book Linear Models in Statistics written by Alvin C. Rencher and published by John Wiley & Sons. This book was released on 2008-01-07 with total page 690 pages. Available in PDF, EPUB and Kindle. Book excerpt: The essential introduction to the theory and application of linear models—now in a valuable new edition Since most advanced statistical tools are generalizations of the linear model, it is neces-sary to first master the linear model in order to move forward to more advanced concepts. The linear model remains the main tool of the applied statistician and is central to the training of any statistician regardless of whether the focus is applied or theoretical. This completely revised and updated new edition successfully develops the basic theory of linear models for regression, analysis of variance, analysis of covariance, and linear mixed models. Recent advances in the methodology related to linear mixed models, generalized linear models, and the Bayesian linear model are also addressed. Linear Models in Statistics, Second Edition includes full coverage of advanced topics, such as mixed and generalized linear models, Bayesian linear models, two-way models with empty cells, geometry of least squares, vector-matrix calculus, simultaneous inference, and logistic and nonlinear regression. Algebraic, geometrical, frequentist, and Bayesian approaches to both the inference of linear models and the analysis of variance are also illustrated. Through the expansion of relevant material and the inclusion of the latest technological developments in the field, this book provides readers with the theoretical foundation to correctly interpret computer software output as well as effectively use, customize, and understand linear models. This modern Second Edition features: New chapters on Bayesian linear models as well as random and mixed linear models Expanded discussion of two-way models with empty cells Additional sections on the geometry of least squares Updated coverage of simultaneous inference The book is complemented with easy-to-read proofs, real data sets, and an extensive bibliography. A thorough review of the requisite matrix algebra has been addedfor transitional purposes, and numerous theoretical and applied problems have been incorporated with selected answers provided at the end of the book. A related Web site includes additional data sets and SAS® code for all numerical examples. Linear Model in Statistics, Second Edition is a must-have book for courses in statistics, biostatistics, and mathematics at the upper-undergraduate and graduate levels. It is also an invaluable reference for researchers who need to gain a better understanding of regression and analysis of variance.

Book Statistical Foundations of Data Science

Download or read book Statistical Foundations of Data Science written by Jianqing Fan and published by CRC Press. This book was released on 2020-09-21 with total page 942 pages. Available in PDF, EPUB and Kindle. Book excerpt: Statistical Foundations of Data Science gives a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories. It aims to serve as a graduate-level textbook and a research monograph on high-dimensional statistics, sparsity and covariance learning, machine learning, and statistical inference. It includes ample exercises that involve both theoretical studies as well as empirical applications. The book begins with an introduction to the stylized features of big data and their impacts on statistical analysis. It then introduces multiple linear regression and expands the techniques of model building via nonparametric regression and kernel tricks. It provides a comprehensive account on sparsity explorations and model selections for multiple regression, generalized linear models, quantile regression, robust regression, hazards regression, among others. High-dimensional inference is also thoroughly addressed and so is feature screening. The book also provides a comprehensive account on high-dimensional covariance estimation, learning latent factors and hidden structures, as well as their applications to statistical estimation, inference, prediction and machine learning problems. It also introduces thoroughly statistical machine learning theory and methods for classification, clustering, and prediction. These include CART, random forests, boosting, support vector machines, clustering algorithms, sparse PCA, and deep learning.

Book Sparse Graphical Modeling for High Dimensional Data

Download or read book Sparse Graphical Modeling for High Dimensional Data written by Faming Liang and published by CRC Press. This book was released on 2023-08-02 with total page 151 pages. Available in PDF, EPUB and Kindle. Book excerpt: A general framework for learning sparse graphical models with conditional independence tests Complete treatments for different types of data, Gaussian, Poisson, multinomial, and mixed data Unified treatments for data integration, network comparison, and covariate adjustment Unified treatments for missing data and heterogeneous data Efficient methods for joint estimation of multiple graphical models Effective methods of high-dimensional variable selection Effective methods of high-dimensional inference