EBookClubs

Read Books & Download eBooks Full Online

EBookClubs

Read Books & Download eBooks Full Online

Book Regularized Regression Methods for Variable Selection and Estimation

Download or read book Regularized Regression Methods for Variable Selection and Estimation written by Lee Herbrandson Dicker and published by . This book was released on 2010 with total page 222 pages. Available in PDF, EPUB and Kindle. Book excerpt: We make two contributions to the body of work on the variable selection and estimation problem. First, we propose a new penalized likelihood procedure--the seamless- L 0 (SELO) method--which utilizes a continuous penalty function that closely approximates the discontinuous L 0 penalty. The SELO penalized likelihood procedure consistently selects the correct variables and is asymptotically normal, provided the number of variables grows slower than the number of observations. The SELO method is efficiently implemented using a coordinate descent algorithm. Tuning parameter selection is crucial to the performance of the SELO procedure. We propose a BIC-like tuning parameter selection method for SELO which consistently identifies the correct model, even if the number of variables diverges. Simulation results show that the SELO procedure with BIC tuning parameter selection performs very well in a variety of settings--outperforming other popular penalized likelihood procedures by a substantial margin. Using SELO, we analyze a publicly available HIV drug resistance and mutation dataset and obtain interpretable results.

Book Regularized Regression in Generalized Linear Measurement Error Models with Instrumental Variables  variable Selection and Parameter Estimation

Download or read book Regularized Regression in Generalized Linear Measurement Error Models with Instrumental Variables variable Selection and Parameter Estimation written by Lin Xue and published by . This book was released on 2020 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Regularization method is a commonly used technique in high dimensional data analysis. With properly chosen tuning parameter for certain penalty functions, the resulting estimator is consistent in both variable selection and parameter estimation. Most regularization methods assume that the data can be observed and precisely measured. However, it is well-known that the measurement error (ME) is ubiquitous in real-world datasets. In many situations some or all covariates cannot be observed directly or are measured with errors. For example, in cardiovascular disease related studies, the goal is to identify important risk factors such as blood pressure, cholesterol level and body mass index, which cannot be measured precisely. Instead, the corresponding proxies are employed for analysis. If the ME is ignored in regularized regression, the resulting naive estimator can have high selection and estimation bias. Accordingly, the important covariates are falsely dropped from the model and the redundant covariates are retained in the model incorrectly. We illustrate how ME affects the variable selection and parameter estimation through theoretical analysis and several numerical examples. To correct for the ME effects, we propose the instrumental variable assisted regularization method for linear and generalized linear models. We showed that the proposed estimator has the oracle property such that it is consistent in both variable selection and parameter estimation. The asymptotic distribution of the estimator is derived. In addition, we showed that the implementation of the proposed method is equivalent to the plug-in approach under linear models, and the asymptotic variance-covariance matrix has a compact form. Extensive simulation studies in linear, logistic and poisson log-linear regression showed that the proposed estimator outperforms the naive estimator in both linear and generalized linear models. Although the focus of this study is the classical ME, we also discussed the variable selection and estimation in the setting of Berkson ME. In particular, our finite sample simulation studies show that in contrast to the estimation in linear regression, the Berkson ME may cause bias in variable selection and estimation. Finally, the proposed method is applied to real datasets of diabetes and Framingham heart study.

Book Variable Selection by Regularization Methods for Generalized Mixed Models

Download or read book Variable Selection by Regularization Methods for Generalized Mixed Models written by Andreas Groll and published by Cuvillier Verlag. This book was released on 2011-12-13 with total page 175 pages. Available in PDF, EPUB and Kindle. Book excerpt: A regression analysis describes the dependency of random variables in the form of a functional relationship. One distinguishes between the dependent response variable and one or more independent influence variables. There is a variety of model classes and inference methods available, ranging from the conventional linear regression model up to recent non- and semiparametric regression models. The so-called generalized regression models form a methodically consistent framework incorporating many regression approaches with response variables that are not necessarily normally distributed, including the conventional linear regression model based on the normal distribution assumption as a special case. When repeated measurements are modeled in addition to fixed effects also random effects or coefficients can be included. Such models are known as Random Effects Models or Mixed Models. As a consequence, regression procedures are applicable extremely versatile and consider very different problems. In this dissertation regularization techniques for generalized mixed models are developed that are able to perform variable selection. These techniques are especially appropriate when many potential influence variables are present and existing approaches tend to fail. First of all a componentwise boosting technique for generalized linear mixed models is presented which is based on the likelihood function and works by iteratively fitting the residuals using weak learners. The complexity of the resulting estimator is determined by information criteria. For the estimation of variance components two approaches are considered, an estimator resulting from maximizing the profile likelihood, and an estimator which can be calculated using an approximative EM-algorithm. Then the boosting concept is extended to mixed models with ordinal response variables. Two different types of ordered models are considered, the threshold model, also known as cumulative model, and the sequential model. Both are based on the assumption that the observed response variable results from a categorized version of a latent metric variable. In the further course of the thesis the boosting approach is extended to additive predictors. The unknown functions to be estimated are expanded in B-spline basis functions, whose smoothness is controlled by penalty terms. Finally, a suitable L1-regularization technique for generalized linear models is presented, which is based on a combination of Fisher scoring and gradient optimization. Extensive simulation studies and numerous applications illustrate the competitiveness of the methods constructed in this thesis compared to conventional approaches. For the calculation of standard errors bootstrap methods are used.

Book A Regularization Approach for Estimation and Variable Selection in High Dimensional Regression

Download or read book A Regularization Approach for Estimation and Variable Selection in High Dimensional Regression written by Yiannis Dendramis and published by . This book was released on 2019 with total page 49 pages. Available in PDF, EPUB and Kindle. Book excerpt: Model selection and estimation are important topics in econometric analysis which can become considerably complicated in high dimensional settings, where the set of possible regressors can become larger than the set of available observations. For large scale problems the penalized regression methods (e.g. Lasso) have become the de facto benchmark that can effectively trade off parsimony and fit. In this paper we introduce a regularized estimation and model selection approach that is based on sparse large covariance matrix estimation, introduced by Bickel and Levina (2008) and extended by Dendramis, Giraitis, and Kapetanios (2018). We provide asymptotic and small sample results that indicate that our approach can be an important alternative to the penalized regression. Moreover, we also introduce a number of extensions that can improve the asymptotic and small sample performance of the proposed method. The usefulness of what we propose is illustrated via Monte Carlo exercises and an empirical application in macroeconomic forecasting.

Book Statistical Learning with Sparsity

Download or read book Statistical Learning with Sparsity written by Trevor Hastie and published by CRC Press. This book was released on 2015-05-07 with total page 354 pages. Available in PDF, EPUB and Kindle. Book excerpt: Discover New Methods for Dealing with High-Dimensional DataA sparse statistical model has only a small number of nonzero parameters or weights; therefore, it is much easier to estimate and interpret than a dense model. Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underl

Book Tuning Parameter Selection in L1 Regularized Logistic Regression

Download or read book Tuning Parameter Selection in L1 Regularized Logistic Regression written by Shujing Shi and published by . This book was released on 2012 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Variable selection is an important topic in regression analysis and is intended to select the best subset of predictors. Least absolute shrinkage and selection operator (Lasso) was introduced by Tibshirani in 1996. This method can serve as a tool for variable selection because it shrinks some coefficients to exact zero by a constraint on the sum of absolute values of regression coefficients. For logistic regression, Lasso modifies the traditional parameter estimation method, maximum log likelihood, by adding the L1 norm of the parameters to the negative log likelihood function, so it turns a maximization problem into a minimization one. To solve this problem, we first need to give the value for the parameter of the L1 norm, called tuning parameter. Since the tuning parameter affects the coefficients estimation and variable selection, we want to find the optimal value for the tuning parameter to get the most accurate coefficient estimation and best subset of predictors in the L1 regularized regression model. There are two popular methods to select the optimal value of the tuning parameter that results in a best subset of predictors, Bayesian information criterion (BIC) and cross validation (CV). The objective of this paper is to evaluate and compare these two methods for selecting the optimal value of tuning parameter in terms of coefficients estimation accuracy and variable selection through simulation studies.

Book Penalty  Shrinkage and Pretest Strategies

Download or read book Penalty Shrinkage and Pretest Strategies written by S. Ejaz Ahmed and published by Springer Science & Business Media. This book was released on 2013-12-11 with total page 122 pages. Available in PDF, EPUB and Kindle. Book excerpt: The objective of this book is to compare the statistical properties of penalty and non-penalty estimation strategies for some popular models. Specifically, it considers the full model, submodel, penalty, pretest and shrinkage estimation techniques for three regression models before presenting the asymptotic properties of the non-penalty estimators and their asymptotic distributional efficiency comparisons. Further, the risk properties of the non-penalty estimators and penalty estimators are explored through a Monte Carlo simulation study. Showcasing examples based on real datasets, the book will be useful for students and applied researchers in a host of applied fields. The book’s level of presentation and style make it accessible to a broad audience. It offers clear, succinct expositions of each estimation strategy. More importantly, it clearly describes how to use each estimation strategy for the problem at hand. The book is largely self-contained, as are the individual chapters, so that anyone interested in a particular topic or area of application may read only that specific chapter. The book is specially designed for graduate students who want to understand the foundations and concepts underlying penalty and non-penalty estimation and its applications. It is well-suited as a textbook for senior undergraduate and graduate courses surveying penalty and non-penalty estimation strategies, and can also be used as a reference book for a host of related subjects, including courses on meta-analysis. Professional statisticians will find this book to be a valuable reference work, since nearly all chapters are self-contained.

Book Developing a Protocol for Observational Comparative Effectiveness Research  A User s Guide

Download or read book Developing a Protocol for Observational Comparative Effectiveness Research A User s Guide written by Agency for Health Care Research and Quality (U.S.) and published by Government Printing Office. This book was released on 2013-02-21 with total page 236 pages. Available in PDF, EPUB and Kindle. Book excerpt: This User’s Guide is a resource for investigators and stakeholders who develop and review observational comparative effectiveness research protocols. It explains how to (1) identify key considerations and best practices for research design; (2) build a protocol based on these standards and best practices; and (3) judge the adequacy and completeness of a protocol. Eleven chapters cover all aspects of research design, including: developing study objectives, defining and refining study questions, addressing the heterogeneity of treatment effect, characterizing exposure, selecting a comparator, defining and measuring outcomes, and identifying optimal data sources. Checklists of guidance and key considerations for protocols are provided at the end of each chapter. The User’s Guide was created by researchers affiliated with AHRQ’s Effective Health Care Program, particularly those who participated in AHRQ’s DEcIDE (Developing Evidence to Inform Decisions About Effectiveness) program. Chapters were subject to multiple internal and external independent reviews. More more information, please consult the Agency website: www.effectivehealthcare.ahrq.gov)

Book The Australian Temperament Project

Download or read book The Australian Temperament Project written by Suzanne Vassallo and published by . This book was released on 2013 with total page 26 pages. Available in PDF, EPUB and Kindle. Book excerpt: This report highlights some of the key learnings about human development from the Australian Temperament Project (ATP) - a groundbreaking longitudinal study that, to date, has followed a large group of Victorians from their birth to age 30 years. ATP is a joint project between the Australian Institute of Family Studies, the Royal Children's Hospital, the University of Melbourne and Deakin University and is one of only a few in the world with information on three generations of study members - the young people, their parents, and now the young people's own children.

Book Variable Selection Via Penalized Regression and the Genetic Algorithm Using Information Complexity  with Applications for High dimensional  omics Data

Download or read book Variable Selection Via Penalized Regression and the Genetic Algorithm Using Information Complexity with Applications for High dimensional omics Data written by Tyler J. Massaro and published by . This book was released on 2016 with total page 360 pages. Available in PDF, EPUB and Kindle. Book excerpt: This dissertation is a collection of examples, algorithms, and techniques for researchers interested in selecting influential variables from statistical regression models. Chapters 1, 2, and 3 provide background information that will be used throughout the remaining chapters, on topics including but not limited to information complexity, model selection, covariance estimation, stepwise variable selection, penalized regression, and especially the genetic algorithm (GA) approach to variable subsetting. In chapter 4, we fully develop the framework for performing GA subset selection in logistic regression models. We present advantages of this approach against stepwise and elastic net regularized regression in selecting variables from a classical set of ICU data. We further compare these results to an entirely new procedure for variable selection developed explicitly for this dissertation, called the post hoc adjustment of measured effects (PHAME). In chapter 5, we reproduce many of the same results from chapter 4 for the first time in a multinomial logistic regression setting. The utility and convenience of the PHAME procedure is demonstrated on a set of cancer genomic data. Chapter 6 marks a departure from supervised learning problems as we shift our focus to unsupervised problems involving mixture distributions of count data from epidemiologic fields. We start off by reintroducing Minimum Hellinger Distance estimation alongside model selection techniques as a worthy alternative to the EM algorithm for generating mixtures of Poisson distributions. We also create for the first time a GA that derives mixtures of negative binomial distributions. The work from chapter 6 is incorporated into chapters 7 and 8, where we conclude the dissertation with a novel analysis of mixtures of count data regression models. We provide algorithms based on single and multi-target genetic algorithms which solve the mixture of penalized count data regression models problem, and we demonstrate the usefulness of this technique on HIV count data that were used in a previous study published by Gray, Massaro et al. (2015) as well as on time-to-event data taken from the cancer genomic data sets from earlier.

Book Variable selection and parameter estimation for normal linear regression models

Download or read book Variable selection and parameter estimation for normal linear regression models written by Peter J. Kempthorne and published by . This book was released on 1985 with total page 159 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Automatic Smoothing and Variable Selection Via Regularization

Download or read book Automatic Smoothing and Variable Selection Via Regularization written by Ming Yuan and published by . This book was released on 2004 with total page 112 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Computational Genomics with R

Download or read book Computational Genomics with R written by Altuna Akalin and published by CRC Press. This book was released on 2020-12-16 with total page 462 pages. Available in PDF, EPUB and Kindle. Book excerpt: Computational Genomics with R provides a starting point for beginners in genomic data analysis and also guides more advanced practitioners to sophisticated data analysis techniques in genomics. The book covers topics from R programming, to machine learning and statistics, to the latest genomic data analysis techniques. The text provides accessible information and explanations, always with the genomics context in the background. This also contains practical and well-documented examples in R so readers can analyze their data by simply reusing the code presented. As the field of computational genomics is interdisciplinary, it requires different starting points for people with different backgrounds. For example, a biologist might skip sections on basic genome biology and start with R programming, whereas a computer scientist might want to start with genome biology. After reading: You will have the basics of R and be able to dive right into specialized uses of R for computational genomics such as using Bioconductor packages. You will be familiar with statistics, supervised and unsupervised learning techniques that are important in data modeling, and exploratory analysis of high-dimensional data. You will understand genomic intervals and operations on them that are used for tasks such as aligned read counting and genomic feature annotation. You will know the basics of processing and quality checking high-throughput sequencing data. You will be able to do sequence analysis, such as calculating GC content for parts of a genome or finding transcription factor binding sites. You will know about visualization techniques used in genomics, such as heatmaps, meta-gene plots, and genomic track visualization. You will be familiar with analysis of different high-throughput sequencing data sets, such as RNA-seq, ChIP-seq, and BS-seq. You will know basic techniques for integrating and interpreting multi-omics datasets. Altuna Akalin is a group leader and head of the Bioinformatics and Omics Data Science Platform at the Berlin Institute of Medical Systems Biology, Max Delbrück Center, Berlin. He has been developing computational methods for analyzing and integrating large-scale genomics data sets since 2002. He has published an extensive body of work in this area. The framework for this book grew out of the yearly computational genomics courses he has been organizing and teaching since 2015.

Book Variable Selection and Parameter Estimation for Normal Linear Regression Models

Download or read book Variable Selection and Parameter Estimation for Normal Linear Regression Models written by Peter James Kempthorne and published by . This book was released on 1986 with total page 159 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Cross validation and Regression Analysis in High dimensional Sparse Linear Models

Download or read book Cross validation and Regression Analysis in High dimensional Sparse Linear Models written by Feng Zhang and published by Stanford University. This book was released on 2011 with total page 91 pages. Available in PDF, EPUB and Kindle. Book excerpt: Modern scientific research often involves experiments with at most hundreds of subjects but with tens of thousands of variables for every subject. The challenge of high dimensionality has reshaped statistical thinking and modeling. Variable selection plays a pivotal role in the high-dimensional data analysis, and the combination of sparsity and accuracy is crucial for statistical theory and practical applications. Regularization methods are attractive for tackling these sparsity and accuracy issues. The first part of this thesis studies two regularization methods. First, we consider the orthogonal greedy algorithm (OGA) used in conjunction with a high-dimensional information criterion introduced by Ing& Lai (2011). Although it has been shown to have excellent performance for weakly sparse regression models, one does not know a priori in practice that the actual model is weakly sparse, and we address this problem by developing a new cross-validation approach. OGA can be viewed as L0 regularization for weakly sparse regression models. When such sparsity fails, as revealed by the cross-validation analysis, we propose to use a new way to combine L1 and L2 penalties, which we show to have important advantages over previous regularization methods. The second part of the thesis develops a Monte Carlo Cross-Validation (MCCV) method to estimate the distribution of out-of-sample prediction errors when a training sample is used to build a regression model for prediction. Asymptotic theory and simulation studies show that the proposed MCCV method mimics the actual (but unknown) prediction error distribution even when the number of regressors exceeds the sample size. Therefore MCCV provides a useful tool for comparing the predictive performance of different regularization methods for real (rather than simulated) data sets.

Book The Solution Path of the Generalized Lasso

Download or read book The Solution Path of the Generalized Lasso written by Ryan Joseph Tibshirani and published by Stanford University. This book was released on 2011 with total page 95 pages. Available in PDF, EPUB and Kindle. Book excerpt: We present a path algorithm for the generalized lasso problem. This problem penalizes the l1 norm of a matrix D times the coefficient vector, and has a wide range of applications, dictated by the choice of D. Our algorithm is based on solving the dual of the generalized lasso, which facilitates computation and conceptual understanding of the path. For D=I (the usual lasso), we draw a connection between our approach and the well-known LARS algorithm. For an arbitrary D, we derive an unbiased estimate of the degrees of freedom of the generalized lasso fit. This estimate turns out to be quite intuitive in many applications.

Book On Bayesian Regression Regularization Methods

Download or read book On Bayesian Regression Regularization Methods written by Qing Li and published by . This book was released on 2010 with total page 88 pages. Available in PDF, EPUB and Kindle. Book excerpt: Regression regularization methods are drawing increasing attention from statisticians for more frequent appearance of high-dimensional problems. Regression regularization achieves simultaneous parameter estimation and variable selection by penalizing the model parameters. In the first part of this thesis, we focus on the elastic net, which is a flexible regularization and variable selection method that uses a mixture of L1 and L2 penalties. It is particularly useful when there are much more predictors than the sample size. We proposes a Bayesian method to solve the elastic net model using a Gibbs sampler. While the marginal posterior mode of the regression coefficients is equivalent to estimates given by the non-Bayesian elastic net, the Bayesian elastic net has two major advantages. Firstly, as a Bayesian method, the distributional results on the estimates are straightforward, making the statistical inference easier. Secondly, it chooses the two penalty parameters simultaneously, avoiding the "double shrinkage problem" in the elastic net method. Real data examples and simulation studies show that the Bayesian elastic net behaves comparably in prediction accuracy but performs better in variable selection. The second part of this thesis investigates Bayesian regularization in quantile regression. Quantile regression is a method that models the relationship between the response variable and covariates through the population quantiles of the response variable. By proposing a hierarchical model framework, we give a generic treatment to a set of regularization approaches, including lasso, elastic net and group lasso. Gibbs samplers are derived for all cases. This is the first work to discuss regularized quantile regression with the elastic net penalty and the group lasso penalty. Both simulated and real data examples show that Bayesian regularized quantile regression methods often outperform quantile regression without regularization and their non-Bayesian counterparts with regularization.