EBookClubs

Read Books & Download eBooks Full Online

EBookClubs

Read Books & Download eBooks Full Online

Book Shrinkage based Variable Selection Methods for Linear Regression and Mixed effects Models

Download or read book Shrinkage based Variable Selection Methods for Linear Regression and Mixed effects Models written by Arun Krishna and published by . This book was released on 2009 with total page 93 pages. Available in PDF, EPUB and Kindle. Book excerpt: Keywords: shrinkage techniques, powered correlation prior, Zellner's prior, mixed-models.

Book Shrinkage Based Variable Selection Methods for Linear Regression and Mixed Effects Models

Download or read book Shrinkage Based Variable Selection Methods for Linear Regression and Mixed Effects Models written by and published by . This book was released on 2004 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: KRISHNA, ARUN. Shrinkage-Based Variable Selection Methods for Linear Regression and Mixed-Effects Models. (Under the direction of Professors H.D. Bondell and S.K. Ghosh). In this dissertation we propose two new shrinkage-based variable selection approaches. We first propose a Bayesian selection technique for linear regression models, which allows for highly correlated predictors to enter or exit the model, simultaneously. The second variable selection method proposed is for linear mixed-effects models, where we develop a new technique to jointly select the important fixed and random effects parameters. We briefly summarize each of these methods below. The problem of selecting the correct subset of predictors within a linear model has received much attention in recent literature. Within the Bayesian framework, a popular choice of prior has been Zellnerâ€"! g-prior which is based on the inverse of empirical covariance matrix of the predictors. We propose an extension of Zellnerâ€"! gprior which allow for a power parameter on the empirical covariance of the predictors. The power parameter helps control the degree to which correlated predictors are smoothed towards or away from one another. In addition, the empirical covariance of the predictors is used to obtain suitable priors over model space. In this manner, the power parameter also helps to determine whether models containing highly collinear predictors are preferred or avoided. The proposed power parameter can be chosen via an empirical Bayes method which leads to a data adaptive choice of prior. Simulation studies and a real data example are presented to show how the power parameter is well determined from the degree of cross-correlation within predictors. The proposed modification compares favorably to the standard use of Zellnerâ€"! prior and an intrinsic prior in these examples. We propose a new method of simultaneously identifying the important predictors that correspond to both the fixed and random effects.

Book Methods for Interquantile Shrinkage and Variable Selection in Linear Regression Models

Download or read book Methods for Interquantile Shrinkage and Variable Selection in Linear Regression Models written by Liewen Jiang and published by . This book was released on 2012 with total page 87 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Variable Selection by Regularization Methods for Generalized Mixed Models

Download or read book Variable Selection by Regularization Methods for Generalized Mixed Models written by Andreas Groll and published by Cuvillier Verlag. This book was released on 2011-12-13 with total page 175 pages. Available in PDF, EPUB and Kindle. Book excerpt: A regression analysis describes the dependency of random variables in the form of a functional relationship. One distinguishes between the dependent response variable and one or more independent influence variables. There is a variety of model classes and inference methods available, ranging from the conventional linear regression model up to recent non- and semiparametric regression models. The so-called generalized regression models form a methodically consistent framework incorporating many regression approaches with response variables that are not necessarily normally distributed, including the conventional linear regression model based on the normal distribution assumption as a special case. When repeated measurements are modeled in addition to fixed effects also random effects or coefficients can be included. Such models are known as Random Effects Models or Mixed Models. As a consequence, regression procedures are applicable extremely versatile and consider very different problems. In this dissertation regularization techniques for generalized mixed models are developed that are able to perform variable selection. These techniques are especially appropriate when many potential influence variables are present and existing approaches tend to fail. First of all a componentwise boosting technique for generalized linear mixed models is presented which is based on the likelihood function and works by iteratively fitting the residuals using weak learners. The complexity of the resulting estimator is determined by information criteria. For the estimation of variance components two approaches are considered, an estimator resulting from maximizing the profile likelihood, and an estimator which can be calculated using an approximative EM-algorithm. Then the boosting concept is extended to mixed models with ordinal response variables. Two different types of ordered models are considered, the threshold model, also known as cumulative model, and the sequential model. Both are based on the assumption that the observed response variable results from a categorized version of a latent metric variable. In the further course of the thesis the boosting approach is extended to additive predictors. The unknown functions to be estimated are expanded in B-spline basis functions, whose smoothness is controlled by penalty terms. Finally, a suitable L1-regularization technique for generalized linear models is presented, which is based on a combination of Fisher scoring and gradient optimization. Extensive simulation studies and numerous applications illustrate the competitiveness of the methods constructed in this thesis compared to conventional approaches. For the calculation of standard errors bootstrap methods are used.

Book Methods and Applications of Longitudinal Data Analysis

Download or read book Methods and Applications of Longitudinal Data Analysis written by Xian Liu and published by Elsevier. This book was released on 2015-09-01 with total page 531 pages. Available in PDF, EPUB and Kindle. Book excerpt: Methods and Applications of Longitudinal Data Analysis describes methods for the analysis of longitudinal data in the medical, biological and behavioral sciences. It introduces basic concepts and functions including a variety of regression models, and their practical applications across many areas of research. Statistical procedures featured within the text include: descriptive methods for delineating trends over time linear mixed regression models with both fixed and random effects covariance pattern models on correlated errors generalized estimating equations nonlinear regression models for categorical repeated measurements techniques for analyzing longitudinal data with non-ignorable missing observations Emphasis is given to applications of these methods, using substantial empirical illustrations, designed to help users of statistics better analyze and understand longitudinal data. Methods and Applications of Longitudinal Data Analysis equips both graduate students and professionals to confidently apply longitudinal data analysis to their particular discipline. It also provides a valuable reference source for applied statisticians, demographers and other quantitative methodologists. From novice to professional: this book starts with the introduction of basic models and ends with the description of some of the most advanced models in longitudinal data analysis Enables students to select the correct statistical methods to apply to their longitudinal data and avoid the pitfalls associated with incorrect selection Identifies the limitations of classical repeated measures models and describes newly developed techniques, along with real-world examples.

Book Subset Selection in Regression

Download or read book Subset Selection in Regression written by Alan Miller and published by CRC Press. This book was released on 2002-04-15 with total page 258 pages. Available in PDF, EPUB and Kindle. Book excerpt: Originally published in 1990, the first edition of Subset Selection in Regression filled a significant gap in the literature, and its critical and popular success has continued for more than a decade. Thoroughly revised to reflect progress in theory, methods, and computing power, the second edition promises to continue that tradition. The author ha

Book Linear Mixed Model Selection Via Minimum Approximated Information Criterion

Download or read book Linear Mixed Model Selection Via Minimum Approximated Information Criterion written by Olivia Abena Atutey and published by . This book was released on 2020 with total page 110 pages. Available in PDF, EPUB and Kindle. Book excerpt: The analyses of correlated, repeated measures, or multilevel data with a Gaussian response are often based on models known as the linear mixed models (LMMs). LMMs are modeled using both fixed effects and random effects. The random intercepts (RI) and random intercepts and slopes (RIS) models are two exceptional cases from the linear mixed models that are taken into consideration. Our primary focus in this dissertation is to propose an approach for simultaneous selection and estimation of fixed effects only in LMMs. This dissertation, inspired by recent research of methods and criteria for model selection, aims to extend a variable selection procedure referred to as minimum approximated information criterion (MIC) of Su et al. (2018). Our contribution presents further use of the MIC for variable selection and sparse estimation in LMMs. Thus, we design a penalized log-likelihood procedure referred to as the minimum approximated information criterion for LMMs (lmmMAIC), which is used to find a parsimonious model that better generalizes data with a group structure. Our proposed lmmMAIC method enforces variable selection and sparse estimation simultaneously by adding a penalty term to the negative log-likelihood of the linear mixed model. The method differs from existing regularized methods mainly due to the penalty parameter and the penalty function.With regards to the penalty function, the lmmMAIC mimics the traditional Bayesian information criterion (BIC)-based best subset selection (BSS) method but requires a continuous or smooth approximation to the L0 norm penalty of BSS. In this context, lmmMAIC performs sparse estimation by optimizing an approximated information criterion, which substantially requires approximating that L0 norm penalty of BSS with a continuous unit dent function. A unit dent function, motivated by bump functions called mollifiers (Friedrichs, 1944), is an even continuous function with a [0, 1] range. Among several unit dent functions, incorporating a hyperbolic tangent function is most preferred. The approximation changes the discrete nature of the L0 norm penalty of BSS to a continuous or smooth one making our method less computationally expensive. Besides, the hyperbolic tangent function has a simple form and it is much easier to compute its derivatives. This shrinkage-based method fits a linear mixed model containing all p predictors instead of comparing and selecting a correct sub-model across 2p candidate models. On this account, the lmmMAIC is feasible for high-dimensional data. The replacement, however, does not enforce sparsity since the hyperbolic tangent function is not singular at its origin. To better handle this issue, a reparameterization trick of the regression coefficients is needed to achieve sparsity.For a finite number of parameters, numerical investigations demonstrated by Shi and Tsai (2002) prove that traditional information criterion (IC)-based procedure like BIC can consistently identify a model. Following these suggestions of consistent variable selection and computational efficiency, we maintain the BIC fixed penalty parameter. Thus, our newly proposed procedure is free of using the frequently applied practices such as generalized cross validation (GCV) in selecting an optimal penalty parameter for our penalized likelihood framework. The lmmMAIC enjoys less computational time compared to other regularization methods.We formulate the lmmMAIC procedure as a smooth optimization problem and seek to solve for the fixed effects parameters by minimizing the penalized log-likelihood function. The implementation of the lmmMAIC involves an initial step of using the simulated annealing algorithm to obtain estimates. We proceed using these estimates as starting values by applying the modified Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm until convergence. After this step, we plug estimates obtained from the modified BFGS into the reparameterized hyperbolic tangent function to obtain our fixed effects estimates. Alternatively, the optimization of the penalized log-likelihood can be solved using generalized simulation annealing.Our research explores the performance and asymptotic properties of the lmmMAIC method by conducting extensive simulation studies using different model settings. The numerical results of our simulations for our proposed variable selection and estimation method are compared to other standard LMMs shrinkage-based methods such as Lasso, ridge, and elastic net. The results provide evidence that lmmMAIC is more consistent and efficient than the existing shrinkage-based methods under study. Furthermore, two applications with real-life examples are illustrated to evaluate the effectiveness of the lmmMAIC method.

Book TWO STAGE SCAD LASSO FOR LINEAR MIXED MODEL SELECTION

Download or read book TWO STAGE SCAD LASSO FOR LINEAR MIXED MODEL SELECTION written by Mohammed A. Yousef and published by . This book was released on 2019 with total page 116 pages. Available in PDF, EPUB and Kindle. Book excerpt: Linear regression model is the classical approach to explain the relationship between the response variable (dependent) and predictors (independent). However, when the number of predictors in the data increases, the likelihood of the correlation between predictors also increases, which is problematic. To avoid that, the linear mixed effects model was proposed which consists of a fixed effects term and a random effects term. The fixed effects term represents the traditional linear regression coefficients, and the random effects term represents the values that are drawn randomly from the population. Thus, the linear mixed model allows us to represent the mean as well as the covariance structure of the data in a single model. When the fixed and random effects terms increase in their dimensions, selection as appropriate model, which is the optimum fit, becomes increasingly difficult. Due to this natural complexity inherent in the linear mixed model, in this dissertation we propose a two-stage method for selecting fixed and random effects terms. In the first stage, we select the most significant fixed effects in the model based on the conditional distribution of the response variable given the random effects. This is achieved by minimizing the penalized least square estimator with a SCAD Lasso penalty term. We used the Newton-Raphson optimization algorithm to implement the parameter estimations. In this process, the coefficients of the unimportant predictors shrink towards exactly zero, thus eliminating the noise from the model. Subsequently, in the second stage we choose the most important random effects by maximizing the penalized profile log-likelihood function. This maximization is achieved using the Newton-Raphson optimization algorithm. As in the first stage, the penalty term appended is SCAD Lasso. Unlike the fixed effects, the random effects are drawn randomly from the population; hence, they need to be predicted. This prediction is done by estimating the diagonal elements (variances) of the covariance structure of the random effects. Note that during this step, for all random effects that are unimportant, the corresponding variance components will shrink to exactly zero (similar to the shrinking of fixed effects parameters in the first stage). This is how noise is eliminated from the model while retaining only significant effects. Hence, the selection of the random effects is completed. In both stages of the proposed approach, it is shown that the selection of the effects through elimination is done with the probability tending to one. It is indicative that the proposed method surely identifies all true effects, fixed as well as random. Also, it is shown that the proposed method satisfies the oracle properties, namely asymptotic normality and sparsity. At the end of these two stages, we have the optimal linear mixed model which can be readily applied to correlated data. To test the overall effectiveness of the proposed approach, four simulation studies are conducted. Each scenario has a different number of subjects, different observations per subject, and different covariance structures on which the data are generated. The simulation results illustrate that the proposed method can effectively select the fixed effects and random effects in the linear mixed model. In the simulations, the proposed method is also compared with other model selection methods, and the simulation results make it manifest that the proposed method performs better in choosing the true model. Subsequently, two applications, Amsterdam growth and health study data (Kemper, 1995) and Messier 69 data-Astronomy application (Husband, 2017), are utilized to investigate how the proposed approach behaves with the real-life data. In both applications, the proposed method is compared with other methods. The proposed method proves to be more effective than its counterparts in identifying the appropriate mixed model.

Book Variable Selection Procedures for Generalized Linear Mixed Models in Longitudinal Data Analysis

Download or read book Variable Selection Procedures for Generalized Linear Mixed Models in Longitudinal Data Analysis written by and published by . This book was released on 2004 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Model selection is important for longitudinal data analysis. But up to date little work has been done on variable selection for generalized linear mixed models (GLMM). In this paper we propose and study a class of variable selection methods. Full likelihood (FL) approach is proposed for simultaneous model selection and parameter estimation. Due to the intensive computation involved in FL approach, Penalized Quasi-Likelihood (PQL) procedure is developed so that model selection in GLMMs can proceed in the framework of linear mixed models. Since the PQL approach will produce biased parameter estimates for sparse binary longitudinal data, Two-stage Penalized Quasi-Likelihood approach (TPQL) is proposed to bias correct PQL in terms of estimation: use PQL to do model selection at the first stage and existing software to do parameter estimation at the second stage. Marginal approach for some special types of data is also developed. A robust estimator of standard error for the fitted parameters is derived based on a sandwich formula. A bias correction is proposed to improve the estimation accuracy of PQL for binary data. The sampling performance of four proposed procedures is evaluated through extensive simulations and their application to real data analysis. In terms of model selection, all of them perform closely. As for parameter estimation, FL, AML and TPQL yield similar results. Compared with FL, the other procedures greatly reduce computational load. The proposed procedures can be extended to longitudinal data analysis involving missing data, and the shrinkage penalty based approach allows them to work even when the number of observations n is less than the number of parameters d.

Book Regression Modeling Strategies

Download or read book Regression Modeling Strategies written by Frank E. Harrell and published by Springer Science & Business Media. This book was released on 2013-03-09 with total page 583 pages. Available in PDF, EPUB and Kindle. Book excerpt: Many texts are excellent sources of knowledge about individual statistical tools, but the art of data analysis is about choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for dealing with nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap. This text realistically deals with model uncertainty and its effects on inference to achieve "safe data mining".

Book Multivariate Statistical Modelling Based on Generalized Linear Models

Download or read book Multivariate Statistical Modelling Based on Generalized Linear Models written by Ludwig Fahrmeir and published by Springer Science & Business Media. This book was released on 2013-11-11 with total page 440 pages. Available in PDF, EPUB and Kindle. Book excerpt: Concerned with the use of generalised linear models for univariate and multivariate regression analysis, this is a detailed introductory survey of the subject, based on the analysis of real data drawn from a variety of subjects such as the biological sciences, economics, and the social sciences. Where possible, technical details and proofs are deferred to an appendix in order to provide an accessible account for non-experts. Topics covered include: models for multi-categorical responses, model checking, time series and longitudinal data, random effects models, and state-space models. Throughout, the authors have taken great pains to discuss the underlying theoretical ideas in ways that relate well to the data at hand. As a result, numerous researchers whose work relies on the use of these models will find this an invaluable account.

Book Objets d art et d ameublement  Bijoux Argenterie  Dessins anciens et gravures Tableaux anciens et modernes  Mobiliers des XVIII e et XIX e si  cles  Tapisseries   Tapis

Download or read book Objets d art et d ameublement Bijoux Argenterie Dessins anciens et gravures Tableaux anciens et modernes Mobiliers des XVIII e et XIX e si cles Tapisseries Tapis written by and published by . This book was released on 1988 with total page 89 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Joint Variable Selection for Data Envelopment Analysis Via Group Sparsity

Download or read book Joint Variable Selection for Data Envelopment Analysis Via Group Sparsity written by Zhiwei Qin and published by . This book was released on 2014 with total page 28 pages. Available in PDF, EPUB and Kindle. Book excerpt: This study develops a data-driven group variable selection method for data envelopment analysis (DEA), a non-parametric linear programming approach to the estimation of production frontiers. The proposed method extends the group Lasso (least absolute shrinkage and selection operator) designed for variable selection on (often predefined) groups of variables in linear regression models to DEA models. In particular, a special constrained version of the group Lasso with the loss function suited for variable selection in DEA models is derived and solved by a new tailored algorithm based on the alternating direction method of multipliers (ADMM). This study further conducts a thorough evaluation of the proposed method against two widely used variable selection methods -- the efficiency contribution measure (ECM) method and the regression-based (RB) test -- in DEA via Monte Carlo simulations. The simulation results show that our method provides more favorable performance compared with its benchmarks.

Book Shrinkage Parameter Selection in Generalized Linear and Mixed Models

Download or read book Shrinkage Parameter Selection in Generalized Linear and Mixed Models written by Erin K. Melcon and published by . This book was released on 2014 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Penalized likelihood methods such as lasso, adaptive lasso, and SCAD have been highly utilized in linear models. Selection of the penalty parameter is an important step in modeling with penalized techniques. Traditionally, information criteria or cross validation are used to select the penalty parameter. Although methods of selecting this have been evaluated in linear models, general linear models and linear mixed models have not been so thoroughly explored.This dissertation will introduce a data-driven bootstrap (Empirical Optimal Selection, or EOS) approach for selecting the penalty parameter with a focus on model selection. We implement EOS on selecting the penalty parameter in the case of lasso and adaptive lasso. In generalized linear models we will introduce the method, show simulations comparing EOS to information criteria and cross validation, and give theoretical justification for this approach. We also consider a practical upper bound for the penalty parameter, with theoretical justification. In linear mixed models, we use EOS with two different objective functions; the traditional log-likelihood approach (which requires an EM algorithm), and a predictive approach. In both of these cases, we compare selecting the penalty parameter with EOS to selection with information criteria. Theoretical justification for both objective functions and a practical upper bound for the penalty parameter in the log-likelihood case are given. We also applied our technique to two datasets; the South African heart data (logistic regression) and the Yale infant data (a linear mixed model). For the South African data, we compare the final models using EOS and information criteria via the mean squared prediction error (MSPE). For the Yale infant data, we compare our results to those obtained by Ibrahim et al. (2011).

Book Boosting Methods for Variable Selection in High Dimensional Sparse Models

Download or read book Boosting Methods for Variable Selection in High Dimensional Sparse Models written by and published by . This book was released on 2004 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Firstly, we propose new variable selection techniques for regression in high dimensional linear models based on a forward selection version of the LASSO, adaptive LASSO or elastic net, respectively to be called as forward iterative regression and shrinkage technique (FIRST), adaptive FIRST and elastic FIRST. These methods seem to work better for an extremely sparse high dimensional linear regression model. We exploit the fact that the LASSO, adaptive LASSO and elastic net have closed form solutions when the predictor is one-dimensional. The explicit formula is then repeatedly used in an iterative fashion until convergence occurs. By carefully considering the relationship between estimators at successive stages, we develop fast algorithms to compute our estimators. The performance of our new estimators is compared with commonly used estimators in terms of predictive accuracy and errors in variable selection. It is observed that our approach has better prediction performance for highly sparse high dimensional linear regression models. Secondly, we propose a new variable selection technique for binary classification in high dimensional models based on a forward selection version of the Squared Support Vector Machines or one-norm Support Vector Machines, to be called as forward iterative selection and classification algorithm (FISCAL). This methods seem to work better for a highly sparse high dimensional binary classification model. We suggest the squared support vector machines using 1-norm and 2-norm simultaneously. The squared support vector machines are convex and differentiable except at zero when the predictor is one-dimensional. Then an iterative forward selection approach is applied along with the squared support vector machines until a stopping rule is satisfied. Also, we develop a recursive algorithm for the FISCAL to save computational burdens. We apply the processes to the original onenorm Support Vector Machines. We compare the FISCAL with other widely used.

Book Essays on Robust Model Selection and Model Averaging for Linear Models

Download or read book Essays on Robust Model Selection and Model Averaging for Linear Models written by Le Chang and published by . This book was released on 2017 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Model selection is central to all applied statistical work. Selecting the variables for use in a regression model is one important example of model selection. This thesis is a collection of essays on robust model selection procedures and model averaging for linear regression models. In the first essay, we propose robust Akaike information criteria (AIC) for MM-estimation and an adjusted robust scale based AIC for M and MM-estimation. Our proposed model selection criteria can maintain their robust properties in the presence of a high proportion of outliers and the outliers in the covariates. We compare our proposed criteria with other robust model selection criteria discussed in previous literature. Our simulation studies demonstrate a significant outperformance of robust AIC based on MM-estimation in the presence of outliers in the covariates. The real data example also shows a better performance of robust AIC based on MM-estimation. The second essay focuses on robust versions of the "Least Absolute Shrinkage and Selection Operator" (lasso). The adaptive lasso is a method for performing simultaneous parameter estimation and variable selection. The adaptive weights used in its penalty term mean that the adaptive lasso achieves the oracle property. In this essay, we propose an extension of the adaptive lasso named the Tukey-lasso. By using Tukey's biweight criterion, instead of squared loss, the Tukey-lasso is resistant to outliers in both the response and covariates. Importantly, we demonstrate that the Tukey-lasso also enjoys the oracle property. A fast accelerated proximal gradient (APG) algorithm is proposed and implemented for computing the Tukey-lasso. Our extensive simulations show that the Tukey-lasso, implemented with the APG algorithm, achieves very reliable results, including for high-dimensional data where p>n. In the presence of outliers, the Tukey-lasso is shown to offer substantial improvements in performance compared to the adaptive lasso and other robust implementations of the lasso. Real data examples further demonstrate the utility of the Tukey-lasso. In many statistical analyses, a single model is used for statistical inference, ignoring the process that leads to the model being selected. To account for this model uncertainty, many model averaging procedures have been proposed. In the last essay, we propose an extension of a bootstrap model averaging approach, called bootstrap lasso averaging (BLA). BLA utilizes the lasso for model selection. This is in contrast to other forms of bootstrap model averaging that use AIC or Bayesian information criteria (BIC). The use of the lasso improves the computation speed and allows BLA to be applied even when the number of variables p is larger than the sample size n. Extensive simulations confirm that BLA has outstanding finite sample performance, in terms of both variable and prediction accuracies, compared with traditional model selection and model averaging methods. Several real data examples further demonstrate an improved out-of-sample predictive performance of BLA.