[EBOOK] Statistical Methods For Large Scale Multiple Testing Problems PDF Download

Genetics

Statistical Methods for Large scale Multiple Testing Problems

Book Details:

Author : Yu Gao
Publisher :
Release : 2019
ISBN :
Pages : 100 pages

Download or read book Statistical Methods for Large scale Multiple Testing Problems written by Yu Gao and published by . This book was released on 2019 with total page 100 pages. Available in PDF, EPUB and Kindle. Book excerpt: A large-scale multiple testing problem simultaneously tests thousands or even millions of null hypotheses, and it is widely used in different fields, for example genetics and astronomy. An error rate serves as a measure of the performance of a testing procedure. The use of the family-wise error rate can accommodate any dependence between hypotheses, but it is often overly conservative and has limited detection power.The false discovery rate is more powerful, however not as widely used due to the requirement of independence and other reasons. In this thesis, we develop statistical methods for large-scale multiple testing problems in pharmacovigilance and genetic studies, and adopt the false discovery rate to improve the detection power by tacking mixed challenges.

Mathematics

Handbook of Multiple Comparisons

Book Details:

Author : Xinping Cui
Publisher : CRC Press
Release : 2021-11-18
ISBN : 0429633882
Pages : 418 pages

Download or read book Handbook of Multiple Comparisons written by Xinping Cui and published by CRC Press. This book was released on 2021-11-18 with total page 418 pages. Available in PDF, EPUB and Kindle. Book excerpt: Written by experts that include originators of some key ideas, chapters in the Handbook of Multiple Testing cover multiple comparison problems big and small, with guidance toward error rate control and insights on how principles developed earlier can be applied to current and emerging problems. Some highlights of the coverages are as follows. Error rate control is useful for controlling the incorrect decision rate. Chapter 1 introduces Tukey's original multiple comparison error rates and point to how they have been applied and adapted to modern multiple comparison problems as discussed in the later chapters. Principles endure. While the closed testing principle is more familiar, Chapter 4 shows the partitioning principle can derive confidence sets for multiple tests, which may become important as the profession goes beyond making decisions based on p-values. Multiple comparisons of treatment efficacy often involve multiple doses and endpoints. Chapter 12 on multiple endpoints explains how different choices of endpoint types lead to different multiplicity adjustment strategies, while Chapter 11 on the MCP-Mod approach is particularly useful for dose-finding. To assess efficacy in clinical trials with multiple doses and multiple endpoints, the reader can see the traditional approach in Chapter 2, the Graphical approach in Chapter 5, and the multivariate approach in Chapter 3. Personalized/precision medicine based on targeted therapies, already a reality, naturally leads to analysis of efficacy in subgroups. Chapter 13 draws attention to subtle logical issues in inferences on subgroups and their mixtures, with a principled solution that resolves these issues. This chapter has implication toward meeting the ICHE9R1 Estimands requirement. Besides the mere multiple testing methodology itself, the handbook also covers related topics like the statistical task of model selection in Chapter 7 or the estimation of the proportion of true null hypotheses (or, in other words, the signal prevalence) in Chapter 8. It also contains decision-theoretic considerations regarding the admissibility of multiple tests in Chapter 6. The issue of selected inference is addressed in Chapter 9. Comparison of responses can involve millions of voxels in medical imaging or SNPs in genome-wide association studies (GWAS). Chapter 14 and Chapter 15 provide state of the art methods for large scale simultaneous inference in these settings.

Electronic dissertations

Large scale Multiple Hypothesis Testing with Complex Data Structure

Book Details:

Author : Xiaoyu Dai
Publisher :
Release : 2018
ISBN :
Pages : 104 pages

Download or read book Large scale Multiple Hypothesis Testing with Complex Data Structure written by Xiaoyu Dai and published by . This book was released on 2018 with total page 104 pages. Available in PDF, EPUB and Kindle. Book excerpt: In the last decade, motivated by a variety of applications in medicine, bioinformatics, genomics, brain imaging, etc., a growing amount of statistical research has been devoted to large-scale multiple testing, where thousands or even greater numbers of tests are conducted simultaneously. However, due to the complexity of real data sets, the assumptions of many existing multiple testing procedures, e.g. that tests are independent and have continuous null distributions of p-values, may not hold. This poses limitations in their performances such as low detection power and inflated false discovery rate (FDR). In this dissertation, we study how to better proceed the multiple testing problems under complex data structures. In Chapter 2, we study the multiple testing with discrete test statistics. In Chapter 3, we study the discrete multiple testing with prior ordering information incorporated. In Chapter 4, we study the multiple testing under complex dependency structure. We propose novel procedures under each scenario, based on the marginal critical functions (MCFs) of randomized tests, the conditional random field (CRF) or the deep neural network (DNN). The theoretical properties of our procedures are carefully studied, and their performances are evaluated through various simulations and real applications with the analysis of genetic data from next-generation sequencing (NGS) experiments.

Methods in Multiple Testing and Meta analysis with Applications to the Analysis of Genomic Data

Book Details:

Author : Yihan Li
Publisher :
Release : 2014
ISBN :
Pages : 160 pages

Download or read book Methods in Multiple Testing and Meta analysis with Applications to the Analysis of Genomic Data written by Yihan Li and published by . This book was released on 2014 with total page 160 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Large scale Simultaneous Hypothesis Testing

Book Details:

Author : Bradley Efron
Publisher :
Release : 2003
ISBN :
Pages : 22 pages

Download or read book Large scale Simultaneous Hypothesis Testing written by Bradley Efron and published by . This book was released on 2003 with total page 22 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Mathematics

A Multiple Testing Approach to the Multivariate Behrens Fisher Problem

Book Details:

Author : Tejas Desai
Publisher : Springer Science & Business Media
Release : 2013-02-26
ISBN : 1461464439
Pages : 60 pages

Download or read book A Multiple Testing Approach to the Multivariate Behrens Fisher Problem written by Tejas Desai and published by Springer Science & Business Media. This book was released on 2013-02-26 with total page 60 pages. Available in PDF, EPUB and Kindle. Book excerpt: In statistics, the Behrens–Fisher problem is the problem of interval estimation and hypothesis testing concerning the difference between the means of two normally distributed populations when the variances of the two populations are not assumed to be equal, based on two independent samples. In his 1935 paper, Fisher outlined an approach to the Behrens-Fisher problem. Since high-speed computers were not available in Fisher’s time, this approach was not implementable and was soon forgotten. Fortunately, now that high-speed computers are available, this approach can easily be implemented using just a desktop or a laptop computer. Furthermore, Fisher’s approach was proposed for univariate samples. But this approach can also be generalized to the multivariate case. In this monograph, we present the solution to the afore-mentioned multivariate generalization of the Behrens-Fisher problem. We start out by presenting a test of multivariate normality, proceed to test(s) of equality of covariance matrices, and end with our solution to the multivariate Behrens-Fisher problem. All methods proposed in this monograph will be include both the randomly-incomplete-data case as well as the complete-data case. Moreover, all methods considered in this monograph will be tested using both simulations and examples.

Mathematics

Resampling Based Multiple Testing

Book Details:

Author : Peter H. Westfall
Publisher : John Wiley & Sons
Release : 1993-01-12
ISBN : 9780471557616
Pages : 382 pages

Download or read book Resampling Based Multiple Testing written by Peter H. Westfall and published by John Wiley & Sons. This book was released on 1993-01-12 with total page 382 pages. Available in PDF, EPUB and Kindle. Book excerpt: Combines recent developments in resampling technology (including the bootstrap) with new methods for multiple testing that are easy to use, convenient to report and widely applicable. Software from SAS Institute is available to execute many of the methods and programming is straightforward for other applications. Explains how to summarize results using adjusted p-values which do not necessitate cumbersome table look-ups. Demonstrates how to incorporate logical constraints among hypotheses, further improving power.

Large Scale Global and Simultaneous Inference

Book Details:

Author : Tony Cai
Publisher :
Release : 2017
ISBN :
Pages : 0 pages

Download or read book Large Scale Global and Simultaneous Inference written by Tony Cai and published by . This book was released on 2017 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Due to rapid technological advances, researchers are now able to collect and analyze ever larger data sets. Statistical inference for big data often requires solving thousands or even millions of parallel inference problems simultaneously. This poses significant challenges and calls for new principles, theories, and methodologies. This review provides a selective survey of some recently developed methods and results for large-scale statistical inference, including detection, estimation, and multiple testing. We begin with the global testing problem, where the goal is to detect the existence of sparse signals in a data set, and then move to the problem of estimating the proportion of nonnull effects. Finally, we focus on multiple testing with false discovery rate (FDR) control. The FDR provides a powerful and practical approach to large-scale multiple testing and has been successfully used in a wide range of applications. We discuss several effective data-driven procedures and also present efficient strategies to handle various grouping, hierarchical, and dependency structures in the data.

A New Approach for Large Scale Multiple Testing with Application to FDR Control for Graphically Structured Hypotheses

Book Details:

Author : Wenge Guo
Publisher :
Release : 2018
ISBN :
Pages : 37 pages

Download or read book A New Approach for Large Scale Multiple Testing with Application to FDR Control for Graphically Structured Hypotheses written by Wenge Guo and published by . This book was released on 2018 with total page 37 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Mathematics

Large Scale Inference

Book Details:

Author : Bradley Efron
Publisher : Cambridge University Press
Release : 2012-11-29
ISBN : 1139492136
Pages : pages

Download or read book Large Scale Inference written by Bradley Efron and published by Cambridge University Press. This book was released on 2012-11-29 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: We live in a new age for statistical inference, where modern scientific technology such as microarrays and fMRI machines routinely produce thousands and sometimes millions of parallel data sets, each with its own estimation or testing problem. Doing thousands of problems at once is more than repeated application of classical methods. Taking an empirical Bayes approach, Bradley Efron, inventor of the bootstrap, shows how information accrues across problems in a way that combines Bayesian and frequentist ideas. Estimation, testing and prediction blend in this framework, producing opportunities for new methodologies of increased power. New difficulties also arise, easily leading to flawed inferences. This book takes a careful look at both the promise and pitfalls of large-scale statistical inference, with particular attention to false discovery rates, the most successful of the new statistical techniques. Emphasis is on the inferential ideas underlying technical developments, illustrated using a large number of real examples.

Global Testing and Large Scale Multiple Testing for High Dimensional Covariance Structures

Book Details:

Author : Tony Cai
Publisher :
Release : 2017
ISBN :
Pages : pages

Download or read book Global Testing and Large Scale Multiple Testing for High Dimensional Covariance Structures written by Tony Cai and published by . This book was released on 2017 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Driven by a wide range of contemporary applications, statistical inference for covariance structures has been an active area of current research in high-dimensional statistics. This review provides a selective survey of some recent developments in hypothesis testing for high-dimensional covariance structures, including global testing for the overall pattern of the covariance structures and simultaneous testing of a large collection of hypotheses on the local covariance structures with false discovery proportion and false discovery rate control. Both one-sample and two-sample settings are considered. The specific testing problems discussed include global testing for the covariance, correlation, and precision matrices, and multiple testing for the correlations, Gaussian graphical models, and differential networks.

Large Scale Multiple Testing for Data with Spatial Signals

Book Details:

Author : Yunda Zhong
Publisher :
Release : 2013
ISBN : 9781303005732
Pages : 107 pages

Download or read book Large Scale Multiple Testing for Data with Spatial Signals written by Yunda Zhong and published by . This book was released on 2013 with total page 107 pages. Available in PDF, EPUB and Kindle. Book excerpt: This thesis consists of three projects. The abstracts for each project are listed below.

Multiple Testing in the Presence of Correlations

Book Details:

Author : Bhramori Banerjee
Publisher :
Release : 2011
ISBN :
Pages : 99 pages

Download or read book Multiple Testing in the Presence of Correlations written by Bhramori Banerjee and published by . This book was released on 2011 with total page 99 pages. Available in PDF, EPUB and Kindle. Book excerpt: Simultaneous testing of multiple null hypotheses has now become an integral part of statistical analysis of data arising from modern scientific investigations. Often the test statistics in such multiple testing problem are correlated. The research in this dissertation is motivated by the scope of improving or extending existing methods to incorporate correlation in the data. Sarkar (2008) proposes controlling the pairwise false discovery rate (Pairwise-FDR), which inherently takes into account the dependence among the p-values, thereby making it a more robust, less conservative and more powerful under dependence than the usual notion of FDR. In this dissertation, we further investigate the performance of Pairwise-FDR under a dependent mixture model. In particular, we consider a step-up method to control the Pairwise-FDR under this model assuming that the correlation between any two p-values is the same (exchangeable). We also suggest improving this method by incorporating an estimate of the number of pairs of true null hypotheses developed under this model. Efron (2007, Journal of the American Statistical Association 102, 93-103) proposed a novel approach to incorporate dependence among the null p-values into a multiple testing method controlling false discoveries. In this dissertation, we try to investigate the scope of utilizing this approach by proposing alternative versions of adaptive Bonferroni and BH methods which estimates the number of true null hypotheses from the empirical null distribution introduced by Efron. These newer adaptive procedures have been numerically shown to perform better than existing adaptive Bonferroni or BH methods within a wider range of dependence. A gene expression microarray data set has been used to highlight the difference in results obtained upon applying the proposed and other adaptive BH methods. Another approach to address the presence of correlation is motivated by the scope of utilizing the dependence structure of the data towards further improving some multiple testing methods while maintaining control of some error rate. The dependence structure of the data is incorporated using pairwise weights. In this dissertation we propose a weighted version of the pairwise FDR (Sarkar, 2008) using pairwise weights and a method controlling the weighted pairwise- FDR. We give a discussion on the application of such weighted procedure and suggest some weighting schemes that generates pairwise weights.

Some New Developments on Multiple Testing Procedures

Book Details:

Author : Lilun Du
Publisher :
Release : 2015
ISBN :
Pages : 0 pages

Download or read book Some New Developments on Multiple Testing Procedures written by Lilun Du and published by . This book was released on 2015 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: In the context of large-scale multiple testing, hypotheses are often accompanied with certain prior information. In chapter 2, we present a single-index modulated multiple testing procedure, which maintains control of the false discovery rate while incorporating prior information, by assuming the availability of a bivariate p-value for each hypothesis. To find the optimal rejection region for the bivariate p-value, we propose a criteria based on the ratio of probability density functions of the bivariate p-value under the true null and non-null. This criteria in the bivariate normal setting further motivates us to project the bivariate p-value to a single index p-value, for a wide range of directions. The true null distribution of the single index p-value is estimated via parametric and nonparametric approaches, leading to two procedures for estimating and controlling the false discovery rate. To derive the optimal projection direction, we propose a new approach based on power comparison, which is further shown to be consistent under some mild conditions. Multiple testing based on chi-squared test statistics is commonly used in many scientific fields such as genomics research and brain imaging studies. However, the challenges associated with designing a formal testing procedure when there exists a general dependence structure across the chi-squared test statistics have not been well addressed. In chapter 3, we propose a Factor Connected procedure to fill in this gap. We first adopt a latent factor structure to construct a testing framework for approximating the false discovery proportion (FDP) for a large number of highly correlated chi-squared test statistics with finite degrees of freedom k. The testing framework is then connected to simultaneously testing k linear constraints in a large dimensional linear factor model involved with some observable and unobservable common factors, resulting in a consistent estimator of FDP based on the associated unadjusted p-values.

Computers

Phenotypes and Genotypes

Book Details:

Author : Florian Frommlet
Publisher : Springer
Release : 2016-02-12
ISBN : 1447153103
Pages : 232 pages

Download or read book Phenotypes and Genotypes written by Florian Frommlet and published by Springer. This book was released on 2016-02-12 with total page 232 pages. Available in PDF, EPUB and Kindle. Book excerpt: This timely text presents a comprehensive guide to genetic association, a new and rapidly expanding field that aims to elucidate how our genetic code (genotypes) influences the traits we possess (phenotypes). The book provides a detailed review of methods of gene mapping used in association with experimental crosses, as well as genome-wide association studies. Emphasis is placed on model selection procedures for analyzing data from large-scale genome scans based on specifically designed modifications of the Bayesian information criterion. Features: presents a thorough introduction to the theoretical background to studies of genetic association (both genetic and statistical); reviews the latest advances in the field; illustrates the properties of methods for mapping quantitative trait loci using computer simulations and the analysis of real data; discusses open challenges; includes an extensive statistical appendix as a reference for those who are not totally familiar with the fundamentals of statistics.

Mathematics

Statistics for High Dimensional Data

Book Details:

Author : Peter Bühlmann
Publisher : Springer Science & Business Media
Release : 2011-06-08
ISBN : 364220192X
Pages : 568 pages

Download or read book Statistics for High Dimensional Data written by Peter Bühlmann and published by Springer Science & Business Media. This book was released on 2011-06-08 with total page 568 pages. Available in PDF, EPUB and Kindle. Book excerpt: Modern statistics deals with large and complex data sets, and consequently with models containing a large number of parameters. This book presents a detailed account of recently developed approaches, including the Lasso and versions of it for various models, boosting methods, undirected graphical modeling, and procedures controlling false positive selections. A special characteristic of the book is that it contains comprehensive mathematical theory on high-dimensional statistics combined with methodology, algorithms and illustrations with real data examples. This in-depth approach highlights the methods’ great potential and practical applicability in a variety of settings. As such, it is a valuable resource for researchers, graduate students and experts in statistics, applied mathematics and computer science.

Improved Tools for Large scale Hypothesis Testing

Book Details:

Author : Zihao Zheng
Publisher :
Release : 2022
ISBN :
Pages : 0 pages

Download or read book Improved Tools for Large scale Hypothesis Testing written by Zihao Zheng and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Large-scale hypothesis testing, as one of the key statistical tools, has been widely studied and applied to high throughput bioinformatics experiments, such as high-density peptide array studies and brain image data sets. The high dimensionality and small sample size of many experiments challenge conventional statistical approaches, including those aiming to control the false discovery rate (FDR). Motivated by this, in this dissertation, I develop several improved statistical and computational tools for large-scale hypothesis testing. The first method, MixTwice, advances an empirical-Bayesian tool that computes local false discovery rate statistics when provided with data on estimated effects and estimated standard errors. I also extend this method from two group comparison problems to multiple group comparison settings and develop a generalized method called MixTwice-ANOVA. The second method GraphicalT calculates local FDRs semiparametrically using available graph-associated information. The first method, MixTwice, introduces an empirical-Bayes approach that involves the estimation of two mixing distributions, one on underlying effects and one on underlying variance parameters. Provided with the estimated effect sizes and estimated errors, MixTwice estimates the mixing distribution and calculates the local false discovery rates via nonparametric MLE and constrained optimization with unimodal shape constraint of the effect distribution. Numerical experiments show that MixTwice can accurately estimate generative parameters and have good testing operating characteristics. Applied to a high-density peptide array, it powerfully identifies non-null peptides to recover meaningful peptide markers when the underlying signal is weak, and has strong reproducibility properties when the underlying signal is strong. The second contribution of this dissertation generalizes MixTwice from scenarios comparing two conditions to scenarios comparing multiple groups. Similar to MixTwice, MixTwice-ANOVA takes numerator and denominator statistics of F test to estimate two underlying mixing distributions. Compared with other large-scale testing tools for one-way ANOVA settings, MixTwice-ANOVA has better power properties and FDR control through numerical experiments. Applied to the peptide array study comparing multiple Sjogren-disease (SjD) populations, the proposed approach discovers meaningful epitope structure and novel scientific findings on Sjogren disease. Numerical experiments support evaluation among testing tools. Besides the methodology contribution of MixTwice in large-scale testing, I also discuss generalized evaluation and computational aspects. For the former part, I propose an evaluation metric, in additional to FDR control, power, etc., called reproducibility, to provide a practical guide for different testing tools. For the latter part, I borrow the idea from pool adjacent violator algorithm (PAVA) and advance a computational algorithm called EM-PAVA to solve nonparametric MLE with isotonic partial order constraint. This algorithm is discussed through theoretical guarantees and computational performances. The last contribution of this dissertation deals with large-scale testing problems with graph-associated data. Different from many studies that incorporate the graph-associated information through detailed modeling specifications, GraphicalT provides a semiparametric way to calculate the local false discovery rates using available auxiliary data graph. The method shows good performance in synthetic examples and in a brain-imaging problem from the study of Alzheimer's disease.