[EBOOK] Statistical Methods For Genome Wide Association Studies On Biobank Data PDF Download

Statistical Methods for Genome Wide Association Studies on Biobank Data

Book Details:

Author : Christopher Austin German
Publisher :
Release : 2021
ISBN :
Pages : 162 pages

Download or read book Statistical Methods for Genome Wide Association Studies on Biobank Data written by Christopher Austin German and published by . This book was released on 2021 with total page 162 pages. Available in PDF, EPUB and Kindle. Book excerpt: Genome-Wide Association Studies (GWAS) encompass an important area of statistical genetics. They seek to identify single-nucleotide polymorphisms (SNPs) that are associated with a trait of interest. It is becoming more common for large-scale resources of patient data such as biobanks to become available to researchers that include both genetic data and phenotype data from electronic health records (EHR). New techniques for GWAS are necessary to handle both the large sample sizes and the types of complex data generated from these resources. The first chapter aims to tackle both of these issues by establishing an efficient method of conducting a genome-wide scan of SNPs associated with ordinal traits, which commonly occur from phenotyping algorithms for complex diseases. Chapter two focuses on estimating the effects of covariates on intra-individual variances in a framework that can scale to big longitudinal data. Within-subject variances of traits such as blood pressure have been found to be risk factors, independent of mean levels, for a variety of conditions such as cardiovascular disease. We develop a weighted method of moments (MoM) framework for fitting a mixed effects location-scale model that is robust to distributional assumptions and is computationally tractable for biobank-sized data sets. The third chapter uses the framework from the second chapter to develop and conduct large-scale GWAS, identifying variants associated with intra-individual variability of longitudinal traits. In all of these projects, a main focus is ensuring that the methods can scale to the large sample sizes common in biobank data sets.

Science

Statistical Methods Computing and Resources for Genome Wide Association Studies

Book Details:

Author : Riyan Cheng
Publisher : Frontiers Media SA
Release : 2021-08-24
ISBN : 2889712125
Pages : 148 pages

Download or read book Statistical Methods Computing and Resources for Genome Wide Association Studies written by Riyan Cheng and published by Frontiers Media SA. This book was released on 2021-08-24 with total page 148 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Medical

Methods in Statistical Genomics

Book Details:

Author : Philip Chester Cooley
Publisher : RTI Press
Release : 2016-08-29
ISBN : 1934831166
Pages : 163 pages

Download or read book Methods in Statistical Genomics written by Philip Chester Cooley and published by RTI Press. This book was released on 2016-08-29 with total page 163 pages. Available in PDF, EPUB and Kindle. Book excerpt: The objective of this book is to describe procedures for analyzing genome-wide association studies (GWAS). Some of the material is unpublished and contains commentary and unpublished research; other chapters (Chapters 4 through 7) have been published in other journals. Each previously published chapter investigates a different genomics model, but all focus on identifying the strengths and limitations of various statistical procedures that have been applied to different GWAS scenarios.

Statistical Methods for Genome wide Association Studies and Personalized Medicine

Book Details:

Author :
Publisher :
Release : 2014
ISBN :
Pages : 172 pages

Download or read book Statistical Methods for Genome wide Association Studies and Personalized Medicine written by and published by . This book was released on 2014 with total page 172 pages. Available in PDF, EPUB and Kindle. Book excerpt: In genome-wide association studies (GWAS), researchers analyze the genetic variation across the entire human genome, searching for variations that are associated with observable traits or certain diseases. There are several inference challenges, including the huge number of genetic markers to test, the weak association between truly associated markers and the traits, and the correlation structure between the genetic markers. We discuss the problem of high dimensional statistical inference, especially capturing the dependence among multiple hypotheses. Chapter 3 proposes a feature selection approach based on a unique graphical model which can leverage correlation structure among the markers. This graphical model-based feature selection approach significantly outperforms the conventional feature selection methods used in GWAS. Chapter 4 reformulates this feature selection approach as a multiple testing procedure that has many elegant properties, including controlling false discovery rate at a specified level and significantly improving the power of the tests. In order to relax the parametric assumption within the model, Chapter 5 further proposes a semiparametric graphical model which estimates f1 adaptively. These statistical methods are based on graphical models, and their parameter learning is difficult due to the intractable normalization constant. Capturing the hidden patterns and heterogeneity within the parameters is even harder. Chapters 6 and 7 discuss the problem of learning large-scale graphical models, especially dealing with issues of heterogeneous parameters and latently-grouped parameters. Chapter 6 proposes a nonparametric approach which can adaptively integrate background knowledge about how the different parts of the graph can vary. For learning latently-grouped parameters in undirected graphical models, Chapter 7 imposes Dirichlet process priors over the parameters and estimates the parameters in a Bayesian framework. Chapter 8 explores the potential translation of GWAS discoveries to clinical breast cancer diagnosis. We discovered that, using SNPs known to be associated with breast cancer, we can better stratify patients and thereby significantly reduce false positives during breast cancer diagnosis, alleviating the risk of overdiagnosis. This result suggests that when radiologists are making medical decisions from mammograms (such as suggesting follow-up biopsies), they can consider these risky SNPs for more accurate decisions if the patients' genotype data are available.

Medical

Design Analysis and Interpretation of Genome Wide Association Scans

Book Details:

Author : Daniel O. Stram
Publisher : Springer Science & Business Media
Release : 2013-11-23
ISBN : 1461494435
Pages : 344 pages

Download or read book Design Analysis and Interpretation of Genome Wide Association Scans written by Daniel O. Stram and published by Springer Science & Business Media. This book was released on 2013-11-23 with total page 344 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book presents the statistical aspects of designing, analyzing and interpreting the results of genome-wide association scans (GWAS studies) for genetic causes of disease using unrelated subjects. Particular detail is given to the practical aspects of employing the bioinformatics and data handling methods necessary to prepare data for statistical analysis. The goal in writing this book is to give statisticians, epidemiologists, and students in these fields the tools to design a powerful genome-wide study based on current technology. The other part of this is showing readers how to conduct analysis of the created study. Design and Analysis of Genome-Wide Association Studies provides a compendium of well-established statistical methods based upon single SNP associations. It also provides an introduction to more advanced statistical methods and issues. Knowing that technology, for instance large scale SNP arrays, is quickly changing, this text has significant lessons for future use with sequencing data. Emphasis on statistical concepts that apply to the problem of finding disease associations irrespective of the technology ensures its future applications. The author includes current bioinformatics tools while outlining the tools that will be required for use with extensive databases from future large scale sequencing projects. The author includes current bioinformatics tools while outlining additional issues and needs arising from the extensive databases from future large scale sequencing projects.

Medical

The Fundamentals of Modern Statistical Genetics

Book Details:

Author : Nan M. Laird
Publisher : Springer Science & Business Media
Release : 2010-12-13
ISBN : 1441973389
Pages : 226 pages

Download or read book The Fundamentals of Modern Statistical Genetics written by Nan M. Laird and published by Springer Science & Business Media. This book was released on 2010-12-13 with total page 226 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book covers the statistical models and methods that are used to understand human genetics, following the historical and recent developments of human genetics. Starting with Mendel’s first experiments to genome-wide association studies, the book describes how genetic information can be incorporated into statistical models to discover disease genes. All commonly used approaches in statistical genetics (e.g. aggregation analysis, segregation, linkage analysis, etc), are used, but the focus of the book is modern approaches to association analysis. Numerous examples illustrate key points throughout the text, both of Mendelian and complex genetic disorders. The intended audience is statisticians, biostatisticians, epidemiologists and quantitatively- oriented geneticists and health scientists wanting to learn about statistical methods for genetic analysis, whether to better analyze genetic data, or to pursue research in methodology. A background in intermediate level statistical methods is required. The authors include few mathematical derivations, and the exercises provide problems for students with a broad range of skill levels. No background in genetics is assumed.

STATISTICAL METHOD of GENETIC ASSOCIATION STUDIES

Book Details:

Author :
Publisher :
Release : 2022
ISBN :
Pages : 0 pages

Download or read book STATISTICAL METHOD of GENETIC ASSOCIATION STUDIES written by and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Abstract : In genome-wide association studies (GWAS) for thousands of phenotypes in biobanks, most binary phenotypes have substantially fewer cases than controls. Many widely used approaches for joint analysis of multiple phenotypes in association studies produce inflated type I error rates for such extremely unbalanced case-control phenotypes. In our research, we develop two novel methods to jointly analyze multiple unbalanced case-control phenotypes to circumvent this issue. In the first method, we cluster multiple phenotypes into different clusters based on a hierarchical clustering method, then we merge phenotypes in each cluster into a single phenotype. In each cluster, we use the saddlepoint approximation to estimate the p-value of an association test between the merged phenotype and a SNP which eliminates the issue of inflated type I error rate of the test for extremely unbalanced case-control phenotypes. Finally, we use the Cauchy combination method to obtain an integrated p-value for all clusters to test the association between multiple phenotypes and a SNP. In the second method, we first construct a Multi-Layer Network (MLN) using all individuals with at least one case status among all phenotypes. Then, we introduce a computational efficient community detection method to group phenotypes into different disjoint clusters based on the MLN. The phenotypes in the same cluster are merged to a single phenotype which mainly eliminates the issue of inflated type I error rate of test for extremely unbalanced binary phenotypes. Finally, to test the association between all phenotypes and a SNP, we use the score test statistic to test the association between each merged phenotype and a SNP and then use the Omnibus test to obtain an overall p-value (MLN-O). Extensive simulation studies reveal that the newly proposed approaches can control type I error rates and are more powerful than other methods we compared with. The real data analyses also show that our methods outperform other methods we compared with.

Science

Assessing Gene Environment Interactions in Genome Wide Association Studies Statistical Approaches

Book Details:

Author : Philip C. Cooley
Publisher : RTI Press
Release : 2014-05-14
ISBN :
Pages : 24 pages

Download or read book Assessing Gene Environment Interactions in Genome Wide Association Studies Statistical Approaches written by Philip C. Cooley and published by RTI Press. This book was released on 2014-05-14 with total page 24 pages. Available in PDF, EPUB and Kindle. Book excerpt: In this report, we address a scenario that uses synthetic genotype case-control data that is influenced by environmental factors in a genome-wide association study (GWAS) context. The precise way the environmental influence contributes to a given phenotype is typically unknown. Therefore, our study evaluates how to approach a GWAS that may have an environmental component. Specifically, we assess different statistical models in the context of a GWAS to make association predictions when the form of the environmental influence is questionable. We used a simulation approach to generate synthetic data corresponding to a variety of possible environmental-genetic models, including a “main effects only” model as well as a “main effects with interactions” model. Our method takes into account the strength of the association between phenotype and both genotype and environmental factors, but we focus on low-risk genetic and environmental risks that necessitate using large sample sizes (N = 10,000 and 200,000) to predict associations with high levels of confidence. We also simulated different Mendelian gene models, and we analyzed how the collection of factors influences statistical power in the context of a GWAS. Using simulated data provides a “truth set” of known outcomes such that the association-affecting factors can be unambiguously determined. We also test different statistical methods to determine their performance properties. Our results suggest that the chances of predicting an association in a GWAS is reduced if an environmental effect is present and the statistical model does not adjust for that effect. This is especially true if the environmental effect and genetic marker do not have an interaction effect. The functional form of the statistical model also matters. The more accurately the form of the environmental influence is portrayed by the statistical model, the more accurate the prediction will be. Finally, even with very large samples sizes, association predictions involving recessive markers with low risk can be poor

Statistical Methods for Aggregating Trans Ancestry Genome Wide Association Summary Statistics Its Applications to Nicotine Alcohol Addiction Phenotypes

Book Details:

Download or read book Statistical Methods for Aggregating Trans Ancestry Genome Wide Association Summary Statistics Its Applications to Nicotine Alcohol Addiction Phenotypes written by Xingyan Wang and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Large-scale biobank data is available due to the decreasing cost of sequencing in the past decades. This also allows studies to generate summary association statistics at the cohort level that can be shared with researchers while protecting study participants' privacy. Moreover, by aggregating effects across different studies, we can also increase the power of detecting novel genetic associations between variants and phenotypes of interest. Genome-wide association studies (GWAS) also start to include samples of non-European ancestries. Genetic effects can differ between ancestries, which demands meta-analysis methods to better account for the genetic effect heterogeneities in trans-ancestry meta-analysis. In order to consider the trans-ancestry characteristic, we propose (1) a trans-ancestry meta-analysis method that uses ancestry effects derived from allele frequency to increase power; (2) a trans-ancestry fine-mapping method that can narrow down potential causal variants via statistic methods and quantify loci ancestry heterogeneity; (3) a local ancestry proportion estimation method based on summary statistics for the admixed cohort. We apply our methods to the GWAS and Sequencing Consortium of Alcohol and Nicotine use (GSCAN) study with four smoking-related traits and one drinking-related trait. We show that our methods outperform other methods through simulations and applied data analysis. We further extend our methods to investigate the cross trait genetic architecture, which partitions genetic effects into a component invariant between ancestry, components that vary with ancestry, and a component that varies independently from ancestry. We show that the genetic effect component invariant across ancestries shows the strongest cross-trait genetic correlations, demonstrating that pervasive pleiotropic effects are more likely to be shared across ancestries. We also apply the trans-ancestry TWAS method with GTEx data to discover novel associations at the gene level.

Science

Handbook of Statistical Genomics

Book Details:

Author : David J. Balding
Publisher : John Wiley & Sons
Release : 2019-09-10
ISBN : 1119429145
Pages : 1223 pages

Download or read book Handbook of Statistical Genomics written by David J. Balding and published by John Wiley & Sons. This book was released on 2019-09-10 with total page 1223 pages. Available in PDF, EPUB and Kindle. Book excerpt: A timely update of a highly popular handbook on statistical genomics This new, two-volume edition of a classic text provides a thorough introduction to statistical genomics, a vital resource for advanced graduate students, early-career researchers and new entrants to the field. It introduces new and updated information on developments that have occurred since the 3rd edition. Widely regarded as the reference work in the field, it features new chapters focusing on statistical aspects of data generated by new sequencing technologies, including sequence-based functional assays. It expands on previous coverage of the many processes between genotype and phenotype, including gene expression and epigenetics, as well as metabolomics. It also examines population genetics and evolutionary models and inference, with new chapters on the multi-species coalescent, admixture and ancient DNA, as well as genetic association studies including causal analyses and variant interpretation. The Handbook of Statistical Genomics focuses on explaining the main ideas, analysis methods and algorithms, citing key recent and historic literature for further details and references. It also includes a glossary of terms, acronyms and abbreviations, and features extensive cross-referencing between chapters, tying the different areas together. With heavy use of up-to-date examples and references to web-based resources, this continues to be a must-have reference in a vital area of research. Provides much-needed, timely coverage of new developments in this expanding area of study Numerous, brand new chapters, for example covering bacterial genomics, microbiome and metagenomics Detailed coverage of application areas, with chapters on plant breeding, conservation and forensic genetics Extensive coverage of human genetic epidemiology, including ethical aspects Edited by one of the leading experts in the field along with rising stars as his co-editors Chapter authors are world-renowned experts in the field, and newly emerging leaders. The Handbook of Statistical Genomics is an excellent introductory text for advanced graduate students and early-career researchers involved in statistical genetics.

Medical

Analysis of Complex Disease Association Studies

Book Details:

Author : Eleftheria Zeggini
Publisher : Academic Press
Release : 2010-11-17
ISBN : 0123751438
Pages : 353 pages

Download or read book Analysis of Complex Disease Association Studies written by Eleftheria Zeggini and published by Academic Press. This book was released on 2010-11-17 with total page 353 pages. Available in PDF, EPUB and Kindle. Book excerpt: According to the National Institute of Health, a genome-wide association study is defined as any study of genetic variation across the entire human genome that is designed to identify genetic associations with observable traits (such as blood pressure or weight), or the presence or absence of a disease or condition. Whole genome information, when combined with clinical and other phenotype data, offers the potential for increased understanding of basic biological processes affecting human health, improvement in the prediction of disease and patient care, and ultimately the realization of the promise of personalized medicine. In addition, rapid advances in understanding the patterns of human genetic variation and maturing high-throughput, cost-effective methods for genotyping are providing powerful research tools for identifying genetic variants that contribute to health and disease. This burgeoning science merges the principles of statistics and genetics studies to make sense of the vast amounts of information available with the mapping of genomes. In order to make the most of the information available, statistical tools must be tailored and translated for the analytical issues which are original to large-scale association studies. Analysis of Complex Disease Association Studies will provide researchers with advanced biological knowledge who are entering the field of genome-wide association studies with the groundwork to apply statistical analysis tools appropriately and effectively. With the use of consistent examples throughout the work, chapters will provide readers with best practice for getting started (design), analyzing, and interpreting data according to their research interests. Frequently used tests will be highlighted and a critical analysis of the advantages and disadvantage complimented by case studies for each will provide readers with the information they need to make the right choice for their research. Additional tools including links to analysis tools, tutorials, and references will be available electronically to ensure the latest information is available. Easy access to key information including advantages and disadvantage of tests for particular applications, identification of databases, languages and their capabilities, data management risks, frequently used tests Extensive list of references including links to tutorial websites Case studies and Tips and Tricks

STATISTICAL METHODS FOR JOINT ANALYSIS OF MULTIPLE PHENOTYPES AND THEIR APPLICATIONS FOR PHEWAS

Book Details:

Author :
Publisher :
Release : 2019
ISBN :
Pages : pages

Download or read book STATISTICAL METHODS FOR JOINT ANALYSIS OF MULTIPLE PHENOTYPES AND THEIR APPLICATIONS FOR PHEWAS written by and published by . This book was released on 2019 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Abstract : Genome-wide association studies (GWAS) have successfully detected tens of thousands of robust SNP-trait associations. Earlier researches have primarily focused on association studies of genetic variants and some well-defined functions or phenotypic traits. Emerging evidence suggests that pleiotropy, the phenomenon of one genetic variant affects multiple phenotypes, is widespread, especially in complex human diseases. Therefore, individual phenotype analyses may lose statistical power to identify the underlying genetic mechanism. Contrasting with single phenotype analyses, joint analysis of multiple phenotypes exploits the correlations between phenotypes and aggregates multiple weak marginal effects and is therefore likely to provide new insights into the functional consequences of genetic variations. This dissertation includes two papers, corresponding to two primary research projects I have done during my Ph.D. study, with each distributed in one chapter. Chapter 1 proposed an innovative method, which referred to as HC-CLC, for joint analysis of multipole phenotypes using a Hierarchical Clustering (HC) approach followed by a Clustering Linear Combination (CLC) method. The HC step partitions phenotypes into clusters. The CLC method is then used to test the association between the genetic variant and all phenotypes, which is done by combining individual test statistics while taking full advantage of the clustering information in the HC step. Extensive simulations together with the COPDGene data analysis have been used to assess the Type I error rates and the power of our proposed method. Our simulation results demonstrate that the Type I error rates of HC-CLC are effectively controlled in different realistic settings. HC-CLC either outperforms all other methods or has statistical power that is very close to the most powerful alternative method with which it has been compared. In addition, our real data analysis shows that HC-CLC is an appropriate method for GWAS. Chapter 2 redesigned the PheCLC (Phenome-wide association study that uses the CLC method) which was previously developed by our research group. The refined method is then applied on the UKBiobank data, a large cohort study across the United Kingdom, to test the validity and understand the limitations of the proposed method. We have named our new method UKB-PheCLC. The UKB-PheCLC method is an EHR-based PheWAS. In the first step, it classifies the whole phenome into different phenotypic categories according to the UK Biobank ICD codes. In the second step, the CLC method is applied to each phenotypic category to derive a CLC-based p-value for testing the association between the genetic variant of interest and all phenotypes in that category. In the third step, the CLC-based p-values of all categories are combined by using a strategy resemble that of the Adaptive Fisher's Combination (AFC) method. Overall, UKB-PheCLC harnesses the powerful resource of the UK Biobank and considers the possibility that phenotypes can be grouped into different phenotypic categories, which is very common in EHR-based PheWAS. Moreover, UKB-PheCLC can handle both qualitative and quantitative phenotypes, and it also doesn't require raw phenotype information. The real data analysis results confirm that UKB-PheCLC is more powerful than the existing methods we have it compared with. Thus, UKB-PheCLC can serve as a compelling method for phenome-wide association study.

Medical

Genetic Dissection of Complex Traits

Book Details:

Author : D.C. Rao
Publisher : Academic Press
Release : 2008-04-23
ISBN : 0080569110
Pages : 788 pages

Download or read book Genetic Dissection of Complex Traits written by D.C. Rao and published by Academic Press. This book was released on 2008-04-23 with total page 788 pages. Available in PDF, EPUB and Kindle. Book excerpt: The field of genetics is rapidly evolving and new medical breakthroughs are occuring as a result of advances in knowledge of genetics. This series continually publishes important reviews of the broadest interest to geneticists and their colleagues in affiliated disciplines. Five sections on the latest advances in complex traits Methods for testing with ethical, legal, and social implications Hot topics include discussions on systems biology approach to drug discovery; using comparative genomics for detecting human disease genes; computationally intensive challenges, and more

Modeling Biological Processes in Genome wide Association Studies Using Regularized Regression

Book Details:

Author : Gabriel Hoffman
Publisher :
Release : 2013
ISBN :
Pages : 336 pages

Download or read book Modeling Biological Processes in Genome wide Association Studies Using Regularized Regression written by Gabriel Hoffman and published by . This book was released on 2013 with total page 336 pages. Available in PDF, EPUB and Kindle. Book excerpt: Genome-wide association studies (GWAS) have become a a widely adopted approach to identify genetic variation that produces variation in complex phenotype. Standard statistical methods are able to identify strong associations in these datasets, but more sophisticated statistical methods that model complex aspects of the biological data can identify weaker associations and further elucidate the underlying molecular biology. We develop and apply statistical methods that explicitly model two aspects of GWAS data using two complementary forms of regularized regression. First, we model the polygenic architecture of complex phenotypes using feature selection methods in a penalized regression framework. We propose novel algorithmic, computational and heuristic approaches in order to produce a method that scales to high dimensional GWAS data and increases power to detect weak associations that are not detectable by standard tests. Second, we model the covariance between individuals due to kinship and population structure using a linear mixed model that regularizes the statistical contribution of a metric of ancestry. Linear mixed models have been widely adopted for analysis of GWAS data, but their theoretical properties have not been examined in this context. We formalize the statistical properties of the linear mixed model, develop a novel interpretation in relation to population genetics, and propose a novel low rank linear mixed model that learns the dimensionality of the correction for kinship and population structure from the data. Finally, we combine these two complementary regularized regression models into a penalized linear mixed model. We develop a unified model incorporating a novel algorithm with novel approaches to tuning nonconvex penalties and determining the optimal stopping point in the regularization path. Leveraging recent work on assessing significance of selected features, we produce a well-principled and scalable statistical method applicable to feature selection, hypothesis testing and prediction in many contexts.

Statistical Methods for Genome Wide Association Studies

Book Details:

Author : Malin Östensson
Publisher :
Release : 2012
ISBN : 9789173857420
Pages : 181 pages

Download or read book Statistical Methods for Genome Wide Association Studies written by Malin Östensson and published by . This book was released on 2012 with total page 181 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Electronic books

Statistical Methods for Integrative Analysis of Genomic Data

Book Details:

Author : Jingsi Ming
Publisher :
Release : 2018
ISBN :
Pages : 141 pages

Download or read book Statistical Methods for Integrative Analysis of Genomic Data written by Jingsi Ming and published by . This book was released on 2018 with total page 141 pages. Available in PDF, EPUB and Kindle. Book excerpt: Thousands of risk variants underlying complex phenotypes (quantitative traits and diseases) have been identified in genome-wide association studies (GWAS). However, there are still several challenges towards deepening our understanding of the genetic architectures of complex phenotypes. First, the majority of GWAS hits are in non-coding region and their biological interpretation is still unclear. Second, most complex traits are suggested to be highly polygenic, i.e., they are affected by a vast number of risk variants with individually small or moderate effects, whereas a large proportion of risk variants with small effects remain unknown. Third, accumulating evidence from GWAS suggests the pervasiveness of pleiotropy, a phenomenon that some genetic variants can be associated with multiple traits, but there is a lack of unified framework which is scalable to reveal relationship among a large number of traits and prioritize genetic variants simultaneously with functional annotations integrated. In this thesis, we propose two statistical methods to address these challenges using integrative analysis of summary statistics from GWASs and functional annotations. In the first part, we propose a latent sparse mixed model (LSMM) to integrate functional annotations with GWAS data. Not only does it increase the statistical power of identifying risk variants, but also offers more biological insights by detecting relevant functional annotations. To allow LSMM scalable to millions of variants and hundreds of functional annotations, we developed an efficient variational expectation-maximization (EM) algorithm for model parameter estimation and statistical inference. We first conducted comprehensive simulation studies to evaluate the performance of LSMM. Then we applied it to analyze 30 GWASs of complex phenotypes integrated with nine genic category annotations and 127 cell-type specific functional annotations from the Roadmap project. The results demonstrate that our method possesses more statistical power than conventional methods, and can help researchers achieve deeper understanding of genetic architecture of these complex phenotypes. In the second part, we propose a latent probit model (LPM) which combines summary statistics from multiple GWASs and functional annotations, to characterize relationship and increase statistical power to identify risk variants. LPM can also perform hypothesis testing for pleiotropy and annotations enrichment. To enable the scalability of LPM as the number of GWASs increases, we developed an efficient parameter-expanded EM (PX-EM) algorithm which can execute parallelly. We first validated the performance of LPM through comprehensive simulations, then applied it to analyze 44 GWASs with nine genic category annotations. The results demonstrate the benefits of LPM and can offer new insights of disease etiology.

Assessing Gene environment Interactions in Genome wide Association Studies

Book Details:

Author : Philip Chester Cooley
Publisher :
Release : 2014
ISBN :
Pages : 20 pages

Download or read book Assessing Gene environment Interactions in Genome wide Association Studies written by Philip Chester Cooley and published by . This book was released on 2014 with total page 20 pages. Available in PDF, EPUB and Kindle. Book excerpt: In this report, we address a scenario that uses synthetic genotype case-control data that is influenced by environmental factors in a genome-wide association study (GWAS) context. The precise way the environmental influence contributes to a given phenotype is typically unknown. Therefore, our study evaluates how to approach a GWAS that may have an environmental component. Specifically, we assess different statistical models in the context of a GWAS to make association predictions when the form of the environmental influence is questionable. We used a simulation approach to generate synthetic data corresponding to a variety of possible environmental-genetic models, including a "main effects only" model as well as a "main effects with interactions" model. Our method takes into account the strength of the association between phenotype and both genotype and environmental factors, but we focus on low-risk genetic and environmental risks that necessitate using large sample sizes (N = 10,000 and 200,000) to predict associations with high levels of confidence. We also simulated different Mendelian gene models, and we analyzed how the collection of factors influences statistical power in the context of a GWAS. Using simulated data provides a "truth set" of known outcomes such that the association-affecting factors can be unambiguously determined. We also test different statistical methods to determine their performance properties. Our results suggest that the chances of predicting an association in a GWAS is reduced if an environmental effect is present and the statistical model does not adjust for that effect. This is especially true if the environmental effect and genetic marker do not have an interaction effect. The functional form of the statistical model also matters. The more accurately the form of the environmental influence is portrayed by the statistical model, the more accurate the prediction will be. Finally, even with very large samples sizes, association predictions involving recessive markers with low risk can be poor.