EBookClubs

Read Books & Download eBooks Full Online

EBookClubs

Read Books & Download eBooks Full Online

Book Computational and Statistical Methods for Extracting Biological Signal from High Dimensional Microbiome Data

Download or read book Computational and Statistical Methods for Extracting Biological Signal from High Dimensional Microbiome Data written by Gibraan Rahman and published by . This book was released on 2023 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Next-generation sequencing (NGS) has effected an explosion of research into the relationship between genetic information and a variety of biological conditions. One of the most exciting areas of study is how the trillions of microbial species that we share this Earth with affect our health. However, the process of extracting useful biological insights from this breadth of data is far from trivial. There are numerous statistical and computational considerations in addition to the already complex and messy biological problems. In this thesis, I describe my work on developing and implementing software to tackle the complex world of statistical microbiome analysis. In the first part of this thesis, we review the applications and challenges of performing dimensionality reduction on microbiome data comprising thousands of microbial taxa. When dealing with this high dimensionality, it is imperative to be able to get an overview of the community structure in a lower dimensional space that can be both visualized and interpreted. We review the statistical considerations for dimensionality reduction and the existing tools and algorithms that can and cannot address them. This includes discussions about sparsity, compositionality, and phylogenetic signal. We also make recommendations about tools and algorithms to consider for different use-cases. In the second part of this thesis, we present a new software, Evident, designed to assist researchers with statistical analysis of microbiome effect sizes and power analysis. Effect sizes of statistical tests are not widely reported in microbiome datasets, limiting the interpretability of community differences such as alpha and beta diversity. As more large microbiome studies are produced, researchers have the opportunity to mine existing datasets to get a sense of the effect size for different biological conditions. These, in turn, can be used to perform power analysis prior to designing an experiment, allowing researchers to better allocate resources. We show how Evident is scalable to dozens of datasets and provides easy calculation and exploration of effect sizes and power analysis from existing data. In the third part of this thesis, we describe a novel investigation into the joint microbiome and metabolome axis in colorectal cancer. In most cases of sporadic colorectal cancers (CRC), tumorigenesis is a multistep process driven by genomic alterations in concert with dietary influences. In addition, mounting evidence has implicated the gut microbiome as an effector in the development and progression of CRC. While large meta-analyses have provided mechanistic insight into disease progression in CRC patients, study heterogeneity has limited causal associations. To address this limitation, multi-omics studies on genetically controlled cohorts of mice were performed to distinguish genetic and dietary influences. Diet was identified as the major driver of microbial and metabolomic differences, with reductions in alpha diversity and widespread changes in cecal metabolites seen in HFD-fed mice. Similarly, the levels of non-classic amino acid conjugated forms of the bile acid cholic acid (AA-CAs) increased with HFD. We show that these AA-CAs signal through the nuclear receptor FXR and membrane receptor TGR5 to functionally impact intestinal stem cell growth. In addition, the poor intestinal permeability of these AA-CAs supports their localization in the gut. Moreover, two cryptic microbial strains, Ileibacterium valens and Ruminococcus gnavus, were shown to have the capacity to synthesize these AA-CAs. This multi-omics dataset from CRC mouse models supports diet-induced shifts in the microbiome and metabolome in disease progression with potential utility in directing future diagnostic and therapeutic developments. In the fourth chapter, we demonstrate a new framework for performing differential abundance analysis using customized statistical modeling. As we learn more and more about the relationship between the microbiome and biological conditions, experimental protocols are becoming more and more complex. For example, meta-analyses, interventions, longitudinal studies, etc. are being used to better understand the dynamic nature of the microbiome. However, statistical methods to analyze these relationships are lacking--especially in the field of differential abundance. Finding biomarkers associated with conditions of interest must be performed with statistical care when dealing with these kinds of experimental designs. We present BIRDMAn, a software package integrating probabilistic programming with Stan to build custom models for analyzing microbiome data. We show that, on both simulated and real datasets, BIRDMAn is able to extract novel biological signals that are missed by existing methods. These chapters, taken together, advance our knowledge of statistical analysis of microbiome data and provide tools and references for researchers looking to perform analysis on their own data.

Book Statistical Analysis of Microbiome Data

Download or read book Statistical Analysis of Microbiome Data written by Somnath Datta and published by Springer Nature. This book was released on 2021-10-27 with total page 349 pages. Available in PDF, EPUB and Kindle. Book excerpt: Microbiome research has focused on microorganisms that live within the human body and their effects on health. During the last few years, the quantification of microbiome composition in different environments has been facilitated by the advent of high throughput sequencing technologies. The statistical challenges include computational difficulties due to the high volume of data; normalization and quantification of metabolic abundances, relative taxa and bacterial genes; high-dimensionality; multivariate analysis; the inherently compositional nature of the data; and the proper utilization of complementary phylogenetic information. This has resulted in an explosion of statistical approaches aimed at tackling the unique opportunities and challenges presented by microbiome data. This book provides a comprehensive overview of the state of the art in statistical and informatics technologies for microbiome research. In addition to reviewing demonstrably successful cutting-edge methods, particular emphasis is placed on examples in R that rely on available statistical packages for microbiome data. With its wide-ranging approach, the book benefits not only trained statisticians in academia and industry involved in microbiome research, but also other scientists working in microbiomics and in related fields.

Book Statistical and Computational Methods for Microbiome Multi Omics Data

Download or read book Statistical and Computational Methods for Microbiome Multi Omics Data written by Himel Mallick and published by Frontiers Media SA. This book was released on 2020-11-19 with total page 170 pages. Available in PDF, EPUB and Kindle. Book excerpt: This eBook is a collection of articles from a Frontiers Research Topic. Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: frontiersin.org/about/contact.

Book High Dimensional Methods to Model Biological Signal in Genome Wide Studies

Download or read book High Dimensional Methods to Model Biological Signal in Genome Wide Studies written by Andrew J. Bass and published by . This book was released on 2021 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Recent advancements in sequencing technology have substantially increased the quality and quantity of data in genomics, presenting novel analytical challenges for biological discovery. In particular, foundational ideas developed in statistics over the past century are not easily extended to these high-dimensional datasets. Therefore, creating novel methodologies to analyze this data is a key challenge faced in statistics, and more generally, biology and computational science.Here I focus on building statistical methods for genome-wide analysis that are statistically rigorous, computationally fast, and easy to implement. In particular, I develop four methods that improve statistical inference of high-dimensional biological data. The first focuses on differential expression analysis where I extend the optimal discovery procedure (ODP) to complex study designs and RNA-seq studies. I find that the extended ODP leverages shared biological signal to substantially improve the statistical power compared to other commonly used testing procedures. The second aims to model the functional relationship between sequencing depth and statistical power in RNA-seq differential expression studies. The resulting model, superSeq, accurately predicts the improvement in statistical power when sequencing additional reads in a completed study. Thus superSeq can guide researchers in choosing a sufficient sequencing depth to maximize statistical power while avoiding unnecessary sequencing costs.The third method estimates the posterior distribution of false discovery rate (FDR) quantities, such as local FDRs and q-values, using a Bayesian nonparametric approach. Specifically, I implement an approximation to these posterior distributions that is scalable to genome-wide datasets using variational inference. These estimated posterior distributions are informative in a significance analysis as they capture the uncertainty of FDR quantities in reported results.Finally, I develop a likelihood-based approach to estimating unobserved population structure on the canonical parameter scale. I demonstrate that this framework can flexibly capture arbitrary structure and provide accurate allele frequency estimates while being computationally fast for large population genetic studies. Therefore, this framework is useful for many applications in population genetics, such as accounting for structure in the genome-wide association testing procedure GCATest.Collectively, these four methods address problems typically encountered in a biological analysis and can thus help improve downstream inferences in high-dimensional settings.

Book Computational Methods for the Analysis of Genomic Data and Biological Processes

Download or read book Computational Methods for the Analysis of Genomic Data and Biological Processes written by Francisco A. Gómez Vela and published by MDPI. This book was released on 2021-02-05 with total page 222 pages. Available in PDF, EPUB and Kindle. Book excerpt: In recent decades, new technologies have made remarkable progress in helping to understand biological systems. Rapid advances in genomic profiling techniques such as microarrays or high-performance sequencing have brought new opportunities and challenges in the fields of computational biology and bioinformatics. Such genetic sequencing techniques allow large amounts of data to be produced, whose analysis and cross-integration could provide a complete view of organisms. As a result, it is necessary to develop new techniques and algorithms that carry out an analysis of these data with reliability and efficiency. This Special Issue collected the latest advances in the field of computational methods for the analysis of gene expression data, and, in particular, the modeling of biological processes. Here we present eleven works selected to be published in this Special Issue due to their interest, quality, and originality.

Book Statistical Methods for High Dimensional Data in Microbiome Research

Download or read book Statistical Methods for High Dimensional Data in Microbiome Research written by Sven Kleine Bardenhorst and published by . This book was released on 2024 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Applied Microbiome Statistics

Download or read book Applied Microbiome Statistics written by Yinglin Xia and published by CRC Press. This book was released on 2024-07-22 with total page 457 pages. Available in PDF, EPUB and Kindle. Book excerpt: This unique book officially defines microbiome statistics as a specific new field of statistics and addresses the statistical analysis of correlation, association, interaction, and composition in microbiome research. It also defines the study of the microbiome as a hypothesis-driven experimental science and describes two microbiome research themes and six unique characteristics of microbiome data, as well as investigating challenges for statistical analysis of microbiome data using the standard statistical methods. This book is useful for researchers of biostatistics, ecology, and data analysts. Presents a thorough overview of statistical methods in microbiome statistics of parametric and nonparametric correlation, association, interaction, and composition adopted from classical statistics and ecology and specifically designed for microbiome research. Performs step-by-step statistical analysis of correlation, association, interaction, and composition in microbiome data. Discusses the issues of statistical analysis of microbiome data: high dimensionality, compositionality, sparsity, overdispersion, zero-inflation, and heterogeneity. Investigates statistical methods on multiple comparisons and multiple hypothesis testing and applications to microbiome data. Introduces a series of exploratory tools to visualize composition and correlation of microbial taxa by barplot, heatmap, and correlation plot. Employs the Kruskal–Wallis rank-sum test to perform model selection for further multi-omics data integration. Offers R code and the datasets from the authors’ real microbiome research and publicly available data for the analysis used. Remarks on the advantages and disadvantages of each of the methods used.

Book Computational Methods for Microbiome Analysis

Download or read book Computational Methods for Microbiome Analysis written by Joao Carlos Setubal and published by Frontiers Media SA. This book was released on 2021-02-02 with total page 170 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Statistical Methods for Human Microbiome Data Analysis

Download or read book Statistical Methods for Human Microbiome Data Analysis written by Jun Chen and published by . This book was released on 2012 with total page 107 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Bioinformatic and Statistical Analysis of Microbiome Data

Download or read book Bioinformatic and Statistical Analysis of Microbiome Data written by Yinglin Xia and published by Springer Nature. This book was released on 2023-06-16 with total page 717 pages. Available in PDF, EPUB and Kindle. Book excerpt: This unique book addresses the bioinformatic and statistical modelling and also the analysis of microbiome data using cutting-edge QIIME 2 and R software. It covers core analysis topics in both bioinformatics and statistics, which provides a complete workflow for microbiome data analysis: from raw sequencing reads to community analysis and statistical hypothesis testing. It includes real-world data from the authors’ research and from the public domain, and discusses the implementation of QIIME 2 and R for data analysis step-by-step. The data as well as QIIME 2 and R computer programs are publicly available, allowing readers to replicate the model development and data analysis presented in each chapter so that these new methods can be readily applied in their own research. Bioinformatic and Statistical Analysis of Microbiome Data is an ideal book for advanced graduate students and researchers in the clinical, biomedical, agricultural, and environmental fields, as well as those studying bioinformatics, statistics, and big data analysis.

Book Statistical Methods for High Dimensional Count and Compositional Data with Applications to Microbiome Studies

Download or read book Statistical Methods for High Dimensional Count and Compositional Data with Applications to Microbiome Studies written by Yuanpei Cao and published by . This book was released on 2016 with total page 202 pages. Available in PDF, EPUB and Kindle. Book excerpt: Next generation sequencing (NGS) technologies make the studies of microbiomes in very large-scale possible without cultivation in vitro. One approach to sequencing-based microbiome studies is to sequence specific genes (often the 16S rRNA gene) to produce a profile of diversity of bacterial taxa. Alternatively, the NGS-based sequencing strategy, also called shotgun metagenomics, provides further insights at the molecular level, such as species/strain quantification, gene function analysis and association studies. Such studies generate large-scale high-dimensional count and compositional data, which are the focus of this dissertation.

Book Statistical Methods for the Analysis of Microbiome Data

Download or read book Statistical Methods for the Analysis of Microbiome Data written by Anna M. Plantinga and published by . This book was released on 2018 with total page 128 pages. Available in PDF, EPUB and Kindle. Book excerpt: The human microbiome plays a vital role in maintaining health, and imbalances in the microbiome are associated with a wide variety of diseases. Understanding whether and how the microbiome is associated with particular health conditions is a focus of many modern microbiome studies, with the hope that a deeper understanding of these associations may lead to more effective prevention and treatment regimens. However, how best to analyze data from microbiome profiling studies remains unclear. The high dimensionality, compositional nature, intrinsic biological structure, and limited availability of samples pose substantial statistical challenges. To face these challenges, we propose novel analytic approaches based on sparse penalized regression strategies and distance-based global association analysis. Most distance-based methods for global microbiome association analysis are restricted to simple dichotomous or quantitative outcomes, but more complex outcomes are increasingly common in microbiome studies. In the first part of this dissertation, we introduce two distance-based methods for the analysis of entire microbial communities in modern microbiome studies. We develop a kernel machine regression-based score test for association between the microbiome and censored time-to-event outcomes. We then propose a novel longitudinal measure of dissimilarity that summarizes changes in the microbiome across time and compares these changes between subjects. Since this dissimilarity may be incorporated into any distance-based analysis framework, it is a highly flexible tool for applying a wide variety of distance-based analyses in longitudinal studies. Identification of associated taxa and detection of predictive microbial signatures are key to translation of microbiome studies. In the second part of this dissertation, we present two penalized regression methods for estimation and prediction with high-dimensional compositional data. Because phylogenetic similarity between bacteria often corresponds to shared functions, our first contribution is to incorporate phylogenetic structure into a penalized regression model for constrained data. We then propose a model that exploits phylogenetic structure to use partial information in the setting of differing feature sets between model-building and prediction datasets. We evaluate the performance of these methods through extensive simulation studies and apply them to studies investigating the association of graft-versus-host disease or body mass index with the gut microbiome.

Book Capturing Hidden Signals From High Dimensional Data and Applications to Genomics

Download or read book Capturing Hidden Signals From High Dimensional Data and Applications to Genomics written by Elior Rahmani and published by . This book was released on 2020 with total page 223 pages. Available in PDF, EPUB and Kindle. Book excerpt: The analysis of high-dimensional data, albeit challenging owing to various computational and statistical aspects, often provides opportunities to uncover hidden signals by leveraging inherent structure in the data. In the context of genomics, where molecular markers are probed at ever-increasing resolution and throughput, large sets of features that follow specific patterns, in conjunction with large sample sizes, allow us to implement richer and more sophisticated models than before in attempt to extract signal that is not immediately evident from the data. Particularly, genomic markers are often affected by multiple genetic and environmental factors, they may differ in their regulation and presentation in different tissues, cell types, conditions, or over time, and some markers may affect multiple biological processes; unveiling those signals is likely to be pivotal in advancing our understanding of complex biology and disease. This dissertation introduces novel computational methodologies and theory that address several key challenges faced in the analysis of high-dimensional genomic data coming from heterogeneous sources ("bulk" genomics) with a particular focus on DNA methylation data. Through a range of simulations and the analysis of multiple data sets, we demonstrate that our proposed methods provide opportunities to conduct powerful and statistically sound population-level studies at an unprecedented resolution and scale.

Book Statistical and Computational Methods for Analyzing High Throughput Genomic Data

Download or read book Statistical and Computational Methods for Analyzing High Throughput Genomic Data written by Jingyi Li and published by . This book was released on 2013 with total page 226 pages. Available in PDF, EPUB and Kindle. Book excerpt: In the burgeoning field of genomics, high-throughput technologies (e.g. microarrays, next-generation sequencing and label-free mass spectrometry) have enabled biologists to perform global analysis on thousands of genes, mRNAs and proteins simultaneously. Extracting useful information from enormous amounts of high-throughput genomic data is an increasingly pressing challenge to statistical and computational science. In this thesis, I will address three problems in which statistical and computational methods were used to analyze high-throughput genomic data to answer important biological questions. The first part of this thesis focuses on addressing an important question in genomics: how to identify and quantify mRNA products of gene transcription (i.e., isoforms) from next-generation mRNA sequencing (RNA-Seq) data? We developed a statistical method called Sparse Linear modeling of RNA-Seq data for Isoform Discovery and abundance Estimation (SLIDE) that employs probabilistic modeling and L1 sparse estimation to answer this ques- tion. SLIDE takes exon boundaries and RNA-Seq data as input to discern the set of mRNA isoforms that are most likely to present in an RNA-Seq sample. It is based on a linear model with a design matrix that models the sampling probability of RNA-Seq reads from different mRNA isoforms. To tackle the model unidentifiability issue, SLIDE uses a modified Lasso procedure for parameter estimation. Compared with existing deterministic isoform assembly algorithms, SLIDE considers the stochastic aspects of RNA-Seq reads in exons from different isoforms and thus has increased power in detecting more novel isoforms. Another advantage of SLIDE is its flexibility of incorporating other transcriptomic data into its model to further increase isoform discovery accuracy. SLIDE can also work downstream of other RNA-Seq assembly algorithms to integrate newly discovered genes and exons. Besides isoform discovery, SLIDE sequentially uses the same linear model to estimate the abundance of discovered isoforms. Simulation and real data studies show that SLIDE performs as well as or better than major competitors in both isoform discovery and abundance estimation. The second part of this thesis demonstrates the power of simple statistical analysis in correcting biases of system-wide protein abundance estimates and in understanding the rela- tionship between gene transcription and protein abundances. We found that proteome-wide surveys have significantly underestimated protein abundances, which differ greatly from previously published individual measurements. We corrected proteome-wide protein abundance estimates by using individual measurements of 61 housekeeping proteins, and then found that our corrected protein abundance estimates show a higher correlation and a stronger linear relationship with mRNA abundances than do the uncorrected protein data. To estimate the degree to which mRNA expression levels determine protein levels, it is critical to measure the error in protein and mRNA abundance data and to consider all genes, not only those whose protein expression is readily detected. This is a fact that previous proteome-widely surveys ignored. We took two independent approaches to re-estimate the percentage that mRNA levels explain in the variance of protein abundances. While the percentages estimated from the two approaches vary on different sets of genes, all suggest that previous protein-wide surveys have significantly underestimated the importance of transcription. In the third and final part, I will introduce a modENCODE (the Model Organism ENCyclopedia Of DNA Elements) project in which we compared developmental stages, tis- sues and cells (or cell lines) of Drosophila melanogaster and Caenorhabditis elegans, two well-studied model organisms in developmental biology. To understand the similarity of gene expression patterns throughout their development time courses is an interesting and important question in comparative genomics and evolutionary biology. The availability of modENCODE RNA-Seq data for different developmental stages, tissues and cells of the two organisms enables a transcriptome-wide comparison study to address this question. We undertook a comparison of their developmental time courses and tissues/cells, seeking com- monalities in orthologous gene expression. Our approach centers on using stage/tissue/cell- associated orthologous genes to link the two organisms. For every stage/tissue/cell in each organism, its associated genes are selected as the genes capturing specific transcriptional activities: genes highly expressed in that stage/tissue/cell but lowly expressed in a few other stages/tissues/cells. We aligned a pair of D. melanogaster and C. elegans stages/tissues/cells by a hypergeometric test, where the test statistic is the number of orthologous gene pairs associated with both stages/tissues/cells. The test is against the null hypothesis that the two stages/tissues/cells have independent sets of associated genes. We first carried out the alignment approach on pairs of stages/tissues/cells within D. melanogaster and C. elegans respectively, and the alignment results are consistent with previous findings, supporting the validity of this approach. When comparing fly with worm, we unexpectedly observed two parallel collinear alignment patterns between their developmental timecourses and several interesting alignments between their tissues and cells. Our results are the first findings regarding a comprehensive comparison between D. melanogaster and C. elegans time courses, tissues and cells.

Book Big Data in Omics and Imaging

    Book Details:
  • Author : MOMIAO. XIONG
  • Publisher : CRC Press
  • Release : 2021-06-30
  • ISBN : 9781032095981
  • Pages : 668 pages

Download or read book Big Data in Omics and Imaging written by MOMIAO. XIONG and published by CRC Press. This book was released on 2021-06-30 with total page 668 pages. Available in PDF, EPUB and Kindle. Book excerpt: Big Data in Omics and Imaging: Association Analysis addresses the recent development of association analysis and machine learning for both population and family genomic data in sequencing era. It is unique in that it presents both hypothesis testing and a data mining approach to holistically dissecting the genetic structure of complex traits and to designing efficient strategies for precision medicine. The general frameworks for association analysis and machine learning, developed in the text, can be applied to genomic, epigenomic and imaging data. FEATURES Bridges the gap between the traditional statistical methods and computational tools for small genetic and epigenetic data analysis and the modern advanced statistical methods for big data Provides tools for high dimensional data reduction Discusses searching algorithms for model and variable selection including randomization algorithms, Proximal methods and matrix subset selection Provides real-world examples and case studies Will have an accompanying website with R code The book is designed for graduate students and researchers in genomics, bioinformatics, and data science. It represents the paradigm shift of genetic studies of complex diseases- from shallow to deep genomic analysis, from low-dimensional to high dimensional, multivariate to functional data analysis with next-generation sequencing (NGS) data, and from homogeneous populations to heterogeneous population and pedigree data analysis. Topics covered are: advanced matrix theory, convex optimization algorithms, generalized low rank models, functional data analysis techniques, deep learning principle and machine learning methods for modern association, interaction, pathway and network analysis of rare and common variants, biomarker identification, disease risk and drug response prediction.

Book Microbiome Analysis

Download or read book Microbiome Analysis written by Robert G. Beiko and published by . This book was released on 2018 with total page 324 pages. Available in PDF, EPUB and Kindle. Book excerpt: