[EBOOK] Statistical And Computational Methods For Analysis Of Spatial Transcriptomics Data PDF Download

Statistical and Computational Methods for Analysis of Spatial Transcriptomics Data

Book Details:

Author : Dylan Maxwell Cable
Publisher :
Release : 2020
ISBN :
Pages : 39 pages

Download or read book Statistical and Computational Methods for Analysis of Spatial Transcriptomics Data written by Dylan Maxwell Cable and published by . This book was released on 2020 with total page 39 pages. Available in PDF, EPUB and Kindle. Book excerpt: Spatial transcriptomic technologies measure gene expression at increasing spatial resolution, approaching individual cells. One limitation of current technologies is that spatial measurements may contain contributions from multiple cells, hindering the discovery of cell type-specific spatial patterns of localization and expression. In this thesis, I will explore the development of Robust Cell Type Decomposition (RCTD), a computational method that leverages cell type profiles learned from single-cell RNA sequencing data to decompose mixtures, such as those observed in spatial transcriptomic technologies. Our RCTD approach accounts for platform effects introduced by systematic technical variability inherent to different sequencing modalities. We demonstrate RCTD provides substantial improvement in cell type assignment in Slide-seq data by accurately reproducing known cell type and subtype localization patterns in the cerebellum and hippocampus. We further show the advantages of RCTD by its ability to detect mixtures and identify cell types on an assessment dataset. Finally, we show how RCTD’s recovery of cell type localization uniquely enables the discovery of genes within a cell type whose expression depends on spatial environment. Spatial mapping of cell types with RCTD has the potential to enable the definition of spatial components of cellular identity, uncovering new principles of cellular organization in biological tissue.

Mathematics

Spatial Analysis Along Networks

Book Details:

Author : Atsuyuki Okabe
Publisher : John Wiley & Sons
Release : 2012-08-13
ISBN : 0470770813
Pages : 0 pages

Download or read book Spatial Analysis Along Networks written by Atsuyuki Okabe and published by John Wiley & Sons. This book was released on 2012-08-13 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: In the real world, there are numerous and various events that occur on and alongside networks, including the occurrence of traffic accidents on highways, the location of stores alongside roads, the incidence of crime on streets and the contamination along rivers. In order to carry out analyses of those events, the researcher needs to be familiar with a range of specific techniques. Spatial Analysis Along Networks provides a practical guide to the necessary statistical techniques and their computational implementation. Each chapter illustrates a specific technique, from Stochastic Point Processes on a Network and Network Voronoi Diagrams, to Network K-function and Point Density Estimation Methods, and the Network Huff Model. The authors also discuss and illustrate the undertaking of the statistical tests described in a Geographical Information System (GIS) environment as well as demonstrating the user-friendly free software package SANET. Spatial Analysis Along Networks: Presents a much-needed practical guide to statistical spatial analysis of events on and alongside a network, in a logical, user-friendly order. Introduces the preliminary methods involved, before detailing the advanced, computational methods, enabling the readers a complete understanding of the advanced topics. Dedicates a separate chapter to each of the major techniques involved. Demonstrates the practicalities of undertaking the tests described in the book, using a GIS. Is supported by a supplementary website, providing readers with a link to the free software package SANET, so they can execute the statistical methods described in the book. Students and researchers studying spatial statistics, spatial analysis, geography, GIS, OR, traffic accident analysis, criminology, retail marketing, facility management and ecology will benefit from this book.

Statistical and Computational Methods for Analyzing High Throughput Genomic Data

Book Details:

Author : Jingyi Li
Publisher :
Release : 2013
ISBN :
Pages : 226 pages

Download or read book Statistical and Computational Methods for Analyzing High Throughput Genomic Data written by Jingyi Li and published by . This book was released on 2013 with total page 226 pages. Available in PDF, EPUB and Kindle. Book excerpt: In the burgeoning field of genomics, high-throughput technologies (e.g. microarrays, next-generation sequencing and label-free mass spectrometry) have enabled biologists to perform global analysis on thousands of genes, mRNAs and proteins simultaneously. Extracting useful information from enormous amounts of high-throughput genomic data is an increasingly pressing challenge to statistical and computational science. In this thesis, I will address three problems in which statistical and computational methods were used to analyze high-throughput genomic data to answer important biological questions. The first part of this thesis focuses on addressing an important question in genomics: how to identify and quantify mRNA products of gene transcription (i.e., isoforms) from next-generation mRNA sequencing (RNA-Seq) data? We developed a statistical method called Sparse Linear modeling of RNA-Seq data for Isoform Discovery and abundance Estimation (SLIDE) that employs probabilistic modeling and L1 sparse estimation to answer this ques- tion. SLIDE takes exon boundaries and RNA-Seq data as input to discern the set of mRNA isoforms that are most likely to present in an RNA-Seq sample. It is based on a linear model with a design matrix that models the sampling probability of RNA-Seq reads from different mRNA isoforms. To tackle the model unidentifiability issue, SLIDE uses a modified Lasso procedure for parameter estimation. Compared with existing deterministic isoform assembly algorithms, SLIDE considers the stochastic aspects of RNA-Seq reads in exons from different isoforms and thus has increased power in detecting more novel isoforms. Another advantage of SLIDE is its flexibility of incorporating other transcriptomic data into its model to further increase isoform discovery accuracy. SLIDE can also work downstream of other RNA-Seq assembly algorithms to integrate newly discovered genes and exons. Besides isoform discovery, SLIDE sequentially uses the same linear model to estimate the abundance of discovered isoforms. Simulation and real data studies show that SLIDE performs as well as or better than major competitors in both isoform discovery and abundance estimation. The second part of this thesis demonstrates the power of simple statistical analysis in correcting biases of system-wide protein abundance estimates and in understanding the rela- tionship between gene transcription and protein abundances. We found that proteome-wide surveys have significantly underestimated protein abundances, which differ greatly from previously published individual measurements. We corrected proteome-wide protein abundance estimates by using individual measurements of 61 housekeeping proteins, and then found that our corrected protein abundance estimates show a higher correlation and a stronger linear relationship with mRNA abundances than do the uncorrected protein data. To estimate the degree to which mRNA expression levels determine protein levels, it is critical to measure the error in protein and mRNA abundance data and to consider all genes, not only those whose protein expression is readily detected. This is a fact that previous proteome-widely surveys ignored. We took two independent approaches to re-estimate the percentage that mRNA levels explain in the variance of protein abundances. While the percentages estimated from the two approaches vary on different sets of genes, all suggest that previous protein-wide surveys have significantly underestimated the importance of transcription. In the third and final part, I will introduce a modENCODE (the Model Organism ENCyclopedia Of DNA Elements) project in which we compared developmental stages, tis- sues and cells (or cell lines) of Drosophila melanogaster and Caenorhabditis elegans, two well-studied model organisms in developmental biology. To understand the similarity of gene expression patterns throughout their development time courses is an interesting and important question in comparative genomics and evolutionary biology. The availability of modENCODE RNA-Seq data for different developmental stages, tissues and cells of the two organisms enables a transcriptome-wide comparison study to address this question. We undertook a comparison of their developmental time courses and tissues/cells, seeking com- monalities in orthologous gene expression. Our approach centers on using stage/tissue/cell- associated orthologous genes to link the two organisms. For every stage/tissue/cell in each organism, its associated genes are selected as the genes capturing specific transcriptional activities: genes highly expressed in that stage/tissue/cell but lowly expressed in a few other stages/tissues/cells. We aligned a pair of D. melanogaster and C. elegans stages/tissues/cells by a hypergeometric test, where the test statistic is the number of orthologous gene pairs associated with both stages/tissues/cells. The test is against the null hypothesis that the two stages/tissues/cells have independent sets of associated genes. We first carried out the alignment approach on pairs of stages/tissues/cells within D. melanogaster and C. elegans respectively, and the alignment results are consistent with previous findings, supporting the validity of this approach. When comparing fly with worm, we unexpectedly observed two parallel collinear alignment patterns between their developmental timecourses and several interesting alignments between their tissues and cells. Our results are the first findings regarding a comprehensive comparison between D. melanogaster and C. elegans time courses, tissues and cells.

Mathematics

Spatial Analysis with R

Book Details:

Author : Tonny J. Oyana
Publisher : CRC Press
Release : 2020-08-31
ISBN : 100017347X
Pages : 281 pages

Download or read book Spatial Analysis with R written by Tonny J. Oyana and published by CRC Press. This book was released on 2020-08-31 with total page 281 pages. Available in PDF, EPUB and Kindle. Book excerpt: In the five years since the publication of the first edition of Spatial Analysis: Statistics, Visualization, and Computational Methods, many new developments have taken shape regarding the implementation of new tools and methods for spatial analysis with R. The use and growth of artificial intelligence, machine learning and deep learning algorithms with a spatial perspective, and the interdisciplinary use of spatial analysis are all covered in this second edition along with traditional statistical methods and algorithms to provide a concept-based problem-solving learning approach to mastering practical spatial analysis. Spatial Analysis with R: Statistics, Visualization, and Computational Methods, Second Edition provides a balance between concepts and practicums of spatial statistics with a comprehensive coverage of the most important approaches to understand spatial data, analyze spatial relationships and patterns, and predict spatial processes. New in the Second Edition: Includes new practical exercises and worked-out examples using R Presents a wide range of hands-on spatial analysis worktables and lab exercises All chapters are revised and include new illustrations of different concepts using data from environmental and social sciences Expanded material on spatiotemporal methods, visual analytics methods, data science, and computational methods Explains big data, data management, and data mining This second edition of an established textbook, with new datasets, insights, excellent illustrations, and numerous examples with R, is perfect for senior undergraduate and first-year graduate students in geography and the geosciences.

Science

Computational Methods for Single Cell Data Analysis

Book Details:

Author : Guo-Cheng Yuan
Publisher : Humana Press
Release : 2019-02-14
ISBN : 9781493990566
Pages : 271 pages

Download or read book Computational Methods for Single Cell Data Analysis written by Guo-Cheng Yuan and published by Humana Press. This book was released on 2019-02-14 with total page 271 pages. Available in PDF, EPUB and Kindle. Book excerpt: This detailed book provides state-of-art computational approaches to further explore the exciting opportunities presented by single-cell technologies. Chapters each detail a computational toolbox aimed to overcome a specific challenge in single-cell analysis, such as data normalization, rare cell-type identification, and spatial transcriptomics analysis, all with a focus on hands-on implementation of computational methods for analyzing experimental data. Written in the highly successful Methods in Molecular Biology series format, chapters include introductions to their respective topics, lists of the necessary materials and reagents, step-by-step, readily reproducible laboratory protocols, and tips on troubleshooting and avoiding known pitfalls. Authoritative and cutting-edge, Computational Methods for Single-Cell Data Analysis aims to cover a wide range of tasks and serves as a vital handbook for single-cell data analysis.

Statistical Simulation and Analysis of Single cell RNA seq Data

Book Details:

Author : Tianyi Sun
Publisher :
Release : 2023
ISBN :
Pages : 0 pages

Download or read book Statistical Simulation and Analysis of Single cell RNA seq Data written by Tianyi Sun and published by . This book was released on 2023 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: The recent development of single-cell RNA sequencing (scRNA-seq) technologies has revolutionized transcriptomic studies by revealing the genome-wide gene expression levels within individual cells. In contrast to bulk RNA sequencing, scRNA-seq technology captures cell-specific transcriptome landscapes, which can reveal crucial information about cell-to-cell heterogeneity across different tissues, organs, and systems and enable the discovery of novel cell types and new transient cell states. According to search results from PubMed, from 2009-2023, over 5,000 published studies have generated datasets using this technology. Such large volumes of data call for high-quality statistical methods for their analysis. In the three projects of this dissertation, I have explored and developed statistical methods to model the marginal and joint gene expression distributions and determine the latent structure type for scRNA-seq data. In all three projects, synthetic data simulation plays a crucial role. My first project focuses on the exploration of the Beta-Poisson hierarchical model for the marginal gene expression distribution of scRNA-seq data. This model is a simplified mechanistic model with biological interpretations. Through data simulation, I demonstrate three typical behaviors of this model under different parameter combinations, one of which can be interpreted as one source of the sparsity and zero inflation that is often observed in scRNA-seq datasets. Further, I discuss parameter estimation methods of this model and its other applications in the analysis of scRNA-seq data. My second project focuses on the development of a statistical simulator, scDesign2, to generate realistic synthetic scRNA-seq data. Although dozens of simulators have been developed before, they lack the capacity to simultaneously achieve the following three goals: preserving genes, capturing gene correlations, and generating any number of cells with varying sequencing depths. To fill in this gap, scDesign2 is developed as a transparent simulator that achieves all three goals and generates high-fidelity synthetic data for multiple scRNA-seq protocols and other single-cell gene expression count-based technologies. Compared with existing simulators, scDesign2 is advantageous in its transparent use of probabilistic models and is unique in its ability to capture gene correlations via copula. We verify that scDesign2 generates more realistic synthetic data for four scRNA-seq protocols (10x Genomics, CEL-Seq2, Fluidigm C1, and Smart-Seq2) and two single-cell spatial transcriptomics protocols (MERFISH and pciSeq) than existing simulators do. Under two typical computational tasks, cell clustering and rare cell type detection, we demonstrate that scDesign2 provides informative guidance on deciding the optimal sequencing depth and cell number in single-cell RNA-seq experimental design, and that scDesign2 can effectively benchmark computational methods under varying sequencing depths and cell numbers. With these advantages, scDesign2 is a powerful tool for single-cell researchers to design experiments, develop computational methods, and choose appropriate methods for specific data analysis needs. My third project focuses on deciding latent structure types for scRNA-seq datasets. Clustering and trajectory inference are two important data analysis tasks that can be performed for scRNA-seq datasets and will lead to different interpretations. However, as of now, there is no principled way to tell which one of these two types of analysis results is more suitable to describe a given dataset. In this project, we propose two computational approaches that aim to distinguish cluster-type vs. trajectory-type scRNA-seq datasets. The first approach is based on building a classifier using eigenvalue features of the gene expression covariance matrix, drawing inspiration from random matrix theory (RMT). The second approach is based on comparing the similarity of real data and simulated data generated by assuming the cell latent structure as clusters or a trajectory. While both approaches have limitations, we show that the second approach gives more promising results and has room for further improvements.

Statistical and Computational Methods for Single cell Transcriptome Sequencing and Metagenomics

Book Details:

Author : Fanny Perraudeau
Publisher :
Release : 2018
ISBN :
Pages : 246 pages

Download or read book Statistical and Computational Methods for Single cell Transcriptome Sequencing and Metagenomics written by Fanny Perraudeau and published by . This book was released on 2018 with total page 246 pages. Available in PDF, EPUB and Kindle. Book excerpt: I propose statistical methods and software for the analysis of single-cell transcriptome sequencing (scRNA-seq) and metagenomics data. Specifically, I present a general and flexible zero-inflated negative binomial-based wanted variation extraction (ZINB-WaVE) method, which extracts low-dimensional signal from scRNA-seq read counts, accounting for zero inflation (dropouts), over-dispersion, and the discrete nature of the data. Additionally, I introduce an application of the ZINB-WaVE method that identifies excess zero counts and generates gene and cell-specific weights to unlock bulk RNA-seq differential expression pipelines for zero-inflated data, boosting performance for scRNA-seq analysis. Finally, I present a method to estimate bacterial abundances in human metagenomes using full-length 16S sequencing reads.

Science

Statistical Genomics

Book Details:

Author : Brooke Fridley
Publisher : Springer Nature
Release : 2023-03-16
ISBN : 1071629867
Pages : 377 pages

Download or read book Statistical Genomics written by Brooke Fridley and published by Springer Nature. This book was released on 2023-03-16 with total page 377 pages. Available in PDF, EPUB and Kindle. Book excerpt: This volume provides a collection of protocols from researchers in the statistical genomics field. Chapters focus on integrating genomics with other “omics” data, such as transcriptomics, epigenomics, proteomics, metabolomics, and metagenomics. Written in the highly successful Methods in Molecular Biology series format, chapters include introductions to their respective topics, lists of the necessary materials and reagents, step-by-step, readily reproducible laboratory protocols, and tips on troubleshooting and avoiding known pitfalls. Cutting-edge and thorough, Statistical Genomics hopes that by covering these diverse and timely topics researchers are provided insights into future directions and priorities of pan-omics and the precision medicine era.

Computational Methods for Spatial Statistics and Image Data

Book Details:

Author : Nancy McMillan
Publisher :
Release : 1993
ISBN :
Pages : 372 pages

Download or read book Computational Methods for Spatial Statistics and Image Data written by Nancy McMillan and published by . This book was released on 1993 with total page 372 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Computational Methods for Analysis of Spatial Trancsriptomics Data

Book Details:

Author : Alma Andersson
Publisher :
Release : 2022
ISBN : 9789180401425
Pages : pages

Download or read book Computational Methods for Analysis of Spatial Trancsriptomics Data written by Alma Andersson and published by . This book was released on 2022 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt:

Statistical Methods for RNA sequencing Data

Book Details:

Author : Rhonda Bacher
Publisher :
Release : 2017
ISBN :
Pages : 0 pages

Download or read book Statistical Methods for RNA sequencing Data written by Rhonda Bacher and published by . This book was released on 2017 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Major methodological and technological advances in sequencing have inspired ambitious biological questions that were previously elusive. Addressing such questions with novel and complex data requires statistically rigorous tools. In this dissertation, I develop, evaluate, and apply statistical and computational methods for analysis of high-throughput sequencing data. A unifying theme of this work is that all these methods are aimed at RNA-seq data. The first method focuses on characterizing gene expression in RNA-seq experiments with ordered conditions. The second focuses on single-cell RNA-seq data, where we develop a method for normalization to account for a previously unknown technical artifact in the data. Finally, we develop a simulation in order to recapitulate the source of the artifact [in silico].

Computers

Gene Expression Data Analysis

Book Details:

Author : Pankaj Barah
Publisher : Chapman & Hall/CRC
Release : 2021-08
ISBN : 9780429322655
Pages : 360 pages

Download or read book Gene Expression Data Analysis written by Pankaj Barah and published by Chapman & Hall/CRC. This book was released on 2021-08 with total page 360 pages. Available in PDF, EPUB and Kindle. Book excerpt: Development of high-throughput technologies in molecular biology during the last two decades has contributed to the production of tremendous amounts of data. Microarray and RNA sequencing are two such widely used high-throughput technologies for simultaneously monitoring the expression patterns of thousands of genes. Data produced from such experiments are voluminous (both in dimensionality and numbers of instances) and evolving in nature. Analysis of huge amounts of data toward the identification of interesting patterns that are relevant for a given biological question requires high-performance computational infrastructure as well as efficient machine learning algorithms. Cross-communication of ideas between biologists and computer scientists remains a big challenge. Gene Expression Data Analysis: A Statistical and Machine Learning Perspective has been written with a multidisciplinary audience in mind. The book discusses gene expression data analysis from molecular biology, machine learning, and statistical perspectives. Readers will be able to acquire both theoretical and practical knowledge of methods for identifying novel patterns of high biological significance. To measure the effectiveness of such algorithms, we discuss statistical and biological performance metrics that can be used in real life or in a simulated environment. This book discusses a large number of benchmark algorithms, tools, systems, and repositories that are commonly used in analyzing gene expression data and validating results. This book will benefit students, researchers, and practitioners in biology, medicine, and computer science by enabling them to acquire in-depth knowledge in statistical and machine-learning-based methods for analyzing gene expression data. Key Features: An introduction to the Central Dogma of molecular biology and information flow in biological systems A systematic overview of the methods for generating gene expression data Background knowledge on statistical modeling and machine learning techniques Detailed methodology of analyzing gene expression data with an example case study Clustering methods for finding co-expression patterns from microarray, bulkRNA, and scRNA data A large number of practical tools, systems, and repositories that are useful for computational biologists to create, analyze, and validate biologically relevant gene expression patterns Suitable for multidisciplinary researchers and practitioners in computer science and the biological sciences

Statistical and Computational Methods for Studying Genomic Spatial Structure and Properties

Book Details:

Author : Shay Ben-Elazar
Publisher :
Release : 2019
ISBN :
Pages : 125 pages

Download or read book Statistical and Computational Methods for Studying Genomic Spatial Structure and Properties written by Shay Ben-Elazar and published by . This book was released on 2019 with total page 125 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Bayesian statistical decision theory

Bayesian Models for High Throughput Spatial Transcriptomics

Book Details:

Author : Carter Allen
Publisher :
Release : 2022
ISBN :
Pages : 0 pages

Download or read book Bayesian Models for High Throughput Spatial Transcriptomics written by Carter Allen and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: High throughput spatial transcriptomics (HST) is a rapidly emerging class of experimental technologies that allow for profiling gene expression in tissue samples at or near single-cell resolution while retaining the spatial location of each sequencing unit within the tissue sample. Through analyzing HST data, we seek to identify sub-populations of cells within a tissue sample that may inform biological phenomena such as disease status, treatment response, sex bias, et cetera. However, computational approaches for discerning sub-populations in HST data are still limited in that they (i) are unable to directly model normalized gene expression features to achieve more biologically interpretable sub-populations; (ii) fail to accommodate multi-sample experimental designs, thereby precluding the study of group effects such as treatment or disease status; or (iii) consider sub-populations as static entities, thus ignoring the interactive nature of cells within and between sub-populations. This dissertation seeks to address these gaps through development of various Bayesian statistical models and software. In Chapter 1, we introduce HST data and discuss germane features, such as spatial autocorrelation, skewness, and batch effects. In Chapter 2 we develop SPRUCE: a Bayesian spatial mixture model capable of achieving state of the art identification of cell sub-populations relative to manual expert annotations. An R package, spruce, is available through The Comprehensive R Archive Network (CRAN). In Chapter 3, we present MAPLE: the first HST analysis tool capable of differential abundance analysis (DAA) in multi-sample HST data. Further, we introduce uncertainty quantification to HST data analysis to account for the inherent uncertainty in sub-population labels that is ignored by existing computational methods. An R package, maple, is available through CRAN. Finally, in Chapter 4 we introduce analysis of community connectivity (ACC) to HST data. Through ACC, we seek to not only label biologically informative sub-populations in a tissue sample, but describe the similarity among groups of cells within and between sub-populations. We achieve ACC through the development of a novel multi-layer stochastic block model, which jointly models the inter-relationships among cells in terms of spatial information and gene expression patterns. We provide an R package, banyan, for implementation of ACC. Taken together, this dissertation utilizes Bayesian statistical modeling to enhance the available methodology for HST data analysis. In doing so, this work expands the range of biological insights available from HST data.

Statistical and Computational Methods for Comparing High Throughput Data from Two Conditions

Book Details:

Author : Xinzhou Ge
Publisher :
Release : 2021
ISBN :
Pages : 186 pages

Download or read book Statistical and Computational Methods for Comparing High Throughput Data from Two Conditions written by Xinzhou Ge and published by . This book was released on 2021 with total page 186 pages. Available in PDF, EPUB and Kindle. Book excerpt: The development of high-throughput biological technologies have enabled researchers to simultaneously perform analysis on thousands of features (e.g., genes, genomic regions, and proteins). The most common goal of analyzing high-throughput data is to contrast two conditions, to identify ``interesting'' features, whose values differ between two conditions. How to contrast the features from two conditions to extract useful information from high-throughput data, and how to ensure the reliability of identified features are two increasingly pressing challenge to statistical and computational science. This dissertation aim to address these two problems regarding analysing high-throughput data from two conditions. My first project focuses on false discovery rate (FDR) control in high-throughput data analysis from two conditions. FDR is defined as the expected proportion of uninteresting features among the identified ones. It is the most widely-used criterion to ensure the reliability of the interesting features identified. Existing bioinformatics tools primarily control the FDR based on p-values. However, obtaining valid p-values relies on either reasonable assumptions of data distribution or large numbers of replicates under both conditions, two requirements that are often unmet in biological studies. In Chapter \ref{chap:clipper}, we propose Clipper, a general statistical framework for FDR control without relying on p-values or specific data distributions. Clipper is applicable to identifying both enriched and differential features from high-throughput biological data of diverse types. In comprehensive simulation and real-data benchmarking, Clipper outperforms existing generic FDR control methods and specific bioinformatics tools designed for various tasks, including peak calling from ChIP-seq data, and differentially expressed gene identification from bulk or single-cell RNA-seq data. Our results demonstrate Clipper's flexibility and reliability for FDR control, as well as its broad applications in high-throughput data analysis. My second project focuses on alignment of multi-track epigenomic signals from different samples or conditions. The availability of genome-wide epigenomic datasets enables in-depth studies of epigenetic modifications and their relationships with chromatin structures and gene expression. Various alignment tools have been developed to align nucleotide or protein sequences in order to identify structurally similar regions. However, there are currently no alignment methods specifically designed for comparing multi-track epigenomic signals and detecting common patterns that may explain functional or evolutionary similarities. We propose a new local alignment algorithm, EpiAlign, designed to compare chromatin state sequences learned from multi-track epigenomic signals and to identify locally aligned chromatin regions. EpiAlign is a dynamic programming algorithm that novelly incorporates varying lengths and frequencies of chromatin states. We demonstrate the efficacy of EpiAlign through extensive simulations and studies on the real data from the NIH Roadmap Epigenomics project. EpiAlign can also detect common chromatin state patterns across multiple epigenomes from conditions, and it will serve as a useful tool to group and distinguish epigenomic samples based on genome-wide or local chromatin state patterns.

efficient statistical and computational methods for large scale sequencing data

Book Details:

Author :
Publisher :
Release :
ISBN :
Pages : 0 pages

Download or read book efficient statistical and computational methods for large scale sequencing data written by and published by . This book was released on with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Statistical Methods for Genomics and Genetics Data Analysis

Book Details:

Author : Ziyue Wang
Publisher :
Release : 2020
ISBN :
Pages : 0 pages

Download or read book Statistical Methods for Genomics and Genetics Data Analysis written by Ziyue Wang and published by . This book was released on 2020 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Over the past decades, genome research has led to major technological advances in sequencing, genotyping, and phenotyping. Identifying the genetic basis of disease as well as the relationship and function of genes becomes a central problem in a number of biological endeavors, as it is essential for understanding disease mechanism. In this dissertation, I develop, implement, evaluate and apply statistical and computational methods for analysis of various types of genomic data. A unifying theme in my work is developing methods to address problems that arise in the labs of my collaborators, focusing on the right balance between computational simplicity and impact. The first method I developed focuses on integrating genetic information across mouse and human genome to uncover important disease-related genetic signals. Namely, I developed the cross-species-integration (CSI) pipeline with two modules: an iterative mapping procedure to narrow down QTL regions of interest and a concordant test to improve functional inference in GWAS. The second method focuses on developing a gene association network model to learn relationships among genes. Specifically, I developed scNBN, a Negative Binomial based graphical model combining a proper neighborhood selection algorithm for recovering gene association networks using scRNA-seq data. The final part introduces the R packages that implement above methods which are beneficial for both statisticians and scientists who are interested in performing these analyses.