EBookClubs

Read Books & Download eBooks Full Online

EBookClubs

Read Books & Download eBooks Full Online

Book Computational Methods for Integrative Annotation of the Human Regulatory Genome

Download or read book Computational Methods for Integrative Annotation of the Human Regulatory Genome written by Tevfik Umut Dincer and published by . This book was released on 2023 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Deciphering the complex regulatory programs controlling gene expression is key to gaining insight into countless biological processes. However, a comprehensive characterization of the regulatory elements controlling expression across diverse cell types remains elusive. Analysis of DNA sequence provides insights into potential regulatory regions but cannot provide functional evidence of regulation on its own. Biochemical assays like ChIP-seq and ATAC-seq map epigenetic marks and regions of open chromatin associated with regulatory activity in a wide variety of cell and tissue types across the genome, but do not directly measure regulatory activity. Functional characterization assays like massively parallel reporter assays or CRISPR interference screens offer more direct evidence of regulatory activity but may have limited genomic coverage and cell type availability. Computational methods integrating these diverse data types can enable the prediction and interpretation of regulatory elements across the genome. Here, I present integrative modeling approaches that combine epigenomic, functional, and DNA sequence data for the comprehensive annotation of the human regulatory genome. First, we introduce ChromActivity, a computational method for annotating the regulatory genome across hundreds of cell and tissue types. ChromActivity integrates epigenomic data across over a hundred human cell and tissue types with a diverse set of functional characterization datasets to generate genomewide annotations of regulatory activity. ChromActivity provides annotations featuring discrete states reflecting combinatorial activity patterns and also continuous activity scores reflecting predicted regulatory element activities. Next, we present SHARPR-seq, a computational method for integrating DNA sequence information to extend the Sharpr-MPRA high-resolution regulatory activity mapping framework. SHARPR-seq improves upon the SHARPR method in multiple evaluation metrics, enabling improved functional dissection of regulatory elements controlling gene expression. These integrative modeling approaches demonstrate the utility of combining complementary data types to provide a more comprehensive understanding of the human regulatory landscape.

Book Computational Methods for the Analysis of Genomic Data and Biological Processes

Download or read book Computational Methods for the Analysis of Genomic Data and Biological Processes written by Francisco A. Gómez Vela and published by MDPI. This book was released on 2021-02-05 with total page 222 pages. Available in PDF, EPUB and Kindle. Book excerpt: In recent decades, new technologies have made remarkable progress in helping to understand biological systems. Rapid advances in genomic profiling techniques such as microarrays or high-performance sequencing have brought new opportunities and challenges in the fields of computational biology and bioinformatics. Such genetic sequencing techniques allow large amounts of data to be produced, whose analysis and cross-integration could provide a complete view of organisms. As a result, it is necessary to develop new techniques and algorithms that carry out an analysis of these data with reliability and efficiency. This Special Issue collected the latest advances in the field of computational methods for the analysis of gene expression data, and, in particular, the modeling of biological processes. Here we present eleven works selected to be published in this Special Issue due to their interest, quality, and originality.

Book Computational Methods for Integrative Inference of Genome scale Gene Regulatory Networks

Download or read book Computational Methods for Integrative Inference of Genome scale Gene Regulatory Networks written by Alireza Fotuhi Siahpirani and published by . This book was released on 2019 with total page 156 pages. Available in PDF, EPUB and Kindle. Book excerpt: Inference of transcriptional regulatory networks is an important filed of research in systems biology, and many computational methods have been developed to infer regulatory networks from different types of genomic data. One of the most popular classes of computational network inference methods is expression based network inference. Given the mRNA levels of genes, these methods reconstruct a network between regulatory genes (called transcription factors) and potential target genes that best explains the input data. However, it has been shown that the networks that are inferred only using expression, have low agreement with experimentally validated physical regulatory interactions. In recent years, many methods have been developed to improve the accuracy of these computational methods by incorporating additional data types. In this dissertation, we describe our contributions towards advancing the state of the art in this field. Our first contribution, is developing a prior-based network inference method, MERLIN-P. MERLIN-P uses both expression of genes, and prior knowledge of interactions between regulatory genes and their potential targets, and infers a network that is supported by both expression and prior knowledge. Using a logistic function, MERLIN-P could incorporate and combine multiple sources of prior knowledge. The inferred networks in yeast, outperform state of the art expression based network inference methods, and perform better or at a par with prior based state of the art method. Our second contribution, is developing a method to estimate transcription factor activity from a noisy prior network, NCA+LASSO. Network Component Analysis (NCA), is a computational method that given expression of target genes and a (potentially incomplete and noisy) network structure that describes the connection of regulatory genes to these target genes, estimates unobserved activity of the regulators (transcription factor activities, TFA). It has been shown that using TFA can improve the quality of inferred networks. However, our prior knowledge in new contexts could be incomplete and noisy, and we do not know to what extent presence of noise in input network affects the quality of estimated TFA. We first show how presence of noise in the input prior network can decrease the quality of estimated TFA, and then show that by adding a regularization term, we can improve the quality of the estimated TFA. We show that using estimated TFA instead of just expression of TFs in network inference, improves the agreement of inferred networks to experimentally validated physical interactions, for all state of the art methods, including MERLIN-P. Our final contribution, is developing a multi-task inference method, Dynamic Regulatory Module Network (DRMN), that simultaneously infers regulatory networks for related cell lines, while taking into account the expected similarity of the cell lines. Many biological contexts are hierarchically related, and leveraging the similarity of these contexts could help us infer more accurate regulatory programs in each context. However, the small number of measurements in each context makes the inference of regulatory networks challenging. By inferring regulatory programs at module level (groups of co-expressed genes), DRMN is able to handle the small number of measurements, while the use of multi-task learning allows for incorporation of hierarchical relationship of contexts. DRMN first infers modules of co-expressed genes in each cell line, then infers a regulatory network for each module, and iteratively updates the inferred modules to reflect both co-expression and co-regulation, and updates the inferred networks to reflect the updated modules. We assess the accuracy of the inferred networks by predicting the expression on hold out genes, and show that the resulting modules and networks, provide insight into the process of differentiation between these related cell lines. For all the developed methods, we validate our results by comparing to known experimentally validated networks, and show that our results provide useful insight into the biological processes under consideration. Specifically, in chapter 2, we evaluated our inferred networks based on both network structure and predictive power, identified TFs that all tested methods fail to recover their target sets, and explored potential reasons that can explain this failure. Additionally, we used our method to infer stress specific networks, and evaluated predictions using stress specific knock-down experiments. In chapter 3, we evaluated our inferred networks based on both network structure and predictive power, and furthermore used our inferred networks to identify potential regulators that could be important for pluripotency state in mESC. We tested the effect of these regulators using shRNA experiments, and experimentally validated some of their predicted targets. Finally, in chapter 4, we evaluated our inferred models based on their predictive power and ability to predict gene expression in hold out data.

Book Detection  Annotation and Prioritization of Human Regulatory Variants in the Genetics Study

Download or read book Detection Annotation and Prioritization of Human Regulatory Variants in the Genetics Study written by Jun Mulin Li and published by . This book was released on 2017-01-26 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: This dissertation, "Detection, Annotation and Prioritization of Human Regulatory Variants in the Genetics Study" by Jun, Mulin, Li, 李俊, was obtained from The University of Hong Kong (Pokfulam, Hong Kong) and is being sold pursuant to Creative Commons: Attribution 3.0 Hong Kong License. The content of this dissertation has not been altered in any way. We have altered the formatting in order to facilitate the ease of printing and reading of the dissertation. All rights not granted by the above license are retained by the author. Abstract: Interpreting human regulatory variants in the noncoding genomic region is critical to understand the regulatory mechanisms of disease pathogenesis and promote personalized medicine. Recent studies showed that the associated SNPs detected by genome wide association study (GWAS) are significantly enriched in those regions that harbor functional elements, such as transcriptional factor binding sites (TFBSs), chromatin with histone modifications, DNase I hypersensitive sites (DHSs), expression quantitative trait loci (eQTLs) and microRNA (miRNA) binding sites. With the accumulation of functional genomics data, computational methods have been developed to annotate, predict and prioritize noncoding regulatory variants regarding different biological processes. However, evaluating the regulatory effect of genetic variants requires systematic consideration in both transcriptional and post-transcriptional level. In this dissertation, we designed a set of computational methods to predict and prioritize regulatory variants that affect gene regulation with comprehensive evaluations. We first constructed an integrative database that collect all disease-associated variants from genome wide association study (GWAS). Given the GWAS variants for particular disease/trait, we developed a pipeline GWAS3D to systematically analyze the probability of genetics variants affecting regulatory pathways and underlying disease associations by integrating chromatin state, long range chromosome interaction, sequence motif, and conservation information. We demonstrated that GWAS3D can identify functional regulatory variant that was experimentally validated to affect enhancer function. Detection and prioritization of regulatory variants in a particular cell/tissue is challenging and requires systematic consideration of chromatin states under corresponding condition. Prediction based on cell type-specific function genomic data can improve the chance and accuracy of regulatory variants discovery. By combining results from multiple methods and epigenome profiles, we developed a Bayesian approach to measure the regulatory potential of genetic variants in a cell type-specific manner. This model can also measure the ensemble effect of chromatin marks around variant locus and estimate regulatory probability of genetic variant on specific cell environment. We showed that this integrative and condition-dependent strategy significantly improves the prediction performance of functional regulatory variants. Last, we sought to investigate whether genetic variants in the miRNA binding site can affect the function of competing endogenous RNA (ceRNA) and subsequent disease development. Using RNA-seq data on human individuals from different populations, we revealed the genome-wide association between DNA polymorphism and ceRNA regulation. We found regulatory variants can simultaneously affect gene expression changes in both cis and trans through the ceRNA mechanism. We prioritized these variants with their associated ceRNAs according to different criteria and evaluated their collective effect on the ceRNA regulatory network. DOI: 10.5353/th_b5689295 Subjects: Human genetics - Variation Genomics - Data processing

Book Computational Methods in Genome Research

Download or read book Computational Methods in Genome Research written by Sándor Suhai and published by Springer Science & Business Media. This book was released on 2012-12-06 with total page 230 pages. Available in PDF, EPUB and Kindle. Book excerpt: The application of computational methods to solve scientific and pratical problems in genome research created a new interdisciplinary area that transcends boundaries traditionally separating genetics, biology, mathematics, physics, and computer science. Computers have been, of course, intensively used for many year~ in the field of life sciences, even before genome research started, to store and analyze DNA or proteins sequences, to explore and model the three-dimensional structure, the dynamics and the function of biopolymers, to compute genetic linkage or evolutionary processes etc. The rapid development of new molecular and genetic technologies, combined with ambitious goals to explore the structure and function of genomes of higher organisms, has generated, however, not only a huge and burgeoning body of data but also a new class of scientific questions. The nature and complexity of these questions will require, beyond establishing a new kind of alliance between experimental and theoretical disciplines, also the development of new generations both in computer software and hardware technologies, respectively. New theoretical procedures, combined with powerful computational facilities, will substantially extend the horizon of problems that genome research can ·attack with success. Many of us still feel that computational models rationalizing experimental findings in genome research fulfil their promises more slowly than desired. There also is an uncertainity concerning the real position of a 'theoretical genome research' in the network of established disciplines integrating their efforts in this field.

Book Integrative Approaches for Mining High throughput Genomic Data

Download or read book Integrative Approaches for Mining High throughput Genomic Data written by Kenneth Daily and published by . This book was released on 2011 with total page 115 pages. Available in PDF, EPUB and Kindle. Book excerpt: The study of transcriptional regulation encompasses many fields in molecular, cell, evolu- tionary, and computational biology. For any given genome, only a small fraction of the regulatory elements embedded in the DNA sequence have been characterized, and there is great interest in developing computational methods to systematically discover and map all these elements. High-throughput techniques have made genome-wide assays standard for the analysis of mechanisms of regulation, and the amount of data available for analysis is increasing exponentially. Computational techniques have been developed in tandem to pro- cess, synthesize, index, and store these datasets. We describe here results from various levels of the study of transcriptional regulation and the methods developed to facilitate analysis. First, we develop and improve a pipeline (termed MotifMap) for the search, storage, and integration of transcription factor binding sites in the species of multiple model organisms. We employ a phylogenetic footprinting approach to reducing the number of false positive sites reported, and evaluate the performance using high-throughput sequencing datasets for a number of transcription factors. Next, we employ this pipeline in conjunction with high- throughput sequencing data in a study to annotate retrotransposon insertion sites across the yeast genome. Specific elements are observed proximal to these insertion sites, and their identification is aided by the MotifMap pipeline. Lastly, we describe techniques to compress and store high-throughput sequencing data. Our algorithm's performance is comparable to standard compression techniques, while maintaining the ability to use the data for analysis.

Book Theoretical and Computational Methods in Genome Research

Download or read book Theoretical and Computational Methods in Genome Research written by Sándor Suhai and published by Springer Science & Business Media. This book was released on 1997 with total page 352 pages. Available in PDF, EPUB and Kindle. Book excerpt: Contains plenary lectures presented at the March 1996 International Symposium on Theoretical and Computational Genome Research, held in Heidelberg, Germany. Topics include the feasibility of whole human genome sequencing, analysis of gene functions by the metabolic pathway database, error analysis o

Book Computational Methods for Single Cell Data Analysis

Download or read book Computational Methods for Single Cell Data Analysis written by Guo-Cheng Yuan and published by Humana Press. This book was released on 2019-02-14 with total page 271 pages. Available in PDF, EPUB and Kindle. Book excerpt: This detailed book provides state-of-art computational approaches to further explore the exciting opportunities presented by single-cell technologies. Chapters each detail a computational toolbox aimed to overcome a specific challenge in single-cell analysis, such as data normalization, rare cell-type identification, and spatial transcriptomics analysis, all with a focus on hands-on implementation of computational methods for analyzing experimental data. Written in the highly successful Methods in Molecular Biology series format, chapters include introductions to their respective topics, lists of the necessary materials and reagents, step-by-step, readily reproducible laboratory protocols, and tips on troubleshooting and avoiding known pitfalls. Authoritative and cutting-edge, Computational Methods for Single-Cell Data Analysis aims to cover a wide range of tasks and serves as a vital handbook for single-cell data analysis.

Book Integrative Computational Genomics Based Approaches to Uncover the Tissue Specific Regulatory Networks in Development and Disease

Download or read book Integrative Computational Genomics Based Approaches to Uncover the Tissue Specific Regulatory Networks in Development and Disease written by Rajneesh Srivastava and published by . This book was released on 2020 with total page 400 pages. Available in PDF, EPUB and Kindle. Book excerpt: Regulatory protein families such as transcription factors (TFs) and RNA Binding Proteins (RBPs) are increasingly being appreciated for their role in regulating the respective targeted genomic/transcriptomic elements resulting in dynamic transcriptional (TRNs) and post-transcriptional regulatory networks (PTRNs) in higher eukaryotes. The mechanistic understanding of these two regulatory network types require a high resolution tissue-specific functional annotation of both the proteins as well as their target sites. This dissertation addresses the need to uncover the tissue-specific regulatory networks in development and disease. This work establishes multiple computational genomics based approaches to further enhance our understanding of regulatory circuits and decipher the associated mechanisms at several layers of biological processes. This study potentially contributes to the research community by providing valuable resources including novel methods, web interfaces and software which transforms our ability to build high-quality regulatory binding maps of RBPs and TFs in a tissue specific manner using multi-omics datasets. The study deciphered the broad spectrum of temporal and evolutionary dynamics of the transcriptome and their regulation at transcriptional and post transcriptional levels. It also advances our ability to functionally annotate hundreds of RBPs and their RNA binding sites across tissues in the human genome which help in decoding the role of RBPs in the context of disease phenotype, networks, and pathways. The approaches developed in this dissertation is scalable and adaptable to further investigate the tissue specific regulators in any biological systems. Overall, this study contributes towards accelerating the progress in molecular diagnostics and drug target identification using regulatory network analysis method in disease and pathophysiology.

Book Computational Methods for Identifying and Characterizing the Human Gene Regulatory Regions and Cis elements

Download or read book Computational Methods for Identifying and Characterizing the Human Gene Regulatory Regions and Cis elements written by and published by . This book was released on 2004 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: The identification of functional regulatory regions and cis-elements is a preliminary step toward the reconstruction of gene regulatory networks. Comparative genomics has been demonstrated to be a powerful approach for motif discovery. However, the accurate alignment of complex genomic sequences, especially those of mammalians, remains a computational challenge. In chapter 2, we propose a novel pairwise alignment system, ACANA, to improve the alignment quality of genomic sequences. Compared with top competing alignment tools, ACANA achieves better alignment quality in aligning divergent sequences for both local and global alignments. When applied to the upstream sequences of human-mouse orthologs, ACANA is able to reliably detect the conserved functional regions containing most cis-elements. Statistical motif modeling is another fundamental computational approach for motif prediction in large genome sequence. In chapter 3, we introduce the mixture of optimized Markov models to reduce false motif discovery rate in large genomic sequences. Our model is not only able to incorporate most dependency information within a motif by optimizing the arrangement of motif positions, but also flexible for adjusting model complexity limited by the size of training data. We implement the mixture model in our OMiMa system. Using OMiMa, we demonstrate that our model can improve motif prediction accuracy. Although the reconstruction of complete human gene regulatory networks, at present, remains a distant hope, it is still possible to infer some distinct features of the networks from the available data. In chapter 4, we present an example of inferring major evolutionary features of human gene regulatory networks by combining information from both gene sequence data and functional annotations. We systematically analyze the association between gene function and upstream region conservation for human-rodent orthologs. Our study shows that upstream regulatory regions of developmental tran.

Book Computational Methods for the Analysis of Genomic Data and Biological Processes

Download or read book Computational Methods for the Analysis of Genomic Data and Biological Processes written by Francisco A. Gómez Vela and published by . This book was released on 2021 with total page 222 pages. Available in PDF, EPUB and Kindle. Book excerpt: In recent decades, new technologies have made remarkable progress in helping to understand biological systems. Rapid advances in genomic profiling techniques such as microarrays or high-performance sequencing have brought new opportunities and challenges in the fields of computational biology and bioinformatics. Such genetic sequencing techniques allow large amounts of data to be produced, whose analysis and cross-integration could provide a complete view of organisms. As a result, it is necessary to develop new techniques and algorithms that carry out an analysis of these data with reliability and efficiency. This Special Issue collected the latest advances in the field of computational methods for the analysis of gene expression data, and, in particular, the modeling of biological processes. Here we present eleven works selected to be published in this Special Issue due to their interest, quality, and originality.

Book Computational Methods for Analysis and Modeling of Time course Gene Expression Data

Download or read book Computational Methods for Analysis and Modeling of Time course Gene Expression Data written by and published by . This book was released on 2004 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Genes encode proteins, some of which in turn regulate other genes. Such interactions make up gene regulatory relationships or (dynamic) gene regulatory networks. With advances in the measurement technology for gene expression and in genome sequencing, it has become possible to measure the expression level of thousands of genes simultaneously in a cell at a series of time points over a specific biological process. Such time-course gene expression data may provide a snapshot of most (if not all) of the interesting genes and may lead to a better understanding gene regulatory relationships and networks. However, inferring either gene regulatory relationships or networks puts a high demand on powerful computational methods that are capable of sufficiently mining the large quantities of time-course gene expression data, while reducing the complexity of the data to make them comprehensible. This dissertation presents several computational methods for inferring gene regulatory relationships and gene regulatory networks from time-course gene expression. These methods are the result of the authors doctoral study. Cluster analysis plays an important role for inferring gene regulatory relationships, for example, uncovering new regulons (sets of co-regulated genes) and their putative cis-regulatory elements. Two dynamic model-based clustering methods, namely the Markov chain model (MCM)-based clustering and the autoregressive model (ARM)-based clustering, are developed for time-course gene expression data. However, gene regulatory relationships based on cluster analysis are static and thus do not describe the dynamic evolution of gene expression over an observation period. The gene regulatory network is believed to be a time-varying system. Consequently, a state-space model for dynamic gene regulatory networks from time-course gene expression data is developed. To account for the complex time-delayed relationships in gene regulatory networks, the state space model is extended to.

Book Computational Methods for Studying Gene Regulation and Genome Organization Using High throughput DNA Sequencing

Download or read book Computational Methods for Studying Gene Regulation and Genome Organization Using High throughput DNA Sequencing written by Giancarlo A. Bonora and published by . This book was released on 2015 with total page 308 pages. Available in PDF, EPUB and Kindle. Book excerpt: The full sequencing of the human genome ushered in the genomics era and laid the foundation for a more comprehensive understanding of gene regulation and development. But, since the DNA sequence represents only one aspect of the genomic information housed within the nucleus, the question of exactly how it is utilized to direct developmental programs and tissue-specific gene expression is still an open one. However, rapid advances in high-throughput DNA sequencing (HTS) technologies over the past decade have allowed biologists to begin to tackle the question on a genomic scale. HTS has been coupled to bisulfite conversion of DNA for assessing cytosine methylation (bisulfite sequencing), to chromatin immunoprecipitation for ascertaining genomic locations bound by specific factors or found in a particular chromatin state (ChIP-seq), to the isolation of transcripts for the measurement of gene expression (RNA-seq), and to methods of chromosome conformation capture for the identification of genome-wide DNA-DNA interactions (4C-seq and Hi-C). The focus of my doctoral research has been the development of novel bioinformatics approaches to analyze the data produced by these technologies in order to shed light on how distinct cell identities are established and maintained. Here, I present highlights of this work in six chapters. Chapter 1 presents a study investigating DNA methylation changes going from the differentiated to pluripotent state, which shows that changes predominantly occur late in the process and are strongly associated with changes to chromatin state. Chapter 2 introduces methylation-sensitive restriction enzyme bisulfite sequencing (MREBS) as a method for assessing precise differential DNA methylation at cost comparable to RRBS, while providing additional information over a coverage area more comparable to WGBS. Chapter 3 presents a study showing that inhibition of ribonucleotide reductase decreased DNA methylation genome-wide by enhancing the incorporation of a cytidine analog into DNA. Chapter 4 describes a study showing that, for genes important to leaf senescence, temporal changes in expression closely matched changes to two histone modifications. Chapter 5 reviews cutting-edge research exploring the link between regulatory networks and genome organization. Chapter 6 describes a study showing that regulators responsible for cell identity contribute to cell type-specific genome organization.

Book Genome Annotation

    Book Details:
  • Author : Jung Soh
  • Publisher : CRC Press
  • Release : 2016-04-19
  • ISBN : 1439841187
  • Pages : 255 pages

Download or read book Genome Annotation written by Jung Soh and published by CRC Press. This book was released on 2016-04-19 with total page 255 pages. Available in PDF, EPUB and Kindle. Book excerpt: The success of individualized medicine, advanced crops, and new and sustainable energy sources requires thoroughly annotated genomic information and the integration of this information into a coherent model. A thorough overview of this field, Genome Annotation explores automated genome analysis and annotation from its origins to the challenges of next-generation sequencing data analysis. The book initially takes you through the last 16 years since the sequencing of the first complete microbial genome. It explains how current analysis strategies were developed, including sequencing strategies, statistical models, and early annotation systems. The authors then present visualization techniques for displaying integrated results as well as state-of-the-art annotation tools, including MAGPIE, Ensembl, Bluejay, and Galaxy. They also discuss the pipelines for the analysis and annotation of complex, next-generation DNA sequencing data. Each chapter includes references and pointers to relevant tools. As very few existing genome annotation pipelines are capable of dealing with the staggering amount of DNA sequence information, new strategies must be developed to accommodate the needs of today’s genome researchers. Covering this topic in detail, Genome Annotation provides you with the foundation and tools to tackle this challenging and evolving area. Suitable for both students new to the field and professionals who deal with genomic information in their work, the book offers two genome annotation systems on an accompanying CD-ROM.

Book Computational Methods for Analyzing and Modeling Gene Regulation and 3D Genome Organization

Download or read book Computational Methods for Analyzing and Modeling Gene Regulation and 3D Genome Organization written by Anastasiya Belyaeva and published by . This book was released on 2021 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Biological processes from differentiation to disease progression are governed by gene regulatory mechanisms. Currently large-scale omics and imaging data sets are being collected to characterize gene regulation at every level. Such data sets present new opportunities and challenges for extracting biological insights and elucidating the gene regulatory logic of cells. In this thesis, I present computational methods for the analysis and integration of various data types used for cell profiling. Specifically, I focus on analyzing and linking gene expression with the 3D organization of the genome. First, I describe methodologies for elucidating gene regulatory mechanisms by considering multiple data modalities. I design a computational framework for identifying colocalized and coregulated chromosome regions by integrating gene expression and epigenetic marks with 3D interactions using network analysis. Then, I provide a general framework for data integration using autoencoders and apply it for the integration and translation between gene expression and chromatin images of naive T-cells. Second, I describe methods for analyzing single modalities such as contact frequency data, which measures the spatial organization of the genome, and gene expression data. Given the important role of the 3D genome organization in gene regulation, I present a methodology for reconstructing the 3D diploid conformation of the genome from contact frequency data. Given the ubiquity of gene expression data and the recent advances in single-cell RNA-sequencing technologies as well as the need for causal modeling of gene regulatory mechanisms, I then describe an algorithm as well as a software tool, difference causal inference (DCI), for learning causal gene regulatory networks from gene expression data. DCI addresses the problem of directly learning differences between causal gene regulatory networks given gene expression data from two related conditions. Finally, I shift my focus from basic biology to drug discovery. Given the current COVID19 pandemic, I present a computational drug repurposing platform that enables the identification of FDA approved compounds for drug repurposing and investigation of potential causal drug mechanisms. This framework relies on identifying drugs that reverse the signature of the infection in the space learned by an autoencoder and then uses causal inference to identify putative drug mechanisms.

Book Computational Methods for Analyzing and Modeling Gene Regulation Dynamics

Download or read book Computational Methods for Analyzing and Modeling Gene Regulation Dynamics written by Jason Ernst and published by . This book was released on 2008 with total page 174 pages. Available in PDF, EPUB and Kindle. Book excerpt: Abstract: "Gene regulation is a central biological process whose disruption can lead to many diseases. This process is largely controlled by a dynamic network of transcription factors interacting with specific genes to control their expression. Time series microarray gene expression experiments have become a widely used technique to study the dynamics of this process. This thesis introduces new computational methods designed to better utilize data from these experiments and to integrate this data with static transcription factor-gene interaction data to analyze and model the dynamics of gene regulation. The first method, STEM (Short Time-series Expression Miner), is a clustering algorithm and software specifically designed for short time series expression experiments, which represent the substantial majority of experiments in this domain. The second method, DREM (Dynamic Regulatory Events Miner), integrates transcription factor-gene interactions with time series expression data to model regulatory networks while taking into account their dynamic nature. The method uses an Input-Output Hidden Markov Model to identify bifurcation points in the time series expression data. While the method can be readily applied to some species, the coverage of experimentally determined transcription factor-gene interactions in most species is limited. To address this we introduce two methods to improve the computational predictions of these interactions. The first of these methods, SEREND (SEmi-supervised REgulatory Network Discoverer), motivated by the species E. coli is a semi-supervised learning method that uses verified transcription factor-gene interactions, DNA sequence binding motifs, and gene expression data to predict new interactions. We also present a method motivated by human genomic data, that combines motif information with a probabilistic prior on transcription factor binding at each location in the organism's genome, which it infers based on a diverse set of genomic properties. We applied these methods to yeast, E. coli, and human cells. Our methods successfully predicted interactions and pathways, many of which have been experimentally validated. Our results indicate that by explicitly addressing the temporal nature of regulatory networks we can obtain accurate models of dynamic interaction networks in the cell."

Book Integrative Machine Learning and Network Mining Models for the Inference of Regulatory Elements and Interactions in Human Cells

Download or read book Integrative Machine Learning and Network Mining Models for the Inference of Regulatory Elements and Interactions in Human Cells written by Asa Thibodeau and published by . This book was released on 2018 with total page 86 pages. Available in PDF, EPUB and Kindle. Book excerpt: With the increase in diverse genome profiling technologies and publicly available ontology databases ranging from open chromatin profiles to the 3D structure of the genome, it is imperative to build novel computational methods that take full advantage of these diverse datasets to uncover the regulatory mechanisms behind cellular functions. Integrating these datasets offers the opportunity to identify regulatory elements (id est, promoter, enhancers, et cetera) and interactions critical for cell-type-specific functions. Here, the goal's two fold: 1) inference of regulatory interactions and networks from 3D chromatin interaction datasets and 2) inference of cell-specific and non-specific regulatory elements such as enhancers (regulatory elements that target gene promoters and regulate their expression). To address the first goal, two software tools were developed: (1) a web-accessible application: Querying and visualizing chromatin Interaction Network (QuIN) and (2) a pathway analysis prioritization tool: Triangulation of Perturbation Origins and Identification of Non-Coding Targets (TriPOINT). QuIN enables users to easily mine chromatin interaction datasets and integrate them with other sources such as SNPs and epigenetic marks to ultimately build networks to query and visualize them in downstream analyses and to prioritize genomic loci (id est, disease-causing variants). Similarly, TriPOINT uses pathways in conjunction with chromatin interaction networks to identify perturbed genes in treatment vs. control cases, implementing pathway topology based approaches for identifying inconsistencies in pathways and incorporating the capabilities of QuIN to integrate non-coding regulators targeting genes in these pathways through chromatin interaction data. The second goal was achieved using two approaches. First, features obtained from network mining were trained on support vector machines to assess the predictive power in identifying cell-type-specific promoters (broad domains) and enhancers (super enhancers) from chromatin interaction networks. Network signatures were mined in three cell lines (MCF-7, K562, and GM12878) using QuIN across multiple chromatin interaction assays (ChIA-PET, Hi-C, and HiChIP) and it was discovered that network related features could effectively discriminate typical promoters and enhancers from cell-type-specific ones. Second, features from Assay for Transposase Accessible Chromatin (ATAC-seq) were profiled to identify enhancers from accessible chromatin in neural network models. Models were highly predictive of enhancers; useful for individual specific and clinical sample settings.