[EBOOK] Biologically Interpretable Machine Learning Methods To Understand Gene Regulation For Disease Phenotypes PDF Download

Biologically Interpretable Machine Learning Methods to Understand Gene Regulation for Disease Phenotypes

Book Details:

Author : Ting Jin
Publisher :
Release : 2023
ISBN :
Pages : 0 pages

Download or read book Biologically Interpretable Machine Learning Methods to Understand Gene Regulation for Disease Phenotypes written by Ting Jin and published by . This book was released on 2023 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Gene expression and regulation is a key molecular mechanism driving the development of human diseases, particularly at the cell type level, but it remains elusive. For example in many brain diseases, such as Alzheimer's disease (AD), understanding how cell-type gene expression and regulation change across multiple stages of AD progression is still challenging. Moreover, interindividual variability of gene expression and regulation is a known characteristic of the human brain and brain diseases. However, it is still unclear how interindividual variability affects personalized gene regulation in brain diseases including AD, thereby contributing to their heterogeneity. Recent technological advances have enabled the detection of gene regulation activities through multi-omics (i.e., genomics, transcriptomics, epigenomics, proteomics). In particular, emerging single-cell sequencing technologies (e.g., scRNA-seq, scATAC-seq) allow us to study functional genomics and gene regulation at the cell-type level. Moreover, these multi-omics data of populations (e.g., human individuals) provide a unique opportunity to study the underlying regulatory mechanisms occurring in brain disease progression and clinical phenotypes. For instance, PsychAD is a large project generating single-cell multi-omics data including many neuronal and glial cell types, aiming to understand the molecular mechanisms of neuropsychiatric symptoms of multiple brain diseases (e.g., AD, SCZ, ASD, Bipolar) from over 1,000 individuals. However, analyzing and integrating large-scale multi-omics data at the population level, as well as understanding the mechanisms of gene regulation, also remains a challenge. Machine learning is a powerful and emerging tool to decode the unique complexities and heterogeneity of human diseases. For instance, Beebe-Wang, Nicosia, et al. developed MD-AD, a multi-task neural network model to predict various disease phenotypes in AD patients using RNA-seq. Additionally, with advancements in graph neural networks, which possess enhanced capabilities to represent sophisticated gene network structures like gene regulation networks that control gene expression. Efforts have also been made to capture the gene regulation heterogeneity of brain diseases. For instance, Kim SY has applied graph convolutional networks to offer personalized diagnostic insights through population graphs that correspond with disease progression. However, many existing machine learning methods are often limited to constructing accurate models for disease phenotype prediction and frequently lack biological interpretability or personalized insights, especially in gene regulation. Therefore, to address these challenges, my Ph.D. works have developed three machine-learning methods designed to decode the gene regulation mechanisms of human diseases. First, in this dissertation, I will present scGRNom, a computational pipeline that integrates multi-omic data to construct cell-type gene regulatory networks (GRNs) linking non-coding regulatory elements. Next, I will introduce i-BrainMap an interpretable knowledge-guided graph neural network model to prioritize personalized cell type disease genes, regulatory linkages, and modules. Thirdly, I introduce ECMaker, a semi-restricted Boltzmann machine (semi-RBM) method for identifying gene networks to predict diseases and clinical phenotypes. Overall, all our interpretable machine learning models improve phenotype prediction, prioritize key genes and networks associated with disease phenotypes, and are further aimed at enhancing our understanding of gene regulatory mechanisms driving disease progression and clinical phenotypes.

Interpretable Machine Learning Methods for Regulatory and Disease Genomics

Book Details:

Author : Peyton Greis Greenside
Publisher :
Release : 2018
ISBN :
Pages : pages

Download or read book Interpretable Machine Learning Methods for Regulatory and Disease Genomics written by Peyton Greis Greenside and published by . This book was released on 2018 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: It is an incredible feat of nature that the same genome contains the code to every cell in each living organism. From this same genome, each unique cell type gains a different program of gene expression that enables the development and function of an organism throughout its lifespan. The non-coding genome - the ~98 of the genome that does not code directly for proteins - serves an important role in generating the diverse programs of gene expression turned on in each unique cell state. A complex network of proteins bind specific regulatory elements in the non-coding genome to regulate the expression of nearby genes. While basic principles of gene regulation are understood, the regulatory code of which factors bind together at which genomic elements to turn on which genes remains to be revealed. Further, we do not understand how disruptions in gene regulation, such as from mutations that fall in non-coding regions, ultimately lead to disease or other changes in cell state. In this work we present several methods developed and applied to learn the regulatory code or the rules that govern non-coding regions of the genome and how they regulate nearby genes. We first formulate the problem as one of learning pairs of sequence motifs and expressed regulator proteins that jointly predict the state of the cell, such as the cell type specific gene expression or chromatin accessibility. Using pre-engineered sequence features and known expression, we use a paired-feature boosting approach to build an interpretable model of how the non-coding genome contributes to cell state. We also demonstrate a novel improvement to this method that takes into account similarities between closely related cell types by using a hierarchy imposed on all of the predicted cell states. We apply this method to discover validated regulators of tadpole tail regeneration and to predict protein-ligand binding interactions. Recognizing the need for improved sequence features and stronger predictive performance, we then move to a deep learning modeling framework to predict epigenomic phenotypes such as chromatin accessibility from just underlying DNA sequence. We use deep learning models, specifically multi-task convolutional neural networks, to learn a featurization of sequences over several kilobases long and their mapping to a functional phenotype. We develop novel architectures that encode principles of genomics in models typically designed for computer vision, such as incorporating reverse complementation and the 3D structure of the genome. We also develop methods to interpret traditionally ``black box" neural networks by 1) assigning importance scores to each input sequence to the model, 2) summarizing non-redundant patterns learned by the model that are predictive in each cell type, and 3) discovering interactions learned by the model that provide indications as to how different non-coding sequence features depend on each other. We apply these methods in the system of hematopoiesis to interpret chromatin dynamics across differentiation of blood cell types, to understand immune stimulation, and to interpret immune disease-associated variants that fall in non-coding regions. We demonstrate strong performance of our boosting and deep learning models and demonstrate improved performance of these machine learning frameworks when taking into account existing knowledge about the biological system being modeled. We benchmark our interpretation methods using gold standard systems and existing experimental data where available. We confirm existing knowledge surrounding essential factors in hematopoiesis, and also generate novel hypotheses surrounding how factors interact to regulate differentiation. Ultimately our work provides a set of tools for researchers to probe and understand the non-coding genome and its role in controlling gene expression as well as a set of novel insights surrounding how hematopoiesis is controlled on many scales from global quantification of regulatory sequence to interpretation of individual variants.

Technology & Engineering

Handbook of Machine Learning Applications for Genomics

Book Details:

Author : Sanjiban Sekhar Roy
Publisher : Springer Nature
Release : 2022-06-23
ISBN : 9811691584
Pages : 222 pages

Download or read book Handbook of Machine Learning Applications for Genomics written by Sanjiban Sekhar Roy and published by Springer Nature. This book was released on 2022-06-23 with total page 222 pages. Available in PDF, EPUB and Kindle. Book excerpt: Currently, machine learning is playing a pivotal role in the progress of genomics. The applications of machine learning are helping all to understand the emerging trends and the future scope of genomics. This book provides comprehensive coverage of machine learning applications such as DNN, CNN, and RNN, for predicting the sequence of DNA and RNA binding proteins, expression of the gene, and splicing control. In addition, the book addresses the effect of multiomics data analysis of cancers using tensor decomposition, machine learning techniques for protein engineering, CNN applications on genomics, challenges of long noncoding RNAs in human disease diagnosis, and how machine learning can be used as a tool to shape the future of medicine. More importantly, it gives a comparative analysis and validates the outcomes of machine learning methods on genomic data to the functional laboratory tests or by formal clinical assessment. The topics of this book will cater interest to academicians, practitioners working in the field of functional genomics, and machine learning. Also, this book shall guide comprehensively the graduate, postgraduates, and Ph.D. scholars working in these fields.

Computers

Gene Expression Data Analysis

Book Details:

Author : Pankaj Barah
Publisher : CRC Press
Release : 2021-11-21
ISBN : 1000425738
Pages : 379 pages

Download or read book Gene Expression Data Analysis written by Pankaj Barah and published by CRC Press. This book was released on 2021-11-21 with total page 379 pages. Available in PDF, EPUB and Kindle. Book excerpt: Development of high-throughput technologies in molecular biology during the last two decades has contributed to the production of tremendous amounts of data. Microarray and RNA sequencing are two such widely used high-throughput technologies for simultaneously monitoring the expression patterns of thousands of genes. Data produced from such experiments are voluminous (both in dimensionality and numbers of instances) and evolving in nature. Analysis of huge amounts of data toward the identification of interesting patterns that are relevant for a given biological question requires high-performance computational infrastructure as well as efficient machine learning algorithms. Cross-communication of ideas between biologists and computer scientists remains a big challenge. Gene Expression Data Analysis: A Statistical and Machine Learning Perspective has been written with a multidisciplinary audience in mind. The book discusses gene expression data analysis from molecular biology, machine learning, and statistical perspectives. Readers will be able to acquire both theoretical and practical knowledge of methods for identifying novel patterns of high biological significance. To measure the effectiveness of such algorithms, we discuss statistical and biological performance metrics that can be used in real life or in a simulated environment. This book discusses a large number of benchmark algorithms, tools, systems, and repositories that are commonly used in analyzing gene expression data and validating results. This book will benefit students, researchers, and practitioners in biology, medicine, and computer science by enabling them to acquire in-depth knowledge in statistical and machine-learning-based methods for analyzing gene expression data. Key Features: An introduction to the Central Dogma of molecular biology and information flow in biological systems A systematic overview of the methods for generating gene expression data Background knowledge on statistical modeling and machine learning techniques Detailed methodology of analyzing gene expression data with an example case study Clustering methods for finding co-expression patterns from microarray, bulkRNA, and scRNA data A large number of practical tools, systems, and repositories that are useful for computational biologists to create, analyze, and validate biologically relevant gene expression patterns Suitable for multidisciplinary researchers and practitioners in computer science and biological sciences

Electronic dissertations

Interpretable Machine Learning in Plant Genomes

Book Details:

Author : Christina Brady Azodi
Publisher :
Release : 2019
ISBN : 9781392717943
Pages : 217 pages

Download or read book Interpretable Machine Learning in Plant Genomes written by Christina Brady Azodi and published by . This book was released on 2019 with total page 217 pages. Available in PDF, EPUB and Kindle. Book excerpt: Complex systems are ubiquitous in genetics and genomics. From the regulation of gene expression to the genetic basis of complex traits, we see that complex networks of diverse cellular molecules underpin the natural world. Driven by technological advances, today's researchers have access to large amounts of omics data from diverse species. At the same time, improvements in computer processing and algorithms have produced more powerful computational tools. Taken together, these advances mean that those working at the interface of data science and biology are poised to better model and understand complex biological systems. The research in this dissertation demonstrates how a data-driven approach can be used to better understand three complex systems: (1) transcriptional response to single and combined heat and drought stress in Arabidopsis thaliana, (2) the genetic basis of flowering time, a complex trait, in Zea mays, and (3) the social basis for opinions and beliefs about biotechnology products.To study the first system, we generated models of the cis-regulatory code from information about DNA sequence and additional omics levels using both classic machine learning and deep learning algorithms. We identified 1,061 putative cis-regulatory elements associated with different patterns of response to single and combined heat and drought stress and found that information about additional levels of regulation, especially chromatin accessibility and known transcription factor binding, improved our models of the cis-regulatory code. To study the second system, we generated phenotype prediction models for flowering time, height, and yield based on either genetic markers or transcript levels at the seedling stage. We found that, while genetic marker-based models performed better than transcript level-based models, models that integrated both types of data performed best. Furthermore, transcript-based models were more useful for finding genes known to be associated with flowering time, highlighting how using additional levels of omics data can improve our ability to understand the genetic basis of complex traits. Finally, to study the third system, we integrated 29 characteristics about a person (e.g. age, political ideology, education, values, environmental beliefs) into a machine learning model that would predict an individual's beliefs and opinions about five different types of biotechnology products (e.g. biofortification, biopharmaceuticals). While this approach was particularly usefully for identifying individuals that were broadly supportive of biotechnology, finding characteristics of individuals with negative or conditional (i.e. support product A, but not B) opinions was more challenging, highlighting the complexity of public opinions about biotechnology.

Electronic dissertations

Machine Learning for the Study of Gene Regulation and Complex Traits

Book Details:

Author : Anne Sonnenschein
Publisher :
Release : 2017
ISBN : 9780355189353
Pages : 151 pages

Download or read book Machine Learning for the Study of Gene Regulation and Complex Traits written by Anne Sonnenschein and published by . This book was released on 2017 with total page 151 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Machine Learning Methods in Construction of Transcriptional Regulatory Networks

Book Details:

Author : Yue Fan
Publisher :
Release : 2012
ISBN :
Pages : 360 pages

Download or read book Machine Learning Methods in Construction of Transcriptional Regulatory Networks written by Yue Fan and published by . This book was released on 2012 with total page 360 pages. Available in PDF, EPUB and Kindle. Book excerpt: Abstract: The transcriptional regulatory network is a biological network that captures the interactions between transcription factor genes (TF-genes) and their regulatory gene targets. Regulation of transcription controls the level of gene expression and thus governs many characteristics of cells. The primary mechanism of transcriptional regulation is through DNA binding, that is, a transcription factor is usually bound to a DNA binding site which is sometimes located in the promoter region of a target gene. The construction of the regulatory network is a problem which can be decomposed into the sub-problems of identifying, for every known gene which produces a TF, its target genes, its binding motif (common sequence pattern of its DNA binding sites) and its DNA binding sites themselves (nucleotide-level binding locations). Many tools have been developed in the last decade to solve these problems. This thesis presents a series of machine learning-based algorithms, making use of support vector machines (SVMs), which can be used to construct the transcriptional regulatory network. This has also established a framework which enables other machine learning algorithms to be applied to this field. The connection between new machine learning methods and traditional methods for solving the above problems also suggests that the machine methods introduced have the potential to identify optimal solutions based on the use training examples of binding motifs, binding sites, and target genes of a given TF. Based on the insights of a pilot project (TFSVM), we first develop a motif discovery tool (SVMotif) to discover binding motifs out of a set of pre-identified potential binding sequences. This tool, tested on the yeast genome, validates many previously identified motifs and also discovers novel ones. Besides identifying primary binding motifs, this tool also successfully identifies 20 secondary motifs at the p = 0.15 significance level. In order to leverage the advantage of different motif discovery algorithms, an ensemble algorithm is then developed to integrate information from multiple position weight matrices (PWM) produced by 5 commonly used motif discovery algorithms. A connection between the SVM-based methods and traditional PWM-based methods is described, which becomes the basis of integrating multiple PWMs by considering them as SVM-based weak learners. This ensemble method is tested in solving the three above-mentioned identification problems--it outperforms its 5 components on all tasks. Finally, a machine framework is proposed and implemented to utilize network information to denoise gene expression feature vectors used for diagnosis and prognosis in biological and biomedical problems. Several local smoothing techniques from statistics are generalized to the graphs/networks obtained from the above and other network construction methods. We then applied the algorithm to denoising gene expression profiles--the resulting smoothed gene expression features improve the accuracy of biological phenotype classification significantly.

Computers

Graph Representation Learning

Book Details:

Author : William L. William L. Hamilton
Publisher : Springer Nature
Release : 2022-06-01
ISBN : 3031015886
Pages : 141 pages

Download or read book Graph Representation Learning written by William L. William L. Hamilton and published by Springer Nature. This book was released on 2022-06-01 with total page 141 pages. Available in PDF, EPUB and Kindle. Book excerpt: Graph-structured data is ubiquitous throughout the natural and social sciences, from telecommunication networks to quantum chemistry. Building relational inductive biases into deep learning architectures is crucial for creating systems that can learn, reason, and generalize from this kind of data. Recent years have seen a surge in research on graph representation learning, including techniques for deep graph embeddings, generalizations of convolutional neural networks to graph-structured data, and neural message-passing approaches inspired by belief propagation. These advances in graph representation learning have led to new state-of-the-art results in numerous domains, including chemical synthesis, 3D vision, recommender systems, question answering, and social network analysis. This book provides a synthesis and overview of graph representation learning. It begins with a discussion of the goals of graph representation learning as well as key methodological foundations in graph theory and network analysis. Following this, the book introduces and reviews methods for learning node embeddings, including random-walk-based methods and applications to knowledge graphs. It then provides a technical synthesis and introduction to the highly successful graph neural network (GNN) formalism, which has become a dominant and fast-growing paradigm for deep learning with graph data. The book concludes with a synthesis of recent advancements in deep generative models for graphs—a nascent but quickly growing subset of graph representation learning.

Computers

Kernel Methods in Computational Biology

Book Details:

Author : Bernhard Schölkopf
Publisher : MIT Press
Release : 2004
ISBN : 9780262195096
Pages : 428 pages

Download or read book Kernel Methods in Computational Biology written by Bernhard Schölkopf and published by MIT Press. This book was released on 2004 with total page 428 pages. Available in PDF, EPUB and Kindle. Book excerpt: A detailed overview of current research in kernel methods and their application to computational biology.

Computers

Research in Computational Molecular Biology

Book Details:

Author : Lenore J. Cowen
Publisher : Springer
Release : 2019-04-15
ISBN : 3030170837
Pages : 337 pages

Download or read book Research in Computational Molecular Biology written by Lenore J. Cowen and published by Springer. This book was released on 2019-04-15 with total page 337 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the proceedings of the 23rd Annual Conference on Research in Computational Molecular Biology, RECOMB 2019, held in Washington, DC, USA, in April 2019. The 17 extended and 20 short abstracts presented were carefully reviewed and selected from 175 submissions. The short abstracts are included in the back matter of the volume. The papers report on original research in all areas of computational molecular biology and bioinformatics.

Computers

Intelligent Computing Methodologies

Book Details:

Author : De-Shuang Huang
Publisher : Springer
Release : 2019-07-30
ISBN : 3030267660
Pages : 833 pages

Download or read book Intelligent Computing Methodologies written by De-Shuang Huang and published by Springer. This book was released on 2019-07-30 with total page 833 pages. Available in PDF, EPUB and Kindle. Book excerpt: This two-volume set of LNCS 11643 and LNCS 11644 constitutes - in conjunction with the volume LNAI 11645 - the refereed proceedings of the 15th International Conference on Intelligent Computing, ICIC 2019, held in Nanchang, China, in August 2019. The 217 full papers of the three proceedings volumes were carefully reviewed and selected from 609 submissions. The ICIC theme unifies the picture of contemporary intelligent computing techniques as an integral concept that highlights the trends in advanced computational intelligence and bridges theoretical research with applications. The theme for this conference is “Advanced Intelligent Computing Methodologies and Applications.” Papers related to this theme are especially solicited, including theories, methodologies, and applications in science and technology.

Technology & Engineering

Genomics Assisted Crop Improvement

Book Details:

Author : R.K. Varshney
Publisher : Springer Science & Business Media
Release : 2007-12-12
ISBN : 1402062958
Pages : 405 pages

Download or read book Genomics Assisted Crop Improvement written by R.K. Varshney and published by Springer Science & Business Media. This book was released on 2007-12-12 with total page 405 pages. Available in PDF, EPUB and Kindle. Book excerpt: This superb volume provides a critical assessment of genomics tools and approaches for crop breeding. Volume 1 presents the status and availability of genomic resources and platforms, and also devises strategies and approaches for effectively exploiting genomics research. Volume 2 goes into detail on a number of case studies of several important crop and plant species that summarize both the achievements and limitations of genomics research for crop improvement.

Artificial intelligence

Deep Learning in Biology and Medicine

Book Details:

Author : Davide Bacciu
Publisher : World Scientific Publishing Europe Limited
Release : 2021
ISBN : 9781800610934
Pages : 0 pages

Download or read book Deep Learning in Biology and Medicine written by Davide Bacciu and published by World Scientific Publishing Europe Limited. This book was released on 2021 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Biology, medicine and biochemistry have become data-centric fields for which Deep Learning methods are delivering groundbreaking results. Addressing high impact challenges, Deep Learning in Biology and Medicine provides an accessible and organic collection of Deep Learning essays on bioinformatics and medicine. It caters for a wide readership, ranging from machine learning practitioners and data scientists seeking methodological knowledge to address biomedical applications, to life science specialists in search of a gentle reference for advanced data analytics.With contributions from internationally renowned experts, the book covers foundational methodologies in a wide spectrum of life sciences applications, including electronic health record processing, diagnostic imaging, text processing, as well as omics-data processing. This survey of consolidated problems is complemented by a selection of advanced applications, including cheminformatics and biomedical interaction network analysis. A modern and mindful approach to the use of data-driven methodologies in the life sciences also requires careful consideration of the associated societal, ethical, legal and transparency challenges, which are covered in the concluding chapters of this book.

Computers

Elements of Causal Inference

Book Details:

Author : Jonas Peters
Publisher : MIT Press
Release : 2017-11-29
ISBN : 0262037319
Pages : 289 pages

Download or read book Elements of Causal Inference written by Jonas Peters and published by MIT Press. This book was released on 2017-11-29 with total page 289 pages. Available in PDF, EPUB and Kindle. Book excerpt: A concise and self-contained introduction to causal inference, increasingly important in data science and machine learning. The mathematization of causality is a relatively recent development, and has become increasingly important in data science and machine learning. This book offers a self-contained and concise introduction to causal models and how to learn them from data. After explaining the need for causal models and discussing some of the principles underlying causal inference, the book teaches readers how to use causal models: how to compute intervention distributions, how to infer causal models from observational and interventional data, and how causal ideas could be exploited for classical machine learning problems. All of these topics are discussed first in terms of two variables and then in the more general multivariate case. The bivariate case turns out to be a particularly hard problem for causal learning because there are no conditional independences as used by classical methods for solving multivariate cases. The authors consider analyzing statistical asymmetries between cause and effect to be highly instructive, and they report on their decade of intensive research into this problem. The book is accessible to readers with a background in machine learning or statistics, and can be used in graduate courses or as a reference for researchers. The text includes code snippets that can be copied and pasted, exercises, and an appendix with a summary of the most important technical concepts.

Medical

Machine Learning in Radiation Oncology

Book Details:

Author : Issam El Naqa
Publisher : Springer
Release : 2015-06-19
ISBN : 3319183052
Pages : 336 pages

Download or read book Machine Learning in Radiation Oncology written by Issam El Naqa and published by Springer. This book was released on 2015-06-19 with total page 336 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book provides a complete overview of the role of machine learning in radiation oncology and medical physics, covering basic theory, methods, and a variety of applications in medical physics and radiotherapy. An introductory section explains machine learning, reviews supervised and unsupervised learning methods, discusses performance evaluation, and summarizes potential applications in radiation oncology. Detailed individual sections are then devoted to the use of machine learning in quality assurance; computer-aided detection, including treatment planning and contouring; image-guided radiotherapy; respiratory motion management; and treatment response modeling and outcome prediction. The book will be invaluable for students and residents in medical physics and radiation oncology and will also appeal to more experienced practitioners and researchers and members of applied machine learning communities.

Computers

Automatic Text Processing

Book Details:

Author : Gerard Salton
Publisher : Addison Wesley Publishing Company
Release : 1989
ISBN :
Pages : 552 pages

Download or read book Automatic Text Processing written by Gerard Salton and published by Addison Wesley Publishing Company. This book was released on 1989 with total page 552 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Medical

Molecular Epidemiology

Book Details:

Author : Paul A. Schulte
Publisher : Academic Press
Release : 2012-12-02
ISBN : 0323138578
Pages : 609 pages

Download or read book Molecular Epidemiology written by Paul A. Schulte and published by Academic Press. This book was released on 2012-12-02 with total page 609 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book will serve as a primer for both laboratory and field scientists who are shaping the emerging field of molecular epidemiology. Molecular epidemiology utilizes the same paradigm as traditional epidemiology but uses biological markers to identify exposure, disease or susceptibility. Schulte and Perera present the epidemiologic methods pertinent to biological markers. The book is also designed to enumerate the considerations necessary for valid field research and provide a resource on the salient and subtle features of biological indicators.