[EBOOK] Machine Learning For The Study Of Gene Regulation And Complex Traits PDF Download

Electronic dissertations

Machine Learning for the Study of Gene Regulation and Complex Traits

Book Details:

Author : Anne Sonnenschein
Publisher :
Release : 2017
ISBN : 9780355189353
Pages : 151 pages

Download or read book Machine Learning for the Study of Gene Regulation and Complex Traits written by Anne Sonnenschein and published by . This book was released on 2017 with total page 151 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Electronic dissertations

Interpretable Machine Learning in Plant Genomes

Book Details:

Author : Christina Brady Azodi
Publisher :
Release : 2019
ISBN : 9781392717943
Pages : 217 pages

Download or read book Interpretable Machine Learning in Plant Genomes written by Christina Brady Azodi and published by . This book was released on 2019 with total page 217 pages. Available in PDF, EPUB and Kindle. Book excerpt: Complex systems are ubiquitous in genetics and genomics. From the regulation of gene expression to the genetic basis of complex traits, we see that complex networks of diverse cellular molecules underpin the natural world. Driven by technological advances, today's researchers have access to large amounts of omics data from diverse species. At the same time, improvements in computer processing and algorithms have produced more powerful computational tools. Taken together, these advances mean that those working at the interface of data science and biology are poised to better model and understand complex biological systems. The research in this dissertation demonstrates how a data-driven approach can be used to better understand three complex systems: (1) transcriptional response to single and combined heat and drought stress in Arabidopsis thaliana, (2) the genetic basis of flowering time, a complex trait, in Zea mays, and (3) the social basis for opinions and beliefs about biotechnology products.To study the first system, we generated models of the cis-regulatory code from information about DNA sequence and additional omics levels using both classic machine learning and deep learning algorithms. We identified 1,061 putative cis-regulatory elements associated with different patterns of response to single and combined heat and drought stress and found that information about additional levels of regulation, especially chromatin accessibility and known transcription factor binding, improved our models of the cis-regulatory code. To study the second system, we generated phenotype prediction models for flowering time, height, and yield based on either genetic markers or transcript levels at the seedling stage. We found that, while genetic marker-based models performed better than transcript level-based models, models that integrated both types of data performed best. Furthermore, transcript-based models were more useful for finding genes known to be associated with flowering time, highlighting how using additional levels of omics data can improve our ability to understand the genetic basis of complex traits. Finally, to study the third system, we integrated 29 characteristics about a person (e.g. age, political ideology, education, values, environmental beliefs) into a machine learning model that would predict an individual's beliefs and opinions about five different types of biotechnology products (e.g. biofortification, biopharmaceuticals). While this approach was particularly usefully for identifying individuals that were broadly supportive of biotechnology, finding characteristics of individuals with negative or conditional (i.e. support product A, but not B) opinions was more challenging, highlighting the complexity of public opinions about biotechnology.

Developing Machine Learning and Statistical Methods for the Analysis of Genetics and Genomics

Book Details:

Author : Jiajin Li
Publisher :
Release : 2021
ISBN :
Pages : 154 pages

Download or read book Developing Machine Learning and Statistical Methods for the Analysis of Genetics and Genomics written by Jiajin Li and published by . This book was released on 2021 with total page 154 pages. Available in PDF, EPUB and Kindle. Book excerpt: With the development of next-generation sequencing technologies, we can detect numerous genetic variants associated with many diseases or complex traits over the past decades. Genome-wide association studies (GWAS) have been one of the most effective methods to identify those variants. It discovers disease-associated variants by comparing the genetic information between controls and cases. This approach is simple and effective and has been used by many studies. Before performing GWAS, we need to detect the genetic variants of the sample population. A subset of these variants, however, may have poor sequencing quality due to limitations in NGS or variant callers. In genetic studies that analyze a large number of sequenced individuals, it is critical to detect and remove those variants with poor quality as they may cause spurious findings. Here, I will present ForestQC, an efficient statistical tool for performing quality control on variants identified from NGS data by combining a traditional filtering approach and a machine learning approach, which outperforms widely used methods by considerably improving the quality of variants to be included in the analysis. Once this association is identified, the next step is to understand the genetic mechanism of rare variants on how the variants influence diseases, especially whether or how they regulate gene expression as they may affect diseases through gene regulation. However, it is challenging to identify the regulatory effects of rare variants because it often requires large sample sizes and the existing statistical approaches are not optimized for it. To improve statistical power, I will introduce a new approach, LRT-q, based on a likelihood ratio test that combines effects of multiple rare variants in a nonlinear manner and has higher power than previous approaches. I apply LRT-q to the GTEx dataset and find many novel biological insights. Recent studies have shown that omics data can be used for automatic disease diagnosis with machine learning algorithms. I will introduce an accurate and automated machine learning pipeline for the diagnosis of atopic dermatitis (AD) based on transcriptome and microbiota data. I will demonstrate that this classifier can accurately differentiate subjects with AD and healthy individuals. It also identifies a set of genes and microorganisms that are predictive for AD. I will show that they are directly or indirectly associated with AD.

Science

Machine Learning in Genome Wide Association Studies

Book Details:

Author : Ting Hu
Publisher : Frontiers Media SA
Release : 2020-12-15
ISBN : 2889662292
Pages : 74 pages

Download or read book Machine Learning in Genome Wide Association Studies written by Ting Hu and published by Frontiers Media SA. This book was released on 2020-12-15 with total page 74 pages. Available in PDF, EPUB and Kindle. Book excerpt: This eBook is a collection of articles from a Frontiers Research Topic. Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: frontiersin.org/about/contact.

Science

Machine Learning and Network Driven Integrative Genomics

Book Details:

Author : Mehdi Pirooznia
Publisher : Frontiers Media SA
Release : 2021-04-29
ISBN : 2889667251
Pages : 143 pages

Download or read book Machine Learning and Network Driven Integrative Genomics written by Mehdi Pirooznia and published by Frontiers Media SA. This book was released on 2021-04-29 with total page 143 pages. Available in PDF, EPUB and Kindle. Book excerpt:

MACHINE LEARNING AND DEEP LEARNING APPROACHES FOR GENE REGULATORY NETWORK INFERENCE IN PLANT SPECIES

Book Details:

Author :
Publisher :
Release : 2023
ISBN :
Pages : 0 pages

Download or read book MACHINE LEARNING AND DEEP LEARNING APPROACHES FOR GENE REGULATORY NETWORK INFERENCE IN PLANT SPECIES written by and published by . This book was released on 2023 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Abstract : The construction of gene regulatory networks (GRNs) is vital for understanding the regulation of metabolic pathways, biological processes, and complex traits during plant growth and responses to environmental cues and stresses. The increasing availability of public databases has facilitated the development of numerous methods for inferring gene regulatory relationships between transcription factors and their targets. However, there is limited research on supervised learning techniques that utilize available regulatory relationships of plant species in public databases. This study investigates the potential of machine learning (ML), deep learning (DL), and hybrid approaches for constructing GRNs in plant species, specifically Arabidopsis thaliana, poplar, and maize. Challenges arise due to limited training data for gene regulatory pairs, especially in less-studied species such as poplar and maize. Nonetheless, our results demonstrate that hybrid models integrating ML and artificial neural network (ANN) techniques significantly outperformed traditional methods in predicting gene regulatory relationships. The best-performing hybrid models achieved over 95% accuracy on holdout test datasets, surpassing traditional ML and ANN models and also showed good accuracy on lignin biosynthesis pathway analysis. Employing transfer learning techniques, this study has also successfully transferred the known knowledge of gene regulation from one species to another, substantially improving performance and manifesting the viability of cross-species learning using deep learning-based approaches. This study contributes to the methodology for growing body of knowledge in GRN prediction and construction for plant species, highlighting the value of adopting hybrid models and transfer learning techniques. This study and the results will help to pave a way for future research on how to learn from known to unknown and will be conductive to the advance of modern genomics and bioinformatics.

Business & Economics

Neural Networks in Finance and Investing

Book Details:

Author : Robert R. Trippi
Publisher : Irwin Professional Publishing
Release : 1996
ISBN :
Pages : 872 pages

Download or read book Neural Networks in Finance and Investing written by Robert R. Trippi and published by Irwin Professional Publishing. This book was released on 1996 with total page 872 pages. Available in PDF, EPUB and Kindle. Book excerpt: This completely updated version of the classic first edition offers a wealth of new material reflecting the latest developments in teh field. For investment professionals seeking to maximize this exciting new technology, this handbook is the definitive information source.

Machine Learning Techniques on Gene Function Prediction

Book Details:

Author : Quan Zou
Publisher : Frontiers Media SA
Release : 2019-12-04
ISBN : 2889632148
Pages : 485 pages

Download or read book Machine Learning Techniques on Gene Function Prediction written by Quan Zou and published by Frontiers Media SA. This book was released on 2019-12-04 with total page 485 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Computers

Advanced Lectures on Machine Learning

Book Details:

Author : Olivier Bousquet
Publisher : Springer
Release : 2011-03-22
ISBN : 3540286500
Pages : 249 pages

Download or read book Advanced Lectures on Machine Learning written by Olivier Bousquet and published by Springer. This book was released on 2011-03-22 with total page 249 pages. Available in PDF, EPUB and Kindle. Book excerpt: Machine Learning has become a key enabling technology for many engineering applications, investigating scientific questions and theoretical problems alike. To stimulate discussions and to disseminate new results, a summer school series was started in February 2002, the documentation of which is published as LNAI 2600. This book presents revised lectures of two subsequent summer schools held in 2003 in Canberra, Australia, and in Tübingen, Germany. The tutorial lectures included are devoted to statistical learning theory, unsupervised learning, Bayesian inference, and applications in pattern recognition; they provide in-depth overviews of exciting new developments and contain a large number of references. Graduate students, lecturers, researchers and professionals alike will find this book a useful resource in learning and teaching machine learning.

Computers

Machine Learning and Knowledge Discovery in Databases

Book Details:

Author : Wray Buntine
Publisher : Springer Science & Business Media
Release : 2009-09-03
ISBN : 3642041736
Pages : 787 pages

Download or read book Machine Learning and Knowledge Discovery in Databases written by Wray Buntine and published by Springer Science & Business Media. This book was released on 2009-09-03 with total page 787 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the refereed proceedings of the joint conference on Machine Learning and Knowledge Discovery in Databases: ECML PKDD 2009, held in Bled, Slovenia, in September 2009. The 106 papers presented in two volumes, together with 5 invited talks, were carefully reviewed and selected from 422 paper submissions. In addition to the regular papers the volume contains 14 abstracts of papers appearing in full version in the Machine Learning Journal and the Knowledge Discovery and Databases Journal of Springer. The conference intends to provide an international forum for the discussion of the latest high quality research results in all areas related to machine learning and knowledge discovery in databases. The topics addressed are application of machine learning and data mining methods to real-world problems, particularly exploratory research that describes novel learning and mining tasks and applications requiring non-standard techniques.

Interpretable Machine Learning Methods for Regulatory and Disease Genomics

Book Details:

Author : Peyton Greis Greenside
Publisher :
Release : 2018
ISBN :
Pages : pages

Download or read book Interpretable Machine Learning Methods for Regulatory and Disease Genomics written by Peyton Greis Greenside and published by . This book was released on 2018 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: It is an incredible feat of nature that the same genome contains the code to every cell in each living organism. From this same genome, each unique cell type gains a different program of gene expression that enables the development and function of an organism throughout its lifespan. The non-coding genome - the ~98 of the genome that does not code directly for proteins - serves an important role in generating the diverse programs of gene expression turned on in each unique cell state. A complex network of proteins bind specific regulatory elements in the non-coding genome to regulate the expression of nearby genes. While basic principles of gene regulation are understood, the regulatory code of which factors bind together at which genomic elements to turn on which genes remains to be revealed. Further, we do not understand how disruptions in gene regulation, such as from mutations that fall in non-coding regions, ultimately lead to disease or other changes in cell state. In this work we present several methods developed and applied to learn the regulatory code or the rules that govern non-coding regions of the genome and how they regulate nearby genes. We first formulate the problem as one of learning pairs of sequence motifs and expressed regulator proteins that jointly predict the state of the cell, such as the cell type specific gene expression or chromatin accessibility. Using pre-engineered sequence features and known expression, we use a paired-feature boosting approach to build an interpretable model of how the non-coding genome contributes to cell state. We also demonstrate a novel improvement to this method that takes into account similarities between closely related cell types by using a hierarchy imposed on all of the predicted cell states. We apply this method to discover validated regulators of tadpole tail regeneration and to predict protein-ligand binding interactions. Recognizing the need for improved sequence features and stronger predictive performance, we then move to a deep learning modeling framework to predict epigenomic phenotypes such as chromatin accessibility from just underlying DNA sequence. We use deep learning models, specifically multi-task convolutional neural networks, to learn a featurization of sequences over several kilobases long and their mapping to a functional phenotype. We develop novel architectures that encode principles of genomics in models typically designed for computer vision, such as incorporating reverse complementation and the 3D structure of the genome. We also develop methods to interpret traditionally ``black box" neural networks by 1) assigning importance scores to each input sequence to the model, 2) summarizing non-redundant patterns learned by the model that are predictive in each cell type, and 3) discovering interactions learned by the model that provide indications as to how different non-coding sequence features depend on each other. We apply these methods in the system of hematopoiesis to interpret chromatin dynamics across differentiation of blood cell types, to understand immune stimulation, and to interpret immune disease-associated variants that fall in non-coding regions. We demonstrate strong performance of our boosting and deep learning models and demonstrate improved performance of these machine learning frameworks when taking into account existing knowledge about the biological system being modeled. We benchmark our interpretation methods using gold standard systems and existing experimental data where available. We confirm existing knowledge surrounding essential factors in hematopoiesis, and also generate novel hypotheses surrounding how factors interact to regulate differentiation. Ultimately our work provides a set of tools for researchers to probe and understand the non-coding genome and its role in controlling gene expression as well as a set of novel insights surrounding how hematopoiesis is controlled on many scales from global quantification of regulatory sequence to interpretation of individual variants.

Biologically Interpretable Machine Learning Methods to Understand Gene Regulation for Disease Phenotypes

Book Details:

Author : Ting Jin
Publisher :
Release : 2023
ISBN :
Pages : 0 pages

Download or read book Biologically Interpretable Machine Learning Methods to Understand Gene Regulation for Disease Phenotypes written by Ting Jin and published by . This book was released on 2023 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Gene expression and regulation is a key molecular mechanism driving the development of human diseases, particularly at the cell type level, but it remains elusive. For example in many brain diseases, such as Alzheimer's disease (AD), understanding how cell-type gene expression and regulation change across multiple stages of AD progression is still challenging. Moreover, interindividual variability of gene expression and regulation is a known characteristic of the human brain and brain diseases. However, it is still unclear how interindividual variability affects personalized gene regulation in brain diseases including AD, thereby contributing to their heterogeneity. Recent technological advances have enabled the detection of gene regulation activities through multi-omics (i.e., genomics, transcriptomics, epigenomics, proteomics). In particular, emerging single-cell sequencing technologies (e.g., scRNA-seq, scATAC-seq) allow us to study functional genomics and gene regulation at the cell-type level. Moreover, these multi-omics data of populations (e.g., human individuals) provide a unique opportunity to study the underlying regulatory mechanisms occurring in brain disease progression and clinical phenotypes. For instance, PsychAD is a large project generating single-cell multi-omics data including many neuronal and glial cell types, aiming to understand the molecular mechanisms of neuropsychiatric symptoms of multiple brain diseases (e.g., AD, SCZ, ASD, Bipolar) from over 1,000 individuals. However, analyzing and integrating large-scale multi-omics data at the population level, as well as understanding the mechanisms of gene regulation, also remains a challenge. Machine learning is a powerful and emerging tool to decode the unique complexities and heterogeneity of human diseases. For instance, Beebe-Wang, Nicosia, et al. developed MD-AD, a multi-task neural network model to predict various disease phenotypes in AD patients using RNA-seq. Additionally, with advancements in graph neural networks, which possess enhanced capabilities to represent sophisticated gene network structures like gene regulation networks that control gene expression. Efforts have also been made to capture the gene regulation heterogeneity of brain diseases. For instance, Kim SY has applied graph convolutional networks to offer personalized diagnostic insights through population graphs that correspond with disease progression. However, many existing machine learning methods are often limited to constructing accurate models for disease phenotype prediction and frequently lack biological interpretability or personalized insights, especially in gene regulation. Therefore, to address these challenges, my Ph.D. works have developed three machine-learning methods designed to decode the gene regulation mechanisms of human diseases. First, in this dissertation, I will present scGRNom, a computational pipeline that integrates multi-omic data to construct cell-type gene regulatory networks (GRNs) linking non-coding regulatory elements. Next, I will introduce i-BrainMap an interpretable knowledge-guided graph neural network model to prioritize personalized cell type disease genes, regulatory linkages, and modules. Thirdly, I introduce ECMaker, a semi-restricted Boltzmann machine (semi-RBM) method for identifying gene networks to predict diseases and clinical phenotypes. Overall, all our interpretable machine learning models improve phenotype prediction, prioritize key genes and networks associated with disease phenotypes, and are further aimed at enhancing our understanding of gene regulatory mechanisms driving disease progression and clinical phenotypes.

Science

Agricultural Bioinformatics

Book Details:

Author : Kavi Kishor P.B.
Publisher : Springer
Release : 2014-07-14
ISBN : 8132218809
Pages : 296 pages

Download or read book Agricultural Bioinformatics written by Kavi Kishor P.B. and published by Springer. This book was released on 2014-07-14 with total page 296 pages. Available in PDF, EPUB and Kindle. Book excerpt: A common approach to understanding the functional repertoire of a genome is through functional genomics. With systems biology burgeoning, bioinformatics has grown to a larger extent for plant genomes where several applications in the form of protein-protein interactions (PPI) are used to predict the function of proteins. With plant genes evolutionarily conserved, the science of bioinformatics in agriculture has caught interest with myriad of applications taken from bench side to in silico studies. A multitude of technologies in the form of gene analysis, biochemical pathways and molecular techniques have been exploited to an extent that they consume less time and have been cost-effective to use. As genomes are being sequenced, there is an increased amount of expression data being generated from time to time matching the need to link the expression profiles and phenotypic variation to the underlying genomic variation. This would allow us to identify candidate genes and understand the molecular basis/phenotypic variation of traits. While many bioinformatics methods like expression and whole genome sequence data of organisms in biological databases have been used in plants, we felt a common reference showcasing the reviews for such analysis is wanting. We envisage that this dearth would be facilitated in the form of this Springer book on Agricultural Bioinformatics. We thank all the authors and the publishers Springer, Germany for providing us an opportunity to review the bioinformatics works that the authors have carried in the recent past and hope the readers would find this book attention grabbing.

Gene expression

Gene Expression Data Analysis

Book Details:

Author : Pankaj Barah
Publisher : Chapman & Hall/CRC
Release : 2021-08
ISBN : 9781032055756
Pages : pages

Download or read book Gene Expression Data Analysis written by Pankaj Barah and published by Chapman & Hall/CRC. This book was released on 2021-08 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: "The book introduces phenomenal growth of data generated by increasing numbers of genome sequencing projects and other throughput technology-led experimental efforts. It provides information about various sources of gene expression data, and pre-processing, analysis, and validation of such data"--

Machine Learning Approaches to Understanding the Genetic Basis of Complex Traits

Book Details:

Author : Su-In Lee
Publisher :
Release : 2008
ISBN :
Pages : 348 pages

Download or read book Machine Learning Approaches to Understanding the Genetic Basis of Complex Traits written by Su-In Lee and published by . This book was released on 2008 with total page 348 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Computers

Elements of Causal Inference

Book Details:

Author : Jonas Peters
Publisher : MIT Press
Release : 2017-11-29
ISBN : 0262037319
Pages : 289 pages

Download or read book Elements of Causal Inference written by Jonas Peters and published by MIT Press. This book was released on 2017-11-29 with total page 289 pages. Available in PDF, EPUB and Kindle. Book excerpt: A concise and self-contained introduction to causal inference, increasingly important in data science and machine learning. The mathematization of causality is a relatively recent development, and has become increasingly important in data science and machine learning. This book offers a self-contained and concise introduction to causal models and how to learn them from data. After explaining the need for causal models and discussing some of the principles underlying causal inference, the book teaches readers how to use causal models: how to compute intervention distributions, how to infer causal models from observational and interventional data, and how causal ideas could be exploited for classical machine learning problems. All of these topics are discussed first in terms of two variables and then in the more general multivariate case. The bivariate case turns out to be a particularly hard problem for causal learning because there are no conditional independences as used by classical methods for solving multivariate cases. The authors consider analyzing statistical asymmetries between cause and effect to be highly instructive, and they report on their decade of intensive research into this problem. The book is accessible to readers with a background in machine learning or statistics, and can be used in graduate courses or as a reference for researchers. The text includes code snippets that can be copied and pasted, exercises, and an appendix with a summary of the most important technical concepts.

Machine Learning for the Analysis of Transcriptional Regulation and Biological Systems

Book Details:

Author : Dustin T. Holloway
Publisher :
Release : 2007
ISBN :
Pages : 572 pages

Download or read book Machine Learning for the Analysis of Transcriptional Regulation and Biological Systems written by Dustin T. Holloway and published by . This book was released on 2007 with total page 572 pages. Available in PDF, EPUB and Kindle. Book excerpt: Abstract: Many factors influence the regulation of genes and their protein products within the cell. The primary mode of regulatory control is the association of transcription factors (TFs) with their binding sites in DNA. These binding sites occur most often in a gene's promoter regions. The network of interactions between transcription factors and the genes they regulate governs many of the behaviors and responses of cells. One of the central goals of modern computational biology is the ability to predict the targets of transcription factors, thereby revealing the genetic program of the cell. Combining various kinds of data ( e.g., sequence, gene expression) in an optimal way to make these predictions is also a central theme of regulatory analysis. This thesis presents a data mining methodology making use of support vector machines (SVMs) and other machine learning techniques to predict new targets for transcription factors using a variety of genomic information. The employed methods allow extraction of detailed biological information from the datasets under study. Extracting this information allows us to generate hypotheses about TF function such as candidate sequences that a TF may bind or the experimental conditions under which it may act. We first develop a Bayesian approach to combine heterogeneous data. Then, an SVM method is implemented which greatly improves on the data integration. The SVM methods are applied to 163 yeast and 153 human TFs, generating thousands of high confidence predictions. These predictions are analyzed extensively, and two case studies are presented. Specifically, new roles for the yeast regulator Swi6 are discussed along with the role of Wt1 in human cancer. Along those lines, SVMs are also applied to DNA microarrays to discover biomarkers for renal cell carcinoma which can accurately differentiate normal from cancerous tissue. In addition, several machine learning algorithms are combined to assign functions to unannotated genes in the yeast genome by integrating large genomic datasets. Finally, a new motif discovery method (SVMotif) which leverages the power of kernel methods is proposed. The work concludes with thoughts on future directions and perspectives on future extensions of this work.