EBookClubs

Read Books & Download eBooks Full Online

EBookClubs

Read Books & Download eBooks Full Online

Book Improved Prediction of Gene Expression of Epigenomics Data of Lung Cancer Using Machine Learning and Deep Learning Models

Download or read book Improved Prediction of Gene Expression of Epigenomics Data of Lung Cancer Using Machine Learning and Deep Learning Models written by ZhengXin Shi and published by . This book was released on 2020 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Epigenetics is the study of biological mechanisms that will switch genes on and off, its alterations are deeply involved in the change of gene expression among various diseases including cancers. Machine learning is frequently used in cancer diagnosis and detection. In this research, four types of data are used towards the correct prediction of lung cancer, including DNA Methylation data, Histone data, Human Genome data, and RNA-Seq data. Four feature selection methods - ReliefF, Gain Ratio (GR), Principle Component Analysis (PCA), Correlation-based feature selection (CFS) and seven different classifiers - Random Forest (RF), Support Vector Machine (SVM) with Gaussian Kernel function and Linear Kernel function, Logistic Regression (LR), Naive Bayes (NB), Artificial Neural Network, and Convolutional Neural Network (CNN) were implemented in this study. The processing of these data sets is done using custom R-script. The tools that were used for feature selection and classification in the presented work are Weka 3 and Python. With the help of machine learning and deep learning methods, we were able to improve the accuracy and area under the curve (AUC) of the lung cancer prediction from an earlier published work. It was observed that the CNN model overperformed the other six classification methods.

Book Optimized Feature Selection for Enhancing Lung Cancer Prediction Using Machine Learning Techniques

Download or read book Optimized Feature Selection for Enhancing Lung Cancer Prediction Using Machine Learning Techniques written by Shanthi S and published by Ary Publisher. This book was released on 2023-02-25 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Lung cancer is a major cause of cancer-related deaths worldwide. Machine learning techniques have shown promising results in the early detection and prediction of lung cancer. However, high-dimensional data, such as gene expression profiles, can introduce noise and decrease the classification accuracy of machine learning models. Feature selection techniques can alleviate this issue by identifying the most relevant and informative features, leading to better model performance. Optimized feature selection techniques can enhance the prediction accuracy of lung cancer using machine learning algorithms. Support vector machines, random forest, and artificial neural networks are commonly used algorithms for lung cancer prediction. By optimizing feature selection, these models can be trained with the most informative features, reducing overfitting and improving classification accuracy. Cross-validation techniques can also be used to evaluate the performance of feature selection and machine learning algorithms. The integration of optimized feature selection with machine learning techniques can provide an accurate and reliable lung cancer prediction model, which has the potential to improve early detection and precision medicine for lung cancer patients. Overall, optimized feature selection for enhancing lung cancer prediction using machine learning techniques is a promising approach to improving patient outcomes and reducing the global burden of lung cancer.

Book Deep learning to disease prediction on next generation sequencing and biomedical imaging data

Download or read book Deep learning to disease prediction on next generation sequencing and biomedical imaging data written by Saurav Mallik and published by Frontiers Media SA. This book was released on 2023-08-31 with total page 144 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Handbook of Machine Learning Applications for Genomics

Download or read book Handbook of Machine Learning Applications for Genomics written by Sanjiban Sekhar Roy and published by Springer Nature. This book was released on 2022-06-23 with total page 222 pages. Available in PDF, EPUB and Kindle. Book excerpt: Currently, machine learning is playing a pivotal role in the progress of genomics. The applications of machine learning are helping all to understand the emerging trends and the future scope of genomics. This book provides comprehensive coverage of machine learning applications such as DNN, CNN, and RNN, for predicting the sequence of DNA and RNA binding proteins, expression of the gene, and splicing control. In addition, the book addresses the effect of multiomics data analysis of cancers using tensor decomposition, machine learning techniques for protein engineering, CNN applications on genomics, challenges of long noncoding RNAs in human disease diagnosis, and how machine learning can be used as a tool to shape the future of medicine. More importantly, it gives a comparative analysis and validates the outcomes of machine learning methods on genomic data to the functional laboratory tests or by formal clinical assessment. The topics of this book will cater interest to academicians, practitioners working in the field of functional genomics, and machine learning. Also, this book shall guide comprehensively the graduate, postgraduates, and Ph.D. scholars working in these fields.

Book Cancer Prediction for Industrial IoT 4 0

Download or read book Cancer Prediction for Industrial IoT 4 0 written by Meenu Gupta and published by CRC Press. This book was released on 2021-12-31 with total page 202 pages. Available in PDF, EPUB and Kindle. Book excerpt: Cancer Prediction for Industrial IoT 4.0: A Machine Learning Perspective explores various cancers using Artificial Intelligence techniques. It presents the rapid advancement in the existing prediction models by applying Machine Learning techniques. Several applications of Machine Learning in different cancer prediction and treatment options are discussed, including specific ideas, tools and practices most applicable to product/service development and innovation opportunities. The wide variety of topics covered offers readers multiple perspectives on various disciplines. Features • Covers the fundamentals, history, reality and challenges of cancer • Presents concepts and analysis of different cancers in humans • Discusses Machine Learning-based deep learning and data mining concepts in the prediction of cancer • Offers real-world examples of cancer prediction • Reviews strategies and tools used in cancer prediction • Explores the future prospects in cancer prediction and treatment Readers will learn the fundamental concepts and analysis of cancer prediction and treatment, including how to apply emerging technologies such as Machine Learning into practice to tackle challenges in domains/fields of cancer with real-world scenarios. Hands-on chapters contributed by academicians and other professionals from reputed organizations provide and describe frameworks, applications, best practices and case studies on emerging cancer treatment and predictions. This book will be a vital resource to graduate students, data scientists, Machine Learning researchers, medical professionals and analytics managers.

Book Machine Learning Techniques on Gene Function Prediction

Download or read book Machine Learning Techniques on Gene Function Prediction written by Quan Zou and published by Frontiers Media SA. This book was released on 2019-12-04 with total page 485 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Enhancing Gene Expression Signatures in Cancer Prediction Models

Download or read book Enhancing Gene Expression Signatures in Cancer Prediction Models written by Vidya P. Kamath and published by . This book was released on 2010 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: ABSTRACT: Cancer can develop through a series of genetic events in combination with external influential factors that alter the progression of the disease. Gene expression studies are designed to provide an enhanced understanding of the progression of cancer and to develop clinically relevant biomarkers of disease, prognosis and response to treatment. One of the main aims of microarray gene expression analyses is to develop signatures that are highly predictive of specific biological states, such as the molecular stage of cancer. This dissertation analyzes the classification complexity inherent in gene expression studies, proposing both techniques for measuring complexity and algorithms for reducing this complexity. Classifier algorithms that generate predictive signatures of cancer models must generalize to independent datasets for successful translation to clinical practice. The predictive performance of classifier models is shown to be dependent on the inherent complexity of the gene expression data. Three specific quantitative measures of classification complexity are proposed and one measure (Phi) is shown to correlate highly (R2=0.82) with classifier accuracy in experimental data. Three quantization methods are proposed to enhance contrast in gene expression data and reduce classification complexity. The accuracy for cancer prognosis prediction is shown to improve using quantization in two datasets studied: from 67% to 90% in lung cancer and from 56% to 68% in colorectal cancer. A corresponding reduction in classification complexity is also observed. A random subspace based multivariable feature selection approach using cost-sensitive analysis is proposed to model the underlying heterogeneous cancer biology and address complexity due to multiple molecular pathways and unbalanced distribution of samples into classes. The technique is shown to be more accurate than the univariate t-test method. The classifier accuracy improves from 56% to 68% for colorectal cancer prognosis prediction. A published gene expression signature to predict radiosensitivity of tumor cells is augmented with clinical indicators to enhance modeling of the data and represent the underlying biology more closely. Statistical tests and experiments indicate that the improvement in the model fit is a result of modeling the underlying biology rather than statistical over-fitting of the data, thereby accommodating classification complexity through the use of additional variables.

Book Advanced Machine Learning Approaches in Cancer Prognosis

Download or read book Advanced Machine Learning Approaches in Cancer Prognosis written by Janmenjoy Nayak and published by Springer Nature. This book was released on 2021-05-29 with total page 461 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book introduces a variety of advanced machine learning approaches covering the areas of neural networks, fuzzy logic, and hybrid intelligent systems for the determination and diagnosis of cancer. Moreover, the tactical solutions of machine learning have proved its vast range of significance and, provided novel solutions in the medical field for the diagnosis of disease. This book also explores the distinct deep learning approaches that are capable of yielding more accurate outcomes for the diagnosis of cancer. In addition to providing an overview of the emerging machine and deep learning approaches, it also enlightens an insight on how to evaluate the efficiency and appropriateness of such techniques and analysis of cancer data used in the cancer diagnosis. Therefore, this book focuses on the recent advancements in the machine learning and deep learning approaches used in the diagnosis of different types of cancer along with their research challenges and future directions for the targeted audience including scientists, experts, Ph.D. students, postdocs, and anyone interested in the subjects discussed.

Book Machine and Deep Learning in Oncology  Medical Physics and Radiology

Download or read book Machine and Deep Learning in Oncology Medical Physics and Radiology written by Issam El Naqa and published by Springer Nature. This book was released on 2022-02-02 with total page 514 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book, now in an extensively revised and updated second edition, provides a comprehensive overview of both machine learning and deep learning and their role in oncology, medical physics, and radiology. Readers will find thorough coverage of basic theory, methods, and demonstrative applications in these fields. An introductory section explains machine and deep learning, reviews learning methods, discusses performance evaluation, and examines software tools and data protection. Detailed individual sections are then devoted to the use of machine and deep learning for medical image analysis, treatment planning and delivery, and outcomes modeling and decision support. Resources for varying applications are provided in each chapter, and software code is embedded as appropriate for illustrative purposes. The book will be invaluable for students and residents in medical physics, radiology, and oncology and will also appeal to more experienced practitioners and researchers and members of applied machine learning communities.

Book Interpretable Machine Learning Methods for Regulatory and Disease Genomics

Download or read book Interpretable Machine Learning Methods for Regulatory and Disease Genomics written by Peyton Greis Greenside and published by . This book was released on 2018 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: It is an incredible feat of nature that the same genome contains the code to every cell in each living organism. From this same genome, each unique cell type gains a different program of gene expression that enables the development and function of an organism throughout its lifespan. The non-coding genome - the ~98 of the genome that does not code directly for proteins - serves an important role in generating the diverse programs of gene expression turned on in each unique cell state. A complex network of proteins bind specific regulatory elements in the non-coding genome to regulate the expression of nearby genes. While basic principles of gene regulation are understood, the regulatory code of which factors bind together at which genomic elements to turn on which genes remains to be revealed. Further, we do not understand how disruptions in gene regulation, such as from mutations that fall in non-coding regions, ultimately lead to disease or other changes in cell state. In this work we present several methods developed and applied to learn the regulatory code or the rules that govern non-coding regions of the genome and how they regulate nearby genes. We first formulate the problem as one of learning pairs of sequence motifs and expressed regulator proteins that jointly predict the state of the cell, such as the cell type specific gene expression or chromatin accessibility. Using pre-engineered sequence features and known expression, we use a paired-feature boosting approach to build an interpretable model of how the non-coding genome contributes to cell state. We also demonstrate a novel improvement to this method that takes into account similarities between closely related cell types by using a hierarchy imposed on all of the predicted cell states. We apply this method to discover validated regulators of tadpole tail regeneration and to predict protein-ligand binding interactions. Recognizing the need for improved sequence features and stronger predictive performance, we then move to a deep learning modeling framework to predict epigenomic phenotypes such as chromatin accessibility from just underlying DNA sequence. We use deep learning models, specifically multi-task convolutional neural networks, to learn a featurization of sequences over several kilobases long and their mapping to a functional phenotype. We develop novel architectures that encode principles of genomics in models typically designed for computer vision, such as incorporating reverse complementation and the 3D structure of the genome. We also develop methods to interpret traditionally ``black box" neural networks by 1) assigning importance scores to each input sequence to the model, 2) summarizing non-redundant patterns learned by the model that are predictive in each cell type, and 3) discovering interactions learned by the model that provide indications as to how different non-coding sequence features depend on each other. We apply these methods in the system of hematopoiesis to interpret chromatin dynamics across differentiation of blood cell types, to understand immune stimulation, and to interpret immune disease-associated variants that fall in non-coding regions. We demonstrate strong performance of our boosting and deep learning models and demonstrate improved performance of these machine learning frameworks when taking into account existing knowledge about the biological system being modeled. We benchmark our interpretation methods using gold standard systems and existing experimental data where available. We confirm existing knowledge surrounding essential factors in hematopoiesis, and also generate novel hypotheses surrounding how factors interact to regulate differentiation. Ultimately our work provides a set of tools for researchers to probe and understand the non-coding genome and its role in controlling gene expression as well as a set of novel insights surrounding how hematopoiesis is controlled on many scales from global quantification of regulatory sequence to interpretation of individual variants.

Book Machine Intelligence  Big Data Analytics  and IoT in Image Processing

Download or read book Machine Intelligence Big Data Analytics and IoT in Image Processing written by Ashok Kumar and published by John Wiley & Sons. This book was released on 2023-02-14 with total page 516 pages. Available in PDF, EPUB and Kindle. Book excerpt: MACHINE INTELLIGENCE, BIG DATA ANALYTICS, AND IoT IN IMAGE PROCESSING Discusses both theoretical and practical aspects of how to harness advanced technologies to develop practical applications such as drone-based surveillance, smart transportation, healthcare, farming solutions, and robotics used in automation. The concepts of machine intelligence, big data analytics, and the Internet of Things (IoT) continue to improve our lives through various cutting-edge applications such as disease detection in real-time, crop yield prediction, smart parking, and so forth. The transformative effects of these technologies are life-changing because they play an important role in demystifying smart healthcare, plant pathology, and smart city/village planning, design and development. This book presents a cross-disciplinary perspective on the practical applications of machine intelligence, big data analytics, and IoT by compiling cutting-edge research and insights from researchers, academicians, and practitioners worldwide. It identifies and discusses various advanced technologies, such as artificial intelligence, machine learning, IoT, image processing, network security, cloud computing, and sensors, to provide effective solutions to the lifestyle challenges faced by humankind. Machine Intelligence, Big Data Analytics, and IoT in Image Processing is a significant addition to the body of knowledge on practical applications emerging from machine intelligence, big data analytics, and IoT. The chapters deal with specific areas of applications of these technologies. This deliberate choice of covering a diversity of fields was to emphasize the applications of these technologies in almost every contemporary aspect of real life to assist working in different sectors by understanding and exploiting the strategic opportunities offered by these technologies. Audience The book will be of interest to a range of researchers and scientists in artificial intelligence who work on practical applications using machine learning, big data analytics, natural language processing, pattern recognition, and IoT by analyzing images. Software developers, industry specialists, and policymakers in medicine, agriculture, smart cities development, transportation, etc. will find this book exceedingly useful.

Book Prediction of Cancer from Gene Expression Data

Download or read book Prediction of Cancer from Gene Expression Data written by Ansuman Kumar and published by . This book was released on 2022-09-13 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Cancer is one of the dangerous diseases caused by abnormal division of cells and uncontrolled exponential growth of cells. Cancer cells usually behave dierently from the normal cells and can spread to other parts of the body. This spreading process of cancer cells to other parts of the body is called metastasis [1]. Cancer arises from the conversion of normal cells into cancerous cells in a multistage process that generally progresses from a pre-cancerous cells to a malignant tumor. Cancer is the second-leading cause of death worldwide and an approximately 9.6 million people die every year from cancer according to the Union for International Cancer Control (UICC), Switzerland (https: //www.worldcancerday.org/ what-cancer). Early classication of cancer sub-type classes has a great importance in serving better diagnosis to the patients. Therefore, cancer sub-types (classes) prediction at initial stage has become a vital area of research in the eld of machine learning and medical science worldwide to the researchers and scientists. There exist dierent clinical approaches to diagnosis of cancer which are described. Apart from the clinical approaches of predicting cancer, computational biologists suggest complementary and relatively inexpensive solution for cancer prediction, and primary (early) diagnosis using modern technology like machine learning [3] and soft computing [4] etc. to apply on microarray gene expression data [5]. Machine learning [3] technology provides set of computer models that automatically learn from data and experience. Whereas, soft computing [6] is a collection of methodologies which exploit the tolerance for imprecision and uncertainty to achieve tractability, robustness, and low solution cost. Microarray technology [5] records thousands of genes simultaneously. Number of genes present in microarray data is normally very large as compared to the number of samples [7]. Also the clinically labeled samples are very few. Moreover the cancer subtypes exist in microarray gene expression data are often vague, indiscernible, ambiguous, and overlapping in nature [8]. Therefore, it is important to construct robust classiers in this complex (vague, indiscernible, ambiguous) scenario that would achieve high accuracy in classifying cancerous samples [9] in presence of limited training samples. Detailed description about machine learning, soft computing and microarray technology are provided.

Book Deep Learning Techniques for Analyzing Clinical Lung Cancer Data

Download or read book Deep Learning Techniques for Analyzing Clinical Lung Cancer Data written by Haoze Du and published by . This book was released on 2019 with total page 104 pages. Available in PDF, EPUB and Kindle. Book excerpt: With the continued public concerns about cancer identification in patients, many methods have been implemented to analyze clinical records to gain actionable information and make a meaningful prediction of cancer patients outcomes. It is necessary to accurately predict the efficacy of specific therapy or identify a combination of actionable treatments on clinical practice based on clinical datasets. While conventional machine learning methods such as artificial neural networks and support vector machines have shown promise, they clearly have significant room for improvement. In this thesis, we attempted to train and optimize an innovative deep learning method called cascade forest, which is inspired by artificial neural networks, as well as a number of traditional machine learning methods and deep neural networks. Cutting edge machine learning tools such as Tensorflow and Scikit-learn on the GPU platform, which allows parallel computation to enhance their performances, were used to improve the time efficiency. The outcomes of this thesis include: 1) predicting the outcomes of a cancer patient based on clinical data from the publicly available SEER database; 2) evaluating the patient outcomes by comparing the models based on different datasets; 3) attempting to increase the accuracy and reduce the execution time for model training by optimizing machine learning models.

Book Artificial Intelligence and Precision Oncology

Download or read book Artificial Intelligence and Precision Oncology written by Zodwa Dlamini and published by Springer Nature. This book was released on 2023-01-21 with total page 317 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book highlights the use of artificial intelligence (AI), big data and precision oncology for medical decision making in cancer screening, diagnosis, prognosis and treatment. Precision oncology has long been thought of as ideal for the management and treatment of cancer. This strategy promises to revolutionize the treatment, control, and prevention of cancer by tailoring tests, treatments and predictions to specific individuals or population groups. In order to accomplish these goals, vast amounts of patient or population group specific data needs to be integrated and analysed to be able to identify key patterns or features which can be used to define or characterize the disease or the response to the disease in these individuals. These patterns or features can be as varied as molecular patterns or features in medical images. This level of data analysis and integration can only be achieved through the use of AI. The book is divided into three parts starting with a section on the use of artificial intelligence for screening, diagnosis and monitoring in precision oncology. The second part: Artificial intelligence and Omics in precision oncology, highlights the use of AI and epigenetics, metabolomics, microbiomics in precision oncology. The third part covers artificial intelligence in cancer therapy and its clinical applications. It also highlights the use of AI tools for risk prediction, early detection, diagnosis and accurate prognosis. This book, written by experts in the field from academia and industry, will appeal to cancer researchers, clinical oncologists, pathologists, medical students, academic teaching staff and medical residents interested in cancer research as well as those specialising as clinical oncologists.

Book Lung Nodule Malignancy Prediction from Computed Tomography Images Using Deep Learning

Download or read book Lung Nodule Malignancy Prediction from Computed Tomography Images Using Deep Learning written by Rahul Paul and published by . This book was released on 2020 with total page 135 pages. Available in PDF, EPUB and Kindle. Book excerpt: Lung cancer has a high incidence and mortality rate. The five-year relative survival rate for all lung cancers is 18%. Due to the high mortality and incidence rate of lung cancer worldwide, early detection is essential. Low dose Computed Tomography (CT) is a commonly used technique for screening, diagnosis, and prognosis of non-small cell lung cancer (NSCLC). The National Lung Screening Trial (NLST) compared low-dose helical computed tomography (LDCT) and standard chest radiography (CXR) for three annual screens and reported a 20% relative reduction in lung cancer mortality for LDCT compared to CXR. As such, LDCT screening for lung cancer is an effective way of mitigating lung cancer mortality and is the only imaging option for those at high risk. Lung cancer screening for high-risk patients often detects a large number of indeterminate pulmonary nodules, of which only a subset will be identified as cancer. As such, reliable and reproducible biomarkers determining which indeterminate pulmonary nodules will be identified as cancer would have significant translational implications as a therapeutic method to enhance lung cancer screening for nodule detection. Radiomics is an approach to extract high-dimensional quantitative features from medical images, which can be used individually or merged with clinical data for predictive and diagnostic analysis. Quantitative radiomics features (size, shape, and texture) extracted from lung CT scans have been shown to predict cancer incidence and prognosis. Deep learning is an emerging machine learning approach, which has been applied to the classification and analysis of various cancers and tumors. To generate generic features (blobs, edges, etc.) from an image, different convolutional kernels are applied over the input image, and then those generic feature-based images are passed through some fully connected neural layers. This category of the neural network is called a convolutional neural network (CNN), which has achieved high accuracy on image data. With the advancement of deep learning and convolutional neural networks (CNNs), deep features can be utilized to analyze lung CTs for prognosis prediction and diagnosis. In this dissertation, deep learning-based approaches were presented for lung nodule malignancy prediction. A subset of cases from the NLST was chosen as a dataset in our study. We experimented with three different pre-trained CNNs for extracting deep features and used five different classifiers. Experiments were also conducted with deep features from different color channels of a pre-trained CNN. Selected deep features were combined with radiomics features. Three CNNs were designed and trained. Combinations of features from pre-trained, CNNs trained on NLST data, and classical radiomics were used to build classifiers. The best accuracy (76.79%) was obtained using feature combinations. An area under the receiver operating characteristic curve of 0.87 was obtained using a CNN trained on an augmented NLST data cohort. After that, each of the three CNNs was trained using seven different seeds to create the initial weights. These enabled variability in the CNN models, which were combined to generate a robust, more accurate ensemble model. Augmenting images using only rotation and flipping and training with images from T0 yielded the best accuracy to predict lung cancer incidence at T2 from a separate test cohort (Accuracy = 90.29%; AUC = 0.96) based on an ensemble 21 models. From this research, five conclusions were obtained, which will be utilized in future research. First, we proposed a simple and effective CNN architecture with a small number of parameters useful for smaller (medical) datasets. Second, we showed features obtained using transfer learning with all the channels of a pre-trained CNN performed better than features extracted using any single channel and we also constructed a new feature set by fusing quantitative features with deep features, which in turn enhanced classification performance. Third, ensemble learning with deep neural networks was a compelling approach that accurately predicted lung cancer incidence at the second screening after the baseline screen, mostly two years later. Fourth, we proposed a method for deep features to have a recognizable definition via semantic or quantitative features. Fifth, deep features were dependent on the scanner parameters, and the dependency was removed using pixel size based normalization.

Book Machine Learning for Large scale Genomics

Download or read book Machine Learning for Large scale Genomics written by Yifei Chen and published by . This book was released on 2014 with total page 125 pages. Available in PDF, EPUB and Kindle. Book excerpt: Genomic malformations are believed to be the driving factors of many diseases. Therefore, understanding the intrinsic mechanisms underlying the genome and informing clinical practices have become two important missions of large-scale genomic research. Recently, high-throughput molecular data have provided abundant information about the whole genome, and have popularized computational tools in genomics. However, traditional machine learning methodologies often suffer from strong limitations when dealing with high-throughput genomic data, because the latter are usually very high dimensional, highly heterogeneous, and can show complicated nonlinear effects. In this thesis, we present five new algorithms or models to address these challenges, each of which is applied to a specific genomic problem. Project 1 focuses on model selection in cancer diagnosis. We develop an efficient algorithm (ADMM-ENSVM) for the Elastic Net Support Vector Machine, which achieves simultaneous variable selection and max-margin classification. On a colon cancer diagnosis dataset, ADMM-ENSVM shows advantages over other SVM algorithms in terms of diagnostic accuracy, feature selection ability, and computational efficiency. Project 2 focuses on model selection in gene correlation analysis. We develop an efficient algorithm (SBLVGG) using the similar methodology as of ADMM-ENSVM for the Latent Variable Gaussian Graphical Model (LVGG). LVGG models the marginal concentration matrix of observed variables as a combination of a sparse matrix and a low rank one. Evaluated on a microarray dataset containing 6,316 genes, SBLVGG is notably faster than the state-of-the-art LVGG solver, and shows that most of the correlation among genes can be effectively explained by only tens of latent factors. Project 3 focuses on ensemble learning in cancer survival analysis. We develop a gradient boosting model (GBMCI), which does not explicitly assume particular forms of hazard functions, but trains an ensemble of regression trees to approximately optimize the concordance index. We benchmark the performance of GBMCI against several popular survival models on a large-scale breast cancer prognosis dataset. GBMCI consistently outperforms other methods based on a number of feature representations, which are heterogeneous and contain missing values. Project 4 focuses on deep learning in gene expression inference (GEIDN). GEIDN is a large-scale neural network, which can infer ~21k target genes jointly from ~1k landmark genes and can naturally capture hierarchical nonlinear interactions among genes. We deploy deep learning techniques (drop out, momentum training, GPU computing, etc.) to train GEIDN. On a dataset of ~129k complete human transcriptomes, GEIDN outperforms both k-nearest neighbor regression and linear regression in predicting >99.96% of the target genes. Moreover, increased network scales help to improve GEIDN, while increased training data benefits GEIDN more than other methods. Project 5 focuses on deep learning in annotating coding and noncoding genetic variants (DANN). DANN is a neural network to differentiate evolutionarily derived alleles from simulated ones with 949 highly heterogeneous features. It can capture nonlinear relationships among features. We train DANN with deep learning techniques like for GEIDN. DANN achieves a 18.90% relative reduction in the error rate and a 14.52% relative increase in the area under the curve over CADD, a state-of-the-art algorithm to annotate genetic variants based on the linear SVM.

Book Computational Prediction of Chemopreventive and Therapeutic Options in Cancer Using Whole genome Gene Expression Studies

Download or read book Computational Prediction of Chemopreventive and Therapeutic Options in Cancer Using Whole genome Gene Expression Studies written by Adam Matthew Gustafson and published by . This book was released on 2009 with total page 350 pages. Available in PDF, EPUB and Kindle. Book excerpt: Abstract: Cancer is a leading cause of death worldwide, accounting for over 7.9 million deaths annually. In this dissertation, computational methodologies utilizing whole-genome gene expression data were used to identify preventive and therapeutic opportunities to combat this disease. Three important aspects of cancer progression were studied: host response to carcinogens, early stages of tumorigenesis, and deregulated pathways in the primary tumor. First, host response to cigarette smoke in the bronchial airway of healthy current and never smokers was studied to elucidate what may be key regulatory relationships in the metabolism of carcinogens. MicroRNAs, which are short, non-coding RNAs involved in post-transcriptional gene regulation, were found to be primarily down-regulated in the airway of smokers. By integrating microRNA and mRNA airway expression data, mir-218 was identified as putatively inhibiting the transcription factor MAFG, which is predicted to regulate genes involved in the response to tobacco exposure. Second, early events in the development of lung cancer were studied using computational modeling approaches that incorporate gene expression signatures defined by in vitro perturbation of specific oncogenic pathways. When analyzing the cytologically normal bronchial airway of smokers with lung cancer and high-risk smokers with dysplastic airway lesions, the PI3K pathway had heightened activity throughout the respiratory tract prior to oncogenesis. This has significant implications regarding preventive opportunities, as we found that a chemopreventative agent for lung cancer, myo-inositol, previously shown to cause regression of dysplasia, inhibits PI3K in vitro and in vivo . Finally, personalized treatment of primary breast cancer was explored by training models on gene expression data from in vitro drug response experiments to predict the responsiveness of a tumor to a drug. Ten compounds were studied, and predicted drug responsiveness was significantly linked to survival rates, highlighting their biological/clinical relevance. Some compounds showed synergy with conventional breast cancer subtypes, while others had autonomous patterns of sensitivity. Drug sensitivity predictions were validated in two mouse xenograft models, suggesting the computational methodology is pertinent and accurate. In summary, clinically relevant therapeutic information regarding deregulated pathways can be uncovered in gene expression data and used to improve our understanding of tumorigenesis and guide treatment of cancer.