EBookClubs

Read Books & Download eBooks Full Online

EBookClubs

Read Books & Download eBooks Full Online

Book High Dimensional Methodologies for Sufficient Dimension Reduction  Discriminant Analysis  and Tensor Data

Download or read book High Dimensional Methodologies for Sufficient Dimension Reduction Discriminant Analysis and Tensor Data written by Jing Zeng and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Thanks to the advancement of data-collecting technology in brain imaging, genomics, financial econometrics, and machine learning, scientific data tend to grow in both size and structural complexity, which are not amenable to traditional statistical analysis. In this dissertation, we developed novel high-dimensional methodologies for dimension reduction, discriminant analysis, and tensor data. In the first chapter, we proposed a unified framework, called subspace estimation with automatic dimension and variable selection (SEAS), to extend many existing low-dimensional sufficient dimension reduction (SDR) methods to the high-dimensional setting. The flexibility of SEAS considerably widens the application scope of many SDR methods. Our proposal only relies on a double-penalized convex formulation, which can be solved efficiently. From the theoretical perspective, we established a satisfactory convergence rate for our proposal, which is optimal in a minimax sense. In the second chapter, we established a population model for the reduced-rank linear discriminant analysis (LDA) problem, which arises naturally in many scenarios. We also developed an efficient algorithm and derived the non-asymptotic results in the high-dimensional setting. In the last chapter, we studied how two data modalities associate and interact with each other given a third modality, which is a crucial problem in multimodal integrative analysis but has no available statistical solution. We formulated this problem as a tensor decomposition problem and proposed a novel generalized liquid association analysis (GLAA) method. A high-order orthogonal iteration algorithm is provided accordingly. Furthermore, we established the non-asymptotic results for the proposed estimators.

Book Dimension Reduction and Regression for Tensor Data and Mixture Models

Download or read book Dimension Reduction and Regression for Tensor Data and Mixture Models written by Ning Wang and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: In modern statistics, many data sets are of complex structure, including but not limited to high dimensionality, higher-order, and heterogeneity. Recently, there has been growing interest in developing valid and efficient statistical methods for these data sets. In my thesis, we studied three types of data complexity: (1) tensor data (a.k.a. array valued random objects); (2) heavy-tailed data; (3) data from heterogeneous subpopulations. We address these three challenges by developing novel methodologies and efficient algorithms. Specifically, we proposed likelihood-based dimension folding methods for tensor data, studied the robust tensor $\td$ regression by a proposed tensor $\td$ distribution, and developed an algorithm and theory for high-dimensional mixture linear regression. My work on these three topics is elaborated as follows. In recent years, traditional multivariate analysis tools, such as multivariate regression and discriminant analysis, are generalized from modeling random vectors and matrices to higher-order random tensors (a.k.a.~array-valued random objects). Equipped with tensor algebra and high-dimensional computation techniques, concise and interpretable statistical models and estimation procedures prevail in various applications. One challenge for tensor data analysis is caused by the large dimensions of the tensor. Many statistical methods such as linear discriminant analysis and quadratic discriminant analysis are not applicable or unstable for data sets with the dimension that is larger than the sample size. Sufficient dimension reduction methods are flexible tools for data visualization and exploratory analysis, typically in a regression of a univariate response on a multivariate predictor. For regressions with tensor predictors, a general framework of dimension folding and several moment-based estimation procedures have been proposed in the literature. In this essay, we propose two likelihood-based dimension folding methods motivated by quadratic discriminant analysis for tensor data: the maximum likelihood estimators are derived under a general covariance setting and a structured envelope covariance setting. We study the asymptotic properties of both estimators and show using simulation studies and a real-data analysis that they are more accurate than existing moment-based estimators. Another challenge to statistical tensor models is the non-Gaussian nature of many real-world data. Unfortunately, existing approaches are either restricted to normality or implicitly using least squares type objective functions that are computationally efficient but sensitive to data contamination. Motivated by this, we adopt a simple tensor $\td$-distribution that is, unlike the commonly used matrix $\td$-distributions, compatible with tensor operators and reshaping of the data. We study the tensor response regression with tensor $\td$-error, and develop penalized likelihood-based estimation and a novel one-step estimation. We study the asymptotic relative efficiency of various estimators and establish the one-step estimator's oracle properties and near-optimal asymptotic efficiency. We further propose a high-dimensional modification to the one-step estimation procedure and showed that it attains the minimax optimal rate in estimation. Numerical studies show the excellent performance of the one-step estimator. In the last chapter, we consider the high-dimensional mixture linear regression. The expectation-maximization (EM) algorithm and its variants are widely used in statistics. In high-dimensional mixture linear regression, the model is assumed to be a finite mixture of linear regression forms and the number of predictors is much larger than the sample size. The standard EM algorithm, which attempts to find the maximum likelihood estimator, becomes infeasible. We devise a penalized EM algorithm and study its statistical properties. Existing theoretical results of regularized EM algorithms often rely on dividing the sample into many independent batches and employing a fresh batch of sample in each iteration of the algorithm. Our algorithm and theoretical analysis do not require sample-splitting. The proposed method also has encouraging performances in simulation studies and a real data example.

Book Dimension Reduction

    Book Details:
  • Author : Christopher J. C. Burges
  • Publisher : Now Publishers Inc
  • Release : 2010
  • ISBN : 1601983786
  • Pages : 104 pages

Download or read book Dimension Reduction written by Christopher J. C. Burges and published by Now Publishers Inc. This book was released on 2010 with total page 104 pages. Available in PDF, EPUB and Kindle. Book excerpt: We give a tutorial overview of several foundational methods for dimension reduction. We divide the methods into projective methods and methods that model the manifold on which the data lies. For projective methods, we review projection pursuit, principal component analysis (PCA), kernel PCA, probabilistic PCA, canonical correlation analysis (CCA), kernel CCA, Fisher discriminant analysis, oriented PCA, and several techniques for sufficient dimension reduction. For the manifold methods, we review multidimensional scaling (MDS), landmark MDS, Isomap, locally linear embedding, Laplacian eigenmaps, and spectral clustering. Although the review focuses on foundations, we also provide pointers to some more modern techniques. We also describe the correlation dimension as one method for estimating the intrinsic dimension, and we point out that the notion of dimension can be a scale-dependent quantity. The Nystr m method, which links several of the manifold algorithms, is also reviewed. We use a publicly available dataset to illustrate some of the methods. The goal is to provide a self-contained overview of key concepts underlying many of these algorithms, and to give pointers for further reading.

Book Multilinear Subspace Learning

Download or read book Multilinear Subspace Learning written by Haiping Lu and published by CRC Press. This book was released on 2013-12-11 with total page 298 pages. Available in PDF, EPUB and Kindle. Book excerpt: Due to advances in sensor, storage, and networking technologies, data is being generated on a daily basis at an ever-increasing pace in a wide range of applications, including cloud computing, mobile Internet, and medical imaging. This large multidimensional data requires more efficient dimensionality reduction schemes than the traditional techniques. Addressing this need, multilinear subspace learning (MSL) reduces the dimensionality of big data directly from its natural multidimensional representation, a tensor. Multilinear Subspace Learning: Dimensionality Reduction of Multidimensional Data gives a comprehensive introduction to both theoretical and practical aspects of MSL for the dimensionality reduction of multidimensional data based on tensors. It covers the fundamentals, algorithms, and applications of MSL. Emphasizing essential concepts and system-level perspectives, the authors provide a foundation for solving many of today’s most interesting and challenging problems in big multidimensional data processing. They trace the history of MSL, detail recent advances, and explore future developments and emerging applications. The book follows a unifying MSL framework formulation to systematically derive representative MSL algorithms. It describes various applications of the algorithms, along with their pseudocode. Implementation tips help practitioners in further development, evaluation, and application. The book also provides researchers with useful theoretical information on big multidimensional data in machine learning and pattern recognition. MATLAB® source code, data, and other materials are available at www.comp.hkbu.edu.hk/~haiping/MSL.html

Book Modern Dimension Reduction

Download or read book Modern Dimension Reduction written by Philip D. Waggoner and published by Cambridge University Press. This book was released on 2021-08-05 with total page 98 pages. Available in PDF, EPUB and Kindle. Book excerpt: Data are not only ubiquitous in society, but are increasingly complex both in size and dimensionality. Dimension reduction offers researchers and scholars the ability to make such complex, high dimensional data spaces simpler and more manageable. This Element offers readers a suite of modern unsupervised dimension reduction techniques along with hundreds of lines of R code, to efficiently represent the original high dimensional data space in a simplified, lower dimensional subspace. Launching from the earliest dimension reduction technique principal components analysis and using real social science data, I introduce and walk readers through application of the following techniques: locally linear embedding, t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection, self-organizing maps, and deep autoencoders. The result is a well-stocked toolbox of unsupervised algorithms for tackling the complexities of high dimensional data so common in modern society. All code is publicly accessible on Github.

Book Elements of Dimensionality Reduction and Manifold Learning

Download or read book Elements of Dimensionality Reduction and Manifold Learning written by Benyamin Ghojogh and published by Springer Nature. This book was released on 2023-02-02 with total page 617 pages. Available in PDF, EPUB and Kindle. Book excerpt: Dimensionality reduction, also known as manifold learning, is an area of machine learning used for extracting informative features from data for better representation of data or separation between classes. This book presents a cohesive review of linear and nonlinear dimensionality reduction and manifold learning. Three main aspects of dimensionality reduction are covered: spectral dimensionality reduction, probabilistic dimensionality reduction, and neural network-based dimensionality reduction, which have geometric, probabilistic, and information-theoretic points of view to dimensionality reduction, respectively. The necessary background and preliminaries on linear algebra, optimization, and kernels are also explained to ensure a comprehensive understanding of the algorithms. The tools introduced in this book can be applied to various applications involving feature extraction, image processing, computer vision, and signal processing. This book is applicable to a wide audience who would like to acquire a deep understanding of the various ways to extract, transform, and understand the structure of data. The intended audiences are academics, students, and industry professionals. Academic researchers and students can use this book as a textbook for machine learning and dimensionality reduction. Data scientists, machine learning scientists, computer vision scientists, and computer scientists can use this book as a reference. It can also be helpful to statisticians in the field of statistical learning and applied mathematicians in the fields of manifolds and subspace analysis. Industry professionals, including applied engineers, data engineers, and engineers in various fields of science dealing with machine learning, can use this as a guidebook for feature extraction from their data, as the raw data in industry often require preprocessing. The book is grounded in theory but provides thorough explanations and diverse examples to improve the reader’s comprehension of the advanced topics. Advanced methods are explained in a step-by-step manner so that readers of all levels can follow the reasoning and come to a deep understanding of the concepts. This book does not assume advanced theoretical background in machine learning and provides necessary background, although an undergraduate-level background in linear algebra and calculus is recommended.

Book Tensor Dimension Reduction Methods for Modeling High Dimensional Spatio temporal Data

Download or read book Tensor Dimension Reduction Methods for Modeling High Dimensional Spatio temporal Data written by Rukayya Sani Ibrahim and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Data observed simultaneously in both space and time are becoming increasingly prevalent with applications in diverse areas, from ecology to financial econometrics. The datasets are massive with several variables observed in varying locations and time and are often accompanied with irregularities. Therefore, there is need to formulate efficient models that can efficiently handle the size and all dependencies of massive datatsets while performing predictions and forecast well. In this work, we propose a new model for matrix-valued spatio-temporal data using the classic vector autoregressive (VAR) model on each column (location) of the matrix. This allows us to present the coefficient matrices in a unified format. To achieve dimension reduction, we decompose the folded coefficient matrix using tensor decomposition, which allows us to have reduced dimension in four directions which automatically not only reduces the number of model parameters significantly but also achieves substantial efficiency gains. We propose an alternating least squares algorithm to estimate the parameters of interest and derive the asymptotic properties of the proposed estimators for low dimension. For high dimensional setting, we propose a sparsity-inducing norms using regularized estimation techniques. An alternating least squares algorithm with sparsity inducing norms is presented. We present simulation results and a real data analysis to demonstrate the superiority of our estimators.

Book Supervised Dimension Reduction Techniques for High dimensional Data

Download or read book Supervised Dimension Reduction Techniques for High dimensional Data written by Dylan Molho and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: The data sets arising in modern science and engineering are often extremely large, befitting the era of big data. But these data sets are not only large in the number of samples they have, they may also have a large number of features, placing each data point in a high-dimensional space. However, unique problems arise when the dimension of the data has the same or even greater order than the sample size. This scenario in statistics is known as the High Dimension, Low Sample Size problem (HDLSS). In this paradigm, many standard statistical estimators are shown to perform sub-optimally and in some cases can not be computed at all. To overcome the barriers found in HDLSS scenarios, one must make additional assumptions on the data, either with explicit formulations or with implicit beliefs about the behavior of the data. The first type of research leads to structural assumptions placed on the probability model that generates the data, which allow for alterations to classical methods to yield theoretically optimal estimators for the chosen well-defined tasks. The second type of research, in contrast, makes general assumptions usually based on the the causal nature of chosen real-world data application, where the data is assumed to have dependencies between the parameters.This dissertation develops two novel algorithms that successfully operate in the paradigm of HDLSS. We first propose the Generalized Eigenvalue (GEV) estimator, a unified sparse projection regression framework for estimating generalized eigenvector problems. Unlike existing work, we reformulate a sequence of computationally intractable non-convex generalized Rayleigh quotient optimization problems into a computationally efficient simultaneous linear regression problem, padded with a sparse penalty to deal with high-dimensional predictors. We showcase the applications of our method by considering three iconic problems in statistics: the sliced inverse regression (SIR), linear discriminant analysis (LDA), and canonical correlation analysis (CCA).We show the reformulated linear regression problem is able to recover the same projection space obtained by the original generalized eigenvalue problem. Statistically, we establish the nonasymptotic error bounds for the proposed estimator in the applications of SIR and LDA, and prove these rates are minimax optimal. We present how the GEV is applied to the CCA problem, and adapt the method for a robust Huber-loss based formulation for noisy data. We test our framework on both synthetic and real datasets and demonstrate its superior performance compared with other state-of-the-art methods in high dimensional statistics.The second algorithm is the scJEGNN, a graphical neural network (GNN) tailored to the task of data integration for HDLSS single-cell sequencing data. We show that with its unique model, the GNN is able to leverage structural information of the biological data relations in order to perform a joint embedding of multiple modalities of single-cell gene expression data. The model is applied to data from the NeurIPS 2021 competition for Open Problems in Single-Cell Analysis, and we demonstrate that our model is able to outperform top teams from the joint embedding task.

Book Tensor Data Analysis in High Dimensions

Download or read book Tensor Data Analysis in High Dimensions written by Keqian Min and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: A large number of tensor datasets have been appearing in modern scientific research, attracting much attention to the analysis of such datasets. Tensor data often have high dimensionality and tensor structure that contains extra information. Handling the high dimensionality and utilizing the structural information is essential for analyzing tensor data. Feature screening is a popular method to deal with high dimensionality. In the first part of this dissertation, we study the smoothness structure of tensors and propose a general framework for tensor screening called smoothed tensor screening (STS). We establish the SURE screening property for STS under mild conditions. In the second part, we study the tensor Gaussian graphical model, which reveals the conditional independence structure within tensor data. With normally distributed $M$-way tensors, the key to high-dimensional tensor graphical models becomes the sparse estimation of the $M$ inverse covariance matrices. To overcome the high computational cost of the existing cyclic approaches, we propose a separable and parallel estimation scheme. We provide numerical studies to demonstrate its performance. In the third part, we study the optimality theory of tensor discriminant analysis (TDA) in high dimensions. We provide a systematic investigation on the theoretical properties of TDA. We obtain the minimax lower bound for both coefficient estimation and misclassification risk. We further show that one existing high-dimensional tensor discriminant analysis estimator is minimax optimal.

Book Multimodal and Tensor Data Analytics for Industrial Systems Improvement

Download or read book Multimodal and Tensor Data Analytics for Industrial Systems Improvement written by Nathan Gaw and published by Springer Nature. This book was released on with total page 388 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Book Tensor Networks for Dimensionality Reduction and Large Scale Optimization

Download or read book Tensor Networks for Dimensionality Reduction and Large Scale Optimization written by Andrzej Cichocki and published by . This book was released on 2016-12-19 with total page 196 pages. Available in PDF, EPUB and Kindle. Book excerpt: This monograph provides a systematic and example-rich guide to the basic properties and applications of tensor network methodologies, and demonstrates their promise as a tool for the analysis of extreme-scale multidimensional data. It demonstrates the ability of tensor networks to provide linearly or even super-linearly, scalable solutions.

Book Dimension Reduction and Sufficient Graphical Models

Download or read book Dimension Reduction and Sufficient Graphical Models written by Kyongwon Kim and published by . This book was released on 2020 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: The methods I develop in my thesis are based on linear or nonlinear sufficient dimension reduction. The basic principle of linear sufficient dimension reduction is to extract a small number of linear combinations of predictor variables, which can represent original predictor variables without loss of information on the conditional distribution of response variable given predictor variables. Nonlinear sufficient dimension reduction is a more generalized version of linear sufficient dimension reduction to the nonlinear context. I am focusing on applying sufficient dimension reduction methods into two areas, regression modeling and graphical models. The first project is about statistical inference in regression context after sufficient dimension reduction. Second, I apply nonlinear sufficient dimension reduction method to the well known statistical graphical models in machine learning. These projects have consistency in a context that discovering areas that sufficient dimension reduction can be applied and establishing statistical theory behind their applications. My first project is about post sufficient dimension reduction statistical inference. The methodologies of sufficient dimension reduction have undergone extensive developments in the past three decades. However, there has been a lack of systematic and rigorous development of post dimension reduction inference, which has seriously hindered its applications. The current common practice is to treat the estimated sufficient predictors as the true predictors and use them as the starting point of the downstream statistical inference. However, this naive inference approach would grossly overestimate the confidence level of an interval, or the power of a test, leading to the distorted results. In this project, we develop a general and comprehensive framework of post dimension reduction inference, which can accommodate any dimension reduction method and model building method, as long as their corresponding influence functions are available. Within this general framework, we derive the influence functions and present the explicit post reduction formulas for the combinations of numerous dimension reduction and model building methods. We then develop post reduction inference methods for both confidence interval and hypothesis testing. We investigate the finite-sample performance of our procedures by simulations and a real data analysis. My second project is about applying nonlinear dimension reduction technique to graphical models. We introduce the Sufficient Graphical Model by applying the recently developed nonlinear sufficient dimension reduction techniques to the evaluation of conditional independence. Graphical model is nonparametric in nature, as it does not make distributional assumptions such as the Gaussian or copula Gaussian assumptions. However, unlike fully nonparametric graphical model, which relies on the high-dimensional kernel to characterize a conditional independence, our graphical model is based on a conditional independence given a set of sufficient predictors with a substantially reduced dimension. In this way, we avoid the curse of dimensionality that comes with a high-dimensional kernel. We develop the population-level properties, convergence rate, and consistency of our estimate. By simulation comparisons and an analysis of the DREAM 4 Challenge data set, we demonstrate that our method outperforms the existing methods when the Gaussian or copula Gaussian assumptions are violated, and its performance remains excellent in the high-dimensional setting.

Book Dimensionality Reduction with Unsupervised Nearest Neighbors

Download or read book Dimensionality Reduction with Unsupervised Nearest Neighbors written by Oliver Kramer and published by Springer Science & Business Media. This book was released on 2013-05-30 with total page 137 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book is devoted to a novel approach for dimensionality reduction based on the famous nearest neighbor method that is a powerful classification and regression approach. It starts with an introduction to machine learning concepts and a real-world application from the energy domain. Then, unsupervised nearest neighbors (UNN) is introduced as efficient iterative method for dimensionality reduction. Various UNN models are developed step by step, reaching from a simple iterative strategy for discrete latent spaces to a stochastic kernel-based algorithm for learning submanifolds with independent parameterizations. Extensions that allow the embedding of incomplete and noisy patterns are introduced. Various optimization approaches are compared, from evolutionary to swarm-based heuristics. Experimental comparisons to related methodologies taking into account artificial test data sets and also real-world data demonstrate the behavior of UNN in practical scenarios. The book contains numerous color figures to illustrate the introduced concepts and to highlight the experimental results.

Book Advances in Forward Sufficient Dimension Reduction Methods for Statistical Learning

Download or read book Advances in Forward Sufficient Dimension Reduction Methods for Statistical Learning written by Harris Quach and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Modern information collection methods continue to generate an abundance of data from which practitioners, in fields ranging from the social sciences and humanities to the natural and biomedical sciences, attempt to draw new insights and discoveries through statistical analyses. Against this backdrop, dimension reduction methods have come to prominence as a tool for extracting important or informative features from high dimensional data sets. In particular, Sufficient Dimension Reduction (SDR) methods have become a popular and effective tool for supervised dimension reduction. Since the seminal paper by K.-C. Li (1991), Sufficient Dimension Reduction has developed at a rapid pace, with inverse regression methods having undergone substantial development and becoming widely applicable. Alternative approaches to sufficient dimension reduction have received relatively less attention, despite being less restrictive and more effective in many scenarios. The objective of this thesis is to advance the forward regression approach for sufficient dimension reduction by developing methods that are more effective for categorical and ordinal responses, and methods that apply to functional data.

Book Discovery of Latent Factors in High dimensional Data Using Tensor Methods

Download or read book Discovery of Latent Factors in High dimensional Data Using Tensor Methods written by Furong Huang and published by . This book was released on 2016 with total page 261 pages. Available in PDF, EPUB and Kindle. Book excerpt: Unsupervised learning aims at the discovery of hidden structure that drives the observations in the real world. It is essential for success in modern machine learning and artificial intelligence. Latent variable models are versatile in unsupervised learning and have applications in almost every domain, e.g., social network analysis, natural language processing, computer vision and computational biology. Training latent variable models is challenging due to the non-convexity of the likelihood objective function. An alternative method is based on the spectral decomposition of low order moment matrices and tensors. This versatile framework is guaranteed to estimate the correct model consistently. My thesis spans both theoretical analysis of tensor decomposition framework and practical implementation of various applications.This thesis presents theoretical results on convergence to globally optimal solution of tensor decomposition using the stochastic gradient descent, despite non-convexity of the objective. This is the first work that gives global convergence guarantees for the stochastic gradient descent on non-convex functions with exponentially many local minima and saddle points.This thesis also presents large-scale deployment of spectral methods (matrix and tensor decomposition) carried out on CPU, GPU and Spark platforms. Dimensionality reduction techniques such as random projection are incorporated for a highly parallel and scalable tensor decomposition algorithm. We obtain a gain in both accuracies and in running times by several orders of magnitude compared to the state-of-art variational methods.To solve real world problems, more advanced models and learning algorithms are proposed. After introducing tensor decomposition framework under latent Dirichlet allocation (LDA) model, this thesis discusses generalization of LDA model to mixed membership stochastic block model for learning hidden user commonalities or communities in social network, convolutional dictionary model for learning phrase templates and word-sequence embeddings, hierarchical tensor decomposition and latent tree structure model for learning disease hierarchy in healthcare analytics, and spatial point process mixture model for detecting cell types in neuroscience.

Book Transactions on Rough Sets XXIII

Download or read book Transactions on Rough Sets XXIII written by James F. Peters and published by Springer Nature. This book was released on 2023-01-01 with total page 513 pages. Available in PDF, EPUB and Kindle. Book excerpt: The LNCS journal Transactions on Rough Sets is devoted to the entire spectrum of rough sets related issues, from logical and mathematical foundations, through all aspects of rough set theory and its applications, such as data mining, knowledge discovery, and intelligent information processing, to relations between rough sets and other approaches to uncertainty, vagueness, and incompleteness, such as fuzzy sets and theory of evidence. Volume XXIII in the series is a continuation of a number of research streams that have grown out of the seminal work of Zdzislaw Pawlak during the first decade of the 21st century.