[EBOOK] Exploiting Structure In Large Scale Optimization For Machine Learning PDF Download

Exploiting Structure in Large scale Optimization for Machine Learning

Book Details:

Author : Cho-Jui Hsieh
Publisher :
Release : 2015
ISBN :
Pages : 288 pages

Download or read book Exploiting Structure in Large scale Optimization for Machine Learning written by Cho-Jui Hsieh and published by . This book was released on 2015 with total page 288 pages. Available in PDF, EPUB and Kindle. Book excerpt: With an immense growth of data, there is a great need for solving large-scale machine learning problems. Classical optimization algorithms usually cannot scale up due to huge amount of data and/or model parameters. In this thesis, we will show that the scalability issues can often be resolved by exploiting three types of structure in machine learning problems: problem structure, model structure, and data distribution. This central idea can be applied to many machine learning problems. In this thesis, we will describe in detail how to exploit structure for kernel classification and regression, matrix factorization for recommender systems, and structure learning for graphical models. We further provide comprehensive theoretical analysis for the proposed algorithms to show both local and global convergent rate for a family of in-exact first-order and second-order optimization methods.

Optimization Methods for Structured Machine Learning Problems

Book Details:

Author : Nikolaos Tsipinakis
Publisher :
Release : 2019
ISBN :
Pages : 0 pages

Download or read book Optimization Methods for Structured Machine Learning Problems written by Nikolaos Tsipinakis and published by . This book was released on 2019 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Solving large-scale optimization problems lies at the core of modern machine learning applications. Unfortunately, obtaining a sufficiently accurate solution quickly is a difficult task. However, the problems we consider in many machine learning applications exhibit a particular structure. In this thesis we study optimization methods and improve their convergence behavior by taking advantage of such structures. In particular, this thesis constitutes of two parts: In the first part of the thesis, we consider the Temporal Difference learning (TD) problem in off-line Reinforcement Learning (RL). In off-line RL, it is typically the case that the number of samples is small compared to the number of features. Therefore, recent advances have focused on efficient algorithms to incorporate feature selection via `1-regularization which effectively avoids over-fitting. Unfortunately, the TD optimization problem reduces to a fixed-point problem where convexity of the objective function cannot be assumed. Further, it remains unclear whether existing algorithms have the ability to offer good approximations for the task of policy evaluation and improvement (either they are non-convergent or do not solve the fixed-point problem). In this part of the thesis, we attempt to solve the `1- regularized fixed-point problem with the help of Alternating Direction Method of Multipliers (ADMM) and we argue that the proposed method is well suited to the structure of the aforementioned fixed-point problem. In the second part of the thesis, we study multilevel methods for large-scale optimization and extend their theoretical analysis to self-concordant functions. In particular, we address the following issues that arise in the analysis of second-order optimization methods based either on sampling, randomization or sketching: (a) the analysis of the iterates is not scale-invariant and (b) lack of global fast convergence rates without restrictive assumptions. We argue that, with the analysis undertaken in this part of the thesis, the analysis of randomized second-order methods can be considered on-par with the analysis of the classical Newton method. Further, we demonstrate how our proposed method can exploit typical spectral structures of the Hessian that arise in machine learning applications to further improve the convergence rates.

Essays in Large Scale Optimization Algorithm and Its Application in Revenue Management

Book Details:

Author : Mingxi Zhu (Researcher in optimization algorithms)
Publisher :
Release : 2023
ISBN :
Pages : 0 pages

Download or read book Essays in Large Scale Optimization Algorithm and Its Application in Revenue Management written by Mingxi Zhu (Researcher in optimization algorithms) and published by . This book was released on 2023 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: This dissertation focuses on the large-scale optimization algorithm and its application in revenue management. It comprises three chapters. Chapter 1, Managing Randomization in the Multi-Block Alternating Direction Method of Multipliers for Quadratic Optimization, provides theoretical foundations for managing randomization in the multi-block alternating direction method of multipliers (ADMM) method for quadratic optimization. Chapter 2, How a Small Amount of Data Sharing Benefits Distributed Optimization and Learning, presents both the theoretical and practical evidences on sharing a small amount of data could hugely benefit distributed optimization and learning. Chapter 3, Dynamic Exploration and Exploitation: The Case of Online Lending, studies exploration/ exploitation trade-offs, and the value of dynamic extracting information in the context of online lending. The first chapter is a joint work with Kresimir Mihic and Yinyu Ye. The Alternating Direction Method of Multipliers (ADMM) has gained a lot of attention for solving large-scale and objective-separable constrained optimization. However, the two-block variable structure of the ADMM still limits the practical computational efficiency of the method, because one big matrix factorization is needed at least once even for linear and convex quadratic programming. This drawback may be overcome by enforcing a multi-block structure of the decision variables in the original optimization problem. Unfortunately, the multi-block ADMM, with more than two blocks, is not guaranteed to be convergent. On the other hand, two positive developments have been made: first, if in each cyclic loop one randomly permutes the updating order of the multiple blocks, then the method converges in expectation for solving any system of linear equations with any number of blocks. Secondly, such a randomly permuted ADMM also works for equality-constrained convex quadratic programming even when the objective function is not separable. The goal of this paper is twofold. First, we add more randomness into the ADMM by developing a randomly assembled cyclic ADMM (RAC-ADMM) where the decision variables in each block are randomly assembled. We discuss the theoretical properties of RAC-ADMM and show when random assembling helps and when it hurts, and develop a criterion to guarantee that it converges almost surely. Secondly, using the theoretical guidance on RAC-ADMM, we conduct multiple numerical tests on solving both randomly generated and large-scale benchmark quadratic optimization problems, which include continuous, and binary graph-partition and quadratic assignment, and selected machine learning problems. Our numerical tests show that the RAC-ADMM, with a variable-grouping strategy, could significantly improve the computation efficiency on solving most quadratic optimization problems. The second chapter is a joint work with Yinyu Ye. Distributed optimization algorithms have been widely used in machine learning and statistical estimation, especially under the context where multiple decentralized data centers exist and the decision maker is required to perform collaborative learning across those centers. While distributed optimization algorithms have the merits in parallel processing and protecting local data security, they often suffer from slow convergence compared with centralized optimization algorithms. This paper focuses on how small amount of data sharing could benefit distributed optimization and learning for more advanced optimization algorithms. Specifically, we consider how data sharing could benefit distributed multi-block alternating direction method of multipliers (ADMM) and preconditioned conjugate gradient method (PCG) with application in machine learning tasks of linear and logistic regression. These algorithms are commonly known as algorithms between the first and the second order methods, and we show that data share could hugely boost the convergence speed for this class of the algorithms. Theoretically, we prove that a small amount of data share leads to improvements from near-worst to near-optimal convergence rate when applying ADMM and PCG methods to machine learning tasks. A side theory product is the tight upper bound of linear convergence rate for distributed ADMM applied in linear regression. We further propose a meta randomized data-sharing scheme and provide its tailored applications in multi-block ADMM and PCG methods in order to enjoy both the benefit from data-sharing and from the efficiency of distributed computing. From the numerical evidences, we are convinced that our algorithms provide good quality of estimators in both the least square and the logistic regressions within much fewer iterations by only sharing 5% of pre-fixed data, while purely distributed optimization algorithms may take hundreds more times of iterations to converge. We hope that the discovery resulted from this paper would encourage even small amount of data sharing among different regions to combat difficult global learning problems. The third chapter is a joint work with Haim Mendelson. This paper studies exploration and exploitation tradeoffs in the context of online lending. Unlike traditional contexts where the cost of exploration is an opportunity cost of lost revenue or some other implicit cost, in the case of unsecured online lending, the lender effectively gives away money in order to learn about the borrower's ability to repay. In our model, the lender maximizes the expected net present value of the cash flow she receives by dynamically adjusting the loan amounts and the interest (discount) rate as she learns about the borrower's unknown income. The lender has to carefully balance the trade-offs between earning more interest when she lends more and the risk of default, and we provided the optimal dynamic policy for the lender. The optimal policy support the classic "lean experimentation" in certain regime, while challenge such concept in other regime. When the demand elasticity is zero (the discount rate is set exogenously), or the elasticity a decreasing function of the discount rate, the optimal policy is characterized by a large number of small experiments with increasing repayment amounts. When the demand elasticity is constant or when it is an increasing function of the discount rate, we obtain a two-step optimal policy: the lender performs a single experiment and then, if the borrower repays the loan, offers the same loan amount and discount rate in each subsequent period without any further experimentation. This result sheds light in how to take into account the market churn measured by elasticity, in the dynamic experiment design under uncertain environment. We further provide the implications under the optimal policies, including the impact of the income variability, the value of information and the consumer segmentation. Lastly, we extend the methodology to analyze the Buy-Now-Pay-Later business model and provide the policy suggestions.

Convex Optimization Algorithms and Statistical Bounds for Learning Structured Models

Book Details:

Author : Amin Jalali
Publisher :
Release : 2016
ISBN :
Pages : 178 pages

Download or read book Convex Optimization Algorithms and Statistical Bounds for Learning Structured Models written by Amin Jalali and published by . This book was released on 2016 with total page 178 pages. Available in PDF, EPUB and Kindle. Book excerpt: Design and analysis of tractable methods for estimation of structured models from massive high-dimensional datasets has been a topic of research in statistics, machine learning and engineering for many years. Regularization, the act of simultaneously optimizing a data fidelity term and a structure-promoting term, is a widely used approach in different machine learning and signal processing tasks. Appropriate regularizers, with efficient optimization techniques, can help in exploiting the prior structural information on the underlying model. This dissertation is focused on exploring new structures, devising efficient convex relaxations for exploiting them, and studying the statistical performance of such estimators. We address three problems under this framework on which we elaborate below. In many applications, we aim to reconstruct models that are known to have more than one structure at the same time. Having a rich literature on exploiting common structures like sparsity and low rank at hand, one could pose similar questions about simultaneously structured models with several low-dimensional structures. Using the respective known convex penalties for the involved structures, we show that multi-objective optimization with these penalties can do no better, order-wise, than exploiting only one of the present structures. This suggests that to fully exploit the multiple structures, we need an entirely new convex relaxation, not one that combines the convex relaxations for each structure. This work, while applicable for general structures, yields interesting results for the case of sparse and low-rank matrices which arise in applications such as sparse phase retrieval and quadratic compressed sensing. We then turn our attention to the design and efficient optimization of convex penalties for structured learning. We introduce a general class of semidefinite representable penalties, called variational Gram functions (VGF), and provide a list of optimization tools for solving regularized estimation problems involving VGFs. Exploiting the variational structure in VGFs, as well as the variational structure in many common loss functions, enables us to devise efficient optimization techniques as well as to provide guarantees on the solutions of many regularized loss minimization problems. Finally, we explore the statistical and computational trade-offs in the community detection problem. We study recovery regimes and algorithms for community detection in sparse graphs generated under a heterogeneous stochastic block model in its most general form. In this quest, we were able to expand the applicability of semidefinite programs (in exact community detection) to some new and important network configurations, which provides us with a better understanding of the ability of semidefinite programs in reaching statistical identifiability limits.

Computers

Optimization for Machine Learning

Book Details:

Author : Suvrit Sra
Publisher : MIT Press
Release : 2012
ISBN : 026201646X
Pages : 509 pages

Download or read book Optimization for Machine Learning written by Suvrit Sra and published by MIT Press. This book was released on 2012 with total page 509 pages. Available in PDF, EPUB and Kindle. Book excerpt: An up-to-date account of the interplay between optimization and machine learning, accessible to students and researchers in both communities. The interplay between optimization and machine learning is one of the most important developments in modern computational science. Optimization formulations and methods are proving to be vital in designing algorithms to extract essential knowledge from huge volumes of data. Machine learning, however, is not simply a consumer of optimization technology but a rapidly evolving field that is itself generating new optimization ideas. This book captures the state of the art of the interaction between optimization and machine learning in a way that is accessible to researchers in both fields. Optimization approaches have enjoyed prominence in machine learning because of their wide applicability and attractive theoretical properties. The increasing complexity, size, and variety of today's machine learning models call for the reassessment of existing assumptions. This book starts the process of reassessment. It describes the resurgence in novel contexts of established frameworks such as first-order methods, stochastic approximations, convex relaxations, interior-point methods, and proximal methods. It also devotes attention to newer themes such as regularized optimization, robust optimization, gradient and subgradient methods, splitting techniques, and second-order methods. Many of these techniques draw inspiration from other fields, including operations research, theoretical computer science, and subfields of optimization. The book will enrich the ongoing cross-fertilization between the machine learning community and these other fields, and within the broader optimization community.

First Order Methods for Large Scale Convex Optimization

Book Details:

Author : Zi Wang
Publisher :
Release : 2016
ISBN :
Pages : pages

Download or read book First Order Methods for Large Scale Convex Optimization written by Zi Wang and published by . This book was released on 2016 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: The revolution of storage technology in the past few decades made it possible to gather tremendous amount of data anywhere from demand and sales records to web user behavior, customer ratings, software logs and patient data in healthcare. Recognizing patterns and discovering knowledge from large amount of data becomes more and more important, and has attracted significant attention in operations research (OR), statistics and computer science field. Mathematical programming is an essential tool within these fields, and especially for data mining and machine learning, and it plays a significant role for data-driven predictions/decisions and pattern recognition.The major challenge while solving those large-scale optimization problems is to process large data sets within practically tolerable run-times. This is where the advantages of first-order algorithms becomes clearly apparent. These methods only use gradient information, and are particularly good at computing medium accuracy solutions. In contrast, interior point method computations that exploit second-order information quickly become intractable, even for moderate-size problems, since the complexity of each factorization of a n n matrix in interior point methods is O(n^3). The memory required for second-order methods could also be an issue in practice for problems with dense data matrices due to limited RAM. Another benefit of using first-order methods is that one can exploit additional structural information of the problem to further improve the efficiency of these algorithms.In this dissertation, we studied convex regression, and multi-agent consensus optimization problems; and developed new fast first-order iterative algorithms to efficiently compute -optimal and -feasible solutions to these large-scale optimization problems in parallel, distributed, or asynchronous computation settings while carefully managing memory usage. The proposed algorithms are able to take advantage of the structural information of the specific problems we considered in this dissertation, and have strong capability to deal with large-scale problems. Our numerical results showed the advantages of our proposed methods over other traditional methods in terms of speed, memory usage, and especially communication requirements for distributed methods.

Computers

Tractability

Book Details:

Author : Lucas Bordeaux
Publisher : Cambridge University Press
Release : 2014-02-06
ISBN : 110772922X
Pages : 401 pages

Download or read book Tractability written by Lucas Bordeaux and published by Cambridge University Press. This book was released on 2014-02-06 with total page 401 pages. Available in PDF, EPUB and Kindle. Book excerpt: Classical computer science textbooks tell us that some problems are 'hard'. Yet many areas, from machine learning and computer vision to theorem proving and software verification, have defined their own set of tools for effectively solving complex problems. Tractability provides an overview of these different techniques, and of the fundamental concepts and properties used to tame intractability. This book will help you understand what to do when facing a hard computational problem. Can the problem be modelled by convex, or submodular functions? Will the instances arising in practice be of low treewidth, or exhibit another specific graph structure that makes them easy? Is it acceptable to use scalable, but approximate algorithms? A wide range of approaches is presented through self-contained chapters written by authoritative researchers on each topic. As a reference on a core problem in computer science, this book will appeal to theoreticians and practitioners alike.

Technology & Engineering

Evolutionary Large Scale Multi Objective Optimization and Applications

Book Details:

Author : Xingyi Zhang
Publisher : John Wiley & Sons
Release : 2024-09-11
ISBN : 1394178417
Pages : 358 pages

Download or read book Evolutionary Large Scale Multi Objective Optimization and Applications written by Xingyi Zhang and published by John Wiley & Sons. This book was released on 2024-09-11 with total page 358 pages. Available in PDF, EPUB and Kindle. Book excerpt: Tackle the most challenging problems in science and engineering with these cutting-edge algorithms Multi-objective optimization problems (MOPs) are those in which more than one objective needs to be optimized simultaneously. As a ubiquitous component of research and engineering projects, these problems are notoriously challenging. In recent years, evolutionary algorithms (EAs) have shown significant promise in their ability to solve MOPs, but challenges remain at the level of large-scale multi-objective optimization problems (LSMOPs), where the number of variables increases and the optimized solution is correspondingly harder to reach. Evolutionary Large-Scale Multi-Objective Optimization and Applications constitutes a systematic overview of EAs and their capacity to tackle LSMOPs. It offers an introduction to both the problem class and the algorithms before delving into some of the cutting-edge algorithms which have been specifically adapted to solving LSMOPs. Deeply engaged with specific applications and alert to the latest developments in the field, it’s a must-read for students and researchers facing these famously complex but crucial optimization problems. The book’s readers will also find: Analysis of multi-optimization problems in fields such as machine learning, network science, vehicle routing, and more Discussion of benchmark problems and performance indicators for LSMOPs Presentation of a new taxonomy of algorithms in the field Evolutionary Large-Scale Multi-Objective Optimization and Applications is ideal for advanced students, researchers, and scientists and engineers facing complex optimization problems.

Large Scale Optimization Methods for Machine Learning

Book Details:

Author : Shuai Zheng
Publisher :
Release : 2019
ISBN :
Pages : 264 pages

Download or read book Large Scale Optimization Methods for Machine Learning written by Shuai Zheng and published by . This book was released on 2019 with total page 264 pages. Available in PDF, EPUB and Kindle. Book excerpt:

Deep Learning Approach to Large Scale Systems

Book Details:

Author : Abdulelah Lafi Altamimi
Publisher :
Release : 2023
ISBN :
Pages : 0 pages

Download or read book Deep Learning Approach to Large Scale Systems written by Abdulelah Lafi Altamimi and published by . This book was released on 2023 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: The significance of large-scale systems has increased recently due to the growth in data and the number of users. The computational cost of analyzing these high-dimensional systems due to the curse of dimensionality raises the urge for developing efficient approaches. Deep learning methods have the capability and scalability to process high-volume data with significantly lower computational complexity. In this work, deep learning algorithms are utilized to solve large-scale systems in different applications. We design and solve high-dimensional systems using tractable algorithms. In particular, the deep reinforcement learning method and deep neural network are employed in our work in maximizing problems and classification problems, respectively. Comparisons with conventional algorithms are performed for validation purposes. Moreover, this work proposes an approach to exploiting the knowledge of the physical structure of plants inspired by deep learning algorithms. An application in the forest management field considered in this work is a large-scale forest model for wildfire mitigation. A high-dimensional forest model is designed in the Markov decision process framework. The model includes the probability of wildfire occurrence in a large number of stands. The probability of wildfire in each stand is a function of wind direction, flammability, and the stand's timber volume. Wildfire reduction is achieved by maximizing the timber volume in the forest through management actions. A deep reinforcement learning approach, i.e., the actor-critic algorithm, is used to solve the Markov decision process and propose management policies. Furthermore, the performances of conventional Markov decision process solutions,i.e., the value iteration algorithm and the genetic algorithm, are compared to the proposed approach. It outperforms these algorithms in terms of the value of the timber volume and the computational cost. Another interesting application considered in this thesis is fast stochastic predictive control. In the proposed approach, the computational complexity of solving stochastic predictive control is significantly reduced using deep learning. In particular, the number of constraints in the sampled method is reduced to the minimal set required to solve the optimization problem. Determining these constraints,i.e., the policies, is considered a classification problem to be solved using a neural network. The small number of constraints and the solvable quadratic optimization problem introduced by the sampled method result in a fast stochastic model predictive control. In this thesis, we also propose an approach to exploiting the prior knowledge of the physically interconnected systems in the parameter estimation domain. Unlike the physics-informed neural network, the proposed approach can estimate the parameters for every system in the interconnection. It has a general form that can be applied to any system as well as an objective function. We also combine the case of prior knowledge of system function with the case of the unavailability of this information. The Fourier series approximation method is used when knowledge of system functions is not available. The first-order gradient descent algorithm is considered to minimize the estimation error in the objective function. For that, we provide a systematic way to compute the gradients of the objective function. Using several versions of the gradient descent algorithm, the proposed solution shows promising results in the estimation of the system parameters.

Computer algorithms

Optimal Stochastic and Distributed Algorithms for Machine Learning

Book Details:

Author : Hua Ouyang
Publisher :
Release : 2013
ISBN :
Pages : pages

Download or read book Optimal Stochastic and Distributed Algorithms for Machine Learning written by Hua Ouyang and published by . This book was released on 2013 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Stochastic and data-distributed optimization algorithms have received lots of attention from the machine learning community due to the tremendous demand from the large-scale learning and the big-data related optimization. A lot of stochastic and deterministic learning algorithms are proposed recently under various application scenarios. Nevertheless, many of these algorithms are based on heuristics and their optimality in terms of the generalization error is not sufficiently justified. In this talk, I will explain the concept of an optimal learning algorithm, and show that given a time budget and proper hypothesis space, only those achieving the lower bounds of the estimation error and the optimization error are optimal. Guided by this concept, we investigated the stochastic minimization of nonsmooth convex loss functions, a central problem in machine learning. We proposed a novel algorithm named Accelerated Nonsmooth Stochastic Gradient Descent, which exploits the structure of common nonsmooth loss functions to achieve optimal convergence rates for a class of problems including SVMs. It is the first stochastic algorithm that can achieve the optimal O(1/t) rate for minimizing nonsmooth loss functions. The fast rates are confirmed by empirical comparisons with state-of-the-art algorithms including the averaged SGD. The Alternating Direction Method of Multipliers (ADMM) is another flexible method to explore function structures. In the second part we proposed stochastic ADMM that can be applied to a general class of convex and nonsmooth functions, beyond the smooth and separable least squares loss used in lasso. We also demonstrate the rates of convergence for our algorithm under various structural assumptions of the stochastic function: O(1/sqrt{t}) for convex functions and O(log t/t) for strongly convex functions. A novel application named Graph-Guided SVM is proposed to demonstrate the usefulness of our algorithm. We also extend the scalability of stochastic algorithms to nonlinear kernel machines, where the problem is formulated as a constrained dual quadratic optimization. The simplex constraint can be handled by the classic Frank-Wolfe method. The proposed stochastic Frank-Wolfe methods achieve comparable or even better accuracies than state-of-the-art batch and online kernel SVM solvers, and are significantly faster. The last part investigates the problem of data-distributed learning. We formulate it as a consensus-constrained optimization problem and solve it with ADMM. It turns out that the underlying communication topology is a key factor in achieving a balance between a fast learning rate and computation resource consumption. We analyze the linear convergence behavior of consensus ADMM so as to characterize the interplay between the communication topology and the penalty parameters used in ADMM. We observe that given optimal parameters, the complete bipartite and the master-slave graphs exhibit the fastest convergence, followed by bi-regular graphs.

Optimization Algorithms for Machine Learning

Book Details:

Author : Anant Raj
Publisher :
Release : 2020
ISBN :
Pages : 0 pages

Download or read book Optimization Algorithms for Machine Learning written by Anant Raj and published by . This book was released on 2020 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: With the advent of massive datasets and increasingly complex tasks, modern machine learning systems pose several new challenges in terms of scalability to high dimensional data as well as to large datasets. In this thesis, we consider to study scalable descent methods such as coordinate descent and stochastic coordinate descent which are based on the stochastic approximation of full gradient. In the first part of the thesis, we propose faster and scalable coordinate based opti- mization which scales to high dimensional problems. As a first step to achieve scalable coordinate based descent approaches, we propose a new framework to derive screening rules for convex optimization problems based on duality gap which covers a large class of constrained and penalized optimization formulations. In later stages, we develop new approximately greedy coordinate selection strategy in coordinate descent for large-scale optimization. This novel coordinate selection strategy provavbly works better than uni- formly random selection, and can reach the efficiency of steepest coordinate descent (SCD) in the best case. In best case scenario, this may enable an acceleration of a factor of up to n, the number of coordinates. Having similar objective in mind, we further propose an adaptive sampling strategy for sampling in stochastic gradient based optimization. The proposed safe sampling scheme provably achieves faster convergence than any fixed deterministic sampling schemes for coordinate descent and stochastic gradient descent methods. Exploiting the connection between matching pursuit where a more generalized notion of directions is considered and greedy coordinate descent where all the moving directions are orthogonal, we also propose a unified analysis for both the approaches and extend it to get the accelerated rate. In the second part of this thesis, we focus on providing provably faster and scalable mini batch stochastic gradient descent (SGD) algorithms. Variance reduced SGD methods converge significantly faster than the vanilla SGD counterpart. We propose a variance reduce algorithm k-SVRG that addresses issues of SVRG [98] and SAGA[54] by making best use of the available memory and minimizes the stalling phases without progress. In later part of the work, we provide a simple framework which utilizes the idea of optimistic update to obtain accelerated stochastic algorithms. We obtain accelerated variance reduced algorithm as well as accelerated universal algorithm as a direct consequence of this simple framework. Going further, we also employ the idea of local sensitivity based importance sampling in an iterative optimization method and analyze its convergence while optimizing over the selected subset. In the final part of the thesis, we connect the dots between coordinate descent method and stochastic gradient descent method in the interpolation regime. We show that better stochastic gradient based dual algorithms with fast rate of convergence can be obtained to optimize the convex objective in the interpolation regime.

Optimization Methods for Large Scale Problems and Applications to Machine Learning

Book Details:

Author : Luca Bravi
Publisher :
Release : 2016
ISBN :
Pages : pages

Download or read book Optimization Methods for Large Scale Problems and Applications to Machine Learning written by Luca Bravi and published by . This book was released on 2016 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt:

Computers

Evolutionary Multi Task Optimization

Book Details:

Author : Liang Feng
Publisher : Springer Nature
Release : 2023-03-29
ISBN : 9811956502
Pages : 220 pages

Download or read book Evolutionary Multi Task Optimization written by Liang Feng and published by Springer Nature. This book was released on 2023-03-29 with total page 220 pages. Available in PDF, EPUB and Kindle. Book excerpt: A remarkable facet of the human brain is its ability to manage multiple tasks with apparent simultaneity. Knowledge learned from one task can then be used to enhance problem-solving in other related tasks. In machine learning, the idea of leveraging relevant information across related tasks as inductive biases to enhance learning performance has attracted significant interest. In contrast, attempts to emulate the human brain’s ability to generalize in optimization – particularly in population-based evolutionary algorithms – have received little attention to date. Recently, a novel evolutionary search paradigm, Evolutionary Multi-Task (EMT) optimization, has been proposed in the realm of evolutionary computation. In contrast to traditional evolutionary searches, which solve a single task in a single run, evolutionary multi-tasking algorithm conducts searches concurrently on multiple search spaces corresponding to different tasks or optimization problems, each possessing a unique function landscape. By exploiting the latent synergies among distinct problems, the superior search performance of EMT optimization in terms of solution quality and convergence speed has been demonstrated in a variety of continuous, discrete, and hybrid (mixture of continuous and discrete) tasks. This book discusses the foundations and methodologies of developing evolutionary multi-tasking algorithms for complex optimization, including in domains characterized by factors such as multiple objectives of interest, high-dimensional search spaces and NP-hardness.

Large scale Optimization Methods for Data science Applications

Book Details:

Author : Haihao Lu (Ph.D.)
Publisher :
Release : 2019
ISBN :
Pages : 211 pages

Download or read book Large scale Optimization Methods for Data science Applications written by Haihao Lu (Ph.D.) and published by . This book was released on 2019 with total page 211 pages. Available in PDF, EPUB and Kindle. Book excerpt: In this thesis, we present several contributions of large scale optimization methods with the applications in data science and machine learning. In the first part, we present new computational methods and associated computational guarantees for solving convex optimization problems using first-order methods. We consider general convex optimization problem, where we presume knowledge of a strict lower bound (like what happened in empirical risk minimization in machine learning). We introduce a new functional measure called the growth constant for the convex objective function, that measures how quickly the level sets grow relative to the function value, and that plays a fundamental role in the complexity analysis. Based on such measure, we present new computational guarantees for both smooth and non-smooth convex optimization, that can improve existing computational guarantees in several ways, most notably when the initial iterate is far from the optimal solution set. The usual approach to developing and analyzing first-order methods for convex optimization always assumes that either the gradient of the objective function is uniformly continuous (in the smooth setting) or the objective function itself is uniformly continuous. However, in many settings, especially in machine learning applications, the convex function is neither of them. For example, the Poisson Linear Inverse Model, the D-optimal design problem, the Support Vector Machine problem, etc. In the second part, we develop a notion of relative smoothness, relative continuity and relative strong convexity that is determined relative to a user-specified "reference function" (that should be computationally tractable for algorithms), and we show that many differentiable convex functions are relatively smooth or relatively continuous with respect to a correspondingly fairly-simple reference function. We extend the mirror descent algorithm to our new setting, with associated computational guarantees. Gradient Boosting Machine (GBM) introduced by Friedman is an extremely powerful supervised learning algorithm that is widely used in practice -- it routinely features as a leading algorithm in machine learning competitions such as Kaggle and the KDDCup. In the third part, we propose the Randomized Gradient Boosting Machine (RGBM) and the Accelerated Gradient Boosting Machine (AGBM). RGBM leads to significant computational gains compared to GBM, by using a randomization scheme to reduce the search in the space of weak-learners. AGBM incorporate Nesterov's acceleration techniques into the design of GBM, and this is the first GBM type of algorithm with theoretically-justified accelerated convergence rate. We demonstrate the effectiveness of RGBM and AGBM over GBM in obtaining a model with good training and/or testing data fidelity..

Combinatorial optimization

Machine Learning to Scale Up Combinatorial Applications

Book Details:

Author : F. A. Rezaur Rahman Chowdhury
Publisher :
Release : 2020
ISBN :
Pages : 190 pages

Download or read book Machine Learning to Scale Up Combinatorial Applications written by F. A. Rezaur Rahman Chowdhury and published by . This book was released on 2020 with total page 190 pages. Available in PDF, EPUB and Kindle. Book excerpt: Combinatorial optimization problems arise in many scientific and engineering domains including graph analytics, computational biology, natural language processing, and computer vision. Existing methods to solve these combinatorial problems can be classified into three categories: exact tractable algorithms by exploiting the structure of these problems; approximation algorithms; and heuristic methods. In many real-world applications, we repeatedly solve a particular type of combinatorial optimization problem on different problem instances. For example, processing different input queries over a graph database. However, these combinatorial optimization solvers don't exploit the availability of this large set of input problem instances to improve their effectiveness.In this thesis, we propose a machine learning based search framework to automatically scale up combinatorial optimization solvers using the training data generated from a distribution of input problem instances. This research is inspired by the ability of humans to improve the speed of their reasoning processes with experience. For example, as a child learns to read or play chess, the reasoning processes involved become more automatic and perform better per unit time. The key idea is to define an effective time-bounded search procedure to solve the underlying combinatorial optimization problem and learn search control knowledge to improve speed and/or accuracy using supervised training data.We instantiate this framework for three important real-world combinatorial problems. First, we improve the effectiveness of processing graph queries over a large-scale graph database. We study two qualitatively different approaches towards this goal. Second, we present a linear-time machine learning-based folding system for RNA secondary structure prediction1 . Third, we develop learning methods to improve the speed and accuracy of solving structured prediction tasks arising in natural language processing and computer vision (e.g., producing part-of-speech tag sequences for input sequence of words) using randomized greedy search procedures.

Data driven Optimization Under Uncertainty in the Era of Big Data and Deep Learning

Book Details:

Author : Chao Ning
Publisher :
Release : 2020
ISBN :
Pages : 270 pages

Download or read book Data driven Optimization Under Uncertainty in the Era of Big Data and Deep Learning written by Chao Ning and published by . This book was released on 2020 with total page 270 pages. Available in PDF, EPUB and Kindle. Book excerpt: This dissertation deals with the development of fundamental data-driven optimization under uncertainty, including its modeling frameworks, solution algorithms, and a wide variety of applications. Specifically, three research aims are proposed, including data-driven distributionally robust optimization for hedging against distributional uncertainties in energy systems, online learning based receding-horizon optimization that accommodates real-time uncertainty data, and an efficient solution algorithm for solving large-scale data-driven multistage robust optimization problems. There are two distinct research projects under the first research aim. In the first related project, we propose a novel data-driven Wasserstein distributionally robust mixed-integer nonlinear programming model for the optimal biomass with agricultural waste-to-energy network design under uncertainty. A data-driven uncertainty set of feedstock price distributions is devised using the Wasserstein metric. To address computational challenges, we propose a reformulation-based branch-and-refine algorithm. In the second related project, we develop a novel deep learning based distributionally robust joint chance constrained economic dispatch optimization framework for a high penetration of renewable energy. By leveraging a deep generative adversarial network (GAN), an f-divergence-based ambiguity set of wind power distributions is constructed as a ball in the probability space centered at the distribution induced by a generator neural network. To facilitate its solution process, the resulting distributionally robust chance constraints are equivalently reformulated as ambiguity-free chance constraints, which are further tackled using a scenario approach. Additionally, we derive a priori bound on the required number of synthetic wind power data generated by f-GAN to guarantee a predefined risk level. To facilitate large-scale applications, we further develop a prescreening technique to increase computational and memory efficiencies by exploiting problem structure. The second research aim addresses the online learning of real-time uncertainty data for receding-horizon optimization-based control. In the related project, data-driven stochastic model predictive control is proposed for linear time-invariant systems under additive stochastic disturbance, whose probability distribution is unknown but can be partially inferred from real-time disturbance data. The conditional value-at-risk constraints on system states are required to hold for an ambiguity set of disturbance distributions. By leveraging a Dirichlet process mixture model, the first and second-order moment information of each mixture component is incorporated into the ambiguity set. As more data are gathered during the runtime of controller, the ambiguity set is updated based on real-time data. We then develop a novel constraint tightening strategy based on an equivalent reformulation of distributionally robust constraints over the proposed ambiguity set. Additionally, we establish theoretical guarantees on recursive feasibility and closed-loop stability of the proposed model predictive control. The third research aim focuses on algorithm development for data-driven multistage adaptive robust mixed-integer linear programs. In the related project, we propose a multi-to-two transformation theory and develop a novel transformation-proximal bundle algorithm. By partitioning recourse decisions into state and control decisions, affine decision rules are applied exclusively on the state decisions. In this way, the original multistage robust optimization problem is shown to be transformed into an equivalent two-stage robust optimization problem, which is further addressed using a proximal bundle method. The finite convergence of the proposed solution algorithm is guaranteed for the multistage robust optimization problem with a generic uncertainty set. To quantitatively assess solution quality, we further develop a scenario-tree-based lower bounding technique. The effectiveness and advantages of the proposed algorithm are fully demonstrated in inventory control and process network planning.