[EBOOK] Efficient Reinforcement Learning With Value Function Generalization PDF Download

Efficient Reinforcement Learning with Value Function Generalization

Book Details:

Author : Zheng Wen
Publisher :
Release : 2014
ISBN :
Pages : pages

Download or read book Efficient Reinforcement Learning with Value Function Generalization written by Zheng Wen and published by . This book was released on 2014 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Reinforcement learning (RL) is concerned with how an agent should learn to make decisions over time while interacting with an environment. A growing body of work has produced RL algorithms with sample and computational efficiency guarantees. However, most of this work focuses on "tabula rasa" learning; i.e. algorithms aim to learn with little or no prior knowledge about the environment. Such algorithms exhibit sample complexities that grow at least linearly in the number of states, and they are of limited practical import since state spaces in most relevant contexts are enormous. There is a need for algorithms that generalize in order to learn how to make effective decisions at states beyond the scope of past experience. This dissertation focuses on the open issue of developing efficient RL algorithms that leverage value function generalization (VFG). It consists of two parts. In the first part, we present sample complexity results for two classes of RL problems -- deterministic systems with general forms of VFG and Markov decision processes (MDPs) with a finite hypothesis class. The results provide upper bounds that are independent of state and action space cardinalities and polynomial in other problem parameters. In the second part, building on insights from our sample complexity analyses, we propose randomized least-square value iteration (RLSVI), a RL algorithm for MDPs with VFG via linear hypothesis classes. The algorithm is based on a new notion of randomized value function exploration. We compare through computational studies the performance of RLSVI against least-square value iterations (LSVI) with Boltzmann exploration or epsilon-greedy exploration, which are widely used in RL with VFG. Results demonstrate that RLSVI is orders of magnitude more efficient.

Computers

Reinforcement Learning second edition

Book Details:

Author : Richard S. Sutton
Publisher : MIT Press
Release : 2018-11-13
ISBN : 0262352702
Pages : 549 pages

Download or read book Reinforcement Learning second edition written by Richard S. Sutton and published by MIT Press. This book was released on 2018-11-13 with total page 549 pages. Available in PDF, EPUB and Kindle. Book excerpt: The significantly expanded and updated new edition of a widely used text on reinforcement learning, one of the most active research areas in artificial intelligence. Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives while interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the field's key ideas and algorithms. This second edition has been significantly expanded and updated, presenting new topics and updating coverage of other topics. Like the first edition, this second edition focuses on core online learning algorithms, with the more mathematical material set off in shaded boxes. Part I covers as much of reinforcement learning as possible without going beyond the tabular case for which exact solutions can be found. Many algorithms presented in this part are new to the second edition, including UCB, Expected Sarsa, and Double Learning. Part II extends these ideas to function approximation, with new sections on such topics as artificial neural networks and the Fourier basis, and offers expanded treatment of off-policy learning and policy-gradient methods. Part III has new chapters on reinforcement learning's relationships to psychology and neuroscience, as well as an updated case-studies chapter including AlphaGo and AlphaGo Zero, Atari game playing, and IBM Watson's wagering strategy. The final chapter discusses the future societal impacts of reinforcement learning.

Hardware efficient Scalable Reinforcement Learning Systems

Book Details:

Author :
Publisher :
Release : 2007
ISBN :
Pages : 88 pages

Download or read book Hardware efficient Scalable Reinforcement Learning Systems written by and published by . This book was released on 2007 with total page 88 pages. Available in PDF, EPUB and Kindle. Book excerpt: Reinforcement Learning (RL) is a machine learning discipline in which an agent learns by interacting with its environment. In this paradigm, the agent is required to perceive its state and take actions accordingly. Upon taking each action, a numerical reward is provided by the environment. The goal of the agent is thus to maximize the aggregate rewards it receives over time. Over the past two decades, a large variety of algorithms have been proposed to select actions in order to explore the environment and gradually construct an effective strategy that maximizes the rewards. These RL techniques have been successfully applied to numerous real-world, complex applications including board games and motor control tasks. Almost all RL algorithms involve the estimation of a value function, which indicates how good it is for the agent to be in a given state, in terms of the total expected reward in the long run. Alternatively, the value function may reflect on the impact of taking a particular action at a given state. The most fundamental approach for constructing such a value function consists of updating a table that contains a value for each state (or each state-action pair). However, this approach is impractical for large scale problems, in which the state and/or action spaces are large. In order to deal with such problems, it is necessary to exploit the generalization capabilities of non-linear function approximators, such as artificial neural networks. This dissertation focuses on practical methodologies for solving reinforcement learning problems with large state and/or action spaces. In particular, the work addresses scenarios in which an agent does not have full knowledge of its state, but rather receives partial information about its environment via sensory-based observations. In order to address such intricate problems, novel solutions for both tabular and function-approximation based RL frameworks are proposed. A resource-efficient recurrent neural network algorithm is presented, which exploits adaptive step-size techniques to improve learning characteristics. Moreover, a consolidated actor-critic network is introduced, which omits the modeling redundancy found in typical actor-critic systems. Pivotal concerns are the scalability and speed of the learning algorithms, for which we devise architectures that map efficiently to hardware. As a result, a high degree of parallelism can be achieved. Simulation results that correspond to relevant testbench problems clearly demonstrate the solid performance attributes of the proposed solutions.

Technology & Engineering

TEXPLORE Temporal Difference Reinforcement Learning for Robots and Time Constrained Domains

Book Details:

Author : Todd Hester
Publisher : Springer
Release : 2013-06-22
ISBN : 3319011685
Pages : 170 pages

Download or read book TEXPLORE Temporal Difference Reinforcement Learning for Robots and Time Constrained Domains written by Todd Hester and published by Springer. This book was released on 2013-06-22 with total page 170 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book presents and develops new reinforcement learning methods that enable fast and robust learning on robots in real-time. Robots have the potential to solve many problems in society, because of their ability to work in dangerous places doing necessary jobs that no one wants or is able to do. One barrier to their widespread deployment is that they are mainly limited to tasks where it is possible to hand-program behaviors for every situation that may be encountered. For robots to meet their potential, they need methods that enable them to learn and adapt to novel situations that they were not programmed for. Reinforcement learning (RL) is a paradigm for learning sequential decision making processes and could solve the problems of learning and adaptation on robots. This book identifies four key challenges that must be addressed for an RL algorithm to be practical for robotic control tasks. These RL for Robotics Challenges are: 1) it must learn in very few samples; 2) it must learn in domains with continuous state features; 3) it must handle sensor and/or actuator delays; and 4) it should continually select actions in real time. This book focuses on addressing all four of these challenges. In particular, this book is focused on time-constrained domains where the first challenge is critically important. In these domains, the agent’s lifetime is not long enough for it to explore the domains thoroughly, and it must learn in very few samples.

Computers

Algorithms for Reinforcement Learning

Book Details:

Author : Csaba Grossi
Publisher : Springer Nature
Release : 2022-05-31
ISBN : 3031015517
Pages : 89 pages

Download or read book Algorithms for Reinforcement Learning written by Csaba Grossi and published by Springer Nature. This book was released on 2022-05-31 with total page 89 pages. Available in PDF, EPUB and Kindle. Book excerpt: Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms' merits and limitations. Reinforcement learning is of great interest because of the large number of practical applications that it can be used to address, ranging from problems in artificial intelligence to operations research or control engineering. In this book, we focus on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming. We give a fairly comprehensive catalog of learning problems, describe the core ideas, note a large number of state of the art algorithms, followed by the discussion of their theoretical properties and limitations. Table of Contents: Markov Decision Processes / Value Prediction Problems / Control / For Further Exploration

Computers

Algorithms for Reinforcement Learning

Book Details:

Author : Csaba Szepesvari
Publisher : Morgan & Claypool Publishers
Release : 2010
ISBN : 1608454924
Pages : 89 pages

Download or read book Algorithms for Reinforcement Learning written by Csaba Szepesvari and published by Morgan & Claypool Publishers. This book was released on 2010 with total page 89 pages. Available in PDF, EPUB and Kindle. Book excerpt: Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms' merits and limitations. Reinforcement learning is of great interest because of the large number of practical applications that it can be used to address, ranging from problems in artificial intelligence to operations research or control engineering. In this book, we focus on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming.We give a fairly comprehensive catalog of learning problems, describe the core ideas, note a large number of state of the art algorithms, followed by the discussion of their theoretical properties and limitations.

Electronic computers. Computer science

Efficient Reinforcement Learning Using Gaussian Processes

Book Details:

Author : Marc Peter Deisenroth
Publisher : KIT Scientific Publishing
Release : 2010
ISBN : 3866445695
Pages : 226 pages

Download or read book Efficient Reinforcement Learning Using Gaussian Processes written by Marc Peter Deisenroth and published by KIT Scientific Publishing. This book was released on 2010 with total page 226 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book examines Gaussian processes in both model-based reinforcement learning (RL) and inference in nonlinear dynamic systems.First, we introduce PILCO, a fully Bayesian approach for efficient RL in continuous-valued state and action spaces when no expert knowledge is available. PILCO takes model uncertainties consistently into account during long-term planning to reduce model bias. Second, we propose principled algorithms for robust filtering and smoothing in GP dynamic systems.

Science

Model Based Reinforcement Learning

Book Details:

Author : Milad Farsi
Publisher : John Wiley & Sons
Release : 2022-12-02
ISBN : 1119808596
Pages : 276 pages

Download or read book Model Based Reinforcement Learning written by Milad Farsi and published by John Wiley & Sons. This book was released on 2022-12-02 with total page 276 pages. Available in PDF, EPUB and Kindle. Book excerpt: Model-Based Reinforcement Learning Explore a comprehensive and practical approach to reinforcement learning Reinforcement learning is an essential paradigm of machine learning, wherein an intelligent agent performs actions that ensure optimal behavior from devices. While this paradigm of machine learning has gained tremendous success and popularity in recent years, previous scholarship has focused either on theory—optimal control and dynamic programming – or on algorithms—most of which are simulation-based. Model-Based Reinforcement Learning provides a model-based framework to bridge these two aspects, thereby creating a holistic treatment of the topic of model-based online learning control. In doing so, the authors seek to develop a model-based framework for data-driven control that bridges the topics of systems identification from data, model-based reinforcement learning, and optimal control, as well as the applications of each. This new technique for assessing classical results will allow for a more efficient reinforcement learning system. At its heart, this book is focused on providing an end-to-end framework—from design to application—of a more tractable model-based reinforcement learning technique. Model-Based Reinforcement Learning readers will also find: A useful textbook to use in graduate courses on data-driven and learning-based control that emphasizes modeling and control of dynamical systems from data Detailed comparisons of the impact of different techniques, such as basic linear quadratic controller, learning-based model predictive control, model-free reinforcement learning, and structured online learning Applications and case studies on ground vehicles with nonholonomic dynamics and another on quadrator helicopters An online, Python-based toolbox that accompanies the contents covered in the book, as well as the necessary code and data Model-Based Reinforcement Learning is a useful reference for senior undergraduate students, graduate students, research assistants, professors, process control engineers, and roboticists.

Efficient Reinforcement Learning Via Singular Value Decomposition End to end Model based Methods and Reward Shaping

Book Details:

Author : Clement Gehring
Publisher :
Release : 2022
ISBN :
Pages : 0 pages

Download or read book Efficient Reinforcement Learning Via Singular Value Decomposition End to end Model based Methods and Reward Shaping written by Clement Gehring and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Reinforcement learning (RL) provides a general framework for data-driven decision making. However, the very same generality that makes this approach applicable to a wide range of problems is also responsible for its well-known inefficiencies. In this thesis, we consider different properties which are shared by interesting classes of decision making which can be leveraged to design learning algorithms that are both computationally and data efficient. Specifically, this work examines the low-rank structure found in various aspects of decision making problems and the sparsity of effects of classical deterministic planning, as well as the properties that end-to-end model-based methods depend on to perform well. We start by showing how low-rank structure in the successor representation enables the design of an efficient on-line learning algorithm. Similarly, we show how this same structure can be found in the Bellman operator which we use to formulate an efficient variant of the least-squares temporal difference learning algorithm. We further explore low-rank structure in state features to learn efficient transition models which allow for efficient planning entirely in a low dimensional space. We then take a closer look at end-to-end model-based methods in to better understand their properties. We do this by examining this type of approach through the lens of constrained optimization and implicit differentiation. Through the implicit perspective, we derive properties of these methods which allow us to identify conditions under which they perform well. We conclude this thesis by exploring how the sparsity of effects of classical planning problems can used to define general domain-independent heuristics which we can be used to greatly accelerate learning of domain-dependent heuristics through the use of potential-based reward shaping and lifted function approximation.

Mathematics

Dynamic Programming and Optimal Control

Book Details:

Author : Dimitri Bertsekas
Publisher : Athena Scientific
Release :
ISBN : 1886529434
Pages : 613 pages

Download or read book Dynamic Programming and Optimal Control written by Dimitri Bertsekas and published by Athena Scientific. This book was released on with total page 613 pages. Available in PDF, EPUB and Kindle. Book excerpt: This is the leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization. The treatment focuses on basic unifying themes, and conceptual foundations. It illustrates the versatility, power, and generality of the method with many examples and applications from engineering, operations research, and other fields. It also addresses extensively the practical application of the methodology, possibly through the use of approximations, and provides an extensive treatment of the far-reaching methodology of Neuro-Dynamic Programming/Reinforcement Learning. Among its special features, the book 1) provides a unifying framework for sequential decision making, 2) treats simultaneously deterministic and stochastic control problems popular in modern control theory and Markovian decision popular in operations research, 3) develops the theory of deterministic optimal control problems including the Pontryagin Minimum Principle, 4) introduces recent suboptimal control and simulation-based approximation techniques (neuro-dynamic programming), which allow the practical application of dynamic programming to complex problems that involve the dual curse of large dimension and lack of an accurate mathematical model, 5) provides a comprehensive treatment of infinite horizon problems in the second volume, and an introductory treatment in the first volume The electronic version of the book includes 29 theoretical problems, with high-quality solutions, which enhance the range of coverage of the book.

Business & Economics

The Elements of Joint Learning and Optimization in Operations Management

Book Details:

Author : Xi Chen
Publisher : Springer Nature
Release : 2022-09-20
ISBN : 3031019261
Pages : 444 pages

Download or read book The Elements of Joint Learning and Optimization in Operations Management written by Xi Chen and published by Springer Nature. This book was released on 2022-09-20 with total page 444 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book examines recent developments in Operations Management, and focuses on four major application areas: dynamic pricing, assortment optimization, supply chain and inventory management, and healthcare operations. Data-driven optimization in which real-time input of data is being used to simultaneously learn the (true) underlying model of a system and optimize its performance, is becoming increasingly important in the last few years, especially with the rise of Big Data.

Computers

Reinforcement Learning and Dynamic Programming Using Function Approximators

Book Details:

Author : Lucian Busoniu
Publisher : CRC Press
Release : 2017-07-28
ISBN : 1351833820
Pages : 277 pages

Download or read book Reinforcement Learning and Dynamic Programming Using Function Approximators written by Lucian Busoniu and published by CRC Press. This book was released on 2017-07-28 with total page 277 pages. Available in PDF, EPUB and Kindle. Book excerpt: From household appliances to applications in robotics, engineered systems involving complex dynamics can only be as effective as the algorithms that control them. While Dynamic Programming (DP) has provided researchers with a way to optimally solve decision and control problems involving complex dynamic systems, its practical value was limited by algorithms that lacked the capacity to scale up to realistic problems. However, in recent years, dramatic developments in Reinforcement Learning (RL), the model-free counterpart of DP, changed our understanding of what is possible. Those developments led to the creation of reliable methods that can be applied even when a mathematical model of the system is unavailable, allowing researchers to solve challenging control problems in engineering, as well as in a variety of other disciplines, including economics, medicine, and artificial intelligence. Reinforcement Learning and Dynamic Programming Using Function Approximators provides a comprehensive and unparalleled exploration of the field of RL and DP. With a focus on continuous-variable problems, this seminal text details essential developments that have substantially altered the field over the past decade. In its pages, pioneering experts provide a concise introduction to classical RL and DP, followed by an extensive presentation of the state-of-the-art and novel methods in RL and DP with approximation. Combining algorithm development with theoretical guarantees, they elaborate on their work with illustrative examples and insightful comparisons. Three individual chapters are dedicated to representative algorithms from each of the major classes of techniques: value iteration, policy iteration, and policy search. The features and performance of these algorithms are highlighted in extensive experimental studies on a range of control applications. The recent development of applications involving complex systems has led to a surge of interest in RL and DP methods and the subsequent need for a quality resource on the subject. For graduate students and others new to the field, this book offers a thorough introduction to both the basics and emerging methods. And for those researchers and practitioners working in the fields of optimal and adaptive control, machine learning, artificial intelligence, and operations research, this resource offers a combination of practical algorithms, theoretical analysis, and comprehensive examples that they will be able to adapt and apply to their own work. Access the authors' website at www.dcsc.tudelft.nl/rlbook/ for additional material, including computer code used in the studies and information concerning new developments.

Reinforcement learning

Design and Analysis of Efficient Reinforcement Learning Algorithms

Book Details:

Author : Claude-Nicolas Fiechter
Publisher :
Release : 1997
ISBN :
Pages : 125 pages

Download or read book Design and Analysis of Efficient Reinforcement Learning Algorithms written by Claude-Nicolas Fiechter and published by . This book was released on 1997 with total page 125 pages. Available in PDF, EPUB and Kindle. Book excerpt: Reinforcement learning considers the problem of learning a task or behavior by interacting with one's environment. The learning agent is not explicitly told how the task is to be achieved and has to learn by trial-and-error, using only the rewards and punishments that it receives in response to the actions it takes. In the last ten years there has been a rapidly growing interest in reinforcement learning techniques as a base for intelligent control architectures. Many methods have been proposed and a number of very successful applications have been developed. This dissertation contributes to a theoretical foundation for the study of reinforcement learning by applying some of the methods and tools of computational learning theory to the problem. We propose a formal model of efficient reinforcement learning based on Valiant's Probably Approximately Correct (PAC) learning framework, and use it to design reinforcement learning algorithms and to analyze their performance. We describe the first polynomial-time PAC algorithm for the general finite-state reinforcement learning problem and show that an active and directed exploration of its environment by the learning agent is necessary and sufficient to obtain efficient learning for that problem. We consider the trade-off between exploration and exploitation in reinforcement learning algorithms and show how in general an off-line PAC algorithm can be converted into an on-line algorithm that efficiently balances exploration and exploitation. We also consider the problem of generalization in reinforcement learning and show how in some cases the underlying structure of the environment can be exploited to achieve faster learning. We describe a PAC algorithm for the associative reinforcement learning problem that uses a form of decision lists to represent the policies in a compact way and generalize across different inputs. In addition, we describe a PAC algorithm for a special case of reinforcement learning where the environment can be modeled by a linear system. This particular reinforcement learning problem corresponds to the so-called linear quadratic regulator which is extensively studied and used in automatic and adaptive control.

Computers

Understanding Machine Learning

Book Details:

Author : Shai Shalev-Shwartz
Publisher : Cambridge University Press
Release : 2014-05-19
ISBN : 1107057132
Pages : 415 pages

Download or read book Understanding Machine Learning written by Shai Shalev-Shwartz and published by Cambridge University Press. This book was released on 2014-05-19 with total page 415 pages. Available in PDF, EPUB and Kindle. Book excerpt: Introduces machine learning and its algorithmic paradigms, explaining the principles behind automated learning approaches and the considerations underlying their usage.

Technology & Engineering

Reinforcement Learning

Book Details:

Author : Marco Wiering
Publisher : Springer Science & Business Media
Release : 2012-03-05
ISBN : 3642276458
Pages : 653 pages

Download or read book Reinforcement Learning written by Marco Wiering and published by Springer Science & Business Media. This book was released on 2012-03-05 with total page 653 pages. Available in PDF, EPUB and Kindle. Book excerpt: Reinforcement learning encompasses both a science of adaptive behavior of rational beings in uncertain environments and a computational methodology for finding optimal behaviors for challenging problems in control, optimization and adaptive behavior of intelligent agents. As a field, reinforcement learning has progressed tremendously in the past decade. The main goal of this book is to present an up-to-date series of survey articles on the main contemporary sub-fields of reinforcement learning. This includes surveys on partially observable environments, hierarchical task decompositions, relational knowledge representation and predictive state representations. Furthermore, topics such as transfer, evolutionary methods and continuous spaces in reinforcement learning are surveyed. In addition, several chapters review reinforcement learning methods in robotics, in games, and in computational neuroscience. In total seventeen different subfields are presented by mostly young experts in those areas, and together they truly represent a state-of-the-art of current reinforcement learning research. Marco Wiering works at the artificial intelligence department of the University of Groningen in the Netherlands. He has published extensively on various reinforcement learning topics. Martijn van Otterlo works in the cognitive artificial intelligence group at the Radboud University Nijmegen in The Netherlands. He has mainly focused on expressive knowledge representation in reinforcement learning settings.

Technology & Engineering

Reinforcement Learning Algorithms Analysis and Applications

Book Details:

Author : Boris Belousov
Publisher : Springer Nature
Release : 2021-01-02
ISBN : 3030411885
Pages : 197 pages

Download or read book Reinforcement Learning Algorithms Analysis and Applications written by Boris Belousov and published by Springer Nature. This book was released on 2021-01-02 with total page 197 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book reviews research developments in diverse areas of reinforcement learning such as model-free actor-critic methods, model-based learning and control, information geometry of policy searches, reward design, and exploration in biology and the behavioral sciences. Special emphasis is placed on advanced ideas, algorithms, methods, and applications. The contributed papers gathered here grew out of a lecture course on reinforcement learning held by Prof. Jan Peters in the winter semester 2018/2019 at Technische Universität Darmstadt. The book is intended for reinforcement learning students and researchers with a firm grasp of linear algebra, statistics, and optimization. Nevertheless, all key concepts are introduced in each chapter, making the content self-contained and accessible to a broader audience.

Computer algorithms

Towards the Understanding of Sample Efficient Reinforcement Learning Algorithms

Book Details:

Author : Tengyu Xu
Publisher :
Release : 2022
ISBN :
Pages : 0 pages

Download or read book Towards the Understanding of Sample Efficient Reinforcement Learning Algorithms written by Tengyu Xu and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Reinforcement learning (RL), which aims at designing a suitable policy for an agent via interacting with an unknown environment, has achieved remarkable success in the recent past. Despite its great potential to solve complex tasks, current RL algorithms suffer from requiring a large amount of interaction data, which could result in significant cost in real world applications. Thus, the goal of this thesis is to study the sample complexity of fundamental RL algorithms, and then, to propose new RL algorithms to solve real-world problems with provable efficiency. To achieve this goal, this thesis makes the contributions along the following three main directions: 1. For policy evaluation, we proposed a new on-policy algorithm called variance reduce TD (VRTD) and established the state-of-the-art sample complexity result for off-policy two-timescale TD learning algorithms. 2. For policy optimization, we established improved sample complexity bounds for on-policy actor-critic (AC) type algorithms and proposed the first doubly robust off-policy AC algorithm with provable efficiency guarantee. 3. We proposed three new algorithms: GenTD, CRPO and PARTED to address challenging practical problems of general value function evaluation, safe RL and trajectory-wise reward RL, respectively, with provable efficiency.