
-
Sample Complexity of Policy Gradient Finding Second-Order Stationary Points
The goal of policy-based reinforcement learning (RL) is to search the ma...
read it
-
Gradient Q(σ, λ): A Unified Algorithm with Function Approximation for Reinforcement Learning
Full-sampling (e.g., Q-learning) and pure-expectation (e.g., Expected Sa...
read it
-
FiDi-RL: Incorporating Deep Reinforcement Learning with Finite-Difference Policy Search for Efficient Learning of Continuous Control
In recent years significant progress has been made in dealing with chall...
read it
-
Policy Optimization with Stochastic Mirror Descent
Stochastic mirror descent (SMD) keeps the advantages of simplicity of im...
read it
-
Expected Sarsa(λ) with Control Variate for Variance Reduction
Off-policy learning is powerful for reinforcement learning. However, the...
read it
-
TBQ(σ): Improving Efficiency of Trace Utilization for Off-Policy Reinforcement Learning
Off-policy reinforcement learning with eligibility traces is challenging...
read it
-
Beetle Swarm Optimization Algorithm:Theory and Application
In this paper, a new meta-heuristic algorithm, called beetle swarm optim...
read it
-
Qualitative Measurements of Policy Discrepancy for Return-based Deep Q-Network
In this paper, we focus on policy discrepancy in return-based deep Q-net...
read it
-
A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning
Recently, a new multi-step temporal learning algorithm, called Q(σ), uni...
read it