StarCraft II is one of the most challenging simulated reinforcement lear...
Off-policy learning enables a reinforcement learning (RL) agent to reaso...
Monte Carlo (MC) methods are the most widely used methods to estimate th...
SARSA, a classical on-policy control algorithm for reinforcement learnin...
In this paper, we establish the global optimality and convergence rate o...
Emphatic Temporal Difference (TD) methods are a class of off-policy
Rein...
Off-policy sampling and experience replay are key for improving sample
e...
The deadly triad refers to the instability of a reinforcement learning
a...
We consider off-policy policy evaluation with function approximation (FA...
We investigate the discounting mismatch in actor-critic algorithm
implem...
We present a Reverse Reinforcement Learning (Reverse RL) approach for
re...
We present a new per-step reward perspective for risk-averse control in ...
We present GradientDICE for estimating the density ratio between the sta...
We present the first provably convergent off-policy actor-critic algorit...
In distributional reinforcement learning (RL), the estimated distributio...
Intrinsic rewards are introduced to simulate how human intelligence work...
We revisit residual algorithms in both model-free and model-based
reinfo...
We reformulate the option framework as two parallel augmented MDPs. Unde...
We propose a new objective, the counterfactual objective, unifying exist...
In this paper, we propose an actor ensemble algorithm, named ACE, for
co...
In this paper, we propose the Quantile Option Architecture (QUOTA) for
e...
Experience replay plays an important role in the success of deep
reinfor...
Reinforcement learning and evolutionary strategy are two major approache...
Representations are fundamental to artificial intelligence. The performa...