
Unifying Gradient Estimators for MetaReinforcement Learning via OffPolicy Evaluation
Modelagnostic metareinforcement learning requires estimating the Hessi...
read it

Taylor Expansion of Discount Factors
In practical reinforcement learning (RL), the discount factor used for e...
read it

Revisiting Peng's Q(λ) for Modern Reinforcement Learning
Offpolicy multistep reinforcement learning algorithms consist of conse...
read it

Unlocking Pixels for Reinforcement Learning via Implicit Attention
There has recently been significant interest in training reinforcement l...
read it

ESENAS: Combining Evolution Strategies with Neural Architecture Search at No Extra Cost for Reinforcement Learning
We introduce ESENAS, a simple neural architecture search (NAS) algorith...
read it

MonteCarlo Tree Search as Regularized Policy Optimization
The combination of MonteCarlo tree search (MCTS) with deep reinforcemen...
read it

Online Hyperparameter Tuning in Offpolicy Learning via Evolutionary Strategies
Offpolicy learning algorithms have been known to be sensitive to the ch...
read it

Hindsight Expectation Maximization for Goalconditioned Reinforcement Learning
We propose a graphical model framework for goalconditioned RL, with an ...
read it

SelfImitation Learning via Generalized Lower Bound Qlearning
Selfimitation learning motivated by lowerbound Qlearning is a novel a...
read it

Taylor Expansion Policy Optimization
In this work, we investigate the application of Taylor expansions in rei...
read it

Discrete Action OnPolicy Learning with ActionValue Critic
Reinforcement learning (RL) in discrete action space is ubiquitous in re...
read it

ESMAML: Simple HessianFree Meta Learning
We introduce ESMAML, a new framework for solving the model agnostic met...
read it

Reinforcement Learning with Chromatic Networks
We present a new algorithm for finding compact neural networks encoding ...
read it

Reinforcement Learning for Integer Programming: Learning to Cut
Integer programming (IP) is a general optimization framework widely appl...
read it

Wasserstein Reinforcement Learning
We propose behaviordriven optimization via Wasserstein distances (WDs) ...
read it

Variance Reduction for Evolution Strategies via Structured Control Variates
Evolution Strategies (ES) are a powerful class of blackbox optimization ...
read it

Structured Monte Carlo Sampling for Nonisotropic Distributions via Determinantal Point Processes
We propose a new class of structured methods for Monte Carlo (MC) sampli...
read it

AugmentReinforceMerge Policy Gradient for Binary Stochastic Policy
Due to the high variance of policy gradients, onpolicy optimization alg...
read it

Orthogonal Estimation of Wasserstein Distances
Wasserstein distances are increasingly used in a wide variety of applica...
read it

Adaptive SampleEfficient Blackbox Optimization via ESactive Subspaces
We present a new algorithm ASEBO for conducting optimization of highdim...
read it

Discretizing Continuous Action Space for OnPolicy Optimization
In this work, we show that discretizing action space for continuous cont...
read it

Boosting Trust Region Policy Optimization by Normalizing Flows Policy
We propose to improve trust region policy search with normalizing flows ...
read it

Implicit Policy for Reinforcement Learning
We introduce Implicit Policy, a general class of expressive policies tha...
read it

Exploration by Distributional Reinforcement Learning
We propose a framework based on distributional reinforcement learning an...
read it

Variational Deep Q Network
We propose a framework that directly tackles the probability distributio...
read it