Two-Sample Testing in Reinforcement Learning

01/20/2022
by   Martin Waltz, et al.
0

Value-based reinforcement-learning algorithms have shown strong performances in games, robotics, and other real-world applications. The most popular sample-based method is Q-Learning. A Q-value is the expected return for a state-action pair when following a particular policy, and the algorithm subsequently performs updates by adjusting the current Q-value towards the observed reward and the maximum of the Q-values of the next state. The procedure introduces maximization bias, and solutions like Double Q-Learning have been considered. We frame the bias problem statistically and consider it an instance of estimating the maximum expected value (MEV) of a set of random variables. We propose the T-Estimator (TE) based on two-sample testing for the mean. The TE flexibly interpolates between over- and underestimation by adjusting the level of significance of the underlying hypothesis tests. A generalization termed K-Estimator (KE) obeys the same bias and variance bounds as the TE while relying on a nearly arbitrary kernel function. Using the TE and the KE, we introduce modifications of Q-Learning and its neural network analog, the Deep Q-Network. The proposed estimators and algorithms are thoroughly tested and validated on a diverse set of tasks and environments, illustrating the performance potential of the TE and KE.

READ FULL TEXT
research
05/03/2021

Action Candidate Based Clipped Double Q-learning for Discrete and Continuous Action Tasks

Double Q-learning is a popular reinforcement learning algorithm in Marko...
research
03/22/2022

Action Candidate Driven Clipped Double Q-learning for Discrete and Continuous Action Tasks

Double Q-learning is a popular reinforcement learning algorithm in Marko...
research
02/28/2013

Estimating the Maximum Expected Value: An Analysis of (Nested) Cross Validation and the Maximum Sample Average

We investigate the accuracy of the two most common estimators for the ma...
research
02/28/2021

Ensemble Bootstrapping for Q-Learning

Q-learning (QL), a common reinforcement learning algorithm, suffers from...
research
02/15/2020

Loop estimator for discounted values in Markov reward processes

At the working heart of policy iteration algorithms commonly used and st...
research
05/27/2021

Pattern Transfer Learning for Reinforcement Learning in Order Dispatching

Order dispatch is one of the central problems to ride-sharing platforms....
research
12/05/2022

Bayesian Reconciliation of Return Predictability

This article considers a stable vector autoregressive (VAR) model and in...

Please sign up or login with your details

Forgot password? Click here to reset