Beyond Value: CHECKLIST for Testing Inferences in Planning-Based RL

06/04/2022
by   Kin-Ho Lam, et al.
0

Reinforcement learning (RL) agents are commonly evaluated via their expected value over a distribution of test scenarios. Unfortunately, this evaluation approach provides limited evidence for post-deployment generalization beyond the test distribution. In this paper, we address this limitation by extending the recent CheckList testing methodology from natural language processing to planning-based RL. Specifically, we consider testing RL agents that make decisions via online tree search using a learned transition model and value function. The key idea is to improve the assessment of future performance via a CheckList approach for exploring and assessing the agent's inferences during tree search. The approach provides the user with an interface and general query-rule mechanism for identifying potential inference flaws and validating expected inference invariances. We present a user study involving knowledgeable AI researchers using the approach to evaluate an agent trained to play a complex real-time strategy game. The results show the approach is effective in allowing users to identify previously-unknown flaws in the agent's reasoning. In addition, our analysis provides insight into how AI experts use this type of testing approach, which may help improve future instantiations.

READ FULL TEXT
research
09/28/2021

Identifying Reasoning Flaws in Planning-Based RL Using Tree Explanations

Enabling humans to identify potential flaws in an agent's decision makin...
research
05/07/2022

Search-Based Testing of Reinforcement Learning

Evaluation of deep reinforcement learning (RL) is inherently challenging...
research
03/22/2019

Explaining Reinforcement Learning to Mere Mortals: An Empirical Study

We present a user study to investigate the impact of explanations on non...
research
10/05/2019

Towards Deployment of Robust AI Agents for Human-Machine Partnerships

We study the problem of designing AI agents that can robustly cooperate ...
research
02/19/2020

Value-driven Hindsight Modelling

Value estimation is a critical component of the reinforcement learning (...
research
06/26/2019

Towards Empathic Deep Q-Learning

As reinforcement learning (RL) scales to solve increasingly complex task...
research
08/15/2023

Planning to Learn: A Novel Algorithm for Active Learning during Model-Based Planning

Active Inference is a recent framework for modeling planning under uncer...

Please sign up or login with your details

Forgot password? Click here to reset