Brick Tic-Tac-Toe: Exploring the Generalizability of AlphaZero to Novel Test Environments

07/13/2022
by   John Tan Chong Min, et al.
0

Traditional reinforcement learning (RL) environments typically are the same for both the training and testing phases. Hence, current RL methods are largely not generalizable to a test environment which is conceptually similar but different from what the method has been trained on, which we term the novel test environment. As an effort to push RL research towards algorithms which can generalize to novel test environments, we introduce the Brick Tic-Tac-Toe (BTTT) test bed, where the brick position in the test environment is different from that in the training environment. Using a round-robin tournament on the BTTT environment, we show that traditional RL state-search approaches such as Monte Carlo Tree Search (MCTS) and Minimax are more generalizable to novel test environments than AlphaZero is. This is surprising because AlphaZero has been shown to achieve superhuman performance in environments such as Go, Chess and Shogi, which may lead one to think that it performs well in novel test environments. Our results show that BTTT, though simple, is rich enough to explore the generalizability of AlphaZero. We find that merely increasing MCTS lookahead iterations was insufficient for AlphaZero to generalize to some novel test environments. Rather, increasing the variety of training environments helps to progressively improve generalizability across all possible starting brick configurations.

READ FULL TEXT

page 11

page 12

page 13

research
01/01/2021

When Is Generalizable Reinforcement Learning Tractable?

Agents trained by reinforcement learning (RL) often fail to generalize b...
research
06/03/2022

Disentangling Epistemic and Aleatoric Uncertainty in Reinforcement Learning

Characterizing aleatoric and epistemic uncertainty on the predicted rewa...
research
08/29/2023

Improving Reinforcement Learning Training Regimes for Social Robot Navigation

In order for autonomous mobile robots to navigate in human spaces, they ...
research
12/06/2018

Quantifying Generalization in Reinforcement Learning

In this paper, we investigate the problem of overfitting in deep reinfor...
research
01/27/2023

Single-Trajectory Distributionally Robust Reinforcement Learning

As a framework for sequential decision-making, Reinforcement Learning (R...
research
02/12/2022

Automatic Curriculum Generation for Learning Adaptation in Networking

As deep reinforcement learning (RL) showcases its strengths in networkin...
research
03/08/2021

Comparing Popular Simulation Environments in the Scope of Robotics and Reinforcement Learning

This letter compares the performance of four different, popular simulati...

Please sign up or login with your details

Forgot password? Click here to reset