Efficient Use of heuristics for accelerating XCS-based Policy Learning in Markov Games

05/26/2020
by   Hao Chen, et al.
6

In Markov games, playing against non-stationary opponents with learning ability is still challenging for reinforcement learning (RL) agents, because the opponents can evolve their policies concurrently. This increases the complexity of the learning task and slows down the learning speed of the RL agents. This paper proposes efficient use of rough heuristics to speed up policy learning when playing against concurrent learners. Specifically, we propose an algorithm that can efficiently learn explainable and generalized action selection rules by taking advantages of the representation of quantitative heuristics and an opponent model with an eXtended classifier system (XCS) in zero-sum Markov games. A neural network is used to model the opponent from their behaviors and the corresponding policy is inferred for action selection and rule evolution. In cases of multiple heuristic policies, we introduce the concept of Pareto optimality for action selection. Besides, taking advantages of the condition representation and matching mechanism of XCS, the heuristic policies and the opponent model can provide guidance for situations with similar feature representation. Furthermore, we introduce an accuracy-based eligibility trace mechanism to speed up rule evolution, i.e., classifiers that can match the historical traces are reinforced according to their accuracy. We demonstrate the advantages of the proposed algorithm over several benchmark algorithms in a soccer and a thief-and-hunter scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/22/2021

Bridging the gap between learning and heuristic based pushing policies

Non-prehensile pushing actions have the potential to singulate a target ...
research
03/11/2021

Analyzing the Hidden Activations of Deep Policy Networks: Why Representation Matters

We analyze the hidden activations of neural network policies of deep rei...
research
05/17/2023

A Genetic Fuzzy System for Interpretable and Parsimonious Reinforcement Learning Policies

Reinforcement learning (RL) is experiencing a resurgence in research int...
research
07/18/2022

Boolean Decision Rules for Reinforcement Learning Policy Summarisation

Explainability of Reinforcement Learning (RL) policies remains a challen...
research
09/10/2016

Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks

We consider scenarios from the real-time strategy game StarCraft as new ...
research
07/25/2022

Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

While single-agent policy optimization in a fixed environment has attrac...
research
03/24/2023

Learning to Operate in Open Worlds by Adapting Planning Models

Planning agents are ill-equipped to act in novel situations in which the...

Please sign up or login with your details

Forgot password? Click here to reset