Learning classifier systems with memory condition to solve non-Markov problems

11/02/2012
by   Zhaoxiang Zang, et al.
0

In the family of Learning Classifier Systems, the classifier system XCS has been successfully used for many applications. However, the standard XCS has no memory mechanism and can only learn optimal policy in Markov environments, where the optimal action is determined solely by the state of current sensory input. In practice, most environments are partially observable environments on agent's sensation, which are also known as non-Markov environments. Within these environments, XCS either fails, or only develops a suboptimal policy, since it has no memory. In this work, we develop a new classifier system based on XCS to tackle this problem. It adds an internal message list to XCS as the memory list to record input sensation history, and extends a small number of classifiers with memory conditions. The classifier's memory condition, as a foothold to disambiguate non-Markov states, is used to sense a specified element in the memory list. Besides, a detection method is employed to recognize non-Markov states in environments, to avoid these states controlling over classifiers' memory conditions. Furthermore, four sets of different complex maze environments have been tested by the proposed method. Experimental results show that our system is one of the best techniques to solve partially observable environments, compared with some well-known classifier systems proposed for these environments.

READ FULL TEXT

page 11

page 26

research
04/17/2021

A Self-Supervised Auxiliary Loss for Deep RL in Partially Observable Settings

In this work we explore an auxiliary loss useful for reinforcement learn...
research
02/25/2016

Thompson Sampling is Asymptotically Optimal in General Environments

We discuss a variant of Thompson sampling for nonparametric reinforcemen...
research
05/18/2006

Cross-Entropic Learning of a Machine for the Decision in a Partially Observable Universe

Revision of the paper previously entitled "Learning a Machine for the De...
research
06/25/2019

Learning Causal State Representations of Partially Observable Environments

Intelligent agents can cope with sensory-rich environments by learning t...
research
02/08/2022

Provable Reinforcement Learning with a Short-Term Memory

Real-world sequential decision making problems commonly involve partial ...
research
05/23/2022

Generalization, Mayhems and Limits in Recurrent Proximal Policy Optimization

At first sight it may seem straightforward to use recurrent layers in De...
research
06/16/2021

How memory architecture affects performance and learning in simple POMDPs

Reinforcement learning is made much more complex when the agent's observ...

Please sign up or login with your details

Forgot password? Click here to reset