Learning to Design Games: Strategic Environments in Deep Reinforcement Learning

07/05/2017
by   Haifeng Zhang, et al.
0

In typical reinforcement learning (RL), the environment is assumed given and the goal of the learning is to identify an optimal policy for the agent taking actions through its interactions with the environment. In this paper, we extend this setting by considering the environment is not given, but controllable and learnable through its interaction with the agent at the same time. Theoretically, we find a dual Markov decision process (MDP) w.r.t. the environment to that w.r.t. the agent, and solving the dual MDP-policy pair yields a policy gradient solution to optimizing the parametrized environment. Furthermore, environments with discontinuous parameters are addressed by a proposed general generative framework. While the idea is illustrated by an extended two-agent rock-paper-scissors game, our experiments on a Maze game design task show the effectiveness of the proposed algorithm in generating diverse and challenging Mazes against different agents with various settings.

READ FULL TEXT

page 3

page 6

page 7

research
05/11/2023

On Practical Robust Reinforcement Learning: Practical Uncertainty Set and Double-Agent Algorithm

We study a robust reinforcement learning (RL) with model uncertainty. Gi...
research
02/28/2023

Policy Dispersion in Non-Markovian Environment

Markov Decision Process (MDP) presents a mathematical framework to formu...
research
09/24/2018

EpiRL: A Reinforcement Learning Agent to Facilitate Epistasis Detection

Epistasis (gene-gene interaction) is crucial to predicting genetic disea...
research
11/03/2022

Theta-Resonance: A Single-Step Reinforcement Learning Method for Design Space Exploration

Given an environment (e.g., a simulator) for evaluating samples in a spe...
research
03/28/2022

REPTILE: A Proactive Real-Time Deep Reinforcement Learning Self-adaptive Framework

In this work a general framework is proposed to support the development ...
research
11/17/2022

AlphaSnake: Policy Iteration on a Nondeterministic NP-hard Markov Decision Process

Reinforcement learning has recently been used to approach well-known NP-...
research
03/31/2022

Mask Atari for Deep Reinforcement Learning as POMDP Benchmarks

We present Mask Atari, a new benchmark to help solve partially observabl...

Please sign up or login with your details

Forgot password? Click here to reset