Partially Observed, Multi-objective Markov Games

04/16/2014
by   Yanling Chang, et al.
0

The intent of this research is to generate a set of non-dominated policies from which one of two agents (the leader) can select a most preferred policy to control a dynamic system that is also affected by the control decisions of the other agent (the follower). The problem is described by an infinite horizon, partially observed Markov game (POMG). At each decision epoch, each agent knows: its past and present states, its past actions, and noise corrupted observations of the other agent's past and present states. The actions of each agent are determined at each decision epoch based on these data. The leader considers multiple objectives in selecting its policy. The follower considers a single objective in selecting its policy with complete knowledge of and in response to the policy selected by the leader. This leader-follower assumption allows the POMG to be transformed into a specially structured, partially observed Markov decision process (POMDP). This POMDP is used to determine the follower's best response policy. A multi-objective genetic algorithm (MOGA) is used to create the next generation of leader policies based on the fitness measures of each leader policy in the current generation. Computing a fitness measure for a leader policy requires a value determination calculation, given the leader policy and the follower's best response policy. The policies from which the leader can select a most preferred policy are the non-dominated policies of the final generation of leader policies created by the MOGA. An example is presented that illustrates how these results can be used to support a manager of a liquid egg production process (the leader) in selecting a sequence of actions to best control this process over time, given that there is an attacker (the follower) who seeks to contaminate the liquid egg production process with a chemical or biological toxin.

READ FULL TEXT
research
06/29/2020

Using Reinforcement Learning to Herd a Robotic Swarm to a Target Distribution

In this paper, we present a reinforcement learning approach to designing...
research
07/26/2023

Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks

We study reinforcement learning (RL) for learning a Quantal Stackelberg ...
research
11/03/2022

Zero-Sum Games with Noisy Observations

In this paper, 2 × 2 zero-sum games (ZSGs) are studied under the followi...
research
05/04/2023

Stackelberg Games for Learning Emergent Behaviors During Competitive Autocurricula

Autocurricular training is an important sub-area of multi-agent reinforc...
research
05/24/2020

Model-free Reinforcement Learning for Stochastic Stackelberg Security Games

In this paper, we consider a sequential stochastic Stackelberg game with...
research
11/28/2022

Provably Efficient Model-free RL in Leader-Follower MDP with Linear Function Approximation

We consider a multi-agent episodic MDP setup where an agent (leader) tak...
research
02/04/2021

Optimizing Consensus-based Multi-target Tracking with Multiagent Rollout Control Policies

This paper considers a multiagent, connected, robotic fleet where the pr...

Please sign up or login with your details

Forgot password? Click here to reset