Reinforcement Learning in Deep Structured Teams: Initial Results with Finite and Infinite Valued Features

10/06/2020
by   Jalal Arabneydi, et al.
0

In this paper, we consider Markov chain and linear quadratic models for deep structured teams with discounted and time-average cost functions under two non-classical information structures, namely, deep state sharing and no sharing. In deep structured teams, agents are coupled in dynamics and cost functions through deep state, where deep state refers to a set of orthogonal linear regressions of the states. In this article, we consider a homogeneous linear regression for Markov chain models (i.e., empirical distribution of states) and a few orthonormal linear regressions for linear quadratic models (i.e., weighted average of states). Some planning algorithms are developed for the case when the model is known, and some reinforcement learning algorithms are proposed for the case when the model is not known completely. The convergence of two model-free (reinforcement learning) algorithms, one for Markov chain models and one for linear quadratic models, is established. The results are then applied to a smart grid.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

11/29/2020

Reinforcement Learning in Linear Quadratic Deep Structured Teams: Global Convergence of Policy Gradient Methods

In this paper, we study the global convergence of model-based and model-...
01/28/2019

Markov-Modulated Linear Regression

Classical linear regression is considered for a case when regression par...
10/23/2021

Deep Structured Teams in Arbitrary-Size Linear Networks: Decentralized Estimation, Optimal Control and Separation Principle

In this article, we introduce decentralized Kalman filters for linear qu...
06/13/2012

Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping

We consider the problem of efficiently learning optimal control policies...
11/09/2020

Thompson sampling for linear quadratic mean-field teams

We consider optimal control of an unknown multi-agent linear quadratic (...
11/03/2020

Online Observer-Based Inverse Reinforcement Learning

In this paper, a novel approach to the output-feedback inverse reinforce...
10/17/2021

A Markov Chain Model for COVID19 in Mexico City

This paper presents a model for COVID19 in Mexico City. The data analyze...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.