Reinforcement Learning in Deep Structured Teams: Initial Results with Finite and Infinite Valued Features

by   Jalal Arabneydi, et al.

In this paper, we consider Markov chain and linear quadratic models for deep structured teams with discounted and time-average cost functions under two non-classical information structures, namely, deep state sharing and no sharing. In deep structured teams, agents are coupled in dynamics and cost functions through deep state, where deep state refers to a set of orthogonal linear regressions of the states. In this article, we consider a homogeneous linear regression for Markov chain models (i.e., empirical distribution of states) and a few orthonormal linear regressions for linear quadratic models (i.e., weighted average of states). Some planning algorithms are developed for the case when the model is known, and some reinforcement learning algorithms are proposed for the case when the model is not known completely. The convergence of two model-free (reinforcement learning) algorithms, one for Markov chain models and one for linear quadratic models, is established. The results are then applied to a smart grid.



There are no comments yet.


page 1

page 2

page 3

page 4


Reinforcement Learning in Linear Quadratic Deep Structured Teams: Global Convergence of Policy Gradient Methods

In this paper, we study the global convergence of model-based and model-...

Markov-Modulated Linear Regression

Classical linear regression is considered for a case when regression par...

Deep Structured Teams in Arbitrary-Size Linear Networks: Decentralized Estimation, Optimal Control and Separation Principle

In this article, we introduce decentralized Kalman filters for linear qu...

Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping

We consider the problem of efficiently learning optimal control policies...

Thompson sampling for linear quadratic mean-field teams

We consider optimal control of an unknown multi-agent linear quadratic (...

Online Observer-Based Inverse Reinforcement Learning

In this paper, a novel approach to the output-feedback inverse reinforce...

A Markov Chain Model for COVID19 in Mexico City

This paper presents a model for COVID19 in Mexico City. The data analyze...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.