DeepAI

# Learning in Networked Control Systems

We design adaptive controller (learning rule) for a networked control system (NCS) in which data packets containing control information are transmitted across a lossy wireless channel. We propose Upper Confidence Bounds for Networked Control Systems (UCB-NCS), a learning rule that maintains confidence intervals for the estimates of plant parameters (A_(),B_()), and channel reliability p_(), and utilizes the principle of optimism in the face of uncertainty while making control decisions. We provide non-asymptotic performance guarantees for UCB-NCS by analyzing its "regret", i.e., performance gap from the scenario when (A_(),B_(),p_()) are known to the controller. We show that with a high probability the regret can be upper-bounded as Õ(C√(T))[%s], where T is the operating time horizon of the system, and C is a problem dependent constant.

• 50 publications
• 19 publications
03/12/2020

### Regret Bound of Adaptive Control in Linear Quadratic Gaussian (LQG) Systems

We study the problem of adaptive control in partially observable linear ...
01/31/2020

### Regret Minimization in Partially Observable Linear Quadratic Control

We study the problem of regret minimization in partially observable line...
08/26/2021

### Finite-time System Identification and Adaptive Control in Autoregressive Exogenous Systems

Autoregressive exogenous (ARX) systems are the general class of input-ou...
11/26/2020

### Regret Bounds for Adaptive Nonlinear Control

We study the problem of adaptively controlling a known discrete-time non...
02/16/2010

### Convergence of Bayesian Control Rule

Recently, new approaches to adaptive control have sought to reformulate ...
10/21/2020

### Meta-Learning Guarantees for Online Receding Horizon Control

In this paper we provide provable regret guarantees for an online meta-l...
07/01/2020

### On the Latency, Rate and Reliability Tradeoff in Wireless Networked Control Systems for IIoT

Wireless networked control systems (WNCSs) provide a key enabling techni...

## I Introduction

Though adaptive control [5] of unknown Linear Quadratic Gaussian (LQG) systems [10] is a well-studied topic by now [4, 7, 6, 2], existing algorithms cannot be utilized for controlling an unknown NCS in which plant and network parameters are unknown. In departure from the traditional adaptive controllers for LQG systems, an algorithm now also needs to continually estimate the unknown network behaviour besides simultaneously learning and controlling the plant in an online manner. An important concern is that in general it is not optimal to design and operate network estimator independently of the process controller. Thus, the optimal controls should utilize the information gained about network quality in addition to using the information gained about plant parameters. Similarly, decisions made by the network scheduler should also “aid” the controller in “learning” the unknown plant parameters.

This work addresses the problem of adaptive control of a simple NCS in which data packets from the controller to the plant, are communicated over an unreliable channel. We model the plant as a LQG system. We propose a learning rule that maintains estimates and confidence sets for both a) (unknown) plant parameters , and also b) (unknown) channel reliability . Controls are then generated using the principle of optimism in face of uncertainty [9], and depend upon both a) and b). We denote our algorithm as Upper Confidence Bounds for Networked Control Systems (UCB-NCS).

We show that UCB-NCS yields the same asymptotic performance as the optimal controller that has knowledge of the system and network parameters. We also quantify its finite-time performance by providing upper-bounds on its “regret” [3]. Regret scales as , where is the operating time horizon and is a problem dependent constant. It also depends on the channel reliability through a certain quantity which we call the “margin of stability”  (14). A larger value of means that the learning algorithm has a lower regret.

UCB-NCS has many appealing properties. For instance, network estimator needs to communicate only occasionally the value of its optimistic estimate of network reliability to the controller which then uses it to generate controls.

## Ii System Model

We assume that the system of interest is linear, and evolves as follows

 x(t+1)={A(⋆)x(t)+B(⋆)u(t)+w(t) % if ℓ(t)=1A(⋆)x(t)+w(t) if ℓ(t)=0, (1)

where are the system matrices, is the instantaneous state of the wireless channel, and are the system state and control input at time respectively. are Bernoulli i.i.d. with mean value . is the process noise, and is assumed to be i.i.d. with

 E(w(t)wT(t))=σ2w, ∀t∈[1,T].

The objective is to minimize the operating cost

 ET−1∑t=1xT(t)Qx(t)+uT(t)Ru(t)+xT(T)Qx(T). (2)

We let denote the system parameters. is not known to controller. We assume that the system is scalar, i.e., .

## Iii Preliminaries on Jump Markov Linear Systems

Note that (1) is a Jump Markov Linear System (JMLS), and if the system parameter is known, the optimal controls can be obtained by using Dynamic Programming [8].

There are matrices such that the optimal control at is given by . We let denote the optimal matrices when system parameter is equal to .

We let denote the “cost-to-go” when system state is equal to , channel state is and system dynamics are described by . In fact value function is piecewise linear, and we let denote the corresponding matrices. We also let be the optimal operating cost.

Notation

: For a random variable (r.v.)

, let denote its projection onto the space of measurable funcions, i.e., its conditional expectation w.r.t. sigma-algebra . For 222 denotes the set of integers., we let . For a set of r.v. s , we let denote the smallest sigma-algebra with respect to which each r.v. in is measurable. For functions , we say if . For a set , we let denote its complement.

## Iv Upper Confidence Bounds for NCS (UCB-NCS)

Let . A learning policy, or an adaptive controller is a collection of maps . Let denote the estimates of at time defined as follows. Let , and .

 ^p(t)=t∑s=1ℓ(s)/t, ^A(t)∈argmin1/2[λA2+t−1∑s=1(z(s)−Ax(s))2(1−ℓ(s))], ^B(t)∈ (3)

Define

 V1(t): =λ+t−1∑s=1x2(s)(1−ℓ(s)),V2(t):=λ+t−1∑s=1u2(s)ℓ(s), γi(δ,t) :=√log(λVi(t)/δ), i=1,2. (4)

Let be the confidence intervals associated with the estimates at time defined as follows,

 C1(t): ={A:|A−^A(t)|≤β1(t)}, (5) C2(t): ={B:|B−^B(t)|≤β2(t)}, C3(t): ={p:|p−^p(t)|≤β3(t)}, (6)

where

 β1(t): =(γ1(δ,t)+λ1/2)/√V1(δ,t), β3(t):=√log(1/δ)/t β2(t): =(γ2(δ,t)+λ1/2)√V2(t)+Kmax(γ1(δ,t)+λ1/2)√V1(δ,t).

The learning rule decomposes the cumulative time into episodes, and implements a single stationary controller within each single episode that chooses as a function of . Let denote the starting time of -th episode. The controller implemented within episode is obtained at time by solving the following optimization problem.

 minθ∈C(τk)∩ΘJθ, (7)

where is the set of “allowable” parameters. Let denote a solution to above problem. It implements the optimal controller corresponding to the case when true system parameters are equal to . . Thus, for .

A new episode begins when either or doubles or the operating time spent in current episode becomes equal to length of previous episode. The learning rule also ensures that the durations of episodes are at least time-slots, i.e., . We set

 θ(t):=θ(τk),∀t∈[τk,τk+1−1],

i.e., it is the current value of the UCB estimate of . UCB-NCS is summarized in Algorithm 1.

## V Large Deviation Bounds on Estimation Errors

We now analyze the estimation errors .

###### Lemma 1

Define

 E:={ω:θ(⋆)=(A(⋆),B(⋆),p(⋆))∈C(t), ∀t∈[1,T]}.

We then have that

 P(Ec)≤3δ.

It can be shown that

 e1(t)=−λA/V1(t)+t−1∑s=1w(s)x(s)(1−ℓ(s))/V1(t). (8)

Note that is a martingale difference sequence w.r.t. , while is adapted to . Thus, bound on follows by using self-normalized bounds on martingales from Corollary 1 of [1].

To analyze , we observe,

 e2(t)=(t−1∑s=1w(s)u(s)ℓ(s)/V2(t)−λB/V2(t)) +[A−^A(t)]t−1∑s=1x(s)u(s)ℓ(s)/V2(t). (9)

The first term within braces is bounded using Corollary 2 of [1]. To bound the second term, we observe that it is upper-bounded by . We then use bounds on to bound it. Bound on estimation error of is obtained using Azuma-Hoeffding inequality.

## Vi Large Deviation Bounds on the System State |x(t)|

We now bound under UCB-NCS. System evolution under UCB-NCS is given by

 x(t+1)=Asw(t)x(t)+w(t),t∈[1,T−1],

where

 Asw(t):=[(A(⋆)+B(⋆)Kθ(t)(ℓ(t)))ℓ(t)+A(⋆)(1−ℓ(t))].

Thus,

 x(t)=x(0)G(0,t)+t−1∑s=1w(s)G(s,t−1), (10)

where

 G(s1,s2):=⎧⎪⎨⎪⎩s2∏ℓ=s1Asw(ℓ) if s2>s1,1 if s1=s2.

Consider the deviations

 Δ(t1,t2):=t2∑s=t1ℓ(s)−p(⋆)(t2−t1),

and the events,

 Jt1,t2:={ω:|Δ(t1,t2)|≤√2ασ2p(⋆)(t2−t1)log(t2−t1)}, (11)

where , and . It follows from Azuma-Hoeffding inequality that

 P(Jct1,t2)≤1(t2−t1)α, ∀t1,t2∈[1,T]. (12)

Fix a sufficiently large 333It suffices to let , and define

 J:=∩t1,t2:t2≥t1+L Jt1,t2. (13)

The following result by combining union bound with the bound (12).

###### Lemma 2
 P(Jc)≤T2/Lα.

We now focus on upper-bounding on .

Throughout, we assume that the true system parameter , and the set used by UCB-NCS, satisfy the following.

###### Assumption 1

Define

 Λ(θ):=E(logAsw(t)|θ(t)=θ).

Let . Then,

 Λ(θ)<−η−ϵ<0, ∀θ∈Θ. (14)

We call as the “margin of stability” of the NCS. Note that depends upon a) , b) .

Consider an element of , and assume there are episodes during the time period . Let denote the number of times channel state assumes value , and let denote the UCB estimate of during the -th episode. Let denote the duration of -th episode. We have the following,

 |G(s,t)|=t∏m=sAsw(ℓ) ≤K∏k=1exp(DkΛ(θk))exp(√2ασ2p(⋆)DklogDk) ≤exp(−η(t−s)), (15)

where the first inequality follows from definition of  (13), while the second follows from Assumption 1.

Let

 H:={ω:maxt∈[1,T]|w(t)|≤log1/2(T/δ)}.

Following is easily proved.

###### Lemma 3

We have

 P(Hc)≤δ.
###### Lemma 4

Define

 g(δ,T):=|x(0)|+log1/2(T/δ)/(1−exp(−η)). (16)

Under Assumption 1, we have the following on

 |x(t)|

Note that we have suppressed dependence of function upon .

The proof follows by substituting in (10) the bound (VI) on and the bound on on the set .

## Vii Regret Analysis of UCB-NCS

Define , the regret incurred by UCB-NCS until time as follows

 R(T): =T∑t=1c(t)−TJθ(⋆), where c(t): =Qx2(t)+Ru2(t). (17)

For , define

 xθ(t+1;u)=Ax(t)+Bu+w(t).

Similarly, let be drawn i.i.d. according to .

###### Lemma 5

On the set , can be upper-bounded as follows,

 R(T)≤R1+R2,

where,

 R1: =T−1∑t=1Vθ(t)(xθ(⋆)(t+1;u(t)),ℓ(⋆)(t+1))Ft −Vθ(t)(x(t),ℓ(t)) R2: =T−1∑t=1Vθ(t)(xθ(t)(t+1;u(t)),ℓθ(t)(t+1))Ft −Vθ(t)(xθ(⋆)(t+1;u(t)),ℓ(⋆)(t+1))Ft.

Consider the Bellman optimality equation at time when the true system parameter is assumed equal to ,

 Jθ(t)+Vθ(t)(x(t),ℓ(t))=Qx2(t) +minu∈R[Ru2+Vθ(t)(xθ(t)(t+1;u),ℓθ(t)(t+1))Ft] =Qx2(t)+Ru2(t)+Vθ(t)(xθ(⋆)(t+1;u(t)),ℓ(⋆)(t+1))Ft +Vθ(t)(xθ(t)(t+1;u(t)),ℓθ(t)(t+1))Ft −Vθ(t)(xθ(⋆)(t+1;u(t)),ℓ(⋆)(t+1))Ft (18)

where the second equality follows since the learning rule applies controls by assuming that is the true system parameter. Note that on , serves as a lower bound on the optimal cost , so that serves as an upper-bound on . Proof is completed by re-arranging the terms in (VII), and summing them from to . We now bound the terms on .

### Vii-a Bounding R1

We decompose as follows, , where,

 T1: =T−1∑t=1Vθ(t−1)(xθ(⋆)(t;u(t−1)),ℓ(⋆)(t))Ft−1 −Vθ(t)(x(t),ℓ(t)), T2: =Vθ(T−1)(xθ(⋆)(T;u(T−1)),ℓ(⋆)(T))FT−1 −Vθ(1)(x(1),ℓ(1)).

We further decompose as follows,

 T1=T3+T4,

where,

 T3: =T−1∑t=1Vθ(t−1)(xθ(⋆)(t;u(t−1)),ℓ(⋆)(t))Ft−1 −Vθ(t−1)(x(t),ℓ(t)) T4: =T−1∑t=1Vθ(t)(x(t),ℓ(t))−Vθ(t−1)(x(t),ℓ(t)).
###### Lemma 6
 P(T3>√Tg(δ,T)log(T/δ))≤δ+P([H∩J]c),

where is as in (16).

is a martingale, though its increments are not bounded. However, its increments are upper-bounded as . It follows from Lemma 4 that its increments are upper-bounded as on . The proof then follows from Proposition 34 of [11]. Henceforth denote

 G:={ω:T3<√Tg(δ,T)log(T/δ)}.

We obtain the following bound on by combining results of Lemma 6 and Lemma 14.

###### Lemma 7 (Bounding R1)

Let

 U1: =√Tg(δ,T)log(T/δ) +2Pmaxg2(δ,T)+Pmaxf(δ,T)g(δ,T), (19)

where are as in (16), (13). On we have .

### Vii-B Bounding R2

We decompose as follows,

 R2=T5+T6. (20)

where

 T5: =T−1∑t=1Vθ(t)(xθ(t)(t+1;u(t)),ℓθ(t)(t+1))Ft −Vθ(t)(xθ(⋆)(t+1;u(t)),ℓθ(t)(t+1))Ft T6: =T−1∑t=1Vθ(t)(xθ(⋆)(t+1;u(t)),ℓθ(t)(t+1))Ft −Vθ(t)(xθ(⋆)(t+1;u(t)),ℓ(⋆)(t+1))Ft.

Note that under UCB-NCS, we have that . Let

 Kmax:=supθ∈Θ,ℓ∈{0,1}Kθ(ℓ),Pmax:=supθ∈Θ,ℓ∈{0,1}Pθ(ℓ). (21)

After performing simple algebraic manipulations, we can show that

 T5 ≤PmaxT−1∑t=1∣∣(Aθ(t)x(t)+Bθ(t)u(t))2 −(A(⋆)x(t)+B(⋆)u(t))2∣∣ ≤Pmax T1/27×T1/28 (22)

where

 T7:=T−1∑t=1∣∣Aθ(t)x(t)−A(⋆)x(t)+Bθ(t)u(t)−B(⋆)u(t)∣∣2, T8:=T∑t=1∣∣Aθ(t)x(t)+Bθ(t)u(t)+A(⋆)x(t)+B(⋆)u(t)∣∣2,

and the last inequality in (VII-B) follows from Cauchy-Schwartz inequality. The terms are bounded in Lemma 10 and Lemma 11 in Appendix. We substitute these bounds in (VII-B) and obtain the following result.

###### Lemma 8

On , we have

 T5 ≤C1√Tlog(V1(T)/λ)(γ1(δ,T)+γ2(δ,T)+2λ1/2) ×√h(δ,T) g3/2(δ,T), where, (23) C1: =2√2Pmax(1+Kmax)Gcl,max/λ. (24)

It remains to bound in order to bound . This is done in Lemma 12 of Appendix.

###### Lemma 9

Let

 U2:=C1√Tlog(V1(T)/λ)(γ1(δ,T)+γ2(δ,T)+2λ1/2) ×√h(δ,T) g3/2(δ,T) +Pmax(G2cl,max g(δ,T)+σ2)√αTlogT.

On , we have .

Follows by substituting bounds on and from Lemma 8 and Lemma 12 into (20).

## Viii Main Result

###### Theorem 1 (Bound on Regret)

Consider the NCS operating under UCB-NCS described in Algorithm 1. Under Assumption 1, with a probability at least . The terms are defined in (7) and (23) respectively. Upon ignoring terms and factors that are , this bound simplifies to

 √T(log1/4(1/δ)+√αPmaxG2cl,maxlog1/2(1/δ)+C1).

It follows from Lemma 5 that on . Proof then follows by substituting upper-bounds from Lemma 7, Lemma 9, and using union bound to lower-bound the probability of .

## Ix Conclusion and Future Work

We propose UCB-NCS, an adaptive control law, or learning rule for NCS, and provide its finite-time performance guarantees. We show that with a high probability, its regret scales as upto constant factors. We identify a certain quantity which we call margin of stability of NCS. Regret increases with a smaller margin, which indicates a low network quality.

Results in this work can be extended in various directions. So far we considered only scalar systems. A natural extension is to the case of vector systems. Another direction is to derive lower-bounds on expected value of regret that can be achieved under any admissible control policy.

###### Lemma 10 (Bounding T7)

On , we have

 T7 ≤(1+Kmax)2(γ1(δ,T)+γ2(T)+2λ1/2)2 ×2{2√h(δ,T)}2⋅g2(δ,T)⋅1λlog(V1(T)λ).

Let be the time step at which the latest episode begins. Since under UCB-NCS we have , it can be shown that

 ∣∣(Aθ(t)x(t)−A(⋆)x(t))+(Bθ(t)u(t)−B(⋆)u(t))∣∣ ≤∣∣(Aθ(t)−A(⋆))∣∣|x(t)|+∣∣(Bθ(t)−B(⋆))∣∣Kmax|x(t)|. (25)

Now consider the following inequality,

 ∣∣(Aθ(t)−A(⋆))∣∣≤∣∣Aθ(t)−^A(t)∣∣+∣∣^A(t)−A(⋆)∣∣. (26)

For , we have,

 ∣∣A−^A(τ)∣∣|x(t)| ≤√V1(τ)∣∣A−^A(τ)∣∣|x(t)|√V1(t)√h(δ,T) ≤(γ1(τ)+λ1/2)|x(t)|√V1(t)√h(δ,T), (27)

where the first inequality follows from Lemma 14, and second inequality follows from the size of the confidence intervals (5). On , we have , and also ; so we use inequality (IX) with set equal to , and combine the resulting inequalities with (26) in order to obtain the following,

 ∣∣(Aθ(t)−A(⋆))∣∣|x(t)| ≤2√h(δ,T)(γ1(δ,t)+λ1/2)|x(t)|√V1(t). (28)

A similar bound can be obtained for also. Remaining proof comprises of substituting these bounds in (IX) and performing algebraic manipulations. We also utilize Lemma 10 of [2] in order to bound .

###### Lemma 11 (Bounding T8)

On , we have

 T8≤G2cl,max Tg(δ,T),

where

 Gcl,max:=supθ∈Θ,ℓ∈{0,1}{|Aθ+BθKℓ(θ)|,|A(⋆)+B(⋆)Kℓ(θ)|}

Follows from Lemma 4.

###### Lemma 12

On we have

 T6≤Pmax(G2cl,maxg(δ,T)+σ2)√αTlogT.

We have

 T6 ≤T−1∑t=1|pθ(t)−p(⋆)|Pmax(G2cl,maxx2(t)+σ2) ≤Pmax(G2cl,maxmaxt∈[1,T]x2(t)+σ2)(T