Identification and Adaptation with Binary-Valued Observations under Non-Persistent Excitation Condition

Dynamical systems with binary-valued observations are widely used in information industry, technology of biological pharmacy and other fields. Though there have been much efforts devoted to the identification of such systems, most of the previous investigations are based on first-order gradient algorithm which usually has much slower convergence rate than the Quasi-Newton algorithm. Moreover, persistence of excitation(PE) conditions are usually required to guarantee consistent parameter estimates in the existing literature, which are hard to be verified or guaranteed for feedback control systems. In this paper, we propose an online projected Quasi-Newton type algorithm for parameter estimation of stochastic regression models with binary-valued observations and varying thresholds. By using both the stochastic Lyapunov function and martingale estimation methods, we establish the strong consistency of the estimation algorithm and provide the convergence rate, under a signal condition which is considerably weaker than the traditional PE condition and coincides with the weakest possible excitation known for the classical least square algorithm of stochastic regression models. Convergence of adaptive predictors and their applications in adaptive control are also discussed.

Authors

• 2 publications
• 1 publication
• 13 publications
11/04/2021

Quasi-Newton Methods for Saddle Point Problems

This paper studies quasi-Newton methods for solving strongly-convex-stro...
11/02/2020

Asynchronous Parallel Stochastic Quasi-Newton Methods

Although first-order stochastic algorithms, such as stochastic gradient ...
03/30/2020

Non-asymptotic Superlinear Convergence of Standard Quasi-Newton Methods

In this paper, we study the non-asymptotic superlinear convergence rate ...
04/26/2017

Stochastic Orthant-Wise Limited-Memory Quasi-Newton Methods

The ℓ_1-regularized sparse model has been popular in machine learning so...
04/16/2019

An efficient stochastic Newton algorithm for parameter estimation in logistic regressions

Logistic regression is a well-known statistical model which is commonly ...
10/23/2016

Fast and Reliable Parameter Estimation from Nonlinear Observations

In this paper we study the problem of recovering a structured but unknow...
08/14/2019

New Results on Parameter Estimation via Dynamic Regressor Extension and Mixing: Continuous and Discrete-time Cases

We present some new results on the dynamic regressor extension and mixin...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Background

The purpose of this paper is to study parameter estimation and adaptation for stochastic systems, in which the system output cannot be measured accurately, and the only available information is whether or not the output belongs to some set (Wang et al. (2003)). Specifically, consider the following standard stochastic regression model:

 yk+1=ϕTkθ+vk+1,k=0,1,2,… (1)

where

represent the system output, random noise and regression vector, respectively, and

is an unknown parameter vector to be estimated. The system output can only be observed with binary-valued measurements:

 sk+1=I(yk+1≥ck)={1,yk+1≥ck;0,otherwise, (2)

where denotes a given threshold sequence, and is the indicator function.

This type of observations emerges widely in practical systems with the development of science and technology. One example comes from neuron systems(

Ghysen (2003)

) where, instead of measuring exact internal potential, the systems only provide information of states (excitation or inhibition). When the potential is smaller than the potential threshold, the neuron shows the inhibition state, otherwise shows the excitation state. The objective of classification via neural network is to learn the system parameters based on states of neurons only. Another example comes from sensor networks(

Zhang et al. (2019)) where, for set-valued sensor networks, the information from each sensor turns out to be quantized observations with finite number of bits or even 1 bit. Specifically, each sensor only provide information whether the measured value is larger than a designed threshold or not. Usually, such sensors are more cost effective than regular sensors. Besides, there are also numerous other examples such as ATM ABR traffic control; gas content sensors (, , etc.) in gas and oil industry and switching sensors for shift-by-wire in automotive applications(Wang et al. (2003)).

1.2 Related works

Due to the widely use of systems with quantized observations, a number of basic problems concerning identification and control emerge, which need theoretical investigations. In fact, the estimation of quantized output systems has constituted a vast literature recently. Wang et al. (2003); Wang and Yin (2007); Wang et al. (2006); Zhao et al. (2007) gave a strongly consistent identification algorithm under periodic signals. Jafari et al. (2012) proposed a recursive identification algorithm for FIR systems with binary-valued observations, and proved its convergence under -mixing property of the signals. Later, Guo and Zhao (2013) proposed a recursive projection algorithm for FIR systems with binary-valued observations and fixed thresholds. The paper established the convergence rate under the following strongly persistent excitation condition , for some fixed and constant . Besides, Marelli et al. (2013) considered ARMA system with time-varying K - level scalar quantizer also random packet dropouts, and gave the consistence result under independent and identically distributed(i.i.d.) conditions. The adaptive quantizer is considered in You (2015) for FIR systems, provided that the signals satisfy i.i.d conditions with . Moreover, under some persistent excitation conditions, Zhao et al. (2016) introduced a EM-typed algorithm, which is robust and easy to programme, and proved that the maximum likelihood criterion can be achieved as the number of iterations goes to infinity. Song (2018) presented a strongly consistent estimate and obtained the convergence rate for ARMA systems with binary sensors and unknown threshold under i.i.d Gaussian inputs. Zhang et al. (2019) considered a Quasi-Newton type algorithm under the following persistent excitation (PE) condition: where

represents the minimal eigenvalue of the matrix in question. Numerical simulations in

Zhang et al. (2019) demonstrated that their Quasi-Newton type algorithm has equivalent convergence properties for first-order FIR systems and high-order systems, where strong consistency and asymptotic efficiency for first-order FIR systems are also established. For other methods, We refer the readers to Bottegal et al. (2017), Wang and Zhang (2014) about kernel-based method and quadratic programming-based method, among others.

However, almost all of the existing investigations on identification suffer from some fundamental limitations.

Firstly, for the system with regular output sensors, i.e., in , substantial progresses had been made in the area of adaptive estimation and control (e.g.,Chen and Guo (1991)), and the excitation condition for consistency of parameter estimates need not to be persistent. For example, it is widely known that the weakest possible excitation condition for strong consistency of the least-squares estimate for stochastic regression models is the following (Lai and Wei (1982)):

 (3)

which is actually a decaying excitation condition, is much weaker than the classical PE condition, and can be applied in adaptive feedback control(see e.g., Chen and Guo (1991)). However, as mentioned above, for identification of systems with binary-valued sensors, almost all of the existing literatures need the PE conditions on signals for strong consistency, and actually, most need i.i.d or periodic signal assumptions. Though these conditions may be satisfied for some open-loop or off-line identification, they are much more difficult to be satisfied or verified for closed-loop system identification, since the input and output data of such systems are generally determined by nonlinear stochastic dynamic equations(Chen and Guo (1991)). Consequently, the problem whether or not the PE condition can be essentially weakened to e.g. (3) for identification of stochastic systems with binary-valued sensors still remains open.

Secondly, to the best of the authors’ knowledge, almost all of the existing estimation algorithms for stochastic systems with binary-valued observations and given thresholds, are designed with first-order gradient. This kind of algorithms is designed by taking the same step-size for each coordinates, which may alleviate the complexity in the convergence analysis, but will sacrifice the convergence rate of the algorithms (Ljung and Söderström (1983)). To improve the convergence properties, it is necessary to consider estimation algorithms with adaptation gain being a matrix (e.g., Hessian matrix or its modifications), rather than a scalar.

Thirdly, there are only a few results on adaptive control with binary-valued observations in the existing literature (c.f.,e.g., Guo et al. (2011), Zhao et al. (2013)), where some kinds of FIR control systems are considered and consistency of parameter estimates is needed for the optimality of adaptive control systems.

The goal of this paper is to show that the above mentioned limitations can be considerably relaxed or removed.

1.3 Contributions

Inspired by the method in Guo (1995), this paper proposes a new recursive projected Quasi-Newton type algorithm which can be viewed as a naturally extension of classical linear least-square algorithm with a projection operator. The main contributions of this paper can be summarized as follows:

• We propose a projected recursive Quasi-Newton type algorithm for stochastic regression systems with binary-valued observations. In the area of identification with binary-valued observations and given fixed thresholds, this paper appears to be the first to establish almost sure convergence for Quasi-Newton type estimation algorithms where the adaptation gains are matrices.

• The weakest possible excitation condition known for strong consistency of the classical least-squares algorithm, is proven to be sufficient for strong consistency of the proposed new estimation algorithm in the current binary-valued observation case. This appears to be the first time to achieve such a strong result in the literature of system identification with binary-valued observations.

• We also obtain a celebrated result on the asymptotic order of the accumulated regret of adaptive prediction, that is , , which does not need any excitation condition and can be conveniently used in adaptive control to give better results than the existing ones in the literature.

The remainder of this paper is organized as follows. In Section 2, we give the main results of this paper, including the assumptions, proposed algorithms and main theorems; Section 3 presents the proofs of the main results together with some key lemmas. Some numerical examples are provided in Section 4. Finally, we conclude the paper with some remarks.

2 The main results

Consider the stochastic regression model (1)-(2) with binary-valued observations. The objectives of this paper are, to propose a strongly consistent estimator for the unknown parameter vector under a non-PE condition, and to give an asymptotically optimal adaptive predictor for the regular output together with its applications in adaptive tracking.

2.1 Notations and assumptions

For our purpose, we introduce some notations and assumptions first.

Notations. By , we denote the Euclidean-norm of vectors or matrices. The spectrum of a symmetric matrix is denoted by , where the maximum and minimum eigenvalues are denoted by and respectively. Moreover, by or we mean the determinant of the matrix .

Assumption 1.

Let be a non-decreasing sequence of algebras such that is measurable with a known upper bound:

 supk≥1∥ϕk∥=M<∞,a.s. (4)

where

may be a random variable.

Assumption 2.

The true parameter belongs to a bounded convex set , and we denote

 supx∈D∥x∥=L<∞,a.s. (5)
Assumption 3.

The given threshold is an adapted sequence, with a known upper bound:

 supk≥0|ck|=C<∞,a.s. (6)

where may be a random variable.

Assumption 4.

The noise is integrable and measurale. For any

, the conditional probability density function of

given , denoted by , is known and satisfies

 inf|x|≤LM+C{fk(x)}>0,k=0,1,⋯,a.s. (7)

where , and are defined by , and .

Remark 1.

It can be easily seen that if the threshold is fixed, then Assumption 3 will be satisfied automatically. Moreover, if the noise is independent with the algebra

, and with identically normal distribution as assumed previously (see,e.g.,

Guo and Zhao (2013), Zhang et al. (2019)), then the condition in Assumption 4 will be satisfied.

2.2 Recursive algorithm and adaptive predictor

To construct a Quasi-Newton type identification algorithm, we need to introduce a projection operator on as follows.

Definition 1.

For the linear space , the weighted norm associated with a positive definite matrix is defined as

 ∥x∥Q=xτQx,∀x∈Rp. (8)
Definition 2.

For a given convex compact set , and a positive definite matrix Q, the projection operator is defined as

 ΠQ(x)=argminω∈Ω∥x−ω∥Q,∀x∈Rp (9)
Remark 2.

The well-posedness of is ensured by the positive definite property of the matrix and the convexity of (Cheney, 2001).

Our recursive identification algorithm is a kind of Quasi-Newton algorithm, defined as follows:

 ^θk+1=ΠP−1k+1{^θk+akβkPkϕkek+1}, (10)
 Pk+1=Pk−β2kakPkϕkϕτkPk, (11)
 ek+1=sk+1−1+Fk+1(ck−ϕτk^θk), (12)
 ak=11+β2kϕτkPkϕk, (13)
 0<βk+1≤min{βk,inf|x|≤LM+Cfk+2(x)}, (14)

where is the estimate of at time ; is a projection operator defined as in Definition 2;

is the conditional probability distribution function of

given the algbra ; the initial value can be chosen arbitrarily in , where is given in Assumption 2; can be arbitrarily chosen from the interval ; can also be chosen arbitrarily.

Note that by and the well-known matrix inversion formula (see, e.g., Guo (2020), Theorem 1.1.17), the inverse of can be recursively rewritten as

 P−1k+1=P−1k+β2kϕkϕτk,k=0,1,⋯. (15)

Thus, is positive-definite since the initial condition , which ensures the well-posedness of the projection operator in Algorithm 2.2.

Moreover, since both and are measurable, we have

 (16)

which is the best prediction for given in the mean square sense. Note that

can be obtained by the known conditional probability density function

in Assumption 4. Replacing the unknown parameter in by its estimate , we can obtain a natural adaptive predictor of as follows:

 ^yk+1=^θτkϕk+E(vk+1∣Fk). (17)

The difference between the best prediction and adaptive prediction can be regarded as regret, denoted as , i.e.,

 Rk= [E(yk+1∣Fk)−^yk+1]2=(~θτkϕk)2, (18)

where . One may naturally expect that the regret be small in some sense, which will be useful in adaptive control. Details will be discussed in the subsequent section.

Throughout the sequel, for convenience , let us introduce the following notations:

 γk=1/βk, (19)
 ωk+1=sk+1−1+Fk+1(ck−θτϕk), (20)
 ψk=Fk+1(ck−^θτkϕk)−Fk+1(ck−θτϕk)). (21)

2.3 Global convergence results

The following three theorems are the main results of this paper. Under no excitation conditions, we will establish some nice asymptotic upper bounds for the parameter estimation error, the accumulated regrets of adaptive prediction, and the tracking error of adaptive control.

Theorem 1.

Under Assumptions 1-4, the estimation error produced by the estimation Algorithm 2.2 has the following upper bound:

 ∥∥~θn+1∥∥2= O(log(λmax{P−1n+1})λmin{P−1n+1}),a.s. (22)

where .

The detailed proof of Theorem 1 is supplied in the next section.

Corollary 1.

Let the conditions of Theorem 1 hold, and let the conditional probability density function of the noise sequence have a uniformly positive lower bound:

 inf|x|≤LM+C,k≥0{fk(x)}>0,a.s. (23)

Then

 ∥∥~θn+1∥∥2=O(lognλmin{P−10+∑ni=1ϕiϕτi}),a.s. (24)
Remark 3.

Let the noise be independent with algebra

, and normally distributed with zero mean and variance

, . Then the condition will be satisfied if has an upper and positive lower bound.

Remark 4.

From we know that if we have

 (25)

as , then the estimates given by Algorithm 2.2 will be strongly consistent, i.e., The condition is much weaker than the traditional persistent excitation condition, which requires that  . Also, the condition is equal to the Lai-Wei excitation condition for classical least-squares algorithm with regular output sensors.

Theorem 2.

Consider the estimation Algorithm 2.2 under Assumptions 1-4. The sample paths of the accumulated regrets will have the following upper bound:

 n∑k=0Rk=O(γ2nlog|P−1n+1|),a.s. (26)

where , are defined by and , respectively.

The proof of Theorem 2 is given in Section 3.

According to Theorem 2, one can directly deduce the following corollary.

Corollary 2.

Let the conditions of Theorem 2 hold, and let be the conditional probability density function of the noise sequence as defined in Assumption 4. Then we have the following two basic results for the accumulated regret of adaptive prediction:

• If has a uniformly positive lower bound, i.e.

 inf|x|≤LM+C,k≥0{fk(x)}>0,a.s. (27)

then

 n∑k=0Rk=O(logn),a.s. (28)
• If does not have a uniformly positive lower bound but satisfies

 √logkk=o(inf|x|≤LM+C{fk(x)}),a.s. (29)

then

 n∑k=0Rk=o(n),a.s. (30)
Remark 5.

Let the noise sequence be independent and normally distributed with zero mean and variance . Then the condition will be satisfied if has both upper and lower positive bounds; the conditions will be satisfied if and .

Remark 6.

The result in Corollary 2 is similar to the result for the classical LS algorithm for linear stochastic regression models with regular sensors, where the order for the accumulated regrets is the best possible among all adaptive predictors(see Lai (1986)).

As in the regular observation case (see Guo (1995)), an important application of Theorem 2 is in adaptive control of stochastic systems with binary-valued observations, as stated in the following theorem:

Theorem 3.

Let the conditions of Theorem 2 hold. And let the conditional probability density function satisfy and

 supkE[|vk|α∣Fk−1]<∞,a.s., (31)

for some . If the regression vectors can be influenced by an input signal , such that for a given bounded sequence of reference signals , the following equation can be satisfied by choosing :

 ^θτkϕk+E(vk+1∣Fk)=y∗k+1. (32)

Then the averaged tracking error , defined by

 Jn=1nn−1∑k=0(yk+1−y∗k+1)2, (33)

will approach to its minimum value with the following best possible almost sure convergence rate:

 (34)

where .

The detailed proof of Theorem 3 is given in the next section.

3 Proofs of the main results

To prove the main results, we first introduce several lemmas.

Lemma 1.

(Cheney, 2001).The projection operator given by Definition 2 satisfies

 ∥ΠQ(x)−ΠQ(y)∥Q≤∥x−y∥Q∀x,y∈Rp (35)
Lemma 2.

(Chen and Guo, 1991).Let be a martingale difference sequence and an adapted sequence. If

 supnE[|ωn+1|α∣Fn]<∞a.s. (36)

for some , then as :

 n∑i=0fiωi+1=O(sn(α)log1α+η(sαn(α)+e))a.s.,∀η>0, (37)

where

 sn(α)=(n∑i=0|fi|α)1α (38)
Lemma 3.

(Lai and Wei, 1982). Let   be a sequence of vectors in and let . Let denote the determinant of . Assume that is nonsingular, then as

 n∑k=0XτkA−1kXk1+XτkA−1kXk=O(log(|An|)). (39)
Lemma 4.

(Guo, 1995). Let   be any bounded sequence of vectors in . Denote with , then we have

 ∞∑k=0(XτkA−1kXk)2<∞. (40)

Finally, the proofs of Theorems 1-3 will immediately follow from the following Lemma 5, which can be proven by using Lemmas 1-4.

Lemma 5.

Let Assumptions 1-4 be satisfied. Then the parameter estimate given by Algorithm 2.2 has the following property as n :

 ~θτn+1P−1n+1~θn+1+β2nn∑k=0(~θτkϕk)2=O(log|P−1n+1|). (41)

Proof. By Assumptions 1-4 and (10)-(14), is measurable, and satisfies

 |ψk|≤1,k=0,1,⋯ (42)

Moreover, by and

 E(ωk+1∣Fk)=0, (43)

which means is a martingale difference sequence.

Following the analysis ideas of the classical least-squares for linear stochastic regression models(see e.g., Moore (1978), Lai and Wei (1982), Guo (1995)), we consider the following stochastic Lyapunov function:

 Vk=~θτkP−1k~θk.

By Lemma 1 and Algorithm 2.2 , we know that

 Vk+1=~θτk+1P−1k+1~θk+1 (44) ≤ {~θk−akPkβkϕk[sk+1−1+Fk+1(ck−ϕτk^θk)]}τP−1k+1⋅ {~θk−akPkβkϕk[sk+1−1+Fk+1(ck−ϕτk^θk)]} = {~θk−akPkβkϕk[ψk+ωk+1]}τP−1k+1⋅ {~θk−akPkβkϕk[ψk+ωk+1]} = ~θτkP−1k+1~θk−2akβk~θτkP−1k+1Pkϕkψk +a2kβ2kψ2kϕτkPkP−1k+1Pkϕk +2a2kβ2kψkϕτkPkP−1k+1Pkϕkωk+1 −2akβkϕτkPkP−1k+1~θkωk+1 +a2kβ2kϕτkPkP−1k+1Pkϕkω2k+1.

Let us now analyze the right-hand-side (RHS) of term by term. From (15), we know that

 ~θτkP−1k+1~θk=~θτkP−1k~θk+β2k(~θτkϕk)2. (45)

Moreover, by (15) again, we know that

 akP−1k+1Pkϕk (46) = ak(I+β2kϕkϕτkPk)ϕk = akϕk(1+β2kϕτkPkϕk) = ϕk.

Hence, we have

 2akβk~θτkP−1k+1Pkϕkψk (47) = 2βk~θτkϕkψk=2βk|~θτkϕk|⋅|ψk| ≥ 2β2k(~θτkϕk)2,

where the second equality is since is an increasing function, and the last inequality holds by (14) and

 |ψk|=∣∣ ∣∣∫ck−^θτkϕkck−θτϕkfk+1(x)dx∣∣ ∣∣≥βk|~θτkϕk|. (48)

Similarly, by ,

 a2kβ2kψ2kϕτkPkP−1k+1Pkϕk (49) = akβ2kψ2kϕτkPkϕk≤akβ2kϕτkPkϕk

where we have used the fact that .

Now, substituting , and into we get

 Vk+1≤ ~θτkP−1k~θk−β2k(~θτkϕk)2+akβ2kϕτkPkϕk (50) +2a2kβ2kψkϕτkPkP−1k+1Pkϕkωk+1 −2akβkϕτkPkP−1k+1~θkωk+1 +a2kβ2kϕτkPkP−1k+1Pkϕkω2k+1,

Summing up both sides of from 0 to , we have

 Vn+1≤ ~θτ0P−10~θ0+n∑k=0akβ2kϕτkPkϕk (51) −β2kn∑k=0(~θτkϕk)2 +2n∑k=0a2kβ2kψkϕτkPkP−1k+1Pkϕkωk+1 −2n∑k=0akβkϕτkPkP−1k+1~θkωk+1 +n∑k=0a2kβ2kϕτkPkP−1k+1Pkϕkω2k+1

We now analyze the last three terms in which are related to the martingale difference sequence .

Let in Lemma 3 and Lemma 4, we get

 n∑k=0akβ2kϕτkPkϕk=O(log∣∣P−1n+1∣∣), (52)
 n∑k=0(β2kϕτkPkϕk)2=O(1), (53)

respectively. Moreover, since , we have

 supkE[|ωk+1|2∣Fk]<∞,a.s. (54)

Denote

 ~Sn= ⎷n∑k=0(akβ2kψkϕτkPkϕk)2. (55)

By and the boundedness of and , we know that . Consequently, by and Lemma 2, we have

 n∑k=0a2kβ2kψkϕτkPkP−1k+1Pkϕkωk+1 (56) = n∑k=0akβ2kψkϕτkPkϕkωk+1 = O(~Snlog12+η(~S2n+e))=O(1), a.s. ∀η>0.

Also, by Lemma 2 and again, we know that

 n∑k=0akβkϕτkPkP−1k+1~θkωk+1 (57) = n∑k=0βkϕτk~θkωk+1 = O(n∑k=0(βk~θτkϕk)2)12+η = o(n∑k=0(βk~θτkϕk)2)+O(1) a.s. ∀η>0

As for the last term of right side of , since , we have

 supkE[∣∣ω2k+1−E[ω2k+1∣Fk]∣∣2∣Fk]≤1, a.s. , (58)

Denote , by Lemma 2 and letting , we get

 n∑k=0a2kβ2kϕτkPkP−1k+1Pkϕk{ω2k+1−E[ω2k+1∣Fk]} (59) = = O(Λnlog12+η(Λ2n+e))=O(1), a.s. ∀η>0

where the last equality is from and . Hence, from and

 n∑k=0akβ2kϕτkPkϕkω2k+1 (60) ≤ +supkE[ω2k+1∣Fk](n∑k=0akβ2kϕτkPkϕk) = O(log|P−1n+1|)) a.s.

Combine , , , , we thus have

 ~θτn+1P−1n+1~θn+1+n∑k=0(βk~θτkϕk)2=O(log|P−1n+1|), a.s. . (61)

Note that is a non-increasing sequence, we finally obtain . ∎

Proofs of Theorems 1 and 2. We note that

 λmin{P−1n+1}∥~θn+1∥2≤~θτn+1P−1n+1~θn+1. (62)

Then Theorem 1 follows from Lemma 5 immediately. Moreover, note that , Theorem 2 also follows from Lemma 5.∎

Proof of Theorem 3. By the definitions of , and the equation , we know that

 Jn= 1nn−1∑k=0[yk+1−y∗k+1]2 (63) = 1nn−1∑k=0[yk+1−ϕτk^θk−E(vk+1∣Fk)]2 = +1nn−1∑k=02(ϕτk~θk)[vk+1−E(vk+1∣Fk)],

We now estimate the RHS of the above equation term by term. First, by Corollary 2 we know that the first term is bounded by . For the last two terms of , by Lemma 2, we have

 n−