The purpose of this paper is to study parameter estimation and adaptation for stochastic systems, in which the system output cannot be measured accurately, and the only available information is whether or not the output belongs to some set (Wang et al. (2003)). Specifically, consider the following standard stochastic regression model:
represent the system output, random noise and regression vector, respectively, andis an unknown parameter vector to be estimated. The system output can only be observed with binary-valued measurements:
where denotes a given threshold sequence, and is the indicator function.
This type of observations emerges widely in practical systems with the development of science and technology. One example comes from neuron systems(Ghysen (2003)
) where, instead of measuring exact internal potential, the systems only provide information of states (excitation or inhibition). When the potential is smaller than the potential threshold, the neuron shows the inhibition state, otherwise shows the excitation state. The objective of classification via neural network is to learn the system parameters based on states of neurons only. Another example comes from sensor networks(Zhang et al. (2019)) where, for set-valued sensor networks, the information from each sensor turns out to be quantized observations with finite number of bits or even 1 bit. Specifically, each sensor only provide information whether the measured value is larger than a designed threshold or not. Usually, such sensors are more cost effective than regular sensors. Besides, there are also numerous other examples such as ATM ABR traffic control; gas content sensors (, , etc.) in gas and oil industry and switching sensors for shift-by-wire in automotive applications(Wang et al. (2003)).
1.2 Related works
Due to the widely use of systems with quantized observations, a number of basic problems concerning identification and control emerge, which need theoretical investigations. In fact, the estimation of quantized output systems has constituted a vast literature recently. Wang et al. (2003); Wang and Yin (2007); Wang et al. (2006); Zhao et al. (2007) gave a strongly consistent identification algorithm under periodic signals. Jafari et al. (2012) proposed a recursive identification algorithm for FIR systems with binary-valued observations, and proved its convergence under -mixing property of the signals. Later, Guo and Zhao (2013) proposed a recursive projection algorithm for FIR systems with binary-valued observations and fixed thresholds. The paper established the convergence rate under the following strongly persistent excitation condition , for some fixed and constant . Besides, Marelli et al. (2013) considered ARMA system with time-varying K - level scalar quantizer also random packet dropouts, and gave the consistence result under independent and identically distributed(i.i.d.) conditions. The adaptive quantizer is considered in You (2015) for FIR systems, provided that the signals satisfy i.i.d conditions with . Moreover, under some persistent excitation conditions, Zhao et al. (2016) introduced a EM-typed algorithm, which is robust and easy to programme, and proved that the maximum likelihood criterion can be achieved as the number of iterations goes to infinity. Song (2018) presented a strongly consistent estimate and obtained the convergence rate for ARMA systems with binary sensors and unknown threshold under i.i.d Gaussian inputs. Zhang et al. (2019) considered a Quasi-Newton type algorithm under the following persistent excitation (PE) condition: where
represents the minimal eigenvalue of the matrix in question. Numerical simulations inZhang et al. (2019) demonstrated that their Quasi-Newton type algorithm has equivalent convergence properties for first-order FIR systems and high-order systems, where strong consistency and asymptotic efficiency for first-order FIR systems are also established. For other methods, We refer the readers to Bottegal et al. (2017), Wang and Zhang (2014) about kernel-based method and quadratic programming-based method, among others.
However, almost all of the existing investigations on identification suffer from some fundamental limitations.
Firstly, for the system with regular output sensors, i.e., in , substantial progresses had been made in the area of adaptive estimation and control (e.g.,Chen and Guo (1991)), and the excitation condition for consistency of parameter estimates need not to be persistent. For example, it is widely known that the weakest possible excitation condition for strong consistency of the least-squares estimate for stochastic regression models is the following (Lai and Wei (1982)):
which is actually a decaying excitation condition, is much weaker than the classical PE condition, and can be applied in adaptive feedback control(see e.g., Chen and Guo (1991)). However, as mentioned above, for identification of systems with binary-valued sensors, almost all of the existing literatures need the PE conditions on signals for strong consistency, and actually, most need i.i.d or periodic signal assumptions. Though these conditions may be satisfied for some open-loop or off-line identification, they are much more difficult to be satisfied or verified for closed-loop system identification, since the input and output data of such systems are generally determined by nonlinear stochastic dynamic equations(Chen and Guo (1991)). Consequently, the problem whether or not the PE condition can be essentially weakened to e.g. (3) for identification of stochastic systems with binary-valued sensors still remains open.
Secondly, to the best of the authors’ knowledge, almost all of the existing estimation algorithms for stochastic systems with binary-valued observations and given thresholds, are designed with first-order gradient. This kind of algorithms is designed by taking the same step-size for each coordinates, which may alleviate the complexity in the convergence analysis, but will sacrifice the convergence rate of the algorithms (Ljung and Söderström (1983)). To improve the convergence properties, it is necessary to consider estimation algorithms with adaptation gain being a matrix (e.g., Hessian matrix or its modifications), rather than a scalar.
Thirdly, there are only a few results on adaptive control with binary-valued observations in the existing literature (c.f.,e.g., Guo et al. (2011), Zhao et al. (2013)), where some kinds of FIR control systems are considered and consistency of parameter estimates is needed for the optimality of adaptive control systems.
The goal of this paper is to show that the above mentioned limitations can be considerably relaxed or removed.
Inspired by the method in Guo (1995), this paper proposes a new recursive projected Quasi-Newton type algorithm which can be viewed as a naturally extension of classical linear least-square algorithm with a projection operator. The main contributions of this paper can be summarized as follows:
We propose a projected recursive Quasi-Newton type algorithm for stochastic regression systems with binary-valued observations. In the area of identification with binary-valued observations and given fixed thresholds, this paper appears to be the first to establish almost sure convergence for Quasi-Newton type estimation algorithms where the adaptation gains are matrices.
The weakest possible excitation condition known for strong consistency of the classical least-squares algorithm, is proven to be sufficient for strong consistency of the proposed new estimation algorithm in the current binary-valued observation case. This appears to be the first time to achieve such a strong result in the literature of system identification with binary-valued observations.
We also obtain a celebrated result on the asymptotic order of the accumulated regret of adaptive prediction, that is , , which does not need any excitation condition and can be conveniently used in adaptive control to give better results than the existing ones in the literature.
The remainder of this paper is organized as follows. In Section 2, we give the main results of this paper, including the assumptions, proposed algorithms and main theorems; Section 3 presents the proofs of the main results together with some key lemmas. Some numerical examples are provided in Section 4. Finally, we conclude the paper with some remarks.
2 The main results
Consider the stochastic regression model (1)-(2) with binary-valued observations. The objectives of this paper are, to propose a strongly consistent estimator for the unknown parameter vector under a non-PE condition, and to give an asymptotically optimal adaptive predictor for the regular output together with its applications in adaptive tracking.
2.1 Notations and assumptions
For our purpose, we introduce some notations and assumptions first.
Notations. By , we denote the Euclidean-norm of vectors or matrices. The spectrum of a symmetric matrix is denoted by , where the maximum and minimum eigenvalues are denoted by and respectively. Moreover, by or we mean the determinant of the matrix .
Let be a non-decreasing sequence of algebras such that is measurable with a known upper bound:
where may be a random variable.
may be a random variable.
The true parameter belongs to a bounded convex set , and we denote
The given threshold is an adapted sequence, with a known upper bound:
where may be a random variable.
The noise is integrable and measurale. For any , the conditional probability density function of
, the conditional probability density function ofgiven , denoted by , is known and satisfies
where , and are defined by , and .
It can be easily seen that if the threshold is fixed, then Assumption 3 will be satisfied automatically. Moreover, if the noise is independent with the algebra , and with identically normal distribution as assumed previously (see,e.g.,
, and with identically normal distribution as assumed previously (see,e.g.,Guo and Zhao (2013), Zhang et al. (2019)), then the condition in Assumption 4 will be satisfied.
2.2 Recursive algorithm and adaptive predictor
To construct a Quasi-Newton type identification algorithm, we need to introduce a projection operator on as follows.
For the linear space , the weighted norm associated with a positive definite matrix is defined as
For a given convex compact set , and a positive definite matrix Q, the projection operator is defined as
The well-posedness of is ensured by the positive definite property of the matrix and the convexity of (Cheney, 2001).
Our recursive identification algorithm is a kind of Quasi-Newton algorithm, defined as follows:
where is the estimate of at time ; is a projection operator defined as in Definition 2;
is the conditional probability distribution function ofgiven the algbra ; the initial value can be chosen arbitrarily in , where is given in Assumption 2; can be arbitrarily chosen from the interval ; can also be chosen arbitrarily.
Note that by and the well-known matrix inversion formula (see, e.g., Guo (2020), Theorem 1.1.17), the inverse of can be recursively rewritten as
Thus, is positive-definite since the initial condition , which ensures the well-posedness of the projection operator in Algorithm 2.2.
Moreover, since both and are measurable, we have
which is the best prediction for given in the mean square sense. Note that
can be obtained by the known conditional probability density functionin Assumption 4. Replacing the unknown parameter in by its estimate , we can obtain a natural adaptive predictor of as follows:
The difference between the best prediction and adaptive prediction can be regarded as regret, denoted as , i.e.,
where . One may naturally expect that the regret be small in some sense, which will be useful in adaptive control. Details will be discussed in the subsequent section.
Throughout the sequel, for convenience , let us introduce the following notations:
2.3 Global convergence results
The following three theorems are the main results of this paper. Under no excitation conditions, we will establish some nice asymptotic upper bounds for the parameter estimation error, the accumulated regrets of adaptive prediction, and the tracking error of adaptive control.
The detailed proof of Theorem 1 is supplied in the next section.
Let the conditions of Theorem 1 hold, and let the conditional probability density function of the noise sequence have a uniformly positive lower bound:
Let the noise be independent with algebra , and normally distributed with zero mean and variance
, and normally distributed with zero mean and variance, . Then the condition will be satisfied if has an upper and positive lower bound.
From we know that if we have
as , then the estimates given by Algorithm 2.2 will be strongly consistent, i.e., The condition is much weaker than the traditional persistent excitation condition, which requires that . Also, the condition is equal to the Lai-Wei excitation condition for classical least-squares algorithm with regular output sensors.
According to Theorem 2, one can directly deduce the following corollary.
Let the conditions of Theorem 2 hold, and let be the conditional probability density function of the noise sequence as defined in Assumption 4. Then we have the following two basic results for the accumulated regret of adaptive prediction:
If has a uniformly positive lower bound, i.e.
If does not have a uniformly positive lower bound but satisfies
Let the noise sequence be independent and normally distributed with zero mean and variance . Then the condition will be satisfied if has both upper and lower positive bounds; the conditions will be satisfied if and .
As in the regular observation case (see Guo (1995)), an important application of Theorem 2 is in adaptive control of stochastic systems with binary-valued observations, as stated in the following theorem:
Let the conditions of Theorem 2 hold. And let the conditional probability density function satisfy and
for some . If the regression vectors can be influenced by an input signal , such that for a given bounded sequence of reference signals , the following equation can be satisfied by choosing :
Then the averaged tracking error , defined by
will approach to its minimum value with the following best possible almost sure convergence rate:
The detailed proof of Theorem 3 is given in the next section.
3 Proofs of the main results
To prove the main results, we first introduce several lemmas.
(Chen and Guo, 1991).Let be a martingale difference sequence and an adapted sequence. If
for some , then as :
(Lai and Wei, 1982). Let be a sequence of vectors in and let . Let denote the determinant of . Assume that is nonsingular, then as
(Guo, 1995). Let be any bounded sequence of vectors in . Denote with , then we have
Moreover, by and
which means is a martingale difference sequence.
Following the analysis ideas of the classical least-squares for linear stochastic regression models(see e.g., Moore (1978), Lai and Wei (1982), Guo (1995)), we consider the following stochastic Lyapunov function:
Let us now analyze the right-hand-side (RHS) of term by term. From (15), we know that
Moreover, by (15) again, we know that
Hence, we have
where the second equality is since is an increasing function, and the last inequality holds by (14) and
Similarly, by ,
where we have used the fact that .
Now, substituting , and into we get
Summing up both sides of from 0 to , we have
We now analyze the last three terms in which are related to the martingale difference sequence .
respectively. Moreover, since , we have
By and the boundedness of and , we know that . Consequently, by and Lemma 2, we have
Also, by Lemma 2 and again, we know that
As for the last term of right side of , since , we have
Denote , by Lemma 2 and letting , we get
where the last equality is from and . Hence, from and
Combine , , , , we thus have
Note that is a non-increasing sequence, we finally obtain . ∎