1 Introduction
1.1 Background
The purpose of this paper is to study parameter estimation and adaptation for stochastic systems, in which the system output cannot be measured accurately, and the only available information is whether or not the output belongs to some set (Wang et al. (2003)). Specifically, consider the following standard stochastic regression model:
(1) 
where
represent the system output, random noise and regression vector, respectively, and
is an unknown parameter vector to be estimated. The system output can only be observed with binaryvalued measurements:(2) 
where denotes a given threshold sequence, and is the indicator function.
This type of observations emerges widely in practical systems with the development of science and technology. One example comes from neuron systems(
Ghysen (2003)) where, instead of measuring exact internal potential, the systems only provide information of states (excitation or inhibition). When the potential is smaller than the potential threshold, the neuron shows the inhibition state, otherwise shows the excitation state. The objective of classification via neural network is to learn the system parameters based on states of neurons only. Another example comes from sensor networks(
Zhang et al. (2019)) where, for setvalued sensor networks, the information from each sensor turns out to be quantized observations with finite number of bits or even 1 bit. Specifically, each sensor only provide information whether the measured value is larger than a designed threshold or not. Usually, such sensors are more cost effective than regular sensors. Besides, there are also numerous other examples such as ATM ABR traffic control; gas content sensors (, , etc.) in gas and oil industry and switching sensors for shiftbywire in automotive applications(Wang et al. (2003)).1.2 Related works
Due to the widely use of systems with quantized observations, a number of basic problems concerning identification and control emerge, which need theoretical investigations. In fact, the estimation of quantized output systems has constituted a vast literature recently. Wang et al. (2003); Wang and Yin (2007); Wang et al. (2006); Zhao et al. (2007) gave a strongly consistent identification algorithm under periodic signals. Jafari et al. (2012) proposed a recursive identification algorithm for FIR systems with binaryvalued observations, and proved its convergence under mixing property of the signals. Later, Guo and Zhao (2013) proposed a recursive projection algorithm for FIR systems with binaryvalued observations and fixed thresholds. The paper established the convergence rate under the following strongly persistent excitation condition , for some fixed and constant . Besides, Marelli et al. (2013) considered ARMA system with timevarying K  level scalar quantizer also random packet dropouts, and gave the consistence result under independent and identically distributed(i.i.d.) conditions. The adaptive quantizer is considered in You (2015) for FIR systems, provided that the signals satisfy i.i.d conditions with . Moreover, under some persistent excitation conditions, Zhao et al. (2016) introduced a EMtyped algorithm, which is robust and easy to programme, and proved that the maximum likelihood criterion can be achieved as the number of iterations goes to infinity. Song (2018) presented a strongly consistent estimate and obtained the convergence rate for ARMA systems with binary sensors and unknown threshold under i.i.d Gaussian inputs. Zhang et al. (2019) considered a QuasiNewton type algorithm under the following persistent excitation (PE) condition: where
represents the minimal eigenvalue of the matrix in question. Numerical simulations in
Zhang et al. (2019) demonstrated that their QuasiNewton type algorithm has equivalent convergence properties for firstorder FIR systems and highorder systems, where strong consistency and asymptotic efficiency for firstorder FIR systems are also established. For other methods, We refer the readers to Bottegal et al. (2017), Wang and Zhang (2014) about kernelbased method and quadratic programmingbased method, among others.However, almost all of the existing investigations on identification suffer from some fundamental limitations.
Firstly, for the system with regular output sensors, i.e., in , substantial progresses had been made in the area of adaptive estimation and control (e.g.,Chen and Guo (1991)), and the excitation condition for consistency of parameter estimates need not to be persistent. For example, it is widely known that the weakest possible excitation condition for strong consistency of the leastsquares estimate for stochastic regression models is the following (Lai and Wei (1982)):
(3) 
which is actually a decaying excitation condition, is much weaker than the classical PE condition, and can be applied in adaptive feedback control(see e.g., Chen and Guo (1991)). However, as mentioned above, for identification of systems with binaryvalued sensors, almost all of the existing literatures need the PE conditions on signals for strong consistency, and actually, most need i.i.d or periodic signal assumptions. Though these conditions may be satisfied for some openloop or offline identification, they are much more difficult to be satisfied or verified for closedloop system identification, since the input and output data of such systems are generally determined by nonlinear stochastic dynamic equations(Chen and Guo (1991)). Consequently, the problem whether or not the PE condition can be essentially weakened to e.g. (3) for identification of stochastic systems with binaryvalued sensors still remains open.
Secondly, to the best of the authors’ knowledge, almost all of the existing estimation algorithms for stochastic systems with binaryvalued observations and given thresholds, are designed with firstorder gradient. This kind of algorithms is designed by taking the same stepsize for each coordinates, which may alleviate the complexity in the convergence analysis, but will sacrifice the convergence rate of the algorithms (Ljung and Söderström (1983)). To improve the convergence properties, it is necessary to consider estimation algorithms with adaptation gain being a matrix (e.g., Hessian matrix or its modifications), rather than a scalar.
Thirdly, there are only a few results on adaptive control with binaryvalued observations in the existing literature (c.f.,e.g., Guo et al. (2011), Zhao et al. (2013)), where some kinds of FIR control systems are considered and consistency of parameter estimates is needed for the optimality of adaptive control systems.
The goal of this paper is to show that the above mentioned limitations can be considerably relaxed or removed.
1.3 Contributions
Inspired by the method in Guo (1995), this paper proposes a new recursive projected QuasiNewton type algorithm which can be viewed as a naturally extension of classical linear leastsquare algorithm with a projection operator. The main contributions of this paper can be summarized as follows:

We propose a projected recursive QuasiNewton type algorithm for stochastic regression systems with binaryvalued observations. In the area of identification with binaryvalued observations and given fixed thresholds, this paper appears to be the first to establish almost sure convergence for QuasiNewton type estimation algorithms where the adaptation gains are matrices.

The weakest possible excitation condition known for strong consistency of the classical leastsquares algorithm, is proven to be sufficient for strong consistency of the proposed new estimation algorithm in the current binaryvalued observation case. This appears to be the first time to achieve such a strong result in the literature of system identification with binaryvalued observations.

We also obtain a celebrated result on the asymptotic order of the accumulated regret of adaptive prediction, that is , , which does not need any excitation condition and can be conveniently used in adaptive control to give better results than the existing ones in the literature.
The remainder of this paper is organized as follows. In Section 2, we give the main results of this paper, including the assumptions, proposed algorithms and main theorems; Section 3 presents the proofs of the main results together with some key lemmas. Some numerical examples are provided in Section 4. Finally, we conclude the paper with some remarks.
2 The main results
Consider the stochastic regression model (1)(2) with binaryvalued observations. The objectives of this paper are, to propose a strongly consistent estimator for the unknown parameter vector under a nonPE condition, and to give an asymptotically optimal adaptive predictor for the regular output together with its applications in adaptive tracking.
2.1 Notations and assumptions
For our purpose, we introduce some notations and assumptions first.
Notations. By , we denote the Euclideannorm of vectors or matrices. The spectrum of a symmetric matrix is denoted by , where the maximum and minimum eigenvalues are denoted by and respectively. Moreover, by or we mean the determinant of the matrix .
Assumption 1.
Let be a nondecreasing sequence of algebras such that is measurable with a known upper bound:
(4) 
where
may be a random variable.
Assumption 2.
The true parameter belongs to a bounded convex set , and we denote
(5) 
Assumption 3.
The given threshold is an adapted sequence, with a known upper bound:
(6) 
where may be a random variable.
Assumption 4.
The noise is integrable and measurale. For any
, the conditional probability density function of
given , denoted by , is known and satisfies(7) 
where , and are defined by , and .
Remark 1.
It can be easily seen that if the threshold is fixed, then Assumption 3 will be satisfied automatically. Moreover, if the noise is independent with the algebra
, and with identically normal distribution as assumed previously (see,e.g.,
Guo and Zhao (2013), Zhang et al. (2019)), then the condition in Assumption 4 will be satisfied.2.2 Recursive algorithm and adaptive predictor
To construct a QuasiNewton type identification algorithm, we need to introduce a projection operator on as follows.
Definition 1.
For the linear space , the weighted norm associated with a positive definite matrix is defined as
(8) 
Definition 2.
For a given convex compact set , and a positive definite matrix Q, the projection operator is defined as
(9) 
Remark 2.
The wellposedness of is ensured by the positive definite property of the matrix and the convexity of (Cheney, 2001).
Our recursive identification algorithm is a kind of QuasiNewton algorithm, defined as follows:
(10) 
(11) 
(12) 
(13) 
(14) 
where is the estimate of at time ; is a projection operator defined as in Definition 2;
is the conditional probability distribution function of
given the algbra ; the initial value can be chosen arbitrarily in , where is given in Assumption 2; can be arbitrarily chosen from the interval ; can also be chosen arbitrarily.Note that by and the wellknown matrix inversion formula (see, e.g., Guo (2020), Theorem 1.1.17), the inverse of can be recursively rewritten as
(15) 
Thus, is positivedefinite since the initial condition , which ensures the wellposedness of the projection operator in Algorithm 2.2.
Moreover, since both and are measurable, we have
(16) 
which is the best prediction for given in the mean square sense. Note that
can be obtained by the known conditional probability density function
in Assumption 4. Replacing the unknown parameter in by its estimate , we can obtain a natural adaptive predictor of as follows:(17) 
The difference between the best prediction and adaptive prediction can be regarded as regret, denoted as , i.e.,
(18) 
where . One may naturally expect that the regret be small in some sense, which will be useful in adaptive control. Details will be discussed in the subsequent section.
Throughout the sequel, for convenience , let us introduce the following notations:
(19) 
(20) 
(21) 
2.3 Global convergence results
The following three theorems are the main results of this paper. Under no excitation conditions, we will establish some nice asymptotic upper bounds for the parameter estimation error, the accumulated regrets of adaptive prediction, and the tracking error of adaptive control.
Theorem 1.
The detailed proof of Theorem 1 is supplied in the next section.
Corollary 1.
Let the conditions of Theorem 1 hold, and let the conditional probability density function of the noise sequence have a uniformly positive lower bound:
(23) 
Then
(24) 
Remark 3.
Let the noise be independent with algebra
, and normally distributed with zero mean and variance
, . Then the condition will be satisfied if has an upper and positive lower bound.Remark 4.
From we know that if we have
(25) 
as , then the estimates given by Algorithm 2.2 will be strongly consistent, i.e., The condition is much weaker than the traditional persistent excitation condition, which requires that . Also, the condition is equal to the LaiWei excitation condition for classical leastsquares algorithm with regular output sensors.
Theorem 2.
According to Theorem 2, one can directly deduce the following corollary.
Corollary 2.
Let the conditions of Theorem 2 hold, and let be the conditional probability density function of the noise sequence as defined in Assumption 4. Then we have the following two basic results for the accumulated regret of adaptive prediction:

If has a uniformly positive lower bound, i.e.
(27) then
(28) 
If does not have a uniformly positive lower bound but satisfies
(29) then
(30)
Remark 5.
Let the noise sequence be independent and normally distributed with zero mean and variance . Then the condition will be satisfied if has both upper and lower positive bounds; the conditions will be satisfied if and .
Remark 6.
As in the regular observation case (see Guo (1995)), an important application of Theorem 2 is in adaptive control of stochastic systems with binaryvalued observations, as stated in the following theorem:
Theorem 3.
Let the conditions of Theorem 2 hold. And let the conditional probability density function satisfy and
(31) 
for some . If the regression vectors can be influenced by an input signal , such that for a given bounded sequence of reference signals , the following equation can be satisfied by choosing :
(32) 
Then the averaged tracking error , defined by
(33) 
will approach to its minimum value with the following best possible almost sure convergence rate:
(34) 
where .
The detailed proof of Theorem 3 is given in the next section.
3 Proofs of the main results
To prove the main results, we first introduce several lemmas.
Lemma 2.
(Chen and Guo, 1991).Let be a martingale difference sequence and an adapted sequence. If
(36) 
for some , then as :
(37) 
where
(38) 
Lemma 3.
(Lai and Wei, 1982). Let be a sequence of vectors in and let . Let denote the determinant of . Assume that is nonsingular, then as
(39) 
Lemma 4.
(Guo, 1995). Let be any bounded sequence of vectors in . Denote with , then we have
(40) 
Finally, the proofs of Theorems 13 will immediately follow from the following Lemma 5, which can be proven by using Lemmas 14.
Lemma 5.
Proof. By Assumptions 14 and (10)(14), is measurable, and satisfies
(42) 
Moreover, by and
(43) 
which means is a martingale difference sequence.
Following the analysis ideas of the classical leastsquares for linear stochastic regression models(see e.g., Moore (1978), Lai and Wei (1982), Guo (1995)), we consider the following stochastic Lyapunov function:
By Lemma 1 and Algorithm 2.2 , we know that
(44)  
Let us now analyze the righthandside (RHS) of term by term. From (15), we know that
(45) 
Moreover, by (15) again, we know that
(46)  
Hence, we have
(47)  
where the second equality is since is an increasing function, and the last inequality holds by (14) and
(48) 
Similarly, by ,
(49)  
where we have used the fact that .
Now, substituting , and into we get
(50)  
Summing up both sides of from 0 to , we have
(51)  
We now analyze the last three terms in which are related to the martingale difference sequence .
Let in Lemma 3 and Lemma 4, we get
(52) 
(53) 
respectively. Moreover, since , we have
(54) 
Denote
(55) 
By and the boundedness of and , we know that . Consequently, by and Lemma 2, we have
(56)  
Also, by Lemma 2 and again, we know that
(57)  
As for the last term of right side of , since , we have
(58) 
Denote , by Lemma 2 and letting , we get
(59)  
where the last equality is from and . Hence, from and
(60)  
Combine , , , , we thus have
(61) 
Note that is a nonincreasing sequence, we finally obtain . ∎
Comments
There are no comments yet.