I Introduction
The goal of the distributed estimation is to estimate the system parameters across the network based on observed data of all nodes. Diffusion adaptation has been intensively studied during the last decades [1, 2, 3, 4]. Most algorithms assume that the information exchange among neighbour nodes is noiseless, which, however may not satisfy in real applications. So far, there is a lot of researches on performance analysis of diffusion adaptation with noisy links [5, 6, 7]. However, most studies focus only on the impact of noise on the diffusion process, but give no way to improve the performance of the algorithms.
Recently, some improved diffusion adaptation algorithms have been proposed to alleviate the impact of noisy communications [5, 8, 9]
. By modifying the diffusion strategies of typical diffusion least mean squares (DLMS) algorithm, these algorithms can work well under noisy links. These algorithms utilize mean squared error (MSE) as the cost function, however, many lead to biased estimation. In particular, MSE can obtain unbiased estimation for standard regression assumption where input data is noiseless. While for noisy communications, all the data are perturbed by noise during information exchange process. This recalls the error in variable (EIV) model which takes both input and output data into consideration. So far several diffusion algorithms under EIV model have been proposed such as bias-compensated DLMS (BC-DLMS) algorithm
[10] and diffusion gradient descent based total least squares (D-GDTLS) algorithm [11, 12]. Both methods can achieve better performance compared with DLMS for noisy input and output data.The robustness is also an important issue in real applications. BC-DLMS and D-GDTLS are all based on quadratic form costs, which can perform well in Gaussian noises. However, due to impulsive perturbation, the noise distribution may contain large outliers, thus the performance of quadratic cost based algorithms may degrade. Recently, an information theoretic learning (ITL) based criterion called maximum total correntropy (MTC) has been proposed [13]
. Correntropy is a nonlinear method to measure the similarity, which involves all the even moments of the difference and is insensitive to outliers
[14]. As an extension of maximum correntropy criterion (MCC) [14, 15, 16, 17, 18, 19] and total least squares (TLS) [20], MTC aims to deal with the robust EIV regression problem. MTC based adaptive filtering has shown accurate and robust performance when both input and output noise contain large outliers [14].In this paper we develop a new algorithm called diffusion maximum total correntropy (DMTC) algorithm for noisy links. Using EIV model, DMTC can theoretically produce unbiased estimation in Gaussian noise. Taking advantage of correntropy, DMTC can efficiently handle the large outliers. The theoretical analysis in the mean and mean-square error performance is also given. Moreover, to reduce the negative influence of noise in combination step, the adaptive combination rule is utilized. Simulation results confirm the desired performance of the proposed algorithm, showing that our method can greatly improve the performance of diffusion adaptation with noisy links in both accuracy and robustness.
Ii Proposed algorithm
Ii-a System model and problem statement
Consider a connected network with nodes. For each node , suppose we are given a sequence with the following linear relation
(1) |
where
is the input vector of node
at time , is the corresponding output and is the target weight vector. Since the output data may be perturbed by measurement noise, the observed data can be described as(2) |
where is the output noise of node
with variance
. The goal of distributed estimation is to collaboratively estimate based on data sequences .Diffusion adaptation strategies have been widely used in distributed estimation. In particular, DLMS is the most used algorithm. The weight update of DLMS is
(3) |
where , is the intermediate vectors of node at time , is the neighbourhood of node , is the corresponding step-size, and are the -th entries of matrices and respectively. In particular, and should satisfy , and if .
Eq.(3) contains two information communication procedures. For node at each iteration , the data are shared to neighbour node for gradient estimation, and then receive the estimated vector from neighbour node for combination. When taking link noise into consideration, the observed data are represented as
(4) | |||
where are the noisy data received by node from its neighbor , are the corresponding measurement noise with variance and covariance matrix , respectively. Note that since self weight update will not contain noise. In the following part, for simplicity, the subscript and denote the same notation.
Ii-B DMTC algorithm
It is known that DLMS utilizes the following approximation of global cost function at node [1, 4]
(5) |
where can be obtained from , the local cost function at node is the MSE based cost function
(6) |
When information exchange contains noise, using the MSE as the cost function may lead to biased estimation. In particular, during data sharing from node to , the input data and output data are perturbed by noise vector and noise , respectively. Therefore, it is more reasonable to treat the regression model for as the EIV model where all the variables are assumed to be perturbed by noise. For EIV regression in distributed network, The D-GDTLS algorithm has been proposed [11]. Directly applying D-GDTLS to the proposed problem leads to the following cost function
(7) |
where . Here we use the notation instead of in Eq.(5) since cost functions for each link may no longer the same.
D-GDTLS uses quadratic form cost, thus may not perform well when link noise contains large outliers. In adaptive filtering, to improve the robustness of EIV model regression, an ITL based criterion called maximum total correntropy (MTC) is developed [13]. By replacing the quadratic norm in Eq.(7) with correntropy measure, one can obtain the MTC based utility functions
(8) |
where is the kernel parameter and is calculated without outliers.
For self weight update (i.e. ), the MSE in Eq.(6) is still used as the cost function. Therefore, based on Eq.(6) and Eq.(8), we derive the diffusion maximum total correntropy (DMTC) algorithm for noisy communications as
(9) |
where
Note that the coefficient in the derivation of Eq.(8) is absorbed into .
Ii-C Adaptive combination rule
To further improve the performance and robustness, we follow the adaptive combination rule [5, 9] and propose the adaptive combination DMTC (AC-DMTC) algorithm. In particular, the coefficient is replaced by
(10) |
where
(11) |
is a smoothed estimation of with a forgetting factor , and is a local one-step approximation defined as [9]
(12) |
where is a sufficient small positive value.
One should notice that this adaptive combination rule is essentially robust to outliers. In particular, using smoothed estimation in Eq.(11) reduces the negative impact of accidental large outliers. Moreover, the assignment in Eq.(10) also guarantee the robustness since relative large will result small .
Iii Local Convergence analysis
In this section, we carry out the local convergence analysis of the proposed DMTC algorithm. The analysis is based on the following assumption:
: For arbitrary node , the elements of are zero-mean i.i.d. Gaussian random processes, and are independent of and .
The above assumption is commonly used in analysis of diffusion algorithms [1, 2, 3, 4, 5]. Since calculation in non-Gaussian noise is very hard, the strictly analysis of the proposed algorithm is based on Gaussian noise assumption. The performance on non-Gaussian link noise is briefly discussed and will be verified by simulations. Moreover, similar to [5, 9], during the analysis we regard as a fixed matrix.
Iii-a Mean stability
First we analyze the mean stability of the proposed algorithm. Subtracting both sides of Eq.(9) from we can obtain
(13) |
where , , are error vectors. For further analysis, we define instantaneous gradient error vector as
(14) |
where . Since is twice continuously differentiable in a neighborhood of a line segment between points and , we can use Theorem 1.2.1 in [21] and approximate as [1]
(15) | ||||
where is the Hessian matrix of at .
For Gaussian noise distribution, using integral method to compute the expectation, one can obtain [13]
(16) |
(17) |
where is the covariance matrix of node . Then, by defining
and substituting Eq.(14)-(17) into Eq.(13) yields
(18) |
Under Assumption 1 and taking expectation of both sides, we get
(19) |
Therefore, to ensure the stability, should be stable, i.e. where is the spectrum operator. Since , we obtain
. Thus the magnitudes of all the eigenvalues of matrix
must be less than unity to ensure the local convergence. After some algebra, one can conclude that the step size should satisfy the following condition to ensure the convergence(20) |
When Eq.(20) is satisfied, will tend to zero as , that is, under Gaussian noise, the estimation of the proposed DMTC algorithm is asymptotically unbiased.
Remark 1: When Gaussian noise is contaminated with occasional large outliers, due to utilizing Gaussian function, large outliers will have little influence on the local maximum property of local cost function . Therefore, DMTC can efficiently alleviate the bad influence of large outliers. Moreover, since MTC is only the local maximum, to guarantee a global optimal solution, one can use D-GDTLS or DMTC with sufficient large to train the adaptive filter first to make sure the solution is near the global optimal solution [13].
Iii-B Mean square stability and steady state performance
At the vicinity of local maximum, the gradient error in Eq.(14) can be approximated as
(21) |
Thus squaring both sides of Eq.(18) gives the following mean energy conservation [1]
(22) |
where is an arbitrary positive symmetric matrix, denotes for arbitrary matrix and
(23) | ||||
After some algebra, one can obtain the following mean-square relation
(24) |
where and
(25) |
where is the matrix vectorization operator. and denotes the same quantities. When Eq.(20) is satisfied, all eigenvalues of will be also less than , and the mean square stability in Eq.(24) can be guaranteed. Thus the condition of step-size in Eq.(20) will guarantee both the mean and mean-square stability.
When the iteration number is large enough (), we can obtain the steady state MSD as
(26) | ||||
Remark 2: One can observe that the formulation of Eq.(26) is similar to MSD derived for DLMS without data sharing (i.e. ) [5]. Thus, similar to analysis in [5], one should minimize the upper bound of MSD to achieve better performance. This fact theoretically verifies the rationality of using the adaptive combination rule for DMTC.
Iv Simulation Results
In this section, we present simulation results to illustrate the performance of the proposed algorithm. Consider a distributed network with 20 nodes and the desired weight vector is assumed to be
. Each node is linked to 3 nodes in average. Each element of input vectors is zero-mean Gaussian with unit variance. The Gaussian mixture model (GMM) is used as the distribution of link noise. The probability density function (PDF) of the GMM noise
is , stands for original noise and is associated with calculation of . is set as large value to represent the distribution of large outliers. The parameter controls the occurrence probability of large outliers.Several algorithms are used for comparison including DLMS based diffusion algorithms (DLMS, AC-DLMS, AC-DLMS without data sharing ()) and DMTC based diffusion algorithms (AC-DMTC, DMTC with only data sharing ()). Non-cooperative LMS and diffusion MCC (DMCC) [19] are also added for comparison. For adaptive combination rule, the forgetting factor is set to 0.05, and is set to . For DMTC based diffusion algorithms, without mentioned, are all set to for and for . The kernel widths of DMCC are set to for and for so that it achieves similar measurement of residuals with DMTC at steady state. The MSD is approximated over 1000 Monte Carlo runs. For all algorithms, Metropolis weights [3] are used in both adaptation and combination step. The initial weight vector is zero vector.
First we compare the convergence performance of the algorithms. The observation noise variances are set to 0.1 for all nodes. All the link noises are set to the same distribution. In the first 2000 iterations, Gaussian link noises with are used. While for , the GMM link noises with are added. The step sizes of each algorithm are adjusted so that all the algorithms have almost the same initial convergence speed. The average learning curves of different algorithms are shown in Fig.1. One can observe that DMTC based algorithms can achieve better result than others in both Gaussian and non-Gaussian noises. Note that when large outliers occur, the performance of all diffusion methods will degrade.
Second, we investigate the performance under different link noises. The algorithm settings are the same as previous simulation. The MSD curves in terms of different noise variances and are shown in Fig.2. One can see that the proposed AC-DMTC algorithm achieves the best performance in all noise environments. In particular, Fig.2(a) shows that using EIV model can achieve better performance especially when the link noise variance is large. Fig.2(b) confirms the robustness of correntropy, and Fig.2(c) illustrates that adaptive combination rule can efficiently alleviate the bad influence of large outliers.
Finally, the sensitivity of the kernel parameter is investigated. The GMM noise is set the same with the first simulation. The at are all set to the same value . The MSD curves versus different are shown in Fig.3. It can be seen that the MSD increases as the grows large. Moreover, a small will slow down the convergence speed, thus one should select a proper value of to make a trade off between faster convergence speed and lower MSD.
V Conclusion
In this work we propose a new algorithm called diffusion maximum total correntropy (DMTC), which aims to improve both the accuracy and robustness for distributed estimation over network with noisy links. Taking advantages of correntropy and EIV model, the proposed algorithm is theoretically unbiased in Gaussian noise, and can efficiently handle noise in presence of large outliers. Further, the adaptive combination rule is also utilized. Simulation results confirm that the proposed algorithm can achieve excellent performance in both Gaussian and non-Gaussian noises.
References
- [1] J. Chen and A. H. Sayed, “Diffusion adaptation strategies for distributed optimization and learning over networks,” IEEE Transactions on Signal Processing, vol. 60, no. 8, pp. 4289–4305, 2012.
- [2] A. H. Sayed, “Diffusion adaptation over networks,” Academic Press Library in Signal Processing, vol. 3, pp. 323–454, 2013.
- [3] C. G. Lopes and A. H. Sayed, “Diffusion least-mean squares over adaptive networks: Formulation and performance analysis,” IEEE Transactions on Signal Processing, vol. 56, no. 7, pp. 3122–3136, 2008.
- [4] F. S. Cattivelli and A. H. Sayed, “Diffusion lms strategies for distributed estimation,” IEEE Transactions on Signal Processing, vol. 58, no. 3, pp. 1035–1048, 2010.
- [5] X. Zhao, S. Y. Tu, and A. H. Sayed, “Diffusion adaptation over networks under imperfect information exchange and non-stationary data,” IEEE Transactions on Signal Processing, vol. 60, no. 7, pp. 3460–3475, 2012.
- [6] A. Khalili, A. Rastegarnia, A. Rastegarnia, and J. A. Chambers, “Steady-state analysis of diffusion lms adaptive networks with noisy links,” IEEE Transactions on Signal Processing, vol. 60, no. 2, pp. 974–979, 2012.
- [7] R. Abdolee and B. Champagne, “Diffusion lms algorithms for sensor networks over non-ideal inter-sensor wireless channels,” in International Conference on Distributed Computing in Sensor Systems and Workshops, 2011, pp. 1–6.
- [8] X. Zhao and A. H. Sayed, “Clustering via diffusion adaptation over networks,” in International Workshop on Cognitive Information Processing, 2012, pp. 1–6.
- [9] R. Nassif, C. Richard, J. Chen, A. Ferrari, and A. H. Sayed, “Diffusion lms over multitask networks with noisy links,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2016, pp. 4583–4587.
- [10] R. Abdolee and B. Champagne, “Diffusion lms strategies in sensor networks with noisy input data,” IEEE/ACM Transactions on networking, vol. 24, no. 1, pp. 3–14, 2016.
- [11] R. Arablouei, S. Werner, and K. Doğançay, “Diffusion-based distributed adaptive estimation utilizing gradient-descent total least-squares,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013, pp. 5308–5312.
- [12] S. Huang and C. Li, “Distributed sparse total least-squares over networks,” IEEE Transactions on Signal Processing, vol. 63, no. 11, pp. 2986–2998, 2015.
- [13] F. Wang, Y. He, S. Wang, and B. Chen, “Maximum total correntropy adaptive filtering against heavy-tailed noises,” Signal Processing, 2017.
- [14] W. Liu, P. P. Pokharel, and J. C. Príncipe, “Correntropy: properties and applications in non-gaussian signal processing,” Signal Processing, IEEE Transactions on, vol. 55, no. 11, pp. 5286–5298, 2007.
- [15] B. Chen, L. Xing, J. Liang, N. Zheng, and J. C. Principe, “Steady-state mean-square error analysis for adaptive filtering under the maximum correntropy criterion,” Signal Processing Letters, IEEE, vol. 21, no. 7, pp. 880–884, 2014.
- [16] B. Chen and J. C. Príncipe, “Maximum correntropy estimation is a smoothed map estimation,” Signal Processing Letters, IEEE, vol. 19, no. 8, pp. 491–494, 2012.
- [17] Z. Wu, S. Peng, B. Chen, and H. Zhao, “Robust hammerstein adaptive filtering under maximum correntropy criterion,” Entropy, vol. 17, no. 10, pp. 7149–7166, 2015.
- [18] B. Chen, J. Wang, H. Zhao, and N. Zheng, “Convergence of a fixed-point algorithm under maximum correntropy criterion,” IEEE Signal Processing Letters, vol. 22, no. 10, pp. 1723–1727, 2015.
- [19] W. Ma, B. Chen, J. Duan, and H. Zhao, “Diffusion maximum correntropy criterion algorithms for robust distributed estimation,” Digital Signal Processing, vol. 58, pp. 10–19, 2016.
- [20] I. Markovsky and S. V. Huffel, “Overview of total least-squares methods,” Signal Processing, vol. 87, no. 10, pp. 2283–2302, 2007.
- [21] C. T. Kelley, “Iterative methods for optimization,” Frontiers in Applied Mathematics, vol. 41, no. 9, pp. 878–878, 2010.