# A Complete Transient Analysis for the Incremental LMS Algorithm

The incremental least mean square (ILMS) algorithm was presented in <cit.>. The article included theoretical analysis of the algorithm along with simulation results under different scenarios. However, the transient analysis was left incomplete. This work presents the complete transient analysis, including the learning behavior. The analysis results are verified through several experimental results.

## Authors

• 3 publications
08/25/2016

### Transient performance analysis of zero-attracting LMS

Zero-attracting least-mean-square (ZA-LMS) algorithm has been widely use...
08/08/2021

### Mean-square Analysis of the NLMS Algorithm

This work presents a novel approach to the mean-square analysis of the n...
10/14/2014

### A stochastic behavior analysis of stochastic restricted-gradient descent algorithm in reproducing kernel Hilbert spaces

This paper presents a stochastic behavior analysis of a kernel-based sto...
11/06/2017

### Quickest Change Detection under Transient Dynamics: Theory and Asymptotic Analysis

The problem of quickest change detection (QCD) under transient dynamics ...
04/21/2022

### Persistent-Transient Duality in Human Behavior Modeling

We propose to model the persistent-transient duality in human behavior u...
01/09/2021

### Studies on Frequency Response Optimized Integrators Considering Second Order Derivative

This paper presents comprehensive studies on frequency response optimize...
02/04/2018

### Uncertainty Quantification of the time averaging of a Statistics Computed from Numerical Simulation of Turbulent Flow

Rigorous assessment of the uncertainty is crucial to the utility of nume...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Over the last decade, several algorithms have been proposed for distributed estimation over wireless sensor networks. Different algorithms target different goals. However, the most significant feature among these algorithms has been the basic distribution schemes used for developing the algorithms. Nearly all algorithms have been based on either the incremental scheme or the diffusion scheme

[1].

The incremental scheme was first introduced in [2]. The network was distributed using a Hamiltonian cycle and each node was connected to two other nodes in the cycle. The resulting algorithm converged quicker than the diffusion scheme and gave lower misadjustment as well. While the diffusion scheme presented some advantages over the incremental scheme, one significant advantage was the theoretical analysis [3]. Since the analysis of the diffusion scheme is easier to perform, most subsequent algorithms have been based on the diffusion scheme [1].

The analysis of the incremental scheme is not easy to perform. The analysis presented in [2] is incomplete. The transient analysis presented in [2] reaches a certain point before the focus shifts to steady-state results. The reason is that the learning behavior for the incremental scheme is not straightforward. As a result, the subsequent algorithms based on the ILMS algorithm of [2] also follow the same process and leave the transient analysis incomplete. This work presents the complete transient analysis for the ILMS algorithm, including the learning behavior. The results are verified through several experimental results. The experiments are conducted to test all aspects of the analysis and simulation results are used to validate the theoretical findings.

The rest of the paper is divided as follows. Section 2 presents the ILMS algorithm. Section 3 presents the mean square analysis in details, including the transient and steady-state analysis. Experimental results are presented in section 4 and section 5 gives the conclusion.

## 2 The ILMS Algorithm

A collection of sensor nodes, spread over a geographical area is being considered. The nodes are connected in a Hamiltonian cycle [2]. Fig. 1 shows an illustration of a possible adaptive wireless sensor network. Connecting the nodes in a cyclic way would result in the incremental strategy.

The unknown parameters are modeled as a vector,

, of size (). The input to a node at any given time instant, , is a () regressor vector, , where is the node index. The resulting observed output for the node is a noise corrupted scalar, , given by

 dk(i)=uk(i)wo+vk(i), (1)

where

is the zero-mean additive noise with variance

.

The incremental least mean square (ILMS) algorithm is given by [2]

 wk(i) = wk−1(i)+μkek(i)uTk(i), (2) w(i) = wN(i), (3)

where is the estimate of the unknown vector at time instant for node , is the instantaneous error, is the final estimate for iteration and is the transpose operator.

## 3 Proposed analysis

Let the weight-error vector be given by

 ~w(i)=wo−w(i). (4)

Using (4) in (2) and simplifying gives

 ~wk(i) = [IM−μkuTk(i)uk(i)]~wk−1(i) (5) −μkuTk(i)vk(i),

where

is an identity matrix of size

. The auto correlation matrix of the input regressor vector is given by , where is the expectation operator.

can be decomposed into its component matrices of eigenvalues and eigenvectors. Thus,

, where is the matrix of eigenvectors such that and is a diagonal matrix containing the eigenvalues. Using the matrix , the following transformations are made

 ¯¯¯¯¯wk(i)=HTk~wk(i),¯¯¯¯uk(i)=uk(i)Hk

The weight-error update equation thus becomes

 ¯¯¯¯¯wk(i) = [IM−μk¯¯¯¯uTk(i)¯¯¯¯uk(i)]¯¯¯¯¯wk−1(i) (6) −μk¯¯¯¯uTk(i)vk(i).

### 3.1 Mean Analysis

Applying the expectation operator to (6) gives

 E[¯¯¯¯¯wk(i)] =E[{IM−μk¯¯¯¯uTk(i)¯¯¯¯uk(i)}¯¯¯¯¯wk−1(i) (7) −μk¯¯¯¯uTk(i)vk(i)] ={IM−μkE[¯¯¯¯uTk(i)¯¯¯¯uk(i)]}E[¯¯¯¯¯wk−1(i)]

where the data independence assumption separates from the rest of the variables. The second term is 0 as additive noise is independent and zero-mean and . The sufficient condition for stability is evaluated from (7) and is given by

 0<μk<2βk,max, (8)

where is the maximum eigenvalue of .

### 3.2 Mean-Square Analysis

For the mean-square analysis, the approach of [2] is followed. However, the analysis in [2] was not completed to give the learning behavior for the algorithm. Here, the learning behavior is also given in closed form. Taking the squared weighted -norm of (6) and applying the expectation operator yields

 E[∥∥¯¯¯¯¯wk(i)∥∥2Σ] (9) =E[¯¯¯¯¯wTk−1(i)Σ′k¯¯¯¯¯wk−1(i)]+E[μ2kv2k(i)¯¯¯¯uk(i)Σk¯¯¯¯uTk(i)] −E[μkvk(i)¯¯¯¯uk(i)Σk{IM−μk¯¯¯¯uTk(i)¯¯¯¯uk(i)}¯¯¯¯¯wk−1(i)] −E[¯¯¯¯¯wTk−1(i){IM−μk¯¯¯¯uTk(i)¯¯¯¯uk(i)}Σkμkv(i)¯¯¯¯uTk(i)],

where is the -norm operator and is a weighting matrix. The weighting matrix is given by

 Σ′k = {IM−μk¯¯¯¯uTk(i)¯¯¯¯uk(i)}TΣk{IM−μk¯¯¯¯uTk(i)¯¯¯¯uk(i)} (10) = IM−μk¯¯¯¯uTk(i)¯uk(i)Σk−μkΣk¯¯¯¯uTk(i)¯¯¯¯uk(i) +μ2k¯¯¯¯uTk(i)¯¯¯¯uk(i)Σk¯¯¯¯uTk(i)¯¯¯¯uk(i)

The last two terms in (9) are zero since the additive noise is independent. Using the data independence assumption, the remaining two terms are simplified as

 E[∥∥¯¯¯¯¯wk(i)∥∥2Σk] = E[∥∥¯¯¯¯¯wk−1(i)∥∥2Σ′k] +σ2v,kμ2kTr{ΣkΛk},

where is the additive noise variance at node , is the trace operator and .

The expectation operator from the first term in (3.2) applies to the weighting matrix independently as well since the data is being assumed to be independent [4]. Thus, we have, after simplification

 Σ′k = IM−2μkΛkΣk+μ2kΛkTr[ΣkΛk] (12) +μ2kΛkΣkΛk.

Using the operator, (3.2) is further simplified to

 E[∥∥¯¯¯¯¯wk(i)∥∥2σk] = E[∥∥¯¯¯¯¯wk−1(i)∥∥2Fkσk] +σ2v,kμ2kλTkσk

where , and , where is given by

 Fk=IM−2μkΛk+μ2k[Λ2k+λkλTk]. (14)

Now, it can be seen from (14) that remains fixed at every iteration, even if it varies for each node, depending on the individual values of the parameters. Thus, using (3.2), the analysis is initialized as

 E[∥∥¯¯¯¯¯w(0)∥∥2σ]=∥wo∥2σ,

The first iterative update for node is given by

 E[∥∥¯¯¯¯¯w1(1)∥∥2σ1] =E[∥∥¯¯¯¯¯w(0)∥∥2F1σ1]+σ2v,1μ21λT1σ1 =∥wo∥2F1σ1+σ2v,1μ21λT1σ1

Similarly, the first update for node is given by

 E[∥∥¯¯¯¯¯w2(1)∥∥2σ2] = E[∥∥¯¯¯¯¯w1(1)∥∥2F2σ2]+σ2v,2μ22λT2σ2 = ∥wo∥2F1F2σ2+σ2v,1μ21λT1F2σ2 +σ2v,2μ22λT2σ2

Continuing for node gives

 E[∥∥¯¯¯¯¯w3(1)∥∥2σ3] = E[∥∥¯¯¯¯¯w2(1)∥∥2F3σ3]+σ2v,3μ23λT3σ3 = ∥wo∥2F1F2F3σ3+σ2v,1μ21λT1F2F3σ3 +σ2v,2μ22λT2F3σ3+σ2v,3μ23λT3σ3 = ∥wo∥2¯¯¯F3σ3+B3σ3,

where

 ¯¯¯¯F3 = F1F2F3=3∏m=1Fm B3 = σ2v,1μ21λT1F2F3+σ2v,2μ22λT2F3+σ2v,3μ23λT3IM = 2∑m=1σ2v,mμ2mλTm{3∏n=m+1Fn}+σ2v,3μ23λT3IM.

Thus, for node , the first update is

 E[∥∥¯¯¯¯¯wk(1)∥∥2σk]=∥wo∥2¯¯¯Fkσk+Bkσk,

where

 ¯¯¯¯Fk = k∏m=1Fm Bk = k−1∑m=1σ2v,mμ2mλTm{k∏n=m+1Fn}+σ2v,kμ2kλTkIM.

Finally, the first iteration ends with node and the final update is given by

 E[∥∥¯¯¯¯¯wN(1)∥∥2σN]=∥wo∥2¯¯¯FNσN+BNσN.

Moving to iteration , the update for node is given by

 E[∥∥¯¯¯¯¯w1(2)∥∥2σ1] = E[∥∥¯¯¯¯¯wN(1)∥∥2F1σ1]+σ2v,1μ21λT1σ1 = ∥wo∥2¯¯¯FNF1σ1+BNF1σ1+σ2v,1μ21λT1σ1.

For node , we have

 E[∥∥¯¯¯¯¯w2(2)∥∥2σ2] = E[∥∥¯¯¯¯¯w1(2)∥∥2F2σ2]+σ2v,2μ22λT2σ2 = ∥wo∥2¯¯¯FN¯¯¯F2σ2+[BN¯¯¯¯F2+B2]σ2.

Continuing for node , we get

 E[∥∥¯¯¯¯¯wk(2)∥∥2σk]=∥wo∥2¯¯¯FN¯¯¯Fkσk+[BN¯¯¯¯Fk+Bk]σk.

The final update for iteration is given by

 E[∥∥¯¯¯¯¯wN(2)∥∥2σN]=∥wo∥2¯¯¯F2NσN+BN[¯¯¯¯FN+IM]σN.

Before moving on, we need to link the first update for node with the second update. Beginning with node ,

 E[∥∥¯¯¯¯¯w1(2)∥∥2σ1] = E[∥∥¯¯¯¯¯wN(1)∥∥2F1σ1]+σ2v,1μ21λT1σ1 = E[∥∥¯¯¯¯¯wN−1(1)∥∥2FNF1σ1]+σ2v,1μ21λT1σ1 +σ2v,Nμ2NλTNF1σ1 = E[∥∥¯¯¯¯¯wN−2(1)∥∥2FN−1FNF1σ1]+σ2v,1μ21λT1σ1 +σ2v,Nμ2NλTNF1σ1 +σ2v,N−1μ2N−1λTN−1FNF1σ1 = E[∥∥¯¯¯¯¯w1(1)∥∥2F2…FNF1σ1]+σ2v,1μ21λT1σ1 +σ2v,Nμ2NλTNF1σ1+… +σ2v,2μ22λT2{N∏m=3Fm}F1σ1 = ∥wo∥2¯¯¯FNF1σ1+(BNF1+σ2v,1μ21λT1IM)σ1

Similarly, for node , we get, after simplification

 E[∥∥¯¯¯¯¯w2(2)∥∥2σ2]=∥wo∥2¯¯¯FN¯¯¯F2σ2+(BN¯¯¯¯F2+B2)σ2

Generalizing for node gives

 E[∥∥¯¯¯¯¯wk(2)∥∥2σk]=∥wo∥2¯¯¯FN¯¯¯Fkσk+(BN¯¯¯¯Fk+Bk)σk

The final update for iteration is thus give by

 E[∥∥¯¯¯¯¯wN(2)∥∥2σN]=∥wo∥2¯¯¯F2NσN+BN(¯¯¯¯FN+IM)σN

Since the final update for each iteration is given by the update of node , we focus on node only for now. Continuing for node , the update for iteration is given by

 E[∥∥¯¯¯¯¯wN(i)∥∥2σN] = ∥wo∥2¯¯¯FiNσN (15) +BN(i−1∑m=1¯¯¯¯FmN+IM)σN.

Similarly, for iteration , we have

 E[∥∥¯¯¯¯¯wN(i+1)∥∥2σN] = ∥wo∥2¯¯¯Fi+1NσN (16) +BN(i∑m=1¯¯¯¯FmN+IM)σN.

Subtracting (15) from (16), rearranging and simplifying gives

 E[∥∥¯¯¯¯¯wN(i+1)∥∥2σN] = E[∥∥¯¯¯¯¯wN(i)∥∥2σN] (17) +∥wo∥2¯¯¯Fi+1NσN−∥wo∥2¯¯¯FiNσN +BN(i∑m=1¯¯¯¯FmN+IM)σN −BN(i−1∑m=1¯¯¯¯FmN+IM)σN = E[∥∥¯¯¯¯¯wN(i)∥∥2σN]+BN¯¯¯¯FiNσN +∥wo∥2¯¯¯FiN[¯¯¯FN−IM]σN

The term can be written in an iterative way as follows:

 AN,i=¯¯¯¯FiN=AN,i−1¯¯¯¯FN. (18)

Inserting (18) into (17) gives the final update recursion

 E[∥∥¯¯¯¯¯wN(i+1)∥∥2σN] = E[∥∥¯¯¯¯¯wN(i)∥∥2σN]+BNAN,iσN (19) +∥wo∥2AN,i[¯FN−IM]σN.

Taking the weighting matrix as results in the mean-square-deviation (MSD) and gives the excess mean square error (EMSE).

For steady-state, we take a look at the relation between the first and second updates for node again. We do this in an iterative way as follows:

 E[∥∥¯¯¯¯¯wN(2)∥∥2σN] = E[∥∥¯¯¯¯¯wN−1(2)∥∥2FNσN]+σ2v,Nμ2NλTNσN = E[∥∥¯¯¯¯¯wN−2(2)∥∥2FN−1FNσN]+σ2v,Nμ2NλTNσN +σ2v,N−1μ2N−1λTN−1FNσN = E[∥∥¯¯¯¯¯wN−3(2)∥∥2FN−2FN−1FNσN]+σ2v,Nμ2NλTNσN +σ2v,N−1μ2N−1λTN−1FNσN +σ2v,N−2μ2N−2λTN−2FN−1FNσN = E[∥∥¯¯¯¯¯wN(1)∥∥2¯¯¯FNσN]+BNσN,

where the last step is obtained by further successive iterations and simplifying. Generalizing for iteration gives

 E[∥∥¯¯¯¯¯wN(i+1)∥∥2σN]=E[∥∥¯¯¯¯¯wN(i)∥∥2¯¯¯FNσN]+BNσN. (20)

 (21)

Rearranging and simplifying (21) gives the steady-state equation

 E[∥∥¯¯¯¯¯wN(∞)∥∥2σN]=BN[IM−¯¯¯¯FN]−1σN. (22)

Using results in the steady-state mean-square-deviation (MSD) and gives the steady-state excess mean square error (EMSE).

## 4 Results and Discussion

This section compares the theoretical findings presented above with simulation results. A network of nodes is used for the simulations, with the length of the unknown vector being . Results are shown for SNR values of dB, dB and dB. The noise power values at each node for the different SNR values are shown in Fig. 2.

In the first experiment, the input data is assumed to be white. The convergence speed varies in order to have a comparison at both fast convergence and slow convergence. Therefore, the values used are and . The results are shown in Figs. 3 and 4. As can be seen, there is an excellent match between theory and simulation curves for all values of SNR. For comparison, the steady-state results obtained using (22) are also shown.

In the second experiment, the two scenarios are exactly the same except that the input data is now correlated, with the correlation factor being . The results are shown in Figs. 5 and 6. It should be noted that there is a slight discrepancy between the curves during the transient phase. However, this discrepancy is very low and the two curves are closely matched.

For further comparison, the steady-state results from the above experiments are listed below in Table 1. The aim of this table is to show the comparison between the results from (19) and (22). As can be seen, the results from both equations yield the same steady-state values.

## 5 Conclusion

This work presents the complete transient analysis for the incremental LMS algorithm, first presented in [2]. The learning behavior for the transient analysis is presented. Results show that the steady-state results from the transient analysis and the steady-state results from the steady-state analysis are an exact match. Furthermore, the simulation results closely match the theoretical results for different scenarios.

## References

• [1] A. H. Sayed, A. H.: Adaptive Networks. Proceedings of the IEEE. 102, 460-497 (2014).
• [2] Lopes, C. G.; Sayed, A. H.: Incremental adaptive strategies over distributed networks. IEEE Trans. Signal Process. 55, 4064-4077 (2007).
• [3] Lopes, C. G.; Sayed, A. H.: Diffusion least-mean squares over adaptive networks: Formulation and performance analysis. IEEE Trans. Signal Process. 56, 3122-3136 (2008).
• [4] Sayed, A. H.: Fundamentals of Adaptive Filtering. Wiley, New York (2003).