1 Introduction
Kernel adaptive filtering [1]
is an attractive approach for nonlinear estimation problems based on the theory of reproducing kernel Hilbert space (RKHS), and a number of kernel adaptive filtering algorithms have been proposed
[2, 3, 4, 5, 6, 7, 8]. The existing kernel adaptive filtering algorithms are classified into two general categories according to the space in which optimization is performed
[6]: (i) the RKHS approach (e.g., [2, 5, 7]) and (ii) the parameterspace approach (e.g., [4, 6, 9]). The kernel normalized least mean square (KNLMS) algorithm is a representative example of the parameterspace approach and its stochastic behavior analyses have been presented in [10, 11, 12]. The analyses have clarified the transient and steadystate performance in the mean squared error (MSE). A stochastic restrictedgradient descent algorithm studied in the present work is an RKHS counterpart of the KNLMS algorithm. We call it the natural kernel least mean square (Natural KLMS) algorithm to distinguish it from the KLMS algorithm proposed in [13]. A primitive question is whether it is possible to give the same analyses as in [10, 11, 12] for the stochastic restrictedgradient descent algorithm. If this is possible, it will provide a theoretical basis to compare the performances of KNLMS and Natural KLMS. This will eventually give a new insight into the relationship between the two classes of kernel adaptive filtering algorithms.To clarify the orientation of the Natural KLMS algorithm in the kernel adaptive filtering researches, let us give a short note on the RKHS approach. Dictionary sparsification is a common issue of kernel adaptive filtering [14, 3, 4, 1]. The KLMS algorithm [13] updates the filter only when the current input datum is added into the dictionary and this would cause severe performance degradations. A systematic scheme which eliminates such a limitation has been proposed in [15] under the name of hyperplane projection along affine subspace (HYPASS)
. The HYPASS algorithm updates the filter using the projection onto the zeroinstantaneouserror hyperplane along the socalled
dictionary subspace , the subspace spanned by the dictionary elements. This is achieved by projecting the gradient direction onto . In a nutshell, HYPASS is the NLMS algorithm operated in the dictionary subspace . Natural KLMS is actually an LMS counterpart of HYPASS and we consider this LMSbased algorithm to make the analysis feasible. In [16] and [7], the mean square convergence analysis and the theoretical steadystate MSE have been presented for the KLMS and Quantized KLMS algorithms, respectively. However, transient performance analyses have not yet been reported due to the difficulty in treating the growing number of dictionary elements.In this paper, we present a stochastic behavior analysis of the Natural KLMS algorithm with a Gaussian kernel under i.i.d. random inputs based on the framework presented in [12]. Natural KLMS is derived by using the restricted gradient which gives a steepest ascent direction within the dictionary subspace . The analysis provides theoretical MSEs during the transient phase as well as at the steadystate. We also derive stability conditions in the mean and meansquare sense. The key ingredients for the analysis are the restricted gradient and the isomorphism between the dictionary subspace and a Euclidean space; these were also the key when the first and second authors developed a sparse version of HYPASS in [17]. The validity of the analysis is illustrated by simulations.
2 Preliminaries
We address an adaptive estimation problem of a nonlinear system with sequentially arriving input signals , and its noisy output , where
is assumed an i.i.d. random vector and
is a zeromean additive noise uncorrelated with any other signals. The function is modeled as an element of the RKHS associated with a Gaussian kernel , where is the kernel parameter. We denote by and the canonical inner product and the norm defined in , respectively, and and those in . A kernel adaptive filter is given as a finite order filter:(1) 
where are the filter coefficients and indicates the dictionary ; is the time index. Without loss of generality, we assume that the dictionary is a linearly independent set so that it spans an dimensional subspace
(2) 
which is called the dictionary subspace. Although the dictionary is updated typically during the learning process, we assume that the dictionary is fixed to make the analysis tractable.
The instantaneous error at time instant is defined as , where is the vector of the kernelized input and is the coefficient vector. The MSE cost function, with respect to the coefficient vector , is given by
(3) 
where is the autocorrelation matrix of the kernelized input and is the crosscorrelation vector between and . With the optimization in RKHS in mind, the MSE, with respect to , is given by:
(4) 
While the KNLMS algorithm optimizes in the Euclidean space , the Natural KLMS algorithm presented in the following section optimizes in the RKHS under the restriction to the dictionary subspace , or in short, it optimizes in . Referring to [2]
, the stochastic gradient descent method for
in updates the filter along the ‘line’ (one dimensional subspace) spanned by the singleton . This implies that the filter is updated only when is added into the dictionary, because otherwise for any . We thus present the restricted gradient, which was initially introduced in [17], and derive the Natural KLMS algorithm in the following section.3 The Natural KLMS algorithm
The ordinary gradient of in is given by . Given any positive definite matrix , and define an inner product and its induced norm, respectively. The gradient of (3) with the inner product is defined as [17]
(5) 
where for is the Gram matrix.^{1}^{1}1The Gram matrix is ensured to be positive definite due to the assumption that the elements of the dictionary are linearly independent. The definition of the gradient is validated by observing that for any .
The functional Hilbert space of dimension is isomorphic to the Hilbert space under the correspondence (see Fig. 1)
(6) 
Note here that the isomorphism as Hilbert spaces includes, in addition to the onetoone correspondence between the elements, the preservation of the inner product; i.e., for any and . Under the correspondence in (6), the restricted gradient is defined, through the gradient in , as follows [17]:
(7) 
The restricted gradient gives the steepest ascent direction, within the dictionary subspace , of the tangent plane of the functional (4) at the point . See the derivation of the restricted gradient in [17]. An instantaneous approximation of the restricted gradient , where is given by . Hence, for the initial vector , the stochastic restrictedgradient descent method, which we call the Natural KLMS algorithm, is given by
(8) 
where is the step size. The Natural KLMS algorithm (8) requires complexity for each time update, and this would make a significant impact on the overall complexity of the algorithm. In [15, 18], a simple selectiveupdating idea for complexity reduction without serious performance degradations has been presented; it will be shown in Section 5 that the selectiveupdating works well.
4 Performance analysis
4.1 Key idea and assumption
We derive a theoretical MSE and stability conditions for the Natural KLMS algorithm given by (8) with Gaussian kernel, given the dictionary . Leftmultiplying bothsides of (8) by the square root of yields^{2}^{2}2For any positive semidefinite matrix , there exists a unique square root satisfying .
(9) 
where , . The cost function in (3) can be rewritten by
(10) 
as a function of , and (9) can be regarded as a stochastic gradient descent method for this cost function . Here
(11) 
and
(12) 
are the autocorrelation matrix and the crosscorrelation vector for the modified vector , respectively.
As is positive definite [10], the optimum weight vector is given by
(13) 
and with , we define the weight error vector
(14) 
In the present analysis, needs to be independent of , which is guaranteed by making the following conditioned modified independence assumption (CMIA) [12].
Assumption 1
is independent of .
4.2 Mean weight error analysis
The estimation error can be expressed by
(15) 
Substituting (15) to (9), we obtain the recursive expression for :
(16) 
Using CMIA, we obtain the mean weight error model
(17) 
where denotes the identity matrix for any positive integer . Let the input
be a random vector following a Gaussian distribution with zero mean and the covariance matrix
. Then, the component () of the autocorrelation matrix of is given by [12]:where , , and stands for determinant.
From the recursion in (17), we obtain the mean stability condition of the Natural KLMS algorithm as follows.
Theorem 1 (Stability in the mean)
Assume CMIA holds. Then, for any initial condition, given dictionary , the Natural KLMS algorithm asymptotically converges in the mean if the step size is chosen to satisfy
(18) 
where
denotes the maximum eigenvalue of the matrix.
Proof: It is clear from the wellknown mean stability results (see, e.g., [19]).
4.3 Meansquare error analysis
Squaring (15) and taking its expectation under CMIA, the MSE (10) of Natural KLMS can be rewritten as
(19) 
where is the correlation matrix of and is the minimum MSE. We assume is sufficiently close to the optimal solution of the infinite order model so that , and and are uncorrelated. Following the arguments in [10, Section III. D] with and replaced respectively by and , we arrive, with simple manipulations, at the following recursion:
(20) 
where and its component can be approximated as
(21) 
Here, the component () of is defined as
(22) 
where , () is the th column vector of , and . The approximation in (21) can be developed by following the arguments in [12, Section 3.3] with and replaced by and , respectively. Finally, the component of can be written as
(23) 
where , , with can be computed by [12, Eq. (35)].
Let us now establish the meansquare stability condition and derive the steadystate MSE. Due to the presence of in (20), we exploit the lexicographic representation of , i.e, the columns of each matrix are stacked on top of each other into a vector. The recursion (20) can be rewritten as
(24) 
where and are the lexicographic forms of and , respectively, and
(25) 
where , , and is an matrix entries are: with . Here, denotes the Kronecker product. By (24) and (25), we obtain the following results.
Theorem 2 (Meansquare stability)
Assume CMIA holds. For any initial conditions and satisfying (18), given a dictionary , the Natural KLMS algorithm with Gaussian kernel is meansquare stable, if the matrix is stable (i.e., the spectral radius of is less than one).
Proof: The algorithm is said to be meansquare stable if, and only if, the state vector remains bounded and tends to a steadystate value, regardless of the initial condition [19]. To complete the proof, it is sufficient to show that remains bounded and tends to a steadystate value, where is a diagonal positive definite matrix. This is verified by the fact that is bounded and tends to a steadystate value if the matrix is stable.
Theorem 3 (MSE in the steady state)
Consider a sufficiently small step size , which ensures mean and meansquare stability. The steadystate MSE is given by (19) with the lexicographic representation of given by
(26) 
provided that is invertible.
We remark on Theorem 3 that the invertibility of is actually ensured by the stability of .
5 Simulation results
We shall compare simulated learning curves and analytic models to validate the analysis. We conduct two experiments under the same settings as in [12]. In the first experiment, the input sequence is generated by
(27) 
where
is the noise following the i.i.d standard normal distribution. The nonlinear system is defined as follows:
(28) 
where
is an additive zeromean Gaussian noise with the standard deviation
. The input vector is . The step size, the standard deviation of the input, the input correlation parameter, the kernel parameter and the dictionary size are set to , , , and , respectively. The dictionary is samples on a uniform grid defined on .Selective update  

Full update 
Fig. 2 depicts the results: the learning curves, the theoretical transient MSE curve, and the theoretical steady state MSE line are presented in blue, red, and green (dotted line), respectively. The simulated curve is obtained by averaging over 300 MonteCarlo runs. The theoretical MSE is estimated by (19) with recursively evaluated by (20). The steady state MSE is computed by Theorem 3. Although the input is correlated, the theoretical MSE presented in this paper well represents the behavior of the Natural KLMS algorithm.
In the second experiment, the fluidflow control problem is considered [20]:
(29) 
where the input is generated again by (27) with and , and the standard deviation of the additive Gaussian noise is set to . The kernel parameter is set to . The input vector is . 31 dictionary elements are selected from the inputs based on the coherence criterion [4] in advance. The step size is set to . The simulated curves are obtained by averaging over 300 MonteCarlo runs, and the same theoretical model as the first experiment is used. Fig 3 depicts the results. Again, the simulation results show the validity of the analysis. Table 1 summarizes the overall periteration complexity (the number of real multiplications) of the Natural KLMS algorithm with full update and selective update (see [15, 18]), and Fig. 4 illustrates the complexity as a function of the dictionary size for and ; is counted simply as . Here, means that only one coefficient is updated at each iteration and hence the complexity is reduced drastically. Fig 2 and 3 depict the MSE learning curves of the Natural KLMS algorithm with full update and selective update for . It can be seen that the Natural KLMS algorithm with the selective update exhibits a steadystate MSE comparable to the fullupdate case with drastically lower complexity.
6 Conclusion
This paper presented a stochastic behavior analysis of the Natural KLMS algorithm which is a stochastic restrictedgradient descent method. The analysis provided a transient and steadystate MSEs of the algorithm. We also derived stability conditions in the mean and meansquare sense. Simulation results showed that the theoretical MSE curves given by the analysis well meet the simulated MSE curves. The outcomes of this study will serve as a theoretical basis to compare the performances of KNLMS and Natural KLMS.
References
 [1] W. Liu, J. Príncipe, and S. Haykin, Kernel Adaptive Filtering. New Jersey: Wiley, 2010.
 [2] J. Kivinen, A. J. Smola, and R. C. Williamson, “Online learning with kernels,” IEEE Trans. Signal Process., vol. 52, no. 8, pp. 2165–2176, Aug. 2004.
 [3] Y. Engel, S. Mannor, and R. Meir, “The kernel recursive leastsquares algorithm,” IEEE Trans. Signal Process., vol. 52, no. 8, pp. 2275–2285, Aug. 2004.
 [4] C. Richard, J.C. M. Bermudez, and P. Honeine, “Online prediction of time series data with kernels,” IEEE Trans. Signal Process., vol. 57, no. 3, pp. 1058–1067, Mar. 2009.
 [5] K. Slavakis, S. Theodoridis, and I. Yamada, “Adaptive constrained learning in reproducing kernel Hilbert spaces: the robust beamforming case,” IEEE Trans. Signal Process., vol. 57, no. 12, pp. 4744–4764, Dec. 2009.
 [6] M. Yukawa, “Multikernel adaptive filtering,” IEEE Trans. Signal Processing, vol. 60, no. 9, pp. 4672–4682, Sep. 2012.

[7]
B. Chen, S. Zhao, P. Zhu, and J. C. Príncipe, “Quantized kernel least mean
square algorithm,”
IEEE Trans. Neural Networks and Learning Systems
, vol. 23, no. 1, pp. 22–32, 2012.  [8] S. V. Vaerenbergh, M. LazaroGradilla, and I. Santamaria, “Kernel recursive leastsquares tracker for timevarying regression,” IEEE Trans. Neural Network and Learning Systems, vol. 23, no. 8, pp. 1313–1326, Aug 2012.
 [9] W. Gao, J. Chen, C. Richard, and J. Huang, “Online dictionary learning for kernel LMS,” IEEE Trans. Signal Process., vol. 62, no. 11, pp. 2765–2777, June 2014.
 [10] W. D. Parreira, J.C. M. Bermudez, C. Richard, and J. Y. Tourneret, “Stochastic behavior analysis of the Gaussian kernel leastmeansquare algorithm,” IEEE Trans. Signal Processing, vol. 60, no. 5, pp. 2208–2222, May 2012.
 [11] C. Richard and J.C. M. Bermudez, “Closedform conditions for convergence of the gaussian kernelleastmeansquare algorithm,” in Proc. Asilomar, Pacific Grove, CA, USA, Nov. 2012, pp. 1797–1801.
 [12] J. Chen, W. Gao, C. Richard, and J.C. M. Bermudez, “Convergence analysis of kernel LMS algorithm with pretuned dictionary,” in Proc. IEEE ICASSP, 2014, pp. 7243–7247.
 [13] W. Liu, P. P. Pokharel, and J. C. Príncipe, “The kernel leastmeansquare algorithm,” IEEE Trans. Signal Process., vol. 56, no. 2, pp. 543–554, Feb. 2008.

[14]
J. Platt, “A resourseallocating network for function interpolation,”
IEEE Transactions on Neural Networks and Learning Systems, vol. 3, no. 2, pp. 213–225, 1991.  [15] M. Yukawa and R. Ishii, “An efficient kernel adaptive filtering algorithm using hyperplane projection along affine subspace,” in Proc. EUSIPCO, 2012, pp. 2183–2187.
 [16] B. Chen, S. Zhao, P. Zhu, and J. C. Príncipe, “Mean square convergence analysis for kernel least mean square algorithm,” Signal Processing, vol. 92, pp. 2624–2632, 2012.
 [17] M. Takizawa and M. Yukawa, “An efficient sparse kernel adaptive filtering algorithm based on isomorphism between functional subspace and euclidean space,” in Proc. IEEE ICASSP, 2014, pp. 4508–4512.
 [18] ——, “Adaptive nonlinear estimation based on parallel projection along affine subspaces in reproducing kernel hilbert space,” IEEE Trans. Signal Processing, submitted for publication.
 [19] A. H. Sayed, Adaptive Filters. John Wiley & Sons, 2008.
 [20] H. AlDuwaish, M. N. Karim, and V. Chandrasekar, “Use of multilayer feedforward neural networks in identification and control of wiener model,” in Proc. IEEE Control Theory Appl., vol. 143, 1996, pp. 255–258.
Comments
There are no comments yet.