1 Introduction
A nonlinear system is usually called an Hammerstein system when it is composed of two blocks in cascade, the first being a static nonlinearity and the second a linear timeinvariant (LTI) dynamic system [ljung1999system].
There are several areas in science and engineering where Hammerstein systems find applications, see e.g. [hunter1986identification], [westwick2001separable], [bai2009identification]. For this reason, in recent years Hammerstein system identification has become a popular and rather active research topic [schoukens2011parametric], [han2012hammerstein].
Several approaches have been proposed for Hammerstein system identification. For instance, in [greblicki1986identification] a kernelbased regression method is described, [greblicki2002stochastic] proposes an identification approach based on a stochastic approximation, while [goethals2005subspace] focuses on subspace methods. In [westwick2001separable], [bai2004convergence] and [rangan1995new] iterative methods based on leastsquares are studied.
An interesting approach was proposed by ErWei Bai in [bai1998optimal]. Here, the static nonlinearity is modeled as the linear combination of basis functions, while the LTI system is assumed to be a finite impulse response (FIR) with
coefficients. The Hammerstein system is then modeled as a linear regression, where the regressor vector is
dimensional. Since it contains all the combinations of the nonlinearity coefficients and the FIR coefficients, this vector is usually called overparameterized vector. Its estimate is obtained via leastsquares and then it is decomposed in order to obtain the nonlinearity coefficients and the impulse response. Albeit proven to be asymptotically consistent, the whole procedure suffers of two main drawbacks. First, since it relies on a leastsquares estimation of a possibly very highdimensional vector, the final estimates may suffer from high variance [ljung1999system]. Second, the procedure does not guarantee that the estimated dimensional vector can be exactly decomposed to obtain the nonlinearity coefficients and the FIR system, and thus approximations are required.In this paper, we propose a regularization technique to curb the variance of the estimates of the overparameterized vector. Similarly to [bai1998optimal], we model the Hammerstein system dynamics using the aforementioned overparameterized vector, then we solve the regression problem relying on a kernelbased approach. To this end, we introduce a novel kernel, called the Kronecker overparameterized (KOP) kernel, which is the composition of a rankone positive semidefinite matrix and the socalled firstorder stable spline kernel (see [pillonetto2010new], [pillonetto2011prediction], [bottegal2013regularized], and [pillonetto2014kernel] for details). The structure of this kernel depends on a few parameters (also called hyperparameters in this context), which we need to estimate from data. This task is addressed by an empirical Bayes approach [maritz1989empirical], that is to say by maximizing the marginal likelihood (ML) of the output. Once the kernel parameters are fixed, the overparameterized vector is estimated via regularized least squares [pillonetto2014kernel]. Equivalently, we can think of the overparameterized vector as a Gaussian random vector with zeromean and covariance matrix given by the KOP kernel. With this interpretation, the estimate corresponds to the minimum mean square error estimate in the Bayesian sense [wahba1990spline].
A contribution of this paper is to reveal some interesting properties of the estimated overparameterized vector provided by the proposed method. We prove that, as opposed to [bai1998optimal], this estimate can be decomposed exactly in order to obtain the nonlinearity coefficients and the LTI system impulse response, with no loss of information due to approximations. The concept of exact decomposition will be made clear throughout the paper. We also demonstrate strong connections with our recently proposed method [risuleo2015kernel], effectively proving that, although the two approaches are inherently different, the estimates obtained with the two methods are equivalent. Finally, we show, through several numerical experiments, that the proposed method outperforms both the algorithm proposed in [bai1998optimal] and the standard matlab system identification toolbox function for Hammerstein system identification.
The paper is organized as follows. In the next section, we formulate the Hammerstein system identification problem. In Section 3, we describe the modeling approach based on overparameterized vectors. In Section 4, we introduce the proposed identification scheme, and we give some theoretical background in Section 5. Numerical experiments are illustrated in Section 6. Some conclusions end the paper.
2 Problem formulation
We consider a stable single input single output discretetime system described by the following timedomain relations (see Figure 1)
(1) 
In the above equation, represents a (static) nonlinear function transforming the measurable input into the unavailable signal , which in turn feeds a strictly causal stable LTI system, described by the impulse response . The output measurements of the system are corrupted by white Gaussian noise, denoted by , which has unknown variance . Following a standard approach in Hammerstein system identification (see e.g. [bai1998optimal]), we assume that can be modeled as a combination of known basis functions , namely
(2) 
where the coefficients are unknown.
We assume that inputoutput samples are collected, and denote them by , . For notational convenience, we also assume null initial conditions. Then, the system identification problem we discuss in this paper is the problem of estimating samples of the impulse response, say (where is large enough to capture the system dynamics), as well as the coefficients characterizing the static nonlinearity .
2.1 Nonuniqueness of the identified system
It is wellknown (see e.g. [bai2004convergence]) that the two components of a Hammerstein system can be determined up to a scaling factor. In fact, for any , every pair , describes the inputoutput relation equally well. As suggested in [bai2004convergence], we will circumvent this nonuniqueness issue by introducing the following assumption:
Assumption 1.
The impulse response has unitary gain, i.e. , and the sign of its first nonzero element is positive.
2.2 Notation and preliminaries
Given a sequence of scalars , we denote by its vector representation, i.e.
We reserve the symbol to indicate the Kronecker product of two matrices (or vectors). We will make use of the bilinear property
where , , , and have proper dimensions. Denoting by the columnwise vectorization of a matrix , we recall that, for any two vectors and , . Given a vector , we introduce its reshape as
Given , The symbol denotes the Toeplitz matrix whose entries are elements of , namely
(3) 
Let
(4) 
and
(5) 
We have the following result, which will be used throughout the paper.
Lemma 1.
Let and be as in (3). Then
(6) 
Proof.
Note that
(7) 
which proves the statement. ∎
Based on the equality stated by Lemma 1, we extend the Toeplitz notation to matrices, that is, given we write
(8) 
3 Identification via overparameterized models
In this paper, we deal with overparameterized approaches to Hammerstein system identification. To this end, we construct the matrices
(9) 
and
(10) 
Then, we can express the Hammerstein system dynamics problem by means of the linear regression model (see also [bai1998optimal])
(11) 
where
(12) 
This vector contains the unknown parameters of the Hammerstein model. Thus, it constitutes an overparameterization with respect to the original parameters and . A desirable property of any estimate of is that it should be expressible as (12), namely as a Kronecker product of two vectors. We formalize this concept in the following definition.
Definition 1.
Let . We say that is a Kronecker overparameterized (KOP) vector if there exist and such that (12) holds.
The following lemma gives a property of KOP vectors.
Lemma 2.
Let be a KOP vector. Then and thus is a rankone matrix.
Proof.
Follows from the identity . ∎
Under Assumption 1, an dimensional KOP vector can be uniquely decomposed into the  and  dimensional vectors and .
Proposition 1.
Let be a KOP vector. Let be the th row and and the th column of , define
(13) 
Then , and .
Proof.
From (13) we have that , so is the th element of hence . In addition
(14) 
and
(15) 
which completes the proof. ∎
3.1 A review of an overparameterized method for Hammerstein system identification
In this section we review the identification procedure proposed in [bai1998optimal], which constitutes the starting point of our regularized kernelbased method. Given the model (11), consistent estimates of and can be obtained with the following steps (see [bai1998optimal] for details about consistency). First, we compute the leastsquares estimate
(16) 
Then, since we know that is a KOP vector, that is, the reshaping of into an matrix must be rankone (Lemma 2), we approximate to a KOP vector by approximating to a rankone matrix. This can be done by solving the problem
(17) 
where denotes the Frobenius norm. Expressing
by means of its singular value decomposition, i.e.
(18)  
we find that the solution of (17) is . Then (since we have assumed ) and .
Note that, since in general in not a KOP vector, generally and the truncation required by the approximation (17) introduces a bias in the estimates and that degrades performance (see [hjalmarsson2004direct]). Another drawback of this method is that it requires the leastsquares estimate of the possibly highdimensional vector . Hence, despite its consistency property, the procedure can suffer from high variance in the estimates when is not (very) large.
4 A regularized overparameterization method for Hammerstein system identification
In the previous section, we have seen that the estimator proposed in [bai1998optimal] suffers from high variance and from a bias that degrades performance. To control the variance of the estimate, we can use regularization (for a full treatment, see [bishop2006pattern], [hastie2005elements]); this means we have to select some properties we want to enforce on the estimated vector. As we have pointed out in the previous section, a vector is a good candidate estimate of the unknown vector if it satisfies the following properties:

is a KOP vector, so that it can be decomposed as in (12);

The mean square error of is low, so that the estimated values and are close to the true values.
A natural approach to incorporate (at least) the second property is based on regularization or, equivalently, on the Gaussian regression framework [rasmussen2006gaussian]. Thus, we model as a Gaussian random vector, namely
(19) 
where the covariance matrix (also called a kernel) is parameterized by the vector . The structure of determines the properties of the realizations from (19) and, consequently, of the estimates of . In the next subsection, we focus on designing a kernel suitable for Hammerstein system identification that incorporates also the first property.
4.1 The KOP kernel
We first recall the kernelbased identification approach for LTI systems proposed in [pillonetto2010regularized], [pillonetto2011prediction], and we model also as a realization of a zeromean dimensional Gaussian process. Then we have
(20) 
where the kernel corresponds to the socalled firstorder stable spline kernel (or TC kernel in [chen2012estimation]). It is defined as
(21) 
where the hyperparameter is a scalar in the interval . The choice of this kernel is motivated by the fact that it promotes BIBO stable and smooth realizations. The decay velocity of these realizations is regulated by . Typical formulations of the stable spline kernel (see e.g. [pillonetto2014kernel]) include a scaling factor multiplying the kernel, in order to capture the amplitude of the unknown impulse response. Here such an hyperparameter is redundant, as we are working under Assumption 1.
To reconcile (20) with (19), we need to ensure that the transformation is a Gaussian vector, when is Gaussian. This is possible if is a (deterministic) dimensional vector. In this case is an dimensional Gaussian random vector with covariance matrix
(22) 
which is parameterized by the vector . In this way, we have defined a new kernel for system identification based on overparameterized vector regression. We formalize this in the following definition.
Definition 2.
Note that is rankdeficient, its rank being equal to . Rankdeficient kernels for system identification have also been studied in [chen2013rank].
4.2 Estimation of the overparameterized vector
We now derive the estimation procedure for the vector . Recalling that the noise distribution is Gaussian and given the Gaussian description of (19
), the joint distribution of
and is Gaussian. Hence, we can write(24) 
where and
(25) 
In (24) we have highlighted the dependence of the joint distribution on the vector and the noise variance . Assume these quantities are given; then the minimum mean square error estimate of can be computed as (see e.g. [anderson2012optimal])
(26)  
To be able to compute (24) we first need to determine and . This can be done by maximizing the ML of the output data (see e.g. [pillonetto2014tuning]). Then we have
(27) 
The resulting estimation procedure for can be summarized by the following two steps.
Having obtained , it remains to establish how to decompose it in order to obtain the estimates and . In the next section we shall see that, using the proposed approach, such an operation becomes natural.
5 Properties of the estimated overparameterized vector
In this section, we analyze some properties of the regularized overparameterization estimate of . In particular, we show that the estimates produced by (26) are KOP vectors. Then, we show that this procedure leads to exactly the same estimator as the one we proposed in [risuleo2015kernel]; where the coefficients of the nonlinearity were considered as model parameters and not included among the kernel hyperparameters.
To further specify the equivalence, we first briefly review the Hammerstein system identification approach proposed in [risuleo2015kernel] which is based on a different Gaussian process assumption.
5.1 A review of the method proposed in [risuleo2015kernel]
Let . Then we can model the measurements with the linear relation
(28) 
Modeling as a Gaussian random vector with covariance given by the stable spline kernel (21), we notice that a joint Gaussian description holds between and . Hence we can write
(29) 
where and . Note that (29) depends on the parameters , and . These parameters are estimated via ML maximization, that is by solving
(30) 
The minimum mean square estimate of is then computed as
(31) 
In the next section, we point out the strong connection between (31) and the estimate (26), produced by the KOP kernelbased regression approach.
5.2 The estimate (26) is a KOP vector
In this section we prove that, when the KOP kernelbased method is used to estimate (26), the resulting estimates can be decomposed as Kronecker products of lowerdimensional vectors and thus they are KOP vectors. Before arriving at this result we show the equivalence between the output measurement models (11) and (28).
Lemma 3.
Proof.
Recalling the bilinear property of the Kronecker product and Lemma 1, we see that
which proves the result. ∎
Theorem 1.
Proof.
Let
(33)  
(34) 
be the marginal likelihoods of the two models. We find that
Using Lemma 3, we have that , hence and are equivalent. The same promptly holds for their ML maximizers and . ∎
We are now in the position to prove that the estimate is a KOP vector
Theorem 2.
Proof.
Corollary 1.
The estimate given in (26) is a KOP vector and is rankone.
Proof.
We can make an interesting observation, that further links the KOP estimate to our previous kernel based estimator:
Corollary 2.
We have established that the estimate produced using the procedure detailed in Section 4 is a KOP vector. So, the estimates of the impulse response and the nonlinearity coefficients can be retrieved using (13). The whole procedure is summarized in Algorithm 1.
The result of the outlined regularization procedure applied to the overparameterized vector, with a suitable rank deficient kernel, yields estimates that are equivalent to the ones provided by the procedure outlined in [risuleo2015kernel].
6 Numerical Experiments
We evaluate the proposed algorithm with numerical simulations of Hammerstein systems. We set up 4 experiments in order to test different experimental conditions. The experiments consist of 200 independent Monte Carlo runs each. At every Monte Carlo run, we generate random data and Hammerstein systems, according to the following specifics.

The linear subsystem model is of output error type:
(38) generated by picking poles and zeros at random. The poles and zeros were sampled in conjugate pairs with uniform in and uniform in .

The input nonlinearity is a polynomial of fourth order. It is a linear combination of Legendre polynomial basis functions, defined as
(39) where . The coefficients are chosen uniformly in .

The input to the system is Gaussian white noise with unit variance.

The experimental data consists in pairs of inputoutput samples, simulated from zero initial conditions.

The measurement noise is Gaussian and white. Its variance is a fraction of the noiseless output variance, i.e.
(40) where depends on the experiment.
Every experiment is carried out in a different signal to noise ratio (SNR) condition, see Table I.
Experiment  1  2  3  4 

SNR  10  20  50  100 
We aim at estimating samples of the system impulse response of the LTI systems (which are such that and with the sign of the first sample positive) and the coefficients of the nonlinear block. We test the following estimation methods.

KOP: This is the method described in this paper. The ML optimization problem is solved using the function fminsearch available in matlab. The search was initialized with the elements of uniformly sampled in , , and equal to the sample variance of the residuals of .

LSOP: This estimator implements the leastsquares overparameterizationbased method proposed in [bai1998optimal] and briefly reviewed in Section 3.1. Note that, under the working experimental conditions, this method has to perform a leastsquares estimate of a dimensional vector.

NLHW: This is the matlab function nlhw that uses the prediction error method to identify the linear block in the system (see [ljung2009developments] for details). To get the best performance from this method, we equip it with an oracle that knows the true order of the LTI system generating the measurements and knows the order of the polynomial input nonlinearity.
Note that all the methods have available the same amount of prior information, namely the orders of the input polynomial. The knowledge of th order of the linear block is known only to NLHW, which makes use of a parametric description of the linear system. Furthermore, we note that, due to the Gaussianity of the noise, the leastsquares procedure in LSOP is optimal in the GaussMarkov sense.
We assess the accuracy of the estimated models using two performance indices. The first is the fit of the system impulse response, defined as
(41) 
where is the system generated at the th run of each experiment, its estimate and its mean. The second is the fit of the static nonlinear function, given by
(42) 
Figure 1 shows the results of the outcomes of the 4 experiments. The box plots compare the results of KOP, LSOP and NLHW for the considered signal to noise ratios. We can see that, for high SNR, all the estimators perform well, especially in identifying the nonlinearity coefficients. For lower SNR , however, the proposed method KOP performs substantially better than the others. This is mainly because of the regularizing effect of the KOP kernel that reduces the variance of the estimates. Notice also that the proposed approach enforces the rank deficiency in the reshaped version of , so it circumvents the errors introduced by the rankone approximation made by LSOP. The main drawback of NLHW is that it relies on a high dimensional nonlinear optimization, as it needs to estimate all the parameters in the model. The proposed method is instead nonparametric, and does not rely on the knowledge of the order of the LTI system.
7 Conclusions
Regularization is an effective technique to control the variance of least squares estimates. In this paper we have studied how to improve popular overparameterization methods for Hammerstein system identification using Gaussian process regression with a suitable prior. To this end, starting from the stable spline kernel, we have introduced the KOP kernel, which we believe to be a novel concept in Hammerstein system identification. Using the KOP kernel, we have designed a regularized leastsquares estimator which provides an estimate of the overparameterized vector. The impulse response of the LTI system and the coefficients of the static nonlinearity are then retrieved by suitably decomposing the estimated vector. In contrast with the original overparameterization method, this decomposition involves no approximation. An important contribution is showing that this procedure estimate is equivalent to our recently proposed kernelbased method [risuleo2015kernel]. Using simulations, we have shown that the proposed method compares very favorably with the current stateoftheart algorithms for Hammerstein system identification.
The introduction of the KOP kernel possibly opens up for new effective system identification methods based on the combination of overparameterized vectors and regularization techniques. In fact, we believe that Hammerstein system identification is not the only problem where KOP kernels could find application. Another possible extension of the proposed method is the design of new kernels merging a kernel for the static nonlinearity and the stable spline kernel. The main issue with this approach is that, at least theoretically, the Gaussian description of the resulting overparameterization vector would be lost.
Comments
There are no comments yet.