A new kernel-based approach for overparameterized Hammerstein system identification

In this paper we propose a new identification scheme for Hammerstein systems, which are dynamic systems consisting of a static nonlinearity and a linear time-invariant dynamic system in cascade. We assume that the nonlinear function can be described as a linear combination of p basis functions. We reconstruct the p coefficients of the nonlinearity together with the first n samples of the impulse response of the linear system by estimating an np-dimensional overparameterized vector, which contains all the combinations of the unknown variables. To avoid high variance in these estimates, we adopt a regularized kernel-based approach and, in particular, we introduce a new kernel tailored for Hammerstein system identification. We show that the resulting scheme provides an estimate of the overparameterized vector that can be uniquely decomposed as the combination of an impulse response and p coefficients of the static nonlinearity. We also show, through several numerical experiments, that the proposed method compares very favorably with two standard methods for Hammerstein system identification.



There are no comments yet.


page 1

page 2

page 3

page 4


Blind system identification using kernel-based methods

We propose a new method for blind system identification. Resorting to a ...

Identification of complex systems in the basis of wavelets

In this paper is proposed the method of the identification of complex dy...

Outlier robust system identification: a Bayesian kernel-based approach

In this paper, we propose an outlier-robust regularized kernel-based met...

A novel Multiplicative Polynomial Kernel for Volterra series identification

Volterra series are especially useful for nonlinear system identificatio...

A new kernel-based approach to system identification with quantized output data

In this paper we introduce a novel method for linear system identificati...

Estimation of sparse linear dynamic networks using the stable spline horseshoe prior

Identification of the so-called dynamic networks is one of the most chal...

Robust EM kernel-based methods for linear system identification

Recent developments in system identification have brought attention to r...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A nonlinear system is usually called an Hammerstein system when it is composed of two blocks in cascade, the first being a static nonlinearity and the second a linear time-invariant (LTI) dynamic system [ljung1999system].

There are several areas in science and engineering where Hammerstein systems find applications, see e.g. [hunter1986identification][westwick2001separable][bai2009identification]. For this reason, in recent years Hammerstein system identification has become a popular and rather active research topic [schoukens2011parametric][han2012hammerstein].

Several approaches have been proposed for Hammerstein system identification. For instance, in [greblicki1986identification] a kernel-based regression method is described, [greblicki2002stochastic] proposes an identification approach based on a stochastic approximation, while [goethals2005subspace] focuses on subspace methods. In [westwick2001separable][bai2004convergence] and [rangan1995new] iterative methods based on least-squares are studied.

An interesting approach was proposed by Er-Wei Bai in [bai1998optimal]. Here, the static nonlinearity is modeled as the linear combination of basis functions, while the LTI system is assumed to be a finite impulse response (FIR) with

coefficients. The Hammerstein system is then modeled as a linear regression, where the regressor vector is

-dimensional. Since it contains all the combinations of the nonlinearity coefficients and the FIR coefficients, this vector is usually called overparameterized vector. Its estimate is obtained via least-squares and then it is decomposed in order to obtain the nonlinearity coefficients and the impulse response. Albeit proven to be asymptotically consistent, the whole procedure suffers of two main drawbacks. First, since it relies on a least-squares estimation of a possibly very high-dimensional vector, the final estimates may suffer from high variance [ljung1999system]. Second, the procedure does not guarantee that the estimated -dimensional vector can be exactly decomposed to obtain the nonlinearity coefficients and the FIR system, and thus approximations are required.

In this paper, we propose a regularization technique to curb the variance of the estimates of the overparameterized vector. Similarly to [bai1998optimal], we model the Hammerstein system dynamics using the aforementioned overparameterized vector, then we solve the regression problem relying on a kernel-based approach. To this end, we introduce a novel kernel, called the Kronecker overparameterized (KOP) kernel, which is the composition of a rank-one positive semi-definite matrix and the so-called first-order stable spline kernel (see [pillonetto2010new][pillonetto2011prediction][bottegal2013regularized], and [pillonetto2014kernel] for details). The structure of this kernel depends on a few parameters (also called hyperparameters in this context), which we need to estimate from data. This task is addressed by an empirical Bayes approach [maritz1989empirical], that is to say by maximizing the marginal likelihood (ML) of the output. Once the kernel parameters are fixed, the overparameterized vector is estimated via regularized least squares [pillonetto2014kernel]. Equivalently, we can think of the overparameterized vector as a Gaussian random vector with zero-mean and covariance matrix given by the KOP kernel. With this interpretation, the estimate corresponds to the minimum mean square error estimate in the Bayesian sense [wahba1990spline].

A contribution of this paper is to reveal some interesting properties of the estimated overparameterized vector provided by the proposed method. We prove that, as opposed to [bai1998optimal], this estimate can be decomposed exactly in order to obtain the nonlinearity coefficients and the LTI system impulse response, with no loss of information due to approximations. The concept of exact decomposition will be made clear throughout the paper. We also demonstrate strong connections with our recently proposed method [risuleo2015kernel], effectively proving that, although the two approaches are inherently different, the estimates obtained with the two methods are equivalent. Finally, we show, through several numerical experiments, that the proposed method outperforms both the algorithm proposed in [bai1998optimal] and the standard matlab system identification toolbox function for Hammerstein system identification.

The paper is organized as follows. In the next section, we formulate the Hammerstein system identification problem. In Section 3, we describe the modeling approach based on overparameterized vectors. In Section 4, we introduce the proposed identification scheme, and we give some theoretical background in Section 5. Numerical experiments are illustrated in Section 6. Some conclusions end the paper.

2 Problem formulation

We consider a stable single input single output discrete-time system described by the following time-domain relations (see Figure 1)


In the above equation, represents a (static) nonlinear function transforming the measurable input into the unavailable signal , which in turn feeds a strictly causal stable LTI system, described by the impulse response . The output measurements of the system are corrupted by white Gaussian noise, denoted by , which has unknown variance . Following a standard approach in Hammerstein system identification (see e.g. [bai1998optimal]), we assume that can be modeled as a combination of known basis functions , namely


where the coefficients are unknown.

Figure 1: Block scheme of the Hammerstein system.

We assume that input-output samples are collected, and denote them by , . For notational convenience, we also assume null initial conditions. Then, the system identification problem we discuss in this paper is the problem of estimating samples of the impulse response, say (where is large enough to capture the system dynamics), as well as the coefficients characterizing the static nonlinearity .

2.1 Non-uniqueness of the identified system

It is well-known (see e.g. [bai2004convergence]) that the two components of a Hammerstein system can be determined up to a scaling factor. In fact, for any , every pair , describes the input-output relation equally well. As suggested in [bai2004convergence], we will circumvent this non-uniqueness issue by introducing the following assumption:

Assumption 1.

The impulse response has unitary gain, i.e. , and the sign of its first non-zero element is positive.

2.2 Notation and preliminaries

Given a sequence of scalars , we denote by its vector representation, i.e.

We reserve the symbol to indicate the Kronecker product of two matrices (or vectors). We will make use of the bilinear property

where , , , and have proper dimensions. Denoting by the columnwise vectorization of a matrix , we recall that, for any two vectors and , . Given a vector , we introduce its reshape as

Given , The symbol denotes the Toeplitz matrix whose entries are elements of , namely






We have the following result, which will be used throughout the paper.

Lemma 1.

Let and be as in (3). Then


Note that


which proves the statement. ∎

Based on the equality stated by Lemma 1, we extend the Toeplitz notation to matrices, that is, given we write


3 Identification via overparameterized models

In this paper, we deal with overparameterized approaches to Hammerstein system identification. To this end, we construct the matrices




Then, we can express the Hammerstein system dynamics problem by means of the linear regression model (see also [bai1998optimal])




This vector contains the unknown parameters of the Hammerstein model. Thus, it constitutes an overparameterization with respect to the original parameters and . A desirable property of any estimate of is that it should be expressible as (12), namely as a Kronecker product of two vectors. We formalize this concept in the following definition.

Definition 1.

Let . We say that is a Kronecker overparameterized (KOP) vector if there exist and such that (12) holds.

The following lemma gives a property of KOP vectors.

Lemma 2.

Let be a KOP vector. Then and thus is a rank-one matrix.


Follows from the identity . ∎

Under Assumption 1, an -dimensional KOP vector can be uniquely decomposed into the - and - dimensional vectors and .

Proposition 1.

Let be a KOP vector. Let be the th row and and the th column of , define


Then , and .


From (13) we have that , so is the th element of hence . In addition




which completes the proof. ∎

3.1 A review of an overparameterized method for Hammerstein system identification

In this section we review the identification procedure proposed in [bai1998optimal], which constitutes the starting point of our regularized kernel-based method. Given the model (11), consistent estimates of and can be obtained with the following steps (see [bai1998optimal] for details about consistency). First, we compute the least-squares estimate


Then, since we know that is a KOP vector, that is, the reshaping of into an matrix must be rank-one (Lemma 2), we approximate to a KOP vector by approximating to a rank-one matrix. This can be done by solving the problem


where denotes the Frobenius norm. Expressing

by means of its singular value decomposition, i.e.


we find that the solution of (17) is . Then (since we have assumed ) and .

Note that, since in general in not a KOP vector, generally and the truncation required by the approximation (17) introduces a bias in the estimates and that degrades performance (see [hjalmarsson2004direct]). Another drawback of this method is that it requires the least-squares estimate of the possibly high-dimensional vector . Hence, despite its consistency property, the procedure can suffer from high variance in the estimates when is not (very) large.

4 A regularized overparameterization method for Hammerstein system identification

In the previous section, we have seen that the estimator proposed in [bai1998optimal] suffers from high variance and from a bias that degrades performance. To control the variance of the estimate, we can use regularization (for a full treatment, see [bishop2006pattern][hastie2005elements]); this means we have to select some properties we want to enforce on the estimated vector. As we have pointed out in the previous section, a vector is a good candidate estimate of the unknown vector if it satisfies the following properties:

  1. is a KOP vector, so that it can be decomposed as in (12);

  2. The mean square error of is low, so that the estimated values and are close to the true values.

A natural approach to incorporate (at least) the second property is based on regularization or, equivalently, on the Gaussian regression framework [rasmussen2006gaussian]. Thus, we model as a Gaussian random vector, namely


where the covariance matrix (also called a kernel) is parameterized by the vector . The structure of determines the properties of the realizations from (19) and, consequently, of the estimates of . In the next subsection, we focus on designing a kernel suitable for Hammerstein system identification that incorporates also the first property.

4.1 The KOP kernel

We first recall the kernel-based identification approach for LTI systems proposed in [pillonetto2010regularized][pillonetto2011prediction], and we model also as a realization of a zero-mean -dimensional Gaussian process. Then we have


where the kernel corresponds to the so-called first-order stable spline kernel (or TC kernel in [chen2012estimation]). It is defined as


where the hyperparameter is a scalar in the interval . The choice of this kernel is motivated by the fact that it promotes BIBO stable and smooth realizations. The decay velocity of these realizations is regulated by . Typical formulations of the stable spline kernel (see e.g. [pillonetto2014kernel]) include a scaling factor multiplying the kernel, in order to capture the amplitude of the unknown impulse response. Here such an hyperparameter is redundant, as we are working under Assumption 1.

To reconcile (20) with (19), we need to ensure that the transformation is a Gaussian vector, when is Gaussian. This is possible if is a (deterministic) -dimensional vector. In this case is an -dimensional Gaussian random vector with covariance matrix


which is parameterized by the vector . In this way, we have defined a new kernel for system identification based on overparameterized vector regression. We formalize this in the following definition.

Definition 2.

We the define the Kronecker overparameterized (KOP) kernel as


where is as in (21).

Note that is rank-deficient, its rank being equal to . Rank-deficient kernels for system identification have also been studied in [chen2013rank].

4.2 Estimation of the overparameterized vector

We now derive the estimation procedure for the vector . Recalling that the noise distribution is Gaussian and given the Gaussian description of  (19

), the joint distribution of

and is Gaussian. Hence, we can write


where and


In (24) we have highlighted the dependence of the joint distribution on the vector and the noise variance . Assume these quantities are given; then the minimum mean square error estimate of can be computed as (see e.g. [anderson2012optimal])


To be able to compute (24) we first need to determine and . This can be done by maximizing the ML of the output data (see e.g. [pillonetto2014tuning]). Then we have


The resulting estimation procedure for can be summarized by the following two steps.

  1. Solve (4.2) to obtain .

  2. Compute (26) using the estimated parameters.

Having obtained , it remains to establish how to decompose it in order to obtain the estimates and . In the next section we shall see that, using the proposed approach, such an operation becomes natural.

5 Properties of the estimated overparameterized vector

In this section, we analyze some properties of the regularized overparameterization estimate of . In particular, we show that the estimates produced by (26) are KOP vectors. Then, we show that this procedure leads to exactly the same estimator as the one we proposed in [risuleo2015kernel]; where the coefficients of the nonlinearity were considered as model parameters and not included among the kernel hyperparameters.

To further specify the equivalence, we first briefly review the Hammerstein system identification approach proposed in [risuleo2015kernel] which is based on a different Gaussian process assumption.

5.1 A review of the method proposed in  [risuleo2015kernel]

Let . Then we can model the measurements with the linear relation


Modeling as a Gaussian random vector with covariance given by the stable spline kernel (21), we notice that a joint Gaussian description holds between and . Hence we can write


where and . Note that (29) depends on the parameters , and . These parameters are estimated via ML maximization, that is by solving


The minimum mean square estimate of is then computed as


In the next section, we point out the strong connection between (31) and the estimate (26), produced by the KOP kernel-based regression approach.

5.2 The estimate (26) is a KOP vector

In this section we prove that, when the KOP kernel-based method is used to estimate (26), the resulting estimates can be decomposed as Kronecker products of lower-dimensional vectors and thus they are KOP vectors. Before arriving at this result we show the equivalence between the output measurement models (11) and (28).

Lemma 3.

Let and as in (10). Then


where and are the KOP and the stable spline kernels.


Recalling the bilinear property of the Kronecker product and Lemma 1, we see that

which proves the result. ∎

Theorem 1.

Consider the output measurement models (11) and (28). Then:

  1. The marginal likelihoods of obtained from the two models are equivalent;

  2. The parameter estimates obtained from (4.2) and (30) are the same.




be the marginal likelihoods of the two models. We find that

Using Lemma 3, we have that , hence and are equivalent. The same promptly holds for their ML maximizers and . ∎

We are now in the position to prove that the estimate is a KOP vector

Theorem 2.

Assume that and are estimated using the ML approach (4.2) (or, equivalently, (30)). Then, the minimum variance estimate of in (26) is such that


where is the minimum variance estimate of in (31) and is the ML estimate of .


Using (26) and recalling the bilinear property of the Kronecker product and Lemma 1, we have


From Theorem 1 we know that ; thus, recalling (31) we have


so that (35) is obtained. ∎

Corollary 1.

The estimate given in (26) is a KOP vector and is rank-one.


Since from (35) we have , is a KOP vector. The second part of the statement follows directly from Lemma 2. ∎

We can make an interesting observation, that further links the KOP estimate to our previous kernel based estimator:

Corollary 2.

The estimates of the nonlinearity coefficients , found maximizing (4.2) and those resulting from decomposing as in (35) are the same.


Follows directly from (5.2) and Theorem 1. ∎

We have established that the estimate produced using the procedure detailed in Section 4 is a KOP vector. So, the estimates of the impulse response and the nonlinearity coefficients can be retrieved using (13). The whole procedure is summarized in Algorithm 1.

Figure 2: Results of the Monte Carlo experiments for different SNR. Top: Fit in percent of the linear system impulse response. Bottom: Fit in percent of the nonlinear transformation.

Algorithm 1: KOP kernel-based Hammerstein system identification
Input: ,
Output: ,

  1. Obtain , solving (4.2)

  2. Estimate using (26)

  3. Find , normalizing by (Proposition 1).

The result of the outlined regularization procedure applied to the overparameterized vector, with a suitable rank deficient kernel, yields estimates that are equivalent to the ones provided by the procedure outlined in [risuleo2015kernel].

6 Numerical Experiments

We evaluate the proposed algorithm with numerical simulations of Hammerstein systems. We set up 4 experiments in order to test different experimental conditions. The experiments consist of 200 independent Monte Carlo runs each. At every Monte Carlo run, we generate random data and Hammerstein systems, according to the following specifics.

  • The linear subsystem model is of output error type:


    generated by picking poles and zeros at random. The poles and zeros were sampled in conjugate pairs with uniform in and uniform in .

  • The input nonlinearity is a polynomial of fourth order. It is a linear combination of Legendre polynomial basis functions, defined as


    where . The coefficients are chosen uniformly in .

  • The input to the system is Gaussian white noise with unit variance.

  • The experimental data consists in pairs of input-output samples, simulated from zero initial conditions.

  • The measurement noise is Gaussian and white. Its variance is a fraction of the noiseless output variance, i.e.


    where depends on the experiment.

Every experiment is carried out in a different signal to noise ratio (SNR) condition, see Table I.

Experiment 1 2 3 4
SNR 10 20 50 100
Table 1: SNR considered in the experiments

We aim at estimating samples of the system impulse response of the LTI systems (which are such that and with the sign of the first sample positive) and the coefficients of the nonlinear block. We test the following estimation methods.

  • KOP: This is the method described in this paper. The ML optimization problem is solved using the function fminsearch available in matlab. The search was initialized with the elements of uniformly sampled in , , and equal to the sample variance of the residuals of .

  • LS-OP: This estimator implements the least-squares overparameterization-based method proposed in [bai1998optimal] and briefly reviewed in Section 3.1. Note that, under the working experimental conditions, this method has to perform a least-squares estimate of a -dimensional vector.

  • NLHW: This is the matlab function nlhw that uses the prediction error method to identify the linear block in the system (see [ljung2009developments] for details). To get the best performance from this method, we equip it with an oracle that knows the true order of the LTI system generating the measurements and knows the order of the polynomial input nonlinearity.

Note that all the methods have available the same amount of prior information, namely the orders of the input polynomial. The knowledge of th order of the linear block is known only to NLHW, which makes use of a parametric description of the linear system. Furthermore, we note that, due to the Gaussianity of the noise, the least-squares procedure in LS-OP is optimal in the Gauss-Markov sense.

We assess the accuracy of the estimated models using two performance indices. The first is the fit of the system impulse response, defined as


where is the system generated at the -th run of each experiment, its estimate and its mean. The second is the fit of the static nonlinear function, given by


Figure 1 shows the results of the outcomes of the 4 experiments. The box plots compare the results of KOP, LS-OP and NLHW for the considered signal to noise ratios. We can see that, for high SNR, all the estimators perform well, especially in identifying the nonlinearity coefficients. For lower SNR , however, the proposed method KOP performs substantially better than the others. This is mainly because of the regularizing effect of the KOP kernel that reduces the variance of the estimates. Notice also that the proposed approach enforces the rank deficiency in the reshaped version of , so it circumvents the errors introduced by the rank-one approximation made by LS-OP. The main drawback of NLHW is that it relies on a high dimensional nonlinear optimization, as it needs to estimate all the parameters in the model. The proposed method is instead nonparametric, and does not rely on the knowledge of the order of the LTI system.

7 Conclusions

Regularization is an effective technique to control the variance of least squares estimates. In this paper we have studied how to improve popular overparameterization methods for Hammerstein system identification using Gaussian process regression with a suitable prior. To this end, starting from the stable spline kernel, we have introduced the KOP kernel, which we believe to be a novel concept in Hammerstein system identification. Using the KOP kernel, we have designed a regularized least-squares estimator which provides an estimate of the overparameterized vector. The impulse response of the LTI system and the coefficients of the static nonlinearity are then retrieved by suitably decomposing the estimated vector. In contrast with the original overparameterization method, this decomposition involves no approximation. An important contribution is showing that this procedure estimate is equivalent to our recently proposed kernel-based method [risuleo2015kernel]. Using simulations, we have shown that the proposed method compares very favorably with the current state-of-the-art algorithms for Hammerstein system identification.

The introduction of the KOP kernel possibly opens up for new effective system identification methods based on the combination of overparameterized vectors and regularization techniques. In fact, we believe that Hammerstein system identification is not the only problem where KOP kernels could find application. Another possible extension of the proposed method is the design of new kernels merging a kernel for the static nonlinearity and the stable spline kernel. The main issue with this approach is that, at least theoretically, the Gaussian description of the resulting overparameterization vector would be lost.