# A prediction perspective on the Wiener-Hopf equations for discrete time series

The Wiener-Hopf equations are a Toeplitz system of linear equations that have several applications in time series. These include the update and prediction step of the stationary Kalman filter equations and the prediction of bivariate time series. The Wiener-Hopf technique is the classical tool for solving the equations, and is based on a comparison of coefficients in a Fourier series expansion. The purpose of this note is to revisit the (discrete) Wiener-Hopf equations and obtain an alternative expression for the solution that is more in the spirit of time series analysis. Specifically, we propose a solution to the Wiener-Hopf equations that combines linear prediction with deconvolution. The solution of the Wiener-Hopf equations requires one to obtain the spectral factorization of the underlying spectral density function. For general spectral density functions this is infeasible. Therefore, it is usually assumed that the spectral density is rational, which allows one to obtain a computationally tractable solution. This leads to an approximation error when the underlying spectral density is not a rational function. We use the proposed solution together with Baxter's inequality to derive an error bound for the rational spectral density approximation.

## Authors

• 4 publications
• 4 publications
03/01/2021

### Posterior consistency for the spectral density of non-Gaussian stationary time series

Various nonparametric approaches for Bayesian spectral density estimatio...
07/01/2020

### Spectral methods for small sample time series: A complete periodogram approach

The periodogram is a widely used tool to analyze second order stationary...
01/12/2018

### A note on Herglotz's theorem for time series on function spaces

01/18/2022

We show that solution to the Hermite-Padé type I approximation problem l...
05/17/2019

### Functional Lagged Regression with Sparse Noisy Observations

A (lagged) time series regression model involves the regression of scala...
12/11/2019

### The Wasserstein-Fourier Distance for Stationary Time Series

We introduce a novel framework for analysing stationary time series base...
01/16/2020

### The Widely Linear Complex Ornstein-Uhlenbeck Process with Application to Polar Motion

Complex-valued and widely linear modelling of time series signals are wi...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The Wiener-Hopf technique (Wiener and Hopf, 1931; Hopf, 1934) was first proposed in the 1930s as a method for solving an integral equation of the form

 g(τ)=∫∞0h(t)c(τ−t)dtforτ∈[0,∞)

in terms of , where is a known difference kernel and is a specified function. The above integral equation and the Weiner-Hopf technique have been used widely in many applications in applied mathematics and engineering (see Lawrie and Abrahams (2007) for a review). In the 1940s, Wiener (1949) reformulated the problem within discrete “time”, which is commonly referred to as the Wiener (causal) filter. The discretization elegantly encapsulates several problems in time series analysis. For example, the best fitting finite order autoregressive parameters fall under this framework. The autoregressive parameters can be expressed as a finite interval Wiener-Hopf equations (commonly referred to as the FIR Wiener filter), for which Levinson (1947) and Durbin (1960) proposed a method for solving these equations. More broadly, the best linear predictor of a causal stationary time series naturally gives rise to the Wiener filter. For example, the prediction of hidden states based on the observed states in a Kalman filter model. The purpose of this paper is to revisit the discrete-time Wiener-Hopf equations (it is precisely defined in (1.2)) and derive an alternative solution using the tools of linear prediction. Below we briefly review some classical results on the Wiener filter.

Suppose that is a real-valued, zero mean bivariate weakly stationary time series where and

are defined on the same probability space

. Let and be the autocovariance and cross-covariance function respectively. Let denote the real Hilbert space in spanned by . The inner product and norm on is and respectively. For , let be the closed subspace of spanned by and is the orthogonal projection of onto . The orthogonal projection of onto is

 PH0(Y0)=∞∑j=0hjX−j, (1.1)

were . To evaluate , we rewrite (1.1) as a system of linear equations. By using that is an orthogonal projection of onto it is easily shown that (1.1) leads to the system of normal equations

 cYX(ℓ)=∞∑j=0hjc(ℓ−j)forℓ=0,1,.... (1.2)

The above set of equations is typically referred to as the Wiener-Hopf equations (or semi-infinite Toeplitz equations). There are two well-known methods for solving this equation in the frequency domain; the Wiener-Hopf technique (sometimes called the gapped function, see

Wiener (1949)) and the prewhitening method proposed in Bode and Shannon (1950) and Zadeh and Ragazzini (1950). Both solutions solve for (see Kailath (1974), Kailath (1980) and Orfanidis (2018), Sections 11.3-11.8). The Wiener-Hopf technique is based on the spectral factorization and a comparison of Fourier coefficients corresponding to negative and positive frequencies. The prewhitening method, as the name suggests, is more in the spirit of time series where the time series is whitened using an autoregressive filter. We assume the spectral density satisfies the condition . Then, admits an infinite order MA and AR representation (see Pourahmadi (2001), Sections 5-6 and Krampe et al. (2018), page 706)

 Xt=εt+∞∑j=1ψjεt−j,Xt−∞∑j=1ϕjXt−j=εtt∈Z (1.3)

where , , and

is a white noise process with

.

From (1.3), we immediately obtain the spectral factorization , where . Given , we use the notation and . Both the Wiener-Hopf technique and prewhitening method yield the solution

 H(ω)=σ−2ϕ(ω)[ϕ(ω)∗fYX(ω)]+, (1.4)

where and is a complex conjugate of .

The normal equations in (1.2) belong to the general class of Wiener-Hopf equations of the form

 gℓ=∞∑j=0hjc(ℓ−j)forℓ≥0, (1.5)

where is a symmetric, positive definite sequence. The Wiener-Hopf technique yields the solution

 H(ω)=σ−2ϕ(ω)[ϕ(ω)∗G+(ω)]+, (1.6)

where (the derivation is well-known, but for completeness we give a short proof in Section 2.3). An alternative method for solving for is within the time domain. This is done by representing (1.5) as the semi-infinite Toeplitz system

 g+=T(f)h+, (1.7)

where and are semi-infinite (column) sequences and is a Toeplitz matrix of the form . Let (setting ) denote the autoregressive coefficients corresponding to defined as in (1.3),

be its Fourier transform. By letting

for , we define the lower triangular Toeplitz matrix . Provided that , it is well-known that is invertible on , and the inverse is (see, for example, Theorem III of Widom (1960)), thus the time domain solution to (1.5) is .

In this paper, we study the Wiener-Hopf equations from a time series perspective, combining linear prediction methods developed in the time domain with the deconvolution method in the frequency domain. Observe that (1.5) is semi-infinite convolution equations (since the equations only hold for non-negative index ), thus the standard deconvolution approach is not possible. In Subba Rao and Yang (2021), we used the tools of linear prediction to rewrite the Gaussian likelihood of a stationary time series within the frequency domain. We transfer some of these ideas to solving the Wiener-Hopf equations. In Section 2.2, we show that we can circumvent the constraint , by using linear prediction to yield the normal equations in (1.2) for all . In Section 2.3, we show that there exists a stationary time series

where and induce the general Wiener-Hopf equations of the form (1.5). This allows us to use the aforementioned technique to reformulate the Wiener-Hopf equations as a bi-infinite Toeplitz system, and thus obtain a solution to as a deconvolution. The same technique is used to obtain an expression for entries of the inverse Toeplitz matrix .

In practice, evaluating in (1.4) for a general spectral density is infeasible. Typically, it is assumed that the spectral density is rational, which allows one to obtain a computationally tractable solution for . Of course, this leads to an approximation error in when the underlying spectral density is not a rational function. In Section 3 we show that Baxter’s inequality can be utilized to obtain a bound between and its approximation based on a rational approximation of the general spectral density. Lastly, proofs of results in Sections 2 and 3 can be found in the Appendix.

## 2 A prediction approach

### 2.1 Notation and Assumptions

In this section, we collect together the notation introduced in Section 1 and some additional notation necessary for the paper.

Let be the space of all square integral complex functions on and is a space of all bi-infinite complex sequences where . Similarly, we denote , a space of all semi-infinite square summable sequences. To connect the time and frequency domain through an isomorphism, we define the Fourier transform and its adjoint

 F(v)(ω)=∑j∈Zvjeijωand[F∗(g)]j=12π∫2π0g(λ)e−ijλdλ.

We define the semi- and bi-infinite Toeplitz matrices (operators) and on and respectively. We use the following assumptions.

###### Assumption 2.1

Let be a symmetric positive definite sequence and be its Fourier transform. Then,

• .

• For some we have .

Under Assumption 2.1(i), we have the unique factorization

 f(ω)=σ2|ψ(ω)|2=σ2|ϕ(ω)|−2, (2.1)

where and . We note that the characteristic polynomials and do not have zeroes in thus the AR parameters are causal or have minimum phase (see Szegö (1921) and Inoue (2000), pages 68-69).

We mention that Assumption 2.1(i) is used in all the results in this paper, whereas, Assumption 2.1(ii) is only required in the approximation theorem in Section 3.

### 2.2 Bivariate time series and the Wiener-Hopf equations

We now give an alternative formulation for the solution of (1.2) and (1.5), which utilizes properties of linear prediction to solve it using a standard deconvolution method. To integrate our derivation within the Wiener causal filter framework, we start with the classical Wiener filter. For and , let

 PH0(Y)=∞∑j=0hjX−j. (2.2)

We observe that by construction, (2.2) gives rise to the normal equations

 cov(Y,X−ℓ)=∞∑j=0hjc(ℓ−j)ℓ≥0. (2.3)

Since (2.3) only holds for positive , this prevents one using deconvolution to solve for . Instead, we define a “proxy” set of variables for such that (2.3) is valid for . By using the property of orthogonal projections, we have

 cov(Y,PH0(X−ℓ))=cov(PH0(Y),X−ℓ)forℓ<0.

This gives

 (2.4)

Equations (2.3) and (2.4) allow us to represent the solution of as a deconvolution. We define the semi- and bi-infinite sequences , , and . Taking the Fourier transform of gives and thus

 H(ω)=F(c±)(ω)f(ω)=∑∞ℓ=0cov(Y,X−ℓ)eiℓω+∑∞ℓ=1cov(Y,PH0(Xℓ))e−iℓωf(ω). (2.5)

This forms the key to the following theorem.

###### Theorem 2.1

Suppose is a stationary time series whose spectral density satisfies Assumption 2.1(i) and . Let , then , and

 H(ω)=∑∞ℓ=0cYX(ℓ)(eiℓω+ψ(ω)∗ϕℓ(ω)∗)f(ω), (2.6)

where and are defined in (2.1) and for .

PROOF. See Appendix A.

###### Remark 2.1

It is clear that is not a well-defined random variable. However, it is interesting to note that under Assumption 2.1(ii) (for ) is a well defined random variable, where and

 ∞∑ℓ=1PH0(Xℓ)e−iℓω=ψ(ω)∞∑j=0X−jϕj(ω). (2.7)

### 2.3 General Wiener-Hopf equations

We now generalize the above prediction approach to general Wiener-Hopf linear equations which satisfy

 gℓ=∞∑j=0hjc(ℓ−j)ℓ≥0, (2.8)

where and (which is assumed to be a symmetric, positive definite sequence) are known. We will obtain a solution similar to (2.6) but for the normal equations in (2.8). We first describe the classical Wiener-Hopf method to solve (2.8). Since is known for all , we extend (2.8) to the negative index , and define as

 gℓ=∞∑j=0hjc(ℓ−j)forℓ<0. (2.9)

Note that is not given, however it is completely determined by and . The Wiener-Hopf technique evaluates the Fourier transform of the above and isolates the positive frequencies to yield the solution for . In particular, evaluating the Fourier transform of (2.8) and (2.9) gives

 H(ω)f(ω)=G−(ω)+G+(ω) (2.10)

where and . Replacing with and dividing the above with yields

 H(ω)ψ(ω)=G−(ω)σ2ψ(ω)∗+G+(ω)σ2ψ(ω)∗=σ−2ϕ(ω)∗G−(ω)+σ−2ϕ(ω)∗G+(ω). (2.11)

Isolating the positive frequencies in (2.11) gives the solution

 H(ω)=σ−2ϕ(ω)[ϕ(ω)∗G+(ω)]+, (2.12)

this proves the result stated in (1.6). Similarly, by isolating the negative frequencies, we obtain in terms of and

 G−(ω)=−1∑ℓ=−∞gℓeiℓω=−ψ(ω)∗[ϕ(ω)∗G+(ω)]−.

The Wiener-Hopf technique yields an explicit solution for and . However, from a time series perspective, it is difficult to interpret these formulas. We now obtain an alternative expression for these solutions based on a linear prediction of random variables.

We consider the matrix representation, , in (1.7). We solve by embedding the semi-infinite Toeplitz matrix on into the bi-infinite Toeplitz system on . We divide the bi-infinite Toeplitz matrix into four sub-matrices , , , and . We observe that . Further, we let and where . Then, we obtain the following bi-infinite Toeplitz system on

 T±(f)h±=(C00C01C10C11)(0h+)=(C01h+C11h+)=(C01C−111g+g+)=(g−g+)=g±. (2.13)

We note that the positive indices in the sequence are , but for the negative indices, where , it is which is identical to defined in (2.9). The Fourier transform on both sides in (2.13) gives the deconvolution , which is identical to (2.10). We now reformulate the above equation through the lens of prediction. To do this we define a stationary process and a random variable on the same probability space which yields (2.8) as their normal equations.

We first note that since is a symmetric, positive definite sequence, there exists a stationary time series with is its autocovariance function (see Brockwell and Davis (2006), Theorem 1.5.1). Using this we define the random variable

 Y=∞∑j=0hjX−j. (2.14)

Provided that , then and thus belongs to the Hilbert space spanned by (we show in Theorem 2.2 that this is true if ). By (2.8), we observe that for all . We now show that for

 cov(Y,X−ℓ)=[C01C−111g+]ℓ=gℓ.

First, since , then . Further, for , the th row (where we start the enumeration of the rows from the bottom) of contains the coefficients of the best linear predictor of given i.e.

 PH0(X−ℓ)=∞∑j=0[C01C−111]ℓ,jX−jℓ<0.

Using the above, we evaluate for

 cov(Y,PH0(X−ℓ)) = cov(Y,∞∑j=0[C01C−111]ℓ,jX−j) = ∞∑j=0[C01C−111]ℓ,jcov(Y,X−j)(from (???), gj=cov(Y,X−j)) = ∞∑j=0[C01C−111]ℓ,jgj=[C01C−111g+]ℓ=gℓ.

Thus the entries of are indeed the covariances: and . This allows us to use Theorem 2.1 to solve general Wiener-Hopf equations. Further, it gives an intuitive meaning to (2.9) and (2.13).

###### Theorem 2.2

Suppose that is a symmetric, positive definite sequence and its Fourier transform satisfies Assumption 2.1(i). We define the (semi) infinite system of equations

 gℓ=∞∑j=0hjc(ℓ−j)ℓ≥0,

where . Then, and

 H(ω)=∑∞ℓ=0gℓ(eiℓω+ψ(ω)∗ϕℓ(ω)∗)f(ω). (2.15)

PROOF. See Appendix A.

The solution for given in (2.12) was obtained by comparing the frequencies in a Fourier transform. Whereas the solution in Theorem 2.2 was obtained using linear prediction. The two solutions are algebraically different. We now show that they are the same by direct verification. Comparing the solutions (2.12) and (2.15) we have two different expressions for

 H(ω)=σ−2ϕ(ω)[ϕ(ω)∗G+(ω)]+=F(g±)(ω)f(ω)=σ−2F(g±)(ω)|ϕ(ω)|2.

Therefore, the above are equivalent if

 [ϕ(ω)∗G+(ω)]+=F(g±)(ω)ϕ(ω)∗. (2.16)

We now prove the above is true by direct verification.

###### Lemma 2.1

Suppose the same set of assumptions and notations in Theorem 2.2 hold. Then

 [ϕ(ω)∗G+(ω)]+=ϕ(ω)∗∞∑ℓ=0gℓ(eiℓω+ψ(ω)∗ϕℓ(ω)∗)

where .

As mentioned in Section 1, it is well-known that . We show below that an alternative expression for the entries of can be deduced using the deconvolution method described in Theorem 2.2.

###### Corollary 2.1

Suppose the same set of assumptions and notations in Theorem 2.2 holds. Let denote the th row of . Then, for all and the Fourier transfrom is

 Dk(ω)=eikω+ψ(ω)∗ϕk(ω)∗f(ω)k≥0.

Therefore,

 dj,k=12π∫2π0(eikω+ψ(ω)∗ϕk(ω)∗f(ω))e−ijωdωj,k≥0.

PROOF. See Appendix A.

###### Remark 2.2 (Multivariate extension)

The case that the (autocovariance) sequence is made up of -dimensions, has not been considered in this paper. However, if

is a positive definite matrix with Vector MA

and Vector AR representations, then it is may be possible to extend the above results to the multivariate setting. A sufficient condition for this to hold is that there exists a such that for all

 c−1Id≤Σ(ω)≤cId,

(see Rozanov (1967), pages 77-78), where indicates that is positive semi-definite.

## 3 Finite order autoregressive approximations

In many applications it is often assumed the spectral density is rational (Cadzow (1982); Ahlén and Sternad (1991), and Ge and Kerrigan (2016)). Obtaining the spectral factorization of a rational spectral density (such as that given in (2.1)) is straightforward, and is one of the reasons that rational spectral densities are widely used. However, a rational spectral density is usually only an approximation of the underlying spectral density. In this section, we obtain a bound for the approximation when the rational spectral density corresponds to a finite order autoregressive process. We mention that using the expression in (2.15) easily lends itself to obtaining a rational approximation and for bounding the difference using Baxter’s inequality.

We now use the expression in (2.15) to obtain an approximation of in terms of a best fitting AR coefficients. In particular, using that , we replace the infinite order AR coefficients in

 H(ω)=∑∞ℓ=0gℓ(eiℓω+[ϕ(ω)∗]−1ϕℓ(ω)∗)f(ω) (3.1)

with the best fitting AR coefficients. More precisely, suppose that are the best fitting AR coefficients in the sense that it minimizes the mean squared error

 (ϕp,1,...,ϕp,p)′=argmina∥X0−p∑j=1ajX−j∥2=argmina12π∫2π0|1−p∑j=1ajeijω|2f(ω)dω (3.2)

where . Let and where . We note that the zeros of lie outside the unit circle. Then, we define the approximation of as

 Hp(ω)=∑∞ℓ=0gℓ(eiℓω+[ϕp(ω)∗]−1ϕp,ℓ(ω)∗)fp(ω), (3.3)

where for and if . We observe that the Fourier coefficients of are the solution of where with . Thus and are approximations of and . Observe that by using Lemma 2.1 and (2.12) we can show that

 Hp(ω)=σ−2ϕp(ω)[ϕp(ω)∗G+(ω)]+ (3.4)

Below we obtain a bound for . It is worth noting that to obtain the bound we use the expressions for and given in (3.1) and (3.3), as these expression are easier to study than their equivalent expressions in (2.12) in (3.4).

###### Theorem 3.1 (Approximation theorem)

Suppose that is a symmetric, positive definite sequence that satisfies Assumption 2.1(ii) and its Fourier transform satisfies Assumption 2.1(i). We define the (semi) infinite system of equations

 gℓ=∞∑j=0hjc(ℓ−j)ℓ≥0,

where . Let and be defined as in (3.1) and (3.3). Then

 ∣∣H(ω)−Hp(ω)∣∣≤C[p−K+1sups|gs|+p−K|G+(ω)|],

where .

PROOF. See Appendix A.

### Acknowledgements

SSR and JY gratefully acknowledge the partial support of the National Science Foundation (grant DMS-1812054).

## Appendix A Proofs

The purpose of this appendix is to give the technical details behind the results stated in the main section.

PROOF of Theorem 2.1   To prove that , we note that since , then is a well defined random variable where with

 var[PH0(Y)]=⟨h+,T(f)h+⟩.

Furthermore, where and . Since , this implies .

To prove that , we recall that (2.2) leads to the matrix equation where . We define the operator norm and use the result that since , then . Thus , as required.

From (2.5), we have . We next express in terms of an infinite order AR and MA coefficients of . To do this we observe

 (A.1)

The second term on the right hand side of (A.1) looks quite wieldy. However, we show below that it can be expressed in terms of the AR coefficients associated with . It is well-known that the -step ahead forecast () has the representation , where for , the -step prediction coefficients are

 ϕj(ℓ)=ℓ∑s=