The Lévy State Space Model

In this paper we introduce a new class of state space models based on shot-noise simulation representations of non-Gaussian Lévy-driven linear systems, represented as stochastic differential equations. In particular a conditionally Gaussian version of the models is proposed that is able to capture heavy-tailed non-Gaussianity while retaining tractability for inference procedures. We focus on a canonical class of such processes, the α-stable Lévy processes, which retain important properties such as self-similarity and heavy-tails, while emphasizing that broader classes of non-Gaussian Lévy processes may be handled by similar methodology. An important feature is that we are able to marginalise both the skewness and the scale parameters of these challenging models from posterior probability distributions. The models are posed in continuous time and so are able to deal with irregular data arrival times. Example modelling and inference procedures are provided using Rao-Blackwellised sequential Monte Carlo applied to a two-dimensional Langevin model, and this is tested on real exchange rate data.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

08/11/2020

Deep State-Space Gaussian Processes

This paper is concerned with a state-space approach to deep Gaussian pro...
11/20/2017

Non-Gaussian Autoregressive Processes with Tukey g-and-h Transformations

When performing a time series analysis of continuous data, for example f...
03/13/2018

A General Class of Score-Driven Smoothers

Motivated by the observation that score-driven models can be viewed as a...
07/16/2019

Structured Variational Inference in Unstable Gaussian Process State Space Models

Gaussian processes are expressive, non-parametric statistical models tha...
06/12/2017

Fractional Langevin Monte Carlo: Exploring Lévy Driven Stochastic Differential Equations for Markov Chain Monte Carlo

Along with the recent advances in scalable Markov Chain Monte Carlo meth...
02/14/2018

Inference for Continuous Time Random Maxima with Heavy-Tailed Waiting Times

In many complex systems of interest, inter-arrival times between events ...
08/12/2019

Modeling continuous-time stochastic processes using N-Curve mixtures

Representations of sequential data are commonly based on the assumption ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Heavy-tailed non-Gaussian processes play a significant role in many real-world scenarios, exhibiting extreme values much more frequently than a Gaussian model. Examples of such abrupt changes include variations presented by stock prices or insurance gains/losses in financial applications, as studied extensively since the seminal works [Mandelbrot1963] and [Fama1965a]. Further applications can be found in various fields of engineering, such as communications (see [Azzaoui2010] for statistical modelling of channels, [Fahs2012, FreitasEganClavierEtAl2017] for capacity bounds, [LiebeherrBurchardCiucu2012] for delay bounds in networks with -stable noise, and [ShevlyakovKim2006, WarrenThomas1991] for signal detection), signal processing [Nikias1995, Lombardi2006, GodsillRayner1998, GodsillKuruoglu1999, Godsill_2000a], image analysis [Achim2001, Achim2006].

In many of these situations, the random phenomena considered can be still thought of as emerging from the combination of many independent perturbations. According to the generalized CLT [Gnedenko1968, p. 162][Feller1966, p. 576]

, whenever the sum of independent identically distributed (i.i.d.) random variables (RVs) converges in distribution, it converges to a member of the class of

-stable distributions; this class is central to this paper. The Gaussian is a special member of this class, the only one with finite variance. Hence, using non-Gaussian

-stable processes offers a natural way to extend beyond Gaussian processes to heavy-tailed cases. A principal reason for the attention that -stable laws have received in applications (see the extensive bibliography listed in [nolan_web]), is their role in the generalized CLT, the modelling flexibility offered by this class of distributions, as well as the useful and unique properties of -stable Lévy processes [Samorodnitsky_Taqqu_1994] such as self-similarity and infinite divisibility. We will thus focus on the -stable class of models, but will also describe how our models and methods can be applied to a much broader class of heavy-tailed processes.

Ii Problem formulation

We study in particular inference methods for linear stochastic differential equations (sdes) driven by non-Gaussian Lévy processes [rogers_williams_2000, Oksendal2003, Bertoin_1997]:

(1)

where is a non-Gaussian Lévy process, whose solution is obtained by stochastic integration:

The characteristic function for a general Lévy process having no drift or Brownian motion part is given by

[Kallenberg_2002], Corollary 15.8, [Bertoin_1997], as

(2)

where

(3)

where is the indicator function and is a Lévy measure on . We will be particularly concerned with the so-called infinite activity processes having , and hence an almost surely infinite number of jumps within any finite time interval.

We will consider principally the

-stable Lévy process, later highlighting how to broaden the analysis to other classes. The focus will be on vector sdes driven by scalar Lévy processes, although in principle our methods extend also to vector valued Lévy processes. Take, then,

to be an -stable Lévy process, defined to have, for [Samorodnitsky_Taqqu_1994],

  • a.s.

  • Independent, stationary -stable incrememts ,

where denotes having the same probability distribution, and is the -stable law, having scale parameter , skewness , location , and tail parameter [Feller1966, Samorodnitsky_Taqqu_1994, Nikias1995]. Such Lévy processes are pure jump processes (), or pure jump plus drift (), possessing almost surely an infinte number of jumps in finite time intervals and they are in general highly intractable for inference.111We explicity exclude the special case for notational reasons, but this could in principle be included in future studies.

It will be convenient to define the integral, over the time interval 222We take the time axis up to by convention, although the processes can be extended to arbitrarily large times axes in practice.

with

In the case (univariate sde) with , , the problem is solved using results based on [Samorodnitsky_Taqqu_1994] Section 3.2.2, see our previous work [Lemke2015, Lemke_Godsill_2015]. In the vector sde case, we could consider for example a direct Euler-type discretisation of the integral, solving approximately using increments of the process, , , , and then using the conditionally Gaussian auxiliary variable models of [Lemke2015, godsill2006bayesian]. However, there are issues about the level of approximation, choice of discretisation step-size, etc. that make this approach unsatisfactory as a general solution. However, a pertinent question remains as to whether the limit as can be explicitly calculated, hence avoiding the need to simulate auxiliary states on the scale. We have been unable to do so, except for the scalar case (see [Lemke_Godsill_2015]

), and hence we propose very accurate approximate models suitable for likelihood and Bayesian inference procedures.

Iii Summary of Methods and Contribution

The starting point for the Lévy state space model is the generalised shot noise representation of , see [Rosinski_2001]. A principal contribution here is to adopt a non-centered conditionally Gaussian

form for the shot noise representation which has been largely overlooked in the stochastic processes literature, but which is of significant benefit in generating tractable and effective inference procedures. This special form enables both tractable conditional Gaussian inference, and the possibility of modelling of asymmetric Lévy processes. In addition, we show that the residual error committed by truncation can be approximated by a Gaussian-driven sde that exactly matches the first and second moments of the true residual error. We have established central limit theorems and sharp rates of convergence for the approximating underlying Gaussian residual Lévy process. Finally, based on this representation, we develop transition densities for the

Lévy state space

model in the conditionally Gaussian form, and show how to perform marginal inference in the model using Kalman-filtering recursions built into a Bayesian (or potentially likelihood-based) Monte Carlo framework. Results have been obtained for simulated data and real exchange rate data.

Other relevant work includes [clement2019]

who propose theorems and estimating function methods for stable Lévy-driven sdes;

[Fournier_20011], who prove error bounds between Lévy-driven sdes with small jumps and their Gaussian approximations; [Jasra2019] provide Bayesian inference procedures using a quasi-likelihood approach and MCMC; [Dereich2011, Dereich2016] provide a Multi-level Monte Carlo approach for evaluation of expectations, using coupled Euler approximations; and [Ferreiro-Castilla2016] who propose Euler schemes under Poisson-arrival times. In contrast to these approaches, we tackle the problem without resort to Euler approximations or pseudo-likelihood functions, and our methods are applicable to low- or high-frequency observations, but are currently limited to linear sdes. Our approach focuses on a class of Lévy processes that may be expressed as a conditionally Gaussian shot noise process, which includes for example all -stable Lévy processes and many truncated and modified variants on these processes; however, at the expense of more computational effort, more general non-Gaussian versions may also be incorporated into our framework.

Iv Generalised Shot Noise Processes

We will use a generalised shot noise formulation studied in [Rosinski_2001] and references therein, in which the Lévy process is expressed on a time axis as

(4)

Here

are the epochs of a unit rate Poisson process,

, are mutually independent random variables with a certain distribution , and are independent uniform random variables on . The centering terms are sometimes needed and sometimes not, as discussed below. is generally taken to be non-increasing in its argument for fixed . A point process for the expanded space with mean measure may be constructed as [Rosinski_2001]:

where the iid uniform variables give the times of arrival of jumps and give the size of the jumps and is the Dirac point mass located at . This leads to a direct expression for [Rosinski_2001] (by the Lévy-Ito integral representation of , expressed in terms of the augmented point process ):

(5)

which, on substitution for , leads to the shot noise representation (4). The almost sure convergence of this series to is proven in [Rosinski_2001]. In this paper we will apply, in particular, random Poisson truncations of the form which leads to approximations, neglecting small jumps, of the form

(6)

and the corresponding truncated shot noise series

(7)

The corresponding Lévy measure for can then be shown to be [Rosinski_2001]

(8)

with

and where

V The conditionally Gaussian Lévy process

One of the principal contributions of this paper is in methods for tractable inference about states and parameters of Lévy driven sdes. The starting point for this is the generalised shot noise representation (4). We propose to use a special form of this in which we define:

where is a non-increasing function.

If we furthermore assume a non-centered Gaussian form then special properties result, notably tractable conditional Gaussian inference, and the possibility of modelling of asymmetric Lévy processes. Note that in the centered case, , such processes are well known, see e.g. the processes of type G [Rosinski_1991], and the Condition S processes described in [Samorodnitsky_Taqqu_1994], but in the general case they are commonly overlooked. A well known example of this form for is the -stable Lévy process, which can be represented with the form [Samorodnitsky_Taqqu_1994]:

where is the stable law tail parameter. The case , the Cauchy process, is specifically excluded from now on, as this forms a singular special case of the family [Samorodnitsky_Taqqu_1994]. Essentially any distribution for the may be adopted, subject to finite absolute th moments , which is well known to be satisfied by our non-centred normal . Following the procedure in the previous section, it is relatively straightforward to verify that the correct Lévy measure for the stable law is obtained under this function and the non-centered Gaussian form (a proof will be given in subsequent publications).

There are many possible choices for the centering function and we follow a standard procedure [Samorodnitsky_Taqqu_1994] in which no centering is required for and a modified centering can be calculated in closed form for :

and it can be shown that this choice of centering is also equal to the expected value of the truncated process:

There is extensive earlier work on the analysis of the convergence rate of truncated series of this class to the corresponding stable laws; see, e.g., [JanickiWeron1994, janicki1992computer, JanickiKokoszka1992, LedouxPaulauskas1996, Bentkus1996, Bentkus2001]. In [Riabiz2018a] we reviewed these results, and derived new bounds for the symmetric stable laws under our conditionally Gaussian representation. Here though we are able to provide the following more general result that applies to the asymmetric general case for the first time:

Theorem 1

If the approximating truncated process is as in (7) and is the full untruncated Lévy process, then:

where is a constant that does not depend on the truncation level .

Proof: The proof follows a similar structure to Theorem 3.1 of [Asmussen2001] and will be presented in a future publication.

Remarks:

  1. We notice that the rate of convergence wrt matches that in [Asmussen2001] Th. 3.1, which corresponds to a different type of truncation on the absolute size of the jumps. We comment however that it is not as good as the convergence rate we have previously obtained for the symmetric case, which was [Riabiz2018a]. However, we note that our current result is significantly more general in that it covers both symmetric and non-symmetric (skewed) cases. We can postulate that a more subtle future development will prove convergence rates between and as we move from fully skewed to symmetric models.

  2. We have computed this convergence result for the particular case of the stable law, , but other conditionally Gaussian Lévy processes are likely to be amenable to similar analysis and we will study these in future work. Similarly, non-Gaussian versions may also be amenable to similar analysis provided convenient moment expressions are available for certain distributions.

V-a The Lévy-driven sde

Recall that the required sde solution is given in Eq. (1) and in particular we require a representation of the integral:

The proposed series representation of the integral is based on the generalised shot noise representation of the previous section,

(9)

and to get the basic result we may (informally) substitute this directly into (1) to obtain

(10)

where as before:

  • are event times of a unit-rate Poisson process

  • are i.i.d. ,

  • are i.i.d. such that

  • are constants, non-zero only if

and is as follows:

A similar form has been proved to converge to the correct -stable sde in [Samorodnitsky_Taqqu_1994], in the scalar sde case, using a discrete Rademacher random variable . In [Lemke_Godsill_2015] the scalar case was also proven with , and a special case was also given for symmetric Gaussians in [Samorodnitsky_Taqqu_1994] (the so-called Condition S). Here we are extending to the vector sde , with , for which the full proof will be presented in a future publication. Some insights may be gained by the following informal interpretation (see also [Samorodnitsky_Taqqu_1994]):

  • are jump arrival times

  • is the size of jump at

  • is the effect at time of passing a unit jump at through the linear ODE

  • is a drift term, only non-zero for

In particular, for practical implementations we here introduce the randomly truncated version, with truncation for ,

which has zero mean for finite and , by construction.

We now analyse the error due to this truncation, given by . The following result gives an exact characterisation of the covariance of the error:

Theorem 2

Proof: The proof relies on calculation of the covariance of the point process corresponding to and will be presented in a future publication.

Remark: The expectation is obtained for our sde as:

and we can recognise this as the covariance function of a linear Gaussian sde [Oksendal2003]

where is the unidimensional Brownian motion. Hence the covariance of is exactly matched by that of a linear Gaussian sde :

and .

We propose then to approximate the exact integral in four possible ways, each involving a different truncation/ Gaussian approximation to the residual, as explained in the next section.

Vi Lévy state space model

We are now in a position to specify the stable Lévy state-space model. This requires the specification of the transition density between times and , , expressed as

(11)

This may be obtained fairly straightforwardly from (10) and the properties of the matrix exponential and the point process on the sub-interval of as

(12)

where and all terms are defined as before, except that now are uniforms on the sub-interval and are drawn independently of the point process in any non-overlapping time intervals. Thus, for each successive time interval a new set of s needs to be independently generated afresh in order to forward simulate the process for times . We have constructed the model in this way so that a causal forward simulation may be carried out without requiring a prior simulation of all the for the entire time axis as in (10) .

The randomly truncated version can be obtained similarly,

(13)

with

and

We can here interpret truncation parameter as the expected number of jumps per unit time. Note that, conditional upon the term is Gaussian,

with

(14)

and

(15)

To complete the state space model, recall that the residual terms of the series from , , are zero mean with covariance matrix:

leading to a decomposition of the exact state space model as:

We can now propose several approximations to this exact representation by either removing or approximating the residual term . In each case is approximated with an independent zero mean Gaussian term having covariance , where is a constant leading to different approximations, each corresponding to a conditionally Gaussian form for the approximated Lévy-driven sde’s transition density:

(16)

In each case the conditioning random variables are defined such that are generated independently of all other time intervals that do not overlap with as:

where

denotes the exponential distribution with unit mean.

This conditional Gaussian transition density is the building block for all of the forward simulation and inference methods described subsequently. The four cases of approximation considered are:

  1. Truncated series . The first approximation neglects the residual altogether, hence leading to the truncated version of the process, (13), and corresponding to . This approximation of the process has some computational benefits, as will be considered shortly, but is most likely the least accurate of our three approximations.

  2. Gaussian residual approximation . In this more sophisticated approximation a zero mean Gaussian approximation to the residual is made, replacing a zero mean Gaussian random variable with covariance matched to . In this case . This is most likely the most accurate representation of the process that we have, but it has some less desirable computational properties, as will be described shortly. Note however that for the important symmetric case the transition density is identical to the following partial Gaussian approximation, acquiring its computational advantages (principally the ability to marginalise ).

  3. Partial Gaussian residual approximation . Here a halfway house may be considered in which is replaced with the solution of a linear SDE whose covariance equals and hence . This version retains the computational benefits of , but does not fully account for the covariance of (since ), and so is likely to sit in between 1) and 2) in terms of approximation accuracy.

  4. Joint Gaussian residual approximation of and . A final, more complex version of the truncation attempts to approximate the resisual series of and jointly. This has the same computational benefits as and , but is a little more complex to compute and is not exposed here, see [Riabiz2019] for full details.

Vi-a Matrix-vector form for Lévy state space Model

We are now in a position to state the model in standard linear (conditionally upon and ) Gaussian state-space form. For the approximated process , which may be set equal to , case 1) above with , , case 2) above with , or , case 3) above with . In cases 1) and 3) it may be written using an extended state vector :

(17)

with

where (which does not depend upon ), and where . For case 2), owing to the more complex covariance structure of the Gaussian residual approximation , the mean may not be directly included as a state variable and we have

(18)

and .

Vii Computation of likelihoods and state-conditionals

Equations (17) and (18) can now form the basis for forward simulation of data from the Lévy state space model, or for inference about its states and parameters. Likelihoods and state conditional distributions may be computed using standard Bayesian recursions, implemented by the Kalman Filter, all conditioned upon the latent variables

. This means that Monte Carlo likelihood-based and Bayesian inference procedures such as Monte Carlo EM (MCEM), Markov chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) may routinely be implemented.

For a concrete example, take the case 1) or 3) approximated models and suppose we partially observe , at times , , arranged in ascending order,

For example, if and we are fully observing the first component of . Then, initialise , , with , , where is a constant. Note that the covariance of the noise and the prior covariance of , , are both scaled relative to . This is to ensure greatest analytical tractability of posterior densities computed using Kalman filtering. A flat prior on is obtained as , and a fully observed case with no observation noise is obtained with . Other more general prior structures can of course be incorporated, but this will be at the cost of a convenient conjugate posterior density for .

In order to obtain the closed form result, follow a scheme similar to that in, for example, [Harvey1990]. In this redefine the dynamical noise as

where , which does not depend on by construction in cases 1) and 3). Then, the Kalman filter computes [Harvey1990], for each ,

where , etc. are the Kalman filter output variables under the definition of the modified dynamical noise distribution , and , . From this the marginal likelihood is obtained as

where

and where is the dimension of the observation vector .

The conjugate prior for this form of likelihood is the inverted gamma distribution

. Completing the (standard) conjugate analysis [Bernardo_Smith_1994], we have:

(19)

and we have, remarkably, all the tools to carry out marginal anaylsis of the data conditioned on the auxiliary variables , and also joint conditional analysis for . We stress that this full analysis is only available with the chosen prior structures, in particular the normal-inverse gamma conjugate priors for and the special scaled form of the noise covariance matrices and . Without these structures we will still be able to use the standard Kalman filter model to marginalise and infer , but we will lose the closed form marginal expressions for and/or . A minor modification applies for the joint Gaussian residual approximation, case 4), in which once again the full Kalman filter calculations can be implemented with a modified noise covariance (not detailed here). Finally, for the Gaussian residual approximation , Case 2), the conjugate likelihood structure is lost owing to the term in the transition density’s noise covariance . The standard Kalman filter can be used to infer/marginalise , but not or .

Viii Example: Langevin model

In order to give a concrete application case, we work through the analysis for a 2-dimensional () Langevin model with similar structure to that used in [Christensen2012],

and work with a state vector:

The required system matrices for this model are

Then,

and

which renders the calculation of and (see (14) and (15) fairly straightforward:

(20)

and

(21)

Also:

and:

Viii-a Forward simulation example

Equation (17) may be used directly to simulate a discrete ‘skeleton’ of the process, and this in itself may be of use in the study of these intractable processes. An example simulation is given in Fig. 1 for an asymmetric Langevin model with . A much more heavy-tailed example is given in Fig. 2 for the symmetric case.

Viii-B Marginal Monte Carlo Filter

A sequential Monte Carlo (SMC) method is applied to infer the posterior distribution of the states of the Langevin model. This is implemented in the form of a standard bootstrap particle filter, proposing the variables according to (20) and (21) at each time interval and computing marginal weights according to (19). This is a fairly standard implementation, see e.g. [Cappe2007], and we omit further details here, leaving this to a subsequent publication. In all cases the state is observed as and the derivative is unobserved. An example run of the filter, showing the Monte Carlo filter trajectories, is displayed in Fig. 3. Note that is quite well inferred in the bottom panel, with appropriate posterior uncertainty about the trajectory. In Fig. 4 a small segment of the high-frequency tick level Euro dollar exchange rate from 2006 is modelled and tracked over a period of a few minutes with the filter. Here we found that a very heavy-tailed model () was successfully able to capture the apparent rapid jumps in the trend () of the process. Full simulation results and details of implementation will be provided in subsequent publications.

Fig. 1: Example Langevin model data set, , positively skewed ()
Fig. 2: Example Langevin model data set,