I Introduction
Heavytailed nonGaussian processes play a significant role in many realworld scenarios, exhibiting extreme values much more frequently than a Gaussian model. Examples of such abrupt changes include variations presented by stock prices or insurance gains/losses in financial applications, as studied extensively since the seminal works [Mandelbrot1963] and [Fama1965a]. Further applications can be found in various fields of engineering, such as communications (see [Azzaoui2010] for statistical modelling of channels, [Fahs2012, FreitasEganClavierEtAl2017] for capacity bounds, [LiebeherrBurchardCiucu2012] for delay bounds in networks with stable noise, and [ShevlyakovKim2006, WarrenThomas1991] for signal detection), signal processing [Nikias1995, Lombardi2006, GodsillRayner1998, GodsillKuruoglu1999, Godsill_2000a], image analysis [Achim2001, Achim2006].
In many of these situations, the random phenomena considered can be still thought of as emerging from the combination of many independent perturbations. According to the generalized CLT [Gnedenko1968, p. 162][Feller1966, p. 576]
, whenever the sum of independent identically distributed (i.i.d.) random variables (RVs) converges in distribution, it converges to a member of the class of
stable distributions; this class is central to this paper. The Gaussian is a special member of this class, the only one with finite variance. Hence, using nonGaussian
stable processes offers a natural way to extend beyond Gaussian processes to heavytailed cases. A principal reason for the attention that stable laws have received in applications (see the extensive bibliography listed in [nolan_web]), is their role in the generalized CLT, the modelling flexibility offered by this class of distributions, as well as the useful and unique properties of stable Lévy processes [Samorodnitsky_Taqqu_1994] such as selfsimilarity and infinite divisibility. We will thus focus on the stable class of models, but will also describe how our models and methods can be applied to a much broader class of heavytailed processes.Ii Problem formulation
We study in particular inference methods for linear stochastic differential equations (sdes) driven by nonGaussian Lévy processes [rogers_williams_2000, Oksendal2003, Bertoin_1997]:
(1) 
where is a nonGaussian Lévy process, whose solution is obtained by stochastic integration:
The characteristic function for a general Lévy process having no drift or Brownian motion part is given by
[Kallenberg_2002], Corollary 15.8, [Bertoin_1997], as(2) 
where
(3) 
where is the indicator function and is a Lévy measure on . We will be particularly concerned with the socalled infinite activity processes having , and hence an almost surely infinite number of jumps within any finite time interval.
We will consider principally the
stable Lévy process, later highlighting how to broaden the analysis to other classes. The focus will be on vector sdes driven by scalar Lévy processes, although in principle our methods extend also to vector valued Lévy processes. Take, then,
to be an stable Lévy process, defined to have, for [Samorodnitsky_Taqqu_1994],
a.s.

Independent, stationary stable incrememts ,
where denotes having the same probability distribution, and is the stable law, having scale parameter , skewness , location , and tail parameter [Feller1966, Samorodnitsky_Taqqu_1994, Nikias1995]. Such Lévy processes are pure jump processes (), or pure jump plus drift (), possessing almost surely an infinte number of jumps in finite time intervals and they are in general highly intractable for inference.^{1}^{1}1We explicity exclude the special case for notational reasons, but this could in principle be included in future studies.
It will be convenient to define the integral, over the time interval ^{2}^{2}2We take the time axis up to by convention, although the processes can be extended to arbitrarily large times axes in practice.
with
In the case (univariate sde) with , , the problem is solved using results based on [Samorodnitsky_Taqqu_1994] Section 3.2.2, see our previous work [Lemke2015, Lemke_Godsill_2015]. In the vector sde case, we could consider for example a direct Eulertype discretisation of the integral, solving approximately using increments of the process, , , , and then using the conditionally Gaussian auxiliary variable models of [Lemke2015, godsill2006bayesian]. However, there are issues about the level of approximation, choice of discretisation stepsize, etc. that make this approach unsatisfactory as a general solution. However, a pertinent question remains as to whether the limit as can be explicitly calculated, hence avoiding the need to simulate auxiliary states on the scale. We have been unable to do so, except for the scalar case (see [Lemke_Godsill_2015]
), and hence we propose very accurate approximate models suitable for likelihood and Bayesian inference procedures.
Iii Summary of Methods and Contribution
The starting point for the Lévy state space model is the generalised shot noise representation of , see [Rosinski_2001]. A principal contribution here is to adopt a noncentered conditionally Gaussian
form for the shot noise representation which has been largely overlooked in the stochastic processes literature, but which is of significant benefit in generating tractable and effective inference procedures. This special form enables both tractable conditional Gaussian inference, and the possibility of modelling of asymmetric Lévy processes. In addition, we show that the residual error committed by truncation can be approximated by a Gaussiandriven sde that exactly matches the first and second moments of the true residual error. We have established central limit theorems and sharp rates of convergence for the approximating underlying Gaussian residual Lévy process. Finally, based on this representation, we develop transition densities for the
Lévy state spacemodel in the conditionally Gaussian form, and show how to perform marginal inference in the model using Kalmanfiltering recursions built into a Bayesian (or potentially likelihoodbased) Monte Carlo framework. Results have been obtained for simulated data and real exchange rate data.
Other relevant work includes [clement2019]
who propose theorems and estimating function methods for stable Lévydriven sdes;
[Fournier_20011], who prove error bounds between Lévydriven sdes with small jumps and their Gaussian approximations; [Jasra2019] provide Bayesian inference procedures using a quasilikelihood approach and MCMC; [Dereich2011, Dereich2016] provide a Multilevel Monte Carlo approach for evaluation of expectations, using coupled Euler approximations; and [FerreiroCastilla2016] who propose Euler schemes under Poissonarrival times. In contrast to these approaches, we tackle the problem without resort to Euler approximations or pseudolikelihood functions, and our methods are applicable to low or highfrequency observations, but are currently limited to linear sdes. Our approach focuses on a class of Lévy processes that may be expressed as a conditionally Gaussian shot noise process, which includes for example all stable Lévy processes and many truncated and modified variants on these processes; however, at the expense of more computational effort, more general nonGaussian versions may also be incorporated into our framework.Iv Generalised Shot Noise Processes
We will use a generalised shot noise formulation studied in [Rosinski_2001] and references therein, in which the Lévy process is expressed on a time axis as
(4) 
Here
are the epochs of a unit rate Poisson process,
, are mutually independent random variables with a certain distribution , and are independent uniform random variables on . The centering terms are sometimes needed and sometimes not, as discussed below. is generally taken to be nonincreasing in its argument for fixed . A point process for the expanded space with mean measure may be constructed as [Rosinski_2001]:where the iid uniform variables give the times of arrival of jumps and give the size of the jumps and is the Dirac point mass located at . This leads to a direct expression for [Rosinski_2001] (by the LévyIto integral representation of , expressed in terms of the augmented point process ):
(5) 
which, on substitution for , leads to the shot noise representation (4). The almost sure convergence of this series to is proven in [Rosinski_2001]. In this paper we will apply, in particular, random Poisson truncations of the form which leads to approximations, neglecting small jumps, of the form
(6) 
and the corresponding truncated shot noise series
(7) 
The corresponding Lévy measure for can then be shown to be [Rosinski_2001]
(8) 
with
and where
V The conditionally Gaussian Lévy process
One of the principal contributions of this paper is in methods for tractable inference about states and parameters of Lévy driven sdes. The starting point for this is the generalised shot noise representation (4). We propose to use a special form of this in which we define:
where is a nonincreasing function.
If we furthermore assume a noncentered Gaussian form then special properties result, notably tractable conditional Gaussian inference, and the possibility of modelling of asymmetric Lévy processes. Note that in the centered case, , such processes are well known, see e.g. the processes of type G [Rosinski_1991], and the Condition S processes described in [Samorodnitsky_Taqqu_1994], but in the general case they are commonly overlooked. A well known example of this form for is the stable Lévy process, which can be represented with the form [Samorodnitsky_Taqqu_1994]:
where is the stable law tail parameter. The case , the Cauchy process, is specifically excluded from now on, as this forms a singular special case of the family [Samorodnitsky_Taqqu_1994]. Essentially any distribution for the may be adopted, subject to finite absolute th moments , which is well known to be satisfied by our noncentred normal . Following the procedure in the previous section, it is relatively straightforward to verify that the correct Lévy measure for the stable law is obtained under this function and the noncentered Gaussian form (a proof will be given in subsequent publications).
There are many possible choices for the centering function and we follow a standard procedure [Samorodnitsky_Taqqu_1994] in which no centering is required for and a modified centering can be calculated in closed form for :
and it can be shown that this choice of centering is also equal to the expected value of the truncated process:
There is extensive earlier work on the analysis of the convergence rate of truncated series of this class to the corresponding stable laws; see, e.g., [JanickiWeron1994, janicki1992computer, JanickiKokoszka1992, LedouxPaulauskas1996, Bentkus1996, Bentkus2001]. In [Riabiz2018a] we reviewed these results, and derived new bounds for the symmetric stable laws under our conditionally Gaussian representation. Here though we are able to provide the following more general result that applies to the asymmetric general case for the first time:
Theorem 1
If the approximating truncated process is as in (7) and is the full untruncated Lévy process, then:
where is a constant that does not depend on the truncation level .
Proof: The proof follows a similar structure to Theorem 3.1 of [Asmussen2001] and will be presented in a future publication.
Remarks:

We notice that the rate of convergence wrt matches that in [Asmussen2001] Th. 3.1, which corresponds to a different type of truncation on the absolute size of the jumps. We comment however that it is not as good as the convergence rate we have previously obtained for the symmetric case, which was [Riabiz2018a]. However, we note that our current result is significantly more general in that it covers both symmetric and nonsymmetric (skewed) cases. We can postulate that a more subtle future development will prove convergence rates between and as we move from fully skewed to symmetric models.

We have computed this convergence result for the particular case of the stable law, , but other conditionally Gaussian Lévy processes are likely to be amenable to similar analysis and we will study these in future work. Similarly, nonGaussian versions may also be amenable to similar analysis provided convenient moment expressions are available for certain distributions.
Va The Lévydriven sde
Recall that the required sde solution is given in Eq. (1) and in particular we require a representation of the integral:
The proposed series representation of the integral is based on the generalised shot noise representation of the previous section,
(9) 
and to get the basic result we may (informally) substitute this directly into (1) to obtain
(10) 
where as before:

are event times of a unitrate Poisson process

are i.i.d. ,

are i.i.d. such that

are constants, nonzero only if
and is as follows:
A similar form has been proved to converge to the correct stable sde in [Samorodnitsky_Taqqu_1994], in the scalar sde case, using a discrete Rademacher random variable . In [Lemke_Godsill_2015] the scalar case was also proven with , and a special case was also given for symmetric Gaussians in [Samorodnitsky_Taqqu_1994] (the socalled Condition S). Here we are extending to the vector sde , with , for which the full proof will be presented in a future publication. Some insights may be gained by the following informal interpretation (see also [Samorodnitsky_Taqqu_1994]):

are jump arrival times

is the size of jump at

is the effect at time of passing a unit jump at through the linear ODE

is a drift term, only nonzero for
In particular, for practical implementations we here introduce the randomly truncated version, with truncation for ,
which has zero mean for finite and , by construction.
We now analyse the error due to this truncation, given by . The following result gives an exact characterisation of the covariance of the error:
Theorem 2
Proof: The proof relies on calculation of the covariance of the point process corresponding to and will be presented in a future publication.
Remark: The expectation is obtained for our sde as:
and we can recognise this as the covariance function of a linear Gaussian sde [Oksendal2003]
where is the unidimensional Brownian motion. Hence the covariance of is exactly matched by that of a linear Gaussian sde :
and .
We propose then to approximate the exact integral in four possible ways, each involving a different truncation/ Gaussian approximation to the residual, as explained in the next section.
Vi Lévy state space model
We are now in a position to specify the stable Lévy statespace model. This requires the specification of the transition density between times and , , expressed as
(11) 
This may be obtained fairly straightforwardly from (10) and the properties of the matrix exponential and the point process on the subinterval of as
(12) 
where and all terms are defined as before, except that now are uniforms on the subinterval and are drawn independently of the point process in any nonoverlapping time intervals. Thus, for each successive time interval a new set of s needs to be independently generated afresh in order to forward simulate the process for times . We have constructed the model in this way so that a causal forward simulation may be carried out without requiring a prior simulation of all the for the entire time axis as in (10) .
The randomly truncated version can be obtained similarly,
(13) 
with
and
We can here interpret truncation parameter as the expected number of jumps per unit time. Note that, conditional upon the term is Gaussian,
with
(14) 
and
(15) 
To complete the state space model, recall that the residual terms of the series from , , are zero mean with covariance matrix:
leading to a decomposition of the exact state space model as:
We can now propose several approximations to this exact representation by either removing or approximating the residual term . In each case is approximated with an independent zero mean Gaussian term having covariance , where is a constant leading to different approximations, each corresponding to a conditionally Gaussian form for the approximated Lévydriven sde’s transition density:
(16) 
In each case the conditioning random variables are defined such that are generated independently of all other time intervals that do not overlap with as:
where
denotes the exponential distribution with unit mean.
This conditional Gaussian transition density is the building block for all of the forward simulation and inference methods described subsequently. The four cases of approximation considered are:

Truncated series . The first approximation neglects the residual altogether, hence leading to the truncated version of the process, (13), and corresponding to . This approximation of the process has some computational benefits, as will be considered shortly, but is most likely the least accurate of our three approximations.

Gaussian residual approximation . In this more sophisticated approximation a zero mean Gaussian approximation to the residual is made, replacing a zero mean Gaussian random variable with covariance matched to . In this case . This is most likely the most accurate representation of the process that we have, but it has some less desirable computational properties, as will be described shortly. Note however that for the important symmetric case the transition density is identical to the following partial Gaussian approximation, acquiring its computational advantages (principally the ability to marginalise ).

Partial Gaussian residual approximation . Here a halfway house may be considered in which is replaced with the solution of a linear SDE whose covariance equals and hence . This version retains the computational benefits of , but does not fully account for the covariance of (since ), and so is likely to sit in between 1) and 2) in terms of approximation accuracy.

Joint Gaussian residual approximation of and . A final, more complex version of the truncation attempts to approximate the resisual series of and jointly. This has the same computational benefits as and , but is a little more complex to compute and is not exposed here, see [Riabiz2019] for full details.
Via Matrixvector form for Lévy state space Model
We are now in a position to state the model in standard linear (conditionally upon and ) Gaussian statespace form. For the approximated process , which may be set equal to , case 1) above with , , case 2) above with , or , case 3) above with . In cases 1) and 3) it may be written using an extended state vector :
(17) 
with
where (which does not depend upon ), and where . For case 2), owing to the more complex covariance structure of the Gaussian residual approximation , the mean may not be directly included as a state variable and we have
(18) 
and .
Vii Computation of likelihoods and stateconditionals
Equations (17) and (18) can now form the basis for forward simulation of data from the Lévy state space model, or for inference about its states and parameters. Likelihoods and state conditional distributions may be computed using standard Bayesian recursions, implemented by the Kalman Filter, all conditioned upon the latent variables
. This means that Monte Carlo likelihoodbased and Bayesian inference procedures such as Monte Carlo EM (MCEM), Markov chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) may routinely be implemented.
For a concrete example, take the case 1) or 3) approximated models and suppose we partially observe , at times , , arranged in ascending order,
For example, if and we are fully observing the first component of . Then, initialise , , with , , where is a constant. Note that the covariance of the noise and the prior covariance of , , are both scaled relative to . This is to ensure greatest analytical tractability of posterior densities computed using Kalman filtering. A flat prior on is obtained as , and a fully observed case with no observation noise is obtained with . Other more general prior structures can of course be incorporated, but this will be at the cost of a convenient conjugate posterior density for .
In order to obtain the closed form result, follow a scheme similar to that in, for example, [Harvey1990]. In this redefine the dynamical noise as
where , which does not depend on by construction in cases 1) and 3). Then, the Kalman filter computes [Harvey1990], for each ,
where , etc. are the Kalman filter output variables under the definition of the modified dynamical noise distribution , and , . From this the marginal likelihood is obtained as
where
and where is the dimension of the observation vector .
The conjugate prior for this form of likelihood is the inverted gamma distribution
. Completing the (standard) conjugate analysis [Bernardo_Smith_1994], we have:(19) 
and we have, remarkably, all the tools to carry out marginal anaylsis of the data conditioned on the auxiliary variables , and also joint conditional analysis for . We stress that this full analysis is only available with the chosen prior structures, in particular the normalinverse gamma conjugate priors for and the special scaled form of the noise covariance matrices and . Without these structures we will still be able to use the standard Kalman filter model to marginalise and infer , but we will lose the closed form marginal expressions for and/or . A minor modification applies for the joint Gaussian residual approximation, case 4), in which once again the full Kalman filter calculations can be implemented with a modified noise covariance (not detailed here). Finally, for the Gaussian residual approximation , Case 2), the conjugate likelihood structure is lost owing to the term in the transition density’s noise covariance . The standard Kalman filter can be used to infer/marginalise , but not or .
Viii Example: Langevin model
In order to give a concrete application case, we work through the analysis for a 2dimensional () Langevin model with similar structure to that used in [Christensen2012],
and work with a state vector:
The required system matrices for this model are
Then,
and
which renders the calculation of and (see (14) and (15) fairly straightforward:
(20) 
and
(21) 
Also:
and:
Viiia Forward simulation example
Equation (17) may be used directly to simulate a discrete ‘skeleton’ of the process, and this in itself may be of use in the study of these intractable processes. An example simulation is given in Fig. 1 for an asymmetric Langevin model with . A much more heavytailed example is given in Fig. 2 for the symmetric case.
ViiiB Marginal Monte Carlo Filter
A sequential Monte Carlo (SMC) method is applied to infer the posterior distribution of the states of the Langevin model. This is implemented in the form of a standard bootstrap particle filter, proposing the variables according to (20) and (21) at each time interval and computing marginal weights according to (19). This is a fairly standard implementation, see e.g. [Cappe2007], and we omit further details here, leaving this to a subsequent publication. In all cases the state is observed as and the derivative is unobserved. An example run of the filter, showing the Monte Carlo filter trajectories, is displayed in Fig. 3. Note that is quite well inferred in the bottom panel, with appropriate posterior uncertainty about the trajectory. In Fig. 4 a small segment of the highfrequency tick level Euro dollar exchange rate from 2006 is modelled and tracked over a period of a few minutes with the filter. Here we found that a very heavytailed model () was successfully able to capture the apparent rapid jumps in the trend () of the process. Full simulation results and details of implementation will be provided in subsequent publications.
Comments
There are no comments yet.