Lévy processes have become very useful recently in several scientific disciplines. A non-exhaustive list includes physics, in the study of turbulence and quantum field theory; economics, for continuous time-series models; insurance mathematics, for computation of insurance and risk, and mathematical finance, for pricing path dependent options. Earlier application of Lévy processes in modeling financial instruments dates back in 
where a variance gamma process is used to model market returns.
A typical computational problem in mathematical finance is the computation of the quantity , where is the time solution of a stochastic differential equation driven by a Lévy process and , a bounded Borel measurable function on . For instance can be a payoff function. Typically one uses the Black-Scholes model, in which the underlying price process is lognormal. However, often the asset price exhibits big jumps over the time horizon. The inconsistency of the assumptions of the Black-Scholes model for market data has lead to the development of more realistic models for these data in the literature. General Lévy processes offer a promising alternative to describe the observed reality of financial market data, as compared to models that are based on standard Brownian motions.
In the application of standard and multilevel particle filter methods to SDEs driven by general Lévy processes, in addition to pricing path dependent options, we will consider filtering of partially-observed Lévy process with discrete-time observations. In the latter context, we will assume that the partially-observed data are regularly spaced observations , where is a realization of and ) has density given by , where is the time scale. Real S&P stock price data will be used to illustrate our proposed methods as well as the standard particle filter. We will show how both of these problems can be formulated as general Feynman-Kac type problems , with time-dependent potential functions modifying the Lévy path measure.
The multilevel Monte Carlo (MLMC) methodology was introduced in  and first applied to the simulation of SDE driven by Brownian motion in . Recently,  provided a detailed analysis of the application of MLMC to a Lévy-driven SDE. This first work was extended in  to a method with a Gaussian correction term which can substantially improve the rate for pure jump processes . The authors in  use the MLMC method for general Lévy processes based on Wiener-Hopf decomposition. We extend the methodology described in  to a particle filtering framework. This is challenging due to the following reasons. First, one must choose a suitable weighting function to prevent the weights in the particle filter being zero (or infinite). Next, one must control the jump part of the underlying Lévy process such that the path of the filter does not blow up as the time parameter increases. In pricing path dependent options, for example knock out barrier options, we adopt the same strategy described in [16, 17] for the computation of the expectation of the functionals of the SDE driven by general Lévy processes.
The rest of the paper is organised as follows. In Section 2, we briefly review the construction of general Lévy processes, the numerical approximation of Lévy-driven SDEs, the MLMC method, and finally the construction of a coupled kernel for Lévy-driven SDEs which will allow MLMC to be used. Section 3 introduces both the standard and multilevel particle filter methods and their application to Lévy-driven SDEs. Section 4 features numerical examples of pricing barrier options and filtering of partially observed Lévy processes. The computational savings of the multilevel particle filter over the standard particle filter is illustrated in this section.
2 Approximating SDE driven by Lévy Processes
In this section, we briefly describe the construction and approximation of a general -dimensional Lévy process , and the solution of a -dimensional SDE driven by . Consider a stochastic differential equation given by
where , and the initial value is (assumed known). In particular, in the present work we are interested in computing the expectation of bounded and measurable functions , that is .
2.1 Lévy Processes
For a general detailed description of the Lévy processes and analysis of SDEs driven by Lévy processes, we shall refer the reader to the monographs of [3, 27] and [1, 24]. Lévy processes are stochastic processes with stationary and independent increments, which begin almost surely from the origin and are stochastically continuous. Two important fundamental tools available to study the richness of the class of Lévy processes are the Lévy-Khintchine formula and the Lévy-Itô decomposition. They respectively characterize the distributional properties and the structure of sample paths of the Lévy process. Important examples of Lévy processes include Poisson processes, compound Poisson processes and Brownian motions.
There is a strong interplay between Lévy processes and infinitely divisible distributions such that, for any the distribution of is infinitely divisible. Conversely, if is an infinitely divisible distribution then there exists a Lévy process such that the distribution of is given by . This conclusion is the result of Lévy-Khintchine formula for Lévy processes we describe below. Let be a Lévy process with a triplet , , where is a measure satisfying and , such that
with the probability law of , where
The measure is called the Lévy measure of . The triplet of Lévy characteristics is simply called Lévy triplet. Note that in general, the Lévy measure can be finite or infinite. If , then almost all paths of the Lévy process have a finite number of jumps on every compact interval and it can be represented as a compensated compound Poisson process. On the other hand, if , then the process has an infinite number of jumps on every compact interval almost surely. Even in this case the third term in the integrand ensures that the integral is finite, and hence so is the characteristic exponent.
2.2 Simulation of Lévy Processes
The law of increments of many Lévy processes is not known explicitly. This makes it more difficult to simulate a path of a general Lévy process than for instance standard Brownian motion. For a few Lévy processes where the distribution of the process is known explicitly, [4, 28] provided methods for exact simulation of such processes, which are applicable in financial modelling. For our purposes, the simulation of the path of a general Lévy process will be based on the Lévy-Itô decomposition and we briefly describe the construction below. An alternative construction is based on Wiener-Hopf decomposition. This is used in .
The Lévy-Itô decomposition reveals much about the structure of the paths of a Lévy process. We can split the Lévy exponent, or the characteristic exponent of in , into three parts
The first term corresponds to a deterministic drift process with parameter , the second term to a Wiener process with covariance , where denotes the symmetric square-root, and the last part corresponds to a Lévy process which is a square integrable martingale. This term may either be a compensated compound Poisson process or the limit of such processes, and it is the hardest to handle when it arises from such a limit.
Thus, any Lévy process can be decomposed into three independent Lévy processes thanks to the Lévy-Itô decomposition theorem. In particular, let denote a Wiener process independent of the process . A Lévy process can be constructed as follows
The Lévy-Itô decomposition guarantees that every square integrable Lévy process has a representation as . We will assume that one cannot sample from the law of , hence of , and rather we must numerically approximate the process with finite resolution. Such numerical methods have been studied extensively, for example in [15, 25].
norm, for vectors, and induced operator norm for matrices.
There exists a such that
, and for all ;
Item (i) provides continuity of the forward map, while (ii) controls the variance of the jumps, and (iii) controls the diffusion and drift components and is trivially satisfied. These assumptions are the same as in the paper , with the exception of the second part of (i), which was not required there. As in that paper we refer to the following general references on Lévy processes for further details[1, 3].
2.3 Numerical Approximation of a Lévy Process and Lévy-driven SDE
Recall and . Consider the evolution of discretized Lévy process and hence the Lévy-driven SDE over the time interval .
In order to describe the Euler discretization of the two processes for a given accuracy parameter , we need some definitions. The meaning of the subscript will become clear in the next section. Let denote a jump threshold parameter in the sense that jumps which are smaller than will be ignored. Let . Define , that is the Lévy measure outside of the ball of radius . We assume that the Lévy component of the process is nontrivial so that . First will be chosen and then the parameter will be chosen such that the step-size of the time-stepping method is
. The jump time increments are exponentially distributed with parameterso that the number of jumps before time is a Poisson process with intensity . The jump times will be denoted by . The jump heights are distributed according to
The expected number of jumps on an interval of length is , and the compensated compound Poisson process defined by
The Euler discretization of the Lévy process and the Lévy driven SDE is given by Algorithm 1. Appropriate refinement of the original jump times to new jump times is necessary to control the discretization error arising from the Brownian motion component, the original drift process, and the drift component of the compound Poisson process. Note that the is non-zero only when corresponds with for some , as a consequence of the construction presented above.
The numerical approximation of the Lévy process described in Algorithm 1 gives rise to an approximation of the Lévy-driven SDE as follows. Given , for
where is given by (5). In particular the recursion in (6) gives rise to a transition kernel, denoted by , between observation times . This kernel is the measure of given initial condition . Observe that the initial condition for is irrelevant for simulation of , since only the increments are required, which are simulated independently by adding a realization of to .
2.4 Multilevel Monte Carlo Method
Suppose one aims to approximate the expectation of functionals of the solution of the Lévy-driven SDE in at time , that is , where is a bounded and measurable function. Typically, one is interested in the expectation w.r.t. the law of exact solution of SDE , but this is not always possible in practice. Suppose that the law associated with with no discretization is . Since we cannot sample from , we use a biased version associated with a given level of discretization of SDE at time . Given , define , the expectation with respect to the density associated with the Euler discretization at level . The standard Monte Carlo (MC) approximation at time consists in obtaining i.i.d. samples from the density and approximating by its empirical average
The mean square error of the estimator is
Since the MC estimator
is an unbiased estimator for, this can further be decomposed into
The first term in the right hand side of the decomposition is the variance of MC simulation and the second term is the bias arising from discretization. If we want (7) to be , then it is clearly necessary to choose , and then the total cost is , where it is assumed that for some is the cost to ensure the bias is .
Now, in the multilevel Monte Carlo (MLMC) settings, one can observe that the expectation of the finest approximation can be written as a telescopic sum starting from a coarser approximation , and the intermediate ones:
Now it is our hope that the variance of the increments decays with , which is reasonable in the present scenario where they are finite resolution approximations of a limiting process. The idea of the MLMC method is to approximate the multilevel (ML) identity by independently computing each of the expectations in the telescopic sum by a standard MC method. This is possible by obtaining i.i.d. pairs of samples for each , from a suitably coupled joint measure with the appropriate marginals and , for example generated from a coupled simulation the Euler discretization of SDE at successive refinements. The construction of such a coupled kernel is detailed in Section 2.5. Suppose it is possible to obtain such coupled samples at time . Then for , one has independent MC estimates. Let
where . Analogously to the single level Monte Carlo method, the mean square error for the multilevel estimator can be expanded to obtain
with the convention that . It is observed that the bias term remains the same; that is we have not introduced any additional bias. However, by an optimal choice of , one can possibly reduce the computational cost for any pre-selected tolerance of the variance of the estimator, or conversely reduce the variance of the estimator for a given computational effort.
In particular, for a given user specified error tolerance measured in the root mean square error, the highest level and the replication numbers are derived as follows. We make the following assumptions about the bias, variance and computational cost based on the observation that there is an exponential decay of bias and variance as increases.
Suppose that there exist some constants and an accuracy parameter associated with the discretization of SDE at level such that
where are related to the particular choice of the discretization method and cost is the computational effort to obtain one sample . For example, the Euler-Maruyama discretization method for the solution of SDEs driven by Brownian motion gives orders . The accuracy parameter typically takes the form for some integer . Such estimates can be obtained for Lévy driven SDE and this point will be revisited in detail below. For the time being we take this as an assumption.
The key observation from the mean-square error of the multilevel estimator is that the bias is given by the finest level, while the variance is decomposed into a sum of variances of the increments. Thus the total variance is of the form and by condition above, the variance of the increment is of the form . The total computational cost takes the form . In order to minimize the effort to obtain a given mean square error (MSE), one must balance the terms in . Based on the condition above, a bias error proportional to will require the highest level
In order to obtain optimal allocation of resources , one needs to solve a constrained optimization problem: minimize the total cost for a given fixed total variance or vice versa. Based on the conditions and above, one obtains via the Lagrange multiplier method the optimal allocation .
Now targetting an error of size , one sets , where is chosen to control the total error for increasing . Thus, for the multilevel estimator we obtained:
One then sets in order to have variance of . We can identify three distinct cases
If , which corresponds to the Euler-Maruyama scheme, then . One can clearly see from the expression in that . Then the total cost is compared with single level .
If , which correspond to the Milstein scheme, then , and hence the optimal computational cost is .
If , which is the worst case scenario, then it is sufficient to choose . In this scenario, one can easily deduce that the total cost is , where , using the fact that .
One of the defining features of the multilevel method is that the realizations for a given increment must be sufficiently coupled in order to obtain decaying variances . It is clear how to accomplish this in the context of stochastic differential equations driven by Brownian motion introduced in  (see also ), where coarse icrements are obtained by summing the fine increments, but it is non-trivial how to proceed in the context of SDEs purely driven by general Lévy processes. A technique based on Poisson thinning has been suggested by  for pure-jump diffusion and by  for general Lévy processes. In the next section, we explain an alternative construction of a coupled kernel based on the Lévy-Ito decomposition, in the same spirit as in .
2.5 Coupled Sampling for Levy-driven SDEs
The ML methodology described in Section 2.4 works by obtaining samples from some coupled-kernel associated with discretization of . We now describe how one can construct such a kernel associated with the discretization of the Lévy-driven SDE. Let . Define a kernel, , where denotes the -algebra of measurable subsets, such that for
The coupled kernel can be constructed using the following strategy. Using the same definitions in Section 2.3, let and be user specified jump-thresholds for the fine and coarse approximation, respectively. Define
The objective is to generate a coupled pair given , with . The parameter will be chosen such that , and these determine the value of in (14), for . We now describe the construction of the coupled kernel and thus obtain the coupled pair in Algorithm 2, which is the same as the one presented in .
Generate fine process: Use parts (A) to (C) of Algorithm 1 to generate fine process yielding and
Generate coarse jump times and heights: for ,
If , then and ; ;
Refine jump times: Set and ,
If , set ; else and Go to (i).
If , set , and redefine for ;
Else and Go to (ii).
Recursion of the process: sample (noting );
Let , , , and ;
The construction of the coupled kernel outlined in Algorithm 2 ensures that the paths of fine and coarse processes are correlated enough to ensure that the optimal convergence rate of the multilevel algorithm is achieved.
3 Multilevel Particle Filter for Lévy-driven SDEs
In this section, the multilevel particle filter will be discussed for sampling from certain types of measures which have a density with respect to a Lévy process. We will begin by briefly reviewing the general framework and standard particle filter, and then we will extend these ideas into the multilevel particle filtering framework.
3.1 Filtering and Normalizing Constant Estimation for Lévy-driven SDEs
Recall the Lévy-driven SDE (1). We will use the following notation here . It will be assumed that the general probability density of interest is of the form for , for some given
where is the transition density of the process as a function of , i.e. the density of solution at observational time point given initial condition . It is assumed that is the conditional density (given ) of an observation at discrete time , so observations (which are omitted from our notations) are regularly observed at times . Note that the formulation discussed here, that is for , also allows one to consider general Feynman-Kac models (of the form (17)), rather than just the filters that are focussed upon in this section. The following assumptions will be made on the likelihood functions . Note these assumptions are needed for our later mathematical results and do not preclude the application of the algorithm to be described.
There are and , such that for all , and , satisfies
In practice, as discussed earlier on is typically analytically intractable (and we further suppose is not currently known up-to a non-negative unbiased estimate). As a result, we will focus upon targets associated to a discretization, i.e. of the type
for , where is defined by iterates of the recursion in (6). Note that we will use as the notation for measure and density, with the use clear from the context, where .
The objective is to compute the expectation of functionals with respect to this measure, particularly at the last co-ordinate. For any bounded and measurable function , , we will use the notation
Often of interest is the computation of the un-normalized measure. That is, for any bounded and measurable function define, for
In the context of the model under study, is the marginal likelihood.
Henceforth will be used to denote a draw from . The vanilla case described earlier can be viewed as the special example in which for all
. Following standard practice, realizations of random variables will be denoted with small letters. So, after drawing, then the notation will be used for later references to the realized value. The randomness of the samples will be recalled again for MSE calculations, over potential realizations.
3.2 Particle Filtering
We will describe the particle filter that is capable of exactly approximating, that is as the Monte Carlo samples go to infinity, terms of the form (19) and (20), for any fixed . The particle filter has been studied and used extensively (see for example [5, 8]) in many practical applications of interest.
For a given level , algorithm 5 gives the standard particle filter. The weights are defined as for
with the convention that . Note that the abbreviation stands for effective sample size which measures the variability of weights at time of the algorithm (other more efficient procedures are also possible, but not considered). In the analysis to follow in algorithm 5 (or rather it’s extension in the next section), but this is not the case in our numerical implementations.
3.3 Multilevel Particle Filter
We now describe the multilevel particle filter of  for the context considered here. The basic idea is to run independent algorithms, the first a particle filter as in the previous section and the remaining, coupled particle filters. The particle filter will sequentially (in time) approximate and the coupled filters will sequentially approximate the couples