In this paper, to the best of our knowledge for the first time in the literature, we study the problem of nonparametric Bayesian inference for infinite activity subordinators, i.e., Lévy processes with non-decreasing sample paths. In the last two decades, Lévy processes have received a lot of attention, mainly due to their numerous applications in mathematical finance and insurance, but also in natural sciences; see, e.g., Barndorff-Nielsen et al. (2001). As a matter of fact, thanks to their ability to reproduce stylised features of financial time series distributions, Lévy processes have become a fundamental building block for modelling asset prices with jumps, see Cont and Tankov (2004). By the Lévy-Khintchine formula, the law of a Lévy process is uniquely determined by the so-called Lévy triplet, which hence encodes all the probabilistic information on the process. Since the Lévy triplet involves an infinite-dimensional object, the Lévy measure of the process, this provides natural motivation for studying nonparametric inference procedures for Lévy processes, where the objects of inference are elements of some function spaces.
We term the class of increasing infinite activity Lévy processes that we study -subordinators. Our model generalises the well-known Gamma process, which is a popular risk model, see Dufresne et al. (1991)
, and also forms a building block for more general Lévy models, like the Variance-Gamma (VG) process, that finds many applications in finance, see, e.g.,Madan and Seneta (1990). The family of -subordinators also overlaps with the class of self-decomposable Lévy processes, that likewise have important applications in finance, see, e.g., Carr et al. (2007).
We specifically concentrate on estimation of the Lévy triplet of a -subordinator. On the computational side, our Bayesian procedure circumvents the problem of the intractable likelihood for -subordinators via the data augmentation device, which relies on bridge process sampling via Gamma process bridges, and also employs an infinite-dimensional form of the reversible jump algorithm. On the theoretical side, we establish that our procedure is consistent: as the sample size grows to infinity, the posterior asymptotically concentrates around the parameters of the Lévy processes under which the data have been generated. We test our algorithm on simulated and real data examples. In particular we fit a -subordinator to a benchmark dataset in insurance theory, large fire losses in Denmark, and study the question whether a risk model based on a Gamma process is adequate for modelling this dataset.
1.1 Literature overview
To provide further motivation for a nonparametric Bayesian approach to inference in Lévy processes and to highlight some associated challenges, in this subsection we supply an overview of the literature on the subject.
The problem of nonparametric inference for Lévy processes has a long history, going back to Rubin and Tucker (1959) and Basawa and Brockwell (1982). Revival of interest in it dates around the year 2003, with contributions Buchmann and Grübel (2003), Buchmann and Grübel (2004) and van Es et al. (2007), as well as numerous later publications; see also Ilhe et al. (2015) for a further extension. Very recent works Coca (2017) and Duval and Mariucci (2017) provide an extensive list of references.
In general, there are two major strands of mathematical statistics literature dealing with inference for Lévy processes, or more generally semimartingales. The first considers the so-called high frequency setup where asymptotic properties of the corresponding estimators are studied under the assumption that observations are made at an increasing frequency in time. In the second strand of the literature, times between successive observations are assumed to be fixed (the so-called low frequency setup) and the asymptotic analysis is done under the premise that the observational horizon tends to infinity.
The last decade witnessed a tremendous advance in the area of statistics for high frequency financial data, due to the development of new mathematical methods to analyse these data, as well as increasing availability of such data. We refer to the recent book Aït-Sahalia and Jacod (2014) for a comprehensive treatment of modern statistical methods for high frequency data. At the same time, progress was achieved also in statistical inference for Lévy-driven models based on low frequency data, see, e.g., Belomestny et al. (2015) for an overview and references. The latter situation is more challenging, as e.g. it becomes quite difficult to distinguish between small jumps of a Lévy process and the Brownian increments. This often leads to rather slow, logarithmic convergence rates for resulting estimators, see, e.g., Belomestny and Reiß (2006), Gugushvili (2009) and Gugushvili (2012). Hence accurate nonparametric inference for Lévy processes typically requires very large amounts of data, which may not always be available in practice. Fortunately, in many cases there is additional (prior) information about the structure of the parameters, which can be used to improve the estimation quality under limited data. To account for this prior information, the Bayesian estimation framework is quite appealing. Furthermore, the Bayesian approach provides automatic uncertainty quantification in parameter estimates through the spread of the posterior distribution of the parameters. Also, in some fields, such as e.g. climate and weather science, Bayesian approaches are thought to be default (see, e.g., Berliner et al. (1999)), and studying them would go together with common practices in those fields. On the other hand, there are also some formidable challenges in applying the nonparametric Bayesian methodology to inference in Lévy processes. Firstly, the underlying process is usually observed at discrete time instances, while Lévy models are formulated in continuous time. This gives rise to complications that are typical in inference for discretely observed continuous time stochastic processes. Secondly, Bayesian estimation in its simplest, pristine form requires knowledge of the likelihood of observations, and hence of marginal densities of the underlying Lévy process; these, however, are rarely available in closed form. Thirdly, devising valid MCMC algorithms in infinite-dimensional settings is a highly non-trivial task. Cf. recent works on nonparametric Bayesian inference in diffusion models, such as Beskos et al. (2008) and van der Meulen et al. (2014).
The literature on nonparametric Bayesian inference for Lévy processes is very recent and also rather scarce, the only available works being Nickl and Söhl (2017a), Gugushvili et al. (2015) and Gugushvili et al. (2018). These deal with a particular case of compound Poisson processes, concentrate exclusively on theoretical aspects (with the exception of the latter paper), and do not appear to admit an obvious extension to other classes of Lévy processes. In fact, compound Poisson processes are rather special among Lévy processes, and are of limited applicability in many practically relevant cases. Hence there is space for improvement. On the positive side, the practical results we obtained in this work demonstrate great potential of Bayesian methods for inference in Lévy processes. Our approach is aimed at developing an applicable statistical methodology, which we substantiate by theoretical results, and also test via challenging examples. At the same time, we admit there remain several unresolved theoretical and practical issues, such as derivation of posterior contraction rates or practical fine-tuning of the prior we use. However, upon careful reading this should come as no surprise given the sheer complexity of our undertaking, where several topics would have merited to be subjects of independent research projects. We view our work as the first substantial step made in the direction of studying inference problems for Lévy processes via nonparametric Bayesian methods. It is our hope that our contribution will generate additional interest in this statistically and mathematically fascinating topic.
1.2 Structure of the paper
The rest of the paper is organised as follows: in Section 2 we describe in detail the statistical problem we are dealing with and our nonparametric Bayesian approach to it. Posterior inference in our setting is performed through MCMC sampling, and Section 3 provides a detailed exposition of our sampling algorithm. In Section 4 we establish the fact that our approach is consistent in the frequentist sense: asymptotically, as the sample size , the posterior measure concentrates around the Lévy triplet under which the data used in the estimation procedure has been generated. In Section 5 we test the practical performance of our method via simulation on a challenging example. In Section 6 we further generalise our basic model from Section 2 and detail changes and extensions this requires in designing an MCMC sampler in comparison to the one from Section 3. This new sampler is tested in simulations in Section 7. In Section 8 we apply our methodology on an insurance dataset. Possible extensions of our inferential approach to more general Lévy models are discussed in Section 9. Finally, in Appendices A and B we state and prove some technical results used in the main body of the paper, while in Appendix C we provide some additional analyses to substantiate our modelling approach in Section 8.
2 Statistical problem and approach
In this section we introduce in detail the statistical problem we are dealing with and describe our approach to tackle it.
2.1 Statistical problem
Consider a univariate Lévy process with generating Lévy triplet , where is finite and
Hence has no Gaussian component and the law of is entirely determined by . By the Lévy-Khintchine formula, see Theorem 8.1 in Sato (1999)
, the characteristic functionof admits the unique representation of the type
We also assume that the Lévy measure admits the representation
where and are parameters to be estimated, while is a known or unknown parameter. It follows that is a pure jump process with non-decreasing sample paths, or put another way, a subordinator with zero drift, cf. Sections 2.6.1–2.6.2 in Kyprianou (2006). One may call this class of Lévy processes Gamma-type subordinators, because is a Gamma process when , but we prefer to simply refer to it as -subordinators.
Assume that the process is observed at discrete time instances so our observations are . Our aim is nonparametric Bayesian estimation for the parameter triple . This requires specification of the likelihood and the prior in our model, that are next combined via Bayes’ formula to form the posterior distribution. This latter encodes all the necessary inferential information within the Bayesian setup. By Theorem 27.7 in Sato (1999), marginal distributions of possess densities with respect to the Lebesgue measure. With denoting the density of an increment , the likelihood
is in general intractable, as the marginal densities of are not known in closed form, except some special cases. This complicates a computational approach to Bayesian inference. We will circumvent this obstacle by employing the concept of data augmentation, see Tanner and Wong (1987). Specifically, we will propose a suitable nonparametric prior distribution on the parameter triple , and derive a Metropolis-Hastings algorithm relying on data augmentation to sample from the posterior distribution. Details of our approach are given in the following subsections.
We first consider the problem where is known and fixed. All processes and their laws in this section are restricted to the time interval for a fixed . Note that for any two Lévy measures and given by (2.2) with parameters and respectively, provided and both functions and are Lipschitz in some neighbourhood of zero, we have
where is the Hellinger distance between two (infinite) measures. By assumption (2.1) and property (2.3), together with Theorem 33.1 in Sato (1999), it follows that the laws and of are equivalent. Furthermore, Theorem 33.2 in Sato (1999) implies that a.s.
We can also write the log-likelihood ratio as
where the jump measure is defined by
for any Borel subset of . We can view as the dominating measure for . From the inferential point of view the specific choice of the dominating measure is immaterial. A convenient choice of for the theoretical development in Section 4 is to actually take to be the ‘true’ Lévy measure with parameters and (recall that is fixed and assumed to be known).
2.3 Gamma processes
We temporarily specialise to the case of a Gamma process. A Gamma process is an example of a pure jump Lévy process with non-decreasing sample paths. Its Lévy triplet is given by where
see Example 8.10 in Sato (1999). Making the dependence on parameters explicit, we also refer to as a process. The distribution of is gamma with rate parameter and shape parameter so that
where denotes the gamma function.
2.4 Data augmentation and bridge sampling
By using the data augmentation technique, we can utilise existence of a closed-form likelihood for a continuously observed Lévy path, see Subsection 2.2, to define a Metropolis-Hastings algorithm to sample from the posterior given the discrete observations . This treats the unobserved path segments between two consecutive observation times as missing data and augments the state space of the algorithm to sample from the joint posterior of missing data and unknown parameters. Specifically, this requires the ability to sample from the conditional distribution of the missing data given the parameters and the observations.
Consider again the Lévy process with fixed parameters , , , and denote the corresponding law by . Conditional on the observations and and the parameters, by the independent increments property of a Lévy process, the process can be sampled on each time interval independently. Samples from the conditional distribution on these intervals connect the observations in the form of so-called bridges. It suffices to describe the construction for a single bridge from to . A Gamma process shares with the Wiener process a remarkable property that samples from the conditional distribution can be obtained through a simple transformation of the unconditional path, see Yor (2007). For the Wiener process conditional on for a number , this transformation takes the form
For the Gamma process, the corresponding transformation takes a multiplicative form: define for a path a map by
Then , where denotes the law of , defines a factorisation of the conditional distribution of under the law given . This result in combination with a Metropolis-Hastings step can be used to sample from the conditional distribution of a -subordinator given the observations and parameters.
Analogously, we denote by the conditional distribution of under the law given . Here and later we use a superscript star to denote the conditional distributions, suppress the dependence on in the notation and write for example for integration with respect to the conditional distribution. By conditioning,
where and are the densities of under and , respectively. Note that is the continuous-time likelihood, which is known in closed form. Hence is also known in closed form up to an unknown proportionality constant , and the ratio of Radon-Nikodym derivatives , with denoting a proposal in the MCMC algorithm, is given by formula (2.11) below. This allows us to use samples distributed according to , i.e. bridges, as proposals for the augmented segment that follows the intractable conditional distribution .
To define the prior, we consider a subclass of processes defined in (2.2), where the parameter in the Lévy measure has the following form. Fix a sequence
set for convenience and and define bins by
Given bins assume the function is piecewise linear, i.e.,
where and Together with the parameter determines the slope of the function on the bin while gives the intercept. The process with the law can be viewed as a Gamma process with rate parameter and shape parameter , subjected to local deviations in the behaviour of jumps of sizes falling in bins compared to what of a Gamma process. The parameters quantify the extent of these local deviations on the bin .
We equip with independent priors. Note that these priors on implicitly define a prior on the Lévy measure as well. The specific form of the prior is not crucial for many arguments that follow, but is convenient computationally. In fact, theoretical results in Section 4 can be derived for other series priors as well. However, the local linear structure in (2.7) (which also means that the prior could be rewritten as series prior with basis functions with compact support) is important to derive some simple update formulae below.
For a realisation from the implicit prior on as above in the present section, let us work out the integral
which enters the expression for the likelihood in Subsection 2.2. To that end remember the definition of the exponential integral, see, e.g., §15.09 in Jeffreys and Swirles (1999) for its basic properties. Then a change of the integration variable gives
Observe that Similar to the case of ,
Also here remark that For future reference in Subsection 2.6, note that for any
which follows from the formula for Frullani’s integral, see §12.16 in Jeffreys and Swirles (1999).
2.6 Likelihood expressions for parameter updates
The following expressions will be used in Section 3 to construct the Metropolis-Hastings algorithm to sample from the posterior of
Define random variables
that for each give the number of jumps of , whose sizes fall into the bin Consider two laws and , where the Lévy measure is given by (2.2) and (2.7), while is given by (2.2) with coefficients instead of the coefficients , . The two laws and are equivalent, since each is equivalent to . We have the following expression for the log-likelihood,
Finally, for the ratio of Radon-Nikodym derivatives with respect to the law of a Gamma process with the same parameter we have
for and with , where is defined analogously to using instead . Note that in this situation the righthand side is independent of the choice of the parameter of the Gamma process measure used as the dominating measure.
3 Sampling the posterior
Using the usual convention in Bayesian statistics, denote the prior density of the parametersby , and use a similar generic notation for the density of the corresponding (joint) proposal kernel evaluated in , e.g. for a random move from to . We first describe the Metropolis–Hastings algorithm to sample from the posterior in continuous time and next make a remark about the discretisation below.
Initialise the parameters , , , , , , with their starting values. Initialise the segments with bridges connecting observations and , , using (2.5).
Repeat the following steps:
Independently, for each :
Sample bridge proposals connecting observations and using (2.5).
Sample . If
set to on , otherwise keep on .
Independently of step (i), propose and let denote the corresponding Lévy measure. Sample . If
replace by , otherwise retain .
Note that Step (i)(b) is the accept-reject step based on (2.11). Note that while we formulate the
The Metropolis-Hastings algorithm described above assumes one can sample the various processes and their bridges in continuous time. In practice it is possible to simulate the relevant processes only on a discrete grid of time points, which, however, can be made arbitrarily fine. In general it is preferable to work with a finite-dimensional approximation of a valid MCMC algorithm with infinite-dimensional state space instead of just an MCMC algorithm targeting a finite-dimensional approximation of the (joint) posterior, because the latter approach might have a singularity (resulting e.g. in vanishing acceptance probabilities) with growing dimension; seeBeskos et al. (2008) for an extended perspective. We now outline how our original algorithm can be discretised. Consider a discrete time grid (and ) for , . Formula (2.5) remains valid also for discretised Gamma processes, and those are readily obtained by sampling from the distribution of their increments. On the other hand, in the likelihood expressions of Subsection 2.6 and in (3.12) we approximate the sum of jumps of the process with sizes in , , by the sum of the increments of falling in ,
4 Posterior consistency
In this section we study asymptotic frequentist properties of our nonparametric Bayesian procedure. The only comparable works for Lévy processes available in the literature are Gugushvili et al. (2015), Gugushvili et al. (2018) and Nickl and Söhl (2017a), but they deal with the class of compound Poisson processes, which is quite different from the class of -subordinators considered in this work. Arguments in favour of studying frequentist asymptotics for Bayesian procedures have been already given in the literature many times, and will not be repeated here; see, e.g., Wasserman (1998). Our main result in this section is that under suitable regularity conditions, with growing sample size, our nonparametric Bayesian approach consistently recovers the parameters of interest. Thereby it stands on a solid theoretical ground.
4.1 Main results
Recall the setup of Section 2, which is complemented as follows. In this section we assume that the process is observed at equidistant times , . Without loss of generality we assume that our observations are This assumption, which we did not require in earlier sections, implies that the increments of the process are independent and identically distributed. This way we can develop our arguments without the additional technical burden caused by non-i.i.d. increments. We denote the increments by , where and assume that under the true Lévy density . In general, will stand for the law of the increment under the Lévy density Furthermore, we introduce the law of under the true Lévy density The law of this path under the Lévy density will be denoted by For our asymptotic results, we will let the number of bins depend on the sample size , and write instead. The prior below will be defined on a special class of Lévy densities, . These are the densities that on the bins , , , , , have the form , with , with the special choice and . So, with the above notation,
Below we present our first condition, and we comment on it and give additional explanations after it, as well as a few further comments after Condition 2.
Let the function have a compact support on the interval where the boundary points are known, , and suppose is -Hölder continuous, (, ). Suppose also that with known boundary points . Finally, assume that the parameter is known and, without loss of generality, equal to .
The assumption of known requires some further comments. As we already remarked elsewhere, the parameter plays a role similar to the dispersion coefficient of a stochastic differential equation driven by a Wiener process. Derivation of nonparametric Bayesian asymptotics for the latter class of processes (all of which is a recent work) historically proceeded from the assumption of a known to the one where is unknown and has to be estimated; see van der Meulen and van Zanten (2013), Gugushvili and Spreij (2014) and Nickl and Söhl (2017b). In that sense the fact that at this stage we assume is known does not appear unexpected or unnatural. This assumption assists in derivation of useful bounds on the Kullback-Leibler and Hellinger distances between marginals of -subordinators under different Lévy triplets, which in general is the key to establishing consistency properties of nonparametric Bayesian procedures. We achieve this by reducing some of the intractable computations for these marginals to calculations involving laws of continuously observed -subordinators, for which we need precisely to assume that the parameter is known; otherwise the corresponding laws are singular, which would yield only trivial and useless bounds.
The coefficients are equipped with independent uniform priors on the known interval , . Likewise, the coefficients are independent uniform on the interval whereas is uniform on , .We assume that all priors are independent. Implicitly, this defines a prior on the class of Lévy densities , which are realisations from the prior.
The assumption in Condition 2 that various priors are uniform can be relaxed to the assumption that they are supported on compacts and have densities bounded away from zero there. In fact, other assumptions in Conditions 1 and 2 can be relaxed at the cost of extra technical arguments in the proofs, but we do not strive for full generality in this work: a clean, readable presentation of our results and conciseness in the proofs is our primary goal.
Theorem 4.1 is our first main result in this section. Said shortly, it implies that our Bayesian procedure is consistent in probability; this in turn implies the existence of consistent Bayesian point estimates, see, e.g., Ghosal et al. (2000), pp. 506–507. We use the notation for the posterior measure. Also, denotes the law of the sample under the true Lévy density and denotes the law of the infinite sample under the true Lévy density .
Before proceeding further, we recall the definition of the Kullback-Leibler divergenceand the discrepancy for two probability measures :
Here stands for the square of the natural logarithm.
We will treat the numerator and denominator separately. We start with the denominator. Define the set
where is a fixed number. Let be a restriction of the prior to the set normalised to have the total mass We can write
of -probability tending to as
for , where for two sequences and of positive real numbers the notation indicates that there exists a constant that is independent of such that . We also used the fact that For future use remember that can be made arbitrarily small by choosing small. This finishes bounding the term Now we turn to By Lemma A, on the sequence of events
of -probability tending to as we have
which proves the theorem.
The theorem has the following corollary that we will use in the proof of Theorem 4.1: a fixed can be replaced with a sufficiently slowly decaying
For every fixed there exists a sequence , possibly depending on such that
The result follows from Lemma on p. 181 in Pollard (2002).
The metric for , in which posterior convergence occurs in Theorem 4.1, is defined indirectly, in terms of the distance between the corresponding laws However, we will show that the theorem implies posterior consistency also in another and perhaps more natural metric for . Let denote weak convergence of finite Borel measures and be the Dirac measure at zero. The following proposition holds, as a consequence of Theorem 2 in Gnedenko (1939), see Appendix A for its proof. Note that in our setting the first component of the Lévy triplet is completely determined by the Lévy density, cf. (2.1). Define for Lévy triplets , finite Borel measures
where we assume and are on and and are finite. Then if and only if
The following is our second main theoretical result, in which the metric for posterior contraction is defined directly for the Lévy density (equivalently, Lévy measure
). As the Lévy density uniquely determines the corresponding Lévy measure, in the theorem below as well as in its proof we will somewhat abuse the notation by considering posterior probabilities of certain sets of Lévy measures.
Let be any distance that metrises weak convergence of finite (signed) Borel measures. Then, for any fixed
Since the Lévy measures we consider are infinite in any neighbourhood of zero, using some weight function to convert them into finite measures does not appear to be an unnatural idea, cf. Comte and Genon-Catalot (2011) for a similar approach.
Proof of Theorem 4.1. Note that Hellinger consistency in Theorem 4.1 also holds when we replace with there, since Hellinger consistency implies consistency in any metric metrising weak convergence. The proof of the theorem is by contradiction. Assume that the statement of the theorem fails, so that there exist such that
Take Then the elementary relation
for all large enough has -probability at least In formula,
Let now and suppose Then by the same argument as above, for the realisation the intersection of two sets
must have posterior mass at least for all large enough. Note that by this fact it also holds that
for all large enough. However, by Proposition 2 the intersection is an empty set for , so that
But then, as
This contradicts (4.18). The proof is completed.
5 Example: Sum of two Gamma processes
Insurance theory, operational loss models, or more generally risk processes furnish a natural field of application for subordinators. In particular, a risk model based on Gamma process was extensively studied from a probabilistic point of view in the widely cited work Dufresne et al. (1991)
. On the other hand, a given risk process may itself be a result of conflation of several heterogeneous factors, for instance due to population heterogeneity. We may assume that individual risk processes can be modelled through independent Gamma processes. This is conceptually similar to using convolutions of gamma distributions in, e.g., storage models; seeMathai (1982). The cumulative risk process is again a Lévy process, though not necessarily gamma, as sums of independent Gamma processes are not necessarily Gamma. However, such sums can be closely approximated through -subordinators, as we will now demonstrate. It is enough to consider the particular case of a sum of two independent Gamma processes, the general case being only notationally more complex. Thus, let and be two independent Gamma processes with parameters and Let the process be their sum, Its Lévy density is given by
The process can be viewed as a mixture of phenomena happening at different time scales (slow and fast). For , the behaviour of is determined by and . On the hand, consider the equation
where will be chosen later on. Solving for we get