# Perfect Sampling of Multivariate Hawkes Process

As an extension of self-exciting Hawkes process, the multivariate Hawkes process models counting processes of different types of random events with mutual excitement. In this paper, we present a perfect sampling algorithm that can generate i.i.d. stationary sample paths of multivariate Hawkes process without any transient bias. In addition, we provide an explicit expression of algorithm complexity in model and algorithm parameters and provide numerical schemes to find the optimal parameter set that minimizes the complexity of the perfect sampling algorithm.

## Authors

• 17 publications
• 1 publication
• ### Counting paths in perfect trees

We present some exact expressions for the number of paths of a given len...
11/23/2017 ∙ by Peter J. Humphries, et al. ∙ 0

• ### Perfect Sampling of graph k-colorings for k> 3Δ

We give an algorithm for perfect sampling from the uniform distribution ...
09/23/2019 ∙ by Siddharth Bhandari, et al. ∙ 0

• ### Rejection Sampling for Tempered Levy Processes

We extend the idea of tempering stable Levy processes to tempering more ...
06/02/2018 ∙ by Michael Grabchak, et al. ∙ 0

• ### New Perfect Nonlinear Functions over Finite Fields

In this paper we present a new class of perfect nonlinear polynomials o...
12/10/2018 ∙ by Jinquan Luo, et al. ∙ 0

• ### Perfect simulation of the Hard Disks Model by Partial Rejection Sampling

We present a perfect simulation of the hard disks model via the partial ...
01/22/2018 ∙ by Heng Guo, et al. ∙ 0

• ### Backward Simulation of Multivariate Mixed Poisson Processes

The Backward Simulation (BS) approach was developed to generate, simply ...
07/15/2020 ∙ by Michael Chiu, et al. ∙ 0

• ### Data Disclosure under Perfect Sample Privacy

Perfect data privacy seems to be in fundamental opposition to the econom...
04/02/2019 ∙ by Borzoo Rassouli, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

As an extension of self-exciting Hawkes process, the multivariate Hawkes process has been popularly used in literature to model the counting process of different types of random events because of its ability to capture mutual excitation. In finance, multivariate Hawkes process is used to examine market returns and investor sentiment interactions [15] and predict cyber-attacks frequency [2]. In medical science, it is used to fit clinical events in patient treatment trajectory [5]. In population dynamics, it is adopted in modeling the lapse risk in life insurance [1]. In social media, retweet cascades [13] and popularity evolution of mobile Apps [12] can be well predicted using this process.

There are some previous works on simulation of multivariate Hawkes processes. The early works mostly rely on the intensity-based thinning method such as [11] and [9]. Improved methods are proposed in [6] and [4] that can sample from inter-arrival times directly without generating intensity paths. However, previous works are not focused on the stationary distribution of the Hawkes process and can not be directly applied to generate unbiased samples from the steady state of the Hawkes process. To the best of our knowledge, our paper is the first perfect sampling algorithm for multivariate Hawkes process, i.e. sample from the exact stationary distribution without approximation. In this paper, we extend the perfect sampling algorithm of univariate Hawkes process in [3] to multivariate Hawkes processes with mutual excitements. The key component of our algorithm is to represent the Hawkes process as a Poisson-cluster process where each cluster follows a multivariate branching process. In addition, we derive an explicit expression of the algorithm complexity as a function of the model and algorithm parameters. In particular, we show that the complexity function is convex in algorithm parameters and hence there is a unique optimal choice of algorithm parameters that minimizes the algorithm complexity, which can be numerically computed using standard convex optimization solvers.

The rest of the paper is organized as follows. In Section 2, we introduce the cluster representation of multivariate Hawkes processes and our assumptions. In Section 3, we introduce the perfect sampling algorithm and derive the complexity function. In Section 4, we implement the perfect sampling algorithm and test the performance of the simulation algorithm.

## 2 Model and Assumptions

### 2.1 Hawkes Process

A -variate Hawkes process is a -dimension counting process that satisfies

 P(Ni(t+Δt)−Ni(t)=m|F(t))=⎧⎨⎩λi(t)Δt+o(Δt),m=1o(Δt),m>11−λi(t)Δt+o(Δt)m=0,

as , where is the associated filtration and is called the conditional intensity such that

 λi(t)=λi0+d∑j=1∑{k:tjk

where for each , are the arrival times in direction and the constant is called the background intensity in direction , and for all , the function is integrable and called the excitation function.

From the conditional intensity function (1), we see that an arrival in any direction will increase the future intensity functions for all through the non-negative excitation function . Example of such excitation function like with is widely used in practice. Therefore, arrivals of a multivariate Hawkes process are mutually-exciting. According to [7], for the linear Hawkes process, when the excitation functions are non-negative, the counting process can be represented as a sum of independent processes, or clusters. Now we introduce another equivalent definition (or construction) of Hawkes process which represents it as clusters of branching processes.

###### Definition 1.

(Cluster Representation of Multivariate Hawkes) Consider a (possibly infinite) and define a sequence of events in direction , , according to the following procedure:

1. For direction , a set of immigrant events arrive according to a Poisson process with rate on .

2. For each immigrant event , define a cluster , which is the set of events brought by immigrant . We index the events in by and represent it by a tuple , where is direction at which the event occurs, is the index of its parent event, and is its arrival time. Following this representation, we denote the immigrant event as .

3. Each cluster is generated following a branching process. In particular, upon the arrival of each event (say, in direction ), its direct offspring in direction will start to arrive according to a inhomogeneous Poisson process with rate function . (We will provide the complete procedure to generate a cluster in Algorithm 1 at the end of this section.)

4. Collect the sequence of events .

Then, in each direction , counting process corresponding to the event sequence is equivalent to the Hawkes process defined by conditional intensity function (1).

Now we introduce more random variables related to the Hawkes process that will be useful in our simulation algorithm and briefly analyze their distributional properties.

• We define for each cluster its cluster length as the amount of time elapsed between the immigrant event and the last event in this cluster, i.e.

 Lim≜max1≤k≤|Cim|tim,k−τim. (2)

We call the arrival time of the cluster .

• Let . According to Definition 1, multivariate Hawkes process has self-similarity property. That is, upon its arrival, an immigrant in direction will bring new arrivals in direction according to a time-nonhomogeneous Poisson process with rate for and each arrival will bring other new arrivals following the same pattern. And is the expected number of next generation in direction brought by each event in direction .

• For any event in that is not an immigrant, and the arrival time of its parent is following our notation, we define the birth time of event as

 bim,k≜tim,k−tim,pim,k. (3)

Following the property of inhomogeneous Poisson process, the birth time of the events in direction whose parent in direction

are independent identical distributed random variables that follow the probability density function

for conditional on number of events.

• Given the clear meaning of and in the cluster representation of Hawkes process, in the rest of the paper, we shall denote by as the parameters that decide the distribution of a d-variate Hawkes process, where is a matrix, is a matrix-value function, and

is a vector in

. Now, we could provide the complete procedure for Step 3 of Definition 1 in Algorithm 1.

###### Remark 1.

The iteration in Algorithm 1 will stop in finite time almost surely as long as the Hawkes process is stationary (i.e. Assumption 1 holds), and thus the corresponding branching process will become extinct in finite time.

### 2.2 Stationary Hawkes Process

For the multivariate Hawkes process with parameters to be stable in long term, intuitively, each cluster should contain a finite number of events on average. According to [8], we shall assume the following condition holds throughout the paper,

 max{|λ|:λ is the eigenvalue of ¯h}<1. (4)

Indeed, we can directly construct a stationary Hawkes process using the cluster representation as follows. First, in each direction , we extend the homogeneous Poisson process of the immigrants, or equivalently, the cluster arrivals, to time interval . For this two-ended Poisson process, we index the sequence of immigrant arrival times by such that and generate the clusters independently for each following the procedure in Definition 1. Then, the events that arrive after time form a stationary sample path of a Hawkes process on . In detail, for each , let

 Nj(t) ≜|∪di=1∪∞m=−∞{(k,dim,k,pim.k,tim,k):k=1,2,...,|Cim|,dim,k=j,0≤tim,k≤t}|

then is a stationary multivariate Hawkes process.

Our goal is to simulate the stationary multivariate Hawkes process efficiently and we close this section by summarizing our technical assumptions on the model parameters :

###### Assumption 1.

The stability condition (4) holds.

###### Assumption 2.

There exists such that for all .

###### Assumption 3.

For at least one such that Assumption 2 holds, we can sample from the probability density function for all .

###### Remark 2.

Assumption 1 is natural. Assumption 2 basically requires that mutual-excitement effects are short-memory, i.e. the excitation function is exponentially bounded. Actually, this assumption is satisfied in most of the application papers cited in the Introduction. Assumption 3 is assumed to guarantee algorithm implementation. Since the main focus of our algorithm is on the design of the importance sampling and the acceptance-rejection scheme, we simply assume an oracle is available to simulate from the tilted distribution. In real application, users can resort to a variety of simulation methods to generate the tilted distribution.

## 3 Perfect Sampling of Hawkes Process

Following [10] and [3], we can decompose the counting process of the stationary Hawkes in each direction as

 Nj(t) ≜|∪di=1∪∞m=−∞{eim,k=(k,dim,k,pim.k,tim,k):k=1,2,...,|Cim|,dim,k=j,0≤tim,k≤t}| (5) =d∑i=1|∪−1m=−∞{eim,k∈Cim:dim,k=j,0≤tim,k≤t}|+d∑i=1|∪∞m=1{eim,k∈Cim:dim,k=j,0≤tim,k≤t}| ≜N0,j(t)+N1,j(t),

where are corresponding to those events that come from clusters that arrived before time and depart after time , while are brought by immigrants arrive on . Since the immigrants arrive on according to homogeneous Poisson processes, can be simulated directly following the procedure described in Definition 1. So the key step in the perfect sampling algorithm is to simulate the clusters that arrive before time and depart after time . With slight abusing of notation, we shall denote the set of such clusters by and let where are the set of clusters brought by immigrants in direction before time 0 and lasting after time 0.

For each , let be the probability that a cluster brought by an immigrant in direction last for more than units of time. Then, by Poisson thinning theorem, the clusters in arrive on time horizon according a in-homogeneous Poisson process with intensity function , for . Besides, a cluster in should follow the conditional distribution of the branching process described in Definition 1 on the event that .

Following Definition 1, are independent and can be simulated separately. In our new algorithm, we extend Algorithm 1 proposed in [3] to sample, for each , from the conditional distribution of given without evaluating the function . Also, we shall improve the efficiency of simulating the conditional distribution of by importance sampling and acceptance-rejection methods. In particular, we define the total birth time as

 Bim≜|Cim|∑k=2bim,k,

where is the birth time of event in cluster as defined in (3), and for the immigrant event. We design a change of measure based on exponential tilting of with respect to which is called importance distribution. To do this, let us characterize such importance distribution by analyzing the cumulant generating function (c.g.f.) of .

###### Proposition 1.

For all , consider a cluster in direction with parameter . Let and be its cluster length and total birth time, respectively. Then, the following statements are true:

1. [(1)]

2. for all .

3. Under Assumption 2, there exists such that for any , the cumulant generating function (c.g.f.) of is well-defined for all , i.e. . Besides, for , the vector satisfies a system of equations:

 ψBi(θ)=d∑j=1¯hij(exp(ψfij(θ)+ψBj(θ))−1),∀1≤i≤d, (6)

with being the c.g.f. of the birth time as defined in (3) for all .

4. Let

be the probability distribution of a cluster

. Let be the importance distribution of the cluster under exponential tilting by parameter with respect to the total birth time , i.e.

 dQ(Cim)=exp(ηBim−ψBi(η))⋅dP(Cim).

Then, sampling a cluster from the importance distribution is equivalent to sampling a cluster in direction with parameter with .

Proof of Proposition 1.

We prove the three statements of Proposition 1 one by one.

(1). Given , to see , it is intuitive to represent the cluster as a tree ignoring the direction of each event. Let the immigrant event be the root node and link each event to its parent event by an edge of length . Then is equal to the total length of all edges in the tree while is equal to the length of the longest path(s) from the root node to a leaf node. Therefore, .

(2). By definition, for each , , where follows probability density for some . Let

 ~bk=∑1≤l,j≤d~bij,k, with ~blj,ki.i.d.∼flj(t),∀1≤l,j≤d.

Then we have

 E[exp(θBim)]≤E⎡⎢⎣exp⎛⎝θ|Cim|∑k=2~bk⎞⎠⎤⎥⎦.

Note that is the sum of i.i.d. random variables. On the one hand, it is known in literature [14] that the c.g.f. of is finite in a neighbourhood around . On the other hand, following Assumption 2,

 E[exp(θ~bk)]=∏1≤l,j≤d∫∞0eθtfij(t)dt<∞,∀θ≤η.

As a consequence, we can conclude that is finite in a neighbourhood around the origin and so is . As this is true for all , we can find such that the c.g.f.’s of for all are finite for all .

Now, for fixed , let’s characterize . For a cluster in direction , we denote by the number of direct offsprings in direction brought by the immigrant in direction . Then, is a Poisson random variable with mean . Based on the self-similarity property of the branching process, a direct child event in direction of the immigrant will bring a sub-cluster following the same distribution as a cluster , and thus the total birth time of this sub-cluster follows the same distribution as . For all , let be i.i.d. copies of birth time following the probability density and be i.i.d. copies of total birth time , for . Then, we have

 ψBi(θ) =log(E[exp(θBim)])=log⎛⎝E⎡⎢⎣exp⎛⎝θ|Cim|∑k=1bim,k⎞⎠⎤⎥⎦⎞⎠ =log⎛⎜⎝E⎡⎢⎣E⎡⎢⎣exp⎛⎜⎝θd∑j=1Kij∑k=1(bij(k)+Bj(k))⎞⎟⎠|Ki1,Ki2,…,Kid⎤⎥⎦⎤⎥⎦⎞⎟⎠ =log(E[exp(d∑j=1Kij(ψfij(θ)+ψBj(θ)))]) =d∑j=1¯hij[exp(ψfij(θ)+ψBj(θ))−1].

(3). Under , the c.g.f. of becomes . We can compute,

 ψBi,η(θ) =d∑j=1¯hij[exp(ψfij(θ+η)+ψBj(θ+η))−1]−d∑j=1¯hij[exp(ψfij(η)+ψBj(η))−1] =d∑j=1¯hij[exp(ψfij(θ+η)+ψBj(θ+η))−exp(ψfij(η)+ψBj(η))] =d∑j=1¯hijexp(ψfij(η)+ψBj(η))[exp(ψfij,η(θ)+ψBj,η(θ))−1]

with be the c.g.f. corresponding to probability density function . The above calculation shows that equals exactly to the c.g.f. of the total birth time of a cluster with parameter . Therefore, to simulate the exponentially tilted distribution of with parameter is equivalent to generate a cluster such that number of direct child events in direction of an event in direction is a Poisson with mean , and their birth time following the density function .

Given Proposition 1, we know how to simulate the importance distribution of . Now we provide our perfect sampling algorithm as follows.

Algorithm 2 actually contains two importance-sampling steps. When simulating the immigrants, i.e. the arrivals of clusters, it simulates an in-homogeneous Poisson with a larger intensity function . Now we show that the output of Algorithm 2 follows exactly the distribution of , and provide an explicit expression of algorithm complexity as a function of model and algorithm parameters.

###### Theorem 1.

The list of clusters generated by Algorithm 2 exactly follows the distribution of . In particular, for each direction ,

1. [(1)]

2. The arrival times of the clusters follow an in-homogeneous Poisson process with intensity for .

3. For each cluster in the list, given its arrival time , it follows the conditional distribution of a cluster given that the cluster length .

Besides, the expected total number of random variables generated by Algorithm 2 before termination is a convex function in with explicit expression as follows.

 X(η)=d∑i=1λ0iexp(ψBi(ηi))ηi(1+~Si(ηi)), (7)

where is the -th row sum of the matrix and with for all .

Proof of Theorem 1. To prove Statement (1) for each direction , by Poisson thinning theorem, it suffices to show that, for each , the acceptance probability of the cluster equals to

 γi(τim)~γi(τim)=P(Lim>−τim)exp(−ητim)/exp(ψBi(η)).

According to Proposition 1, the importance distribution and the target distribution satisfies

 dQ(Bim=x)=dP(Bim=x)⋅exp(ηx)exp(ψBi(η)).

Therefore, the probability for cluster to be accepted in Step 3 is

 EQ[1(Lim>−τm and Um−τim)exp(−η(Bim+τim))dQ = ∫1(Lim>−τim)exp(−η(Bim+τim))dP⋅exp(ηBim)exp(ψBi(η)) = ∫exp(−ητim)exp(ψBi(η))1(Lim>−τim)dP = P(Lim>−τim)exp(−ητim)/exp(ψBi(η)).

Therefore, we obtain Statement (1).

From the above calculation, we can also see that, given and , and any event , the joint probability

 P(Cim∈A,Cim is accepted) =∫exp(−ητim)exp(ψBi(η))1(Cim∈A,Lim>−τim)dP ∝P(Lim>−τim,Cim∈A).

Therefore, the accepted sample of indeed follows the conditional distribution of given , and we obtain Statement (2).

To check (7), we first note that the expected total number of random variables generated by Algorithm 2 can be expressed as , where is the expected total number of clusters in direction , and is the average number of events in each cluster. Following Algorithm 2, the number of those clusters in direction is a Poisson random variable with mean

 ∫∞0~γi(−t)dt=λ0,iexp(ψBi(ηi))ηi.

Now we compute the average number of events one cluster brought by an immigrant in direction . Note that in Algorithm 2, we use importance distribution of parameters to generate the clusters in direction . Let be the expected number of events in direction brought by an immigrant in direction under this set of parameters. Then, by the self-similarity of the branching process, we have

 Slj=1(l=j)+d∑k=1~hlk(ηi)Skj, ∀1≤l,j≤d, (8)

with . As

, the moment generating function of the clusters under the importance distribution is finite in a neighbourhood around 0 and the corresponding branching processes are all stable. Then the determinant

, so the inverse matrix is well defined. Therefore,

 Si=~Si(ηi)≜d∑j=1~Sij(ηi) is the row sum of the matrix(I−~h(ηi))−1.

As a consequence, the expected total number of random variables is

 d∑i=1MiSi=d∑i=1λ0iexp(ψBi(ηi))ηi(1+~Si(ηi)).

Finally, what remains to be proved is that the function is convex. This suffices to show that for each , is convex. In the rest of proof, for convenience, we write , and denote the product of functions as . As the eigenvalue of the matrix , we can write,

 x(ηi) =g(ηi)(1+d∑j=1~Sij(ηi))=2g(ηi)+∞∑m=1d∑k1=1d∑k2=1…d∑km=1(g∗~hik1∗…∗~hkm−1km)(ηi).

For any , let . Then, we have

 ∂∂ηi(g∗~hik1∗…∗~hkm−1km)(ηi)=