As an extension of self-exciting Hawkes process, the multivariate Hawkes process has been popularly used in literature to model the counting process of different types of random events because of its ability to capture mutual excitation. In finance, multivariate Hawkes process is used to examine market returns and investor sentiment interactions  and predict cyber-attacks frequency . In medical science, it is used to fit clinical events in patient treatment trajectory . In population dynamics, it is adopted in modeling the lapse risk in life insurance . In social media, retweet cascades  and popularity evolution of mobile Apps  can be well predicted using this process.
There are some previous works on simulation of multivariate Hawkes processes. The early works mostly rely on the intensity-based thinning method such as  and . Improved methods are proposed in  and  that can sample from inter-arrival times directly without generating intensity paths. However, previous works are not focused on the stationary distribution of the Hawkes process and can not be directly applied to generate unbiased samples from the steady state of the Hawkes process. To the best of our knowledge, our paper is the first perfect sampling algorithm for multivariate Hawkes process, i.e. sample from the exact stationary distribution without approximation. In this paper, we extend the perfect sampling algorithm of univariate Hawkes process in  to multivariate Hawkes processes with mutual excitements. The key component of our algorithm is to represent the Hawkes process as a Poisson-cluster process where each cluster follows a multivariate branching process. In addition, we derive an explicit expression of the algorithm complexity as a function of the model and algorithm parameters. In particular, we show that the complexity function is convex in algorithm parameters and hence there is a unique optimal choice of algorithm parameters that minimizes the algorithm complexity, which can be numerically computed using standard convex optimization solvers.
The rest of the paper is organized as follows. In Section 2, we introduce the cluster representation of multivariate Hawkes processes and our assumptions. In Section 3, we introduce the perfect sampling algorithm and derive the complexity function. In Section 4, we implement the perfect sampling algorithm and test the performance of the simulation algorithm.
2 Model and Assumptions
2.1 Hawkes Process
A -variate Hawkes process is a -dimension counting process that satisfies
as , where is the associated filtration and is called the conditional intensity such that
where for each , are the arrival times in direction and the constant is called the background intensity in direction , and for all , the function is integrable and called the excitation function.
From the conditional intensity function (1), we see that an arrival in any direction will increase the future intensity functions for all through the non-negative excitation function . Example of such excitation function like with is widely used in practice. Therefore, arrivals of a multivariate Hawkes process are mutually-exciting. According to , for the linear Hawkes process, when the excitation functions are non-negative, the counting process can be represented as a sum of independent processes, or clusters. Now we introduce another equivalent definition (or construction) of Hawkes process which represents it as clusters of branching processes.
(Cluster Representation of Multivariate Hawkes) Consider a (possibly infinite) and define a sequence of events in direction , , according to the following procedure:
For direction , a set of immigrant events arrive according to a Poisson process with rate on .
For each immigrant event , define a cluster , which is the set of events brought by immigrant . We index the events in by and represent it by a tuple , where is direction at which the event occurs, is the index of its parent event, and is its arrival time. Following this representation, we denote the immigrant event as .
Each cluster is generated following a branching process. In particular, upon the arrival of each event (say, in direction ), its direct offspring in direction will start to arrive according to a inhomogeneous Poisson process with rate function . (We will provide the complete procedure to generate a cluster in Algorithm 1 at the end of this section.)
Collect the sequence of events .
Then, in each direction , counting process corresponding to the event sequence is equivalent to the Hawkes process defined by conditional intensity function (1).
Now we introduce more random variables related to the Hawkes process that will be useful in our simulation algorithm and briefly analyze their distributional properties.
We define for each cluster its cluster length as the amount of time elapsed between the immigrant event and the last event in this cluster, i.e.
We call the arrival time of the cluster .
Let . According to Definition 1, multivariate Hawkes process has self-similarity property. That is, upon its arrival, an immigrant in direction will bring new arrivals in direction according to a time-nonhomogeneous Poisson process with rate for and each arrival will bring other new arrivals following the same pattern. And is the expected number of next generation in direction brought by each event in direction .
For any event in that is not an immigrant, and the arrival time of its parent is following our notation, we define the birth time of event as
Following the property of inhomogeneous Poisson process, the birth time of the events in direction whose parent in direction
are independent identical distributed random variables that follow the probability density functionfor conditional on number of events.
Given the clear meaning of and in the cluster representation of Hawkes process, in the rest of the paper, we shall denote by as the parameters that decide the distribution of a d-variate Hawkes process, where is a matrix, is a matrix-value function, and
is a vector in. Now, we could provide the complete procedure for Step 3 of Definition 1 in Algorithm 1.
2.2 Stationary Hawkes Process
For the multivariate Hawkes process with parameters to be stable in long term, intuitively, each cluster should contain a finite number of events on average. According to , we shall assume the following condition holds throughout the paper,
Indeed, we can directly construct a stationary Hawkes process using the cluster representation as follows. First, in each direction , we extend the homogeneous Poisson process of the immigrants, or equivalently, the cluster arrivals, to time interval . For this two-ended Poisson process, we index the sequence of immigrant arrival times by such that and generate the clusters independently for each following the procedure in Definition 1. Then, the events that arrive after time form a stationary sample path of a Hawkes process on . In detail, for each , let
then is a stationary multivariate Hawkes process.
Our goal is to simulate the stationary multivariate Hawkes process efficiently and we close this section by summarizing our technical assumptions on the model parameters :
The stability condition (4) holds.
There exists such that for all .
For at least one such that Assumption 2 holds, we can sample from the probability density function for all .
Assumption 1 is natural. Assumption 2 basically requires that mutual-excitement effects are short-memory, i.e. the excitation function is exponentially bounded. Actually, this assumption is satisfied in most of the application papers cited in the Introduction. Assumption 3 is assumed to guarantee algorithm implementation. Since the main focus of our algorithm is on the design of the importance sampling and the acceptance-rejection scheme, we simply assume an oracle is available to simulate from the tilted distribution. In real application, users can resort to a variety of simulation methods to generate the tilted distribution.
3 Perfect Sampling of Hawkes Process
where are corresponding to those events that come from clusters that arrived before time and depart after time , while are brought by immigrants arrive on . Since the immigrants arrive on according to homogeneous Poisson processes, can be simulated directly following the procedure described in Definition 1. So the key step in the perfect sampling algorithm is to simulate the clusters that arrive before time and depart after time . With slight abusing of notation, we shall denote the set of such clusters by and let where are the set of clusters brought by immigrants in direction before time 0 and lasting after time 0.
For each , let be the probability that a cluster brought by an immigrant in direction last for more than units of time. Then, by Poisson thinning theorem, the clusters in arrive on time horizon according a in-homogeneous Poisson process with intensity function , for . Besides, a cluster in should follow the conditional distribution of the branching process described in Definition 1 on the event that .
Following Definition 1, are independent and can be simulated separately. In our new algorithm, we extend Algorithm 1 proposed in  to sample, for each , from the conditional distribution of given without evaluating the function . Also, we shall improve the efficiency of simulating the conditional distribution of by importance sampling and acceptance-rejection methods. In particular, we define the total birth time as
where is the birth time of event in cluster as defined in (3), and for the immigrant event. We design a change of measure based on exponential tilting of with respect to which is called importance distribution. To do this, let us characterize such importance distribution by analyzing the cumulant generating function (c.g.f.) of .
For all , consider a cluster in direction with parameter . Let and be its cluster length and total birth time, respectively. Then, the following statements are true:
for all .
be the probability distribution of a cluster. Let be the importance distribution of the cluster under exponential tilting by parameter with respect to the total birth time , i.e.
Then, sampling a cluster from the importance distribution is equivalent to sampling a cluster in direction with parameter with .
Proof of Proposition 1.
We prove the three statements of Proposition 1 one by one.
(1). Given , to see , it is intuitive to represent the cluster as a tree ignoring the direction of each event. Let the immigrant event be the root node and link each event to its parent event by an edge of length . Then is equal to the total length of all edges in the tree while is equal to the length of the longest path(s) from the root node to a leaf node. Therefore, .
(2). By definition, for each , , where follows probability density for some . Let
Then we have
As a consequence, we can conclude that is finite in a neighbourhood around the origin and so is . As this is true for all , we can find such that the c.g.f.’s of for all are finite for all .
Now, for fixed , let’s characterize . For a cluster in direction , we denote by the number of direct offsprings in direction brought by the immigrant in direction . Then, is a Poisson random variable with mean . Based on the self-similarity property of the branching process, a direct child event in direction of the immigrant will bring a sub-cluster following the same distribution as a cluster , and thus the total birth time of this sub-cluster follows the same distribution as . For all , let be i.i.d. copies of birth time following the probability density and be i.i.d. copies of total birth time , for . Then, we have
(3). Under , the c.g.f. of becomes . We can compute,
with be the c.g.f. corresponding to probability density function . The above calculation shows that equals exactly to the c.g.f. of the total birth time of a cluster with parameter . Therefore, to simulate the exponentially tilted distribution of with parameter is equivalent to generate a cluster such that number of direct child events in direction of an event in direction is a Poisson with mean , and their birth time following the density function .
Given Proposition 1, we know how to simulate the importance distribution of . Now we provide our perfect sampling algorithm as follows.
Algorithm 2 actually contains two importance-sampling steps. When simulating the immigrants, i.e. the arrivals of clusters, it simulates an in-homogeneous Poisson with a larger intensity function . Now we show that the output of Algorithm 2 follows exactly the distribution of , and provide an explicit expression of algorithm complexity as a function of model and algorithm parameters.
The list of clusters generated by Algorithm 2 exactly follows the distribution of . In particular, for each direction ,
The arrival times of the clusters follow an in-homogeneous Poisson process with intensity for .
For each cluster in the list, given its arrival time , it follows the conditional distribution of a cluster given that the cluster length .
Besides, the expected total number of random variables generated by Algorithm 2 before termination is a convex function in with explicit expression as follows.
where is the -th row sum of the matrix and with for all .
Proof of Theorem 1. To prove Statement (1) for each direction , by Poisson thinning theorem, it suffices to show that, for each , the acceptance probability of the cluster equals to
According to Proposition 1, the importance distribution and the target distribution satisfies
Therefore, the probability for cluster to be accepted in Step 3 is
Therefore, we obtain Statement (1).
From the above calculation, we can also see that, given and , and any event , the joint probability
Therefore, the accepted sample of indeed follows the conditional distribution of given , and we obtain Statement (2).
To check (7), we first note that the expected total number of random variables generated by Algorithm 2 can be expressed as , where is the expected total number of clusters in direction , and is the average number of events in each cluster. Following Algorithm 2, the number of those clusters in direction is a Poisson random variable with mean
Now we compute the average number of events one cluster brought by an immigrant in direction . Note that in Algorithm 2, we use importance distribution of parameters to generate the clusters in direction . Let be the expected number of events in direction brought by an immigrant in direction under this set of parameters. Then, by the self-similarity of the branching process, we have
with . As
, the moment generating function of the clusters under the importance distribution is finite in a neighbourhood around 0 and the corresponding branching processes are all stable. Then the determinant, so the inverse matrix is well defined. Therefore,
As a consequence, the expected total number of random variables is
Finally, what remains to be proved is that the function is convex. This suffices to show that for each , is convex. In the rest of proof, for convenience, we write , and denote the product of functions as . As the eigenvalue of the matrix , we can write,
For any , let . Then, we have