1 Introduction
Motivation.
Cascading chain of events usually arise in nature or society: The economy has witnessed that financial meltdowns are often epidemic. For example, the Asian financial crisis swept across Thailand and quickly engulfed South Africa, Eastern Europe and even Brazil. Similarly, criminological research (Bernasco and Nieuwbeerta, 2005) has shown that crime can spread through local environments very rapidly where burglars will constantly attack nearby targets because local susceptibilities are well known to thieves. As another example, in genetic analysis, ReynaudBouret and Schbath (2010) looked at the likelihood of occurrences of a particular event along the DNA sequence where ‘an event’ could be any biological signals occurring along the genomes that tend to cluster together.
The defining characteristic of these examples is that the occurrence of one event often triggers a series of similar events. The Hawkes process, or otherwise known as the selfexciting process, is an extension of Poisson processes that aims to explain excitatory interactions (Hawkes, 1971). What makes the term selfexcitation worthy of its name is typically not the occurrence of the initial event, but the intensification of further events. We seek to characterize this amplification magnitude, which we call the contagion parameters, or levels of selfexcitation, or simply, levels of excitation.
The use of Hawkes processes is not an attempt to describe all features of selfexcitation in their correct proportions. Probabilistic modeling inevitably exaggerates some aspects while disregarding others, and an accurate model is one that takes care of the significant aspects and abandons the less important details. Thus, there are often two streams in modeling which are fairly contradictory; on the one hand, the model ought to mimic excitatory relationships in a real world application, and this pulls toward specifying wide families of processes. On the other hand, the model should be manageable and tractable which pulls in the direction of identifying simpler processes in which inference and parameter estimations are feasible.
Adopting the latter view of establishing simpler processes, we present a version of selfexciting processes that permits the levels of excitation to be modulated by a stochastic differential equation (SDE). SDEs are natural tools used to describe rate of change between the excitation levels. Put differently, we attach some indeterminacy to these quantities where we model them as random values over time, rather than being a constant as in the classical Hawkes setting (Hawkes, 1971; Ozaki, 1979). Our formulation implies that the contagion parameters are random processes thus inheriting tractable covariance structures, in contrast to the setup initiated by Brémaud and Massoulié (2002) and Dassios and Zhao (2011), where contagion levels are independent and identically distributed (iid) random.
Contributions.
We present a model that generalizes classical Hawkes and new insights on inference. Our noteworthy contributions are as follows: (1) We propose a fully Bayesian framework to model excitatory relationships where the contagion parameters are stochastic processes satisfying an SDE. This new feature enables the control of the varying contagion levels through periods of excitation. (2) With denoting the counts of events, we design a sampling procedure that scales with complexity compared to a naïve implementation of Ogata’s modified thinning algorithm (Ogata, 1981) which needs
steps. (3) A hybrid of MCMC algorithms that provide significant flexibility to do parameter estimation for our selfexciting model is presented. In addition, we describe how to construct two SDEs over periods of unequal lengths and introduce general procedures for inference. (4) We conclude by making explicit deductive connections to two related areas in machine learning; (i) the ‘Estep’ of the expectation maximization (EM) algorithm for Hawkes processes
(Veen and Schoenberg, 2008; Ogata, 1981) and (ii) modeling the volatility clustering phenomenon.2 Our Model : Stochastic Hawkes
2.1 Review of Poisson and Classical Hawkes Processes
This section recapitulates some pieces of counting process theory needed in what follows. The Poisson process is frequently used as a model for counting events occurring one at a time. Formally, the Poisson process with constant intensity is a process taking values in such that: (a) if then , (b) takes values if , if , and if where denotes any function that satisfies as . In addition, if , the number of events in the interval is independent to the times of events during . We speak of as the number of ‘arrivals’ or ‘events’ of the process by time . However, events usually do not arrive in evenly spaced intervals but naturally arrive clustered in time. The Classical Hawkes process aims at explaining such phenomenon. It is a point process whose intensity depends on the path mirrored by the point process over time. Precisely, the point process is determined by through the following relations: takes values if , if , and if , where for positive constants and .
2.2 Proposed Model Specification and Interpretation
We define our model as a linear selfexciting process endowed with a nonnegative –stochastic intensity function :
(1) 
where is a deterministic base intensity, is a stochastic process and conveys the positive influence of the past events on the current value of the intensity process. We write and to ease notation and being the history of the process and contains the list of times of events up to and including , i.e. . Figure 1 illustrates the different components of our model, which we explain in the following subsections.
2.2.1 Base Intensity,
This parameter is the base or background intensity describing the arrival of externaloriginating events in the absence of the influence of any previous events. These events are also known as exogenous events. By way of analogy, the base rate is referred to as the ‘immigrant intensity’ in ecological applications (Law et al., 2009), where it describes the rate with which new organisms are expected to arrive from other territories and colonies. In our case is a function of time and takes the form where is the initial intensity jump at time , is the constant parameter, and is the constant rate of exponential decay.
2.2.2 The Contagion Process,
The levels of excitation measure the impact of clustering or contagion of the event times. To see this, observe in Equation (1) that whenever is high and of positive value, it imposes a greater value to the intensity
, thus increasing the probability of generating an event in a shorter period of time, thereby causing the clustering phenomena.
We use differential equations to describe the evolution of the levels of excitation. Translating the evolution of contagiousness into the language of mathematics means setting up an equation containing a derivative (or an integral), discussed further below. The changes in the contagion is assumed to satisfy the stochastic differential equation
where is a standard Brownian motion and where . Different settings of the functionals and lead to different versions of SDEs.
An important criterion for selecting appropriate choices of the couple essentially boils down to how we decide to model the levels of excitation within Stochastic Hawkes. A standing assumption is that the contagion process has to be positive, that is,
Assumption 1
The contagion parameters .
This is necessary as the levels of excitations act as a parameter that scales the magnitude of the influence of each past event and subsequently contributes to the quantity in Equation (1), which is nonnegative.
Some notable examples of the couple are the Geometric Brownian Motion (GBM): (Kloeden and Platen, 1999; ZammitMangion et al., 2012); the SquareRootProcesses: , (Archambeau et al., 2007; Opper et al., 2010); Langevin equation: , and their variants (Stimberg et al., 2011; Welling and Teh, 2011; Liptser and Shiryaev, 1978).
Whilst the positivity of is guaranteed for GBM, this may not be true for other candidates such as the Langevin dynamics or the SquareRootProcesses. This is because they possess the inherent property that nothing prevents them from going negative and thus may not be suitable choices to model the levels of excitation. Specifically, SquareRootProcesses can be negative if the Feller condition is not satisfied (Feller, 1951; Liptser and Shiryaev, 1978). For reallife applications, this condition may not be respected, thus violating Assumption 1.
To that end, we focus on two specifications of the SDEs, namely the GBM and we tilt the Langevin dynamics by exponentiating it so that the positivity of is ensured (Black and Karasinski, 1991):

[nosep]

Geometric Brownian Motion (GBM):
(2) where and .

Exponential Langevin:
(3) where and .
The parameter for exponential Langevin denotes the decay or growth rate and it signifies how strongly the levels of excitation reacts to being pulled toward the asymptotic mean, . For fixed and , a small value of implies is not oscillating about the mean.
2.2.3 The Sum Product, .
The product describes the impact on the current intensity of a previous event that took place at time . We take to be the exponential kernel of the form . Note that is being shared between the base intensity and the kernel to ensure that the process has memoryless properties (see Hawkes and Oakes, 1974; Ozaki, 1979). The memoryless property states the following: given the present, the future is independent of the past. We exploit this property to design our sampling algorithm so that we do not need to track all of its past history, only the present event time matters. This would not have been possible, if we were to use a power kernel (Ogata, 1998), say. In addition, choosing this kernel enables us to derive Gibbs sampling procedures to facilitate efficient inference.
Summarizing, the intensity of our model becomes:
(4) 
If we set to be a constant, we retrieve the model proposed by Hawkes (1971). In addition, setting returns us the inhomogeneous Poisson process. Furthermore, letting and simplifies it to the Poisson process.
2.3 The Branching Structure for Stochastic Hawkes
This section presents a generative view of our model that permits a systematic treatment of situations where each of the observed event times can be separated into immigrants and offsprings, terminologies that we shall define shortly. This is integral to deriving efficient inference algorithms.
We call an event time an immigrant if it is generated from the base intensity , otherwise, we say is an offspring. This is known as the branching structure. It is therefore natural to introduce a variable that describes the specific process to which each event time
corresponds to. We do that by introducing the random variables
, where if event is an immigrant, and if event is an offspring of event . We illustrate and elucidate the branching structure through Figure 1. For further details on classification of Hawkes processes via the branching structure, refer to Rasmussen (2013) and Daley and VereJones (2003).2.4 Advantages of Stochastic Contagion
Before we dive into the technical definitions of simulating samples for our point process and performing parameter inference, we illustrate the effect of model choice. The levels of contagion can necessarily come in different flavors: be it a constant, as in the standard classical Hawkes, or a sequence of iid random variables, or satisfying an SDE.
First we generate levels of contagion following a GBM, see Figure 2(a). Performing inference by learning as a GBM leads to good estimation as indicated in Figure 2(b). However, if we let
to be Gamma distributed, we are less able to reproduce the properties of the ground truth, as is evident from Figure
2(c). This is fairly intuitive as the Gamma distribution does not inherit any serial correlation between the samples as they are iid, whereas the ground truth does possess correlation structure of a Geometric Brownian Motion.Proceeding further, this time with the ground truth inheriting iid Gamma random variables, as illustrated in Figure 2(d). Learning as Gamma in our model leads to good inference as illustrated in Figure 2(e). Further, letting to be a GBM enables us to learn some properties of the ground truth this time round, see Figure 2(f). Comparing Figures 2(a) & (c) against Figures 2(d) & (f), we conclude that a fairly general formulation for the level of contagion, such as the GBM, is advantageous in recovering stylized facts of an iid random levels of contagion, but not vice versa.
2.5 Likelihood Function
This section explicates the likelihood with the presence of an SDE and the branching structure within Stochastic Hawkes. The derivation is new and merits discussion here. A key result is that the integrated intensity function can be derived explicitly, that is,
where we note the second term for is a Riemann integral over a stochastic integral with respect to , is due to Fubini’s Theorem and follows from the equivalence between stochastic integral and the sum over event times. Consequently, we get the following:
Proposition 1
Let be , and for respectively. Assume that no events have occurred before time 0. Further set . Then the likelihood function is given by
This generalizes the likelihood function of Lewis and Mohler (2011). It can be viewed that this as an alternative version of the likelihood function found in Daley and VereJones (2003) and Rubin (1972) with the presence of the branching structure coupled with a stochastic process . The proof of this result can be found in the Supplemental Materials (Section B).
3 Simulation of Stochastic Hawkes
We present a sampling procedure for our Stochastic Hawkes model in Algorithm 1. This algorithm scales with complexity compared to a naïve implementation of Ogata’s modified thinning algorithm (Ogata, 1981) which requires steps with denoting the number of events. Similarly to Ozaki (1979) but differently from Ogata’s method, it is noteworthy to mention that our algorithm does not require the stationarity condition for intensity dynamics as long as .
The outline of Dassios and Zhao (2013) for simulating Hawkes processes is followed closely and adapted to the present setting. The idea of their algorithm is to decompose the interarrival event times into two independent simpler random variables, denoted by and , with the intention that they can be sampled conveniently. Note however that we also need to sample the levels of selfexcitation , which is a stochastic process in contrast to iid sequences of as in Dassios and Zhao (2013).
We seek to find laws that describe the GBM and exponential Langevin dynamics. Applying Itô’s formula (see Liptser and Shiryaev, 1978, and Section C in the Supplemental Materials) on and performing discretization for GBM yields
(5) 
where is introduced as a shorthand for . Similarly for exponential Langevin, the discretization scheme returns
where we define , is standard normal, and is known. Both these expressions now allow us to sample for all . We state the following:
Proposition 2
The simulation algorithm for a sample path of Stochastic Hawkes process is presented in Algorithm 1.
The proof of this algorithm presented in the Supplemental Materials (Section A).
4 Parameter Inference from Observed Data
We present a hybrid of MCMC algorithms that updates the parameters one at a time, either by direct draws using Gibbs sampling or through the Metropolis–Hastings (MH) algorithm. A hybrid algorithm (Robert and Casella, 2005) combines the features of the Gibbs sampler and the MH algorithm, thereby providing significant flexibility in designing the inference thereof for the parameters within our model.
To see the mechanics of this, consider a twodimensional parameterization as an illustration. Let and be parameters of interest. Assume that the posterior is of a known distribution, we can perform inference directly utilizing the Gibbs sampler. On the other hand, suppose can only be evaluated but not directly sampled; then, we resort to the use of an MH algorithm to update given . The MH step samples from a proposal distribution which implies that we draw and that the criteria to accept or reject the proposal candidate is based on the acceptance probability, denoted by :
(6) 
The hybrid algorithm is as follows: given , for iterations:

[nosep]

Sample and accept or reject based on Equation (6).

Sample with Gibbs sampling.
We proceed by explaining the inference of a simple motivating example in Section 4.1. This is the case when the contagion parameters are iid random elements. The main inference procedures for being stochastic processes can be found in Sections 4.2 and 4.3.
We summarize our MCMC algorithm in Algorithm 2.
4.1 Example: Levels of Excitation are iid Random
The focus here is on to be iid random elements with distribution function . To form a suitable model for the problem under consideration, we propose to model as a sequence of iid
Gamma distribution. This is a slight generalization to the Exponential distribution suggested by
Rasmussen (2013) as Gamma distribution contains an additional shape parameter that will help to improve the fitting performance.The are assumed to inherit iid Gamma distribution with shape and scale : . We also fix Gamma priors for
with hyperparameters
. Since all branching structure is equally likely a priori, we have . The posterior for follows Gamma distribution, which can easily be sampled. Turning to the parameters of , we note that Gamma prior on gives Gamma posterior, and the acceptance probability for the sampled is given by where takes the form .4.2 Gibbs Sampling
4.2.1 Sampling the Branching Structure
The posterior of follows the Multinomial posterior distribution where for all is a probability matrix (each row sum to 1) satisfying
(7) 
where
(8) 
is a normalizing constant. In the Gibbs sampler, we sample new directly from its posterior.
4.2.2 Sampling the Contagion Parameters
Geometric Brownian Motion.
Let and . Given that the joint posterior is given by
, we take two independent conjugate priors,
and where refers to the inverse Gamma distribution. Standard calculations yield the posterior distributions and where the posterior parameters are given by(9) 
and
(10) 
Exponential Langevin.
We take similar priors for as in the case for GBM. We further assume that . The posterior distributions for and are , and with
(11) 
as well as
(12)  
(13) 
and also
(14) 
Recall that the parameter expresses the wildness of fluctuation about the mean level . A small value of translates to a volatile . If we believe that the levels of selfexcitations were erratic, which is of particular interest, then we would want a small value for . This implies that expanding the power series on the exponential function where to the first order would be sufficient. This is similar in spirit to the Milstein scheme (Kloeden and Platen, 1999) where higher orders of quadratic variations vanish. For an exact sampling of , one needs to resort to an MH scheme. In most applications, we can even set to be a constant and do not perform inference for it. Proceeding, we obtain
(15)  
(16) 
where we have used the following shorthand , , , , and throughout the calculations.
4.3 MetropolisHastings
For the case of following the GBM, we propose a symmetric proposal for with . The posterior of is . The acceptance probability for is where
where we have defined and when .
For the case of with symmetry normal proposal, the acceptance probability is , where
For the inferences of the remaining parameters , and for exponential Langevin, the acceptance probabilities are shown in Table 1 in Supplemental Materials.
5 Discussion and Related Work
Reduction to EM.
We show that careful selection of specific priors yields posterior probabilities that coincide with the distribution that is taken under the Estep in the EM (expectation–maximization) algorithm methodology launched by
Veen and Schoenberg (2008). They utilized the branching structure as a strategy for obtaining the maximum likelihood estimates of a classical Hawkes process which has intensity as in Equation (4) with being a constant (). As in their paper, we define the variables associated with the th event time as if event is caused by event and if the event is an immigrant event. The unobserved branching structure is treated as the missing data and used to construct an EM algorithm. The conditional expected value of the complete data loglikelihoood can be written asThe following probabilities are used to find an expression for the conditional expected complete data loglikelihood:
taking value (a) when and value (b) when .
The kernel used by Veen and Schoenberg (2008) is . Observe that this coincides with Equations (7) and (8) in Section 4.2.1 when is set to a constant . These probabilities are analogous to the probabilities used to perform the thinning in the modified simulation algorithm (see Ogata, 1981; Farajtabar et al., 2014; Valera and GomezRodriguez, 2015).
The Renewal Equation Governing .
We provide an expression for when follows an SDE:
where 1. rewrites as a stochastic integral, 2. follows from subtracting and adding the mean value process of and 3. propagating expectation through the equation.
Other Point Processes.
Simma and Jordan (2010)
proposed an EM inference algorithm for Hawkes processes and applied to large social network datasets. Inspired by their latent variable setup, we adapted some of their hidden variable formulation within the marked point process framework into our fully Bayesian inference setting. We have leveraged ideas from previous work on selfexciting processes to consequently treating the levels of excitation as random processes.
Linderman and Adams (2014) introduced a multivariate point process combining self (Hawkes) and external (Cox) flavors to study latent networks in the data. These processes have also been proposed and applied in analyzing topic diffusion and user interactions (Rodriguez et al., 2011; Yang and Zha, 2013). Farajtabar et al. (2014) put forth a temporal point process model with one intensity being modulated by the other. Bounds of self exciting processes are also studied in (Hansen et al., 2015). Differently from these, we breathe another dimension into Hawkes processes by modeling the contagion parameters as a stochastic differential equation equipped with general procedures for learning. This allows much more latitude in parameterizing the selfexciting processes as a basic building block before incorporating wider families of processes.Studies of inference for continuous SDEs have been launched by Archambeau et al. (2007). Later contributions, notably by Ruttor et al. (2013) and Opper et al. (2010), dealt with SDEs with drift modulated by a memoryless Telegraph or Kac process that takes binary values as well as incorporating discontinuities in the finite variation terms. We remark that the inference for diffusion processes via expectation propagation has been pursued by Cseke et al. (2015). We add some new aspects to the existing theory by introducing Bayesian approaches for performing inference on two SDEs, namely the GBM and the exponential Langevin dynamics over periods of unequal lengths.
6 Final Remarks
We extended the Hawkes process by treating the magnitudes of selfexcitation as random elements satisfying two versions of SDEs. These formulations allow the modeling of phenomena when the events and their intensities accelerate one another in a correlated fashion.
Which stochastic differential equation should one choose? We presented two SDEs of unequal lengths in this work. The availability of other SDEs in the machine learning literature leaves us the modeling freedom to maneuver and adapt to each reallife application. Each scenario presents quite distinct specifics that require certain amount of impromptu and improvised inventiveness. Finally, a flexible hybrid MCMC algorithm is put forward and connexions to the EM algorithm is spelled out.
7 Acknowledgments
Young Lee wishes to thank Aditya K. Menon for inspiring discussions.
This work was undertaken at NICTA. NICTA is funded by the Australian Government through the Department of Communications and the Australian Research Council through the ICT Centre of Excellence Program.
References
 Archambeau et al. (2007) Archambeau, C., Cornford, D., Opper, M., and ShaweTaylor, J. (2007). Gaussian process approximations of stochastic differential equations. In Gaussian Processes in Practice, Bletchley Park, Bletchley, UK, June 1213, 2006, pages 1–16.
 Bernasco and Nieuwbeerta (2005) Bernasco, W. and Nieuwbeerta, P. (2005). How do residential burglars select target areas? A new approach to the analysis of criminal location choice. British Journal of Criminology, 45(3):296–315.
 Black and Karasinski (1991) Black, F. and Karasinski, P. (1991). Bond and option pricing when short rates are lognormal. Financial Analyst Journal, pages 52–59.
 Brémaud and Massoulié (2002) Brémaud, P. and Massoulié, L. (2002). Power spectra of general shot noises and Hawkes point processes with a random excitation. Advances in Applied Probability, 34(1):205–222.
 Cseke et al. (2015) Cseke, B., Schnoerr, D., Opper, M., and Sanguinetti, G. (2015). Expectation propagation for diffusion processes by moment closure approximations. In arXiv preprint arXiv:1512.06098.
 Daley and VereJones (2003) Daley, D. J. and VereJones, D. (2003). An Introduction to the Theory of Point Processes. SpringerVerlag New York, 2nd edition.
 Dassios and Zhao (2011) Dassios, A. and Zhao, H. (2011). A dynamic contagion process. Advances in Applied Probability, pages 814–846.
 Dassios and Zhao (2013) Dassios, A. and Zhao, H. (2013). Exact simulation of Hawkes process with exponentially decaying intensity. Electronic Communications in Probability, 18:1–13.
 Farajtabar et al. (2014) Farajtabar, M., Du, N., GomezRodriguez, M., Valera, I., Zha, H., and Song, L. (2014). Shaping social activity by incentivizing users. In NIPS ’14: Advances in Neural Information Processing Systems.
 Feller (1951) Feller, W. (1951). Two singular diffusion problems. The Annals of Mathematics, 54:173–182.
 Hansen et al. (2015) Hansen, N. R., ReynaudBouret, P., and Rivoirard, V. (2015). Lasso and probabilistic inequalities for multivariate point processes. Bernoulli, 21(1):83–143.
 Hawkes (1971) Hawkes, A. G. (1971). Spectra of some selfexciting and mutually exciting point processes. Biometrika, pages 89–90.
 Hawkes and Oakes (1974) Hawkes, A. G. and Oakes, D. (1974). A cluster process representation of a selfexciting process. Journal of Applied Probability, pages 493–503.
 Kloeden and Platen (1999) Kloeden, P. E. and Platen, E. (1999). Numerical solution of stochastic differential equations. Applications of Mathematics. Springer, Berlin, New York.
 Law et al. (2009) Law, R., Illian, J., Burslem, D. F. R. P., Gratzer, G., Gunatilleke, C. V. S., and Gunatilleke, I. A. U. N. (2009). Ecological information from spatial patterns of plants: insights from point process theory. Journal of Ecology, 97(4):616–628.
 Lewis and Mohler (2011) Lewis, E. and Mohler, G. (2011). A nonparametric EM algorithm for multiscale Hawkes processes. Journal of Nonparametric Statistics, pages 1–16.
 Linderman and Adams (2014) Linderman, S. W. and Adams, R. P. (2014). Discovering latent network structure in point process data. In ThirtyFirst International Conference on Machine Learning (ICML).
 Liptser and Shiryaev (1978) Liptser, R. S. and Shiryaev, A. N., editors (1978). Statistics of Random Processes, II. SpringerVerlag, BerlinHeidelbergNew York.
 Ogata (1981) Ogata, Y. (1981). On Lewis’ simulation method for point processes. IEEE Transactions on Information Theory, 27(1):23–31.
 Ogata (1998) Ogata, Y. (1998). Spacetime pointprocess models for earthquake occurrences. Annals of the Institute of Statistical Mathematics, 50(2):379–402.
 Opper et al. (2010) Opper, M., Ruttor, A., and Sanguinetti, G. (2010). Approximate inference in continuous time GaussianJump processes. In Advances in Neural Information Processing Systems 23, pages 1831–1839.
 Ozaki (1979) Ozaki, T. (1979). Maximum likelihood estimation of Hawkes’ selfexciting point processes. Annals of the Institute of Statistical Mathematics, 31(1):145–155.
 Rasmussen (2013) Rasmussen, J. G. (2013). Bayesian inference for Hawkes processes. Methodology and Computing in Applied Probability, 15(3):623–642.
 ReynaudBouret and Schbath (2010) ReynaudBouret, P. and Schbath, S. (2010). Adaptive estimation for Hawkes processes; application to genome analysis. Ann. Statist., 38:2781–2822.
 Robert and Casella (2005) Robert, C. P. and Casella, G. (2005). Monte Carlo Statistical Methods. SpringerVerlag New York.
 Rodriguez et al. (2011) Rodriguez, M. G., Balduzzi, D., and Schölkopf, B. (2011). Uncovering the temporal dynamics of diffusion networks. In ICML, pages 561–568.
 Rubin (1972) Rubin, I. (1972). Regular point processes and their detection. IEEE Transactions on Information Theory, 18(5):547–557.
 Ruttor et al. (2013) Ruttor, A., Batz, P., and Opper, M. (2013). Approximate Gaussian process inference for the drift function in stochastic differential equations. In Advances in Neural Information Processing Systems 26, pages 2040–2048.
 Simma and Jordan (2010) Simma, A. and Jordan, M. I. (2010). Modeling events with cascades of Poisson processes. In UAI, pages 546–555. AUAI Press.
 Stimberg et al. (2011) Stimberg, F., Opper, M., Sanguinetti, G., and Ruttor, A. (2011). Inference in continuoustime changepoint models. In Advances in Neural Information Processing Systems 24, pages 2717–2725. Curran Associates.
 Valera and GomezRodriguez (2015) Valera, I. and GomezRodriguez, M. (2015). Modeling adoption and usage of competing products. In ICDM, pages 409–418. IEEE.
 Veen and Schoenberg (2008) Veen, A. and Schoenberg, F. P. (2008). Estimation of spacetime branching process models in seismology using an EMtype algorithm. Journal of the American Statistical Association, 103(482):614–624.
 Welling and Teh (2011) Welling, M. and Teh, Y. W. (2011). Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the International Conference on Machine Learning.
 Yang and Zha (2013) Yang, S.H. and Zha, H. (2013). Mixture of mutually exciting processes for viral diffusion. In ICML, pages 1–9.
 ZammitMangion et al. (2012) ZammitMangion, A., Dewar, M., Kadirkamanathan, V., and Sanguinetti, G. (2012). Point process modelling of the Afghan war diary. Proceedings of the National Academy of Sciences, 109(31):12414–12419.