1 Introduction
1.1 From credit derivative indices to Hawkes processes
Credit indices are financial instruments comprised of a set of credit securities, mainly used to hedge credit default risk. Each new index series is issued every 6 months and expires after a defined maturity. Though liquid, those indices are traded at a rather ”midfrequency” rate, with trades occurring at a minute scale. For European indices, the market is continuously open from 7:00 to 17:00, with a total of five to ten billion euros reported each day on average. Fig 1 gives an idea of the activity on a sampled day of trading. If recent regulations have led to a greater public reporting of trading activities, thus releasing more amount of traded data, this data is by essence quite sparse. Picturing such market behaviour over time represents a challenging opportunity: it originally motivated the work presented in this article.
This study focuses on three principal European credit indices, of the most traded 5year maturity^{1}^{1}1Data provided by otcstreaming.com, and available on demand. : the Main Index, noted here itxeb, a combination of 125 equally weighted investment grade entities; the Crossover Index, noted itxex, with 75 subinvestment grade names; and the Senior Financial, noted itxes, a subset of 30 financial entities from Main Index, referencing senior debt.
When trying to capture the underlying dynamics of this market, one of the first questions that comes to mind is: how random is the timing of trades? A very naive approach is to model trades arrival times as purely random and uncorrelated processes, for example a Poisson process by trying to fit an exponential law over the distribution of interarrival times. Yet, such a model fails to fit the data: from Fig 2, we can notice a high density of very short interarrival times, suggesting that some selfexcitation phenomenon occurs, with trades triggering more trades; an intuition that every practitioner of the field would confirm.
Number of trades  

itxeb  7 min  10 min  1 min  3 min  8 min  22 227 
itxex  8 min  12 min  2 min  4 min  10 min  18 152 
itxes  17 min  24 min  3 min  9 min  21 min  7 194 
Literature review reveals that one model has recently driven considerable attention and a growing interest from both the scientific and quantitative communities: Hawkes processes. This model, introduced in 1971 by Hawkes ([6], [7]) to characterize earthquake tremors, can be used to describe timing of trades and their cross influences from a point process perspective.
After summarizing related work for Hawkes processes, we will recall the theoretical framework around this stochastic model. We will then focus on optimization methods for fitting such a process and describe Two Stage Hawkes Likelihood Optimization (2SHLO), a maximum likelihood based algorithm to fit multidimensional Hawkes processes with a parametric exponential kernel. Lastly, we will present our results, firstly on simulated data, then on credit derivative indices mentioned above. Hawkes processes in finance have mainly been applied to high frequency data. Is such a model appropriate for ”midfrequency” credit data? Can we specify the impact of volumes in trades or describe mutual influences between different indices?
1.2 Related work
Hawkes processes were originally designed for seismology analysis and deeply studied for that purpose by Ogota [12], [10], [11]. The concept has been utilized to model other effects where self and cross ignitions happen, such as social media tweets cascading [14], crime occurrences due to gang retaliations [9], or financial market events. As regards to financial applications, Bowsher [4]
proposed in 2002 a generalized Hawkes process model taking into account night gaps, interday dependencies as well as intraday seasonality (with a piecewise baseline intensity), which was used to model interactions between trades and price changes and also estimate of the price volatility based on midquote intensity for NYSE stock. Market price microstructure & market impact (influence of market orders on forthcoming prices) was modelled in
[3], price impact was also studied in [1]. [13] focused on the impact of volume and order types on the limit order book. [2] (2015) exposes a full review of Hawkes process applications to finance, such as price & volatility modeling, market reflexivity measurement, order book modeling or risk contagion modeling.2 Modeling Self and Cross Excitement with Hawkes Processes
This section reframes the required formalism, as very clearly stated in [2], [8] or [15], starting from point and counting process definitions, through the key concept of intensity function. Hawkes model formulation is detailed for the reader, with a peculiar focus on exponential kernel structure.
2.1 Core concepts: from counting & point processes to intensity functions
Definition 1 (Point Process)
Let
be a probability space.
Let
be a sequence of nonnegative random variables such that
, . is called a (simple) point process on .Definition 2 (Counting Process)
Let be a point process. The rightcontinuous stochastic process defined for all as is called the counting process associated with .
The study of point processes goes through a single mathematical object, the intensity function, which is defined as the conditional probability density of occurrence of an event in the immediate future.
Definition 3 (Intensity Function)
Let be a counting process adapted to a filtration
. The leftcontinuous intensity is heuristically defined as
(1) 
The intensity function depends on the choice of filtration , which represents the amount of information available until time (See [5] for a rigorous definition of the intensity function). In the context of Hawkes processes, we will simply use the natural filtration, with all previous information being available, and consider .
Homogeneous Poisson process
One of the simplest point processes is the homogeneous Poisson process, for which the intensity function is constant over time:
. In that case, durations (or interevent waiting times) are independent and identically distributed (following an exponential distribution of hazard rate
). As presented in the introduction, credit trades cannot be modeled by this memoryless model. Let’s introduce Hawkes processes, a peculiar kind of nonhomogeneous Poisson processes with linear dependencies over functions of past events.2.2 Selfexcitement: onedimensional Hawkes processes
Definition 4 (Onedimensional Hawkes Processes)
Let be a point process and the associated counting process, such that its intensity function is defined for each time as
(2) 
where is an exogenous base intensity and is a nonnegative, measurable function such that .
is called a Hawkes process with baseline and kernel .
The kernel expresses the positive influence of past events on the current value of the intensity. Each jump increases the probability of future events through the kernel . Clustering effects and branching structure are well depicted by Hawkes processes, with the baseline activity generating immigrant events and descendant events enhancing the intensity.
For this study, we focus on exponential kernels, defined as , with parameters , which allows easy interpretation: or adjacency represents the weight of previous events while is the decay / typical duration of influence for a past event.
The branching ratio represents the average number of descendants for any event. For exponential kernel, .
2.3 Including mutualexcitement with multidimensional Hawkes processes
Definition 5 (Multidimensional Hawkes Processes)
Let and be a Mdimensional point process.
We denote by
the associated counting process such that its vector intensity function
, is defined as, for all , :(3) 
with an exogenous base intensity vector
and a matrixvalued kernel that is componentwise positive and causal (null values when ) and with each component belonging to the space of integrable functions.
is called a multidimensional (or multivariate) Hawkes process with baseline and kernel .
The choice of an exponential kernel can be generalized for multivariate Hawkes processes with kernel components being expressed as .
This article focuses on Hawkes processes with exponential kernel and constant baselines, to which we propose a maximum likelihood based fitting method.
3 Hawkes Processes: Model Calibration
The most commonly used technique for parametric inference of Hawkes processes is a direct numerical optimization of the Maximum Likelihood, which was first introduced in the work of Ogata (1978) [10]. He proved the asymptotic consistency and efficiency of the Maximum Likelihood Estimator (MLE) under the assumption of stationary condition of the underlying point process. The negative loglikelihood function of a multidimensional Hawkes process over the time interval is given in Daley & VereJones [5] Proposition 13.1.VI by :
(4) 
Depending on the shape of kernels,
is generally nonconvex. Classic nonconvex numerical optimization pitfalls are to be feared: a direct numerical optimization could converge to a merely local minimum. On the other hand, it may take too many iterations to converge as the negative loglikelihood function can be flat on some regions of the space of parameters. This problem is also faced when using an Expectation Maximization (EM) algorithm to find the MLE.
On the computational side, the repeated evaluation of the negative loglikelihood can be highly time consuming, essentially due to the nested sum in the first part of 4 where the conditional intensity is also expressed as a sum over the history.
3.1 Maximum Likelihood Estimation (MLE) for exponential multidimensional Hawkes processes
Let’s consider a Mdimensional multivariate Hawkes model with constant baselines and kernel functions as defined in 2.3. The choice of an exponential kernel has proved to be very interesting as it allows intuitive and meaningful interpretations of its parameters and also reduces the computational cost of the evaluation of the negative loglikelihood function as noted by Ogata (1981) [11] who exhibited a recursive formula that eliminated the nested sum evaluation problem.
The existing methods to fit exponential parameters through MLE directly use nonlinear optimization algorithms such as NelderMead (also called downhill simplex method) or BFGS, with performance decreasing as the number of parameters increases. This represents quite an issue as parameters describe a Mdimensional Hawkes process with exponential kernel.
In addition, other existing methods often make the strong assumption that the decays of the exponential kernel are given and fixed a priori. Consequently, the estimation of each kernel function is equivalent to the estimation of its adjacency coefficient and since the conditional intensity is linear with respect to kernel functions , the negative loglikelihood function is convex with respect to all parameters. Therefore we can use the widely available convex optimization machinery to find the global minimum efficiently. The main drawback of this method is the possible inaccuracy of the decays parameters which can lead to model mismatch.
3.2 Introducing Two Stage Hawkes Likelihood Optimization (2SHLO):
In order to combine the benefits of both approaches described above, we propose the following estimation method to fit exponential Hawkes processes.
The negative loglikelihood function for exponential multivariate Hawkes processes is given by (see Ogata[11] for the recursive formulation):
(5) 
As we mentioned earlier, if is fixed, then is convex with respect to parameters . As a result it can be minimized using any convex optimization algorithm, for example an Accelerated Gradient Descent (AGD) which converges to a global minimum such that:
(6) 
is still a nonconvex function but defined on a space with a lower dimension than the initial parameter space. Any nonlinear heuristic algorithms can now be used to minimize leading to the MLE as^{2}^{2}2if we call , the following inequality holds for all : 3.2 follows immediately by taking the minimum over in each side of the inequality. :
(7) 
We summarize the fitting method as follow :
3.3 Goodness of fit
The compensator function of a point process with intensity is defined as . To assess the goodness of fit of our estimation, we use the residual point process analysis theorem, as stated in [5]:
The transformed sequence is a realization of a unit rate Poisson process if and only if the original sequence is a realization from the point process defined by .
The goodness of fit can be tested by comparing the transformed estimated times to a standard Poisson process using Q–Q plots and comparison of density functions.
4 Results  2SHLO in practice
We have been running our experiments in Python 3, relying on tick[16] library for Hawkes process simulations, AGD optimization and analytic tools. 2SHLO is also benefiting of wellknown scipy scientific package.
To provide an idea of the computational performance of the fitting method, we mention that, on average, it takes 4.69 seconds to fit 2 dimensional exponential hawkes process with training length of T = 10000.
4.1 Validating 2SHLO performances over simulated data
To generate Hawkes processes, a few methods are available (as summarized in [2]), either based on thinning, timechange or cluster algorithm. We have here used tick functions for simulations which are based on the thinning algorithm described in (Ogata, 1981, p.25, Algorithm 2) [11].
To validate the performance of the fitting algorithm, we ran a series of 100 simulation and fitting procedure on the simulated 2D Hawkes timestamps using different end times T to have different training sizes and using the following simulation parameters :
Fig. 3 confirms that the estimated parameters converge to their true optimal values for increasing N number of ticks in the training set. This reassures that when the Hawkes process is well specified and indeed recovers the optimal parameters asymptotically.
4.2 Calibrating univariate Hawkes processes on ”midfrequency” credit derivative trades
We have fitted a modified onedimensional Hawkes process on the period from 15/01/2017 to 15/12/2017 of reported trades data^{3}^{3}3The trading data have been cleaned by discarding transactions that do not reflect market signals such as roll and switch trades on 3 indices ITXEB, ITXES and ITXEX. The modification consists of imposing a null value of the intensity in the gap between two days and the standard exponential kernel format during trading days so as to account for the absence of trading activity overnight. The model is thus slightly modified and described in the appendix section in a more detailed way. Before making this modeling choice, we experimented with a more elaborate model that accounts for the overnight spillover effect as described in the work of C. G.Bowsher [4] where the author defines the intensity as recursively depending on the level of the stochastic parts of the intensity at the end of the previous trading day and thus extending the effect of trading days over time. The fitting procedure resulted into a null parameter estimation of the spillover effect which motivated the choice of the model modification described above.
To validate the ’universality’ of our estimates, we tested the goodness of fit of the Hawkes model with the estimated parameters on out of sample period from 15/01/201815/07/2018. The model fitted well for the 3 indices time series (see Fig. 4) suggesting that each series has stable intrinsic parameters that characterize the pattern of trades arrival times.
Estimated  Estimated  Estimated  

ITXEB  22 min  0.62  20 min 
ITXEX  24 min  0.60  23 min 
ITXES  58 min  0.65  36 min 
4.3 Measuring the influence of traded volumes
When considering the distribution of trade volumes for credit indices, volume clusters are clearly outlined. For example, for index itxeb, three bins can be distinguished: ”small” trades, ”medium” trades and ”big” trades (See Fig. 5). Using multivariate Hawkes processes model, volume impacts have been modeled by considering 3 counting processes, splitting trades per volume size.
By using the same model as the last section to take overnight gaps into consideration, the fitting algorithm trained on 2018 data resulted in the following parameters estimates ( and expressed in minute) :
Visualizing in Fig. 5 the branching ratio which corresponds the the values of allows some intuitive interpretation of the fitted model as it shows that large trades are purely self exciting process and they have a major influence in triggering small trades, we also notice that all trade categories have a non negligible self excitation component.


4.4 Measuring cross interactions between traded indices
We have adopted again a multivariate Hawkes processes with null intensity overnight model to study the cross interaction between indices trades times. The fitting algorithm trained on 2017 data resulted in the following parameters estimates ( and expressed in minute) :
From Fig. 6 we can notice that itxes index is the least influenced by the other indices with the very low cross excitation components and a high self excitation component which was expected due the its low liquidity. We also notice that the estimated baselines of all indices are greater for this multivariate model compared to the unidimensional model which can be explained by the fact that we decreased the ’Poissonnian’ behaviour of the arrival times by including more information about the influence of other indices.
5 Conclusion
Through point process analysis, Hawkes processes allows a relatively simple and interpretable representation of time dependant events with self and mutual excitements. With 2SHLO; a maximum likelihood based algorithm mixing convex and nonconvex optimization, we have been able to fit fastly and properly exponential kernel multivariate Hawkes processes. The algorithm has been applied to credit derivative trading data, an unexplored ”midfrequency” market for such processes. We have been able to emphasize self excitement for three index trades and to measure impacts of traded volumes. With the same framework, we were also able to quantify cross influences between indices. Next prospects turns towards testing 2SHLO performances on high dimensional datasets, for instance studying both price & volume trade influences as well as market bid/ask dynamics on credit derivatives in a forecasting perspective.
References
 [1] Amaral, L., Papanicolaou, A.: Price Impact of Large Orders Using Hawkes Processes. In: NYU Tandon Research Paper No. 2874042 (2017)
 [2] Bacry, E., Mastromatteo, I., Muzy, J.F.: Hawkes Processes in Finance. In: Market Microstructure and Liquidity, vol. 01 (2015)
 [3] Bacry, E., Muzy, J.F.: Hawkes models for price and trades highfrequency dynamics. In: Quantitative Finance, vol. 14, pp. 1147–1166 (2013)
 [4] Bowsher, C. G.: Modelling Security Market Events in Continuous Time: Intensity Based, Multivariate Point Process Models. In: Nuffield Economics Working Paper (2003)
 [5] Daley, D.J., VereJones, D.: An introduction to the theory of point processes, vol. 2. Springer (1988)
 [6] Hawkes, A.G.: Spectra of Some SelfExciting and Mutually Exciting Point Processes. In: Biometrika, vol. 58, pp. 83–90 (1971)
 [7] Hawkes, A.G.: Point Spectra of Some Mutually Exciting Point Processes. In: Journal of the Royal Statistical Society. Series B (Methodological), vol. 33, pp. 438–443 (1971)
 [8] Laub, P., Taimre, T., Pollett, P.: Hawkes Processes. In: arXiv preprint arXiv:1507.02822 (2015)
 [9] Mohler, G. O., Short, M. B., Brantingham, P. J., Schoenberg, F. P., Tita, G. E.: SelfExciting Point Process Modeling of Crime. In: Journal of the American Statistical Association, vol. 106, pp. 100–108 (2011)
 [10] Ogata, Y., The asymptotic behaviour of maximum likelihood estimators for stationary point processes. Annals of the Institute of Statistical Mathematics, vol. 30, pp. 243–261 (1978)
 [11] Ogata, Y., On Lewis’ simulation method for point processes, IEEE Transactions on Information Theory, vol. 27, pp. 23–31 (1981)
 [12] Ogata, Y.: Statistical Models for Earthquake Occurrences and Residual Analysis for Point Processes. In: Journal of the American Statistical Association, vol. 83, pp. 9–27 (1988)
 [13] Rambaldi, M., Bacry, E., Lillo, F.: The role of volume in order book dynamics: a multivariate Hawkes process analysis. In: Quantitative Finance (2016)
 [14] Rizoiu, M., Lee, Y., Mishra, S., Xie, L.: A Tutorial on Hawkes Processes for Events in Social Media. In: Frontiers of Multimedia Research, pp. 191–218 (2017)
 [15] Toke, I. M., An Introduction to Hawkes Processes with Applications to Finance (2011) http://lamp.ecp.fr/MAS/fiQuant/ioane_files/HawkesCourseSlides.pdf
 [16] Bacry E., Bompaire M., Gaïffas S., Poulsen S., Tick: a Python library for statistical learning, with a particular emphasis on timedependent modelling. https://xdatainitiative.github.io/tick/#
Appendix: Hawkes Processes Formulas
5.1 The Univariate Bowsher Hawkes process model
5.1.1 Definition:
As described in [4], the intensity of the univariate Bowsher process is defined recursively depending on the level of the stochastic parts of the intensity function at the end of the th trading day and the contributions of the events occurring on day . Thus it is necessary to perform the following data transformation: the real halfline is partitioned into intervals of length corresponding to the different trading days. This partition is written as :
The model is thus defined by the (scalar) stochastic intensity :
such that :
where the parameters used are :

is the interval defining the day number d

is the baseline intensity

is the spillover adjacency coefficient

is the spillover decay

is the self excitement adjacency coefficient

is the self excitement decay

is the process differentiate
5.1.2 The Loglikelihood:
The minus loglikelihood of a univariate point process is defined as :
Computing the minus loglikelihood in the Hawkes Bowsher model results into the following closed form :
5.1.3 The Fitting procedure:
As the task of computing the gradient of each parameter of the loglikelihood is a fastidious one because of the recursive definition of the intensity, and also because we have only 5 parameters to fit , we choose to use the LBFGSB algorithm to perform the minimization of which had delivered satisfying performance on fitting simulated data.
5.1.4 Data simulation:
We use the thinning algorithm to simulate artificial data described by the following algorithm :

Given Bowsher Hawkes process described as above

Set current time and event counter

While

Set the upper bound of Poisson intensity .

Sample interarrival time: draw and let (as described in).

Update current time: .

Draw .

If , accept the current sample: let and .
Otherwise reject the sample, return to step (a).

5.2 The Day Gaps Multivariate Exponential Hawkes process model
5.2.1 Definition:
The intensity of the Day Gaps Multivariate Exponential Hawkes process is defined just like the standard multivariate exponential Hawkes process model described in [6] given by :
Where :

is the number of nodes

are the baseline intensities

are the kernels

are the processes differentiates
with an exponential parametrisation of the kernels:

are the adjacency of the kernel

are the decays of the kernel
With the modification consisting on considering the intensity as a null function between two consecutive days to take the day gaps into consideration :
5.2.2 The Loglikelihood:
The minus loglikelihood of a Multivariate Point Process is defined as :
Computing the minus loglikelihood in The Day Gaps Multivariate Exponential Hawkes process model results into the following closed form :
with are defined using the Ogoata [11] recursive formula :
Where :

is the number of days in the data

is the length of one day

are the baseline intensities

are the adjacency coefficients

are the decays coefficients

is the time defining the last time of the day number d
5.2.3 The Fitting procedure:
To fit this model, we use the algorithm described in this paper, using the projected Newton Descent for the convex optimization part and Nelder Mead for the non convex optimization step. The choice of the newton descent is justified by the easy computation of the gradient and hessian matrix which is sparse and because this algorithm has a quadratic convergence rate compared to gradient descent methods.
The gradient of is thus given by the following close formulas :
The hessian of is given by the following close formulas :
and all other coefficients are null. The fitting algorithm is described as follow:
And the projected newton descent is described as follow :
5.3 Some empirical results:
5.3.1 Loglikelihood landscape:
Visualizing the minus loglikelihood function for 2D exponential Hawkes process with regards to two non convex variables resulted into the following figure, emphasizing the flatness of the function to minimize and the existence of valleys that pose problem to gradient based optimization :
5.3.2 Goodness of fit size bins study:
Assessing the goodness of fit of the multivariate Hawkes process fitted on 2017 itxeb reported trades split into three series with regards to the size of trades:
5.3.3 Goodness of fit Multi index study:
Assessing the goodness of fit of the multivariate Hawkes process fitted on 2017 itxeb, itxes and itxex reported trades
Comments
There are no comments yet.