THAP: A Matlab Toolkit for Learning with Hawkes Processes

08/28/2017
by   Hongteng Xu, et al.
Georgia Institute of Technology
0

As a powerful tool of asynchronous event sequence analysis, point processes have been studied for a long time and achieved numerous successes in different fields. Among various point process models, Hawkes process and its variants attract many researchers in statistics and computer science these years because they capture the self- and mutually-triggering patterns between different events in complicated sequences explicitly and quantitatively and are broadly applicable to many practical problems. In this paper, we describe an open-source toolkit implementing many learning algorithms and analysis tools for Hawkes process model and its variants. Our toolkit systematically summarizes recent state-of-the-art algorithms as well as most classic algorithms of Hawkes processes, which is beneficial for both academical education and research. Source code can be downloaded from https://github.com/HongtengXu/Hawkes-Process-Toolkit.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

04/14/2018

CytonRL: an Efficient Reinforcement Learning Open-source Toolkit Implemented in C++

This paper presents an open-source enforcement learning toolkit named Cy...
11/04/2020

DeepReg: a deep learning toolkit for medical image registration

DeepReg (https://github.com/DeepRegNet/DeepReg) is a community-supported...
05/01/2018

Realistic Multimedia Tools based on Physical Models: II. The Binary 3D Renderer (B3dR)

The present article reports on the second tool of a custom-built toolkit...
02/14/2022

Building Inspection Toolkit: Unified Evaluation and Strong Baselines for Damage Recognition

In recent years, several companies and researchers have started to tackl...
02/09/2020

MOGPTK: The Multi-Output Gaussian Process Toolkit

We present MOGPTK, a Python package for multi-channel data modelling usi...
07/02/2021

DRIFT: A Toolkit for Diachronic Analysis of Scientific Literature

In this work, we present to the NLP community, and to the wider research...
12/09/2020

TaskTracker-tool: a Toolkit for Tracking of Code Snapshots and Activity Data During Solution of Programming Tasks

The process of writing code and use of features in an integrated develop...

Code Repositories

Hawkes-Process-Toolkit

A toolbox of Hawkes processes


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Real-world interactions among multiple entities are often recorded as event sequences, such as user behaviors in social networks, earthquakes in different locations, and diseases and their complications. The entities or event types in these sequences often exhibit complicated self- and mutually-triggering patterns — historical events are likely to have influences on the happenings of current and future events, and the historical events at different time stamps have different impacts. Modeling these event sequences and analyzing the triggering patterns behind them are classical problems in statistics and computer science, which can be solved based on point process models and their learning algorithms.

As a special kind of point processes, Hawkes process model (hawkes1971point)

attracts a lot of researchers and has been widely used in many fields because it can represent the triggering patterns explicitly and quantitatively. Additionally, Hawkes process is very flexible, which has many variants and can be extended and connected with existing machine learning models. Its typical applications include, but not limited to, financial analysis, bioinformatics, social network analysis and control, and crowd behavior modeling. Because of these properties and broad applications, most existing toolkits of point processes are actually developed focusing on Hawkes processes.

However, although many new models and learning algorithms of Hawkes processes have been proposed for these years, the development of existing Hawkes processes’ toolkits lags behind. On the one hand, they concentrate on implementing traditional algorithms rather than the rapidly evolving state-of-the-art. On the other hand, it is difficult to have a fair and comprehensive comparison for modern algorithms because they are implemented over different sources.

Focusing on modeling and learning Hawkes process and its variants, we describe a new toolkit THAP (Toolkit for HAwkes Processes) in this paper, implementing a wide variety of learning and analysis algorithms for Hawkes processes. THAP offers a Matlab-based implementation of modern state-of-the-art learning and analysis algorithms and provides two real-world date sets (the IPTV data (luo2014you; luo2015multi) and the Linkedin data (xu2017learning)

). It has an ability to compare different algorithms using a variety of evaluation metrics, and thus, may clarify which algorithms perform better under what circumstance. The Matlab-based implementation makes it have some benefits for academical education and research — students can understand the basic concepts of Hawkes processes and the details of the corresponding learning algorithms and accelerate their research in their initial phases. The open-source nature of

THAP makes it easy for third parties to contribute additional implementations, and the modules of THAP are extendable to develop more complicated models and functions, e.g. Wasserstein learning (xiao2017wasserstein)

and recurrent neural networks 

(du2016recurrent).

Figure 1: The structure of THAP.

2 Implementation

THAP is a multi-platform Matlab software (R2016a or higher version required). It is compatible with MS Windows, Linux, and Mac OS. The toolkit consists of five main components, as shown in Fig. 1. Data: Import real-world data (i.e., csv files), convert them to Matlab’s format (i.e., mat files), and implement data preprocessing like sampling, stitching, and thinning. Simulation: Implement three simulation methods to generate synthetic data, including the branch clustering method (hawkes1974cluster; moller2006approximate), Ogata’s modified thinning method (ogata1981lewis), and the fast thinning method for the Hawkes process with exponential impact functions (dassios2013exact). Model: Define Hawkes process model and its variants and implement their learning algorithms. Analysis:

Achieve the Granger causality analysis and the clustering analysis of event sequences.

Visualization: Visualize data, models, and learning results.

The key modules of THAP

are modeling and analysis modules. In particular, Hawkes processes can be categorized into parametric models and nonparametric ones. The parametric models include the Hawkes processes with predefined impact functions, e.g., exponential impact functions and Gaussian impact functions. The nonparametric models include the Hawkes processes with arbitrary impact functions, and those impact functions can be represented by a set of basis functions or discretized as a set of sample points with fixed time lags. According to the representation of impact function, different learning algorithms are applied. For the Hawkes processes with continuous impact functions (i.e., those represented by predefined functions or basis), we can apply maximum likelihood estimation (MLE) directly to estimate the parameters of the models. For the Hawkes processes with discretized impact functions, we can (a) treat event sequences as time series and apply the least-squares (LS) method 

(eichler2016graphical)

, or (b) combine MLE with a solver of ordinary differential equations (ODE) 

(zhou2013learning2) to estimate the parameters of the models.

(a) Sequence and intensity
(b) Simulators’ runtime
(c) Learned impact function
(d) Estimation errors
(e) Log-likelihood of data
(f) Granger causality graph
(g) Dynamics of infectivity
(h) Clustering analysis
Figure 2: Visualization of typical functions achieved by THAP

Additionally, THAP provides us with three cutting-edge analysis tools. The first is Granger causality analysis. For multi-dimensional Hawkes processes, the self-and mutually-triggering patterns between different event types can be represented by a Granger causality graph. THAP combines MLE with various regularizers, e.g., sparse, group-sparse, and low-rank regularizers, and learns the adjacent matrix of the Granger causality graph (i.e., the infectivity matrix of event types) robustly. The second is clustering analysis. THAP contains two methods to cluster the event sequences generated by different Hawkes processes: (a) learning a mixture model of Hawkes processes (xu2017dirichlet); (b) implementing a distance metric for marked point processes (iwayama2017definition). The third is dynamical analysis of event sequence. THAP implements the time-varying Hawkes process (TVHP) model in (xu2017learning) and captures the change of infectivity matrix over time.

In summary, we visualize some typical functions achieved by THAP in Fig. 2. Specifically, using THAP, we can (a) visualize event sequences and their intensity functions; (b) simulate event sequences by different simulators and compare their runtime; (c) learn Hawkes processes by different algorithms and visualize learned impact functions; (d) calculate estimation errors of parameters; (e) calculate log-likelihood of data obtained by different algorithms; (f) learn the Granger causality graph of event types (e.g., the infectivity between TV program categories); (g) learn the dynamics of infectivity matrix (e.g., the infectivity between companies for employees at different ages); and (h) learn clustering structures of event sequences and distances between them.

3 Related work

Table 1 summarizes the implemented features in other open-source point process toolkits and compares them to those in THAP. The functions implemented by different toolkits are labeled by different symbols. THAP covers most of functions of other toolkits and contains many new functions. Specifically, the R-based library R-hawkes111https://cran.r-project.org/web/packages/hawkes/hawkes.pdf just contains a single estimation algorithm of traditional Hawkes processes (da2014hawkes). The Python-based library pyhawkes222https://github.com/slinderman/pyhawkes only implements its contributors’ published algorithms (linderman2014discovering; linderman2015scalable). The C++ library PtPack333https://github.com/dunan/MultiVariatePointProcess includes some traditional and advanced learning techniques of point processes. It is not very user-friendly because it does not have Python or Matlab interfaces. Recently, a new C++ library tick444https://github.com/X-DataInitiative/tick is developed with a Python interface (bacry2017tick), which includes most PtPack’s functions and further improves their performance.

[c] Model Type Parametric Nonparametric Impact function Exponential   Gaussian Smooth basis   Discrete Simulator Branch clustering (Fast) Thinning Learning MLE(+Regularizer) MLE + ODE Least-squares Analysis Granger causality Clustering (Mixture model) Clustering (Distance metric) Longtime dynamics (TVHP)

  • THAP, R-hawkes, pyhawkes, PtPack, tick.

Table 1: Models and algorithms of Hawkes processes in different toolkits.

4 Summary

THAP contributes to point process research community by (a) providing an easy and fair comparison among most existing models and learning algorithms of Hawkes processes, (b) supporting advanced analysis tools which have not been available for other libraries, and (c) filling the blank of point process’s education and research with a Matlab-based toolkit. In the future, we plan to add extensions to go beyond existing Hawkes process models.

We would like to acknowledge support for this project from the NSF IIS-1639792, 1717916, and NSFC 61628203.


References