PoPPy: A Point Process Toolbox Based on PyTorch

10/23/2018
by   Hongteng Xu, et al.
4

PoPPy is a Point Process toolbox based on PyTorch, which achieves flexible designing and efficient learning of point process models. It can be used for interpretable sequential data modeling and analysis, e.g., Granger causality analysis of multi-variate point processes, point process-based simulation and prediction of event sequences. In practice, the key points of point process-based sequential data modeling include: 1) How to design intensity functions to describe the mechanism behind observed data? 2) How to learn the proposed intensity functions from observed data? The goal of PoPPy is providing a user-friendly solution to the key points above and achieving large-scale point process-based sequential data analysis, simulation and prediction.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 5

page 6

page 7

page 8

page 9

page 11

page 12

10/04/2021

Row-clustering of a Point Process-valued Matrix

Structured point process data harvested from various platforms poses new...
01/25/2020

Diagnostics and Visualization of Point Process Models for Event Times on a Social Network

Point process models have been used to analyze interaction event times o...
05/31/2020

Point Process Regression

Point processes in time have a wide range of applications that include t...
05/23/2017

Wasserstein Learning of Deep Generative Point Process Models

Point processes are becoming very popular in modeling asynchronous seque...
09/26/2019

Intensity-Free Learning of Temporal Point Processes

Temporal point processes are the dominant paradigm for modeling sequence...
09/09/2019

Analyzing the Spotify Top 200 Through a Point Process Lens

Every generation throws a hero up the pop charts. For the current genera...
04/25/2021

Breiman's two cultures: You don't have to choose sides

Breiman's classic paper casts data analysis as a choice between two cult...

Code Repositories

PoPPy

A Point Process Toolbox Based on PyTorch


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Overview

1.1 What is PoPPy?

PoPPy is a Point Process toolbox based on PyTorch, which achieves flexible designing and efficient learning of point process models. It can be used for interpretable sequential data modeling and analysis, , Granger causality analysis of multi-variate point processes, point process-based simulation and prediction of event sequences.

1.2 The Goal of PoPPy

Many real-world sequential data are often generated by complicated interactive mechanisms among multiple entities. Treating the entities as events with different discrete categories, we can represent their sequential behaviors as event sequences in continuous time domain. Mathematically, an event sequence can be denoted as , where and are the timestamp and the event type (, the index of entity) of the

-th event, respectively. Optionally, each event type may be associated with a feature vector

, , and each event sequence may also have a feature vector , . Many real-world scenarios can be formulated as event sequences, as shown in Table 1.

 Scene Patient admission Job hopping Online shopping
 Entities (Event types) Diseases Companies Items
 Sequences Patients’ admission records LinkedIn users’ job history Buying/rating behaviors
 Event feature Diagnose records Job descriptions Item profiles
 Sequence feature Patient profiles User profiles User profiles
 Task Build Disease network Model talent flow Recommendation system
Table 1: Typical event sequences in practice.

Given a set of event sequences , we aim to model the dynamics of the event sequences, capture the interactive mechanisms among different entities and predict their future behaviors. Temporal point process model provides us with a potential solution to achieve these aims. In particular, a multi-variate temporal point process can be represented by a set of counting processes , in which is the number of type- events occurring till time . For each , the expected instantaneous happening rate of type- events at time is denoted as , which is called “intensity function”:

(1)

where represents historical observations before time .

Figure 1: Event sequences and intensity functions.

As shown in Fig. 1, the counting processes can be represented as a set of intensity functions, each of which corresponds to a specific event type. The temporal dependency within the same event type and that across different event types (, the red arrows in Fig. 1) can be captured by choosing particular intensity functions. Therefore, the key points of point process-based sequential data modeling include

  1. How to design intensity functions to describe the mechanism behind observed data?

  2. How to learn the proposed intensity functions from observed data?

The goal of PoPPy is providing a user-friendly solution to the key points above and achieving large-scale point process-based sequential data analysis, simulation and prediction.

1.3 Installation of PoPPy

PoPPy is developed on Mac OS 10.13.6 but also tested on Ubuntu 16.04. The installation of PoPPy is very simple. In particular,

  1. Install Anaconda3 and create a conda environment.

  2. Install PyTorch0.4 in the environment.

  3. Download PoPPy from https://github.com/HongtengXu/PoPPy/ and unzip it to the directory in the environment. The unzipped folder should contains several subfolders, as shown in Fig. 2.

  4. Open dev/util.py and change POPPY_PATH to the directory, as shown in Fig. 3.

Figure 2: The subfolders in the package of PoPPy.
Figure 3: An example of the path of PoPPy.

The subfolders in the package include

  • data: It contains a toy dataset in .csv format.

  • dev: It contains a util.py file, which configures the path and the logger of the package.

  • docs: It contains the tutorial of PoPPy.

  • example: It contains some demo scripts for testing the functionality of the package.

  • model: It contains the classes of predefined point process models and their modules.

  • output: It contains the output files generated by the demo scripts in the example folder.

  • preprocess: It contains the classes and the functions of data I/O and preprocessing.

In the following sections, we will introduce the details of PoPPy.

2 Data: Representation and Preprocessing

2.1 Representations of Event Sequences

PoPPy represents observed event sequences as a nested dictionary. In particular, the proposed database has the following structure:

database = {
    ’event_features’    : None or (De, C) float array of event features,
                            C is the number of event types.
                            De is the dimension of event feature.
    ’type2idx’          : a Dict = {’event_name’: event_index}
    ’idx2type’          : a Dict = {event_index: ’event_name’}
    ’seq2idx’           : a Dict = {’seq_name’: seq_index}
    ’idx2seq’           : a Dict = {seq_index: ’seq_name’}
    ’sequences’         : a List = [seq_1, seq_2, ..., seq_N].
}
For the i-th sequence:
seq_i = {
    ’times’             : (N,) float array of timestamps,
                            N is the number of events.
    ’events’            : (N,) int array of event types.
    ’seq_feature’       : None or (Ds,) float array of sequence feature.
                            Ds is the dimension of sequence feature
    ’t_start’           : a float number, the start timestamp of the sequence.
    ’t_stop’            : a float number, the stop timestamp of the sequence.
    ’label’             : None or int/float number, the labels of the sequence
}

PoPPy provides three functions to load data from .csv file and convert it to the proposed database.

2.1.1 preprocess.DataIO.load_sequences_csv

This function loads event sequences and convert them to the proposed database. The IO and the description of this function is shown in Fig. 4.

Figure 4: The description of load_sequences_csv.

For example, the Linkedin.csv file in the folder data records a set of linkedin users’s job hopping behaviors among different companies, whose format is shown in Fig. 5

Figure 5: Some rows of Linkedin.csv.

Here, the column id corresponds to the names of sequences ( the index of users), the column time corresponds to the timestamps of events ( the ages that the users start to work), and the column event corresponds to the event types (, the companies). Therefore, we can define the input domain_names as

domain_names = {
    ’seq_id’    : ’id’,
    ’time’      : ’time’,
    ’event’     : ’event’
}

and database = load_sequences_csv(’Linkedin.csv’, domain_names).

Note that the database created by load_sequences_csv() does not contain event features and sequence features, whose values in database are None. PoPPy supports to load categorical or numerical features from .csv files, as shown below.

2.1.2 preprocess.DataIO.load_seq_features_csv

This function loads sequence features from a .csv file and import them to the proposed database. The IO and the description of this function is shown in Fig. 6.

Figure 6: The description of load_seq_features_csv.

Take the Linkedin.csv file as an example. Suppose that we have already create database by the function load_sequences_csv, and we want to take the column option1 (, the job titles that each user had) as the categorical features of event sequences. We should have

domain_names = {
    ’option1’    : ’categorical’
}
database = load_seq_features_csv(
                                file_name = ’Linkedin.csv’,
                                seq_domain = ’seq_id’,
                                domain_dict = domain_names,
                                database = database)

Here the input normalize is set as default , which means that the features in database[’sequences’][i][’seq_feature’], , are not normalized.

2.1.3 preprocess.DataIO.load_event_features_csv

This function loads event features from a .csv file and import them to the proposed database. The IO and the description of this function is shown in Fig. 7.

Figure 7: The description of load_event_features_csv.

Similarly, if we want to take the column option1 in Linkedin.csv as the categorical features of event types, we should have

domain_names = {
    ’option1’    : ’categorical’
}
database = load_event_features_csv(
                                file_name = ’Linkedin.csv’,
                                event_domain = ’event’,
                                domain_dict = domain_names,
                                database = database)

2.2 Operations for Data Preprocessing

Besides basic sequence/feature loaders and converters mentioned above, PoPPy contains multiple useful functions and classes for data preprocessing, including sequence stitching, superposing, aggregating and batch sampling. Fig. 8 illustrates the corresponding data operations.

Figure 8: The illustration of four data operations.

2.2.1 preprocess.DataOperation.stitching

This function stitches the sequences in two database randomly or based on their seq_feature and time information (t_start, t_stop). Its description is shown in Fig. 9.

When method = ’random’, for each sequence in database1 the function randomly selects a sequence in database2 as its follower and stitches them together. When method = ’feature’, the similarity between the sequence in database1 and that in database2 is defined by the multiplication of a temporal Gaussian kernel and a sequence feature’s Gaussian kernel, and the function selects the sequence in database2 yielding to a distribution defined by the similarity. The stitching method has been proven to be useful for enhancing the robustness of learning results, especially when the training sequences are very short xu2017learning ; xu2018learning .

Figure 9: The description of stitching.

2.2.2 preprocess.DataOperation.superposing

This function superposes the sequences in two database randomly or based on their seq_feature and time information (t_start, t_stop). Its description is shown in Fig. 10.

When method = ’random’, for each sequence in database1 the function randomly selects a sequence in database2 and superposes them together. When method = ’feature’, the similarity between the sequence in database1 and that in database2 is defined by the multiplication of a temporal Gaussian kernel and a sequence feature’s Gaussian kernel, and the function selects the sequence in database2 yielding to a distribution defined by the similarity.

Similar to the stitching operation, the superposing method has been proven to be useful for learning linear Hawkes process robustly. However, it should be noted that different from stitching operation, which stitches similar sequences with a high probability, the superposing operation would like to superpose the dissimilar sequences with a high probability. The rationality of such an operation can be found in my paper 

xu2018benefits ; xu2018superposition .

Figure 10: The description of superposing.

2.2.3 preprocess.DataOperation.aggregating

This function discretizes each event sequence into several bins and counts the number of events with specific types in each bin. Its description is shown in Fig. 11.

Figure 11: The description of aggregate.

2.2.4 preprocess.DataOperation.EventSampler

This class is a subclass of torch.utils.data.Dataset, which samples batches from database. For each sample in the batch, an event (, its event type and timestamp) and its history with length memorysize (, the last memorysize events and their timestamps) are recorded. If event and/or sequence features are available, the sample will record these features as well.

Figure 12: The description of EventSampler.

3 Temporal Point Process Models

3.1 Modular design of point process model

PoPPy applies a flexible strategy to build point process’s intensity functions from interpretable modules. Such a modular design strategy is very suitable for Hawkes process and its variants. Fig. 13 illustrates the proposed modular design strategy. In the following sections, we take Hawkes process and its variants as examples, and introduce the modules (, the classes) in PoPPy.

Figure 13: An illustration of proposed modular design strategy. Each color block represents a class with some functions. For each block, the dotted frame represents one of its subclass, which inherits some functions (the white ones) while overrides some others or creates new ones (the yellow ones). The black arrow means that the destination class will call the instance of the source class as input.

3.2 model.PointProcess.PointProcessModel

This class contains basic functions of a point process model, including

  • fit: learn model’s parameters given training data. It description is shown in Fig. 14

    Figure 14: The description of fit.
  • validation: test model given validation data. It description is shown in Fig. 15

    Figure 15: The description of validation.
  • simulation: simulate new event sequences from scratch or following observed sequences by Ogata’s thinning algorithm ogata1981lewis . It description is shown in Fig. 16

    Figure 16: The description of simulate.
  • prediction: predict expected counts of the events in the target time inteveral given learned model and observed sequences. It description is shown in Fig. 17

    Figure 17: The description of predict.
  • model_save: save model or save its parameters. It description is shown in Fig. 18

    Figure 18: The description of model_save.
  • model_load: load model or load its parameters. It description is shown in Fig. 19

    Figure 19: The description of model_load.
  • print_info: print basic information of model

  • plot_exogenous: print exogenous intensity.

In PoPPy, the instance of this class actually implements an inhomogeneous Poisson process, in which the exogenous intensity is used as the intensity function.

An important subclass of this class is model.HawkesProcess.HawkesProcessModel. This subclass inherits most of the functions above except print_info and plot_exogenous. Additionally, because Hawkes process considers the triggering patterns among different event types, this subclass has a new function plot_causality, which plots the adjacency matrix of the event types’ Granger causality graph. The typical visualization results of the exogenous intensity of different even types and the Granger causality among them are shown in Fig. 20

(a) exogenous intensity
(b) Granger causality
Figure 20: Typical visualization results.

Compared with its parant class, model.HawkesProcess.HawkesProcessModel uses a specific intensity function, which is defined in the class model.HawkesProcess.HawkesProcessIntensity.

3.3 model.HawkesProcess.HawkesProcessIntensity

This class inherits the functions in torch.nn.Module. It defines the intensity function of a generalized Hawkes process, which contains the following functions:

  • print_info: print the basic information of the intensity function.

  • intensity: calculate of the -th sample in the batch sampled by EventSampler.

  • expected_counts: calculate for and for the -th sample in the batch.

  • forward: override the forward function in torch.nn.Module. It calculates and for for SGD.

Specifically, the intensity function of type- event at time is defined as

(2)

Here, the intensity function is consist of two parts:

  • Exogenous intensity : it is independent with time, which measures the intensity contributed by the intrinsic properties of sequence and event type.

  • Endogenous impact : it sums up the influences of historical events quantitatively via impact functions , which measures the intensity contributed by the historical observations.

Furthermore, the impact function is decomposed with the help of basis representation, where is called the -th decay kernel and is the corresponding coefficient.

is an activation function, which can be

  • Identity: .

  • ReLU: .

  • Softplus: .

PoPPy provides multiple choices to implement various intensity functions — each module can be parametrized in different ways.

3.3.1 model.ExogenousIntensity.BasicExogenousIntensity

This class and its subclasses in model.ExogenousIntensityFamily implements several models of exogenous intensity, as shown in Table 2.

 Class Formulation
ExogenousIntensity.BasicExogenousIntensity
ExogenousIntensityFamily.NaiveExogenousIntensity
ExogenousIntensityFamily.LinearExogenousIntensity
ExogenousIntensityFamily.NeuralExogenousIntensity
Table 2: Typical models of exogenous intensity.

Here, the activation function is defined as aforementioned .

Note that the last two models require event and sequence features as input. When they are called while the features are not given. PoPPy will add one more embedding layer to generate event/sequence features from their index, and learn this layer during training.

3.3.2 model.EndogenousImpact.BasicEndogenousImpact

This class and its subclasses in model.EndogenousImpactFamily implements several models of the coefficients of the impact function, as shown in Table 3.

 Class Formulation
EndogenousImpact.BasicEndogenousImpact
EndogenousImpactFamily.NaiveEndogenousImpact
EndogenousImpactFamily.FactorizedEndogenousImpact
EndogenousImpactFamily.LinearEndogenousImpact
EndogenousImpactFamily.BiLinearEndogenousImpact
Table 3: Typical models of endogenous impact’s coefficient.

Here, the activation function is defined as aforementioned .

Note that the last two models require event and sequence features as input. When they are called while the features are not given. PoPPy will add one more embedding layer to generate event/sequence features from their index, and learn this layer during training.

3.3.3 model.DecayKernel.BasicDecayKernel

This class and its subclasses in model.DecayKernelFamily implements several models of the decay kernel, as shown in Table 4.

 Class Formulation
DecayKernelFamily.ExponentialKernel zhou2013learning 1
DecayKernelFamily.RayleighKernel 1
DecayKernelFamily.GaussianKernel 1
DecayKernelFamily.PowerlawKernel zhao2015seismic 1
DecayKernelFamily.GateKernel 1
DecayKernelFamily.MultiGaussKernel xu2016learning >1
Table 4: Typical models of decay kernel.

Fig. 21 visualizes some examples.

(a) Exponential kernel
(b) Rayleigh kernel
(c) Gaussian kernel
(d) Powerlaw kernel
(e) Gate kernel
(f) Multi-Gaussian kernel
Figure 21: Examples of decay kernels and their integration values.

4 Learning Algorithm

4.1 Loss functions

With the help of PyTorch, PoPPy learns the point process models above efficiently by stochastic gradient descent on CPU or GPU 

mei2017neural .111Currently, the GPU version is under development.

Different from existing point process toolboxes, which mainly focuses on the maximum likelihood estimation of point process models, PoPPy integrates three loss functions to learn the models, as shown in Table 

5.

 Maximum Likelihood Estimation zhou2013learning ; xu2016learning
 - Class: OtherLayers.MaxLogLike
 - Formulation:
 Least Square Estimation xu2018benefits ; xu2018online
 - Class: OtherLayers.LeastSquare
 - Formulation:
 Conditional Likelihood Estimation xu2017patient
 - Class: OtherLayers.CrossEntropy
 - Formulation:
Table 5: A list of loss functions.

Here and is an one-hot vector whose the -th element is 1.

4.2 Stochastic gradient decent

All the optimizers and the learning rate scheduler integrated in PyTorch are applicable to PoPPy. A typical configuration is using Adam + Exponential learning rate decay strategy, which should achieve good learning results in most situations. The details can be found in the demo scripts in the folder example.

Trick: Although most of optimizers are applicable, generally Adam achieves the best performance in our experiments mei2017neural .

4.3 Optional regularization

Besides the L2-norm regularizer integrated in the optimizers of PyTorch, PoPPy provides two more regularizers when learning models.

  1. Sparsity: L1-norm of model’s parameters can be applied to the models, which helps to learn structural parameters.

  2. Nonnegativeness: If it is required, PoPPy can ensure the parameters to be nonnegative during training.

Trick: When the activation function of impact coefficient is softplus, you’d better close the nonnegative constraint by setting the input nonnegative of the function fit as None.

5 Examples

As a result, using PoPPy, users can build their own point process models by combining different modules with high flexibility. As shown in Fig. 22, Each point process model can be built by selecting different modules and combining them together. The red dots represent the module with learnable parameters, the blue dots represent the module without parameters, and the green dots represent loss function modules. Moreover, users can add their own modules and design specific point process models for their applications easily, as long as the new classes override the corresponding functions.

Figure 22: Illustration the contruction of a point process model.

Finally, we list some typical models implemented by PoPPy in Table 6 222It should be noted that our implementations may be different from the methods in the references in the aspect of model and learning algorithm so the results in the references may not be reproduced by PoPPy..

 Model Linear Hawkes process zhou2013learning
 Exogenous Intensity NaiveExogenousIntensity
 Endogenous Impact NavieEndogenousImpact
 Decay Kernel ExponentialKernel
 Activation Identity
 Loss MaxLogLike
 Model Linear Hawkes process xu2016learning ; xu2018superposition
 Exogenous Intensity NaiveExogenousIntensity
 Endogenous Impact NavieEndogenousImpact
 Decay Kernel MultiGaussKernel
 Activation Identity
 Loss MaxLogLike
 Model Linear Hawkes process xu2018benefits
 Exogenous Intensity NaiveExogenousIntensity
 Endogenous Impact NavieEndogenousImpact
 Decay Kernel MultiGaussKernel
 Activation Identity
 Loss LeastSquares
 Model Factorized point process xu2018online
 Exogenous Intensity LinearExogenousIntensity
 Endogenous Impact FactorizedEndogenousImpact
 Decay Kernel ExponentialKernel
 Activation Identity
 Loss LeastSquares
 Model Semi-Parametric Hawkes process engelhard2018predicting
 Exogenous Intensity LinearExogenousIntensity
 Endogenous Impact NavieEndogenousImpact
 Decay Kernel MultiGaussKernel
 Activation Identity
 Loss MaxLogLike
 Model Parametric self-correcting process xu2015trailer
 Exogenous Intensity LinearExogenousIntensity
 Endogenous Impact LinearEndogenousImpact
 Decay Kernel GateKernel
 Activation Softplus
 Loss MaxLogLike
 Model Mutually-correcting process xu2017patient
 Exogenous Intensity LinearExogenousIntensity
 Endogenous Impact LinearEndogenousImpact
 Decay Kernel GaussianKernel
 Activation Softplus
 Loss CrossEntropy
Table 6: Typical models implemented by PoPPy.

References

  • (1) M. Engelhard, H. Xu, L. Carin, J. A. Oliver, M. Hallyburton, and F. J. McClernon. Predicting smoking events with a time-varying semi-parametric hawkes process model. arXiv preprint arXiv:1809.01740, 2018.
  • (2) H. Mei and J. M. Eisner. The neural hawkes process: A neurally self-modulating multivariate point process. In Advances in Neural Information Processing Systems, pages 6754–6764, 2017.
  • (3) Y. Ogata. On lewis’ simulation method for point processes. IEEE Transactions on Information Theory, 27(1):23–31, 1981.
  • (4) H. Xu, L. Carin, and H. Zha. Learning registered point processes from idiosyncratic observations. In

    International Conference on Machine Learning

    , 2018.
  • (5) H. Xu, X. Chen, and L. Carin. Superposition-assisted stochastic optimization for hawkes processes. arXiv preprint arXiv:1802.04725, 2018.
  • (6) H. Xu, M. Farajtabar, and H. Zha. Learning granger causality for hawkes processes. In International Conference on Machine Learning, pages 1717–1726, 2016.
  • (7) H. Xu, D. Luo, and L. Carin.

    Online continuous-time tensor factorization based on pairwise interactive point processes.

    In

    Proceedings of the 27th International Conference on Artificial Intelligence

    . AAAI Press, 2018.
  • (8) H. Xu, D. Luo, X. Chen, and L. Carin. Benefits from superposed hawkes processes. In International Conference on Artificial Intelligence and Statistics, pages 623–631, 2018.
  • (9) H. Xu, D. Luo, and H. Zha. Learning hawkes processes from short doubly-censored event sequences. In International Conference on Machine Learning, pages 3831–3840, 2017.
  • (10) H. Xu, W. Wu, S. Nemati, and H. Zha. Patient flow prediction via discriminative learning of mutually-correcting processes. IEEE Transactions on Knowledge and Data Engineering, 29(1):157–171, 2017.
  • (11) H. Xu, Y. Zhen, and H. Zha. Trailer generation via a point process-based visual attractiveness model. In Proceedings of the 24th International Conference on Artificial Intelligence, pages 2198–2204. AAAI Press, 2015.
  • (12) Q. Zhao, M. A. Erdogdu, H. Y. He, A. Rajaraman, and J. Leskovec. Seismic: A self-exciting point process model for predicting tweet popularity. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1513–1522. ACM, 2015.
  • (13) K. Zhou, H. Zha, and L. Song. Learning social infectivity in sparse low-rank networks using multi-dimensional hawkes processes. In Artificial Intelligence and Statistics, pages 641–649, 2013.