PoPPy
A Point Process Toolbox Based on PyTorch
view repo
PoPPy is a Point Process toolbox based on PyTorch, which achieves flexible designing and efficient learning of point process models. It can be used for interpretable sequential data modeling and analysis, e.g., Granger causality analysis of multi-variate point processes, point process-based simulation and prediction of event sequences. In practice, the key points of point process-based sequential data modeling include: 1) How to design intensity functions to describe the mechanism behind observed data? 2) How to learn the proposed intensity functions from observed data? The goal of PoPPy is providing a user-friendly solution to the key points above and achieving large-scale point process-based sequential data analysis, simulation and prediction.
READ FULL TEXT VIEW PDFA Point Process Toolbox Based on PyTorch
PoPPy is a Point Process toolbox based on PyTorch, which achieves flexible designing and efficient learning of point process models. It can be used for interpretable sequential data modeling and analysis, , Granger causality analysis of multi-variate point processes, point process-based simulation and prediction of event sequences.
Many real-world sequential data are often generated by complicated interactive mechanisms among multiple entities. Treating the entities as events with different discrete categories, we can represent their sequential behaviors as event sequences in continuous time domain. Mathematically, an event sequence can be denoted as , where and are the timestamp and the event type (, the index of entity) of the
-th event, respectively. Optionally, each event type may be associated with a feature vector
, , and each event sequence may also have a feature vector , . Many real-world scenarios can be formulated as event sequences, as shown in Table 1.Scene | Patient admission | Job hopping | Online shopping |
---|---|---|---|
Entities (Event types) | Diseases | Companies | Items |
Sequences | Patients’ admission records | LinkedIn users’ job history | Buying/rating behaviors |
Event feature | Diagnose records | Job descriptions | Item profiles |
Sequence feature | Patient profiles | User profiles | User profiles |
Task | Build Disease network | Model talent flow | Recommendation system |
Given a set of event sequences , we aim to model the dynamics of the event sequences, capture the interactive mechanisms among different entities and predict their future behaviors. Temporal point process model provides us with a potential solution to achieve these aims. In particular, a multi-variate temporal point process can be represented by a set of counting processes , in which is the number of type- events occurring till time . For each , the expected instantaneous happening rate of type- events at time is denoted as , which is called “intensity function”:
(1) |
where represents historical observations before time .
As shown in Fig. 1, the counting processes can be represented as a set of intensity functions, each of which corresponds to a specific event type. The temporal dependency within the same event type and that across different event types (, the red arrows in Fig. 1) can be captured by choosing particular intensity functions. Therefore, the key points of point process-based sequential data modeling include
How to design intensity functions to describe the mechanism behind observed data?
How to learn the proposed intensity functions from observed data?
The goal of PoPPy is providing a user-friendly solution to the key points above and achieving large-scale point process-based sequential data analysis, simulation and prediction.
PoPPy is developed on Mac OS 10.13.6 but also tested on Ubuntu 16.04. The installation of PoPPy is very simple. In particular,
Install Anaconda3 and create a conda environment.
Install PyTorch0.4 in the environment.
Download PoPPy from https://github.com/HongtengXu/PoPPy/ and unzip it to the directory in the environment. The unzipped folder should contains several subfolders, as shown in Fig. 2.
Open dev/util.py and change POPPY_PATH to the directory, as shown in Fig. 3.
The subfolders in the package include
data: It contains a toy dataset in .csv format.
dev: It contains a util.py file, which configures the path and the logger of the package.
docs: It contains the tutorial of PoPPy.
example: It contains some demo scripts for testing the functionality of the package.
model: It contains the classes of predefined point process models and their modules.
output: It contains the output files generated by the demo scripts in the example folder.
preprocess: It contains the classes and the functions of data I/O and preprocessing.
In the following sections, we will introduce the details of PoPPy.
PoPPy represents observed event sequences as a nested dictionary. In particular, the proposed database has the following structure:
PoPPy provides three functions to load data from .csv file and convert it to the proposed database.
This function loads event sequences and convert them to the proposed database. The IO and the description of this function is shown in Fig. 4.
For example, the Linkedin.csv file in the folder data records a set of linkedin users’s job hopping behaviors among different companies, whose format is shown in Fig. 5
Here, the column id corresponds to the names of sequences ( the index of users), the column time corresponds to the timestamps of events ( the ages that the users start to work), and the column event corresponds to the event types (, the companies). Therefore, we can define the input domain_names as
and database = load_sequences_csv(’Linkedin.csv’, domain_names).
Note that the database created by load_sequences_csv() does not contain event features and sequence features, whose values in database are None. PoPPy supports to load categorical or numerical features from .csv files, as shown below.
This function loads sequence features from a .csv file and import them to the proposed database. The IO and the description of this function is shown in Fig. 6.
Take the Linkedin.csv file as an example. Suppose that we have already create database by the function load_sequences_csv, and we want to take the column option1 (, the job titles that each user had) as the categorical features of event sequences. We should have
Here the input normalize is set as default , which means that the features in database[’sequences’][i][’seq_feature’], , are not normalized.
This function loads event features from a .csv file and import them to the proposed database. The IO and the description of this function is shown in Fig. 7.
Similarly, if we want to take the column option1 in Linkedin.csv as the categorical features of event types, we should have
Besides basic sequence/feature loaders and converters mentioned above, PoPPy contains multiple useful functions and classes for data preprocessing, including sequence stitching, superposing, aggregating and batch sampling. Fig. 8 illustrates the corresponding data operations.
This function stitches the sequences in two database randomly or based on their seq_feature and time information (t_start, t_stop). Its description is shown in Fig. 9.
When method = ’random’, for each sequence in database1 the function randomly selects a sequence in database2 as its follower and stitches them together. When method = ’feature’, the similarity between the sequence in database1 and that in database2 is defined by the multiplication of a temporal Gaussian kernel and a sequence feature’s Gaussian kernel, and the function selects the sequence in database2 yielding to a distribution defined by the similarity. The stitching method has been proven to be useful for enhancing the robustness of learning results, especially when the training sequences are very short xu2017learning ; xu2018learning .
This function superposes the sequences in two database randomly or based on their seq_feature and time information (t_start, t_stop). Its description is shown in Fig. 10.
When method = ’random’, for each sequence in database1 the function randomly selects a sequence in database2 and superposes them together. When method = ’feature’, the similarity between the sequence in database1 and that in database2 is defined by the multiplication of a temporal Gaussian kernel and a sequence feature’s Gaussian kernel, and the function selects the sequence in database2 yielding to a distribution defined by the similarity.
Similar to the stitching operation, the superposing method has been proven to be useful for learning linear Hawkes process robustly. However, it should be noted that different from stitching operation, which stitches similar sequences with a high probability, the superposing operation would like to superpose the dissimilar sequences with a high probability. The rationality of such an operation can be found in my paper
xu2018benefits ; xu2018superposition .This function discretizes each event sequence into several bins and counts the number of events with specific types in each bin. Its description is shown in Fig. 11.
This class is a subclass of torch.utils.data.Dataset, which samples batches from database. For each sample in the batch, an event (, its event type and timestamp) and its history with length memorysize (, the last memorysize events and their timestamps) are recorded. If event and/or sequence features are available, the sample will record these features as well.
PoPPy applies a flexible strategy to build point process’s intensity functions from interpretable modules. Such a modular design strategy is very suitable for Hawkes process and its variants. Fig. 13 illustrates the proposed modular design strategy. In the following sections, we take Hawkes process and its variants as examples, and introduce the modules (, the classes) in PoPPy.
This class contains basic functions of a point process model, including
fit: learn model’s parameters given training data. It description is shown in Fig. 14
validation: test model given validation data. It description is shown in Fig. 15
simulation: simulate new event sequences from scratch or following observed sequences by Ogata’s thinning algorithm ogata1981lewis . It description is shown in Fig. 16
prediction: predict expected counts of the events in the target time inteveral given learned model and observed sequences. It description is shown in Fig. 17
model_save: save model or save its parameters. It description is shown in Fig. 18
model_load: load model or load its parameters. It description is shown in Fig. 19
print_info: print basic information of model
plot_exogenous: print exogenous intensity.
In PoPPy, the instance of this class actually implements an inhomogeneous Poisson process, in which the exogenous intensity is used as the intensity function.
An important subclass of this class is model.HawkesProcess.HawkesProcessModel. This subclass inherits most of the functions above except print_info and plot_exogenous. Additionally, because Hawkes process considers the triggering patterns among different event types, this subclass has a new function plot_causality, which plots the adjacency matrix of the event types’ Granger causality graph. The typical visualization results of the exogenous intensity of different even types and the Granger causality among them are shown in Fig. 20
Compared with its parant class, model.HawkesProcess.HawkesProcessModel uses a specific intensity function, which is defined in the class model.HawkesProcess.HawkesProcessIntensity.
This class inherits the functions in torch.nn.Module. It defines the intensity function of a generalized Hawkes process, which contains the following functions:
print_info: print the basic information of the intensity function.
intensity: calculate of the -th sample in the batch sampled by EventSampler.
expected_counts: calculate for and for the -th sample in the batch.
forward: override the forward function in torch.nn.Module. It calculates and for for SGD.
Specifically, the intensity function of type- event at time is defined as
(2) |
Here, the intensity function is consist of two parts:
Exogenous intensity : it is independent with time, which measures the intensity contributed by the intrinsic properties of sequence and event type.
Endogenous impact : it sums up the influences of historical events quantitatively via impact functions , which measures the intensity contributed by the historical observations.
Furthermore, the impact function is decomposed with the help of basis representation, where is called the -th decay kernel and is the corresponding coefficient.
PoPPy provides multiple choices to implement various intensity functions — each module can be parametrized in different ways.
This class and its subclasses in model.ExogenousIntensityFamily implements several models of exogenous intensity, as shown in Table 2.
Class | Formulation |
---|---|
ExogenousIntensity.BasicExogenousIntensity | |
ExogenousIntensityFamily.NaiveExogenousIntensity | |
ExogenousIntensityFamily.LinearExogenousIntensity | |
ExogenousIntensityFamily.NeuralExogenousIntensity |
Here, the activation function is defined as aforementioned .
Note that the last two models require event and sequence features as input. When they are called while the features are not given. PoPPy will add one more embedding layer to generate event/sequence features from their index, and learn this layer during training.
This class and its subclasses in model.EndogenousImpactFamily implements several models of the coefficients of the impact function, as shown in Table 3.
Class | Formulation |
---|---|
EndogenousImpact.BasicEndogenousImpact | |
EndogenousImpactFamily.NaiveEndogenousImpact | |
EndogenousImpactFamily.FactorizedEndogenousImpact | |
EndogenousImpactFamily.LinearEndogenousImpact | |
EndogenousImpactFamily.BiLinearEndogenousImpact |
Here, the activation function is defined as aforementioned .
Note that the last two models require event and sequence features as input. When they are called while the features are not given. PoPPy will add one more embedding layer to generate event/sequence features from their index, and learn this layer during training.
This class and its subclasses in model.DecayKernelFamily implements several models of the decay kernel, as shown in Table 4.
Class | Formulation | |
---|---|---|
DecayKernelFamily.ExponentialKernel zhou2013learning | 1 | |
DecayKernelFamily.RayleighKernel | 1 | |
DecayKernelFamily.GaussianKernel | 1 | |
DecayKernelFamily.PowerlawKernel zhao2015seismic | 1 | |
DecayKernelFamily.GateKernel | 1 | |
DecayKernelFamily.MultiGaussKernel xu2016learning | >1 |
Fig. 21 visualizes some examples.
With the help of PyTorch, PoPPy learns the point process models above efficiently by stochastic gradient descent on CPU or GPU
mei2017neural .^{1}^{1}1Currently, the GPU version is under development.Different from existing point process toolboxes, which mainly focuses on the maximum likelihood estimation of point process models, PoPPy integrates three loss functions to learn the models, as shown in Table
5.Maximum Likelihood Estimation zhou2013learning ; xu2016learning |
- Class: OtherLayers.MaxLogLike |
- Formulation: |
Least Square Estimation xu2018benefits ; xu2018online |
- Class: OtherLayers.LeastSquare |
- Formulation: |
Conditional Likelihood Estimation xu2017patient |
- Class: OtherLayers.CrossEntropy |
- Formulation: |
Here and is an one-hot vector whose the -th element is 1.
All the optimizers and the learning rate scheduler integrated in PyTorch are applicable to PoPPy. A typical configuration is using Adam + Exponential learning rate decay strategy, which should achieve good learning results in most situations. The details can be found in the demo scripts in the folder example.
Trick: Although most of optimizers are applicable, generally Adam achieves the best performance in our experiments mei2017neural .
Besides the L2-norm regularizer integrated in the optimizers of PyTorch, PoPPy provides two more regularizers when learning models.
Sparsity: L1-norm of model’s parameters can be applied to the models, which helps to learn structural parameters.
Nonnegativeness: If it is required, PoPPy can ensure the parameters to be nonnegative during training.
Trick: When the activation function of impact coefficient is softplus, you’d better close the nonnegative constraint by setting the input nonnegative of the function fit as None.
As a result, using PoPPy, users can build their own point process models by combining different modules with high flexibility. As shown in Fig. 22, Each point process model can be built by selecting different modules and combining them together. The red dots represent the module with learnable parameters, the blue dots represent the module without parameters, and the green dots represent loss function modules. Moreover, users can add their own modules and design specific point process models for their applications easily, as long as the new classes override the corresponding functions.
Finally, we list some typical models implemented by PoPPy in Table 6 ^{2}^{2}2It should be noted that our implementations may be different from the methods in the references in the aspect of model and learning algorithm so the results in the references may not be reproduced by PoPPy..
Model | Linear Hawkes process zhou2013learning |
Exogenous Intensity | NaiveExogenousIntensity |
Endogenous Impact | NavieEndogenousImpact |
Decay Kernel | ExponentialKernel |
Activation | Identity |
Loss | MaxLogLike |
Model | Linear Hawkes process xu2016learning ; xu2018superposition |
Exogenous Intensity | NaiveExogenousIntensity |
Endogenous Impact | NavieEndogenousImpact |
Decay Kernel | MultiGaussKernel |
Activation | Identity |
Loss | MaxLogLike |
Model | Linear Hawkes process xu2018benefits |
Exogenous Intensity | NaiveExogenousIntensity |
Endogenous Impact | NavieEndogenousImpact |
Decay Kernel | MultiGaussKernel |
Activation | Identity |
Loss | LeastSquares |
Model | Factorized point process xu2018online |
Exogenous Intensity | LinearExogenousIntensity |
Endogenous Impact | FactorizedEndogenousImpact |
Decay Kernel | ExponentialKernel |
Activation | Identity |
Loss | LeastSquares |
Model | Semi-Parametric Hawkes process engelhard2018predicting |
Exogenous Intensity | LinearExogenousIntensity |
Endogenous Impact | NavieEndogenousImpact |
Decay Kernel | MultiGaussKernel |
Activation | Identity |
Loss | MaxLogLike |
Model | Parametric self-correcting process xu2015trailer |
Exogenous Intensity | LinearExogenousIntensity |
Endogenous Impact | LinearEndogenousImpact |
Decay Kernel | GateKernel |
Activation | Softplus |
Loss | MaxLogLike |
Model | Mutually-correcting process xu2017patient |
Exogenous Intensity | LinearExogenousIntensity |
Endogenous Impact | LinearEndogenousImpact |
Decay Kernel | GaussianKernel |
Activation | Softplus |
Loss | CrossEntropy |
International Conference on Machine Learning
, 2018.Online continuous-time tensor factorization based on pairwise interactive point processes.
InProceedings of the 27th International Conference on Artificial Intelligence
. AAAI Press, 2018.
Comments
There are no comments yet.