## 1 Introduction

There is a growing interest in modelling and understanding temporal dyadic interaction data. Temporal interaction data take the form of time-stamped triples indicating that an interaction occurred between individuals and at time . Interactions may be directed or undirected. Examples of such interaction data include commenting a post on an online social network, exchanging an email, or meeting in a coffee shop. An important challenge is to understand the underlying structure that underpins these interactions. To do so, it is important to develop statistical network models with interpretable parameters, that capture the properties which are observed in real social interaction data.

One important aspect to capture is the community structure of the interactions. Individuals are often affiliated to some latent communities (e.g. work, sport, etc.), and their affiliations determine their interactions: they are more likely to interact with individuals sharing the same interests than to individuals affiliated with different communities. An other important aspect is reciprocity. Many events are responses to recent events of the opposite direction. For example, if Helen sends an email to Mary, then Mary is more likely to send an email to Helen shortly afterwards. A number of papers have proposed statistical models to capture both community structure and reciprocity in temporal interaction data (Blundell et al., 2012; Dubois et al., 2013; Linderman and Adams, 2014). They use models based on Hawkes processes for capturing reciprocity and stochastic block-models or latent feature models for capturing community structure.

In addition to the above two properties, it is important to capture the global properties of the interaction data. Interaction data are often sparse: only a small fraction of the pairs of nodes actually interact. Additionally, they typically exhibit high degree (number of interactions per node) heterogeneity: some individuals have a large number of interactions, whereas most individuals have very few, therefore resulting in empirical degree distributions being heavy-tailed. As shown by Karrer and Newman (2011), Gopalan et al. (2013) and Todeschini et al. (2016)

, failing to account explicitly for degree heterogeneity in the model can have devastating consequences on the estimation of the latent structure.

Recently, two classes of statistical models, based on random measures, have been proposed to capture sparsity and power-law degree distribution in network data. The first one is the class of models based on exchangeable random measures (Caron and Fox, 2017; Veitch and Roy, 2015; Herlau et al., 2016; Borgs et al., 2018; Todeschini et al., 2016; Palla et al., 2016; Janson, 2017a). The second one is the class of edge-exchangeable models (Crane and Dempsey, 2015; 2018; Cai et al., 2016; Williamson, 2016; Janson, 2017b; Ng and Silva, 2017). Both classes of models can handle both sparse and dense networks and, although the two constructions are different, connections have been highlighted between the two approaches (Cai et al., 2016; Janson, 2017b).

The objective of this paper is to propose a class of statistical models for temporal dyadic interaction data that can capture all the desired properties mentioned above, which are often found in real world interactions. These are sparsity, degree heterogeneity, community structure and reciprocity. Combining all the properties in a single model is non trivial and there is no such construction to our knowledge. The proposed model generalises existing reciprocating relationships models (Blundell et al., 2012) to the sparse and power-law regime. Our model can also be seen as a natural extension of the classes of models based on exchangeable random measures and edge-exchangeable models and it shares properties of both families. The approach is shown to outperform alternative models for link prediction on a variety of temporal network datasets.

The construction is based on Hawkes processes and the (static) model of Todeschini et al. (2016) for sparse and modular graphs with overlapping community structure. In Section 2, we present Hawkes processes and compound completely random measures which form the basis of our model’s construction. The statistical model for temporal dyadic data is presented in Section 3 and its properties derived in Section 4. The inference algorithm is described in Section 5. Section 6 presents experiments on four real-world temporal interaction datasets.

## 2 Background material

### 2.1 Hawkes processes

Let be a sequence of event times with , and let the subset of event times between time and time . Let denote the number of events between time and time , where if is true, and 0 otherwise. Assume that is a counting process with conditional intensity function , that is for any and any infinitesimal interval

(1) |

Consider another counting process with the corresponding . Then, are mutually-exciting Hawkes processes (Hawkes, 1971) if the conditional intensity functions and take the form

where are the base intensities and non-negative kernels parameterised by and . This defines a pair of processes in which the current rate of events of each process depends on the occurrence of past events of the opposite process.

Assume that and for , for . If admits a form of fast decay then this results in strong local effects. However, if it prescribes a peak away from the origin then longer term effects are likely to occur. We consider here an exponential kernel

(2) |

where . determines the sizes of the self-excited jumps and is the constant rate of exponential decay. The stationarity condition for the processes is . Figure 1 gives an illustration of two mutually-exciting Hawkes processes with exponential kernel and their conditional intensities.

### 2.2 Compound completely random measures

A homogeneous completely random measure (CRM) (Kingman, 1967; 1993) on without fixed atoms nor deterministic component takes the form

(3) |

where are the points of a Poisson process on with mean measure where is a Lévy measure, is a locally bounded measure and is the dirac delta mass at . The homogeneous CRM is completely characterized by and , and we write , or simply when is taken to be the Lebesgue measure. Griffin and Leisen (2017) proposed a multivariate generalisation of CRMs, called compound CRM (CCRM). A compound CRM with independent scores is defined as

(4) |

where and the scores

are independently distributed from some probability distribution

and is a CRM with mean measure . In the rest of this paper, we assume thatis a gamma distribution with parameters

, is the Lebesgue measure and is the Lévy measure of a generalized gamma process(5) |

where and .

## 3 Statistical model for temporal interaction data

Consider temporal interaction data of the form where represents a directed interaction at time from node/individual to node/individual . For example, the data may correspond to the exchange of messages between students on an online social network.

We use a point process on , and consider that each node is assigned some continuous label . the labels are only used for the model construction, similarly to (Caron and Fox, 2017; Todeschini et al., 2016), and are not observed nor inferred from the data. A point at location indicates that there is a directed interaction between the nodes and at time . See Figure 2 for an illustration.

For a pair of nodes and , with labels and , let be the counting process

(6) |

for the number of interactions between and in the time interval . For each pair , the counting processes are mutually-exciting Hawkes processes with conditional intensities

(7) |

where is the exponential kernel defined in Equation (2). Interactions from individual to individual may arise as a response to past interactions from to through the kernel , or via the base intensity . We also model assortativity so that individuals with similar interests are more likely to interact than individuals with different interests. For this, assume that each node has a set of positive latent parameters , where is the level of its affiliation to each latent community . The number of communities is assumed known. We model the base rate

(8) |

Two nodes with high levels of affiliation to the same communities will be more likely to interact than nodes with affiliation to different communities, favouring assortativity.

In order to capture sparsity and power-law properties and as in Todeschini et al. (2016), the set of affiliation parameters and node labels is modelled via a compound CRM with gamma scores, that is where the Lévy measure is defined by Equation (5), and for each node and community

(9) |

The parameter is a degree correction for node and can be interpreted as measuring the overall popularity/sociability of a given node irrespective of its level of affiliation to the different communities. An individual with a high sociability parameter will be more likely to have interactions overall than individuals with low sociability parameters. The scores tune the level of affiliation of individual to the community . The model is defined on . We assume that we observe interactions over a subset where and tune both the number of nodes and number of interactions. The whole model is illustrated in Figure 2.

The model admits the following set of hyperparameters, with the following interpretation:

The hyperparameters where and of the kernel tune the reciprocity.

The hyperparameters tune the community structure of the interactions. tunes the size of community while tunes the variability of the level of affiliation to this community; larger values imply more separated communities.

The hyperparameter tunes the sparsity and the degree heterogeneity: larger values imply higher sparsity and heterogeneity. It also tunes the slope of the degree distribution. Parameter tunes the exponential cut-off in the degree distribution. This is illustrated in Figure 3.

Finally, the hyperparameters and tune the overall number of interactions and nodes.

## 4 Properties

### 4.1 Connection to sparse vertex-exchangeable and edge-exchangeable models

The model is a natural extension of sparse vertex-exchangeable and edge-exchangeable graph models. Let

be a binary variable indicating if there is at least one interaction in

between nodes and in either direction. We assumewhich corresponds to the probability of a connection in the static simple graph model proposed by Todeschini et al. (2016). Additionally, for fixed and (no reciprocal relationships), the model corresponds to a rank- extension of the rank-1 Poissonized version of edge-exchangeable models considered by Cai et al. (2016) and Janson (2017a). The sparsity properties of our model follow from the sparsity properties of these two classes of models.

### 4.2 Sparsity

The size of the dataset is tuned by both and

. Given these quantities, both the number of interactions and the number of nodes with at least one interaction are random variables. We now study the behaviour of these quantities, showing that the model exhibits sparsity. Let

be the overall number of interactions between nodes with label until time , the total number of pairs of nodes with label who had at least one interaction before time , and the number of nodes with label who had at least one interaction before time respectively.We provide in the supplementary material a theorem for the exact expectations of , .
Now consider the asymptotic behaviour of the expectations of , and , as and go to infinity.^{1}^{1}1 We use the following asymptotic notations. if , if and if both and .
Consider fixed and that tends to infinity. Then,

as tends to infinity. For , the number of edges and interactions grows quadratically with the number of nodes, and we are in the dense regime. When , the number of edges and interaction grows subquadratically, and we are in the sparse regime. Higher values of lead to higher sparsity. For fixed ,

and as tends to infinity. Sparsity in arises when for the number of edges and when for the number of interactions. The derivation of the asymptotic behaviour of expectations of , and follows the lines of the proofs of Theorems 3 and 5.3 in (Todeschini et al., 2016) and Lemma D.6 in the supplementary material of (Cai et al., 2016) , and is omitted here.

## 5 Approximate Posterior Inference

Assume a set of observed interactions between individuals over a period of time . The objective is to approximate the posterior distribution where are the kernel parameters and