1 Introduction
Like their inhabitants, countries interact with one another: they consult, negotiate, trade, threaten, and fight. These interactions are seldom uncoordinated. Rather, they are connected by a fabric of overlapping communities, such as security coalitions, treaties, trade cartels, and military alliances. For example, OPEC coordinates the petroleum export policies of its thirteen member countries, LAIA fosters trade among Latin American countries, and NATO guarantees collective defense against attacks by external parties.
A single country can belong to multiple communities, reflecting its different identities. For example, Venezuela—an oilproducing country and a Latin American country—is a member of both OPEC and LAIA. When Venezuela interacts with other countries, it sometimes does so as an OPEC member and sometimes does so as a LAIA member.
Countries engage in both withincommunity and betweencommunity interactions. For example, when acting as an OPEC member, Venezuela consults with other OPEC countries, but trades with nonOPEC, oilimporting countries. Moreover, although Venezuela engages in betweencommunity interactions when trading as an OPEC member, it engages in withincommunity interactions when trading as a LAIA member. To understand or predict how countries interact, we must account for their community memberships and how those memberships influence their actions.
In this paper, we take a new approach to learning overlapping communities from interaction events of the form “country took action toward country at time .” A data set of such interaction events can be represented as either 1) a set of event tokens, 2) a tensor of event type counts, or 3) a series of weighted multinetworks. Models that use the token representation naturally yield efficient inference algorithms, models that use the tensor representation exhibit good predictive performance, and models that use the network representation learn latent structure that aligns with wellknown concepts such as communities. Previous models of interaction event data have each used a subset of these representations. Our approach—Bayesian Poisson Tucker decomposition (BPTD)—takes advantage of all three.
BPTD builds on the classic Tucker decomposition (Tucker, 1964) to factorize a tensor of event type counts into three factor matrices and a fourdimensional core tensor (section 2). The factor matrices embed countries into communities, action types into “topics,” and time steps into “regimes.” The core tensor interacts communities, topics, and regimes. The country–community factors enable BPTD to learn overlapping community memberships, while the core tensor enables it to learn directed community–community interaction networks specific to topics of action types and temporal regimes. Figure 1 illustrates this structure. BPTD leads to an efficient MCMC inference algorithm (section 4) and achieves better predictive performance than related models (section 6). Finally, BPTD discovers interpretable latent structure that agrees with our knowledge of international relations (section 7).

2 Bayesian Poisson Tucker Decomposition
We can represent a data set of interaction events as a set of event tokens, where a single token indicates that sender country took action toward receiver country during time step . Alternatively, we can aggregate these event tokens into a fourdimensional tensor , where element is a count of the number of events of type . This tensor will be sparse because most event types never actually occur in practice. Finally, we can equivalently view this count tensor as a series of weighted multinetwork snapshots, where the weight on edge in the snapshot is .
BPTD models each element of count tensor as
(1) 
where , , , , and are positive real numbers. Factors and capture the rates at which countries and participate in communities and , respectively; factor captures the strength of association between action and topic ; and captures how well regime explains the events in time step . We can collectively view the country–community factors as a latent factor matrix , where the row represents country ’s community memberships. Similarly, we can view the action–topic factors and the timestep–regime factors as latent factor matrices and , respectively. Factor captures the rate at which community takes actions associated with topic toward community during regime . The such factors form a core tensor that interacts communities, topics, and regimes.
The country–community factors are gammadistributed,
(2) 
where the shape and rate parameters and are specific to country . We place an uninformative gamma prior over these shape and rate parameters: . This hierarchical prior enables BPTD to express heterogeneity in the countries’ rates of activity. For example, we expect that the US will engage in more interactions than Burundi.
The action–topic and timestep–regime factors are also gammadistributed; however, we assume that these factors are drawn directly from an uninformative gamma prior,
(3) 
Because BPTD learns a single embedding of countries into communities, it preserves the traditional networkbased notion of community membership. Any sender–receiver asymmetry is captured by the core tensor , which we can view as a compression of count tensor . By allowing ondiagonal elements, which we denote by and offdiagonal elements to be nonzero, the core tensor can represent both within and betweencommunity interactions.
The elements of the core tensor are gammadistributed,
(4)  
(5) 
Each community has two positive weights and that capture its rates of within and betweencommunity interaction, respectively. Each topic has a positive weight , while each regime has a positive weight . We place an uninformative prior over the withincommunity interaction rates and gamma shrinkage priors over the other weights: , , , and . These priors bias BPTD toward learning latent structure that is sparse. Finally, we assume that and are drawn from an uninformative gamma prior: .
As , the topic weights and their corresponding action–topic factors constitute a draw from a gamma process (Ferguson, 1973). Similarly, as , the regime weights and their corresponding timestep–regime factors constitute a draw from another gamma process. As , the within and betweencommunity interaction weights and their corresponding country–community factors constitute a draw from a marked gamma process (Kingman, 1972). The mark associated with atom is . We can view the elements of the core tensor and their corresponding factors as a draw from a gamma process, provided that the expected sum of the core tensor elements is finite. This multirelational gamma process extends the relational gamma process of Zhou (2015).
Proposition 1: In the limit as , the expected sum of the core tensor elements is finite and equal to
We prove this proposition in the supplementary material.
3 Connections to Previous Work
Poisson CP decomposition: DuBois & Smyth (2010) developed a model that assigns each event token (ignoring time steps) to one of latent classes, where each class is characterized by three categorical distributions— over senders, over receivers, and over actions—i.e.,
(6) 
This model is closely related to the Poissonbased model of Schein et al. (2015), which explicitly uses the canonical polyadic (CP) tensor decomposition (Harshman, 1970) to factorize count tensor into four latent factor matrices. These factor matrices jointly embed senders, receivers, action types, and time steps into a dimensional space,
(7) 
where , , , and are positive real numbers.
Schein et al.’s model generalizes Bayesian Poisson matrix factorization (Cemgil, 2009; Gopalan et al., 2014, 2015; Zhou & Carin, 2015) and nonBayesian Poisson CP decomposition (Chi & Kolda, 2012; Welling & Weber, 2001).
Although Schein et al.
’s model is expressed in terms of a tensor of event type counts, the relationship between the multinomial and Poisson distributions
(Kingman, 1972) means that we can also express it in terms of a set of event tokens. This yields an equation that is similar to equation 6,(8) 
Conversely, DuBois & Smyth’s model can be expressed as a CP tensor decomposition. This equivalence is analogous to the relationship between Poisson matrix factorization and latent Dirichlet allocation (Blei et al., 2003).
We can make Schein et al.’s model nonparametric by adding a perclass positive weight , i.e.,
(9) 
As the perclass weights and their corresponding latent factors constitute a draw from a gamma process.
Adding this perclass weight reveals that CP decomposition is a special case of Tucker decomposition where the cardinalities of the latent dimensions are equal and the offdiagonal elements of the core tensor are zero. DuBois & Smyth’s and Schein et al.’s models are therefore highly constrained special cases of BPTD that cannot capture dimensionspecific structure, such as communities of countries or topics of action types. These models require each latent class to jointly summarize information about senders, receivers, action types, and time steps. This requirement conflates communities of countries and topics of action types, thus forcing each class to capture potentially redundant information. Moreover, by definition, CP decomposition models cannot express betweencommunity interactions and cannot express sender–receiver asymmetry without learning completely separate latent factor matrices for senders and receivers. These limitations make it hard to interpret these models as learning community memberships.
Infinite relational models: The infinite relational model (IRM) of Kemp et al. (2006) also learns latent structure specific to each dimension of an dimensional tensor; however, unlike BPTD, the elements of this tensor are binary, indicating the presence or absence of the corresponding event type. The IRM therefore uses a Bernoulli likelihood. Schmidt & Mørup (2013) extended the IRM to model a tensor of event counts by replacing the Bernoulli likelihood with a Poisson likelihood (and gamma priors):
(10) 
where are the respective community assignments of countries and , is the topic assignment of action , and is the regime assignment of time step . This model, which we refer to as the gamma–Poisson IRM (GPIRM), allocates dimensional event types to dimensional latent classes—e.g., it allocates all tokens of type to class .
The GPIRM is a special case of BPTD where the rows of the latent factor matrices are constrained to be “onehot” binary vectors—i.e.,
, , , and . With this constraint, the Poisson rates in equations 1 and 10 are equal. Unlike BPTD, the GPIRM is a singlemembership model. In addition, it cannot express heterogeneity in rates of activity of the countries, action types, and time steps. The latter limitation can be remedied by letting , , , and be positive real numbers. We refer to this variant of the GPIRM as the degreecorrected GPIRM (DCGPIRM).Stochastic block models: The IRM itself generalizes the stochastic block model (SBM) of Nowicki & Snijders (2001), which learns latent structure from binary networks. Although the SBM was originally specified using a Bernoulli likelihood, Karrer & Newman (2011) introduced an alternative specification that uses the Poisson likelihood:
(11) 
where , , and is a positive real number. Like the IRM and the GPIRM, the SBM is a singlemembership model and cannot express heterogeneity in the countries’ rates of activity. Airoldi et al. (2008) addressed the former limitation by letting such that . Meanwhile, Karrer & Newman (2011) addressed the latter limitation by allowing both and to be positive real numbers, much like the DCGPIRM. Ball et al. (2011) simultaneously addressed both limitations by letting , but constrained . Finally, Zhou (2015) extended Ball et al.
’s model to be nonparametric and introduced the Poisson–Bernoulli distribution to link binary data to the Poisson likelihood in a principled fashion. In this model, the elements of the core matrix and their corresponding factors constitute a draw from a relational gamma process.
NonPoisson Tucker decomposition: Researchers sometimes refer to the Poisson rate in equation 11 as being “bilinear” because it can equivalently be written as . Nickel et al. (2012) introduced RESCAL—a nonprobabilistic bilinear model for binary data that achieves stateoftheart performance at relation extraction. Nickel et al. (2015) then introduced several extensions for extracting relations of different types. Bilinear models, such as RESCAL and its extensions, are all special cases (albeit nonprobabilistic ones) of Tucker decomposition.
4 Posterior Inference
Given an observed count tensor , inference in BPTD involves “inverting” the generative process to obtain the posterior distribution over the parameters conditioned on
and hyperparameters
and . The posterior distribution is analytically intractable; however, we can approximate it using a set of posterior samples. We draw these samples using Gibbs sampling, repeatedly resampling the value of each parameter from its conditional posterior given , , , and the current values of the other parameters. We express each parameter’s conditional posterior in a closed form using gamma–Poisson conjugacy and the auxiliary variable techniques of Zhou & Carin (2012). We provide the conditional posteriors in the supplementary material.The conditional posteriors depend on via a set of “latent sources” (Cemgil, 2009) or subcounts. Because of the Poisson additivity theorem (Kingman, 1972), each latent source
is a Poissondistributed random variable:
(12)  
(13) 
Together, equations 12 and 13 are equivalent to equation 1. In practice, we can equivalently view each latent source in terms of the token representation described in section 2,
(14) 
where each token’s class assignment is an auxiliary latent variable. Using this representation, computing the latent sources (given the current values of the model parameters) simply involves allocating event tokens to classes, much like the inference algorithm for DuBois & Smyth’s model, and aggregating them using equation 14. The conditional posterior for each token’s class assignment is
(15) 
Computation is dominated by the normalizing constant
(16) 
Computing this normalizing constant naïvely involves operations; however, because each latent class is composed of four separate dimensions, we can improve efficiency. We instead compute
(17) 
which involves operations.
Compositional allocation using equations 15 and 17 improves computational efficiency significantly over naïve noncompositional allocation using equations 15 and 16. In practice, we set , , and to large values to approximate the nonparametric interpretation of BPTD. If, for example, , , and , computing the normalizing constant for equation 15 using equation 16 requires 2,753 times the number of operations implied by equation 17.
Proposition 2: For an dimensional core tensor with elements, computing the normalizing constant using noncompositional allocation requires times the number of operations required to compute it using compositional allocation. When , . As for any and , .
We prove this proposition in the supplementary material.
BPTD and other Poissonbased models yield allocation inference algorithms that take advantage of the inherent sparsity of the data and scale with the number of event tokens. In contrast, nonPoisson tensor decomposition models (including Hoff’s model) lead to algorithms that scale with the size of the count tensor. Allocationbased inference in BPTD is especially efficient because it compositionally allocates each dimensional event token to an dimensional latent class. Figure 2 illustrates this process. CP decomposition models, such as those of DuBois & Smyth (2010) and Schein et al. (2015), only permit noncompositional allocation. For example, while BPTD allocates each token to a fourdimensional latent class , Schein et al.’s model allocates to a onedimensional latent class that cannot be decomposed. Therefore, when , BPTD yields a faster allocation inference algorithm than Schein et al.’s model.
5 Country–Country Interaction Event Data
Our data come from the Integrated Crisis Early Warning System (ICEWS) of Boschee et al. and the Global Database of Events, Language, and Tone (GDELT) of Leetaru & Schrodt (2013). ICEWS and GDELT both use the Conflict and Mediation Event Observations (CAMEO) hierarchy (Gerner et al., ) for senders, receivers, and actions.
The toplevel CAMEO coding for senders and receivers is their country affiliation, while lower levels in the hierarchy incorporate more specific attributes like their sectors (e.g., government or civilian) and their religious or ethnic affiliations. When studying international relations using CAMEOcoded event data, researchers usually consider only the senders’ and receivers’ countries. There are 249 countries represented in ICEWS, which include nonuniversally recognized states, such as Occupied Palestinian Territory, and former states, such as Former Yugoslav Republic of Macedonia; there are 233 countries in GDELT.
The top level for actions, which we use in our analyses, consists of twenty action classes, roughly ranked according to their overall sentiment. For example, the most negative is 20—Use Unconventional Mass Violence. CAMEO further divides these actions into the QuadClass scheme: Verbal Cooperation (actions 2–5), Material Cooperation (actions 6–7), Verbal Conflict (actions 8–16), and Material Conflict (16–20). The first action (1—Make Statement) is neutral.
6 Predictive Analysis
Baseline models: We compared BPTD’s predictive performance to that of three baseline models, described in section 3: 1) GPIRM, 2) DCGPIRM, and 3) the Bayesian Poisson tensor factorization (BPTF) model of Schein et al. (2015). All three models use a Poisson likelihood and have the same two hyperparameters as BPTD—i.e., and . We set to 0.1, as recommended by Gelman (2006), and we set so that . This parameterization encourages the elements of the core tensor to be sparse. We implemented an MCMC inference algorithm for each model. We provide the full generative process for all three models in the supplementary material.
GPIRM and DCGPIRM are both Tucker decomposition models and thus allocate events to fourdimensional latent classes. The cardinalities of these latent dimensions are the same as BPTD’s—i.e., , , and . In contrast, BPTF is a CP decomposition model and thus allocates events to onedimensional latent classes. We set the cardinality of this dimension so that the total number of latent factors in BPTF’s likelihood was equal to the total number of latent factors in BPTD’s likelihood—i.e., . We chose not to let BPTF and BPTD use the same number of latent classes—i.e., to set . BPTF does not permit noncompositional allocation, so MCMC inference becomes very slow for even moderate values of , , and . CP decomposition models also tend to overfit when is large (Zhao et al., 2015). Throughout our predictive experiments, we let , , and . These values were wellsupported by the data, as we explain in section 7.
Experimental setup: We constructed twelve different observed tensors—six from ICEWS and six from GDELT. Five of the six tensors for each source (ICEWS or GDELT) correspond to oneyear time spans with monthly time steps, starting with 2004 and ending with 2008; the sixth corresponds to a fiveyear time span with monthly time steps, spanning 1995–2000. We divided each tensor into a training tensor and a test tensor
. We further divided each test tensor into a heldout portion and an observed portion via a binary mask. We experimented with two different masks: one that treats the elements involving the most active fifteen countries as the heldout portion and the remaining elements as the observed portion, and one that does the opposite. The first mask enabled us to evaluate the models’ reconstructions of the densest (and arguably most interesting) portion of each test tensor, while the second mask enabled us to evaluate their reconstructions of its complement. Across the entire GDELT database, for example, the elements involving the most active fifteen countries—i.e., 6% of all 233 countries—account for 30% of the event tokens. Moreover, 40% of these elements are nonzero. These nonzero elements are highly dispersed, with a variancetomean ratio of 220. In contrast, only 0.7% of the elements involving the other countries are nonzero. These elements have a variancetomean ratio of 26.
For each combination of the four models, twelve tensors, and two masks, we ran 5,000 iterations of MCMC inference on the training tensor. We clamped the country–community factors, the action–topic factors, and the core tensor and then inferred the timestep–regime factors for the test tensor using its observed portion by running 1,000 iterations of MCMC inference. We saved every tenth sample after the first 500. We used each sample, along with the country–community factors, the action–topic factors, and the core tensor, to compute the Poisson rate for each element in the heldout portion of the test tensor. Finally, we averaged these rates across samples and used each element’s average rate to compute its probability. We combined the heldout elements’ probabilities by taking their geometric mean or, equivalently, by computing their inverse perplexity. We chose this combination strategy to ensure that the models were penalized heavily for making poor predictions on the nonzero elements and were not rewarded excessively for making good predictions on the zero elements. By clamping the country–community factors, the action–topic factors, and the core tensor after training, our experimental setup is analogous to that used to assess collaborative filtering models’ strong generalization ability
(Marlin, 2004).Results: Figure 3 illustrates the results for each combination of the four models, twelve tensors, and two masks. The top row contains the results from the twelve experiments involving the first mask, where the elements involving the most active fifteen countries were treated as the heldout portion. BPTD outperformed the baselines significantly. BPTF—itself a stateoftheart model—performed better than BPTD in only one experiment. In general, the Tucker decomposition allows BPTD to learn richer latent structure that generalizes better to heldout data. The bottom row contains the results from the experiments involving the second mask. The models’ performance was closer in these experiments, probably because of the large proportion of easytopredict zero elements. BPTD and BPTF performed indistinguishably in these experiments, and both models outperformed the GPIRM and the DCGPIRM. The singlemembership nature of the GPIRM and the DCGPIRM prevents them from expressing high levels of heterogeneity in the countries’ rates of activity. When the heldout elements were highly dispersed, these models sometimes made extremely inaccurate predictions. In contrast, the mixedmembership nature of BPTD and BPTF allows them to better express heterogeneous rates of activity.
7 Exploratory Analysis
We used a tensor of ICEWS events spanning 1995–2000, with monthly time steps, to explore the latent structure discovered by BPTD. We initially let , , and —i.e., latent classes—and used the shrinkage priors to adaptively learn the most appropriate numbers of communities, topics, and regimes. We found communities and topics with weights that were significantly greater than zero. We provide a plot of the community weights in the supplementary material. Although all three regimes had nonzero weights, one had a much larger weight than the other two. For comparison, Schein et al. (2015) used fifty latent classes to model the same data, while Hoff (2015) used , , and to model a similar tensor from GDELT.
Topics of action types: We show the inferred action–topic factors as a heatmap in the left subplot of figure 4. We ordered the topics by their weights , which are above the heatmap. The inferred topics correspond very closely to CAMEO’s QuadClass scheme. Moving from left to right, the topics place their mass on increasingly negative actions. Topics 1 and 2 place most of their mass on Verbal Cooperation actions; topic 3 places most of its mass on Material Cooperation actions and the neutral 1—Make Statement action; topic 4 places most of its mass on Verbal Conflict actions and the 1—Make Statement action; and topics 5 and 6 place their mass on Material Conflict actions.
Topicpartitioned community–community networks: In the right subplot of figure 4, we visualize the inferred community structure for topic and the most active regime . The bottomleft heatmap is the community–community interaction network . The topleft heatmap depicts the rate at which each country acts as a sender in each community —i.e., . Similarly, the bottomright heatmap depicts the rate at which each country acts as a receiver in each community. The topright heatmap depicts the number of times each country took an action associated with topic toward each country during regime —i.e., . We grouped the countries by their strongest community memberships and ordered the communities by their withincommunity interaction weights , from smallest to largest; the thin green lines separate the countries that are strongly associated with one community from the countries that are strongly associated with its adjacent communities.
Some communities contain only one or two strongly associated countries. For example, community 1 contains only the US, community 6 contains only China, and community 7 contains only Russia and Belarus. These communities mostly engage in betweencommunity interaction. Other larger communities, such as communities 9 and 15, mostly engage in withincommunity interaction. Most communities have a strong geographic interpretation. Moving upward from the bottom, there are communities that correspond to Eastern Europe, East Africa, SouthCentral Africa, Latin America, Australasia, Central Europe, Central Asia, etc. The community–community interaction network summarizes the patterns in the topright heatmap. This topic is dominated by the 4–Consult action, so the network is symmetric; the more negative topics have asymmetric community–community interaction networks. We therefore hypothesize that cooperation is an inherently reciprocal type of interaction. We provide visualizations for the other five topics in the supplementary material.
8 Summary
We presented Bayesian Poisson Tucker decomposition (BPTD) for learning the latent structure of international relations from country–country interaction events of the form “country took action toward country at time .” Unlike previous models, BPTD takes advantage of all three representations of an interaction event data set: 1) a set of event tokens, 2) a tensor of event type counts, and 3) a series of weighted multinetwork snapshots. BPTD uses a Poisson likelihood, respecting the discrete nature of the data and its inherent sparsity. Moreover, BPTD yields a compositional allocation inference algorithm that is more efficient than noncompositional allocation algorithms. Because BPTD is a Tucker decomposition model, it shares parameters across latent classes. In contrast, CP decomposition models force each latent class to capture potentially redundant information. BPTD therefore “does more with less.” This efficiency is reflected in our predictive analysis: BPTD outperforms BPTF—a CP decomposition model—as well as two other baselines. BPTD learns interpretable latent structure that aligns with wellknown concepts from the networks literature. Specifically, BPTD learns latent country–community memberships, including the number of communities, as well as directed community–community interaction networks that are specific to topics of action types and temporal regimes. This structure captures the complexity of country–country interactions, while revealing patterns that agree with our knowledge of international relations. Finally, although we presented BPTD in the context of interaction events, BPTD is well suited to learning latent structure from other types of multidimensional count data.
Acknowledgements
We thank Abigail Jacobs and Brandon Stewart for helpful discussions. This work was supported by NSF #SBE0965436, #IIS1247664, #IIS1320219; ONR #N000141110651; DARPA #FA87501420009, #N6600115C4032; Adobe; the John Templeton Foundation; the Sloan Foundation; the UMass Amherst Center for Intelligent Information Retrieval. Any opinions, findings, conclusions, or recommendations expressed in this material are the authors’ and do not necessarily reflect those of the sponsors.
References

Airoldi et al. (2008)
Airoldi, E. M., Blei, D. M., Feinberg, S. E., and Xing, E. P.
Mixed membership stochastic blockmodels.
Journal of Machine Learning Research
, 9:1981–2014, 2008.  Ball et al. (2011) Ball, B., Karrer, B., and Newman, M. E. J. Efficient and principled method for detecting communities in networks. Physical Review E, 84(3), 2011.
 Blei et al. (2003) Blei, D., Ng, A., and Jordan, M. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.
 (4) Boschee, E., Lautenschlager, J., O’Brien, S., Shellman, S., Starz, J., and Ward, M. ICEWS coded event data. Harvard Dataverse. V10.
 Cemgil (2009) Cemgil, A. T. Bayesian inference for nonnegative matrix factorisation models. Computational Intelligence and Neuroscience, 2009.
 Chi & Kolda (2012) Chi, E. C. and Kolda, T. G. On tensors, sparsity, and nonnegative factorizations. SIAM Journal on Matrix Analysis and Applications, 33(4):1272–1299, 2012.
 Cichocki et al. (2009) Cichocki, A., Zdunek, R., Phan, A. H., and i Amari, S. Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory MultiWay Data Analysis and Blind Source Separation. John Wiley & Sons, 2009.
 DuBois & Smyth (2010) DuBois, C. and Smyth, P. Modeling relational events via latent classes. In Proceedings of the Sixteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 803–812, 2010.
 Ferguson (1973) Ferguson, T. S. A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1(2):209–230, 1973.
 Gelman (2006) Gelman, A. Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1(3):515–533, 2006.
 (11) Gerner, D. J., Schrodt, P. A., AbuJabr, R., and Yilmaz, Ö. Conflict and mediation event observations (CAMEO): A new event data framework for the analysis of foreign policy interactions. Working paper.

Gopalan et al. (2014)
Gopalan, P., Ruiz, F. J. R., Ranganath, R., and Blei, D. M.
Bayesian nonparametric Poisson factorization for recommendation
systems.
In
Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics
, volume 33, pp. 275–283, 2014.  Gopalan et al. (2015) Gopalan, P., Hofman, J., and Blei, D. Scalable recommendation with Poisson factorization. In Proceedings of the ThirtyFirst Conference on Uncertainty in Artificial Intelligence, 2015.
 Harshman (1970) Harshman, R. Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multimodal factor analysis. UCLA Working Papers in Phonetics, 16:1–84, 1970.
 Hoff (2014) Hoff, P. Multilinear tensor regression for longitudinal relational data. arXiv:1412.0048, 2014.
 Hoff (2015) Hoff, P. Equivariant and scalefree Tucker decomposition models. Bayesian Analysis, 2015.
 Karrer & Newman (2011) Karrer, B. and Newman, M. E. J. Stochastic blockmodels and community structure in networks. Physical Review E, 83(1), 2011.
 Kemp et al. (2006) Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T., and Ueda, N. Learning systems of concepts with an infinite relational model. In Proceedings of the TwentyFirst National Conference on Artificial Intelligence, 2006.

Kim & Choi (20007)
Kim, Y.D. and Choi, S.
Nonnegative Tucker decomposition.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, 20007.  Kingman (1972) Kingman, J. F. C. Poisson Processes. Oxford University Press, 1972.
 Kolda & Bader (2009) Kolda, T. G. and Bader, B. W. Tensor decompositions and applications. SIAM Review, 51(3):455–500, 2009.
 Leetaru & Schrodt (2013) Leetaru, K. and Schrodt, P. GDELT: Global data on events, location, and tone, 1979–2012. Working paper, 2013.
 Marlin (2004) Marlin, B. Collaborative filtering: A machine learning perspective. Master’s thesis, University of Toronto, 2004.
 Mørup et al. (2008) Mørup, M., Hansen, L. K., and Arnfred, S. M. Algorithms for sparse nonnegative Tucker decompositions. Neural Computation, 20(8):2112–2131, 2008.
 Nickel et al. (2012) Nickel, M., Tresp, V., and Kriegel, H.P. Factorizing YAGO: Scalable machine learning for linked data. In Proceedings of the TwentyFirst International World Wide Web Conference, pp. 271–280, 2012.
 Nickel et al. (2015) Nickel, M., Murphy, K., Tresp, V., and Gabrilovich, E. A review of relational machine learning for knowledge graphs: From multirelational link prediction to automated knowledge graph construction. arXiv:1503.00759, 2015.
 Nowicki & Snijders (2001) Nowicki, K. and Snijders, T. A. B. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96(455):1077–1087, 2001.
 Schein et al. (2015) Schein, A., Paisley, J., Blei, D. M., and Wallach, H. Bayesian Poisson tensor factorization for inferrring multilateral relations from sparse dyadic event counts. In Proceedings of the TwentyFirst ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1045–1054, 2015.
 Schmidt & Mørup (2013) Schmidt, M. N. and Mørup, M. Nonparametric Bayesian modeling of complex networks: An introduction. IEEE Signal Processing Magazine, 30(3):110–128, 2013.
 Tucker (1964) Tucker, L. R. The extension of factor analysis to threedimensional matrices. In Frederiksen, N. and Gulliksen, H. (eds.), Contributions to Mathematical Psychology. Holt, Rinehart and Winston, 1964.
 Welling & Weber (2001) Welling, M. and Weber, M. Positive tensor factorization. Pattern Recognition Letters, 22(12):1255–1261, 2001.
 Xu et al. (2012) Xu, Z., Yan, F., and Qi, Y. Infinite Tucker decomposition: Nonparametric Bayesian models for multiway data analysis. In Proceedings of the TwentyNinth International Conference on Machine Learning, pp. 1023–1030, 2012.
 Zhao et al. (2015) Zhao, Q., Zhang, L., and Cichocki, A. Bayesian CP factorization of incomplete tensors with automatic rank determination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9):1751–1763, 2015.
 Zhou (2015) Zhou, M. Infinite edge partition models for overlapping community detection and link prediction. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, pp. 1135–1143, 2015.
 Zhou & Carin (2012) Zhou, M. and Carin, L. Augmentandconquer negative binomial processes. In Advances in Neural Information Processing Systems TwentyFive, pp. 2546–2554, 2012.
 Zhou & Carin (2015) Zhou, M. and Carin, L. Negative binomial process count and mixture modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2):307–320, 2015.