1 Introduction
Dynamic networks are commonplace in many fields, such as economics, neuroscience, biology, recommender systems, data mining, and others. In these networks, different entities are represented by nodes, and their interactions over time are tracked through their edges. Dynamics networks are interrelated with the statistical relational models [1]
presented in artificial intelligence
[2]. Statistical relational models often are characterized as: graphical models [3], latent class models [4] and tensor factorization models [5]. Interestingly, there is a natural interconnection between the concept of latent models, graphical models and tensor factorization [6, 7]. The relational datasets are in the form of graphs, with nodes and edges representing, respectively, the various entities and their relationships [8]. The relational data is a graphstructured knowledge data that contain information for the relationships between some entities.Tensor factorization methods have been proposed long ago for analyzing relational datasets and dynamic networks as the dynamics can be fully represented as three dimensional tensor with the first two axes indexing the nodes, and the third axis indexing time, Figure 1. It was shown that the RESCAL tensor decomposition can reveal notable interactions in dynamic asymmetric pairwise relationship tensor [5], especially, when restricting the factors to be nonnegative, which make the model parts based and highly interpretable [9].
One of the fundamental challenges in any model analysis is the determination of model hyperparameters
[10]. For the tensor decomposition this means selection of the correct latent dimension in the data that is related with the estimation of the rank of the analyzed tensor [11], which is an NPhard problem [12]. The development of various procedures for model selection is an active research topic that include heuristics such as the Automatic Relevance Determination(ARD)
[13] or generalized class of information criteria for tensors with specific properties [14]. Especially in economics extracting easy understandable latent variables impacts important questions about the hidden mechanisms, causes, propagating channels, and groups of any international trade and economic events[15, 16].RESCAL can be considered as a specific nonnegative Tucker2 decomposition with two equal factors, because of the inherent symmetry of the data. Hence, in RESCAL we need to know the multirank rather the rank of the analyzed tensor. In the case of nonnegative decomposition and even in presence of deficiency of the factors we can apply NMF to the corresponding unfoldings of the tensor, to find the minimal multi rank [17].
To determine the latent dimensionality in NMF, here we utilize a recent model determination technique, called NMFk, [18, 19] that is scalable [20], and has been used to decompose the biggest collection of human cancer genomes [21]. NMFk integrate the classical NMF with custom clustering and Silhouette statistics [22] and works on the principle of stability of the extracted latent variables, from several NMFminimizations combined with the accuracy of the minimization in order to estimate the optimal number of latent features. Here, we report the application of NMFk to determine latent dimensions, to the nonnegative RESCAL tensor decomposition. We demonstrate the efficacy of our method on synthetic data and apply our model determination method to decompose an wellknown international trade flow dataset and validate our results against established economic empirical findings.
1.1 Preliminaries and Notation
Throughout this work vectors are denoted with lowercase bold letters,
, matrices are denoted with uppercase letters , and tensors are denoted with uppercase script letters, . Mode1 multiplication between an order three tensor and a matrix is defined , and similarly for modes 2 and 3. The Frobenius norm of a matrix or tensor is the square root of the sum of the squares of the elements, or , . A single subscript on a matrix, or tensor indicates the slice of the object along the last index, so indicates the column of the matrix, and indicates the matrix along the third mode. Additionally, a superscript in parentheses is used to enumerate items in a set or ensemble, .1.2 Related Works
Here we describe some relevant works that are working with a similar decomposition model. The first proposed model was the single domain Decomposition into Directional Components (DEDICOM) model [23]. DEDICOM is capable of analyzing asymmetric data in marketing research and it has both matrix and tensor versions without constraints on its factors:
twoway DEDICOM  
threeway DEDICOM 
In the threeway DEDICOM model, is the frontal slice of the tensor , and is a diagonal matrix. These models describe a single domain in the sense that they require the row space to be the same as the column space. Since there is no constraint on the factors, these decompositions can be estimated similarly as SVD.
Later, an alternating algorithm, Alternating Simultaneous Approximation Least Square and Newton (ASALSAN) was proposed to perform a threeway DEDICOM [24] that can be applied to large and sparse data. Moreover, the nonnegative version of threeway DEDICOM was also introduced using a multiplicative update algorithm. The model was then applied to analyze email network data and export/import data. However, the latent dimension was selected without being justified, and the meaning of the extracted factors was not analyzed in details.
A relaxed version of threeway DEDICOM is the RESCAL model [5]. The model decomposes a threeway tensor as follows:
In [5], the factors and minimized the regularized minimization problem, and can be estimated using a variation of ASALSAN:
The model was then applied to different datasets, and the authors suggested to find latent dimension using crossvalidation. Instead, they chose a rather large dimension (
), and then used kmeans clustering method to cluster the matrix
into 6 groups.Following the initial work, Ref. [25] introduced updating schemes for different variants of the nonnegative RESCAL mode, including leastsquares with regularization, KLdivergence with regularization, and other.
2 Methods and Implementation
2.1 Nonnegative RESCAL
The RESCAL model is a tensor decomposition that takes advantage of the inherent structure of relational properties such as those expected in a dynamic network. More specifically, RESCAL decomposes an order three tensor by finding a common low dimensional latent space for the first two modes such that
where is an matrix containing the features, and is an tensor capturing the mixing relations between the features. In practice, a decomposition is always approximated by solving the optimization problem
(1) 
When applied to a dynamic networks, RESCAL is interpretable in the way that each column of represents a group of objects or nodes, and represents the relations among these groups at time . An equivalent problem statement demonstrates that RESCAL simultaneously decomposes each slice of an order three tensor with a rank factorization,
(2) 
With this interpretation RESCAL attempts to find a common set of features that can be simultaneously used to represent both the row space and the column space of the matrices . To remove a scaling ambiguity, RESCAL also can constrain the columns of to be unit norm, for .
Quite often with nonnegative data, a nonnegative decomposition provides more insightful features with parts based decomposition [9]. Nonnegative RESCAL provides the same advantage by decomposing the data with nonnegative features, and nonnegative mixing matrices with the optimization
(3)  
subject to 
.
2.2 Multiplicative Update Algorithm
To solve the nonnegative constraint minimization problem in equation 3, we use the multiplicative update scheme similar to the one used for DEDICOM model [24], and for nonnegative RESCAL [25]. Starting from nonnegative random initialization of and , the following update steps are performed until the relative error converges to a predefined tolerance:
(4) 
where is the vectorize operator. After the multiplicate update scheme converges, and are appropriately scaled such that columns of have a sum equal to one. Notice that the decomposition can be scaled without affecting the reconstruction error.
2.3 Model Selection
To select an appropriate latent dimension, we adapt the clustering procedure that has found success in NMF [18]. For each explored latent dimension, , our procedure applies three steps: (i) bootstrapping by resampling the data, (ii) decomposing the bootstrapped data, and then (iii) analyzing the cluster stability of the solutions. Figure 3 shows a diagram of the model selection scheme.
To resample the data, we construct an ensemble of tensors sampled from for where
is a uniform distribution between
and . Each resampling introduces some variability in the data which mitigates the possibility of overfitting.We use custom clustering algorithm designed to exploit the stability of the NMF’s solutions corresponding to the resampled data. The custom property of the clustering is that each cluster should contain precisely one feature vector from each . Our algorithm is based on kmeans but iterates over each
to assign each vector to an appropriate centroid. The assignment is done by a greedy algorithm applied to the cosine similarity between the columns of
and the current centroids.To analyze the quality of the clustering, we use silhouette statistics [22]. The silhouette of a single point has a value between 1 and 1 relating how close it is to other points in the same cluster and how far is this point to the closest of the other clusters. A high silhouette score indicates that the clusters are compact and well separated, while a low silhouette score indicates that the clusters are not well separated. We use both the mean silhouette score, as well as the minimum silhouette score of a group as cluster quality metrics.
The selection of the correct latent dimension is accomplished by considering both the silhouette statistics, which measures the stability of the solutions, and the relative error. We expect that with the correct latent dimension the solutions will cluster well with a small relative error. With a too small latent space, the relative error will be too large, and with too large latent space there will be latent features representing the noise in the system that are not stable and hence will not cluster well, resulting in a low silhouette score. We consider the correct latent dimension to be the largest dimension that generates low relative errors and high silhouette scores.
3 Experiments
3.1 Synthetic Data
Here we test the performance of our protocol for model selection in determining the ground truth latent dimension of synthetic datasets with different levels of noise. First, a synthetic data tensor with dimensions and the latent dimension is generated as follows:

Elements of matrix are randomly sampled from . The matrix is only selected if . Otherwise, it is regenerated.

Elements of matrix is sparsified by setting elements smaller than to zeros.

The tensor is also generated from .

Tensor is also sparsified by setting elements smaller than to zeros.

where elements of are sampled from .

The noise level of the data tensor can be adjusted by the value of .

The noise level is computed as .
The latent dimension determination procedure described in 2.3 is then applied to the simulated data. More specifically, the sample perturbation is and the number of iterations .
In these scenarios, for a range of , a tensor of dimension is simulated with the ground truth dimension . The plots of relative reconstruction error, minimum and average silhouette scores are shown in Figure 4. As the noise factor increases, the noise level also increases and gradually distorts the Lshape of the reconstruction error curves. However, the silhouette score curves are consistent up to , and after that, the silhouette score curves start diverging, demonstrating that the clusters are no longer compact and well separated.
3.2 Economic Application: Decompose the International Trade Flows
International trade has been shown to generate mutual benefit between countries by allowing them to focus on specialization and exchange their produced goods and services. Unsurprisingly, it has increasingly contributed to the Gross Domestic Product (GDP). In fact, according to the World Bank, in 2017, the world GDP is about $80 trillion. Of this total, about 29 percent was traded across countries: international trade flows in goods and services is about $23 trillion. It has been shown that the economic growth of a country is strongly associated with its role in the world trading network. Therefore, understanding the trade pattern is an essential step before we can discuss the effects of international trade, or even suggest policy changes.
Here we show that by applying the nonnegative RESCAL model, with our latent dimension determination method, we can decompose the international trade network into different groups of countries whose exports and imports are tightly linked together and their interactions over time are captured in the resulting interacting tensor. More interestingly, the groups’ activities are also meaningful in the sense that they are consistent with stylized economic facts.
International trade flows data
We obtained the international trade flows data from Direction of Trade Statistics, IMF [26]. Specifically, the data contains monthly export amounts in U.S. dollars between 23 countries from January 1981 to December 2015 (420 months). Thus the data tensor has 23 by 23 by 420 dimensions, in which each entry, , represents how much country exports to countries in month . For clarity, the list of countries includes Australia, Canada, China Mainland, Denmark, Finland, France, Germany, Hong Kong, Indonesia, Ireland, Italy, Japan, Korea, Malaysia, Mexico, Netherlands, New Zealand, Singapore, Spain, Sweden, Thailand, United Kingdom, United States.
This dataset has previously been analyzed with variations of the RESCAL model. In [27], Chen et al. also apply the RESCAL model, though not nonnegative, and they do not have a procedure to determine the correct latent dimension, they simply chose a latent dimension of three for illustrative purposes of the model.
Determine the latent dimension & optimal factors
To determine the latent dimension for the model, we resampled as described in Section 2.3 with the uniform distribution so that each value in our ensemble has the measured value error. We resampled from this distribution 50 times to construct our ensemble and calculated the silhouette statistic for a range of dimensions . Figure 5 shows the relative reconstruction errors, the average and minimum silhouette statistics leading us to conclude that that the true latent dimension is 5.
To determine the optimal factors for the analysis, we first ran 100 iterations from random initialization on the original data with the selected dimension , and select the decomposition, which provided the lowest reconstruction error. For both purposes, the stopping criterion is the relative convergence rate .
Interpretation of the decomposition
Here we show the interpretation and analysis of the optimal decomposition. First, since the columns of are normalized to have a sum equal to one, the entry can be interpreted as how much the country contributes into group . Figure 6 shows that the latent factors, which here are the country contribution profile in each group, approximately correspond to five economic regions: Asia and Pacific (without China), Europe, NAFTA (Canada, Mexico, and the U.S.), U.S., and China. These Geoeconomic regions are essential participants in the international trade, verifying that the model selection procedure extracts meaningful latent factors.
Next, we analyze the interacting tensor to determine its economic meanings. We first look at the aggregate export level for each economic group over time by summing across the rows of each without the diagonal elements. By doing this, we exclude the contribution of the groups themselves to their aggregate export activities. We then check how well these approximated export activities agree with four global and local economic recession periods, which are identified by the National Bureau of Economic Research (NBER).
As shown in Fig 7, the approximated activities well match with the international trading trends in all considered periods. For example, during the Great Recession (12/200706/2009), in which international trade was dramatically decreasing, the activities of all groups dropped substantially. Secondly, during early 2000s Recession, which was partially caused by the dotcom bubble and September 11, the activities represent the fact that the trading trend of all groups, except Asia, were dropping. Thirdly, during the Asia Financial Crisis (1997199), only the activities of the group Asia is going down, representing the fact that this Recession mostly affected Asian countries. Lastly, during the European Debt Crisis, which peaked in 20102012, while the economy of other regions was recovering from the Great Recession, only European countries struggled with their high government debt. This styled fact is replicated in the approximated activity of the Europegroup. Overall, the interacting tensor captured the connection between the export and economic health of different economic regions.
Finally, we analyze the interaction tensor by summing the tensor over time, which gives us the matrix describing how strongly these groups interact. is then normalized by its maximum value for better visualization.
We makes three observations from Figure 8. First, the Europian group has the strongest interaction with itself, which makes sense since this is a group of advanced economies, and they have formed the European Union since 1995. Second, the strong twoway connection between NAFTA and the United States and between China and Asia, are also matched with statistics. Third, by comparing the row and column for China and United States, we can see that China is a trade surplus (the amount of exports is greater than the number of imports), and the United States is a trade deficit. Overall, the interacting tensor did extract meaningful information from the data in the sense that they agree with international economics empirical facts, indicating that our model selection procedure can capture meaningful activities from the data.
4 Discussion
In this paper, we introduce a model selection protocol for determination of the dominant latent dimension of the nonnegative RESCAL model. This method, which is based on nonnegativity assumptions on both factors and , evaluates the stability of the latent factors , or equivalently the quality of clusters generated by factorization of a set of different realizations of the input data. The method then selects the highest dimension at which the stability is still high. Our method performs well on sparse synthetic data with different noise levels. Moreover, when applied to a real dataset, the international trade flows data from IMF, the model was able to decompose considered countries into meaningfull geoeconomics regions; with interacting activities matching the trading characteristics of each region and consistent with what has been observed about different economic recessions.
It would be of a particular interest to apply the method presented here to two common modeling challenges in business analytics and quantitative marketing: forecasting, and customer marketing segmentation. For example, traditional forecasting techniques rely almost exclusively on the timeseries properties of the learning data set (usually called statistical forecasting methods). Another set of techniques that have been developed introduces additional (external) variables, and a regressionlike model was fitted with the additional requirement that the residuals are an ARIMAdistributed process; they are referred to as machine learning (ML) based forecasting methods
[28]. However, it is unclear which set of techniques is the better one and would universally work for different types of forecasts: customers forecasting vs. revenue forecasting vs. i.e., inventory forecasting. Two recent papers have carried out an adhoc comparison of statistical vs. machine learning models and have arrived at precisely opposite conclusions using similar testing methodologies and goodness of fit metrics ([28] and [29]). Both statistical and machine learning techniques use the timeseries nature of the data in the forecasting in a specific manner, whereas the method presented here treats the time dimension like all other dimensions of the multidimensional data set. Further, it is of interest to compare the performance of our method for forecasting of customers and revenue and contrast it against the statistical and the ML methodologies. Remarkably, Markidakis et al. ([29]) have also compared the forecasting performance of several Deep Learning algorithms and have found it to be stacking unfavorably against the statistical models – it would be instructive also to see how the Nonnegative RESCAL algorithm fares against deep learning models.
Further, the customer segmentation is a fundamental modeling exercise in quantitative and precision marketing. Customer segmentation has almost exclusively been done in two distinct, loosely connected steps: A) segmenting customers using static data (no timeresolved data, typically using clustering techniques), B) consider transitions to (slightly) different segmentation states, to simulate timeresolved behavior. We intend to apply the Nonnegative RESCAL methodology to this problem as the time dimension is treated in much more natural fashion here.
Acknowledgment
The work performed in Los Alamos National Laboratory is supported by the U.S. Department of Energy National Nuclear Security Administration under Contract No. DEAC5206NA25396 and LANL laboratory directed research and development (LDRD) grant 20190020DR. Computations were supported by LANL Institutional Computing Program.
5 Bibliography
References
 [1] Sharan U and Neville J 2007 Exploiting timevarying relationships in statistical relational models Proceedings of the 9th WebKDD and 1st SNAKDD 2007 workshop on Web mining and social network analysis pp 9–15
 [2] Raedt L D, Kersting K, Natarajan S and Poole D 2016 Synthesis Lectures on Artificial Intelligence and Machine Learning 10 1–189
 [3] Getoor L and Mihalkova L 2011 Learning statistical models from relational data Proceedings of the 2011 ACM SIGMOD International Conference on Management of data pp 1195–1198
 [4] Magidson J and Vermunt J K 2004 The Sage handbook of quantitative methodology for the social sciences 175–198
 [5] Nickel M, Tresp V and Kriegel H P 2011 A threeway model for collective learning on multirelational data. Icml vol 11 pp 809–816
 [6] Ishteva M 2015 Tensors and latent variable models International Conference on Latent Variable Analysis and Signal Separation (Springer) pp 49–55
 [7] Robeva E and Seigal A 2019 Information and Inference: A Journal of the IMA 8 273–288
 [8] Koller D, Friedman N, Džeroski S, Sutton C, McCallum A, Pfeffer A, Abbeel P, Wong M F, Heckerman D, Meek C et al. 2007 Introduction to statistical relational learning
 [9] Lee D D and Seung H S 1999 Nature 401 788–791
 [10] MacKay D J et al. 1994 ASHRAE transactions 100 1053–1062
 [11] Anandkumar A, Ge R, Hsu D, Kakade S M and Telgarsky M 2014 Journal of Machine Learning Research 15 2773–2832
 [12] Håstad J 1990 Journal of algorithms (Print) 11 644–654
 [13] Mørup M and Hansen L K 2009 Journal of Chemometrics: A Journal of the Chemometrics Society 23 352–363
 [14] Shi C, Lu W and Song R 2019 The Journal of Machine Learning Research 20 809–846
 [15] Eaton J, Kortum S, Neiman B and Romalis J 2016 American Economic Review 106 3401–38
 [16] Bems R, Johnson R C and Yi K M 2013 Annu. Rev. Econ. 5 375–400
 [17] Alexandrov B, DeSantis D, Manzini G and Skau E 2019 arXiv preprint arXiv:1909.07570
 [18] Alexandrov L B, NikZainal S, Wedge D C, Campbell P J and Stratton M R 2013 Cell reports 3 246–259
 [19] Alexandrov B S and Vesselinov V V 2014 Water Resources Research 50 7332–7347
 [20] Chennupati G, Vangara R, Skau E, Djidjev H and Alexandrov B 2020 The Journal of Supercomputing 1–31
 [21] Alexandrov L B, NikZainal S, Wedge D C, Aparicio S A, Behjati S, Biankin A V, Bignell G R, Bolli N, Borg A, BørresenDale A L et al. 2013 Nature 500 415–421
 [22] Rousseeuw P J 1987 Journal of computational and applied mathematics 20 53–65
 [23] Harshman R A, Green P E, Wind Y and Lundy M E 1982 Marketing Science 1 205–242
 [24] Bader B W, Harshman R A and Kolda T G 2006 Pattern analysis of directed graphs using dedicom: an application to enron email. Tech. rep. Sandia National Laboratories
 [25] Krompaß D, Nickel M, Jiang X and Tresp V 2013 Nonnegative tensor factorization with rescal Tensor Methods for Machine Learning, ECML workshop
 [26] Marini M, Dippelsman R J and Stanger M 2018
 [27] Chen E Y and Chen R 2017 arXiv preprint arXiv:1710.06325
 [28] Nesreen K Ahmed Amir F Atiya N E G H E S 2010 Econometric Reviews 29 594––621
 [29] Spyros Makridakis Evangelos Spiliotis V A 2018 PLoS One 13 1––27
Comments
There are no comments yet.