Inductive Framework for Multi-Aspect Streaming Tensor Completion with Side Information

02/18/2018 ∙ by Madhav Nimishakavi, et al. ∙ Microsoft indian institute of science 0

Low-rank tensor completion is a well-studied problem and has applications in various fields. However, in many real-world applications the data is dynamic, i.e., the tensor grows as new data arrives. Besides the tensor, in many real-world scenarios, side information is also available in the form of matrices which also grow. Existing work on dynamic tensor completion do not incorporate side information and most of the previous work is based on the assumption that the tensor grows only in one mode. We bridge this gap in this paper by proposing a dynamic tensor completion framework called Side Information infused Incremental Tensor Analysis (SIITA), which incorporates side information and works for general incremental tensors. We carry out extensive experiments on multiple real-world datasets to demonstrate the effectiveness of SIITA in various different settings.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Low rank tensor completion is a well-studied problem and has various applications in the fields of recommendation systems [31], link-prediction [4], compressed sensing [3], to name a few. Majority of the previous works focus on solving the problem in a static setting [7, 9, 14]. However, most of the real world data is dynamic, for example in an online movie recommendation system the number of users and movies increase with time. It is prohibitively expensive to use the static algorithms for dynamic data. Therefore, there has been an increasing interest in developing algorithms for dynamic low-rank tensor completion [15, 19, 28].

Usually in many real world scenarios, besides the tensor data, additional side information is also available, e.g., in the form of matrices. In the dynamic scenarios, the side information grows with time as well. For instance, movie-genre information in the movie recommendation etc. There has been considerable amount of work in incorporating side information into tensor completion [22, 8]. However, the previous works on incorporating side information deal with the static setting. In this paper, we propose a dynamic low-rank tensor completion model that incorporates side information growing with time.

Most of the current dynamic tensor completion algorithms work in the streaming scenario, i.e., the case where the tensor grows only in one mode, which is usually the time mode. In this case, the side information is a static matrix. Multi-aspect streaming scenario [6, 28], on the other hand, is a more general framework, where the tensor grows in all the modes of the tensor. In this setting, the side information matrices also grow. Figure 1 illustrates the difference between streaming and multi-aspect streaming scenarios with side information.

Besides side information, incorporating nonnegative constraints into tensor decomposition is desirable in an unsupervised setting. Nonnegativity is essential for discovering interpretable clusters [11, 21]

. Nonnegative tensor learning is explored for applications in computer vision

[26, 16], unsupervised induction of relation schemas [24], to name a few. Several algorithms for online Nonnegative Matrix Factorization (NMF) exist in the literature [18, 35], but algorithms for nonnegative online tensor decomposition with side information are not explored to the best of our knowledge. We also fill this gap by showing how nonnegative constraints can be enforced on the decomposition learned by our proposed framework SIITA.

(a) Streaming tensor sequence with side information. (b) Multi-aspect streaming tensor sequence with side information.
Figure 1: Illustration of streaming and multi-aspect streaming sequences with side information. The blue block represents the tensor at time step and the green block represents the side information. The blocks in grey represent the data at previous time steps. For easy understanding, we show side information along only one mode.

In this paper, we work with the more general multi-aspect streaming scenario and make the following contributions:

  • Formally define the problem of multi-aspect streaming tensor completion with side information.

  • Propose a Tucker based framework Side Information infused Incremental Tensor Analysis(SIITA) for the problem of multi-aspect streaming tensor completion with side information. We employ a stochastic gradient descent (SGD) based algorithm for solving the optimization problem.

  • Incorporate nonnegative constraints with SIITA for discovering the underlying clusters in unsupervised setting.

  • Demonstrate the effectiveness of SIITA using extensive experimental analysis on multiple real-world datasets in all the settings.

The organization of the paper is as follows. In Section 3, we introduce the definition of multi-aspect streaming tensor sequence with side information and discuss our proposed framework SIITA in Section 4. We also discuss how nonnegative constraints can be incorporated into SIITA in Section 4. The experiments are shown in Section 5, where SIITA performs effectively in various settings. All our codes are implemented in Matlab, and can be found at https://madhavcsa.github.io/.

2 Related Work

Property TeCPSGD[19] OLSTEC [15] MAST [28] AirCP [8] SIITA (this paper)
Streaming
Multi-Aspect Streaming
Side Information
Sparse Solution
Table 1: Summary of different tensor streaming algorithms.

Dynamic Tensor Completion : [29, 30] introduce the concept of dynamic tensor analysis by proposing multiple Higher order SVD based algorithms, namely Dynamic Tensor Analysis (DTA), Streaming Tensor Analysis (STA) and Window-based Tensor Analysis (WTA) for the streaming scenario. [25] propose two adaptive online algorithms for CP decomposition of -order tensors. [34] propose an accelerated online algorithm for tucker factorization in streaming scenario, while an accelerated online algorithm for CP decomposition is developed in [36].

A significant amount of research work is carried out for dynamic tensor decompositions, but work focusing on the problem of dynamic tensor completion is relatively less explored. Work by [19] can be considered a pioneering work in dynamic tensor completion. They propose a streaming tensor completion algorithm based on CP decomposition. Recent work by [15] is an accelerated second order Stochastic Gradient Descent (SGD) algorithm for streaming tensor completion based on CP decomposition. [6] introduces the problem of multi-aspect streaming tensor analysis by proposing a histogram based algorithm. Recent work by [28] is a more general framework for multi-aspect streaming tensor completion.

Tensor Completion with Auxiliary Information : [1] propose a Coupled Matrix Tensor Factorization (CMTF) approach for incorporating additional side information, similar ideas are also explored in [2] for factorization on hadoop and in [5] for link prediction in heterogeneous data. [22] propose with-in mode and cross-mode regularization methods for incorporating similarity side information matrices into factorization. Based on similar ideas, [8] propose AirCP, a CP-based tensor completion algorithm.

[32] propose nonnegative tensor decmpositon by incorporating nonnegative constraints into CP decomposition. Nonnegative CP decomposition is explored for applications in computer vision in [26]. Algorithms for nonnegative Tucker decomposition are proposed in [16] and for sparse nonnegative Tucker decomposition are proposed in [20]. However, to the best our knowledge, nonnegative tensor decomposition algorithms do not exist for dynamic settings, a gap we fill in this paper.

Inductive framework for matrix completion with side information is proposed in [12, 23, 27], which has not been explored for tensor completion to the best of our knowledge. In this paper, we propose an online inductive framework for multi-aspect streaming tensor completion.

Table 1 provides details about the differences between our proposed SIITA and various baseline tensor completion algorithms.

3 Preliminaries

An -order or -mode tensor is an -way array. We use boldface calligraphic letters to represent tensors (e.g., ), boldface uppercase to represent matrices (e.g., U

), and boldface lowercase to represent vectors (e.g.,

v). represents the entry of indexed by .  
Definition 1 (Coupled Tensor and Matrix) [28]: A matrix and a tensor are called coupled if they share a mode. For example, a tensor and a matrix are coupled along the movie mode.  
Definition 2 (Tensor Sequence) [28]: A sequence of -order tensors is called a tensor sequence denoted as , where each at time instance .  
Definition 3 (Multi-aspect streaming Tensor Sequence) [28]: A tensor sequence of -order tensors is called a multi-aspect streaming tensor sequence if for any , is the sub-tensor of , i.e.,

Here, increases with time, and is the snapshot tensor of this sequence at time .  
Definition 4 (Multi-aspect streaming Tensor Sequence with Side Information) : Given a time instance , let be a side information (SI) matrix corresponding to the mode of (i.e., rows of are coupled along mode of ). While the number of rows in the SI matrices along a particular mode may increase over time, the number of columns remain the same, i.e., is not dependent on time. In particular, we have,

Putting side information matrices of all the modes together, we get the side information set ,

Given an -order multi-aspect streaming tensor sequence , we define a multi-aspect streaming tensor sequence with side information as .

We note that all modes may not have side information available. In such cases, an identity matrix of appropriate size may be used as

, i.e., , where .

The problem of multi-aspect streaming tensor completion with side information is formally defined as follows:

Problem Definition: Given a multi-aspect streaming tensor sequence with side information , the goal is to predict the missing values in by utilizing only entries in the relative complement and the available side information .

4 Proposed Framework SIITA

In this section, we discuss the proposed framework SIITA for the problem of multi-aspect streaming tensor completion with side information. Let be an -order multi-aspect streaming tensor sequence with side information. Assuming that, at every time step, are only observed for some indices , where is a subset of the complete set of indices . Let the sparsity operator be defined as:

Tucker tensor decomposition [17], is a form of higher-order PCA for tensors. It decomposes an -order tensor into a core tensor multiplied by a matrix along each mode as follows

where, are the factor matrices and can be thought of as principal components in each mode. The tensor is called the core tensor, which shows the interaction between different components. is the (multilinear) rank of the tensor. The -mode matrix product of a tensor with a matrix is denoted by , more details can be found in [17]. The standard approach of incorporating side information while learning factor matrices in Tucker decomposition is by using an additive term as a regularizer [22]. However, in an online setting the additive side information term poses challenges as the side information matrices are also dynamic. Therefore, we propose the following fixed-rank inductive framework for recovering missing values in , at every time step :

(1)

where

(2)

is the Frobenius norm, and are the regularization weights. Conceptually, the inductive framework models the ratings of the tensor as a weighted scalar product of the side information matrices. Note that (1) is a generalization of the inductive matrix completion framework [12, 23, 27], which has been effective in many applications.

The inductive tensor framework has two-fold benefits over the typical approach of incorporating side information as an additive term. The use of terms in the factorization reduces the dimensionality of variables from to and typically . As a result, computational time required for computing the gradients and updating the variables decreases remarkably. Similar to [16], we define

which collects Kronecker products of mode matrices except for in a backward cyclic manner.

The gradients for (1) wrt for and can be computed as following:

(3)

where

By updating the variables using gradients given in (3), we can recover the missing entries in at every time step , however that is equivalent to performing a static tensor completion at every time step. Therefore, we need an incremental scheme for updating the variables. Let and represent the variables at time step , then

(4)

since is recovered at the time step -, the problem is equivalent to using only

for updating the variables at time step .

We propose to use the following approach to update the variables at every time step , i.e.,

(5)

where is the step size for the gradients. , needed for computing the gradients of , is given by

(6)
Input : 
Randomly initialize and ;
for t = 1, 2, … do
        ;
        ;
        for k = 1:K do
               Compute from (6) using and ;
               Compute for from (3);
               Update using and in (5) ;
               Compute from (3);
               Update using and in (5);
              
        end for
       ;
        ;
       
end for
Return : .
Algorithm 1 Proposed SIITA Algorithm

Algorithm 1 summarizes the procedure described above. The computational cost of implementing Algorithm 1 depends on the update of the variables (5) and the computations in (6). The cost of computing is . The cost of performing the updates (5) is . Overall, at every time step, the computational cost of Algorithm 1 is .

Extension to the nonnegative case: NN-SIITA

We now discuss how nonnegative constraints can be incorporated into the decomposition learned by SIITA. Nonnegative constraints allow the factor of the tensor to be interpretable.

We denote SIITA with nonnegative constraints with NN-SIITA. At every time step in the multi-aspect streaming setting, we seek to learn the following decomposition:

(7)

where is as given in (2).

We employ a projected gradient descent based algorithm for solving the optimization problem in (7). We follow the same incremental update scheme discussed in Algorithm 1, however we use a projection operator defined below for updating the variables. For NN-SIITA, (5) is replaced with

where is the element-wise projection operator defined as

The projection operator maps a point back to the feasible region ensuring that the factor matrices and the core tensor are always nonnegative with iterations.

5 Experiments

We evaluate SIITA against other state-of-the-art baselines in two dynamic settings viz., (1) multi-aspect streaming setting (Section 5.1), and (2) traditional streaming setting (Section 5.2). We then evaluate effectiveness of SIITA in the non-streaming batch setting (Section 5.3). We analyze the effect of different types of side information in Section 5.4. Finally, we evaluate the performance of NN-SIITA in the unsupervised setting in Section 5.5.

Datasets: Datasets used in the experiments are summarized in Table 2. MovieLens 100K [10] is a standard movie recommendation dataset. YELP is a downsampled version of the YELP(Full) dataset [13]. The YELP(Full) review dataset consists of 70K (user) 15K (business) 108 (year-month) tensor, and a side information matrix of size 15K (business) 68 (city). We select a subset of this dataset for comparisons as the considered baselines algorithms cannot scale to the full dataset. We note that SIITA, our proposed method, doesn’t have such scalability concerns. In Section 5.4, we show that SIITA scales to datasets of much larger sizes. In order to create YELP out of YELP(Full), we select the top frequent 1000 users and top 1000 frequent businesses and create the corresponding tensor and side information matrix. After the sampling, we obtain a tensor of size 1000 (user) 992 (business) 93 (year-month) and a side information matrix of dimensions 992 (business) 56 (city).

MovieLens 100K YELP
Modes user movie week user business year-month
Tensor Size 943168231 100099293
Starting size 19342 20202
Increment step 19, 34, 1 20, 20, 2
Sideinfo matrix 1682 (movie) 19 (genre) 992 (business) 56 (city)
Table 2: Summary of datasets used in the paper. The starting size and increment size given in the table are for Multi-Aspect Streaming setting. For Streaming setting, the tensor grows in the third dimension, one slice at every time step.

5.1 Multi-Aspect Streaming Setting

Dataset Missing% Rank MAST SIITA
MovieLens 100K 20% 3 1.60 1.23
5 1.53 1.29
10 1.48 2.49
50% 3 1.74 1.28
5 1.75 1.29
10 1.64 2.55
80% 3 2.03 1.59
5 1.98 1.61
10 2.02 2.96
YELP 20% 3 1.90 1.43
5 1.92 1.54
10 1.93 4.03
50% 3 1.94 1.51
5 1.94 1.67
10 1.96 4.04
80% 3 1.97 1.71
5 1.97 1.61
10 1.97 3.49
Table 3: Test RMSE (lower is better) averaged across all the time steps in the multi-aspect streaming tensor sequence setting (Definition 4) for MAST and SIITA. SIITA, the proposed method, outperforms MAST for all the datasets. Section 5.1 provides more details.
(a) MovieLens 100K (20% Missing) (b) YELP (20% Missing)
Figure 2: Evolution of test RMSE of MAST and SIITA with each time step. For both the datasets, SIITA attains a stable performance after a few time steps, while the performance of MAST degrades with every time step. Refer to Section 5.1 for more details.
(a) MovieLens 100K (20% Missing) (b) YELP (20% Missing)
Figure 3: Runtime comparison between MAST and SIITA at every time step. SIITA is significantly faster than MAST. Refer to Section 5.1 for more details.

We first analyze the model in the multi-aspect streaming setting, for which we consider MAST [28] as a state-of-the-art baseline.
MAST [28]: MAST is a dynamic low-rank tensor completion algorithm, which enforces nuclear norm regularization on the decomposition matrices of CP. A tensor-based Alternating Direction Method of Multipliers is used for solving the optimization problem.

We experiment with the MovieLens 100K and YELP datasets. Since the third mode is time in both the datasets, i.e., (week) in MovieLens 100K and (year-month) in YELP, one way to simulate the multi-aspect streaming sequence (Definition 3) is by considering every slice in third-mode as one time step in the sequence, and letting the tensor grow along other two modes with every time step, similar to the ladder structure given in [28, Section  3.3]. Note that this is different from the traditional streaming setting, where the tensor only grows in time mode while the other two modes remain fixed. In contrast, in the multi-aspect setting here, there can be new users joining the system within the same month but on different days or different movies getting released on different days in the same week etc. Therefore in our simulations, we consider the third mode as any normal mode and generate a more general multi-aspect streaming tensor sequence, the details are given in Table 2. The parameters for MAST are set based on the guidelines provided in [28,  Section 4.3].

We compute the root mean square error on test data (test RMSE; lower is better) at every time step and report the test RMSE averaged across all the time steps in Table 3. We perform experiments on multiple train-test splits for each dataset. We vary the test percentage, denoted by Missing% in Table 3, and the rank of decomposition, denoted by Rank for both the datasets. For every (Missing%, Rank) combination, we run both models on ten random train-test splits and report the average. For SIITA, Rank = in Table 3 represents the Tucker-rank .

In Table 3, the proposed SIITA achieves better results than MAST. Figure 2 shows the plots for test RMSE at every time step. Since SIITA handles the sparsity in the data effectively, as a result SIITA is significantly faster than MAST, which can be seen from Figure 3. Overall, we find that SIITA, the proposed method, is more effective and faster compared to MAST in the multi-aspect streaming setting.

5.2 Streaming Setting

(b) MovieLens 100K (20% Missing) (a) YELP (20% Missing)
Figure 4: Evolution of Test RMSE of TeCPSGD, OLSTEC and SIITA with each time step. In both datasets, SIITA performs significantly better than the baseline algorithms in the pure streaming setting. Refer to Section 5.2 for more details.
(b) MovieLens 100K (20% Missing) (a) YELP (20% Missing)
Figure 5: Runtime comparison between TeCPSGD, OLSTEC and SIITA. SIITA is able to exploit sparsity in the data and is much faster. Refer to Section 5.2 for more details.
Dataset Missing% Rank TeCPSGD OLSTEC SIITA
MovieLens 100K 20% 3 3.39 5.46 1.53
5 3.35 4.65 1.54
10 3.19 4.96 1.71
50% 3 3.55 8.39 1.63
5 3.40 6.73 1.64
10 3.23 3.66 1.73
80% 3 3.78 3.82 1.79
5 3.77 3.80 1.75
10 3.84 4.34 2.47
YELP 20% 3 4.55 4.04 1.45
5 4.79 4.04 1.59
10 5.17 4.03 2.85
50% 3 4.67 4.03 1.55
5 5.03 4.03 1.67
10 5.25 4.03 2.69
80% 3 4.99 4.02 1.73
5 5.17 4.02 1.78
10 5.31 4.01 2.62
Table 4: Test RMSE averaged across all the time steps in the streaming setting for TeCPSGD, OLSTEC, a state-of-the-art streaming tensor completion algorithm, and SIITA. SIITA outperforms the baseline algorithms significantly. See Section 5.2 for more details.

In this section, we simulate the pure streaming setting by letting the tensor grow only in the third mode at every time step. The number of time steps for each dataset in this setting is the dimension of the third mode, i.e., 31 for MovieLens 100K and 93 for YELP. We compare the performance of SIITA with TeCPSGD and OLSTEC algorithms in the streaming setting.
TeCPSGD [19]: TeCPSGD is an online Stochastic Gradient Descent based algorithm for recovering missing data in streaming tensors. This algorithm is based on PARAFAC decomposition. TeCPSGD is the first proper tensor completion algorithm in the dynamic setting.
OLSTEC [15]: OLSTEC is an online tensor tracking algorithm for partially observed data streams corrupted by noise. OLSTEC is a second order stochastic gradient descent algorithm based on CP decomposition exploiting recursive least squares. OLSTEC is the state-of-the-art for streaming tensor completion.

We report test RMSE, averaged across all time steps, for both MovieLens 100K and YELP datasets. Similar to the multi-aspect streaming setting, we run all the algorithms for multiple train-test splits. For each split, we run all the algorithms with different ranks. For every (Missing%, Rank) combination, we run all the algorithms on ten random train-test splits and report the average. SIITA significantly outperforms all the baselines in this setting, as shown in Table 4. Figure 4 shows the average test RMSE of every algorithm at every time step. From Figure 5 it can be seen that SIITA takes much less time compared to other algorithms. The spikes in the plots suggest that the particular slices are relatively less sparse.

5.3 Batch Setting

Dataset Missing% Rank AirCP SIITA
MovieLens 100K 20% 3 3.351 1.534
5 3.687 1.678
10 3.797 2.791
50% 3 3.303 1.580
5 3.711 1.585
10 3.894 2.449
80% 3 3.883 1.554
5 3.997 1.654
10 3.791 3.979
YELP 20% 3 1.094 1.052
5 1.086 1.056
10 1.077 1.181
50% 3 1.096 1.097
5 1.095 1.059
10 1.719 1.599
80% 3 1.219 1.199
5 1.118 1.156
10 2.210 2.153
Table 5: Mean Test RMSE across multiple train-test splits in the Batch setting. SIITA achieves lower test RMSE on both the datasets compared to AirCP, a state-of-the-art algorithm for this setting. Refer to Section 5.3 for details.

Even though our primary focus is on proposing an algorithm for the multi-aspect streaming setting, SIITA can be run as a tensor completion algorithm with side information in the batch (i.e., non streaming) setting. To run in batch setting, we set in Algorithm 1 and run for multiple passes over the data. In this setting, AirCP [8] is the current state-of-the-art algorithm which is also capable of handling side information. We consider AirCP as the baseline in this section. The main focus of this setting is to demonstrate that SIITA incorporates the side information effectively.  
AirCP [8]: AirCP is a CP based tensor completion algorithm proposed for recovering the spatio-temporal dynamics of online memes. This algorithm incorporates auxiliary information from memes, locations and times. An alternative direction method of multipliers (ADMM) based algorithm is employed for solving the optimization. AirCP expects the side information matrices to be similarity matrices and takes input the Laplacian of the similarity matrices. However, in the datasets we experiment with, the side information is available as feature matrices. Therefore, we consider the covariance matrices as similarity matrices.

We run both algorithms till convergence and report test RMSE. For each dataset, we experiment with different levels of test set sizes, and for each such level, we run our experiments on 10 random splits. We report the mean test RMSE per train-test percentage split. We run our experiments with multiple ranks of factorization. Results are shown in Table 5, where we observe that SIITA achieves better results. Note that the rank for SIITA is the Tucker rank, i.e., rank = 3. This implies a factorization rank of (3, 3, 3) for SIITA.

Remark: Since all the baselines considered for various settings are CP based, we only compare for CP tensor rank. From Tables 3, 4 and 5 it can be seen that the performance suffers for rank = 10. However, when we run SIITA with a rank = (10, 10, 2) we achieve a lower test RMSE.

5.4 Analyzing Merits of Side Information

Dataset Missing% Rank SIITA (w/o SI) SIITA
MovieLens 100K 20% 3 1.19 1.23
5 1.19 1.29
10 2.69 2.49
50% 3 1.25 1.28
5 1.25 1.29
10 3.28 2.55
80% 3 1.45 1.59
5 1.42 1.61
10 2.11 2.96
YELP 20% 3 1.44 1.43
5 1.48 1.54
10 3.90 4.03
50% 3 1.57 1.51
5 1.62 1.67
10 5.48 4.04
80% 3 1.75 1.71
5 1.67 1.61
10 5.28 3.49
Table 6: Test RMSE averaged across multiple train-test splits in the Multi-Aspect Streaming setting, analyzing the merits of side information. See Section 5.4 for more details.
Dataset Missing% Rank SIITA (w/o SI) SIITA
MovieLens 100K 20% 3 1.46 1.53
5 1.53 1.54
10 1.55 1.71
50% 3 1.58 1.63
5 1.67 1.64
10 1.56 1.73
80% 3 1.76 1.79
5 1.74 1.75
10 2.31 2.47
YELP 20% 3 1.46 1.45
5 1.62 1.59
10 2.82 2.85
50% 3 1.57 1.55
5 1.69 1.67
10 2.54 2.67
80% 3 1.76 1.73
5 1.80 1.78
10 2.25 2.62
Table 7: Test RMSE averaged across multiple train-test splits in the streaming setting, analyzing the merits of side information. See Section 5.4 for more details.
(b) MovieLens 100K (80% Missing) (a) YELP (80% Missing)
Figure 6: Evolution of test RMSE with every time step in the multi-aspect streaming setting for SIITA and SIITA (w/o SI). See Section 5.4 for more details.
(b) MovieLens 100K (80% Missing) (a) YELP (80% Missing)
Figure 7: Run Time comparison between SIITA and SIITA (w/o SI) in the multi-aspect streaming setting. See Section 5.4 for more details.
(b) MovieLens 100K (80% Missing) (a) YELP (80% Missing)
Figure 8: Evolution of test RMSE with every time step in the streaming setting for SIITA and SIITA(w/o SI). See Section 5.4 for more details.
(b) MovieLens 100K (80% Missing) (a) YELP (80% Missing)
Figure 9: Run Time comparison between SIITA and SIITA (w/o SI) in the Streaming setting. See Section 5.4 for more details.
(a) Test RMSE at every time step (b) Run Time at every time step
Figure 10: Investigating the merits of side information for MovieLens 1M dataset in the multi-aspect streaming setting. Side information along the user mode is the most useful for tensor completion. See Section 5.4 for more details.

(a) Evolution of Test RMSE against epochs.

(b) Time elapsed with every epoch.
Figure 11: Investigating the merits of side information for MovieLens 1M dataset in the batch setting. Side information along the user mode is the most useful for tensor completion. See Section 5.4 for more details.

Our goal in this paper is to propose a flexible framework using which side information may be easily incorporated during incremental tensor completion, especially in the multi-aspect streaming setting. Our proposed method, SIITA, is motivated by this need. In order to evaluate merits of different types of side information on SIITA, we report several experiments where performances of SIITA with and without various types of side information are compared.

Single Side Information: In the first experiment, we compare SIITA with and without side information (by setting side information to identity; see Section 3). We run the experiments in both multi-aspect streaming and streaming settings. Table 6 reports the mean test RMSE of SIITA and SIITA (w/o SI), which stands for running SIITA without side information, for both datasets in multi-aspect streaming setting. For MovieLens 100K, SIITA achieves better performance without side information. Whereas for YELP, SIITA performs better with side information. Figure 6 shows the evolution of test RMSE at every time step. Figure 7 shows the runtime of SIITA when run with and without side information. SIITA runs faster in the presence of side information. Table 7 reports the mean test RMSE for both the datasets in the streaming setting. Similar to the multi-aspect streaming setting, SIITA achieves better performance without side information for MovieLens 100K dataset and with side information for YELP dataset. Figure 8 shows the test RMSE of SIITA against time steps, with and without side information. Figure 9 shows the runtime at every time step.

Multi Side Information: In all the datasets and experiments considered so far, side information along only one mode is available to SIITA. In this next experiment, we consider the setting where side information along multiple modes are available. For this experiment, we consider the MovieLens 1M [10] dataset, a standard dataset of 1 million movie ratings. This dataset consists of a 6040 (user) 3952 (movie) 149 (week) tensor, along with two side information matrices: a 6040 (user) 21 (occupation) matrix, and a 3952 (movie) 18 (genre) matrix.

Note that among all the methods considered in the paper, SIITA is the only method which scales to the size of MovieLens 1M datasets.

We create four variants of the dataset. The first one with the tensor and all the side information matrices denoted by MovieLens 1M, the second one with the tensor and only the side information along the movie mode denoted by MovieLens 1M (movie mode). Similarly, MovieLens (user mode) with only user mode side information, and finally MovieLens 1M (no si) with only the tensor and no side information.

We run SIITA in multi-aspect streaming and batch modes for all the four variants. Test RMSE at every time step in the multi-aspect streaming setting is shown in Figure 10(a). Evolution of Test RMSE (lower is better) against epochs are shown in Figure 11(a) in batch mode. From Figures 10(a) and 11(a), it is evident that the variant MovieLens 1M (user mode) achieves best overall performance, implying that the side information along the user mode is more useful for tensor completion in this dataset. However, MovieLens 1M (movie mode) achieves poorer performance than other variants implying that movie-mode side information is not useful for tensor completion in this case. This is also the only side information mode available to SIITA during the MovieLens 100K experiments in Tables 6 and 7. This sub-optimal side information may be a reason for SIITA’s diminished performance when using side information for MovieLens100K dataset. From the runtime comparisons in Figures 11 (b) and 10(b), we observe that MovieLens 1M (where both types of side information are available) takes the least time, while the variant MovieLens 1M (no si) takes the most time to run. This is a benefit we derive from the inductive framework, where in the presence of useful side information, SIITA not only helps in achieving better performance but also runs faster.

5.5 Unsupervised Setting

(a) MovieLens 100K (b) YELP
Figure 12: Average Purity of clusters learned by NN-SIITA and NN-SIITA (w/o SI) at every time step in the unsupervised setting. For both datasets, side information helps in learning purer clusters. See Section 5.5 for more details.
(a) MovieLens 100K (b) YELP
Figure 13: Evolution of mean average purity with for NN-SIITA and NN-SIITA (w/o SI) for both MovieLens 100K and YELP datasets. See Section 5.5 for more details.
Cluster (Action, Adventure, Sci-Fi) Cluster (Noisy)
MovieLens100K Movie Genres Movie Genres
The Empire Strikes Back (1980) Action, Adventure, Sci-Fi, Drama, Romance Toy Story (1995) Animation, Children’s, Comedy
Heavy Metal (1981) Action, Adventure, Sci-Fi, Animation, Horror From Dusk Till Dawn (1996) Action, Comedy, Crime, Horror, Thriller
Star Wars (1977) Action, Adventure, Sci-Fi, Romance, War Mighty Aphrodite (1995) Comedy
Return of the Jedi (1983) Action, Adventure, Sci-Fi, Romance, War Apollo 13 (1995) Action, Drama, Thriller
Men in Black (1997) Action, Adventure, Sci-Fi, Comedy Crimson Tide (1995) Drama, Thriller, War
Cluster (Phoenix) Cluster (Noisy)
YELP Business Location Business Location
Hana Japanese Eatery Phoenix The Wigman Litchfield Park
Herberger Theater Center Phoenix Hitching Post 2 Gold Canyon
Scramble A Breakfast Joint Phoenix Freddys Frozen Custard & Steakburgers Glendale
The Arrogant Butcher Phoenix Costco Avondale
FEZ Phoenix Hana Japanese Eatery Phoenix
Table 8: Example clusters learned by NN-SIITA for MovieLens 100K and YELP datasets. The first column is an example of a pure cluster and the second column is an example of noisy cluster. See Section 5.5 for more details.

In this section, we consider an unsupervised setting with the aim to discover underlying clusters of the items, like movies in the MovieLens 100K dataset and businesses in the YELP dataset, from a sequence of sparse tensors. It is desirable to mine clusters such that similar items are grouped together. Nonnegative constraints are essential for mining interpretable clusters [11, 21]. For this set of experiments, we consider the nonnegative version of SIITA denoted by NN-SIITA. We investigate whether side information helps in discovering more coherent clusters of items in both datasets.

We run our experiments in the multi-aspect streaming setting. At every time step, we compute Purity of clusters and report average-Purity. Purity of a cluster is defined as the percentage of the cluster that is coherent. For example, in MovieLens 100K, a cluster of movies is 100% pure if all the movies belong to the same genre and 50% pure if only half of the cluster belong to the same genre. Formally, let clusters of items along mode- are desired, let be the rank of factorization along mode-. Every column of the matrix is considered a distribution of the items, the top- items of the distribution represent a cluster. For -th cluster, i.e., cluster representing column of the matrix , let items among the top- items belong to the same category, Purity and average-Purity are defined as follows:

Note that Purity is computed per cluster, while average-Purity is computed for a set of clusters. Higher average-Purity indicates a better clustering.

We report average-Purity at every time step for both the datasets. We run NN-SIITA with and without side information. Figure 12 shows average-Purity at every time step for MovieLens 100K and YELP datasets. It is clear from Figure 12 that for both the datasets side information helps in discovering better clusters. We compute the Purity for MovieLens 100K dataset based on the genre information of the movies and for the YELP dataset we compute Purity based on the geographic locations of the businesses. Table 8 shows some example clusters learned by NN-SIITA. For MovieLens 100K dataset, each movie can belong to multiple genres. For computing the Purity, we consider the most common genre for all the movies in a cluster. Results shown in Figure 12 are for . However, we also vary between 5 and 25 and report the mean average-Purity, which is obtained by computing the mean across all the time steps in the multi-aspect streaming setting. As can be seen from Figure 13, having side information helps in learning better clusters for all the values of . For MovieLens 100K, the results reported are with a factorization rank of and for YELP, the rank of factorization is . Since this is an unsupervised setting, note that we use the entire data for factorization, i.e., there is no train-test split.

6 Conclusion

We propose an inductive framework for incorporating side information for tensor completion in standard and multi-aspect streaming settings. The proposed framework can also be used in the batch setting. Given a completely new dataset with side information along multiple modes, SIITA can be used to analyze the merits of different side information for tensor completion. Besides performing better, SIITA is also significantly faster than state-of-the-art algorithms. We also propose NN-SIITA for handling nonnegative constraints and show how it can be used for mining interpretable clusters. Our experiments confirm the effectiveness of SIITA in many instances. In future, we plan to extend our proposed framework to handle missing side information problem instances [33].

Acknowledgement

This work is supported in part by the Ministry of Human Resource Development (MHRD), Government of India.

References

  • Acar et al.  [2011] Acar, Evrim, Kolda, Tamara G., & Dunlavy, Daniel M. 2011. All-at-once Optimization for Coupled Matrix and Tensor Factorizations. In: MLG.
  • Beutel et al.  [2014] Beutel, Alex, Talukdar, Partha Pratim, Kumar, Abhimanu, Faloutsos, Christos, Papalexakis, Evangelos E, & Xing, Eric P. 2014. Flexifact: Scalable flexible factorization of coupled tensors on hadoop. In: SDM.
  • Cichocki et al.  [2015] Cichocki, A., Mandic, D., De Lathauwer, L., Zhou, G., Zhao, Q., Caiafa, C., & Phan, H. A. 2015. Tensor decompositions for signal processing applications: From two-way to multiway component analysis. IEEE Signal Processing Magazine, 32(2), 145–163.
  • Ermiş et al.  [2015a] Ermiş, B., Acar, E., & Cemgil, A. T. 2015a. Link prediction in heterogeneous data via generalized coupled tensor factorization. In: KDD.
  • Ermiş et al.  [2015b] Ermiş, Beyza, Acar, Evrim, & Cemgil, A Taylan. 2015b. Link prediction in heterogeneous data via generalized coupled tensor factorization. KDD.
  • Fanaee-T & Gama [2015] Fanaee-T, Hadi, & Gama, João. 2015. Multi-aspect-streaming Tensor Analysis. Know.-Based Syst., 332–345.
  • Filipović & Jukić [2015] Filipović, M., & Jukić, A. 2015. Tucker factorization with missing data with application to low-n-rank tensor completion. Multidimens Syst Signal Process.
  • Ge et al.  [2016] Ge, Hancheng, Caverlee, James, Zhang, Nan, & Squicciarini, Anna. 2016. Uncovering the Spatio-Temporal Dynamics of Memes in the Presence of Incomplete Information. CIKM.
  • Guo et al.  [2017] Guo, X., Yao, Q., & Kwok, J. T. 2017. Efficient sparse low-rank tensor completion using the Frank-Wolfe algorithm. In: AAAI.
  • Harper & Konstan [2015] Harper, F. Maxwell, & Konstan, Joseph A. 2015. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst., Dec., 19:1–19:19.
  • Hyvönen et al.  [2008] Hyvönen, Saara, Miettinen, Pauli, & Terzi, Evimaria. 2008. Interpretable nonnegative matrix decompositions. Pages 345–353 of: KDD. ACM.
  • Jain & Dhillon [2013] Jain, Prateek, & Dhillon, Inderjit S. 2013. Provable inductive matrix completion. arXiv preprint arXiv:1306.0626.
  • Jeon et al.  [2016] Jeon, ByungSoo, Jeon, Inah, Sael, Lee, & Kang, U. 2016. Scout: Scalable coupled matrix-tensor factorization-algorithm and discoveries. In: ICDE.
  • Kasai & Mishra [2016] Kasai, H., & Mishra, B. 2016. Low-rank tensor completion: a Riemannian manifold preconditioning approach. In: ICML.
  • Kasai [2016] Kasai, Hiroyuki. 2016. Online Low-Rank Tensor Subspace Tracking from Incomplete Data by CP Decomposition using Recursive Least Squares. In: ICASSP.
  • Kim & Choi [2007] Kim, Yong-Deok, & Choi, Seungjin. 2007. Nonnegative tucker decomposition. In: CVPR.
  • Kolda & Bader [2009] Kolda, Tamara G, & Bader, Brett W. 2009. Tensor decompositions and applications. SIAM review, 51(3), 455–500.
  • Lefevre et al.  [2011] Lefevre, Augustin, Bach, Francis, & Févotte, Cédric. 2011. Online algorithms for nonnegative matrix factorization with the Itakura-Saito divergence. Pages 313–316 of: Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on. IEEE.
  • Mardani et al.  [2015] Mardani, Morteza, Mateos, Gonzalo, & Giannakis, Georgios B. 2015.

    Subspace learning and imputation for streaming big data matrices and tensors.

    IEEE Transactions on Signal Processing.
  • Mørup et al.  [2008] Mørup, Morten, Hansen, Lars Kai, & Arnfred, Sidse M. 2008. Algorithms for sparse nonnegative Tucker decompositions. Neural computation, 20(8), 2112–2131.
  • Murphy et al.  [2012] Murphy, Brian, Talukdar, Partha Pratim, & Mitchell, Tom M. 2012. Learning Effective and Interpretable Semantic Models using Non-Negative Sparse Embedding. In: COLING.
  • Narita et al.  [2011] Narita, Atsuhiro, Hayashi, Kohei, Tomioka, Ryota, & Kashima, Hisashi. 2011. Tensor Factorization Using Auxiliary Information. Pages 501–516 of: Machine Learning and Knowledge Discovery in Databases.
  • Natarajan & Dhillon [2014] Natarajan, Nagarajan, & Dhillon, Inderjit S. 2014. Inductive matrix completion for predicting gene–disease associations. Bioinformatics, 30(12), i60–i68.
  • Nimishakavi et al.  [2016] Nimishakavi, Madhav, Saini, Uday Singh, & Talukdar, Partha. 2016. Relation Schema Induction using Tensor Factorization with Side Information. Pages 414–423 of: EMNLP.
  • Nion & Sidiropoulos [2009] Nion, Dimitr, & Sidiropoulos, Nicholas D. 2009. Adaptive algorithms to track the PARAFAC decomposition of a third-order tensor. IEEE Transactions on Signal Processing.
  • Shashua & Hazan [2005] Shashua, Amnon, & Hazan, Tamir. 2005. Non-negative Tensor Factorization with Applications to Statistics and Computer Vision. Pages 792–799 of: ICML. ICML ’05. New York, NY, USA: ACM.
  • Si et al.  [2016] Si, Si, Chiang, Kai-Yang, Hsieh, Cho-Jui, Rao, Nikhil, & Dhillon, Inderjit S. 2016. Goal-directed inductive matrix completion. In: KDD.
  • Song et al.  [2017] Song, Qingquan, Huang, Xiao, Ge, Hancheng, Caverlee, James, & Hu, Xia. 2017. Multi-Aspect Streaming Tensor Completion. In: KDD.
  • Sun et al.  [2006] Sun, Jimeng, Tao, Dacheng, & Faloutsos, Christos. 2006. Beyond streams and graphs: dynamic tensor analysis. In: KDD.
  • Sun et al.  [2008] Sun, Jimeng, Tao, Dacheng, Papadimitriou, Spiros, Yu, Philip S., & Faloutsos, Christos. 2008. Incremental Tensor Analysis: Theory and Applications. ACM Trans. Knowl. Discov. Data, 2(3).
  • Symeonidis et al.  [2008] Symeonidis, Panagiotis, Nanopoulos, Alexandros, & Manolopoulos, Yannis. 2008. Tag Recommendations Based on Tensor Dimensionality Reduction. In: RecSys.
  • Welling & Weber [2001] Welling, Max, & Weber, Markus. 2001. Positive tensor factorization. Pattern Recognition Letters, 22(12), 1255–1261.
  • Wimalawarne et al.  [2017] Wimalawarne, Kishan, Yamada, Makoto, & Mamitsuka, Hiroshi. 2017. Convex Coupled Matrix and Tensor Completion. arXiv preprint arXiv:1705.05197.
  • Yu et al.  [2015] Yu, Rose, Cheng, Dehua, & Liu, Yan. 2015. Accelerated Online Low-rank Tensor Learning for Multivariate Spatio-temporal Streams. In: ICML.
  • Zhao et al.  [2017] Zhao, Renbo, Tan, Vincent, & Xu, Huan. 2017. Online Nonnegative Matrix Factorization with General Divergences. Pages 37–45 of: AISTATS.
  • Zhou et al.  [2016] Zhou, Shuo, Vinh, Nguyen Xuan, Bailey, James, Jia, Yunzhe, & Davidson, Ian. 2016. Accelerating online cp decompositions for higher order tensors. In: KDD. ACM.