Matrix Factorization via Deep Learning

12/04/2018 ∙ by Duc Minh Nguyen, et al. ∙ 0

Matrix completion is one of the key problems in signal processing and machine learning. In recent years, deep-learning-based models have achieved state-of-the-art results in matrix completion. Nevertheless, they suffer from two drawbacks: (i) they can not be extended easily to rows or columns unseen during training; and (ii) their results are often degraded in case discrete predictions are required. This paper addresses these two drawbacks by presenting a deep matrix factorization model and a generic method to allow joint training of the factorization model and the discretization operator. Experiments on a real movie rating dataset show the efficacy of the proposed models.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Let be an incomplete matrix and the set containing the indices of the observed entries in (). Matrix completion (MC) concerns the task of recovering the unknown entries of . Existing work often relies on the assumption that is a low-rank matrix. Matrix factorization (MF) methods [8] approximate the unknown rank- matrix by the product of two factors, , , , and .

Recently, state-of-the-art performance in matrix completion has been achieved by neural network models  

[11, 13, 7]. Nevertheless, these methods suffer from two main drawbacks. The first is associated with their extendability to row or column samples unseen during training. The second concerns models that ignore the discrete nature of matrices involved in many application domains, and produce sub-optimal solutions by applying a quantization operation (e.g. rounding) on the real-valued predictions.

In this work, we focus on these two drawbacks of deep-learning-based MC. To address the first problem, we present a deep two-branch neural network model for MF, which can be effectively extended to rows and/or columns unseen during training. We obtain discrete predictions with a continuous approximation of a discretization operator, which enables simultaneous learning of the MF model and the discretizer. Experiments on a real dataset justify the effectiveness of our methods.

2 Deep Matrix Factorization Model

Consider a partially observed matrix , and let , be the

-th row vector and

, be the -th column vector. Our model, illustrated in Fig. 1, comprises two branches that take as input the row and column vectors . The two branches can be seen as two embedding functions and , realized by a number of and fully connected layers, respectively; and are the corresponding sets of weights. and map , to the latent factors , . The prediction for the matrix entry at the

position is then given by the cosine similarity between

and . Therefore, our model maps the row and column vectors , to a continuous value 111 During training, all entries are linearly scaled into , according to , with . A re-scaling step is required to bring the predicted values to the range . according to . We coin this model Deep Matrix Factorization (DMF)222 DMF was presented in our previous work [9], and was independently proposed in [12]..

We employ the mean square error as an objective function to train our model, with regularization on the network parameters . The final objective function has the form:

(1)

with

a hyperparameter.

Figure 1: The deep matrix factorization model (DMF): The two branches consist of and fully connected layers, mapping the inputs to the latent factors, , ; is the cosine similarity function.
Figure 2: Extendability in matrix completion. area (I): rows and columns available during training. areas (II) and (III): entries corresponding to the interactions of unseen rows and seen columns and vice versa. area (IV): entries corresponding to the interactions of unseen rows and unseen columns.

Extendability of DMF

An important question in MF is how to efficiently extend the model to samples unseen during training. For example, in recommender systems the trained model has to deal with new users and/or new items. Fig. 2 illustrates different cases involving extendability. Incomplete rows and columns available during training are depicted in Area (I); areas (II), (III) and (IV) represent new rows and columns which are only partially observed after training. Let us denote by , , and the matrices corresponding to the entries in areas (I), (I) and (II), (I) and (III) and (I), (II), (III), (IV), respectively. A DMF model trained with entries in , can be extended to other matrices as follows:

with , , and .

3 Discrete Matrix Factorization

Most matrix factorization models only produce real-valued entries from which discrete predictions can be obtained by a separate quantization step. Considering a set of quantization levels , a quantizer divides the real line into non-overlapping consecutive intervals, defined as the range , . A quantity will be mapped to if it falls within the interval  [1], that is,

(2)

where outputs if the is true and otherwise, and are the quantization boundaries. In discrete MC, the set of quantization levels corresponds to the set of allowable discrete values of the matrix entries.

Denote by the heaviside step function, where if and otherwise. Without loss of generality, let us assume that , for all . We define . Then, (2) can be written as

(3)

is piecewise constant, thus, incorporating it into a gradient-based learning system such as DMF results in a vanishing gradient. We replace the heaviside step function by a logistic function of the form , where is a scalar denoting the center of the sigmoid, and controls the slope of . Therefore, we obtain the piecewise smooth function

(4)

Since , becomes arbitrarily close to in (3), when becomes arbitrarily large.

By incorporating into the DMF model (Sec. 2), we obtain a discrete MF model coined DMF-D, which can be trained using the following objective function:

(5)

where the last term penalizes deviating significantly from a uniform quantizer, ; are hyperparameters. We start with a small value of at the beginning of training and gradually increase it to a very large value, so that the output of the model becomes discrete at the end of training.

4 Exprimental Results

We carry out experiments on the MovieLens1M datasets [2], with 1 million user-movie ratings in . We randomly split of the observed entries for training, for validation and for testing. We evaluate the prediction quality in terms of the root mean square error, , and the mean absolute error, , calculated over the entries reserved for testing (). We report the results averaged over 5 runs with different random splits.

Table 1 presents results regarding the extendability of different deep-learning-based models. Even though the DMF model does not produce the best predictions for entries in area (I), and (III), it produces the best results in area (II) and is the only model that can make predictions for entries in area (IV).

(I) (II) (III) (IV)
U-Autorec [11] - -
I-Autorec [11] - 0.856 -
Deep U-Autorec [7] - -
DMF 0.883 0.904
Table 1: Results (RMSE) on the ML-1M dataset [2], separated by areas (from (I) to (IV)).
RMSE MAE
RankK [4]
OPTSPACE [6]
RDMC [3]
ODMC [5]
DMF-D
Table 2: Discrete MC results on the MovieLens1M dataset [2].

Table 2 presents a comparison of models that output discrete predictions. DMF-D outperforms existing models by large margins. The results show the benefits of the proposed approximation and the join training of latent factors and the discretization operator. More experimental results can be found in [9, 10].

5 Conclusion

We presented DMF, a deep neural network model for matrix factorization which can be extended to unseen samples without the need of re-training. We couple DMF with a method that allows to train discrete MF models with gradient descent, obtaining DMF-D, a strong model for discrete matrix completion. Experimental results with real data prove the effectiveness of both models compared to the state of the art.

References

  • [1] R.M. Gray and D.L. Neuhoff. Quantization. IEEE Transactions on Information Theory, 44(6):2325–2383, 1998.
  • [2] F. M. Harper and J. A. Konstan. The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems, 5(4):19:1–19:19, 2015.
  • [3] J. Huang, F. Nie, and H. Huang. Robust discrete matrix completion. In

    Conference on Artificial Intelligence (AAAI)

    , pages 424–430, 2013.
  • [4] J. Huang, P. Nie, H. Huang, Y. Lei, and C. Ding. Social trust prediction using rank-k matrix recovery. In International Joint Conference on Artificial Intelligence (IJCAI), pages 2647–2653, 2013.
  • [5] Z. Huo, J. Liu, and H. Huang. Optimal discrete matrix completion. In Conference on Artificial Intelligence (AAAI), pages 1687–1693, 2016.
  • [6] R. H. Keshavan and S. Oh. A gradient descent algorithm on the grassman manifold for matrix completion. arXiv preprint arXiv:0910.5260, 2009.
  • [7] O. Kuchaiev and B. Ginsburg. Training deep autoencoders for collaborative filtering. arXiv preprint arXiv:1708.01715, 2017.
  • [8] I. Markovsky. Low Rank Approximation: Algorithms, Implementation, Applications. Springer, 2012.
  • [9] D. M. Nguyen, E. Tsiligianni, and N. Deligiannis. Extendable neural matrix completion. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.
  • [10] D. M. Nguyen, E. Tsiligianni, and N. Deligiannis. Learning discrete matrix factorization models. IEEE Signal Processing Letters, 25(5):720–724, 2018.
  • [11] S. Sedhain, A. K. Menon, S. Sanner, and L. Xie.

    Autorec: Autoencoders meet collaborative filtering.

    In International Conference on World Wide Web (WWW), pages 111–112, 2015.
  • [12] H-J. Xue, X. Dai, J. Zhang, S. Huang, and J. Chen. Deep matrix factorization models for recommender systems. In International Joint Conference on Artificial Intelligence (IJICAI), pages 3203–3209, 2017.
  • [13] Y. Zheng, B. Tang, W. Ding, and H. Zhou. A neural autoregressive approach to collaborative filtering. In International Conference on Machine Learning (ICML), pages 764–773, 2016.