To test our model, we experimented on the MovieLens 100K dataset created by the GroupLens Research Project at the University of Minnesota Harper and Konstan (2015), which is a common benchmark dataset for recommendation systems. It consists of 100,000 ratings on a scale of 5 from 943 users on 1682 movies. Along with unique identification numbers, for each user, the age, occupation, zip code and gender are provided, and for each movie, the movie title, genre, and other metadata are provided. For our experiments, we use the unique identification number, age, occupation, zip code and gender as user features, and the unique identification number and genre as movie features. For this dataset, there exists a standard 5-fold training and validation split which we used in order to accurately compare with current state of the art models. To test for transferability, we experimented on the MovieLens 1M dataset, which has 1 million ratings from 6000 users on 4000 movies and has the same user and item metadata. As there is no defined train-valid split for this dataset, we used a 80-20 train-valid split, with the latter timestamps of the ratings held out as validation.
With the HBGNN described in Section.2, we have formulated a supervised learning task with a defined as the user-item bigraph and as the rating. Thus, to tune our model, we use root mean squared error, , as our cost function given our prediction, , and the target value, y.
We use AMSGrad with Weight Decay Reddi et al. (2018); Krogh and Hertz (1992), an adaptive gradient descent optimization based on exponential moving average updates paired with weight regularization, to optimize the cost function defined in Eq.1.
2 Hierarchical BiGraph Neural Network (HBGNN)
We start by formalizing the generalized bigraph framework used to structure the user-item features of the recommendation system using graphical form shown in Fig. 5. The generalized bigraph framework introduced allows us to sufficiently express the locality and connectivity of the entities and its features. In the context of recommendation systems, there exists two entities: user and item. Thus, there are two nodes placed in the scope of the place graph, , and is connected by a single port. This port provides the communication between the two entities needed to merge the two entities for the task at hand. For recommendation systems, the task is to predict the ratings given user and item. The bigraph needs to be concrete, which means all supports need to be defined. Thus, to obtain the state for the two entities in the place graph, we define two link graphs that contain a densely connected nodes with states allocated by features. The links amongst the peers are bidirectional, and communicate to embed the profile of the entity.
Now, we will canonically go over the steps of how the HBGNN propagates the user and item features to predict a rating using Fig. 6. Since features of the user, , and item, , are sparse and usually discrete, we introduce learnable embeddings, and , to be assigned to their respective node within the link graphs.
Once embeddings are assigned to their respective node, both link graphs perform message passing. Each node processes incoming messages using an gated recurrent unitLi et al. (2015) with the input being the aggregated sum of all neighbor messages, , for node denoted as and the hidden state being the current state denoted as , at each updated node feature, . It is important to note that the message sent by neighbors are the current state of the neighbors, and no independent transforms should be applied individually to each message rather on the aggregated sum as to be tolerant to isomorphism. The message passing process can done concurrently through all nodes in both link graphs, thus being very computational efficient.
Upon messages passing rounds, all node states in the link graphs are passed through an neural network for encapsulation and then assigned to the respective nodes, and , in the place graph. With our design choice of making the link graph fully connected, can equal 1 and still allow propagation through all combinations of nodes states. We will experiment with this claim in Section 3.
Since all supports are defined upon assignment, we can communicate via port for the recommendation system task. We can perform message passing between the two nodes at the place graph similar to how it was done in the link graphs. Again, each entity node contains a gated recurrent unit and messages are aggregated in the same manner. We can additionally choose whether the identification number of the user and item is introduced in the link graph or the place graph. Models that input the identification number at the link graph will be prefixed with whereas models that input the identification number at the place graph will be prefixed with . Furthermore, once again, since the place graph is fully connected, complete propagation between the two peers is done with a single iteration of message passing.
The priors to determine the sparsity in the link graphs are not trivial. In graph structures, its sparsity can be controlled by either edge removal or edge reweighing. We argue edge reweighing is a more generalized approach of edge removal, since given that connects node to node , the message passed through can be scaled by 0 to represent edge removal. To formalize edge reweighing, we denote as the set of edge weights, and for all edges in , there is an associated element in . Thus, . Now, message passing with edge reweighing can be a weighted sum of all neighboring messages.
To compute each , we can leverage past works Veličković et al. (2017); Vaswani et al. (2017); Liao et al. (2019) on the attention mechanism and graph neural networks. Simply, we can assign the hidden state as the key, and each message as the query and value for node .
The key and query are transformed using a linear transformand .
Thus, we introduce a variant of the HBGNN that utilizes the attention mechanism for edge reweighing, as to overcome the unknown prior for feature dependency and sparsity in both user and item link graphs. We denote this model as Attention HBGNN (AHBGNN).
To obtain the rating, we can proceed in numerous ways by utilizing some combination of the nodes of the link graph and/or place graph, but we will only illustrate on using solely the place graph in this paper. Once message passing is complete, we can pass the states of the two entity nodes of the place graph into an neural network to predict the rating. To optimize the HBGNN, we can treat the recommendation system problem as a supervised learning task.
|MovieLens 100K||MovieLens 1M|
RMSE train-test results on MovieLens 100K and 1M. The suffix * denotes transfer learning.
We argue that HBGNN is implicitly utilizing both collaborative filtering and content-based filtering under this generalized bigraph paradigm. HBGNN embeds the user and item profiles into disjoint graphs, encoding the same information used to describe by all the users and items in the user and item population. The learned embedding is generalized and optimized in a manner that take into account other users and item, and this aspect acts as collaborative filtering since it can generate correlations amongst the population. By using a supervised learning structure, HBGNN utilizes on past user and item relationship. Thus, this aspect acts as content-based filtering since it can generate correlations based on user-item history.
We conducted experiments described in Section 1, and compare our models, -HBGNN, -HBGNN, and AHBGNN, to the current modeling methods that were tested on both datasets on the RMSE metric, such as Graph Convolution Matrix Completion (GCMC) van den Berg et al. (2017), Inductive Graph Convolution Matrix Completion (IGCMC) Zhang and Chen (2019), Inductive Graph-based Matrix Completion (IGMC) Zhang and Chen (2019), PinSage Ying et al. (2018), Self-Supervised Exchangeable Model (SSEM) Hartford et al. (2018)
, and Factorized Exchangable Autoencoder (FEAE)Hartford et al. (2018)
. Also, we tested a traditional multi-layer perceptron model (MLP) as well on the datasets with the same features as the HBGNNs and their own set of learnable embedding.
Both -HBGNN and
-HBGNN uses 512 parameters for each state in the user and movie link graphs, and 2048 parameters for each state in the rating place graph. The size of the embedding matrix depends on the range of the user and movie feature, but we set the user age range to be fixed at 100 so that we can leverage transfer learning from MovieLens100K and MovieLens1M. The encoding neural network that assigns the state of the nodes in the place graph is a single layer model with 4096 linear perceptrons. The processing network used to predict the rating is a 5-layer multi-layer perceptron model with leaky rectifier transforms between the layers. All linear layers are initialized with Kaiming initialization using an uniform distribution.
The results shown in Table 1 conclude our HBGNNs to performs competitively with the current modeling methods on both dataset. The -HBGNN and
-HBGNN was trained for 75 epochs on the MovieLens100K dataset, and trained for 30 epochs for MovieLens1M, and the final RMSE loss is shown in Table1. The attention variants followed respective training procedures.
Additionally, we fine-tuned the pretrained -HBGNN and -HBGNN that performed the best amongst the five folds from the MovieLens100K dataset on the MovieLens1M dataset, and the pretrained models outperforms the models without pre-training within one epoch. We also attempted fine=tuning the pretrained -HBGNN and -HBGNN from the MovieLens1M dataset on the MovieLens100K dataset, and found similar results. Both results can be seen in Table 1 and each pretrained model was only fine-tuned for 5 epoch for both datasets, as it converged very quickly and to a better local optima. To do this, we had to reformat and retrain the unique identification number embeddings for user and movie as they differed between the two datasets, as well the zip code for the user profile. So, when we fine-tune the models, HBGNN only needs to learn the embeddings for the identification numbers and the zip code, and adjust the ports and links minimally. From our experimental results in Table 1, we can conclude the transferability of this architecture to be very effective, only required to relearn the embeddings for the identification number and the zip code for the user.
The -HBGNN achieved faster convergence than -HBGNN, however once converged, both models achieved similar performance on validation set, with -HBGNN performing slightly better. -HBGNN appears to overfit to the training set much more than -HBGNN in both MovieLens datasets. Both methods achieve competitive performance amongst current methods, and additional regularization should be applied in future works for better generalization and performance on the validation set.
In Fig. 8, we visualize each user profiles created by the user link graph and their respective ratings generated by the place graphs using t-distributed Stochastic Neighbor Embedding (TSNE) using -HGNN with 1 message passing round for both place and link graphs. The details of each movie is listed in Table 2.
|Movie Titles||Genre||Number of Ratings|
|Star Trek: First Contact (1996)||Action, Adventure, Sci-Fi||5||25||87||123||53|
|Homeward Bound II… (1996)||Adventure, Children’s||6||8||6||8||2|
|Cats Don’t Dance (1997)||Animation, Children’s, Musical||5||6||7||1||5|
|Love Bug, The (1969)||Children’s, Comedy||4||8||24||4||1|
In this paper, we have proposed Hierarchical BiGraph Neural Net (HBGNN), a hierarchical approach of using graph neural networks as recommendation systems and structuring the user-item features using a bigraph framework. We show that HBGNN is competitive compared to current recommendation systems and is transferrable to higher scaled tasks without much retraining. In future works, we hope to explore different methods of understanding the embeddings in the link and place graphs of the HBGNN (ie. understand the individual and joint profiles), and test different design choices, (ie. methods of message passing, connectivity, different graph neural network architectures).
- Fab: content-based, collaborative recommendation. Communications of the ACM 40, pp. 66–72. External Links: Cited by: Hierarchical BiGraph Neural Network as Recommendation Systems.
Unifying collaborative and content-based filtering.
Proceedings of the Twenty-First International Conference on Machine Learning, ICML ’04, New York, NY, USA, pp. 9. External Links: Cited by: Hierarchical BiGraph Neural Network as Recommendation Systems.
- Spectral networks and locally connected networks on graphs. External Links: Cited by: Hierarchical BiGraph Neural Network as Recommendation Systems.
- Discriminative embeddings of latent variable models for structured data. External Links: Cited by: Hierarchical BiGraph Neural Network as Recommendation Systems.
- Convolutional networks on graphs for learning molecular fingerprints. CoRR abs/1509.09292. External Links: Cited by: Hierarchical BiGraph Neural Network as Recommendation Systems.
- Using collaborative filtering to weave an information tapestry. Commun. ACM 35 (12), pp. 61–70. External Links: Cited by: Hierarchical BiGraph Neural Network as Recommendation Systems.
- The movielens datasets: history and context. ACM Trans. Interact. Intell. Syst. 5 (4). External Links: Cited by: §1.
- Deep models of interactions across sets. External Links: Cited by: §3.
- A simple weight decay can improve generalization. pp. 950–957. External Links: Cited by: §1.
- NewsWeeder: learning to filter netnews. In Machine Learning Proceedings 1995Advances in Neural Information Processing Systems 4, A. Prieditis, S. Russell, J. E. Moody, S. J. Hanson, and R. P. Lippmann (Eds.), pp. 331 – 339. External Links: Cited by: Hierarchical BiGraph Neural Network as Recommendation Systems.
- Gated graph sequence neural networks. External Links: Cited by: §2.
- Efficient graph generation with graph recurrent attention networks. External Links: Cited by: §2.
- Pure bigraphs: structure and dynamics. Information and Computation 204 (1), pp. 60 – 122. External Links: Cited by: Hierarchical BiGraph Neural Network as Recommendation Systems.
- On the convergence of adam and beyond. In International Conference on Learning Representations, External Links: Cited by: §1.
- Item-based collaborative filtering recommendation algorithms. Proceedings of ACM World Wide Web Conference 1, pp. . External Links: Cited by: Hierarchical BiGraph Neural Network as Recommendation Systems.
- The graph neural network model. IEEE Transactions on Neural Networks 20 (1), pp. 61–80. Cited by: Hierarchical BiGraph Neural Network as Recommendation Systems.
- Bigraphs with sharing. Theoretical Computer Science 577, pp. 43 – 73. External Links: Cited by: Hierarchical BiGraph Neural Network as Recommendation Systems.
- Graph convolutional matrix completion. External Links: Cited by: §3, Hierarchical BiGraph Neural Network as Recommendation Systems.
- Attention is all you need. External Links: Cited by: §2.
- Graph attention networks. External Links: Cited by: §2.
- Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. 38 (5). External Links: Cited by: Hierarchical BiGraph Neural Network as Recommendation Systems.
Graph convolutional neural networks for web-scale recommender systems. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. External Links: Cited by: §3.
- Inductive matrix completion based on graph neural networks. External Links: Cited by: §3, Hierarchical BiGraph Neural Network as Recommendation Systems.
- User-based collaborative-filtering recommendation algorithms on hadoop. In 2010 Third International Conference on Knowledge Discovery and Data Mining, Vol. , pp. 478–481. Cited by: Hierarchical BiGraph Neural Network as Recommendation Systems.