Hierarchical BiGraph Neural Network as Recommendation Systems

07/27/2020 ∙ by Dom Huh, et al. ∙ George Mason University 0

Graph neural networks emerge as a promising modeling method for applications dealing with datasets that are best represented in the graph domain. In specific, developing recommendation systems often require addressing sparse structured data which often lacks the feature richness in either the user and/or item side and requires processing within the correct context for optimal performance. These datasets intuitively can be mapped to and represented as networks or graphs. In this paper, we propose the Hierarchical BiGraph Neural Network (HBGNN), a hierarchical approach of using GNNs as recommendation systems and structuring the user-item features using a bigraph framework. Our experimental results show competitive performance with current recommendation system methods and transferability.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Preliminaries

To test our model, we experimented on the MovieLens 100K dataset created by the GroupLens Research Project at the University of Minnesota Harper and Konstan (2015), which is a common benchmark dataset for recommendation systems. It consists of 100,000 ratings on a scale of 5 from 943 users on 1682 movies. Along with unique identification numbers, for each user, the age, occupation, zip code and gender are provided, and for each movie, the movie title, genre, and other metadata are provided. For our experiments, we use the unique identification number, age, occupation, zip code and gender as user features, and the unique identification number and genre as movie features. For this dataset, there exists a standard 5-fold training and validation split which we used in order to accurately compare with current state of the art models. To test for transferability, we experimented on the MovieLens 1M dataset, which has 1 million ratings from 6000 users on 4000 movies and has the same user and item metadata. As there is no defined train-valid split for this dataset, we used a 80-20 train-valid split, with the latter timestamps of the ratings held out as validation.

Figure 1: Rating Graph
Figure 2: User Graph
Figure 3: Movie Graph
Figure 4: MovieLens Graphs: The three graphs above specify the graphs used for MovieLens 100K for -HBGNN. For -HBGNN, the deviation would be the elimination of the identification number nodes on both user and movie level graphs, and introducing these features at the rating place graph in their respective nodes.

With the HBGNN described in Section.2, we have formulated a supervised learning task with a defined as the user-item bigraph and as the rating. Thus, to tune our model, we use root mean squared error, , as our cost function given our prediction, , and the target value, y.

(1)

We use AMSGrad with Weight Decay Reddi et al. (2018); Krogh and Hertz (1992), an adaptive gradient descent optimization based on exponential moving average updates paired with weight regularization, to optimize the cost function defined in Eq.1.

2 Hierarchical BiGraph Neural Network (HBGNN)

We start by formalizing the generalized bigraph framework used to structure the user-item features of the recommendation system using graphical form shown in Fig. 5. The generalized bigraph framework introduced allows us to sufficiently express the locality and connectivity of the entities and its features. In the context of recommendation systems, there exists two entities: user and item. Thus, there are two nodes placed in the scope of the place graph, , and is connected by a single port. This port provides the communication between the two entities needed to merge the two entities for the task at hand. For recommendation systems, the task is to predict the ratings given user and item. The bigraph needs to be concrete, which means all supports need to be defined. Thus, to obtain the state for the two entities in the place graph, we define two link graphs that contain a densely connected nodes with states allocated by features. The links amongst the peers are bidirectional, and communicate to embed the profile of the entity.

Figure 5: Generalized bigraph framework: Example with user and item in context of recommendation system with a single port.
Figure 6: Bigraph Deconstruction: Example of a deconstructed bigraph to clearly illustrate the locality and connectivity of the generalized bigraph framework in context of recommendation systems and graph neural networks under the assumption of three features for both user and item entities.

Now, we will canonically go over the steps of how the HBGNN propagates the user and item features to predict a rating using Fig. 6. Since features of the user, , and item, , are sparse and usually discrete, we introduce learnable embeddings, and , to be assigned to their respective node within the link graphs.

(2)

Once embeddings are assigned to their respective node, both link graphs perform message passing. Each node processes incoming messages using an gated recurrent unit

Li et al. (2015) with the input being the aggregated sum of all neighbor messages, , for node denoted as and the hidden state being the current state denoted as , at each updated node feature, . It is important to note that the message sent by neighbors are the current state of the neighbors, and no independent transforms should be applied individually to each message rather on the aggregated sum as to be tolerant to isomorphism. The message passing process can done concurrently through all nodes in both link graphs, thus being very computational efficient.

(3)

Upon messages passing rounds, all node states in the link graphs are passed through an neural network for encapsulation and then assigned to the respective nodes, and , in the place graph. With our design choice of making the link graph fully connected, can equal 1 and still allow propagation through all combinations of nodes states. We will experiment with this claim in Section 3.

(4)

Since all supports are defined upon assignment, we can communicate via port for the recommendation system task. We can perform message passing between the two nodes at the place graph similar to how it was done in the link graphs. Again, each entity node contains a gated recurrent unit and messages are aggregated in the same manner. We can additionally choose whether the identification number of the user and item is introduced in the link graph or the place graph. Models that input the identification number at the link graph will be prefixed with whereas models that input the identification number at the place graph will be prefixed with . Furthermore, once again, since the place graph is fully connected, complete propagation between the two peers is done with a single iteration of message passing.

The priors to determine the sparsity in the link graphs are not trivial. In graph structures, its sparsity can be controlled by either edge removal or edge reweighing. We argue edge reweighing is a more generalized approach of edge removal, since given that connects node to node , the message passed through can be scaled by 0 to represent edge removal. To formalize edge reweighing, we denote as the set of edge weights, and for all edges in , there is an associated element in . Thus, . Now, message passing with edge reweighing can be a weighted sum of all neighboring messages.

(5)

To compute each , we can leverage past works Veličković et al. (2017); Vaswani et al. (2017); Liao et al. (2019) on the attention mechanism and graph neural networks. Simply, we can assign the hidden state as the key, and each message as the query and value for node .

Figure 7: Edge reweighing with attention mechanism with a node with three neighbor nodes.

The key and query are transformed using a linear transform

and .

(6)

Thus, we introduce a variant of the HBGNN that utilizes the attention mechanism for edge reweighing, as to overcome the unknown prior for feature dependency and sparsity in both user and item link graphs. We denote this model as Attention HBGNN (AHBGNN).

To obtain the rating, we can proceed in numerous ways by utilizing some combination of the nodes of the link graph and/or place graph, but we will only illustrate on using solely the place graph in this paper. Once message passing is complete, we can pass the states of the two entity nodes of the place graph into an neural network to predict the rating. To optimize the HBGNN, we can treat the recommendation system problem as a supervised learning task.

Methods RMSE
MovieLens 100K MovieLens 1M
Train Test Train Test
GCMC - 0.910 - 0.832
IGCMC - 1.142 - 1.259
IGMC - 0.905 - 0.857
PinSage - 0.951 - 0.906
FEAE - 0.920 - 0.860
SSM - 0.910 - 0.863
MLP 1.141 1.178 1.123 1.149
-HBGNN 0.002 0.927 0.002 0.877
-HBGNN* 0.002 0.914 0.003 0.863
-HBGNN 0.729 0.930 0.581 0.898
-HBGNN* 0.704 0.912 0.579 0.879
-AHBGNN 0.001 0.910 0.002 0.852
-AHBGNN 0.708 0.931 0.548 0.870
Table 1:

RMSE train-test results on MovieLens 100K and 1M. The suffix * denotes transfer learning.

We argue that HBGNN is implicitly utilizing both collaborative filtering and content-based filtering under this generalized bigraph paradigm. HBGNN embeds the user and item profiles into disjoint graphs, encoding the same information used to describe by all the users and items in the user and item population. The learned embedding is generalized and optimized in a manner that take into account other users and item, and this aspect acts as collaborative filtering since it can generate correlations amongst the population. By using a supervised learning structure, HBGNN utilizes on past user and item relationship. Thus, this aspect acts as content-based filtering since it can generate correlations based on user-item history.

3 Results

We conducted experiments described in Section 1, and compare our models, -HBGNN, -HBGNN, and AHBGNN, to the current modeling methods that were tested on both datasets on the RMSE metric, such as Graph Convolution Matrix Completion (GCMC) van den Berg et al. (2017), Inductive Graph Convolution Matrix Completion (IGCMC) Zhang and Chen (2019), Inductive Graph-based Matrix Completion (IGMC) Zhang and Chen (2019), PinSage Ying et al. (2018), Self-Supervised Exchangeable Model (SSEM) Hartford et al. (2018)

, and Factorized Exchangable Autoencoder (FEAE)

Hartford et al. (2018)

. Also, we tested a traditional multi-layer perceptron model (MLP) as well on the datasets with the same features as the HBGNNs and their own set of learnable embedding.

Both -HBGNN and

-HBGNN uses 512 parameters for each state in the user and movie link graphs, and 2048 parameters for each state in the rating place graph. The size of the embedding matrix depends on the range of the user and movie feature, but we set the user age range to be fixed at 100 so that we can leverage transfer learning from MovieLens100K and MovieLens1M. The encoding neural network that assigns the state of the nodes in the place graph is a single layer model with 4096 linear perceptrons. The processing network used to predict the rating is a 5-layer multi-layer perceptron model with leaky rectifier transforms between the layers. All linear layers are initialized with Kaiming initialization using an uniform distribution.

The results shown in Table 1 conclude our HBGNNs to performs competitively with the current modeling methods on both dataset. The -HBGNN and

-HBGNN was trained for 75 epochs on the MovieLens100K dataset, and trained for 30 epochs for MovieLens1M, and the final RMSE loss is shown in Table

1. The attention variants followed respective training procedures.

Additionally, we fine-tuned the pretrained -HBGNN and -HBGNN that performed the best amongst the five folds from the MovieLens100K dataset on the MovieLens1M dataset, and the pretrained models outperforms the models without pre-training within one epoch. We also attempted fine=tuning the pretrained -HBGNN and -HBGNN from the MovieLens1M dataset on the MovieLens100K dataset, and found similar results. Both results can be seen in Table 1 and each pretrained model was only fine-tuned for 5 epoch for both datasets, as it converged very quickly and to a better local optima. To do this, we had to reformat and retrain the unique identification number embeddings for user and movie as they differed between the two datasets, as well the zip code for the user profile. So, when we fine-tune the models, HBGNN only needs to learn the embeddings for the identification numbers and the zip code, and adjust the ports and links minimally. From our experimental results in Table 1, we can conclude the transferability of this architecture to be very effective, only required to relearn the embeddings for the identification number and the zip code for the user.

Figure 8: TSNE visualization of user profiles using Chebyshev distance distinguished by their respective ratings.

The -HBGNN achieved faster convergence than -HBGNN, however once converged, both models achieved similar performance on validation set, with -HBGNN performing slightly better. -HBGNN appears to overfit to the training set much more than -HBGNN in both MovieLens datasets. Both methods achieve competitive performance amongst current methods, and additional regularization should be applied in future works for better generalization and performance on the validation set.

In Fig. 8, we visualize each user profiles created by the user link graph and their respective ratings generated by the place graphs using t-distributed Stochastic Neighbor Embedding (TSNE) using -HGNN with 1 message passing round for both place and link graphs. The details of each movie is listed in Table 2.

Movie Titles Genre Number of Ratings
1 2 3 4 5
Star Trek: First Contact (1996) Action, Adventure, Sci-Fi 5 25 87 123 53
Homeward Bound II… (1996) Adventure, Children’s 6 8 6 8 2
Cats Don’t Dance (1997) Animation, Children’s, Musical 5 6 7 1 5
Love Bug, The (1969) Children’s, Comedy 4 8 24 4 1
Table 2: Movies used for TSNE visualization, showing genre and distributions of provided ratings within MovieLens100K dataset.

4 Conclusion

In this paper, we have proposed Hierarchical BiGraph Neural Net (HBGNN), a hierarchical approach of using graph neural networks as recommendation systems and structuring the user-item features using a bigraph framework. We show that HBGNN is competitive compared to current recommendation systems and is transferrable to higher scaled tasks without much retraining. In future works, we hope to explore different methods of understanding the embeddings in the link and place graphs of the HBGNN (ie. understand the individual and joint profiles), and test different design choices, (ie. methods of message passing, connectivity, different graph neural network architectures).

References