Introduction
With the rapid growth of Internet services and mobile devices, personalized recommender systems play an increasingly important role in modern society. They can reduce information overload and help satisfy diverse service demands. Such systems bring significant benefits to at least two parties. They can: (i) help users easily discover products from millions of candidates, and (ii) create opportunities for product providers to increase revenue.
To provide a more accurate and interpretable recommendation service, knowledge graphs (KGs) are being incorporated into recommender systems. A KG is a heterogeneous graph, where nodes function as entities and edges represent relations between the entities. This is an effective data structure to model relational data, e.g., two movies directed by the same director. Several recent works have integrated KGs into the recommendation model, and the approaches can be divided into two branches: pathbased DBLP:conf/kdd/HuSZY18; DBLP:conf/cikm/WangZWZLXG18 and regularizationbased DBLP:conf/kdd/ZhangYLXM16; DBLP:conf/kdd/Wang00LC19. Pathbased methods extract paths from the KG that carry the highorder connectivity information and feed these paths into the predictive model. To handle the large number of paths between two nodes, researchers have either applied path selection algorithms to select prominent paths or defined metapath patterns to constrain the paths. By contrast, regularizationbased methods devise additional loss terms that capture the KG structure and use these to regularize the recommender model learning.
Although many effective models have been proposed, we argue that there are still several avenues for enhancing performance. First, previous works learn the KG representations in the Euclidean space. As has been observed in other application domains, this may not effectively capture the hierarchical structure that is known to exist within KGs DBLP:conf/icml/SalaSGR18. Second, methods like CKE DBLP:conf/kdd/ZhangYLXM16, CFKG DBLP:journals/algorithms/AiACZ18, and RippleNet DBLP:conf/cikm/WangZWZLXG18 do not distinguish between neighboring entities, adjusting according to their relative importance and informativeness, when learning the representation of each entity. This may lead to undesirable blurring of information from relations in the KG and an incomplete understanding of an entity. Third, all the regularizationbased methods adopt a fixed hyperparameter. We argue that the regularization degree should be adaptive, taking on different values for different entities according to the relevance and value of the information from the knowledge graph. Furthermore, different training phases may need different magnitudes of regularization power values, so the hyperparameter values should evolve during training.
To tackle the aforementioned problems, we propose a knowledgeenhanced recommendation model in the hyperbolic space, namely HyperKnow
, to tackle the topK recommendation task. In particular, we map the entity and relation embeddings of the KG as well as user and item embeddings to the Poincaré ball model. This allows us to capture the hierarchical structure in the KG. We incorporate an attention model in the hyperbolic space, and use the Einstein midpoint for aggregation, in order to form a representation of the neighborhood of each item in the knowledge graph. We then use a regularization term to encourage the representation of an item to remain close to the representation of its neighborhood (in the hyperbolic space). This transfers the relational and structural information from the knowledge graph to the recommendation model. To adaptively control the regularization effect, we model the learning of adaptive and finegrained regularization factors as a bilevel (inner and outer) optimization problem
DBLP:journals/tec/SinhaMD18. We build a proxy function to explicitly link the learning of the regularization related parameters with the outer objective function. We extensively evaluate our model on three realworld datasets, comparing it with many stateoftheart methods using a variety of performance validation metrics. The experimental results not only demonstrate the improvements of our model over other baselines but also show the effectiveness of the proposed components.To summarize, the major contributions of this paper are:

[leftmargin=*]

To model the hierarchical structure of KG, we map the entity and relation embeddings of the KG into the Poincaré ball along with user and item embeddings. To the best of our knowledge, ours is the first work to consider knowledgeenhanced recommendation in the hyperbolic space.

To transfer the knowledge from the KG to the recommendation model, we incorporate hyperbolic attention and use the Einstein midpoint to aggregate the neighboring entities of an item to form a neighborhood representation.

To learn the adaptive regularization factors, we cast the learning process as a bilevel optimization problem and build a proxy function to explicitly update the regularizationrelated parameters.

Experiments on three realworld datasets show that HyperKnow significantly outperforms the stateoftheart methods for the topK recommendation task.
Related Work
General Recommendation
Early recommendation studies largely focused on explicit feedback DBLP:conf/www/SarwarKKR01; DBLP:conf/kdd/Koren08. The recent research focus is shifting towards implicit data DBLP:conf/cikm/TranLL018. Collaborative filtering (CF) with implicit feedback is usually treated as a TopK item recommendation task, where the goal is to recommend a list of items to users that users may be interested in. It is more practical and challenging DBLP:conf/icdm/PanZCLLSY08, and accords more closely with the realworld recommendation scenario. Early works mostly rely on matrix factorization techniques DBLP:conf/icdm/HuKV08; DBLP:conf/uai/RendleFGS09
to learn latent features of users and items. Due to their ability to learn salient representations, (deep) neural networkbased methods
DBLP:conf/www/HeLZNHC17; DBLP:conf/icdm/SunZMCGTH19; DBLP:conf/kdd/MaKL19are also adopted. Autoencoderbased methods
DBLP:conf/wsdm/WuDZE16; DBLP:conf/cikm/MaZWL18; DBLP:conf/wsdm/MaKWWL19 have also been proposed for TopK recommendation. In DBLP:conf/kdd/LianZZCXS18; DBLP:conf/ijcai/XueDZHC17, deep learning techniques are used to boost the traditional matrix factorization and factorization machine methods. Recently, some methods are also conducted in the hyperbolic space. HyperML
DBLP:conf/wsdm/TranT0CL20 conducts metric learning in the hyperbolic space and outperforms Euclidean counterparts. DBLP:conf/sigir/FengTCCLL20 propose to tackle the next PointofInterest recommendation task in the hyperbolic space.Knowledge Graph Enhanced Recommendation
Knowledge graphs (KGs) are an important means to represent side information of recommender systems and have proven to be helpful to improve the recommendation performance. For example, DBLP:conf/kdd/ZhangYLXM16 propose to apply the TransR method DBLP:conf/aaai/LinLSLZ15 to learn the KG representation as well as the item embeddings in the KG. DBLP:journals/algorithms/AiACZ18 integrate users and items with the KG and jointly learn the recommendation and KG part. DBLP:conf/www/WangZZLXG19 propose a multitask feature learning approach for knowledge graph enhanced recommendation, where these two parts are connected with a crossandcompress unit to transfer knowledge and share regularization of items. Another track of research tries to perform propagation over the KG to assist in recommendation. Specifically, RippleNet DBLP:conf/cikm/WangZWZLXG18 extends the user’s interests along KG links to discover her potential interests by introducing preference propagation, which automatically propagates users’ potential preferences and explores their hierarchical interests in the KG. KPRN DBLP:conf/aaai/WangWX00C19 constructs the extracted path sequence with both the entity embedding and the relation embedding. These paths are encoded with an LSTM layer and the preferences for items in each path are predicted through fullyconnected layers. KGCN DBLP:conf/www/WangZXLG19 studies the utilization of Graph Convolutional Networks (GCNs) for computing embeddings of items via propagation among their neighbors in the KG. Recently, KGAT DBLP:conf/kdd/Wang00LC19 recursively performs propagation over the KG via a graph attention mechanism that refines entity embeddings. Several subsequent works DBLP:conf/sigir/ChenZMLM20; DBLP:conf/www/WangX000C20 focus on optimizing the negative sampling procedure in knowledgeenhanced recommendation. In this paper, we report results for our proposed method using a vanilla negative sampling strategy, so that we can focus on the performance impact of the novel aspects: learning in the hyperbolic space, using hyperbolic attention with Einstein midpoint aggregation, and introducing adaptive regularization. But the advanced negative sampling strategies can also be incorporated into our proposed method to provide a further performance improvement.
Our proposed model distinguishes itself from previous models by learning knowledgeenhanced recommendation in the Poincaré ball model. In addition, we employ a hyperbolic attention model in the hyperbolic space to assign different degrees of importance to the neighboring entities of a certain item. We introduce a bilevel optimization formulation of the learning task to achieve an adaptive regularization mechanism that controls the regularization effect.
Preliminaries
Problem Formulation
The knowledgebased recommendation considered in this paper takes as inputs the user implicit feedback and the item knowledge graph. The implicit feedback is represented by a number of useritem pairs , where is the user set and is the item set. The item knowledge graph can be formulated as a set of triples, each consisting of a relation and two entities , referred to as the head and tail of the triple.
Then the top recommendation task in this paper is formulated as: given the training item set of user , and the nonempty test item set (requiring that and ) of user , the model must recommend an ordered set of items such that and . Then the recommendation quality is evaluated by a matching score between and , such as Recall@.
Hyperbolic Geometry of the Poincaré Ball
The Poincaré ball model is one of five isometric models of hyperbolic geometry cannon1997hyperbolic, which is a nonEuclidean geometry with constant negative curvature. Formally, the Poincaré ball of radius is a dimensional manifold equipped with the Riemannian metric which is conformal to the Euclidean metric with the conformal factor , i.e., The distance between two points is measured along a geodesic (i.e. a shortest path between the points) and is given by:
(1) 
where denotes the Euclidean norm and represents Möbius addition DBLP:conf/nips/GaneaBH18:
(2) 
Methodology
In this section, we introduce the proposed model, HyperKnow, which integrates the knowledge graph with the recommendation task in the hyperbolic space. We first introduce the user preference learning in the hyperbolic space. Then we illustrate the hyperbolic attention mechanism that is used to distinguish items’ neighboring entities in the knowledge graph. We next explain how to adaptively learn the recommendation objective and knowledge graph by a bilevel optimization formulation. Lastly we introduce the training and prediction procedure of the proposed model.
Learning User Preference
User preference modeling lies at the core of recommender systems. Recently, distance metric learning has been widely applied to measure the user preference on items, yielding substantial performance gains DBLP:conf/www/HsiehYCLBE17. In this approach, the distance between user and item is used to measure the user preference on a certain item. To learn the user preference, we apply the Bayesian Personalized Ranking loss DBLP:conf/uai/RendleFGS09 to capture the pairwise preference of a user for an item that the user has accessed compared to a randomly sampled item :
(3) 
where , , and ,
is the sigmoid function, and
is the dimension of the manifold. represents the parameters of the recommender model.Regularizing Neighboring Entities
Knowledge graphs (KGs), consisting of (head entity, relationship, tail entity) triples, are efficient data structures for representing factual knowledge and are widely used in applications such as question answering DBLP:conf/aaai/ZhangDKSS18. Recently, KGs have been applied in recommender systems to not only enhance the recommendation performance but also provide interpretable recommendation results.
To effectively exploit KGs in recommender systems, we treat them as relational inductive biases DBLP:journals/corr/abs180601261 between items. During the learning process, the relations in the KG can be used as regularizers; if two items link to a common entity or multiple common entities in the KG which suggests that a user might have similar preferences for the items. However, an item can link to multiple entities in the KG and the relative importance of different entities can differ greatly. Moreover, the entities can contribute in different ways to the description of the item. This motivates us to propose an attention mechanism in the Poincaré ball model.
Considering an item , we use to denote the set of neighboring triples for which is the head entity. Then we apply a TransEstyle DBLP:conf/nips/BordesUGWY13 scoring function () to calculate the matching score between an item and its neighboring entity in :
(4) 
The usual way to aggregate multiple attentions in the Euclidean space is weighted midpoint aggregation. The corresponding operation in the hyperbolic space is not immediately obvious, but fortunately, the extension to hyperbolic space does exist in the form of the Einstein midpoint. It has a simple form in the Klein disk model cannon1997hyperbolic:
(5) 
where the elements of are the Lorentz factors and is the function to transform the coordinates from the Poincaré ball model to the Klein disk model. The Klein model is supported on the same space as the Poincaré ball, but the same point has different coordinates in each model. Let and denote the coordinates of the same point in the Poincare and Klein models correspondingly. Then the following transition formulas hold:
(6)  
We call in (5) the neighborhood representation of item . During the training process we add a regularizing term that encourages the neighborhood representation to be close to the item’s representation . The goal is to transfer the inductive bias in KG to the item representation:
(7) 
Combining with the user preference learning objective , the overall knowledgeenhanced objective can be:
(8) 
where is to balance the effect from the KG.
Adaptive and Finegrained Regularization
Previous works DBLP:conf/kdd/Wang00LC19; DBLP:conf/cikm/WangZWZLXG18; DBLP:conf/kdd/ZhangYLXM16 that derive information from a KG in the recommender setting use a single and fixed number for in Eq. 8 to train the overall objective. However, employing a single fixed value for can have several drawbacks. First, different datasets may require different impact levels of regularization from KGs. Treating as a fixed value requires extra hyperparameter searching procedure for each dataset to better realize the power of KGs. Second, different items may need different degrees of regularization. Using the same value for every item would limit the achievable performance improvement that can be derived from the KG information. Third, in different training phases, the model may need different magnitudes of regularization power.
To address the problems outlined above, we propose an adaptive regularization scheme to apply different strengths of regularization to each item and to adjust the strength throughout training. We formulate Eq. 8 as:
(9) 
where is th value of and where is the sigmoid function. Unfortunately, directly minimizing this objective function is not able to achieve the desired purpose of adaptively controlling the regularization. The reason is that, considering
explicitly appears in the loss function, constantly decreasing the value of
is the straightforward way to minimize the loss. As a consequence, instead of reaching optimal values for the model, all will end up with very small values close to zero, leading to unsatisfactory results.To tackle the above problem, we model the learning of recommendation models and the adaptive regularization of KG as a bilevel optimization problem DBLP:journals/anor/ColsonMS07:
(10)  
Here contains the model parameters , , and . The objective function attempts to minimize with respect to with fixed. Meanwhile, the objective function optimizes with respect to through , considering as a function of .
As most existing models use gradientbased methods for optimization, a simple approximation strategy with less computation is introduced as follows:
(11) 
In this expression, is the learning rate for one step of inner optimization. Related approximations have been validated in DBLP:conf/wsdm/Rendle12; DBLP:conf/iclr/LiuSY19; DBLP:conf/kdd/MaMZTLC20. Thus, we can define a proxy function to link with the outer optimization:
(12) 
For simplicity, we use two optimizers and to update and , respectively. The iterative procedure is shown in Alg. 1.
Training and Prediction
After incorporating a parameter regularization term to avoid overfitting, the overall loss function is:
(13)  
where is a hyperparameter. When minimizing the objective function, the partial derivatives with respect to all the parameters can be computed by gradient descent with backpropagation. We apply the Adam DBLP:journals/corr/KingmaB14 algorithm to automatically adapt the learning rate during the learning procedure.
Recommendation Phase. For user , we compute the distance between the user and each item in the dataset. Then the items that are not in the training set and have the shortest distances are recommended to user .
Evaluation
In this section, we first describe the experimental setup. We then report the results of the conducted experiments and demonstrate the effectiveness of the proposed modules.
Datasets
The proposed model is evaluated on three realworld datasets from various domains with different sparsities: Amazonbook, LastFM and Yelp2018, which are fully adopted from DBLP:conf/kdd/Wang00LC19. The Amazonbook dataset is adopted from the Amazon review dataset DBLP:conf/www/HeM16 with the book category, which covers a large amount of useritem interaction data, e.g., user ratings and reviews. The LastFM dataset is collected from last.fm music website, where the tracks are viewed as the items. A subset of data from Jan. 2015 to Jun. 2015 is selected. The Yelp2018 dataset is adopted from the 2018 edition of the Yelp challenge, where local businesses like restaurants and bars are viewed as the items.
All the above datasets follow the 10core setting to ensure that each user and item have at least ten interactions. For Amazonbook and LastFM, items are mapped into Freebase entities via title matching if there is a mapping available. For Yelp2018, the item knowledge from the local business information network (e.g., category, location, and attribute) is extracted as KG data. The data statistics after preprocessing are shown in Table 1.
For fair comparison, these three datasets in our experiments are exactly the same as those used in DBLP:conf/kdd/Wang00LC19. For each dataset, 80% of interaction data of each user is randomly selected to constitute the training set, and we treat the remaining 20% as the test set. From the training set, 10% of interactions are randomly selected as validation set to tune hyperparameters. The experiments are executed five times and the average result is reported.
Amazonbook  LastFM  Yelp2018  
#Users  70,679  23,566  45,919 
#Items  24,915  48,123  45,538 
#Interactions  847,733  3,034,796  1,185,068 
#Entities  88,572  58,266  90,961 
#Relations  39  9  42 
#Triplets  2,557,746  464,567  1,853,704 
FM  NFM  CKE  CFKG  RippleNet  GCMC  KGAT  HyperKnow  Improv.  
Recall@20  
Amazonbook  0.1345  0.1366  0.1343  0.1142  0.1336  0.1316  0.1489  0.1534*  3.23% 
LastFM  0.0778  0.0829  0.0736  0.0723  0.0791  0.0818  0.0870  0.0949*  9.08% 
Yelp2018  0.0627  0.0660  0.0657  0.0522  0.0664  0.0659  0.0712  0.0683  N/A 
NDCG@20  
Amazonbook  0.0886  0.0913  0.0885  0.0770  0.0910  0.0874  0.1006  0.1075*  6.86% 
LastFM  0.1181  0.1214  0.1184  0.1143  0.1238  0.1253  0.1325  0.1533*  16.70% 
Yelp2018  0.0768  0.0810  0.0805  0.0644  0.0822  0.0790  0.0867  0.0897*  3.46% 
compared to the best baseline method based on the paired ttest.
Evaluation Metrics
We evaluate all the methods in terms of Recall@K and NDCG@K. For each user, Recall@K (R@K) indicates what percentage of her rated items emerge in the top recommended items. NDCG@K (N@K) is the normalized discounted cumulative gain at , which takes the position of correctly recommended items into account.
Methods Studied
To demonstrate the effectiveness of our model, we compare to the following recommendation methods:

FM DBLP:conf/icdm/Rendle10, a classical factorization model, which incorporates the secondorder feature interactions between input features.

NFM DBLP:conf/sigir/0001C17, a stateoftheart factorization model, which subsumes FM under a neural network.

CKE DBLP:conf/kdd/ZhangYLXM16, a representative regularizationbased method, which exploits semantic embeddings derived from TransR DBLP:conf/aaai/LinLSLZ15 to enhance the matrix factorization.

CFKG DBLP:journals/algorithms/AiACZ18, a model that applies TransE DBLP:conf/nips/BordesUGWY13 on the unified graph including users, items, entities, and relations, casting the recommendation task as the prediction of (u, Interact, i) triplets.

RippleNet DBLP:conf/cikm/WangZWZLXG18, a model that combines regularization and pathbased methods, which enrich user representations by adding those of items within paths rooted at each user.

GCMC DBLP:journals/corr/BergKW17, a model designed to employ a graph convolutional network on graphstructured data. Here the model is applied on the useritem knowledge graph.

KGAT DBLP:conf/kdd/Wang00LC19, a stateoftheart KG enhanced model, which employs a graph neural network and an attention mechanism to learn from highorder graphstructured data for recommendation.

HyperKnow, the proposed model, which learns the knowledgeenhanced recommendation in the Poincaré ball and applies hyperbolic attention for distinguishing neighboring entities and bilevel optimization for adaptive regularization, respectively.
Experiment Settings
In the experiments, the latent dimension of all the models is set to 64. The parameters for all baseline methods are initialized as in the corresponding papers, and are then carefully tuned to achieve optimal performances. The learning rate is tuned amongst , and we search for the coefficient of L2 normalization over the range . To prevent overfitting, the dropout ratio is selected from the range for NFM, GMMC, and KGAT. The dimension of attention network is tested over the values . Regarding NFM, the number of MLP layers is set to with neurons according to the original paper. For RippleNet, we set the number of hops and the memory size as and , respectively. For KGAT, we set the depth as with hidden dimension , , and , respectively. The network architectures of the above methods are configured to be the same as described in the original papers. For HyperKnow, the curvature is set to 1 and the batch size is set to
. The hyperparameters are tuned on the validation set. Our experiments are conducted with PyTorch running on GPU machines (NVIDIA Tesla V100).
Performance Comparison
The performance comparison results are shown in Table 2.
Observations about our model
. First, the proposed model—HyperKnow, achieves the best performance for most evaluation metrics on three datasets, which illustrates the superiority of our model. Second, HyperKnow outperforms KGAT on the Amazonbook and LastFM datasets. Although KGAT adopts the attention model to distinguish the entity importance in the knowledge graph, it may not effectively capture the hierarchical structure between entities, which can be wellmodeled by learning the entity and relation embeddings in the hyperbolic space. One possible reason why HyperKnow does not outperform KGAT for the Recall@20 metric on the Yelp2018 dataset is that most of the entities in this KG are linked according to whether they have the same attributes, such as
HasTV. Most of these attributes are very generic, which means that the KG provides information of limited value. As a result, much of the transfer that HyperKnow performs from the KG to the recommendation part for the Yelp2018 dataset is likely to be noise. Third, HyperKnow achieves better performance than GCMC and RippleNet. Although GCMC and RippleNet can model highorder connectivities, they fail to identify the important entities that would make a difference in recommendation. On the other hand, HyperKnow employs an attention model in the hyperbolic space to learn the neighborhood representation of an item and transfers the knowledge from the KG to the item representation via regularization. Fourth, HyperKnow obtains better results than CKE. One possible reason is that CKE adopts a fixed power of regularization during the whole training process. By contrast, HyperKnow performs finegrained regularization to regularize the item and its neighborhood. Fifth, HyperKnow outperforms FM and NFM. One reason may be that using a distance as the scoring function can capture more finegrained user preference.Other observations. First, KGAT outperforms GCMC and RippleNet. KGAT is capable of exploring the highorder connectivity in an explicit way and applies a graph attention model to aggregate the neighbors in the useritem knowledge graph in a weighted manner. Second, FM and NFM achieve better performance than CFKG and CKE in most cases. One major reason is that FM and NFM capture the secondorder connectivity between users and entities, whereas CFKG and CKE model connectivity on the granularity of triples, leaving highorder connectivity untouched. Third, RippleNet achieves better performance than FM. This may verify that incorporating twohop neighboring items is of importance to enrich user representations. Fourth, NFM performs better than FM. One major reason is that NFM has stronger expressiveness, since the hidden layer allows NFM to capture the nonlinear and complex feature interactions between user, item, and entity embeddings.
Architecture  Amazonbook  LastFM  
R@20  N@20  R@20  N@20  
(1) BPR+E  0.1017  0.0729  0.0604  0.1112 
(2) BPR+H  0.1167  0.0833  0.0656  0.1191 
(3) BPR+Att+E  0.1121  0.0812  0.0746  0.1319 
(4) BPR+Att+H  0.1447  0.1025  0.0885  0.1453 
(5) BPR+Avg+H  0.1250  0.0897  0.0775  0.1358 
(6) HyperKnow  0.1534  0.1075  0.0949  0.1533 
Ablation Analysis
To verify the effectiveness of the proposed model in the Poincaré ball, the hyperbolic attention model, and the adaptive regularization mechanism, we conduct an ablation study in Table 3. This demonstrates the contribution of each module to the HyperKnow model. In (1), we use the Euclidean distance to measure the user preference optimized by the BPR loss. In (2), we apply the distance in the Poincaré ball to measure users’ preferences and optimize using Eq. 3. In (3), we integrate the TransEstyle attention on the top of (1) in the Euclidean space. In (4), we add hyperbolic attention to (2). In (5), we replace the attention model in (4) with an average operation in the hyperbolic space. In (6), we present the overall HyperKnow model to show the effectiveness of the adaptive regularization mechanism.
From the results shown in Table 3, we make the following observations. First, comparing (1) and (2), we can observe that measuring the user preference by calculating distance in the hyperbolic space achieves better performance than calculating distance in the Euclidean space. This confirms the results reported in DBLP:conf/wsdm/TranT0CL20. Second, from (2) and (3), we observe that incorporating the hyperbolic attention model significantly improves the model performance. Third, in (3) and (4), we compare the performance of the attention model in both the Euclidean and hyperbolic space. From the results, we can observe that the attention model achieves better results in the hyperbolic space than in the Euclidean space. Fourth, from (1), (2), (3) and (4), we can observe that equipping the recommendation model with the KG either in the Euclidean space or hyperbolic space can improve the recommendation performance. Fifth, from (4) and (5), we observe that by distinguishing the importance of each neighbour of an item through attention, we achieve considerable improvement compared to a simple average. Comparing (4) and (6), we can observe that the adaptive regularization can provide the finegrained regularization power.
CKE  CFKG  KGAT  HyperKnow  
Amazonbook  55s  22s  457s  15s 
LastFM  53s  27s  137s  22s 
Yelp2018  63s  37s  352s  20s 
Training Efficiency
In this section, we compare the training efficiency with other stateoftheart KGenhanced methods in terms of the training speed. We compare the time taken for one epoch of training. From the results reported in
DBLP:conf/sigir/ChenZMLM20, these compared methods take a similar number of epochs to converge as well as our proposed method. RippleNet is not computationefficient and takes much longer to train, we omit the comparison with RippleNet. All the experiments are conducted on a single GPU of an NVIDIA Tesla V100. All the compared methods are executed for 20 epochs and we report the average computation time, which is shown in Table 4. The training time comparison shows that HyperKnow is more computationally efficient than other stateoftheart methods, and the reason follows. Compared to CKE, HyperKnow has a smaller number of learnable parameters (8.3 million v.s. 11.4 million on the LastFM dataset). Compared to KGAT and CFKG, HyperKnow does not incorporate the users into the KG, which makes the scale of the KG much smaller.Influence of Hyperparameters
The value of that regularizing the item embedding with its neighborhood is an important hyperparameter if not using the adaptive regularization mechanism. Its effect on the Amazonbook and LastFM datasets is shown in Figure 2.
From the results in Figure 2, we observe that the value of does affect the recommendation performance, with performance deteriorating by as much as 10 percent if a suboptimal value is chosen. Furthermore, there is no fixed value that achieves performance that is as good as the performance obtained by using the proposed adaptive mechanism. These results demonstrate that the finegrained and adaptive regularization benefits the recommendation task, which confirms the results reported in DBLP:conf/wsdm/Rendle12.
Embedding Visualization
To verify whether the learned embedding in the Poincaré ball can capture the hierarchical structure in the knowledge graph, we train HyperKnow in the 2D space on the LastFM dataset and visualize the entities in the 2D hyperbolic space. We randomly select two nodes and their twohop neighbors to visualize. The visualization is shown in Figure 3. The biggest dot denotes the selected entity, the less biggest dot denotes the firsthop neighbor of the selected entity, and the smallest dot denotes the secondhop neighbor.
From Figure 3, we can observe that these three kinds of nodes may form the hierarchical patterns in Poincaré ball and the learned embeddings in the hyperbolic space can represent the hierarchical relationships.
Conclusion
In this paper, we propose a knowledgeenhanced recommendation model in the hyperbolic space (HyperKnow) for topK recommendation. HyperKnow learns the user and item embeddings as well as the knowledge graph representation in the Poincaré ball model to capture the hierarchical structure in the knowledge graph. In addition, we incorporate hyperbolic attention to select the most important neighboring entities of each item. To adaptively control the regularization effect, a bilevel optimization mechanism is proposed to generate a finegrained regularization effect between recommendation and the knowledge graph. Experimental results on three realworld datasets clearly validate the performance advantages of our model over multiple stateoftheart methods and demonstrate the effectiveness of each of the proposed constituent modules.