Recommender systems (RS) estimate users’ preferences for items to provide personalized recommendations and a better user experience. The underlying assumption is that users may be interested in items selected by people who share similar interactions with them. By mining interaction records, a modelimplicitly learns user-user and item-item similarities, and exploits them for recommendations (Salakhutdinov and Mnih, 2008).
One of the major challenges for RS is the long tail of users with sparse interaction data. Making recommendations for new users or about new items with little data is difficult (the so called cold-start problem). To alleviate this issue, a considerable strand of research has explored incorporating side information to augment the interaction data (Sun et al., 2019). A prominent branch of this work explores using textual descriptions and reviews, which often exist alongside rating or purchase data (McAuley and Leskovec, 2013; Tan et al., 2016b)
. However, a recent re-evaluation of such techniques indicates that the benefits are marginal, especially in cold-start scenarios. For example, modern deep learning-based models yield minimal changes in performance when reviews are masked(Sachdeva and McAuley, 2020).
In this paper we introduce a simple and effective form of data augmentation that leverages pre-trained textual semantic similarity models. The models are applied to widely available textual data, like product descriptions and reviews, yielding new relations between items. In this manner, we complement the implicit item similarity learnt from interactions by introducing explicit semantic relations based on textual attributes. We explore how these relations guide models to group semantically similar items and boost the recommendation system’s performance.
The data augmentation technique is evaluated on a variety of models where the user-item graph can naturally be extended with new relations. Many of these models are variants of knowledge graph recommenders since they provide an expressive and unified framework for modelling side information between users, items, and related entities (Zhang et al., 2016). Analysis of local and global geometric measures of the generated graphs indicate that the augmented graphs are better represented in hyperbolic spaces (see §4). Therefore, we also explore a variety of representational alternatives for recommender systems, including Euclidean (Bordes et al., 2013; Wang et al., 2014; Yang et al., 2015), complex (Sun et al., 2019) and hyperbolic (Chami et al., 2020; Balazevic et al., 2019) geometries.
Finally, we analyze how the proposed relations are more efficient at encoding semantic information present in textual descriptions compared to baselines that extract latent features from raw text. We find that our technique is more effective at reducing the generalization error, and this is particularly notable in cold-start settings. Furthermore, we analyze which type of text is more helpful for this task, noting that the product description can provide very condensed and useful information to draw semantic relations.
In this work we investigate two types of inductive biases: a data-dependent bias, by augmenting the graph relations via pre-trained language models, and a geometric bias, through the choice of a metric space to embed the relations. By means of a thorough assessment, we show how they complement each other. As we enlarge the data, the hyperbolic properties of the graph become more evident. In summary, we make the following contributions:
We propose an unsupervised data augmentation technique by explicitly mining semantic relations derived from items’ text that boosts the performance, and is particularly effective in cold-start settings.
We provide a thorough analysis of local and global structural aspects of the user-item graph, and its augmented version, which indicates that the graph is better represented in hyperbolic space rather than in Euclidean.
We explore KG methods developed in Euclidean, hyperbolic and complex spaces, and showcase how they achieve state-of-the-art performance for recommendations, when leveraged with the appropriate relations.
2. Mining Semantic Relations
A popular approach to incorporate side information in recommender systems is to exploit user reviews and item descriptions, which often exist alongside rating or purchase data (McAuley and Leskovec, 2013; Tan et al., 2016b). A textual review is much more expressive than a single rating, and the underlying assumption is that reviews are effective user/item descriptors (Ge et al., 2019; Margaris et al., 2020), thus they can be used to learn better latent features (Catherine and Cohen, 2017; Chen et al., 2018).
In a conventional deep learning architecture, these features are used in matrix factorization (Zheng et al., 2017). In such a setup, the model is burdened with the task of learning an implicit similarity function that should emerge from the textual input and the user’s purchase history. Based on this affinity, alike users and items are grouped and leveraged for recommendations. However, Sachdeva & McAuley (Sachdeva and McAuley, 2020) show that recent models yield minimal changes in performance when reviews are masked. They observe that reviews are more effective when used as a regularizer (McAuley and Leskovec, 2013; Hsieh et al., 2017), rather than as side data to extract latent features, and this behavior is accentuated in cold-start scenarios.
This paper introduces a different approach to benefit from textual attributes: it proposes leveraging advances in textual similarity models (Cer et al., 2018) to feed a recommender explicit content-based similarities via new edges in the interaction graph. This requires no supervision and and complements the implicit similarity function that the model learns, without increasing the computational complexity. The data augmentation increases the density in the graph and serves as an efficient regularizer without the need to reduce the effective capacity of the model (Hernández-García and König, 2018).
The goal of the proposed method is to augment the user-item interaction graph with item-item relations. These relations are based on the semantic similarity between the item descriptions and reviews. Initially we collect all the available text for each item in the set of items
. This includes metadata, such as item name, descriptions and reviews. We experiment with various filters, as heuristic measures of the helpfulness of different types of text (e.g. top-k longest reviews, or only the metadata).
To compute text embeddings we employ the Universal Sentence Encoder (USE) (Cer et al., 2018), as it has shown good performance on sentence similarity benchmarks, and it can be applied without any further fine-tuning. Moreover, the average review length in purchase datasets tends to be one paragraph (Ai et al., 2018; McAuley and Leskovec, 2013), and USE has also been pre-trained to encode paragraphs composed of more than one sentence. We compute one embedding for each review (or descriptor) corresponding to the item . The final embedding for item is the result of taking the mean of all its review embeddings.222An ablation of the heuristics and the encoder can be found in §6.4.
Once we have assigned an embedding to all items, we employ cosine similarity to compute the similarity between them. Finally, we extend the original user-item training set with the semantic relations between pairs of items. We filter out low-similarity pairs, and select the top-k highest similarities to be added as relations.
3. Representing the Graph
Given that we extend the user-item graph by adding semantic relations, we need to account for the different types of edges present in the augmented graph. We model this multi-relational graph as a knowledge graph, since they offer a flexible approach to add multiple relations between diverse entities. In this section we describe knowledge graphs, and their application in recommender systems.
3.1. Knowledge Graphs
Knowledge graphs (KGs) are multi-relational graphs where nodes represent entities and typed-edges represent relationships among entities. They are popular data structures for representing heterogeneous knowledge in the shape of (head, relation, tail)
triples, which can be queried and used in downstream applications. The usual approach to work with KGs is to learn representations of entities and relations as vectors, for some choice of space(typically ), such that the KG structure is preserved. More formally, let be a knowledge graph where is the set of entities, is the set of relations and is the set of triples stored in the graph. Most of the KG embedding methods learn vectors for , and for . The likelihood of a triple to be correct is evaluated using a model specific score function .
3.2. KG for Recommender Systems
Knowledge graph embedding methods333In this work we focus on embedding methods, according to the classification presented on (Guo et al., 2020), and not in path-based methods. have been widely adopted into the recommendation problem as an effective tool to model side information (Guo et al., 2020). Multiple relations between users, items, and heterogeneous entities can be mapped into the KG and incorporated to alleviate data sparsity and enhance the recommendation performance (Zhang et al., 2016).
Knowledge graph-based recommender systems can be seen as multi-task models, with well-established advantages. Learning several tasks (relations) at a time reduces the risk of over-fitting by generalizing the shared entity representations (Zhang et al., 2019b). The data augmentation also improves the generalization of the model and acts as an effective regularizer (Hernández-García and König, 2018). Furthermore, akin relations can help the model to learn different types of entity interactions, such as similarities, that are finally exploited for recommendations (Ruder, 2017).
Multi-relational knowledge graphs exhibit an intricate and varying structure as a result of the logical properties of the relationships they encode (Miller, 1992; Suchanek et al., 2007; Lehmann et al., 2015). An item can be connected to different entities by symmetric, anti-symmetric, or hierarchical relations. To capture these non-trivial patterns more expressive operators become necessary.
In Table 1, we show KG embedding methods, along with their operators, and in which RS work they have been applied. It can be seen that previous work integrating KG into RS has applied a rather narrow set of methods. These are the translational approaches TransH (Wang et al., 2014), TransR (Lin et al., 2015), or TransD (Ji et al., 2015), which are extensions of TransE (Bordes et al., 2013). We notice that the state of the art in the field of KG embedding methods has advanced in recent years, as more compound and expressive operators have been developed. However, recommender systems have continued to apply outdated models and have not profited from the current progress.
Recent approaches propose to embed the graph into non-Euclidean geometries such as hyperbolic spaces (Balazevic et al., 2019; Kolyvakis et al., 2020; Chami et al., 2020), to model embeddings over the complex numbers (Trouillon et al., 2016; Lacroix et al., 2018; Sun et al., 2019), or apply quaternion algebra (Zhang et al., 2019a). In the next section we describe methods that combine different operators and achieve SotA performance on KG completion tasks.444For a brief review of hyperbolic geometry see §4.2.
3.3. Methods Compared
RotatE (Sun et al., 2019): Maps entities and relations to the complex vector space and defines each relation as a rotation in the complex plane from the source entity to the target entity. Given a triple (h, r, t), it is expected that , where denotes the Hadamard (element-wise) product. Rotations are chosen since they can simultaneously model and infer inversion, composition, symmetric or anti-symmetric patterns.
MuRP (Balazevic et al., 2019): By establishing a comparison with word analogies through hyperbolic distances (Tifrea et al., 2019), the authors propose a scoring function based on relation-specific Möbius multiplication on the head entity, and Möbius addition (Ganea et al., 2018) on the tail entity:
where ; are scalar biases for the head and tail entities respectively, and is the hyperbolic distance (Eq 2).
RotRef (Chami et al., 2020): Extends MuRP with rotations and reflections in hyperbolic space by learning relationship-specific isometries through Givens transformations.555https://en.wikipedia.org/wiki/Givens_rotation The result of these operations is combined with an attention mechanism in the tangent space (Chami et al., 2019).
3.4. Augmenting the graph
We augment the user-item graph by exploiting heterogeneous relations between items and other entities. Our starting point is the user-item graph, composed of solely triples. We augment this bipartite graph with the aforementioned semantic relations. These are triples, meaning that there is a semantic overlap between the descriptors corresponding to these items. Previous work has explored relations between items and diverse entities such as product brands and categories (Zhang et al., 2018; Ai et al., 2018; Xian et al., 2019), or movie directors and actors (Zhang et al., 2016; Cao et al., 2019; Xin et al., 2019), depending on the dataset. To the best of our knowledge, this paper is the first to explore using pre-trained semantic similarity models to create new relations.
The graph augmentation modifies its size and structure. A change in its connectivity affects the optimal embedding space to operate (Gu et al., 2019). Thus, we analyze this aspect in the next section.
4. Interaction Graph Analysis
The predominant approach to deal with graphs has been to embed them in an Euclidean space. Nonetheless, graphs from many diverse domains exhibit non-Euclidean features (Bronstein et al., 2017). In particular, several RS datasets exhibit power-law degree distributions (Chamberlain et al., 2019), and properties of scale-free networks (Cano et al., 2006; Kitsak et al., 2017), which imply a latent hyperbolic geometry (Krioukov et al., 2010). If we match the geometry of the target embedding space to the structure of the data, we can improve the representation fidelity (Gu et al., 2019; López et al., 2021). With the goal of understanding which type of Riemannian manifold would be more suitable as embedding space, we analyze different structural aspects and geometric properties of the graph.
Our analysis shows that when we augment the relationships in the graph, the added edges modify its connectivity and structure, making it more hyperbolic-like.
To investigate the recommendation problem with regard to different relationships and geometries, we focus on the Amazon dataset666https://nijianmo.github.io/amazon/index.html (McAuley and Leskovec, 2013; Ni et al., 2019), as it is a standard benchmark for RS, and it provides item reviews and metadata in the form of textual descriptions. Nonetheless, our analysis generalizes to RS datasets that exhibit a latent hyperbolic geometry, studied in (Cano et al., 2006; Kitsak et al., 2017; Chamberlain et al., 2019).
Specifically, we adopt the 5-core split for the branches ”Musical Instruments” (MusIns), ”Video Games” (VGames) and ”Arts, Crafts and Sewing” (Arts&Crafts), which form a diverse dataset in size and domain. Besides the semantic relations, we also add relationships available on the dataset that have already been explored in previous work (Zhang et al., 2018; Ai et al., 2018; Xian et al., 2019). These are:
also_bought: users who bought item A also bought item B.
also_view: users who bought the item A also viewed item B.
category: the item belongs to one or more categories.
brand: the item belongs to one brand.
The number of each type of relation added to the final augmented graph is reported in Table 2.
4.2. Hyperbolic Geometry
Hyperbolic geometry is a non-Euclidean geometry with constant negative curvature. Hyperbolic space is naturally equipped for embedding symbolic data with hierarchical structures (Nickel and Kiela, 2017; Sala et al., 2018). Intuitively, that is because the amount of space grows exponentially as points move away from the origin. This mirrors the exponential growth of the number of nodes in trees with increasing distance from the root (Cho et al., 2019). Thus, hyperbolic space can be seen as the continuous analogue to a discrete tree-like structure. Embedding norm represents depth in the hierarchy, and distance between embeddings the affinity or similarity of the respective items (López et al., 2019). In this work, we analyze models operating in the -dimensional Poincaré ball: .777Although (Chami et al., 2020) learns the curvature, we fix it to .
For two points the distance in this space is defined as:
4.3. Curvature Analysis
Curvature is a geometric property that describes the local shape of an object. If we draw two parallel paths on a surface with positive curvature like a sphere, these two paths move closer to each other while for a negatively curved surface like a saddle, these two paths tend to be apart. There are multiple notions of curvature in Riemannian manifolds, with varying granularity. In the interest of space888For an in-depth treatment see (Lee, 1997)., we only recall a key notion: hyperbolic spaces have constant negative curvature, Euclidean spaces have zero curvature (flat) and spherical spaces are positively curved.
Discrete data such as graphs do not have manifold structure. Thus, curvature analogs are necessary to provide a measure that satisfies similar properties (Cruceru et al., 2020). In this work, we apply the Ollivier-Ricci curvature to analyze the graphs (Ollivier, 2009). Since this type of curvature characterizes the space locally, we plot the results in Figure 5.
We can observe that nodes and edges in the user-item graphs exhibit a very negative curvature (in red color). Negatively curved edges are highly related to graph connectivity, and removing them would result in a disconnected graph (Ni et al., 2015). As we add more relationships, the augmented graph becomes much more connected, therefore these edges play a less important role. Nonetheless, the overall curvature, as shown by the vast majority of nodes and edges, remains negative. The correspondence with the negative curvature of hyperbolic space suggests that both, the user-item and the augmented graphs would profit from a representation in that geometry, rather than in an Euclidean one.
Also known as Gromov hyperbolicity (Gromov, 1987), -hyperbolicity quantifies with a single number the hyperbolicity of a given metric space. The smaller the is, the more hyperbolic-like or negatively-curved the space is. This measure has also been adapted to graphs (Fournier et al., 2015).
We report the -mean and -max in Table 3. We can see that both measures decrease when we compare the user-item (U-I) graph with the augmented one (Augmen). The metric shows that, as we add more relationships to the initial user-item graph, it becomes more hyperbolic-like. This global metric complements the local curvature and also indicates that the graph fits into a hyperbolic space.
We experiment with different approaches to represent the multi-relational interaction graph for the task of generating recommendations. Our aim is to compare recent KG techniques, with a particular focus on the ones operating in hyperbolic spaces, with KG methods applied in previous work and SotA recommender systems.
The recommender system baselines are:
BPR (Rendle et al., 2009): Standard collaborative filtering baseline for RS based on matrix factorization (MF) with Bayesian personalized ranking.
HyperML (Vinh Tran et al., 2020): Hyperbolic adaptation of CML.
Since reproducing all the works that adopt KGs is unfeasible (see Table 1), and in cases such as (Zhang et al., 2016, 2018; Ai et al., 2018) the KG method is applied practically without modifications, we directly employ the KG models themselves. The selected methods are:
TransE (Bordes et al., 2013): Translation-based KG method.
DistMul (Yang et al., 2015): Based on multi-linear product, a generalization of the dot product.
RotatE (Sun et al., 2019): Performs rotations in the complex plane .
RotRef (Chami et al., 2020): Based on rotations and reflections.
MuR (Balazevic et al., 2019): Based on multiplications and additions.
For RotRef and MuR we compare to the Euclidean and hyperbolic versions.
To evaluate the models we utilize the branches of the Amazon dataset introduced in §4.1. Since it is very costly to rank all the available items, following (He et al., 2017b; Vinh Tran et al., 2020), we randomly select samples which the user has not interacted with, and rank the ground truth amongst these samples. To generate evaluation splits, the penultimate and last item the user has interacted with are withheld as dev and test sets respectively.
Given differences in preprocessing strategies, we reproduce the baselines based on their public implementations. To ensure consistency in the results we conduct a hyper-parameter search for all methods on the validation set. All models are trained with the Adam optimizer (Kingma and Ba, 2014), and operate with latent dimensions. We train for epochs with early stopping on the dev set. To optimize parameters in hyperbolic models we apply tangent space optimization (Chami et al., 2019). We choose the best learning rate for each method from from and batch-size from . In each case, we report the average of three runs. Preliminary experiments with multi-task losses (splitting the loss between KG and RS components as in (Wang et al., 2019a; Cao et al., 2019; Xin et al., 2019)) did not show significant improvements therefore we disregard this approach. Models that incorporate item features (NeuMF and CML++) are fed with the text embeddings used to compute the semantic similarities. In this way, all models have access to information extracted from the same sources.
We evaluate the models in two setups: User-item, where we only employ the user-item interactions, and Augmented, where we utilize the graph with all the added relationships. Moreover, for the Augmented case we follow the standard data extension protocol by adding inverse relations to the train split (Lacroix et al., 2018; Chami et al., 2020) . That is, for each triple , we also add the inverse .
Evaluation protocols and Metrics:
To evaluate the recommendation performance of KG methods we only look at the buy relation. For each user we rank the items according to the scoring function . We adopt normalized discounted cumulative gain (DCG) and hit ratio (HR), both at
, as well-established ranking evaluation metrics for recommendations.
Through our experiments we aim to answer the following questions:
RQ1 How do KG methods perform compared to recently published RS?
RQ2 What is the impact of the data augmentation?
RQ3 How important are different relations to improving recommendations?
RQ4 Which text attributes are most helpful?
RQ5 How do different metric spaces compare?
6. Results and Discussion
6.1. RQ1: Performance over user-item graph
In Table 4 we report the results for all models. Regarding the user-item results, we can observe that NeuMF is a very strong baseline, surpassing the performance of all other RS, and several KG methods. This can be explained by the fact that NeuMF and CML++ have access to more information than other baselines, since they are fed with embeddings generated from the items’ text. However, RotRef and MuR outperform all models. These models are designed to deal with multi-relational graphs, but in this case they only see one type of relation (). Although they operate solely based on the user-item interactions, the compound operators that they incorporate allow them for a more expressive representation of the bipartite graph, which results in improved recommendations. KG methods applied only on the user-item graph report a very high performance due to their enhanced representation capacity, thus we consider they should be adopted as hard-to-beat baselines in further research in recommender systems.
6.2. RQ2: Exploiting Augmented Data
We can see that all models (except for HyperML in MusIns) have a significant boost in performance when we train them on the augmented graph , with gains up to for RotatE over the user-item data in MusIns. The proposed densification process reduces the sparsity by adding meaningful relations between the entities. All models, including the RS that are not designed to incorporate multi-relational information, benefit from the augmented data. In this setup, the RS can be thought as models that aim to predict the plausibility of an interaction between any two entities (not only user-item). Although they do not account for each particular type of relation, they profit from the extended training set and this results in an enhanced generalization, which contributes to cluster users and items in a way that improves the recommendation.
In this setting, TransE and TransH show a greater relative increase in their performance (higher ) compared to recommender system baselines. Since the purpose of the KG models is to cope with multi-relational data, they can exploit the augmented relations in a much better way than the RS baselines. Although the rotations in the complex plane of RotatE offer noticeable improvements of more than in all branches with the augmented graph, it is outperformed by translational approaches such as TransE.
HyperML and CML++ do not show large gains or high performances with the added relations. The reason for this is that metric learning approaches that lack relational operators are ill-posed algebraic systems when there is a large number of interactions (see (Tay et al., 2018a), Thm 2.1).
MuR and RotRef are the best performing models also in this setup. These results highlight how heterogeneous sources of information, when leveraged through adequate tools and formulations, allow models to exploit the augmented data and achieve considerable performance gains.
Hyperbolic and Euclidean models show very competitive results for MusIns and Arts&Crafts. It has been shown that in low-dimensional setups () hyperbolic space offers significant improvements (Nickel and Kiela, 2017; Leimeister and Wilson, 2018; Chami et al., 2020). Since we operate with dimensions, both models exhibit a similar representation capacity on these datasets. Nonetheless, MuRP outperforms its Euclidean counterpart in VGames for both setups. These findings are in line with the previous analysis (§4), and they emphasize the importance of choosing a suitable metric space that fits the data distribution as a powerful and efficient inductive bias. Since the amount of space in the hyperbolic representation grows exponentially compared to the Euclidean one, this model is able to accommodate entities in a better way, unfolding latent hierarchies in the graph (see §6.5).
Finally, we also demonstrate the recent developments in KGs, and their enriched representation capacities. Advanced KG methods achieve a much better performance than their predecessors and outperform RS explicitly designed for the task in both setups.
6.3. RQ3: Relation Ablation
In this section we investigate the contribution of each individual relation to the results of MuRP, which we consider the best performing model. Besides evaluating on the test split, we create a subset with the of users with fewer number of interactions. They represent users affected by the cold-start problem, since they exhibit very few interactions with items. We refer to this split as ”Cold Test”. The results of the ablation are presented in Table 5.
When we look at individual performance on the test set, we see that each relation brings improvements over the user-item graph (buy relation), which highlights the key role of data augmentation to boost the performance of these models. We notice that the semantic relation is the best in the ”Musical Instruments” branch, whereas in the other two branches, it is the second. The relation also_bought seems to be more helpful in those cases. also_bought is mined from behavioral patterns of users with respect to complementary products that are usually bought together (McAuley et al., 2015), and it plays a fundamental role in predicting user purchases (Wölbitsch et al., 2019). However, the relations also_bought and also_view are derived from the entire dataset, and not only from the train split. These relations incorporate information between users and items on the dev/test set, posing considerable advantages over the brand, category and semantic relations.999This was confirmed by personal correspondence with the authors of (Ni et al., 2019).
Finally, all relationships combined outperform the individual setups for all branches of the dataset in the test set, in line with previous results (Zhang et al., 2018). This demonstrate how KG methods can leverage heterogeneous information in an unified manner, and shows the scalability of the approach to new relation types.
We analyze how different relations affect users in cold-start settings, by looking at the of users with the fewest number of interactions (Cold Test). In this case, the semantic relation brings a very pronounced boost in performance: , and for MusIns, VGames and Arts&Crafts respectively, much more than any other relation. This shows the effectiveness of semantic relations to densify the graph, with remarkable improvements particularly over sparse users and items. These observations are in line with previous research that has shown how reviews, when used as regularizers, are particularly helpful to alleviate cold-start problems (Sachdeva and McAuley, 2020). Moreover, we notice that the performance for ”Cold Test” tends to be better than for ”Test” in MusIns and Arts&Crafts. We hypothesize that this is caused by modelling users as a single point in the space. When users exhibit a large number of interactions with distinct items, it becomes more difficult to place the user embedding close to all their preferences.
6.4. RQ4: The Role of Reviews
6.4.1. Relations vs Features
We analyze the effectiveness of semantic relations to model side information extracted from textual descriptions when compared to different ways of incorporating latent features proposed in previous work. We consider the following models: CML++ utilizes the item features as an explicit regularizer, NeuMF initializes the item embeddings with the features, and Narre (Chen et al., 2018) uses TextCNN (Kim, 2014) to extract features from the text. We do not compare to KG approaches since they do not incorporate latent features. For CML++ and NeuMF we employ the text embeddings used to compute the semantic similarities as item features. Results for the ablation are reported in Table 6.
Compared to baselines that do not incorporate any side information, we see that relations bring a larger improvement than features for CML++ and Narre (17.1 vs 3.7 and 10.6 vs 7.4 relative improvement respectively). Although these models are not specifically designed to incorporate multi-relational information, data augmentation via semantic relations seems to be more effective than exploiting textual features. For NeuMF
, the transfer learning technique of initializing the item embeddings with textual features proves to be very effective. Nonetheless, when semantic relations are combined with features, the performance is improved.
This ablation showcases the efficacy of semantic relations to model information extracted from textual data. They can be seamlessly integrated with latent features, and the improvements of the combined models demonstrate that they provide complementary information for recommendations.
6.4.2. Type of Text for Semantic Relation
We analyze which type of text is more useful for extracting features and capturing item similarities. To do so, we filter the available text for each item according to different criteria: (a) Only textual metadata, such as product name, description and categorical labels, (b) only reviews, (c) only reviews with high level of sentiment polarity, and (d) metadata + top-k longest reviews. We repeat the method described in §2.2 to pre-process relations, and encode all text with USE. We compare CML++ and NeuMF, which are models that incorporate item features, and MuRP. In all cases we run experiments with the buy + semantic training set, in order to analyze the addition of semantic information only. Results for ”MusIns” are reported in Table 7. We can see that adding features on top of relations degrade the performance of CML++ (the same behavior can be noticed in Table 6). This model integrates item features as a regularizer to correct the projection into the target embedding space. In this setup, the semantic relations are already contributing in that regard, therefore the integration of extra features seems to mislead the model. On the other hand, NeuMF has a drastic drop in performance when features are removed. In this case, the features and relations extracted from text that combine metadata with the longest reviews helps the most in the recommendations, followed by utilizing all the available reviews as input.
Finally, MuRP does not use any features and solely learns from the relations available in the graph. We again see that using metadata combined with the longest reviews is the most useful text to learn item dis/similarities. This is due to the fact that longer reviews tend to be more descriptive about the items. However, results using only metadata show competitive performance as well. In the three models, reviews with high polarity (i.e. very ”positive/negative” reviews rather than ”neutral” ones) do not seem to carry extra descriptive information such that it can be leveraged for similarities.
This ablation suggests that to learn useful dis/similarities between items for recommendations, it is convenient to combine the longest reviews with the item metadata. Nonetheless, it is noticeable that metadata by itself can be leveraged to obtain competitive results. We consider this a relevant outcome since, in the absence of lengthy reviews, a brief and accurate item name and description might as well offer remarkable performance gains.
6.4.3. Encoder Analysis
This ablation is related to the choice of a pre-trained encoder that captures text similarities in an unsupervised manner. We compare the performance of USE to BERT (Devlin et al., 2019), without applying any fine-tuning, and Sentence-BERT (Reimers and Gurevych, 2019), which is an adaptation of BERT with a siamese network, fine-tuned for sentence similarity. We analyze the MuRP model with the same four criteria for filtering the available text, and we compare to the setup without adding the semantic relations as well.
Results for the ”MusIns” are reported in Table 8. We observe that when we use BERT as the encoder, the performance of the user-item graph extended with semantic relations is worse than using the user-item graph alone. In Figure 6 we show the cosine similarities for random item embeddings. We see that for the BERT encoder, which has not been pre-trained on semantic similarity objectives, most items are very similar to each other, and this hinders the model from clustering them. This setup can be considered as an ablative experiment of the original model, where we add random semantic relations. This demonstrates the importance of the semantic information contained in the connections that we create, and how the model is able to leverage them.
Sentence-BERT shows improvements over BERT. Nevertheless, we find USE to be the most effective encoder that captures review and metadata dis/similarities. Sentence-BERT is more competent at distinguishing degrees of similarity than BERT, but Figure 6 shows that the patterns detected are alike. On the other hand, USE is able to identify a broader range of dis/similarities between items, that results in an improved performance for recommendations.
6.5. RQ5: Metric Space Analysis
Our experiments showed that hyperbolic methods can offer improvements over systems operating on Euclidean space or with complex numbers. Moreover, since hyperbolic space is naturally equipped for embedding hierarchical structures, its self-organizing properties make it amenable to capture different types of hierarchies as a by-product of the learning process. In the resulting embeddings, the norm represents depth in the hierarchy. As explained in §3.4, each item in the Amazon dataset has a set of categorical labels that describes which categories the item belongs to. We analyze the hyperbolic and Euclidean versions of MuR, and report the Spearman correlation between the norms of the embeddings for each category and the number of interactions of that category:
The correlation is moderate to high for the hyperbolic model, whereas for the Euclidean model it is non-existent. This indicates that more ”general” categories (with more interactions) have a shorter norm, which is expected when embedding a hierarchy in a hyperbolic space (Sala et al., 2018), while the origin of the space has no particular meaning in the Euclidean model.
To shed light on this aspect, we reconstruct the hierarchies that the hyperbolic and Euclidean models learn. To do so, we randomly select a category embedding, and iteratively look at the closest neighbor that has less or equal norm (this would be the parent category). We report the hierarchies for ”Electric Guitar Bags & Cases” and ”Footswitches” from the ”MusIns” dataset in Table 9.
We notice that the hyperbolic hierarchy is much more concise and precise, compared to the Euclidean one. The hyperbolic model builds different ”short” hierarchies, accommodating the labels in a more spread way, whereas the Euclidean model learns a ”tall” tree of categories, which results in a much more noisy arrangement.
This shows that the hyperbolic model automatically infers the hierarchy arising from the label distribution (López et al., 2019; López and Strube, 2020), and offers a more interpretable space. Furthermore, the model achieves this as a by-product of the learning process with augmented relations, without being specifically trained for this purpose.
7. Related Work
Data augmentation plays an important role in machine learning(Krizhevsky et al., 2012; Shorten and Khoshgoftaar, 2019), as it reduces the generalization error without affecting the effective capacity of the model (Hernández-García and König, 2018). In RS, augmentation techniques have been applied by extending co-purchased products (Wölbitsch et al., 2019), generating adversarial pseudo user-item interactions (Wang et al., 2019b), or casting different user actions as purchases (Tan et al., 2016a; Tuan and Phuong, 2017). Also by exploiting item side information, such as images (Chu and Tsai, 2017), audio (Liang et al., 2015) or video (Chen et al., 2017) features. We propose a new unsupervised method to learn similarity relations between items (or users) based on semantic text models applied to textual attributes. Our work expands the strand of research that incorporates review information as regularization technique (Sachdeva and McAuley, 2020), and notably improves the performance for the cold-start.
Recommenders using Text:
Previous work has mined text to use it as regularizer (McAuley and Leskovec, 2013; Hsieh et al., 2017), or as latent features to learn better user and item representations (Zheng et al., 2017; Chen et al., 2018). However, Sachdeva & McAuley argue that the benefit of using reviews for recommendation is overstated, and the reported gains are only possible under a narrow set of conditions (Sachdeva and McAuley, 2020). They specifically note that reviews seem to provide little benefit as features, but help more when used for regularization. The simple data augmentation method proposed in this paper offers another way to think about using textual information. The added relations improve user and item representations, like adding features can, but without needing to increase the representation size. Thus it can also be seen as providing the benefits of regularization, but without directly constraining model expressivity, as for instance dropout does.
Knowledge Graph Recommenders:
Previous work integrating KG into RS has applied a narrow set of representational methods, favoring Euclidean translational approaches (Guo et al., 2020) (see Table 1). The experiments in this paper expand on previous analysis by including more recent KG embedding methods. Our results show that newer methods significantly benefit from the introduced data augmentation and outperform not only previous KG recommenders, but also other state-of-the-art recommendation systems.
The advantages of hyperbolic space have been argued for in a wide variety of application domains: question answering (Tay et al., 2018b), machine translation (Gulcehre et al., 2019), language modeling (Dhingra et al., 2018), hierarchical classification (López et al., 2019; López and Strube, 2020), and taxonomy refinement (Aly et al., 2019; Le et al., 2019) among others. In RS, hyperbolic geometry naturally emerges in several datasets (Cano et al., 2006; Kitsak et al., 2017), and hyperbolic spaces have been applied in combination with metric learning approaches (Vinh Tran et al., 2020; Chamberlain et al., 2019). In this work, we expand on these studies and carry out a structural analysis of the properties of user-item graphs extended by our data augmentation, which shows that hyperbolic methods may have an important role to play for both interpretability of recommendations, and for enabling higher performance by leveraging lower representational dimensionality.
In this work we propose a simple unsupervised data augmentation technique that adds semantic relations to the user-item graph based on applying pre-trained language models to widely available textual attributes. This can be regarded as a data-dependent prior that introduces an effective inductive bias, without increasing the computational cost of models at inference time.
By exploring a variety of modern KG methods, we observe that recent advances, when combined with our data augmentation technique, result in state-of-the-art RS performance (RQ1, §6.1). Moreover, the proposed data augmentation improves the performance of all analyzed models, including those that are not designed to handle multi-relational information (RQ2, §6.2). Thus, the technique can be considered architecture-agnostic.
Our ablation study highlights the impact of the semantic relations particularly in cold-start settings (RQ3, §6.3). Regarding the choice of textual inputs, the study reveals that using either reviews or brief product descriptions are both effective (RQ4, §6.4). An important branch of further work here is to explore if these results generalise to denser domains, as we notice anecdotal evidence that the benefit of this data augmentation diminishes as the average degree of nodes increases.
Finally, our analysis of structural properties of the graphs, which can be extended to more datasets that exhibit a latent hyperbolic geometry (Cano et al., 2006; Kitsak et al., 2017; Chamberlain et al., 2019), reveals that recommenders can benefit from operating in these metric spaces. In particular, we remark how hyperbolic space improves the interpretability for recommendations (RQ5, §6.5).
Acknowledgements.This work has been supported by the German Research Foundation (DFG) as part of the Research Training Group Adaptive Preparation of Information from Heterogeneous Sources (AIPHES) under grant No. GRK 1994/1 and the Klaus Tschira Foundation, Heidelberg, Germany.
-  (2018) Learning heterogeneous knowledge base embeddings for explainable recommendation. Algorithms (English (US)). External Links: Cited by: §2.2, §3.4, Table 1, §4.1, §5.
-  (2019-07) Every child should have parents: a taxonomy refinement algorithm based on hyperbolic term embeddings. In Proceedings of the 57th Annual Meeting of the ACL, Florence, Italy, pp. 4811–4817. External Links: Cited by: §7.
-  (2019) Multi-relational poincaré graph embeddings. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32, pp. 4463–4473. External Links: Cited by: §1, §3.2, §3.3, 6th item.
-  (2013) Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.), Vol. 26, pp. 2787–2795. External Links: Cited by: §1, §3.2, 1st item.
-  (2017) Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine 34 (4), pp. 18–42. Cited by: §4.
-  (2006) Topology of music recommendation networks. Chaos (Woodbury, N.Y.), pp. 013107. External Links: Cited by: §4.1, §4, §7, §8.
-  (2019) Unifying knowledge graph learning and recommendation: towards a better understanding of user preferences. In The World Wide Web Conference, WWW ’19, New York, NY, USA, pp. 151–161. External Links: Cited by: §3.4, Table 1, §5.
-  (2017) TransNets: learning to transform for recommendation. In Proceedings of the Eleventh ACM Conference on Recommender Systems, RecSys ’17, New York, NY, USA, pp. 288–296. External Links: Cited by: §2.1.
Universal sentence encoder for English.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Brussels, Belgium, pp. 169–174. External Links: Cited by: §2.1, §2.2.
-  (2019) Scalable hyperbolic recommender systems. CoRR abs/1902.08648. External Links: Cited by: §4.1, §4, §7, §8.
-  (2020-07) Low-dimensional hyperbolic knowledge graph embeddings. In Proceedings of the 58th Annual Meeting of the ACL, Online, pp. 6901–6914. External Links: Cited by: §1, §3.2, §3.3, 5th item, §5, §6.2, footnote 7.
Hyperbolic graph convolutional neural networks. In Advances in Neural Information Processing Systems 32, pp. 4869–4880. External Links: Cited by: §3.3, §5.
-  (2018) Neural attentional rating regression with review-level explanations. In Proceedings of the 2018 World Wide Web Conference, WWW ’18, Republic and Canton of Geneva, CHE, pp. 1583–1592. External Links: Cited by: §2.1, §6.4.1, §7.
-  (2017) Attentive collaborative filtering: multimedia recommendation with item- and component-level attention. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’17, New York, NY, USA, pp. 335–344. External Links: Cited by: §7.
-  (2019-16–18 Apr) Large-margin classification in hyperbolic space. In Proceedings of Machine Learning Research, K. Chaudhuri and M. Sugiyama (Eds.), Proceedings of Machine Learning Research, Vol. 89, , pp. 1832–1840. External Links: Cited by: §4.2.
-  (2017) A hybrid recommendation system considering visual information for predicting favorite restaurants. WWW. External Links: Cited by: §7.
-  (2020) Computationally tractable riemannian manifolds for graph embeddings. In 37th International Conference on Machine Learning (ICML), Cited by: §4.3.
-  (2019) Location embeddings for next trip recommendation. In Companion Proceedings of The 2019 World Wide Web Conference, WWW ’19, New York, NY, USA, pp. 896–903. External Links: Cited by: Table 1.
-  (2019-06) BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the ACL: Human Language Technologies (NAACL-HLT), Minneapolis, Minnesota, pp. 4171–4186. External Links: Cited by: §6.4.3.
-  (2018-06) Embedding text in hyperbolic spaces. In Proceedings of the Twelfth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-12), New Orleans, Louisiana, USA, pp. 59–69. External Links: Cited by: §7.
-  (2015) Computing the gromov hyperbolicity of a discrete metric space. Inf. Process. Lett., pp. 576–579. External Links: Cited by: §4.4.
Hyperbolic neural networks. In Advances in Neural Information Processing Systems 31, pp. 5345–5355. External Links: Cited by: §3.3.
-  (2019) Helpfulness-aware review based neural recommendation. CCF Transactions on Pervasive Computing and Interaction 1, pp. 285–295. Cited by: §2.1.
-  (1987) Hyperbolic groups. In Essays in Group Theory, S. M. Gersten (Ed.), External Links: Cited by: §4.4.
-  (2019) Learning mixed-curvature representations in product spaces. In International Conference on Learning Representations, External Links: Cited by: §3.4, §4.
-  (2019-05) Hyperbolic attention networks. In 7th International Conference on Learning Representations, ICLR, New Orleans, LA, USA. External Links: Cited by: §7.
-  (2020) A survey on knowledge graph-based recommender systems. IEEE Transactions on Knowledge and Data Engineering (), pp. 1–1. External Links: Cited by: §3.2, §7, footnote 3.
-  (2017) Translation-based recommendation. In Proceedings of the Eleventh ACM Conference on Recommender Systems, RecSys ’17, New York, NY, USA, pp. 161–169. External Links: Cited by: Table 1.
-  (2017) Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web, WWW ’17, Republic and Canton of Geneva, CHE, pp. 173–182. External Links: Cited by: 2nd item, §5.
-  (2018) Data augmentation instead of explicit regularization. External Links: Cited by: §2.1, §3.2, §7.
-  (2017) Collaborative metric learning. In Proceedings of the 26th International Conference on World Wide Web, WWW ’17, Republic and Canton of Geneva, CHE, pp. 193–201. External Links: Cited by: §2.1, 3rd item, §7.
-  (2018) Improving sequential recommendation with knowledge-enhanced memory networks. In The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’18, New York, NY, USA, pp. 505–514. External Links: Cited by: Table 1.
-  (2015-07) Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual Meeting of the ACL and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, pp. 687–696. External Links: Cited by: §3.2.
-  (2014-10) Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1746–1751. External Links: Cited by: §6.4.1.
-  (2014) Adam: a method for stochastic optimization. ICLR. Cited by: §5.
-  (2017) Latent geometry of bipartite networks. Physical Review E. External Links: Cited by: §4.1, §4, §7, §8.
-  (2020) Hyperbolic knowledge graph embeddings for knowledge base completion. In The Semantic Web, A. Harth, S. Kirrane, A. Ngonga Ngomo, H. Paulheim, A. Rula, A. L. Gentile, P. Haase, and M. Cochez (Eds.), Cham, pp. 199–214. Cited by: §3.2.
-  (2010) Hyperbolic geometry of complex networks. Physical review E. External Links: Cited by: §4.
-  (2012) ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.), Vol. 25, pp. 1097–1105. External Links: Cited by: §7.
Canonical tensor decomposition for knowledge base completion. In Proceedings of Machine Learning Research, J. Dy and A. Krause (Eds.), Proceedings of Machine Learning Research, Vol. 80, Stockholmsmässan, Stockholm Sweden, pp. 2863–2872. External Links: Cited by: §3.2, §5.
-  (2019-07) Inferring concept hierarchies from text corpora via hyperbolic embeddings. In Proceedings of the 57th Annual Meeting of the ACL, Florence, Italy, pp. 3231–3241. External Links: Cited by: §7.
-  (1997) Riemannian manifolds: an introduction to curvature. Graduate Texts in Mathematics, Springer New York. External Links: Cited by: footnote 8.
-  (2015) DBpedia - a large-scale, multilingual knowledge base extracted from Wikipedia.. Semantic Web 6 (2), pp. 167–195. External Links: Cited by: §3.2.
-  (2018) Skip-gram word embeddings in hyperbolic space. CoRR abs/1809.01498. External Links: Cited by: §6.2.
-  (2019) Unifying task-oriented knowledge graph learning and recommendation. IEEE Access 7 (), pp. 115816–115828. External Links: Cited by: Table 1.
-  (2015) Content-aware collaborative music recommendation using pre-trained neural networks.. In ISMIR, M. Müller and F. Wiering (Eds.), pp. 295–301. External Links: Cited by: §7.
Learning entity and relation embeddings for knowledge graph completion.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI’15, pp. 2181–2187. External Links: Cited by: §3.2.
-  (2019-08) Fine-grained entity typing in hyperbolic space. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), Florence, Italy, pp. 169–180. External Links: Cited by: §4.2, §6.5, §7.
-  (2021-18–24 Jul) Symmetric spaces for graph embeddings: a finsler-riemannian approach. In Proceedings of the 38th International Conference on Machine Learning, M. Meila and T. Zhang (Eds.), Proceedings of Machine Learning Research, Vol. 139, pp. 7090–7101. External Links: Cited by: §4.
-  (2020-11) A fully hyperbolic neural model for hierarchical multi-class classification. In Findings of the ACL: EMNLP 2020, Online, pp. 460–475. External Links: Cited by: §6.5, §7.
-  (2020) What makes a review a reliable rating in recommender systems?. Information Processing & Management 57 (6), pp. 102304. External Links: Cited by: §2.1.
-  (2013) Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM Conference on Recommender Systems, RecSys ’13, New York, NY, USA, pp. 165–172. External Links: Cited by: §1, §2.1, §2.1, §2.2, §4.1, §7.
-  (2015) Inferring networks of substitutable and complementary products. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, New York, NY, USA, pp. 785–794. External Links: Cited by: §6.3.
-  (1992) WordNet: a lexical database for english. In Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992, External Links: Cited by: §3.2.
-  (2015) Ricci curvature of the internet topology. In 2015 IEEE Conference on Computer Communications (INFOCOM), Vol. , pp. 2758–2766. External Links: Cited by: §4.3.
-  (2019-11) Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 188–197. External Links: Cited by: §4.1, footnote 9.
-  (2017) Poincaré embeddings for learning hierarchical representations. In Advances in Neural Information Processing Systems 30, pp. 6341–6350. External Links: Cited by: §4.2, §6.2.
Ricci curvature of Markov chains on metric spaces. Journal of Functional Analysis 256 (3), pp. 810 – 864. External Links: Cited by: §4.3.
-  (2018) A study of the similarities of entity embeddings learned from different aspects of a knowledge base for item recommendations. In The Semantic Web: ESWC 2018 Satellite Events, A. Gangemi, A. L. Gentile, A. G. Nuzzolese, S. Rudolph, M. Maleshkova, H. Paulheim, J. Z. Pan, and M. Alam (Eds.), Cham, pp. 345–359. External Links: Cited by: Table 1.
-  (2019-11) Sentence-BERT: sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 3982–3992. External Links: Cited by: §6.4.3.
-  (2009) BPR: bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI ’09, Arlington, Virginia, USA, pp. 452–461. External Links: Cited by: 1st item.
-  (2017) An overview of multi-task learning in deep neural networks. CoRR abs/1706.05098. External Links: Cited by: §3.2.
-  (2020) How useful are reviews for recommendation? A critical review and potential improvements. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’20, New York, NY, USA, pp. 1845–1848. External Links: Cited by: §1, §2.1, 3rd item, §6.3, §7, §7.
-  (2018-10–15 Jul) Representation tradeoffs for hyperbolic embeddings. In Proceedings of the 35th International Conference on Machine Learning, J. Dy and A. Krause (Eds.), Proceedings of Machine Learning Research, Vol. 80, Stockholmsmässan, Stockholm Sweden, pp. 4460–4469. External Links: Cited by: §4.2, §6.5.
-  (2008) Bayesian probabilistic matrix factorization using markov chain monte carlo. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, New York, NY, USA, pp. 880–887. External Links: Cited by: §1.
-  (2019) Attentive knowledge graph embedding for personalized recommendation. CoRR abs/1910.08288. External Links: Cited by: Table 1.
-  (2019) A survey on image data augmentation for deep learning. Journal of Big Data 6 (60). Cited by: §7.
-  (2007) Yago: a core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web, WWW ’07, New York, NY, USA, pp. 697–706. External Links: Cited by: §3.2.
-  (2019) RotatE: knowledge graph embedding by relational rotation in complex space. In International Conference on Learning Representations, External Links: Cited by: §1, §3.2, §3.3, 4th item.
-  (2019-09) Research commentary on recommendations with side information: a survey and research directions. Electronic Commerce Research and Applications 37, pp. 1–29 (English). External Links: Cited by: §1.
Improved recurrent neural networks for session-based recommendations. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, DLRS 2016, New York, NY, USA, pp. 17–22. External Links: Cited by: §7.
-  (2016) Rating-boosted latent topics: understanding users and items with ratings and reviews. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16, pp. 2640–2646. External Links: Cited by: §1, §2.1.
-  (2019) AKUPM: attention-enhanced knowledge-aware user preference model for recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’19, New York, NY, USA, pp. 1891–1899. External Links: Cited by: Table 1.
-  (2018) Latent relational metric learning via memory-based attention for collaborative ranking. In Proceedings of the 2018 World Wide Web Conference, WWW ’18, Republic and Canton of Geneva, CHE, pp. 729–739. External Links: Cited by: Table 1, §6.2.
-  (2018) Hyperbolic representation learning for fast and efficient neural question answering. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM ’18, New York, NY, USA, pp. 583–591. External Links: Cited by: §7.
-  (2019-05) Poincare glove: hyperbolic word embeddings. In 7th International Conference on Learning Representations, ICLR, New Orleans, LA, USA. External Links: Cited by: §3.3.
-  (2016) Complex embeddings for simple link prediction. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, pp. 2071–2080. Cited by: §3.2.
-  (2017) 3D convolutional networks for session-based recommendation with content features. In Proceedings of the Eleventh ACM Conference on Recommender Systems, RecSys ’17, New York, NY, USA, pp. 138–146. External Links: Cited by: §7.
-  (2020) HyperML: a boosting metric learning approach in hyperbolic space for recommender systems. In Proceedings of the 13th International Conference on Web Search and Data Mining, WSDM ’20, New York, NY, USA, pp. 609–617. External Links: Cited by: 4th item, §5, §7.
-  (2018) DKN: deep knowledge-aware network for news recommendation. In Proceedings of the 2018 World Wide Web Conference, WWW ’18, Republic and Canton of Geneva, CHE, pp. 1835–1844. External Links: Cited by: Table 1.
-  (2019) Multi-task feature learning for knowledge graph enhanced recommendation. In The World Wide Web Conference, WWW ’19, New York, NY, USA, pp. 2000–2010. External Links: Cited by: §5.
-  (2019) Enhancing collaborative filtering with generative augmentation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19, New York, NY, USA, pp. 548–556. External Links: Cited by: §7.
-  (2019) KGAT: knowledge graph attention network for recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’19, New York, NY, USA, pp. 950–958. External Links: Cited by: Table 1.
-  (2014) Knowledge graph embedding by translating on hyperplanes. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, AAAI’14, pp. 1112–1119. Cited by: §1, §3.2, 2nd item.
-  (2019) Beggars can’t be choosers: augmenting sparse data for embedding-based product recommendations in retail stores. In Proceedings of the 27th ACM Conference on User Modeling, Adaptation and Personalization, UMAP ’19, New York, NY, USA, pp. 104–112. External Links: Cited by: §6.3, §7.
-  (2019) Reinforcement knowledge graph reasoning for explainable recommendation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’19, New York, NY, USA, pp. 285–294. External Links: Cited by: §3.4, §4.1.
-  (2019) Relational collaborative filtering: modeling multiple item relations for recommendation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’19, New York, NY, USA, pp. 125–134. External Links: Cited by: §3.4, Table 1, §5.
-  (2015-05) Embedding entities and relations for learning and inference in knowledge bases. In Proceedings of the International Conference on Learning Representations (ICLR) 2015, Proceedings of the International Conference on Learning Representations (ICLR) 2015 edition. External Links: Cited by: §1, 3rd item.
-  (2019) Bayes embedding (bem): refining representation by integrating knowledge graphs and behavior-specific networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM ’19, New York, NY, USA, pp. 679–688. External Links: Cited by: Table 1.
-  (2016) Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, New York, NY, USA, pp. 353–362. External Links: Cited by: §1, §3.2, §3.4, Table 1, §5.
-  (2019) Quaternion knowledge graph embeddings. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32, pp. 2735–2745. External Links: Cited by: §3.2.
-  (2019) Deep learning based recommender system: a survey and new perspectives. ACM Comput. Surv.. External Links: Cited by: §3.2.
-  (2018) Learning over knowledge-base embeddings for recommendation. CoRR abs/1803.06540. External Links: Cited by: §3.4, Table 1, §4.1, §5, §6.3.
-  (2017) Joint deep modeling of users and items using reviews for recommendation. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM ’17, New York, NY, USA, pp. 425–434. External Links: Cited by: §2.1, §7.