Motivation. Web users interact with a huge volume of content every day, be it for news, entertainment, or inside social conversations. To save time and effort, users are progressively depending on curated feeds for such content. A feed is a stream of individualized content items that a service provider tailors to a user. Well-known kinds of feeds include Facebook and Twitter for social networks, Quora and StackExchange for community question-answering, Spotify and Last.fm for music, Google News and Mashable for news, and so on. Since a feed is a one-stop source for information, it is important that users understand how items in their feed relate to their profile and activity on the platform.
On some platforms like Twitter and Tumblr, the feed originates solely from updates in the user’s social neighborhood or from their explicitly stated interest categories, and the connection is almost always obvious to the user. However, as service providers gather an increasing amount of user-specific information in an attempt to better cater to personal preferences, more and more platforms (like Quora, LinkedIn, and Last.fm), are generating complex feeds. Here, a feed results from an intricate combination of one’s interests, friendship network, her actions on the platform, and external trends. Such platforms are the focus of this paper. Over time, a user accumulates several thousands of actions that together constitute her profile (posts, upvotes, likes, comments, etc.), making it impossible for the user to remember all these details. Further, the user may not even possess a complete record of her actions on the platform, a common situation that has been referred to as the problem of inverse privacy (Gurevich et al., 2016).
In such situations, identifying explanatory relationships between the users’ online behavior (social network, thematic interests, actions like clicks and votes) and the feed items they receive, is useful for at least three reasons: (i) they can convince the user of their relevance, whenever a received item’s connection to the user is non-obvious or surprising; (ii) they can point the user towards future actionability (a course of action to avoid seeing more of certain kinds of items), and (iii) due to the sheer scale and complexity of data and models that service providers deal with, it is not always realistically possible to show end users why some feed item was shown to them; in such cases, these relationships are a proxy that the users could find plausible.
For example, if Alice sees a post on making bombs in her feed when she herself is unaware of any explicit connection to such, she might be highly curious as to what she might have done to create such an association. In this context, Alice would definitely find it useful if she is now shown explanations like the following, that could remind her of some relevant actions: : (i) her good friend Bob is a close friend of Charlie, who follows Chemistry and the bomb post was tagged as belonging to this category, or, (ii) she recently asked a question about food, that is recorded as a sub-category of Organics in the platform’s taxonomy, and the author of the bomb post has also categorized the post under Organics. In our study involving users each on two platforms, participants reported seeing such non-obvious items in their feeds over a period of two months.
Limitations of state-of-the-art. In principle, service providers are in the best position to offer such explanations. But they rarely do so in practice. For example, Quora simply tags items not emanating directly from one’s neighborhood or interest profile with ‘Topic you might like’, and Last.fm often has notes like ‘Similar to Shayne Ward’ for a track recommendation, neither elaborating the user’s relationship to the artist Shayne Ward, nor how the similarity was determined in the first place. Facebook’s explanations have similarly been brought into question (Andreou et al., 2018). Except (Eslami et al., 2015), there is very little work investigating relationships between user interactions on the platform and items in their feeds (outside of Twitter (Edwards et al., 2014), where the feed is exclusively from the network). Relationship discovery (Liang et al., 2016)
, however, has been explored in other contexts and for different goals, most notably for understanding entity relatedness in knowledge graphs(Fang et al., 2011; Bianchi et al., 2017; Pirrò, 2015; Seufert et al., 2016; Lao et al., 2011), predicting links on social networks (Zhang et al., 2014), and generating personalized recommendations (Yu et al., 2014; Lao and Cohen, 2010).
Understanding connections between user preferences and online advertisements has been investigated in simulated environments to some extent (Lécuyer et al., 2014; Lecuyer et al., 2015; Parra-Arnau et al., 2017). We differentiate ourselves from these approaches in the following ways:
Prior works have aimed to discover the true provenance of an item using models of causality, and are typically aimed at reverse-engineering the platform. This is very different from our goal where we try to unravel connections between a user’s own actions and what she sees in her feed, to enable her to better understand the interplay between her and the platform, and for her to have a better handle on the cognitive overload resulting from interactions with the platform;
Mechanisms underlying advertisement targeting are guided by completely different (financial) incentives in comparison to regular feed items.
Approach. We propose FAIRY, a Framework for Activity-Item Relationship discoverY, that (i) addresses the discussed challenges by building user-specific interaction graphs exclusively using information visible to the user herself, (ii) learns models for predicting relevance and surprisal, trained on data from real user-studies on two platforms (Quora and Last.fm), and (iii) uses learning-to-rank techniques to discover and rank relationships derived from the above interaction graphs.
Since our goal is to pinpoint a set of user actions and their subsequent associations to the recommended feed item, FAIRY starts out by building an interaction graph connecting items in the user’s local neighborhood. An example of such an interaction graph for a Quora user is shown in Fig. 1. The interaction graph is modeled as a heterogeneous information network (HIN) (Sun and Han, 2013; Sun et al., 2011; Deng et al., 2011; Liang et al., 2016). The HIN is a graph with different types of nodes (users , categories , and items ) and edges (corresponding to different action types like follow, ask, and upvote, shown in different colors). Nodes and edges in this HIN are weighted (not marked in Fig. 1 for simplicity). Edges are directed and timestamped, corresponding to the time an action was performed, wherever applicable (’s in Fig. 1). In the FAIRY framework, each path in this interaction graph that connects the user (Alice) to a feed item (post on bombs) corresponds to a potential explanation for that item, provided that each edge on the path has timestamp , where is the time the feed item was seen by . Two possible explanation paths (out of many more) are shown via solid edges in Fig. 1.
The high number of such explanation paths in the HIN demands a subsequent ranking module. We employ a learning-to-rank (LTR) model based on ordinal regression (Joachims, 2006), that models judgments of relevance and surprisal, collected from the same
users who received the feed item. Features used in the LTR models correspond to light-weight estimations of user influence, category specificity, item engagement, path pattern frequency, and so on. These are derived from node and edge weights in the HIN, and are intentionally kept simple, to enable interpretability by the end-user.
We compare FAIRY to three baselines: ESPRESSO (Seufert et al., 2016), REX (Fang et al., 2011), and PRA (Lao and Cohen, 2010), based on different underlying algorithms. These are state-of-the-art methods for computing entity relatedness over knowledge graphs. FAIRY outperforms these baselines on two representative platforms in modeling both relevance and surprisal. Code for FAIRY is available at https://github.com/azinmatin/fairy.
Contributions. This paper’s key contributions are:
the first user-centric framework for discovering and ranking explanation paths between users’ activities on a social network and items in their feed;
models for capturing the subtle aspects of user-relevance and surprisal for such explanations, with learning-to-rank techniques over light-weight features;
extensive experiments including ablation studies, showing systematic improvements over multiple baselines, and identifying vital factors in such models;
a user-study conducted over two months involving users each on Quora and Last.fm, providing useful design guidelines for future research on feed analysis.
2. Discovering Explanations
The FAIRY framework uses a heterogeneous information network (HIN) (Sun and Han, 2013; Liang et al., 2016; Yu et al., 2014) to represent a user’s presence and activities on a social network as a graph, and relies on a learning-to-rank (LTR) model to order explanation paths mined from this user-specific interaction graph.
We are given a user , where is the set of members of , and her feed on , . Our goal here is to find and rank the set of explanation relations between and , where . is the set of connections between and via ’s local neighborhood on . This local neighborhood is initialized using the set of actions that performs: , where . Examples of such activities are ‘asking a question’ on Quora or ‘loving an album’ on Last.fm. Connections to via are identified by traversing the vicinity of in , and recorded in ’s interaction graph, . Strictly speaking, may not be fully connected. Typically, however, has an underlying taxonomy of topics that all entities in the network must belong to, and overlaying on the vicinities of and ensures connectivity in . Specifically, a path between any pair of entities may be found by first associating them via the categories they directly belong to, followed by subsequent generalization by traversing higher up in , till a connection is established (see (Kasneci et al., 2009) for an application of this strategy in relationship mining). More formally, each is an instance of a HIN, defined as:
Definition 2.0 ().
The Interaction Graph of a user is a directed and weighted multi-graph , where: is the set of nodes, is the set of edges, and are functions mapping nodes and edges to their corresponding types (sets and ), and are node and edge weight mapping functions respectively, and maps each edge to a timestamp .
The user , her actions , and the feed item are naturally part of (). Fig. 1 shows a representative interaction graph where is the leftmost user Alice, and is the rightmost item marked with a bomb. We now explain each property of this HIN, instantiating them with corresponding features of social networks.
Nodes in correspond to entities in the social network. Node types are either users or various classes of content (categories, tags, posts, songs, etc.). Via mapping , we have , where is one of the types above for node . In Fig. 1, , such that (Alice) = user, and (Health) = category.
An edge represents a relationship (interchangeably referred to as actions henceforth) between two nodes . Edges represent the following connections: (i) user-content: these capture engagement or actions by the users on the content in (in Fig. 1, user Alice category Health), (ii) user-user: they capture social relationships between users (Charlie Sam), and (iii) content-content: these edges capture relationships between content items (Food Organics). Each is mapped to an edge type , where , instantiated depending on the platform (e.g., asks, answers, follows, for Quora); (Charlie Sam) = follows). For each edge in the HIN, we add an opposite edge typed with the inverse relation between the same pair of nodes (e.g., we add Health Alice). This enables bi-directional traversal of the HIN.
Nodes and edges in are associated with non-negative weights. The node weight may reflect node influence, specificity, or engagement depending upon the entity type. The value of itself may be derived from measurable features of the platform that are visible to . For example, if the post on food was upvoted by users, . An edge weight counts the number of times the action was performed, e.g., a Last.fm user can scrobble (listen to) a specific song five times ().
is a multi-graph, implying that there may exist more than one edge between any two nodes, corresponding to multiple actions. For example, Twitter users can both like and re-tweet a Tweet, and users on Last.fm can both scrobble a song, and love it.
There exists a unique edge timestamp denoted by , which is the time when the corresponding action was performed (e.g., ). This is possibly null, if
has existed since the epoch (category memberships are assumed to be such edges in this work). For edges with(action performed multiple times), we define as the timestamp of the first instance of this action.
The size of the interaction graph is characterized by the eccentricity of , . This is the greatest graph (geodesic) distance (length of shortest path) between and any other node in , i.e., . In other words, yields the -hop neighborhood of in . In Fig. 1, .
We are now in a position to discover the set of explanation relations , formally defined below.
Definition 2.0 ().
An Explanation Path is a path connecting and in such that the timestamp of every edge in is strictly less than the time when was seen by , i.e. .
In Fig. 1, for (Alice, bomb-post ), if we have , then , is a valid explanation path. However, the following path: , is invalid as the timestamp . Henceforth, explanations and relations (with or without paths) are used interchangeably, and should be understood as equivalent. An explanation path can thus be a combination of user-content, user-user, and content-content edges. For example, the path outlined earlier in this paragraph is a mixture of user-content and content-content edges, but lacks any user-user connection. Thus, given, , , and , we extract all explanation paths between and from as candidates for further processing.
It can be argued that since the interaction graph could often be dense, explanations could be better presented as sub-graphs (Seufert et al., 2016) instead. But we prefer paths in this work due to the following reasons: (i) sub-graphs are difficult to isolate by influence, especially for dense neighborhoods; (ii) paths (simple, without loops), are atomic units of relationship, and subgraphs can, in fact, be reduced to a number of constituent paths; and (iii) subgraphs are harder to interpret for the average user, and may be more difficult to make comparative assessments.
3. Ranking Explanations
Due to the high activity count aggregated by a user over her time as a member of the platform, and due to the richness of the platform itself (allowing a post to have multiple categories, having a detailed directed acyclic graph (DAG) taxonomy, etc.), the usual number of candidate explanation paths is too high to be processed by a user, if presented all at once. Measurements from our user study show that the number of such paths can vary from a few thousand to even millions (depending on the graph size determined by , and length of relation path ). Thus, it is imperative that we rank these paths and present only a top few, to prevent a cognitive overload.
Over the last decade or so, learning-to-rank (LTR) (Cao et al., 2007; Joachims, 2006) as emerged as the de facto framework for supervised ranking in information retrieval and data mining, which motivates its application to FAIRY. LTR has three basic variants: pointwise, pairwise, and listwise, with each type proving beneficial in specific contexts (Bast and Haussmann, 2015; Radlinski and Joachims, 2005).
The common guiding criterion, though, is the nature of gold label judgments that can be collected (and/or inferred). In our case, we want to model relevance and surprisal of the explanation paths. Generally, it is difficult for a user to score a standalone explanation path on scale of
, say. At the other extreme, it might be even harder to score a complete list of say, a heuristically chosen sample of ten paths. Collectingpreference judgments is the most natural thing to do in our setting: it is a conceivable task for an end-user (an average social media user, here) to rate a path as being more relevant (generally useful as a satisfactory explanation to her), or it being more surprising (such as discovering a forgotten/unknown connection in her vicinity), than another path connecting her to the same feed item. Collecting such explicit pairwise annotations has been suggested as being cognitively preferable to the user in other contexts like document relevance assessments (Carterette et al., 2008). In a similar vein, we use a pairwise learning-to-rank model based on ordinal regression, that directly makes use of the users’ preference judgments on pairs of explanations (Joachims, 2006). We used with a linear kernel, as it is very fast to train, and has been shown to be highly effective in ranking result pages for search queries (Schuhmacher et al., 2015).
3.1. LTR features
The general principles guiding this work are explainability and transparency. This influences the choice of our features in two ways: (i) while the provided explanation should already be insightful to the user, it is not unreasonable to assume that the user could, in turn, want to know what was responsible for a few chosen paths to be shown as more surprising or relevant than others. This points towards using simple and interpretable features that can give a naïve user a handle on what was found to be “important” in this context; and, (ii) the features should be visible to the user in question: either public, or easily accessible in a few clicks, or by visiting a user-specific URL. Data that only the service provider has access to (derived measures of user-user similarity), or requires excessive crawling (total number of authors of all posts in a category) are clearly unsuitable in the task at hand. With these guidelines, we define the following sets of features for the FAIRY framework. These are grouped into five sets: (i) user, (ii) category, (iii) item, (iv) path instance, and (v) path pattern. For the first three sets, if there are multiple instances of the same type on a path (two users or three categories), the feature value is averaged over these instances.
3.1.1. User features
We consider two factors for users on explanation paths: (i) user influence, and (ii) user activity. Influence is typically measured using number of followers or the link ratio (ratio of followers to followees, wherever friendship is not mutual) on social networks (Srijith et al., 2017). Higher the link ratio, higher is the perceived influence. For activity, we measure the individual types of activity that the user is allowed on the platform (scrobbling tracks, loving tracks, and following other users on Last.fm). More influential and active users may have a discriminative effect on the user’s judgments for relevance.
3.1.2. Category features
(i) Popularity or influence can be estimated for categories too, for example, by counting the number of posts in them, or looking at their total numbers of followers or subscribers. Such aggregates are often made visible by providers, and it is not necessary to actually visit and count the items concerned. (ii) We also consider category specificity (Ramakrishnan et al., 2005), which is reflected by its depth in the category hierarchy (the higher the level, the more specific it is, root assumed to be level ) and the number of children (sub-categories) it has in the taxonomy (more children implying less specificity). While popularity may influence user’s inputs similarly as for users, high category specificity may directly affect the surprisal factor.
3.1.3. Item features
(i) Specificity for items (songs, posts, etc.) may be analogously computed by counting the number of different categories that the item belongs to. (ii) Engagement is an important measure for items, which represents the different actions that have been performed on it (typically the same as the number of users who have interacted with the item). Examples include the number of different listeners of a song on Last.fm, or the number of different answers that a question has on Quora, etc. Engagement may be perceived as analogous to influence or popularity for users and categories in its role in FAIRY.
3.1.4. Path instance features
There are some properties of the explanation path as a whole: these can be understood better by further separating them into path instance and path pattern features. Instance features measure aspects of the specific path in question. These include: (i) Aggregate similarity of feed item to relation path , i.e., , where is any internal node on the path, and is the path length ( gives the number of such internal nodes). This normalized similarity function is treated as a plug-in and can be instantiated via embedding-based similarities, or computed locally from itself. The item-path similarity function, in a sense, measures the coherence of the extracted relation path to the recommended item. If it is very low, the path could be quite surprising to the user. (ii) Analogous aggregate similarity of user with : . This feature models the familiarity of the user with the path as a whole. (iii) Path length : The length of an explanation is easily one of the most tangible factors for a user to make decisions on: while shorter paths may imply obvious connections, longer paths may be more surprising. (iv) Path recency: This is a temporal feature, and is defined as the most recent edge on the path with respect to the feed: . Highly recent paths will have a low value of this feature; the idea is that relatively newer paths may be more relevant to the user due to freshness, while older paths may have an associated surprise factor. (v) Edge weights: This feature averages the number of times each action on the path has been repeated (e.g. number of times a user has listened to a specific song). Note that this feature cannot be used for Quora as none of the actions can be repeated, i.e, the user cannot upvote/follow/ask/answer the same item more than once.
3.1.5. Path pattern features
Pattern features drop concrete instantiations and in and deal with the underlying sequence of node and edge types and instead ( in Fig. 1 has the path pattern: ). We consider the following features: (i) Pattern frequency: This is the average support count of a pattern between any and (frequent patterns may be less surprising). (ii) Pattern confidence: The percentage of pairs with at least one observed instance of the pattern. (iii) Edge type counts: Users in our study explicitly mentioned the effect of some edge types on their choice of relevance and suprisal. Thus, to zoom into the effect produced by the aggregate measure of pattern frequency, we also considered the counts of each edge (action) type present on the path as features (#likes, #follows, #belongs to, etc).
The general intuition here is that the user has clear mental models of relevance and surprisal; the above features are what she sees in her daily interactions with the platform. These are the tangible perceptions that influence her models, and the aim here is to learn how these factors combine to mimic her assessments.
4. User studies
Platforms. Since user feeds are never public, we needed to design user studies from where we could collect gold judgments on explanation paths extracted and ranked by FAIRY. While there are a plethora of social network platforms providing personalized feeds, we chose Quora and Last.fm as they possess the richness that can truly test the full power of our HIN model. To be specific, these platforms have node types, edge types, non-obvious feed items, properties for estimating node and edge weights, have millions of members, and have been subject of previous research (e.g., Quora in (Wang et al., 2013), Last.fm in (Jäschke et al., 2007)). Also, one being a community-question answering site, and the other an online music recommender, they represent two completely different application platforms, and hence ideal to evaluate FAIRY with. We use a slightly simplified version of these platforms’ schemas, as shown in Fig. 2.
Users. We hired users each for Quora and Last.fm, for interacting with the platforms and assessing explanations, in May - July 2018. Users had to set up fresh accounts (using real credentials for accountability) so that all their activity can be recorded. Each user had to spend hours on one platform in total. This total time was divided into one hour sessions per day, with a gap of one day between consecutive sessions. Interactions were planned in this staged manner so as to allow the service provider enough time to build up user profiles, and generate personalized feeds. The hired users were graduate students of mixed background who were familiar with Quora and Last.fm. They were paid $10 per -hour session, for three tasks: (i) interacting with the platform (a minimum of activities per session from permissible actions in Fig. 2, no upper bound, “natural” behavior recommended), (ii) identifying non-obvious feed items after going through their complete feed (both platforms have a countable feed at a given point of time), and (iii) providing assessments of relevance and surprisal. Their complete activity and feed on the platforms were recorded. Users were given a random set of ten initial topics to follow on Quora, due to such a requirement by the platform. They were assigned five initial followers from within the study group on Quora and Last.fm.
Judgments. After each session, we updated the interaction graphs of users, selected three non-obvious recommendations per user, and mined explanation paths for these feed items. Path length was restricted to four for Quora and five in Last.fm, which resulted in about paths on average over pairs. Increasing path length led to an exponential increase in the number of explanations. About pairs of paths were then randomly sampled per feed item, to be assessed by the user on two questions: (i) Which explanation path is more relevant (useful) to you for this feed item? (ii) Which explanation path is more surprising to you for this feed item? The users were allowed to give free-text comments on reasons motivating their choices. Several of these intuitions and allusions were encoded into our design considerations.
Ethics. To comply with standard ethical guidelines, all participants were informed about the purpose of the study, and that their data was being collected for research purposes. They signed documents to confirm their awareness of the same. Users signed up with their real credentials; no terms of service of the providers were violated over the course of the research. All users deleted their accounts at the conclusion of the study.
5. Evaluation Setup
|Platform||#Activity||—N—||—E—||#(u, f) paths|
Datasets. For Quora, we used pairs of explanation paths evaluated by users as the gold standard. These explanation paths covered distinct pairs. Each explanation path appeared in pairs on average. For Last.fm, we collected evaluated pairs from users. These paths were extracted as potential explanations for distinct pairs. Each path occurred in about pairs. Details on the interaction graphs are in Table 1.
LTR. To run LTR, we divided each dataset into training, development, and test sets. We used (Soh et al., 2013) with a linear kernel in all our experiments.
Baselines. FAIRY was compared with three baselines for relationship discovery: ESPRESSO (Seufert et al., 2016), REX (Fang et al., 2011) and PRA (Lao and Cohen, 2010). In ESPRESSO (Seufert et al., 2016), the goal is to find relatedness cores (dense subgraphs) between two sets of query nodes. For this, they first identify a center, i.e, a node with highest similarity to both the input query sets. Then, they expand the subgraph by adding other key entities, their context entities, and query context entities. In each step, the entities are selected based on random-walk based scores. To apply this algorithm, we consider and as the input query sets. For each path, we compute its score as if it were the output of ESPRESSO. For this, we first find the the most similar node on the path to both and as the center. We then expand the set of selected nodes by adding their adjacent nodes on the path. At the time of adding each node, we compute their random-walk based similarity to the adjacent (already selected) node on the path. At the end, we compute the score of each path by averaging the scores of its comprising nodes.
REX (Fang et al., 2011) takes a pair of entities and returns a ranked list of its relationship explanations. Like ESPRESSO, the relationships are in the form of subgraphs. The extracted relationships are then ranked based on different classes of measures. These measures can be used for paths as well. We used all aggregate and distributional measures from the original paper. However, for brevity, we only describe the global distributional measure as it performed the best. This measure captures rarity of relations as a signal of interestingness. For this, we compared the value of for each explanation path .
For PRA (Lao and Cohen, 2010), we computed scores of paths via pattern-constrained random walks. For instance, a pattern like “user post” only allows the random walker to leave the source node with type “user” to nodes with type “post” through edges of type “follows”.
Metric. We measured the percentage accuracy of each method (or configuration, as applicable) as the ratio of correct predictions to all predictions over pairs of relationship paths.
6. Results and Insights
6.1. Key findings
|Platform||Method||FAIRY||ESPRESSO (Seufert et al., 2016)||REX (Fang et al., 2011)||PRA (Lao and Cohen, 2010)|
Comparison of FAIRY with baselines. Table 2 shows the comparison of accuracy for the relevance and surprisal models of FAIRY with baselines on both platforms. In all cases, FAIRY significantly outperforms all the baselines (paired -test with -value 0.05). Note that all baselines have the same model for both relevance and surprisal as they try to find either ‘relevant’ or ‘interesting’ relationships. All baselines solely rely on structural properties of the underlying graphs. In ESPRESSO, scores of cores are affected by degrees of intermediate nodes. More precisely, on Quora, category nodes have large degrees as they are connected to many other nodes. This affects scores of paths with many intermediate category nodes. A similar problem happens in PRA as it is also based on random walk. Besides, as path scores are computed by multiplying reciprocals of node degrees, PRA is biased toward shorter paths. In REX, we are only able to compare explanation paths with different path patterns. This has substantial effect on accuracy, as many explanation paths share the same pattern.
User-specific models. To test subjective preferences for relevance and surprisal, we built and evaluated analogous user-specific LTR models. FAIRY accuracies of user-specific models were observed to be higher than the aggregate global model for most users (Fig. 3-6). There are, however, a few users whose judgments we could not easily predict. User id’s are assigned by descending order of FAIRY accuracy in Fig. 3, and these id’s are used as references for comparison across all subsequent figures. We found that FAIRY outperforms baselines in user models as well. Note, again, that baselines do not have separate models for relevance and surprisal.
6.2. Analysis and Discussion
|No path patt.|
|No path inst.|
Ablation study. To analyze effects of different feature groups on LTR accuracy, we removed one group at a time and retrained the models. The key finding was that the removal of none of the feature groups could improve the accuracy of all models on all platforms. For example, while the removal of path pattern features does not affect accuracies of models on Last.fm, it can do so on Quora by around (from to ). Therefore, for the sake of consistency we keep the set of features the same on both platforms. Details of feature group removal are presented in Table 3
. To systematically study variations in features, we tested the effects of adding/removing/replacing single features. For instance, to compute the aggregate similarity of the feed item to the explanation path, we plugged in two different similarity functions: embedding-based similarity and graph-based similarity. In the former, we computed the cosine similarity between the embeddings of categories/tags. To learn the embedding of each category/tag, we first sampled a set of related sentences. On Quora, we sampledquestions at random, asked in the concerned category. For Last.fm tags, we treated each tag as a sentence. Then we learned the embedding of each sentence using the latent variable generative model in Arora et al. (Arora et al., 2016) and represented the category/tag with the average embedding of the sampled sentences. In the graph-based similarity function, we used the taxonomic distance between categories/tags in the category DAG. Replacing the graph-based similarity with embeddings improved the accuracy by percent (from to ). This emphasizes the insufficiency of graph structures in capturing the similarity of nodes. We tried several other variations too. For example, adding counts of node types as a new feature, or replacing total edge counts with edge type counts (user-content, user-user and content-content) counts, or replacing all user activity features with their maximum values (instead of averages): all of these hurt the accuracy of the Quora relevance model () by at least .
Perturbation analysis. To understand the effect of instances of each node type on users’ judgments, we changed our sampling strategy so that paths in each pair differ in only one instance. In other words, one path can be obtained by perturbing any one instance of the other path. For example, Alice Bob Health is a user-perturbation of: Alice Jack Health. We generated perturbed paths for each node type and retrained the LTR models. Table 4 shows that in most cases, random sampling fared better than such perturbed sampling. The only exception is for user-perturbed paths in Last.fm. In one-on-one interviews with users at the end of the study, some users explained their preferences (more relevant) as being caused by presence of some specific friends on the paths. User-perturbation created pairs of paths where such friends were present and absent, resulting in clearer preferences and improved modeling. The last row of the table, however, shows that item-perturbations greatly degraded performance. This can be attributed to incomplete knowledge of the users on certain path items in the interaction graphs: replacing such items by yet other unfamiliar items clearly has an arbitrary effect on assessments, which seemed to worsen model accuracies.
Transitivity of judgments. Relevance and surprisal are subtle factors, and it is worthwhile to investigate transitivity in users’ assessments (for both understanding and as a sanity check). So we extracted all triplets of explanation paths where we had the users’ judgments on any two pairs. We then computed a where is the indicator of a transitive triplet. On a positive note, it turned out that people were transitive (consistent) in their judgments for and of the Quora and Last.fm triplets respectively.
Surprisal and complexity. To make sure that users were not simply using complexity (say, length) of a path as a proxy for surprisal, we asked about a third of the users to additionally select the “more complex” path. We noticed that for of pairs, the more surprising path was in fact the simpler (shorter) one, indicating that surprisal is based on more implicit factors.
Anecdotal examples. Table 5 presents some examples of correct (in blue) and wrong (in red) predictions by FAIRY. In each pair, the first path denotes the preferred one by the user. The diversity of node and edge types in correct pairs shows the ability of FAIRY to learn underlying factors determining relevance or surprisal. The wrongly predicted pairs provide insights on the shortcomings of FAIRY. For example, in the surprisal model, we do not consider sensitivity of topics or items. Pair #4 is one such example where explanation paths reveal such sensitive items the users interacted with. Another limitation of FAIRY is that it does not incorporate background information about the users (such as their nationality) which clearly affects their preferences. For example, on Last.fm, some users mentioned that presence of ethnic tags (such as Persian, Latin or Brazilian) influenced their relevance decisions (pair #6).
7. Related Work
Social feeds and transparency. Personalizing social feeds has been the focus of many studies, as the amount of information generated by users’ networks is overwhelming. To increase user engagement, models have been developed for finding relevant feed items for users by exploiting their past behavior (Freyne et al., 2010; Hong et al., 2012; Soh et al., 2013; Agarwal et al., 2015). Users, however, are often unaware of the presence of such curation algorithms (Hamilton et al., 2014; Eslami et al., 2015) as service providers generally do not provide insightful explanations. For example, Cotter et al. (Cotter et al., 2017) demonstrate inadequacy of explanations for feed ranking in Facebook.
Heterogeneous information networks and meta-paths. Due to limitations of traditional graphs for capturing complex semantics with different types of entities and relations, heterogeneous information networks (HIN) were introduced to model multiple node and edge types (Sun and Han, 2013). To analyze such HINs better, a meta-path was defined as the pattern of a path (sequence of node and edge types). Meta-paths have since been used in different applications such as similarity search (Sun et al., 2011; Seyler et al., 2018), relationship discovery (Fang et al., 2011; Kong et al., 2013; Behrens et al., 2018), link prediction (Sun et al., 2012; Zhang et al., 2014, 2018; Dong et al., 2017), and generating recommendations (Lee et al., 2013; Hu et al., 2018; Liu et al., 2014) in HINs.
Relationship discovery in knowledge graphs. Finding interesting relationships among graph concepts is too broad an area to do justice in a short survey: however, mining such connections for knowledge graph entities is a more pertinent sub-problem that has been well-studied. This task is either (semi-)supervised, where users are asked for feedback (Behrens et al., 2018)
, or unsupervised, where heuristic measures of “interestingness” are applied to detect and rank relationships. However, user utility is probably multi-faceted: while we have explored relevance and surprisal, there are probably more, like coherence or complexity. These measures are normally approximated using topological properties of graphs, such as specificity and rarity of node/edge/path types(Ramakrishnan et al., 2005; Fang et al., 2011) or connectivity/reachability of nodes (Lao and Cohen, 2010; Seufert et al., 2016; Liang et al., 2016). These scoring strategies, however, implicitly assume a static topology, and might not be very useful for dynamic interaction graphs where nodes and edges are added and deleted with each timestep, and it is imperative that relationships should take temporal constraints into account.
We presented FAIRY, a smart user-side framework that presents ranked explanations to users for items in their social feeds. Explanations are represented as relationship paths connecting the user’s own actions to the received feed items. FAIRY was trained and evaluated on data from two real user studies on the popular platforms of Quora and Last.fm. It outperformed three baselines on relationship mining on the task of modeling and predicting what users considered relevant and surprising explanations. The success of FAIRY hinges on two key aspects: a powerful heterogeneous information network representation of the user’s local neighborhood that can capture the complexity of current social media platforms, and, (ii) a fast learning-to-rank model that operates with intuitive and interpretable features that are easily accessible to the user.
FAIRY is the first step towards a general goal of improving transparency through the user’s lens. We plan to implement FAIRY as a browser plug-in for users. Future directions for research would include, among others: (i) better modeling of temporal information in the interaction graph, (ii) further exploiting content-features to build better models of entity similarity, and (iii) understanding effects of the user’s activities across multiple connected platforms.
Acknowledgements. This work was partly supported by the ERC Synergy Grant 610150 (imPACT) and the DFG Collaborative Research Center 1223. We would like to thank Krishna P. Gummadi from MPI-SWS for useful discussions at various stages of this work.
- Personalizing linkedin feed. In KDD, Cited by: §7.
- Investigating ad transparency mechanisms in social media: A case study of Facebook’s explanations. In NDSS, Cited by: §1.
- A simple but tough-to-beat baseline for sentence embeddings. In ICLR, Cited by: §6.2.
- More accurate question answering on Freebase. In CIKM, Cited by: §3.
- MetaExp: interactive explanation and exploration of large knowledge graphs. In WWW, Cited by: §7, §7.
- Actively learning to rank semantic associations for personalized contextual exploration of knowledge graphs. In ESWC, Cited by: §1.
- Learning to rank: From pairwise approach to listwise approach. In ICML, Cited by: §3.
- Here or there: Here or There: Preference Judgments for Relevance. In ECIR, Cited by: §3.
- Explaining the news feed algorithm: an analysis of the news feed fyi blog. In CHI, Cited by: §7.
- Probabilistic topic models with biased propagation on heterogeneous information networks. In KDD, Cited by: §1.
- Metapath2vec: scalable representation learning for heterogeneous networks. In KDD, Cited by: §7.
- Is that a bot running the social media feed? Testing the differences in perceptions of communication quality for a human agent and a bot agent on twitter. Computers in Human Behavior. Cited by: §1.
- “I always assumed that I wasn’t really that close to [her]”: Reasoning about Invisible Algorithms in News Feeds. In CHI, Cited by: §1, §7.
- REX: Explaining relationships between entity pairs. In VLDB, Cited by: §1, §1, §5, §5, Table 2, §7, §7.
- Social networking feeds: recommending items of interest. In RecSys, Cited by: §7.
- Inverse privacy. CACM. Cited by: §1.
- A path to understanding the effects of algorithm awareness. In CHI, Cited by: §7.
- Learning to rank social update streams. In SIGIR, Cited by: §7.
Leveraging meta-path based context for top-n recommendation with a neural co-attention model. In KDD, Cited by: §7.
- Tag recommendations in folksonomies. In PKDD, Cited by: §4.
- Training linear svms in linear time. In KDD, Cited by: §1, §3, §3.
- STAR: Steiner-tree approximation in relationship graphs. In ICDE, Cited by: §2.
- Multi-label classification by mining label and instance correlations from heterogeneous information networks. In KDD, Cited by: §7.
- Relational retrieval using a combination of path-constrained random walks. Machine learning. Cited by: §1, §1, §5, §5, Table 2, §7.
- Random walk inference and learning in a large scale knowledge base. In EMNLP, Cited by: §1.
- XRay: Enhancing the web’s transparency with differential correlation. In USENIX Security Symposium, Cited by: 2nd item, §1.
- Sunlight: Fine-grained targeting detection at scale with statistical confidence. In SIGSAC, Cited by: 2nd item, §1.
- PathRank: ranking nodes on a heterogeneous graph for flexible hybrid recommender systems. Expert Systems with Applications. Cited by: §7.
- What links Alice and Bob?: Matching and ranking semantic patterns in heterogeneous networks. In WWW, Cited by: §1, §1, §2, §7.
- Meta-path-based ranking with pseudo relevance feedback on heterogeneous graph for citation recommendation. In CIKM, Cited by: §7.
- MyAdChoices: Bringing transparency and control to online advertising. TWeb. Cited by: §1.
- Explaining and suggesting relatedness in knowledge graphs. In ISWC, Cited by: §1.
- Query chains: Learning to rank from implicit feedback. In KDD, Cited by: §3.
- Discovering informative connection subgraphs in multi-relational graphs. ACM SIGKDD Explorations Newsletter. Cited by: §3.1.2, §7.
- Ranking entities for Web queries through text and knowledge. In CIKM, Cited by: §3.
- ESPRESSO: Explaining relationships between entity sets. In CIKM, Cited by: §1, §1, §2, §5, Table 2, §7.
- An information retrieval framework for contextual suggestion based on heterogeneous information network embeddings. In SIGIR, Cited by: §7.
- Recommendation for online social feeds by exploiting user response behavior. In WWW, Cited by: §5, §7.
- Longitudinal modeling of social media with Hawkes process based on users and networks. In ASONAM, Cited by: §3.1.1.
- When will it happen?: relationship prediction in heterogeneous information networks. In WSDM, Cited by: §7.
- Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. In VLDB, Cited by: §1, §7.
- Mining heterogeneous information networks: A structural analysis approach. ACM SIGKDD Explorations Newsletter. Cited by: §1, §2, §7.
- Wisdom in the social crowd: An analysis of Quora. In WWW, Cited by: §4.
- Personalized entity recommendation: A heterogeneous information network approach. In WSDM, Cited by: §1, §2.
- Camel: Content-Aware and Meta-path Augmented Metric Learning for Author Identification. In WWW, Cited by: §7.
- Meta-path based multi-network collective link prediction. In KDD, Cited by: §1, §7.