kNN-Embed: Locally Smoothed Embedding Mixtures For Multi-interest Candidate Retrieval

by   Ahmed El-Kishky, et al.

Candidate generation is the first stage in recommendation systems, where a light-weight system is used to retrieve potentially relevant items for an input user. These candidate items are then ranked and pruned in later stages of recommender systems using a more complex ranking model. Since candidate generation is the top of the recommendation funnel, it is important to retrieve a high-recall candidate set to feed into downstream ranking models. A common approach for candidate generation is to leverage approximate nearest neighbor (ANN) search from a single dense query embedding; however, this approach this can yield a low-diversity result set with many near duplicates. As users often have multiple interests, candidate retrieval should ideally return a diverse set of candidates reflective of the user's multiple interests. To this end, we introduce kNN-Embed, a general approach to improving diversity in dense ANN-based retrieval. kNN-Embed represents each user as a smoothed mixture over learned item clusters that represent distinct `interests' of the user. By querying each of a user's mixture component in proportion to their mixture weights, we retrieve a high-diversity set of candidates reflecting elements from each of a user's interests. We experimentally compare kNN-Embed to standard ANN candidate retrieval, and show significant improvements in overall recall and improved diversity across three datasets. Accompanying this work, we open source a large Twitter follow-graph dataset, to spur further research in graph-mining and representation learning for recommender systems.


page 1

page 2

page 3

page 4


Slate-Aware Ranking for Recommendation

We see widespread adoption of slate recommender systems, where an ordere...

TwERC: High Performance Ensembled Candidate Generation for Ads Recommendation at Twitter

Recommendation systems are a core feature of social media companies with...

Candidate Generation with Binary Codes for Large-Scale Top-N Recommendation

Generating the Top-N recommendations from a large corpus is computationa...

Lessons Learned Addressing Dataset Bias in Model-Based Candidate Generation at Twitter

Traditionally, heuristic methods are used to generate candidates for lar...

A Novel User Representation Paradigm for Making Personalized Candidate Retrieval

Candidate retrieval is a crucial part in recommendation system, where qu...

Representation Online Matters: Practical End-to-End Diversification in Search and Recommender Systems

As the use of online platforms continues to grow across all demographics...

Learning Multi-Stage Multi-Grained Semantic Embeddings for E-Commerce Search

Retrieving relevant items that match users' queries from billion-scale c...

Please sign up or login with your details

Forgot password? Click here to reset