PECOS: Prediction for Enormous and Correlated Output Spaces

10/12/2020
by   Hsiang-Fu Yu, et al.
0

Many challenging problems in modern applications amount to finding relevant results from an enormous output space of potential candidates. The size of the output space for these problems can range from millions to billions. Moreover, training data is often limited for many of the so-called “long-tail” of items in the output space. Given the inherent paucity of training data for most of the items in the output space, developing machine learned models that perform well for spaces of this size is challenging. Fortunately, items in the output space are often correlated thereby presenting an opportunity to alleviate the data sparsity issue. In this paper, we propose the Prediction for Enormous and Correlated Output Spaces (PECOS) framework, a versatile and modular machine learning framework for solving prediction problems for very large output spaces, and apply it to the eXtreme Multilabel Ranking (XMR) problem: given an input instance, find and rank the most relevant items from an enormous but fixed and finite output space. PECOS is a three-phase framework: (i) in the first phase, PECOS organizes the output space using a semantic indexing scheme, (ii) in the second phase, PECOS uses the indexing to narrow down the output space by orders of magnitude using a machine learned matching scheme, and (iii) in the third phase, PECOS ranks the matched items using a final ranking scheme. The versatility and modularity of PECOS allows for easy plug-and-play of various choices for the indexing, matching, and ranking phases. On a dataset where the output space is of size 2.8 million, PECOS with a neural matcher results in a 10 recursive linear matcher but takes 265x more time to train. We also develop fast real time inference procedures; for example, inference takes less than 10 milliseconds on the data set with 2.8 million labels.

READ FULL TEXT

page 11

page 14

research
05/07/2019

A Modular Deep Learning Approach for Extreme Multi-label Text Classification

Extreme multi-label classification (XMC) aims to assign to an instance t...
research
10/16/2022

End-to-End Learning to Index and Search in Large Output Spaces

Extreme multi-label classification (XMC) is a popular framework for solv...
research
08/27/2020

Neural Learning of One-of-Many Solutions for Combinatorial Problems in Structured Output Spaces

Recent research has proposed neural architectures for solving combinator...
research
08/21/2020

Fine-tune BERT for E-commerce Non-Default Search Ranking

The quality of non-default ranking on e-commerce platforms, such as base...
research
06/06/2023

Towards Memory-Efficient Training for Extremely Large Output Spaces – Learning with 500k Labels on a Single Commodity GPU

In classification problems with large output spaces (up to millions of l...
research
02/24/2017

Rank-to-engage: New Listwise Approaches to Maximize Engagement

For many internet businesses, presenting a given list of items in an ord...

Please sign up or login with your details

Forgot password? Click here to reset