Revisiting Wedge Sampling for Budgeted Maximum Inner Product Search
Top-k maximum inner product search (MIPS) is a central task in many machine learning applications. This paper extends top-k MIPS with a budgeted setting, that asks for the best approximate top-k MIPS given a limit of B computational operations. We investigate recent advanced sampling algorithms, including wedge and diamond sampling to solve it. Though the design of these sampling schemes naturally supports budgeted top-k MIPS, they suffer from the linear cost from scanning all data points to retrieve top-k results and the performance degradation for handling negative inputs. This paper makes two main contributions. First, we show that diamond sampling is essentially a combination between wedge sampling and basic sampling for top-k MIPS. Our theoretical analysis and empirical evaluation show that wedge is competitive (often superior) to diamond on approximating top-k MIPS regarding both efficiency and accuracy. Second, we propose a series of algorithmic engineering techniques to deploy wedge sampling on budgeted top-k MIPS. Our novel deterministic wedge-based algorithm runs significantly faster than the state-of-the-art methods for budgeted and exact top-k MIPS while maintaining the top-5 precision at least 80
READ FULL TEXT