Scalable Neural Data Server: A Data Recommender for Transfer Learning

06/19/2022
by   Tianshi Cao, et al.
11

Absence of large-scale labeled data in the practitioner's target domain can be a bottleneck to applying machine learning algorithms in practice. Transfer learning is a popular strategy for leveraging additional data to improve the downstream performance, but finding the most relevant data to transfer from can be challenging. Neural Data Server (NDS), a search engine that recommends relevant data for a given downstream task, has been previously proposed to address this problem. NDS uses a mixture of experts trained on data sources to estimate similarity between each source and the downstream task. Thus, the computational cost to each user grows with the number of sources. To address these issues, we propose Scalable Neural Data Server (SNDS), a large-scale search engine that can theoretically index thousands of datasets to serve relevant ML data to end users. SNDS trains the mixture of experts on intermediary datasets during initialization, and represents both data sources and downstream tasks by their proximity to the intermediary datasets. As such, computational cost incurred by SNDS users remains fixed as new datasets are added to the server. We validate SNDS on a plethora of real world tasks and find that data recommended by SNDS improves downstream task performance over baselines. We also demonstrate the scalability of SNDS by showing its ability to select relevant data for transfer outside of the natural image setting.

READ FULL TEXT

page 7

page 9

page 20

page 21

research
01/09/2020

Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data

Transfer learning has proven to be a successful technique to train deep ...
research
09/28/2020

Scalable Transfer Learning with Expert Models

Transfer of pre-trained representations can improve sample efficiency an...
research
08/06/2014

Scalable Greedy Algorithms for Transfer Learning

In this paper we consider the binary transfer learning problem, focusing...
research
05/03/2021

OCTOPUS: Overcoming Performance andPrivatization Bottlenecks in Distributed Learning

The diversity and quantity of the data warehousing, gathering data from ...
research
07/08/2022

Beyond Transfer Learning: Co-finetuning for Action Localisation

Transfer learning is the predominant paradigm for training deep networks...
research
04/04/2022

SHiFT: An Efficient, Flexible Search Engine for Transfer Learning

Transfer learning can be seen as a data- and compute-efficient alternati...
research
12/15/2020

*-CFQ: Analyzing the Scalability of Machine Learning on a Compositional Task

We present *-CFQ ("star-CFQ"): a suite of large-scale datasets of varyin...

Please sign up or login with your details

Forgot password? Click here to reset