Dataset2Vec: Learning Dataset Meta-Features

05/27/2019
by   Hadi S. Jomaa, et al.
0

Machine learning tasks such as optimizing the hyper-parameters of a model for a new dataset or few-shot learning can be vastly accelerated if they are not done from scratch for every new dataset, but carry over findings from previous runs. Meta-learning makes use of features of a whole dataset such as its number of instances, its number of predictors, the means of the predictors etc., so called meta-features, dataset summary statistics or simply dataset characteristics, which so far have been hand-crafted, often specifically for the task at hand. More recently, unsupervised dataset encoding models based on variational auto-encoders have been successful in learning such characteristics for the special case when all datasets follow the same schema, but not beyond. In this paper we design a novel model, Dataset2Vec, that is able to characterize datasets with a latent feature vector based on batches and thus is able to generalize beyond datasets having the same schema to arbitrary (tabular) datasets. To do so, we employ auxiliary learning tasks on batches of datasets, esp. to distinguish batches from different datasets. We show empirically that the meta-features collected from batches of similar datasets are concentrated within a small area in the latent space, hence preserving similarity. We also show that using the dataset characteristics learned by Dataset2Vec in a state-of-the-art hyper-parameter optimization model outperforms the hand-crafted meta-features that have been used in the hyper-parameter optimization literature so far. As a result, we advance the current state-of-the-art results for hyper-parameter optimization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2019

Chameleon: Learning Model Initializations Across Tasks With Different Schemas

Parametric models, and particularly neural networks, require weight init...
research
01/22/2020

Optimized Generic Feature Learning for Few-shot Classification across Domains

To learn models or features that generalize across tasks and domains is ...
research
03/30/2021

Conditional Meta-Learning of Linear Representations

Standard meta-learning for representation learning aims to find a common...
research
05/17/2023

MetaModulation: Learning Variational Feature Hierarchies for Few-Shot Learning with Fewer Tasks

Meta-learning algorithms are able to learn a new task using previously l...
research
04/06/2023

Learning to Learn with Indispensable Connections

Meta-learning aims to solve unseen tasks with few labelled instances. Ne...
research
06/06/2020

Knowledge-Based Learning through Feature Generation

Machine learning algorithms have difficulties to generalize over a small...
research
02/07/2021

Hyperparameter Optimization with Differentiable Metafeatures

Metafeatures, or dataset characteristics, have been shown to improve the...

Please sign up or login with your details

Forgot password? Click here to reset