Distill Gold from Massive Ores: Efficient Dataset Distillation via Critical Samples Selection

05/28/2023
by   Yue Xu, et al.
4

Data-efficient learning has drawn significant attention, especially given the current trend of large multi-modal models, where dataset distillation can be an effective solution. However, the dataset distillation process itself is still very inefficient. In this work, we model the distillation problem with reference to information theory. Observing that severe data redundancy exists in dataset distillation, we argue to put more emphasis on the utility of the training samples. We propose a family of methods to exploit the most valuable samples, which is validated by our comprehensive analysis of the optimal data selection. The new strategy significantly reduces the training cost and extends a variety of existing distillation algorithms to larger and more diversified datasets, e.g. in some cases only 0.04 comparable distillation performance. Moreover, our strategy consistently enhances the performance, which may open up new analyses on the dynamics of distillation and networks. Our method is able to extend the distillation algorithms to much larger-scale datasets and more heterogeneous datasets, e.g. ImageNet-1K and Kinetics-400. Our code will be made publicly available.

READ FULL TEXT

page 19

page 20

page 21

research
10/19/2020

New Properties of the Data Distillation Method When Working With Tabular Data

Data distillation is the problem of reducing the volume oftraining data ...
research
09/29/2022

Dataset Distillation using Parameter Pruning

The acquisition of advanced models relies on large datasets in many fiel...
research
08/15/2023

Multimodal Dataset Distillation for Image-Text Retrieval

Dataset distillation methods offer the promise of reducing a large-scale...
research
01/13/2023

A Comprehensive Survey to Dataset Distillation

Deep learning technology has unprecedentedly developed in the last decad...
research
11/11/2015

Unifying distillation and privileged information

Distillation (Hinton et al., 2015) and privileged information (Vapnik & ...
research
11/19/2022

Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory

Dataset distillation methods aim to compress a large dataset into a smal...
research
02/28/2023

DREAM: Efficient Dataset Distillation by Representative Matching

Dataset distillation aims to generate small datasets with little informa...

Please sign up or login with your details

Forgot password? Click here to reset