Multimodal Dataset Distillation for Image-Text Retrieval

08/15/2023
by   Xindi Wu, et al.
0

Dataset distillation methods offer the promise of reducing a large-scale dataset down to a significantly smaller set of (potentially synthetic) training examples, which preserve sufficient information for training a new model from scratch. So far dataset distillation methods have been developed for image classification. However, with the rise in capabilities of vision-language models, and especially given the scale of datasets necessary to train these models, the time is ripe to expand dataset distillation methods beyond image classification. In this work, we take the first steps towards this goal by expanding on the idea of trajectory matching to create a distillation method for vision-language datasets. The key challenge is that vision-language datasets do not have a set of discrete classes. To overcome this, our proposed multimodal dataset distillation method jointly distill the images and their corresponding language descriptions in a contrastive formulation. Since there are no existing baselines, we compare our approach to three coreset selection methods (strategic subsampling of the training dataset), which we adapt to the vision-language setting. We demonstrate significant improvements on the challenging Flickr30K and COCO retrieval benchmark: the best coreset selection method which selects 1000 image-text pairs for training is able to achieve only 5.6 distillation approach almost doubles that with just 100 (an order of magnitude fewer) training pairs.

READ FULL TEXT

page 9

page 18

page 21

page 23

page 25

page 26

research
10/06/2019

Improving Dataset Distillation

Dataset distillation is a method for reducing dataset sizes: the goal is...
research
04/17/2021

Data Distillation for Text Classification

Deep learning techniques have achieved great success in many fields, whi...
research
05/28/2023

Distill Gold from Massive Ores: Efficient Dataset Distillation via Critical Samples Selection

Data-efficient learning has drawn significant attention, especially give...
research
07/16/2021

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

Large-scale vision and language representation learning has shown promis...
research
04/27/2023

CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to Guardrail Models for Virtual Assistants

A wave of new task-based virtual assistants has been fueled by increasin...
research
05/29/2023

Towards Efficient Deep Hashing Retrieval: Condensing Your Data via Feature-Embedding Matching

The expenses involved in training state-of-the-art deep hashing retrieva...
research
06/23/2021

PatentNet: A Large-Scale Incomplete Multiview, Multimodal, Multilabel Industrial Goods Image Database

In deep learning area, large-scale image datasets bring a breakthrough i...

Please sign up or login with your details

Forgot password? Click here to reset