Learning Multimodal Data Augmentation in Feature Space

12/29/2022
by   Zichang Liu, et al.
14

The ability to jointly learn from multiple modalities, such as text, audio, and visual data, is a defining feature of intelligent systems. While there have been promising advances in designing neural networks to harness multimodal data, the enormous success of data augmentation currently remains limited to single-modality tasks like image classification. Indeed, it is particularly difficult to augment each modality while preserving the overall semantic structure of the data; for example, a caption may no longer be a good description of an image after standard augmentations have been applied, such as translation. Moreover, it is challenging to specify reasonable transformations that are not tailored to a particular modality. In this paper, we introduce LeMDA, Learning Multimodal Data Augmentation, an easy-to-use method that automatically learns to jointly augment multimodal data in feature space, with no constraints on the identities of the modalities or the relationship between modalities. We show that LeMDA can (1) profoundly improve the performance of multimodal deep learning architectures, (2) apply to combinations of modalities that have not been previously considered, and (3) achieve state-of-the-art results on a wide range of applications comprised of image, text, and tabular data.

READ FULL TEXT

page 2

page 5

research
05/03/2023

SeqAug: Sequential Feature Resampling as a modality agnostic augmentation method

Data augmentation is a prevalent technique for improving performance in ...
research
05/11/2021

Cross-Modal Generative Augmentation for Visual Question Answering

Data augmentation is an approach that can effectively improve the perfor...
research
11/17/2020

Multimodal Prototypical Networks for Few-shot Learning

Although providing exceptional results for many computer vision tasks, s...
research
08/03/2022

A Feature-space Multimodal Data Augmentation Technique for Text-video Retrieval

Every hour, huge amounts of visual contents are posted on social media a...
research
06/02/2023

Calibrating Multimodal Learning

Multimodal machine learning has achieved remarkable progress in a wide r...
research
03/03/2022

Graph Neural Networks for Multimodal Single-Cell Data Integration

Recent advances in multimodal single-cell technologies have enabled simu...
research
04/17/2018

Multimodal Co-Training for Selecting Good Examples from Webly Labeled Video

We tackle the problem of learning concept classifiers from videos on the...

Please sign up or login with your details

Forgot password? Click here to reset