Modular and Parameter-Efficient Multimodal Fusion with Prompting

03/15/2022
by   Sheng Liang, et al.
4

Recent research has made impressive progress in large-scale multimodal pre-training. In the context of the rapid growth of model size, it is necessary to seek efficient and flexible methods other than finetuning. In this paper, we propose to use prompt vectors to align the modalities. Our method achieves comparable performance to several other multimodal fusion methods in low-resource settings. We further show that our method is modular and parameter-efficient for processing tasks involving two or more data modalities.

READ FULL TEXT
research
04/13/2023

Efficient Multimodal Fusion via Interactive Prompting

Large-scale pre-training has brought unimodal fields such as computer vi...
research
11/10/2019

Dynamic Fusion for Multimodal Data

Effective fusion of data from multiple modalities, such as video, speech...
research
07/22/2021

Multi-stage Pre-training over Simplified Multimodal Pre-training Models

Multimodal pre-training models, such as LXMERT, have achieved excellent ...
research
05/31/2018

Efficient Low-rank Multimodal Fusion with Modality-Specific Factors

Multimodal research is an emerging field of artificial intelligence, and...
research
12/22/2021

Multimodal Personality Recognition using Cross-Attention Transformer and Behaviour Encoding

Personality computing and affective computing have gained recent interes...
research
08/26/2020

Training Multimodal Systems for Classification with Multiple Objectives

We learn about the world from a diverse range of sensory information. Au...
research
06/13/2017

Online Convolutional Dictionary Learning for Multimodal Imaging

Computational imaging methods that can exploit multiple modalities have ...

Please sign up or login with your details

Forgot password? Click here to reset