DeepAI AI Chat
Log In Sign Up

M-BERT: Injecting Multimodal Information in the BERT Structure

by   Wasifur Rahman, et al.
University of Rochester
Carnegie Mellon University

Multimodal language analysis is an emerging research area in natural language processing that models language in a multimodal manner. It aims to understand language from the modalities of text, visual, and acoustic by modeling both intra-modal and cross-modal interactions. BERT (Bidirectional Encoder Representations from Transformers) provides strong contextual language representations after training on large-scale unlabeled corpora. Fine-tuning the vanilla BERT model has shown promising results in building state-of-the-art models for diverse NLP tasks like question answering and language inference. However, fine-tuning BERT in the presence of information from other modalities remains an open research problem. In this paper, we inject multimodal information within the input space of BERT network for modeling multimodal language. The proposed injection method allows BERT to reach a new state of the art of 84.38% binary accuracy on CMU-MOSI dataset (multimodal sentiment analysis) with a gap of 5.98 percent to the previous state of the art and 1.02 percent to the text-only BERT.


Multimodal Language Analysis with Recurrent Multistage Fusion

Computational modeling of human multimodal language is an emerging resea...

Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input

The ability to model intra-modal and inter-modal interactions is fundame...

Fine-Tuning BERT for Sentiment Analysis of Vietnamese Reviews

Sentiment analysis is an important task in the field ofNature Language P...

Transferring BERT-like Transformers' Knowledge for Authorship Verification

The task of identifying the author of a text spans several decades and w...

Federated pretraining and fine tuning of BERT using clinical notes from multiple silos

Large scale contextual representation models, such as BERT, have signifi...

How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations

Bidirectional Encoder Representations from Transformers (BERT) reach sta...

Detecting Insincere Questions from Text: A Transfer Learning Approach

The internet today has become an unrivalled source of information where ...