TEASEL: A Transformer-Based Speech-Prefixed Language Model

09/12/2021
by   Mehdi Arjmand, et al.
8

Multimodal language analysis is a burgeoning field of NLP that aims to simultaneously model a speaker's words, acoustical annotations, and facial expressions. In this area, lexicon features usually outperform other modalities because they are pre-trained on large corpora via Transformer-based models. Despite their strong performance, training a new self-supervised learning (SSL) Transformer on any modality is not usually attainable due to insufficient data, which is the case in multimodal language learning. This work proposes a Transformer-Based Speech-Prefixed Language Model called TEASEL to approach the mentioned constraints without training a complete Transformer model. TEASEL model includes speech modality as a dynamic prefix besides the textual modality compared to a conventional language model. This method exploits a conventional pre-trained language model as a cross-modal Transformer model. We evaluated TEASEL for the multimodal sentiment analysis task defined by CMU-MOSI dataset. Extensive experiments show that our model outperforms unimodal baseline language models by 4 (SoTA) model by 1 smaller than the SoTA model.

READ FULL TEXT

page 1

page 2

page 4

page 5

page 6

page 8

page 9

page 10

research
11/14/2022

Grafting Pre-trained Models for Multimodal Headline Generation

Multimodal headline utilizes both video frames and transcripts to genera...
research
08/21/2023

Can Language Models Learn to Listen?

We present a framework for generating appropriate facial responses from ...
research
11/18/2021

Transformer-S2A: Robust and Efficient Speech-to-Animation

We propose a novel robust and efficient Speech-to-Animation (S2A) approa...
research
01/21/2023

REDAffectiveLM: Leveraging Affect Enriched Embedding and Transformer-based Neural Language Model for Readers' Emotion Detection

Technological advancements in web platforms allow people to express and ...
research
11/07/2021

How does a Pre-Trained Transformer Integrate Contextual Keywords? Application to Humanitarian Computing

In a classification task, dealing with text snippets and metadata usuall...
research
05/23/2023

Training Transitive and Commutative Multimodal Transformers with LoReTTa

Collecting a multimodal dataset with two paired modalities A and B or B ...
research
02/17/2023

GPT4MIA: Utilizing Generative Pre-trained Transformer (GPT-3) as A Plug-and-Play Transductive Model for Medical Image Analysis

In this paper, we propose a novel approach (called GPT4MIA) that utilize...

Please sign up or login with your details

Forgot password? Click here to reset