Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors

11/23/2018
by   Yansen Wang, et al.
0

Humans convey their intentions through the usage of both verbal and nonverbal behaviors during face-to-face communication. Speaker intentions often vary dynamically depending on different nonverbal contexts, such as vocal patterns and facial expressions. As a result, when modeling human language, it is essential to not only consider the literal meaning of the words but also the nonverbal contexts in which these words appear. To better model human language, we first model expressive nonverbal representations by analyzing the fine-grained visual and acoustic patterns that occur during word segments. In addition, we seek to capture the dynamic nature of nonverbal intents by shifting word representations based on the accompanying nonverbal behaviors. To this end, we propose the Recurrent Attended Variation Embedding Network (RAVEN) that models the fine-grained structure of nonverbal subword sequences and dynamically shifts word representations based on nonverbal cues. Our proposed model achieves competitive performance on two publicly available datasets for multimodal sentiment analysis and emotion recognition. We also visualize the shifted word representations in different nonverbal contexts and summarize common patterns regarding multimodal variations of word representations.

READ FULL TEXT
research
10/24/2020

Learning Fine-Grained Multimodal Alignment for Speech Emotion Recognition

Speech emotion recognition is a challenging task because the emotion exp...
research
02/03/2018

Multi-attention Recurrent Network for Human Communication Comprehension

Human face-to-face communication is a complex multimodal signal. We use ...
research
01/02/2018

Learning Multimodal Word Representation via Dynamic Fusion Methods

Multimodal models have been proven to outperform text-based models on le...
research
09/11/2018

Evaluating Semantic Rationality of a Sentence: A Sememe-Word-Matching Neural Network based on HowNet

Automatic evaluation of semantic rationality is an important yet challen...
research
09/26/2018

Deep contextualized word representations for detecting sarcasm and irony

Predicting context-dependent and non-literal utterances like sarcastic a...
research
10/06/2021

Unsupervised Multimodal Language Representations using Convolutional Autoencoders

Multimodal Language Analysis is a demanding area of research, since it i...
research
08/21/2022

CMSBERT-CLR: Context-driven Modality Shifting BERT with Contrastive Learning for linguistic, visual, acoustic Representations

Multimodal sentiment analysis has become an increasingly popular researc...

Please sign up or login with your details

Forgot password? Click here to reset