Incorporating End-to-End Speech Recognition Models for Sentiment Analysis

02/28/2019
by   Egor Lakomkin, et al.
0

Previous work on emotion recognition demonstrated a synergistic effect of combining several modalities such as auditory, visual, and transcribed text to estimate the affective state of a speaker. Among these, the linguistic modality is crucial for the evaluation of an expressed emotion. However, manually transcribed spoken text cannot be given as input to a system practically. We argue that using ground-truth transcriptions during training and evaluation phases leads to a significant discrepancy in performance compared to real-world conditions, as the spoken text has to be recognized on the fly and can contain speech recognition mistakes. In this paper, we propose a method of integrating an automatic speech recognition (ASR) output with a character-level recurrent neural network for sentiment recognition. In addition, we conduct several experiments investigating sentiment recognition for human-robot interaction in a noise-realistic scenario which is challenging for the ASR systems. We quantify the improvement compared to using only the acoustic modality in sentiment recognition. We demonstrate the effectiveness of this approach on the Multimodal Corpus of Sentiment Intensity (MOSI) by achieving 73,6 a binary sentiment classification task, exceeding previously reported results that use only acoustic input. In addition, we set a new state-of-the-art performance on the MOSI dataset (80.4

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

research
10/29/2021

Fusing ASR Outputs in Joint Training for Speech Emotion Recognition

Alongside acoustic information, linguistic features based on speech tran...
research
04/20/2021

On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era

Text encodings from automatic speech recognition (ASR) transcripts and a...
research
04/07/2020

Keywords Extraction and Sentiment Analysis using Automatic Speech Recognition

Automatic Speech Recognition (ASR) is the interdisciplinary subfield of ...
research
07/21/2023

A Change of Heart: Improving Speech Emotion Recognition through Speech-to-Text Modality Conversion

Speech Emotion Recognition (SER) is a challenging task. In this paper, w...
research
10/27/2020

Emotion recognition by fusing time synchronous and time asynchronous representations

In this paper, a novel two-branch neural network model structure is prop...
research
03/01/2022

Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors

Multimodal sentiment analysis has attracted increasing attention and lot...
research
02/21/2023

Connecting Humanities and Social Sciences: Applying Language and Speech Technology to Online Panel Surveys

In this paper, we explore the application of language and speech technol...

Please sign up or login with your details

Forgot password? Click here to reset