An End-to-end Approach for Lexical Stress Detection based on Transformer

11/06/2019
by   Yong Ruan, et al.
0

The dominant automatic lexical stress detection method is to split the utterance into syllable segments using phoneme sequence and their time-aligned boundaries. Then we extract features from syllable to use classification method to classify the lexical stress. However, we can't get very accurate time boundaries of each phoneme and we have to design some features in the syllable segments to classify the lexical stress. Therefore, we propose a end-to-end approach using sequence to sequence model of transformer to estimate lexical stress. For this, we train transformer model using feature sequence of audio and their phoneme sequence with lexical stress marks. During the recognition process, the recognized phoneme sequence is restricted according to the original standard phoneme sequence without lexical stress marks, but the lexical stress mark of each phoneme is not limited. We train the model in different subset of Librispeech and do lexical stress recognition in TIMIT and L2-ARCTIC dataset. For all subsets, the end-to-end model will perform better than the syllable segments classification method. Our method can achieve a 6.36 rate in other studies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/22/2021

A Transformer Architecture for Stress Detection from ECG

Electrocardiogram (ECG) has been widely used for emotion recognition. Th...
research
09/27/2021

Predicting Driver Self-Reported Stress by Analyzing the Road Scene

Several studies have shown the relevance of biosignals in driver stress ...
research
12/29/2020

Detection of Lexical Stress Errors in Non-native (L2) English with Data Augmentation and Attention

This paper describes two novel complementary techniques that improve the...
research
05/17/2021

MUSER: MUltimodal Stress Detection using Emotion Recognition as an Auxiliary Task

The capability to automatically detect human stress can benefit artifici...
research
10/20/2022

Play It Back: Iterative Attention for Audio Recognition

A key function of auditory cognition is the association of characteristi...
research
02/28/2020

Metaphoric Paraphrase Generation

This work describes the task of metaphoric paraphrase generation, in whi...
research
03/11/2019

The Truth and Nothing but the Truth: Multimodal Analysis for Deception Detection

We propose a data-driven method for automatic deception detection in rea...

Please sign up or login with your details

Forgot password? Click here to reset