Detecting Mismatch between Text Script and Voice-over Using Utterance Verification Based on Phoneme Recognition Ranking

03/20/2020
by   Yoonjae Jeong, et al.
0

The purpose of this study is to detect the mismatch between text script and voice-over. For this, we present a novel utterance verification (UV) method, which calculates the degree of correspondence between a voice-over and the phoneme sequence of a script. We found that the phoneme recognition probabilities of exaggerated voice-overs decrease compared to ordinary utterances, but their rankings do not demonstrate any significant change. The proposed method, therefore, uses the recognition ranking of each phoneme segment corresponding to a phoneme sequence for measuring the confidence of a voice-over utterance for its corresponding script. The experimental results show that the proposed UV method outperforms a state-of-the-art approach using cross modal attention used for detecting mismatch between speech and transcription.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2022

Improving short-video speech recognition using random utterance concatenation

One of the limitations in end-to-end automatic speech recognition framew...
research
06/23/2021

Enrollment-less training for personalized voice activity detection

We present a novel personalized voice activity detection (PVAD) learning...
research
11/01/2018

Deep Segment Attentive Embedding for Duration Robust Speaker Verification

LSTM-based speaker verification usually uses a fixed-length local segmen...
research
03/03/2023

SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks

The availability of digital devices operated by voice is expanding rapid...
research
06/28/2019

Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice

Millions of people reach out to digital assistants such as Siri every da...
research
11/15/2022

Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder

By utilizing the fact that speaker identity and content vary on differen...
research
10/28/2021

A Sequence to Sequence Model for Extracting Multiple Product Name Entities from Dialog

E-commerce voice ordering systems need to recognize multiple product nam...

Please sign up or login with your details

Forgot password? Click here to reset