Phonetically-Oriented Word Error Alignment for Speech Recognition Error Analysis in Speech Translation

04/24/2019
by   Nicholas Ruiz, et al.
0

We propose a variation to the commonly used Word Error Rate (WER) metric for speech recognition evaluation which incorporates the alignment of phonemes, in the absence of time boundary information. After computing the Levenshtein alignment on words in the reference and hypothesis transcripts, spans of adjacent errors are converted into phonemes with word and syllable boundaries and a phonetic Levenshtein alignment is performed. The aligned phonemes are recombined into aligned words that adjust the word alignment labels in each error region. We demonstrate that our Phonetically-Oriented Word Error Rate (POWER) yields similar scores to WER with the added advantages of better word alignments and the ability to capture one-to-many word alignments corresponding to homophonic errors in speech recognition hypotheses. These improved alignments allow us to better trace the impact of Levenshtein error types on downstream tasks such as speech translation.

READ FULL TEXT
research
06/10/2019

Word-level Speech Recognition with a Dynamic Lexicon

We propose a direct-to-word sequence model with a dynamic lexicon. Our w...
research
07/04/2022

Minimizing Sequential Confusion Error in Speech Command Recognition

Speech command recognition (SCR) has been commonly used on resource cons...
research
02/17/2023

Handling the Alignment for Wake Word Detection: A Comparison Between Alignment-Based, Alignment-Free and Hybrid Approaches

Wake word detection exists in most intelligent homes and portable device...
research
09/20/2023

SpeechAlign: a Framework for Speech Translation Alignment Evaluation

Speech-to-Speech and Speech-to-Text translation are currently dynamic ar...
research
09/14/2023

Aligning Speakers: Evaluating and Visualizing Text-based Diarization Using Efficient Multiple Sequence Alignment (Extended Version)

This paper presents a novel evaluation approach to text-based speaker di...
research
07/21/2023

MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems

MeetEval is an open-source toolkit to evaluate all kinds of meeting tran...
research
12/22/2022

Alignment Entropy Regularization

Existing training criteria in automatic speech recognition(ASR) permit t...

Please sign up or login with your details

Forgot password? Click here to reset