On the Inductive Bias of Word-Character-Level Multi-Task Learning for Speech Recognition

11/28/2018
by   Jan Kremer, et al.
0

End-to-end automatic speech recognition (ASR) commonly transcribes audio signals into sequences of characters while its performance is evaluated by measuring the word-error rate (WER). This suggests that predicting sequences of words directly may be helpful instead. However, training with word-level supervision can be more difficult due to the sparsity of examples per label class. In this paper we analyze an end-to-end ASR model that combines a word-and-character representation in a multi-task learning (MTL) framework. We show that it improves on the WER and study how the word-level model can benefit from character-level supervision by analyzing the learned inductive preference bias of each model component empirically. We find that by adding character-level supervision, the MTL model interpolates between recognizing more frequent words (preferred by the word-level model) and shorter words (preferred by the character-level model).

READ FULL TEXT
research
11/23/2020

Multi-task Language Modeling for Improving Speech Recognition of Rare Words

End-to-end automatic speech recognition (ASR) systems are increasingly p...
research
07/27/2022

SoundChoice: Grapheme-to-Phoneme Models with Semantic Disambiguation

End-to-end speech synthesis models directly convert the input characters...
research
10/31/2018

On The Inductive Bias of Words in Acoustics-to-Word Models

Acoustics-to-word models are end-to-end speech recognizers that use word...
research
03/30/2022

Is Word Error Rate a good evaluation metric for Speech Recognition in Indic Languages?

We propose a new method for the calculation of error rates in Automatic ...
research
05/10/2023

Quran Recitation Recognition using End-to-End Deep Learning

The Quran is the holy scripture of Islam, and its recitation is an impor...
research
07/18/2018

Hierarchical Multi Task Learning With CTC

In Automatic Speech Recognition, it is still challenging to learn useful...
research
04/01/2022

Multi-sequence Intermediate Conditioning for CTC-based ASR

End-to-end automatic speech recognition (ASR) directly maps input speech...

Please sign up or login with your details

Forgot password? Click here to reset