Improving Massively Multilingual ASR With Auxiliary CTC Objectives

02/24/2023
by   William Chen, et al.
0

Multilingual Automatic Speech Recognition (ASR) models have extended the usability of speech technologies to a wide variety of languages. With how many languages these models have to handle, however, a key to understanding their imbalanced performance across different languages is to examine if the model actually knows which language it should transcribe. In this paper, we introduce our work on improving performance on FLEURS, a 102-language open ASR benchmark, by conditioning the entire model on language identity (LID). We investigate techniques inspired from recent Connectionist Temporal Classification (CTC) studies to help the model handle the large number of languages, conditioning on the LID predictions of auxiliary tasks. Our experimental results demonstrate the effectiveness of our technique over standard CTC/Attention-based hybrid models. Furthermore, our state-of-the-art systems using self-supervised models with the Conformer architecture improve over the results of prior work on FLEURS by a relative 28.4 available at https://github.com/espnet/espnet/tree/master/egs2/fleurs/asr1 .

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2022

Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information

Multilingual end-to-end models have shown great improvement over monolin...
research
10/13/2022

HuBERT-TR: Reviving Turkish Automatic Speech Recognition with Self-supervised Speech Representation Learning

While the Turkish language is listed among low-resource languages, liter...
research
06/14/2023

Towards training Bilingual and Code-Switched Speech Recognition models from Monolingual data sources

Multilingual Automatic Speech Recognition (ASR) models are capable of tr...
research
12/22/2020

Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition

Low-resource automatic speech recognition (ASR) is challenging, as the l...
research
04/07/2022

3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition

Recently, Conformer based CTC/AED model has become a mainstream architec...
research
08/15/2023

Improving CTC-AED model with integrated-CTC and auxiliary loss regularization

Connectionist temporal classification (CTC) and attention-based encoder ...
research
05/13/2020

DARTS-ASR: Differentiable Architecture Search for Multilingual Speech Recognition and Adaptation

In previous works, only parameter weights of ASR models are optimized un...

Please sign up or login with your details

Forgot password? Click here to reset