Scaling Up Deliberation for Multilingual ASR

10/11/2022
by   Ke Hu, et al.
0

Multilingual end-to-end automatic speech recognition models are attractive due to its simplicity in training and deployment. Recent work on large-scale training of such models has shown promising results compared to monolingual models. However, the work often focuses on multilingual models themselves in a single-pass setup. In this work, we investigate second-pass deliberation for multilingual speech recognition. Our proposed deliberation is multilingual, i.e., the text encoder encodes hypothesis text from multiple languages, and the decoder attends to multilingual text and audio. We investigate scaling the deliberation text encoder and decoder, and compare scaling the deliberation decoder and the first-pass cascaded encoder. We show that deliberation improves the average WER on 9 languages by 4 model. By increasing the size of the deliberation up to 1B parameters, the average WER improvement increases to 9 Our deliberation rescorer is based on transformer layers and can be parallelized during rescoring.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/03/2021

A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English

We study training a single end-to-end (E2E) automatic speech recognition...
research
07/21/2023

Prompting Large Language Models with Speech Recognition Abilities

Large language models have proven themselves highly flexible, able to so...
research
09/16/2020

NABU - Multilingual Graph-based Neural RDF Verbalizer

The RDF-to-text task has recently gained substantial attention due to co...
research
01/27/2021

Transformer Based Deliberation for Two-Pass Speech Recognition

Interactive speech recognition systems must generate words quickly while...
research
10/30/2022

DuDe: Dual-Decoder Multilingual ASR for Indian Languages using Common Label Set

In a multilingual country like India, multilingual Automatic Speech Reco...
research
03/23/2023

A Deliberation-based Joint Acoustic and Text Decoder

We propose a new two-pass E2E speech recognition model that improves ASR...
research
10/13/2022

Task Grouping for Multilingual Text Recognition

Most existing OCR methods focus on alphanumeric characters due to the po...

Please sign up or login with your details

Forgot password? Click here to reset