Bilingual End-to-End ASR with Byte-Level Subwords

05/01/2022
by   Liuhui Deng, et al.
0

In this paper, we investigate how the output representation of an end-to-end neural network affects multilingual automatic speech recognition (ASR). We study different representations including character-level, byte-level, byte pair encoding (BPE), and byte-level byte pair encoding (BBPE) representations, and analyze their strengths and weaknesses. We focus on developing a single end-to-end model to support utterance-based bilingual ASR, where speakers do not alternate between two languages in a single utterance but may change languages across utterances. We conduct our experiments on English and Mandarin dictation tasks, and we find that BBPE with penalty schemes can improve utterance-based bilingual ASR performance by 2 smaller number of outputs and fewer parameters. We conclude with analysis that indicates directions for further improving multilingual ASR.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/03/2021

A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English

We study training a single end-to-end (E2E) automatic speech recognition...
research
10/22/2020

A multilingual approach to joint Speech and Accent Recognition with DNN-HMM framework

Human can perform multi-task recognition from speech. For instance, huma...
research
06/05/2022

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR

Despite the rapid progress in automatic speech recognition (ASR) researc...
research
09/16/2021

Utterance-level neural confidence measure for end-to-end children speech recognition

Confidence measure is a performance index of particular importance for a...
research
06/30/2019

Contextual Phonetic Pretraining for End-to-end Utterance-level Language and Speaker Recognition

Pretrained contextual word representations in NLP have greatly improved ...
research
12/07/2020

Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid ASR Systems

Inspired by SpecAugment – a data augmentation method for end-to-end ASR ...
research
06/06/2023

Improving Fairness and Robustness in End-to-End Speech Recognition through unsupervised clustering

The challenge of fairness arises when Automatic Speech Recognition (ASR)...

Please sign up or login with your details

Forgot password? Click here to reset