End-to-End ASR for Code-switched Hindi-English Speech

End-to-end (E2E) models have been explored for large speech corpora and have been found to match or outperform traditional pipeline-based systems in some languages. However, most prior work on end-to-end models use speech corpora exceeding hundreds or thousands of hours. In this study, we explore end-to-end models for code-switched Hindi-English language with less than 50 hours of data. We utilize two specific measures to improve network performance in the low-resource setting, namely multi-task learning (MTL) and balancing the corpus to deal with the inherent class imbalance problem i.e. the skewed frequency distribution over graphemes. We compare the results of the proposed approaches with traditional, cascaded ASR systems. While the lack of data adversely affects the performance of end-to-end models, we see promising improvements with MTL and balancing the corpus.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2022

ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks

This paper describes the ON-TRAC Consortium translation systems develope...
research
12/17/2021

JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification

In this paper, we construct a new Japanese speech corpus called "JTubeSp...
research
05/27/2020

Phone Features Improve Speech Translation

End-to-end models for speech translation (ST) more tightly couple speech...
research
07/06/2021

Kosp2e: Korean Speech to English Translation Corpus

Most speech-to-text (S2T) translation studies use English speech as a so...
research
10/19/2021

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

End-to-end TTS suffers from high data requirements as it is difficult fo...
research
11/26/2019

Convolutional Composer Classification

This paper investigates end-to-end learnable models for attributing comp...
research
12/14/2020

REDAT: Accent-Invariant Representation for End-to-End ASR by Domain Adversarial Training with Relabeling

Accents mismatching is a critical problem for end-to-end ASR. This paper...

Please sign up or login with your details

Forgot password? Click here to reset