Language Model Bootstrapping Using Neural Machine Translation For Conversational Speech Recognition

12/02/2019
by   Surabhi Punjabi, et al.
0

Building conversational speech recognition systems for new languages is constrained by the availability of utterances that capture user-device interactions. Data collection is both expensive and limited by the speed of manual transcription. In order to address this, we advocate the use of neural machine translation as a data augmentation technique for bootstrapping language models. Machine translation (MT) offers a systematic way of incorporating collections from mature, resource-rich conversational systems that may be available for a different language. However, ingesting raw translations from a general purpose MT system may not be effective owing to the presence of named entities, intra sentential code-switching and the domain mismatch between the conversational data being translated and the parallel text used for MT training. To circumvent this, we explore the following domain adaptation techniques: (a) sentence embedding based data selection for MT training, (b) model finetuning, and (c) rescoring and filtering translated hypotheses. Using Hindi as the experimental testbed, we translate US English utterances to supplement the transcribed collections. We observe a relative word error rate reduction of 7.8-15.6 analysis reveals that translation particularly aids the interaction scenarios which are underrepresented in the transcribed data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/18/2015

Phrase Based Language Model For Statistical Machine Translation

We consider phrase based Language Models (LM), which generalize the comm...
research
05/23/2018

Selecting Machine-Translated Data for Quick Bootstrapping of a Natural Language Understanding System

This paper investigates the use of Machine Translation (MT) to bootstrap...
research
03/27/2019

Using Monolingual Data in Neural Machine Translation: a Systematic Study

Neural Machine Translation (MT) has radically changed the way systems ar...
research
02/13/2017

Towards speech-to-text translation without speech recognition

We explore the problem of translating speech to text in low-resource sce...
research
05/01/2020

Selecting Backtranslated Data from Multiple Sources for Improved Neural Machine Translation

Machine translation (MT) has benefited from using synthetic training dat...
research
11/07/2018

Towards Fluent Translations from Disfluent Speech

When translating from speech, special consideration for conversational s...
research
04/01/2021

Sampling and Filtering of Neural Machine Translation Distillation Data

In most of neural machine translation distillation or stealing scenarios...

Please sign up or login with your details

Forgot password? Click here to reset