Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models

06/21/2019
by   Ke Hu, et al.
0

Contextual automatic speech recognition, i.e., biasing recognition towards a given context (e.g. user's playlists, or contacts), is challenging in end-to-end (E2E) models. Such models maintain a limited number of candidates during beam-search decoding, and have been found to recognize rare named entities poorly. The problem is exacerbated when biasing towards proper nouns in foreign languages, e.g., geographic location names, which are virtually unseen in training and are thus out-of-vocabulary (OOV). While grapheme or wordpiece E2E models might have a difficult time spelling OOV words, phonemes are more acoustically salient and past work has shown that E2E phoneme models can better predict such words. In this work, we propose an E2E model containing both English wordpieces and phonemes in the modeling space, and perform contextual biasing of foreign words at the phoneme level by mapping pronunciations of foreign words into similar English phonemes. In experimental evaluations, we find that the proposed approach performs 16 grapheme-only biasing model, and 8 on a foreign place name recognition task, with only slight degradation on regular English tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2018

Deep context: end-to-end contextual speech recognition

In automatic speech recognition (ASR) what a user says depends on the pa...
research
05/02/2020

A language score based output selection method for multilingual speech recognition

The quality of a multilingual speech recognition system can be improved ...
research
08/10/2020

Subword Regularization: An Analysis of Scalability and Generalization for End-to-End Automatic Speech Recognition

Subwords are the most widely used output units in end-to-end speech reco...
research
12/14/2019

Personalization of End-to-end Speech Recognition On Mobile Devices For Named Entities

We study the effectiveness of several techniques to personalize end-to-e...
research
10/18/2022

Personalization of CTC Speech Recognition Models

End-to-end speech recognition models trained using joint Connectionist T...
research
07/12/2022

End-to-end speech recognition modeling from de-identified data

De-identification of data used for automatic speech recognition modeling...
research
12/08/2017

Building competitive direct acoustics-to-word models for English conversational speech recognition

Direct acoustics-to-word (A2W) models in the end-to-end paradigm have re...

Please sign up or login with your details

Forgot password? Click here to reset