Telephonetic: Making Neural Language Models Robust to ASR and Semantic Noise

06/13/2019
by   Chris Larson, et al.
0

Speech processing systems rely on robust feature extraction to handle phonetic and semantic variations found in natural language. While techniques exist for desensitizing features to common noise patterns produced by Speech-to-Text (STT) and Text-to-Speech (TTS) systems, the question remains how to best leverage state-of-the-art language models (which capture rich semantic features, but are trained on only written text) on inputs with ASR errors. In this paper, we present Telephonetic, a data augmentation framework that helps robustify language model features to ASR corrupted inputs. To capture phonetic alterations, we employ a character-level language model trained using probabilistic masking. Phonetic augmentations are generated in two stages: a TTS encoder (Tacotron 2, WaveGlow) and a STT decoder (DeepSpeech). Similarly, semantic perturbations are produced by sampling from nearby words in an embedding space, which is computed using the BERT language model. Words are selected for augmentation according to a hierarchical grammar sampling strategy. Telephonetic is evaluated on the Penn Treebank (PTB) corpus, and demonstrates its effectiveness as a bootstrapping technique for transferring neural language models to the speech domain. Notably, our language model achieves a test perplexity of 37.49 on PTB, which to our knowledge is state-of-the-art among models trained only on PTB.

READ FULL TEXT
research
04/01/2022

Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems

Large-scale language models (LLMs) such as GPT-2, BERT and RoBERTa have ...
research
06/09/2020

On the Effectiveness of Neural Text Generation based Data Augmentation for Recognition of Morphologically Rich Speech

Advanced neural network models have penetrated Automatic Speech Recognit...
research
05/07/2020

A Tale of Two Perplexities: Sensitivity of Neural Language Models to Lexical Retrieval Deficits in Dementia of the Alzheimer's Type

In recent years there has been a burgeoning interest in the use of compu...
research
09/14/2018

Visual Speech Language Models

Language models (LM) are very powerful in lipreading systems. Language m...
research
06/08/2023

Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding

Large Language Models (LLMs) have been applied in the speech domain, oft...
research
11/01/2021

PerSpeechNorm: A Persian Toolkit for Speech Processing Normalization

In general, speech processing models consist of a language model along w...
research
07/20/2021

Seed Words Based Data Selection for Language Model Adaptation

We address the problem of language model customization in applications w...

Please sign up or login with your details

Forgot password? Click here to reset