Detecting Unintended Memorization in Language-Model-Fused ASR

04/20/2022
by   W. Ronny Huang, et al.
0

End-to-end (E2E) models are often being accompanied by language models (LMs) via shallow fusion for boosting their overall quality as well as recognition of rare words. At the same time, several prior works show that LMs are susceptible to unintentionally memorizing rare or unique sequences in the training data. In this work, we design a framework for detecting memorization of random textual sequences (which we call canaries) in the LM training data when one has only black-box (query) access to LM-fused speech recognizer, as opposed to direct access to the LM. On a production-grade Conformer RNN-T E2E model fused with a Transformer LM, we show that detecting memorization of singly-occurring canaries from the LM training data of 300M examples is possible. Motivated to protect privacy, we also show that such memorization gets significantly reduced by per-example gradient-clipped LM training without compromising overall quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2020

Improving accuracy of rare words for RNN-Transducer through unigram shallow fusion

End-to-end automatic speech recognition (ASR) systems, such as recurrent...
research
05/26/2022

Contextual Adapters for Personalized Speech Recognition in Neural Transducers

Personal rare word recognition in end-to-end Automatic Speech Recognitio...
research
03/20/2023

On-the-fly Text Retrieval for End-to-End ASR Adaptation

End-to-end speech recognition models are improved by incorporating exter...
research
11/16/2020

Deep Shallow Fusion for RNN-T Personalization

End-to-end models in general, and Recurrent Neural Network Transducer (R...
research
03/09/2022

Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition

Language model fusion helps smart assistants recognize words which are r...
research
02/21/2022

Adaptive Discounting of Implicit Language Models in RNN-Transducers

RNN-Transducer (RNN-T) models have become synonymous with streaming end-...
research
04/15/2022

Improving Rare Word Recognition with LM-aware MWER Training

Language models (LMs) significantly improve the recognition accuracy of ...

Please sign up or login with your details

Forgot password? Click here to reset