Privacy-Preserving Adversarial Representation Learning in ASR: Reality or Illusion?

Automatic speech recognition (ASR) is a key technology in many services and applications. This typically requires user devices to send their speech data to the cloud for ASR decoding. As the speech signal carries a lot of information about the speaker, this raises serious privacy concerns. As a solution, an encoder may reside on each user device which performs local computations to anonymize the representation. In this paper, we focus on the protection of speaker identity and study the extent to which users can be recognized based on the encoded representation of their speech as obtained by a deep encoder-decoder architecture trained for ASR. Through speaker identification and verification experiments on the Librispeech corpus with open and closed sets of speakers, we show that the representations obtained from a standard architecture still carry a lot of information about speaker identity. We then propose to use adversarial training to learn representations that perform well in ASR while hiding speaker identity. Our results demonstrate that adversarial training dramatically reduces the closed-set classification accuracy, but this does not translate into increased open-set verification error hence into increased protection of the speaker identity in practice. We suggest several possible reasons behind this negative result.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/25/2023

On-Device Speaker Anonymization of Acoustic Embeddings for ASR based onFlexible Location Gradient Reversal Layer

Smart devices serviced by large-scale AI models necessitates user data t...
research
03/15/2022

Privacy-Preserving Speech Representation Learning using Vector Quantization

With the popularity of virtual assistants (e.g., Siri, Alexa), the use o...
research
05/19/2020

Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition

It is important to transcribe and archive speech data of endangered lang...
research
12/14/2022

Disentangling Prosody Representations with Unsupervised Speech Reconstruction

Human speech can be characterized by different components, including sem...
research
12/08/2020

Adversarial Disentanglement of Speaker Representation for Attribute-Driven Privacy Preservation

With the increasing interest over speech technologies, numerous Automati...
research
02/18/2022

Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation

Dysarthric speech reconstruction (DSR), which aims to improve the qualit...
research
04/15/2021

A Method to Reveal Speaker Identity in Distributed ASR Training, and How to Counter It

End-to-end Automatic Speech Recognition (ASR) models are commonly traine...

Please sign up or login with your details

Forgot password? Click here to reset