Introducing Model Inversion Attacks on Automatic Speaker Recognition

01/09/2023
by   Karla Pizzi, et al.
0

Model inversion (MI) attacks allow to reconstruct average per-class representations of a machine learning (ML) model's training data. It has been shown that in scenarios where each class corresponds to a different individual, such as face classifiers, this represents a severe privacy risk. In this work, we explore a new application for MI: the extraction of speakers' voices from a speaker recognition system. We present an approach to (1) reconstruct audio samples from a trained ML model and (2) extract intermediate voice feature representations which provide valuable insights into the speakers' biometrics. Therefore, we propose an extension of MI attacks which we call sliding model inversion. Our sliding MI extends standard MI by iteratively inverting overlapping chunks of the audio samples and thereby leveraging the sequential properties of audio data for enhanced inversion performance. We show that one can use the inverted audio data to generate spoofed audio samples to impersonate a speaker, and execute voice-protected commands for highly secured systems on their behalf. To the best of our knowledge, our work is the first one extending MI attacks to audio data, and our results highlight the security risks resulting from the extraction of the biometric data in that setup.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/18/2019

Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems

Voice Processing Systems (VPSes), now widely deployed, have been made si...
research
09/19/2023

USED: Universal Speaker Extraction and Diarization

Speaker extraction and diarization are two crucial enabling techniques f...
research
07/11/2019

My lips are concealed: Audio-visual speech enhancement through obstructions

Our objective is an audio-visual model for separating a single speaker f...
research
05/07/2020

Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data

We propose Cotatron, a transcription-guided speech encoder for speaker-i...
research
02/20/2018

Fitting New Speakers Based on a Short Untranscribed Sample

Learning-based Text To Speech systems have the potential to generalize f...
research
03/25/2022

WaveFuzz: A Clean-Label Poisoning Attack to Protect Your Voice

People are not always receptive to their voice data being collected and ...
research
08/24/2023

WavMark: Watermarking for Audio Generation

Recent breakthroughs in zero-shot voice synthesis have enabled imitating...

Please sign up or login with your details

Forgot password? Click here to reset