Privacy against Real-Time Speech Emotion Detection via Acoustic Adversarial Evasion of Machine Learning

11/17/2022
by   Brian Testa, et al.
0

Emotional Surveillance is an emerging area with wide-reaching privacy concerns. These concerns are exacerbated by ubiquitous IoT devices with multiple sensors that can support these surveillance use cases. The work presented here considers one such use case: the use of a speech emotion recognition (SER) classifier tied to a smart speaker. This work demonstrates the ability to evade black-box SER classifiers tied to a smart speaker without compromising the utility of the smart speaker. This privacy concern is considered through the lens of adversarial evasion of machine learning. Our solution, Defeating Acoustic Recognition of Emotion via Genetic Programming (DARE-GP), uses genetic programming to generate non-invasive additive audio perturbations (AAPs). By constraining the evolution of these AAPs, transcription accuracy can be protected while simultaneously degrading SER classifier performance. The additive nature of these AAPs, along with an approach that generates these AAPs for a fixed set of users in an utterance and user location-independent manner, supports real-time, real-world evasion of SER classifiers. DARE-GP's use of spectral features, which underlay the emotional content of speech, allows the transferability of AAPs to previously unseen black-box SER classifiers. Further, DARE-GP outperforms state-of-the-art SER evasion techniques and is robust against defenses employed by a knowledgeable adversary. The evaluations in this work culminate with acoustic evaluations against two off-the-shelf commercial smart speakers, where a single AAP could evade a black box classifier over 70 deployed AAP playback on a small-form-factor system (raspberry pi) integrated with a wake-word system to evaluate the efficacy of a real-world, real-time deployment where DARE-GP is automatically invoked with the smart speaker's wake word.

READ FULL TEXT

page 3

page 13

page 24

research
02/02/2022

Speaker Normalization for Self-supervised Speech Emotion Recognition

Large speech emotion recognition datasets are hard to obtain, and small ...
research
11/07/2021

Emotional Prosody Control for Speech Generation

Machine-generated speech is characterized by its limited or unnatural em...
research
11/28/2018

Adversarial Machine Learning And Speech Emotion Recognition: Utilizing Generative Adversarial Networks For Robustness

Deep learning has undoubtedly offered tremendous improvements in the per...
research
11/23/2022

Whose Emotion Matters? Speaker Detection without Prior Knowledge

The task of emotion recognition in conversations (ERC) benefits from the...
research
07/01/2017

Employing Emotion Cues to Verify Speakers in Emotional Talking Environments

Usually, people talk neutrally in environments where there are no abnorm...
research
01/21/2021

Soft Genetic Programming Binary Classifiers

The study of the classifier's design and it's usage is one of the most i...
research
10/24/2020

Stop Bugging Me! Evading Modern-Day Wiretapping Using Adversarial Perturbations

Mass surveillance systems for voice over IP (VoIP) conversations pose a ...

Please sign up or login with your details

Forgot password? Click here to reset