Hear "No Evil", See "Kenansville": Efficient and Transferable Black-Box Attacks on Speech Recognition and Voice Identification Systems

by   Hadi Abdullah, et al.

Automatic speech recognition and voice identification systems are being deployed in a wide array of applications, from providing control mechanisms to devices lacking traditional interfaces, to the automatic transcription of conversations and authentication of users. Many of these applications have significant security and privacy considerations. We develop attacks that force mistranscription and misidentification in state of the art systems, with minimal impact on human comprehension. Processing pipelines for modern systems are comprised of signal preprocessing and feature extraction steps, whose output is fed to a machine-learned model. Prior work has focused on the models, using white-box knowledge to tailor model-specific attacks. We focus on the pipeline stages before the models, which (unlike the models) are quite similar across systems. As such, our attacks are black-box and transferable, and demonstrably achieve mistranscription and misidentification rates as high as 100 Mechanical Turk demonstrating that there is no statistically significant difference between human perception of regular and perturbed audio. Our findings suggest that models may learn aspects of speech that are generally not perceived by human subjects, but that are crucial for model accuracy. We also find that certain English language phonemes (in particular, vowels) are significantly more susceptible to our attack. We show that the attacks are effective when mounted over cellular networks, where signals are subject to degradation due to transcoding, jitter, and packet loss.


page 1

page 9

page 11

page 18


Adversarial Black-Box Attacks for Automatic Speech Recognition Systems Using Multi-Objective Genetic Optimization

Fooling deep neural networks with adversarial input have exposed a signi...

Speech Pattern based Black-box Model Watermarking for Automatic Speech Recognition

As an effective method for intellectual property (IP) protection, model ...

SoK: A Study of the Security on Voice Processing Systems

As the use of Voice Processing Systems (VPS) continues to become more pr...

Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems

Voice Processing Systems (VPSes), now widely deployed, have been made si...

Practical Attacks on Voice Spoofing Countermeasures

Voice authentication has become an integral part in security-critical op...

Towards the Transferable Audio Adversarial Attack via Ensemble Methods

In recent years, deep learning (DL) models have achieved significant pro...

Please sign up or login with your details

Forgot password? Click here to reset