Discriminate natural versus loudspeaker emitted speech

01/31/2019
by   Thanh-Ha Le, et al.
0

In this work, we address a novel, but potentially emerging, problem of discriminating the natural human voices and those played back by any kind of audio devices in the context of interactions with in-house voice user interface. The tackled problem may find relevant applications in (1) the far-field voice interactions of vocal interfaces such as Amazon Echo, Google Home, Facebook Portal, etc, and (2) the replay spoofing attack detection. The detection of loudspeaker emitted speech will help avoid false wake-ups or unintended interactions with the devices in the first application, while eliminating attacks involve the replay of recordings collected from enrolled speakers in the second one. At first we collect a real-world dataset under well-controlled conditions containing two classes: recorded speeches directly spoken by numerous people (considered as the natural speech), and recorded speeches played back from various loudspeakers (considered as the loudspeaker emitted speech). Then from this dataset, we build prediction models based on Deep Neural Network (DNN) for which different combination of audio features have been considered. Experiment results confirm the feasibility of the task where the combination of audio embeddings extracted from SoundNet and VGGish network yields the classification accuracy up to about 90

READ FULL TEXT
research
04/13/2019

Towards Vulnerability Analysis of Voice-Driven Interfaces and Countermeasures for Replay

Fake audio detection is expected to become an important research area in...
research
09/03/2019

Voice Spoofing Detection Corpus for Single and Multi-order Audio Replays

The evolution of modern voice controlled devices (VCDs) in recent years ...
research
05/28/2022

SuperVoice: Text-Independent Speaker Verification Using Ultrasound Energy in Human Speech

Voice-activated systems are integrated into a variety of desktop, mobile...
research
09/01/2020

When the Differences in Frequency Domain are Compensated: Understanding and Defeating Modulated Replay Attacks on Automatic Speech Recognition

Automatic speech recognition (ASR) systems have been widely deployed in ...
research
08/29/2018

Replay attack spoofing detection system using replay noise by multi-task learning

In this paper, we propose a spoofing detection system for replay attack ...
research
03/21/2022

Automated detection of foreground speech with wearable sensing in everyday home environments: A transfer learning approach

Acoustic sensing has proved effective as a foundation for numerous appli...
research
07/30/2020

Detecting Distrust Towards the Skills of a Virtual Assistant Using Speech

Research has shown that trust is an essential aspect of human-computer i...

Please sign up or login with your details

Forgot password? Click here to reset