An Initial Investigation for Detecting Partially Spoofed Audio

04/06/2021
by   Lin Zhang, et al.
0

All existing databases of spoofed speech contain attack data that is spoofed in its entirety. In practice, it is entirely plausible that successful attacks can be mounted with utterances that are only partially spoofed. By definition, partially-spoofed utterances contain a mix of both spoofed and bona fide segments, which will likely degrade the performance of countermeasures trained with entirely spoofed utterances. This hypothesis raises the obvious question: 'Can we detect partially-spoofed audio?' This paper introduces a new database of partially-spoofed data, named PartialSpoof, to help address this question. This new database enables us to investigate and compare the performance of countermeasures on both utterance- and segmental- level labels. Experimental results using the utterance-level labels reveal that the reliability of countermeasures trained to detect fully-spoofed data is found to degrade substantially when tested with partially-spoofed data, whereas training on partially-spoofed data performs reliably in the case of both fully- and partially-spoofed utterances. Additional experiments using segmental-level labels show that spotting injected spoofed segments included in an utterance is a much more challenging task even if the latest countermeasure models are used.

READ FULL TEXT
research
04/11/2022

The PartialSpoof Database and Countermeasures for the Detection of Short Generated Audio Segments Embedded in a Speech Utterance

Automatic speaker verification is susceptible to various manipulations a...
research
12/07/2020

Using previous acoustic context to improve Text-to-Speech synthesis

Many speech synthesis datasets, especially those derived from audiobooks...
research
11/01/2022

Waveform Boundary Detection for Partially Spoofed Audio

The present paper proposes a waveform boundary detection system for audi...
research
09/14/2022

Using Rater and System Metadata to Explain Variance in the VoiceMOS Challenge 2022 Dataset

Non-reference speech quality models are important for a growing number o...
research
03/08/2022

Estimating the Uncertainty in Emotion Class Labels with Utterance-Specific Dirichlet Priors

Emotion recognition is a key attribute for artificial intelligence syste...
research
10/21/2019

Disambiguating Speech Intention via Audio-Text Co-attention Framework: A Case of Prosody-semantics Interface

Understanding the intention of an utterance is challenging for some pros...
research
06/24/2019

A computational model of early language acquisition from audiovisual experiences of young infants

Earlier research has suggested that human infants might use statistical ...

Please sign up or login with your details

Forgot password? Click here to reset