Replay attack detection with complementary high-resolution information using end-to-end DNN for the ASVspoof 2019 Challenge

04/23/2019
by   Jee-weon Jung, et al.
0

In this study, we concentrate on replacing the process of extracting hand-crafted acoustic feature with end-to-end DNN using complementary high-resolution spectrograms. As a result of advance in audio devices, typical characteristics of a replayed speech based on conventional knowledge alter or diminish in unknown replay configurations. Thus, it has become increasingly difficult to detect spoofed speech with a conventional knowledge-based approach. To detect unrevealed characteristics that reside in a replayed speech, we directly input spectrograms into an end-to-end DNN without knowledge-based intervention. Explorations dealt in this study that differentiates from existing spectrogram-based systems are twofold: complementary information and high-resolution. Spectrograms with different information are explored, and it is shown that additional information such as the phase information can be complementary. High-resolution spectrograms are employed with the assumption that the difference between a bona-fide and a replayed speech exists in the details. Additionally, to verify whether other features are complementary to spectrograms, we also examine raw waveform and an i-vector based system. Experiments conducted on the ASVspoof 2019 physical access challenge show promising results, where t-DCF and equal error rates are 0.0570 and 2.45

READ FULL TEXT
research
02/02/2021

The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap

This paper provides a detailed description of the Hitachi-JHU system tha...
research
09/22/2017

Attention-based Wav2Text with Feature Transfer Learning

Conventional automatic speech recognition (ASR) typically performs multi...
research
07/06/2023

Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition

Accurate recognition of cocktail party speech containing overlapping spe...
research
02/14/2021

Light Field Reconstruction via Attention-Guided Deep Fusion of Hybrid Lenses

This paper explores the problem of reconstructing high-resolution light ...
research
03/04/2021

End-to-End Mispronunciation Detection and Diagnosis From Raw Waveforms

Mispronunciation detection and diagnosis (MDD) is designed to identify p...
research
08/18/2022

L3: Accelerator-Friendly Lossless Image Format for High-Resolution, High-Throughput DNN Training

The training process of deep neural networks (DNNs) is usually pipelined...
research
04/06/2022

Customizable End-to-end Optimization of Online Neural Network-supported Dereverberation for Hearing Devices

This work focuses on online dereverberation for hearing devices using th...

Please sign up or login with your details

Forgot password? Click here to reset