Detection of Doctored Speech: Towards an End-to-End Parametric Learn-able Filter Approach

by   Rohit Arora, et al.

The Automatic Speaker Verification systems have potential in biometrics applications for logical control access and authentication. A lot of things happen to be at stake if the ASV system is compromised. The preliminary work presents a comparative analysis of the wavelet and MFCC-based state-of-the-art spoof detection techniques developed in these papers, respectively (Novoselov et al., 2016) (Alam et al., 2016a). The results on ASVspoof 2015 justify our inclination towards wavelet-based features instead of MFCC features. The experiments on the ASVspoof 2019 database show the lack of credibility of the traditional handcrafted features and give us more reason to progress towards using end-to-end deep neural networks and more recent techniques. We use Sincnet architecture as our baseline. We get E2E deep learning models, which we call WSTnet and CWTnet, respectively, by replacing the Sinc layer with the Wavelet Scattering and Continuous wavelet transform layers. The fusion model achieved 62 and our Sincnet baseline when evaluated on the modern spoofing attacks in ASVspoof 2019. The final scale distribution and the number of scales used in CWTnet are far from optimal for the task at hand. So to solve this problem, we replaced the CWT layer with a Wavelet Deconvolution(WD) (Khan and Yener, 2018) layer in our CWTnet architecture. This layer calculates the Discrete-Continuous Wavelet Transform similar to the CWTnet but also optimizes the scale parameter using back-propagation. The WDnet model achieved 26 CWTnet and Sincnet models respectively when evaluated over ASVspoof 2019 dataset. This shows that more generalized features are extracted as compared to the features extracted by CWTnet as only the most important and relevant frequency regions are focused upon.


page 1

page 2

page 3

page 4


STC Anti-spoofing Systems for the ASVspoof 2015 Challenge

This paper presents the Speech Technology Center (STC) systems submitted...

Glioma Grade Predictions using Scattering Wavelet Transform-Based Radiomics

Glioma grading before the surgery is very critical for the prognosis pre...

End-To-End Audiovisual Feature Fusion for Active Speaker Detection

Active speaker detection plays a vital role in human-machine interaction...

Historical traffic flow data reconstrucion applying Wavelet Transform

Despite the importance of fundamental parameters (traffic flow, density ...

Palmprint Recognition Using Deep Scattering Convolutional Network

Palmprint recognition has drawn a lot of attention during the recent yea...

End-to-End Speech Recognition From the Raw Waveform

State-of-the-art speech recognition systems rely on fixed, hand-crafted ...

Decoding Imagined Speech using Wavelet Features and Deep Neural Networks

This paper proposes a novel approach that uses deep neural networks for ...

Please sign up or login with your details

Forgot password? Click here to reset