They are wearing a mask! Identification of Subjects Wearing a Surgical Mask from their Speech by means of x-vectors and Fisher Vectors

Challenges based on Computational Paralinguistics in the INTERSPEECH Conference have always had a good reception among the attendees owing to its competitive academic and research demands. This year, the INTERSPEECH 2020 Computational Paralinguistics Challenge offers three different problems; here, the Mask Sub-Challenge is of specific interest. This challenge involves the classification of speech recorded from subjects while wearing a surgical mask. In this study, to address the above-mentioned problem we employ two different types of feature extraction methods. The x-vectors embeddings, which is the current state-of-the-art approach for Speaker Recognition; and the Fisher Vector (FV), that is a method originally intended for Image Recognition, but here we utilize it to discriminate utterances. These approaches employ distinct frame-level representations: MFCC and PLP. Using Support Vector Machines (SVM) as the classifier, we perform a technical comparison between the performances of the FV encodings and the x-vector embeddings for this particular classification task. We find that the Fisher vector encodings provide better representations of the utterances than the x-vectors do for this specific dataset. Moreover, we show that a fusion of our best configurations outperforms all the baseline scores of the Mask Sub-Challenge.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2020

Applying Speech Tempo-Derived Features, BoAW and Fisher Vectors to Detect Elderly Emotion and Speech in Surgical Masks

The 2020 INTERSPEECH Computational Paralinguistics Challenge (ComParE) c...
research
06/17/2020

Are you wearing a mask? Improving mask detection from speech using augmentation by cycle-consistent GANs

The task of detecting whether a person wears a face mask from speech is ...
research
08/12/2020

Mask Detection and Breath Monitoring from Speech: on Data Augmentation, Feature Representation and Modeling

This paper introduces our approaches for the Mask and Breathing Sub-Chal...
research
09/17/2018

Generative x-vectors for text-independent speaker verification

Speaker verification (SV) systems using deep neural network embeddings, ...
research
02/05/2016

Fantastic 4 system for NIST 2015 Language Recognition Evaluation

This article describes the systems jointly submitted by Institute for In...
research
06/30/2019

Contextual Phonetic Pretraining for End-to-end Utterance-level Language and Speaker Recognition

Pretrained contextual word representations in NLP have greatly improved ...
research
07/13/2019

Speaker Recognition with Random Digit Strings Using Uncertainty Normalized HMM-based i-vectors

In this paper, we combine Hidden Markov Models (HMMs) with i-vector extr...

Please sign up or login with your details

Forgot password? Click here to reset