Learning Domain Invariant Representations for Child-Adult Classification from Speech

10/25/2019
by   Rimita Lahiri, et al.
0

Diagnostic procedures for ASD (autism spectrum disorder) involve semi-naturalistic interactions between the child and a clinician. Computational methods to analyze these sessions require an end-to-end speech and language processing pipeline that go from raw audio to clinically-meaningful behavioral features. An important component of this pipeline is the ability to automatically detect who is speaking when i.e., perform child-adult speaker classification. This binary classification task is often confounded due to variability associated with the participants' speech and background conditions. Further, scarcity of training data often restricts direct application of conventional deep learning methods. In this work, we address two major sources of variability - age of the child and data source collection location - using domain adversarial learning which does not require labeled target domain data. We use two methods, generative adversarial training with inverted label loss and gradient reversal layer to learn speaker embeddings invariant to the above sources of variability, and analyze different conditions under which the proposed techniques improve over conventional learning methods. Using a large corpus of ADOS-2 (autism diagnostic observation schedule, 2nd edition) sessions, we demonstrate upto 13.45 conventional learning methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2018

Adversarial Learning of Raw Speech Features for Domain Invariant Speech Recognition

Recent advances in neural network based acoustic modelling have shown si...
research
11/04/2019

Speaker-invariant Affective Representation Learning via Adversarial Training

Representation learning for speech emotion recognition is challenging du...
research
03/22/2019

Towards adversarial learning of speaker-invariant representation for speech emotion recognition

Speech emotion recognition (SER) has attracted great attention in recent...
research
01/22/2023

Leveraging Speaker Embeddings with Adversarial Multi-task Learning for Age Group Classification

Recently, researchers have utilized neural network-based speaker embeddi...
research
11/27/2016

Invariant Representations for Noisy Speech Recognition

Modern automatic speech recognition (ASR) systems need to be robust unde...
research
03/27/2020

Mic2Mic: Using Cycle-Consistent Generative Adversarial Networks to Overcome Microphone Variability in Speech Systems

Mobile and embedded devices are increasingly using microphones and audio...
research
03/28/2022

Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions

In our previous work, we proposed a language-independent speaker anonymi...

Please sign up or login with your details

Forgot password? Click here to reset