A knowledge-driven vowel-based approach of depression classification from speech using data augmentation

10/27/2022
by   Kexin Feng, et al.
0

We propose a novel explainable machine learning (ML) model that identifies depression from speech, by modeling the temporal dependencies across utterances and utilizing the spectrotemporal information at the vowel level. Our method first models the variable-length utterances at the local-level into a fixed-size vowel-based embedding using a convolutional neural network with a spatial pyramid pooling layer ("vowel CNN"). Following that, the depression is classified at the global-level from a group of vowel CNN embeddings that serve as the input of another 1D CNN ("depression CNN"). Different data augmentation methods are designed for both the training of vowel CNN and depression CNN. We investigate the performance of the proposed system at various temporal granularities when modeling short, medium, and long analysis windows, corresponding to 10, 21, and 42 utterances, respectively. The proposed method reaches comparable performance with previous state-of-the-art approaches and depicts explainable properties with respect to the depression outcome. The findings from this work may benefit clinicians by providing additional intuitions during joint human-ML decision-making tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2022

Toward Knowledge-Driven Speech-Based Models of Depression: Leveraging Spectrotemporal Variations in Speech Vowels

Psychomotor retardation associated with depression has been linked with ...
research
02/20/2019

Utterance-level end-to-end language identification using attention-based CNN-BLSTM

In this paper, we present an end-to-end language identification framewor...
research
02/26/2019

Acoustic scene classification using multi-layer temporal pooling based on convolutional neural network

The temporal dynamics and the discriminative information in the audio si...
research
09/13/2023

The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease detection

Machine Learning (ML) has emerged as a promising approach in healthcare,...
research
10/12/2022

Towards Generalized and Explainable Long-Range Context Representation for Dialogue Systems

Context representation is crucial to both dialogue understanding and gen...
research
05/21/2020

Multistream CNN for Robust Acoustic Modeling

This paper presents multistream CNN, a novel neural network architecture...
research
07/18/2022

The Vocal Signature of Social Anxiety: Exploration using Hypothesis-Testing and Machine-Learning Approaches

Background - Social anxiety (SA) is a common and debilitating condition,...

Please sign up or login with your details

Forgot password? Click here to reset