Self-supervised representations in speech-based depression detection

05/20/2023
by   Wen Wu, et al.
0

This paper proposes handling training data sparsity in speech-based automatic depression detection (SDD) using foundation models pre-trained with self-supervised learning (SSL). An analysis of SSL representations derived from different layers of pre-trained foundation models is first presented for SDD, which provides insight to suitable indicator for depression detection. Knowledge transfer is then performed from automatic speech recognition (ASR) and emotion recognition to SDD by fine-tuning the foundation models. Results show that the uses of oracle and ASR transcriptions yield similar SDD performance when the hidden representations of the ASR model is incorporated along with the ASR textual information. By integrating representations from multiple foundation models, state-of-the-art SDD results based on real ASR were achieved on the DAIC-WOZ dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/04/2021

A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding

Self-supervised speech representations such as wav2vec 2.0 and HuBERT ar...
research
07/10/2021

Layer-wise Analysis of a Self-supervised Speech Representation Model

Recently proposed self-supervised learning approaches have been successf...
research
05/14/2023

Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations

Self-supervised learning (SSL) speech models such as wav2vec and HuBERT ...
research
06/09/2023

Developing Speech Processing Pipelines for Police Accountability

Police body-worn cameras have the potential to improve accountability an...
research
07/13/2023

Adapting an ASR Foundation Model for Spoken Language Assessment

A crucial part of an accurate and reliable spoken language assessment sy...
research
09/14/2023

USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models

We introduce a multilingual speaker change detection model (USM-SCD) tha...
research
02/12/2023

ASR Bundestag: A Large-Scale political debate dataset in German

We present ASR Bundestag, a dataset for automatic speech recognition in ...

Please sign up or login with your details

Forgot password? Click here to reset