DeepAI AI Chat
Log In Sign Up

Depa: Self-supervised audio embedding for depression detection

by   Heinrich Dinkel, et al.

Depression detection research has increased over the last few decades as this disease is becoming a socially-centered problem. One major bottleneck for developing automatic depression detection methods lies on the limited data availability. Recently, pretrained text-embeddings have seen success in sparse data scenarios, while pretrained audio embeddings are rarely investigated. This paper proposes DEPA, a self-supervised, Word2Vec like pretrained depression audio embedding method for depression detection. An encoder-decoder network is used to extract DEPA on sparse-data in-domain (DAIC) and large-data out-domain (switchboard, Alzheimer's) datasets. With DEPA as the audio embedding, performance significantly outperforms traditional audio features regarding both classification and regression metrics. Moreover, we show that large-data out-domain pretraining is beneficial to depression detection performance.


page 1

page 2

page 3

page 4


ASiT: Audio Spectrogram vIsion Transformer for General Audio Representation

Vision transformers, which were originally developed for natural languag...

BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping

Methods for extracting audio and speech features have been studied since...

How Transferable Are Self-supervised Features in Medical Image Classification Tasks?

Transfer learning has become a standard practice to mitigate the lack of...

ViT5: Pretrained Text-to-Text Transformer for Vietnamese Language Generation

We present ViT5, a pretrained Transformer-based encoder-decoder model fo...

Matching Text and Audio Embeddings: Exploring Transfer-learning Strategies for Language-based Audio Retrieval

We present an analysis of large-scale pretrained deep learning models us...

Improving automated segmentation of radio shows with audio embeddings

Audio features have been proven useful for increasing the performance of...

I'm Sorry for Your Loss: Spectrally-Based Audio Distances Are Bad at Pitch

Growing research demonstrates that synthetic failure modes imply poor ge...