A Topic-Attentive Transformer-based Model For Multimodal Depression Detection

06/27/2022
by   Yanrong Guo, et al.
0

Depression is one of the most common mental disorders, which imposes heavy negative impacts on one's daily life. Diagnosing depression based on the interview is usually in the form of questions and answers. In this process, the audio signals and their text transcripts of a subject are correlated to depression cues and easily recorded. Therefore, it is feasible to build an Automatic Depression Detection (ADD) model based on the data of these modalities in practice. However, there are two major challenges that should be addressed for constructing an effective ADD model. The first challenge is the organization of the textual and audio data, which can be of various contents and lengths for different subjects. The second challenge is the lack of training samples due to the privacy concern. Targeting to these two challenges, we propose the TOpic ATtentive transformer-based ADD model, abbreviated as TOAT. To address the first challenge, in the TOAT model, topic is taken as the basic unit of the textual and audio data according to the question-answer form in a typical interviewing process. Based on that, a topic attention module is designed to learn the importance of of each topic, which helps the model better retrieve the depressed samples. To solve the issue of data scarcity, we introduce large pre-trained models, and the fine-tuning strategy is adopted based on the small-scale ADD training data. We also design a two-branch architecture with a late-fusion strategy for building the TOAT model, in which the textual and audio data are encoded independently. We evaluate our model on the multimodal DAIC-WOZ dataset specifically designed for the ADD task. Experimental results show the superiority of our method. More importantly, the ablation studies demonstrate the effectiveness of the key elements in the TOAT model.

READ FULL TEXT

page 1

page 3

page 10

research
03/03/2022

The Vicomtech Audio Deepfake Detection System based on Wav2Vec2 for the 2022 ADD Challenge

This paper describes our submitted systems to the 2022 ADD challenge wit...
research
07/04/2021

Audio-Oriented Multimodal Machine Comprehension: Task, Dataset and Model

While Machine Comprehension (MC) has attracted extensive research intere...
research
08/20/2023

The DKU-DUKEECE System for the Manipulation Region Location Task of ADD 2023

This paper introduces our system designed for Track 2, which focuses on ...
research
09/01/2021

CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations

Existing audio-language task-specific predictive approaches focus on bui...
research
09/03/2019

Multimodal Deep Learning for Mental Disorders Prediction from Audio Speech Samples

Key features of mental illnesses are reflected in speech. Our research f...
research
05/25/2023

Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion

Audio Deepfake Detection (ADD) aims to detect the fake audio generated b...

Please sign up or login with your details

Forgot password? Click here to reset