Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs

06/29/2021
by   Morteza Rohanian, et al.
0

We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker in a structured diagnostic task has Alzheimer's Disease and to what degree, evaluating the ADReSSo challenge 2021 data. Our best model, a BiLSTM with highway layers using words, word probabilities, disfluency features, pause information, and a variety of acoustic features, achieves an accuracy of 84 and RSME error prediction of 4.26 on MMSE cognitive scores. While predicting cognitive decline is more challenging, our models show improvement using the multimodal approach and word probabilities, disfluency and pause information over word-only models. We show considerable gains for AD classification using multimodal fusion and gating, which can effectively deal with noisy inputs from acoustic features and ASR hypotheses.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/03/2020

Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech

In this work, we explore a multimodal semi-supervised learning approach ...
research
05/28/2018

Multimodal Speaker Segmentation and Diarization using Lexical and Acoustic Cues via Sequence to Sequence Neural Networks

While there has been substantial amount of work in speaker diarization r...
research
02/02/2022

ASR-Aware End-to-end Neural Diarization

We present a Conformer-based end-to-end neural diarization (EEND) model ...
research
03/15/2023

Autonomous Soundscape Augmentation with Multimodal Fusion of Visual and Participant-linked Inputs

Autonomous soundscape augmentation systems typically use trained models ...
research
06/24/2019

Multimodal and Multi-view Models for Emotion Recognition

Studies on emotion recognition (ER) show that combining lexical and acou...
research
05/23/2023

Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding

Attention-based encoder-decoder (AED) models have shown impressive perfo...
research
08/24/2023

Attention-Based Acoustic Feature Fusion Network for Depression Detection

Depression, a common mental disorder, significantly influences individua...

Please sign up or login with your details

Forgot password? Click here to reset