Alzheimer's Dementia Detection from Audio and Text Modalities
Automatic detection of Alzheimer's dementia by speech processing is enhanced when features of both the acoustic waveform and the content are extracted. Audio and text transcription have been widely used in health-related tasks, as spectral and prosodic speech features, as well as semantic and linguistic content, convey information about various diseases. Hence, this paper describes the joint work of the GTM-UVIGO research group and acceXible startup to the ADDReSS challenge at INTERSPEECH 2020. The submitted systems aim to detect patterns of Alzheimer's disease from both the patient's voice and message transcription. Six different systems have been built and compared: four of them are speech-based and the other two systems are text-based. The x-vector, i-vector, and statistical speech-based functionals features are evaluated. As a lower speaking fluency is a common pattern in patients with Alzheimer's disease, rhythmic features are also proposed. For transcription analysis, two systems are proposed: one uses GloVe word embedding features and the other uses several features extracted by language modelling. Several intra-modality and inter-modality score fusion strategies are investigated. The performance of single modality and multimodal systems are presented. The achieved results are promising, outperforming the results achieved by the ADDReSS's baseline systems.
READ FULL TEXT