Cascaded Cross-Modal Transformer for Request and Complaint Detection

07/27/2023
by   Nicolae-Catalin Ristea, et al.
0

We propose a novel cascaded cross-modal transformer (CCMT) that combines speech and text transcripts to detect customer requests and complaints in phone conversations. Our approach leverages a multimodal paradigm by transcribing the speech using automatic speech recognition (ASR) models and translating the transcripts into different languages. Subsequently, we combine language-specific BERT-based models with Wav2Vec2.0 audio features in a novel cascaded cross-attention transformer model. We apply our system to the Requests Sub-Challenge of the ACM Multimedia 2023 Computational Paralinguistics Challenge, reaching unweighted average recalls (UAR) of 65.41 the complaint and request classes, respectively.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset