Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment

05/22/2018

∙

Multimodal affective computing, learning to recognize and interpret human affects and subjective information from multiple data sources, is still challenging because: (i) it is hard to extract informative features to represent human affects from heterogeneous inputs; (ii) current fusion strategies only fuse different modalities at abstract level, ignoring time-dependent interactions between modalities. Addressing such issues, we introduce a hierarchical multimodal architecture with attention and word-level fusion to classify utter-ance-level sentiment and emotion from text and audio data. Our introduced model outperforms the state-of-the-art approaches on published datasets and we demonstrated that our model is able to visualize and interpret the synchronized attention over modalities.

READ FULL TEXT

Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment

Sign in with Google

Consider DeepAI Pro