Multimodal Emotion-Cause Pair Extraction in Conversations
Emotion cause analysis has received considerable attention in recent years. Previous studies primarily focused on emotion cause extraction from texts in news articles or microblogs. It is also interesting to discover emotions and their causes in conversations. As conversation in its natural form is multimodal, a large number of studies have been carried out on multimodal emotion recognition in conversations, but there is still a lack of work on multimodal emotion cause analysis. In this work, we introduce a new task named Multimodal Emotion-Cause Pair Extraction in Conversations, aiming to jointly extract emotions and their associated causes from conversations reflected in multiple modalities (text, audio and video). We accordingly construct a multimodal conversational emotion cause dataset, Emotion-Cause-in-Friends, which contains 9,272 multimodal emotion-cause pairs annotated on 13,509 utterances in the sitcom Friends. We finally benchmark the task by establishing a baseline system that incorporates multimodal features for emotion-cause pair extraction. Preliminary experimental results demonstrate the potential of multimodal information fusion for discovering both emotions and causes in conversations.
READ FULL TEXT