Towards Multi-Modal Sarcasm Detection via Hierarchical Congruity Modeling with Knowledge Enhancement

10/07/2022
by   Hui Liu, et al.
0

Sarcasm is a linguistic phenomenon indicating a discrepancy between literal meanings and implied intentions. Due to its sophisticated nature, it is usually challenging to be detected from the text itself. As a result, multi-modal sarcasm detection has received more attention in both academia and industries. However, most existing techniques only modeled the atomic-level inconsistencies between the text input and its accompanying image, ignoring more complex compositions for both modalities. Moreover, they neglected the rich information contained in external knowledge, e.g., image captions. In this paper, we propose a novel hierarchical framework for sarcasm detection by exploring both the atomic-level congruity based on multi-head cross attention mechanism and the composition-level congruity based on graph neural networks, where a post with low congruity can be identified as sarcasm. In addition, we exploit the effect of various knowledge resources for sarcasm detection. Evaluation results on a public multi-modal sarcasm detection dataset based on Twitter demonstrate the superiority of our proposed model.

READ FULL TEXT

page 1

page 12

research
09/30/2021

Multi-Modal Sarcasm Detection Based on Contrastive Attention Mechanism

In the past decade, sarcasm detection has been intensively conducted in ...
research
03/22/2022

Multi-Modal Learning for AU Detection Based on Multi-Head Fused Transformers

Multi-modal learning has been intensified in recent years, especially fo...
research
04/05/2023

Detecting and Grounding Multi-Modal Media Manipulation

Misinformation has become a pressing issue. Fake media, in both visual a...
research
07/14/2023

MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System

Multi-modal sarcasm detection has attracted much recent attention. Never...
research
12/13/2021

ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition

Recently, Multi-modal Named Entity Recognition (MNER) has attracted a lo...
research
09/09/2023

Towards Better Multi-modal Keyphrase Generation via Visual Entity Enhancement and Multi-granularity Image Noise Filtering

Multi-modal keyphrase generation aims to produce a set of keyphrases tha...
research
06/24/2019

CORAL8: Concurrent Object Regression for Area Localization in Medical Image Panels

This work tackles the problem of generating a medical report for multi-i...

Please sign up or login with your details

Forgot password? Click here to reset