MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System

07/14/2023
by   Libo Qin, et al.
0

Multi-modal sarcasm detection has attracted much recent attention. Nevertheless, the existing benchmark (MMSD) has some shortcomings that hinder the development of reliable multi-modal sarcasm detection system: (1) There are some spurious cues in MMSD, leading to the model bias learning; (2) The negative samples in MMSD are not always reasonable. To solve the aforementioned issues, we introduce MMSD2.0, a correction dataset that fixes the shortcomings of MMSD, by removing the spurious cues and re-annotating the unreasonable samples. Meanwhile, we present a novel framework called multi-view CLIP that is capable of leveraging multi-grained cues from multiple perspectives (i.e., text, image, and text-image interaction view) for multi-modal sarcasm detection. Extensive experiments show that MMSD2.0 is a valuable benchmark for building reliable multi-modal sarcasm detection systems and multi-view CLIP can significantly outperform the previous best baselines.

READ FULL TEXT

page 1

page 3

page 5

page 8

research
12/26/2016

Image-Text Multi-Modal Representation Learning by Adversarial Backpropagation

We present novel method for image-text multi-modal representation learni...
research
04/14/2022

Early Myocardial Infarction Detection with One-Class Classification over Multi-view Echocardiography

Myocardial infarction (MI) is the leading cause of mortality and morbidi...
research
01/04/2019

MultiDEC: Multi-Modal Clustering of Image-Caption Pairs

In this paper, we propose a method for clustering image-caption pairs by...
research
08/11/2022

H4M: Heterogeneous, Multi-source, Multi-modal, Multi-view and Multi-distributional Dataset for Socioeconomic Analytics in the Case of Beijing

The study of socioeconomic status has been reformed by the availability ...
research
08/09/2023

Multi-View Fusion and Distillation for Subgrade Distresses Detection based on 3D-GPR

The application of 3D ground-penetrating radar (3D-GPR) for subgrade dis...
research
10/07/2022

Towards Multi-Modal Sarcasm Detection via Hierarchical Congruity Modeling with Knowledge Enhancement

Sarcasm is a linguistic phenomenon indicating a discrepancy between lite...
research
09/14/2022

ImageArg: A Multi-modal Tweet Dataset for Image Persuasiveness Mining

The growing interest in developing corpora of persuasive texts has promo...

Please sign up or login with your details

Forgot password? Click here to reset