Multi-source Semantic Graph-based Multimodal Sarcasm Explanation Generation

06/29/2023
by   Liqiang Jing, et al.
0

Multimodal Sarcasm Explanation (MuSE) is a new yet challenging task, which aims to generate a natural language sentence for a multimodal social post (an image as well as its caption) to explain why it contains sarcasm. Although the existing pioneer study has achieved great success with the BART backbone, it overlooks the gap between the visual feature space and the decoder semantic space, the object-level metadata of the image, as well as the potential external knowledge. To solve these limitations, in this work, we propose a novel mulTi-source sEmantic grAph-based Multimodal sarcasm explanation scheme, named TEAM. In particular, TEAM extracts the object-level semantic meta-data instead of the traditional global visual features from the input image. Meanwhile, TEAM resorts to ConceptNet to obtain the external related knowledge concepts for the input text and the extracted object meta-data. Thereafter, TEAM introduces a multi-source semantic graph that comprehensively characterize the multi-source (i.e., caption, object meta-data, external knowledge) semantic relations to facilitate the sarcasm reasoning. Extensive experiments on a public released dataset MORE verify the superiority of our model over cutting-edge methods.

READ FULL TEXT

page 1

page 3

page 5

page 8

research
05/10/2023

Combo of Thinking and Observing for Outside-Knowledge VQA

Outside-knowledge visual question answering is a challenging task that r...
research
09/05/2019

Stack-VS: Stacked Visual-Semantic Attention for Image Caption Generation

Recently, automatic image caption generation has been an important focus...
research
07/16/2022

Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model

Text response generation for multimodal task-oriented dialog systems, wh...
research
10/07/2020

VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks

Text-to-image multimodal tasks, generating/retrieving an image from a gi...
research
11/28/2018

Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding

We address the problem of phrase grounding by learning a multi-level com...
research
05/17/2023

Dual Semantic Knowledge Composed Multimodal Dialog Systems

Textual response generation is an essential task for multimodal task-ori...
research
04/23/2023

CBIM: A Graph-based Approach to Enhance Interoperability Using Semantic Enrichment

Interoperability remains a challenge in the construction industry. In th...

Please sign up or login with your details

Forgot password? Click here to reset