How to Describe Images in a More Funny Way? Towards a Modular Approach to Cross-Modal Sarcasm Generation

11/20/2022
by   Jie Ruan, et al.
0

Sarcasm generation has been investigated in previous studies by considering it as a text-to-text generation problem, i.e., generating a sarcastic sentence for an input sentence. In this paper, we study a new problem of cross-modal sarcasm generation (CMSG), i.e., generating a sarcastic description for a given image. CMSG is challenging as models need to satisfy the characteristics of sarcasm, as well as the correlation between different modalities. In addition, there should be some inconsistency between the two modalities, which requires imagination. Moreover, high-quality training data is insufficient. To address these problems, we take a step toward generating sarcastic descriptions from images without paired training data and propose an Extraction-Generation-Ranking based Modular method (EGRM) for cross-model sarcasm generation. Specifically, EGRM first extracts diverse information from an image at different levels and uses the obtained image tags, sentimental descriptive caption, and commonsense-based consequence to generate candidate sarcastic texts. Then, a comprehensive ranking algorithm, which considers image-text relation, sarcasticness, and grammaticality, is proposed to select a final text from the candidate texts. Human evaluation at five criteria on a total of 1200 generated image-text pairs from eight systems and auxiliary automatic evaluation show the superiority of our method.

READ FULL TEXT
research
08/19/2019

Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck

Deep generative models have led to significant advances in cross-modal g...
research
05/30/2023

Unsupervised Melody-to-Lyric Generation

Automatic melody-to-lyric generation is a task in which song lyrics are ...
research
05/12/2023

Unsupervised Melody-Guided Lyrics Generation

Automatic song writing is a topic of significant practical interest. How...
research
12/29/2020

Generating Wikipedia Article Sections from Diverse Data Sources

Datasets for data-to-text generation typically focus either on multi-dom...
research
04/23/2018

Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training

Automatic generation of natural language from images has attracted exten...
research
01/23/2019

"Is this an example image?" -- Predicting the Relative Abstractness Level of Image and Text

Successful multimodal search and retrieval requires the automatic unders...
research
09/13/2022

Visual Recipe Flow: A Dataset for Learning Visual State Changes of Objects with Recipe Flows

We present a new multimodal dataset called Visual Recipe Flow, which ena...

Please sign up or login with your details

Forgot password? Click here to reset