Affective Feedback Synthesis Towards Multimodal Text and Image Data

03/23/2022
by   Puneet Kumar, et al.
0

In this paper, we have defined a novel task of affective feedback synthesis that deals with generating feedback for input text corresponding image in a similar way as humans respond towards the multimodal data. A feedback synthesis system has been proposed and trained using ground-truth human comments along with image-text input. We have also constructed a large-scale dataset consisting of image, text, Twitter user comments, and the number of likes for the comments by crawling the news articles through Twitter feeds. The proposed system extracts textual features using a transformer-based textual encoder while the visual features have been extracted using a Faster region-based convolutional neural networks model. The textual and visual features have been concatenated to construct the multimodal features using which the decoder synthesizes the feedback. We have compared the results of the proposed system with the baseline models using quantitative and qualitative measures. The generated feedbacks have been analyzed using automatic and human evaluation. They have been found to be semantically similar to the ground-truth comments and relevant to the given text-image input.

READ FULL TEXT

page 1

page 2

page 7

page 8

page 17

page 18

research
10/09/2019

Exploring Hate Speech Detection in Multimodal Publications

In this work we target the problem of hate speech detection in multimoda...
research
11/30/2020

Flood Detection via Twitter Streams using Textual and Visual Features

The paper presents our proposed solutions for the MediaEval 2020 Flood-R...
research
11/12/2018

CUNI System for the WMT18 Multimodal Translation Task

We present our submission to the WMT18 Multimodal Translation Task. The ...
research
07/13/2020

A Feature Analysis for Multimodal News Retrieval

Content-based information retrieval is based on the information containe...
research
02/28/2021

NLP-CUET@DravidianLangTech-EACL2021: Investigating Visual and Textual Features to Identify Trolls from Multimodal Social Media Memes

In the past few years, the meme has become a new way of communication on...
research
09/15/2017

Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning

In this paper, a self-guiding multimodal LSTM (sg-LSTM) image captioning...
research
04/13/2017

Fashion Conversation Data on Instagram

The fashion industry is establishing its presence on a number of visual-...

Please sign up or login with your details

Forgot password? Click here to reset