Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts

04/19/2019
by   Julia Kruk, et al.
18

Computing author intent from multimodal data like Instagram posts requires modeling a complex relationship between text and image. For example a caption might reflect ironically on the image, so neither the caption nor the image is a mere transcript of the other. Instead they combine -- via what has been called meaning multiplication -- to create a new meaning that has a more complex relation to the literal meanings of text and image. Here we introduce a multimodal dataset of 1299 Instagram post labeled for three orthogonal taxonomies: the authorial intent behind the image-caption pair, the contextual relationship between the literal meanings of the image and caption, and the semiotic relationship between the signified meanings of the image and caption. We build a baseline deep multimodal classifier to validate the taxonomy, showing that employing both text and image improves intent detection by 8 compared to using only image modality, demonstrating the commonality of non-intersective meaning multiplication. Our dataset offers an important resource for the study of the rich meanings that results from pairing text and image.

READ FULL TEXT

page 1

page 3

page 8

research
09/09/2022

MIntRec: A New Dataset for Multimodal Intent Recognition

Multimodal intent recognition is a significant task for understanding hu...
research
09/14/2023

Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary tasks

Effectively leveraging multimodal information from social media posts is...
research
10/18/2022

MMGA: Multimodal Learning with Graph Alignment

Multimodal pre-training breaks down the modality barriers and allows the...
research
10/13/2021

MMIU: Dataset for Visual Intent Understanding in Multimodal Assistants

In multimodal assistant, where vision is also one of the input modalitie...
research
05/09/2023

WikiWeb2M: A Page-Level Multimodal Wikipedia Dataset

Webpages have been a rich resource for language and vision-language task...
research
08/09/2021

FiLMing Multimodal Sarcasm Detection with Attention

Sarcasm detection identifies natural language expressions whose intended...
research
09/06/2023

A Multimodal Analysis of Influencer Content on Twitter

Influencer marketing involves a wide range of strategies in which brands...

Please sign up or login with your details

Forgot password? Click here to reset