Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts

by   Julia Kruk, et al.
SRI International

Computing author intent from multimodal data like Instagram posts requires modeling a complex relationship between text and image. For example a caption might reflect ironically on the image, so neither the caption nor the image is a mere transcript of the other. Instead they combine -- via what has been called meaning multiplication -- to create a new meaning that has a more complex relation to the literal meanings of text and image. Here we introduce a multimodal dataset of 1299 Instagram post labeled for three orthogonal taxonomies: the authorial intent behind the image-caption pair, the contextual relationship between the literal meanings of the image and caption, and the semiotic relationship between the signified meanings of the image and caption. We build a baseline deep multimodal classifier to validate the taxonomy, showing that employing both text and image improves intent detection by 8 compared to using only image modality, demonstrating the commonality of non-intersective meaning multiplication. Our dataset offers an important resource for the study of the rich meanings that results from pairing text and image.


page 1

page 3

page 8


MIntRec: A New Dataset for Multimodal Intent Recognition

Multimodal intent recognition is a significant task for understanding hu...

Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary tasks

Effectively leveraging multimodal information from social media posts is...

MMGA: Multimodal Learning with Graph Alignment

Multimodal pre-training breaks down the modality barriers and allows the...

MMIU: Dataset for Visual Intent Understanding in Multimodal Assistants

In multimodal assistant, where vision is also one of the input modalitie...

WikiWeb2M: A Page-Level Multimodal Wikipedia Dataset

Webpages have been a rich resource for language and vision-language task...

FiLMing Multimodal Sarcasm Detection with Attention

Sarcasm detection identifies natural language expressions whose intended...

A Multimodal Analysis of Influencer Content on Twitter

Influencer marketing involves a wide range of strategies in which brands...

Code Repositories

Please sign up or login with your details

Forgot password? Click here to reset