Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary tasks

09/14/2023
by   Danae Sanchez Villegas, et al.
0

Effectively leveraging multimodal information from social media posts is essential to various downstream tasks such as sentiment analysis, sarcasm detection and hate speech classification. However, combining text and image information is challenging because of the idiosyncratic cross-modal semantics with hidden or complementary information present in matching image-text pairs. In this work, we aim to directly model this by proposing the use of two auxiliary losses jointly with the main task when fine-tuning any pre-trained multimodal model. Image-Text Contrastive (ITC) brings image-text representations of a post closer together and separates them from different posts, capturing underlying dependencies. Image-Text Matching (ITM) facilitates the understanding of semantic correspondence between images and text by penalizing unrelated pairs. We combine these objectives with five multimodal models, demonstrating consistent improvements across four popular social media datasets. Furthermore, through detailed analysis, we shed light on the specific scenarios and cases where each auxiliary task proves to be most effective.

READ FULL TEXT
research
03/27/2023

Borrowing Human Senses: Comment-Aware Self-Training for Social Media Multimodal Classification

Social media is daily creating massive multimedia content with paired im...
research
09/06/2023

C-CLIP: Contrastive Image-Text Encoders to Close the Descriptive-Commentative Gap

The interplay between the image and comment on a social media post is on...
research
04/19/2019

Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts

Computing author intent from multimodal data like Instagram posts requir...
research
06/05/2020

A Dataset and Benchmarks for Multimedia Social Analysis

We present a new publicly available dataset with the goal of advancing m...
research
09/06/2023

A Multimodal Analysis of Influencer Content on Twitter

Influencer marketing involves a wide range of strategies in which brands...
research
01/11/2023

Few-shot Learning for Cross-Target Stance Detection by Aggregating Multimodal Embeddings

Despite the increasing popularity of the stance detection task, existing...
research
09/02/2019

Story-oriented Image Selection and Placement

Multimodal contents have become commonplace on the Internet today, manif...

Please sign up or login with your details

Forgot password? Click here to reset