C-CLIP: Contrastive Image-Text Encoders to Close the Descriptive-Commentative Gap

09/06/2023
by   William Theisen, et al.
0

The interplay between the image and comment on a social media post is one of high importance for understanding its overall message. Recent strides in multimodal embedding models, namely CLIP, have provided an avenue forward in relating image and text. However the current training regime for CLIP models is insufficient for matching content found on social media, regardless of site or language. Current CLIP training data is based on what we call “descriptive” text: text in which an image is merely described. This is something rarely seen on social media, where the vast majority of text content is “commentative” in nature. The captions provide commentary and broader context related to the image, rather than describing what is in it. Current CLIP models perform poorly on retrieval tasks where image-caption pairs display a commentative relationship. Closing this gap would be beneficial for several important application areas related to social media. For instance, it would allow groups focused on Open-Source Intelligence Operations (OSINT) to further aid efforts during disaster events, such as the ongoing Russian invasion of Ukraine, by easily exposing data to non-technical users for discovery and analysis. In order to close this gap we demonstrate that training contrastive image-text encoders on explicitly commentative pairs results in large improvements in retrieval results, with the results extending across a variety of non-English languages.

READ FULL TEXT

page 2

page 3

page 5

page 11

research
09/14/2023

Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary tasks

Effectively leveraging multimodal information from social media posts is...
research
10/29/2022

NTULM: Enriching Social Media Text Representations with Non-Textual Units

On social media, additional context is often present in the form of anno...
research
05/17/2019

Deep Unified Multimodal Embeddings for Understanding both Content and Users in Social Media Networks

There has been an explosion of multimodal content generated on social me...
research
02/14/2018

MemeSequencer: Sparse Matching for Embedding Image Macros

The analysis of the creation, mutation, and propagation of social media ...
research
05/13/2018

Autoencoder as Assistant Supervisor: Improving Text Representation for Chinese Social Media Text Summarization

Most of the current abstractive text summarization models are based on t...
research
03/06/2017

Cats and Captions vs. Creators and the Clock: Comparing Multimodal Content to Context in Predicting Relative Popularity

The content of today's social media is becoming more and more rich, incr...
research
10/13/2021

TAG: Toward Accurate Social Media Content Tagging with a Concept Graph

Although conceptualization has been widely studied in semantics and know...

Please sign up or login with your details

Forgot password? Click here to reset