Multimodal Analytics for Real-world News using Measures of Cross-modal Entity Consistency

by   Eric Müller-Budack, et al.

The World Wide Web has become a popular source for gathering information and news. Multimodal information, e.g., enriching text with photos, is typically used to convey the news more effectively or to attract attention. Photo content can range from decorative, depict additional important information, or can even contain misleading information. Therefore, automatic approaches to quantify cross-modal consistency of entity representation can support human assessors to evaluate the overall multimodal message, for instance, with regard to bias or sentiment. In some cases such measures could give hints to detect fake news, which is an increasingly important topic in today's society. In this paper, we introduce a novel task of cross-modal consistency verification in real-world news and present a multimodal approach to quantify the entity coherence between image and text. Named entity linking is applied to extract persons, locations, and events from news texts. Several measures are suggested to calculate cross-modal similarity for these entities using state of the art approaches. In contrast to previous work, our system automatically gathers example data from the Web and is applicable to real-world news. Results on two novel datasets that cover different languages, topics, and domains demonstrate the feasibility of our approach. Datasets and code are publicly available to foster research towards this new direction.


page 1

page 3

page 7


QuTI! Quantifying Text-Image Consistency in Multimodal Documents

The World Wide Web and social media platforms have become popular source...

Cross-modal Contrastive Learning for Multimodal Fake News Detection

Automatic detection of multimodal fake news has gained a widespread atte...

Synthetic Misinformers: Generating and Combating Multimodal Misinformation

With the expansion of social media and the increasing dissemination of m...

EDIS: Entity-Driven Image Search over Multimodal Web Content

Making image retrieval methods practical for real-world search applicati...

Understanding, Categorizing and Predicting Semantic Image-Text Relations

Two modalities are often used to convey information in a complementary a...

MM-Locate-News: Multimodal Focus Location Estimation in News

The consumption of news has changed significantly as the Web has become ...

Finding Person Relations in Image Data of the Internet Archive

The multimedia content in the World Wide Web is rapidly growing and cont...

Please sign up or login with your details

Forgot password? Click here to reset