Training Multimedia Event Extraction With Generated Images and Captions

06/15/2023
by   Zilin Du, et al.
0

Contemporary news reporting increasingly features multimedia content, motivating research on multimedia event extraction. However, the task lacks annotated multimodal training data and artificially generated training data suffer from distribution shift from real-world data. In this paper, we propose Cross-modality Augmented Multimedia Event Learning (CAMEL), which successfully utilizes artificially generated multimodal training data and achieves state-of-the-art performance. We start with two labeled unimodal datasets in text and image respectively, and generate the missing modality using off-the-shelf image generators like Stable Diffusion and image captioners like BLIP. After that, we train the network on the resultant multimodal datasets. In order to learn robust features that are effective across domains, we devise an iterative and gradual training strategy. Substantial experiments show that CAMEL surpasses state-of-the-art (SOTA) baselines on the M2E2 benchmark. On multimedia events in particular, we outperform the prior SOTA by 4.2 event mention identification and by 9.8 indicates that CAMEL learns synergistic representations from the two modalities. Our work demonstrates a recipe to unleash the power of synthetic training data in structured prediction.

READ FULL TEXT

page 1

page 2

page 5

research
05/05/2020

Cross-media Structured Common Space for Multimedia Event Extraction

We introduce a new task, MultiMedia Event Extraction (M2E2), which aims ...
research
06/14/2022

Multimodal Event Graphs: Towards Event Centric Understanding of Multimodal World

Understanding how events described or shown in multimedia content relate...
research
04/27/2023

VERITE: A Robust Benchmark for Multimodal Misinformation Detection Accounting for Unimodal Bias

Multimedia content has become ubiquitous on social media platforms, lead...
research
11/03/2020

Evoking Places from Spaces. The application of multimodal narrative techniques in the creation of "U Modified"

Multimodal diegetic narrative tools, as applied in multimedia arts pract...
research
12/08/2021

Unimodal Face Classification with Multimodal Training

Face recognition is a crucial task in various multimedia applications su...
research
08/14/2018

Cross-Lingual Cross-Platform Rumor Verification Pivoting on Multimedia Content

With the increasing popularity of smart devices, rumors with multimedia ...
research
11/23/2020

MEG: Multi-Evidence GNN for Multimodal Semantic Forensics

Fake news often involves semantic manipulations across modalities such a...

Please sign up or login with your details

Forgot password? Click here to reset