DeepAI AI Chat
Log In Sign Up

Cross-media Structured Common Space for Multimedia Event Extraction

by   Manling Li, et al.
University of Illinois at Urbana-Champaign
Columbia University

We introduce a new task, MultiMedia Event Extraction (M2E2), which aims to extract events and their arguments from multimedia documents. We develop the first benchmark and collect a dataset of 245 multimedia news articles with extensively annotated events and arguments. We propose a novel method, Weakly Aligned Structured Embedding (WASE), that encodes structured representations of semantic information from textual and visual data into a common embedding space. The structures are aligned across modalities by employing a weakly supervised training strategy, which enables exploiting available resources without explicit cross-media annotation. Compared to uni-modal state-of-the-art methods, our approach achieves 4.0 event argument role labeling and visual event extraction. Compared to state-of-the-art multimedia unstructured representations, we achieve 8.3 5.0 labeling, respectively. By utilizing images, we extract 21.4 mentions than traditional text-only methods.


page 3

page 5

page 8

page 9


Joint Multimedia Event Extraction from Video and Article

Visual and textual modalities contribute complementary information about...

Video Event Extraction via Tracking Visual States of Arguments

Video event extraction aims to detect salient events from a video and id...

Event Argument Extraction using Causal Knowledge Structures

Event Argument extraction refers to the task of extracting structured in...

CLIP-Event: Connecting Text and Images with Event Structures

Vision-language (V+L) pretraining models have achieved great success in ...

Multimodal Event Graphs: Towards Event Centric Understanding of Multimodal World

Understanding how events described or shown in multimedia content relate...

Visual Semantic Multimedia Event Model for Complex Event Detection in Video Streams

Multimedia data is highly expressive and has traditionally been very dif...

Self-paced Learning for Weakly Supervised Evidence Discovery in Multimedia Event Search

Multimedia event detection has been receiving increasing attention in re...