In Defense of Structural Symbolic Representation for Video Event-Relation Prediction

01/06/2023
by   Andrew Lu, et al.
0

Understanding event relationships in videos requires a model to understand the underlying structures of events, i.e., the event type, the associated argument roles, and corresponding entities) along with factual knowledge needed for reasoning. Structural symbolic representation (SSR) based methods directly take event types and associated argument roles/entities as inputs to perform reasoning. However, the state-of-the-art video event-relation prediction system shows the necessity of using continuous feature vectors from input videos; existing methods based solely on SSR inputs fail completely, event when given oracle event types and argument roles. In this paper, we conduct an extensive empirical analysis to answer the following questions: 1) why SSR-based method failed; 2) how to understand the evaluation setting of video event relation prediction properly; 3) how to uncover the potential of SSR-based methods. We first identify the failure of previous SSR-based video event prediction models to be caused by sub-optimal training settings. Surprisingly, we find that a simple SSR-based model with tuned hyperparameters can actually yield a 20% absolute improvement in macro-accuracy over the state-of-the-art model. Then through qualitative and quantitative analysis, we show how evaluation that takes only video as inputs is currently unfeasible, and the reliance on oracle event information to obtain an accurate evaluation. Based on these findings, we propose to further contextualize the SSR-based model to an Event-Sequence Model and equip it with more factual knowledge through a simple yet effective way of reformulating external visual commonsense knowledge bases into an event-relation prediction pretraining dataset. The resultant new state-of-the-art model eventually establishes a 25% Macro-accuracy performance boost.

READ FULL TEXT

page 3

page 5

page 7

page 8

research
01/13/2022

CLIP-Event: Connecting Text and Images with Event Structures

Vision-language (V+L) pretraining models have achieved great success in ...
research
11/03/2022

Open-Vocabulary Argument Role Prediction for Event Extraction

The argument role in event extraction refers to the relation between an ...
research
08/23/2021

Event Extraction by Associating Event Types and Argument Roles

Event extraction (EE), which acquires structural event knowledge from te...
research
06/06/2023

Joint Event Extraction via Structural Semantic Matching

Event Extraction (EE) is one of the essential tasks in information extra...
research
03/25/2023

COFFEE: A Contrastive Oracle-Free Framework for Event Extraction

Event extraction is a complex information extraction task that involves ...
research
10/03/2019

CLEVRER: CoLlision Events for Video REpresentation and Reasoning

The ability to reason about temporal and causal events from videos lies ...
research
06/19/2013

Event-Object Reasoning with Curated Knowledge Bases: Deriving Missing Information

The broader goal of our research is to formulate answers to why and how ...

Please sign up or login with your details

Forgot password? Click here to reset