Generic Event Boundary Captioning: A Benchmark for Status Changes Understanding

04/01/2022
by   Yuxuan Wang, et al.
0

Cognitive science has shown that humans perceive videos in terms of events separated by state changes of dominant subjects. State changes trigger new events and are one of the most useful among the large amount of redundant information perceived. However, previous research focuses on the overall understanding of segments without evaluating the fine-grained status changes inside. In this paper, we introduce a new dataset called Kinetic-GEBC (Generic Event Boundary Captioning). The dataset consists of over 170k boundaries associated with captions describing status changes in the generic events in 12K videos. Upon this new dataset, we propose three tasks supporting the development of a more fine-grained, robust, and human-like understanding of videos through status changes. We evaluate many representative baselines in our dataset, where we also design a new TPD (Temporal-based Pairwise Difference) Modeling method for current state-of-the-art backbones and achieve significant performance improvements. Besides, the results show there are still formidable challenges for current methods in the utilization of different granularities, representation of visual difference, and the accurate localization of status changes. Further analysis shows that our dataset can drive developing more powerful methods to understand status changes and thus improve video level comprehension.

READ FULL TEXT

page 2

page 5

page 21

page 22

research
06/17/2023

LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning

Our winning entry for the CVPR 2023 Generic Event Boundary Captioning (G...
research
07/03/2022

Exploiting Context Information for Generic Event Boundary Captioning

Generic Event Boundary Captioning (GEBC) aims to generate three sentence...
research
07/07/2022

Dual-Stream Transformer for Generic Event Boundary Captioning

This paper describes our champion solution for the CVPR2022 Generic Even...
research
01/26/2021

Generic Event Boundary Detection: A Benchmark for Event Segmentation

This paper presents a novel task together with a new benchmark for detec...
research
03/11/2023

Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos

Joint video-language learning has received increasing attention in recen...
research
08/01/2022

RISeer: Inspecting the Status and Dynamics of Regional Industrial Structure via Visual Analytics

Restructuring the regional industrial structure (RIS) has the potential ...
research
10/08/2022

Are All Steps Equally Important? Benchmarking Essentiality Detection of Events

Natural language often describes events in different granularities, such...

Please sign up or login with your details

Forgot password? Click here to reset