Dual-Stream Transformer for Generic Event Boundary Captioning

07/07/2022
by   Xin Gu, et al.
0

This paper describes our champion solution for the CVPR2022 Generic Event Boundary Captioning (GEBC) competition. GEBC requires the captioning model to have a comprehension of instantaneous status changes around the given video boundary, which makes it much more challenging than conventional video captioning task. In this paper, a Dual-Stream Transformer with improvements on both video content encoding and captions generation is proposed: (1) We utilize three pre-trained models to extract the video features from different granularities. Moreover, we exploit the types of boundary as hints to help the model generate captions. (2) We particularly design an model, termed as Dual-Stream Transformer, to learn discriminative representations for boundary captioning. (3) Towards generating content-relevant and human-like captions, we improve the description quality by designing a word-level ensemble strategy. The promising results on the GEBC test split demonstrate the efficacy of our proposed model.

READ FULL TEXT
research
06/17/2023

LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning

Our winning entry for the CVPR 2023 Generic Event Boundary Captioning (G...
research
07/03/2022

Exploiting Context Information for Generic Event Boundary Captioning

Generic Event Boundary Captioning (GEBC) aims to generate three sentence...
research
03/22/2023

Text with Knowledge Graph Augmented Transformer for Video Captioning

Video captioning aims to describe the content of videos using natural la...
research
08/04/2021

Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers

Video captioning is an essential technology to understand scenes and des...
research
05/13/2022

Joint Generation of Captions and Subtitles with Dual Decoding

As the amount of audio-visual content increases, the need to develop aut...
research
04/01/2022

Generic Event Boundary Captioning: A Benchmark for Status Changes Understanding

Cognitive science has shown that humans perceive videos in terms of even...
research
05/18/2022

It Isn't Sh!tposting, It's My CAT Posting

In this paper, we describe a novel architecture which can generate hilar...

Please sign up or login with your details

Forgot password? Click here to reset