REVECA – Rich Encoder-decoder framework for Video Event CAptioner

06/18/2022
by   Jaehyuk Heo, et al.
0

We describe an approach used in the Generic Boundary Event Captioning challenge at the Long-Form Video Understanding Workshop held at CVPR 2022. We designed a Rich Encoder-decoder framework for Video Event CAptioner (REVECA) that utilizes spatial and temporal information from the video to generate a caption for the corresponding the event boundary. REVECA uses frame position embedding to incorporate information before and after the event boundary. Furthermore, it employs features extracted using the temporal segment network and temporal-based pairwise difference method to learn temporal information. A semantic segmentation mask for the attentional pooling process is adopted to learn the subject of an event. Finally, LoRA is applied to fine-tune the image encoder to enhance the learning efficiency. REVECA yielded an average score of 50.97 on the Kinetics-GEBC test data, which is an improvement of 10.17 over the baseline method. Our code is available in https://github.com/TooTouch/REVECA.

READ FULL TEXT
research
07/03/2022

Exploiting Context Information for Generic Event Boundary Captioning

Generic Event Boundary Captioning (GEBC) aims to generate three sentence...
research
06/17/2023

LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning

Our winning entry for the CVPR 2023 Generic Event Boundary Captioning (G...
research
08/08/2021

Discriminative Latent Semantic Graph for Video Captioning

Video captioning aims to automatically generate natural language sentenc...
research
06/17/2022

Masked Autoencoders for Generic Event Boundary Detection CVPR'2022 Kinetics-GEBD Challenge

Generic Event Boundary Detection (GEBD) tasks aim at detecting generic, ...
research
09/18/2021

Small Lesion Segmentation in Brain MRIs with Subpixel Embedding

We present a method to segment MRI scans of the human brain into ischemi...
research
06/30/2023

Obscured Wildfire Flame Detection By Temporal Analysis of Smoke Patterns Captured by Unmanned Aerial Systems

This research paper addresses the challenge of detecting obscured wildfi...
research
01/17/2020

Temporal Interlacing Network

For a long time, the vision community tries to learn the spatio-temporal...

Please sign up or login with your details

Forgot password? Click here to reset