Object-Oriented Video Captioning with Temporal Graph and Prior Knowledge Building

03/08/2020
by   Fangyi Zhu, et al.
0

Traditional video captioning requests a holistic description of the video, yet the detailed descriptions of the specific objects may not be available. Besides, most methods adopt frame-level inter-object features and ambiguous descriptions during training, which is difficult for learning the vision-language relationships. Without associating the transition trajectories, these image-based methods cannot understand the activities with visual features. We propose a novel task, named object-oriented video captioning, which focuses on understanding the videos in object-level. We re-annotate the object-sentence pairs for more effective cross-modal learning. Thereafter, we design the video-based object-oriented video captioning (OVC)-Net to reliably analyze the activities along time with only visual features and capture the vision-language connections under small datasets stably. To demonstrate the effectiveness, we evaluate the method on the new dataset and compare it with the state-of-the-arts for video captioning. From the experimental results, the OVC-Net exhibits the ability of precisely describing the concurrent objects and their activities in details.

READ FULL TEXT

page 11

page 14

research
08/14/2021

Cross-Modal Graph with Meta Concepts for Video Captioning

Video captioning targets interpreting the complex visual contents as tex...
research
08/05/2021

O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning

Video captioning combines video understanding and language generation. D...
research
07/24/2022

SAVCHOI: Detecting Suspicious Activities using Dense Video Captioning with Human Object Interactions

Detecting suspicious activities in surveillance videos has been a longst...
research
03/16/2018

Object Captioning and Retrieval with Natural Language

We address the problem of jointly learning vision and language to unders...
research
08/16/2020

Poet: Product-oriented Video Captioner for E-commerce

In e-commerce, a growing number of user-generated videos are used for pr...
research
09/28/2022

Thinking Hallucination for Video Captioning

With the advent of rich visual representations and pre-trained language ...
research
09/07/2021

Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention

Automatically describing video, or video captioning, has been widely stu...

Please sign up or login with your details

Forgot password? Click here to reset