Diversified Co-Attention towards Informative Live Video Commenting

11/07/2019
by   Zhihan Zhang, et al.
0

We focus on the task of Automatic Live Video Commenting (ALVC), which aims to generate real-time video comments based on both video frames and other viewers' remarks. An intractable challenge in this task is the appropriate modeling of complex dependencies between video and textual inputs. Previous work in the ALVC task applies separate attention on these two input sources to obtain their representations. In this paper, we argue that the information of video and text should be modeled integrally. We propose a novel model equipped with a Diversified Co-Attention layer (DCA) and a Gated Attention Module (GAM). DCA allows interactions between video and text from diversified perspectives via metric learning, while GAM collects an informative context for comment generation. We further introduce a parameter orthogonalization technique to allieviate information redundancy in DCA. Experiment results show that our model outperforms previous approaches in the ALVC task and the traditional co-attention model, achieving state-of-the-art results.

READ FULL TEXT
research
08/13/2018

Live Video Comment Generation Based on Surrounding Frames and Live Comments

In this paper, we propose the task of live comment generation. Live comm...
research
02/07/2020

Multimodal Matching Transformer for Live Commenting

Automatic live commenting aims to provide real-time comments on videos f...
research
09/13/2018

LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts

We introduce the task of automatic live commenting. Live commenting, whi...
research
04/28/2023

Knowledge Enhanced Model for Live Video Comment Generation

Live video commenting is popular on video media platforms, as it can cre...
research
06/04/2020

Response to LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts

Live video commenting systems are an emerging feature of online video si...
research
03/19/2021

MDMMT: Multidomain Multimodal Transformer for Video Retrieval

We present a new state-of-the-art on the text to video retrieval task on...
research
11/07/2016

Memory-augmented Attention Modelling for Videos

We present a method to improve video description generation by modeling ...

Please sign up or login with your details

Forgot password? Click here to reset