Poet: Product-oriented Video Captioner for E-commerce

08/16/2020
by   Shengyu Zhang, et al.
0

In e-commerce, a growing number of user-generated videos are used for product promotion. How to generate video descriptions that narrate the user-preferred product characteristics depicted in the video is vital for successful promoting. Traditional video captioning methods, which focus on routinely describing what exists and happens in a video, are not amenable for product-oriented video captioning. To address this problem, we propose a product-oriented video captioner framework, abbreviated as Poet. Poet firstly represents the videos as product-oriented spatial-temporal graphs. Then, based on the aspects of the video-associated product, we perform knowledge-enhanced spatial-temporal inference on those graphs for capturing the dynamic change of fine-grained product-part characteristics. The knowledge leveraging module in Poet differs from the traditional design by performing knowledge filtering and dynamic memory modeling. We show that Poet achieves consistent performance improvement over previous methods concerning generation quality, product aspects capturing, and lexical diversity. Experiments are performed on two product-oriented video captioning datasets, buyer-generated fashion video dataset (BFVD) and fan-generated fashion video dataset (FFVD), collected from Mobile Taobao. We will release the desensitized datasets to promote further investigations on both video captioning and general video analysis problems.

READ FULL TEXT
research
10/15/2019

Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 2019

This notebook paper presents our model in the VATEX video captioning cha...
research
06/24/2020

Comprehensive Information Integration Modeling Framework for Video Titling

In e-commerce, consumer-generated videos, which in general deliver consu...
research
03/08/2020

Object-Oriented Video Captioning with Temporal Graph and Prior Knowledge Building

Traditional video captioning requests a holistic description of the vide...
research
11/17/2022

Visual Commonsense-aware Representation Network for Video Captioning

Generating consecutive descriptions for videos, i.e., Video Captioning, ...
research
03/26/2023

GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation

Despite the recent emergence of video captioning models, how to generate...
research
08/25/2021

Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training

Translating e-commercial product descriptions, a.k.a product-oriented ma...
research
08/01/2017

Video as a By-Product of Digital Prototyping: Capturing the Dynamic Aspect of Interaction

Requirements engineering provides several practices to analyze how a use...

Please sign up or login with your details

Forgot password? Click here to reset