Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval

07/06/2020
by   Xun Yang, et al.
0

The rapid growth of user-generated videos on the Internet has intensified the need for text-based video retrieval systems. Traditional methods mainly favor the concept-based paradigm on retrieval with simple queries, which are usually ineffective for complex queries that carry far more complex semantics. Recently, embedding-based paradigm has emerged as a popular approach. It aims to map the queries and videos into a shared embedding space where semantically-similar texts and videos are much closer to each other. Despite its simplicity, it forgoes the exploitation of the syntactic structure of text queries, making it suboptimal to model the complex queries. To facilitate video retrieval with complex queries, we propose a Tree-augmented Cross-modal Encoding method by jointly learning the linguistic structure of queries and the temporal representation of videos. Specifically, given a complex user query, we first recursively compose a latent semantic tree to structurally describe the text query. We then design a tree-augmented query encoder to derive structure-aware query representation and a temporal attentive video encoder to model the temporal characteristics of videos. Finally, both the query and videos are mapped into a joint embedding space for matching and ranking. In this approach, we have a better understanding and modeling of the complex queries, thereby achieving a better video retrieval performance. Extensive experiments on large scale video retrieval benchmark datasets demonstrate the effectiveness of our approach.

READ FULL TEXT

page 1

page 3

page 9

research
06/06/2019

Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos

Query-based moment retrieval aims to localize the most relevant moment i...
research
09/17/2018

Dual Dense Encoding for Zero-Example Video Retrieval

This paper attacks the challenging problem of zero-example video retriev...
research
04/27/2022

Relevance-based Margin for Contrastively-trained Video Retrieval Models

Video retrieval using natural language queries has attracted increasing ...
research
09/19/2022

Tree-based Text-Vision BERT for Video Search in Baidu Video Advertising

The advancement of the communication technology and the popularity of th...
research
07/11/2022

LaT: Latent Translation with Cycle-Consistency for Video-Text Retrieval

Video-text retrieval is a class of cross-modal representation learning p...
research
04/09/2016

Scene-driven Retrieval in Edited Videos using Aesthetic and Semantic Deep Features

This paper presents a novel retrieval pipeline for video collections, wh...
research
11/24/2019

A Proposal-based Approach for Activity Image-to-Video Retrieval

Activity image-to-video retrieval task aims to retrieve videos containin...

Please sign up or login with your details

Forgot password? Click here to reset