Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence

04/16/2020
by   Huy Manh Nguyen, et al.
0

Visual-semantic embedding aims to learn a joint embedding space where related video and sentence instances are located close to each other. Most existing methods put instances in a single embedding space. However, they struggle to embed instances due to the difficulty of matching visual dynamics in videos to textual features in sentences. A single space is not enough to accommodate various videos and sentences. In this paper, we propose a novel framework that maps instances into multiple individual embedding spaces so that we can capture multiple relationships between instances, leading to compelling video retrieval. We propose to produce a final similarity between instances by fusing similarities measured in each embedding space using a weighted sum strategy. We determine the weights according to a sentence. Therefore, we can flexibly emphasize an embedding space. We conducted sentence-to-video retrieval experiments on a benchmark dataset. The proposed method achieved superior performance, and the results are competitive to state-of-the-art methods. These experimental results demonstrated the effectiveness of the proposed multiple embedding approach compared to existing methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 8

research
08/08/2016

Learning Joint Representations of Videos and Sentences with Web Image Search

Our objective is video retrieval based on natural language queries. In a...
research
08/07/2023

Topological Interpretations of GPT-3

This is an experiential study of investigating a consistent method for d...
research
03/25/2023

Learning video embedding space with Natural Language Supervision

The recent success of the CLIP model has shown its potential to be appli...
research
06/10/2022

Learning the Space of Deep Models

Embedding of large but redundant data, such as images or text, in a hier...
research
11/18/2019

Ladder Loss for Coherent Visual-Semantic Embedding

For visual-semantic embedding, the existing methods normally treat the r...
research
06/17/2015

Learning Contextualized Semantics from Co-occurring Terms via a Siamese Architecture

One of the biggest challenges in Multimedia information retrieval and un...
research
02/07/2022

Towards Micro-video Thumbnail Selection via a Multi-label Visual-semantic Embedding Model

The thumbnail, as the first sight of a micro-video, plays a pivotal role...

Please sign up or login with your details

Forgot password? Click here to reset