Learning Joint Representations of Videos and Sentences with Web Image Search

08/08/2016
by   Mayu Otani, et al.
0

Our objective is video retrieval based on natural language queries. In addition, we consider the analogous problem of retrieving sentences or generating descriptions given an input video. Recent work has addressed the problem by embedding visual and textual inputs into a common space where semantic similarities correlate to distances. We also adopt the embedding approach, and make the following contributions: First, we utilize web image search in sentence embedding process to disambiguate fine-grained visual concepts. Second, we propose embedding models for sentence, image, and video inputs whose parameters are learned simultaneously. Finally, we show how the proposed model can be applied to description generation. Overall, we observe a clear improvement over the state-of-the-art methods in the video and sentence retrieval tasks. In description generation, the performance level is comparable to the current state-of-the-art, although our embeddings were trained for the retrieval tasks.

READ FULL TEXT

page 2

page 4

page 7

page 8

page 11

page 12

page 13

research
11/20/2014

Learning a Recurrent Visual Representation for Image Caption Generation

In this paper we explore the bi-directional mapping between images and t...
research
04/16/2020

Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence

Visual-semantic embedding aims to learn a joint embedding space where re...
research
01/15/2020

University of Amsterdam and Renmin University at TRECVID 2017: Searching Video, Detecting Events and Describing Video

In this paper, we summarize our TRECVID 2017 video recognition and retri...
research
09/14/2015

Deep Learning Applied to Image and Text Matching

The ability to describe images with natural language sentences is the ha...
research
06/07/2023

MarineVRS: Marine Video Retrieval System with Explainability via Semantic Understanding

Building a video retrieval system that is robust and reliable, especiall...
research
10/18/2022

CPS-MEBR: Click Feedback-Aware Web Page Summarization for Multi-Embedding-Based Retrieval

Embedding-based retrieval (EBR) is a technique to use embeddings to repr...
research
11/14/2014

A Faster Method for Tracking and Scoring Videos Corresponding to Sentences

Prior work presented the sentence tracker, a method for scoring how well...

Please sign up or login with your details

Forgot password? Click here to reset