Saying What You're Looking For: Linguistics Meets Video Search

09/20/2013
by   Andrei Barbu, et al.
0

We present an approach to searching large video corpora for video clips which depict a natural-language query in the form of a sentence. This approach uses compositional semantics to encode subtle meaning that is lost in other systems, such as the difference between two sentences which have identical words but entirely different meaning: "The person rode the horse vs. The horse rode the person". Given a video-sentence pair and a natural-language parser, along with a grammar that describes the space of sentential queries, we produce a score which indicates how well the video depicts the sentence. We produce such a score for each video clip in a corpus and return a ranked list of clips. Furthermore, this approach addresses two fundamental problems simultaneously: detecting and tracking objects, and recognizing whether those tracks depict the query. Because both tracking and object detection are unreliable, this uses knowledge about the intended sentential query to focus the tracker on the relevant participants and ensures that the resulting tracks are described by the sentential query. While earlier work was limited to single-word queries which correspond to either verbs or nouns, we show how one can search for complex queries which contain multiple phrases, such as prepositional phrases, and modifiers, such as adverbs. We demonstrate this approach by searching for 141 queries involving people and horses interacting with each other in 10 full-length Hollywood movies.

READ FULL TEXT

page 10

page 11

research
11/14/2014

A Faster Method for Tracking and Scoring Videos Corresponding to Sentences

Prior work presented the sentence tracker, a method for scoring how well...
research
07/01/2022

ReLER@ZJU-Alibaba Submission to the Ego4D Natural Language Queries Challenge 2022

In this report, we present the ReLER@ZJU-Alibaba submission to the Ego4D...
research
11/08/2018

Towards Compositional Distributional Discourse Analysis

Categorical compositional distributional semantics provide a method to d...
research
12/31/2020

Searching a Raw Video Database using Natural Language Queries

The number of videos being produced and consequently stored in databases...
research
01/02/2023

NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory

Searching long egocentric videos with natural language queries (NLQ) has...
research
08/10/2022

Exploring Anchor-based Detection for Ego4D Natural Language Query

In this paper we provide the technique report of Ego4D natural language ...
research
06/15/2023

Action Sensitivity Learning for the Ego4D Episodic Memory Challenge 2023

This report presents ReLER submission to two tracks in the Ego4D Episodi...

Please sign up or login with your details

Forgot password? Click here to reset