SpotEM: Efficient Video Search for Episodic Memory

The goal in episodic memory (EM) is to search a long egocentric video to answer a natural language query (e.g., "where did I leave my purse?"). Existing EM methods exhaustively extract expensive fixed-length clip features to look everywhere in the video for the answer, which is infeasible for long wearable-camera videos that span hours or even days. We propose SpotEM, an approach to achieve efficiency for a given EM method while maintaining good accuracy. SpotEM consists of three key ideas: 1) a novel clip selector that learns to identify promising video regions to search conditioned on the language query; 2) a set of low-cost semantic indexing features that capture the context of rooms, objects, and interactions that suggest where to look; and 3) distillation losses that address the optimization issues arising from end-to-end joint training of the clip selector and EM model. Our experiments on 200+ hours of video from the Ego4D EM Natural Language Queries benchmark and three different EM models demonstrate the effectiveness of our approach: computing only 10 original EM model's accuracy. Project page: https://vision.cs.utexas.edu/projects/spotem

READ FULL TEXT

page 1

page 4

page 8

research
10/13/2020

DORi: Discovering Object Relationship for Moment Localization of a Natural-Language Query in Video

This paper studies the task of temporal moment localization in a long un...
research
07/01/2022

ReLER@ZJU-Alibaba Submission to the Ego4D Natural Language Queries Challenge 2022

In this report, we present the ReLER@ZJU-Alibaba submission to the Ego4D...
research
03/15/2022

End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video Grounding

Natural language spatial video grounding aims to detect the relevant obj...
research
08/04/2017

Localizing Moments in Video with Natural Language

We consider retrieving a specific temporal segment, or moment, from a vi...
research
12/31/2020

Searching a Raw Video Database using Natural Language Queries

The number of videos being produced and consequently stored in databases...
research
05/29/2023

Despertando o Interesse de Mulheres para os Cursos em STEM

This article presents initiatives aimed at promoting female participatio...
research
10/11/2022

Learning to Locate Visual Answer in Video Corpus Using Question

We introduce a new task, named video corpus visual answer localization (...

Please sign up or login with your details

Forgot password? Click here to reset