Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization

11/18/2022
by   Mengmeng Xu, et al.
0

This paper deals with the problem of localizing objects in image and video datasets from visual exemplars. In particular, we focus on the challenging problem of egocentric visual query localization. We first identify grave implicit biases in current query-conditioned model design and visual query datasets. Then, we directly tackle such biases at both frame and object set levels. Concretely, our method solves these issues by expanding limited annotations and dynamically dropping object proposals during training. Additionally, we propose a novel transformer-based module that allows for object-proposal set context to be considered while incorporating query information. We name our module Conditioned Contextual Transformer or CocoFormer. Our experiments show the proposed adaptations improve egocentric query detection, leading to a better visual query localization system in both 2D and 3D configurations. Thus, we are able to improve frame-level detection performance from 26.28 and VQ3D localization scores by significant margins. Our improved context-aware query object detector ranked first and second in the VQ2D and VQ3D tasks in the 2nd Ego4D challenge. In addition to this, we showcase the relevance of our proposed model in the Few-Shot Detection (FSD) task, where we also achieve SOTA results. Our code is available at https://github.com/facebookresearch/vq2d_cvpr.

READ FULL TEXT

page 1

page 4

page 8

research
03/18/2022

Local-Global Context Aware Transformer for Language-Guided Video Segmentation

We explore the task of language-guided video segmentation (LVS). Previou...
research
09/15/2021

Anchor DETR: Query Design for Transformer-Based Detector

In this paper, we propose a novel query design for the transformer-based...
research
08/03/2022

Negative Frames Matter in Egocentric Visual Query 2D Localization

The recently released Ego4D dataset and benchmark significantly scales a...
research
06/13/2022

Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection

Recent high-performing Human-Object Interaction (HOI) detection techniqu...
research
08/19/2021

Video Relation Detection via Tracklet based Visual Transformer

Video Visual Relation Detection (VidVRD), has received significant atten...
research
09/14/2018

Detection-by-Localization: Maintenance-Free Change Object Detector

Recent researches demonstrate that self-localization performance is a ve...
research
11/24/2022

One-Shot General Object Localization

This paper presents a general one-shot object localization algorithm cal...

Please sign up or login with your details

Forgot password? Click here to reset