Bridging the Gap Between Object Detection and User Intent via Query-Modulation

06/18/2021
by   Marco Fornoni, et al.
7

When interacting with objects through cameras, or pictures, users often have a specific intent. For example, they may want to perform a visual search. However, most object detection models ignore the user intent, relying on image pixels as their only input. This often leads to incorrect results, such as lack of a high-confidence detection on the object of interest, or detection with a wrong class label. In this paper we investigate techniques to modulate standard object detectors to explicitly account for the user intent, expressed as an embedding of a simple query. Compared to standard object detectors, query-modulated detectors show superior performance at detecting objects for a given label of interest. Thanks to large-scale training data synthesized from standard object detection annotations, query-modulated detectors can also outperform specialized referring expression recognition systems. Furthermore, they can be simultaneously trained to solve for both query-modulated detection and standard object detection.

READ FULL TEXT

page 1

page 7

page 12

page 13

research
06/04/2021

Hallucination In Object Detection – A Study In Visual Part Verification

We show that object detectors can hallucinate and detect missing objects...
research
11/28/2020

Class-agnostic Object Detection

Object detection models perform well at localizing and classifying objec...
research
12/19/2019

Metamorphic Testing for Object Detection Systems

Recent advances in deep neural networks (DNNs) have led to object detect...
research
02/27/2019

Customizing Object Detectors for Indoor Robots

Object detection models based on convolutional neural networks (CNNs) de...
research
12/01/2017

Rank of Experts: Detection Network Ensemble

The recent advances of convolutional detectors show impressive performan...
research
06/28/2022

Detecting tiny objects in aerial images: A normalized Wasserstein distance and a new benchmark

Tiny object detection (TOD) in aerial images is challenging since a tiny...
research
06/28/2019

Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice

Millions of people reach out to digital assistants such as Siri every da...

Please sign up or login with your details

Forgot password? Click here to reset