DeepAI AI Chat
Log In Sign Up

Bridging the Gap Between Object Detection and User Intent via Query-Modulation

06/18/2021
by   Marco Fornoni, et al.
7

When interacting with objects through cameras, or pictures, users often have a specific intent. For example, they may want to perform a visual search. However, most object detection models ignore the user intent, relying on image pixels as their only input. This often leads to incorrect results, such as lack of a high-confidence detection on the object of interest, or detection with a wrong class label. In this paper we investigate techniques to modulate standard object detectors to explicitly account for the user intent, expressed as an embedding of a simple query. Compared to standard object detectors, query-modulated detectors show superior performance at detecting objects for a given label of interest. Thanks to large-scale training data synthesized from standard object detection annotations, query-modulated detectors can also outperform specialized referring expression recognition systems. Furthermore, they can be simultaneously trained to solve for both query-modulated detection and standard object detection.

READ FULL TEXT

page 1

page 7

page 12

page 13

06/04/2021

Hallucination In Object Detection – A Study In Visual Part Verification

We show that object detectors can hallucinate and detect missing objects...
11/28/2020

Class-agnostic Object Detection

Object detection models perform well at localizing and classifying objec...
12/19/2019

Metamorphic Testing for Object Detection Systems

Recent advances in deep neural networks (DNNs) have led to object detect...
02/27/2019

Customizing Object Detectors for Indoor Robots

Object detection models based on convolutional neural networks (CNNs) de...
12/01/2017

Rank of Experts: Detection Network Ensemble

The recent advances of convolutional detectors show impressive performan...
06/28/2022

Detecting tiny objects in aerial images: A normalized Wasserstein distance and a new benchmark

Tiny object detection (TOD) in aerial images is challenging since a tiny...
06/28/2019

Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice

Millions of people reach out to digital assistants such as Siri every da...