DeepAI AI Chat
Log In Sign Up

Bridging the Gap Between Object Detection and User Intent via Query-Modulation

by   Marco Fornoni, et al.

When interacting with objects through cameras, or pictures, users often have a specific intent. For example, they may want to perform a visual search. However, most object detection models ignore the user intent, relying on image pixels as their only input. This often leads to incorrect results, such as lack of a high-confidence detection on the object of interest, or detection with a wrong class label. In this paper we investigate techniques to modulate standard object detectors to explicitly account for the user intent, expressed as an embedding of a simple query. Compared to standard object detectors, query-modulated detectors show superior performance at detecting objects for a given label of interest. Thanks to large-scale training data synthesized from standard object detection annotations, query-modulated detectors can also outperform specialized referring expression recognition systems. Furthermore, they can be simultaneously trained to solve for both query-modulated detection and standard object detection.


page 1

page 7

page 12

page 13


Hallucination In Object Detection – A Study In Visual Part Verification

We show that object detectors can hallucinate and detect missing objects...

Class-agnostic Object Detection

Object detection models perform well at localizing and classifying objec...

Metamorphic Testing for Object Detection Systems

Recent advances in deep neural networks (DNNs) have led to object detect...

Customizing Object Detectors for Indoor Robots

Object detection models based on convolutional neural networks (CNNs) de...

Rank of Experts: Detection Network Ensemble

The recent advances of convolutional detectors show impressive performan...

Detecting tiny objects in aerial images: A normalized Wasserstein distance and a new benchmark

Tiny object detection (TOD) in aerial images is challenging since a tiny...

Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice

Millions of people reach out to digital assistants such as Siri every da...