A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension

09/16/2019
by   Yue Liao, et al.
0

Referring expression comprehension aims to localize the object instance described by a natural language expression. Current referring expression methods have achieved pretty-well performance. However, none of them is able to achieve real-time inference without accuracy drop. The reason for the relatively slow inference speed is that these methods artificially split the referring expression comprehension into two sequential stages including proposal generation and proposal ranking. It does not exactly conform to the habit of human cognition. To this end, we propose a novel Real-time Cross-modality Correlation Filtering method (RCCF). RCCF reformulates the referring expression as a correlation filtering process. The expression is first mapped from the language domain to the visual domain and then treated as a template (kernel) to perform correlation filtering on the image feature map. The peak value in the correlation heatmap indicates the center points of the target box. In addition, RCCF also regresses a 2-D object size and 2-D offset. The center point coordinates, object size and center point offset together form the target bounding-box. Our method runs at 40 FPS while achieves leading performance in RefClef, RefCOCO, RefCOCO+, and RefCOCOg benchmarks. In the challenge RefClef dataset, our methods almost double the state-of-the-art performance(34.70 attention and studies to the new cross-modality correlation filtering framework as well as the one-stage framework for referring expression comprehension.

READ FULL TEXT

page 3

page 7

research
05/05/2021

Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention

Referring Expression Comprehension (REC) has become one of the most impo...
research
08/12/2021

3D-SiamRPN: An End-to-End Learning Method for Real-Time 3D Single Object Tracking Using Raw Point Cloud

3D single object tracking is a key issue for autonomous following robot,...
research
12/12/2018

Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks

The task in referring expression comprehension is to localise the object...
research
12/20/2020

PPGN: Phrase-Guided Proposal Generation Network For Referring Expression Comprehension

Reference expression comprehension (REC) aims to find the location that ...
research
04/21/2022

Referring Expression Comprehension via Cross-Level Multi-Modal Fusion

As an important and challenging problem in vision-language tasks, referr...
research
12/07/2019

A Real-time Global Inference Network for One-stage Referring Expression Comprehension

Referring Expression Comprehension (REC) is an emerging research spot in...
research
07/19/2020

Referring Expression Comprehension: A Survey of Methods and Datasets

Referring expression comprehension (REC) aims to localize a target objec...

Please sign up or login with your details

Forgot password? Click here to reset