Scene Text Retrieval via Joint Text Detection and Similarity Learning

04/04/2021
by   Hao Wang, et al.
5

Scene text retrieval aims to localize and search all text instances from an image gallery, which are the same or similar to a given query text. Such a task is usually realized by matching a query text to the recognized words, outputted by an end-to-end scene text spotter. In this paper, we address this problem by directly learning a cross-modal similarity between a query text and each text instance from natural images. Specifically, we establish an end-to-end trainable network, jointly optimizing the procedures of scene text detection and cross-modal similarity learning. In this way, scene text retrieval can be simply performed by ranking the detected text instances with the learned similarity. Experiments on three benchmark datasets demonstrate our method consistently outperforms the state-of-the-art scene text spotting/retrieval approaches. In particular, the proposed framework of joint detection and similarity learning achieves significantly better performance than separated methods. Code is available at: https://github.com/lanfeng4659/STR-TDSL.

READ FULL TEXT

page 1

page 5

page 7

page 8

page 11

research
12/08/2020

StacMR: Scene-Text Aware Cross-Modal Retrieval

Recent models for cross-modal retrieval have benefited from an increasin...
research
08/12/2019

Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking

A major challenge in matching images and text is that they have intrinsi...
research
05/05/2022

Cross-modal Contrastive Learning for Speech Translation

How can we learn unified representations for spoken utterances and their...
research
04/06/2023

Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval

Cross-modal retrieval methods are the preferred tool to search databases...
research
08/31/2023

3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation

In 3D Referring Expression Segmentation (3D-RES), the earlier approach a...
research
03/28/2022

Towards End-to-End Unified Scene Text Detection and Layout Analysis

Scene text detection and document layout analysis have long been treated...
research
08/27/2018

Single Shot Scene Text Retrieval

Textual information found in scene images provides high level semantic i...

Please sign up or login with your details

Forgot password? Click here to reset