DeepAI AI Chat
Log In Sign Up

Towards End-to-End Unified Scene Text Detection and Layout Analysis

by   Shangbang Long, et al.

Scene text detection and document layout analysis have long been treated as two separate tasks in different image domains. In this paper, we bring them together and introduce the task of unified scene text detection and layout analysis. The first hierarchical scene text dataset is introduced to enable this novel research task. We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way. Comprehensive experiments show that our unified model achieves better performance than multiple well-designed baseline methods. Additionally, this model achieves state-of-the-art results on multiple scene text detection datasets without the need of complex post-processing. Dataset and code:


page 1

page 3

page 6


FOTS: Fast Oriented Text Spotting with a Unified Network

Incidental scene text spotting is considered one of the most difficult a...

Scene Text Retrieval via Joint Text Detection and Similarity Learning

Scene text retrieval aims to localize and search all text instances from...

Real-time Scene Text Detection with Differentiable Binarization

Recently, segmentation-based methods are quite popular in scene text det...

Page Layout Analysis System for Unconstrained Historic Documents

Extraction of text regions and individual text lines from historic docum...

Detecting Curve Text in the Wild: New Dataset and New Solution

Scene text detection has been made great progress in recent years. The d...

An Efficient and Layout-Independent Automatic License Plate Recognition System Based on the YOLO detector

In this paper, we present an efficient and layout-independent Automatic ...

Unified Line and Paragraph Detection by Graph Convolutional Networks

We formulate the task of detecting lines and paragraphs in a document in...

Code Repositories


The HierText dataset contains ~12k images from the Open Images dataset v6 with large amount of text entities. We provide word, line and paragraph level annotations.

view repo