Text Reading Order in Uncontrolled Conditions by Sparse Graph Segmentation

05/04/2023
by   Renshen Wang, et al.
0

Text reading order is a crucial aspect in the output of an OCR engine, with a large impact on downstream tasks. Its difficulty lies in the large variation of domain specific layout structures, and is further exacerbated by real-world image degradations such as perspective distortions. We propose a lightweight, scalable and generalizable approach to identify text reading order with a multi-modal, multi-task graph convolutional network (GCN) running on a sparse layout based graph. Predictions from the model provide hints of bidimensional relations among text lines and layout region structures, upon which a post-processing cluster-and-sort algorithm generates an ordered sequence of all the text lines. The model is language-agnostic and runs effectively across multi-language datasets that contain various types of images taken in uncontrolled conditions, and it is small enough to be deployed on virtually any platform including mobile devices.

READ FULL TEXT

page 3

page 6

page 7

page 9

page 11

page 12

page 14

page 15

research
03/17/2022

Unified Line and Paragraph Detection by Graph Convolutional Networks

We formulate the task of detecting lines and paragraphs in a document in...
research
08/26/2021

LayoutReader: Pre-training of Text and Layout for Reading Order Detection

Reading order detection is the cornerstone to understanding visually-ric...
research
10/12/2022

ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding

Recent years have witnessed the rise and success of pre-training techniq...
research
02/03/2022

DocBed: A Multi-Stage OCR Solution for Documents with Complex Layouts

Digitization of newspapers is of interest for many reasons including pre...
research
06/21/2021

ROPE: Reading Order Equivariant Positional Encoding for Graph-based Document Information Extraction

Natural reading orders of words are crucial for information extraction f...
research
10/26/2022

Analyzing Multi-Task Learning for Abstractive Text Summarization

Despite the recent success of multi-task learning and pre-finetuning for...
research
03/18/2021

Reading Isn't Believing: Adversarial Attacks On Multi-Modal Neurons

With Open AI's publishing of their CLIP model (Contrastive Language-Imag...

Please sign up or login with your details

Forgot password? Click here to reset