Text Detection Forgot About Document OCR

10/14/2022
by   Krzysztof Olejniczak, et al.
0

Detection and recognition of text from scans and other images, commonly denoted as Optical Character Recognition (OCR), is a widely used form of automated document processing with a number of methods available. Advances in machine learning enabled even more challenging scenarios of text detection and recognition "in-the-wild" - such as detecting text on objects from photographs of complex scenes. While the state-of-the-art methods for in-the-wild text recognition are typically evaluated on complex scenes, their performance in the domain of documents has not been published. This paper compares several methods designed for in-the-wild text recognition and for document text recognition, and provides their evaluation on the domain of structured documents. The results suggest that state-of-the-art methods originally proposed for in-the-wild text detection also achieve excellent results on document text detection, outperforming available OCR methods. We argue that the application of document OCR should not be omitted in evaluation of text detection and recognition methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2015

ICDAR 2015 Text Reading in the Wild Competition

Recently, text detection and recognition in natural scenes are becoming ...
research
09/29/2020

Immigration Document Classification and Automated Response Generation

In this paper, we consider the problem of organizing supporting document...
research
03/29/2023

RusTitW: Russian Language Text Dataset for Visual Text in-the-Wild Recognition

Information surrounds people in modern life. Text is a very efficient ty...
research
06/08/2020

Text Detection and Recognition in the Wild: A Review

Detection and recognition of text in natural images are two main problem...
research
08/05/2020

Can You Read Me Now? Content Aware Rectification using Angle Supervision

The ubiquity of smartphone cameras has led to more and more documents be...
research
05/04/2015

Learning Document Image Binarization from Data

In this paper we present a fully trainable binarization solution for deg...
research
07/28/2017

FontCode: Embedding Information in Text Documents using Glyph Perturbation

We introduce FontCode, an information embedding technique for text docum...

Please sign up or login with your details

Forgot password? Click here to reset