BusiNet – a Light and Fast Text Detection Network for Business Documents

07/04/2022
by   Oshri Naparstek, et al.
13

For digitizing or indexing physical documents, Optical Character Recognition (OCR), the process of extracting textual information from scanned documents, is a vital technology. When a document is visually damaged or contains non-textual elements, existing technologies can yield poor results, as erroneous detection results can greatly affect the quality of OCR. In this paper we present a detection network dubbed BusiNet aimed at OCR of business documents. Business documents often include sensitive information and as such they cannot be uploaded to a cloud service for OCR. BusiNet was designed to be fast and light so it could run locally preventing privacy issues. Furthermore, BusiNet is built to handle scanned document corruption and noise using a specialized synthetic dataset. The model is made robust to unseen noise by employing adversarial training strategies. We perform an evaluation on publicly available datasets demonstrating the usefulness and broad applicability of our model.

READ FULL TEXT

page 3

page 4

research
05/17/2022

Detection Masking for Improved OCR on Noisy Documents

Optical Character Recognition (OCR), the task of extracting textual info...
research
02/07/2022

Combining Deep Learning and Reasoning for Address Detection in Unstructured Text Documents

Extracting information from unstructured text documents is a demanding t...
research
09/10/2020

OCR Graph Features for Manipulation Detection in Documents

Detecting manipulations in digital documents is becoming increasingly im...
research
06/01/2022

Delivering Document Conversion as a Cloud Service with High Throughput and Responsiveness

Document understanding is a key business process in the data-driven econ...
research
10/03/2022

EraseNet: A Recurrent Residual Network for Supervised Document Cleaning

Document denoising is considered one of the most challenging tasks in co...
research
03/28/2022

Understanding Questions that Arise When Working with Business Documents

While digital assistants are increasingly used to help with various prod...
research
10/21/2016

Automated Big Text Security Classification

In recent years, traditional cybersecurity safeguards have proven ineffe...

Please sign up or login with your details

Forgot password? Click here to reset