Improving Structured Text Recognition with Regular Expression Biasing

11/10/2021
by   Baoguang Shi, et al.
0

We study the problem of recognizing structured text, i.e. text that follows certain formats, and propose to improve the recognition accuracy of structured text by specifying regular expressions (regexes) for biasing. A biased recognizer recognizes text that matches the specified regexes with significantly improved accuracy, at the cost of a generally small degradation on other text. The biasing is realized by modeling regexes as a Weighted Finite-State Transducer (WFST) and injecting it into the decoder via dynamic replacement. A single hyperparameter controls the biasing strength. The method is useful for recognizing text lines with known formats or containing words from a domain vocabulary. Examples include driver license numbers, drug names in prescriptions, etc. We demonstrate the efficacy of regex biasing on datasets of printed and handwritten structured text and measures its side effects.

READ FULL TEXT

page 1

page 6

research
09/05/2023

STEP – Towards Structured Scene-Text Spotting

We introduce the structured scene-text spotting task, which requires a s...
research
09/15/2015

Regular expressions for decoding of neural network outputs

This article proposes a convenient tool for decoding the output of neura...
research
11/09/2021

DataWords: Getting Contrarian with Text, Structured Data and Explanations

Our goal is to build classification models using a combination of free-t...
research
06/03/2023

TransDocAnalyser: A Framework for Offline Semi-structured Handwritten Document Analysis in the Legal Domain

State-of-the-art offline Optical Character Recognition (OCR) frameworks ...
research
06/06/2023

Looking and Listening: Audio Guided Text Recognition

Text recognition in the wild is a long-standing problem in computer visi...
research
10/15/2019

Text2Math: End-to-end Parsing Text into Math Expressions

We propose Text2Math, a model for semantically parsing text into math ex...
research
04/22/2023

An approach to extract information from academic transcripts of HUST

In many Vietnamese schools, grades are still being inputted into the dat...

Please sign up or login with your details

Forgot password? Click here to reset