Intelligent Information Retrieval: Techniques for Character Recognition and Structured Data Extraction

10/05/2022
by   Mohana, et al.
0

The day-to-day activities of every corporation in-volve working with a huge amount of varying data formats such as those of work orders, techlogs, maintenance documents, etc. all of which are either vector or scanned PDFs. These activities involve long hours of manual work to extract the required data from these documents for further processing and becomes a costly affair for these organizations. Thus there is a huge scope for the development of a tool that provides intelligent optical character recognition and automates the process of extracting required information from these documents. This work contains a detailed analysis of end-to-end information extraction and proposes a highquality information extraction tool. The pro-posed tool incorporates vital preprocessing required and a variety of methods for accurate data extraction based on the type of data. The prerequisite work provides an extensive insight into the technologies and presents its comparative analysis and performs the much needed capabilities check that can be utilized to further build on the intelligent information retrieval tool.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

research
09/12/2020

Abstractive Information Extraction from Scanned Invoices (AIESI) using End-to-end Sequential Approach

Recent proliferation in the field of Machine Learning and Deep Learning ...
research
05/07/2020

A Gaussian Process Upsampling Model for Improvements in Optical Character Recognition

Optical Character Recognition and extraction is a key tool in the automa...
research
03/06/2022

An Adaptive Technique to Categorize Indic Language Documents

The significant growth of the electronic media to store and exchange tex...
research
04/24/2023

DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents

Information Extraction from visually rich documents is a challenging tas...
research
01/20/2017

Case Study of a highly automated Layout Analysis and OCR of an incunabulum: 'Der Heiligen Leben' (1488)

This paper provides the first thorough documentation of a high quality d...
research
03/23/2016

CONDITOR1: Topic Maps and DITA labelling tool for textual documents with historical information

Conditor is a software tool which works with textual documents containin...

Please sign up or login with your details

Forgot password? Click here to reset