Intelligent Information Retrieval: Techniques for Character Recognition and Structured Data Extraction
The day-to-day activities of every corporation in-volve working with a huge amount of varying data formats such as those of work orders, techlogs, maintenance documents, etc. all of which are either vector or scanned PDFs. These activities involve long hours of manual work to extract the required data from these documents for further processing and becomes a costly affair for these organizations. Thus there is a huge scope for the development of a tool that provides intelligent optical character recognition and automates the process of extracting required information from these documents. This work contains a detailed analysis of end-to-end information extraction and proposes a highquality information extraction tool. The pro-posed tool incorporates vital preprocessing required and a variety of methods for accurate data extraction based on the type of data. The prerequisite work provides an extensive insight into the technologies and presents its comparative analysis and performs the much needed capabilities check that can be utilized to further build on the intelligent information retrieval tool.
READ FULL TEXT