A Semi-automatic Data Extraction System for Heterogeneous Data Sources: A Case Study from Cotton Industry

11/05/2021
by   Richi Nayak, et al.
0

With the recent developments in digitisation, there are increasing number of documents available online. There are several information extraction tools that are available to extract information from digitised documents. However, identifying precise answers to a given query is often a challenging task especially if the data source where the relevant information resides is unknown. This situation becomes more complex when the data source is available in multiple formats such as PDF, table and html. In this paper, we propose a novel data extraction system to discover relevant and focused information from diverse unstructured data sources based on text mining approaches. We perform a qualitative analysis to evaluate the proposed system and its suitability and adaptability using cotton industry.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2019

Integrating Information About Entities Progressively

Users often have to integrate information about entities from multiple d...
research
09/02/2019

Learning Real Estate Automated Valuation Models from Heterogeneous Data Sources

Real estate appraisal is a complex and important task, that can be made ...
research
04/05/2019

Bayesian Heatmaps: Probabilistic Classification with Multiple Unreliable Information Sources

Unstructured data from diverse sources, such as social media and aerial ...
research
07/07/2020

Unsupervised Data Extraction from Computer-generated Documents with Single Line Formatting

Processing large amounts of data is an essential problem of the big data...
research
10/02/2020

FedQPL: A Language for Logical Query Plans over Heterogeneous Federations of RDF Data Sources (Extended Version)

Federations of RDF data sources provide great potential when queried for...
research
08/06/2023

Embedding-based Retrieval with LLM for Effective Agriculture Information Extracting from Unstructured Data

Pest identification is a crucial aspect of pest control in agriculture. ...
research
07/13/2018

New/s/leak 2.0 - Multilingual Information Extraction and Visualization for Investigative Journalism

Investigative journalism in recent years is confronted with two major ch...

Please sign up or login with your details

Forgot password? Click here to reset