CHIC: Corporate Document for Visual question Answering

The massive use of digital documents due to the substantial trend of paperless initiatives confronted some companies to find ways to process thousands of documents per day automatically. To achieve this, they use automatic information retrieval (IR) allowing them to extract useful information from large datasets quickly. In order to have effective IR methods, it is first necessary to have an adequate dataset. Although companies have enough data to take into account their needs, there is also a need for a public database to compare contributions between state-of-the-art methods. Public data on the document exists as DocVQA[2] and XFUND [10], but these do not fully satisfy the needs of companies. XFUND contains only form documents while the company uses several types of documents (i.e. structured documents like forms but also semi-structured as invoices, and unstructured as emails). Compared to XFUND, DocVQA has several types of documents but only 4.5 corporate documents (i.e. invoice, purchase order, etc). All of this 4.5 documents do not meet the diversity of documents required by the company. We propose CHIC a visual question-answering public dataset. This dataset contains different types of corporate documents and the information extracted from these documents meet the right expectations of companies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/27/2021

Document Collection Visual Question Answering

Current tasks and methods in Document Understanding aims to process docu...
research
06/12/2023

Document Layout Annotation: Database and Benchmark in the Domain of Public Affairs

Every day, thousands of digital documents are generated with useful info...
research
08/20/2018

Adaptive Document Retrieval for Deep Question Answering

State-of-the-art systems in deep question answering proceed as follows: ...
research
06/09/2018

Learning to Search in Long Documents Using Document Structure

Reading comprehension models are based on recurrent neural networks that...
research
10/08/2022

Enhanced vectors for top-k document retrieval in Question Answering

Modern day applications, especially information retrieval webapps that i...
research
12/09/2021

From Scattered Sources to Comprehensive Technology Landscape: A Recommendation-based Retrieval Approach

Mapping the technology landscape is crucial for market actors to take in...
research
10/20/2020

Extracting Procedural Knowledge from Technical Documents

Procedures are an important knowledge component of documents that can be...

Please sign up or login with your details

Forgot password? Click here to reset