Value Retrieval with Arbitrary Queries for Form-like Documents

12/15/2021
by   Mingfei Gao, et al.
5

We propose value retrieval with arbitrary queries for form-like documents to reduce human effort of processing forms. Unlike previous methods that only address a fixed set of field items, our method predicts target value for an arbitrary query based on the understanding of layout and semantics of a form. To further boost model performance, we propose a simple document language modeling (simpleDLM) strategy to improve document understanding on large-scale model pre-training. Experimental results show that our method outperforms our baselines significantly and the simpleDLM further improves our performance on value retrieval by around 17% F1 score compared with the state-of-the-art pre-training method. Code will be made publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/16/2023

Pre-training with Large Language Model-based Document Expansion for Dense Passage Retrieval

In this paper, we systematically study the potential of pre-training wit...
research
02/28/2022

LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding

Structured document understanding has attracted considerable attention a...
research
09/11/2023

Improving Information Extraction on Business Documents with Specific Pre-Training Tasks

Transformer-based Language Models are widely used in Natural Language Pr...
research
02/10/2020

Pre-training Tasks for Embedding-based Large-scale Retrieval

We consider the large-scale query-document retrieval problem: given a qu...
research
02/08/2015

Improving Term Frequency Normalization for Multi-topical Documents, and Application to Language Modeling Approaches

Term frequency normalization is a serious issue since lengths of documen...
research
03/16/2022

FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction

Sequence modeling has demonstrated state-of-the-art performance on natur...
research
12/08/2020

A Topological Method for Comparing Document Semantics

Comparing document semantics is one of the toughest tasks in both Natura...

Please sign up or login with your details

Forgot password? Click here to reset