Document Intelligence Metrics for Visually Rich Document Evaluation

05/23/2022
by   Jonathan Degange, et al.
0

The processing of Visually-Rich Documents (VRDs) is highly important in information extraction tasks associated with Document Intelligence. We introduce DI-Metrics, a Python library devoted to VRD model evaluation comprising text-based, geometric-based and hierarchical metrics for information extraction tasks. We apply DI-Metrics to evaluate information extraction performance using publicly available CORD dataset, comparing performance of three SOTA models and one industry model. The open-source library is available on GitHub.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/10/2020

SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics

We present SacreROUGE, an open-source library for using and developing s...
research
04/28/2023

Information Redundancy and Biases in Public Document Information Extraction Benchmarks

Advances in the Visually-rich Document Understanding (VrDU) field and pa...
research
02/25/2022

OCR-IDL: OCR Annotations for Industry Document Library Dataset

Pretraining has proven successful in Document Intelligence tasks where d...
research
10/11/2020

Revising FUNSD dataset for key-value detection in document images

FUNSD is one of the limited publicly available datasets for information ...
research
05/30/2023

Table Detection for Visually Rich Document Images

Table Detection (TD) is a fundamental task towards visually rich documen...
research
06/05/2023

ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images

Structured text extraction is one of the most valuable and challenging a...
research
10/28/2022

Radically Lower Data-Labeling Costs for Visually Rich Document Extraction Models

A key bottleneck in building automatic extraction models for visually ri...

Please sign up or login with your details

Forgot password? Click here to reset