Evaluating Out-of-Distribution Performance on Document Image Classifiers

10/14/2022
by   Stefan Larson, et al.
0

The ability of a document classifier to handle inputs that are drawn from a distribution different from the training distribution is crucial for robust deployment and generalizability. The RVL-CDIP corpus is the de facto standard benchmark for document classification, yet to our knowledge all studies that use this corpus do not include evaluation on out-of-distribution documents. In this paper, we curate and release a new out-of-distribution benchmark for evaluating out-of-distribution performance for document classifiers. Our new out-of-distribution benchmark consists of two types of documents: those that are not part of any of the 16 in-domain RVL-CDIP categories (RVL-CDIP-O), and those that are one of the 16 in-domain categories yet are drawn from a distribution different from that of the original RVL-CDIP dataset (RVL-CDIP-N). While prior work on document classification for in-domain RVL-CDIP documents reports high accuracy scores, we find that these models exhibit accuracy drops of between roughly 15-30 further struggle to distinguish between in-domain RVL-CDIP-N and out-of-domain RVL-CDIP-O inputs. Our new benchmark provides researchers with a valuable new resource for analyzing out-of-distribution performance on document classifiers. Our new out-of-distribution data can be found at https://tinyurl.com/4he6my23.

READ FULL TEXT

page 6

page 27

research
08/05/2021

Exploring Out-of-Distribution Generalization in Text Classifiers Trained on Tobacco-3482 and RVL-CDIP

To be robust enough for widespread adoption, document analysis systems i...
research
06/21/2023

On Evaluation of Document Classification using RVL-CDIP

The RVL-CDIP benchmark is widely used for measuring performance on the t...
research
05/26/2023

GVdoc: Graph-based Visual Document Classification

The robustness of a model for real-world deployment is decided by how we...
research
02/20/2018

TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection

Detecting novelty of an entire document is an Artificial Intelligence (A...
research
03/17/2023

Finding Competence Regions in Domain Generalization

We propose a "learning to reject" framework to address the problem of si...
research
07/15/2021

Data vs classifiers, who wins?

The classification experiments covered by machine learning (ML) are comp...
research
05/09/2018

Creative Invention Benchmark

In this paper we present the Creative Invention Benchmark (CrIB), a 2000...

Please sign up or login with your details

Forgot password? Click here to reset