Sequence-aware multimodal page classification of Brazilian legal documents

07/02/2022
by   Pedro H. Luz de Araujo, et al.
0

The Brazilian Supreme Court receives tens of thousands of cases each semester. Court employees spend thousands of hours to execute the initial analysis and classification of those cases – which takes effort away from posterior, more complex stages of the case management workflow. In this paper, we explore multimodal classification of documents from Brazil's Supreme Court. We train and evaluate our methods on a novel multimodal dataset of 6,510 lawsuits (339,478 pages) with manual annotation assigning each page to one of six classes. Each lawsuit is an ordered sequence of pages, which are stored both as an image and as a corresponding text extracted through optical character recognition. We first train two unimodal classifiers: a ResNet pre-trained on ImageNet is fine-tuned on the images, and a convolutional network with filters of multiple kernel sizes is trained from scratch on document texts. We use them as extractors of visual and textual features, which are then combined through our proposed Fusion Module. Our Fusion Module can handle missing textual or visual input by using learned embeddings for missing data. Moreover, we experiment with bi-directional Long Short-Term Memory (biLSTM) networks and linear-chain conditional random fields to model the sequential nature of the pages. The multimodal approaches outperform both textual and visual classifiers, especially when leveraging the sequential nature of the pages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2023

Context-Aware Classification of Legal Document Pages

For many business applications that require the processing, indexing, an...
research
05/26/2022

Semantic Parsing of Interpage Relations

Page-level analysis of documents has been a topic of interest in digitiz...
research
12/09/2019

Modular Multimodal Architecture for Document Classification

Page classification is a crucial component to any document analysis syst...
research
11/27/2018

Document classification using a Bi-LSTM to unclog Brazil's supreme court

The Brazilian court system is currently the most clogged up judiciary sy...
research
12/07/2022

Hierarchical multimodal transformers for Multi-Page DocVQA

Document Visual Question Answering (DocVQA) refers to the task of answer...
research
03/15/2022

Do BERTs Learn to Use Browser User Interface? Exploring Multi-Step Tasks with Unified Vision-and-Language BERTs

Pre-trained Transformers are good foundations for unified multi-task mod...
research
04/15/2020

An Evaluation of DNN Architectures for Page Segmentation of Historical Newspapers

One important and particularly challenging step in the optical character...

Please sign up or login with your details

Forgot password? Click here to reset