The Law of Large Documents: Understanding the Structure of Legal Contracts Using Visual Cues

07/16/2021
by   Allison Hegel, et al.
0

Large, pre-trained transformer models like BERT have achieved state-of-the-art results on document understanding tasks, but most implementations can only consider 512 tokens at a time. For many real-world applications, documents can be much longer, and the segmentation strategies typically used on longer documents miss out on document structure and contextual information, hurting their results on downstream tasks. In our work on legal agreements, we find that visual cues such as layout, style, and placement of text in a document are strong features that are crucial to achieving an acceptable level of accuracy on long documents. We measure the impact of incorporating such visual cues, obtained via computer vision methods, on the accuracy of document understanding tasks including document segmentation, entity extraction, and attribute classification. Our method of segmenting documents based on structural metadata out-performs existing methods on four long-document understanding tasks as measured on the Contract Understanding Atticus Dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2021

StructuralLM: Structural Pre-training for Form Understanding

Large pre-trained language models achieve state-of-the-art results when ...
research
12/31/2019

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Pre-training techniques have been verified successfully in a variety of ...
research
06/30/2020

Segmentation Approach for Coreference Resolution Task

In coreference resolution, it is important to consider all members of a ...
research
05/24/2023

ICDAR 2023 Competition on Robust Layout Segmentation in Corporate Documents

Transforming documents into machine-processable representations is a cha...
research
08/23/2021

Using Neighborhood Context to Improve Information Extraction from Visual Documents Captured on Mobile Phones

Information Extraction from visual documents enables convenient and inte...
research
09/11/2019

BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding

For understanding generic documents, information like font sizes, column...
research
08/03/2023

A Graphical Approach to Document Layout Analysis

Document layout analysis (DLA) is the task of detecting the distinct, se...

Please sign up or login with your details

Forgot password? Click here to reset