CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review

03/10/2021
by   Dan Hendrycks, et al.
0

Many specialized domains remain untouched by deep learning, as large labeled datasets require expensive expert annotators. We address this bottleneck within the legal domain by introducing the Contract Understanding Atticus Dataset (CUAD), a new dataset for legal contract review. CUAD was created with dozens of legal experts from The Atticus Project and consists of over 13,000 annotations. The task is to highlight salient portions of a contract that are important for a human to review. We find that Transformer models have nascent performance, but that this performance is strongly influenced by model design and training dataset size. Despite these promising results, there is still substantial room for improvement. As one of the only large, specialized NLP benchmarks annotated by experts, CUAD can serve as a challenging research benchmark for the broader NLP community.

READ FULL TEXT
research
01/02/2023

MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding

Reading comprehension of legal text can be a particularly challenging ta...
research
10/20/2020

A Benchmark for Lease Contract Review

Extracting entities and other useful information from legal contracts is...
research
12/15/2020

Learning to Check Contract Inconsistencies

Contract consistency is important in ensuring the legal validity of the ...
research
01/30/2023

LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain

Lately, propelled by the phenomenal advances around the transformer arch...
research
10/25/2022

Deconfounding Legal Judgment Prediction for European Court of Human Rights Cases Towards Better Alignment with Experts

This work demonstrates that Legal Judgement Prediction systems without e...
research
12/12/2019

Screening of Informed and Uninformed Experts

Testing the validity of claims made by self-proclaimed experts can be im...
research
09/14/2021

Just What do You Think You're Doing, Dave?' A Checklist for Responsible Data Use in NLP

A key part of the NLP ethics movement is responsible use of data, but ex...

Please sign up or login with your details

Forgot password? Click here to reset