Goldilocks: Just-Right Tuning of BERT for Technology-Assisted Review

05/03/2021
by   Eugene Yang, et al.
0

Technology-assisted review (TAR) refers to iterative active learning workflows for document review in high recall retrieval (HRR) tasks. TAR research and most commercial TAR software have applied linear models such as logistic regression or support vector machines to lexical features. Transformer-based models with supervised tuning have been found to improve effectiveness on many text classification tasks, suggesting their use in TAR. We indeed find that the pre-trained BERT model reduces review volume by 30 TAR workflows simulated on the RCV1-v2 newswire collection. In contrast, we find that linear models outperform BERT for simulated legal discovery topics on the Jeb Bush e-mail collection. This suggests the match between transformer pre-training corpora and the task domain is more important than generally appreciated. Additionally, we show that just-right language model fine-tuning on the task collection before starting active learning is critical. Both too little or too much fine-tuning results in performance worse than that of linear models, even for RCV1-v2.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/14/2019

How to Fine-Tune BERT for Text Classification?

Language model pre-training has proven to be useful in learning universa...
research
10/06/2020

LEGAL-BERT: The Muppets straight out of Law School

BERT has achieved impressive performance in several NLP tasks. However, ...
research
06/17/2019

Exploiting Unsupervised Pre-training and Automated Feature Engineering for Low-resource Hate Speech Detection in Polish

This paper presents our contribution to PolEval 2019 Task 6: Hate speech...
research
06/18/2021

On Minimizing Cost in Legal Document Review Workflows

Technology-assisted review (TAR) refers to human-in-the-loop machine lea...
research
09/15/2021

The Unreasonable Effectiveness of the Baseline: Discussing SVMs in Legal Text Classification

We aim to highlight an interesting trend to contribute to the ongoing de...
research
11/22/2021

Finding the Winning Ticket of BERT for Binary Text Classification via Adaptive Layer Truncation before Fine-tuning

In light of the success of transferring language models into NLP tasks, ...
research
12/19/2019

Image Analytics for Legal Document Review: A Transfer Learning Approach

Though technology assisted review in electronic discovery has been focus...

Please sign up or login with your details

Forgot password? Click here to reset