The Unreasonable Effectiveness of the Baseline: Discussing SVMs in Legal Text Classification

09/15/2021
by   Benjamin Clavié, et al.
0

We aim to highlight an interesting trend to contribute to the ongoing debate around advances within legal Natural Language Processing. Recently, the focus for most legal text classification tasks has shifted towards large pre-trained deep learning models such as BERT. In this paper, we show that a more traditional approach based on Support Vector Machine classifiers reaches competitive performance with deep learning models. We also highlight that error reduction obtained by using specialised BERT-based models over baselines is noticeably smaller in the legal domain when compared to general language tasks. We discuss some hypotheses for these results to support future discussions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/02/2021

LegaLMFiT: Efficient Short Legal Text Classification with LSTM Language Model Pre-Training

Large Transformer-based language models such as BERT have led to broad p...
research
06/12/2023

Linear Classifier: An Often-Forgotten Baseline for Text Classification

Large-scale pre-trained language models such as BERT are popular solutio...
research
12/10/2021

Computer-Assisted Creation of Boolean Search Rules for Text Classification in the Legal Domain

In this paper, we present a method of building strong, explainable class...
research
03/13/2020

Predicting Legal Proceedings Status: an Approach Based on Sequential Text Data

Machine learning applications in the legal field are numerous and divers...
research
12/02/2021

Unsupervised Law Article Mining based on Deep Pre-Trained Language Representation Models with Application to the Italian Civil Code

Modeling law search and retrieval as prediction problems has recently em...
research
09/05/2023

Sample Size in Natural Language Processing within Healthcare Research

Sample size calculation is an essential step in most data-based discipli...
research
05/03/2021

Goldilocks: Just-Right Tuning of BERT for Technology-Assisted Review

Technology-assisted review (TAR) refers to iterative active learning wor...

Please sign up or login with your details

Forgot password? Click here to reset