Comparing the Performance of NLP Toolkits and Evaluation measures in Legal Tech

03/12/2021
by   Muhammad Zohaib Khan, et al.
0

Recent developments in Natural Language Processing have led to the introduction of state-of-the-art Neural Language Models, enabled with unsupervised transferable learning, using different pretraining objectives. While these models achieve excellent results on the downstream NLP tasks, various domain adaptation techniques can improve their performance on domain-specific tasks. We compare and analyze the pretrained Neural Language Models, XLNet (autoregressive), and BERT (autoencoder) on the Legal Tasks. Results show that XLNet Model performs better on our Sequence Classification task of Legal Opinions Classification, whereas BERT produces better results on the NER task. We use domain-specific pretraining and additional legal vocabulary to adapt BERT Model further to the Legal Domain. We prepared multiple variants of the BERT Model, using both methods and their combination. Comparing our variants of the BERT Model, specializing in the Legal Domain, we conclude that both additional pretraining and vocabulary techniques enhance the BERT model's performance on the Legal Opinions Classification task. Additional legal vocabulary improves BERT's performance on the NER task. Combining the pretraining and vocabulary techniques further improves the final results. Our Legal-Vocab-BERT Model gives the best results on the Legal Opinions Task, outperforming the larger pretrained general Language Models, i.e., BERT-Base and XLNet-Base.

READ FULL TEXT
research
09/02/2021

LegaLMFiT: Efficient Short Legal Text Classification with LSTM Language Model Pre-Training

Large Transformer-based language models such as BERT have led to broad p...
research
04/18/2021

When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset

While self-supervised learning has made rapid advances in natural langua...
research
04/23/2021

Optimizing small BERTs trained for German NER

Currently, the most widespread neural network architecture for training ...
research
10/15/2022

AraLegal-BERT: A pretrained language model for Arabic Legal text

The effectiveness of the BERT model on multiple linguistic tasks has bee...
research
08/04/2022

Vocabulary Transfer for Medical Texts

Vocabulary transfer is a transfer learning subtask in which language mod...
research
09/10/2021

IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization

We present IndoBERTweet, the first large-scale pretrained model for Indo...
research
04/16/2021

Temporal Adaptation of BERT and Performance on Downstream Document Classification: Insights from Social Media

Language use differs between domains and even within a domain, language ...

Please sign up or login with your details

Forgot password? Click here to reset