verBERT: Automating Brazilian Case Law Document Multi-label Categorization Using BERT

03/11/2022
by   Felipe R. Serras, et al.
0

In this work, we carried out a study about the use of attention-based algorithms to automate the categorization of Brazilian case law documents. We used data from the Kollemata Project to produce two distinct datasets with adequate class systems. Then, we implemented a multi-class and multi-label version of BERT and fine-tuned different BERT models with the produced datasets. We evaluated several metrics, adopting the micro-averaged F1-Score as our main metric for which we obtained a performance value of F1-micro=0.72 corresponding to gains of 30 percent points over the tested statistical baseline. In this work, we carried out a study about the use of attention-based algorithms to automate the categorization of Brazilian case law documents. We used data from the Kollemata Project to produce two distinct datasets with adequate class systems. Then, we implemented a multi-class and multi-label version of BERT and fine-tuned different BERT models with the produced datasets. We evaluated several metrics, adopting the micro-averaged F1-Score as our main metric for which we obtained a performance value of ⟨ℱ_1 ⟩_micro=0.72 corresponding to gains of 30 percent points over the tested statistical baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2023

Imbalanced Multi-label Classification for Business-related Text with Moderately Large Label Spaces

In this study, we compared the performance of four different methods for...
research
02/14/2022

Punctuation restoration in Swedish through fine-tuned KB-BERT

Presented here is a method for automatic punctuation restoration in Swed...
research
08/15/2023

Finding Stakeholder-Material Information from 10-K Reports using Fine-Tuned BERT and LSTM Models

All public companies are required by federal securities law to disclose ...
research
04/12/2021

WHOSe Heritage: Classification of UNESCO World Heritage "Outstanding Universal Value" Documents with Smoothed Labels

The UNESCO World Heritage List (WHL) is to identify the exceptionally va...
research
03/22/2021

Hybrid Model for Patent Classification using Augmented SBERT and KNN

Purpose: This study aims to provide a hybrid approach for patent claim c...
research
09/08/2019

Transfer Learning Robustness in Multi-Class Categorization by Fine-Tuning Pre-Trained Contextualized Language Models

This study compares the effectiveness and robustness of multi-class cate...
research
10/15/2022

Large Language Models for Multi-label Propaganda Detection

The spread of propaganda through the internet has increased drastically ...

Please sign up or login with your details

Forgot password? Click here to reset