CDJUR-BR – A Golden Collection of Legal Document from Brazilian Justice with Fine-Grained Named Entities

05/20/2023
by   Antonio Mauricio, et al.
0

A basic task for most Legal Artificial Intelligence (Legal AI) applications is Named Entity Recognition (NER). However, texts produced in the context of legal practice make references to entities that are not trivially recognized by the currently available NERs. There is a lack of categorization of legislation, jurisprudence, evidence, penalties, the roles of people in a legal process (judge, lawyer, victim, defendant, witness), types of locations (crime location, defendant's address), etc. In this sense, there is still a need for a robust golden collection, annotated with fine-grained entities of the legal domain, and which covers various documents of a legal process, such as petitions, inquiries, complaints, decisions and sentences. In this article, we describe the development of the Golden Collection of the Brazilian Judiciary (CDJUR-BR) contemplating a set of fine-grained named entities that have been annotated by experts in legal documents. The creation of CDJUR-BR followed its own methodology that aimed to attribute a character of comprehensiveness and robustness. Together with the CDJUR-BR repository we provided a NER based on the BERT model and trained with the CDJUR-BR, whose results indicated the prevalence of the CDJUR-BR.

READ FULL TEXT
research
03/29/2020

A Dataset of German Legal Documents for Named Entity Recognition

We describe a dataset developed for Named Entity Recognition in German f...
research
06/03/2023

FlairNLP at SemEval-2023 Task 6b: Extraction of Legal Named Entities from Legal Texts using Contextual String Embeddings

Indian court legal texts and processes are essential towards the integri...
research
05/06/2022

Fine-grained Intent Classification in the Legal Domain

A law practitioner has to go through a lot of long legal case proceeding...
research
05/10/2023

Extracting Complex Named Entities in Legal Documents via Weakly Supervised Object Detection

Accurate Named Entity Recognition (NER) is crucial for various informati...
research
04/07/2020

A German Corpus for Fine-Grained Named Entity Recognition and Relation Extraction of Traffic and Industry Events

Monitoring mobility- and industry-relevant events is important in areas ...
research
10/09/2019

Towards De-identification of Legal Texts

In many countries, personal information that can be published or shared ...
research
09/02/2021

Towards Explaining STEM Document Classification using Mathematical Entity Linking

Document subject classification is essential for structuring (digital) l...

Please sign up or login with your details

Forgot password? Click here to reset