Multi-granular Legal Topic Classification on Greek Legislation

09/30/2021
by   Christos Papaloukas, et al.
0

In this work, we study the task of classifying legal texts written in the Greek language. We introduce and make publicly available a novel dataset based on Greek legislation, consisting of more than 47 thousand official, categorized Greek legislation resources. We experiment with this dataset and evaluate a battery of advanced methods and classifiers, ranging from traditional machine learning and RNN-based methods to state-of-the-art Transformer-based methods. We show that recurrent architectures with domain-specific word embeddings offer improved overall performance while being competitive even to transformer-based models. Finally, we show that cutting-edge multilingual and monolingual transformer-based models brawl on the top of the classifiers' ranking, making us question the necessity of training monolingual transfer learning models as a rule of thumb. To the best of our knowledge, this is the first time the task of Greek legal text classification is considered in an open research project, while also Greek is a language with very limited NLP resources in general.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2019

Legal Area Classification: A Comparative Study of Text Classifiers on Singapore Supreme Court Judgments

This paper conducts a comparative study on the performance of various ma...
research
10/24/2020

Large Scale Legal Text Classification Using Transformer Models

Large multi-label text classification is a challenging Natural Language ...
research
07/19/2022

Multilingual Transformer Encoders: a Word-Level Task-Agnostic Evaluation

Some Transformer-based models can perform cross-lingual transfer learnin...
research
08/25/2021

Exploring the Promises of Transformer-Based LMs for the Representation of Normative Claims in the Legal Domain

In this article, we explore the potential of transformer-based language ...
research
01/11/2022

The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild

This paper presents a new training dataset for automatic genre identific...
research
05/02/2023

MultiLegalSBD: A Multilingual Legal Sentence Boundary Detection Dataset

Sentence Boundary Detection (SBD) is one of the foundational building bl...
research
07/26/2023

Towards Establishing Systematic Classification Requirements for Automated Driving

Despite the presence of the classification task in many different benchm...

Please sign up or login with your details

Forgot password? Click here to reset