Log In Sign Up

Large-Scale Multi-Label Text Classification on EU Legislation

by   Ilias Chalkidis, et al.

We consider Large-Scale Multi-Label Text Classification (LMTC) in the legal domain. We release a new dataset of 57k legislative documents from EURLEX, annotated with 4.3k EUROVOC labels, which is suitable for LMTC, few- and zero-shot learning. Experimenting with several neural classifiers, we show that BIGRUs with label-wise attention perform better than other current state of the art methods. Domain-specific WORD2VEC and context-sensitive ELMO embeddings further improve performance. We also find that considering only particular zones of the documents is sufficient. This allows us to bypass BERT's maximum text length limit and fine-tune BERT, obtaining the best results in all but zero-shot learning cases.


page 1

page 2

page 3

page 4


Extreme Multi-Label Legal Text Classification: A case study in EU Legislation

We consider the task of Extreme Multi-Label Text Classification (XMTC) i...

An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels

Large-scale Multi-label Text Classification (LMTC) has a wide range of N...

Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification

Large-scale multi-label text classification (LMTC) aims to associate a d...

Flexible Job Classification with Zero-Shot Learning

Using a taxonomy to organize information requires classifying objects (d...

Generalized Zero-shot ICD Coding

The International Classification of Diseases (ICD) is a list of classifi...

Improving Pretrained Models for Zero-shot Multi-label Text Classification through Reinforced Label Hierarchy Reasoning

Exploiting label hierarchies has become a promising approach to tackling...

Deep Learning Based Multi-Label Text Classification of UNGA Resolutions

The main goal of this research is to produce a useful software for Unite...