A Multi-Task Benchmark for Korean Legal Language Understanding and Judgement Prediction

06/10/2022
by   Wonseok Hwang, et al.
0

The recent advances of deep learning have dramatically changed how machine learning, especially in the domain of natural language processing, can be applied to legal domain. However, this shift to the data-driven approaches calls for larger and more diverse datasets, which are nevertheless still small in number, especially in non-English languages. Here we present the first large-scale benchmark of Korean legal AI datasets, LBox Open, that consists of one legal corpus, two classification tasks, two legal judgement prediction (LJP) tasks, and one summarization task. The legal corpus consists of 150k Korean precedents (264M tokens), of which 63k are sentenced in last 4 years and 96k are from the first and the second level courts in which factual issues are reviewed. The two classification tasks are case names (10k) and statutes (3k) prediction from the factual description of individual cases. The LJP tasks consist of (1) 11k criminal examples where the model is asked to predict fine amount, imprisonment with labor, and imprisonment without labor ranges for the given facts, and (2) 5k civil examples where the inputs are facts and claim for relief and outputs are the degrees of claim acceptance. The summarization task consists of the Supreme Court precedents and the corresponding summaries. We also release LCube, the first Korean legal language model trained on the legal corpus from this study. Given the uniqueness of the Law of South Korea and the diversity of the legal tasks covered in this work, we believe that LBox Open contributes to the multilinguality of global legal research. LBox Open and LCube will be publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2021

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Law, interpretations of law, legal arguments, agreements, etc. are typic...
research
11/01/2022

ClassActionPrediction: A Challenging Benchmark for Legal Judgment Prediction of Class Action Cases in the US

The research field of Legal Natural Language Processing (NLP) has been v...
research
03/25/2023

(Legal Design) Research through Litigation

This paper proposes the concept of 'research through litigation', where ...
research
11/23/2022

Agent-Specific Deontic Modality Detection in Legal Language

Legal documents are typically long and written in legalese, which makes ...
research
09/08/2023

NESTLE: a No-Code Tool for Statistical Analysis of Legal Corpus

The statistical analysis of large scale legal corpus can provide valuabl...
research
06/20/2023

Hallucination is the last thing you need

The legal profession necessitates a multidimensional approach that invol...
research
07/31/2023

Adversarially Robust Neural Legal Judgement Systems

Legal judgment prediction is the task of predicting the outcome of court...

Please sign up or login with your details

Forgot password? Click here to reset