HLDC: Hindi Legal Documents Corpus

04/02/2022
by   Arnav Kapoor, et al.
4

Many populous countries including India are burdened with a considerable backlog of legal cases. Development of automated systems that could process legal documents and augment legal practitioners can mitigate this. However, there is a dearth of high-quality corpora that is needed to develop such data-driven systems. The problem gets even more pronounced in the case of low resource languages such as Hindi. In this resource paper, we introduce the Hindi Legal Documents Corpus (HLDC), a corpus of more than 900K legal documents in Hindi. Documents are cleaned and structured to enable the development of downstream applications. Further, as a use-case for the corpus, we introduce the task of bail prediction. We experiment with a battery of models and propose a Multi-Task Learning (MTL) based model for the same. MTL models use summarization as an auxiliary task along with bail prediction as the main task. Experiments with different models are indicative of the need for further research in this area. We release the corpus and model implementation code with this paper: https://github.com/Exploration-Lab/HLDC

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2022

Corpus for Automatic Structuring of Legal Documents

In populous countries, pending legal cases have been growing exponential...
research
05/28/2021

ILDC for CJPE: Indian Legal Documents Corpus for Court Judgment Prediction and Explanation

An automated system that could assist a judge in predicting the outcome ...
research
10/22/2022

Extractive Summarization of Legal Decisions using Multi-task Learning and Maximal Marginal Relevance

Summarizing legal decisions requires the expertise of law practitioners,...
research
03/16/2022

LEVEN: A Large-Scale Chinese Legal Event Detection Dataset

Recognizing facts is the most fundamental step in making judgments, henc...
research
11/15/2022

DeepParliament: A Legal domain Benchmark Dataset for Parliament Bills Prediction

This paper introduces DeepParliament, a legal domain Benchmark Dataset t...
research
01/01/2022

Interpretable Low-Resource Legal Decision Making

Over the past several years, legal applications of deep learning have be...
research
09/08/2023

NESTLE: a No-Code Tool for Statistical Analysis of Legal Corpus

The statistical analysis of large scale legal corpus can provide valuabl...

Please sign up or login with your details

Forgot password? Click here to reset