In this paper, we introduce the Chinese AI and Law challenge dataset (CAIL2018), the first large-scale Chinese legal dataset for judgment prediction. contains more than 2.6 million criminal cases published by the Supreme People's Court of China, which are several times larger than other datasets in existing works on judgment prediction. Moreover, the annotations of judgment results are more detailed and rich. It consists of applicable law articles, charges, and prison terms, which are expected to be inferred according to the fact descriptions of cases. For comparison, we implement several conventional text classification baselines for judgment prediction and experimental results show that it is still a challenge for current models to predict the judgment results of legal cases, especially on prison terms. To help the researchers make improvements on legal judgment prediction, both and baselines will be released after the CAIL competition[http://cail.cipsc.org.cn/].READ FULL TEXT VIEW PDF
In this paper, we give an overview of the Legal Judgment Prediction (LJP...
In this paper, we introduce CAIL2019-SCM, Chinese AI and Law 2019 Simila...
The Legal Judgment Prediction (LJP) is to determine judgment results bas...
The charge prediction task is to determine appropriate charges for a giv...
Legal judgment prediction(LJP) is an essential task for legal AI. While ...
Legal Judgment Prediction (LJP) is the task of automatically predicting ...
In many jurisdictions, the excessive workload of courts leads to high de...
The task of Legal Judgment Prediction(LJP) aims to empower machine to predict the judgment results of legal cases after reading fact descriptions. It has been studied for decades. Due to the limitation of publicly available cases, early works Lauderdale and Clark (2012); Segal (1984); Keown (1980); Ulmer (1963); Nagel (1963); Kort (1957)
usually conduct statistical analysis on the judgment results over a small number of cases rather than predicting them. With the development of machine learning algorithms, some works take LJP as a text classification task and propose to extract efficient features from fact descriptonsLiu and Chen (2017); Sulea et al. (2017); Aletras et al. (2016); Lin et al. (2012); Liu and Hsieh (2006). These works are still restricted to particular case types and suffer from generalization issue when applied to other scenarios.
Inspired by the success of deep learning techniques on natural language processing tasks, researchers attempt to employ neural models to handle judgment prediction task under the text classification frameworkLuo et al. (2017); Hu et al. (2018). However, there is not a publicly accessible high-quality dataset for LJP yet. Therefore, we collect and release the first large-scale dataset for LJP, i.e., CAIL2018, to encourage further explorations on this task and other advanced legal intelligence algorithms.
CAIL2018 consists of more than million criminal cases, which are collected from http://wenshu.court.gov.cn/ published by the Supreme People’s Court of China. These documents serve as the reference for professionals to improve their working efficiency and are expected to benefit researches on legal intelligent systems.
Specifically, each case in CAIL2018 consists of two parts, i.e., fact description and corresponding judgment result. Here, the judgment result of each case is refined into representative ones, including relevant law articles, charges, and prison terms. Comparing with other datasets used by existing LJP works, CAIL2018 is on a larger scale and reserves richer annotations of judgment results. Totally, CAIL2018 contains criminal cases, which are annotated with criminal law articles and criminal charges. Both the number of cases and the number of labels are several times than other closed-source LJP datasets.
In the following parts, we give a detailed introduction to the construction of CAIL2018 and the LJP results of baseline methods on this dataset.
|Fact||Relevant Law Article||Charge||Prison Term||Defendant|
|The Defendant Hu…||234th article of criminal law||intentional injury||12 months||Miss./Mr. Hu|
We construct CAIL2018 from criminal documents collected from China Judgments Online222http://wenshu.court.gov.cn/. There documents of criminal cases belong to five types, including judgment, verdict, conciliation statement, decision letter, and notice. For LJP, we only concern on these cases with judgment results. Therefore, we only keep these judgment documents for training LJP models.
Each original document is well-structured and divided into several parts, e.g., fact description, court view, parties, judgment result and other information. Therefore, we take the fact part as input and extract applicable law articles, charges and prison terms from referee result with regular expressions.
Since many criminal cases own multiple defendants, which would increase the difficulty of LJP greatly, we only retain the cases with a single defendant.
In addition, there are also many low-frequency charges(e.g. insult the national flag, jailbreak) and law articles. We filter out cases with those charges and law articles whose frequency is smaller than . Besides, the top law articles in Chinese Criminal Law are not relevant to specific charges, we filter out these law articles and charges as well.
After preprocessing, the dataset contains criminal cases, criminal law articles, charges and prison term. We also show an instance in CAIL2018 in Table 1.
It is worth noting that, the distribution of different categories in CAIL2018 is quite imbalanced. Considering the number of various charges, the top charges cover cases. On the contrary, the bottom charges only cover cases. The imbalance issue in CAIL2018 makes it challenging to predict low-frequency charges and law articles.
In this section, we implement and evaluate several typical text classification baselines on three subtasks of LJP, including law articles, charges, and prison terms.
|Tasks||Charges||Relevant Articles||Terms of Penalty|
We select following baselines for comparison:
TFIDF+ SVM: Term-frequency inverse document frequency (TFIDF) Salton and Buckley (1988)
is an efficient method to extract word features and Support Vector Machine (SVM)Suykens and Vandewalle (1999)
is a representative classification model. We implement TFIDF to extract text features and employ SVM with linear kernel to train the classifier.
FastText: FastText Joulin et al. (2017)
is a simple and efficient approach for text classification based on N-grams and Hierarchical softmaxMikolov et al. (2013).
For all the methods, we randomly select cases for training and cases for testing. Since all fact descriptions are written in Chinese, we employ THULAC Sun et al. (2016) for word segmentation. For TFIDF+SVM model, we limit the feature size to . For neural-based model, we employ Skip-Gram model Mikolov et al. (2013) to train word embeddings with dimensions.
For CNN, we set the maximum length of a case description to , the filter widths to with each filter size to for consistency.
For training, we employ Adam Kingma and Ba (2015) as the optimizer. We set the learning rate to , the dropout rate to , and the batch size to .
We evaluate baseline models with several metrics, including accuracy(Acc.), macro-precision(MP) and macro-recall(MR) which are widely used in the classification task. Experimental results on the test set are shown in Table 2.
From this table, we find that current models can achieve considerable results on the accuracy of charges prediction and relevant law articles prediction. However, the results of MP and MR show that LJP is still a huge challenge due to the lack of training data and imbalance issue.
In this work, we release the first large-scale legal judgment prediction dataset, CAIL2018. Comparing with existing LJP datasets, CAIL2018 is the largest LJP dataset so far and publicly available. Moreover, CAIL2018 reserves more detailed annotations, which is consistent with real-world scenarios. Experiments demonstrate that LJP is still challenging and leave plenty of room to make improvements.
A two-phase sentiment analysis approach for judgement prediction.Journal of Information Science .
Document modeling with gated recurrent neural network for sentiment classification.In Proceedings of EMNLP. pages 1422–1432.