Similar Case Matching (SCM) plays a major role in legal system, especially in common law legal system. The most similar cases in the past determine the judgment results of cases in common law systems. As a result, legal professionals often spend much time finding and judging similar cases to prove fairness in judgment. As automatically finding similar cases can benefit to the legal system, we select SCM as one of the tasks of CAIL2019.
Chinese AI and L
aw Challenge (CAIL) is a competition of applying artificial intelligence technology to legal tasks. The goal of the competition is to use AI to help the legal system. CAIL was first held in 2018, and the main task of CAIL2018Xiao et al. (2018); Zhong et al. (2018b) is predicting the judgment results from the fact description. The judgment results include the accusation, applicable articles, and the term of penalty. CAIL2019 contains three different tasks, including Legal Question-Answering, Legal Case Element Prediction, and Similar Case Matching. Furthermore, we will focus on SCM in this paper.
More specifically, CAIL2019-SCM contains 8,964 triplets of legal documents. Every legal documents is collected from China Judgments Online111http://wenshu.court.gov.cn/. In order to ensure the similarity of the cases in one triplet, all selected documents are related to Private Lending. Every document in the triplet contains the fact description. CAIL2019-SCM requires researchers to decide which two cases are more similar in a triplet. By detecting similar cases in triplets, we can apply this algorithm for ranking all documents to find the most similar document in the database. There are 247 teams who have participated CAIL2019-SCM, and the best team has reached a score of , which is about points higher than the baseline. The results show that the existing methods have made great progress on this task, but there is still much room for improvement.
In other words, CAIL2019-SCM can benefit the research of legal case matching. Furthermore, there are several main challenges of CAIL2019-SCM: (1) The difference between documents may be small, and then it is hard to decide which two documents are more similar. Moreover, the similarity is defined by legal workers. We must utilize legal knowledge into this task rather than calculate similarity on the lexical level. (2) The length of the documents is quite long. Most documents contain more than characters, and then it is hard for existing methods to capture document level information.
In the following parts, we will give more details about CAIL2019-SCM, including related works about SCM, the task definition, the construction of the dataset, and several experiments on the dataset.
2 Related Work
2.1 Semantic Text Matching
SCM aims to measure the similarity between legal case documents. Essentially, it is an application of semantic text matching, which is central for many tasks in natural language processing, such as question answering, information retrieval, and natural language inference. Take information retrieval as an example, given a query and a database, a semantic matching model is required to judge the semantic similarity between the query and documents in the database. Moreover, the tasks related to semantic matching have attracted the attention of many researchers in recent decades.
Intuitively traditional approaches calculate word-to-word similarity with vector space model, e.g. term frequency-inverse document frequencyWu et al. (2008), bag-of-words Bilotti et al. (2007). However, due to the variety of words in different texts, these approaches achieve limited success in the task.
Nevertheless, most previous studies are designed for identifying the relationship between two sentences with limited length.
2.2 Legal Intelligence
Researchers widely concern tasks for legal intelligence. Applying NLP techniques to solve a legal problem becomes more and more popular in recent years. Previous works Kort (1957); Keown (1980); Lauderdale and Clark (2012) focus on analyzing existing cases with mathematical tools. With the development of deep learning, more researchers pay much efforts on predicting the judgment result of legal cases Luo et al. (2017); Hu et al. (2018); Zhong et al. (2018a); Chalkidis et al. (2019); Jiang et al. (2018); Yang et al. (2019). Furthermore, there are many works on generating court views to interpret charge results Ye et al. (2018), information extraction from legal text Vacek and Schilder (2017); Vacek et al. (2019), legal event detection Yan et al. (2017), identifying applicable law articles Liu et al. (2015); Liu and Hsieh (2006) and legal question answering Kim et al. (2015); Fawei et al. (2018).
Meanwhile, retrieving related legal documents with a query has been studied for decades and is a critical issue in applications of legal intelligence. Raghav et al. (2016) emphasize exploiting paragraph-level and citation information. Kano et al. (2017) and Zhong et al. (2018b) held a legal information extraction and entailment competition to promote progress in legal case retrieval.
3 Overview of Dataset
3.1 Task Definition
We first define the task of CAIL2019-SCM here. The input of CAIL2019-SCM is a triplet , where are fact descriptions of three cases. Here we define a function which is used for measuring the similarity between two cases. Then the task of CAIL2019-SCM is to predict whether or .
3.2 Dataset Construction and Details
To ensure the quality of the dataset, we have several steps of constructing the dataset. First, we select many documents within the range of Private Lending. However, although all cases are related to Private Lending, they are still various so that many cases are not similar at all. If the cases in the triplets are not similar, it does not make sense to compare their similarities. To produce qualified triplets, we first annotated some crucial elements in Private Lending for each document. The elements include:
The properties of lender and borrower, whether they are a natural person, a legal person, or some other organization.
The type of guarantee, including no guarantee, guarantee, mortgage, pledge, and others.
The usage of the loan, including personal life, family life, enterprise production and operation, crime, and others.
The lending intention, including regular lending, transfer loan, and others.
Conventional interest rate method, including no interest, simple interest, compound interest, unclear agreement, and others.
Interest during the agreed period, including , , , and others.
Borrowing delivery form, including no lending, cash, bank transfer, online electronic remittance, bill, online loan platform, authorization to control a specific fund account, unknown or fuzzy, and others.
Repayment form, including unpaid, partial repayment, cash, bank transfer, online electronic remittance, bill, unknown or fuzzy, and others.
Loan agreement, including loan contract, or borrowing, “WeChat, SMS, phone or other chat records”, receipt, irrigation, repayment commitment, guarantee, unknown or fuzzy and others.
After annotating these elements, we can assume that cases with similar elements are quite similar. So when we construct the triplets, we calculate the tf-idf similarity and elemental similarity between cases and select those similar cases to construct our dataset. We have constructed 8,964 triples in total by these methods, and the statistics can be found from Table 1. Then, legal professionals will annotate every triplet to see whether or . Furthermore, to ensure the quality of annotation, every document and triplet is annotated by at least three legal professionals to reach an agreement.
To access the challenge of the similar cases matching task, we evaluate several baselines on our dataset. The experiment results show that even the state-of-the-art systems perform poorly in evaluating the similarity between different cases.
Baselines. All the baseline models are trained on Large Train and tested on Large Valid and Large Test. We adapt the Siamese framework Bromley et al. (1994) to our scenario with different encoder, e.g. CNN Kim (2014), LSTM Hochreiter and Schmidhuber (1997), Bert Devlin et al. (2019), used for encoding the legal documents. We will elaborate on the details of the framework in the following part.
Given the triplet of fact description, (, , ), we first encode them into distributed vectors with the same encoder and then compute the similarity scores between the query case and the candidate cases , with a linear layer. Assume that each document consisting of words, i.e. .
is the dimension of word embeddings. Next, the encoder layer and max pooling layer transform the embedding sequenceinto features , where is the dimension of hidden vector. While for Bert encoder, we feed the document in character-level into the model to get the features .
Afterward, we calculate the similarity with a linear layer with softmax activation. is a weight matrix to be learned.
For the learning objective, we apply the binary cross-entropy loss function with ground-truth label:
Model Performance. We use the accuracy metric in our experiments. Table 2 shows the results of baselines and top 3 participant teams on Large Valid and Large Test dataset, from which we get the following conclusion: 1) The participants achieve promising progress compared to baseline models. 2) Both the baselines systems and participant teams perform poorly on the dataset, due to the lack of utilization of prior legal knowledge. It’s still challenging to utilize legal knowledge and simulate legal reasoning for the dataset.
In this paper, we propose a new dataset, CAIL2019-SCM, which focuses on the task of similar case matching in the legal domain. Compared with existing datasets, CAIL2019-SCM can benefit the case matching in the legal domain to help the legal partitioners work better. Experimental results also show that there is still plenty of room for improvement.
Learning text pair similarity with context-sensitive autoencoders. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1882–1892. Cited by: §2.1.
- Structured retrieval for question answering. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 351–358. Cited by: §2.1.
Signature verification using a” siamese” time delay neural network. In Advances in neural information processing systems, pp. 737–744. Cited by: §2.1, §4.
- Neural legal judgment prediction in english. In Proceddings of ACL. Cited by: §2.2.
- Enhanced lstm for natural language inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1657–1668. Cited by: §2.1.
- BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Cited by: §4.
- Attention-fused deep matching network for natural language inference.. In IJCAI, pp. 4033–4040. Cited by: §2.1.
- A methodology for a criminal law and procedure ontology for legal question answering. In In Proceddings of JIST, Cited by: §2.2.
Multi-perspective sentence similarity modeling with convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1576–1586. Cited by: §2.1.
- Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §4.
- Few-shot charge prediction with discriminative legal attributes. Cited by: §2.2.
- Interpretable rationale augmented charge prediction system. In In Proceedings of COLING, Cited by: §2.2.
- Overview of coliee 2017.. In COLIEE@ ICAIL, pp. 1–8. Cited by: §2.2.
- Mathematical models for legal prediction. Computer/lj 2, pp. 829. Cited by: §2.2.
- COLIEE-2015: evaluation of legal question answering. In In Proceddings of JURISIN, Cited by: §2.2.
- Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Cited by: §4.
- Predicting supreme court decisions mathematically: a quantitative analysis of the” right to counsel” cases. The American Political Science Review 51 (1), pp. 1–12. Cited by: §2.2.
- The supreme court’s many median justices. American Political Science Review 106 (4), pp. 847–866. Cited by: §2.2.
- Matching natural language sentences with hierarchical sentence factorization. In Proceedings of the 2018 World Wide Web Conference, pp. 1237–1246. Cited by: §2.1.
- Exploring phrase-based classification of judicial documents for criminal charges in chinese. In Proceedings of the 16th international conference on Foundations of Intelligent Systems, Cited by: §2.2.
- Predicting associated statutes for legal problems. Information Processing & Management 51 (1), pp. 194–211. Cited by: §2.2.
- Learning to predict charges for criminal cases with legal basis. In In Proceedings of EMNLP, Cited by: §2.2.
- Siamese recurrent architectures for learning sentence similarity. In Thirtieth AAAI Conference on Artificial Intelligence, Cited by: §2.1.
- Learning text similarity with siamese recurrent networks. In Proceedings of the 1st Workshop on Representation Learning for NLP, pp. 148–157. Cited by: §2.1.
- Glove: global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543. Cited by: §4.
- Analyzing the extraction of relevant legal judgments using paragraph-level and citation information. AI4JCArtificial Intelligence for Justice, pp. 30. Cited by: §2.2.
- Thulac: an efficient lexical analyzer for chinese. Technical report Technical Report. Technical Report. Cited by: §4.
- Multiway attention networks for modeling sentence pairs.. In IJCAI, pp. 4411–4417. Cited by: §2.1.
- Litigation analytics: case outcomes extracted from us federal court dockets. In Proceddings of NAACL-HLT. Cited by: §2.2.
- A sequence approach to case outcome detection. In In Proceedings of ICAIL, Cited by: §2.2.
- A deep architecture for semantic matching with multiple positional sentence representations. In Thirtieth AAAI Conference on Artificial Intelligence, Cited by: §2.1.
- Interpreting tf-idf term weights as making relevance decisions. ACM Transactions on Information Systems (TOIS) 26 (3), pp. 13. Cited by: §2.1.
- Cail2018: a large-scale legal dataset for judgment prediction. arXiv preprint arXiv:1807.02478. Cited by: §1.
- Event identification as a decision process with non-linear representation of text. arXiv preprint arXiv:1710.00969. Cited by: §2.2.
- Legal judgment prediction via multi-perspective bi-feedback network. Cited by: §2.2.
- Interpretable charge predictions for criminal cases: learning to generate court views from fact descriptions. In In Proceedings ofNAACL, Cited by: §2.2.
- Abcnn: attention-based convolutional neural network for modeling sentence pairs. Transactions of the Association for Computational Linguistics 4, pp. 259–272. Cited by: §2.1.
- Legal judgment prediction via topological learning. In In Proceedings of the EMNLP, Cited by: §2.2.
- Overview of cail2018: legal judgment prediction competition. arXiv preprint arXiv:1810.05851. Cited by: §1, §2.2.