Performance in the Courtroom: Automated Processing and Visualization of Appeal Court Decisions in France

06/11/2020 ∙ by Paul Boniol, et al. ∙ HEC Paris Ecole Polytechnique 0

Artificial Intelligence techniques are already popular and important in the legal domain. We extract legal indicators from judicial judgment to decrease the asymmetry of information of the legal system and the access-to-justice gap. We use NLP methods to extract interesting entities/data from judgments to construct networks of lawyers and judgments. We propose metrics to rank lawyers based on their experience, wins/loss ratio and their importance in the network of lawyers. We also perform community detection in the network of judgments and propose metrics to represent the difficulty of cases capitalising on communities features.



Moshe Strugano

Great post


page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Recent advances in Artificial Intelligence (AI) and Natural Language Processing (NLP) allow the analysis of large numbers of legal documents in aggregate in contrast to traditional methods. A long-standing application of NLP to legal documents is information extraction and retrieval from judicial decisions. The interest in mining data from judgments can be explained by the critical role they play in the administration of justice in both common and civil law systems. The objective of our work is to analyze judgments by French courts to gain insights about the operation of the French judicial system, which could in turn help developing an interface for laypersons. As explained in (Ruhl and Katz, 2015), a legal user interface could shield the user of the legal system from the complexity of the underlying legal system. Ordinary people perceive the legal system as too complex (26), which results in part from the asymmetry of information in the market of legal services, where ordinary people are disadvantaged in comparison with providers of legal services (1). The asymmetry of information adds to the access-to-justice gap, such that a layperson lacks the right information and tools to choose the right lawyer at an affordable cost, and might prefer to self represent herself or refrain from filing a lawsuit. According to (Greacen et al., 2014) ”one of six Americans is a self-represented litigant in a newly filed case each year,” however, the resolutions are in favor of litigants represented with a lawyer. Both (1; J. M. Greacen, A. D. Johnson, and V. Morris (2014)) suggests ”the ease of access to information” is a solution to address the gap in accessing justice. Access to free basic legal information could help the user to navigate the justice system easily, understand better the legal area his problem falls into, and choose a lawyer with experience on the subject matter of the dispute. In our work, we extract and represent information from past judgments to increase the transparency of judicial procedures and make them more accessible to laypersons. First, we pre-process judgments by extracting relevant legal entities, such as the lawyers of each party, by using Named Entity Recognition (NER) models. Second, we analyze the win/loss rate of lawyers by building two lawyers’ networks: an opposing network of lawyers and a collaborative network of lawyers. Third, we use network analysis of judgments to suggest a measure of case difficulty based on case types/communities with distinct win/lose rates.

2. Related work

Numerous research have been carried out on case-law corpora focusing on specific objectives. One of the long-time objectives is the prediction of case outcomes. One of the first approaches in this field was (Kort, 1957) to manually convert a case factual elements into numerical values, compute their sum and predict a decision in favor of the petitioner if the sum is above a manually selected threshold. Recent efforts (Katz et al., 2014; Sulea et al., 2017; Branting et al., 2019; Long et al., 2019)

have used machine learning techniques to build outcome prediction models. Judicial judgments are rich in data, which could be used to analyze the operations of the legal system. The authors of

(Epstein et al., 2013; Rachlinski and Wistrich, 2017) used empirical methods to understand and describe judicial decision-making. Other researchers extract information to empower legal decision-makers and legal practitioners (Michalopoulos et al., 2019; Mok and Mok, 2019; Howe et al., 2019; Vacek et al., 2019). Judicial decisions lend themselves to the use of network analysis techniques. Networks of case law have been used several times (Fowler et al., 2007; Derlén and Lindholm, 2014) to measure the importance of a case. The theory of graphs provides tools well adapted to analyze the complexity of case law networks; for example, (Tarissan and Nollez-Goldbach, 2016) employs a hybrid version for bipartite graphs to clarify procedural aspects of the International Criminal Court. Judgments are expressed in natural language, therefore to scale their automatic processing, several researchers have been developing natural language processing techniques for the legal domain. Some adapt NLP techniques built for the general language to the legal language. (Sanchez, 2019) build their model of sentence boundary detection (SBD) for legal documents. Researchers from the Lynx project (Rehm et al., 2019) developed a set of NLP services to extract a variety of information from legal documents: term extraction, text structure recognition, and NER. NER techniques have several applications in the legal domain. (Barriere and Fouret, 2019) improved existing NER models and used the resulting models to extract, from French judgments, entities that should be anonymized before the public release of the judgments.

3. Data

3.1. Data collection

Our dataset consists of 40,000 rtf files that were crawled through Légifrance, a French legal publisher providing access to law codes and legal decisions. To navigate and crawl through Légifrance we used Selenium222, a python framework that simulates a real web browser. For our experiments, we used a sample of cases from the court of appeal consisting of 17,215 cases. We limit our first analysis to cases from the court of appeal due to the specificity of cases from trial courts and the Court of Cassation. For future works and analysis, a sample of cases from the Court of Cassation could also be used (more than 400,000 documents available on Légifrance). We decided to focus first on cases decided by civil courts and to exclude both criminal, administrative, and specialized courts. We also remove procedural judgments, such as court orders. Judgments analyzed here are solely final decisions called ”arrêt de Cour d’appel.”

3.2. Data preprocessing

Data preprocessing was the most challenging part of the project. The structure and wording of the legal documents, which vary between different courts and dockets, as well as the use of legal formal language, were challenging obstacles to conduct the text mining tasks. Below we analyze in detail how we approached each part, from segmenting the documents to extracting the persons taking part in each court case and their roles.

3.2.1. Segmentation

Analysis of the macrostructure of cases

The decisions of courts of appeals in France follow an overall similar structure. First, the documents state practical information about the litigation such as dates, jurisdiction, and the different entities involved in the trial, listed in the following order:

  • Appellant (appelant in French): The name of the party is always anonymized, for example: ”Monsieur Jean X.”

  • Appellant’s counsel: can be anonymized but always start with the keyword ”Me” or ”Maître,” for example, ”Me Jean Dupont.”

  • Appellee (intimée in French): this entity has the same format as the appellant.

  • Appellee’s counsel: Same format as the appellant’s counsel.

  • Court Entities (non-fixed order):

    • Judge (magistrats, conseillers in French ): could be anonymized but always start with the keyword ”Président.”

    • Clerk (Greffier in French): Can be anonymized but is always expressed near the word ”Greffier.”

After listing the entities, decisions from French appeal courts continue with the debate. The debate describes all the facts and procedure leading to the appeal. It also states the different arguments brought forth by the parties, and follows with the reasoning of the court. Finally, it closes with the conclusion which states the final decision. The keywords separating the different parts vary significantly, and are sometimes absent, which makes the segmentation task complex. Keywords may vary from one appeal court to another. We use graphs to compare the structure of judgments of appeal courts in several territorial jurisdictions. Figure 1 represents the flow of cases in two different jurisdictions. Each graph is built by parsing judgments from the same jurisdiction into sentences and then linking consecutive sentences by an edge. The name of a node is the text of the sentence. When the sentence is more than five words, the name of the node is "Long_Text_i_j" where i is the index of the case in the whole dataset, and j is the index of the sentence within the case. The size of the node is the occurrence of its text in all the judgment of the considered jurisdiction. To account for keywords that have small variations across documents, we use the Jaro similarity to identify these variations, examples are in table 1. The Jaro similarity is a similarity measure between two strings and (Jaro, 1989) defined with the following formula:


  • is the length of string

  • m is the number of same characters not further than

  • t is the number of transpositions.

The Jaro similarity is used to contract nodes of similar sentences, such that if two sentences have a Jaro similarity larger than 0.8, then they are considered belonging to the same node. Therefore the big nodes are common parts from all documents, and they represent the structure of these documents.

Sentence 1 Sentence 2 Jaro distance
faits et procedure faits procedure 0.86
procedure et pretentions des parties procedure et moyens des parties 0.83
moyens et pretentions des parties pretentions et moyens des parties 0.92
Table 1. Examples of Jaro distance between pairs of similar sentences contracted to a same node

Figure 1 points out the difference in structure and flows that documents from different jurisdictions can have. For instance, Agen will use ”ENTRE” to announce the appellant, and ”ET” to announce the appellee, whereas Douai will use respectively ”APPELANT” and ”INTIMEE.”

Figure 1. Most recurrent Keywords in decisions by courts in Douai, and Agen respectively. The nodes are part of the document. When the sentence is more than five words, the name of the node is attributed to Long_Text_i with i unique. Therefore, the big nodes are common parts from all documents, and therefore the structure.

Nevertheless, we empirically observed that all the decisions, whatever the jurisdiction was, shared the same keyword ”PAR CES MOTIFS” to announce the final decision of the court (last big node in the two flows of figure 1).

Segmentation with keywords

We also sought to extract entities corresponding to the lawyers defending each party. As described above, legal entities are mentioned after the practical information in a fixed order. Moreover, domain experts confirmed these legal entities are mentioned in separate segments. These segments are often preceded by known keywords, as shown in figure 2. Once we have identified the beginning and end of each segment, we use them to extract lawyers’ names as described in the following subsection.

Figure 2. Segmentation of a legal case based on keywords, to facilitate entity recognition

3.3. Extraction of lawyers’ entities

To detect lawyers’ names throughout the document, we discard, first, all segments except the appellant and the appellee segments. Second, we segment them further into sentences using the sentence tokenizer by Polyglot 333, which is a Python package providing multilingual natural language processing tools. Third, we only keep sentences containing honorifics used for lawyers such as ”Me,” ”Maître,” or lawyers’ keywords like: ”representé par.” We then use a well-established Named Entity Recognition model by Polyglot (Al-Rfou et al., 2015) to recognize person entities from the remaining sentences. The model uses pretrained word embeddings from Wikipedia (Al-Rfou et al., 2013)

to classify whether a word is an entity or not based on its sentence. Last, we consider the extracted named entities appearing in the appellant segment as lawyers of the appellant, and the names appearing in the appellee segment as lawyers of the appellee. It should be noted that decisions without any reference to a lawyer on both sides were overlooked.

3.4. Extraction of the judge decision

From the initial segmentation, the final decision of the court is to be found in the conclusion segment of the judgement. Concerning judgments from appeal courts, the court will either confirm the first lower court decision (Tribunal judiciaire) or reverse it. However, the court can also partially confirm the judgment. In other words, the court can decide to accept one of the appellant’s requests, and therefore change the first decision partially. Empirically, we noticed that certain words are present in certain types of decisions, and after validation from the domain experts, we resorted to a keyword-based solution:

  • ”Confirme”, ”Rejete”, ”Irrecevable”: keep the first decision (Appellee ”wins”)

  • ”Infirme”, ”Rectifier”, ”Réforme””: change the first decision (Appellant ”wins”)

Out of a sample of 5832 cases, 570 conclusions (10%) include at least a keyword representing both outcomes, in which case we keep the outcome that has most keywords. This is a temporary solution that requires refinement in the future.

4. Network analysis of Lawyers

Once the entities’ recognition is complete, we extract all the instances (and their function) in every document. Since courts tend to have a limited number of lawyers, judges and court clerks, cases share the same entities. Therefore, all the cases can be considered a big graph where entities interact with each other.

4.1. Opposing network of lawyers

We extracted the winning and losing lawyers in each decision. From this, we can define a directed weighted network. We draw an edge between lawyers if they have been opposed. The edge from lawyer i to lawyer j is weighted by the wins of lawyer i to lawyer j:

Where is the number of wins of lawyer i as an appellant and is the number of wins of lawyer i as an appellee. Parameters ’a’ and ’b’ are used to weigh more winning as an appellant than winning as an appellee since it is known by legal experts that the event of winning an appeal is less frequent than losing it.We also confirm this intuition by counting the rate of appeals’ rejection from our dataset. We get a rejection rate of 0.9. We collapse both edges between two lawyers into one directed edge weighted by: . In this case the edge direction is determined by the sign of such that the edge target is the lawyer with most wins. To visualize the most important nodes, we remove lawyers with only one case (899 out of 2146), which leaves us with a network with 1247 nodes and 2182 edges. The resulting network appears in figure 3.

Figure 3. The network of lawyers that have opposed in appeal cases. The width of the edge is a function of the number of wins the source node has over the target. The node color defines whether the lawyer has more losses or more wins, and the size is analogous to total number of cases, large yellow nodes mean the lawyer has won much more cases than he lost and vice versa.
Figure 4. Network of lawyers that have collaborated in court cases. The edge color signifies the sign of the win-loss metric and the width of the absolute value.

The edge goes from a ”losing” to a ”winning” lawyer, and the width of the edge represents the difference in the number of wins. Node size is the number of appeal cases where the lawyer appears and the color captures the win-loss difference.

4.2. Collaboration network of lawyers

The collaboration network in figure 4 indicates lawyers that have been on the same side during an appeal case. The edges are weighted based on the wins minus the losses, so the network can capture which collaborations are the most successful. We have removed nodes with number of collaborations below a fixed threshold to obtain a decluttered visualization of the network. We obtain a network of 47 nodes and 94 edges out of 2182 nodes and 2950 edges.

4.3. Lawyers Ranking

In this section, we suggest three metrics to rank and compare between lawyers. First, we measure the experience of a lawyer by the number of judgments mentioning him as the appellant’s or appellee’s lawyer. Second, we compute the win-loss rate of lawyers. Third, we calculate the centrality of a lawyer in the opposing network.

(a) Lawyers ordered by the total number of cases
(b) Lawyers ordered by the win/loss ratio
(c) Lawyers ordered by their importance using PageRank algorithm
Figure 5. Ranking of lawyers using three different measures

In figure 4(a), lawyers are ranked by their experience in going in front of the court of appeal. However, this measure alone does not indicate the performance of the lawyer. Thus we need to refer to the win/loss ratio to evaluate the performance. Lawyer 353 ranks first in terms of the win/loss ratio instead of fifth in terms of the total number of cases. In this case, lawyer 353 performs better than lawyer 387, who is ranked first in figure 4(a) but ranks 9 in terms of win/loss ratio. A weakness of the win/loss ratio ranking is that it does not consider the experience of the opposing lawyer; while the opponent’s worth can be a measure of the win’s value. To this end, we compute the weighted directed PageRank of the opposing network 4(c). As explained in section 4.1, the weights are the number of wins such that wins as an appellant’s lawyer counts more than wins as an appellee’s lawyer. So edges directed towards lawyers who win more representing an appellant have higher weights than edges directed towards a lawyer who wins more representing an appellee. Therefore top lawyers in figure 4(c) are lawyers who won against experienced lawyers and who win most as an appellant’s lawyer. Lawyer 387 is ranked best than lawyer 350 in terms of win/loss ratio, but worst in terms of PageRank measure. We could explain this difference in the ranking by the fact that the majority of wins of lawyer 350 wins as an appellant’s lawyer, while the majority of wins of lawyer 387 wins as an appellee’s lawyer. Thus it is recommended for an appellant to choose lawyer 350 rather than lawyer 387.

4.4. Network analysis of judgments

In this section, we develop a method to assess cases’ difficulty from the perspective of the appellant. More precisely, the aim is to compute the difficulty withing a group of cases dealing with the same legal issues. First, we built a network of cases to discover communities of cases about similar legal issues. Second, we use the win/loss rate of the appeal as a proxy to its difficulty.

Graphs encode knowledge and patterns more efficiently (Rousseau and Vazirgiannis, 2013; Nikolentzos et al., 2019). The crucial element is the edges representing some kind of similarity/affiliation among the nodes. Graphs are said to to have the property of community structure (Fortunato, 2010) when there are groups of vertices with high concentration of internal edges and low concentration of edges between these groups, see example in figure 6.c. These special groups are called communities, clusters or modules. In order to build a graph of cases, we needed to connect them with some property that represented similarity. Cases about the same legal issues tend to cite the sames groups of law articles, therefore we define the similarity of two cases by the number of common law articles mentioned in the text of the cases reflecting apparently the thematic similarity among them. Thus we build a network of judgments to discover the communities’ structure and natural divisions among the set of studied cases. First, we prepare cases by extracting cited articles of law. We extract articles by using regular expressions. Then we create an edge between two cases if they cite at least k same articles. Figure 7 shows graphs of judgments for different values of k. It is evident that as we increase k the graph becomes smaller with the cases having higher similarity due to the higher number of common articles.

(a) Cases with at least 3 articles in common
(b) Cases with at least 5 articles in common
(c) A case graph displaying community structure: two groups of cases with dense internal connections and sparser connections between groups
Figure 6. Examples of cases graphs and communities for a number of 7 cases

We built networks for different values of k from cases of the last three months of 2018, as shown in figure 7. The network naturally groups similar cases in communities. For example, in figure 6(c) cases against the same appellee and about the same issue. We also notice, figure 8 that cases with the same win/loss rate are grouped in the same communities.

(a) k=2 80,000 edges 1,000 nodes
(b) k=3 20,000 edges 600 nodes
(c) k=4 5,000 edges 400 nodes
Figure 7. Examples of cases graphs and communities for different values of k, (5500 cases)
Figure 8. Examples of detected communities. Communities circled in red have a high losing rate. Communities circled in green have a high winning rate.

5. Conclusion

We used NLP methods to extract information from judgments of the French court of appeal. We constructed indicators about the difficulty of lawyers’ performance and cases by using network analysis techniques on lawyers’ networks and cases’ networks. Our objective is to use these indicators to guide laypersons when confronted with the legal systems and contribute to the decrease of the access-to-justice gap by reducing the asymmetry of information characterizing the legal market. The lawyers’ ranking could serve to build a system that guides an appellee in choosing a lawyer. However, the lawyers’ ranking relies only on wins and losses of lawyers. In future work, we expect to produce a ranking that takes into account the legal area of the case and its difficulty, in such a way that the ranking could be more personalized to the needs of a layperson.


  • [1] (2016-11) Access to justice and market failure. Slaw. External Links: Link Cited by: §1.
  • R. Al-Rfou, V. Kulkarni, B. Perozzi, and S. Skiena (2015) Polyglot-ner: massive multilingual named entity recognition. In Proceedings of the 2015 SIAM International Conference on Data Mining, pp. 586–594. Cited by: §3.3.
  • R. Al-Rfou, B. Perozzi, and S. Skiena (2013) Polyglot: distributed word representations for multilingual nlp. arXiv preprint arXiv:1307.1662. Cited by: §3.3.
  • V. Barriere and A. Fouret (2019) May i check again?–a simple but efficient way to generate and use contextual dictionaries for named entity recognition. application to french legal texts. arXiv preprint arXiv:1909.03453. Cited by: §2.
  • K. Branting, B. Weiss, B. Brown, C. Pfeifer, A. Chakraborty, L. Ferro, M. Pfaff, and A. Yeh (2019) Semi-supervised methods for explainable legal prediction. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, pp. 22–31. Cited by: §2.
  • M. Derlén and J. Lindholm (2014) Goodbye van g end en l oos, hello b osman? using network analysis to measure the importance of individual cjeu judgments. European Law Journal 20 (5), pp. 667–687. Cited by: §2.
  • L. Epstein, W. M. Landes, and R. A. Posner (2013) The behavior of federal judges: a theoretical and empirical study of rational choice. Harvard University Press. Cited by: §2.
  • S. Fortunato (2010) Community detection in graphs. Physics reports 486 (3-5), pp. 75–174. Cited by: §4.4.
  • J. H. Fowler, T. R. Johnson, J. F. Spriggs, S. Jeon, and P. J. Wahlbeck (2007) Network analysis and the law: measuring the legal importance of precedents at the us supreme court. Political Analysis 15 (3), pp. 324–346. Cited by: §2.
  • J. M. Greacen, A. D. Johnson, and V. Morris (2014) From market failure to 100% access: toward a civil justice continuum. UALR L. Rev. 37, pp. 551. Cited by: §1.
  • J. S. T. Howe, L. H. Khang, and I. E. Chai (2019) Legal area classification: a comparative study of text classifiers on singapore supreme court judgments. arXiv preprint arXiv:1904.06470. Cited by: §2.
  • M. A. Jaro (1989) Advances in record-linkage methodology as applied to matching the 1985 census of tampa, florida. Journal of the American Statistical Association 84 (406), pp. 414–420. Cited by: §3.2.1.
  • D. M. Katz, I. Bommarito, J. Michael, and J. Blackman (2014) Predicting the behavior of the supreme court of the united states: a general approach. arXiv preprint arXiv:1407.6333. Cited by: §2.
  • F. Kort (1957) Predicting supreme court decisions mathematically: a quantitative analysis of the “right to counsel” cases. American Political Science Review 51 (1), pp. 1–12. Cited by: §2.
  • S. Long, C. Tu, Z. Liu, and M. Sun (2019) Automatic judgment prediction via legal reading comprehension. In China National Conference on Chinese Computational Linguistics, pp. 558–572. Cited by: §2.
  • D. P. Michalopoulos, J. Jacob, and A. Coviello (2019) AI-enabled litigation evaluation: data-driven empowerment for legal decision makers. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, pp. 264–265. Cited by: §2.
  • W. Y. Mok and J. R. Mok (2019) Legal machine-learning analysis: first steps towards ai assisted legal research. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, pp. 266–267. Cited by: §2.
  • G. Nikolentzos, A. J. Tixier, and M. Vazirgiannis (2019) Message passing attention networks for document understanding. arXiv preprint arXiv:1908.06267. Cited by: §4.4.
  • J. J. Rachlinski and A. J. Wistrich (2017) Judging the judiciary by the numbers: empirical research on judges. Annual Review of Law and Social Science 13, pp. 203–229. Cited by: §2.
  • G. Rehm, J. M. Schneider, J. Gracia, A. Revenko, V. Mireles, M. Khvalchik, I. Kernerman, A. Lagzdins, M. Pinnis, A. Vasilevskis, et al. (2019) Developing and orchestrating a portfolio of natural legal language processing and document curation services. In Proceedings of the Natural Legal Language Processing Workshop 2019, pp. 55–66. Cited by: §2.
  • F. Rousseau and M. Vazirgiannis (2013) Graph-of-word and tw-idf: new approach to ad hoc ir. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management, pp. 59–68. Cited by: §4.4.
  • J. Ruhl and D. M. Katz (2015) Measuring, monitoring, and managing legal complexity. Iowa L. Rev. 101, pp. 223. Cited by: §1.
  • G. Sanchez (2019) Sentence boundary detection in legal text. In Proceedings of the Natural Legal Language Processing Workshop 2019, pp. 31–38. Cited by: §2.
  • O. Sulea, M. Zampieri, M. Vela, and J. Van Genabith (2017) Predicting the law area and decisions of french supreme court cases. arXiv preprint arXiv:1708.01681. Cited by: §2.
  • F. Tarissan and R. Nollez-Goldbach (2016) Analysing the first case of the international criminal court from a network-science perspective. Journal of Complex Networks 4 (4), pp. 616–634. Cited by: §2.
  • [26] (2016-11)(Website) External Links: Link Cited by: §1.
  • T. Vacek, R. Teo, D. Song, T. Nugent, C. Cowling, and F. Schilder (2019) Litigation analytics: case outcomes extracted from us federal court dockets. In Proceedings of the Natural Legal Language Processing Workshop 2019, pp. 45–54. Cited by: §2.