LawSum: A weakly supervised approach for Indian Legal Document Summarization

by   Vedant Parikh, et al.

Unlike the courts in western countries, public records of Indian judiciary are completely unstructured and noisy. No large scale publicly available annotated datasets of Indian legal documents exist till date. This limits the scope for legal analytics research. In this work, we propose a new dataset consisting of over 10,000 judgements delivered by the supreme court of India and their corresponding hand written summaries. The proposed dataset is pre-processed by normalising common legal abbreviations, handling spelling variations in named entities, handling bad punctuations and accurate sentence tokenization. Each sentence is tagged with their rhetorical roles. We also annotate each judgement with several attributes like date, names of the plaintiffs, defendants and the people representing them, judges who delivered the judgement, acts/statutes that are cited and the most common citations used to refer the judgement. Further, we propose an automatic labelling technique for identifying sentences which have summary worthy information. We demonstrate that this auto labeled data can be used effectively to train a weakly supervised sentence extractor with high accuracy. Some possible applications of this dataset besides legal document summarization can be in retrieval, citation analysis and prediction of decisions by a particular judge.



There are no comments yet.


page 1

page 2

page 3

page 4


WSL-DS: Weakly Supervised Learning with Distant Supervision for Query Focused Multi-Document Abstractive Summarization

In the Query Focused Multi-Document Summarization (QF-MDS) task, a set o...

Incorporating Domain Knowledge for Extractive Summarization of Legal Case Documents

Automatic summarization of legal case documents is an important and prac...

Civil Asset Forfeiture: A Judicial Perspective

Civil Asset Forfeiture (CAF) is a longstanding and controversial legal p...

Attention based Sentence Extraction from Scientific Articles using Pseudo-Labeled data

In this work, we present a weakly supervised sentence extraction techniq...

VerbCL: A Dataset of Verbatim Quotes for Highlight Extraction in Case Law

Citing legal opinions is a key part of legal argumentation, an expert ta...

A Dataset of German Legal Documents for Named Entity Recognition

We describe a dataset developed for Named Entity Recognition in German f...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

There is an increasing interest in using the advances of NLP and machine learning to benefit the legal community. Especially tasks like summarization, rhetorical role labelling and precedent retrieval are highly sought after by lawyers and paralegals. Recently the Supreme Court of India established an AI Department, which focuses on administrative efficiency and legal research

111 It is clearly evident that AI integration in the Indian legal domain is no longer a futuristic endeavour. However, there are few publicly available large scale datasets of legal documents which is a necessity for any serious attempt at legal analytics research. In this work we try to mitigate that gap by proposing a new dataset consisting of over 10,000 judgements and corresponding summaries. While the data was already available in public domain, it was unstructured and noisy. We propose a pre-processing pipeline to curate the dataset and make it more usable.

Unlike the most commonly used datasets, like news articles, Wikipedia pages or social media content, the structure of legal text is much more richer. Very long and complex sentences with several abbreviations, named entities and citations are common in legal documents. The also have a variety of distinct units depending on the nature of information. For instance, statures are divided into sections, articles and paragraphs, while regulations can be in form of sections and sub-sections. This is country specific and varies considerably. As a result even the most basic pre-processing steps, like sentence and word tokenization, or entity recognition, do not work out of the box for legal texts. Further the legal vocabulary is highly technical and distinct from regular text, and is full of abbreviations which makes sentence tokenization even more difficult. This is commonly observed across a lot of works Bhattacharya et al. (2019b); Saravanan et al. (2006); Sanchez (2019). As a result, although several tens of thousands of these court judgements are publicly available, their utility is limited until they can be properly processed. This work aims to bridge that gap, by providing a collection of pre-processed and annotated documents, that can be used for several downstream tasks. To the best of our knowledge, this is the largest publicly available annotated corpora of Indian legal text.

The proposed dataset consists of 10,764 supreme court judgements pre-processed by normalizing abbreviations, handling spell variations in named entities and tokenizing into sentences. Additionally, we identify other useful meta-information about the case, like date of judgement, involved parties, names of the judges, citations, etc. We also label each sentence with their rhetorical roles using the trained model made available by Bhattacharya et al. (2019c) and pseudo-relevance score which indicates whether or not that sentence is summary worthy. We demonstrate the use of this dataset in legal summarization using weakly supervised neural summarization. We conclude that neural approaches comfortably outperform strong baseline techniques. We also conclude that our approach, is effective on different time frames and different sub-domains. While we demonstrate the use on a summarization take, we believe the proposed dataset would be useful to several other tasks, including rhetorical role labelling, precedent retrieval and citation analysis as well. We plan to make the entire preprocessing and annotation pipeline publicly available and provide the required tools for loading and using this dataset. Some supplementary material including sample documents is made available in our github repository222

2 Related Work

A lot of progress has been made in legal text processing in past few years. This includes rhetorical role labellingSaravanan et al. (2008)Bhattacharya et al. (2019c)Hachey and Grover (2005), argument miningMoens et al. (2007), legal text summarizationMoens (2007)Mehta and Majumder (2019)Bhattacharya et al. (2019b), contract analysis, fact search/identification, etc. These works also generate several benchmark datasets, a few of which are publicly available. However, majority of the publicly available datasets are of small size. For example, the text summarization system for canadian case documentsFarzindar and Lapalme (2004) uses 10 document-summary pairs for evaluation. Likewise Saravanan et al. (2006) uses 50 annotated judgements for evaluating graphical models of text summarization. While these datasets are not limited to English and several others like TEMIS in ItalianVenturi (2012) and Lexia in GermanWaltl et al. (2016) are available, most works are focused on legal documents from developed countries such as the UK, USA, Australia and CanadaFarzindar and Lapalme (2004)Nejadgholi et al. (2017). Few such corpora related to the Indian legal domain exist. In contrast to the existing datasets where the case documents are quite structured, the Indian case documents are a lot more noisy and not well structured. Indian case law reports do not usually contain any section headings and do not follow a certain structure, as compared to judgement from other countries like Austrilia, Canada, USA, making the summarization task more challenging.Bhattacharya et al. (2019b).

Despite this, there are several works that look at different aspects of legal text analysis in India like rhetorical role labellingBhattacharya et al. (2019c), semantic searchMore et al. (2019),Zhao et al. (2019) or legal document summarizationBhattacharya et al. (2019b); Saravanan et al. (2006); Bhattacharya et al. (2021)

. But the datasets used in these works again are either not public or relatively much smaller in size. The shared task on Artificial Intelligence for Legal Assistance

Bhattacharya et al. (2019a, 2020) provides a publicly available dataset for searching precedence and statutes. It consists of about 3000 precedent cases, about 200 statutes and 50 queries (hypothetical scenarios). The task is to find the precedents and statutes relevant to the scenario. Another recent dataset related to rhetorical role labelling is provided by Bhattacharya et al. (2019c). They provide 50 judgements tagged with 7 different rhetorical roles. A text summarization dataset used in Bhattacharya et al. (2019b) is somewhat similar to what is proposed in the current work. The dataset consists of around 20,000 judgement and summary pairs from the supreme court of India, but it is not publicly available. Moreover, as mentioned in the paper, that dataset is not structured or annotated. In contrast, the proposed collection is both, much larger than most previous collections and is annotated with a lot of meta-information.

Text summarization is a technique which refers to selecting the most important portions of original text and generating coherent summary out of it. The motivation for text summarization in legal domain is to help in navigating the enormous amount of legal documents produced across the world by the numerous legal institutions. In India itself, there are 25 High Courts and 672 District Courts which publish the legal reports publicly. Such a system will also help in speeding up several cases that are pending in Indian courts [as of 2019, 87.5 percent, District and Subordinate courts] 333 Since legal notes are long documents, legal institutions engage legal experts to produce headnotes which is known as summary. But such a task is labour-intensive, time-intensive as well as quite expensive. Therefore, Automatic summarization of legal documents can significantly help legal practitioners. Many summarization algorithms have been proposed till date, both for general text documents and a few specifically targeted to summarizing legal documents of various countries.

For summarising Canadian case judgments, a method was proposed LetsumFarzindar and Lapalme (2004) which determines the thematic structure of a judgment into four themes namely, introduction, context, juridical analysis, and conclusion. Relevant sentences are identified for each of the themes, and according to predefined percentages for each theme, the sentences are concatenated to form the summary. A similar approach was used in Bhattacharya et al. (2021) for summarizing Indian court judgements. In the case of Australian case judgments, the CaseSummarizer toolPolsley et al. (2016) was proposed which takes into account word frequencies and domain-specific information in the form of abbreviations, legal entities, etc. The legal entities like involved parties are recognized from section headings and also by identifying part of speech tags. Using these features, scores are determined for sentences, and depending on the threshold set by the user, summaries are generated. Another approach for summarization of Australian Case Judgements, HAUSS Galgani et al. (2014) which combines various methods into one approach. Along with that many structural attributes are acquired at term and sentence levels. Based on which rules are created for the extraction of relevant sentences. For UK case judgments Hachey and Grover (2007) uses structural information along with manually annotated rhetorical roles for extractive summarization. Salomon Uyttendaele et al. (2004) is a legal summarization tool which was developed for Belgian criminal cases. It also takes into account structure of the judgement. Whole document is divided into segments, each having specific information and property. For example, alleged offences and opinions of court, both having irrelevant info, while verdict of court is to the point. The segments having irrelevant information are summarized by using clustering algorithms. All the legal tools mentioned are designed considering the format of the judgements published in respective countries. Therefore, these methods will not show good results for legal documents of another country as shown in Bhattacharya et al. (2019b).

One of the first works in text summarization for Indian judgments was by Saravanan et al. (2008) where conditional random field (CRF) based automatic text summarization is proposed. The authors have segmented the document into seven rhetorical roles. Then, various features are used for identification of labels. They presented their results on 200 Kerela High Court Judgements of which 50 were hand-annotated. In Kanapala et al. (2019) each sentence of the judgement is ranked based on by optimizing fitness function which has sentence length, sentence position, degree of similarity, TF–ISF, legal keywords, etc. as their parameters. The proposed approach outperforms classical unsupervised algorithms. The dataset consisted of 1000 supreme court judgements. In Bhattacharya et al. (2019b), a systematic comparative study is performed on 17000+ Indian supreme court judgements, using algorithms such as LexRankErkan and Radev (2004), LSASteinberger and Jezek (2004), DSDRHe et al. (2012), LetSumFarzindar and Lapalme (2004), CasesummarizerPolsley et al. (2016), Graphical ModelSaravanan et al. (2006), Neural Extractive Summarizer etc.

All the approaches discussed above are extractive in nature, where most important sentences are identified and concatenated to form a summary. For, single-document extractive summarization, sentence representations of the documents are created and ranks sentences using machine learning algorithms. For example, SVM and Naive Bayes

Wong et al. (2008)

and Hidden Markov Model

Conroy and O’leary (2001)

. As the amount of labeled dataset increased, deep learning models started being used. In

Cao et al. (2015), CNN and LSTM are were used for creating sentence representations. In Cheng and Lapata (2016), both CNN and LSTM are used for ranking the sentences. In Nallapati et al. (2017)

, LSTM and GRU were used in a hierarchical manner for generating sentence representations. With onset state-of-the-art pre-trained language models several systems that used transformer like BERT

Devlin et al. (2018), RobertaLiu et al. (2019) etc, were proposed for summarization Liu and Lapata (2019); Zhang et al. (2019). Recently, Google has also released Zaheer et al. (2020) for summarization of Long Documents.

In case Unsupervised Extractive Summarisation, TextRank and LexRank were the prominent works. But, lately due increase in amount of text data published daily, and less labelled data many new approaches are coming up. The work by West et al. (2019) proposes a novel approach for unsupervised sentence summarization by mapping the Information Bottleneck principle to a conditional language modelling objective. In Zhou and Rush (2019), two language models are used, and shows that by using a product-of-experts criteria is enough for maintaining continuous contextual matching while maintaining output fluency. In Zheng and Lapata (2019), graph-based ranking algorithm is modified by computing node centrality in two ways, employing BERTDevlin et al. (2018) and graphs with directed edges arguing that the contribution of any two nodes to their respective centrality is influenced by their relative position in a document.

3 Dataset Details

The dataset contains 10,764 judgements delivered by the supreme court of India which are publicly Table 2 shows some word and sentence sentence level statistics of the judgements and headnotes. Fig. 1 shows the frequency distribution for length of judgements and headnotes in words. In this section we list out the annotation schema as well as the preprocessing steps used in creating this dataset. The end result is a structured json file for each judgement-headnote.555Complete json schema available at:

Figure 1: Frequency Distribution for Judgement and Headnote Lengths

While most of the mentioned fields are annotated with high accuracy (98

), there are two exceptions. In case of identifying the people related to the case, there are instances where the information is not present at all, or where it is present but we identify it incorrectly or partially. The reason the latter scenario are use of incorrect punctuations and incorrect spellings. We estimate the accuracy of this field to be around 70%. Add Precision Recall observation and Reasons for not proper identification and also example. The accuracy of rhetorical roles for sentences is limited by the original model used from

Bhattacharya et al. (2019c) and estimated to be around 50 for our dataset.

3.1 Preprocessing

The pre-processing steps consist mainly of normalizing abbreviations, identifying bad punctuations inserted by typing error(e.g.’appel- lant’,’consti- tuency’, should have been a single words) and removing the extra spaces in sentences which are then used to improve sentence tokenization.

Abbreviation Normalization

We identify a list of 31 most common abbreviations used in legal texts using regular expressions and frequency analysis. During pre-processing these abbreviations are expanded to their root form which serves two purposes. Apart from avoiding unexpected sentence breaks, removing unwanted punctuations, extra spaces between the words, expanding abbreviations also helps in normalizing the usage of those words. Several judgements use multiple abbreviations as well as their expanded form interchangeably (e.g. no. and num. for number, cls. and cl. for clause, etc). Expanding these reduces the vocabulary size, and the resulting consistency can be quite beneficial for down stream tasks. A complete list can be found in supplementary material. However, this list only consists of the most frequent abbreviations and is not exhaustive. There are several other, less frequent, abbreviations which are too numerous to handle explicitly. While we do identify them using the rules below, we do not expand them, but instead use that information to prevent incorrect sentence tokenization.

We define a valid abbreviation the rules below, which were formed empirically.

  1. has four or fewer characters, ends with a full stop and has a document frequency of more than 20.

  2. has a period between two characters (e.g. S.S.C)

Sentence Tokenization

After normalizing the 31 known abbreviations, we tokenize the text using the NLTK sentence Next we post-process the output using a set of rules, which we define by observing the most frequent errors in original tokenization. We merge a sentence with the one immediately following it if the sentence ends in a valid acronym, as defined above or it ends in a number followed by a sentence that does not begin with capital letter. The latter is to handle a common problem where sections and clauses are usually referred to as ’sec. 3.’ or ’cls. 4.’ Unlike Sanchez (2019) who find acronym handling to be ineffective in sentence boundary detection, in our case this preprocessing leads to correct sentence tokenization over 98 of the times.

Example of Sentence tokenization 777

Original Legal Text

It was observed by Lord Atkin in Eshugbayi Eleko vs Officer Administering the Government of
Nigeria C), that in accordance with British Jurispru dence no member of the executive can
interfere with the liberty or property of a British subject except when he can support the legality
of his act before a Court of justice. (1) [1924] 2 Irish Reports K. B. 104. (2) [1931] A. C. (62 at
670. 114 In The King vs The Secretary of State for Home Affairs(1), Scrutton LJ. observed:
”A man undoubtedly guilty of murder must yet be released if due forms of law have not
been followed in his conviction. ” It seems very arguable that in the whole set-up of Part III
of our Constitution these principles only remain guaranteed by article 21.

NLTK Tokenization

’It was observed by Lord Atkin in Eshugbayi Eleko vs Officer Administering the Government of
Nigeria C), that in accordance with British Jurispru dence no member of the executive can interfere
with the liberty or property of a British subject except when he can support the legality of his act
before a Court of justice.’,
2-2 ’(1) [1924] 2 Irish Reports K. B.’,
2-2 ’104.’,
2-2 ’(2) [1931] A. C. (62 at 670.’,
2-2 ’114 In The King vs The Secretary of State for Home Affairs(1), Scrutton LJ.’,
2-2 ’observed: ”A man undoubtedly guilty of murder must
yet be released if due forms of law have not been
followed in his conviction. ”’,
2-2 ’It seems very arguable that in the whole set-up of Part III of our Constitution these principles only
remain guaranteed by article 21.’

LAWSUMM Tokenization

It was observed by Lord Atkin in Eshugbayi Eleko vs Officer Administering the Government of
Nigeria C), that in accordance with British Jurisprudence no member of the executive can interfere
with the liberty or property of a British subject except when he can support
the legality of his act before a Court of justice.,
2-2 (1) [1924] 2 Irish Reports K. B. 104. (2) [1931] A. C. (62 at 670. 114
In The King vs The Secretary of State for Home Affairs(1), Scrutton
LJ. observed: ”A man undoubtedly guilty of murder must yet be released
if due forms of law have not been followed in his conviction. ”,
2-2 It seems very arguable that in the whole set-up of Part III of our
Constitution these principles only remain guaranteed by article 21..

Next we describe the annotation schema for the dataset. Each judgement and its associated headnote are converted into a json format with the fields described below. Apart from identifying sentences, we annotate several other fields, which can be potentially useful for summarization as well as several other tasks.

3.2 Annotation Schema

Case Name and Judgement Date

Case name is of the format ABC and others vs. XYZ and others, where ABC are the plaintiffs and XYZ are the defendants. Date includes day, month and year of judgement.


Searching cases by standardized citations or using them to refer other judgements is a common practice. Every judgement delivered by the supreme court of India is indexed by several journals and databases, which are referred by legal professionals to find relevant past judgements. Each such source comes with its own citation format. In this dataset, we identify the citations from three most common sources, INSC (Indian Supreme court), AIR (All India Reporter) and SCR (Supreme court reports). In some cases citation information is available for other, less popular sources, besides the ones mentioned above which are clubbed under ”Other citations” field. This field can be further used for citation analysis and also can be used for finding similar judgments. It can also be used for citation based approaches leverage other documents to summarize a target document. For a target document, they use the catchphrases of the documents cited by the target document (citphrases) and the citation sentences of documents that cite the target document (citances)Galgani F. (2012).


The field lists out the statutory acts referred to in the case. It includes a unique act id and the corresponding act text. These form an important part of the judgement and are usually included in the headnote. It is also particularly useful for finding precedents which relied on the same statutes while making an argument.


List of judges who delivered the verdict and additional fields which indicates whether the judge was the chief justice or not, if judge deliverd the judgement or not. Often names of the judges are spelt differently across cases (e.g. P.K. Balasubramanyan and P. Balasubramanian refer to the same judge). To handle this we use a list of the justices who served in supreme court in the past, along with the years of their service, available from court website. We then match the name mentioned in the judgement to those in the list using levenshtein distance and date of the judgement. Such normalization makes it possible to filter the cases by judges which can have applications like predicting the outcome of case given a particular bench of judges.

Involved parties

We identify the following parties involved in the judgements:

  • Plaintiffs, Defendants and Intervenors.

  • People appearing for the plaintiffs, defendants and Intervenors. This is in form of list of tuples, the name of the person appearing and the person for whom she is appearing. Same person can appear for multiple plaintiffs, or multiple people can appear for the same plaintiff.

Judgement Text

This constitutes the actual judgement delivered by the court. This closely resembles the raw judgement text that is available publicly. The only change from the original judgement text is that we replace the most common abbreviations with their expanded form and removing erroneous punctuations. This also contains lot of meta information about the case, mentioned above, but as free text.

Judgement Sentences

After preprocessing, we tokenize the judgement text into sentences using the steps mentioned in section 3.1. We further annotate each judgement sentence with following information:

  • Pseudo-relevance of the sentence. This is a weak label that indicates whether a sentence is summary worthy or not. A detailed discussion of this is presented in the next section.

  • Rhetorical role of the sentence as defined in in Bhattacharya et al. (2019c). For this we train a Bi-LSTM CRF model using the code and training data provided by the authors.

Headnote Text and Sentences

The pre-processing and annotation process for headnote text and sentences is similar to that for the judgements, with one exception. The headnote sentence do not have any pseudo relevance score associated with them.

3.3 Results and Analysis for Entities Identification

In this section we report the results for entity identification mentioned in Annotation Schema. We estimate the accuracy of all fields based on manual inspection of a subset of 50 randomly selected Supreme Court Judgements.

Case name, Judgement Date, Citations, Acts, are extracted from the Info file, which is provided along with Judgement and Headnote. We were able extract these entities correctly in all 50 Judgements.

For Judges, We were able to find all the judge who presided the case. We were able to identify judges in 98 of cases. The errors occurs due to, wrong usage of punctuations888, wrong sentence tokenization, and in some cases, the judge’s name is not present at regular position in the judgement 999 For Plaintiffs and Defendants, we used the case name. eg. DHIYAN SINGH AND ANR V. JUGAL KISHORE AND ANR, here Dhiyan Singh is the plaintiff and Jugal Kishore is the defendant. In some cases, Intervenors are also the part of Judgement. Intervenors are not a party to an existing lawsuit but who makes himself or herself a party either by joining with the plaintiff or uniting with the defendant in resistance of the plain-tiff’s claims.101010 Currently, we are able to identify the primary plaintiff, defendants, and intervenors. For other secondary parties, information is provided in very unstructured manner which makes it difficult to identify them. Attorneys for parties , i.e, plaintiff, defendants and intervenors have accuracies of 70 , 73 and 100 respectively. Confusion matrices for all 3 are given in following figures.

For each judgement sentence we are having two attributes, i.e, Rhetorical Roles and Pseudo Relevance. For Rhetorical Roles, we are 70

accurate in classifying the sentences into seven broad classes.

For finding the Pseudo relevance for the sentences of the judgement, we followed below mentioned steps. They are:

  • For each case, we compute TF-IDF vectors for all judgement and headnote sentences.

  • This vectors are used for calculating cosine similarity between each judgement sentence and all the headnote sentences. This would give us a matrix of dimension (Nos. of Judgement Sentences X Nos. of Headnote Sentences).

  • We take the maximum similarity value for each judgement sentence, and if value is greater than set threshold value, then sentence is classified as relevant.

For finding the threshold value, we manually annotated relevant sentences for 50 SupremeCourt cases. We tried different threshold values starting from 0.2 to 0.35. We found max correlation between manually annotated sentences and pseudo relevant sentences of 0.861 for threshold value of 0.3. Following figures 2

shows the confusion matrix for different thresholds. We have reported the results for Summarization using 0.3 threshold value.

Figure 2: Confusion Matrix for various thresholds

4 Benchmark results for Legal Text Summarization

In this section we report some late breaking results on legal summarization task. Use of the recent advances in neural summarization techniques in legal domain has been limited mainly due to lack of a publicly available annotated dataset. Given the complex nature and size of legal documents, which can easily run into dozens of pages, getting them manually annotated is prohibitively expensive. Instead we explore weak labelling techniques, similar to what is proposed in Collins et al. (2017) as an alternate to manual annotations. The headnotes are semi-abstractive and headnote sentences have considerable overlap with one or more judgement sentences.

We exploit this fact, to create a weakly labelled training data. Each sentence in the judgement is considered to be summary worthy if its cosine similarity with one or more headnote sentences is above the threshold of 0.3, set empirically as explained above.


We use the 8648 Supreme Court Judgements as the training set. And the remaining Judgements were equally distributed for validation and testing data. We generate summaries and perform all evaluations on the test set. Unlike Bhattacharya et al. (2019b) we have randomly distributed judgment into 3 sets. A sentence classifier is then trained to identify summary worthy sentences.

Summary Length

Some algorithms require the desired length of the summary to be given as an input. We tried different approaches for defining the summary length mentioned below:

  • Using the mean ratio of number of words in a headnote to that in a judgement (2̃3.4)

  • Using the mean ratio of number of sentences in a headnote to that in a judgement (2̃3.7)

  • Mean headnote length in number of words (864 words)

  • Mean headnote length in terms of number of sentences (23 Sentences)

  • Median headnote length in terms of number of words(610 words)

  • Median headnote length in terms of number of sentences (20 Sentences)

5 Model

For this benchmark, we use a simple 2-layer Bidirectional-lstm neural network which has been proven to be good at sentence classification

Liu and Guo (2019)

. We used hidden weights of 128 dimension with Dropout of 0.3. Forward and Backward Hidden Weights were concated and used as input for a linear layer with 2 output nodes for two classes namely, summary worthy and not worthy sentences. Word Embeddings of size 150 initialized using Xavier Normal Distribution. Pretrained embeddings, such as glove, fasttext, word2vec, etc., performed poorly as most of the legal vocabulary were missing. Maximum sentence length is set to 150 words, which is the mean length of the sentences in the Indian Supreme Court Judgements. Total vocabulary size used is 10000 words selected based on the frequency of words in the Indian Supreme Court Judgements. Training and Validation Batch size is set 32. We have used CrossEntropyLoss as the loss function with 4x bias towards summary-worthy sentences. This is done to reduce the effect due to Data Imbalance. Apart from this we also under sampled the dataset during training phase.

Threshold values were set on the classifier’s output value for summary worthy sentences. We tried different threshold starting from 0.5 to 0.85 , with step size of 0.05. Best Rouge F1 scores of 0.619 were reported on 0.725.

5.1 Results

We compare our models to popular extractive summarization techniques like LexrankErkan and Radev (2004), LSASteinberger and Jezek (2004), Greedy-KLHaghighi and Vanderwende (2009) and SumbasicVanderwende et al. (2007). We also used the transformer based approach for extractive text summarization that uses BERT Liu and Lapata (2019) (implementation publicly available) for comparison. Unlike Bhattacharya et al. (2019b) we observe that neural network based approach outperforms these strong baseline techniques by a substantial margin. The results of this preliminary experiment are shown in table 2. We report the standard ROUGE-F metrics for comparisonLin (2004). The baseline summaries are of 864 words, same as mean headnote length. For Neural summaries we do not choose the length explicitly and instead rely on the summarizer to classify sentences as important or not important. As evident, even a simple neural architecture comfortably outperforms strong baselines. This reinforces our hypothesis that weak labelling can be exploited for generating better summaries of legal documents. In future we would like to explore more suitable architecture and take advantage of other meta information like rhetorical roles, various entities, citations, etc. to improve the summaries.

Judge. HeadN.
Mean Sents 134 20
Median Sents 94 26
Min Sents 3 0
Max Sents 3861 873
Mean Words 4586 864
Median Words 3194 610
Min Words 103 0
Max Words 139943 28408
Table 1: Statistics of LAWSUMM
R1-F R2-F R4-F
LexRank 0.542 0.286 0.134
LSA 0.540 0.285 0.133
GreedyKL 0.538 0.276 0.134
Sumbasic 0.571 0.295 0.134
NN-Threshold 0.619 0.408 0.261
Table 2: Benchmark Results

5.2 Analysis for Different Summary Lengths

In Approach 2 and 4 have high recall value, so many n-grams are in the generated which are not part of headnotes. Therefore, we decreased the nos. of words and sentences for the generated Summaries by using median nos. of words and sentences in Approach 3 and 5 which have comparative values for Recall, Precision and F1-Score. While comparing approaches which uses sentences as a unit of length versus approaches that uses words as a unit length, we can see that precision is higher for latter approaches. This is because, Sentences in the Indian Legal Judgements are longer as compared to normal text, with mean length of 150 words.

R1-F1 Score R1-Precision R1-Recall
Approach 1 0.577 0.580 0.619
Approach 2 0.582 0.511 0.735
Approach 3 0.571 0.578 0.612
Approach 4 0.585 0.521 0.719
Approach 5 0.582 0.558 0.660
Approach 6 0.579 0.375 0.629
Table 3: Results for Different Approaches

Analysis for Different Legal Domains

Table 4 shows how generalizable the model is across the 5 different domains. The model gives good performance across all the domains, except for ‘Land Property’, best performance on Intellectual property.

Legal Domains R1-F R2-F R4-F
Land Property 0.560 0.326 0.177
Constitutional 0.574 0.311 0.150
Labour Industrial Law 0.676 0.417 0.268
Intellectual Property 0.697 0.509 0.368
Criminal 0.606 0.305 0.178
Table 4: Statistics of LAWSUMM on different Legal Domains

Analysis for Summary in different time frames

As the years passes by, way of writing the judgements and language used also changes. So, we test our model for summarization from 1950 to 1993. Fig 3. shows a line plot of Rouge-F1 Scores vs. years. The graph shows that our model performs equally, except for [1950 - 1955] where the graph is dipping.

Figure 3: Rouge Scores vs. Year

6 Conclusion

In this work we propose an annotated dataset of 10,764 judgements delivered by the supreme court of India, along with the associated handwritten summary called a headnote. We pre-process the documents to normalize abbreviations, named entities and tokenize it into sentences. We also annotate meta-information like names of people and judges associated with the case, date of judgement, citations of the case and statutory acts referred to in the judgement. Further we propose a weakly supervised approach for automatically summarizing these judgements. Some late breaking results show the effectiveness of our proposed weakly supervised approach which outperforms strong baseline techniques.



  • P. Bhattacharya, K. Ghosh, S. Ghosh, A. Pal, P. Mehta, A. Bhattacharya, and P. Majumder (2019a) Overview of the FIRE 2019 AILA track: artificial intelligence for legal assistance. In Working Notes of FIRE 2019 - Forum for Information Retrieval Evaluation, Kolkata, India, December 12-15, 2019, CEUR Workshop Proceedings, Vol. 2517, pp. 1–12. Cited by: §2.
  • P. Bhattacharya, K. Hiware, S. Rajgaria, N. Pochhi, K. Ghosh, and S. Ghosh (2019b) A comparative study of summarization algorithms applied to legal case judgments. In Advances in Information Retrieval - 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14-18, 2019, Proceedings, Part I, Lecture Notes in Computer Science, Vol. 11437, pp. 413–428. Cited by: §1, §2, §2, §2, §2, §4, §5.1.
  • P. Bhattacharya, P. Mehta, K. Ghosh, S. Ghosh, A. Pal, A. Bhattacharya, and P. Majumder (2020) FIRE 2020 AILA track: artificial intelligence for legal assistance. In FIRE 2020: Forum for Information Retrieval Evaluation, Hyderabad, India, December 16-20, 2020, P. Majumder, M. Mitra, S. Gangopadhyay, and P. Mehta (Eds.), pp. 1–3. External Links: Link, Document Cited by: §2.
  • P. Bhattacharya, S. Paul, K. Ghosh, S. Ghosh, and A. Wyner (2019c) Identification of rhetorical roles of sentences in indian legal judgments. In Legal Knowledge and Information Systems - JURIX 2019: The Thirty-second Annual Conference, Madrid, Spain, December 11-13, 2019, Frontiers in Artificial Intelligence and Applications, Vol. 322, pp. 3–12. Cited by: §1, §2, §2, item 2, §3.
  • P. Bhattacharya, S. Poddar, K. Rudra, K. Ghosh, and S. Ghosh (2021) Incorporating domain knowledge for extractive summarization of legal case documents. In ICAIL ’21: Eighteenth International Conference for Artificial Intelligence and Law, São Paulo Brazil, June 21 - 25, 2021, J. Maranhão and A. Z. Wyner (Eds.), pp. 22–31. External Links: Link, Document Cited by: §2, §2.
  • Z. Cao, F. Wei, S. Li, W. Li, M. Zhou, and H. Wang (2015) Learning summary prior representation for extractive summarization. In

    Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

    Beijing, China, pp. 829–833. External Links: Link, Document Cited by: §2.
  • J. Cheng and M. Lapata (2016) Neural summarization by extracting sentences and words. ArXiv abs/1603.07252. Cited by: §2.
  • E. Collins, I. Augenstein, and S. Riedel (2017) A supervised approach to extractive summarisation of scientific papers. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pp. 195–205. Cited by: §4.
  • J. M. Conroy and D. P. O’leary (2001)

    Text summarization via hidden markov models

    In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’01, New York, NY, USA, pp. 406–407. External Links: ISBN 1581133316, Link, Document Cited by: §2.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §2, §2.
  • G. Erkan and D. R. Radev (2004) LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, pp. 457–479. Cited by: §2, §5.1.
  • A. Farzindar and G. Lapalme (2004) Letsum, an automatic legal text summarizing system. Legal knowledge and information systems, JURIX, pp. 11–18. Cited by: §2, §2, §2.
  • H. A. Galgani F. (2012) Citation based summarisation of legal texts. In PRICAI 2012: Trends in Artificial Intelligence, pp. 40–52. Cited by: §3.2.
  • F. Galgani, P. Compton, and A. Hoffmann (2014) HAUSS: incrementally building a summarizer combining multiple techniques. Int. J. Hum. Comput. Stud. 72, pp. 584–605. Cited by: §2.
  • B. Hachey and C. Grover (2005) Sequence modelling for sentence classification in a legal summarisation system. In Proceedings of the 2005 ACM Symposium on Applied Computing, SAC ’05, New York, NY, USA, pp. 292–296. External Links: ISBN 1581139640, Link, Document Cited by: §2.
  • B. Hachey and C. Grover (2007) Extractive summarisation of legal texts. Artificial Intelligence and Law 14, pp. 305–345. Cited by: §2.
  • A. Haghighi and L. Vanderwende (2009) Exploring content models for multi-document summarization. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder, Colorado, pp. 362–370. External Links: Link Cited by: §5.1.
  • Z. He, C. Chen, J. Bu, C. Wang, L. Zhang, D. Cai, and X. He (2012) Document summarization based on data reconstruction. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, AAAI’12, pp. 620–626. Cited by: §2.
  • A. Kanapala, S. Jannu, and R. Pamula (2019) Summarization of legal judgments using gravitational search algorithm. Neural Computing and Applications, pp. 1–9. Cited by: §2.
  • C. Lin (2004) ROUGE: a package for automatic evaluation of summaries. In Text Summarization Branches Out, Barcelona, Spain, pp. 74–81. External Links: Link Cited by: §5.1.
  • G. Liu and J. Guo (2019) Bidirectional lstm with attention mechanism and convolutional layer for text classification. Neurocomputing 337, pp. 325–338. Cited by: §5.
  • Y. Liu and M. Lapata (2019) Text summarization with pretrained encoders. External Links: 1908.08345 Cited by: §2, §5.1.
  • Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov (2019) RoBERTa: a robustly optimized bert pretraining approach. ArXiv abs/1907.11692. Cited by: §2.
  • P. Mehta and P. Majumder (2019) Domain-specific summarisation. In From Extractive to Abstractive Summarization: A Journey, pp. 35–48. External Links: ISBN 978-981-13-8934-4, Document, Link Cited by: §2.
  • M. Moens, E. Boiy, R. M. Palau, and C. Reed (2007) Automatic detection of arguments in legal texts. In Proceedings of the 11th International Conference on Artificial Intelligence and Law, ICAIL ’07, New York, NY, USA, pp. 225–230. External Links: ISBN 9781595936806, Link, Document Cited by: §2.
  • M. Moens (2007) Summarizing court decisions. Information Processing and Management 43 (6), pp. 1748–1764. Note: Text Summarization External Links: ISSN 0306-4573, Document, Link Cited by: §2.
  • R. More, J. Patil, A. Palaskar, and A. Pawde (2019) Removing named entities to find precedent legal cases. In Working Notes of FIRE 2019 - Forum for Information Retrieval Evaluation, Kolkata, India, December 12-15, 2019, CEUR Workshop Proceedings, Vol. 2517, pp. 13–18. Cited by: §2.
  • R. Nallapati, F. Zhai, and B. Zhou (2017)

    SummaRuNNer: a recurrent neural network based sequence model for extractive summarization of documents

    In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI’17, pp. 3075–3081. Cited by: §2.
  • I. Nejadgholi, R. Bougueng, and S. Witherspoon (2017) A semi-supervised training method for semantic search of legal facts in canadian immigration cases.. In JURIX, pp. 125–134. Cited by: §2.
  • S. Polsley, P. Jhunjhunwala, and R. Huang (2016) CaseSummarizer: a system for automated summarization of legal texts. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations, Osaka, Japan, pp. 258–262. External Links: Link Cited by: §2, §2.
  • G. Sanchez (2019) Sentence boundary detection in legal text. In Proceedings of the Natural Legal Language Processing Workshop 2019, pp. 31–38. Cited by: §1, §3.1.
  • M. Saravanan, B. Ravindran, and S. Raman (2006) Improving legal document summarization using graphical models. Frontiers in Artificial Intelligence and Applications 152, pp. 51. Cited by: §1, §2, §2, §2.
  • M. Saravanan, B. Ravindran, and S. Raman (2008) Automatic identification of rhetorical roles using conditional random fields for legal document summarization. In Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I, pp. 481–490. External Links: Link Cited by: §2, §2.
  • J. Steinberger and K. Jezek (2004) Using latent semantic analysis in text summarization and summary evaluation. In Proceedings of the 5th International Conference on Information Systems Implementation and Modelling, pp. 93–100. Cited by: §2, §5.1.
  • C. Uyttendaele, M. Moens, and J. Dumortier (2004) Salomon: automatic abstracting of legal cases for effective access to court decisions. Artificial Intelligence and Law 6, pp. 59–79. Cited by: §2.
  • L. Vanderwende, H. Suzuki, C. Brockett, and A. Nenkova (2007) Beyond sumbasic: task-focused summarization with sentence simplification and lexical expansion. Inf. Process. Manage. 43 (6), pp. 1606–1618. External Links: ISSN 0306-4573, Link, Document Cited by: §5.1.
  • G. Venturi (2012) Design and development of temis: a syntactically and semantically annotated corpus of italian legislative texts. In Proceedings of the Workshop on Semantic Processing of Legal Texts (SPLeT 2012), pp. 1–12. Cited by: §2.
  • B. Waltl, F. Matthes, T. Waltl, and T. Grass (2016)

    LEXIA: a data science environment for semantic analysis of german legal texts

    Jusletter IT 4 (1), pp. 4–1. Cited by: §2.
  • P. West, A. Holtzman, J. Buys, and Y. Choi (2019) BottleSum: unsupervised and self-supervised sentence summarization using the information bottleneck principle. External Links: 1909.07405 Cited by: §2.
  • K. Wong, M. Wu, and W. Li (2008)

    Extractive summarization using supervised and semi-supervised learning

    In Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1, COLING ’08, USA, pp. 985–992. External Links: ISBN 9781905593446 Cited by: §2.
  • M. Zaheer, G. Guruganesh, K. A. Dubey, J. Ainslie, C. Alberti, S. Ontañón, P. Pham, A. Ravula, Q. Wang, L. Yang, and A. Ahmed (2020) Big bird: transformers for longer sequences. ArXiv abs/2007.14062. Cited by: §2.
  • X. Zhang, F. Wei, and M. Zhou (2019) HIBERT: document level pre-training of hierarchical bidirectional transformers for document summarization. ArXiv abs/1905.06566. Cited by: §2.
  • Z. Zhao, H. Ning, L. Liu, C. Huang, L. Kong, Y. Han, and Z. Han (2019) FIRE2019@aila: legal information retrieval using improved BM25. In Working Notes of FIRE 2019 - Forum for Information Retrieval Evaluation, Kolkata, India, December 12-15, 2019, CEUR Workshop Proceedings, Vol. 2517, pp. 40–45. Cited by: §2.
  • H. Zheng and M. Lapata (2019) Sentence centrality revisited for unsupervised summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 6236–6247. External Links: Link, Document Cited by: §2.
  • J. Zhou and A. Rush (2019) Simple unsupervised summarization by contextual matching. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 5101–5106. External Links: Link, Document Cited by: §2.