Towards Grad-CAM Based Explainability in a Legal Text Processing Pipeline

12/15/2020
by   Lukasz Gorski, et al.
0

Explainable AI(XAI)is a domain focused on providing interpretability and explainability of a decision-making process. In the domain of law, in addition to system and data transparency, it also requires the (legal-) decision-model transparency and the ability to understand the models inner working when arriving at the decision. This paper provides the first approaches to using a popular image processing technique, Grad-CAM, to showcase the explainability concept for legal texts. With the help of adapted Grad-CAM metrics, we show the interplay between the choice of embeddings, its consideration of contextual information, and their effect on downstream processing.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

01/12/2021

Expanding Explainability: Towards Social Transparency in AI systems

As AI-powered systems increasingly mediate consequential decision-making...
05/11/2022

The Conflict Between Explainable and Accountable Decision-Making Algorithms

Decision-making algorithms are being used in important decisions, such a...
09/11/2021

An Objective Metric for Explainable AI: How and Why to Estimate the Degree of Explainability

Numerous government initiatives (e.g. the EU with GDPR) are coming to th...
07/10/2020

Machine Learning Explainability for External Stakeholders

As machine learning is increasingly deployed in high-stakes contexts aff...
07/25/2020

Interpretabilité des modèles : état des lieux des méthodes et application à l'assurance

Since May 2018, the General Data Protection Regulation (GDPR) has introd...
05/16/2019

Knowledge-Based Sequential Decision-Making Under Uncertainty

Deep reinforcement learning (DRL) algorithms have achieved great success...
07/09/2020

Predicting Court Decisions for Alimony: Avoiding Extra-legal Factors in Decision made by Judges and Not Understandable AI Models

The advent of machine learning techniques has made it possible to obtain...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Advancements in the domain of AI and Law have brought additional considerations regarding models development, deployment, updating and their interpretability. This can be seen with the advent of machine-learning-based methods, which naturally exhibit a lower degree of explainability than traditional knowledge-based systems. Yet, knowledge representation frameworks that handle legal information, irrespective of their origin, should cover the pragmatics or context around a given concept and this functionality should be easily demonstrable.

Explainable AI (XAI), is a domain which has focused on providing interpretability and explainability to a decision making process. In the domain of law, interpretability and explainability are more than dealing with information/data transparency or system transparency [4] (henceforth referred to as ontological view). It additionally requires the (legal-) decision-model transparency, the ability to understand the model’s inner working when arriving at the decision (epistemic view

). In this paper, we aim to present the system’s user and architect with a set of tools that facilitate the discovery of inputs that contribute to convolutional neural network’s (CNN’s) output to the greatest degree, by adapting the Grad-CAM method, which originated from the field of computer vision. We adapt this method to the legal domain and show how it can be used to achieve a better understanding of a given system’s state and explain how different embeddings contribute to end result as well as to optimize this system’s inner workings. While this work is concerned with the ontological perspective, we aim this as a stepping stone for another related perspective, where the legally-based positions are connected with explanation thus providing the ability to explain the decisions to its addressee. This paper addresses mainly the technical aspects, showing how Grad-CAMs can be applied to the legal texts, describing the text processing pipeline - taking this as a departing point for deeper analyses in future work. We aim to present this technical implementation as well as the quantitative comparison metrics as the main contribution of the paper.

The paper is structured as follows. State-of-the-art is described in Section 2. Section 3 describes the methodology, which includes the metrics used for results quantification. The architecture used for experiments is described in Section 4. Section 5 talks about the different datasets used and the experimental setup. The outcomes are described in Section 6. Finally, Section 7 provides a conclusion and future work.

2 Related Work

The feasibility of using different - contextual (e.g. BERT) and non-contextual (e.g. word2vec) - embeddings was already studied outside the domain of law. In [3], it was found that the usage of more sophisticated, context-aware methods is unnecessary in the domains where labelled data and simple language are present. As far as the area of law is concerned, the feasibility of using the domain-specific vs. general embeddings (based on word2vec) for the representation of Japanese legal texts was investigated, with the conclusion that general embeddings have an upper hand [24]. The feasibility of using BERT in the domain of law was also already put under scrutiny as well. In [8] its generic pretrained version was used for embeddings generation and it was found that large computational requirements may be a limiting factor for domain-specific embedding creation. The same paper concluded that the performance of the generic version is lower when compared with law-based non-contextual embeddings. On the other hand, in [19], BERT versions trained on legal judgments corpus (of 18000 documents) were used and it was found that training on in-domain corpus does not necessarily offer better performance compared to generic embeddings. In [9] contradictory conclusions were reached: the system’s performance significantly improves when using pre-trained BERT on a legal corpus. Those results suggest that introduction of XAI-based methods might be a condition sine qua non for a proper understanding of general language embeddings and their feasibility in the domain.

Grad-CAMs are explainability method originating from computer vision [23]. It is a well established post-hoc explainability technique when CNNs are concerned. Moreover, Grad-CAM method passed independent sanity checks [1]

. Whilst it is mainly connected with the explanations of deep learning networks used with image data, it has already been adapted for other areas of application. In particular, CNN architecture for text classification was described in 

[12], and there exists at least one implementation which extends this work with Grad-CAM support for explainability [10]. Grad-CAMs were already used in the NLP domain, for (non-legal) document retrieval [7]. Herein we build upon this work and investigate the feasibility of using this method for the legal domain, in particular allowing for the visualisation of context-dependency of various word embeddings. Legal language is a special register of everyday language and deservers investigation on its own. The evolution of legal vocabulary can be precisely traced to particular statutes and precedential judgments, where it is refined and its boundaries are tested [18]. Many terms have thus a particular legal meaning and efficacy and tools that can safeguard final black-box models’ adherence to the particularities of legal language are valuable.

The endeavours aimed at using XAI methods in the legal domain, similar to this paper, have already been undertaken recently. In [5] an Attention Network was used for legal decision prediction - coupling it with attention-weight-based text highlighting of salient case text (though this approach was found to be lacking). The possibility of explaining the BERT’s inner workings was already investigated by other authors, and it was already subject to static as well as dynamic analyses. An interactive tool for the visualisation of its learning process was implemented in [11]. Machine-learning-based evaluation of context importance was performed in [22]; therein it was found that accounting for the content of a sentence’s context greatly improves the performance of legal information retrieval system.

However, the results mentioned hereinbefore do not allow for direct and easily interpretable comparison of different types of embeddings and we aim to explore an easy plug-in solution facilitating this aim.

3 Methodology

We study the interplay between the choice of embeddings, its consideration of contextual information, and its effect on downstream processing. For this work, a pipeline for comparison was prepared, with the main module being the embedder, classification CNN and metric-based evaluator. All the parts are easily pluggable, allowing for extendibility and further testing of a different combinations of modules.

The CNN used in the pipeline was trained for classification. We use two different datasets for CNN training (as well as testing) 111Section 5.1, provides a detailed discussion on the considered datasets:

  1. The Post-Traumatic Stress Disorder (PTSD) [16] dataset [26]

    , where rhetorical roles of sentences are classified.

  2. Statutory Interpretation - Identifying Particular (SIIP) dataset [22], where the sentences are classified into four categories according to their usefulness for a legal provision’s interpretation.

Whilst many methods have already been used for the analysis of aforementioned datasets (including regular expressions, Naive Bayes, Logistic Regression, SVMs 

[26], or Bi-LSTMs [2]), we are unaware of papers that use (explainable) CNNs for this tasks. On the other hand, usage of said CNN should not be treated as the main contribution of this paper, as the classification network is treated only as an exemplary application, warranting conclusions regarding the paper’s main contribution, i.e. the context-awareness of various embeddings when used in the legal domain.

Further down the line, the embeddings are used to transform CNN input sentences into vectors, with vector representation for each word in a sentence concatenated. Herein our implementation is based on the prior work 

[10][12].

3.1 Comparison metrics

Grad-CAM heatmaps are inherently visual tools for data analysis. In computer vision, they are commonly used for qualitative determination of input image regions that contribute to the final prediction of the CNN. While they are an attractive tool for a qualitative analysis of a single entity, they should be supplemented with other tools for easy comparison of multiple embeddings [13] and to facilitate quantitative analysis. Herein the following metrics are introduced and adapted to the legal domain:

  1. Fraction of elements above relative threshold ()

  2. Intersection over union with relative thresholds and ()

The first metric, , is designed to measure the CNN network attention spread over words present in the given input, i.e, what portion of the input is taken into account by CNN in the case of a particular prediction. It is defined as a number of elements in a vector that are larger than the relative threshold multiplied by the maximum vector value divided by the length of this vector.

The second metric, , helps to compare two predictions of two different models given the same input sentence. It answers the question of whether two models, when given the same input sentence, ‘pay attention’ to the same or different chunk(s) of the input sentence. It takes as arguments two Grad-CAM heatmaps ( and

), binarizes them using relative thresholds (

and ) and finally calculates standard intersection over union. It quantifies the relative overlap of words considered important for the prediction by each of two models.

4 System Architecture

The architecture, as shown in Fig 1, is designed to implement the methodology described in 3 and comprises four main modules, i.e.: preprocessing module, embedding module, classification module and visualization module. The pre-processing module uses some industry de facto standard text processing libraries for spelling correction, sentence detection, irregular character removal, etc. The embedding module houses a plug-in system to handle different variants of embeddings, in particular BERT and word2vec. The classification module houses simple 1D CNN which facilitates explainability method common in computer vision i.e. Grad-CAM. The visualization module is used for heatmap generation and metric computation.

Figure 1: System Architecture

The output from the pre-processing module is fed into the embeddings module. The embeddings used are based on variants of BERT and word2vec. In addition to the pre-trained ones, raw data from CourtListener [17] dataset was used for training embeddings creation.

Within the frame of the classification module, the output from the embeddings module is fed into a 1D convolutional layer followed by an average pooling layer and fully-connected layers with dropout and softmax [12]. Although CNN architectures stem from computer vision where an image forms the input of the network, the use of CNN for the sequence of word vectors as an input is reasonable. In a sentence relative positions of words convey meaning. It is similar to an image where relative positions of pixels convey information, with the difference being about dimensionality. Standard image is 2D while a sentence is a 1D sequence of words, therefore we use the 1D CNN for the task of sentence classification.

With Grad-CAM technique it is possible to produce a class activation map (heatmap) for a given input sentence and predicted class. Each element of the class activation map corresponds to one token and indicates its importance in terms of the score of the particular (usually the predicted) class. The class activation map gives information on how strongly the particular tokens present in the input sentence influence the prediction of the CNN.

The software stack used for the development of this system was instrumented under Anaconda 4.8.3 (with Python 3.8.3). Tensorflow v. 2.2.0 was used for CNN instrumentation and Grad-CAMs calculations (with the code itself expanding prior implementation available at 

[10]). Spacy 2.1.8 and blackstone 0.1.15 were used for CourtListener text cleaning. Various BERT implementations and supporting codes were sourced from Huggingface libraries: transformers v. 3.1.0, tokenizers v. 0.8.1rc2, nlp v. 0.4.0. Two computing systems available at ICM University of Warsaw were exploited for the experiments. Text cleaning was performed using the okeanos system (Cray XC40) and main calculations were run on rysy GPU cluster (4x Nvidia Tesla V100 32GB GPUs).

5 Experiments

5.1 Datasets

As stated in Section 3, we use two different datasets for experiments. The PTSD dataset is from the U.S. Board of Veterans’ Appeals (BVA) from 2013 through 2017. The dataset deals with the decisions from adjudicated disability claims by veterans for service-related post-traumatic stress disorder (PTSD) [16]. The dataset itself is well-known and has already been studied by other authors. It annotates a set of sentences originating from 50 decisions issued by the Board according to their function in the decision [26] [25] [21]. The classification consists of six elements: Finding Sentence, Evidence Sentence, Reasoning Sentence, Legal-Rule Sentence, Citation Sentence, Other Sentence.

The SIIP dataset pertains to the United States Code 5 § 552a(a)(4) provision and aims to annotate the judgments that are most useful for interpretation of said provision. The seed information for annotation is collected from the court decisions retrieved from the Caselaw access project data. The sentences are classified into four categories according to their usefulness for the interpretation: High Value, Certain Value, Potential Value, No Value [22].

5.2 Embeddings/Language Modeling

We use pre-trained models as well as we train domain-specific models for the purpose of vector representation of texts. Many flavours of word2vec and BERT embedders were tested. The paper does not go into any details on the comparison of these pre-trained models (or other similar models) based on performance. This has been addressed in several other papers [15][20][11].

For the word2vec a (slimmed down) GoogleNews model was used, with a vocabulary of words [27]. In addition, Law2vec embeddings were also employed, which were trained on a large freely-available legal corpus, with 200 dimensions [14]. For BERT, bert-base-uncased model was used, a transformer model consisting of 12 layers, 768 hidden units, 12 attention heads and 110M parameters. In addition to that, a slimmed-down version of BERT, DistilBERT was also tried, due to its accuracy being on the par with vanilla BERT, yet offering better performance and smaller memory footprint.

In addition to pretrained models, we have also tried training our own word2vec and BERT models. For this aim, a CourtListener [17]

database was sourced. However, due to the large computational requirements of BERT training, a small subset of this dataset was chosen, consisting of 180MiB of judgments. Moreover, while several legal projects provide access to a vast database of US case-laws, it was found that the judgments available therein need to be further processed, as the available textual representations usually contain unnecessary elements, such as page numbers or underscores, that hinder their machine processing. Our hand-written parser joined hyphenated words, removed page numbers and artifacts that were probably introduced by OCR-ing; furthermore, the text was split into sentences using spacy-based blackstone-parser. In line with other authors 

[6], we have found it to be imperfect and failing in segmenting the sentences that contained period-delimited legal abbreviations (e.g. Fed. - Federal). Thus it was supplemented with our own manually-curated list of abbreviations. The training was performed using DistilBERT model (for ca. 36 hours), as well as word2vec in two flavours, 200-dimensional (in line with the dimensionality of Law2Vec) and 768-dimensional (in line with BERT embeddings dimensionality).

As far as the BERT-based embeddings go, there is a number of ways in which they can be extracted from the model. One of the ways is taking embeddings for special CLS token, which prefixes any sentence fed into BERT; another technique that was studied in the literature amounted to concatenating the model’s final layer’s values. The optimal technique is dependent on the task and the domain. Herein we have found the latter to offer better accuracy for downstream CNN training. The features for CNN processing consisted of tokenized sentences, together with embeddings for special BERT tokens (their absence would cause a slight drop in accuracy as well).

6 Results

6.1 Metric-based heatmap comparison

Mean StdDev Mean StdDev Mean StdDev
word2vec (GoogleNews) 0.53 0.31 0.44 0.3 0.35 0.29
Law2vec 0.6 0.3 0.52 0.32 0.42 0.33
word2vec (CourtListener, 200d) 0.49 0.28 0.39 0.27 0.29 0.26
word2vec (CourtListener, 768d) 0.48 0.28 0.38 0.28 0.29 0.27
BERT(bert-base-uncased) 0.48 0.32 0.36 0.28 0.24 0.22
DistilBert(distilbert-base-uncased) 0.67 0.27 0.56 0.27 0.38 0.24
DistilBERT(CourtListener) 0.47 0.39 0.47 0.39 0.44 0.39
Table 1: Heatmap metric for the PTSD dataset
Mean StdDev Mean StdDev Mean StdDev
word2vec (GoogleNews) –
 BERT(bert-base-uncased)
0.49 0.25 0.41 0.24 0.3 0.21
Law2vec –
 BERT(bert-base-uncased)
0.51 0.26 0.43 0.25 0.34 0.25
word2vec(CourtListener, 200d) –
 Law2Vec
0.65 0.25 0.58 0.27 0.51 0.31
word2vec(CourtListener, 768d) –
 DistilBert(distilbert-base-uncased)
0.44 0.23 0. 35 0.22 0.26 0.21
Table 2: Heatmap metric for the selected pairs of embeddings for the PTSD dataset
PTSD SIIP
word2vec (GoogleNews) 0.7 0.9
Law2vec 0.69 0.85
word2vec (CourtListener, 200d) 0.78 0.93
word2vec (CourtListener, 768d) 0.79 0.94
BERT (bert-base-uncased) 0.84 0.94
DistilBERT (distilbert-base-uncased) 0.85 0.94
DistilBERT (CourtListener) 0.42 0.85
Table 3: Test set accuracy.
Figure 2: A sample heatmap for correct prediction with word2vec (CourtListener,768d) embedding
Figure 3: A sample heatmap for failed prediction with word2vec (CourtListener,768d) embedding

A sample heatmap can be referenced in Fig. 2 and Fig. 3, with a colorbar defining the mapping between the colors and values. Fig 2 clearly shows the area of CNN’s attention, which can be quantified further down the line. This picture shows a properly classified sentence, a statement of evidence, defined by the PTSD dataset’s authors as a description of a piece of evidence. CNN pays most attention to the phrase "medical records", which is in line with PTSD’s authors’ annotation protocols, where this kind of sentence describes a given piece of evidence (e.g. the records of testimony). We have found the sentence in Fig. 3 to be hard to classify for ourselves and it prima facie seemed for us to be an example of evidentiary sentence. In the case of CNN, no distinctive activations can be spotted.

Yet, we did not perform any detailed analyses of such images. Instead, we focus on two types of comparison using metrics defined in section 3.1. The comparisons are designed to capture differences between different embeddings, particularly in terms of context handling. First, for a given embedding we calculate CNN network attention spread over words quantified by metric averaged over all input sentences contained in the test set. Then we can compare the mean fraction of words (tokens) in the input sentences which contribute to prediction in the case of various embeddings. Criterion deciding if a particular word contributes to the prediction is, in fact, arbitrary and depends on class activation map (heatmap) binarization threshold. This is why we test a few thresholds, including as suggested in [23] for weakly supervised localization. Essentially high value of the fraction indicates that most word vectors in input sentence are taken into account by CNN during inference. Conversely, the low value of the fraction indicates that most word vectors in the input sentence are ignored by CNN during inference. The comparison results for the PTSD dataset are shown in Table 1 and Table 2 (SIIP dataset was omitted for brevity and due to the similarity with the presented PTSD dataset). The outstanding similarity between word2vec and Law2Vec can be spotted in Table 2, due to both of those models belonging to the same class, as exhibited by the high value of metric.

6.2 Grad-CAM guided context extraction

The analysis of heatmaps and metrics presented hereinbefore proves that only a part of a given sentence contributes to a greater extent to final results. We have hypothesized that it is possible to decrease the amount of CNN’s input data to those important parts without compromising the final prediction. In this respect, Grad-CAM was treated as a helpful heuristic that allows to identify the most important words for a given CNN in its training phase. For this experiment, the value of

, for the threshold of was used to select a percentage of the most important words from a given training example. This in turn was used to compose a vocabulary (or white-list) of the most important words that were encountered during the training. Further down the line, this white-list was used during the inference and only the words present on the list were passed as input to the CNN. Nevertheless, the number of white-listed words allowed coherent sentences to be still passed into CNN (for example, the PTSD sentence However, this evidence does not make it clear and, before white-listing amounted to However, this evidence does not make it clear and unmistakable.).

We have managed to keep accuracy up to the bar of an unmodified dataset using this procedure (e.g. 0.7 for PTSD-word2vec(GoogleNews) and 0.85 for PTSD-DistilBERT (distilbert-base-uncased).

7 Conclusion & Future Work

We presented the first approach to using a popular image processing technique, Grad-CAMs to showcase the explainability concept for legal texts. Few conclusions which we can be drawn from the presented methodology are:

  • The mean value of is higher in the case of DistilBERT embedding than in the cases of word2vec and Law2vec embeddings. It suggests that CNN trained and utilised with this embedding tends to take into account a relatively larger chunk of input sentence while making prediction.

  • Described metrics and visualizations provide a peek into the complexity of context handling aspects embedded in a language model.

  • It enables an user to identify and catalog attention words in a sentence type for data optimization in downstream processing tasks.

Some issues which need further investigation are:

  • Training of these domain-specific models requires time and resources. Apart from algorithmic optimization, data optimization also plays an important role. Extension of this methodology can be used to remove tokens that do not contribute to the final outcome of any downstream processing tasks. A systematic analysis of the method presented in Section 6.2 is warranted.

  • Mapping of metrics from our methodology to standard machine learning metrics could allow us to infer the quality of language models in a given domain (i.e. legal domain). This allows us to measure the quality of a model when there is not sufficient gold data which can be used for effective training of models (inline to the concept of semi-supervised learning).

  • An extension of this approach could be used when validating the consistency of context in facts. And inturn the legal argument chain which is built based on these facts.

Acknowledgment

This research was carried out with the support of the Interdisciplinary Centre for Mathematical and Computational Modelling (ICM), University of Warsaw, under grant no GR81-14.

References

  • [1] J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim (2018) Sanity checks for saliency maps. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31, pp. 9505–9515. External Links: Link Cited by: §2.
  • [2] S. R. Ahmad, D. Harris, and I. Sahibzada (2020)

    Understanding legal documents: classification of rhetorical role of sentences using deep learning and natural language processing

    .
    In 2020 IEEE 14th International Conference on Semantic Computing (ICSC), Vol. , pp. 464–467. Cited by: §3.
  • [3] S. Arora, A. May, J. Zhang, and C. Ré (2020) Contextual embeddings: when are they worth it?. External Links: 2005.09117 Cited by: §2.
  • [4] A. Bibal, M. Lognoul, A. de Streel, and B. Frénay (2020) Legal requirements on explainability in machine learning. Artificial Intelligence and Law, pp. 1–21. Cited by: §1.
  • [5] L. K. Branting, C. Pfeifer, B. Brown, L. Ferro, J. Aberdeen, B. Weiss, M. Pfaff, and B. Liao (2020) Scalable and explainable legal prediction. Artificial Intelligence and Law, pp. 1–26. Cited by: §2.
  • [6] E. Choi, G. Brassil, K. Keller, J. Ouyang, and K. Wang Bankruptcy map: a system for searching and analyzing us bankruptcy cases at scale. Cited by: §5.2.
  • [7] J. Choi, J. Choi, and W. Rhee (2020) Interpreting neural ranking models using grad-cam. arXiv preprint arXiv:2005.05768. Cited by: §2.
  • [8] C. Condevaux, S. Harispe, S. Mussard, and G. Zambrano (2019)

    Weakly supervised one-shot classification using recurrent neural networks with attention: application to claim acceptance detection.

    .
    In JURIX, pp. 23–32. Cited by: §2.
  • [9] E. Elwany, D. Moore, and G. Oberoi (2019) BERT goes to law school: quantifying the competitive advantage of access to large legal corpora in contract understanding. arXiv preprint arXiv:1911.00473. Cited by: §2.
  • [10] Grad-cam for text. Note: Accessed: 2020-08-05https://github.com/HaebinShin/grad-cam-text Cited by: §2, §3, §4.
  • [11] B. Hoover, H. Strobelt, and S. Gehrmann (2019) ExBERT: a visual analysis tool to explore learned representations in transformers models. External Links: 1910.05276 Cited by: §2, §5.2.
  • [12] Y. Kim (2014-10)

    Convolutional neural networks for sentence classification

    .
    In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1746–1751. External Links: Link, Document Cited by: §2, §3, §4.
  • [13] D. Krakov and D. G. Feitelson (2013) Comparing performance heatmaps. In Workshop on Job Scheduling Strategies for Parallel Processing, pp. 42–61. Cited by: §3.1.
  • [14] Law2Vec: legal word embeddings. Note: Accessed: 2020-09-21https://archive.org/details/Law2Vec Cited by: §5.2.
  • [15] L. Martin, B. Muller, P. J. Ortiz Suárez, Y. Dupont, L. Romary, É. de la Clergerie, D. Seddah, and B. Sagot (2020-07) CamemBERT: a tasty French language model. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, pp. 7203–7219. External Links: Link Cited by: §5.2.
  • [16] V. H. Moshiashwili (2015) The downfall of auer deference: veterans law at the federal circuit in 2014.. 64 Am. U. L. Rev. 1007 (2015). Cited by: item 1, §5.1.
  • [17] F. L. Project (2020)(Website) External Links: Link Cited by: §4, §5.2.
  • [18] E. L. Rissland, K. D. Ashley, and R. P. Loui (2003) AI and law: a fruitful synergy. Artificial Intelligence 150 (1-2), pp. 1–15. Cited by: §2.
  • [19] J. Rossi and E. Kanoulas (2019) Legal search in case law and statute law.. In JURIX, pp. 83–92. Cited by: §2.
  • [20] V. Sanh, L. Debut, J. Chaumond, and T. Wolf (2019) DistilBERT, a distilled version of bert: smaller, faster, cheaper and lighter. External Links: 1910.01108 Cited by: §5.2.
  • [21] J. Savelka, V. R. Walker, M. Grabmair, and K. D. Ashley (2017) Sentence boundary detection in adjudicatory decisions in the united states. Cited by: §5.1.
  • [22] J. Savelka, H. Xu, and K. D. Ashley (2019) Improving sentence retrieval from case law for statutory interpretation. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, pp. 113–122. Cited by: §2, item 2, §5.1.
  • [23] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra (2019-10) Grad-cam: visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision 128 (2), pp. 336–359. External Links: ISSN 1573-1405, Link, Document Cited by: §2, §6.1.
  • [24] L. Tang and K. Kageura (2019) An examination of the validity of general word embedding models for processing japanese legal texts. In Proceedings of the Third Workshop on Automated Semantic Analysis of Information in Legal Texts, Montreal, QC, Canada, June 21, 2019, CEUR Workshop Proceedings, Vol. 2385. External Links: Link Cited by: §2.
  • [25] V. R. Walker, J. H. Han, X. Ni, and K. Yoseda (2017) Semantic types for computational legal reasoning: propositional connectives and sentence roles in the veterans’ claims dataset. ICAIL ’17, New York, NY, USA, pp. 217–226. External Links: ISBN 9781450348911, Link, Document Cited by: §5.1.
  • [26] V. R. Walker, K. Pillaipakkamnatt, A. M. Davidson, M. Linares, and D. J. Pesce (2019) Automatic classification of rhetorical roles for sentences: comparing rule-based scripts with machine learning. In Proceedings of the Third Workshop on Automated Semantic Analysis of Information in Legal Texts, Montreal, QC, Canada, June 21, 2019, CEUR Workshop Proceedings, Vol. 2385. External Links: Link Cited by: item 1, §3, §5.1.
  • [27] Word2vec-slim. Note: Accessed: 2020-09-21https://github.com/eyaler/word2vec-slim Cited by: §5.2.