Evaluating Document Representations for Content-based Legal Literature Recommendations

04/28/2021
by   Malte Ostendorff, et al.
4

Recommender systems assist legal professionals in finding relevant literature for supporting their case. Despite its importance for the profession, legal applications do not reflect the latest advances in recommender systems and representation learning research. Simultaneously, legal recommender systems are typically evaluated in small-scale user study without any public available benchmark datasets. Thus, these studies have limited reproducibility. To address the gap between research and practice, we explore a set of state-of-the-art document representation methods for the task of retrieving semantically related US case law. We evaluate text-based (e.g., fastText, Transformers), citation-based (e.g., DeepWalk, Poincaré), and hybrid methods. We compare in total 27 methods using two silver standards with annotations for 2,964 documents. The silver standards are newly created from Open Case Book and Wikisource and can be reused under an open license facilitating reproducibility. Our experiments show that document representations from averaged fastText word vectors (trained on legal corpora) yield the best results, closely followed by Poincaré citation embeddings. Combining fastText and Poincaré in a hybrid manner further improves the overall result. Besides the overall performance, we analyze the methods depending on document length, citation count, and the coverage of their recommendations. We make our source code, models, and datasets publicly available at https://github.com/malteos/legal-document-similarity/.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 5

page 6

page 7

07/07/2020

Hier-SPCNet: A Legal Statute Hierarchy-based Heterogeneous Network for Computing Legal Case Document Similarity

Computing similarity between two legal case documents is an important an...
12/29/2021

LeSICiN: A Heterogeneous Graph-based Approach for Automatic Legal Statute Identification from Indian Legal Documents

The task of Legal Statute Identification (LSI) aims to identify the lega...
11/28/2017

Generative Interest Estimation for Document Recommendations

Learning distributed representations of documents has pushed the state-o...
04/13/2018

Are Abstracts Enough for Hypothesis Generation?

The potential for automatic hypothesis generation (HG) systems to improv...
10/13/2020

Aspect-based Document Similarity for Research Papers

Traditional document similarity measures provide a coarse-grained distin...
01/31/2021

Improving Accountability in Recommender Systems Research Through Reproducibility

Reproducibility is a key requirement for scientific progress. It allows ...
03/05/2018

Optimizing Slate Recommendations via Slate-CVAE

The slate recommendation problem aims to find the "optimal" ordering of ...

Code Repositories

legal-document-similarity

Legal document similarity - Code, data, and models for the ICAIL 2021 paper "Evaluating Document Representations for Content-based Legal Literature Recommendations"


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.