Cross-domain Retrieval in the Legal and Patent Domains: a Reproducibility Study

12/21/2020
by   Sophia Althammer, et al.
0

Domain specific search has always been a challenging information retrieval task due to several challenges such as the domain specific language, the unique task setting, as well as the lack of accessible queries and corresponding relevance judgements. In the last years, pretrained language models, such as BERT, revolutionized web and news search. Naturally, the community aims to adapt these advancements to cross-domain transfer of retrieval models for domain specific search. In the context of legal document retrieval, Shao et al. propose the BERT-PLI framework by modeling the Paragraph Level Interactions with the language model BERT. In this paper we reproduce the original experiments, we clarify pre-processing steps, add missing scripts for framework steps and investigate different evaluation approaches, however we are not able to reproduce the evaluation results. Contrary to the original paper, we demonstrate that the domain specific paragraph-level modelling does not appear to help the performance of the BERT-PLI model compared to paragraph-level modelling with the original BERT. In addition to our legal search reproducibility study, we investigate BERT-PLI for document retrieval in the patent domain. We find that the BERT-PLI model does not yet achieve performance improvements for patent document retrieval compared to the BM25 baseline. Furthermore, we evaluate the BERT-PLI model for cross-domain retrieval between the legal and patent domain on individual components, both on a paragraph and document-level. We find that the transfer of the BERT-PLI model on the paragraph-level leads to comparable results between both domains as well as first promising results for the cross-domain transfer on the document-level. For reproducibility and transparency as well as to benefit the community we make our source code and the trained models publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2021

JuriBERT: A Masked-Language Model Adaptation for French Legal Text

Language models have proven to be very useful when adapted to specific d...
research
08/09/2021

DoSSIER@COLIEE 2021: Leveraging dense retrieval and summarization-based re-ranking for case law retrieval

In this paper, we present our approaches for the case law retrieval and ...
research
08/30/2019

Adapt or Get Left Behind: Domain Adaptation through BERT Language Model Finetuning for Aspect-Target Sentiment Classification

Aspect-Target Sentiment Classification (ATSC) is a subtask of Aspect-Bas...
research
05/27/2018

Legal Document Retrieval using Document Vector Embeddings and Deep Learning

Domain specific information retrieval process has been a prominent and o...
research
05/19/2023

Exploring the Viability of Synthetic Query Generation for Relevance Prediction

Query-document relevance prediction is a critical problem in Information...
research
07/27/2011

HyFlex: A Benchmark Framework for Cross-domain Heuristic Search

Automating the design of heuristic search methods is an active research ...
research
09/15/2023

Reproducible Domain-Specific Knowledge Graphs in the Life Sciences: a Systematic Literature Review

Knowledge graphs (KGs) are widely used for representing and organizing s...

Please sign up or login with your details

Forgot password? Click here to reset