Regulatory Compliance through Doc2Doc Information Retrieval: A case study in EU/UK legislation where text similarity has limitations

01/26/2021
by   Ilias Chalkidis, et al.
24

Major scandals in corporate history have urged the need for regulatory compliance, where organizations need to ensure that their controls (processes) comply with relevant laws, regulations, and policies. However, keeping track of the constantly changing legislation is difficult, thus organizations are increasingly adopting Regulatory Technology (RegTech) to facilitate the process. To this end, we introduce regulatory information retrieval (REG-IR), an application of document-to-document information retrieval (DOC2DOC IR), where the query is an entire document making the task more challenging than traditional IR where the queries are short. Furthermore, we compile and release two datasets based on the relationships between EU directives and UK legislation. We experiment on these datasets using a typical two-step pipeline approach comprising a pre-fetcher and a neural re-ranker. Experimenting with various pre-fetchers from BM25 to k nearest neighbors over representations from several BERT models, we show that fine-tuning a BERT model on an in-domain classification task produces the best representations for IR. We also show that neural re-rankers under-perform due to contradicting supervision, i.e., similar query-document pairs with opposite labels. Thus, they are biased towards the pre-fetcher's score. Interestingly, applying a date filter further improves the performance, showcasing the importance of the time dimension.

READ FULL TEXT

page 1

page 2

page 6

page 7

research
09/03/2020

Multi-Perspective Semantic Information Retrieval

Information Retrieval (IR) is the task of obtaining pieces of data (such...
research
04/02/2018

The Effectiveness of Classification on Information Retrieval System (Case Study)

Large amount of unstructured designed information is difficult to deal w...
research
08/10/2022

Exploiting Hierarchical Dependence Structures for Unsupervised Rank Fusion in Information Retrieval

The goal of rank fusion in information retrieval (IR) is to deliver a si...
research
02/28/2021

LRG at TREC 2020: Document Ranking with XLNet-Based Models

Establishing a good information retrieval system in popular mediums of e...
research
04/27/2020

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

Recent progress in Natural Language Understanding (NLU) is driving fast-...
research
07/14/2023

QontSum: On Contrasting Salient Content for Query-focused Summarization

Query-focused summarization (QFS) is a challenging task in natural langu...
research
07/10/2019

Let's measure run time! Extending the IR replicability infrastructure to include performance aspects

Establishing a docker-based replicability infrastructure offers the comm...

Please sign up or login with your details

Forgot password? Click here to reset