LawngNLI: A Long-Premise Benchmark for In-Domain Generalization from Short to Long Contexts and for Implication-Based Retrieval

12/06/2022
by   William Bruno, et al.
0

Natural language inference has trended toward studying contexts beyond the sentence level. An important application area is law: past cases often do not foretell how they apply to new situations and implications must be inferred. This paper introduces LawngNLI, constructed from U.S. legal opinions with automatic labels with high human-validated accuracy. Premises are long and multigranular. Experiments show two use cases. First, LawngNLI can benchmark for in-domain generalization from short to long contexts. It has remained unclear if large-scale long-premise NLI datasets actually need to be constructed: near-top performance on long premises could be achievable by fine-tuning using short premises. Without multigranularity, benchmarks cannot distinguish lack of fine-tuning on long premises versus domain shift between short and long datasets. In contrast, our long and short premises share the same examples and domain. Models fine-tuned using several past NLI datasets and/or our short premises fall short of top performance on our long premises. So for at least certain domains (such as ours), large-scale long-premise datasets are needed. Second, LawngNLI can benchmark for implication-based retrieval. Queries are entailed or contradicted by target documents, allowing users to move between arguments and evidence. Leading retrieval models perform reasonably zero shot on a LawngNLI-derived retrieval task. We compare different systems for re-ranking, including lexical overlap and cross-encoders fine-tuned using a modified LawngNLI or past NLI datasets. LawngNLI can train and test systems for implication-based case retrieval and argumentation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/06/2023

Improving Domain-Specific Retrieval by NLI Fine-Tuning

The aim of this article is to investigate the fine-tuning potential of n...
research
05/26/2023

Few-shot Fine-tuning vs. In-context Learning: A Fair Comparison and Evaluation

Few-shot fine-tuning and in-context learning are two alternative strateg...
research
12/21/2021

On Cross-Lingual Retrieval with Multilingual Text Encoders

In this work we present a systematic empirical study focused on the suit...
research
05/31/2023

Measuring the Robustness of Natural Language Processing Models to Domain Shifts

Large Language Models have shown promising performance on various tasks,...
research
06/30/2023

BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting

Short-term forecasting of residential and commercial building energy con...
research
03/01/2023

UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers

Many information retrieval tasks require large labeled datasets for fine...
research
07/17/2023

Live Long and Prosper:Analyzing Long-Lived MOAS Prefixes in BGP

BGP exchanges reachability information in the form of prefixes, which ar...

Please sign up or login with your details

Forgot password? Click here to reset