LM vs LM: Detecting Factual Errors via Cross Examination

05/22/2023
by   Roi Cohen, et al.
0

A prominent weakness of modern language models (LMs) is their tendency to generate factually incorrect text, which hinders their usability. A natural question is whether such factual errors can be detected automatically. Inspired by truth-seeking mechanisms in law, we propose a factuality evaluation framework for LMs that is based on cross-examination. Our key idea is that an incorrect claim is likely to result in inconsistency with other claims that the model generates. To discover such inconsistencies, we facilitate a multi-turn interaction between the LM that generated the claim and another LM (acting as an examiner) which introduces questions to discover inconsistencies. We empirically evaluate our method on factual claims made by multiple recent LMs on four benchmarks, finding that it outperforms existing methods and baselines, often by a large gap. Our results demonstrate the potential of using interacting LMs for capturing factual errors.

READ FULL TEXT
research
10/15/2019

The Blessings of Multiple Causes: A Reply to Ogburn et al. (2019)

Ogburn et al. (2019, arXiv:1910.05438) discuss "The Blessings of Multipl...
research
05/24/2023

Mastering the ABCDs of Complex Questions: Answer-Based Claim Decomposition for Fine-grained Self-Evaluation

When answering complex questions, large language models (LLMs) may produ...
research
12/17/2022

Claim Optimization in Computational Argumentation

An optimal delivery of arguments is key to persuasion in any debate, bot...
research
09/12/2016

Comment on "Why does deep and cheap learning work so well?" [arXiv:1608.08225]

In a recent paper, "Why does deep and cheap learning work so well?", Lin...
research
12/07/2022

Discovering Latent Knowledge in Language Models Without Supervision

Existing techniques for training language models can be misaligned with ...
research
10/29/2019

A Note About: Critical Review of BugSwarm for Fault Localization and Program Repair

Datasets play an important role in the advancement of software tools and...
research
09/11/2019

Selecting Data to Clean for Fact Checking: Minimizing Uncertainty vs. Maximizing Surprise

We study the optimization problem of selecting numerical quantities to c...

Please sign up or login with your details

Forgot password? Click here to reset