Measurement Tampering Detection Benchmark

08/29/2023
by   Fabien Roger, et al.
0

When training powerful AI systems to perform complex tasks, it may be challenging to provide training signals which are robust to optimization. One concern is measurement tampering, where the AI system manipulates multiple measurements to create the illusion of good results instead of achieving the desired outcome. In this work, we build four new text-based datasets to evaluate measurement tampering detection techniques on large language models. Concretely, given sets of text inputs and measurements aimed at determining if some outcome occurred, as well as a base model able to accurately predict measurements, the goal is to determine if examples where all measurements indicate the outcome actually had the outcome occur, or if this was caused by measurement tampering. We demonstrate techniques that outperform simple baselines on most datasets, but don't achieve maximum performance. We believe there is significant room for improvement for both techniques and datasets, and we are excited for future work tackling measurement tampering.

READ FULL TEXT

page 16

page 33

page 37

page 38

page 39

research
10/23/2022

Do Language Models Understand Measurements?

Recent success of pre-trained language models (PLMs) has stimulated inte...
research
01/16/2019

Measurements As First-class Artifacts

The emergence of programmable switches has sparked a significant amount ...
research
11/01/2018

Simple Sensitivity Analysis for Differential Measurement Error

Simple sensitivity analysis results are given for differential measureme...
research
12/16/2021

Masked Measurement Prediction: Learning to Jointly Predict Quantities and Units from Textual Context

Physical measurements constitute a large portion of numbers in academic ...
research
03/03/2023

Will Affective Computing Emerge from Foundation Models and General AI? A First Evaluation on ChatGPT

ChatGPT has shown the potential of emerging general artificial intellige...
research
04/15/2021

Detect and Classify – Joint Span Detection and Classification for Health Outcomes

A health outcome is a measurement or an observation used to capture and ...
research
12/09/2022

Measuring Data

We identify the task of measuring data to quantitatively characterize th...

Please sign up or login with your details

Forgot password? Click here to reset