Taken out of context: On measuring situational awareness in LLMs

09/01/2023
by   Lukas Berglund, et al.
0

We aim to better understand the emergence of `situational awareness' in large language models (LLMs). A model is situationally aware if it's aware that it's a model and can recognize whether it's currently in testing or deployment. Today's LLMs are tested for safety and alignment before they are deployed. An LLM could exploit situational awareness to achieve a high score on safety tests, while taking harmful actions after deployment. Situational awareness may emerge unexpectedly as a byproduct of model scaling. One way to better foresee this emergence is to run scaling experiments on abilities necessary for situational awareness. As such an ability, we propose `out-of-context reasoning' (in contrast to in-context learning). We study out-of-context reasoning experimentally. First, we finetune an LLM on a description of a test while providing no examples or demonstrations. At test time, we assess whether the model can pass the test. To our surprise, we find that LLMs succeed on this out-of-context reasoning task. Their success is sensitive to the training setup and only works when we apply data augmentation. For both GPT-3 and LLaMA-1, performance improves with model size. These findings offer a foundation for further empirical study, towards predicting and potentially controlling the emergence of situational awareness in LLMs. Code is available at: https://github.com/AsaCooperStickland/situational-awareness-evals.

READ FULL TEXT
research
09/04/2023

Are Emergent Abilities in Large Language Models just In-Context Learning?

Large language models have exhibited emergent abilities, demonstrating e...
research
05/23/2023

Concept-aware Training Improves In-context Learning Ability of Language Models

Many recent language models (LMs) of Transformers family exhibit so-call...
research
05/24/2023

Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners

The emergent few-shot reasoning capabilities of Large Language Models (L...
research
05/03/2022

ElitePLM: An Empirical Study on General Language Ability Evaluation of Pretrained Language Models

Nowadays, pretrained language models (PLMs) have dominated the majority ...
research
10/16/2021

MAAD: A Model and Dataset for "Attended Awareness" in Driving

We propose a computational model to estimate a person's attended awarene...
research
05/20/2023

A request for clarity over the End of Sequence token in the Self-Critical Sequence Training

The Image Captioning research field is currently compromised by the lack...
research
08/21/2023

CSM-H-R: An Automatic Context Reasoning Framework for Interoperable Intelligent Systems and Privacy Protection

Automation of High-Level Context (HLC) reasoning for intelligent systems...

Please sign up or login with your details

Forgot password? Click here to reset