Causal Analysis of Agent Behavior for AI Safety

03/05/2021
by   Grégoire Delétang, et al.
26

As machine learning systems become more powerful they also become increasingly unpredictable and opaque. Yet, finding human-understandable explanations of how they work is essential for their safe deployment. This technical report illustrates a methodology for investigating the causal mechanisms that drive the behaviour of artificial agents. Six use cases are covered, each addressing a typical question an analyst might ask about an agent. In particular, we show that each question cannot be addressed by pure observation alone, but instead requires conducting experiments with systematically chosen manipulations so as to generate the correct causal evidence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2021

Tell me why! – Explanations support learning of relational and causal structure

Explanations play a considerable role in human learning, especially in a...
research
11/17/2022

Explainability Via Causal Self-Talk

Explaining the behavior of AI systems is an important problem that, in p...
research
01/28/2020

Distal Explanations for Explainable Reinforcement Learning Agents

Causal explanations present an intuitive way to understand the course of...
research
08/17/2022

Discovering Agents

Causal models of agents have been used to analyse the safety aspects of ...
research
01/11/2018

Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution

Current machine learning systems operate, almost exclusively, in a stati...
research
08/21/2021

Learning Causal Models of Autonomous Agents using Interventions

One of the several obstacles in the widespread use of AI systems is the ...
research
04/21/2022

Path-Specific Objectives for Safer Agent Incentives

We present a general framework for training safe agents whose naive ince...

Please sign up or login with your details

Forgot password? Click here to reset