DeepAI AI Chat
Log In Sign Up

Causal Analysis of Agent Behavior for AI Safety

by   Grégoire Delétang, et al.

As machine learning systems become more powerful they also become increasingly unpredictable and opaque. Yet, finding human-understandable explanations of how they work is essential for their safe deployment. This technical report illustrates a methodology for investigating the causal mechanisms that drive the behaviour of artificial agents. Six use cases are covered, each addressing a typical question an analyst might ask about an agent. In particular, we show that each question cannot be addressed by pure observation alone, but instead requires conducting experiments with systematically chosen manipulations so as to generate the correct causal evidence.


page 1

page 2

page 3

page 4


Tell me why! – Explanations support learning of relational and causal structure

Explanations play a considerable role in human learning, especially in a...

Explainability Via Causal Self-Talk

Explaining the behavior of AI systems is an important problem that, in p...

Distal Explanations for Explainable Reinforcement Learning Agents

Causal explanations present an intuitive way to understand the course of...

Discovering Agents

Causal models of agents have been used to analyse the safety aspects of ...

Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution

Current machine learning systems operate, almost exclusively, in a stati...

Learning Causal Models of Autonomous Agents using Interventions

One of the several obstacles in the widespread use of AI systems is the ...

Path-Specific Objectives for Safer Agent Incentives

We present a general framework for training safe agents whose naive ince...