b'Tom Everitt'

research

∙ 07/20/2023

Characterising Decision Theories with Mechanised Causal Graphs

How should my own decisions affect my beliefs about the outcomes I expec...

0 Matt MacDermott, et al. ∙

research

∙ 05/31/2023

Human Control: Definitions and Algorithms

How can humans stay in control of advanced artificial intelligence syste...

0 Ryan Carey, et al. ∙

research

∙ 01/05/2023

Reasoning about Causality in Games

Causal reasoning and game-theoretic reasoning are fundamental topics in ...

0 Lewis Hammond, et al. ∙

research

∙ 08/17/2022

Discovering Agents

Causal models of agents have been used to analyse the safety aspects of ...

0 Zachary Kenton, et al. ∙

research

∙ 04/21/2022

Path-Specific Objectives for Safer Agent Incentives

We present a general framework for training safe agents whose naive ince...

0 Sebastian Farquhar, et al. ∙

research

∙ 02/23/2022

A Complete Criterion for Value of Information in Soluble Influence Diagrams

Influence diagrams have recently been used to analyse the safety and fai...

0 Chris van Merwijk, et al. ∙

research

∙ 02/22/2022

Why Fair Labels Can Yield Unfair Predictions: Graphical Conditions for Introduced Unfairness

In addition to reproducing discriminatory relationships in the training ...

0 Carolyn Ashurst, et al. ∙

research

∙ 10/20/2021

Shaking the foundations: delusions in sequence models for interaction and control

The recent phenomenal success of language models has reinvigorated machi...

68 Pedro A. Ortega, et al. ∙

research

∙ 03/26/2021

Alignment of Language Agents

For artificial intelligence to be beneficial to humans the behaviour of ...

0 Zachary Kenton, et al. ∙

research

∙ 02/15/2021

How RL Agents Behave When Their Actions Are Modified

Reinforcement learning in complex environments may require supervision t...

0 Eric D. Langlois, et al. ∙

research

∙ 02/09/2021

Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

Multi-agent influence diagrams (MAIDs) are a popular form of graphical m...

18 Lewis Hammond, et al. ∙

research

∙ 02/02/2021

Agent Incentives: A Causal Perspective

We present a framework for analysing agent incentives using causal influ...

14 Tom Everitt, et al. ∙

research

∙ 11/17/2020

Avoiding Tampering Incentives in Deep RL via Decoupled Approval

How can we design agents that pursue a given objective when all feedback...

5 Jonathan Uesato, et al. ∙

research

∙ 11/17/2020

REALab: An Embedded Perspective on Tampering

This paper describes REALab, a platform for embedded agency research in ...

5 Ramana Kumar, et al. ∙

research

∙ 01/20/2020

The Incentives that Shape Behaviour

Which variables does an agent have an incentive to control with its deci...

9 Ryan Carey, et al. ∙

research

∙ 08/13/2019

Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective

Can an arbitrarily intelligent reinforcement learning agent be kept unde...

3 Tom Everitt, et al. ∙

research

∙ 06/20/2019

Modeling AGI Safety Frameworks with Causal Influence Diagrams

Proposals for safe AGI systems are typically made at the level of framew...

2 Tom Everitt, et al. ∙

research

∙ 02/26/2019

Understanding Agent Incentives using Causal Influence Diagrams, Part I: Single Action Settings

Agents are systems that optimize an objective function in an environment...

0 Tom Everitt, et al. ∙

research

∙ 11/19/2018

Scalable agent alignment via reward modeling: a research direction

One obstacle to applying reinforcement learning algorithms to real-world...

20 Jan Leike, et al. ∙

research

∙ 05/03/2018

AGI Safety Literature Review

The development of Artificial General Intelligence (AGI) promises to be ...

0 Tom Everitt, et al. ∙

research

∙ 11/27/2017

AI Safety Gridworlds

We present a suite of reinforcement learning environments illustrating v...

0 Jan Leike, et al. ∙

research

∙ 08/13/2017

A Game-Theoretic Analysis of the Off-Switch Game

The off-switch game is a game theoretic model of a highly intelligent ro...

0 Tobias Wängberg, et al. ∙

research

∙ 06/25/2017

Count-Based Exploration in Feature Space for Reinforcement Learning

We introduce a new count-based optimistic exploration algorithm for Rein...

0 Jarryd Martin, et al. ∙

research

∙ 05/23/2017

Reinforcement Learning with a Corrupted Reward Channel

No real-world reward function is perfect. Sensory errors and software bu...

0 Tom Everitt, et al. ∙

research

∙ 09/09/2015

A Topological Approach to Meta-heuristics: Analytical Results on the BFS vs. DFS Algorithm Selection Problem

Search is a central problem in artificial intelligence, and BFS and DFS ...

0 Tom Everitt, et al. ∙

Tom Everitt

Featured Co-authors

Sign in with Google

Consider DeepAI Pro