How should my own decisions affect my beliefs about the outcomes I expec...
How can humans stay in control of advanced artificial intelligence syste...
Causal reasoning and game-theoretic reasoning are fundamental topics in
...
Causal models of agents have been used to analyse the safety aspects of
...
We present a general framework for training safe agents whose naive
ince...
Influence diagrams have recently been used to analyse the safety and fai...
In addition to reproducing discriminatory relationships in the training ...
The recent phenomenal success of language models has reinvigorated machi...
For artificial intelligence to be beneficial to humans the behaviour of ...
Reinforcement learning in complex environments may require supervision t...
Multi-agent influence diagrams (MAIDs) are a popular form of graphical m...
We present a framework for analysing agent incentives using causal influ...
How can we design agents that pursue a given objective when all feedback...
This paper describes REALab, a platform for embedded agency research in
...
Which variables does an agent have an incentive to control with its deci...
Can an arbitrarily intelligent reinforcement learning agent be kept unde...
Proposals for safe AGI systems are typically made at the level of framew...
Agents are systems that optimize an objective function in an environment...
One obstacle to applying reinforcement learning algorithms to real-world...
The development of Artificial General Intelligence (AGI) promises to be ...
We present a suite of reinforcement learning environments illustrating
v...
The off-switch game is a game theoretic model of a highly intelligent ro...
We introduce a new count-based optimistic exploration algorithm for
Rein...
No real-world reward function is perfect. Sensory errors and software bu...
Search is a central problem in artificial intelligence, and BFS and DFS ...