One of the most surprising puzzles in neural network generalisation is
g...
We investigate the internal structure of language model computations usi...
Circuit analysis is a promising technique for understanding the
internal...
Power-seeking behavior is a key source of risk from advanced AI, but our...
Interpretability research aims to build tools for understanding machine
...
Nash equilibrium is a central concept in game theory. Several Nash solve...
Recent advances in deep reinforcement learning (RL) have led to consider...
One of the fundamental challenges of governance is deciding when and how...
OpenSpiel is a collection of environments and algorithms for research in...
Reciprocity is an important feature of human social interaction and unde...
We propose a model-free deep reinforcement learning method that leverage...
With almost daily improvements in capabilities of artificial intelligenc...
We propose zoneout, a novel method for regularizing RNNs. At each timest...
There is considerable uncertainty about what properties, capabilities an...