János Kramár

research

∙ 09/05/2023

Explaining grokking through circuit efficiency

One of the most surprising puzzles in neural network generalisation is g...

0 Vikrant Varma, et al. ∙

research

∙ 07/28/2023

The Hydra Effect: Emergent Self-repair in Language Model Computations

We investigate the internal structure of language model computations usi...

0 Thomas McGrath, et al. ∙

research

∙ 07/18/2023

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Circuit analysis is a promising technique for understanding the internal...

0 Tom Lieberum, et al. ∙

research

∙ 04/13/2023

Power-seeking can be probable and predictive for trained agents

Power-seeking behavior is a key source of risk from advanced AI, but our...

0 Victoria Krakovna, et al. ∙

research

∙ 01/12/2023

Tracr: Compiled Transformers as a Laboratory for Interpretability

Interpretability research aims to build tools for understanding machine ...

0 David Lindner, et al. ∙

research

∙ 06/02/2021

Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent

Nash equilibrium is a central concept in game theory. Several Nash solve...

0 Ian Gemp, et al. ∙

research

∙ 06/08/2020

Learning to Play No-Press Diplomacy with Best Response Policy Iteration

Recent advances in deep reinforcement learning (RL) have led to consider...

0 Thomas Anthony, et al. ∙

research

∙ 04/16/2020

Should I tear down this wall? Optimizing social metrics by evaluating novel actions

One of the fundamental challenges of governance is deciding when and how...

0 János Kramár, et al. ∙

research

∙ 08/26/2019

OpenSpiel: A Framework for Reinforcement Learning in Games

OpenSpiel is a collection of environments and algorithms for research in...

12 Marc Lanctot, et al. ∙

research

∙ 03/19/2019

Learning Reciprocity in Complex Sequential Social Dilemmas

Reciprocity is an important feature of human social interaction and unde...

0 Tom Eccles, et al. ∙

research

∙ 02/26/2018

Reinforcement and Imitation Learning for Diverse Visuomotor Skills

We propose a model-free deep reinforcement learning method that leverage...

0 Yuke Zhu, et al. ∙

research

∙ 07/24/2017

Guidelines for Artificial Intelligence Containment

With almost daily improvements in capabilities of artificial intelligenc...

1 James Babcock, et al. ∙

research

∙ 06/03/2016

Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations

We propose zoneout, a novel method for regularizing RNNs. At each timest...

0 David Krueger, et al. ∙

research

∙ 04/02/2016

The AGI Containment Problem

There is considerable uncertainty about what properties, capabilities an...

0 James Babcock, et al. ∙

János Kramár

Featured Co-authors

Sign in with Google

Consider DeepAI Pro