Ramana Kumar

research

∙ 09/05/2023

Explaining grokking through circuit efficiency

One of the most surprising puzzles in neural network generalisation is g...

0 Vikrant Varma, et al. ∙

research

∙ 02/09/2023

Scaling Goal-based Exploration via Pruning Proto-goals

One of the gnarliest challenges in reinforcement learning (RL) is explor...

0 Akhil Bagaria, et al. ∙

research

∙ 11/25/2022

Solving math word problems with process- and outcome-based feedback

Recent work has shown that asking language models to generate reasoning ...

0 Jonathan Uesato, et al. ∙

research

∙ 10/04/2022

Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals

The field of AI alignment is concerned with AI systems that pursue unint...

0 Rohin Shah, et al. ∙

research

∙ 08/17/2022

Discovering Agents

Causal models of agents have been used to analyse the safety aspects of ...

0 Zachary Kenton, et al. ∙

research

∙ 01/20/2022

Safe Deep RL in 3D Environments using Human Feedback

Agents should avoid unsafe behaviour during both training and deployment...

5 Matthew Rahtz, et al. ∙

research

∙ 04/01/2021

Formal Methods for the Informal Engineer: Workshop Recommendations

Formal Methods for the Informal Engineer (FMIE) was a workshop held at t...

12 Gopal Sarma, et al. ∙

research

∙ 11/17/2020

Avoiding Tampering Incentives in Deep RL via Decoupled Approval

How can we design agents that pursue a given objective when all feedback...

5 Jonathan Uesato, et al. ∙

research

∙ 11/17/2020

REALab: An Embedded Perspective on Tampering

This paper describes REALab, a platform for embedded agency research in ...

5 Ramana Kumar, et al. ∙

research

∙ 06/20/2019

Modeling AGI Safety Frameworks with Causal Influence Diagrams

Proposals for safe AGI systems are typically made at the level of framew...

2 Tom Everitt, et al. ∙

research

∙ 04/02/2018

Learning to Prove with Tactics

We implement a automated tactical prover TacticToe on top of the HOL4 in...

0 Thibault Gauthier, et al. ∙

research

∙ 03/09/2018

Clocked Definitions in HOL

Many potentially non-terminating functions cannot be directly defined in...

0 Ramana Kumar, et al. ∙

Ramana Kumar

Featured Co-authors

Sign in with Google

Consider DeepAI Pro