
SoftBayes: Prod for Mixtures of Experts with LogLoss
We consider prediction with expert advice under the logloss with the go...
Logarithmic Pruning is All You Need
The Lottery Ticket Hypothesis is a conjecture that every large neural ne...
An investigation of modelfree planning
The field of reinforcement learning (RL) is facing increasingly challeng...
Pitfalls of learning a reward function online
In some agent designs like inverse reinforcement learning an agent needs...
Measuring and avoiding side effects using relative reachability
How can we design reinforcement learning agents that avoid causing unnec...
Reinforcement Learning with a Corrupted Reward Channel
No realworld reward function is perfect. Sensory errors and software bu...
Thompson Sampling is Asymptotically Optimal in General Environments
We discuss a variant of Thompson sampling for nonparametric reinforcemen...
AI Safety Gridworlds
We present a suite of reinforcement learning environments illustrating v...
Agents and Devices: A Relative Definition of Agency
According to Dennett, the same system may be described using a `physical...
SingleAgent Policy Tree Search With Guarantees
We introduce two novel tree search algorithms that use a policy to guide...
Zooming Cautiously: LinearMemory Heuristic Search With Node Expansion Guarantees
We introduce and analyze two parameterfree linearmemory tree search al...
Iterative Budgeted Exponential Search
We tackle two longstanding problems related to reexpansions in heurist...
Learning to Prove from Synthetic Theorems
A major challenge in applying machine learning to automated theorem prov...
Laurent Orseau
