Reward Reports for Reinforcement Learning

by   Thomas Krendl Gilbert, et al.

The desire to build good systems in the face of complex societal effects requires a dynamic approach towards equity and access. Recent approaches to machine learning (ML) documentation have demonstrated the promise of discursive frameworks for deliberation about these complexities. However, these developments have been grounded in a static ML paradigm, leaving the role of feedback and post-deployment performance unexamined. Meanwhile, recent work in reinforcement learning design has shown that the effects of optimization objectives on the resultant system behavior can be wide-ranging and unpredictable. In this paper we sketch a framework for documenting deployed learning systems, which we call Reward Reports. Taking inspiration from various contributions to the technical literature on reinforcement learning, we outline Reward Reports as living documents that track updates to design choices and assumptions behind what a particular automated system is optimizing for. They are intended to track dynamic phenomena arising from system deployment, rather than merely static properties of models or data. After presenting the elements of a Reward Report, we provide three examples: DeepMind's MuZero, MovieLens, and a hypothetical deployment of a Project Flow traffic control policy.


page 6

page 20

page 36


Choices, Risks, and Reward Reports: Charting Public Policy for Reinforcement Learning Systems

In the long term, reinforcement learning (RL) is considered by many AI t...

From Design to Deployment of Zero-touch Deep Reinforcement Learning WLANs

Machine learning (ML) is increasingly used to automate networking tasks,...

Hyperbolically-Discounted Reinforcement Learning on Reward-Punishment Framework

This paper proposes a new reinforcement learning with hyperbolic discoun...

Importance Weighted Policy Learning and Adaption

The ability to exploit prior experience to solve novel problems rapidly ...

Toward Understanding the Impact of Staleness in Distributed Machine Learning

Many distributed machine learning (ML) systems adopt the non-synchronous...

Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity

Reinforcement learning provides an automated framework for learning beha...

Please sign up or login with your details

Forgot password? Click here to reset