Self Reward Design with Fine-grained Interpretability

12/30/2021
by   Erico Tjoa, et al.
0

Transparency and fairness issues in Deep Reinforcement Learning may stem from the black-box nature of deep neural networks used to learn its policy, value functions etc. This paper proposes a way to circumvent the issues through the bottom-up design of neural networks (NN) with detailed interpretability, where each neuron or layer has its own meaning and utility that corresponds to humanly understandable concept. With deliberate design, we show that lavaland problems can be solved using NN model with few parameters. Furthermore, we introduce the Self Reward Design (SRD), inspired by the Inverse Reward Design, so that our interpretable design can (1) solve the problem by pure design (although imperfectly) (2) be optimized via SRD (3) perform avoidance of unknown states by recognizing the inactivations of neurons aggregated as the activation in w_unknown.

READ FULL TEXT

page 8

page 14

research
10/26/2018

Using stigmergy to incorporate the time into artificial neural networks

A current research trend in neurocomputing involves the design of novel ...
research
02/16/2021

Reward Poisoning in Reinforcement Learning: Attacks Against Unknown Learners in Unknown Environments

We study black-box reward poisoning attacks against reinforcement learni...
research
06/03/2021

Hyperbolically-Discounted Reinforcement Learning on Reward-Punishment Framework

This paper proposes a new reinforcement learning with hyperbolic discoun...
research
05/20/2019

CERTIFAI: Counterfactual Explanations for Robustness, Transparency, Interpretability, and Fairness of Artificial Intelligence models

As artificial intelligence plays an increasingly important role in our s...
research
06/10/2020

Scalable Partial Explainability in Neural Networks via Flexible Activation Functions

Achieving transparency in black-box deep learning algorithms is still an...
research
10/03/2022

Reward Learning with Trees: Methods and Evaluation

Recent efforts to learn reward functions from human feedback have tended...
research
08/16/2021

Neural-to-Tree Policy Distillation with Policy Improvement Criterion

While deep reinforcement learning has achieved promising results in chal...

Please sign up or login with your details

Forgot password? Click here to reset