Improving Prototypical Part Networks with Reward Reweighing, Reselection, and Retraining

07/08/2023
by   Robin Netzorg, et al.
0

In recent years, work has gone into developing deep interpretable methods for image classification that clearly attributes a model's output to specific features of the data. One such of these methods is the prototypical part network (ProtoPNet), which attempts to classify images based on meaningful parts of the input. While this method results in interpretable classifications, this method often learns to classify from spurious or inconsistent parts of the image. Hoping to remedy this, we take inspiration from the recent developments in Reinforcement Learning with Human Feedback (RLHF) to fine-tune these prototypes. By collecting human annotations of prototypes quality via a 1-5 scale on the CUB-200-2011 dataset, we construct a reward model that learns to identify non-spurious prototypes. In place of a full RL update, we propose the reweighted, reselected, and retrained prototypical part network (R3-ProtoPNet), which adds an additional three steps to the ProtoPNet training loop. The first two steps are reward-based reweighting and reselection, which align prototypes with human feedback. The final step is retraining to realign the model's features with the updated prototypes. We find that R3-ProtoPNet improves the overall consistency and meaningfulness of the prototypes, but lower the test predictive accuracy when used independently. When multiple R3-ProtoPNets are incorporated into an ensemble, we find an increase in test predictive performance while maintaining interpretability.

READ FULL TEXT

page 5

page 9

research
12/20/2021

Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions

The potential of reinforcement learning (RL) to deliver aligned and perf...
research
06/01/2017

Teaching Machines to Describe Images via Natural Language Feedback

Robots will eventually be part of every household. It is thus critical t...
research
02/17/2023

Data Driven Reward Initialization for Preference based Reinforcement Learning

Preference-based Reinforcement Learning (PbRL) methods utilize binary fe...
research
03/23/2023

Take 5: Interpretable Image Classification with a Handful of Features

Deep Neural Networks use thousands of mostly incomprehensible features t...
research
05/27/2023

Query-Policy Misalignment in Preference-Based Reinforcement Learning

Preference-based reinforcement learning (PbRL) provides a natural way to...
research
05/30/2022

Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning

We generalise the problem of reward modelling (RM) for reinforcement lea...
research
12/18/2020

Content Masked Loss: Human-Like Brush Stroke Planning in a Reinforcement Learning Painting Agent

The objective of most Reinforcement Learning painting agents is to minim...

Please sign up or login with your details

Forgot password? Click here to reset