Modelling non-reinforced preferences using selective attention

07/25/2022
by   Noor Sajid, et al.
4

How can artificial agents learn non-reinforced preferences to continuously adapt their behaviour to a changing environment? We decompose this question into two challenges: (i) encoding diverse memories and (ii) selectively attending to these for preference formation. Our proposed non-reinforced preference learning mechanism using selective attention, Nore, addresses both by leveraging the agent's world model to collect a diverse set of experiences which are interleaved with imagined roll-outs to encode memories. These memories are selectively attended to, using attention and gating blocks, to update agent's preferences. We validate Nore in a modified OpenAI Gym FrozenLake environment (without any external signal) with and without volatility under a fixed model of the environment – and compare its behaviour to Pepper, a Hebbian preference learning mechanism. We demonstrate that Nore provides a straightforward framework to induce exploratory preferences in the absence of external signals.

READ FULL TEXT

page 4

page 10

research
06/08/2021

Exploration and preference satisfaction trade-off in reward-free learning

Biological agents have meaningful interactions with their environment de...
research
09/25/2020

Pareto efficient combinatorial auctions: dichotomous preferences without quasilinearity

We consider a combinatorial auction model where preferences of agents ov...
research
02/14/2020

RL agents Implicitly Learning Human Preferences

In the real world, RL agents should be rewarded for fulfilling human pre...
research
03/02/2023

Preference Transformer: Modeling Human Preferences using Transformers for RL

Preference-based reinforcement learning (RL) provides a framework to tra...
research
02/27/2018

Modelling and Analysis of Temporal Preference Drifts Using A Component-Based Factorised Latent Approach

The changes in user preferences can originate from substantial reasons, ...
research
01/15/2021

Deciding What to Learn: A Rate-Distortion Approach

Agents that learn to select optimal actions represent a prominent focus ...
research
09/29/2021

Conflicting Privacy Preference Signals in the Wild

Privacy preference signals allow users to express preferences over how t...

Please sign up or login with your details

Forgot password? Click here to reset