Scale Invariant Solutions for Overdetermined Linear Systems with Applications to Reinforcement Learning

04/15/2021
by   Rahul Madhavan, et al.
0

Overdetermined linear systems are common in reinforcement learning, e.g., in Q and value function estimation with function approximation. The standard least-squares criterion, however, leads to a solution that is unduly influenced by rows with large norms. This is a serious issue, especially when the matrices in these systems are beyond user control. To address this, we propose a scale-invariant criterion that we then use to develop two novel algorithms for value function estimation: Normalized Monte Carlo and Normalized TD(0). Separately, we also introduce a novel adaptive stepsize that may be useful beyond this work as well. We use simulations and theoretical guarantees to demonstrate the efficacy of our ideas.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2017

Deep Exploration via Randomized Value Functions

We study the use of randomized value functions to guide deep exploration...
research
01/04/2020

Represented Value Function Approach for Large Scale Multi Agent Reinforcement Learning

In this paper, we consider the problem of large scale multi agent reinfo...
research
12/19/2013

Avoiding Confusion between Predictors and Inhibitors in Value Function Approximation

In reinforcement learning, the goal is to seek rewards and avoid punishm...
research
12/07/2021

First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach

Obtaining first-order regret bounds – regret bounds scaling not as the w...
research
05/23/2018

Scalable Coordinated Exploration in Concurrent Reinforcement Learning

We consider a team of reinforcement learning agents that concurrently op...
research
05/29/2023

VA-learning as a more efficient alternative to Q-learning

In reinforcement learning, the advantage function is critical for policy...
research
05/30/2019

On Value Functions and the Agent-Environment Boundary

When function approximation is deployed in reinforcement learning (RL), ...

Please sign up or login with your details

Forgot password? Click here to reset