Exploring TD error as a heuristic for σ selection in Q(σ, λ)

12/21/2019
by   Abhishek Nan, et al.
13

In the landscape of TD algorithms, the Q(σ, λ) algorithm is an algorithm with the ability to perform a multistep backup in an online manner while also successfully unifying the concepts of sampling with using the expectation across all actions for a state. σ∈ [0, 1] indicates the extent to which sampling is used. Selecting the value of σ can be based on characteristics of the current state rather than having a constant value or being time based. This report explores the viability of such a TD-error based scheme.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2014

A Multi-threshold Segmentation Approach Based on Artificial Bee Colony Optimization

This paper explores the use of the Artificial Bee Colony (ABC) algorithm...
research
02/25/2019

FPRAS for the Potts Model and the Number of k-colorings

In this paper, we give a sampling algorithm for the Potts model using Ma...
research
09/21/2023

Predictor models for high-performance wheel loading

Autonomous wheel loading involves selecting actions that maximize the to...
research
09/06/2019

Gradient Q(σ, λ): A Unified Algorithm with Function Approximation for Reinforcement Learning

Full-sampling (e.g., Q-learning) and pure-expectation (e.g., Expected Sa...
research
05/03/2018

The Algorithm Selection Competition Series 2015-17

The algorithm selection problem is to choose the most suitable algorithm...
research
06/19/2020

Exploratory Landscape Analysis is Strongly Sensitive to the Sampling Strategy

Exploratory landscape analysis (ELA) supports supervised learning approa...
research
01/13/2022

On Sampling Collaborative Filtering Datasets

We study the practical consequences of dataset sampling strategies on th...

Please sign up or login with your details

Forgot password? Click here to reset