An Analysis of State-Relevance Weights and Sampling Distributions on L1-Regularized Approximate Linear Programming Approximation Accuracy

04/16/2014
by   Gavin Taylor, et al.
0

Recent interest in the use of L_1 regularization in the use of value function approximation includes Petrik et al.'s introduction of L_1-Regularized Approximate Linear Programming (RALP). RALP is unique among L_1-regularized approaches in that it approximates the optimal value function using off-policy samples. Additionally, it produces policies which outperform those of previous methods, such as LSPI. RALP's value function approximation quality is affected heavily by the choice of state-relevance weights in the objective function of the linear program, and by the distribution from which samples are drawn; however, there has been no discussion of these considerations in the previous literature. In this paper, we discuss and explain the effects of choices in the state-relevance weights and sampling distribution on approximation quality, using both theoretical and experimental illustrations. The results provide insight not only onto these effects, but also provide intuition into the types of MDPs which are especially well suited for approximation with RALP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2012

Value Function Approximation in Noisy Environments Using Locally Smoothed Regularized Approximate Linear Programs

Recently, Petrik et al. demonstrated that L1Regularized Approximate Line...
research
01/16/2013

Policy Iteration for Factored MDPs

Many large MDPs can be represented compactly using a dynamic Bayesian ne...
research
09/28/2021

The Role of Lookahead and Approximate Policy Evaluation in Policy Iteration with Linear Value Function Approximation

When the sizes of the state and action spaces are large, solving MDPs ca...
research
07/22/2020

Approximation Benefits of Policy Gradient Methods with Aggregated States

Folklore suggests that policy gradient can be more robust to misspecific...
research
11/29/2015

Exploiting Anonymity in Approximate Linear Programming: Scaling to Large Multiagent MDPs (Extended Version)

Many exact and approximate solution methods for Markov Decision Processe...
research
01/23/2019

Trust Region Value Optimization using Kalman Filtering

Policy evaluation is a key process in reinforcement learning. It assesse...
research
06/27/2012

A Dantzig Selector Approach to Temporal Difference Learning

LSTD is a popular algorithm for value function approximation. Whenever t...

Please sign up or login with your details

Forgot password? Click here to reset