My Brain is Full: When More Memory Helps

by   Christopher Lusena, et al.

We consider the problem of finding good finite-horizon policies for POMDPs under the expected reward metric. The policies considered are em free finite-memory policies with limited memory; a policy is a mapping from the space of observation-memory pairs to the space of action-memeory pairs (the policy updates the memory as it goes), and the number of possible memory states is a parameter of the input to the policy-finding algorithms. The algorithms considered here are preliminary implementations of three search heuristics: local search, simulated annealing, and genetic algorithms. We compare their outcomes to each other and to the optimal policies for each instance. We compare run times of each policy and of a dynamic programming algorithm for POMDPs developed by Hansen that iteratively improves a finite-state controller --- the previous state of the art for finite memory policies. The value of the best policy can only improve as the amount of memory increases, up to the amount needed for an optimal finite-memory policy. Our most surprising finding is that more memory helps in another way: given more memory than is needed for an optimal policy, the algorithms are more likely to converge to optimal-valued policies.


page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 8


Solving POMDPs by Searching the Space of Finite Policies

Solving partially observable Markov decision processes (POMDPs) is highl...

Computing Complexity-aware Plans Using Kolmogorov Complexity

In this paper, we introduce complexity-aware planning for finite-horizon...

Action Selection for MDPs: Anytime AO* vs. UCT

In the presence of non-admissible heuristics, A* and other best-first al...

Stochastic Bandits with Delay-Dependent Payoffs

Motivated by recommendation problems in music streaming platforms, we pr...

Solving POMDPs by Searching in Policy Space

Most algorithms for solving POMDPs iteratively improve a value function ...

Optimal Policies for the Homogeneous Selective Labels Problem

Selective labels are a common feature of consequential decision-making a...

Policy Search: Any Local Optimum Enjoys a Global Performance Guarantee

Local Policy Search is a popular reinforcement learning approach for han...

Please sign up or login with your details

Forgot password? Click here to reset