Width-based Lookaheads with Learnt Base Policies and Heuristics Over the Atari-2600 Benchmark
We propose new width-based planning and learning algorithms applied over the Atari-2600 benchmark. The algorithms presented are inspired from a careful analysis of the design decisions made by previous width-based planners. We benchmark our new algorithms over the Atari-2600 games and show that our best performing algorithm, RIW_C+CPV, outperforms previously introduced width-based planning and learning algorithms π-IW(1), π-IW(1)+ and π-HIW(n, 1). Furthermore, we present a taxonomy of the set of Atari-2600 games according to some of their defining characteristics. This analysis of the games provides further insight into the behaviour and performance of the width-based algorithms introduced. Namely, for games with large branching factors, and games with sparse meaningful rewards, RIW_C+CPV outperforms π-IW, π-IW(1)+ and π-HIW(n, 1).
READ FULL TEXT