Hamiltonian Q-Learning: Leveraging Importance-sampling for Data Efficient RL

by   Udari Madhushani, et al.

Model-free reinforcement learning (RL), in particular Q-learning is widely used to learn optimal policies for a variety of planning and control problems. However, when the underlying state-transition dynamics are stochastic and high-dimensional, Q-learning requires a large amount of data and incurs a prohibitively high computational cost. In this paper, we introduce Hamiltonian Q-Learning, a data efficient modification of the Q-learning approach, which adopts an importance-sampling based technique for computing the Q function. To exploit stochastic structure of the state-transition dynamics, we employ Hamiltonian Monte Carlo to update Q function estimates by approximating the expected future rewards using Q values associated with a subset of next states. Further, to exploit the latent low-rank structure of the dynamic system, Hamiltonian Q-Learning uses a matrix completion algorithm to reconstruct the updated Q function from Q value updates over a much smaller subset of state-action pairs. By providing an efficient way to apply Q-learning in stochastic, high-dimensional problems, the proposed approach broadens the scope of RL algorithms for real-world applications, including classical control tasks and environmental monitoring.


page 6

page 11

page 13


Harnessing Structures for Value-Based Planning and Reinforcement Learning

Value-based methods constitute a fundamental methodology in planning and...

Improving Actor-Critic Reinforcement Learning via Hamiltonian Policy

Approximating optimal policies in reinforcement learning (RL) is often n...

TTR-Based Rewards for Reinforcement Learning with Implicit Model Priors

Model-free reinforcement learning (RL) provides an attractive approach f...

Conservative Optimistic Policy Optimization via Multiple Importance Sampling

Reinforcement Learning (RL) has been able to solve hard problems such as...

Hamiltonian MCMC methods for estimating rare events probabilities in high-dimensional problems

Accurate and efficient estimation of rare events probabilities is of sig...

SyMetric: Measuring the Quality of Learnt Hamiltonian Dynamics Inferred from Vision

A recently proposed class of models attempts to learn latent dynamics fr...

Importance Weighted Evolution Strategies

Evolution Strategies (ES) emerged as a scalable alternative to popular R...