Minimax Optimal Q Learning with Nearest Neighbors

08/03/2023
by   Puning Zhao, et al.
0

Q learning is a popular model free reinforcement learning method. Most of existing works focus on analyzing Q learning for finite state and action spaces. If the state space is continuous, then the original Q learning method can not be directly used. A modification of the original Q learning method was proposed in (Shah and Xie, 2018), which estimates Q values with nearest neighbors. Such modification makes Q learning suitable for continuous state space. (Shah and Xie, 2018) shows that the convergence rate of estimated Q function is Õ(T^-1/(d+3)), which is slower than the minimax lower bound Ω̃(T^-1/(d+2)), indicating that this method is not efficient. This paper proposes two new Q learning methods to bridge the gap of convergence rates in (Shah and Xie, 2018), with one of them being offline, while the other is online. Despite that we still use nearest neighbor approach to estimate Q function, the algorithms are crucially different from (Shah and Xie, 2018). In particular, we replace the kernel nearest neighbor in discretized region with a direct nearest neighbor approach. Consequently, our approach significantly improves the convergence rate. Moreover, the time complexity is also significantly improved in high dimensional state spaces. Our analysis shows that both offline and online methods are minimax rate optimal.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset