-
Metrics for Finite Markov Decision Processes
We present metrics for measuring the similarity of states in a finite Ma...
read it
-
UCB Momentum Q-learning: Correcting the bias without forgetting
We propose UCBMQ, Upper Confidence Bound Momentum Q-learning, a new algo...
read it
-
A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation
Q-learning with neural network function approximation (neural Q-learning...
read it
-
Robbins-Mobro conditions for persistent exploration learning strategies
We formulate simple assumptions, implying the Robbins-Monro conditions f...
read it
-
Frequentist Regret Bounds for Randomized Least-Squares Value Iteration
We consider the exploration-exploitation dilemma in finite-horizon reinf...
read it
-
Data-Driven Learning and Load Ensemble Control
Demand response (DR) programs aim to engage distributed small-scale flex...
read it
-
Graying the black box: Understanding DQNs
In recent years there is a growing interest in using deep representation...
read it
Provably Efficient Reinforcement Learning with Aggregated States
We establish that an optimistic variant of Q-learning applied to a finite-horizon episodic Markov decision process with an aggregated state representation incurs regret Õ(√(H^5 M K) + ϵ HK), where H is the horizon, M is the number of aggregate states, K is the number of episodes, and ϵ is the largest difference between any pair of optimal state-action values associated with a common aggregate state. Notably, this regret bound does not depend on the number of states or actions. To the best of our knowledge, this is the first such result pertaining to a reinforcement learning algorithm applied with nontrivial value function approximation without any restrictions on the Markov decision process.
READ FULL TEXT
Comments
There are no comments yet.