
Metrics for Finite Markov Decision Processes
We present metrics for measuring the similarity of states in a finite Ma...
read it

UCB Momentum Qlearning: Correcting the bias without forgetting
We propose UCBMQ, Upper Confidence Bound Momentum Qlearning, a new algo...
read it

A FiniteTime Analysis of QLearning with Neural Network Function Approximation
Qlearning with neural network function approximation (neural Qlearning...
read it

RobbinsMobro conditions for persistent exploration learning strategies
We formulate simple assumptions, implying the RobbinsMonro conditions f...
read it

Frequentist Regret Bounds for Randomized LeastSquares Value Iteration
We consider the explorationexploitation dilemma in finitehorizon reinf...
read it

DataDriven Learning and Load Ensemble Control
Demand response (DR) programs aim to engage distributed smallscale flex...
read it

Graying the black box: Understanding DQNs
In recent years there is a growing interest in using deep representation...
read it
Provably Efficient Reinforcement Learning with Aggregated States
We establish that an optimistic variant of Qlearning applied to a finitehorizon episodic Markov decision process with an aggregated state representation incurs regret Õ(√(H^5 M K) + ϵ HK), where H is the horizon, M is the number of aggregate states, K is the number of episodes, and ϵ is the largest difference between any pair of optimal stateaction values associated with a common aggregate state. Notably, this regret bound does not depend on the number of states or actions. To the best of our knowledge, this is the first such result pertaining to a reinforcement learning algorithm applied with nontrivial value function approximation without any restrictions on the Markov decision process.
READ FULL TEXT
Comments
There are no comments yet.