Differentially Private Exploration in Reinforcement Learning with Linear Representation

12/02/2021
by   Paul Luyo, et al.
0

This paper studies privacy-preserving exploration in Markov Decision Processes (MDPs) with linear representation. We first consider the setting of linear-mixture MDPs (Ayoub et al., 2020) (a.k.a. model-based setting) and provide an unified framework for analyzing joint and local differential private (DP) exploration. Through this framework, we prove a O(K^3/4/√(ϵ)) regret bound for (ϵ,δ)-local DP exploration and a O(√(K/ϵ)) regret bound for (ϵ,δ)-joint DP. We further study privacy-preserving exploration in linear MDPs (Jin et al., 2020) (a.k.a. model-free setting) where we provide a O(√(K/ϵ)) regret bound for (ϵ,δ)-joint DP, with a novel algorithm based on low-switching. Finally, we provide insights into the issues of designing local DP algorithms in this model-free setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2021

Differentially Private Regret Minimization in Episodic Markov Decision Processes

We study regret minimization in finite horizon tabular Markov decision p...
research
10/19/2021

Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes

Reinforcement learning (RL) algorithms can be used to provide personaliz...
research
04/21/2020

Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition

We study the reinforcement learning problem in the setting of finite-hor...
research
01/18/2022

Differentially Private Reinforcement Learning with Linear Function Approximation

Motivated by the wide adoption of reinforcement learning (RL) in real-wo...
research
02/02/2022

Improved Regret for Differentially Private Exploration in Linear MDP

We study privacy-preserving exploration in sequential decision-making fo...
research
02/24/2021

No-Regret Algorithms for Private Gaussian Process Bandit Optimization

The widespread proliferation of data-driven decision-making has ushered ...
research
07/18/2021

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Policy optimization is a widely-used method in reinforcement learning. D...

Please sign up or login with your details

Forgot password? Click here to reset