Differentially Private Regret Minimization in Episodic Markov Decision Processes

12/20/2021
by   Sayak Ray Chowdhury, et al.
0

We study regret minimization in finite horizon tabular Markov decision processes (MDPs) under the constraints of differential privacy (DP). This is motivated by the widespread applications of reinforcement learning (RL) in real-world sequential decision making problems, where protecting users' sensitive and private information is becoming paramount. We consider two variants of DP – joint DP (JDP), where a centralized agent is responsible for protecting users' sensitive data and local DP (LDP), where information needs to be protected directly on the user side. We first propose two general frameworks – one for policy optimization and another for value iteration – for designing private, optimistic RL algorithms. We then instantiate these frameworks with suitable privacy mechanisms to satisfy JDP and LDP requirements, and simultaneously obtain sublinear regret guarantees. The regret bounds show that under JDP, the cost of privacy is only a lower order additive term, while for a stronger privacy protection under LDP, the cost suffered is multiplicative. Finally, the regret bounds are obtained by a unified analysis, which, we believe, can be extended beyond tabular MDPs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/18/2022

Differentially Private Reinforcement Learning with Linear Function Approximation

Motivated by the wide adoption of reinforcement learning (RL) in real-wo...
research
10/15/2020

Local Differentially Private Regret Minimization in Reinforcement Learning

Reinforcement learning algorithms are widely used in domains where it is...
research
12/02/2021

Differentially Private Exploration in Reinforcement Learning with Linear Representation

This paper studies privacy-preserving exploration in Markov Decision Pro...
research
06/01/2023

Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards

In this paper, we study the problem of (finite horizon tabular) Markov d...
research
09/18/2020

Private Reinforcement Learning with PAC and Regret Guarantees

Motivated by high-stakes decision-making domains like personalized medic...
research
08/26/2021

Adaptive Control of Differentially Private Linear Quadratic Systems

In this paper, we study the problem of regret minimization in reinforcem...
research
02/02/2022

Improved Regret for Differentially Private Exploration in Linear MDP

We study privacy-preserving exploration in sequential decision-making fo...

Please sign up or login with your details

Forgot password? Click here to reset