Private Reinforcement Learning with PAC and Regret Guarantees

09/18/2020
by   Giuseppe Vietri, et al.
13

Motivated by high-stakes decision-making domains like personalized medicine where user information is inherently sensitive, we design privacy preserving exploration policies for episodic reinforcement learning (RL). We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP)–a strong variant of differential privacy for settings where each user receives their own sets of output (e.g., policy recommendations). We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee. Our algorithm only pays for a moderate privacy cost on exploration: in comparison to the non-private bounds, the privacy parameter only appears in lower-order terms. Finally, we present lower bounds on sample complexity and regret for reinforcement learning subject to JDP.

READ FULL TEXT

page 1

page 2

page 3

page 4

02/02/2022

Improved Regret for Differentially Private Exploration in Linear MDP

We study privacy-preserving exploration in sequential decision-making fo...
06/02/2022

Offline Reinforcement Learning with Differential Privacy

The offline reinforcement learning (RL) problem is often motivated by th...
10/15/2020

Local Differentially Private Regret Minimization in Reinforcement Learning

Reinforcement learning algorithms are widely used in domains where it is...
01/30/2019

Private Q-Learning with Functional Noise in Continuous Spaces

We consider privacy-preserving algorithms for deep reinforcement learnin...
03/18/2022

Privacy-Preserving Reinforcement Learning Beyond Expectation

Cyber and cyber-physical systems equipped with machine learning algorith...
03/27/2018

Privacy-preserving Prediction

Ensuring differential privacy of models learned from sensitive user data...
11/06/2021

Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning

We study risk-sensitive reinforcement learning (RL) based on the entropi...