Private Reinforcement Learning with PAC and Regret Guarantees

by   Giuseppe Vietri, et al.

Motivated by high-stakes decision-making domains like personalized medicine where user information is inherently sensitive, we design privacy preserving exploration policies for episodic reinforcement learning (RL). We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP)–a strong variant of differential privacy for settings where each user receives their own sets of output (e.g., policy recommendations). We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee. Our algorithm only pays for a moderate privacy cost on exploration: in comparison to the non-private bounds, the privacy parameter only appears in lower-order terms. Finally, we present lower bounds on sample complexity and regret for reinforcement learning subject to JDP.


page 1

page 2

page 3

page 4


Improved Regret for Differentially Private Exploration in Linear MDP

We study privacy-preserving exploration in sequential decision-making fo...

Offline Reinforcement Learning with Differential Privacy

The offline reinforcement learning (RL) problem is often motivated by th...

Local Differentially Private Regret Minimization in Reinforcement Learning

Reinforcement learning algorithms are widely used in domains where it is...

Private Q-Learning with Functional Noise in Continuous Spaces

We consider privacy-preserving algorithms for deep reinforcement learnin...

Privacy-Preserving Reinforcement Learning Beyond Expectation

Cyber and cyber-physical systems equipped with machine learning algorith...

Privacy-preserving Prediction

Ensuring differential privacy of models learned from sensitive user data...

Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning

We study risk-sensitive reinforcement learning (RL) based on the entropi...