Improved Regret for Differentially Private Exploration in Linear MDP

02/02/2022
by   Dung Daniel Ngo, et al.
0

We study privacy-preserving exploration in sequential decision-making for environments that rely on sensitive data such as medical records. In particular, we focus on solving the problem of reinforcement learning (RL) subject to the constraint of (joint) differential privacy in the linear MDP setting, where both dynamics and rewards are given by linear functions. Prior work on this problem due to Luyo et al. (2021) achieves a regret rate that has a dependence of O(K^3/5) on the number of episodes K. We provide a private algorithm with an improved regret rate with an optimal dependence of O(√(K)) on the number of episodes. The key recipe for our stronger regret guarantee is the adaptivity in the policy update schedule, in which an update only occurs when sufficient changes in the data are detected. As a result, our algorithm benefits from low switching cost and only performs O(log(K)) updates, which greatly reduces the amount of privacy noise. Finally, in the most prevalent privacy regimes where the privacy parameter ϵ is a constant, our algorithm incurs negligible privacy cost – in comparison with the existing non-private regret bounds, the additional regret due to privacy appears in lower-order terms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2020

Private Reinforcement Learning with PAC and Regret Guarantees

Motivated by high-stakes decision-making domains like personalized medic...
research
12/09/2022

Near-Optimal Differentially Private Reinforcement Learning

Motivated by personalized healthcare and other applications involving se...
research
08/26/2021

Adaptive Control of Differentially Private Linear Quadratic Systems

In this paper, we study the problem of regret minimization in reinforcem...
research
12/02/2021

Differentially Private Exploration in Reinforcement Learning with Linear Representation

This paper studies privacy-preserving exploration in Markov Decision Pro...
research
12/20/2021

Differentially Private Regret Minimization in Episodic Markov Decision Processes

We study regret minimization in finite horizon tabular Markov decision p...
research
06/26/2023

A General Framework for Sequential Decision-Making under Adaptivity Constraints

We take the first step in studying general sequential decision-making un...
research
02/01/2019

Privacy Preserving Off-Policy Evaluation

Many reinforcement learning applications involve the use of data that is...

Please sign up or login with your details

Forgot password? Click here to reset