Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients

by   Chris Cundy, et al.

As reinforcement learning techniques are increasingly applied to real-world decision problems, attention has turned to how these algorithms use potentially sensitive information. We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions. We give examples of how this setting covers real-world problems in privacy for sequential decision-making. We solve this problem in the policy gradients framework by introducing a regularizer based on the mutual information (MI) between the sensitive state and the actions at a given timestep. We develop a model-based stochastic gradient estimator for optimization of privacy-constrained policies. We also discuss an alternative MI regularizer that serves as an upper bound to our main MI regularizer and can be optimized in a model-free setting. We contrast previous work in differentially-private RL to our mutual-information formulation of information disclosure. Experimental results show that our training method results in policies which hide the sensitive state.


page 1

page 2

page 3

page 4


Learning to Share and Hide Intentions using Information Regularization

Learning to cooperate with friends and compete with foes is a key compon...

Better Long-Range Dependency By Bootstrapping A Mutual Information Regularizer

In this work, we develop a novel regularizer to improve the learning of ...

Differentially Private Reinforcement Learning with Linear Function Approximation

Motivated by the wide adoption of reinforcement learning (RL) in real-wo...

Distributed Differentially Private Mutual Information Ranking and Its Applications

Computation of Mutual Information (MI) helps understand the amount of in...

Privacy Preserving Off-Policy Evaluation

Many reinforcement learning applications involve the use of data that is...

Invariant Representations with Stochastically Quantized Neural Networks

Representation learning algorithms offer the opportunity to learn invari...

Mutual Information Maximization for Robust Plannable Representations

Extending the capabilities of robotics to real-world complex, unstructur...