Composition Properties of Inferential Privacy for Time-Series Data

07/10/2017 ∙ by Shuang Song, et al. ∙ University of California, San Diego 0

With the proliferation of mobile devices and the internet of things, developing principled solutions for privacy in time series applications has become increasingly important. While differential privacy is the gold standard for database privacy, many time series applications require a different kind of guarantee, and a number of recent works have used some form of inferential privacy to address these situations. However, a major barrier to using inferential privacy in practice is its lack of graceful composition -- even if the same or related sensitive data is used in multiple releases that are safe individually, the combined release may have poor privacy properties. In this paper, we study composition properties of a form of inferential privacy called Pufferfish when applied to time-series data. We show that while general Pufferfish mechanisms may not compose gracefully, a specific Pufferfish mechanism, called the Markov Quilt Mechanism, which was recently introduced, has strong composition properties comparable to that of pure differential privacy when applied to time series data.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

With the proliferation of mobile devices and the internet of things, large amounts of time series data are being collected, stored and mined to draw inferences about the physical environment. Examples include activity recordings of elderly patients to determine the state of their health, power consumption data of residential and commercial buildings to predict power demand responses, location trajectories of users over time to deliver suitable advertisements, among many others. Much of this information is extremely sensitive – activity recordings yield information about what the patient is doing all day, power consumption of a residence can reveal occupancy, and location trajectories can reveal activities of the subjects. It is therefore imperative to develop principled and rigorous solutions that address privacy in these kinds of time series applications.

The gold standard for privacy in database applications has long been differential privacy [2]

; the typical setting is that each record corresponds to the private value of a single person, and the goal is to design algorithms that can compute functions such as classifiers and clusterings on the sensitive data, while hiding the participation of a single person. Differential privacy has many good properties, such as post-processing invariance and graceful composition, which have led to its high popularity and practical use over the years.

Unfortunately, many of the time-series applications described above require a different kind of privacy guarantee. Consider the physical activity monitoring application for example, where the goal is to hide activity at small time intervals while revealing long-term activity patterns. Here the entire dataset is about a single patient, and hence hiding their participation will not be useful. An alternative is entry differential privacy, which hides the inclusion of activity at any given single time point in the data; since activities at close-by time points are highly correlated, this will not prevent an adversary from inferring the activity at the hidden time. To address these issues, a number of recent works [9, 8, 4, 10] have used the notion of inferential privacy, where the goal is to prevent an adversary who has some prior knowledge, from inferring the state of the time series at any particular time.

A clean and elegant framework for inferential privacy is Pufferfish [7], which is our privacy framework of choice. Pufferfish models a privacy problem through a triple ; here is a set of secrets, which is a set of potential facts that we may wish to hide. is a set of tuples of the form where which represent which pairs of secrets should be indistinguishable to an adversary. Finally, is a set of distributions that can plausibly generate the data and describes prior beliefs of an adversary. A mechanism is said to satisfy -Pufferfish privacy in the framework

if an adversary’s posterior odds of every pair of secrets

in is within a factor of of its prior odds. Pufferfish models the physical activity monitoring application as follows – consists of elements of the form , which represent patient has activity at time , consists of tuples of the form for all and all activity pairs , and

consists of a set of Markov Chains that describe how activities transition across time.

However, a major limitation of Pufferfish privacy is that except under very special conditions, it often does not compose gracefully – even if the same or related sensitive data is used in multiple Pufferfish releases that are individually safe, the combined release may have poor privacy guarantees [7]. In many real applications, same or related data is often used across applications, and this forms a major barrier to the practical applicability of Pufferfish.

In this paper, we study this question, and we show a number of composition results for Pufferfish privacy for time series applications in the framework described above. Our results look at two scenarios – sequential and parallel composition; the first is when the same sensitive data is used across multiple computations, and the second is when disjoint sections of the Markov Chain are used in different computations. Note that while in differential privacy, composition in the second case is trivial, this does not apply to Pufferfish, as information about the state of one segment of a Markov Chain can leak information about a correlated segment.

For sequential composition, we show that while in general we cannot expect any arbitrary Pufferfish mechanism to compose gracefully even for the time series framework described above, a specific mechanism, called the Markov Quilt Mechanism, that was recently introduced by [9], does compose linearly, much like pure differential privacy. For parallel composition, we provide two results; first, we show a general result that applies to any Pufferfish mechanism in the framework described above and shows that the privacy guarantee obtained from two releases on two disjoint segments and of the Markov Chain is the worse of the two guarantees plus a correction factor that depends on the distance between and and properties of the chain. Second, we show that if the two segments of the chain are far enough, then, under some mild conditions, using a specific version of the Markov Quilt Mechanism can provide even better parallel composition guarantees, matching those of differential privacy. Our results thus demonstrate that the Markov Quilt Mechanism and its versions have strong composition properties when applied to Markov Chains, thus motivating their use for real time-series applications.

I-a Related Work

Since graceful composition is a critical property of any privacy definition, there has been a significant amount of work on differential privacy composition [2], and it is known to compose rather gracefully. [2] shows that pure differential privacy composes linearly under sequential composition; for parallel composition, the guarantees are even better, and the combined privacy guarantee is the worst of the guarantees offered by the individual releases. [3] shows that a variant of differential privacy, called approximate differential privacy, has even better sequential composition properties than pure differential privacy. Optimal composition guarantees for both pure and approximate differential privacy are established by [6]. Finally, [1] provides a method for numerically calculating privacy guarantees obtained from composing a number of approximate differentially private mechanisms.

In contrast, little is known about the composition properties of inferential privacy. [7] provides examples to show that Pufferfish may not sequentially compose, except in some very special cases. [5] shows that a specialized version of Pufferfish, called Blowfish, which is somewhat closer to differential privacy does have graceful sequential composition properties; however, Blowfish does not apply to time-series data. [9] provides limited privacy guarantees for the Markov Quilt Mechanism under serial composition; however, these guarantees are worse than linear, and they only apply under much more stringent conditions – namely, if all the mechanisms use the same active Markov Quilt.

Ii Preliminaries

Ii-a Time Series Data and Markov Chains

It is common to model time-series data as Markov chains.

Example 1. Suppose we have data tracking the physical activity of a subject: where denotes activity (e.g, running, sitting, etc) of the subject at time . Our goal is to provide the aggregate activity pattern of the subject by releasing (an approximate) histogram, while preventing an adversary from finding out what the subject was doing at a specific time .

Example 2. Suppose we have power consumption data for a house: , where is the power level in Watts at time . Our goal is to output a general power consumption pattern of the household by releasing (an approximate) histogram of the power levels, while preventing an adversary from inferring the power level at a specific time ; specific power levels may be sensitive information, as the presence or absence of family members at a given time can be inferred with the power level.

Markov Chains. Temporal correlation in this kind of time-series data is usually captured by a Markov chain , where represents all possible states and represents the state at time . In Example 1, represents the activity performed by the subject at time and represents all possible activities. In Example 2, represents the power level of the house at time and represents all possible power levels. The transition from one state to another is determined by a transition matrix , and state of is drawn from an initial distribution .

Ii-B The Pufferfish Privacy Framework

Pufferfish privacy framework captures the privacy in these examples. We next define it and specify how the examples mentioned fit in.

A Pufferfish framework is specified by three parameters – a set of secret , a set of secret pairs and a set of data distributions . consists of possible facts about the data that need to be protected. is a set of secret pairs that we want to be indistinguishable. is a set of distributions that can plausibly generate the data and captures the correlation among records; each represents one adversary’s belief of the data. The goal of Pufferfish framework is to ensure indistinguishability of the secrets pairs in under any belief in . Now we define Pufferfish privacy under the framework .

Definition II.1 (Pufferfish Privacy)

A privacy mechanism is said to be -Pufferfish private in a framework if for datasets where , for all secret pairs and for all , we have

(1)

when and are such that ,

Pufferfish Framework for Time-Series Data: We can model the time-series data described in the previous section with the following Pufferfish framework.

Let the database be a Markov chain , where each lies in the state space . Such a Markov Chain may be fully described by a tuple where is an initial distribution and is a transition matrix.

Let denote the event that takes value . The set of secrets is , and the set of secret pairs is . Each represents a Markov chain of the above structure with transition matrix and initial distribution .

In the first example, the state space represents the set of all possible activities and or represents the event that the subject is engaged in activity at time . indicates that we do not want the adversary to distinguish whether the subject is engaged in activity or at a given time. In the second example, the state space represents the set of all possible power levels and or represents the event that the power level of the house is at time . indicates that we do not want the adversary to distinguish whether the house is at power level or at a given time.

Ii-C Notation

We use with a lowercase subscript, for example, , to denote a single node in the Markov chain, and with an uppercase subscript, for example, , to denote a set of nodes in the Markov chain. For a set of nodes we use the notation to denote the number of nodes in .

For , we use to denote the subchain , and we use to denote , to denote .

Ii-D The Markov Quilt Mechanism

[9] proposes the Markov Quilt Mechanism (MQM). It can be used to achieve Pufferfish privacy in the case where

consists of Bayesian networks, of which Markov chains are special cases. We restate the algorithm and the corresponding definitions in this section.

To understand the main idea of MQM, consider a Markov chain . Any two nodes in are correlated to a certain degree, which means releasing the state of one node potentially provides information on the state of the other. However, the amount of correlation between two nodes usually decays as the distance between them grows. Consider a node in the Markov chain. The nodes close to can be highly influenced by its state, while the nodes that are far away are almost independent. Therefore to hide the effect of a node on the result of a query, MQM adds noise that is roughly proportional to the number of nearby nodes, and uses a small correction term to account for the effect of the almost independent set.

To measure the amount of dependence, [9] defines max-influence.

Definition II.2 (max-influence)

The max-influence of a variable on a set of variables under is

(2)

A higher max-influence means higher level of correlation between and , and max-influence becomes if and are independent. For simplicity, we would use to denote .

In a Markov chain, the max-influence can be calculated exactly given the transition matrix and initial distribution . It can also be approximated using properties of the stationary distribution and eigen-gap of the transition matrix if the Markov chain is irreducible and aperiodic. [9] shows the following upper bound of max-influence.

Lemma II.3

For an irreducible and aperiodic Markov chain described by , let be the time reversal of . Let be the stationary distribution of and and let be the eigen-gap of . If , and , then for ,

(3)

To facilitate efficient search for an almost independent set, [9] then defines a Markov Quilt which takes into account the structure a Markov chain.

Definition II.4 (Markov Quilt)

A set of nodes , in a Markov chain is a Markov Quilt for a node if the following conditions hold:

  1. Deleting partitions into parts and such that and .

  2. For all , all and for all , .

Thus, is independent of conditioned on .

Intuitively, is a set of “remote” nodes that are far from , and is the set of “nearby” nodes; and are separated by the Markov Quilt .

A Markov Quilt (with corresponding and ) of is minimal if among all other Markov Quilts with the same nearby set , it has the minimal cardinality.

Lemma II.5

In a Markov chain , the set of minimal Markov Quilts of a node is

(4)

That is, one node on its left and one node to its right can form a Markov Quilt for . Additionally, a Markov Quilt can also be formed by only one node (or ), in which case (or ); and the empty Markov Quilt is also allowed, with corresponding as the whole chain and as the empty set.

  for all  do
     for all  do
        for all Markov Quilts where is in II.5 do
           Calculate
           if  then
                 /*score of */
           else
              
           end if
        end for
        
     end for
     
  end for
  
  return , where
Algorithm 1 MQM(Dataset , -Lipschitz query , , privacy parameter )

The Markov Quilt Mechanism for Markov chain is restated in Algorithm 1. Intuitively, for each node , MQM searches over all the Markov Quilts, finds the one with the least amount of noise needed, and finally adds the noise that is sufficient to protect privacy of all nodes.

It was shown in [9] that MQM guarantees -Pufferfish privacy in the framework in Section II-A provided that the query in Algorithm 1 is -Lipschitz. Note that any Lipschitz function can be scaled to -Lipschitz function.

Observe that Algorithm 1 does not specify how to compute max-influence. [9] proposes two versions of MQM – MQMExact which computes the exact max-influence using Definition II.2, and MQMApprox which computes an upper bound of max-influence using Lemma II.3.

Previous Results on Composition

To design more sophisticated privacy preserving algorithms, we need to understand the privacy guarantee of the combination of two private algorithms, which is called composition.

There are two types of composition – parallel and sequential. The first describes the case where multiple privacy algorithms are applied on disjoint data sets, while the second describes the case where they are applied to the same data.

A major advantage of differential privacy is that it composes gracefully. [2] shows that applying differentially private algorithms, each with -differential privacy, guarantees -differential privacy under parallel composition, and -differential privacy under sequential composition. Better and more sophisticated composition results have been shown for approximate differential privacy [3] [6].

Unlike differential privacy, Pufferfish privacy does not always compose linearly [7]. However, we can still hope to achieve composition for special Pufferfish mechanisms or for special classes of data distributions .

[9] does not provide any parallel composition result. The following sequential composition result for MQM on Markov chain is provided.

Theorem II.6

Let be a set of Lipschitz queries, be a Pufferfish framework as defined in Section II-A, and be a database. Given fixed Markov Quilt sets for all , let denote the Markov Quilt Mechanism that releases with -Pufferfish privacy under using Markov Quilt sets . Then releasing guarantees -Pufferfish privacy under .

Notice that this result holds only when the same Markov Quilts are used for all releases. Moreover, the final privacy guarantee depends on the worst privacy guarantees over the releases. In practice, it might not be easy to enforce the MQM to use the same Markov Quilts at all releases; and if even one of the releases guarantees large , the final privacy guarantee can be bad.

Iii Results

As discussed in the previous section, general Pufferfish mechanisms do not compose linearly. However, we can exploit the properties of data distributions – Markov chains, and properties of the specific Pufferfish mechanism – MQM to obtain new parallel composition result as well as improved sequential composition result.

Iii-a Parallel Composition

Setup: Consider the Pufferfish framework as described in Section II-A. Parallel composition can be formulated as follows.

Suppose there are two subchains of the Markov chain, and where ; and correspondingly, let , and , .

Suppose Alice has access to subchain and wants to release Lipchitz query while guaranteeing -Pufferfish Privacy under framework ; and Bob has access to and wants to release Lipchitz query while guaranteeing -Pufferfish Privacy under framework .

Our goal is to determine how strong the Pufferfish privacy guarantee we can get for releasing .

A General Result for Markov Chains

Theorem III.1

Suppose are two mechanisms such that guarantees -Pufferfish privacy under framework and guarantees -Pufferfish privacy under framework . Then releasing guarantees -Pufferfish Privacy under framework .

Comparing with parallel composition for differential privacy, here we have the extra terms and which capture the correlation between and – the end point of the first subchain and the starting point of the second. This is to be expected, since there is correlation among states in the Markov chain. Intuitively, if the two subchains are close enough, releasing information on one can cause a privacy breach of the other.

MQM on Markov Chains

Let denote the output of MQM on dataset , query function , privacy parameter and Pufferfish framework . Suppose Alice and Bob use MQMApprox to publish and respectively.

Before we establish a parallel composition result, we begin with a definition.

Definition III.2

(Active Markov Quilt) Consider an instance of the Markov Quilt Mechanism . We say that a Markov Quilt (with corresponding ) for a node is active with respect to if , and thus .

Theorem III.3

Suppose we run MQMApprox to release . If the following conditions hold:

  1. for any , there exists some and such that the active Markov Quilts of and with respect to are of the form and respectively for some , , , , and

  2. , i.e., are far from each other compared to their lengths,

then the release guarantees -Pufferfish Privacy under the framework .

The main intuition is as follows. Note that we require the active Markov Quilt of some to be of the form . For any , the correction factor added to account for the effect of the nodes also automatically accounts for the effect of , provided that does not overlap with . This is ensured by the second condition in Theorem III.3.

Iii-B Sequential Composition

Consider the case when Alice and Bob have access to the entire Markov Chain , and want to publish Lipschitz queries with Pufferfish parameters and Pufferfish framework as described in Section II-A.

General Results for Markov Chains

First, we show that an arbitrary Pufferfish mechanism does not compose linearly even when consists of Markov chains.

Theorem III.4

There exists a Markov chain , a function and mechanisms , such that both and guarantee -Pufferfish privacy under framework , yet releasing does not guarantee -Pufferfish privacy under framework .

Now we show that arbitrary Pufferfish mechanisms compose with a correction factor that depends on the max-divergence between the joint and product distributions of and . We define max-divergence first.

Definition III.5 (max-divergence)

Let and be two distributions with the same support. The max-divergence between them is defined as:

Now we state the composition theorem.

Theorem III.6

Suppose are two mechanisms used by Alice and Bob which guarantee and -Pufferfish privacy respectively under framework . If there exists , such that for all ,

then the releasing guarantees -Pufferfish Privacy under framework .

The max-divergence between the joint and product distributions of and measures the amount of dependence between the two releases. The more independent they are, the smaller the max-divergence would be and the stronger privacy the algorithm guarantees.

MQM on Markov Chains

We next show that we can further exploit the properties of MQM to provide tighter privacy guarantees than that provided in [9].

Suppose Alice and Bob use MQM to achieve Pufferfish privacy under the same framework . We show that when consists of Markov chains, even if the two runs of MQM use different Markov Quilts, MQM still compose linearly. This result applies to both MQMExact and MQMApprox.

Theorem III.7

For the Pufferfish framework defined in Section II-A, releasing for all guarantees -Pufferfish privacy under framework .

This result shows that MQM on Markov chain achieves the same composition guarantee as pure differential privacy. Comparing to the composition results provided in [9], i.e., Theorem II.6, Theorem III.7 provides better privacy guarantee under less restricted conditions. It does not require the same Markov Quilts to be used in the two runs of MQM. Moreover, the privacy guarantee is better when ’s are different – as opposite to .

Iv Conclusion

In conclusion, motivated by emerging sensing applications, we study composition properties of Pufferfish, a form of inferential privacy, for certain kinds of time-series data. We provide both sequential and parallel composition results. Our results illustrate that while Pufferfish does not have strong composition properties in general, variants of the recently introduced Markov Quilt Mechanism that guarantees Pufferfish privacy for time series data, do compose well, and have strong composition properties comparable to pure differential privacy. We believe that these results make these mechanisms attractive for practical time series applications.

Acknowledgment

We thank Joseph Geumlek, Sewoong Oh and Yizhen Wang for initial discussions. This work was partially supported by NSF under IIS 1253942, ONR under N00014-16-1-2616 and a Google Faculty Research Award.

References

  • [1] Martín Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 308–318. ACM, 2016.
  • [2] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, 2006.
  • [3] Cynthia Dwork, Guy N Rothblum, and Salil Vadhan. Boosting and differential privacy. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on, pages 51–60. IEEE, 2010.
  • [4] Arpita Ghosh and Robert Kleinberg. Inferential privacy guarantees for differentially private mechanisms. arXiv preprint arXiv:1603.01508, 2016.
  • [5] Xi He, Ashwin Machanavajjhala, and Bolin Ding. Blowfish privacy: tuning privacy-utility trade-offs using policies. In SIGMOD ’14, pages 1447–1458, 2014.
  • [6] Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The composition theorem for differential privacy. In Francis Bach and David Blei, editors,

    Proceedings of the 32nd International Conference on Machine Learning

    , volume 37 of Proceedings of Machine Learning Research, pages 1376–1385, Lille, France, 07–09 Jul 2015. PMLR.
  • [7] Daniel Kifer and Ashwin Machanavajjhala. Pufferfish: A framework for mathematical privacy definitions. ACM Trans. Database Syst., 39(1):3, 2014.
  • [8] Changchang Liu, Supriyo Chakraborty, and Prateek Mittal. Dependence makes you vulnerable: Differential privacy under dependent tuples. In NDSS 2016, 2016.
  • [9] Shuang Song, Yizhen Wang, and Kamalika Chaudhuri. Pufferfish privacy mechanisms for correlated data. In Proceedings of the 2017 ACM International Conference on Management of Data, pages 1291–1306. ACM, 2017.
  • [10] Yonghui Xiao and Li Xiong. Protecting locations with differential privacy under temporal correlations. In Proceedings of the 22nd ACM SIGSAC CCS.

Appendix A Proofs of Composition Results

A-a Proofs for Parallel Composition Results

(of Theorem III.1) Consider the case when the secret pair is for some . For any , we have

since and are independent conditioned on . The first ratio is upper bounded by since is -Pufferfish private. The second ratio can be written as

(5)

where the second equality follows from the fact that is independent of given .
Since and
, (A-A) can be upper bounded by

On the other hand, (A-A) is also upper bounded by

where the equality follows becasue for any . Therefore (A-A) is upper bounded by .

Combining the bound of the first ratio, we get

If the secret pair is for some where , we have

where the third equality is because are independent of given .
Therefore we have

where the last step follows from our previous bound for (A-A) and the fact that guarantees Pufferfish privacy.

The same analysis can be applied to the case where the secret is for some and the upper bound is . (of Theorem III.3) Denote the noises added by MQM for Alice and Bob by respectively. Consider any secret pair of the form . We want to upper bound the following ratio for any .

By assumption, there exists some whose active Markov Quilt is with corresponding and ; and we have .

The main idea of the proof is that we can “borrow” the Markov Quilt of as the Markov Quilt for any because doing so will not increase the noise scale