Convex Optimization for Linear Query Processing under Approximate Differential Privacy

02/13/2016 ∙ by Ganzhao Yuan, et al. ∙ South China University of Technology International Student Union University of Illinois at Urbana-Champaign Qatar Foundation 0

Differential privacy enables organizations to collect accurate aggregates over sensitive data with strong, rigorous guarantees on individuals' privacy. Previous work has found that under differential privacy, computing multiple correlated aggregates as a batch, using an appropriate strategy, may yield higher accuracy than computing each of them independently. However, finding the best strategy that maximizes result accuracy is non-trivial, as it involves solving a complex constrained optimization program that appears to be non-linear and non-convex. Hence, in the past much effort has been devoted in solving this non-convex optimization program. Existing approaches include various sophisticated heuristics and expensive numerical solutions. None of them, however, guarantees to find the optimal solution of this optimization problem. This paper points out that under (ϵ, δ)-differential privacy, the optimal solution of the above constrained optimization problem in search of a suitable strategy can be found, rather surprisingly, by solving a simple and elegant convex optimization program. Then, we propose an efficient algorithm based on Newton's method, which we prove to always converge to the optimal solution with linear global convergence rate and quadratic local convergence rate. Empirical evaluations demonstrate the accuracy and efficiency of the proposed solution.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Differential privacy [5, 3] is a strong and rigorous privacy protection model that is known for its generality, robustness and effectiveness. It is used, for example, in the ubiquitous Google Chrome browser [7]. The main idea is to publish randomized aggregate information over sensitive data, with the guarantee that the adversary cannot infer with high confidence the presence or absence of any individual in the dataset from the released aggregates. An important goal in the design of differentially private methods is to maximize the accuracy of the published noisy aggregates with respect to their exact values.

Besides optimizing for specific types of aggregates, an important generic methodology for improving the overall accuracy of the released aggregates under differential privacy is batch processing, first proposed in [13]. Specifically, batch processing exploits the correlations between multiple queries, so that answering the batch as a whole can lead to higher overall accuracy than answering each query individually. For example, if one aggregate query Q1 (e.g., the total population of New York State and New Jersey) can be expressed as the sum of two other queries (the population of New York and New Jersey, respectively), i.e., Q1 = Q2 + Q3, then we can simply answer Q1 by adding up the noisy answers of Q2 and Q3. Intuitively, answering two queries instead of three reduces the amount of random perturbations required to satisfy differential privacy, leading to higher overall accuracy for the batch as a whole [13, 30]. In this paper, we focus on answering linear aggregate queries under differential privacy. Given a batch of linear aggregate queries (called the workload), we aim to improve their overall accuracy by answering a different set of queries (called the strategy) under differential privacy, and combining their results to obtain the answers to the original workload aggregates.

As shown in [13, 14, 30, 31, 11], different strategy queries lead to different overall accuracy for the original workload. Hence, an important problem in batch processing under differential privacy is to find a suitable strategy that leads to the highest accuracy. Such a strategy can be rather complex, rendering manual construction and brute-force search infeasible [30, 31]. On the other hand, the problem of strategy searching can be formulated into a constrained optimization program, and it suffices to find the optimal solution of this program. However, as we show later in Section 2, the program appears to be non-linear and non-convex; hence, solving it is rather challenging. As we review in Section 2.2, existing approaches resort to either heuristics or complex, expensive and unstable numerical methods. To our knowledge, no existing solutions guarantee to find the optimal solution.

This paper points out that under the (, )-differential privacy definition (also called approximate differential privacy, explained in Section 2), the constrained optimization program for finding the best strategy queries can be re-formulated into a simple and elegant convex optimization program. Note that although the formulation itself is simple, its derivation is rather complicated and non-trivial. Based on this new formulation, we propose the first polynomial solution COA that guarantees to find the optimal solution to the original constrained optimization problem in search of a suitable strategy for processing a batch of arbitrary linear aggregate queries under approximate differential privacy. COA is based on Newton’s method and it utilizes various non-trivial properties of the problem. We show that COA achieves globally linear and locally quadratic convergence rate. Extensive experiments confirm the effectiveness and efficiency of the proposed method.

The rest of the paper is organized as follows. Section 2 provides necessary background on differential privacy and overviews related work. Section 3 presents our convex programming formulation for batch linear aggregate processing under approximate differential privacy. Section 4 describes the proposed solution COA. Section 5 contains a thorough set of experiments. Section 6

concludes the paper with directions for future work. In this paper, boldfaced lowercase letters denote vectors and uppercase letters denote real-valued matrices. We summarize the frequent notations in Table

1.

Symbol Meaning
, Workload matrix
Number of queries (i.e., rows) in
Unit counts (i.e., columns) in
, Covariance matrix of
, Solution matrix
, Strategy matrix
, pseudo-inverse of matrix
, Vectorized listing of
, Convert into a matrix
, Objective value of
, Gradient matrix of
, Hessian matrix of
, Equivalent to
1/0 All-one column vector/All-zero column vector
I Identity matrix
Matrix is positive semidefinite
Matrix is positive definite
Eigenvalue of (increasing order)
Diagonal matrix with as the main diagonal entries
Column vector formed from the main diagonal of
Operator norm: the largest eigenvalue of
Smallest eigenvalue of
Sum of the elements on the main diagonal
Euclidean inner product, i.e.,
Kronecker product of and
Hadamard (a.k.a. entry-wise) product of and

Nuclear norm: sum of the singular values of matrix

Frobenius norm:
Generalized vector norm:
lower bound and upper bound of
lower bound and upper bound of
lower bound and upper bound of
Table 1: Summary of frequent notations

2 background

2.1 Preliminaries

A common definition of differential privacy is (, )-differential privacy [5], as follows:

Definition 1

Two databases and are neighboring iff they differ by at most one tuple. A randomized algorithm satisfies ()-differential privacy iff for any two neighboring databases and and any measurable output in the range of , we have

When , the above definition reduces to another popular definition: -differential privacy (also called “exact differential privacy”). This work focuses on the case where , which is sometimes called approximate differential privacy. Usually, is set to a value smaller than , where is the number of records in the dataset . Both exact and approximate definitions of differential privacy provide strong and rigorous privacy protection to the users. Given the output of a differentially private mechanism, the adversary cannot infer with high confidence (controlled by parameters and ) whether the original database is or any of its neighbors , which differ from

by one record, meaning that each user can plausibly deny the presence of her tuple. An approximately differentially private mechanism can be understood as satisfying exact differential privacy with a certain probability controlled by parameter

. Hence, it is a more relaxed definition which is particularly useful when the exact definition is overly strict for an application, leading to poor result utility.

One basic mechanism for enforcing approximate differential privacy is the Gaussian mechanism [4], which injects Gaussian noise to the query results calibrated to the sensitivity of the queries. Note that the Gaussian mechanism cannot be applied to exact differential privacy. Since the proposed method is based on the Gaussian mechanism, it is limited to query processing under approximate differential privacy as well. Specifically, for any two neighbor databases and , the sensitivity of a query set is defined as . Given a database and a query set

, the Gaussian mechanism outputs a random result that follows the Gaussian distribution with mean

and magnitude , where .

This paper focuses on answering a batch of linear aggregate queries, , each of which is a linear combination of the unit aggregates of the input database . For simplicity, in the following we assume that each unit aggregate is a simple count, which has an sensitivity of 1. Other types of aggregates can be handled by adjusting the sensitivity accordingly. The query set can be represented by a workload matrix with rows and columns. Each entry in is the weight in query on the -th unit count . Since we do not use any other information of the input database besides the unit counts, in the following we abuse the notation by using to represent the vector of unit counts. Therefore, we define , (“” means define). The query batch can be answered directly by:

Given a workload matrix , the worse-case expected squared error of a mechanism is defined as [13, 15, 20]:

where the expectation is taken over the randomness of . Without information of the underlying dataset, the lowest error achievable by any differentially private mechanism for the query matrix and database is:

(1)

where the infimum is taken over all differentially private mechanisms. If a mechanism minimizes the objective value in Eq (1), it is the optimal linear counting query processing mechanism, in the sense that without any prior information of the sensitive data, it achieves the lowest expected error.

2.2 Existing Solutions

Matrix Mechanism. The first solution for answering batch linear aggregate queries under differential privacy is the matrix mechanism [13]. The main idea is that instead of answering the workload queries directly, the mechanism first answers a different set of queries under differential privacy, and then combine their results to answer . Let matrix represent the strategy queries, where each row represent a query and each column represent a unit count. Then, according to the Gaussian mechanism, can be answered using under -differentially privacy, where denotes an dimensional Gaussian variable with scale , and is the maximum norm among all column vectors of . Accordingly, the matrix mechanism answers by:

(2)

where is the Moore-Penrose pseudo-inverse of .

Based on Eq (2), Li et al. [13] formalize the strategy searching problem for batch linear counting query processing in Eq(1) into the following nonlinear optimization problem:

(3)

In the above optimization program, can be either 1 or 2, and the method in [13] applies to both exact and approximate differential privacy. This optimization program, however is rather difficult to solve. The pseudoinverse of of involved in Program (3) is not a continuous function, as it jumps around when is ill-conditioned. Therefore, does not have a derivative, and we cannot solve the problem with simple gradient descent. As pointed out in [31], the solutions in [13] are either prohibitively expensive (which needs to iteratively solve a pair of related semidefinite programs that incurs computational costs), or ineffective (which rarely obtains strategies that outperform naive methods).

Low-Rank Mechanism. Yuan et al. [31] propose the Low-Rank Mechanism (LRM), which formulates the batch query problem as the following low-rank matrix factorization problem:

(4)

where . It can be shown that Program (4) and Program (3) are equivalent to each other; hence, LRM can be viewed as a way to solve the Matrix Mechanism optimization program (to our knowledge, LRM is also the first practical solution for this program). The LRM formulation avoids the pseudo-inverse of the strategy matrix ; however, it is still a non-linear, non-convex constrained optimization program. Hence, it is also difficult to solve. The solution in LRM is a sophisticated numeric method based first-order augmented Lagrangian multipliers (ALM). This solution, however, cannot guarantee to find the globally optimal strategy matrix , due to the non-convex nature of the problem formulation.

Further, the LRM solution may not converge at all. Specifically, it iteratively updates using the formula: , where is the penalty parameter. When is low-rank, according to the rank inequality for matrix multiplication, it leads to: . Therefore, the equality constraint may never hold since we can never express a full-rank matrix with the product of two low-rank ones. When this happens, LRM never converges. For this reason, the initial value of needs to be chosen carefully so that it is not low-rank. However, this problem cannot be completed avoided since during the iterations of LRM, the rank of may drop. Finally, even in cases where LRM does converge, its convergence rate can be slow, leading to high computational costs as we show in the experiments. In particular, the LRM solution is not necessarily a monotone descent algorithm, meaning that the accuracy of its solutions can fluctuate during the iterations.

Adaptive Mechanism. In order to alleviate the computational overhead of the matrix mechanism, adaptive mechanism (AM) [14] considers the following optimization program:

(5)

where

is from the singular value decomposition of the workload matrix

with , and , i.e., the diagonal values of . AM then computes the strategy matrix by , where is a diagonal matrix with as its diagonal values.

The main drawback of AM is that it searches over a reduced subspace of , since it is limited to a weighted nonnegative combination of the fixed eigen-queries . Hence, the candidate strategy matrix solved from the optimization problem in (5) is not guaranteed to be the optimal strategy. In fact it is often suboptimal, as shown in the experiments.

Exponential Smoothing Mechanism. Based on a reformulation of matrix mechanism, the Exponential Smoothing Mechanism (ESM) [30] considers solving the following optimization program:

(6)

where is a function that retrieves the largest element in a vector. This function is hard to compute since it is non-smooth. The authors use the soft max function to smooth this term and employ the non-monotone spectral projected gradient descent for optimizing the non-convex but smooth objective function on a positive definiteness constraint set.

One major problem with this method is that Program (6) involves matrix inverse operator, which may cause numerical instability when the final solution (i.e., the strategy matrix) is of low rank. Further, since the problem is not convex, the ESM solution does not guarantee to converge to the global optimum, either.

The proposed solution, presented next, avoids all the drawbacks of previous solutions: it is fast, stable, numerically robust, and most importantly, it guarantees to find the optimal solution.

3 A Convex Problem Formulation

This section presents the a convex optimization formulation for finding the best strategy for a given workload of linear aggregate queries. The main idea is that instead of solving for the strategy matrix that minimizes result error directly, we first solve the optimal value for , and then obtain accordingly. Note that there can be multiple strategy matrices from a given , in which case we simply output an arbitrary one, since they all lead to the same overall accuracy for the original workload . As we show soon, the objective function with respect to is convex; hence, the proposed solution is guaranteed to find the global optimum. The re-formulation of the optimization program involves a non-trivial semi-definite programming lifting technique to remove the quadratic term, presented below.

First of all, based on the non-convex model in Program (3), we have the following lemma111All proofs can be found in the Appendix..

Lemma 1

Given an arbitrary strategy matrix , we can always construct another strategy satisfying (i) and (ii) , where is defined in in Program (3).

By Lemma 1, the following optimization program is equivalent to Program (3).

(7)

This paper focuses on approximate differential privacy where . Moreover, we assume that is full rank. If this assumption does not hold, we simply transform into a full rank matrix by adding an identity matrix scaled by , where approaches zero. Formally, we have:

(8)

Let . Using the fact that and , we have the following matrix inverse optimization program (note that and are both full-rank):

(9)

Interestingly, using the fact that , one can approximate the matrix inverse via Neumann Series 222 and rewrite the objective function in terms of matrix polynomials 333. Although other convex semi-definite programming reformulations/relaxations exist (discussed in the Appendix of this paper), we focus on Program (9) and provide convex analysis below.

Convexity of Program (9). Observe that the objective function of Program (9) is not always convex unless some conditions are imposed on and . For instance, in the the one-dimensional case, it reduces to the inversely proportional function , with . Clearly, is convex on the strictly positive space and concave on the strictly negative space.

The following lemma states the convexity of Program (9) under appropriate conditions.

Lemma 2

Assume that . The function is convex (resp., strictly convex) if (resp., ).

Since is the covariance matrix of , is always positive semidefinite. Therefore, according to the above lemma, the objective function of Program (9) is convex. Furthermore, since is strictly positive definite, the objective function is actually strictly convex. Therefore, there exists a unique optimal solution for Program (9).

Dual program of Program (9). The following lemma describes the dual program of Program (9).

Lemma 3

The dual program of Program (9) is the following:

where is associated with the inequality constraint .

Lower and upper bounds for Program (9). Next we establish a lower bound and an upper bound on the objective function of Program (9) for any feasible solution.

Lemma 4

For any feasible solution in Program (9), its objective value is sandwiched as

where , and comes from the SVD decomposition that .

The parameter serves as regularization of the convex problem. When , we always have . As can be seen in our subsequent analysis, the assumption that is strictly positive definite is necessary in our algorithm design.

Problem formulation with equality constraints. We next reformulate Program (9) in the following lemma.

Lemma 5

Assume . The optimization problem in Program (9) is equivalent to the following optimization program:

(10)

Program (10) is much more attractive than Program (9) since the equality constraint is easier to handle than the inequality constraint. As can be seen in our algorithm design below, this equality constraint can be explicitly enforced with suitable initialization. Hence, in the rest of the paper, we focus on solving Program (10).

First-order and second-order analysis. It is not hard to verify that the first-order and second-order derivatives of the objective function can be expressed as (see page 700 in [2]):

(11)

Since our method (described soon) is a greedy descent algorithm, we restrict our discussions on the level set which is defined as:

We now analyze bounds for the eigenvalues of the solution in Program (10), as well as bounds for the eigenvalues of the Hessian matrix and the gradient matrix of the objective function in Program (10). The following lemma shows that the eigenvalues of the solution in Program (10) are bounded.

Lemma 6

For any , there exist some strictly positive constants and such that where and .

The next lemma shows the the eigenvalues of the Hessian matrix and the gradient matrix of the objective function in Program (10) are also bounded.

Lemma 7

For any , there exist some strictly positive constants and such that and , where , , , .

A self-concordant function [18] is a function for which in the affective domain. It is useful in the analysis of Newton’s method. A self-concordant barrier function is used to develop interior point methods for convex optimization.

Self-Concordance Property. The following lemma establishes the self-concordance property of Program (10).

Lemma 8

The objective function
with is a standard self-concordant function, where is a strictly positive constant with

The self-concordance plays a crucial role in our algorithm design and convergence analysis. First, self-concordance ensures that the current solution is always in the interior of the constraint set [18] , which makes it possible for us to design a new Cholesky decomposition-based algorithm that can avoid eigenvalue decomposition444Although Cholesky decomposition and eigenvalue decomposition share the same computational complexity ()) for factorizing a positive definite matrix of size , in practice Cholesky decomposition is often significantly faster than eigenvalue decomposition (e.g. by about 50 times for a square matrix of size ).. Second, self-concordance controls the rate at which the second derivative of a function changes, and it provides a checkable sufficient condition to ensure that our method converges to the global solution with (local) quadratic convergence rate.

4 Convex Optimization Algorithm

In this section, we provide a Newton-like algorithm COA to solve Program (10). We first show how to find the search direction and the step size in Sections 4.1 and 4.2, respectively. Then we study the convergence property of COA in Section 4.3. Finally, we present a homotopy algorithm to further accelerate the convergence. For notational convenience, we use the shorthand notation , , , and to denote the objective value, first-order gradient, hessian matrix and the search direction at the point , respectively.

1:  Input: and such that
2:  Output:
3:  Initialize
4:  while not converge do
5:     Solve the following subproblem by Algorithm 2:
(12)
6:      Perform step-size search to get such that:
7:        (1) is positive definite and
8:        (2) there is sufficient decrease in the objective.
9:     if  is an optimal solution of (1) then
10:        terminate and output
11:     end if
12:     Increment by 1
13:  end while
Algorithm 1 Algorithm COA for Solving Program (10)

Following the approach of [27, 10, 32], we build a quadratic approximation around any solution for the objective function by considering its second-order Taylor expansion:

(13)

Therefore, the Newton direction for the smooth objective functon can then be written as the solution of the following equality constrained quadratic program:

(14)

After the direction is computed, we employ an Arimijo-rule based step size selection to ensure positive definiteness and sufficient descent of the next iterate. We summarize our algorithm COA in Algorithm 1. Note that the initial point has to be a feasible solution, thus and . Moreover, the positive definiteness of all the following iterates will be guaranteed by the step size selection procedure (refer to step 7 in Algorithm 1).

1:  Input: , and current gradient , Specify the maximum iteration
2:  Output: Newton direction
3:  
4:  Set
5:  
6:  for  to  do
7:     
8:     
9:     Set
10:     
11:  end for
12:  return
Algorithm 2 A Modified Conjugate Gradient for Finding as in Program (15)

4.1 Computing the Search Direction

This subsection is devoted to finding the search direction in Eq (14). With the choice of and , Eq(14) reduces to the following optimization program:

(15)

At first glance, Program (15) is challenging. First, this is a constrained optimization program with variables and equality constraints. Second, the optimization problem involves computing and storing an Hessian matrix , which is a daunting task in algorithm design.

We carefully analyze Problem (15) and propose the following solutions. For the first issue, Eq (15) is actually a unconstrained quadratic program with variable. In order to handle the diagonal variables of , one can explicitly enforce the diagonal entries of current solution and its gradient to 0. Therefore, the constraint can always be guaranteed. This implies that linear conjugate gradient method can be used to solve Problem (15). For the second issue, we can make good use of the Kronecker product structure of the Hessian matrix. We note that . Given a vector , using the fact that the Hessian matrix can be computed as , the Hessian-vector product can be computed efficiently as:
, which only involves matrix-matrix computation. Our modified linear conjugate gradient method for finding the search direction is summarized in Algorithm 2. Note that the algorithm involves a parameter controlling the maximum number of iterations. The specific value of does not affect the correctness of COA, only its efficiency. Through experiments we found that a value of usually leads to good overall efficiency of COA.

4.2 Computing the Step Size

After the Newton direction is found, we need to compute a step size that ensures positive definiteness of the next iterate and leads to a sufficient decrease of the objective function. We use Armijo’s rule and try step size with a constant decrease rate until we find the smallest with such that is (i) positive definite, and (ii) satisfies the following sufficient decrease condition [27]:

(16)

where . We choose and in our experiments.

We verify positive definiteness of the solution while computing its Cholesky factorization (takes flops). We remark that the Cholesky factorization dominates the computational cost in the step-size computations. To reduce the computation cost, we can reuse the Cholesky factor in the previous iteration when evaluating the objective function (that requires the computation of ). The decrease condition in Eq (16) has been considered in [27] to ensure that the objective value not only decreases but also decreases by a certain amount , where measures the optimality of the current solution.

The following lemma provides some theoretical insights of the line search program. It states that a strictly positive step size can always be achieved in Algorithm 1. This property is crucial in our global convergence analysis of the algorithm.

Lemma 9

There exists a strictly positive constant
such that the positive definiteness and sufficient descent conditions (in step 7-8 of Algorithm 1) are satisfied. Here and are some constants which are independent of the current solution .

The following lemma shows that a full Newton step size will be selected eventually. This is useful for the proof of local quadratic convergence.

Lemma 10

If is close enough to global optimal solution such that , the line search condition will be satisfied with step size .

4.3 Theoretical Analysis

First, we provide convergence properties of Algorithm 1. We prove that Algorithm 1 always converges to the global optimum, and then analyze its convergence rate. Our convergence analysis is mainly based on the proximal point gradient method [27, 10] for composite function optimization in the literature. Specifically, we have the following results (proofs appear in the Appendix):

Theorem 1

Global Convergence of Algorithm 1. Let be sequences generated by Algorithm 1. Then is nonincreasing and converges to the global optimal solution.

Theorem 2

Global Linear Convergence Rate of Algorithm 1. Let be sequences generated by Algorithm 1, Then converges linearly to the global optimal solution.

Theorem 3

Local Quadratic Convergence Rate of Algorithm 1. Let be sequences generated by Algorithm 1. When is sufficiently close to the global optimal solution, then converges quadratically to the global optimal solution.

It is worth mentioning that Algorithm 1 is the first polynomial algorithm for linear query processing under approximate differential privacy with a provable global optimum guarantee.

Next we analyze the time complexity of our algorithm. Assume that COA converges within outer iterations (Algorithm 1) and inner iterations (Algorithm 2). Due to the complexity for Cholesky factorization (where is the number of unit counts), the total complexity of COA is . Note that the running time of COA is independent of the number of queries . In contrast, the current state-of-the-art LRM has time complexity (where and are the number of outer and inner iterations of LRM, respectively), which involves both and . Note that in the big notation is often much smaller than in practice, due to the quadratic convergence rate of COA. According to our experiments, typically COA converges with and .

4.4 A Homotopy Algorithm

In Algorithm 1, we assume that is positive definite. If this is not true, one can consider adding a deceasing regularization parameter to the diagonal entries of . We present a homotopy algorithm for solving Program (9) with approaching 0 in Algorithm 3.

The homotopy algorithm used in [25, 6] have shown the advantages of continuation method in speeding up solving large-scale optimization problems. In continuation method, a sequence of optimization problems with deceasing regularization parameter is solved until a sufficiently small value is arrived. The solution of each optimization is used as the warm start for the next iteration.

In Eq (8), a smaller is always preferred because it results in more accurate approximation of the original optimization in Program (9). However, it also implies a slower convergence rate, according to our convergence analysis. Hence the computational cost of our algorithm is high when small is selected. In Algorithm 3, a series of problems with decreasing regularization parameter are solved by using Algorithm 1, and the solution of each run of Algorithm 1 is used as the initial solution of the next iteration. In this paper, Algorithm 3 starts from a large , and it stops when the preferred arrives.

1:  Input: workload matrix
2:  Output:
3:  Specify the maximum iteration
4:  Initialize ,
5:  for  to  do
6:     Apply Algorithm 1 with and to obtain
7:     
8:  end for
Algorithm 3 A Homotopy Algorithm for Solving Eq (9) with approaching 0.
(a)
(a) WDiscrete
(b) WMarginal
(c) WRange
(d) WRelated
Figure 1: Convergence behavior of the proposed convex optimization algorithm (Algorithm 1).

5 Experiments

This section experimentally evaluates the effectiveness of the proposed convex optimization algorithm COA for linear aggregate processing under approximate differential privacy. We compare COA with six existing methods: Gaussian Mechanism (GM) [16], Wavelet Mechanism (WM) [29], Hierarchical Mechanism (HM) [8], Exponential Smoothing Mechanism (ESM) [30, 13], Adaptive Mechanism (AM) [14, 13] and Low-Rank Mechanism (LRM) [30, 31]. Qardaji et al. [23] proposed an improved version of HM by carefully selecting the branching factor. Similar to HM, this method focuses on range processing, and there is no guarantee on result quality for general linear aggregates. A detailed experimental comparison with [23] is left as future work. Moreover, we also compare with a recent hybrid data- and workload-aware method [12] which is designed only for range queries and exact differential privacy. Since a previous study [31] has shown that LRM significantly outperforms MWEM, we do not compare with Exponential Mechanism with Multiplicative Weights update (MWEM). Although the batch query processing problem under approximate differential privacy in Program (9) can be reformulated as a standard semi-definite programming problem which can be solved by interior point solvers, we do not compare with it either since such method requires prohibitively high CPU time and memory consumption even for one single (Newton) iteration.

Figure 3: Effect of varying domain size with on workload WMarginal.
Figure 4: Effect of varying domain size with on workload WRange.
(a)
(a) Search Log
(b) Net Trace
(c) Social Network
(d) UCI Adult
(a) Search Log
(b) Net Trace
(c) Social Network
(d) UCI Adult
(a) Search Log
(b) Net Trace
(c) Social Network
(d) UCI Adult
(a) Search Log
(b) Net Trace
(c) Social Network
(d) UCI Adult
Figure 2: Effect of varying domain size with on workload WDiscrete.
Figure 3: Effect of varying domain size with on workload WMarginal.
Figure 4: Effect of varying domain size with on workload WRange.
Figure 5: Effect of varying domain size with on workload WRelated.
Figure 2: Effect of varying domain size with on workload WDiscrete.
Figure 7: Effect of varying number of queries with on workload WMarginal.
Figure 8: Effect of varying number of queries with on workload WRange.
(a)
(a) Search Log
(b) Net Trace
(c) Social Network
(d) UCI Adult
(a) Search Log
(b) Net Trace
(c) Social Network
(d) UCI Adult
(a) Search Log
(b) Net Trace
(c) Social Network
(d) UCI Adult
(a) Search Log
(b) Net Trace
(c) Social Network
(d) UCI Adult
Figure 6: Effect of varying number of queries with on workload WDiscrete.
Figure 7: Effect of varying number of queries with on workload WMarginal.
Figure 8: Effect of varying number of queries with on workload WRange.
Figure 9: Effect of varying number of queries with on workload WRelated.
Figure 6: Effect of varying number of queries with on workload WDiscrete.

For AM, we employ the Python implementation obtained from the authors’ website: http://cs.umass.edu/~chaoli. We use the default stopping criterion provided by the authors. For ESM and LRM, we use the Mablab code provided by the authors, which is publicly available at: http://yuanganzhao.weebly.com/. For COA, we implement the algorithm in Matlab (refer to the Appendix of this paper) and only report the results of Algorithm 1 with the parameter . We performed all experiments on a desktop PC with an Intel quad-core 2.50 GHz CPU and 4GBytes RAM. In each experiment, every algorithm is executed 20 times and the average performance is reported.

Following the experimental settings in [31], we use four real-world data sets (Search Log, Net Trace, Social Network and UCI Adult) and fours different types of workloads (WDiscrete, WRange, WMarginal and WRelated). In WDiscrete

, each entry is a random variable follows the bernoulli distribution; in

WRange

, each query sums the unit counts in a range whose start and end points are randomly generated following the uniform distribution.

WMarginal contains queries uniformly sampled from the set of all two-way marginals. For WRelated, we generate workload matrix by low-rank matrix multiplication [31]. Moreover, we measure average squared error and computation time of all the methods. Here the average squared error is the average squared distance between the exact query answers and the noisy answers. In the following, Section 5.1 examines the convergence of Algorithm 1. Sections 5.2 and 5.3 demonstrate the performance of all method with varying domain size {128, 256, 512, 1024, 2014, 4096, 8192} and number of queries { 128, 256, 512, 1024, 2048, 4096, 8192}, respectively. Section 5.5 shows the running time of the proposed method. Unless otherwise specified, the default parameters in bold are used. The privacy parameters are set to in our experiments for all methods, except for DAWA, which has since it answers queries under exact differential privacy.

5.1 Convergence Behavior of COA

Firstly, we verify the convergence property of COA using all the datasets on all the workloads. We record the objective value (i.e. the expected error), the optimality measure (i.e. ) and the test error on four datasets at every iteration and plot these results in Figure 1.

We make three important observations from these results. (i) The objective value and optimality measure decrease monotonically. This is because our method is a greedy descent algorithm. (ii) The test errors do not necessarily decrease monotonically but tend to decrease iteratively. This is because we add random gaussian noise to the results and the average squared error is expected to decrease. (iii) The objective values stabilize after the th iteration, which means that our algorithm has converged, and the decrease of the error is negligible after the th iteration. This implies that one may use a looser stopping criterion without sacrificing accuracy.

5.2 Impact of Varying Number of Unit Counts

We now evaluate the accuracy performance of all mechanisms with varying domain size from 64 to 4096, after fixing the number of queries to 1024. We report the results of all mechanisms on the 4 different workloads in Figures 5, 5, 5 and 5, respectively. We have the following observations. (i) COA obtains comparable results with LRM, the current state of the art. Part of the reason may be that, the random initialization strategy makes LRM avoid undesirable local minima. In addition, COA and LRM achieve the best performance in all settings. Their improvement over the naive GM is over two orders of magnitude, especially when the domain size is large. (ii) WM and HM obtain similar accuracy on WRange and they are comparable to COA and LRM. This is because they are designed for range queries optimization. (iii) AM and ESM have similar accuracy and they are usually strictly worse than COA and LRM. Moreover, the accuracy of AM and ESM is rather unstable on workload WMarginal. For ESM, this instability is caused by numerical errors in the matrix inverse operation, which can be high when the final solution matrix is low-rank. Finally, AM searches in a reduced subspace for the optimal strategy matrix, leading to suboptimal solutions with unstable quality.

5.3 Impact of Varying Number of Queries

In this subsection, we test the impact of varying the query set cardinality from 32 to 8192 with fixed to 512. The accuracy results of all mechanisms on the 4 different workloads are reported in Figures 9, 9, 9 and 9. We have the following observations. (i) COA and LRM have similar performance and they consistently outperform all the other methods in all test cases. (ii) On WDiscrete and WRange workloads, AM and ESM show comparable performance, which is much worse performance than COA and LRM. (iii) On WDiscrete, WRange and WRelated workload, WM and HM improve upon the naive Gaussian mechanism; however, on WMarginal, WM and HM incur higher errors than GM. AM and ESM again exhibit similar performance, which is often better than that of WM, HM, and GM.

Figure 11: Running time comparisons with varying and fixed for different workloads.
Figure 12: Running time comparisons with varying and fixed for different workloads.
(a)
(b) Net Trace
(c) Search Log
(d) Social Network
(e) UCI Adult
(a) WDiscrete
(b) WMarginal
(c) WRange
(d) WRelated
(a) WDiscrete
(b) WMarginal
(c) WRange
(d) WRelated
(a) WDiscrete
(b) WMarginal
(c) WRange
(d) WRelated
Figure 10: Effect of varying and fixed , on different datasets.
Figure 11: Running time comparisons with varying and fixed for different workloads.
Figure 12: Running time comparisons with varying and fixed for different workloads.
Figure 13: Running time comparisons with varying and fixed for different workloads.
Figure 10: Effect of varying and fixed , on different datasets.

5.4 Impact of Varying Rank of Workload

Past studies [30, 31] show that it is possible to reduce the expected error when the workload matrix has low rank. In this set of experiments, we manually control the rank of workload to verify this claim. Recall that the parameter determines the size of the matrix and the size of the matrix during the generation of the WRelated workload. When and contain only independent rows/columns, is exactly the rank of the workload matrix . In Figure 13, we vary from to . We observe that both LRM and COA outperform all other methods by at least one order of magnitude. With increasing , the performance gap gradually closes. Meanwhile, COA’s performance is again comparable to LRM.

5.5 Running Time Evaluations

We now demonstrate the efficiency of LRM, ESM and COA for the 4 different types of workloads. Other methods, such as WM and HM, requires negligible time since they are essentially heuristics without complex optimization computations. From our experiments we obtain the following results. (i) In Figure 13, we vary from 32 to 8192 and fix to 1024. COA requires the same running time regardless of the number of queries , whereas the efficiency of LRM deteriorates with increasing . (ii) In Figure 13, we vary from 32 to 8192 and fix to 1024. We observe that COA is more efficient than LRM when is relatively small (i.e., ). This is mainly because COA converges with much fewer iterations than LRM. Specifically, we found through manual inspection that COA converges within about outer iterations (refer to Figure 1) and inner iterations (refer to our Matlab code in the Appendix). In contract, LRM often takes about outer iterations and about inner iterations to converge. When is very large (e.g., when ) and is relatively small (1024), COA may run slower than LRM due to the former’s cubic runtime complexity with respect to the domain size . (iii) In Figure 13, we vary from 32 to 8192 and fix to a lager value 2048. We observe that COA is much more efficient than LRM for all values of . This is because the runtime of COA is independent of while LRM scale quadratically with , and COA has quadratic local convergence rate. These results are consistent with the convergence rate analysis and complexity analysis in Section 4.3.

(a)
(a) Search Log
(b) Net Trace
(c) Social Network
(d) UCI Adult
(e) Random Alternating
(f) Random Laplace
(g) Random Gaussian
(h) Random Uniform
Figure 14: Effect of varying domain size n with on workload WRange.
(a) Search Log
(b) Net Trace
(c) Social Network
(d) UCI Adult
(e) Random Alternating
(f) Random Laplace
(g) Random Gaussian
(h) Random Uniform
Figure 15: Effect of varying number of queries with n=1024 on workload WRange.

5.6 COA vs DAWA

DAWA [12] targets very different applications compared to the proposed solution COA. In particular, DAWA focuses on range processing under exact (i.e., -) differential privacy, whereas COA addresses arbitrary linear counting queries under approximate (i.e., (, )-) differential privacy. Adapting DAWA to approximate differential privacy is non-trivial, because at the core of DAWA lies a dynamic programming algorithm that is specially designed for cost and the Laplace mechanism (refer to Section 3.2 in [12]). Further, DAWA replies on certain assumptions of the underlying data, e.g., adjacent counts are similar in value, whereas COA is data-independent. Hence, their relative performance depends on the choice of parameter , as well as the dataset.

We compare COA with different values of ranging from 0.01 to 0.00001 against DAWA on workload WRange, since DAWA focuses on range queries. We also consider 4 additional synthetic datasets which do not have local smoothness structure, i.e. Random Alternating, Random Laplace, Random Gaussian, Random Uniform. Specifically, the sensitive data Random Alternating only contains two values which appear alternatingly in the data sequence. For Random Laplace, Random Gaussian, Random Uniform, the sensitive data consists of a random vector

with mean zero and variance 10 which is drawn from the Laplacian, Gaussian and Uniform distribution, respectively.

Figure 14 shows the results with varying domain size , and Figure 15 shows the results with varying domain size . We have the following observations. (i) On real-world datasets Search Log, Net Trace, Social Network and all synthetic datasets (Random Alternating, Random Laplace, Random Gaussian, Random Uniform), the performance of DAWA is rather poor, since these datasets do not satisfy the assumption that adjacent aggregates have similar values. (ii) With a fixed number of queries , COA significantly outperforms DAWA when is large. (iii) COA generally achieves better performance than DAWA when . (iv) DAWA outperforms COA only when is very small, and the dataset happens to satisfy its assumptions. In such situations, one potential way to improve COA is to incorporate data-dependent information through a post-processing technique (e.g., [9, 11]), which is outside of the scope of this paper and left as future work.

6 Conclusions and Future work

In this paper we introduce a convex re-formulation for optimizing batch linear aggregate queries under approximate differential privacy. We provide a systematic analysis of the resulting convex optimization problem. In order to solve the convex problem, we propose a Newton-like method, which is guaranteed to achieve globally linear convergence rate and locally quadratic convergence rate. Extensive experiment on real world data sets demonstrate that our method is efficient and effective.

There are several research directions worthwhile to pursuit in the future. (i) First of all, it is interesting to extend the proposed method to develop hybrid data- and workload-aware differentially private algorithms [12, 11]. (ii) This paper mainly focuses on optimal squared error minimization. Due to the rotational invariance of the norm, the proposed solution can achieve global optimum. We plan to investigate convex relaxations/reformulations to handle the squared/absolute sum error under differential privacy. (iii) While we consider convex semi-definite optimization, one may consider other convex relaxation methods (e.g. further SDP relaxations [28], Second-Order Cone Programming (SOCP) [26]) and other efficient linear algebra (such as partial eigenvalue decomposition, randomized scheme or parallelization) to reduce the computational cost for large-scale batch linear aggregate query optimization.

Appendix

1 Semi-definite Programming Reformulations

In this section, we discuss some convex Semi-Definite Programming (SDP) reformulations for Eq (2) in our submission. Based on these reformulations, we can directly and effectively solve the batch queries answering problem using off-the-shelf interior-point SDP solvers.

The following lemma is useful in deriving the SDP formulations for approximate and exact differential privacy.

Lemma 11

[2] Schur Complement Condition. Let be a real symmetric matrix given by and be the Schur complement of in , that is: . Then we have:

1.1 Approximate Differential Privacy

This subsection presents the SDP formulation for approximate differential privacy, i.e. . Letting , we have and . Introducing a new variable such that , Eq (2) can be cast into the following convex optimization problem.

(17)

Since whenever , we relax the to . By Lemma 11, we have the following optimization problem which is equivalent to Eq (17):

(18)

After the solution in Eq(18) has been found by solving standard convex SDP, we can preform Cholesky decomposition or eigenvalue decomposition on such that and output the matrix as the final configuration. We remark that the output solution is the exact solution of approximate differential privacy optimization problem itself.

1.2 Exact Differential Privacy

This subsection presents the SDP formulation for exact differential privacy, i.e. . Letting , then we have:

By Lemma 11, we have its equivalent reformulation:

This is also equivalent to the following problem:

Using Lemma 11 again and dropping the rank constraint, we have the following convex relaxation problem:

(19)

After the problem in Eq(1.2) has been solved by standard convex SDP, we can output the matrix as the final configuration. Interestingly, we found that unlike the case for approximate differential privacy, the output matrix is not the exact solution of the exact differential privacy optimization problem since we drop the rank constraint in Eq (1.2).

2 Technical Proofs

The following lemma is useful in our proof.

Lemma 12

For any two matrices and , the following inequality holds:

where denotes the smallest eigenvalue of .

We denote . Since both and are PSD matrices, we let . Then we have the following inequalities: