1 Introduction
Differential privacy [5, 3] is a strong and rigorous privacy protection model that is known for its generality, robustness and effectiveness. It is used, for example, in the ubiquitous Google Chrome browser [7]. The main idea is to publish randomized aggregate information over sensitive data, with the guarantee that the adversary cannot infer with high confidence the presence or absence of any individual in the dataset from the released aggregates. An important goal in the design of differentially private methods is to maximize the accuracy of the published noisy aggregates with respect to their exact values.
Besides optimizing for specific types of aggregates, an important generic methodology for improving the overall accuracy of the released aggregates under differential privacy is batch processing, first proposed in [13]. Specifically, batch processing exploits the correlations between multiple queries, so that answering the batch as a whole can lead to higher overall accuracy than answering each query individually. For example, if one aggregate query Q1 (e.g., the total population of New York State and New Jersey) can be expressed as the sum of two other queries (the population of New York and New Jersey, respectively), i.e., Q1 = Q2 + Q3, then we can simply answer Q1 by adding up the noisy answers of Q2 and Q3. Intuitively, answering two queries instead of three reduces the amount of random perturbations required to satisfy differential privacy, leading to higher overall accuracy for the batch as a whole [13, 30]. In this paper, we focus on answering linear aggregate queries under differential privacy. Given a batch of linear aggregate queries (called the workload), we aim to improve their overall accuracy by answering a different set of queries (called the strategy) under differential privacy, and combining their results to obtain the answers to the original workload aggregates.
As shown in [13, 14, 30, 31, 11], different strategy queries lead to different overall accuracy for the original workload. Hence, an important problem in batch processing under differential privacy is to find a suitable strategy that leads to the highest accuracy. Such a strategy can be rather complex, rendering manual construction and bruteforce search infeasible [30, 31]. On the other hand, the problem of strategy searching can be formulated into a constrained optimization program, and it suffices to find the optimal solution of this program. However, as we show later in Section 2, the program appears to be nonlinear and nonconvex; hence, solving it is rather challenging. As we review in Section 2.2, existing approaches resort to either heuristics or complex, expensive and unstable numerical methods. To our knowledge, no existing solutions guarantee to find the optimal solution.
This paper points out that under the (, )differential privacy definition (also called approximate differential privacy, explained in Section 2), the constrained optimization program for finding the best strategy queries can be reformulated into a simple and elegant convex optimization program. Note that although the formulation itself is simple, its derivation is rather complicated and nontrivial. Based on this new formulation, we propose the first polynomial solution COA that guarantees to find the optimal solution to the original constrained optimization problem in search of a suitable strategy for processing a batch of arbitrary linear aggregate queries under approximate differential privacy. COA is based on Newton’s method and it utilizes various nontrivial properties of the problem. We show that COA achieves globally linear and locally quadratic convergence rate. Extensive experiments confirm the effectiveness and efficiency of the proposed method.
The rest of the paper is organized as follows. Section 2 provides necessary background on differential privacy and overviews related work. Section 3 presents our convex programming formulation for batch linear aggregate processing under approximate differential privacy. Section 4 describes the proposed solution COA. Section 5 contains a thorough set of experiments. Section 6
concludes the paper with directions for future work. In this paper, boldfaced lowercase letters denote vectors and uppercase letters denote realvalued matrices. We summarize the frequent notations in Table
1.Symbol  Meaning 

, Workload matrix  
Number of queries (i.e., rows) in  
Unit counts (i.e., columns) in  
, Covariance matrix of  
, Solution matrix  
, Strategy matrix  
, pseudoinverse of matrix  
, Vectorized listing of  
, Convert into a matrix  
, Objective value of  
, Gradient matrix of  
, Hessian matrix of  
, Equivalent to  
1/0  Allone column vector/Allzero column vector 
I  Identity matrix 
Matrix is positive semidefinite  
Matrix is positive definite  
Eigenvalue of (increasing order)  
Diagonal matrix with as the main diagonal entries  
Column vector formed from the main diagonal of  
Operator norm: the largest eigenvalue of  
Smallest eigenvalue of  
Sum of the elements on the main diagonal  
Euclidean inner product, i.e.,  
Kronecker product of and  
Hadamard (a.k.a. entrywise) product of and  
Nuclear norm: sum of the singular values of matrix 

Frobenius norm:  
Generalized vector norm:  
lower bound and upper bound of  
lower bound and upper bound of  
lower bound and upper bound of 
2 background
2.1 Preliminaries
A common definition of differential privacy is (, )differential privacy [5], as follows:
Definition 1
Two databases and are neighboring iff they differ by at most one tuple. A randomized algorithm satisfies ()differential privacy iff for any two neighboring databases and and any measurable output in the range of , we have
When , the above definition reduces to another popular definition: differential privacy (also called “exact differential privacy”). This work focuses on the case where , which is sometimes called approximate differential privacy. Usually, is set to a value smaller than , where is the number of records in the dataset . Both exact and approximate definitions of differential privacy provide strong and rigorous privacy protection to the users. Given the output of a differentially private mechanism, the adversary cannot infer with high confidence (controlled by parameters and ) whether the original database is or any of its neighbors , which differ from
by one record, meaning that each user can plausibly deny the presence of her tuple. An approximately differentially private mechanism can be understood as satisfying exact differential privacy with a certain probability controlled by parameter
. Hence, it is a more relaxed definition which is particularly useful when the exact definition is overly strict for an application, leading to poor result utility.One basic mechanism for enforcing approximate differential privacy is the Gaussian mechanism [4], which injects Gaussian noise to the query results calibrated to the sensitivity of the queries. Note that the Gaussian mechanism cannot be applied to exact differential privacy. Since the proposed method is based on the Gaussian mechanism, it is limited to query processing under approximate differential privacy as well. Specifically, for any two neighbor databases and , the sensitivity of a query set is defined as . Given a database and a query set
, the Gaussian mechanism outputs a random result that follows the Gaussian distribution with mean
and magnitude , where .This paper focuses on answering a batch of linear aggregate queries, , each of which is a linear combination of the unit aggregates of the input database . For simplicity, in the following we assume that each unit aggregate is a simple count, which has an sensitivity of 1. Other types of aggregates can be handled by adjusting the sensitivity accordingly. The query set can be represented by a workload matrix with rows and columns. Each entry in is the weight in query on the th unit count . Since we do not use any other information of the input database besides the unit counts, in the following we abuse the notation by using to represent the vector of unit counts. Therefore, we define , (“” means define). The query batch can be answered directly by:
Given a workload matrix , the worsecase expected squared error of a mechanism is defined as [13, 15, 20]:
where the expectation is taken over the randomness of . Without information of the underlying dataset, the lowest error achievable by any differentially private mechanism for the query matrix and database is:
(1) 
where the infimum is taken over all differentially private mechanisms. If a mechanism minimizes the objective value in Eq (1), it is the optimal linear counting query processing mechanism, in the sense that without any prior information of the sensitive data, it achieves the lowest expected error.
2.2 Existing Solutions
Matrix Mechanism. The first solution for answering batch linear aggregate queries under differential privacy is the matrix mechanism [13]. The main idea is that instead of answering the workload queries directly, the mechanism first answers a different set of queries under differential privacy, and then combine their results to answer . Let matrix represent the strategy queries, where each row represent a query and each column represent a unit count. Then, according to the Gaussian mechanism, can be answered using under differentially privacy, where denotes an dimensional Gaussian variable with scale , and is the maximum norm among all column vectors of . Accordingly, the matrix mechanism answers by:
(2) 
where is the MoorePenrose pseudoinverse of .
Based on Eq (2), Li et al. [13] formalize the strategy searching problem for batch linear counting query processing in Eq(1) into the following nonlinear optimization problem:
(3) 
In the above optimization program, can be either 1 or 2, and the method in [13] applies to both exact and approximate differential privacy. This optimization program, however is rather difficult to solve. The pseudoinverse of of involved in Program (3) is not a continuous function, as it jumps around when is illconditioned. Therefore, does not have a derivative, and we cannot solve the problem with simple gradient descent. As pointed out in [31], the solutions in [13] are either prohibitively expensive (which needs to iteratively solve a pair of related semidefinite programs that incurs computational costs), or ineffective (which rarely obtains strategies that outperform naive methods).
LowRank Mechanism. Yuan et al. [31] propose the LowRank Mechanism (LRM), which formulates the batch query problem as the following lowrank matrix factorization problem:
(4) 
where . It can be shown that Program (4) and Program (3) are equivalent to each other; hence, LRM can be viewed as a way to solve the Matrix Mechanism optimization program (to our knowledge, LRM is also the first practical solution for this program). The LRM formulation avoids the pseudoinverse of the strategy matrix ; however, it is still a nonlinear, nonconvex constrained optimization program. Hence, it is also difficult to solve. The solution in LRM is a sophisticated numeric method based firstorder augmented Lagrangian multipliers (ALM). This solution, however, cannot guarantee to find the globally optimal strategy matrix , due to the nonconvex nature of the problem formulation.
Further, the LRM solution may not converge at all. Specifically, it iteratively updates using the formula: , where is the penalty parameter. When is lowrank, according to the rank inequality for matrix multiplication, it leads to: . Therefore, the equality constraint may never hold since we can never express a fullrank matrix with the product of two lowrank ones. When this happens, LRM never converges. For this reason, the initial value of needs to be chosen carefully so that it is not lowrank. However, this problem cannot be completed avoided since during the iterations of LRM, the rank of may drop. Finally, even in cases where LRM does converge, its convergence rate can be slow, leading to high computational costs as we show in the experiments. In particular, the LRM solution is not necessarily a monotone descent algorithm, meaning that the accuracy of its solutions can fluctuate during the iterations.
Adaptive Mechanism. In order to alleviate the computational overhead of the matrix mechanism, adaptive mechanism (AM) [14] considers the following optimization program:
(5) 
where
is from the singular value decomposition of the workload matrix
with , and , i.e., the diagonal values of . AM then computes the strategy matrix by , where is a diagonal matrix with as its diagonal values.The main drawback of AM is that it searches over a reduced subspace of , since it is limited to a weighted nonnegative combination of the fixed eigenqueries . Hence, the candidate strategy matrix solved from the optimization problem in (5) is not guaranteed to be the optimal strategy. In fact it is often suboptimal, as shown in the experiments.
Exponential Smoothing Mechanism. Based on a reformulation of matrix mechanism, the Exponential Smoothing Mechanism (ESM) [30] considers solving the following optimization program:
(6) 
where is a function that retrieves the largest element in a vector. This function is hard to compute since it is nonsmooth. The authors use the soft max function to smooth this term and employ the nonmonotone spectral projected gradient descent for optimizing the nonconvex but smooth objective function on a positive definiteness constraint set.
One major problem with this method is that Program (6) involves matrix inverse operator, which may cause numerical instability when the final solution (i.e., the strategy matrix) is of low rank. Further, since the problem is not convex, the ESM solution does not guarantee to converge to the global optimum, either.
The proposed solution, presented next, avoids all the drawbacks of previous solutions: it is fast, stable, numerically robust, and most importantly, it guarantees to find the optimal solution.
3 A Convex Problem Formulation
This section presents the a convex optimization formulation for finding the best strategy for a given workload of linear aggregate queries. The main idea is that instead of solving for the strategy matrix that minimizes result error directly, we first solve the optimal value for , and then obtain accordingly. Note that there can be multiple strategy matrices from a given , in which case we simply output an arbitrary one, since they all lead to the same overall accuracy for the original workload . As we show soon, the objective function with respect to is convex; hence, the proposed solution is guaranteed to find the global optimum. The reformulation of the optimization program involves a nontrivial semidefinite programming lifting technique to remove the quadratic term, presented below.
First of all, based on the nonconvex model in Program (3), we have the following lemma^{1}^{1}1All proofs can be found in the Appendix..
Lemma 1
Given an arbitrary strategy matrix , we can always construct another strategy satisfying (i) and (ii) , where is defined in in Program (3).
By Lemma 1, the following optimization program is equivalent to Program (3).
(7) 
This paper focuses on approximate differential privacy where . Moreover, we assume that is full rank. If this assumption does not hold, we simply transform into a full rank matrix by adding an identity matrix scaled by , where approaches zero. Formally, we have:
(8) 
Let . Using the fact that and , we have the following matrix inverse optimization program (note that and are both fullrank):
(9) 
Interestingly, using the fact that , one can approximate the matrix inverse via Neumann Series ^{2}^{2}2 and rewrite the objective function in terms of matrix polynomials ^{3}^{3}3. Although other convex semidefinite programming reformulations/relaxations exist (discussed in the Appendix of this paper), we focus on Program (9) and provide convex analysis below.
Convexity of Program (9). Observe that the objective function of Program (9) is not always convex unless some conditions are imposed on and . For instance, in the the onedimensional case, it reduces to the inversely proportional function , with . Clearly, is convex on the strictly positive space and concave on the strictly negative space.
The following lemma states the convexity of Program (9) under appropriate conditions.
Lemma 2
Assume that . The function is convex (resp., strictly convex) if (resp., ).
Since is the covariance matrix of , is always positive semidefinite. Therefore, according to the above lemma, the objective function of Program (9) is convex. Furthermore, since is strictly positive definite, the objective function is actually strictly convex. Therefore, there exists a unique optimal solution for Program (9).
Lemma 3
The dual program of Program (9) is the following:
where is associated with the inequality constraint .
Lower and upper bounds for Program (9). Next we establish a lower bound and an upper bound on the objective function of Program (9) for any feasible solution.
Lemma 4
For any feasible solution in Program (9), its objective value is sandwiched as
where , and comes from the SVD decomposition that .
The parameter serves as regularization of the convex problem. When , we always have . As can be seen in our subsequent analysis, the assumption that is strictly positive definite is necessary in our algorithm design.
Problem formulation with equality constraints. We next reformulate Program (9) in the following lemma.
Lemma 5
Assume . The optimization problem in Program (9) is equivalent to the following optimization program:
(10) 
Program (10) is much more attractive than Program (9) since the equality constraint is easier to handle than the inequality constraint. As can be seen in our algorithm design below, this equality constraint can be explicitly enforced with suitable initialization. Hence, in the rest of the paper, we focus on solving Program (10).
Firstorder and secondorder analysis. It is not hard to verify that the firstorder and secondorder derivatives of the objective function can be expressed as (see page 700 in [2]):
(11) 
Since our method (described soon) is a greedy descent algorithm, we restrict our discussions on the level set which is defined as:
We now analyze bounds for the eigenvalues of the solution in Program (10), as well as bounds for the eigenvalues of the Hessian matrix and the gradient matrix of the objective function in Program (10). The following lemma shows that the eigenvalues of the solution in Program (10) are bounded.
Lemma 6
For any , there exist some strictly positive constants and such that where and .
The next lemma shows the the eigenvalues of the Hessian matrix and the gradient matrix of the objective function in Program (10) are also bounded.
Lemma 7
For any , there exist some strictly positive constants and such that and , where , , , .
A selfconcordant function [18] is a function for which in the affective domain. It is useful in the analysis of Newton’s method. A selfconcordant barrier function is used to develop interior point methods for convex optimization.
SelfConcordance Property. The following lemma establishes the selfconcordance property of Program (10).
Lemma 8
The objective function
with is a standard selfconcordant function, where is a strictly positive constant with
The selfconcordance plays a crucial role in our algorithm design and convergence analysis. First, selfconcordance ensures that the current solution is always in the interior of the constraint set [18] , which makes it possible for us to design a new Cholesky decompositionbased algorithm that can avoid eigenvalue decomposition^{4}^{4}4Although Cholesky decomposition and eigenvalue decomposition share the same computational complexity ()) for factorizing a positive definite matrix of size , in practice Cholesky decomposition is often significantly faster than eigenvalue decomposition (e.g. by about 50 times for a square matrix of size ).. Second, selfconcordance controls the rate at which the second derivative of a function changes, and it provides a checkable sufficient condition to ensure that our method converges to the global solution with (local) quadratic convergence rate.
4 Convex Optimization Algorithm
In this section, we provide a Newtonlike algorithm COA to solve Program (10). We first show how to find the search direction and the step size in Sections 4.1 and 4.2, respectively. Then we study the convergence property of COA in Section 4.3. Finally, we present a homotopy algorithm to further accelerate the convergence. For notational convenience, we use the shorthand notation , , , and to denote the objective value, firstorder gradient, hessian matrix and the search direction at the point , respectively.
Following the approach of [27, 10, 32], we build a quadratic approximation around any solution for the objective function by considering its secondorder Taylor expansion:
(13) 
Therefore, the Newton direction for the smooth objective functon can then be written as the solution of the following equality constrained quadratic program:
(14) 
After the direction is computed, we employ an Arimijorule based step size selection to ensure positive definiteness and sufficient descent of the next iterate. We summarize our algorithm COA in Algorithm 1. Note that the initial point has to be a feasible solution, thus and . Moreover, the positive definiteness of all the following iterates will be guaranteed by the step size selection procedure (refer to step 7 in Algorithm 1).
4.1 Computing the Search Direction
This subsection is devoted to finding the search direction in Eq (14). With the choice of and , Eq(14) reduces to the following optimization program:
(15) 
At first glance, Program (15) is challenging. First, this is a constrained optimization program with variables and equality constraints. Second, the optimization problem involves computing and storing an Hessian matrix , which is a daunting task in algorithm design.
We carefully analyze Problem (15) and propose the following solutions. For the first issue, Eq (15) is actually a unconstrained quadratic program with variable. In order to handle the diagonal variables of , one can explicitly enforce the diagonal entries of current solution and its gradient to 0. Therefore, the constraint can always be guaranteed. This implies that linear conjugate gradient method can be used to solve Problem (15). For the second issue, we can make good use of the Kronecker product structure of the Hessian matrix. We note that . Given a vector , using the fact that the Hessian matrix can be computed as , the Hessianvector product can be computed efficiently as:
, which only involves matrixmatrix computation. Our modified linear conjugate gradient method for finding the search direction is summarized in Algorithm 2. Note that the algorithm involves a parameter controlling the maximum number of iterations. The specific value of does not affect the correctness of COA, only its efficiency. Through experiments we found that a value of usually leads to good overall efficiency of COA.
4.2 Computing the Step Size
After the Newton direction is found, we need to compute a step size that ensures positive definiteness of the next iterate and leads to a sufficient decrease of the objective function. We use Armijo’s rule and try step size with a constant decrease rate until we find the smallest with such that is (i) positive definite, and (ii) satisfies the following sufficient decrease condition [27]:
(16) 
where . We choose and in our experiments.
We verify positive definiteness of the solution while computing its Cholesky factorization (takes flops). We remark that the Cholesky factorization dominates the computational cost in the stepsize computations. To reduce the computation cost, we can reuse the Cholesky factor in the previous iteration when evaluating the objective function (that requires the computation of ). The decrease condition in Eq (16) has been considered in [27] to ensure that the objective value not only decreases but also decreases by a certain amount , where measures the optimality of the current solution.
The following lemma provides some theoretical insights of the line search program. It states that a strictly positive step size can always be achieved in Algorithm 1. This property is crucial in our global convergence analysis of the algorithm.
Lemma 9
There exists a strictly positive constant
such that the positive definiteness and sufficient descent conditions (in step 78 of Algorithm 1) are satisfied. Here and are some constants which are independent of the current solution .
The following lemma shows that a full Newton step size will be selected eventually. This is useful for the proof of local quadratic convergence.
Lemma 10
If is close enough to global optimal solution such that , the line search condition will be satisfied with step size .
4.3 Theoretical Analysis
First, we provide convergence properties of Algorithm 1. We prove that Algorithm 1 always converges to the global optimum, and then analyze its convergence rate. Our convergence analysis is mainly based on the proximal point gradient method [27, 10] for composite function optimization in the literature. Specifically, we have the following results (proofs appear in the Appendix):
Theorem 1
Theorem 2
Theorem 3
It is worth mentioning that Algorithm 1 is the first polynomial algorithm for linear query processing under approximate differential privacy with a provable global optimum guarantee.
Next we analyze the time complexity of our algorithm. Assume that COA converges within outer iterations (Algorithm 1) and inner iterations (Algorithm 2). Due to the complexity for Cholesky factorization (where is the number of unit counts), the total complexity of COA is . Note that the running time of COA is independent of the number of queries . In contrast, the current stateoftheart LRM has time complexity (where and are the number of outer and inner iterations of LRM, respectively), which involves both and . Note that in the big notation is often much smaller than in practice, due to the quadratic convergence rate of COA. According to our experiments, typically COA converges with and .
4.4 A Homotopy Algorithm
In Algorithm 1, we assume that is positive definite. If this is not true, one can consider adding a deceasing regularization parameter to the diagonal entries of . We present a homotopy algorithm for solving Program (9) with approaching 0 in Algorithm 3.
The homotopy algorithm used in [25, 6] have shown the advantages of continuation method in speeding up solving largescale optimization problems. In continuation method, a sequence of optimization problems with deceasing regularization parameter is solved until a sufficiently small value is arrived. The solution of each optimization is used as the warm start for the next iteration.
In Eq (8), a smaller is always preferred because it results in more accurate approximation of the original optimization in Program (9). However, it also implies a slower convergence rate, according to our convergence analysis. Hence the computational cost of our algorithm is high when small is selected. In Algorithm 3, a series of problems with decreasing regularization parameter are solved by using Algorithm 1, and the solution of each run of Algorithm 1 is used as the initial solution of the next iteration. In this paper, Algorithm 3 starts from a large , and it stops when the preferred arrives.
5 Experiments
This section experimentally evaluates the effectiveness of the proposed convex optimization algorithm COA for linear aggregate processing under approximate differential privacy. We compare COA with six existing methods: Gaussian Mechanism (GM) [16], Wavelet Mechanism (WM) [29], Hierarchical Mechanism (HM) [8], Exponential Smoothing Mechanism (ESM) [30, 13], Adaptive Mechanism (AM) [14, 13] and LowRank Mechanism (LRM) [30, 31]. Qardaji et al. [23] proposed an improved version of HM by carefully selecting the branching factor. Similar to HM, this method focuses on range processing, and there is no guarantee on result quality for general linear aggregates. A detailed experimental comparison with [23] is left as future work. Moreover, we also compare with a recent hybrid data and workloadaware method [12] which is designed only for range queries and exact differential privacy. Since a previous study [31] has shown that LRM significantly outperforms MWEM, we do not compare with Exponential Mechanism with Multiplicative Weights update (MWEM). Although the batch query processing problem under approximate differential privacy in Program (9) can be reformulated as a standard semidefinite programming problem which can be solved by interior point solvers, we do not compare with it either since such method requires prohibitively high CPU time and memory consumption even for one single (Newton) iteration.
For AM, we employ the Python implementation obtained from the authors’ website: http://cs.umass.edu/~chaoli. We use the default stopping criterion provided by the authors. For ESM and LRM, we use the Mablab code provided by the authors, which is publicly available at: http://yuanganzhao.weebly.com/. For COA, we implement the algorithm in Matlab (refer to the Appendix of this paper) and only report the results of Algorithm 1 with the parameter . We performed all experiments on a desktop PC with an Intel quadcore 2.50 GHz CPU and 4GBytes RAM. In each experiment, every algorithm is executed 20 times and the average performance is reported.
Following the experimental settings in [31], we use four realworld data sets (Search Log, Net Trace, Social Network and UCI Adult) and fours different types of workloads (WDiscrete, WRange, WMarginal and WRelated). In WDiscrete
, each entry is a random variable follows the bernoulli distribution; in
WRange, each query sums the unit counts in a range whose start and end points are randomly generated following the uniform distribution.
WMarginal contains queries uniformly sampled from the set of all twoway marginals. For WRelated, we generate workload matrix by lowrank matrix multiplication [31]. Moreover, we measure average squared error and computation time of all the methods. Here the average squared error is the average squared distance between the exact query answers and the noisy answers. In the following, Section 5.1 examines the convergence of Algorithm 1. Sections 5.2 and 5.3 demonstrate the performance of all method with varying domain size {128, 256, 512, 1024, 2014, 4096, 8192} and number of queries { 128, 256, 512, 1024, 2048, 4096, 8192}, respectively. Section 5.5 shows the running time of the proposed method. Unless otherwise specified, the default parameters in bold are used. The privacy parameters are set to in our experiments for all methods, except for DAWA, which has since it answers queries under exact differential privacy.5.1 Convergence Behavior of COA
Firstly, we verify the convergence property of COA using all the datasets on all the workloads. We record the objective value (i.e. the expected error), the optimality measure (i.e. ) and the test error on four datasets at every iteration and plot these results in Figure 1.
We make three important observations from these results. (i) The objective value and optimality measure decrease monotonically. This is because our method is a greedy descent algorithm. (ii) The test errors do not necessarily decrease monotonically but tend to decrease iteratively. This is because we add random gaussian noise to the results and the average squared error is expected to decrease. (iii) The objective values stabilize after the th iteration, which means that our algorithm has converged, and the decrease of the error is negligible after the th iteration. This implies that one may use a looser stopping criterion without sacrificing accuracy.
5.2 Impact of Varying Number of Unit Counts
We now evaluate the accuracy performance of all mechanisms with varying domain size from 64 to 4096, after fixing the number of queries to 1024. We report the results of all mechanisms on the 4 different workloads in Figures 5, 5, 5 and 5, respectively. We have the following observations. (i) COA obtains comparable results with LRM, the current state of the art. Part of the reason may be that, the random initialization strategy makes LRM avoid undesirable local minima. In addition, COA and LRM achieve the best performance in all settings. Their improvement over the naive GM is over two orders of magnitude, especially when the domain size is large. (ii) WM and HM obtain similar accuracy on WRange and they are comparable to COA and LRM. This is because they are designed for range queries optimization. (iii) AM and ESM have similar accuracy and they are usually strictly worse than COA and LRM. Moreover, the accuracy of AM and ESM is rather unstable on workload WMarginal. For ESM, this instability is caused by numerical errors in the matrix inverse operation, which can be high when the final solution matrix is lowrank. Finally, AM searches in a reduced subspace for the optimal strategy matrix, leading to suboptimal solutions with unstable quality.
5.3 Impact of Varying Number of Queries
In this subsection, we test the impact of varying the query set cardinality from 32 to 8192 with fixed to 512. The accuracy results of all mechanisms on the 4 different workloads are reported in Figures 9, 9, 9 and 9. We have the following observations. (i) COA and LRM have similar performance and they consistently outperform all the other methods in all test cases. (ii) On WDiscrete and WRange workloads, AM and ESM show comparable performance, which is much worse performance than COA and LRM. (iii) On WDiscrete, WRange and WRelated workload, WM and HM improve upon the naive Gaussian mechanism; however, on WMarginal, WM and HM incur higher errors than GM. AM and ESM again exhibit similar performance, which is often better than that of WM, HM, and GM.
5.4 Impact of Varying Rank of Workload
Past studies [30, 31] show that it is possible to reduce the expected error when the workload matrix has low rank. In this set of experiments, we manually control the rank of workload to verify this claim. Recall that the parameter determines the size of the matrix and the size of the matrix during the generation of the WRelated workload. When and contain only independent rows/columns, is exactly the rank of the workload matrix . In Figure 13, we vary from to . We observe that both LRM and COA outperform all other methods by at least one order of magnitude. With increasing , the performance gap gradually closes. Meanwhile, COA’s performance is again comparable to LRM.
5.5 Running Time Evaluations
We now demonstrate the efficiency of LRM, ESM and COA for the 4 different types of workloads. Other methods, such as WM and HM, requires negligible time since they are essentially heuristics without complex optimization computations. From our experiments we obtain the following results. (i) In Figure 13, we vary from 32 to 8192 and fix to 1024. COA requires the same running time regardless of the number of queries , whereas the efficiency of LRM deteriorates with increasing . (ii) In Figure 13, we vary from 32 to 8192 and fix to 1024. We observe that COA is more efficient than LRM when is relatively small (i.e., ). This is mainly because COA converges with much fewer iterations than LRM. Specifically, we found through manual inspection that COA converges within about outer iterations (refer to Figure 1) and inner iterations (refer to our Matlab code in the Appendix). In contract, LRM often takes about outer iterations and about inner iterations to converge. When is very large (e.g., when ) and is relatively small (1024), COA may run slower than LRM due to the former’s cubic runtime complexity with respect to the domain size . (iii) In Figure 13, we vary from 32 to 8192 and fix to a lager value 2048. We observe that COA is much more efficient than LRM for all values of . This is because the runtime of COA is independent of while LRM scale quadratically with , and COA has quadratic local convergence rate. These results are consistent with the convergence rate analysis and complexity analysis in Section 4.3.
5.6 COA vs DAWA
DAWA [12] targets very different applications compared to the proposed solution COA. In particular, DAWA focuses on range processing under exact (i.e., ) differential privacy, whereas COA addresses arbitrary linear counting queries under approximate (i.e., (, )) differential privacy. Adapting DAWA to approximate differential privacy is nontrivial, because at the core of DAWA lies a dynamic programming algorithm that is specially designed for cost and the Laplace mechanism (refer to Section 3.2 in [12]). Further, DAWA replies on certain assumptions of the underlying data, e.g., adjacent counts are similar in value, whereas COA is dataindependent. Hence, their relative performance depends on the choice of parameter , as well as the dataset.
We compare COA with different values of ranging from 0.01 to 0.00001 against DAWA on workload WRange, since DAWA focuses on range queries. We also consider 4 additional synthetic datasets which do not have local smoothness structure, i.e. Random Alternating, Random Laplace, Random Gaussian, Random Uniform. Specifically, the sensitive data Random Alternating only contains two values which appear alternatingly in the data sequence. For Random Laplace, Random Gaussian, Random Uniform, the sensitive data consists of a random vector
with mean zero and variance 10 which is drawn from the Laplacian, Gaussian and Uniform distribution, respectively.
Figure 14 shows the results with varying domain size , and Figure 15 shows the results with varying domain size . We have the following observations. (i) On realworld datasets Search Log, Net Trace, Social Network and all synthetic datasets (Random Alternating, Random Laplace, Random Gaussian, Random Uniform), the performance of DAWA is rather poor, since these datasets do not satisfy the assumption that adjacent aggregates have similar values. (ii) With a fixed number of queries , COA significantly outperforms DAWA when is large. (iii) COA generally achieves better performance than DAWA when . (iv) DAWA outperforms COA only when is very small, and the dataset happens to satisfy its assumptions. In such situations, one potential way to improve COA is to incorporate datadependent information through a postprocessing technique (e.g., [9, 11]), which is outside of the scope of this paper and left as future work.
6 Conclusions and Future work
In this paper we introduce a convex reformulation for optimizing batch linear aggregate queries under approximate differential privacy. We provide a systematic analysis of the resulting convex optimization problem. In order to solve the convex problem, we propose a Newtonlike method, which is guaranteed to achieve globally linear convergence rate and locally quadratic convergence rate. Extensive experiment on real world data sets demonstrate that our method is efficient and effective.
There are several research directions worthwhile to pursuit in the future. (i) First of all, it is interesting to extend the proposed method to develop hybrid data and workloadaware differentially private algorithms [12, 11]. (ii) This paper mainly focuses on optimal squared error minimization. Due to the rotational invariance of the norm, the proposed solution can achieve global optimum. We plan to investigate convex relaxations/reformulations to handle the squared/absolute sum error under differential privacy. (iii) While we consider convex semidefinite optimization, one may consider other convex relaxation methods (e.g. further SDP relaxations [28], SecondOrder Cone Programming (SOCP) [26]) and other efficient linear algebra (such as partial eigenvalue decomposition, randomized scheme or parallelization) to reduce the computational cost for largescale batch linear aggregate query optimization.
Appendix
1 Semidefinite Programming Reformulations
In this section, we discuss some convex SemiDefinite Programming (SDP) reformulations for Eq (2) in our submission. Based on these reformulations, we can directly and effectively solve the batch queries answering problem using offtheshelf interiorpoint SDP solvers.
The following lemma is useful in deriving the SDP formulations for approximate and exact differential privacy.
Lemma 11
[2] Schur Complement Condition. Let be a real symmetric matrix given by and be the Schur complement of in , that is: . Then we have:
1.1 Approximate Differential Privacy
This subsection presents the SDP formulation for approximate differential privacy, i.e. . Letting , we have and . Introducing a new variable such that , Eq (2) can be cast into the following convex optimization problem.
(17) 
Since whenever , we relax the to . By Lemma 11, we have the following optimization problem which is equivalent to Eq (17):
(18) 
After the solution in Eq(18) has been found by solving standard convex SDP, we can preform Cholesky decomposition or eigenvalue decomposition on such that and output the matrix as the final configuration. We remark that the output solution is the exact solution of approximate differential privacy optimization problem itself.
1.2 Exact Differential Privacy
This subsection presents the SDP formulation for exact differential privacy, i.e. . Letting , then we have:
By Lemma 11, we have its equivalent reformulation:
This is also equivalent to the following problem:
Using Lemma 11 again and dropping the rank constraint, we have the following convex relaxation problem:
(19) 
After the problem in Eq(1.2) has been solved by standard convex SDP, we can output the matrix as the final configuration. Interestingly, we found that unlike the case for approximate differential privacy, the output matrix is not the exact solution of the exact differential privacy optimization problem since we drop the rank constraint in Eq (1.2).
2 Technical Proofs
The following lemma is useful in our proof.
Lemma 12
For any two matrices and , the following inequality holds:
where denotes the smallest eigenvalue of .
We denote . Since both and are PSD matrices, we let . Then we have the following inequalities:
Comments
There are no comments yet.