# Private Optimization Without Constraint Violations

We study the problem of differentially private optimization with linear constraints when the right-hand-side of the constraints depends on private data. This type of problem appears in many applications, especially resource allocation. Previous research provided solutions that retained privacy, but sometimes violated the constraints. In many settings, however, the constraints cannot be violated under any circumstances. To address this hard requirement, we present an algorithm that releases a nearly-optimal solution satisfying the problem's constraints with probability 1. We also prove a lower bound demonstrating that the difference between the objective value of our algorithm's solution and the optimal solution is tight up to logarithmic factors among all differentially private algorithms. We conclude with experiments on real and synthetic datasets demonstrating that our algorithm can achieve nearly optimal performance while preserving privacy.

## Authors

• 2 publications
• 4 publications
• 10 publications
• 11 publications
06/22/2020

### Differentially Private Convex Optimization with Feasibility Guarantees

This paper develops a novel differentially private framework to solve co...
02/17/2021

### Differentially Private Correlation Clustering

Correlation clustering is a widely used technique in unsupervised machin...
05/02/2019

### Scalable and Jointly Differentially Private Packing

We introduce an (ϵ, δ)-jointly differentially private algorithm for pack...
06/01/2021

### Differentially Private Densest Subgraph

Given a graph, the densest subgraph problem asks for a set of vertices s...
08/07/2019

### A Privacy-preserving Method to Optimize Distributed Resource Allocation

We consider a resource allocation problem involving a large number of ag...
08/20/2018

### An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices

Statistical agencies face a dual mandate to publish accurate statistics ...
06/23/2019

### The Cost of a Reductions Approach to Private Fair Optimization

We examine a reductions approach to fair optimization and learning where...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Differential privacy [5] has emerged as the standard for reasoning about user privacy and private computations. A myriad of practical algorithms exist for a broad range of specific problems. We can now solve tasks in a private manner ranging from computing simple dataset statistics [17]

to modern machine learning

[1]. In this work we add to this body of research by tackling a fundamental question of constrained optimization.

Specifically, we study optimization problems with linear constraints and Lipschitz objective functions. This family of optimization problems includes linear programming and quadratic programming with linear constraints, which can be used to formulate diverse problems in computer science, as well as other fields such as engineering, manufacturing, and transportation. Resource allocation is an example of a common problem in this family: given multiple agents competing for limited goods, how should the goods be distributed among the agents? Whether assigning jobs to machines or partitioning network bandwidth among different applications, these problems have convex optimization formulations with linear constraints. Given that the input to these problems may come from private user data, it is imperative that we find solutions that do not leak information about any individual.

Formally, the goal in linearly-constrained optimization is to find a vector

maximizing a function subject to the constraint that . Due in part to the breadth of the problems covered by these approaches, the past several decades have seen the development of a variety of optimization algorithms with provable guarantees, as well as fast commercial solvers. The parameters and encode data about the specific problem instance at hand, and it is easy to come up with instances where simply releasing the optimal solution would leak information about this sensitive data.

As a concrete example, suppose there is a hospital with branches located throughout a state, each of which has a number of patients with a certain disease. A specific drug is required to treat the infected patients, which the hospital can obtain from a set of pharmacies. The goal is to determine which pharmacies should supply which hospital branches while minimizing the transportation cost. In Figure 1, we present this problem as a linear program (LP). The LP is defined by sensitive information: the constraint vector reveals the number of patients with the disease at each branch.

We provide tools with provable guarantees for solving linearly-constrained optimization problems while preserving differential privacy (DP) [5]. Our algorithm applies to the setting where the constraint vector depends on private data, as is the case in many resource allocation problems, such as the transportation problem above. This problem falls in the category of private optimization, for which there are multiple algorithms in the unconstrained case [2, 4, 12]. To the best of our knowledge, only Hsu et al. [11] study differentially private linear programming—a special case of linearly-constrained optimization. They allow their algorithm’s output to violate the constraints, which can be unacceptable in many applications. In our transportation example from Figure 1, if the constraints are violated, then some hospital will not receive the drugs they require, or some pharmacy will be asked to supply more drugs than they have in their inventory. The importance of satisfying constraints motivates this paper’s central question:

How can we privately solve optimization problems while ensuring that no constraint is violated?

### 1.1 Results overview

Formally, our goal is to privately solve optimization problems of the form

 max→x∈\Rn{g(→x):A→x≤→b(D)},

where the function is -Lipschitz and depends on a private database . The database is a set of individuals’ records, each of which is an element of a domain

To solve this problem, our -differentially private algorithm maps to a nearby vector and releases the vector maximizing such that . (We assume is efficiently optimizable under linear constraints, which is the case, for example, when is convex.) We ensure that coordinate-wise, and therefore our algorithm’s output satisfies the constraints. This requirement precludes our use of traditional DP mechanisms: perturbing each component of using the Laplace, Gaussian, or exponential mechanisms would not result in a vector that is component-wise smaller than . Instead, we extend the truncated Laplace mechanism to a multi-dimensional setting.

We provide a utility bound on the quality of the solution, proving that if is our algorithm’s output and is the optimal solution to the original optimization problem, then is close to . Our bound depends on the sensitivity of the vector , which equals the maximum -norm between any two vectors and when and are neighboring, in the sense that and differ on at most one individual’s data. Our bound also depends on the “niceness” of the matrix , which we quantify using the condition number of the linear system111Here, we use the simplified notation , where is defined in Section 3 and is the -norm under which the objective function is -Lipschitz.  [13, 15]. We summarize our upper bound below (see Theorem 3.2 for the complete statement). [Simplified upper bound] With probability 1,

 g(→x∗)−g(→x(D))≤2⋅Δ⋅L⋅α(A)ϵln(m(eϵ−1)δ+1). (1)

Our main contribution is to show that our proposed algorithm is nearly optimal. We provide a lower bound that is tight up to a logarithmic factor as we summarize below. [Simplified lower bound] There exists an infinite family of matrices , a -Lipschitz function , and a mapping from databases to vectors for any such that:

1. The sensitivity of equals , and

2. For any and , if is an -differentially private mechanism such that with probability 1, then

 g(→x∗)−\E[g(→μ(D))]≥Δ⋅α(A)4ϵln(eϵ−12δ+1).

This lower bound matches the upper bound from Equation (1) up to a multiplicative factor of . See Theorem 3.2 for the complete statement.

#### Pure differential privacy.

A natural question is whether we can achieve pure -differential privacy. In Section 4, we prove that if —the intersection of the feasible regions across all databases —is nonempty, then the optimal -differentially private mechanism disregards the database and outputs with probability 1. If , then no -differentially private mechanism exists. Therefore, any non-trivial private mechanism must allow for a failure probability .

#### Experiments.

We empirically evaluate our algorithm in the contexts of financial portfolio optimization and internet advertising. Our experiments show that our algorithm can achieve nearly optimal performance while preserving privacy. We also compare our algorithm to a baseline -differentially private mechanism that is allowed to violate the problem’s constraints. Our experiments demonstrate that for small values of the privacy parameter , using the baseline algorithm yields a large number of violated constraints, while using our algorithm violates no constraints and incurs virtually no loss in revenue.

#### Truncated Laplace mechanism.

Geng et al. [7] also study the truncated Laplace mechanism in a one-dimensional setting. Given a query mapping from databases to , they study query-output independent noise-adding (QIN) algorithms. Each such algorithm is defined by a distribution over It releases the query output perturbed by additive random noise , i.e., They provide upper and lower bounds on the expected noise magnitude of any QIN algorithm, the upper bound equaling the expected noise magnitude of the truncated Laplace mechanism. They show that in the limit as the privacy parameters and converge to zero, these upper and lower bounds converge.

The Laplace mechanism is known to be a nearly optimal, general purpose -DP mechanism. While other task-specific mechanisms can surpass the utility of the Laplace mechanism [8], they all induce distributions with exponentially decaying tails. The optimality of these mechanisms comes from the fact that the ratio between the mechanism’s output distributions for any two neighboring databases is exactly . Adding less noise would fail to maintain that ratio everywhere, while adding more noise would distort the query output more than necessary. Geng et al. [7] observe that, in the case of -DP mechanisms, adding large magnitude, low probability noise is wasteful, since the DP criteria can instead be satisfied using the “budget” rather than maintaining the ratio everywhere. To solve our private optimization problem, we shift and add noise to the constraints, and in our case adding large magnitude, low probability noise is not only wasteful but violates our requirement that the constraints must be satisfied with probability 1.

Given their similar characterizations, it is not surprising that our mechanism is closely related to that of Geng et al. [7]—the mechanisms both draw noise from a truncated Laplace distribution. Our problem, however, is multidimensional and therefore extra care must be taken in how much noise we add to each coordinate. Moreover, the proof of our mechanism’s optimality is stronger in several ways. First, it holds for any differentially private algorithm, not just for the limited class of QIN algorithms. Second, in the one-dimensional setting —which is the setting that Geng et al. [7] analyze—our lower bound matches our algorithm’s upper bound up to a constant factor of 8 for any and , not only in the limit as and converge to zero.

#### Differentially private combinatorial optimization.

Several recent works have studied differentially private combinatorial optimization

[9, 10], which is a distinct problem from ours, since most combinatorial optimization problems cannot be formulated only using linear constraints. Hsu et al. [10] specifically study a private variant of a classic allocation problem: there are agents and goods, and the agents’ values for all bundles of the goods are private. The goal is to allocate the goods among the agents in order to maximize social welfare, while maintaining differential privacy. This is similar but distinct from the transportation problem from Figure 1. Indeed, if we were to follow the formulation from Hsu et al. [10], the transportation costs would be private, whereas in our setting, the transportation costs are public but the total demand of each hospital is private.

## 2 Differential privacy definition

To define differential privacy (DP), we first formally introduce the notion of a neighboring database: two databases are neighboring, denoted , if they differ on any one record . We use the notation

to denote the random variable corresponding to the vector that our algorithm releases (non-trivial DP algorithms are, by necessity, randomized). Given privacy parameters

and , the algorithm satisfies -differential privacy (-DP) if for any neighboring databases and any subset ,

 Pr[→x(D)∈V]≤eϵPr[→x(D′)∈V]+δ.

## 3 Multi-dimensional optimization

Our goal is to privately solve multi-dimensional optimization problems of the form

 max→x∈\Rn{g(→x):A→x≤→b(D)}, (2)

where is a vector in and is an -Lipschitz function according to an -norm for . Preserving differential privacy while ensuring the constraints are always satisfied is impossible if the feasible regions change drastically across databases. For example, if and are neighboring databases with disjoint feasible regions, there is no -DP mechanism that always satisfies the constraints with (see Lemma A in Appendix A). To circumvent this impossibility, we assume that the intersection of the feasible regions across databases is nonempty. This is satisfied, for example, if the origin is always feasible. The set is non-empty.

In our approach, we map each vector to a random variable and release

 →x(D)∈\argmax→x∈\Rn{g(→x):A→x≤¯→b(D)}. (3)

To formally describe our approach, we use the notation to denote the constraint vector’s sensitivity. We define the component of to be where , is drawn from the truncated Laplace distribution with support and scale , and . In Lemmas A and A in Appendix A, we prove that which allows us to prove that Equation (3) is feasible. In Section 3.1, we prove that our algorithm preserves differential privacy and in Section 3.2, we bound our algorithm’s loss.

### 3.1 Privacy guarantee

In this section, we prove that our algorithm satisfies differential privacy. We use the notation to denote a random vector where each component is drawn i.i.d. from the truncated Laplace distribution with support and scale . We also use the notation .

The mapping preserves -differential privacy.

###### Proof.

Let and be two neighboring databases. We write the density function of as when and when . This proof relies on the following two claims. The first claim shows that in the intersection of the supports the density functions and are close. The proof is in Appendix A.

[] Let be a vector in the intersection of the supports . Then . The second claim shows that the total density of on vectors not contained in the support of is at most .

Let be the set of vectors in the support of but not in the support of . Then .

###### Proof of Claim 3.1.

Suppose . Then for some , either or . This implies that either or . The density function of the truncated Laplace distribution with support and scale is

where is a normalizing constant. Therefore, the probability that for some , either or is

 m(∫−s+Δ−sf(η)dη+∫ss−Δf(η)dη) =mZ(∫−s+Δ−sexp(−|η|ϵΔ)dη+∫ss−Δexp(−|η|ϵΔ)dη) =2mΔ(eϵ−1)e−sϵ/ΔZϵ =m(eϵ−1)e−sϵ/Δ1−e−ϵs/Δ =m(eϵ−1)esϵ/Δ−1 =δ,

where the final equality follows from the fact that In turn, this implies that . ∎

These two claims imply that the mapping preserves -differential privacy. To see why, let be an arbitrary set of vectors in the support of . Let be the set of vectors in that are also in the support of and let be the remaining set of vectors in . As in Claim 3.1, let be the set of vectors in the support of but not in the support of . Clearly, . Therefore,

 Pr[→b(D)−s+→η∈W] =Pr[→b(D)−s+→η∈W0]+Pr[→b(D)−s+→η∈W1] ≤Pr[→b(D)−s+→η∈W0]+Pr[→b(D)−s+→η∈V] =∫W0fD(→u)d→u+∫VfD(→u)d→u ≤∫W0eϵfD′(→u)d→u+∫VfD(→u)d→u (Claim 3.1) ≤∫W0eϵfD′(→u)d→u+δ (Claim 3.1) ≤eϵPr[→b(D′)−s+→η∈W]+δ,

so differential privacy is preserved. ∎

Since differential privacy is immune to post-processing [6], Theorem 3.1 implies our algorithm is differentially private.

The mapping is -differentially private.

### 3.2 Quality guarantees

We next provide a bound on the quality of our algorithm, which measures the difference between the optimal solution and the solution our algorithm returns . Our quality guarantee depends on the “niceness” of the matrix , as quantified by the linear system’s condition number [13], denoted . Li [13] proved that this value sharply characterizes the extent to which a change in the vector causes a change in the feasible region, so it makes sense that it appears in our quality guarantees. Given a norm on where , we use the notation to denote the dual norm where . The linear system’s condition number is defined as

When is nonsingular and ,

is at most the inverse of the minimum singular value,

. This value is closely related to the matrix ’s condition number (which is distinct from , the linear system’s condition number), which roughly measures the rate at which the solution to changes with respect to a change in .

We now prove our quality guarantee, which bounds the difference between the optimal solution to the original optimization problem (Equation (2)) and that of the privately transformed problem (Equation (3)).

Suppose Assumption 3 holds and the function is -Lipschitz with respect to an -norm on . With probability 1,

###### Proof.

Let be an arbitrary vector in the support of and let . Let be an arbitrary point in and let be an arbitrary vector in . We know that

 max→x∈\Rn{g(→x):A→x≤→b(D)}−max→x∈\Rn{g(→x):A→x≤→b} =g(→x∗)−max→x∈\Rn{g(→x):A→x≤→b} =g(→x∗)−g(¯→x)+g(¯→x)−max→x∈\Rn{g(→x):A→x≤→b}.

Since , we know that Therefore,

 (4)

Let Equation (4) shows that for every , . Meanwhile, from work by Li [13], we know that for any -norm ,

 inf¯→x∈S\norm→x∗−¯→xq≤αp,q(A)⋅\norm→b(D)−→bp. (5)

By definition of the infimum, this means that This inequality holds for any vector in the support of and with probability 1,

 \norm→b(D)−¯→b(D)p≤2Δp√mϵln(m(eϵ−1)δ+1).

Therefore, the theorem holds. ∎

In the following examples, we instantiate Theorem 3.2 in several specific settings.

[Nonsingular constraint matrix] When is nonsingular, setting implies that

[Strongly stable linear inequalities] We can obtain even stronger guarantees when the system of inequalities has a solution. In that case, the set is non-empty for any vector  [14], so we need not make Assumption 3. Moreover, when and both equal the -norm and has a solution, we can replace in Theorem 3.2 with the following solution to a linear program:

 ¯α(A)=max(→u,→z)∈\Rm+n{→1⋅→u:−→z≤→u⊤A≤→z,→u≥→0, and →1⋅→z=1}.

This is because we can replace Equation (5) with  [14].

In the following theorem, we prove that the quality guarantee from Theorem 3.2 is tight up to a factor of . Let be an arbitrary diagonal matrix with positive diagonal entries and let be the function . For any , there exists a mapping from databases to vectors such that:

1. The sensitivity of equals , and

2. For any and , if is an -differentially private mechanism such that with probability 1, then

 max{g(→x):A→x≤→b(D)}−\E[g(→μ(D))]≥Δ4ϵ⋅infp≥1{αp,1(A)p√m}⋅ln(eϵ−12δ+1).

Since the objective function is 1-Lipschitz under the -norm, this lower bound matches the upper bound from Theorem 3.2 up to a factor of .

###### Proof of Theorem 3.2.

For ease of notation, let Notice that implies . For each vector , let be a database where for any , and are neighboring if and only if . Let and let be the diagonal entries of . Since with probability 1, must be coordinate-wise smaller than .

For each index , let be the set of vectors whose components are smaller than :

 S0i={→x∈\Rm:xi≤Δai(di−⌊t⌋)}.

Similarly, let

 S1i={→x∈\Rm:Δai(di−⌊t⌋)

For any vector , let . The sets partition the support of into rectangles. Therefore, by the law of total expectation,

 \E[g(→μ(D→d))]=∑→I∈{0,1}m\E[g(→μ(D→d))∣→μ(D→d)∈S→I]Pr[→μ(D→d)∈S→I]. (6)

Suppose that for some . If , then we know that . Meanwhile, if , then since with probability 1. Since , we have that for each ,

 \E[g(→μ(D→d))∣→μ(D→d)∈S→I]≤m∑i=1Δ(di−⌊t⌋)ai1{Ii=0}+Δdiai1{Ii=1}=m∑i=1Δdiai−Δ⌊t⌋ai1{Ii=0}.

Combining this inequality with Equation (6) and rearranging terms, we have that

 \E[g(→μ(D→d))] ≤Δm∑i=1diai−∑→I∈{0,1}mm∑i=1Δ⌊t⌋ai1{Ii=0}Pr[→μ(D→d)∈S→I] =Δm∑i=1diai−Δ⌊t⌋m∑i=11ai∑→I∈{0,1}m1{Ii=0}Pr[→μ(D→d)∈S→I].

For any , Therefore,

 \E[g(→μ(D→d))]≤Δm∑i=1diai−Δ⌊t⌋m∑i=11aiPr[→μ(D→d)∈S0i]. (7)

We now prove that for every index , . This proof relies on the following claim.

[] For any index , vector , and integer , let be the set of all vectors whose component is in the interval :

 S¯→d,i,j={→x∈\Rm:Δ(¯di−j)ai

Then . Notice that , a fact that will allow us to prove that .

###### Proof of Claim 3.2.

We prove this claim by induction on .

#### Base case (j=1).

Fix an arbitrary index and vector . Let be the standard basis vector with a 1 in the component and 0 in every other component. Since , we know the probability that is zero. In other words,

 Pr[→μ(D¯→d−→ei)∈S¯→d,i,1]=0. (8)

Since and are neighboring, this means that

 Pr[→μ(D¯→d)∈S¯→d,i,1]≤eϵPr[→μ(D¯→d−→ei)∈S¯→d,i,1]+δ=δ.

#### Inductive step.

Fix an arbitrary and suppose that for all indices and vectors , We want to prove that for all indices and vectors , To this end, fix an arbitrary index and vector . By the inductive hypothesis, we know that

 Pr[→μ(D¯→d−→ei)∈S¯→d−→ei,i,j]≤δj−1∑ℓ=0eϵℓ. (9)

Note that

 S