# Fair Dimensionality Reduction and Iterative Rounding for SDPs

We model "fair" dimensionality reduction as an optimization problem. A central example is the fair PCA problem: the input data is divided into k groups, and the goal is to find a single d-dimensional representation for all groups for which the maximum variance (or minimum reconstruction error) is optimized for all groups in a fair (or balanced) manner, e.g., by maximizing the minimum variance over the k groups of the projection to a d-dimensional subspace. This problem was introduced by Samadi et al. (2018) who gave a polynomial-time algorithm which, for k=2 groups, returns a (d+1)-dimensional solution of value at least the best d-dimensional solution. We give an exact polynomial-time algorithm for k=2 groups. The result relies on extending results of Pataki (1998) regarding rank of extreme point solutions to semi-definite programs. This approach applies more generally to any monotone concave function of the individual group objectives. For k>2 groups, our results generalize to give a (d+√(2k+0.25)-1.5)-dimensional solution with objective value as good as the optimal d-dimensional solution for arbitrary k,d in polynomial time. Using our extreme point characterization result for SDPs, we give an iterative rounding framework for general SDPs which generalizes the well-known iterative rounding approach for LPs. It returns low-rank solutions with bounded violation of constraints. We obtain a d-dimensional projection where the violation in the objective can be bounded additively in terms of the top O(√(k))-singular values of the data matrices. We also give an exact polynomial-time algorithm for any fixed number of groups and target dimension via the algorithm of Grigoriev and Pasechnik (2005). In contrast, when the number of groups is part of the input, even for target dimension d=1, we show this problem is NP-hard.

## Authors

• 18 publications
• 6 publications
• 12 publications
• 8 publications
• 19 publications
• ### The Price of Fair PCA: One Extra Dimension

We investigate whether the standard dimensionality reduction technique o...
10/31/2018 ∙ by Samira Samadi, et al. ∙ 0

• ### Efficient Fair Principal Component Analysis

The flourishing assessments of fairness measure in machine learning algo...
11/12/2019 ∙ by Mohammad Mahdi Kamani, et al. ∙ 0

• ### On the Two-Dimensional Knapsack Problem for Convex Polygons

We study the two-dimensional geometric knapsack problem for convex polyg...
07/31/2020 ∙ by Arturo Merino, et al. ∙ 0

• ### Minimum-cost integer circulations in given homology classes

Let D be a directed graph cellularly embedded on a surface together with...
11/25/2019 ∙ by Ina Seidel, et al. ∙ 0

• ### Unleashing Linear Optimizers for Group-Fair Learning and Optimization

Most systems and learning algorithms optimize average performance or ave...
04/11/2018 ∙ by Daniel Alabi, et al. ∙ 0

• ### Approximation Algorithms for Clustering via Weighted Impurity Measures

An impurity measures I:R^k →R^+ maps a k-dimensional vector v to a non-...
07/13/2018 ∙ by Ferdinando Cicalese, et al. ∙ 0

• ### Polynomial Time Algorithms to Find an Approximate Competitive Equilibrium for Chores

Competitive equilibrium with equal income (CEEI) is considered one of th...
07/12/2021 ∙ by Shant Boodaghians, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Choosing a low-dimensional representation of a large, high-dimensional data set is a basic computational task with many applications, and is a core primitive for modern machine learning. Perhaps the most ubiquitous and effective of these in practice is

principal component analysis (PCA), which finds a subspace that maximizes the squared lengths of data points projected to the subspace (equivalently, minimizes the sum of squared distances, or regression error, to a -dimensional subspace). When viewing the data as the rows of an matrix , the objective is to find an projection matrix that maximizes the Frobenius norm, .

#### Fairness and multi-objective optimization.

In this age of large-dimensional data, PCA is an indispensable tool for data analysis and a common preprocessing step. Previous work (Samadi et al., 2018a) investigated the case when data falls into two or more groups (e.g., based on gender or education level) . A single global objective for the entire data set need not result in a solution which has high fidelity for all groups. In practice, even when the groups have equal size, PCA often results in much higher reconstruction error for some groups. This suggests some necessary tradeoff between accurately representing each group’s data in a single lower-dimensional subspace. How might one redefine dimensionality reduction to produce projections which optimize different groups’ representation in a balanced way?

The definition of these groups need not be a partition; each group could be defined as a different weighting of the data set (rather than a subset, which is a 0/1 weighting). Framed this way, asking for balance or fairness of this optimization can be viewed as dealing with multiple competing objectives. One way to balance multiple objectives is to find a projection that maximizes the minimum objective value over each of the groups (weightings), i.e.,

 maxP:PTP=Idmin1≤i≤k∥AiP∥2F=⟨ATiAi,PPT⟩.

More generally, let denote the set of all projection matrices , i.e., matrices with orthonormal columns. For each group , we associate a function that denotes the group’s objective value for a particular projection. For any , we define the -fair dimensionality reduction problem as finding a -dimensional projection which optimizes

 maxP∈Pdg(f1(P),f2(P),…,fk(P)).

In the above example of max-min fair PCA, is simply the function and

is the total squared norm of the projection of vectors in

. Other examples include: defining each as the average squared norm of the projections rather than the total, or the marginal variance — the difference in total squared norm when using rather than the best possible projection for that group. (Note that for PCA, max variance and min reconstruction error are the same objective.) One could also choose the product function for the accumulating function . This is also a natural choice, famously introduced in Nash’s solution to the bargaining problem. This framework can also describe the th power mean of the projections, e.g. and .

The appropriate weighting of objectives often depends on the context and application. We define fair dimensionality reduction to be the general problem with functions . The central motivating questions of this paper are the following:

• What is the complexity of Fair-PCA ?

• More generally, what is the complexity of Fair-Dimension-Reduction ?

Framed another way, we ask whether these “fair” optimization problems force us to incur substantial computational cost? In Samadi et al. (2018a), we considered Fair-PCA and showed that the natural semi-definite relaxation of the problem can be used to find a projection to dimension whose cost is at most that of the optimal -dimensional projection. For groups, this is an increase of in the dimension (as opposed to the naïve bound of , by simply taking the span of the optimal -dimensional subspaces for the two groups). However, the computational complexity of exactly solving Fair-PCA remained open.

### 1.1 Results and Techniques

Let us first focus on Fair-PCA for ease of exposition. The problem can be reformulated as the following mathematical program where we denote by . A natural approach to solving this problem is to consider the SDP relaxation obtained by relaxing the rank constraint to a bound on the trace.

Exact Fair-PCA
 maxz ⟨ATiAi,X⟩ ≥zi∈{1,…,k} \rm rank(X) ≤d 0⪯X ⪯I
SDP Relaxation of Fair-PCA
 maxz ⟨ATiAi,X⟩ ≥zi∈{1,…,k} \rm tr(X) ≤d 0⪯X ⪯I

Our first main result is that the SDP relaxation is exact when there are two groups. Thus finding an extreme point of this SDP gives an exact algorithm for Fair-PCA for two groups. Previously, only approximation algorithms were known for this problem.

Any optimal extreme point solution to the SDP relaxation for Fair-PCA with two groups has rank at most . Therefore, -group Fair-PCA can be solved in polynomial time.

Our results also hold for the Fair-Dimension-Reduction when is monotone nondecreasing in any one coordinate and concave, and each is an affine function of (and thus a special case of a quadratic function in ).

There is a polynomial time algorithm for -group Fair-Dimension-Reduction problem when is concave and monotone nondecreasing for at least one of its two arguments, and each is linear in , i.e., for some matrix .

As indicated in the theorem, the core idea is that extreme-point solutions of the SDP in fact have rank , not just trace equal to .

For , the SDP need not recover a rank solution. In fact, the SDP may be inexact even for (see Section 7). Nonetheless, we show that we can bound the rank of a solution to the SDP and obtain the following result. We state it for Fair-PCA, though the same bound holds for Fair-Dimension-Reduction under the same assumptions as in Theorem 1.1. Note that this result generalizes Theorem 1.1.

For any concave that is monotone nondecreasing in at least one of its arguments, there exists a polynomial time algorithm for Fair-PCA with groups that returns a -dimensional embedding whose fair objective value is at least that of the optimal -dimensional embedding. If is only concave, then the dimension returned increases by .

This strictly improves and generalizes the bound of for Fair-PCA from Samadi et al. (2018a). Moreover, if the dimensionality of the solution is a hard constraint, instead of tolerating extra dimension in the solution, one may solve Fair-PCA for target dimension to guarantee a solution of rank at most . Thus, we obtain an approximation algorithm for Fair-PCA of factor . This is stated formally in Section 4.

We now focus our attention to the marginal loss function. This measures the maximum over the groups of the difference between the variance of a common solution for the

groups and an optimal solution for an individual group (“the marginal cost of sharing a common subspace”). For this problem, the above scaling method could substantially harm the objective value, since the target function is nonlinear, namely for each group . In Section 3

, we develop a general iterative rounding framework for SDPs with eigenvalue upper bounds and

other linear constraints. This algorithm gives a solution of desired rank that violates each constraint by a bounded amount. The precise statement is Theorem 1.1. It implies that for Fair-PCA with marginal loss as the objective the additive error is

 Δ(A):=maxS⊆[m]⌊√2|S|+1⌋∑i=1σi(AS)

where .

It is natural to ask whether Fair-PCA is NP-hard to solve exactly. The following result implies that it is, even for target dimension . The max-min Fair-PCA problem for target dimension is NP-hard when the number of groups is part of the input.

This raises the question of the complexity for constant groups. An alternative view of our SDP-based algorithm for is via the S-lemma (Yakubovich, 1971, 1997), which shows that for two quadratic constraints over the unit sphere, there is a polynomial-time algorithm. We refer the reader to Pólik and Terlaky (2007); Ben-Tal and Nemirovski (2001) for the various formulations of the -lemma and applications in control theory, optimization, geometry, portfolio management and statistics. Our proof of Theorem 1.1 effectively shows that the S-lemma can be adapted to our setting by incorporating an upper bound on the eigenvalues and still maintaining polynomial time solvability.

For groups, we would have constraints, one for each group, plus the eigenvalue constraint and the trace constraint; now the tractability of the problem is far from clear. In fact, as we show in Section 7, the SDP has an integrality gap even for . We therefore consider an approach beyond SDPs, to one that involves solving non-convex problems. Thanks to the powerful algorithmic theory of quadratic maps, developed by Grigoriev and Pasechnik (2005), it is polynomial-time solvable to check feasibility of a set of quadratic constraints for any fixed . As we discuss next, their algorithm can check for zeros of a function of a set of quadratic functions, and can be used to optimize the function. Using this result, we show that for , there is a polynomial-time algorithm for rather general functions of the values of individual groups.

Let the fairness objective be where is a degree polynomial in some computable subring of and each is quadratic for . Then there is an algorithm to solve the fair dimensionality reduction problem in time .

By choosing to be the product polynomial over the usual ring or the function which is degree in the ring, this applies to the variants of Fair-PCA discussed above and various other problems.

#### SDP extreme points.

For , the underlying structural property we show is that extreme point solutions of the SDP have rank exactly . First, for , this is the largest eigenvalue problem, since the maximum obtained by a matrix of trace equal to can also be obtained by one of the extreme points in the convex decomposition of this matrix. This extends to trace equal to any , i.e., the optimal solution must be given by the top eigenvectors of . Second, without the eigenvalue bound, for any SDP with constraints, there is an upper bound on the rank of any extreme point, of , a seminal result of Pataki (1998) (see also Barvinok (1995)). However, we cannot apply this directly as we have the eigenvalue upper bound constraint. The complication here is that we have to take into account the constraint without increasing the rank.

Let and be real matrices, , and . Suppose the semi-definite program :

 min⟨C,X⟩ subject to (1) ⟨Ai,X⟩ ⊲i bi∀1≤i≤m (2) \rm tr(X) ≤ d (3) 0⪯X ⪯ In (4)

where , has a nonempty feasible set. Then, all extreme optimal solutions to have rank at most . Moreover, given a feasible optimal solution, an extreme optimal solution can be found in polynomial time.

To prove the theorem, we extend Pataki (1998)’s characterization of rank of SDP extreme points with minimal loss in the rank. We show that the constraints can be interpreted as a generalization of restricting variables to lie between and

in the case of linear programming relaxations. From a technical perspective, our results give new insights into structural properties of extreme points of semi-definite programs and more general convex programs. Our result adds to a rather short list of algorithmic problems that utilize properties of extreme points of a semi-definite relaxation.

#### SDP Iterative Rounding.

Using Theorem 1.1, we extend the iterative rounding framework for linear programs (see  Lau et al. (2011) and references therein) to semi-definite programs, where the constraints are generalized to eigenvalue bounds. The algorithm has a remarkably similar flavor. In each iteration, we fix the subspaces spanned by eigenvectors with and eigenvalues, and argue that one of the constraints can be dropped while bounding the total violation in the constraint over the course of the algorithm. While this applies directly to the Fair-PCA problem, in fact is a general statement for SDPs, which we give below.

Let be a collection of matrices. For any set , let the largest singular of the average of matrices . We let

 Δ(A):=maxS⊆[m]⌊√2|S|+1⌋∑i=1σi(S).

Let be a matrix and be a collection of real matrices, , and . Suppose the semi-definite program :

 min⟨C,X⟩ subject to ⟨Ai,X⟩ ≥ bi∀1≤i≤m \rm tr(X) ≤ d 0⪯X ⪯ In

has a nonempty feasible set and let denote an optimal solution. The Algorithm Iterative-SDP(see Figure 1) returns a matrix such that

1. rank of is at most ,

2. , and

3. for each .

### 1.2 Related Work

With the growing use of machine learning algorithms in automated decision making, researchers have raised concerns about the bias that these algorithms might produce in the outcomes (Angwin et al., 2018; Kay et al., 2015; Buolamwini and Gebru, 2018; Sweeney, 2013). This has resulted in a wide range of studies focusing on detecting and eliminating sources of unfairness in different stages of a decision-making process, where most of this work has focused either on biased data or on algorithms producing biased outcomes. In this regard, studying fairness for dimensionality reduction techniques focuses on a more subtle source of bias in ML applications, which may or may not be used in any particular decision-making process. When PCA is used as a preprocessing step for decision making, it can inadvertently erase critically useful information about some populations. Even when it is used merely to visualize data, the erasure of variance for some populations raises concerns of representational bias (Crawford, 2017).

Principal Component Analysis (PCA) (Pearson, 1901; Jolliffe, 1986; Hotelling, 1933) is widely used as a preprocessing step to reduce the computational burden and/or to facilitate data summarization (Raychaudhuri et al., 1999; Iezzoni and Pritts, 1991). Samadi et al. (2018a) observed that vanilla PCA can inadvertently choose a low dimensional representation of the data which depicts different populations with different fidelities. As the result, vanilla PCA itself can be a source of unfairness in the data representation step and they suggest replacing it with the Fair PCA algorithm in applications (Samadi et al., 2018b).

As mentioned earlier, Pataki (1998) (see also Barvinok (1995)) showed low rank solutions to semi-definite programs with small number of affine constraints can be obtained efficiently. We also refer the reader to survey by Lemon et al. (2016) for more details. Closely related to low rank SDP solutions is the S-lemma (Yakubovich, 1971, 1997) and we refer the reader to the survey by Pólik and Terlaky (2007). We also remark that methods based on Johnson-Lindenstrauss lemma can also be applied to obtain bi-criteria results for Fair-PCA problem. For example, So et al. (2008) give algorithms that give low rank solutions for SDPs with affine constraints without the upper bound on eigenvalues. Here we have focused on single criteria setting, with violation either in the number of dimensions or the objective but not both. Extreme point solutions to linear programming have played an important role in design of approximation algorithms (Lenstra et al., 1990). Iterative rounding method for linear programming, based on the extreme point solutions, has been a highly successful technique starting with the work of  Jain (2001). Restricting a feasible region of certain SDPs relaxations with low-rank constraints has been shown to avoid spurious local optima (Bandeira et al., 2016)

and reduce the runtime due to known heuristics and analysis

(Burer and Monteiro, 2003, 2005; Boumal et al., 2016). We refer the reader to Lau et al. (2011) for details on the topic and applications.

A closely related area, especially to Fair-Dimension-Reduction problem, is multi-objective optimization which has a vast literature. We refer the reader to Deb (2014) and references therein. We also remark that properties of extreme point solutions of linear programs (Ravi and Goemans, 1996; Grandoni et al., 2014) have also been utilized to obtain approximation algorithms to multi-objective problems. For semi-definite programming based methods, the closest works are on simultaneous max-cut (Bhangale et al., 2015, 2018) that utilize sum of squares hierarchy to obtain improved approximation algorithms.

## 2 Low-rank Solutions of Fair-Dimension-Reduction

In this section, we show that all extreme solutions of SDP relaxation of Fair-Dimension-Reduction have low rank, proving Theorem 1.1-1.1. Before we state the results, we make following assumptions. In this section, we let be a concave function which is monotonic in at least one coordinate, and mildly assume that can be accessed with a polynomial-time subgradient oracle and is polynomially bounded by its input. We are explicitly given functions which are affine in , i.e. we are given real matrices and constants and .

We assume to be -Lipschitz. For functions that are -Lipschitz, we define an -optimal solution to -Fair-Dimension-Reduction problem as a projection matrix of rank whose objective value is at most from the optimum. In the context where an optimization problem has affine constraints where is Lipschitz, we also define -solution as a projection matrix of rank that violates th affine constraints by at most . Note that the feasible region of the problem is implicitly bounded by the constraint .

For Section 2, the algorithm may involve solving an optimization under a matrix linear inequality, which may not give an answer representable in finite bits of computation. However, we give algorithms that return an -close solution whose running time depends polynomially on for any . This is standard for computational tractability in convex optimization (see, for example, in Ben-Tal and Nemirovski (2001)). Therefore, for ease of exposition, we omit the computational error dependent on this to obtain an -feasible and -optimal solution, and define polynomial running time as polynomial in and .

To prove Theorem 1.1-1.1, we first show that extreme point solutions in semi-definite cone under affine constraints and have low rank. The statement builds on a result of Pataki (1998). We then apply our result to Fair-Dimension-Reduction problem, which contains the Fair-PCA problem. Finally, we show that existence of low-rank solution leads to an approximation algorithm to Fair-PCA problem.

We first prove Theorem 1.1. [Theorem 1.1] Let be an extreme point optimal solution to . Suppose rank of , say , is more than . Then we show a contradiction to the fact that is extreme. Let of the eigenvalues of be equal to one. If , then we have since and we are done. Thus we assume that . In that case, there exist matrices , and a symmetric matrix such that

 X∗=(Q1Q2)(Λ00Il)(Q1Q2)⊤=Q1ΛQ⊤1+Q2QT2

where , , , and that the columns of and are orthogonal, i.e. has orthonormal columns.

Now, we have

 ⟨Ai,X∗⟩=⟨Ai,Q1ΛQ⊤1+Q2Q⊤2⟩=⟨Q⊤1AiQ1,Λ⟩+⟨Ai,Q2Q⊤2⟩

and

 \rm tr(X∗)=⟨Q⊤1Q1,Λ⟩+\rm tr(Q2Q⊤2)

so that and are linear in .

Observe the set of symmetric matrices forms a vector space of dimension with the above inner product where we consider the matrices as long vectors. If then there exists a -symmetric matrix such that for each and .

But then we claim that is feasible for small , which implies a contradiction to being extreme. Indeed, it satisfies all the linear constraints by construction of . Thus it remains to check the eigenvalues of the newly constructed matrix. Observe that

 Q1(Λ±δΔ)Q⊤1+Q2QT2=Q(Λ±δΔ00Il)Q⊤

with orthonormal . Thus it is enough to consider the eigenvalues of

Observe that eigenvalues of the above matrix are exactly ones and eigenvalues of . Since eigenvalues of are bounded away from and , one can find small such that the eigenvalue of are bounded away from and as well, so we are done. Therefore, we must have which implies . By , we have .

For the algorithmic version, given feasible , we iteratively reduce by at least one until . While , we obtain by using Gaussian elimination. Now we want to find the correct value of so that takes one of the eigenvalues to zero or one. First, determine the sign of to find the correct sign to move that keeps the objective non-increasing, say it is in the positive direction. Since the set of feasible is convex and bounded, the ray intersects the boundary of feasible region at a unique . Perform binary search for the correct value of and set up to the desired accuracy. Since for each and , the additional tight constraint from moving to the boundary of feasible region must be an eigenvalue constraint , i.e., at least one additional eigenvalue is now at 0 or 1, as desired. We apply eigenvalue decomposition to and update accordingly, and repeat.

We also obtain the following corollary from the bound in the proof of Theorem 1.1. The number of fractional eigenvalues in any extreme point solution to is bounded by .

We are now ready to state the main result of this section that we can find a low-rank solution for Fair-Dimension-Reduction . Recall that is the set of all projection matrices , i.e., matrices with orthonormal columns and the -Fair-Dimension-Reduction problem is to solve

 maxP∈Pdg(f1(P),f2(P),…,fk(P)) (5)

There exists a polynomial-time algorithm to solve -Fair-Dimension-Reduction that returns a solution of rank at most whose objective value is at least that of the optimal -dimensional embedding. First, we write a relaxation of (5):

 maxX∈Rn×n (6) \rm tr(X) ≤d (7) 0⪯X ⪯In (8)

Since is concave in and is affine in , we have that as a function of is also concave in . By assumptions on , and the fact that the feasible set is convex and bounded, we can solve the convex program in polynomial time, e.g. by ellipsoid method, to obtain a (possibly high-rank) optimal solution . (In the case that is linear, the relaxation is also an SDP and may be solved faster in theory and practice). By assumptions on , without loss of generality, we let be nondecreasing in the first coordinate. To reduce the rank of , we consider an :

 maxX∈Rn×n ⟨B1,X⟩ subject to (9) ⟨Bi,X⟩ = ⟨Bi,¯X⟩∀2≤i≤k (10) \rm tr(X) ≤ d (11) 0⪯X ⪯ In (12)

has a feasible solution of objective and note that there are constraints in (10). Hence, we can apply the algorithm in Theorem 1.1 with to find an extreme solution of of rank at most . Since is nondecreasing in , optimal solutions to gives objective value at least the optimum of the relaxation (6)-(8), and hence at least the optimum of the original Fair-Dimension-Reduction problem. If the assumption that is monotonic in at least one coordinate is dropped, Theorem 2 will hold with by indexing constraints (10) in for all groups instead of groups.

Another way to state Theorem 2 is that the number of groups must reach before additional dimensions in the solution matrix is required to achieve the optimal objective value. For , no additional dimension in the solution is necessary to attain the optimum. We state this fact as follows. The -Fair-Dimension-Reduction problem on two groups can be solved in polynomial time. In particular, Corollary 2 applies to Fair-PCA with two groups, proving Theorem 1.1.

## 3 Iterative Rounding Framework with Applications to Fair-PCA

In this section, we first prove Theorem 1.1.

We give an iterative rounding algorithm. The algorithm maintains three subspaces that are mutually orthogonal. Let denote matrices whose columns form an orthonormal basis of these subspaces. We will also abuse notation and denote these matrices by sets of vectors in their columns. We let the rank of and be and , respectively. We will ensure that , i.e., vectors in and span .

We initialize and . Over iterations, we increase the subspaces spanned by columns of and and decrease while maintaining pairwise orthogonality. The vectors in columns of will be eigenvectors of our final solution with eigenvalue . In each iteration, we project the constraint matrices orthogonal to and . We will then formulate a residual SDP using columns of as a basis and thus the new constructed matrices will have size . To readers familiar with the iterative rounding framework in linear programming, this generalizes the method of fixing certain variables to or and then formulating the residual problem. We also maintain a subset of constraints indexed by where is initialized to .

The algorithm is specified in Figure 1. In each iteration, we formulate the following with variables which will be a symmetric matrix. Recall is the number of columns in .

 max ⟨FTCF,X(r)⟩ ⟨FTAiF,X(r)⟩ ≥bi−FT1AiF1i∈S \rm tr(X) ≤d−\rm rank(F1) 0⪯X(r) ⪯Ir

It is easy to see that the semi-definite program remains feasible over all iterations if is declared feasible in the first iteration. Indeed the solution defined at the end of any iteration is a feasible solution to the next iteration. We also need the following standard claim.

###### Claim 1

Let be a positive semi-definite matrix such that with . Let be real matrix of the same size as and let denote the largest singular value of . Then

 ⟨B,Y⟩≤l∑i=1λi(B).

The following result follows from Corollary 2 and Claim 1. Recall that

 Δ(A):=maxS⊆[m]⌊√2|S|+1⌋∑i=1σi(S).

where is the ’th largest singular value of .

We let denote for the rest of the section.

Consider any extreme point solution of such that . Let be its eigenvalue decomposition and . Then there exists a constraint such that . Let . From Corollary 2, it follows that number of fractional eigenvalues of is at most . Observe that since . Thus . Moreover, , thus from Claim 1, we obtain that

 ⟨∑j∈SFTAjF,Xf⟩≤⌊√2l+1⌋∑i=1σi⎛⎝∑j∈SFTAjF⎞⎠≤⌊√2l+1⌋∑i=1σi⎛⎝∑j∈SAj⎞⎠≤l⋅Δ

where the first inequality follows from Claim 1 and second inequality follows since the sum of top singular values reduces after projection. But then we obtain, by averaging, that there exists such that

 ⟨FTAjF,Xf⟩<1l⋅lΔ=Δ

as claimed.

Now we complete the proof of Theorem 1.1. Observe that the algorithm always maintains that end of each iteration, trace of plus the rank of is at most . Thus at the end of the algorithm, the returned solution has rank at most . Next, consider the solution over the course of the algorithm. Again, it is easy to see that the objective value is non-increasing over the iterations. This follows since defined at the end of an iteration is a feasible solution to the next iteration.

Now we argue the violation in any constraint . While the constraint remains in the SDP, the solution satisfies

 ⟨Ai,X⟩=⟨Ai,F1FT1⟩+⟨Ai,FXfFT⟩ = ⟨Ai,F1FT1⟩+⟨FTAiF,Xf⟩≤⟨Ai,F1FT1⟩+bi−⟨Ai,F1FT1⟩=bi.

where the inequality again follows since is feasible with the updated constraints.

When constraint is removed it might be violated by a later solution. At this iteration, . Thus, . In the final solution this bound can only go up as might only become larger. This completes the proof of theorem.

#### Application to Fair-PCA .

For the Fair-PCA problem, iterative rounding recovers a rank- solution whose variance goes down from the SDP solution by at most . While this is no better than what we get by scaling (Corollary 4) for the max variance objective function, when we consider the marginal loss, i.e., the difference between the variance of the common -dimensional solution and the best -dimensional solution for each group, then iterative rounding can be much better. The scaling solution guarantee relies on the max-variance being a concave function and for the marginal loss, the loss for each group could go up proportional to the largest max variance (largest sum of top singular values over the groups). With iterative rounding applied to the SDP solution, the loss is the sum of only singular values of the average of some subset of data matrices, so it can be better by as much as a factor of .

## 4 Approximation Algorithm for Fair-PCA

Recall that we require additional dimensions for the projection to achieve the optimal objective. One way to ensure that the algorithm outputs -dimensional projection is to solve the problem in lower target dimension , then apply the rounding described in Section 2. The relationship of objectives between problems with target dimension and is at most factor apart for Fair-PCA problem because the objective scales linearly with , giving an approximation guarantee of . Recall that given , Fair-PCA problem is to solve

 maxP:PTP=Idmin1≤i≤k∥AiP∥2F=⟨ATiAi,PPT⟩

We state the approximation guarantee and the algorithm formally as follows. Let be data sets of groups and suppose . Then there exists a polynomial-time approximation algorithm of factor to Fair-PCA problem. We find an extreme solution of the Fair-PCA problem of finding a projection from to target dimensions. By Theorem 2, the rank of is at most .

Denote the optimal value and an optimal solution to Fair-PCA with target dimension . Note that is a feasible solution to Fair-PCA relaxation on target dimension which is at least because the objective scales linearly with . Therefore, the optimal Fair-PCA relaxation of target dimension attains optimum at least , giving -approximation ratio.

## 5 Polynomial Time Algorithm for Fixed Number of Groups

We briefly summarize the approach of Grigoriev and Pasechnik (2005). Let be real-valued quadratic functions in variables. Let be a polynomial of degree over some subring of (e.g., the usual or ) The problem is to find all roots of the polynomial , i.e., the set

 Z={x:p(f1(x),f2(x),…,fk(x))=0}.

First note that the set of solutions above is in general not finite and is some manifold and highly non-convex. The key idea of Grigoriev and Paleshnik (see also Barvinok Barvinok (1993) for a similar idea applied to a special case) is to show that this set of solutions can be partitioned into a relatively small number of connected components such that there is an into map from these components to roots of a univariate polynomial of degree ; this therefore bounds the total number of components. The proof of this mapping is based on an explicit decomposition of space with the property that if a piece of the decomposition has a solution, it must be the solution of a linear system. The number of possible such linear systems is bounded as , and these systems can be enumerated efficiently.

The core idea of the decomposition starts with the following simple observation that relies crucially on the maps being quadratic (and not of higher degree). The partial derivatives of any degree polynomial of quadratic forms , where , is linear in for any fixed value of .

To see this, suppose and write

 ∂p∂xi=k∑j=1∂p(Y1,…,Yk)∂Yj∂Yj∂xi=k∑j=1∂p(Y1,…,Yk)∂Yj∂fj(x)∂xi.

Now the derivatives of are linear in as is quadratic, and so for any fixed values of , the expression is linear in .

The next step is a nontrivial fact about connected components of analytic manifolds that holds in much greater generality. Instead of all points that correspond to zeros of , we look at all “critical” points of defined as the set of points for which the partial derivatives in all but the first coordinate, i.e.,

 Zc={x:∂p∂xi=0,∀2≤i≤n}.

The theorem says that will intersect every connected component of (Grigor’ev and Vorobjov Jr, 1988).

Now the above two ideas can be combined as follows. We will cover all connected components of . To do this we consider, for each fixed value of , the possible solutions to the linear system obtained, alongside minimizing . The rank of this system is in general at least after a small perturbation (while Grigoriev and Pasechnik (2005) uses a deterministic perturbation that takes some care, we could also use a small random perturbation). So the number of possible solutions grows only as exponential in (and not ), and can be effectively enumerated in time . This last step is highly nontrivial, and needs the argument that over the reals, zeros from distinct components need only to be computed up to finite polynomial precision (as rationals) to keep them distinct. Thus, the perturbed version still covers all components of the original version. In this enumeration, we check for true solutions. The method actually works for any level set of , and not just its zeros. With this, we can optimize over as well. We conclude this section by paraphrasing the main theorem from Grigoriev and Pasechnik (2005).

(Grigoriev and Pasechnik, 2005) Given quadratic maps and a polynomial over some computable subring of of degree at most , there is an algorithm to compute a set of points satisfying that meets each connected component of the set of zeros of using at most operations with all intermediate representations bounded by times the bit sizes of the coefficients of . The minimizer, maximizer or infimum of any polynomial of degree at most over the zeros of can also be computed in the same complexity.

### 5.1 Proof of Theorem 1.1

We apply Theorem 5 and the corresponding algorithm as follows. Our variables will be the entries of an matrix . The quadratic maps will be plus additional maps for and for columns of . The final polynomial is

 p(f1,…,fk,q11,…,qdd)=∑i≤jqij(P)2.

We will find the maximum of the polynomial over the set of zeros of using the algorithm of Theorem 5. Since the total number of variables is and the number of quadratic maps is , we get the claimed complexity of operations and this times the input bit sizes as the bit complexity of the algorithm.

## 6 Hardness

The Fair-PCA problem:

 maxz∈R,P∈Rn×d zsubjectto (13) ⟨Bi,PPT⟩ ≥z,∀i∈[k] (14) PTP=Id (15)

for arbitrary symmetric real PSD matrices is NP-hard for and .

We reduce another NP-hard problem of MAX-CUT to the stated fair PCA problem. In MAX-CUT, given a simple graph , we optimize

 maxS⊆Ve(S,V∖S) (16)

over all subset of vertices. Here, is the size of the cut in . As common NP-hard problems, the decision version of MAX-CUT:

 ∃?S⊆V:e(S,V∖S)≥b (17)

for an arbitrary is also NP-hard. We may write MAX-CUT as an integer program as follows:

 ∃?v∈{−1,1}V: 12∑ij∈E(1−vivj)≥b (18)

Here represents whether a vertex is in the set or not:

 vi={1i∈S−1i∉S (19)

and it can be easily verified that the objective represents the desired cut function.

We now show that this MAX-CUT integer feasibility problem can be formulated as an instance of the fair PCA problem (13)-(15). In fact, it will be formulated as a feasibility version of the fair PCA by checking if the optimal of an instance is at least . We choose and for this instance, and we write . The rest of the proof is to show that it is possible to construct constraints in the fair PCA form (14)-(15) to 1) enforce a discrete condition on to take only two values, behaving similarly as ; and 2) check an objective value of MAX-CUT.

The reason as written cannot behave exactly as is that constraint (15) requires but . Hence, we scale the variables in MAX-CUT problem by writing and rearrange terms in (18) to obtain an equivalent formulation of MAX-CUT:

 ∃?u∈{−1√n,1√n}n: n∑ij∈E−uiuj≥2b−|E| (20)

We are now ready to give an explicit construction of to solve MAX-CUT formulation (20). Let . For each , define

 B2j−1=bn⋅diag(ej),B2j=bnn−1⋅diag(1−ej)

where and denote vectors of length with all zeroes except one at the th coordinate, and with all ones, respectively. It is clear that are PSD. Then for each , the constraints and are equivalent to

 u2j≥1n, and ∑i≠ju2j≥n−1n

respectively. Combining these two inequalities with forces both inequalities to be equalities, implying that for all , as we aim.

Next, we set

 B