# Composable Core-sets for Determinant Maximization: A Simple Near-Optimal Algorithm

"Composable core-sets" are an efficient framework for solving optimization problems in massive data models. In this work, we consider efficient construction of composable core-sets for the determinant maximization problem. This can also be cast as the MAP inference task for determinantal point processes, that have recently gained a lot of interest for modeling diversity and fairness. The problem was recently studied in [IMOR'18], where they designed composable core-sets with the optimal approximation bound of Õ(k)^k. On the other hand, the more practical Greedy algorithm has been previously used in similar contexts. In this work, first we provide a theoretical approximation guarantee of O(C^k^2) for the Greedy algorithm in the context of composable core-sets; Further, we propose to use a Local Search based algorithm that while being still practical, achieves a nearly optimal approximation bound of O(k)^2k; Finally, we implement all three algorithms and show the effectiveness of our proposed algorithm on standard data sets.

## Authors

• 25 publications
• 10 publications
• 21 publications
• 4 publications
02/10/2021

### Simple and Near-Optimal MAP Inference for Nonsymmetric DPPs

Determinantal point processes (DPPs) are widely popular probabilistic mo...
10/29/2020

### Group-Harmonic and Group-Closeness Maximization – Approximation and Engineering

Centrality measures characterize important nodes in networks. Efficientl...
07/31/2018

### Composable Core-sets for Determinant Maximization Problems via Spectral Spanners

We study a spectral generalization of classical combinatorial graph span...
11/30/2018

### Parallelizing greedy for submodular set function maximization in matroids and beyond

We consider parallel, or low adaptivity, algorithms for submodular funct...
08/09/2018

### Few Cuts Meet Many Point Sets

We study the problem of how to breakup many point sets in R^d into small...
10/25/2018

### Adversarially Robust Optimization with Gaussian Processes

In this paper, we consider the problem of Gaussian process (GP) optimiza...
04/05/2021

### Reformulating DOVER-Lap Label Mapping as a Graph Partitioning Problem

We recently proposed DOVER-Lap, a method for combining overlap-aware spe...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Given a set of vectors

and an integer , the goal of the determinant maximization problem is to find a subset of such that the determinant of the Gram matrix of the vectors in is maximized. Geometrically, this determinant is equal to the volume squared of the parallelepiped spanned by the points in . This problem and its variants have been studied extensively over the last decade. To this date, the best approximation factor is due to a work of Nikolov [nikolov2015randomized] which gives a factor of , and it is known that the exponential dependence on is unavoidable [civril2013exponential].

The determinant of a subset of points is used as a measure of diversity in many applications where a small but diverse subset of objects must be selected as a representative of a larger population [mirzasoleiman2017streaming, gong2014diverse, kulesza2012determinantal, chao2015large, kulesza2011learning, yao2016tweet, lee2016individualness]; recently, this has been further applied to model fairness [celis2016fair]

. The determinant maximization problem can also be rephrased as the maximum a posteriori probability (MAP) estimator for

determinantal point processes (DPPs). DPPs are probabilistic models of diversity in which every subset of objects is assigned a probability proportional to the determinant of its corresponding Gram matrix. DPPs have found several applications in machine learning over the last few years [kulesza2012determinantal, mirzasoleiman2017streaming, gong2014diverse, yao2016tweet]. In this context, the determinant maximization problem corresponds to the task of finding the most diverse subset of items.

Many of these applications need to handle large amounts of data and consequently the problem has been considered in massive data models of computation [mirzasoleiman2017streaming, wei2014fast, pan2014parallel, mirzasoleiman2013distributed, mirzasoleiman2015distributed, mirrokni2015randomized, barbosa2015power]. One strong such model that we consider in this work, is composable core-set [IMMM-ccdcm-14] which is an efficient summary of a data set with the composability property: union of summaries of multiple data sets should provably result in a good summary for the union of the data sets. More precisely, in the context of the determinant maximization, a mapping function that maps any point set to one of its subsets is called an -composable core-set if it satisfies the following condition: given any integer and any collection of point sets ,

 MAXDETk(m⋃i=1c(Pi))≥1α⋅% MAXDETk(m⋃i=1Pi)

where we use to denote the optimum of the determinant maximization problem for parameter . We also say is a core-set of size if for any , . If designed for a task, composable core-sets will further imply efficient streaming and distributed algorithms for the same task. This has lead to recent interest in composable core-sets model since its introduction [mirrokni2015randomized, assadi2017randomized, indyk2018composable].

#### An almost optimal approximate composable core-set.

In [indyk2018composable], the authors designed composable core-sets of size with approximation guarantee of for the determinant maximization problem. Moreover, they showed that the best approximation one can achieve is for any polynomial size core-sets, proving that their algorithm is almost optimal. However, its complexity makes it less appealing in practice. First of all, the algorithm requires an explicit representation of the point set, which is not present for many DPP applications; a common case is that the DPP kernel is given by an oracle which returns the inner product between the points; in this setting, the algorithm needs to construct the associated gram matrix, and use SVD decomposition to recover the point set, making the time and memory quadratic in the size of the point-set. Secondly, even in the point set setting, the algorithm is not efficient for large inputs as it requires solving linear programs, where is size of the point set.

In this paper, we focus on two simple to implement algorithms which are typically exploited in practice, namely the Greedy and the Local-Search algorithms. We study these algorithms from theoretical and experimental points of view for the composable core-set problem with respect to the determinant maximization objective, and we compare their performance with the algorithm of [indyk2018composable], which we refer to as the LP-based algorithm.

### 1.1 Our Contributions

#### Greedy algorithm.

The greedy algorithm for determinant maximization proceeds in iterations and at each iteration it picks the point that maximizes the volume of the parallelepiped formed by the set of points picked so far. [cm-smvsm-09] has studied the approximation of the greedy algorithm in the standard setting. In the context of submodular maximization over large data sets, variants of this algorithm have been studied [mirzasoleiman2013distributed]

. One can view the greedy algorithm as a heuristic for constructing a core-set of size

. To the best of our knowledge, the previous analysis of this algorithm does not provide any multiplicative approximation guarantee in the context of composable core-sets.222For more details, see related work.

Our first result shows the first multiplicative approximation factor for composable core-sets on the determinant maximization objective achieved by the Greedy algorithm.

###### Theorem 1.1.

Given a set of points , the Greedy algorithm achieves a -composable core-set of size for the determinant maximization problem, where is a constant.

#### The Local Search algorithm.

Our main contribution is to propose to use the Local Search algorithm for constructing a composable core-set for the task of determinant maximization. The algorithm starts with the solution of the Greedy algorithm and at each iteration, swaps in a point that is not in the core-set with a point that is already in the core-set, as long as this operation increases the volume of the set of picked points. While still being simple, as we show, this algorithm achieves a near-optimal approximation guarantee.

###### Theorem 1.2.

Given a set of points , the Local Search algorithm achieves an -composable core-set of size for the determinant maximization problem.

#### Directional height.

Both of our theoretical results use a modular framework: In Section 3, we introduce a new geometric notion defined for a point set called directional height, which is closely related to the width of a point set defined in [AHV-gavc-05]. We show that core-sets for preserving the directional height of a point set in fact provide core-sets for the determinant maximization problem. Finally, we show that running either the Greedy (Section 5) or Local Search (Section 4) algorithms on a point set obtain composable core-sets for its directional height. We believe that this new notion might find applications elsewhere.

#### Experimental resutls.

Finally, we implement all three algorithms and compare their performances on two real data sets: MNIST[lecun1998gradient] data set and GENES data set, previously used in [batmanghelich2014diversifying, li2015efficient] in the context of DPPs. Our empirical results show that in more than percent of the cases, the solution reported by the Local Search algorithm improves over the Greedy algorithm. The average improvement varies from to up to depending on the data set and the settings of other parameters such as . We further show that although Local Search picks fewer points than the tight approximation algorithm of [indyk2018composable] ( vs. upto ), its performance is better and it runs faster.

### 1.2 Related Work

In a broader scope, determinant maximization is an instance of the (non-monotone) submodular maximization where the logarithm of the determinant is the submodular objective function. There is a long line of work on distributed submodular optimization and its variants [chierichetti2010max, badanidiyuru2014streaming, mirzasoleiman2016distributed, kumar2015fast]. In particular, there has been several efforts to design composable core-sets in various settings of the problem [mirzasoleiman2013distributed, mirrokni2015randomized, barbosa2015power]; In [mirzasoleiman2013distributed], authors study the problem for monotone functions, and show the greedy algorithm offers a -composable core-set for the problem where is the number of parts. On the other hand, [IMMM-ccdcm-14] shows that it is impossible to go beyond an approximation factor of with polynomial size core-sets. Moreover, [mirrokni2015randomized, barbosa2015power] consider a variant of the problem where the data is randomly distributed, and show the greedy algorithm achieves constant factor “randomized” composable core-sets for both monotone and non-monotone functions. However, one can notice that these results can not be directly compared to the current work, as a multiplicative approximation for determinant converts to an additive guarantee for the corresponding submodular function.

As discussed before, the determinant is one way to measure the diversity of a set of items. Diversity maximization with respect to other measures has been also extensively studied in the literature, [hassin1997approximation, gollapudi2009axiomatic, borodin2012max, bhaskara2016linear]. More recently, the problem has received more attention in distributed models of computation, and for several diversity measures constant factor approximation algorithms have been devised [zadeh2017scalable, IMMM-ccdcm-14, ceccarello2017mapreduce]. However, these notions are typically defined by considering only the pairwise dissimilarities between the items; for example, the summation of the dissimilarities over all pairs of items in a set can define its diversity.

One can also go further, and study the problem under additional constraints, such as matroid and knapsack constraints. This has been an active line of research in the past few years, and several centralized and distributed algorithms have been designed in this context for submodular optimization [mirzasoleiman2016fast, lee2009non, lee2010submodular, chekuri2015streaming] and in particular determinant maximization [ebrahimi2017subdeterminant, nikolov2016maximizing].

## 2 Preliminaries

Throughout the paper, we fix as the dimension of the ambient space and as the size parameter of the determinant maximization problem. We call a subset of a point set, and use the term point or vector to refer to an element of a point set. For a set of points and a point , we write to denote the set , and for a point , we write to denote the set .

Let be a point set of size . We use to denote the -dimensional volume of the parallelepiped spanned by vectors in . Also, let denote a matrix where each row represents a point of . Then, the following relates volume to the determinant So the determinant maximization problem can also be phrased as volume maximization. We use the former, but because of the geometric nature of the arguments, sometimes we switch to the volume notation. For any point set , we use to denote the optimal of determinant maximization for , i.e. , where ranges over all subsets of size . is defined similarly.

For a point set , we use to refer to the linear subspace spanned by the vectors in . We also denote the set of all -dimensional linear subspaces by . For a point and a subspace , we use to show the Euclidean distance of from .

#### Greedy algorithm for volume maximization.

As pointed out before, a widely used algorithm for determinant maximization in the offline setting is a greedy algorithm which given a point set and a parameter as the input does the following: start with an empty set . For iterations, add to . The result would be a subset of size which has the following guarantee.

###### Theorem 2.1 ([cm-smvsm-09]).

Let be a point set and be the output of the greedy algorithm on . Then .

### 2.1 Core-sets

Core-set is a generic term used for a small subset of the data that represents it very well. More formally, for a given optimization task, a core-set is a mapping from any data set into one of its subsets such that the solution of the optimization over the core-set approximates the solution of the optimization over the original data set . The notion was first introduced in [AHV-aemp-04] and many variations of core-sets exist. In this work, we consider the notion of composable core-sets defined in [IMMM-ccdcm-14].

###### Definition 2.2 (α-Composable Core-sets).

A function that maps the input set into one of its subsets is called an -composable core-set for a function if, for any collection of sets , we have where and .

For simplicity, we will often refer to the set as the core-set for and use the term “core-set function” with respect to . The size of is defined as the smallest number such that for all sets (assuming it exists). Unless otherwise stated, we might use the term “core-set” to refer to a composable core-set when clear from the context. Our goal is to find composable core-sets for the determinant maximization problem.

## 3 k-Directional Height

As pointed out in the introduction, we introduce a new geometric notion called directional height, and reduce the task of finding composable core-sets for determinant maximization to finding core-sets for this new notion.

###### Definition 3.1 (k-Directional Height).

Let be a point set and be a -dimensional subspace. We define the -directional height of with respect to , denoted by , to be the distance of the farthest point in from , i.e., .

The notion is an instance of an extent measure defined in [AHV-gavc-05]. It is also related to the notion of directional width of a point set previously used in [AHV-gavc-05], which for a direction vector is defined to be .

Next, we define core-sets with respect to this notion. It is essentially a subset of the point set that approximately preserves the -directional height of the point set with respect to any subspace in .

###### Definition 3.2 (α-Approximate Core-set for the k-Directional Height).

Given a point set , a subset is a -approximate core-set for the -directional height of , if for any , we have .

We also say a mapping which maps any point set in to one of its subsets, is an -approximate core-set for the -directional height problem, if the above relation holds for any point set and .

The above notion of core-sets for -directional height is similar to the notion of -kernels defined in [AHV-gavc-05] for the directional width of a point set.

We connect it to composable core-sets for determinant maximization by the following lemma.

###### Lemma 3.3.

Let be an arbitrary collection of point sets, and for any , let be an -approximate core-set for the -directional height for . Then

 MAXDETk(m⋃i=1Pi)≤α2k⋅MAXDETk(m⋃i=1c(Pi)).
###### Proof.

Let be any subset of size , and also let . We claim that there is a point in the union of the core-sets such that . Note that showing this claim is enough to prove the lemma. Since, one can start from the optimum solution which achieves the largest volume on , and for at most iterations, replace a point outside by a point inside, while decreasing the volume by a factor of at most .

So it remains to prove the claim. Let , and let be the plane spanned by . By definition, . On the other hand, suppose that . Then by our assumption, there exists so that . Replacing with , we get

 VOL(W−w1+q)=dist(q,H)⋅VOL(W−w1) ≥dist(w1,H)⋅VOL(W−w1)α=VOL(W)α

which completes the proof. ∎

###### Corollary 3.4.

Any mapping which is an -approximate core-set for -directional height, is an -composable core-set for the determinant maximization.

We employ the above corollary to analyze both greedy and local search algorithms in Sections 4 and 5.

## 4 The Local Search Algorithm

In this section, we describe and analyze the local search algorithm and prove Theorem 1.2. The algorithm is described in Algorithm 1.

To prove Theorem 1.2, we follow a two steps strategy. We first analyze the algorithm for individual point sets, and show that the output is a -approximate core-set for the -directional height problem, as follows.

###### Lemma 4.1.

Let be a set of points and be the result of running the local search algorithm on . Then, for any ,

 h(c(P),H)≥h(P,H)2k(1+ϵ).

Next, we apply 3.4, which implies that local search gives -composable core-sets for the determinant maximization. Clearly this completes the proof of the theorem by setting to a constant.

So proving Theorem 1.2 boils down to showing 4.1. Before, getting into that, we analyze the running time, and present some remarks about the implementation.

#### Running time.

Let be the output of the greedy. By Theorem 2.1 . The algorithm starts with and by definition, in any iteration increases the volume by a factor of at least , hence the total number of iterations is . Finally, each iteration can be naively executed by iterating over all points in , forming the corresponding matrix, and computing the determinant in total time .

We also remark that unlike the algorithm in [indyk2018composable], our method can also be executed without any changes and additional complexity, when the point set is not explicitly given in the input; instead, it is presented by an oracle that given two points of returns their inner product. One can note that in this case the algorithm can be simulated by querying this oracle for at most times.

### 4.1 Proof of 4.1

With no loss of generality, suppose that , the proof automatically extends to . Therefore, has the following property: for any and , . Fix , and let . Our goal is to show there exists so that .

Let be the -dimensional linear subspace spanned by the set of points in the core-set, and let be the projection of onto this subspace. We proceed with proof by contradiction. Set , and suppose to the contrary that for any , . With this assumption, we prove the two following lemmas.

.

###### Lemma 4.3.

.

One can note that, combining the above two lemmas and applying the triangle inequality implies , which contradicts the assumption and completes the proof.

Therefore, it only remains to prove the above lemmas. Let us first fix some notation. Let and for any , let denote the -dimensional subspace spanned by points in .

Proof of 4.2. For , let be the projection of onto . We prove that there exists an index such that we can write where every . Let be the rank, i.e., maximum number of independent points of and clearly as has dimension , we have . Take a subset of independent points that have the maximum volume and let be a point in and note that this point should exist as there are points in the core-set. Thus we can write . With an idea similar to the one presented in [cm-smvsm-09], we can prove that the following claim holds.

###### Claim 4.4.

For any such that , we have .

###### Proof.

We prove that if the claim is not true, then has a larger volume than which contradicts the choice of . Let be the linear subspace passing through . It is easy to see that , meaning that . However, if then since is the only point in which in not in , then which is a contradiction. ∎

Finally, for any , set the corresponding coefficient . So we get that where every .

Now take the point . Note that, is in fact the projection of onto . Therefore, using triangle inequality, we have

 dist(q′j,q)=dist(q,H)≤∑i≠j|αi|dist(qi,H)≤(k−1)xk (1)

Then we get that

 dist(p,pG)=dist(p,G)≤dist(p,G¯j)as G¯j⊂G≤dist(qj,G¯j)by % the local search property≤dist(qj,q)as q∈G¯j≤dist(qj,q′j)+dist(q′j,q)by triangle inequality

Proof of 4.3. Again we prove that we can write where all . We assume that the set of points are linearly independent, otherwise the points in have rank less than and thus the volume is . Therefore, we can write . Note that for any , we have

 dist(pG,G¯i) ≤dist(p,G¯i) ≤dist(qi,G¯i)by the% local search property

where the first inequality follows since is a subspace of and is the projection of onto . Again, similar to the proof of Claim 4.4, this means that . Therefore, using triangle inequality

 dist(pG,H)=dist(k∑i=1αiqi,H)≤k∑i=1|αi|dist(qi,H)

## 5 The Greedy Algorithm

In this section we analyze the performance of the greedy algorithm (see section 2) as a composable core-set function for the determinant maximization and prove Theorem 1.1. Our proof plan is similar to the to the analysis of the local search. We analyze the guarantee of the greedy as a core-set mapping for -directional height, and combining that with 3.4 we achieve the result. We prove the following.

###### Lemma 5.1.

Let be an arbitrary point set and denote the output of running greedy on . Then, is a -approximate core-set for the -directional height of , i.e. for any we have

 h(c(P),H)≥12k⋅3k⋅h(P,H)

So the greedy is a -approximate core-set for -directional height problem. Combining with 3.4, we conclude it is also a composable core-set for the determinant maximization which proves Theorem 1.1.

### 5.1 Proof of 5.1

The proof is similar to the proof of 4.1. Let be the -dimensional subspace spanned by the output of greedy. Also for a point , define to be its projection onto . Fix , let for some number , which in particular implies that for any , . Then, our goal is to prove . We show that by proving the following two lemmas.

For any , .

###### Lemma 5.3.

For any , .

Clearly, combining them with triangle inequality, we get that for any , , which implies and completes the proof. So it remains to prove the lemmas. Let the output of the greedy be with this order, i.e. is the first vector selected by the algorithm.

Proof of 5.2. Recall that is the output of greedy. For any and for any , let and define to be the projection of onto . We show the lemma using the following claim.

###### Claim 5.4.

For any and any , we can write so that for each , .

Let us first show how the above claim implies the lemma. It follows that we can write where all . Now since for each , by assumption, we have that . Therefore, it suffices to prove the claim.

Proof of 5.4. We use induction on . To prove the base case of induction, i.e., , note that is the vector with largest norm in . Thus we have that and therefore we can write where . Now, lets assume that the hypothesis holds for the first points; that is, the projection of any point onto can be written as where ’s are at most .

Now, note that by the definition of the greedy algorithm, is the point with farthest distance from . Therefore, for any point , we know that , and thus, . Therefore if we define to be the projection of onto , we can write

 pt+1=αt+1qt+1−αt+1qtt+1+pt where |αt+1|≤1.

By the hypothesis, we can write , and , where , and . Since , we can write

 pt+1=αt+1qt+1+∑j≤t(βj−αt+1γj)qj=∑j≤t+1αjqj

where . This completes the proof. ∎

Proof of 5.3. First, note that for any , we have This is because the greedy algorithm has chosen over in its -th round which means that , and by definition of the greedy algorithm for any we have . So it is enough to prove

 ∃1≤t≤k−1 s.t. dist(qt+1,Gt)≤3kx (2)

For , let be the projection of onto . Recall that, we are assuming that for any , . To prove (2), we use proof by contradiction, so suppose that for all , . We also define to be the projection of on , i.e., . Given these assumptions, we prove the following claim.

###### Claim 5.5.

For any , we can write where , where for a point and a subspace , denotes projection of onto .

###### Proof.

Intuitively, this is similar to 5.4. However, instead of looking at the execution of the algorithm on the points , we look at the execution of the algorithm on the projected points . Since all of these

points are relatively close to the hyperplane

, the distances are not distorted by much and therefore, we can get approximately the same bounds. Formally, we prove the claim by induction on , and show that for any s.t. , the point can be written as the sum such that .

Base Case. First, we prove the base case of induction, i.e., . Recall that by our assumption, , and thus by triangle inequality, we have that . Therefore, since is the vector with largest norm in , using triangle inequality again, we have that for any ,

 ||q′j||≤||qj||≤||q1||≤||q′1||+x/k≤(1+12k)||q′1||

Therefore we can write where .

Inductive step. Now, lets assume that the hypothesis holds for . In particular this means that we can write where , and that for a given , we can write where ’s are at most . Now let . By triangle inequality, we get that

 dist(qt+1,Gt) ≤dist(qt+1,q′t+1) (3) +dist(q′t+1,Π(G′t)(q′t+1)+ (4) dist(Π(G′t)(q′t+1),Gt) ≤x/k+ℓ+dist(∑i≤tβiq′i,∑i≤tβiqi) ≤x/k+ℓ+∑i≤t|βi|x/k ≤ℓ+3tx. (5)

Now we consider two case. If then using the above

 dist(qt+1,Gt)≤2⋅3tx≤3kx,

which contradicts our assumption of . Otherwise,

 dist(Π(G′t+1)(q′j),G′t)≤dist(q′j,G′t)≤dist(qj,Gt)≤dist(qt+1,Gt)≤2ℓ,

where the last inequality follows from Equation 3 . Therefore, we can write where .

By the hypothesis, we can write , where . Since , we can write

 Π(G′t+1)(q′j)=αt+1q′t+1+∑i≤t(γi−αt+1βi)q′i=∑i≤t+1αiq′i where |αi|≤3t+1.

This completes the proof of the claim. ∎

To finish the proof of the lemma, let us show how it follows from the claim. First, note that are points in the -dimensional space , so for some , should lie inside and we have . Fix such . Define the point where are taken from the above claim which means Note that by definition . Therefore,

 dist(q′t+1,qα) =dist(qα,H) (6) ≤∑i≤tαidist(qi,H)≤3kt⋅x/k. (7)

Then we get that

 dist(qt+1,Gt)≤dist(qt+1,qα)as qα∈Gt≤dist(qt+1,q′t+1)+dist(q′t+1,qα)≤x/k+3kt⋅x/k≤3kx

where the second inequality holds because of triangle inequality and the last one from (6) and the fact that . This contradicts our assumption that , and proves the lemma. ∎

## 6 Experiments

In this section, we evaluate the effectiveness of our proposed Local Search algorithm empirically on real data sets. We implement the following three algorithms.

• The Greedy algorithm of Section 5 (GD).

• The Local Search algorithm of Section 4 with accuracy parameter (LS).

• The LP-based algorithm of [indyk2018composable] which has almost tight approximation guarantee theoretically (LP). Note that this algorithm might pick up to points in the core-set.

#### Data sets.

We use two data sets that were also used in [li2015efficient] in the context of approximating DPPs over large data sets.

• MNIST [lecun1998gradient]: contains a set of images of hand-written digits, where each image is of size by .

• GENES [batmanghelich2014diversifying]: contains a set of genes, where each entry is a feature vector of a gene. The features correspond to shortest path distances of different hubs in the BioGRID gene interaction network. This data set was initially used to identify a diverse set of genes to predict a tumor. Here, we slightly modify it and remove genes that have an unknown value at any coordinate which gives us a data set of size .

Moreover, we apply an RBF kernel on both of these data sets using for MNIST and for GENES. These are the same values used in the work of [li2015efficient].

### 6.1 Experiment Setup.

We partition the data sets uniformly at random into multiple data sets . We use for the smaller GENES data set, and for the larger MNIST data set we use and also we use (equal to the number of digits in the data set). Moreover, since the partitions are random, we repeat every experiment times and take the average in our reported results.

We then use a core-set construction algorithm to compute core-sets of size , i.e., , for . Recall that GD, LS and LP correspond to the Greedy, Local Search and LP-based algorithm of [indyk2018composable] respectively.

Finally, we take the union of these core-sets and compute the solutions for . Since computing the optimal solution can take exponential time (), we will instead use an aggregation algorithm (either GD, LS or LP). We will use the notation to refer to the constructed set of points, returned by . For example, GD/LS refers to the set of points returned by the Greedy algorithm on the union of the core-sets, where each core-set is produced using the Local Search algorithm.

Finally, we vary the value of from to .

### 6.2 Results

#### Local Search vs. Greedy as offline algorithms.

Our first set of experiments simply compares the quality of Greedy and Local Search as centralized algorithms on whole data sets. We perform this experiment to measure the improvement of Local Search over Greedy in the offline setting. Intuitively, this improvement upper bounds the improvement one can expect in the core-set setting. Figure 2 shows the improvement ratio of the determinant of the solution returned by the Local Search algorithm over the determinant of the solution returned by the Greedy algorithm. On average over all values of , Local Search improves over Greedy by for GENES data set and for MNIST data set. Figure 2 shows the ratio of the time it takes to run the Local Search and Greedy algorithms as a function of for both data sets. On average, it takes about times more to run the Local Search algorithm.

#### Local Search vs. Greedy as core-sets.

In our second experiment, we use Greedy algorithm for aggregation, i.e., , and compare GD/LS with GD/GD. Figure 4 shows the improvement of local search over greedy as a core-set construction algorithm. The graph is drawn as a function of , and for each , the improvement ratio is an average over all 10 runs, and shown for all data sets (including GENES, MNIST with partition number , and MNIST with ).

On average this improvement is , and for GENES, MNIST10 and MNIST50 respectively. Moreover, in of all 180 runs of this experiment, Local Search performed better than Greedy, and for some instances, this improvement was up to . Finally, this improvement comes at a cost of increased running time. Figure 4 shows average ratio of the time to construct core-sets using Local Search vs. Greedy.

#### Local Search vs. Greedy - identical algorithms.

We also consider the setting where the core-set construction algorithm is the same as the aggregation algorithm. This mimics the approach of [mirzasoleiman2013distributed], who proposed to use the greedy algorithm on each machine to achieve a small solution; then each machine sends this solution to a single machine that further runs the greedy algorithm on the union of these solutions and reports the result.

In this paper show that if instead of Greedy, we use Local Search in both steps, the solution will improve significantly. Using our notation, here we are comparing LS/LS vs. GD/GD. Figure 5 shows the improvement as a function of , taken average over all 10 runs.

On average the improvement is , and for GENES, MNIST10 and MNIST50 respectively. Moreover, in only 1 out of 180 runs the Greedy perfomed better than Local Search. The improvement could go as high as .

#### Comparing Local Search vs. the LP-based algorithm.

In this section, we compare the performance of the Local Search algorithm and the LP-based algorithm of [indyk2018composable] for constructing core-sets, i.e., we compare GD/LS with GD/LP. Figure 7 shows how much Local Search improves over the LP-based algorithm. On average this improvement is , and for GENES, MNIST10 and MNIST50 respectively. Moreover, in of all runs, Local Search performed better than Lp-based algorithm, and this improvement can go upto . Figure 7 shows the average ratio of the time to construct core-sets using the LP-based algorithm vs. Local Search. As it is clear from the graphs, our proposed Local Search algorithm performs better than even the LP-based algorithm which has almost tight approximation guarantees: while picking fewer points in the core-set, in most cases it finds a better solution and runs faster.

## 7 Conclusion

In this work, we proposed to use the Local Search algorithm to construct composable core-sets for the determinant maximization problem. From theoretical perspective, we showed that it achieves a near-optimal approximation guarantee. We further analyzed its performance on large real data sets, and showed that most of the times, Local Search performs better than both the almost optimal approximation algorithm, and the widely-used Greedy algorithm. Generally, for larger values of , the percentage of this improvement has an increasing pattern, however, the amount of this improvement depends on the data set. We also note that here, we used the naive implementation of the Local Search algorithm: one could tune the value of to further improve the quality of the solution. Finally, we provided a doubly exponential guarantee for the Greedy algorithm, however, our experiments suggest that this bound might be open to improvement.

## Acknowledgments

The authors would like to thank Stefanie Jegelka, Chengtao Li, and Suvrit Sra for providing their data sets and source code of experiments from [li2015efficient].

Piotr Indyk was supported by NSF TRIPODS award No. 1740751 and Simons Investigator Award. Shayan Oveis Gharan and Alireza Rezaei were supported by the NSF grant CCF-1552097 and ONR-YIP grant N00014-17-1-2429.