# Minimizing Sum of Non-Convex but Piecewise log-Lipschitz Functions using Coresets

We suggest a new optimization technique for minimizing the sum ∑_i=1^n f_i(x) of n non-convex real functions that satisfy a property that we call piecewise log-Lipschitz. This is by forging links between techniques in computational geometry, combinatorics and convex optimization. Example applications include the first constant-factor approximation algorithms whose running-time is polynomial in n for the following fundamental problems: (i) Constrained ℓ_z Linear Regression: Given z>0, n vectors p_1,...,p_n on the plane, and a vector b∈R^n, compute a unit vector x and a permutation π:[n]→[n] that minimizes ∑_i=1^n |p_ix-b_π(i)|^z. (ii) Points-to-Lines alignment: Given n lines ℓ_1,...,ℓ_n on the plane, compute the matching π:[n]→[n] and alignment (rotation matrix R and a translation vector t) that minimize the sum of Euclidean distances ∑_i=1^n dist(Rp_i-t,ℓ_π(i))^z between each point to its corresponding line. These problems are open even if z=1 and the matching π is given. In this case, the running time of our algorithms reduces to O(n) using core-sets that support: streaming, dynamic, and distributed parallel computations (e.g. on the cloud) in poly-logarithmic update time. Generalizations for handling e.g. outliers or pseudo-distances such as M-estimators for these problems are also provided. Experimental results show that our provable algorithms improve existing heuristics also in practice. A demonstration in the context of Augmented Reality show how such algorithms may be used in real-time systems.

## Authors

• 12 publications
• 36 publications
02/27/2019

### Provable Approximations for Constrained ℓ_p Regression

The ℓ_p linear regression problem is to minimize f(x)=||Ax-b||_p over x∈...
03/16/2019

### k-Means Clustering of Lines for Big Data

The k-means for lines is a set of k centers (points) that minimizes the ...
01/10/2021

### Provably Approximated ICP

The goal of the alignment problem is to align a (given) point cloud P = ...
03/29/2022

### Efficient Convex Optimization Requires Superlinear Memory

We show that any memory-constrained, first-order algorithm which minimiz...
04/11/2022

### Optimizing a low-dimensional convex function over a high-dimensional cube

For a matrix W ∈ℤ^m × n, m ≤ n, and a convex function g: ℝ^m →ℝ, we are ...
12/12/2019

### Sublinear Time Numerical Linear Algebra for Structured Matrices

We show how to solve a number of problems in numerical linear algebra, s...
11/18/2020

### Introduction to Core-sets: an Updated Survey

In optimization or machine learning problems we are given a set of items...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

We define below the general problem of minimizing sum of piecewise log-Lipschitz functions, and then suggest two example applications.

##### 1.0.0.1 Minimizing sum of piecewise log-Lipschitz functions.

We consider the problem of minimizing the sum over of a set of real non-negative functions that may not be convex but satisfy the piecewise log-Lipschitz; see Definition 2. This condition means that we can partition the range of each function into small subsets of (sub-ranges), such that satisfies the log-Lipschitz condition on each of these sub-ranges; see Definition 1. That is, has a single minimum in this sub-range, and increases in a bounded ratio around its local minimum. Note that might not be convex even in this sub-range. Formally, if the distance from the minimum in a sub-range is multiplied by , then the value of the function increases by a factor of at most for some small (usually constant) .

More generally, we wish to minimize the cost function where is a log-Lipschitz function as explained in the previous paragraph, and are the log-Lipschitz functions in .

As an application, we reduce the following problems to minimizing such a set of functions.

##### 1.0.0.2 Aligning points-to-lines

, known as the Perspective-n-Point (PnP) problem, is a fundamental problem in computer vision which aims to compute the position of an object (formally, a rigid body) based on its position as detected in a 2D-Camera

[16, 18]. Here, we assume that the structure of the object (its 3D model) is known. This problem is equivalent to the problem of estimating the position of a moving camera, based on a captured 2D image of a known object, which is strongly related to the very common procedure of “camera calibration" that is used to estimate the external parameters of a camera using a chessboard.

Formally, the input to the problem is an ordered set of lines that intersect at the origin and an ordered set of points, both in . Each line represents a point in the 2D image. The output is an alignment that minimizes the sum of Euclidean distances over each point (column vector) and its corresponding line, i.e.,

 min(R,t)n∑i=1dist(Rpi−t,ℓi), (1)

where the minimum is over every rotation matrix and translation vector . Here, is the Euclidean distance but in practice we may wish to use non-Euclidean distances, such as distances from a point to the intersection of its corresponding line with the camera’s image plane.

While dozens of heuristics were suggested over the recent decades, this problem is open even when the points and lines are on the plane, e.g. when we wish to align a set of GPS points to a map of lines (say, highways).

We tackle a variant of this problem, when both the points and lines are in , and the lines do not necessarily intersect at the origin.

##### 1.0.0.3 Constrained regression

is a fundamental problem in machine learning, which aims to compute a vector

such that the inner product will predict the label or classification of a point (data record) . Without loss of generality we can assume that all the entries of are non-negative. This motivates the problem of minimizing the error on a “training dataset" of records which are the rows of an real matrix , i.e, with respect to -norm where (including the non-standard norm ). To avoid overfitting, or to decrease sparsity (number of non-zeroes in ) we may wish to keep the norm of constant, say, .

This yields the constrained optimization problem

 (2)

Lagrange multiplier can then be used to obtain the problem

 minλ∈R,x∈Rd∥Ax−b∥+λ(∥x∥−1). (3)

Again, to our knowledge both (2) and (3) are open problems already for points on the plane , and even for the linear regression (). A possible lee-way is to calibrate manually, where large implies better sparsity and less over fitting but higher fitting error, and replace the non-convex constraint by the convex constraint . This yields the common Lasso (least absolute shrinkage and selection operator) methods [30]

 min∥x∥≤1∥Ax−b∥+λ∥x∥ (4)

for different norms (usually combinations of and ). Due to the known value of and convex constraint on , problem (4) can usually be solved using convex optimization.

### 1.1 Generalizations

We consider more generalizations of the above problems as follows.

##### 1.1.0.1 Unknown matching

is the case where the matching between the th point to its line (in the PnP problem (1)) or to the th label of (in (3)) is unknown for every . E.g., in the PnP problem there are usually no labels in the observed images, and in regression b may be an (unsorted) vector of anonymous votes or classifications for all the n users.

##### 1.1.0.2 Non-distance functions

where for an error vector of a candidate solution, the cost that we wish to minimize over all these solutions is where , such as for “worst case" error, which is more robust to noisy data, or of squared distances for maximizing likelihood in the case of Gaussian noise. As in the latter two cases, the function may not be a distance functions.

##### 1.1.0.3 Robustness to outliers

may be obtained by defining to be a function that ignores or at least put less weight on very large distance, maybe based on some given threshold. Such a function is called an M-estimator and is usually a non-convex function.

##### 1.1.0.4 Coreset

in this paper refers to a small representation of a set of input points by a weighted (scaled) subset. The approximation is multiplicative factor, with respect to the cost of any item (query) in a given set . E.g. in (1) it is the union over alignments , and in (2) is the set of unit vectors in . Composable coresets have the property that they can be merged and re-reduced; see e.g. [1, 15].

Our main motivation for designing coresets is (i) to reduce the running time of our algorithms from polynomial to near-linear in , and (ii) handle big data computation models as follows, which is straightforward using composable coresets.

##### 1.1.0.5 Handling big data

in our paper refers to the following computation models: (i) Streaming support for a single pass over possibly unbounded stream of items in using memory and update time that is sub-linear (usually poly-logarithmic) in . (ii) Parallel computations on distributed data that is streamed to machines where the running time is reduced by and the communication between the machines to the server should be also sub-linear in its input size . (iii) Dynamic data which includes deletion of pairs. Here memory is necessary, but we still desire sub-linear update time.

### 1.2 Our contribution

##### 1.2.0.1 Generic framework

for defining a cost function for any finite input subset from a set called ground set, and an item (called query) from a (usually infinite) set . We show that this framework enables handling the generalization in Section 1.1 such as outliers, m-estimators and non-distance functions in a straightforward way. Formally, we define

 cost(A,q):=f(lip(D(a1,q)),⋯,lip(D(an,q))), (5)

where and are an -log-Lipschitz functions, and ; See Definition 4.

##### 1.2.0.2 Optimization of piecewise log-Lipschitz functions.

Given and piecewise -log-Lipschitz functions , we prove that one of their minima approximates their value in simultaneously (for every ), up to a constant factor. See Theorem 3. This yields a finite set of candidate solutions (called centroid set) that contains an approximated solution, without knowing .

We use this result to compute that approximates simultaneously (for every ), where is the query that minimizes over every . Observation 5 proves that is the desired approximation for the optimal solution in our framework.

##### 1.2.0.3 Simultaneous optimization and matching

may be required for the special case that is a set of pairs, and we wish to compute the permutation and query that minimize over every and , where is the corresponding permutation of the pairs. We provide constant factor approximations for the case in (5); See Theorems 8 and 13.

##### 1.2.0.4 Constrained Regression

as defined in Section 1 is our first example application of the above framework, where , is the unit circle, and for every and . We provide the first constant factor approximation for the optimal solution that takes time polynomial in . Such a solution can be computed for every cost function and as defined in (5), e.g., where we wish to minimize the non-convex sum over ; see Theorem 7. Simultaneous optimization and matching for are suggested in Theorem 8.

##### 1.2.0.5 Approximated Points-to-Lines alignment

as defined in Section 1 is our second example, where is a set of paired point and line on the plane, is the union over every rotation matrix and a translation vector , for every and . We provide the first constant factor approximation for the optimal solution that takes polynomial time. Such a solution can be computed for every cost function as defined in (5); see Theorem 12. Including simultaneous optimization and matching for ; See Theorem 13.

##### 1.2.0.6 Composable ε-coresets

for aligning points-to-lines are suggested in Theorem 14 and 15, based on reduction to coresets for regression.

##### 1.2.0.7 Experimental Results

show that our algorithms performed better also in practice, compared to both existing heuristics and provable algorithms for related problems. A system for head tracking in the context of augmented reality shows that our algorithms can be applied in real-time using sampling and coresets. Existing solutions are either too slow or provide unstable images, as is demonstrated in the companion video [20].

## 2 Related Work

For the easier case of summing convex function, a framework was suggested in [11, 12]. However, for the case of summing non-convex functions as in this paper, each with more than one local minima, these techniques do not hold. This is why we had to use more involved algorithms in Section 4. Moreover, our generic framework such as handling outliers and matching can be applied also for the works in [11, 12].

The motivation was to obtain weak but faster -approximations that suffice for computing coresets. The polynomial time algorithms can then be applied on the coreset. A classic example is projective clustering where we wish to approximate points in by a set of affine -dimensional subspaces, such as -means () or PCA (). Constant factor approximations can be computed by considering every set of subspaces, each spanned by input points.

Summing of non-convex but polynomial or rational functions was suggested in [31]. This is by using tools from algebraic geometry such as semi-algebraic sets and their decompositions. For high degree polynomial such techniques may be used to compute the minima in Theorem 3. In this sense, piecewise log-Lipschitz functions can be considered as generalizations of such functions, and our framework may be used to extend them for the generalizations in Section 1.1 (outliers, matching, etc.).

##### 2.0.0.1 Aligning points to lines.

The problem for aligning a set of points to a set of lines in the plane is natural e.g. in the context of GPS points [29, 27], finding sky patterns as in Fig. 1, or aligning pixels in a 2D image to an object that is pre-defined by linear segments [25], as in augmented reality applications [22, 32, 17].

The only known solutions are for the case of sum of squared distances and dimensions, with no outliers, and when the matching between the points is given. In this case, the Lagrange multipliers method can be applied in order to get a set of nd order polynomials. For the problem is called PnP (Perspective-n-Points) and has provable solutions only for the case of exact alignment (zero fitting error) [26, 19] and numerous heuristics. When the matching is unknown ICP (Iterative closest point) is the main common technique based on greedy nearest neighbours; see references in [6]. To handle outliers RANSAC [9] is heuristically used.

##### 2.0.0.2 Constrained regression

is usually used to avoid overfitting and noise for linear regression, as explained in Section 1 and e.g. in [30, 34, 34, 33]; see references therein. The solution is usually based on relaxation to convex optimization. However, when the tradeoff parameter is unknown, or when we want to ignore outliers, or use -estimators, the resulting problems are non-convex.

To our knowledge, no existing provable algorithms are known for handling outliers, unknown matching, or norm for the case .

##### 2.0.0.3 Coresets

have many different definitions. In this paper we use the simplest one that is based on a weighted subset of the input, which preserve properties such as sparsity of the input and numerical stability. Coresets for regression were suggested in [8] using well-conditioned matrices that we cite in Theorem 14. We improve the bounds on the coreset size using the framework from [11, 7]. We also reduce the points-to-lines aligning problem to constrained optimization in the proof of Theorem 31, which allows us to apply our algorithms on these coresets for the case of sum over point-line distances.

In most coresets papers, the main challenge is to compute the coreset. However, in this paper, the harder problem was to extract the desired constrained solution from the coresets which approximate every vector, and ignore the constraints.

## 3 Optimization Framework

In what follows, for every pair of vectors and in we denote if for every . Similarly, is non-decreasing if for every .

The following definition is a generalization of Definition 2.1 in [14] from to dimensions, and from to .

###### Definition 1 (Log-Lipschitz function).

Let , let be an integer, and let . Let be a subset of , and be a non-decreasing function. Then is -log-Lipschitz over , if for every and , we have The parameter is referred to as the log-Lipschitz constant.

Unlike previous papers, the loss fitting (“distance”) function that we want to minimize in this thesis is not a log-Lipshcitz function. However, it can be partitioned to a constant number of log-Lipschitz functions in the following sense.

###### Definition 2 (Piecewise log-Lipschitz).

Let be a continuous function over a set , and be a distance function. Let . The function is piecewise -log-Lipschitz if there is a partition of into subsets such that for every :

1. has a unique infimum at , i.e., .

2. is an -log-Lipschitz function; see Definition 1.

3. for every .

The set of minima is denoted by .

Suppose that we have a set of piecewise -log-Lipschitz functions, and consider the union over every function in this set. The following lemma states that, for every , this union contains a value such that approximates up to a multiplicative factor that depends on .

###### Theorem 3 (simultaneous approximation).

Let be function, where is a piecewise -log-Lipschitz function for every , and let denote the minima of as in Definition 2. Let . Then there is such that for every ,

 gi(x′)≤2rgi(x). (6)
###### Definition 4 (Optimization framework).

Let be a set called ground set, let be a finite input set and let be a set of queries. Let be a function. Let be an -log-Lipschitz function be an -log-Lipschitz function. Let . We define

 cost(A,q)=f(lip(D(a1,q)),⋯,lip(D(an,q))).

The following observation states that if we find a query that approximates the function for every input element, then it also approximates the function as defined in Definition 4.

###### Observation 5.

Let be defined as in Definition 4. Let and let . If for every , then

 cost(A,q′)≤crs⋅cost(A,q∗).

## 4 Algorithms for Aligning Points to Lines

In this section, we introduce our notations, describe our algorithms, and give an overview for each algorithm. See Sections D.1D.2 and D.3 for an intuition of the algorithms presented in this section.

##### 4.0.0.1 Notation.

Let be the set of real matrices. We denote by the length of a point , by the Euclidean distance from to an a line in for every line , by its projection on , i.e, , and by we denote the linear span of . For a matrix we denote by an arbitrary matrix whose columns are mutually orthogonal unit vectors, and also orthogonal to every vector in . Hence,

is an orthogonal matrix. If

, then such that and . We denote for every integer .

In this paper, every vector is a column vector, unless stated otherwise. A matrix is called a rotation matrix if it is orthogonal and its determinant is , i.e., and . For that is called a translation vector, the pair is called an alignment. We define Alignments to be the union of all possible alignments in -dimensional space.

For and a set of pairs of elements, is defined as .

##### 4.0.0.2 Algorithms.

We now present algorithms that compute a constant factor approximation for the problem of aligning points to lines, when the matching is either known or unknown. Algorithm 2 handles the case when the matching is given, i.e. given an ordered set of points, and a corresponding set of lines, both in , we wish to find an alignment that minimizes, for example, the sum of distances between each point in and it’s corresponding line in .

Formally, let be a set of point-line pairs, , and such that is the distance between and for every and . Let be as defined in Definition 4 for . Then Algorithm 2 outputs a set of alignments that is guaranteed to contain an alignment which approximates up to a constant factor; See Theorem 12.

Algorithm 3 handles the case when the matching is unknown, i.e. given unordered sets and consisting of points and lines respectively, we wish to find a matching function and an alignment that minimize, for examples, the sum of distances between each point and its corresponding line .

Formally, let be as defined above but with . Then Algorithm 3 outputs a set of alignments that is guaranteed to contain an alignment which approximates up to a constant factor, where the minimum is over every alignment and matching function ; See Theorem 13.

### 4.1 Algorithm 1: Z-Configurations

##### 4.1.0.1 Overview of Algorithm 1.

Algorithm 1 takes as input a unit vector and three points . The vector represents a direction of a line that intersects the origin. It computes matrices that satisfy Lemma 9; See Section D.1 for intuition and interpretation of those matrices. In Lines 11 we define constants. In Line 1 we define a rotation matrix that rotates the coordinates system by radians counter clockwise around the origin. In Lines 1 and 1 we compute the output matrices and respectively.

In Line 1, if is in the halfplane to the left of the vector , and otherwise. In Line 1 we compute the output matrix ; See Fig. 2 for an illustration.

### 4.2 Algorithm  2: Align

In this section we present the main algorithm, called Align; See Algorithm 2. The output of this algorithm satisfies Lemma 10. The algorithm uses our main observations and general technique for minimizing the sum of distances between the point-line pairs.

#### 4.2.1 Overview of Algorithm 2

The input for Algorithm 2 is a set of pairs, each consists of a point and a line on the plane. The algorithm runs exhaustive search on all the possible tuples and outputs candidate set of alignments. Alignment consists of a rotation matrix and a translation vector . Theorem 10 proves that one of these alignments is the desired approximation. See Section D.2 for intuition.

Line 2 identifies each line by its direction (unit vector) and distance from the origin. Lines 22 iterates over every triple of input pairs such that , and turns it into a constant number of alignments . In Lines 22 we handle the case where the lines and are not parallel. In Line 2 we handle the case where and are parallel.

The case where and are not parallel. Lines 22 compute a rotation matrix that rotates to the -axis. Line 2 calls the sub-procedure Algorithm 1 for computing three matrices and . In Line 2 we revert the effect of the rotation matrix . Lines 22 compute the distance between and the intersection between and since we assumed this intersection point is the origin in Algorithm 1. The matrix and the line are used to compute a set of unit vectors in Line 2. Every defines a possible positioning for the triplet. In Line 2 we define an alignment for each . The union of the alignments in is then added to the output set in Line 2.

The case where and are parallel. In the case, we place , and place as close as possible to . If there are more than one alignment that satisfies those conditions, then we pick an arbitrary one. This is done in Line 2.

### 4.3 Algorithm 3: Align+Match

##### 4.3.0.1 Overview of Algorithm 3.

Algorithm 3 takes as input a set of points and lines in , and a cost function as defined in Theorem 13. The algorithm computes an alignment and a matching function that approximate the minimal value of the given cost function; See Theorem 13.

In Line 3 we iterate over every . In Lines 33 we match to to and to , compute their corresponding set of alignments by a call to Algorithm 2, and then add to the set . Finally, in Lines 33 we compute the optimal matching for every alignment in , and pick the alignment and corresponding matching that minimize the given cost function.

## 5 Statements of Main Results

### 5.1 Constrained regression

The following lemma proves that for every two paired sets and and unit vector , there exists for some that approximates for every .

###### Lemma 6.

Let and . Then there is a set of unit vectors that can be computed in time such that (i) and (ii) hold as follows:

1. For every unit vector there is a vector such that for every ,

 |aTix′−bi|≤4⋅|aTix−bi|. (7)
2. There is such that

The following theorem generalizes the approximation obtained in Lemma 6 to the family of cost functions defined in Definition 4.

###### Theorem 7.

Let be a set of pairs, where for every , we have that and . Let for every and unit vector . Let be as defined in Definition 4. Then in time we can compute a unit vector such that

Recall that for and a set of pairs of elements, is defined as . Lemma 6 proves that for every set and unit vector , there is , such that the minimizes approximates for every . Hence, by computing the union of minimizers of for every , we are guaranteed that for any permutation and unit vector , one of the vectors in will approximate for every .

###### Theorem 8.

Let be a set of pairs, where for every , we have that and . Let for every , and unit vector . Let be as defined in Definition 4 for and . Then in we can compute a unit vector and that satisfy the following

 cost(Aπ′,x′)≤4r⋅minx,πcost(Aπ,x),

where the minimum is over every unit vector and .

### 5.2 Aligning Points-To-Lines

Table 1 summarizes the important results that we obtained for the problem of aligning points-to-lines.

The following Lemma proves that the matrices computed in Algorithm 1 satisfy some set of properties.

###### Lemma 9.

Let be a unit vector and be the line in this direction. Let be the vertices of a triangle such that . Let be the output of a call to Z-Configurations; see Algorithm 1. Then the following hold:

1. -axis and iff there is a unit vector such that and .

2. For every unit vector , we have that if and .

What follows is the main Lemma of Align; See Algorithm 2. The proof of this lemma is divided into 3 steps that correspond to the steps discussed in the intuition for Algorithm 2 in Section D.2.

###### Lemma 10.

Let be set of pairs, where for every , we have that is a point and is a line, both on the plane. Let be an output of a call to
Align; see Algorithm 2. Then for every alignment there exists an alignment such that for every ,

 dist(Rpi−t,ℓi)≤16⋅dist(R∗pi−t∗,ℓi). (8)

Moreover, and can be computed in <