# Continuous optimization

Sufficient conditions for the existence of efficient algorithms are established by introducing the concept of contractility for continuous optimization. Then all the possible continuous problems are divided into three categories: contractile in logarithmic time, contractile in polynomial time, or noncontractile. For the first two, we propose an efficient contracting algorithm to find the set of all global minimizers with a theoretical guarantee of linear convergence; for the last one, we discuss possible troubles caused by using the proposed algorithm.

## Authors

• 12 publications
• 42 publications
09/03/2019

### Contractility of continuous optimization

By introducing the concept of contractility, all the possible continuous...
09/03/2019

### Contraction methods for continuous optimization

We describe a class of algorithms that establishes a contracting sequenc...
11/28/2017

### Seeded Graph Matching: Efficient Algorithms and Theoretical Guarantees

In this paper, a new information theoretic framework for graph matching ...
06/28/2019

### Algorithms for weighted independent transversals and strong colouring

An independent transversal (IT) in a graph with a given vertex partition...
11/05/2020

### Optimization of Virtual Networks

We introduce a general and comprehensive model for the design and optimi...
08/01/2020

### Dividing Bads is Harder than Dividing Goods: On the Complexity of Fair and Efficient Division of Chores

We study the chore division problem where a set of agents needs to divid...
04/09/2014

### Noisy Optimization: Convergence with a Fixed Number of Resamplings

It is known that evolution strategies in continuous domains might not co...

## Code Repositories

### JCOOL

Java COntinuos Optimization Library

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## Acknowledgments

Author contributions: X.L. performed the theoretical analysis and designed the algorithm, X.X. implemented the program, ran experiments, and produced all the plots; X.L. wrote the manuscript. All authors discussed and commented on the experiments and the manuscript. Competing interests: The authors declare no competing financial interests.

## Supplementary materials

Section S1. Method
Section S2. Monotonic convergence
Section S3. Hierarchical low frequency dominant functions
Section S4. Category I: contractile in logarithmic time
Section S5. Category II: contractile in polynomial time
Section S6. Category III: noncontractile
Fig. S1
References
Data S1

## Section S1. Method

For the median-type contracting algorithm, the contracting sequence of length , , is established as follows: define and

 D(k+1)={x∈D(k):Aχ(k)f(x)≤Medianfχ(k)}, 0≤k≤K−1, (S1)

where are uniformly distributed over with the sample size , the relevant data values with the current and , and is an approximation of w.r.t. the data pairs such that

 ∣∣Aχ(k)f(x)−f(x)∣∣≤Medianfχ(k)−f∗χ(k)  and  |r(k)−r(k)A|

here, is a given threshold, is the percentage of falling into , and is the percentage of samples uniformly distributed over falling into .

Two key parts of the algorithm are (i) the method for constructing a model to fit the given data on , and (ii) the sampling strategy for further generating uniform samples over according to some known interior points. In this work, we use Gaussian process (GP) regression for constructing a model to fit the data pairs

. It has merit in the flexible expressivity because its hyperparameters are automatically tuned with the likelihood maximization. See Data S1 for technical details and (

1) for more.

Now consider the sampling strategy. Suppose are given points and is a union of sample sets , where contains samples from a normal distribution with mean

and variance

(sufficiently large to cover ). Further, let . Now we are going to add a new point from to and should preferably fill in the gaps in the distribution of . Actually, this subsequent point could be determined as

 t=argmaxt∈T(k)(min1≤i≤N∥t−χi∥2). (S3)

One can generate uniform samples in by using the above step recursively, see Fig. S1 for an example to illustrate how the method is performed. This recursive algorithm is closely related to the Voronoi diagrams (2), because the added point will be always in the largest Voronoi cell of if the size of is large enough.

## Section S2. Monotonic convergence

In this section we explain the monotonic convergence: if is not a constant on and the sequence is generated by Eqs. 1 and 2 (not limited to the median-type aigorithm), then for every .

The monotonic convergence proceeds by induction on . First, is trivial since is not a constant on . Assume that , then , where

 S(k)={x∈D(k):f(x)≤f∗χ(k)}; (S4)

now we will show that . Let , then it follows from Eq. S2 that . Hence, for any , we have and it can be further rewritten as

 Aχ(k)f(x′)−ε(k)(x′)≤f∗χ(k), (S5)

and equivalently,

 Aχ(k)f(x′)≤f∗χ(k)+ε(k)(x′)≤f∗χ(k)+u(k), (S6)

thus, , that is, ; meanwhile, since .

## Section S3. Hierarchical low frequency dominant functions

Without loss of generality, assume that on hereafter. We show that if is a -type hierarchical low frequency dominant function, then there exist a class of approximations such that the corresponding error bounds are reduced by a factor of every time the number of function evaluations doubles.

For any nonnegative integer and , we define the th -bandlimited function by

 f(j)ρ(x)=∫∥ω∥∞≤2jnρ^f(ω)e2πix⋅ωdω, (S7)

where is the inner product of and ; then according to the Nyquist-Shannon sampling theorem, can be reconstructed by its samples corresponding to a sampling density of . Further, let , where and

 I(j)ρ=∫2j−1nρ<∥ω∥∞≤2jnρ|^f(ω)|dω, for j=1,2,⋯; (S8)

then and the condition Eq. 3 can be rewritten as

 R(j)ρ<(1+p)I(j)ρ,  or equivalently,  R(j+1)ρ

So it follows that

 R(j+1)ρR(j)ρ=R(j+1)ρI(j)ρ+R(j+1)ρ

so the error bounds are reduced by a factor of every time the number of function evaluations doubles, that is, the corresponding sampling density is increased from to . Furthermore, by noting that

 R(j+1)ρ∥^f∥1=R(j+1)ρI(0)ρ+R(1)ρ≤R(2)ρR(1)ρR(3)ρR(2)ρ⋯R(j+1)ρR(j)ρ<(p1+p)j, (S11)

the error bound can also be rewritten as

 ∥f−f(j)ρ∥∞≤R(j+1)ρ<(p1+p)j∥^f∥1. (S12)

So the error bounds are reduced by a factor of every time the sampling density doubles. In the next section we will further consider the number of samples on the compact sets.

## Section S4. Category I: contractile in logarithmic time

For functions in category I, the key to gaining logarithmic time efficiency comes from two reasons: (i) the median of on is reduced by a factor of as increases and (ii) there is an upper bound on the number of function evaluations on every such that a certain approximation can be constructed by these samples and the error bound is less than or equal to the median of on .

Assume that the sequence are defined recursively by and

 D(k+1)={x∈D(k):f(k+l)ρ(x)≤Medianf(ξ(k))}, k=0,1,⋯, (S13)

where is a uniformly distributed random variable over and is the unique integer such that

 (p1+p)s∥^f∥1

Then it is clear that

 Medianf(ξ(k))=prctile(f(ξ),2−(k+1)⋅100%), (S15)

moreover, since is a -type tempered function on ,

 (p1+p)kMedianf(ξ(0))

Further, note that is a -type hierarchical low frequency dominant function, it follows that for every ,

 ∥f−f(k+s)ρ∥∞≤(p1+p)k+s∥^f∥1<(p1+p)kMedianf(ξ(0))

then according to the convergence shown in section 3, is contained in every ; in addition, since is a critical regular function on , and the Fourier approximation can be reconstructed by its samples corresponding to a sampling density of .

The restriction of to , denoted by , is closely related to the threshold value and the prolate spheroidal functions

which are the eigenfunctions of the time and frequency limiting operator

(3-7). More specifically, if we denote a prolate series up to and including the th term by

 SN(k)f(x)=N(k)∑j=1ψj(x)∫D(k)f(x)ψj(x)dx, (S18)

then follows by noting that has a Fourier transform supported in . Moreover, it is important to mention that a super-exponential decay rate of the error bound as soon as reaches or goes beyond the plunge region around the threshold value (8-9).

Therefore, for every there exist and such that could be constructed by samples of over and

 maxx∈D(k)|f(k+s)ρ(x)−A(k)f(x)|≤% Medianf(ξ(k))−∥f−f(k+s)ρ∥∞, (S19)

and then,

 maxx∈D(k)|f(x)−A(k)f(x)|≤∥f−f(k+s)ρ∥∞+maxx∈D(k)|f(k+s)ρ(x)−A(k)f(x)|≤Medianf(ξ(k)); (S20)

further, for a fixed accuracy , there exists a such that

 (p1+p)K+1<ϵ≤(p1+p)K+2,  or,  K+1

hence, after median-type contractions, one gets the approximate solution set with

 maxx∈D(K+1)|f(x)|≤(p1+p)KMedianf(ξ(0)) (S22)

and the total number of function evaluations is less than

 K−1∑k=0⌈CNΩ,f/2⌉+CNΩ,f=O(NΩ,f⋅logp1+pϵ), (S23)

where is the smallest integer greater than or equal to ; let the computational complexity of is , then the time complexity is less than

 K−1∑k=0⌈CaNaΩ,f/2⌉+CaNaΩ,f=O(NaΩ,f⋅logp1+pϵ), (S24)

taking a logarithmic time for any desired accuracy .

## Section S5. Category II: contractile in polynomial time

In the above discussion, Eq.4 helps us control the bound of the number of function evaluations as a constant value in each contraction; in fact, even without Eq.4, the bound can also be controlled in a probability sense. So in the following we first establish a weaker version of Eq.4, that is, if is a compact set and is continuous and not a constant on , then for any , there must exist a such that

 prctile(f(ξ),2−(k+1)⋅100%)>qϵ1+qϵprctile(f(ξ),2−k⋅100%) (S25)

holds for every with probability at least , where is a uniformly distributed random variable on . From the definition Eq. S13, Eq. S23 is equivalent to saying that the upper bound of is less than of the median of on with probability at least .

Suppose is a uniformly distributed random variable on , , and . Under the assumption of , since is continuous and not a constant on , we have and and there exists a such that . First, the distance between the median and the mean is bounded by standard deviation (10), i.e.,

 m(k)−σ(k)≤μ(k)≤m(k)+σ(k); (S26)

and it follows from Chebyshev s inequality that

 P(|f(ξ(k))−μ(k)|>σ(k)/√ϵ)≤ϵ. (S27)

So it holds that

 prctile(f(ξ(k)),ϵ⋅100%)<μ(k)+1√ϵσ(k)≤m(k)+1+√ϵ√ϵσ(k)=√ϵ+C+C√ϵ√ϵm(k) (S28)

with probability at least , that is,

 P(f(ξ(k))>√ϵ+C+C√ϵ√ϵMedianf(ξ(k))=1+qϵqϵMedianf(ξ(k)))≤ϵ, (S29)

where .

Let us now see what happens after replacing condition Eq. 4 with Eq. S23. Assume that

 (S30)

If is predictable in polynomial time, then there exist and such that Eq. 3 holds; similarly, for every , it holds that

 ∥f−f(kl+s)ρ∥∞≤(p1+p)kl+s∥^f∥1<(qϵ1+qϵ)kMedianf(ξ(0))

with probability at least . Further, according to the convergence shown in section 3, is contained in every with probability at least ; in addition, and the Fourier approximation can be reconstructed by its samples corresponding to a sampling density of . And according to the discussion in the previous subsection, for every there exist and such that could be constructed by less than samples of over and

 maxx∈D(k)|f(x)−A(k)f(x)|≤Medianf(ξ(k)); (S32)

further, for a fixed accuracy , there exists a such that

 (p1+p)(K+1)l<ϵ≤(p1+p)(K+2)l,  or,  K+1

hence, after contractions, one can obtain the approximate solution set with

 maxx∈D(K+1)|f(x)|≤(p1+p)KlMedianf(ξ(0)) (S34)

and clearly, the total number of function evaluations is much less than

 K∑k=0C2klNΩ,f=O(NΩ,f⋅2logp1+pϵ); (S35)

similarly, let the computational complexity of is , then the time complexity is much less than

 K∑k=0Ca2aklNaΩ,f=O(NaΩ,f⋅2alogp1+pϵ), (S36)

taking a polynomial time for any desired accuracy . And it is worth noting that the algorithm is not limited to the median-type.

## Section S6. Category III: noncontractile

For a noncontractile , the contracting algorithm degrades totally into a model-based approach. In the following we consider the time complexities for sufficient smooth functions, Hölder continuous functions and non-Hölder continuous functions, respectively.

Suppose that are uniformly distributed over with the sample size , the relevant data values and interpolates on . For a sample set over , we denote the associated fill distance with

 hN:=supx∈Ωmin1≤i≤N∥x−χi∥, (S37)

then for , there exists such that

 P(hN>Cϵ(logN/N)1/n)=O(N−ϵ), (S38)

or, , where ; see Lemma 12 of (11).

If with on a bounded domain , then there exists a band-limited interpolant (see Lemma 3.9 of (12)) such that for any , it holds that

 |f(x)−Af(x)|≤Chs−n/2N∥f∥Cs(Ω)=O(N−(s−n/2)(1−δ)/n), (S39)

where , see Theorem 3.10 of (12); then for , the time complexity is

 O(Na)=O(2na(s−n/2)(1−δ)log12ϵ), (S40)

where the complexity of is . And this result is similar to that given in (11).

If satisfies a -Hölder condition, i.e., there exist and such that , then there exists a nearest-neighbor interpolant which is closely related to the Voronoi diagram of , such that for any , it holds that

 |f(x)−Af(x)|≤ChαN=O(N−α(1−δ)/n), (S41)

then similarly, for , the time complexity is

 O(Na)=O(2naα(1−δ)log12ϵ). (S42)

Further, if does not satisfy any Hölder condition, then the time complexity is larger than for all , so the algorithm shall not be done in polynomial time.

References and Notes

1. C. E. Rasmussen, C. K. I. Williams, Gaussian Processes for Machine Learning (MIT Press, Cambridge, MA, 2006).

2. F. Aurenhammer, Voronoi diagrams - a survey of a fundamental geometric data structure. ACM Comput. Surv. 23, 345-405 (1991). doi: 10.1145/116873.116880

3. D. Slepian, H. O. Pollak, Prolate spheroidal wave functions, Fourier analysis and uncertainty, I. Bell Systems Tech. J. 40, 43-64 (1961). doi: 10.1002/j.1538-7305.1961.tb03976.x

4. H. J. Landau, H. O. Pollak, Prolate spheroidal wave functions, Fourier analysis and uncertainty, II. Bell Systems Tech. J. 40, 65-84 (1961). doi: 10.1002/j.1538-7305.1961.tb03977.x

5. H. J. Landau, H. O. Pollak, Prolate spheroidal wave functions, Fourier analysis and uncertainty, III. Bell Systems Tech. J. 41, 1295-1336 (1962). doi: 10.1002/j.1538-7305.1962.tb03279.x

6. D. Slepian, Prolate spheroidal wave functions, Fourier analysis and uncertainty, IV. Bell Systems Tech. J. 43, 3009-3057 (1964). doi: 10.1002/j.1538-7305.1964.tb01037.x

7. D. Slepian, On bandwidth. Proc. IEEE 64, 292-300 (1976). doi: 10.1109/PROC.1976.10110

8. J. P. Boyd, Approximation of an analytic function on a finite real interval by a bandlimited function and conjectures on properties of prolate spheroidal functions. Appl. Comput. Harmon. Anal. 25, 168-176 (2003). doi: 10.1016/S1063-5203(03)00048-4

9. A. Bonamia, A. Karoui, Spectral decay of time and frequency limiting operator. Appl. Comput. Harmon. Anal. 42, 1-20 (2017). doi: 10.1016/j.acha.2015.05.003

10. C. Mallows, Another comment on O’Cinneide. The American Statistician 45, 257 (1991). doi: 10.1080/00031305.1991.10475815

11. A. D. Bull, Convergence rates of efficient global optimization algorithms. J. Mach. Learn. Res. 12, 2879-2904 (2011).

12. F. J. Narcowich, J. D. Ward, H. Wendland, Sobolev bounds on functions with scattered zeros, with applications to radial basis function surface fitting.

Math. Comp. 74, 743-763 (2005). doi: 10.1090/S0025-5718-04-01708-9