Local search and fixed point computation are fundamental problems in optimization and computer science as they are embedded in many real world processes. Both problems have been studied both as a heuristic for solving optimization problems and as interesting computational problems in themselves.
In local search, the input is an undirected graph and a function . The set can represent any universe of elements with a notion of neighbourhood and the goal is to find a vertex that is a local minimum, that is with for all . We have oracle access to the function ; for any vertex , we can query the oracle to receive the value . The query complexity is the number of oracle queries necessary and sufficient to find a local optimum.
Algorithms based on local search are widely used in areas such as artificial intelligence, operations research, engineering, mathematics, computational biology, and include algorithms such as hill climbing, simulated annealing, Metropolis-Hastings, and the WalkSat family of algorithms for TSP. For many interesting local search problems, it is not known whether a solution can be found in polynomial time. To study this question, Johnson, Papadimitriou and Yannakakis[JPY88] introduced the complexity class PLS to capture the difficulty of computing a locally optimal solution to an optimization problem in a discrete space.
A closely related problem is that of finding an approximate Brouwer fixed point. In the Brouwer fixed point theorem, we have a compact and convex subset of and a continuous function. Then a point such that is guaranteed to exist. An approximate version of this problem (denoted Brouwer) can be studied from a computational point of view, we are given an additional parameter and the goal is to find an -fixed point provided that the function is Lipschitz continuous. The class PPAD captures the difficulty of finding a Brouwer fixed point [PAP94], together with other problems that are computationally equivalent, such as finding an approximate Nash equilibrium in a multi-player game [DGP09] or an exact equilibrium in a two player game [CDT09], an Arrow-Debreu equilibrium in a market [VY11], and a polychromatic simplex in Sperner’s Lemma. The classes PLS and PPAD are related. In a striking recent development, Fearnley, Goldberg, Hollender, and Savani [FGH+20] showed that the class CLS, introduced by Daskalakis and Papadimitriou [DP11] to capture continuous local search, is equal to .
We study the complexity of local search and Brouwer in the black box model when the number of rounds of interaction with the oracle is bounded. For local search, the input will be a local search instance on the -dimensional grid together with an upper bound on the number of rounds of interaction with the oracle. In each round , any number of simultaneous queries can be issued, after which the oracle answers are received, and then the queries for round can be issued, possibly depending on rounds . The algorithm must stop and output a solution by the end of the -th round. The fully adaptive case is obtained by taking .
Rounds, or adaptive complexity, is a core issue for optimization in distributed settings where evaluating the function is time consuming and we have to use multiple processors to speed up the algorithm. Thus an algorithm that runs in rounds can be viewed as follows: a central machine issues in each round a set of queries, one to each processor, then waits for the answers before issuing the next set of parallel queries in round . The question then is how many processors are needed to achieve a time of , or equivalently, what is the query complexity in rounds. For more discussion on adaptive complexity, see, e.g. book by Akl [AKL14] on parallel sorting algorithms. Valiant [VAL75] initiated the study of parallelism using the number of comparisons as a complexity measure and showed that processor parallelism can offer speedups of at least for problems such as sorting and finding the maximum of a list of elements.
There has been a line of research on the query complexity of local search and Brouwer fixed-point in the fully adaptive setting. For local search, Aldous [ALD83] first presented a classical randomized warm algorithm, which achieves on any graph, where is the number of points in the graph. [ALD83] also proved a lower bound of for randomized algorithms on the hypercube with a sophisticated analysis of the random process. For deterministic algorithms, Llewellyn, Trick, and Tovey [LTT93] and Llewellyn and Tovey [LT93] proved that the divide-and-conquer procedure is optimal via an adversarial argument. Thus for the -dimensional grid , these works show a bound of , and the algorithm takes rounds. The randomized case is more challenging and Aaronson [AAR06] proposed a novel technique called the relational adversarial method inspired by the adversarial method in quantum computing, which avoids analyzing the posterior distribution during the execution directly. Later, Zhang [ZHA09], and Sun and Yao [SY09] obtained tighter lower bounds using this method with better choices on the random process, they proved a lower bound of . Together with the randomized algorithm of Aldous [ALD83] which has a warm start, these works concluded a near optimal bound of for randomized algorithm.
The query complexity of the Brouwer problem was first studied by Hirsch, Papadimitriou, and Vavasis[HPV89], which proved an exponential bound for deterministic algorithms. Chen and Deng [CD05] improve their bound to for deterministic algorithms. They also proposed a deterministic divide-and-conquer algorithm matching this bound, which takes rounds. Later a randomized lower bound of was given by Chen and Teng [CT07].
We present several new algorithms and lower bounds, which characterize the trade-off between the number of rounds of adaptivity and the total number of queries, thus showing a transition from one round algorithms to fully adaptive algorithms.
Let be an undirected graph and a function, where is the value of node . The goal is to find a local minimum of , that is, a vertex with the property that for all neighbours of . We study the setting where is a dimensional grid of side length , thus where and if . The dimension is a constant.
We have oracle access to the function and are allowed rounds of interaction with the oracle. A protocol for local search will submit a number of queries to the function and outputs at the end a node that is a local minimum of in the graph . When the protocol runs in rounds, then multiple queries can be issued at once in each round. The choice of queries submitted in round cannot depend on the results of queries from the same or later rounds (i.e. ). Given as a parameter, we measure the total number of queries submitted by an algorithm.
The deterministic query complexity of a search problem in rounds is the total number of queries necessary and sufficient to find a solution when given
rounds of interaction with the oracle. The randomized query complexity is the expected number of queries required to find a solution with probability at leastfor any input, where the expectation is taken over the coin tosses of the protocol.
1.2 Our Results
In this section we state our results and give an overview of the proofs.
1.2.1 Local Search
We start by studying the case of constant rounds for local search.
(Local search, constant rounds) Let be a constant. The query complexity of local search in rounds on the -dimensional grid is , for both deterministic and randomized algorithms.
When , this bound is close to , with gap smaller than any polynomial. The classical result by Lewellyn, Tovey, and Trick [LTT93] showed that the query complexity of local search for deterministic algorithm is , and the upper bound is achieved by a divide-and-conquer algorithm with rounds. Thus our result fills the gap between one round algorithms and logarithmic rounds algorithms except for a small margin. This theorem also implies that randomness does not help when the number of rounds is constant.
When the number of rounds is polynomial in , that is for some constant , the algorithm that yields the upper bound in Theorem 1 is no longer efficient. We design a different algorithm for this regime and also show an almost matching lower bound.
(Local search, polynomial rounds) Let , where is a constant. The randomized query complexity of local search in rounds on the -dimensional grid is at most and at least .
When , the bound is close to , i.e., the bound of constant and logarithmic rounds algorithm; when , the bound is close to , i.e., the bound of fully adaptive algorithm. Thus, our polynomial rounds algorithm fills the gaps between constant / logarithmic rounds algorithm and fully adaptive algorithm.
|Number of Rounds||Deterministic||Randomized|
|Polynomial Rounds:||[LTT93][LT93]333[LTT93] and [LT93] didn’t study the round issue, but we can conclude this result since they proved a lower bound of for deterministic fully adaptive algorithm and also provided a rounds deterministic algorithm with such efficiency.||
Overview of Algorithms for Local Search
Our local search algorithm for constant number of rounds is a direct generalization of the classical divide-and-conquer algorithm by Llewellyn, Tovey, and Trick [LTT93]. We divide the search space into many sub-cubes of side length in round , query their boundary, then continue the search into the one that satisfies a boundary condition which guarantees the existence of a solution in that sub-cube. In the last round, we query all the points in the current sub-cube and get the solution.
For polynomial number of rounds, this approach does not give a tight bound for any , so instead we design an algorithm following the framework of Aldous [ALD83]. We randomly sample many points in round as a warm start and then start tracing for the solution from the best point in round . Since the steepest descent used by Aldous’ algorithm is fully adaptive and takes too many rounds, we designed a recursive procedure (called “fractal-like steepest descent”) which parallelizes the steepest descent process at the cost of more queries.
Let be the set of grid points in the -dimensional cube of side length , centered at point . Let be the number of points with smaller function value than point . Assume we already have a procedure and a number such that will either return a point with , or output a correct solution and halt. Also assume in both cases takes at most rounds and queries in total for any . If we want to find a point with for any given or output a correct solution, the naive approach is to run sequentially times, taking
Since each call of must wait for the result from the previous call, the naive approach will take rounds and queries.
Interestingly, we can parallelize these calls using auxiliary variables that are expensive in queries, but very cheap in rounds. For , let be the point with minimum function value on the boundary of cube , which can be found in only one round with queries after getting . Now assume we have the location of at the start of round . Then we can take to be instead of . The location of will be available at round ; then we can compare the value of with the value of . If then
Otherwise, since has smaller value than any point on the boundary of , we could use a slightly modified version of the divide-and-conquer algorithm of [LTT93] to find the solution within the sub-cube in rounds and queries, and then halt all running procedures. If we have for any , applying inequality 1 for times we will get , so we could return in this case. This parallel approach will take only rounds and queries.
Overview of Lower Bounds for Local Search.
To show lower bounds, we use Yao’s minimax theorem [YAO77]. We first provide a hard distribution of inputs, then show that no deterministic algorithm could achieve accuracy larger than some specific constant on this distribution. The hard distribution of inputs will be given by a staircase construction [VAV93, HPV89]. In general, a staircase is a path drawn from a random process in the domain of the function, and the only solution point is hidden at the end of the path. Interestingly, such construction could be embedded in two relevant but essentially different tasks: finding a local minimum of a function [ALD83, AAR06, SY09, ZHA09, HY17] and computing a Brouwer fixed point [CT07, HPV89].
When applying the staircase to local search, the value of the start point is set to zero, and the value keeps decreasing from the start to the end along the path, like going down the stairs. The values of any points outside of the staircase are set to the distance to the start point of the staircase. Intuitively, the algorithm cannot find much useful structural information of the input and thus has no advantage over the default path tracing algorithm.
The most challenging part is rigorously proving such intuition is correct. Our main technical innovation is a new technique to incorporate the round limit into the randomized lower bounds, as we were not able to obtain a lower bound for rounds using the methods previously mentioned. This could also serve as a simpler alternative method of the classical relational adversarial method[AAR06] in the fully adaptive setting.
A staircase in our proof is defined by an array of connecting grid points , for , and a uniquely determined path is used to link every two consecutive points , for . For round algorithms, we choose a distribution of length staircases, where the length is defined as the number of connecting points in the staircase minus .
The core concept of our proof is good staircases. A length staircase is good with respect to a deterministic algorithm if for each , any point in the suffix after connecting point ( is not included) of the staircase is not queried by in round , when running on the input generated by this staircase. The input functions generated by good staircases are like adversarial inputs, that is, the algorithm could only (roughly) learn the location of the next connecting point in each round , and still know little about everything after . We show that if of all possible staircases are good staircases, then the algorithm will make a mistake with probability at least (Lemma LABEL:lem:good1 and Lemma 14).
We ensure that each possible staircase is chosen with the same probability, and their total number is easy to calculate. Thus the major technical part of our proof is counting the number of good staircases. The following properties of a good staircase are essential for counting them, and are formally proved in Lemma LABEL:lem:good0.
If is a good staircase, any prefix of is also a good staircase, where a prefix of is any staircase formed by a prefix of the array of the connecting points of .
Let be any two good staircase with respect to algorithm . If the first connecting points of the staircases are same, then will make the same queries in round running on both input functions generated by .
We count the number of good staircase in steps. In the first step, we show all staircases of length are good by definition. In each step , we derive a recursive inequality to calculate the numbers of length good staircases by the number of length and good staircases with a method we called two-stage analysis.
The proof framework above works for lower bounds in both constant rounds and polynomial rounds. The major difference between these two case is the distribution of the staircases.
1.2.2 Brouwer Fixed Point
Since more than rounds do not improve the query complexity for Brouwer suggested by [CD05, CT07], we will focus on the case of constant number of rounds. If the number of rounds is a non-constant function smaller than , this only changes the bound by a sub-polynomial term.
(Brouwer fixed point) Let be a constant. The query complexity of computing a Brouwer fixed point on the -dimensional grid in rounds is , for both deterministic and randomized algorithms.
2 Local Search
In this section we study local search on the -dimensional grid. We present algorithm for constant and polynomial number of rounds in Section 2.1 and Section 2.2 respectively. The corresponding lower bounds can be found in Section 2.3 and Section 2.4.
2.1 Algorithm for Local Search in Constant Rounds
A -dimensional cube is a Cartesian product of connected (integer) intervals. We use cube to indicate -dimensional cube for brevity, unless otherwise specified. The boundary of cube is defined as all the points with fewer than neighbors in .
Upper Bound of Theorem 1.
Given the d-dimensional grid , we will define a sequence of cubes contained in each other: , where is the whole grid. For each , set as the side length of cube . The values of are chosen for balancing the number of queries in each round, which will be proved later. Note is an integer divisor of . Consider the following algorithm.
Algorithm 1: Local search in constant rounds
Initialize the current cube to .
In each round :
Divide the current cube into a set of mutually exclusive sub-cubes of side length that cover .
Query all the points on the boundary of sub-cubes . Let be the point with minimal value among them.
Set , where is the sub-cube that belongs to.
In round , query all the points in the current cube and find the solution point.
To argue that the algorithm finds a local minimum, note that in each round , the steepest descent starting from will never leave the sub-cube , since if it did it would have to exit through a point of even smaller value than , which contradicts the definition of . Thus there must exist a local optimum within .
Now we calculate the number of queries in each round. In round , the number of points on the boundary of all sub-cubes is , which is equal to The number of queries in round is . Since are constants, the algorithm makes queries in total as required. ∎
An example can be found in the next figure.
2.2 Algorithm for Local Search in Polynomial Rounds
In this section we present a randomized algorithm that runs in a polynomial number of rounds, filling in the gap between the constant rounds algorithm and the fully adaptive algorithm.
Algorithm 2: Local search in polynomial number of rounds.
Input: Size of the instance , dimension , round limit , value function . These are global parameters accessible from any subroutine. Output: Local minimum in .
Set ; ;
Query points chosen u.a.r. in round and set to the minimum of these
Procedure Fractal-like Steepest Descent (FLSD).
Input: size , depth , grid point , round . Output: point with ; if in the process of searching for such a point it finds a local minimum, then it outputs it and halts everything.
Set // executed in round
If then: // make steps of steepest descent, since is small enough when
For to : // executed in rounds to
Query all the neighbors of ; let be the minimum among them
If then: // thus is a local min
Output and halt all running FLSD and DACS calls
Return // executed in round
For to : // divide the whole task into pieces; executed in rounds to
FLSD // execute call in parallel with current procedure
Query the boundary of to find the point with minimum value on it // making a “giant step” of size step
For to : // check if each giant step does make giant progress by using the feedback from sub-procedures; executed in round , after was received in Step ca
If then: // a solution exists in , call DACS to find it
Set DACS // stop and wait for the result of DACS
Output and halt all running FLSD and DACS calls
Return // executed in round
Procedure Divide-and-Conquer Search (DACS).
Input: cube . Output: Local minimum in .
For to :
If contains only one point then:
Set ; break
Partition into disjoint sub-cubes , each with side half that of
Query all the points on the boundary of each sub-cube .
Let be the point with minimum value among all points queried in , including queries made by Algorithm 2 and all FLSD calls // break ties lexicographically
Let be the unique sub-cube with .
An example with the trace of the execution is shown in Figure 2.
We first establish that Algorithm 2 is correct.
If the procedure FLSD 444All the procedures FLSD we considered in the following analysis are initiated during the execution of Algorithm 2. Thus Lemma 4, Lemma 5 and Lemma 8 may not work for FLSD with arbitrary parameters. does return at Step bb or Step e, it will return within rounds after the start round ; otherwise procedure FLSD will halt within at most rounds after the start round .
We proceed by induction on the depth . The base case is when . Then by the definition of and , we have
Also notice that the parameter will be divided by when decreases by one, so when , the current size will be at most . Assume it holds for .
For any , all the queries made by the procedure itself need rounds and each sub-procedure will take at most rounds by the induction hypothesis. Since all the procedures are independent of each other and could be executed in parallel, the total number of rounds needed for this procedure is . The first part of the lemma thus follows by induction.
Finally, recall that the divide-and-conquer procedure DACS takes rounds, so the procedure will halt within rounds. ∎
Proof of Lemma 5.
We proceed by induction on the depth . The base case is when . Then we know that by the same argument in Lemma 4, thus of steps of steepest descent will ensure that and . Assume it holds for and show for .
For any , by Step da we have for any ; by the induction hypothesis, we have for any . Combining them we get for any . Thus
Also notice that the distance from to is at most , i.e., . This concludes the proof of the lemma for any depth . ∎
The point returned at Step d(a)i is a a local minimum.
We use notation to denote the variable in the procedure DACS and to denote the variable in the procedure FLSD which calls the procedure DACS.
By Step da, we have . Then for each , we have by its definition. Therefore the steepest descent from will never leave the cube , especially the cube . Let be the cube that consists only of the point in the DACS procedure. The steepest descent from doesn’t leave the cube , which means that is a local optimum. ∎
Algorithm 2 outputs the correct answer with probability at least .
The point output at Step b(a)ii is always a local optimum. By Lemma 6, the point output at Step d(a)ii is also a local optimum. Thus we only need to argue that the Algorithm 2 will output the solution and halt with probability at least . Notice that
Thus after the first round, with probability at least , we have
If inequality (2) holds, then the procedure FLSD should halt within a number of rounds of at most
which is impossible. Thus, the call FLSD must halt within rounds in this case, which completes the argument. ∎
Now consider the total number of queries made by Algorithm 2.
A call of procedure FLSD will make number of queries, including the queries made by its sub-procedure.
We proceed by induction on the depth . The base case is when . In this case, the procedure FLSD performs steps of steepest descent, where . Thus it will make queries.
For any depth ,
the number of queries made by Step cb is at most
the number of queries made by all sub-procedures is bounded as follows by the induction hypothesis
the number of queries made by DACS is at most
Thus the total number of queries is , which concludes the proof for all . ∎
2.3 Randomized Lower Bound for Local Search in Constant Rounds
Notation and Definitions
Recall that . Let . We now consider the grid of side length in this subsection for technical convenience.
For a point , let be the grid points that are in the cube region of size with corner point :
Next we define a quasi-segment, which is an approximation of a straight line segment that only uses grid points. Thus will make a mistake with probability at least
where the probability is taken on the staircase and the value of .
Counting the number of good staircases
Counting the number of good staircases is the major technical challenge in our proof. The concept of probability score function will be useful.
Definition 9 (probability score function).
For any algorithm , let be the set of points queried by during its execution. Given a point , for any , define the set of points
The probability score function is .
The probability score function for a good staircases of length is defined as , where is the set of points that have been queried by after the round , if is executed on value function .555By the definition of good staircase, will not query the end point of staircase in rounds . Since is a deterministic algorithm and is deterministic except at the end point of , the set here is uniquely defined.
For a fixed grid point , the following lemma upper bounds the number of quasi-segments that are intersected by a point .
Let , . Let Then for all we have
where is a constant only depending on . In particular, .
Intuitively, is the surface area of a cube of side length , and is the fraction of it blocked by a unit cube on the surface. Let denote the -norm ball of radius , centered at .
Define as the set of points , such that the geometric straight line intersected with the unit -norm ball centered at . Recall the definition of quasi-segment, we have . The following lemma given an upper bound for
by the volume of a specific region, which is easier to estimate.
Denote as the cone region in ball , defined by the center and the tangent surface of ball .
is upper bounded by the volume of the cone region
Define as the cone region in the ball , defined by the center and the tangent surface of ball . For any point , the geometric straight line also intersects with the ball , because . So we have . We then claim that .
If , we have .
Otherwise, let be any point that is in the intersection of straight line and the ball . We have . Also notice that is a cone region centered at and . By a simple geometric observation, we know that there must be since .
Thus we have
Proof of Lemma 10.
We finish the proof of Lemma 10 by estimating the volume of cone region . The central angle of cone is upper bounded by . It’s a basic high dimensional geometric problem that the volume of cone is upper bounded by , where is a constant only depending on . ∎
For any , denote the cost on the probability score function of point incurred by point by
Then the total cost incurred by one point for all is , where is a constant only depending on .
For a fixed point , we only need to consider the cost for point such that , since the cost for other points is zero.
Note the first inequality enumerates by the -norm of and uses Lemma 10. ∎
We can now prove the following key lemma by a method we called two-stage analysis.
If the number of queries issued by algorithm is at most , then of all possible length staircases are good with respect to .
For , denote as the set of all good staircases of length . Then is the number of all length good staircases, and is the fraction of length good staircases. In particular, .
Let’s first fix any length good staircase , and denote the set of all good staircases growing from as . Now consider the sum of the probability score function of these good staircases,