Optimal Adaptive Detection of Monotone Patterns

11/04/2019
by   Omri Ben-Eliezer, et al.
0

We investigate adaptive sublinear algorithms for detecting monotone patterns in an array. Given fixed 2 ≤ k ∈N and ε > 0, consider the problem of finding a length-k increasing subsequence in an array f [n] →R, provided that f is ε-far from free of such subsequences. Recently, it was shown that the non-adaptive query complexity of the above task is Θ((log n)^log_2 k ). In this work, we break the non-adaptive lower bound, presenting an adaptive algorithm for this problem which makes O(log n) queries. This is optimal, matching the classical Ω(log n) adaptive lower bound by Fischer [2004] for monotonicity testing (which corresponds to the case k=2), and implying in particular that the query complexity of testing whether the longest increasing subsequence (LIS) has constant length is Θ(log n).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/03/2019

Finding monotone patterns in sublinear time

We study the problem of finding monotone subsequences in an array from t...
06/15/2020

Improved algorithm for permutation testing

We study the problem of testing forbidden patterns. The patterns that ar...
10/12/2020

New Sublinear Algorithms and Lower Bounds for LIS Estimation

Estimating the length of the longest increasing subsequence (LIS) in an ...
07/28/2019

A Lower Bound on Cycle-Finding in Sparse Digraphs

We consider the problem of finding a cycle in a sparse directed graph G ...
07/13/2020

Adaptive minimax testing for circular convolution

Given observations from a circular random variable contaminated by an ad...
03/05/2018

Optimal Substring-Equality Queries with Applications to Sparse Text Indexing

We consider the problem of encoding a string of length n from an alphabe...
04/10/2019

Testing Unateness Nearly Optimally

We present an Õ(n^2/3/ϵ^2)-query algorithm that tests whether an unknown...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

For an integer and a function (or sequence) , a length- monotone subsequence of is a tuple of indices, , such that and . More generally, for a permutation , a -pattern of is given by a tuple of indices such that whenever satisfy . A sequence is -free if there are no subsequences of with order pattern . Pattern avoidance and detection in an array is a central problem in theoretical computer science and combinatorics, dating back to the work of Knuth [Knu68] (from a computer science perspective), and Simion and Schmidt [SS85] (from a combinatorics perspective); see also the survey [Vat15]. Studying the computational problem from a sublinear algorithms perspective, Newman, Rabinovich, Rajendraprasad, and Sohler [NRRS17] initiated the study of property testing for forbidden order patterns in a sequence. For a fixed and a pattern of length , we want to test whether a function is -free or -far from -free.111A function is -far from -free if any -free function differs on a inputs. They explicitly considered the monotone case as a particularly interesting instance; monotone patterns are naturally connected to monotonicity testing and the longest increasing subsequence and can shine new light on these classic problems. Note that being free of length- monotone increasing subsequences is equivalent, as a simple special case of Dilworth’s theorem [Dil50], to being decomposable into monotone non-increasing subsequences. The algorithmic task, which is the subject of this paper, is the following:

For and , design a randomized algorithm that, given query access to a function

, distinguishes with probability at least

between the case that is free of length- monotone subsequences and the case that it is -far from free of length- monotone subsequences.

This paper gives an algorithm with optimal dependence in for solving the above problem. We state the main theorem next, and discuss connections to monotonicity testing and LIS shortly after.

Fix . For any , there exists an algorithm that, given query access to a function which is -far from -free, outputs a length- monotone subsequence of with probability at least , with query complexity and running time222Generally, along the context of the introduction, we allow the term to depend on ; the precise bound we obtain here is . See Lemma 3 for more details. of .

The algorithm underlying Theorem 1 is adaptive333An algorithm is non-adaptive if its queries do not depend on the answers to previous queries, or, equivalently, if all queries to the function can be made in parallel. Otherwise, if the queries of an algorithm may depend on the outputs of previous queries, then the algorithm is adaptive. and solves the testing problem with one-sided error,444An algorithm for testing property has one-sided error if it has perfect completeness, i.e., it always outputs “yes” if ; otherwise, the algorithm is said to have two-sided error. since a length- monotone subsequence is evidence for not being -free. The algorithm improves on a recent result of Ben-Eliezer, Canonne, Letzter and Waingarten [BECLW19] who gave an algorithm for finding length- monotone patterns with query complexity , which in itself improved upon a upper bound by Newman et al. [NRRS17]. The focus of [BECLW19] was on non-adaptive algorithms, and they gave a lower bound of queries for non-adaptive algorithms achieving one-sided error. Hence, Theorem 1 implies a natural separation between the power of adaptive and non-adaptive algorithms for finding monotone subsequences.

Theorem 1 is optimal, even among two-sided error algorithms. In the case , corresponding to monotonicity testing, there is a lower bound555The precise lower bound is of the form , and is equivalent to the aforementioned one as long as, say, . for both non-adaptive and adaptive algorithms [EKK00, Fis04, CS14], even with two-sided error. A simple reduction suggested in [NRRS17] shows that the same lower bound (up to a multiplicative factor depending on ) holds for any fixed . Thus, an appealing consequence of Theorem 1 is that the natural generalization of monotonicity testing, which considers forbidden monotone patterns of fixed length longer than 2, does not affect the query complexity by more than a constant factor. Interestingly, Fischer [Fis04] shows that for any adaptive algorithm for monotonicity testing on there is a non-adaptive algorithm at least as good in terms of query complexity (even if we only restrict ourselves to one-sided error algorithms). That is, adaptivity does not help at all for . In contrast, the separation between our adaptive upper bound and the non-adaptive lower bound of [BECLW19] implies that this is no longer true for .

As an immediate consequence, Theorem 1 gives an optimal testing algorithm for the longest increasing subsequence (LIS) problem in a certain regime. The classical LIS problem asks one to determine, given a sequence , what is the maximum for which contains a length- increasing subsequence. It is very closely related to other fundamental algorithmic problems in sequences, like the computation of edit distance, Ulam distance, or distance from monotonicity (for example, the latter equals minus the LIS length), and was thoroughly investigated from the perspective of sublinear-time algorithms [PRR06, ACCL07, SS17, RSSS19] and streaming algorithms [GJKK07, SW07, GG10, SS13, EJ15, NS15]. In the property testing regime, the corresponding decision task is to distinguish between the case where has LIS length at most (where is given as part of the input) and the case that is -far from having such a LIS length. Theorem 1 in combination with the aforementioned lower bounds (which readily carry on to this setting) yield a tight bound on the query complexity of testing whether the LIS length is a constant. Fix and . The query complexity of testing whether has LIS length at most is .

1.1 Related Work

Considering general permutations of length and exact computation, Guillemot and Marx [GM14] showed how to find a -pattern in a sequence in time , later improved by Fox [Fox13] to . In the regime , an algorithm of Berendsohn, Kozma, and Marx [BKM19] provides the state-of-the-art.

For approximate computation of general patterns , the works of [NRRS17, BC18] investigate the query complexity of property testing for forbidden order patterns. When is of length , the problem considered is equivalent to testing monotonicity, one of the most widely-studied problems in property testing, with works spanning the past two decades. Over the years, variants of monotonicity testing over various partially ordered sets have been considered, including the line [EKK00, Fis04, Bel18, PRV18, Ben19], the Boolean hypercube  [DGL99, BBM12, BCGSM12, CS13, CST14, CDST15, KMS15, BB15, CS16, CWX17, CS19], and the hypergrid  [BRY14, CS14, BCS18]. We refer the reader to [Gol17, Chapter 4] for more on monotonicity testing, and a general overview of the field of property testing (introduced in [RS96, GGR98]).

Understanding the power of adaptivity seems to be a notoriously difficult problem in property testing. In the context of testing for forbidden order patterns, non-adaptive algorithms are rather weak: the non-adaptive query complexity is for all non-monotone order patterns [NRRS17], and as high as for most order patterns of length [BC18]. Prior to our work (which shows separation between adaptive and non-adaptive algorithms for monotone patterns), the only case for which adaptive algorithms were known to outperform their non-adaptive counterparts have been for patterns of length in [NRRS17], and an intriguing conjecture from the same paper suggests that in fact, the query complexity for testing -freeness is polylogarithmic in for any fixed-length – depicting an exponential separation from the non-adaptive case (for non-monotone patterns).

1.2 Main Ideas and Techniques

We now describe the intuition behind the proof of Theorem 1. There are two main technical components: 1) a new structural result for functions with many length- monotone subsequences which strengthens a theorem of [BECLW19], and 2) new (adaptive) algorithmic components which lead to the -query algorithm. We start by explaining the upper bound of Newman et al. [NRRS17] and the structural decomposition of [BECLW19].

Fix and , and suppose that is -far from -free, that is, -far from free of length- increasing subsequences. Notice that must contain a collection of at least pairwise-disjoint increasing subsequences of length .666Otherwise, greedily eliminating these subsequences gives a -free function differing in strictly less than inputs. For simplicity, consider first (which corresponds to the classical problem of monotonicity testing). For any , we say that cuts the pair with slack if , or, informally, if lies roughly “in the middle” of and . Additionally, the width of the pair is . Define the collection of copies from of width around by

Finally, the density of copies from of width around , and the total density of around , are defined by

A polylogarithmic-query algorithm.

Fix a location and a width , and consider drawing indices from the interval uniformly at random, querying in all of these locations. Letting be the median of the set , if we manage to query the “1-entry” of some where , and the “2-entry” of some where , then would form a valid -pattern, since and . By definition, the number of entries as well as the number of entries within which may be sampled is at least . Therefore, with good probability, uniform queries from the interval will hit at least one such and one , which would form the desired -pattern.

We claim that many values of have some width where the density is large. First, a simple double counting argument shows . On the other hand, for any width , and so . Consequently, the probability that a random satisfies is . It suffices to pick uniformly random in order for one to satisfy with high probability; and, if this event holds, then there exists for which . We now leverage the querying paradigm described in the previous paragraph: if for any as above and any we query uniform locations in , then we shall find a -pattern with good probability. In total, this procedure makes non-adaptive queries.

To deal with general fixed and , the (essentially) same reasoning is applied recursively, leading to the -query algorithm of [NRRS17].

Structural decomposition.

[BECLW19] established a structural theorem for functions that are -far from -free, which led to improved non-adaptive algorithms. Specifically, they show that any which is -far from -free satisfies at least one of two conditions: either contains many growing suffixes, or it can be decomposed into splittable intervals. For the purpose of this discussion, let be any collection of disjoint -copies in .777To simplify the discussion, in the rest of this exposition we will generally not be interested in the exact dependence on the parameters and , and for convenience we often use notions like and that hide this dependence.

  • Growing suffixes: there exist values of where888We have previously defined the notions of cutting with slack and density only for the case , but they generalize rather naturally to any . First, define the gap index of a -pattern in in locations as the smallest integer maximizing ; the above copy is cut by with slack if . The gap-width of the copy is . The definitions of , , can then be generalized in a straightforward manner, replacing “width” with “gap width” wherever relevant. and for every . In other words, many have that the sum of local densities, of -patterns in intervals of growing widths is not too small, and furthermore, the densities are not concentrated on any small set of widths . Any such is said to be the starting point of a growing suffix.

  • Splittable intervals: there exist and a collection of pairwise-disjoint intervals with , so that each contains a dense collection of disjoint -patterns of a particular structure. Specifically, each such interval can be partitioned into three disjoint intervals (in this order), each of size , where fully contains disjoint copies of -patterns, in which the first entries lie in , and the last entries lie in (none of these entries lies in ).

The high-level idea is as follows. First, a greedy rematching procedure from [BECLW19] shows that any that is -far from -free must contain a collection of -patterns with , where all copies in have the same gap index , and also satisfies the requirement that if two copies in locations and satisfy but , then . Fix such a choice of for the rest of the discussion.

Now, what happens if some is the start of the growing suffix, and for any we query in uniformly random entries from the interval ? A simple quantitative analysis (which we omit here) based on the growing suffixes condition shows that with good probability, there will be a collection of -copies , where and (say) for every , such that the algorithm queries the -entry – – of each . Since cuts all the above copies with slack, it must hold that

In view of our requirement on , it immediately follows that . In particular, we have detected a -copy , as desired. To summarize, if we pick uniformly random choices of , and apply the above querying procedure for each such , a -copy will be detected with good probability assuming the growing suffixes condition holds. The total (non-adaptive) query complexity is as claimed.

Shoham: I think this notation is cleaner, if not ideal

When the growing suffixes condition does not hold, [BECLW19] recursively applies the splittable interval condition multiple times to get a non-adaptive algorithm making queries. This cannot be improved, due to the matching lower bound proved there. Thus, in order to achieve a query complexity of assuming the splittable intervals condition, our algorithm must be adaptive.

[BECLW19] proceeds by devising an -query non-adaptive algorithm for the growing suffixes case, and an -query non-adaptive algorithm for the splittable intervals case. Thus, in order to obtain an -query adaptive algorithm, it suffices to develop such an algorithm under the splittable intervals assumption.

Robustifying the structural decomposition.

We now hope to devise an -queries adaptive algorithm for finding a -copy, assuming that satisfies the splittable intervals condition. Perhaps the most natural approach for this in view of the splittable intervals condition is as follows. (i) Approximate, in some way, the endpoints of some splittable interval and its left and right part , while enumerating over the gap index ; (ii) Make a recursive call searching for a -copy contained in the left part and a -copy in the right part , with the hope that they will combine together to a -copy (for this to happen, we also need them to be compatible, in the sense that ).

In order to carry on with the approach suggested above, one can try to locate a “-entry” lying in the left part and a compatible “-entry” lying in the right part (here we henceforth we fix and assume the gap of to equal ). Since in total entries serve as the 1-entry lying in the left part of some interval where , it takes only a constant number of (non-adaptive) random queries from in order to hit one such -entry . Suppose, then, that such a (sufficiently well-behaved) “1-entry” was hit, and that the length of the containing splittable interval is unknown to us. The task now is to approximate this length, or (roughly) equivalently, to find -entries compatible with , which lie in the right part, .

Inspired by the previous approaches, we can try to uniformly sample elements to the right of at all possible scales, that is, to sample such elements in , for any possible . Among the queried elements, only those elements satisfying can serve as candidates to be -entries in , and it can be shown that if indeed is a (well-behaved) -entry of some copy in , then “true positives” – well-behaved -entries satisfying – will, indeed, be queried by this procedure. However, we might overshoot and see many “false positives” among the queries: elements that satisfy , yet do not belong to the interval , and in fact satisfy that is much bigger than . Without the ability to deal with overshooting, or with distinguishing true positives from false ones, it is unclear how to determine, or even approximate, the length of , and we are seemingly stuck.

The splittable intervals condition, however, does not seem strong enough for our purposes: in order to utilize it, one would seemingly have to “identify”, in some way, which parts of our sequence constitute splittable intervals, which is not clear how to do efficiently. In order to bypass this issue, we substantially strengthen the structural theorem. The stronger statement asserts that any that is -far from -free either satisfies the growing suffixes condition, defined previously, or a robust version of the splittable intervals condition, defined as follows.

  • Robust splittable intervals: there exist and a collection of pairwise-disjoint intervals satisfying the same properties as in the “splittable intervals” setting described above (with slightly different dependence on and in the term). Additionally, any interval which contains an interval is itself far from -free, i.e. it contains a collection of disjoint -copies.

The (stronger) robust splittable intervals condition follows from the original condition, of (non-robust) splittable intervals; the proof is strikingly simple, relying on an elementary counting argument. Let be the splittable intervals in the non-robust setting. Let be the collection of all intervals where there exists an interval that does not contain disjoint -copies (for a sufficiently small ). Finally, let be a minimal set of intervals . From the minimality, it can be shown that any is contained in at most three intervals from , and so . Thus, the total number of disjoint -copies in intervals from (and so, in intervals from ) is bounded by , meaning in particular that the intervals in the collection contain disjoint -copies in total provided that is small enough. One can now verify that satisfies all requirements of the robust splittable intervals condition.

Towards an algorithm.

At a high level, the algorithms of [BECLW19] and [NRRS17] proceed in a recursive manner where each step tries to find the relevant width considered (which is one of options). Since their algorithms are non-adaptive, they consider all options in recursive steps, and hence, suffer a logarithmic factor with each step. Since our algorithm is adaptive, we want to choose one of the widths to recurse on. The algorithm will ensure that the width considered is large enough. When the width chosen is not too much larger, our recursive step proceeds similarly to [NRRS17]; we call this the fitting case. However, the width considered may be too large; we call this case overshooting. In order to deal with the overshooting case, we algorithmically utilize the robust structural theorem in a somewhat surprising manner in order to detect a -copy.

We now expand on the above idea and provide an informal description. As [BECLW19] gives an -query algorithm when our function satisfies the growing suffixes condition, we may assume that satisfies the robust splittable intervals condition. Consider sampling, for repetitions, an index uniformly at random, and for each , a random index . Consider the following event:

The index is a (sufficiently well-behaved)999Recall that, in the first polylogarithmic-query qlgorithm described above, we hoped to hit a “1-entry” whose value is no higher than some suitable median value; the “well-behaved” requirements are of similar flavor, and do not incur more than a constant overhead on the query complexity. first element in some -pattern falling in some robust splittable interval , and for satisfying , is a (well-behaved) -th element in some -pattern falling in .

We claim that the above event occurs with high (constant) probability for at least one choice of , and that when this event does occur, the algorithm can be recursively applied without incurring a multiplicative logarithmic factor. Indeed, suppose that the above holds for some .101010More precisely, our algorithm runs this procedure for any of our choices of , without “knowing” which of them satisfies the above event. Since the total number of choices is , this incurs only a constant overhead. We set to be , where is the largest such that holds, and notice in particular that . This means that and .

The fitting case occurs when (achieving the maximum above) is roughly the same as . To handle this case, we recurse by finding a -patterns in , and -pattern in . At a high level, if one takes independent uniform samples from , then one of them is likely to fall in the middle part of , so that and , allowing us to proceed recursively. While this description omits a few details, the intuition proceeds similarly to [NRRS17], except that the recursion occurs only on one width, namely, , and does not lose multiplicative logarithmic factors as in the previous approaches.

The overshooting component.

The other case, of overshooting, occurs when is significantly larger than . We expand on the main ideas here in more detail; the strong guarantee given by the robust splittable intervals condition adds a “for all” element into the structural characterization, which is able to treat the problem posed by overshooting in a rather surprising and non-standard way. Since is much larger than , there exist intervals satisfying the following conditions:

  • lies immediately after the interval (recall that is the interval containing ).

  • lies immediately after , for any .

  • and for any , for some large enough .

For any , set to be the minimal interval containing both and . The robust splittable intervals condition asserts that (since each contains the splittable interval ) the number of disjoint -copies in is proportional to , and provided that is large enough, this means that also contains a collection of disjoint -copies. We now define two sets and as follows. Let be the collection of prefixes of -tuples from with , and let be the collection of suffixes of -tuples from with . As , one of and is large (i.e. has size at least ).

This seemingly innocent combinatorial idea can be exploited non-trivially to find a -copy. Specifically, the algorithm to handle overshooting aims to find (recursively) shorter increasing subsequences in , with the hope of combining them together into a -copy. Concretely, for any , we make two recursive calls of our algorithm on : one for a -increasing subsequence in whose values are at least 111111Technically speaking, our algorithm can be configured to only look for increasing subsequences whose values lie in some range; we use this to make sure that shorter increasing subsequences obtained from the recursive calls of the algorithm can eventually be concatenated into a valid length- one. and a second for an -increasing subsequence in , with values smaller than . By induction, the first recursive call succeeds with good probability if is large, while the second call succeeds with good probability if is large. Since for any either or must be large, at least one of the following must hold.

  • is large. In this case we are likely to find a length- monotone pattern in with values at least , which combines with to form a length- monotone pattern.

  • is large. Here we are likely to find a length- monotone pattern in whose values lie below , which combines with to form a length- monotone pattern.

  • There exists where both and are large. Here we will find, with good probability, a length- monotone pattern in with values below , and a length- monotone pattern in with values above ; together these two patterns combine to form a -pattern.

In all cases, a -increasing subsequence is found with good probability.

Finally, for the query complexity, our algorithm (which runs both the “fitting” component and the “overshooting” component, to address both cases) makes queries: each call makes queries in itself and additional calls recursively, where the recursion depth is bounded by . It follows that the total query complexity is of the form .

In this case, we will use the robust structural result to identify a collection of adjacent intervals which lie between and each contains many -patterns. We recurse twice on each for : in one execution, we seek a -pattern lying below , and in the other execution, we seek a -pattern lying above . Since each contains many -patterns, one of the two executions will succeed. Lastly, these intervals lie within and ; which will mean that we will be able to combine some of the patterns found to form a -pattern.

Recall the above proposed algorithm to approximate an interval by hitting a -entry from and -entry from . Let be a candidate for a -entry, which we assume to lie in some splittable interval , whose characteristics are unknown to us at this point; the probability of a random to indeed be a valid -entry in some interval is , so we can take choics of and run what follows for each separately. For each , make random queries in , and take to be the rightmost (in terms of location) among the queried elements for which . Assuming that is indeed a valid -entry, with good probability we query an element which is a -entry in . Assuming that such an element is indeed queried, our algorithm treats separately the overshooting case – the case where is not contained in the interval – and the fitting case, in which is in the interval (or in close proximity to it). Note that our algorithm does not “know” at this point which of the cases holds, if at all.

Handling the overshooting case.

The strong guarantee given by the robust splittable intervals conditions adds a “for all” element into the structural characterization, which is able to treat the problem posed by overshooting in a rather surprising and non-standard way. Given and as in the last paragraph, if is large enough as a function of , then there exist intervals satisfying the following conditions:

  • lies immediately after the interval (which is the interval containing ).

  • lies immediately after , for any .

  • for any , where again is large enough as a function of and . Moreover, .

For any , set to be the minimal interval containing both and . The robust splittable intervals condition asserts that (since each contains the splittable interval ) the number of disjoint -copies in is proportional to , and provided that is large enough, this means that also contains a collection of disjoint -copies. We now define two sets and as follows. Initialize both sets to be empty, and for any copy , add to if , otherwise add to . In the end of the process, either contains disjoint copies whose smallest element is at least than , or contains disjoint copies whose largest element is smaller than .

This seemingly innocent combinatorial idea can be exploited non-trivially to find a -copy. Specifically, the algorithm to handle overshooting aims to find (recursively) shorter increasing subsequences in , with the hope of combining them together into a -copy.121212Technically speaking, to make sure that shorter increasing subsequences can be combined into a longer one, our (recursive calls to the) algorithm receives, as part of its input parameters, the range of values in which it is required to find an increasing subsequence, as well as the interval in which the subsequence should reside. For example, if we require one call of the algorithm to have values in for some and locations in , while another call to the algorithm receives the value range for some and the interval as an input, then we can rest assured that outputs from these two calls can be combined into a longer increasing subsequence. Concretely, for any , we make two recursive calls of our algorithm on :

  1. The first recursive search is for a -increasing subsequence in the interval , whose values are at least .131313More accurately, the range of allowed values for the recursive call is the intersection of the input range with , and the interval in which this call should operate is .

  2. The second search is for an -increasing subsequence in , with values smaller than .

By induction, the first recursive call succeeds with good probability if is large (of order ), while the second call succeeds with good probability if is large. Since for any either or must be large, at least one of the following must hold.

  • is large. In this case, with good probability the first call on returns an increasing subsequence where . By concatenating with this subsequence, a -increasing subsequence is formed, as desired.

  • is large. In this case, with good probability the second call on will return a -increasing subsequence with all values smaller than , which combined with forms a -copy.

  • Otherwise, there exists some where both and are large. Here, with good probability an -increasing subsequence from with maximum value less than and a -increasing subsequence from with minimum value at least will be found, and together they can be concatenated to form a -increasing subsequence as desired.

In all cases, a -increasing subsequence is found with good probability. This rather surprising technique settles the overshooting case, where is far from the interval in which lies. We now turn to the other regime, where a more standard approach suffices.

Handling the fitting case.

Now suppose that and are as above, and suppose that . In this case, and

serve as relatively good estimations for the endpoints of

. This implies that there is a family of disjoint copies of such that the -entry of any -copy in lies to the left and below each -entry of an -copy in .

We make random queries in order to find, with good probability, an element such that for at least of the subsequences in , their -entry lies to the right of while their -entry lies to the left of (note that any suffices for this purpose). Next, we make random queries in order to find, with good probability, an element such that lies below the -entry of at least subsequences in whose -entries are also to the right of , and above the -entry of at least subsequences in whose -entry is to the left of . Putting everything together, we find that there exists a collection of disjoint -increasing subsequences in whose elements lie (strictly) below , and there is a collection of disjoint -increasing subsequences in whose elements lie above . Two recursive call of the algorithm will find, with high probability, a -increasing subsequence in that lies below , and a -increasing subsequence in that lies above , which toghether form the required -increasing subsequence.

Shoham: updated the description here. I noticed that Erik (as far as I can tell) uses only one element to determine how the interval is split and also how the interval of possible values is split. I believe that’s actually a mistake: if the -entries form a decresasing subsequence, that an appropriate does not exist. Anyway, I put this description here which I think should work as written (in my old write-up I also ‘guessed’ but that’s probably unnecessary), and will look into the details in Section 4 to make sure everything there works.

Putting it all together.

To summarize, our adaptive -query algorithm to find a -increasing subsequence in , assuming is -far from free of such subsequences, goes as follows.

  • First, we try to apply the (non-adaptive) algorithm for finding growing suffixes. If the growing suffixes condition holds, then this step will find the desired increasing subsequence with good probability.

  • Otherwise, the robust splittable intervals condition holds. We pick candidates for -entries, and, for each such and each , we sample random elements in .

  • Let be the rightmost element queried for which . We run the overshooting algorithm. If indeed is a well-behaved -entry of some splittable interval and is an overshoot with respect to and , then this step will find a -increasing subsequence with good probability.

  • Otherwise, is not an overshoot with respect to , and we employ the algorithm for the fitting case.

To analyze the query complexity, note that the only point in the algorithm where the query complexity depends on is in the second step when elements are sampled at all possible scales with respect to . It is not hard to verify that the query complexity is thus of the form . Indeed, it is bounded by an expression of the form

where ranges over all tuples of parameters for the recursive calls made by our algorithm (these parameters depend only on and not on ); in all such tuples, , which gives the desired bound.

Where should we talk about the lower bound (reduction from monotonicity testing)? I think it was first shown by Newman, and is a very simple construction nonetheless. Maybe just give the statement and mention that it follows from Newman’s paradigm.

1.3 Notation

All logarithms considered are base . We consider functions , where , as the inputs and main objects of study. An interval in is a set of the form . At many places throughout the paper, we think of augmenting the image with a special character to consider . can be thought of as a masking operation: In many cases, we will only be interested in entries of so that lies in some prescribed (known in advance) range of values , so that entries outside this range will be marked by . Whenever the algorithm queries and observes , it will interpret this as an incomparable value (with respect to ordering) in . As a result, -values will never be part of monotone subsequences. We note that augmenting the image with was unnecessary in [NRRS17, BECLW19] because they only considered non-adaptive algorithms. We say that for a fixed , the set is a collection of disjoint monotone subsequences of length if it consists of tuples , where and , and furthermore, for any two tuples and , their intersection (as sets) is empty. We also denote as the union of indices in -tuples of , i.e., . Finally, we let denote a large enough polynomial whose degree is (bounded by) a universal constant.

2 Stronger structural dichotomy

In this section, we establish the structural foundations – specifically, the growing suffixes versus robust splittable intervals dichotomy – lying at the heart of our adaptive algorithm. We start with the definitions. The first is the definition of a growing suffix setting, as given in [BECLW19]. For what follows, for an index define , and for any set . Note that the intervals are a partition of into intervals of exponentially increasing length (except for maybe the last one). Finally, the tuple is called the growing suffix starting at .

[Growing suffixes (see [BECLW19], Definition 2.4)] Let . We say that an index starts an -growing suffix if, when considering the collection of intervals , for each there is a subset of indices such that the following properties hold.

  1. We have for all , and .

  2. For every where , if and , then .

The second definition, also from [BECLW19], describes the (non-robust) splittable intervals setting.

[Splittable intervals (see [BECLW19], Definition 2.5)] Let