Discovering and Certifying Lower Bounds for the Online Bin Stretching Problem

01/04/2020
by   Martin Böhm, et al.
0

There are several problems in the theory of online computation where tight lower bounds on the competitive ratio are unknown and expected to be difficult to describe in a short form. A good example is the Online Bin Stretching problem, in which the task is to pack the incoming items online into bins while minimizing the load of the largest bin. Additionally, the optimal load of the entire instance is known in advance. The contribution of this paper is twofold. First, we provide the first non-trivial lower bounds for Online Bin Stretching with 6, 7 and 8 bins, and increase the best known lower bound for 3 bins. We describe in detail the algorithmic improvements which were necessary for the discovery of the new lower bounds, which are several orders of magnitude more complex. The lower bounds are presented in the form of directed acyclic graphs. Second, we use the Coq proof assistant to formalize the Online Bin Stretching problem and certify these large lower bound graphs. The script we propose certified as well all the previously claimed lower bounds, which until now were never formally proven. To the best of our knowledge, this is the first use of a formal verification toolkit to certify a lower bound for an online problem.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

01/13/2020

Lower Bounds for Shoreline Searching with 2 or More Robots

Searching for a line on the plane with n unit speed robots is a classic ...
07/15/2018

A new lower bound for classic online bin packing

We improve the lower bound on the asymptotic competitive ratio of any on...
11/28/2017

Lower Bounds for Approximating the Matching Polytope

We prove that any extended formulation that approximates the matching po...
12/07/2020

Stronger Calibration Lower Bounds via Sidestepping

We consider an online binary prediction setting where a forecaster obser...
07/09/2019

New Competitiveness Bounds for the Shared Memory Switch

We consider one of the simplest and best known buffer management archite...
09/21/2019

Automated Lower Bounds on the I/O Complexity of Computation Graphs

We consider the problem of finding lower bounds on the I/O complexity of...
03/08/2019

Stronger Lower Bounds for Online ORAM

Oblivious RAM (ORAM), introduced in the context of software protection b...

1 Introduction

The problem Online Bin Stretching has been introduced by Azar and Regev [azar2001binstretch] as a semi-online generalization of the Online Bin Packing problem. Specifically, the task consists of packing various-size elements (items) arriving in an online fashion into different bins. The problem belongs to the category of semi-online problems as there is a guarantee (known beforehand) that all the input items can be packed into bins of a given size . The objective is to minimize the load of the largest bin. The performance measure (here named the stretching factor) of an online algorithm is the maximum for all inputs of the load of the largest bin divided by .

Note that this setting is equivalent to the classical scheduling problem Online Makespan Minimization where the optimal makespan of the instance is known in advance to the algorithm.

Lower bounds for Online Bin Stretching

In the introductory paper, Azar and Regev [azar2001binstretch] provide a lower bound of as well as an algorithm achieving a stretching factor of . In the special case of 2 bins, it is known that this lower bound is tight as there is an algorithm with stretching factor . More efficient algorithms have since been proposed, and the current best algorithms designed by Böhm et al. [bohm2014algo] have a stretching factor of for any number of bins and for exactly 3 bins.

bins: [0, 0, 0]next: 1

bins: [1, 0, 0]next: 1

bins: [2, 0, 0]next: 2

bins: [1, 1, 0]next: 3

bins: [2, 2, 0]next: 2

bins: [3, 1, 1]next: 3packing: [{3}; {3}; {1,1}]

bins: [2, 2, 2]next: 2packing: [{2,1}; {2,1}; {2}]
Figure 1: Tree describing the lower bound of for 3 bins of size 3. In each node, the first list represents the current load of each bin and the second number represents the next item to appear in the online instance. At the leaves, a packing of the relevant items in bins of size 3 is provided.

The original lower bound of for any number of bins is depicted in Figure 1. Each node of the tree corresponds to a state of the online process: the online algorithm has packed the current items in the bins, and the next item of the instance is provided. Each child of a node represents a possible choice for the online algorithm in which all bin sizes are less than . Every leaf node contains a proof of existence of a packing that fits all items into bins of capacity .

Since the publication of the lower bound above in 1998, significant effort has been spent by several research groups in order to discover a new lower bound with a better ratio. Despite those efforts, no better lower bound is known for general .

Positive progress has been made for cases with small fixed values of . Gabay, Brauner and Kotov [gabay2013] present a lower bound of for using an extensive computer search, essentially an implementation of the minimax algorithm in this setting.

As the binary result (true or false) of such a computer program should not be blindly trusted, as it is prone to human error, they produced a decision tree, similar to

Figure 1 in order to prove the result. Printing out this tree required a 6-page appendix and it can therefore still be verified by a human, but such a task is quite tedious. This lower bound has been independently generalized to machines by Gabay et al. [gabay2017improved] and Böhm et al [bohm2017LB], and the strategy was already too large to be printed on paper. Subsequent research by Böhm et al. [bohm2017LB] leads to the current state described in  Table 1. The first contribution of this paper is to extend these results as presented in Table 2. Specifically, the lower bound of , which was already established for , is now established for the settings . For , which is the only setting for which a lower bound larger than is known, we have also improved it from to . It should be noted that the size of the trees involved has dramatically increased, going from a few thousands of nodes to billions of nodes. This is the consequence of several major improvements in the computer program, which were previously described in the PhD thesis of one of the authors [bohm2018phd], and which we detail in this paper.

Certified algorithms

Due to the enormous increase in the size of the strategy output, the aforementioned researchers had to resort to a separate program, which can be called a checker, in order to verify the validity of the tree. Therefore, the lower bounds proved so far depend on the correctness of this checker program. It should be noted that the trees are not actually stored as explicitly, but rather use a DAG structure in order to avoid duplicate subtrees.

Value of
Lower bound 4/3 45/33 19/14 19/14
1.333 1.3636 1.357 1.357
Tree nodes 5 5080 433 3908
Table 1: Previously known lower bounds and number of nodes in the tree describing them.
Value of
Lower bound 4/3 112/82 19/14 19/14 19/14 19/14 19/14
1.333 1.3658 1.357 1.357 1.357 1.357 1.357
Tree nodes 5 k 433 3908 M M G
Table 2: Current known lower bounds and number of nodes in the tree describing them.

This method of computing lower bounds falls into the definition of certifying algorithms, which were introduced by Blum and Kannan in [blum1995certif]. Such an algorithm can be defined as providing a certificate, or a witness in addition to the classic output: given an input , it computes the output and provides a witness . The certifying algorithm is accompanied by a checker program, which is typically much simpler, and which can verify, given , , and , that is a valid solution. In our context, the witness corresponds to the tree describing the strategy. Such a strategy has for instance been adopted in the algorithmic library LEDA [mehlhorn1997leda] concerning the maximum cardinality matching problem on graphs. The remaining drawback of this approach is that the checker program still has to be correct in order to trust the solution . While the program was arguably simple, Alkassar et al. [alkassar2014certif] used the automatic verifier VCC and the interactive theorem prover Isabelle [nipkow2002isabelle] in order to build a formal proof of the correctness of the checker program. Surprisingly, there was a bug in the checker, which could make it accept a wrong solution for some ill-formed witness. For a complete survey on the domain of certifying algorithm, we refer the reader to [mcconnell2011certifsurvey]. Subsequent works on checker verification can be found in [trustworthyGraph, noschinski2016formalizing, ShortestPath-AFP].

The lesson that one can learn from this example is that it is arguably dangerous to base a result on the output of a non-trivial program, even if this program seems simple, such as the checker of the online bin stretching lower bounds. The second contribution of this paper is therefore to provide a certified checker. Specifically, we use the proof assistant Coq [barras1997coq] to formalize the online bin stretching problem. Then, we build a checker in the Gallina language used by Coq. We prove that if this checker returns true given a strategy tree, then the corresponding lower bound is valid. Finally, we run the program on the existing trees in order to certify their validity. To the best of our knowledge, this is the first time that a proof assistant software is used to certify such a lower bound found by computer search.

It should be noted that we do not provide any certified result if the computer search procedure does not find a lower bound. As the item sizes are constrained in the computer search, there may exist a lower bound requiring other item sizes.

The rest of the paper is organized as follows. In Section 2, we formally define the problem. In Section 3, we describe in detail the program that we used to improve the best known lower bounds via computer search. In Section 4, we propose a formalization of the lower bound property in Coq, prove that this property matches our definition, and detail the results obtained on the best known lower bounds. Note that we do not detail the Coq proofs nor the checker, as they are not necessary to prove the result. The complete code is available online at [GithubSearch] and [GithubCoq].

2 Bin stretching as a two-player game

In this section, we formally define Online Bin Stretching with integer-sized items as an equivalent two-player game with the two players named Algorithm and Adversary.

The bin stretching game will be parametrized by three positive integers and . Before proceeding formally, we wish to note that stands for the number of bins (machines) in the instance, stands for the target of the Algorithm and corresponds to the guarantee that the Adversary must satisfy.

During each round (indexed by ), the player Adversary chooses a positive integer , corresponding to the size of the next item of the input sequence. After that, the player Algorithm chooses a bin index between and , into which he packs this item. The player Adversary wins if and only if there exists a round such that:

  1. (Hitting the target.) Algorithm loads a bin to capacity , i.e., there exists a bin index such that .

  2. (The guarantee.) There exists a packing of the items into bins with capacity at most .

Note that after some amount of rounds (at most during the -th round), Adversary cannot win as any subsequent packing will have a load of at least . Thus, whenever the player Adversary is unable to present any item, we note the game state as winning for the player Algorithm.

If Adversary has a winning strategy, we say that satisfies the property . This implies that no online algorithm can solve the Online Bin Stretching problem with a stretching factor smaller than .

We now extend this game with a starting state composed of a list of positive integers and a list of nonnegative integers indexed from 0 to . These parameters can be seen as defining a state of the game, where is a set of items that were sent previously by Adversary, and describes the current bin loads of Algorithm. The rounds are played in the same way. Adversary wins if and only if at one round , there exists an integer such that and there exists a packing of the items in bins with load at most .

If Adversary has a winning strategy, we say that satisfies the property .

If is composed only of zeros, we then have by definition:

2.1 Game state

We now define several terms that help us discuss the state of the bin stretching game effectively.

Definition 1.

A bin configuration is a state of the game before the player Adversary makes a move (presents an item). Such a configuration can be represented as a pair , where is an -tuple of loads of the bins and is a list of items such that these items can be packed into bins forming exactly the -tuple .

We also define the extended representation of a bin configuration as the -tuple , where each is a list of items which are currently packed into bin .

For an example, suppose and the following items were presented by the player Adversary so far: . Then, one possible bin configuration might be with the extended representation being .

A careful reader will observe there can be another extended representation of the same bin configuration, namely . However, it is true that, from the point of view of both Algorithm and Adversary, the game state is the same – the loads are the same and the sequence of items is also. There is nothing in the second representation that either player can use to their benefit compared to the first representation. Thus, it is correct to treat them as variants of a single bin configuration.

3 Computing new lower bounds via computer search

Our implemented algorithm is a parallel, multi computer implementation of the classical minimax game search algorithm. We now describe a pseudocode of its sequential version. The main procedure of the minimax algorithm is the procedure Sequential as stated below, which recursively calls the evaluation subroutines EvaluateAdversary and EvaluateAlgorithm. The peculiarities of our algorithm (caching, pruning, parallelization) are described in the following sections.

One of the differences between our algorithm and the algorithm of Gabay et al. [gabay2017improved] is that our algorithm makes no use of alpha-beta pruning – indeed, as either Algorithm or Adversary has a winning strategy from each bin configuration, there is no need to use this type of pruning.

Input is a bin configuration .

1:if the configuration is cached (Section 3.2), return the value found in cache.
2:Create a list of items which can be sent as the next step of the player Adversary (Section 3.1).
3:for every item size in the list  do
4:     Recurse by running .
5:     if  returns (the configuration is winning for player Adversary), stop the cycle and return .
6:if the evaluation reaches this step, store the configuration in the cache and return (player Algorithm wins).
Algorithm 1 Procedure EvaluateAdversary

Input is a bin configuration and item .

1:Prune the tree using known algorithms (Section 3.3.1).
2:for any one of the bins do
3:     if  can be packed into the bin so that its load is at most  then
4:         Create a configuration that corresponds to this packing.
5:         Run .
6:         if  returns 1, return 1 as well.      
7:if we reach this step, no placement of results in victory of Algorithm; return 0.
Algorithm 2 Procedure EvaluateAlgorithm

Input is a bin configuration .

1:Fix parameters .
2:Run .
3:if  returns  then
4:     return success (a lower bound exists).
5:else
6:     return failure.
Algorithm 3 Procedure Sequential

3.1 Verifying the offline optimum guarantee

When we evaluate a turn of the Adversary, we need to create the list of items that Adversary can actually send while satisfying the Online Bin Stretching guarantee. In other words, we compute the value representing the maximum item size that the adversary can send while satisfying the guarantee. We do this operation inside the procedure MaxFeas, which we describe in Section 3.1.1.

3.1.1 Procedure MaxFeas

If we wish to directly compute the maximum feasible value which can be sent from the configuration where , we can do so by calling the dynamic program DynprogMax (Section 3.1.2). The complexity of DynprogMax is in the worst case (and that is if we ignore potential slowdowns via hashing).

This is polynomial when is a constant, but already for and especially when

such a call per every game state becomes prohibitively expensive. Therefore we first compute estimates

and on the value such that . Ideally, our faster estimates can get to the ideal value directly, making the dynamic programming call unnecessary.

We first initialize the upper bound from previous computations. The upper bound will be set as , where is the maximum feasible value that was computed in the previous vertex of Adversary’s turn, and is the total size of all items in the instance. The second term is therefore the sum of all items that can yet arrive in this instance.

Online Best Fit.

To find the first lower bound on quickly, we employ an online bin packing algorithm Online Best Fit. This algorithm maintains a packing of items to bins of size during the evaluation of the algorithm Sequential, packing each item as it is selected by the player Adversary. The algorithm Online Best Fit packs each item into the most-loaded bin where the item fits.

Once the algorithm Sequential selects a different item and evaluates a different branch of the game tree, Online Best Fit removes from its bin and inserts to the most-loaded bin where fits.

As Online Best Fit maintains just one packing, which may not be optimal, it can happen that Online Best Fit is unable to pack the next item even though is a feasible item. In that case, we mark the packing as inconsistent and do not use the lower bound from Online Best Fit until its online packing becomes feasible again.

If the packing maintained by Online Best Fit is still feasible, we return as the lower bound value the amount of unused space on the least-loaded bin.

The main advantage of Online Best Fit is that it takes at most time per each step, and especially for the earlier stages of the evaluation its returned value can match the value of .

Checking the cache.

Next, if a gap still remains between and , we try to tighten it by calling a procedure Query which queries the cache of feasible and infeasible item multisets. The procedure has a ternary answer – either an item multiset was previously computed to be feasible, or it was computed to be infeasible, or this item set is not present in the cache at all.

We update to be the largest value which is confirmed to be feasible, and update to be less than the smallest value confirmed to be infeasible.

Best Fit Decreasing.

If the values and are still unequal, we employ a standard offline bin packing algorithm called Best Fit Decreasing. Best Fit Decreasing takes items from and first sorts them in decreasing order of their sizes. After that it considers each item one by one in this order, packing it into a bin where it “fits best” – where it minimizes the empty space of a bin. We can also interpret it as first sorting the items in decreasing order and then applying the algorithm Online Best Fit defined above.

As for its complexity, Best Fit Decreasing takes in the worst case time. It does not need to sort items in , as the internal representation of keeps the items sorted.

As with Online Best Fit, the lower bound will updated to the maximum empty space over all bins, after Best Fit Decreasing has ended packing. Such an item can always be sent without invalidating the Online Bin Stretching guarantee.

3.1.2 Procedure DynprogMax

Procedure DynprogMax is a sparse modification of the standard dynamic programming algorithm for Knapsack. Given a multiset on input, our task is to find the largest item which can be packed together with into bins (knapsacks) of capacity each.

We use a queue-based algorithm that generates a queue of all valid -tuples that can arise by packing the first items. We do not need to remember where the items are packed, only the loads of the bins represented by the -tuple.

To generate a queue , we initialize it to be an empty queue. Next, we traverse the old queue and add the new item to all bins as long as it fits, creating up to new tuples that need to be added to .

Unsurprisingly, we wish to make sure that we do not add the same tuple several times during one step. We can use an auxiliary array for this purpose, but we have ultimately settled on a hash-based approach.

We use a small array of -bit integers (of approximately elements). When considering a tuple that arises from adding to one of the bins in the tuple , we first compute the hash of the tuple . Since we use Zobrist hashing (see Section 3.2), this operation takes only constant time.

Next, we consider adding to the queue . We use the first bits of (let denote their value) and add to when – in other words, when the small array contains something other than the hash of at the position . We update to contain and continue.

While our hashing technique clearly can lead to duplicate entries in the queue, note that this does not hurt the correctness of our algorithm, only its running time in the worst case.

We continue adding new items to the tuples until we do steps and all items are packed. In the final pass of the queue, we look at the empty space in the least-loaded bin. The output of DynprogMax and the value of is the maximum value of over all tuples in the final pass.

Ignoring the collisions of the hashing scheme (which can happen but will not play a big role if we compute the expected running time based on our randomized hashing function), the time complexity of the procedure MaxFeas is quite high in the worst case: .

Nonetheless, we are convinced that our approach is much faster than implementing MaxFeas

using integer linear programming or using a CSP solver (which has been done in

[gabay2017improved]) and contributes to the fact that we can solve much larger instances.

3.2 Caching

Our minimax algorithm employs extensive use of caching. We cache solutions of the dynamic programming procedure MaxFeas as well as any evaluated bin configuration (as a hash) with its value.

Hash table properties.

We store a large hash table of fixed size with each entry being a 64-bit integer corresponding to bits of hash and a binary value. The hash table is addressed by a prefix of the hash, usually between bits (depending on the computer used).

We solve the collisions by a simple linear probing scheme of a fixed length (say ). In it, when a value needs to be inserted to an occupied position, we check the following slots for an empty space and we insert the value there, should we find it. If all slots are occupied, we replace one value at random.

Hash function.

Our hash function is based on Zobrist hashing [zobrist], which we now describe.

For each bin configuration, we count occurrences of items, creating pairs belonging to , where is the item type and its frequency (the number of items of this size packed in all bins).

As for the loads of the

bins, we maintain that they are sorted in descending order. We also think of them as ordered pairs

, with being the position of the bin in the ordering (e.g. – largest, – smallest) and the actual value of the load.

For example, we can think of bin configuration as a set of load pairs , , along with pairs for items: , , , , and so on.

At the start of our program, we associate a -bit number with each pair . We also associate a -bit number for each possible load of one bin. These two sets of numbers are stored as a matrix of size and a matrix of size .

The Zobrist hash function is then simply a XOR of all associated numbers for a particular bin configuration.

The main advantage of this approach is fast computation of new hash values. Suppose that we have a bin configuration with hash . After one round of the player Adversary and one round of the player Algorithm, a new bin configuration is formed, with one new item placed.

Calculating the hash of can be done in time , provided we remember the hash ; the new hash is calculated by applying XOR to , the new associated values, and the previous associated values which have changed.

Caching of the procedure MaxFeas.

We use essentially the same approach for caching results in the procedure MaxFeas, except only the -tuple of loads needs to be hashed.

We also remark upon the values being cached in the procedure MaxFeas. At first glance, it seems that it might be best to store the value of with each input multiset . However, this is a very bad idea, as we would lose upon a lot of symmetry.

Indeed, if we set to be any item from the list , we would lose out on the fact that we know a lower bound on the largest value that can be sent for a multiset – namely , the value we know is compatible.

Instead, it is much better to cache binary feasibilities or infeasibilities for a specific multiset . We use these results to improve the values of and for other calls of procedure MaxFeas.

3.3 Tree pruning

Alongside the extensive caching described in Subsection 3.2, we also prune some bin configurations where it is possible to prove that a simple online algorithm is able to finalize the packing. Such a bin configuration is then clearly won for player Algorithm, as it can follow the output of the online algorithm.

3.3.1 Algorithmic pruning

Recall that in the game , the player Algorithm is trying to pack all items into bins with load at most . If the search algorithm can quickly deduce that a bin configuration leads to a successful packing, we can immediately evaluate the configuration as winning for the player Algorithm and thus prune the tree.

We can lift several such winning tests – so-called good situations for the player Algorithm– from the algorithmic results of Böhm, Sgall, van Stee and Veselý [bohm2017LB]. However, since the number of bins rises from in [bohm2017LB] up to , the situations can not always be directly generalized.

We now state the new situations that we have generalized from [bohm2017LB] for with .

For the following, we set to be the extra space that the player Algorithm can use without losing, namely .

Good Situation 1.

Given a bin configuration such that the total load of all but the last bin is at least , there exists an online algorithm that packs all remaining items into bins of capacity .

Proof.

If the total amount packed is , the remaining volume for the instance is , which will always fit on the last bin. ∎

Good Situation 2.

Given a bin configuration such that there exist two bins such that:

  1. ,

  2. there exists a bin with load below .

then there exists an online algorithm that packs all remaining items into bins of capacity .

Proof.

We pack the remaining input first into until an item cannot fit – we place that item into , where it always fits. After the item is packed into , the load of , which means Good situation 1 is reached. ∎

Good Situation 3.

Consider a bin configuration . Define the following sizes:

  1. Let be the sum of loads of all bins excluding the last two.

  2. Let (the last bin load requirement) be the smallest load such that if the currently last bin has load at least , GS1 is reached (after reordering the bins).

  3. Let (the overflow) be defined as .

Then, if and:

  • either the second-to-last bin has load at most and above ;

  • or the last bin has load at most but above ;

there exists an online algorithm that packs all remaining items into bins of capacity .

Proof.

Let be the critical bin having load bounded by and let be the other bin. The algorithm packs greedily into . If reaches the threshold load , then GS1 is reached. Assuming it does not, there is now an item of size that can be packed into . Summing up load on , we get , which is sufficient fog GS1. ∎

3.3.2 Adversarial pruning

Compared to our fairly strong algorithmic pruning, we have only few tools to quickly detect that a bin configuration is winning for the player Adversary. More specifically, we use only the following two criteria:

Large item heuristic.

Once any bin has load at least , an item of size packed into that bin would cause it to reach load , which is a victory for the player Adversary. Suppose that the -th bin reaches load . We compute the size of the smallest item such that

  1. ;

  2. For any bin in the interval it holds that ; in other words, Algorithm cannot pack two items of size into any bin starting from the -st.

Finally, we check if Adversary can send copies of the item of size . If so, it is a winning bin configuration for this player and we prune the tree.

Notice that there may be multiple different values of for one bin configuration; for instance, in the setting of , for three bins with loads , we should check whether we can send items of size or items of size . Therefore, in the implementation, we compute for each bin its own candidate value of and then check whether at least one is feasible using the dynamic programming test described in Section 3.1.

Five/nine heuristic.

We use a specific heuristic for the case of

, as it is a good candidate for a general lower bound. This heuristic was experimentally observed to slightly compress the size of the output tree in this setting.

This heuristic comes into play once there is a bin of load at least and once all bins are non-empty (even load is sufficient). The item sizes and are complementary in the sense that one of each can fit together in the optimal packing of capacity 14, but the two of them cannot be packed together into a bin that already has load at least .

A pair of items of size also cannot fit together into any other bin – as all the bins have already load at least .

Finally, if there are too many bins of load at least but not much more, a subsequent input of several items of size will again force a bin of load at least .

We apply this heuristic only when it is true that at all times, items of size can arrive on input without breaking the adversarial guarantee. While this is true, it must be true that all bins are of load strictly less than .

Our heuristic considers repeatedly sending items of size . If at any point there are only bins left with load strictly less than and at the same time items of size can arrive on input, the configuration is winning for the player Adversary. On the other hand, if at any point there is a bin of load at least and the invariant that items of size can still arrive holds, we are also in a winning state for Adversary.

If it is true that by repeatedly sending items of size we eventually reach at least one of the aforementioned two situations, we mark the initial bin configuration as winning for the player Adversary.

A note on performance.

While both of our heuristics reduce the number of tasks in our tree and the number of considered vertices, we were unable to evaluate them in every single vertex of the game tree without a performance penalty. Even the large item heuristic, which can be implemented with just one additional call to the dynamic programming procedures of Section 3.1 slows the program down considerably.

This is likely due to the fact that caching outputs of the dynamic programming calls of Section 3.1 leads to some vertices that do not need to call any dynamic programming procedure, and with our heuristics they are forced to call at least one.

3.4 Monotonicity

One of the new heuristics that enables us to go from a lower bound of on bins to bins is iterating on lower bounds by monotonicity. We define it as follows:

Definition 2.

A winning strategy for Adversary has monotonicity if it is true that for any two items such that is sent immediately after , we have .

Using this concept, we can iterate over from (non-decreasing instances) to (full generality) to find the smallest value of monotonicity which leads to a lower bound, if any.

A potential downside of iterating over monotonicity is that it can introduce an -fold increase in elapsed time in the case that no lower bound exists. Additionally, it is quite likely that monotonicity becomes less useful as the value of increases, as the item of relative size gets smaller and smaller.

Still, solving decision trees of low monotonicity is much faster than solving the full tree, and we have empirically observed that lower bounds of lower monotonicity are fairly common; see Tables 3 and 4 for our empirical results.

Monotonicity caveat.

It is important to remark that when looking for a lower bound for a specific monotonicity value, it is now true that a bin configuration is not enough to describe one state of the bin stretching game. To see this, consider monotonicity . If the first three input items are , the next item needs to be of size or larger. However, if the three input items are (which is permissible for monotonicity ), the next item on input can be of size and above. This means that the two states are not equivalent, even though their bin configuration is the same.

To remedy this, we internally extend the definition of the bin configuration by also marking which item arrived last in the input sequence, which is sufficient for a fixed value of the monotonicity.

3.5 Parallelization

Up until now, we have described a single-threaded minimax algorithm with caching and pruning. To get the computing power necessary for results above bins, we have implemented the minimax search as a parallel program for a computer cluster. We now describe the particulars of this implementation.

Tasks.

Our evaluation of the game tree proceeds in the following way: first, we start evaluating the game tree on the main computer (which we internally call queen) until a vertex corresponding to Adversary’s next move meets a certain threshold (for instance, sufficient depth). After that, we designate this adversarial vertex as a task.

Alongside the queen, we have processes whose job is to evaluate the tasks – we call them the workers. Workers which run on the same machine will have a common cache that they access via atomic primitives in order to maintain consistency. Workers on separate machines do not share information.

Due to the mixed environment of standard Unix threads and MPI processes, we also have a single overseer per each physical machine. This overseer handles the MPI communication as well as spawning the individual worker threads.

The tasks are all generated in advance by the queen. After that, their bin configurations are synchronized with all overseers running. The queen then assign tasks to overseers online, namely by assigning a batch of 250-500 tasks to an overseer. The overseer reports each value of a finished task immediately to the queen. When an overseer is finished processing a batch, it requests and receives a new one.

We have selected this communication strategy for two reasons:

  1. To minimize congestion in the processing phase through the fact that the bin configurations are synchronized beforehand and only identifiers are shared in the online assignment phase.

  2. To allow the queen to evaluate and prune unfinished tasks and therefore avoid some unnecessary processing by the workers.

Task selection.

As mentioned above, an important decision to be made by the lower bound algorithm designer is where to split a vertex of the game tree into a task and send it to be processed in the parallel environment.

Based on our experiments, it seems that maintaining a right balance of the number of tasks as well as their running time is crucial to good performance. When the tasks are too shallow, the performance of the algorithm is dominated by the elapsed time of the most difficult task in the list, which diminishes the gains coming from the parallel implementation.

On the other hand, if there are millions of tasks, the algorithms will still work correctly but we might lose performance from diminishing advantages of individual caching as well as due to pruning happening later in the process.

Previously, we have only used task depth as the principal guideline – when items arrived on input (with usually in the range of ), we mark the bin configuration as a new task.

However, experimenting with running time has shown us that the presence of a larger item makes the evaluation process much faster than our expectation. Therefore, we have ultimately settled an a mixed task threshold function which takes into account both the task depth and also the task load , which is the sum of sizes of all items arrived so far in the instance. We split off a task when its task load is above , and failing that when its task depth is below .

We have settled on setting and to be around of the optimal bin capacity . This way we get deeper bin configurations for very small items which experimentally seems to imply a shorter running time and a similar amount of tasks.

Initial strategy.

Our implementation also allows us to pre-select some initial strategy for the player Adversary in advance. This way we can use our (so far limited) intuitive understanding of what is a good initial move and decrease the time needed to evaluate the whole tree.

A particularly good strategy for the lower bound of seems to be sending an item of size as the first item, followed by items of size . This adversarial strategy leads to a lower bound instance for and bins.

We have therefore implemented a way to pre-select items to be sent in the first few rounds of the game. Given such a list of items, we compute all possible moves of the player Algorithm and create a queue of bin configurations that we each evaluate sequentially.

The fact that already this linear, non-adaptive strategy of sending is enough to get a lower bound of for bins was a pleasant surprise to us. We believe this fact is due to the size of the sequence being already non-trivial (the item alone occupies slightly more than of one stretched bin).

A natural extension is to allow the user to input a partial game tree (an adaptive strategy for the player Adversary) and have the algorithm evaluate it sequentially; this can be easily added to our implementation once we learn more about which items should be the among the first to send.

Technology.

We have settled on using a combination of OpenMPI [openmpi] and standard thread library as provided by the C++ programming language. In our setting, OpenMPI is used to provide inter-computer communication API for sending and receiving tasks as described above. We employ the standard Unix threads to spawn the worker processes themselves; this way they can easily share one large cache for evaluated bin configurations.

We have originally considered using only OpenMPI processes for both inter-computer communication as well as memory sharing on one physical computer; this functionality is present in the latest version of the MPI standard, MPI-3.0. However, after implementing the shared memory functionality, we have noticed some slowdown of the worker processes when the shared memory was large (more than 1 gigabyte). This forced us into the heterogeneous model that we use right now.

3.6 Results

Tables 3 and 4 summarize our results; we include previous results for completeness. Note that there may be a lower bound of size say even though none was found with this denominator; for example, some lower bound may reach using item sizes that are not multiples of .

Elapsed time
Fraction Decimal L. b. Mon. Linear Parallel

Yes 0 2s.
No 2s.
No 3s.
No 6s.
No 5s.
Yes 1 15s.
No
Yes 1 1min. 48s.
No 3min. 6s.
No 30min. 7s.
No 21 m. 49s.
Yes 6 29s.
Yes 8 3h. 21m. 31s.
Table 3: The results and performance of our linear and parallel computations for Online Bin Stretching with three bins. The results above the horizontal line were previously shown in [gabay2017improved] and [bohm2016]. The column L. b. indicates whether a lower bound was found when starting with the given stretching factor as seen in column Fraction. The column Mon. shows the lowest monotonicity that our program needs to find a lower bound. In the case of negative results, time measurements were done only using full generality, i.e. with monotonicity . Some fractions below are omitted; our lower bound computation has not found a lower bound on those. The linear results were computed on a server with an AMD Opteron 6134 CPU and 64496 MB RAM. The size of the hash table was set to . The parallel results were computed using OpenMPI on a heterogeneous cluster with worker processes running. The output of the program was not generated during the time measurements.
Elapsed time
Bins Fraction Decimal L. b. Mon. (5) Linear Parallel (5)
Yes 18s.
Yes 2 (1) 10s.
No 19s.
No 48s.
No 1h. 1m. 40s.
Yes 0 (0) 11s.
Yes 1 (0) 2m. 13s. (16s.)
Yes Unk. (1) (1h. 14s.)
Table 4: The results produced by our minimax algorithm for more than bins. Tested on the same machine and with the same parameters as in Table 3, both for linear and parallel computations. The result was computed subsequently in a parallel environment with 64 threads and 512 MB of shared cache. In columns Mon. and Parallel, we list in brackets monotonicity and elapsed time of computation for an input having an item of size 5 at the start. Monotonicity is measured only starting with the second item.

4 Certification

We describe in this section how we certify the results obtained using the computer search via the Coq proof assistant. We first describe the Coq formalization of the problem previously defined in Section 2. In Section 4.1, we define the relevant types and preliminary functions. In Section 4.2, we describe the core of our formalization. Specifically, we first define the function updating the bin configuration after the addition of a given item in some bin. We then define an inductive predicate that recognizes a winning strategy for Adversary, given a bin configuration. We finally use this predicate to define the main predicate . We then show in Section 4.3 that this formalization is correct: if the Coq predicate is true, then the property is true. Indeed, the goal of the Coq script is to prove that is true for given values of , , defining the game . In Section 4.3, we therefore show that this result actually implies a lower bound on this game, i.e., implies . Finally, in Section 4.4, we present the results obtained on the files generated by the program described in Section 3 and detail some features that had to be implemented in order to handle the large file sizes involved.

The code is available online at [GithubCoq], and also contains a program which translates an adversary strategy expressed using the widespread GraphViz format into a file which can be directly processed by our Coq script. Then, future lower bounds can be easily certified using the same script.

4.1 Preliminaries

We first need a few elementary definitions, after introducing the natural integer variables and requiring that is strictly positive.

The BinExtended type represents the list of items present in a given bin, and is then implemented as a list of integers. The BinLoads type represents the current load of all bins, and is then also implemented as a list of integers, one per bin. The BinsExtended type corresponds to a bin configuration in its extended representation (see Definition 1), and is internally represented as a list of types BinExtended, one per bin.

For any list of integers, the property Iszero is true if and only if the list contains only zeros and at most items. It represents the starting loads of the bins, where some bins may be omitted.

The function BinSum returns the load of a bin, given the list of items present in this bin. The function MaxBinSum returns the load of the highest bin, given the bin configuration, and the function MaxBinValue

returns the load of the highest bin given only the vector of loads. Note that

nil represents the empty list and x::s represents the list of head x and tail s.

Variables m t g : nat.
Hypothesis Posm : m > 0.
Definition BinExtended  := list nat.
Definition BinLoads   := list nat.
Definition BinsExtended := list BinExtended.
Definition Iszero l := (length l <= m) /\ (forall e, In e l -> e = 0).
Fixpoint BinSum (B: BinExtended) := match B with
| nil   => 0
| x ::s => x + BinSum s
end.
Fixpoint MaxBinSum (P: BinsExtended) := match P with
| nil   => 0
| x ::s => max (BinSum x) (MaxBinSum s)
end.
Fixpoint MaxBinValue (S: BinLoads) := match S with
| nil   => 0
| x ::s => max x (MaxBinValue s)
end.

4.2 Definition of the main properties

We now define a few properties specific to the online bin stretching problem.

We first need to define the function which takes three parameters: of type BinLoads and two integers and . This function increases the load of the -th bin by a value equal to . If is larger than the length of , a new item of value is appended to (note that S k equals k+1).

Fixpoint AddToBin (St: BinLoads) (e: nat) (b: nat) := match St,b with
| nil  , b     => [e]
| x ::s, 0     => (x+e) ::s
| x ::s, (S k) => x :: (AddToBin s e k)
end.

The predicate CompletePacking, given a list of item sizes (integers) and an element of type BinsExtended, is true if the configuration uses at least all the items of . It uses two functions which are part of the Coq standard library. The function count_occ Nat.eq_dec x y returns the number of occurrences of the element in the list and the function concat concatenates a list of list of elements.

Definition CompletePacking ($\ell$ : list nat) (P: BinsExtended) := forall e,
count_occ Nat.eq_dec $\ell$  e <= count_occ Nat.eq_dec (concat P) e.

The predicate SolutionPacking, given the same parameters as the predicate CompletePacking, is true if CompletePacking is true, the length of is equal to and no bin has load larger than . Such a packing is then a certificate that the items described in can be packed in bins of capacity .

Definition SolutionPacking ($\ell$ : list nat) (P: BinsExtended) :=
CompletePacking $\ell$  P /\  length P = m /\ MaxBinSum P <= g.

The main predicate used in the formulation is OnlineInfeasible, and is inductively defined as follows below. Note that the auxiliary variable is not necessary in the definition, but it allows the Coq prover to easily assume an induction hypothesis when inductively proving properties of OnlineInfeasible. Also note that in the Coq code, – the successor of – is equal to .

Inductive OnlineInfeasible: nat -> list nat -> BinLoads -> Prop :=
| Overflow X $\ell$  St:   t <= MaxBinValue St  -> (exists P, SolutionPacking $\ell$  P)
        -> OnlineInfeasible X $\ell$ St
| Deadend  X $\ell$  St:   length St <= m
        -> (exists e, forall b, (b < m)
                -> OnlineInfeasible X ( (S e) ::$\ell$) (AddToBin St (S e) b) )
        -> OnlineInfeasible (S X) $\ell$  St
.

The syntax implies the following equivalence.

(1)

The final predicate defined is :

Definition LowerBoundCoq := exists s,
Iszero s  /\  OnlineInfeasible (m*g+2) [ ] s.

The value in the definition of the predicate is there as a simple upper bound of the number of inductive steps sufficient for any correct proof (recall that no more than items can arrive in any valid input for Online Bin Stretching).

4.3 Correctness of the Coq formulation

We show the following theorem, which implies the correctness of the Coq formulation stated in creftypecap 1.

Theorem 1.

For any and for any , the proposition OnlineInfeasible X St implies .

Corollary 1.

implies .

Proof.

We prove this result by reverse induction on the sum of the items of . Let and be two lists of integers and be an integer larger than .

Thanks to the decreasing parameter and the fact that in the definition of , the following property holds and is proved in Coq by induction on (Theorem OI_length).

Therefore, for with a large sum, is false for any value of and so the statement is true.

Let and suppose by induction that for all whose items sum to more than , for all , the proposition implies the property .

Let be a list whose items sum to exactly , and let and such that and .

We want to show the property . Using Equation 1 and the proposition , we have two cases.

First, if holds, then one bin of has load at least and there exists a packing of the items of into bins with load at most . Therefore, the property is true.

Otherwise, there exists such that the following property holds.

(2)

Consider any possible move for Algorithm after Adversary played .

The sum of the items of the list is strictly larger than . And we have by hypothesis on :

Therefore, using Equation 2 and the induction hypothesis, we know that the property holds. So, after Algorithm played , Adversary has a winning strategy.

As this is true for all possible moves of Algorithm, we have the property , which completes the proof. ∎

4.4 Verification of a winning strategy for Adversary

We now detail how we used the results obtained in Section 3 in order to prove the property for a given game . We rely on a file, computed by the aforementioned program, which describes a winning strategy for Adversary: which moves he makes after each possible move of Algorithm, as well as the packing solutions on winning states. The format is based on the tree structure illustrated in Figure 1, with several improvements described below. In order to verify that this file is a correct representation of a lower bound, we implement in Coq a function that performs multiple checks, which ce call Check in this section. In essence, this function is analogous to the verifier program discussed in Section 1. The crucial difference is that Check is certified: a theorem, proven in Coq, states that, if Check returns true, then the predicate is valid for the game . Then, by creftypecap 1, is also valid for the game .

Although we do not detail the complete Coq script here, which exceeds 2000 lines [GithubCoq], we would like to emphasize that the format used to store the lower bound as well as the function Check that verifies it are not implemented in a naive manner because of the file sizes involved. The features implemented, which therefore complicate the Coq script proving the correctness of Check, include the following.

DAG encoding.

The naive tree decomposition of a winning strategy for Adversary details every decision that has to be made, but may contain a large number of duplicate subtrees. Indeed, several nodes of the tree correspond to the same list of items and loads of bins (up to irrelevant permutations). We therefore use a DAG structure to store these duplicates. As constant-access data structures are not available in Coq, we use a single list of trees to denote all the existing duplicate subtrees. When examining the possible outcomes from a node according to the decision of Algorithm, there are then three possibilities: it corresponds to a direct child of this node (if such a subtree is unique), it corresponds to a tree in the list , or one bin exceeds the target load. Note that trees of the list can themselves refer to subsequent trees of the same list, and we are then able to prove our results by induction. It remains to implement a fast way to check that an item is present in the next part of the list . We use for this purpose an AVL tree dictionary indexed by a pair of lists describing the current items and bin loads. To assess the importance of removing these duplicates, notice on Table 5 that it decreased the largest graph size by three orders of magnitude.

Last layer compression.

Often, the last items that are sent by the adversary are independent from the decisions made by Algorithm. However, they can represent a large portion of the nodes in the normal DAG representation. Hence, we store such a situation only as a single node with a list of upcoming items, instead of the full tree. This corresponds directly to storing only one node when the large item heuristic of Section 3.3.2 is successful. The number of nodes obtained in such a compressed DAG (cDAG) is represented in Table 5, and leads to a decrease by a factor of 5.

Binary integers.

Coq proofs often rely on the Peano arithmetic, where a natural integer is represented in a unary way by being either 0 or the successor of an integer. In order to decrease the time and resources required to prove our results, we perform the computations using a binary integer representation. We have therefore implemented two analogous functions, which we can name Check_binary and Check_unary, working respectively on binary and unary integers. We prove that these functions give the same result and that if Check_unary returns true, then is valid. Therefore, we can run the function Check_binary while using unary arithmetic in most of our proofs.

Value of
Lower bound 112/82 19/14 19/14 19/14 19/14 19/14
Tree nodes k 433 3908 M M G
DAG nodes k 236 1271 k k M
cDAG nodes k 102 408 7k 61k 598k
Time 38s 1s 2s 12s 4m30 2h
Table 5: Size of the uncompressed and compressed DAGs and (approximate) time needed to load the trees and certify each lower bound. The Running times were computed on a machine with the Intel Core i5-6600 CPU and 32 GB of RAM.

With these features implemented, we have been able to certify all the lower bound results previously published and presented in this paper. The amount of time necessary to run each Coq script is reported in Table 5.

References