On bin packing with clustering and bin packing with delays

08/19/2019 ∙ by Leah Epstein, et al. ∙ University of Haifa 0

We continue the study of two recently introduced bin packing type problems, called bin packing with clustering, and online bin packing with delays. A bin packing input consists of items of sizes not larger than 1, and the goal is to partition or pack them into bins, where the total size of items of every valid bin cannot exceed 1. In bin packing with clustering, items also have colors associated with them. A globally optimal solution can combine items of different colors in bins, while a clustered solution can only pack monochromatic bins. The goal is to compare a globally optimal solution to an optimal clustered solution, under certain constraints on the coloring provided with the input. We show close bounds on the worst-case ratio between these two costs, called "the price of clustering", improving and simplifying previous results. Specifically, we show that the price of clustering does not exceed 1.93667, improving over the previous upper bound of 1.951, and that it is at least 1.93558, improving over the previous lower bound of 1.93344. In online bin packing with delays, items are presented over time. Items may wait to be packed, and an algorithm can create a new bin at any time, packing a subset of already existing unpacked items into it, under the condition that the bin is valid. A created bin cannot be used again in the future, and all items have to be packed into bins eventually. The objective is to minimize the number of used bins plus the sum of waiting costs of all items, called delays. We build on previous work and modify a simple phase-based algorithm. We combine the modification with a careful analysis to improve the previously known competitive ratio from 3.951 to below 3.1551.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In bin packing problems, a set of items is given, where each item has a rational size in 111We allow zero sizes as they are meaningful in the second problem which we study.. The goal is to partition these items into subsets called bins, where the total size for each bin does not exceed . We use the term load of a bin for the sum of sizes of its items. The process of assigning an item to a bin is called packing, and in such a case we say that the item is packed into the bin.

We study two bin packing problems. The first problem is called bin packing with clustering. In this problem, every item has a second attribute, called a cluster index or a color. A global solution is one where items are packed without considering their clusters, i.e., it is a solution of the classic bin packing problem for this input. A clustered solution is one where every cluster or color must have its own set of bins, and items of different clusters cannot be packed into a common bin. To avoid degenerate cases, an assumption on the input is enforced. Specifically, it is assumed that every cluster is sufficiently large, and an optimal solution for each cluster has at least three bins. The problem was introduced by Azar et al. [3]. It was shown [3] that replacing this assumption with the weaker one where clusters have at least two bins makes the problem less meaningful. The goal is to compare optimal solutions, that is, to compare an optimal clustered solution to an optimal global solution, also called a globally optimal solution. We are interested in the worst-case ratio over all valid inputs, and this ratio is called price of clustering. From an algorithmic point of view, the goal is to design an approximation algorithm for which it is not allowed to mix items of different clusters, while the algorithm still has a good approximation ratio compared to a globally optimal solution. For applications of this problem in the field of massive data sets, see [3].

The results of [3] show that the price of clustering (under the assumption above) is strictly below , and more specifically, it is at most

. The methods used to prove this are based on an auxiliary graph comparing the two different optimal solutions, and a linear program capturing the properties of worst-case inputs. A computer assisted proof was used to find an upper bound on the price of clustering. A lower bound of

was provided as well in the same work. This problem is closely related to batched bin packing [20, 12, 8, 16]. This is a semi-online problem where items are presented in a number of batches, where every batch is to be packed before the next batch is presented. There are two variants, depending on whether bins opened for earlier batches can be used for the current batch. The variant where every batch has its own bins, and the packing is compared to an optimal (offline) one where items of different batches can still be combined into bins together is closely related to our work. It is mentioned in [3] that if every cluster is arbitrarily large such that its optimal cost grows to infinity, then the price of clustering decreases to approximately (we discuss this value [5, 25, 19, 17, 27, 16] in the body of the paper in a different context). In fact, this result regarding the price of clustering with very large clusters follows directly from an earlier result for batched bin packing [16].

The second problem is bin packing with delays. In this online problem, items are presented over time to be packed into bins. An algorithm can decide to create a bin at any time by selecting a subset of already existing unpacked items. The selected subset should have total size at most , and once its bin is created, it cannot be used again for future items. Additionally, every item has a positive monotonically non-decreasing delay function , and letting be the elapsed time from the arrival date of until it is packed, the delay cost (or delay) of is . The objective is to minimize the number of bins plus the total delay cost of all input items, and the goal is to minimize this objective. For example, if every item is assigned to a bin right when it arrives, the delays are the smallest possible, but the number of bins may be very large. On the other hand, if the algorithm waits until many items arrive and it can pack them offline, the delay costs may be very large. The problem is analyzed via the competitive ratio, which is the worst-case ratio between the cost of an online algorithm and an optimal offline solution (which still deals with the input as a sequence arriving over time, but it knows the entire sequence). Competitive algorithms should find a trade-off between waiting for additional items to arrive and the resulting delay costs of already existing items, and one expects to see algorithms designed based on ski-rental type methods [24, 22, 21]. Such methods involve waiting until a certain cost is incurred before performing an action that stops the accumulation of that cost. Obviously, additional problem-specific methods are required in the design of algorithms for problems with delay costs.

Various online combinatorial optimization problems with delays were studied recently

[15, 4, 10], continuing earlier studies of ski-rental type problems. Moreover, a completely different model of bin packing with delays was studied as well [1]. Offline and online bin packing are often studied with respect to asymptotic measures [18, 23, 6, 7], while here we study them via absolute measures, as in previous work on the specific problems we study, where the absolute measure is more appropriate (see [26, 9, 13, 14] for studies of bin packing with respect to absolute measures). The two problems studied here may seem unrelated; one is an offline problem and the other one is a completely different online problem. The flavor of the first problem is not algorithmic, and the algorithmic contribution is used in the analysis. The second problem is an online problem where items arrive over time, and even if one designs an offline algorithm for it, still the time axis has a major role. Since the two problems were introduced and studied in the same work [3] where properties of the first one were used in the analysis of the second one, we study them together as well. Note that we also use properties of offline bin packing for the analysis of the online problem, as we will pack subsets of items at the same time, into one bin or several bins. Bin packing with delays is a special case of the TCP acknowledgement problem [11, 21]. In this problem requests arrive over time, and should be acknowledged at times selected by the algorithm, where at every such time, all pending requests can be acknowledged. The objective is the number of acknowledgement events plus the total waiting time of all requests. Instances of this problem are instances of bin packing with delays with zero size items and delay costs based on the identity function (there is also work on more general delay functions, see for example [2]). Using the lower bound of on the competitive ratio of any algorithm for TCP acknowledgement, a lower bound of is known also for the competitive ratio of any algorithm for bin packing with delays.

In this work, we improve the bounds on the price of clustering, and show close bounds of and . The upper bound is shown via weighting functions, while the lower bound uses a careful refinement of the previous lower bound approach, where not only clusters with items of sizes close to are defined with respect to the worst-case structure but also more complicated clusters are built. We also show how the previous upper bound result can be obtained using a simple analytical proof, and we briefly discuss other versions (with larger clusters). We also generalize the previous algorithm for bin packing with delays such that its parameter can be arbitrary. Here, we apply a simple weight based analysis to obtain a better upper bound of , while the previous bound was [3]. Our algorithm does not require computation of optimal solutions, and whenever it packs a subset of items, this is done using a greedy algorithm, and therefore it runs in polynomial time if the delay function can be computed easily.

2 Price of clustering

In this section we study the price of clustering. Note that we consider the case where optimal costs for clusters are at least . Considering a parameter , such that the optimal cost for every cluster is at least , the cases were fully analyzed and declared as uninteresting [3]. For , the price of clustering is unbounded, as an input of very small items may be partitioned into clusters containing single items. For , the price of clustering is , since every cluster may have one item slightly larger than and one item slightly smaller than , where these items cannot be packed into one bin, while in a globally optimal solution they can be packed in the suitable pairs. On the other hand, every bin is full by more than half on average. We study the most general case where the bound on the price of clustering is strictly below . For a different parameter one can use similar proofs to find close bounds (the tight bounds are expected to tend to approximately as grows). The lower bounds will have a similar structure while the upper bounds will require some modifications of the weight functions.

2.1 A lower bound

Our new lower bound has some similarity to the one of [3]. The idea was that there can be clusters with two items of sizes just above and one item of size just below , such that no two items can be combined in one bin. In our construction, there will also be clusters with five items of sizes approximately , such that no three items can be packed into a bin, so at most two of them have sizes of or less. Similarly, there will be clusters with items of sizes approximately , where no six items can be packed into a bin. It is possible to continue and have some clusters with items of sizes approximately and clusters with items of sizes approximately and so forth, but this will increase only the sixth digit after the decimal point. As already now the items sizes have to be defined carefully, and calculations need to be done precisely to ensure the costs of clusters, we do not give the details of such a construction. The current construction can be also continued with additional very small items, but that would also not increase the value of the lower bound significantly.

Let be a large integer. Let be an integer parameter divisible by , We construct an input where there is a globally optimal solution with bins. The input consists of the following items. Let be a very small value.

  • For , a positive type item has size .

    There is one such item for every .

  • For , a negative type item has size .

    There are one such item for every .

  • There are type items, each of size .

  • For , a positive type item has size . The number of positive type items is .

  • For , a negative type item has size . The number of negative type items is .

  • For , a positive type item has size . The number of positive type items is:
    .

  • For , a negative type item has size . The number of negative type items is .

  • A type item has size . The number of such items is .

  • A type item has size . The number of such items is .

  • A type item has size . The number of such items is .

It is obvious that there is no global solution whose cost is below . A globally optimal solution is defined as follows. For , there is a bin with one positive type item and one negative type item, where the total size for such a pair of items is .

Every bin out of the remaining bins has a type item, so the remaining space of such a bin is , and this is where all other items are packed. Specifically, every positive or negative type is packed into a different such bin. The number of such items is , where

Thus the number of these items is below .

The bins with the largest positive type items, that is, each of the bins with a positive type item, contains also one item of each type out of , , and . The total size for such a bin is

for a sufficiently small value of .

For , every positive type items is packed with a negative type item. For , every negative type items is packed with a positive type item. The total size of items of every such bin is exactly . As for negative type items, they are not combined with additional items and the loads of their bins are approximately . Thus, all items are packed into bins as claimed.

Next, we split items into clusters, and we find the optimal cost for every cluster (in particular we will see that it is at least as it is required for a valid input). We will calculate the total number of bins for the optimal clustered solution. In this input, every cluster will have items of similar sizes.

1. Type items are split into subsets of items each. Since a bin can contain at most such items while is sufficiently small such that items can be packed into a bin, an optimal solution has three bins. Thus, as there are clusters, the contribution to the cost is .

2. The calculation for type items is similar to the previous one. Here a cluster will have items, there are clusters, the contribution to the cost is .

3. The calculation for type items is similar to the last two calculations. Here a cluster will have items, there are clusters, the contribution to the cost is .

4. There are clusters, each containing one type item, and for some value of (), a positive type item and a negative type item. As no two items of one cluster have total size of or less, the optimal cost for each cluster is . The contribution to the cost is therefore . The remaining three items, a type item, a positive type item, and a negative type item are added to one of the clusters, which does not decrease its optimal cost.

5. For every , there is a cluster consisting of five items as follows: three positive type items and two negative type items. The number of clusters for a fixed value of is . Since

no three items fit into one bin, and an optimal solution for every cluster uses three bins. The contribution to the cost is

6. For every , there is a cluster consisting of eleven items as follows: six positive type items and five negative type items. The number of clusters for a fixed value of is .

Since

no six items fit into one bin, and an optimal solution for every cluster uses three bins. The contribution to the cost is

All items were assigned to suitable clusters, and the total cost of the clustered optimal solution is at least

Letting and grow without bound, the ratio between the costs of the two optimal solutions is approximately 1.9355858244424.

Theorem 2.1

The price of clustering is at least .

2.2 Upper bounds for

We will start the analysis of upper bounds with a simple analysis of the price of clustering, yielding the bound of [3] in a simple way (in fact, since we use an analytic proof, we show a value of rather than ). Unlike the previous proof, we do not use auxiliary graphs or computer assisted analysis. Our improved result of will be based on an extension of the approach of the simple bound.

The analysis yielding the bound resembles the one of Simchi-Levi for First-Fit Decreasing (FFD) [26]. In this algorithm, items are sorted by non-increasing size and First-Fit (FF) is applied to this list. FF is a greedy algorithm that packed every item into the bin of smallest index where it can be packed, given the previously packed items, which are not smaller in the case of FFD.

For a fixed input, let be the number of clusters. Let be the number of bins in an optimal solution for the th cluster, whose input is . We let be the set of items , where . Let be a globally optimal solution for , as well as its cost. Let be the number of bins in the output of FFD for cluster .

We will use weights for the analysis. Weights allow us to bind two solutions and compare them, using the property that the total weight of all input items can be defined consistently.

We start with defining a simple weight function. Let as follows.

Let , where is the size of item . Let , where . Similarly, , and . An item of size strictly above is called large. The next claim provides an upper bound on the total weight, based on the value .

Claim 2.2

We have .

Proof. Consider a bin of OPT. The total size of items is at most , and there is at most one large item, which gives at most , since the total weight is at most
 

The next claim holds by definition, and by the assumption on clusters.

Claim 2.3

and .

Consider the output bins of FFD for some input. Recall that indexes of bins are given according to the order in which FFD opens (first uses) them. Let all bins for an output of FFD be called inner except for the last bin. When we say that a bin is earlier than another bin, we mean that it has a smaller index, and a later bin has a larger index.

Claim 2.4

The set of bins of FFD with large items is a prefix of its bins. Every pair of bins of FFD has total size above . If the first item of some bin has size above for an integer , then every earlier bin without any item of size above has items of sizes in (and possibly other items packed later).

The first part holds because the large items are packed first (and a pair of such items cannot share a bin). The second part and third part hold due to the rule of opening a new bin.

Claim 2.5

Assume that all inner bins of FFD except for possibly one bin (called bad) have loads of at least for some cluster . Then, the total weight is at least .

Proof. The total size is above

(by considering together the last bin, and the bad inner bin if it exists or the first inner bin otherwise). Thus, as  

Let be the number of bins in the prefix for inner bins of FFD for cluster with large items (where if this prefix is empty). We have (because we only consider inner bins).

Claim 2.6

Assume that , and every inner bin whose index is above has load of at least . Then, the total weight is at least .

Proof. The total size of items is at least

(by considering the first and last bin together, and since ) the number of inner bins with loads at least is non-negative).

Thus , and we have

As and we get

 

Let be the the first item of the last bin of FFD and its size.

Claim 2.7

At least one of the cases of Claims 2.5,2.6 holds for every cluster.

Proof. If , by the second part of Claim 2.4, all inner bins have loads above .

Otherwise, all items that arrived before have sizes above . Every inner bin without a large item has exactly two items of sizes in . If , the condition of Claim 2.5 holds and otherwise the condition of Claim 2.6 holds.   

Proposition 2.8

The price of clustering is at most 1.95.

Proof. We have .   

We proceed to an improved analysis. Intuitively, a bad structure of clusters is that used in the lower bound construction, that is, clusters consist of items of similar sizes, some of which are slightly smaller than a given reciprocal of an integer and some slightly larger than this value. Our improved weight function is based on dealing with such clusters, and in particular, such clusters for items that are relatively large.

For simplicity, we will use the same notation, and use as the name of our new weight function, and the function is also based on item sizes as follows.

The values after the equality sign in the definition are called bonuses, and their values are between zero and . Next, we find an upper bound on the total weight.

Claim 2.9

We have , where .

Proof. Consider a bin of OPT. The total size of items is at most . We consider items of of sizes above , as the weight of any other item is just times its size. If there is no large item, the ratio does not exceed , so the total weight for is below .

If there is a large item, there can be at most two other items of sizes above . If there is also an item of size in , these are the only two items with positive bonuses. Otherwise, if there is also an item of size in , there can be another item of size in . In other cases the total weight is smaller. Thus, the total weight is at most

 

Once again, we show that for every cluster, it holds that . We use the index and the size as before.

Lemma 2.10

Given a cluster , it holds that .

Proof. If the load of any inner bin, possibly excluding one inner bin, is at least , since the total load of any inner bin and the last inner bin is above , we get

Thus,

since . Thus, we assume that at least two inner bins have loads below .

We split the analysis into several cases. In the case , the load of any inner bin is above , so this case was already excluded.

In the case , the load of every inner bin is at least , and the total size satisfies

We have

by . If there is at least one item of size above , or at least four items with positive bonuses, we are done. Otherwise, there are at most three items of sizes above , all of which are not larger than , and every inner bin, except for possibly the first one, has six items of sizes in , so the loads are above , contradicting the assumption.

In the case , the load of every inner bin is at least , and the total size satisfies

We have

by . The last bin has an item of bonus . Every inner bin with an item of size above has a bonus of at least , and every inner bin without such an item has at least four items of sizes in , so the total bonus is at least . The total bonus is therefore at least , and .

In the case , the load of every inner bin is at least , and the total size satisfies

We have

by . If there is a large item, we are done. Otherwise, if there are at least three items of sizes in , since the last bin has an item of bonus , the total bonus is at least . Otherwise, all items of sizes above (at most two such items) are packed into the first inner bin, and every inner bin except for the first one has at least four items of sizes in , and its load is above . As in the case , we have , and the calculation of bonuses it also the same as in that case.

In the case , the load of every inner bin is at least , and the total size satisfies . We have

The bonus of the item of the last bin is , so if there is also a large item, the total bonus is at least and we are done. Otherwise, if all inner bins together have at least five items of sizes in , the total bonus is at least . We are left with the case where there are at most four such items, and in fact there are exactly four such items, as every inner bin has at least two such items. If the second inner bin does not have an item of size above , then it has three items of sizes above , so this case is impossible. Thus, the first bin has two items of sizes above , and the second bin has at least one such item. The total bonus in this case is .

In the case , we use the value of in the analysis. We consider the first inner bin together with the last bin. Every inner bin that is not in the prefix of first inner bins has two items of sizes in . The bins of indices have large items, and the first inner bin either has a large item or two items of sizes in , so its bonus is at least .

If , we get and

Otherwise, , and the first inner bin has a large item. Thus,

and

By , we have

 

Theorem 2.11

The price of clustering is at most .

Proof. We have .   

2.3 The price of clustering for larger parameters

In this section we briefly discuss the case of larger , that is, the case where for a given integer , it is known that the optimal solution for every cluster has cost not smaller than .

The lower bound has a similar structure in the sense that items type are similar. In the case , half of the bins of a globally optimal solution two items of sizes close to , while here only a fraction of of the bins will be such. In the clustered solution, clusters with items of sizes approximately will still have two items of sizes below , but the number of items of sizes above will be . For clusters with items of sizes approximately , there are still five items of size below in each such cluster, but there are items of sizes above . For items of sizes just above (), still the last bin of the cluster will have just one item, but there are bins, so the number of items will be .

The resulting numbers of items (up to negligible constants) are as follows. The number of items of sizes just above is and the number of items of sizes just below is , items of sizes just above : , items of sizes just below : , items of sizes just above : , items of sizes just below : , and items of sizes just above , , and : .

Proposition 2.12

The lower bound on the price of clustering for a given value is

Since this generalizes the case , indeed for we get the earlier lower bound of . For , and , the approximate lower bounds are 1.8781318, 1.8410851, 1.815945, 1.7979, 1.78437, 1.77386, and 1.76546, respectively.

The lower bound for growing to infinity is only approximately since we did not use the entire series but only the first few elements of the sequence defined earlier.

It is possible to show close upper bounds for other values of as well. As an example, we show a close upper bound for . Once again, we expect the worst case for clusters to be of the same form as before, but there will be another relatively full bin in every cluster.

We will use the same notation once more, and use as the name of our new weight function, and the function is also based on item sizes. Let .

Let , where . We have .

The values after the equality sign in the definition are called bonuses again. We find an upper bound on the total weight.

Claim 2.13

We have .

Proof. Consider a bin of OPT. The total size of items is at most . We consider items of of sizes above , as the weight of any other item is just times its size. If there is no large item, the ratio does not exceed , so the total weight for is below . The remaining case, similarly to the calculation for yields .   

Once again, we show that for every cluster, it holds that . We use the index and the size as before.

Lemma 2.14

Given a cluster , it holds that .

Proof. If the load of any inner bin, possibly excluding one inner bin, is at least , once again we get . Thus,

since . Thus, we assume that at least two inner bins have loads below , and therefore we can assume that holds again.

In the case , we found , and we have

by . If , we already get , so we focus on the case for the current range of .

As and , if there is at least one item of size above , or at least six items with positive bonuses, we are done. Otherwise, there are at most five items of sizes above , all of which are not larger than , so the third inner bin has six items of sizes in , and more specifically, all six items of sizes at least . This gives us another lower bound on the total size:

and .

In the case , the total size satisfies , and we have

by . The last bin has an item of bonus . Every inner bin with an item of size above has a bonus of at least , and every inner bin without such an item has at least four items of sizes in , so the total bonus is at least . The total bonus is therefore at least , and .

In the case , the total size satisfies . We have

by .

If there is a large item, we are done as . Otherwise, since , if there are at least five items of sizes in , we are done. Otherwise, all items of sizes above (at most four such items) are packed into the two first inner bins, and every inner bin except for the first two has at least four items of sizes in , and its load is above . If there is just one inner bin with at least one item of size above , we get the same bound on the total size and the entire analysis is the same as in the case where . Otherwise, there are at least three items with bonuses of (two in the first bin and one in the second bin), and the second bin has at least one additional item with a positive bonus. There are at least five other items (including an item of the last bin) with bonuses of . The total bonus is therefore at least . The total size is at least , and