    # Recoverable Robust Representatives Selection Problems with Discrete Budgeted Uncertainty

Recoverable robust optimization is a multi-stage approach, where it is possible to adjust a first-stage solution after the uncertain cost scenario is revealed. We analyze this approach for a class of selection problems. The aim is to choose a fixed number of items from several disjoint sets, such that the worst-case costs after taking a recovery action are as small as possible. The uncertainty is modeled as a discrete budgeted set, where the adversary can increase the costs of a fixed number of items. While special cases of this problem have been studied before, its complexity has remained open. In this work we make several contributions towards closing this gap. We show that the problem is NP-hard and identify a special case that remains solvable in polynomial time. We provide a compact mixed-integer programming formulation and two additional extended formulations. Finally, computational results are provided that compare the efficiency of different exact solution approaches.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Most optimization problems in practice are uncertain. To handle such problems under uncertainty, a vibrant field of research has been developed, including such approaches as fuzzy optimization [LK10], stochastic programming [BL11], or robust optimization [KZ16]. Treating uncertainty in an optimization problem typically increases its computational complexity, which means that a problem that is simple under known problem data may become challenging when the data is not known precisely.

In this paper we consider one such problem, which is the representatives multi-selection problem (RMSP). We are given several disjoint sets of items, and need to chose a specified number of items from each set. The aim is to minimize a linear cost function in the items. More formally, the problem can be described as

 minx∈Xctx

with

 X={x∈{0,1}n:∑i∈Tjxi=pj ∀j∈[K]}

and forming a partition of the item set into disjoint sets called parts, where we use the notation . The special case of is known as the selection problem [KZ17], while the case with has been studied as the representatives selection problem [KKZ15] or weighted disjoint hitting set problem [Büs11].

To follow a robust optimization approach for this problem, we need to specify an uncertainty set containing all cost scenarios against which we wish to prepare. The classic (single-stage, min-max) approach is then to consider the problem

 minx∈Xmaxc∈Uctx

A drawback of this approach is that it does not incorporate the possibility to react once scenario information becomes available. To alleviate this, two-stage approaches have been introduced, in particular adjustable robust optimization [YGdH19] and recoverable robust optimization [LLMS09]. In the latter approach, we fix a complete solution in the first stage, and can slightly adjust it after the scenario has been revealed.

Different types of uncertainty sets have been proposed in the literature, including discrete uncertainty, interval uncertainy, or ellipsoidal uncertainty [GS16]. Particularly successful has been the so-called budgeted uncertainty as first introduced in [BS03], where only a bounded number of cost coefficients can deviate from their nominal values. That is, the set of possible cost scenarios is given as

 U={c∈Rn:ci=c–i+diδi,δi∈{0,1},∑i∈[n]δi≤Γ}

for some integer . We refer to this type of uncertainty as discrete budgeted uncertainty. It is also possible to define continuous budgeted uncertainty, where we allow the deviations to be continuous within the interval . Discrete budgeted uncertainty hence contains only the extreme points of the continuous budgeted uncertainty polyhedron. In the case of single-stage robust optimization, the continuous and discrete variants are equivalent; this is not necessarily the case for two-stage robust optimization.

We define and . We use a recoverable robust approach, where we buy a solution in the first stage with known costs , and then can adjust this solution after the uncertain second-stage costs have been revealed. We can only change at most many items. Let denote the Hamming distance, and let be the set of recovery solutions for some integer . We define the following problems. In the incremental problem, we are given and , solve

 \textscInc(x,c)=miny∈R(x)∑i∈[n]ciyi.

This represents the recovery step after the solution has been fixed, and the scenario has been revealed. A layer above this is the adversarial problem, where given , we solve

Finally, the recoverable robust representatives multi-selection problem (RRRMSP) is to solve

We will sometimes refer to this as the recoverable robust problem for short. The special case of the recoverable robust representatives selection problem (RRRSP), where , was first considered in the PhD thesis [Büs11]. The case of discrete budgeted uncertainty was highlighted as being open.

The special case , i.e., the recoverable robust selection problem with discrete budgeted uncertainty, was previously considered in [CGKZ18]. It was shown that the adversarial problem can be solved in polynomial time, and a compact formulation for the recoverable robust problem was derived. So far, neither positive nor negative complexity results for the RRRMSP or RRRSP have been derived, despite the problems being open for nearly 10 years.

Other variants of robust selection problems have been considered in the literature as well. In [Ave01], a polynomial time algorithm was presented for the selection problem with a min-max regret objective and interval uncertainty. The algorithm complexity was further improved in [Con04]. In [DK12], min-max and min-max regret representatives selection problem with interval und discrete uncertainty were considered and these results were further refined in [DW13]. More recently, [GKZ19] consider two-stage representatives selection problems with general convex uncertainty sets. Furthermore, the setting of recoverable robustness has been applied to other combinatorial problems as well, including knapsack [BKK11], shortest path [Büs12] and spanning tree [HKZ17] problems.

In Table 1 we show the data of an example RRRSP with , and . We further have .

A natural candidate solution is to pick items 1 and 4. This choice has first-stage costs of 8. A worst-case scenario is that the costs of item 4 are increased, which forces us to respond by exchanging item 4 for item 3. The second-stage costs are thus 19, with total costs . Choosing items 2 and 4 results in an objective value of .

Now consider the solution where we pack items 1 and 3. A worst-case attack is now on item 1, which results in an optimal recovery of exchanging items 1 and 2. The total costs of this solution are . In fact, this is the unique optimal solution to the problem.

Note that item 3 is dominated by item 4, being worse in every cost coefficient. A natural assumption one may make is that dominated items should not be packed in an optimal solution. That is, if there is such that , and for some other item , then we can assume in an optimal solution. This example demonstrates that this is in fact not the case, underlining that the recoverable robust problem, despite a seemingly easy structure, is more complex than it appears.

In this paper we make the following contributions. In Section 2 we show that it is possible to solve the adversarial problem in polynomial time. In Section 3 we solve a long-standing open problem by showing that already the recoverable robust representatives selection problem with and is NP-hard. In Section 4 we show that a special case of the problem can be solved in polynomial time. Here we assume that , and . That is, each part contains exactly two items, of which one must be chosen. The adversary can increase costs once, and we can recover by exchanging a single item. The idea to prove that this case can be solved in polynomial time is based on the following observation. Consider any min-max problem with scenarios:

 minx∈Xmaxs∈[S]∑i∈[n]csixi

This problem can be equivalently written as

 mins∈[S]minx∈Xs∑i∈[n]csixi

where

 Xs={x∈X:∑i∈[n]csixi≥∑i∈[n]ckixi ∀k∈[S]}

that is, we guess the worst-case scenario , but restrict the set of feasible solutions to those where is indeed the worst case. As far as we are aware, such an approach has not been successfully applied before. We consider problem models in Section 5, where we use insight on the adversarial problem from Section 2 to derive a compact mixed integer programming formulation, i.e. a formulation as a mixed integer program of polynomial size. As the number of constraints and variables is of the order , we also discuss different iterative solution approaches. We present computational experiments in Section 6, comparing different exact solution approaches developed in this paper. Finally, Section 7 concludes the paper, and further research questions are pointed out.

To derive a model for

we first consider the incremental problem and model it as the following integer program. equationparentequation

 \textscInc(x,c)=min ∑i∈[n]ciyi (1a) s.t. ∑i∈Tjyi=pj ∀j∈[K] (1b) ∑i∈[n]xiyi≥P−k (1c) yi∈{0,1} ∀i∈[n] (1d)

where is the total number of elements to select. Variable denotes if item is contained in the recovery solution. Constraint (1c) ensures that we must use at least items from the first-stage solution , i.e., at most items can be exchanged for other items. As is fixed in this context, model (1

) is an integer linear program. Note that the coefficient matrix is totally unimodular. We dualize its linear programming relaxation by introducing variables

for constraints (1b), variable for constraint (1c), and variables for constraints (1d) (which become in the relaxation). Using the dual of Inc and by exploiting weak and strong duality, we can construct the following compact formulation for the adversarial problem. equationparentequation

 \textscAdv(x)=max ∑j∈[K]pjαj+(P−k)β−∑i∈[n]γi (2a) s.t. ∑i∈[n]δi≤Γ (2b) αj+xiβ≤c–i+diδi+γi ∀j∈[K],i∈Tj (2c) δi∈{0,1} ∀i∈[n] (2d) β≥0 (2e) γi≥0 ∀i∈[n] (2f)

We use

as a binary variable indicating which item costs should be increased. Constraint (

2b) ensures that the total number of items with increased costs should be less or equal to . Constraints (2c) are the dual constraints to variables in problem Inc.

In the following we show that Adv can be solved in polynomial time. To this end, we use an enumeration argument to decompose Adv into simpler subproblems.

Let us first assume that we fix variable to some value, and decide how many items per part should have increased costs with . Then we have

with subproblems

 \textscAdvj(x,β,Γj)=(P−k)β+max pjαj−∑i∈Tjγi s.t. ∑i∈[n]δi≤Γj αj+xiβ≤c–i+diδi+γi ∀i∈Tj δi∈{0,1} ∀i∈Tj γi≥0 ∀i∈Tj

In particular, for fixed and , we can decompose the problem and consider each part separately. Note that in an optimal solution to , we can assume that

 γi=[αj+xiβ−c–i−diδi]+

where we use the notation for the positive part of a value. Using this observation, we can rewrite the problem as equationparentequation

 \textscAdvj(x,β,Γj)=(P−k)β+max pjαj−∑i∈Tj[αj+xiβ−c–i−diδi]+ (3a) s.t. ∑i∈[n]δi≤Γj (3b) δi∈{0,1} ∀i∈Tj (3c)

For any fixed choice of , the remaining problem is piecewise linear in variable . We can conclude that an optimal value for is at one of the kink points, where the slope of the piecewise linear function changes. Hence, there is an optimal with

 Aj(β) =A1j∪A2j∪A3j(β)∪A4j(β), where A1j ={c–i:i∈Tj} A2j ={c–i+di:i∈Tj} A3j(β) ={c–i−β:i∈Tj} A4j(β) ={c–i+di−β:i∈Tj}

For a fixed choice of , problem (3) is equivalent to a selection problem, which can be solved in time. Furthermore, we have . Overall, we can calculate

in , where denotes problem (3) with fixed choice of .

Now let us assume that for each part , we guess the value of . Let for (in case ) and for (in case ), , and with suitable constants . We find that

 \textscAdv(x)=max (P−k)β+∑j∈Cpjvj+∑j∈Vpj(vj−β)−∑i∈[n]γi s.t. ∑i∈[n]δi≤Γ vj+xiβ≤c–i+diδi+γi ∀j∈C,i∈Tj vj−β+xiβ≤c–i+diδi+γi ∀j∈V,i∈Tj δi∈{0,1} ∀i∈[n] β≥0 γi≥0 ∀i∈[n]

As before, we can assume that in an optimal solution we have

 γi =[vj+xiβ−c–i−diδi]+ ∀j∈C,i∈Tj γi =[vj−β+xiβ−c–i−diδi]+ ∀j∈V,i∈Tj

Using this property, can be rewritten as:

 \textscAdv(x)=max (P−k)β+∑j∈Cpjvj+∑j∈Vpj(vj−β) −∑j∈C∑i∈Tj[vj+xiβ−c–i−diδi]+−∑j∈V∑i∈Tj[vj−β+xiβ−c–i−diδi]+ s.t. ∑i∈[n]δi≤Γ δi∈{0,1} ∀i∈[n] β≥0

Similar to the reasoning for , we find that is piecewise linear in , and conclude that there is an optimal which is equal to one of the kink points

 B(α1,…,αj)={c–i+diδi−vj:j∈[K],i∈Tj}∪{0}

and hence it is sufficient to consider from the set

 B={c–i+dibi−c–j−djbj:i,j∈[n],bi,bj∈{0,1}}∪{0}

Note that .

We can now show that can be solved in polynomial time. Note that directly enumerating all combinations of and would require exponential time, which is why we combine our structural observations with a dynamic program to show the following result.

###### Theorem 1.

The adversarial problem of RRRMSP with discrete budgeted uncertainty can be solved in strongly polynomial time.

###### Proof.

We first enumerate all values . For each and each , we calculate by enumerating over all possible values of .

We then use the following dynamic program. Let

denote the maximum adversary value that is achievable using only the first parts and a budget of . We have . Using the recursion

it is then possible to calculate all values of . As

it is possible to solve the adversarial problem in polynomial time. More precisely, calculating all values for fixed requires time. The subsequent dynamic program needs to calculate many values, each of which requires table lookups. So overall the runtime of this method is

 O(|B|(∑j∈[K]Γ|Tj|2+Γ2K))=O(n4Γ+n2Γ2K)=O(n5)

## 3 Hardness of representatives selection

While the adversarial problem of RRRMSP can be solved in polynomial time, the recoverable robust problem is hard already in the case of RRRSP, as the following result shows.

###### Theorem 2.

The decision version of the recoverable robust representatives selection problem (i.e. the case for all ) is NP-complete, even if we have for all .

###### Proof.

The membership to NP follows from the fact that the adversary problem is in P (Theorem 1), so it remains to show NP-hardness.

To show this, we reduce from the well-known NP-complete problem Partition. An instance of Partition consists out of a multi-set . It is a yes-instance, if there exists such that , where we denote by the sum of elements in . Given an instance of Partition, define and let be some big integer. Consider the following instance of RRRSP, which is depicted in Table 2.

For the sake of readability, items are not numbered consecutively. Instead, we write to refer to item number in part . There are parts , where contains exactly the two items and . Note that for , the costs depend on , and for , the costs are identical.

Let be the first parts and be the remaining parts. Finally, let and . This completes our description of the instance . We now claim that if and only if is a yes-instance of Partition.

To see the ‘if’ part, assume there is with

. Then consider the binary vector

resulting from choosing item in the parts and choosing item if , and otherwise item in the parts . Now consider the adversarial stage for this vector, i.e. : Independent of which items the adversarial player attacks, in the recovery stage all recoveries will take place in , due to the choice of . Furthermore, items from which were attacked in the adversarial stage are prioritized in the recovery. Hence, if the adversary does not attack all items for , then any attack in will prove useless in the end. Therefore the adversary has only two valid strategies: (i) Attack all items for . This leads to a result of after the recovery stage. (ii) Attack all items for and waste the remaining attack. This leads to a result of after the recovery stage. We conclude .

To see the ‘only if’ part, assume that for all we have . Let be some binary vector picked by the first-stage player. We show that : If for some , we are immediately done, so assume for all . Let . There are two cases: If , the adversary can apply strategy (i), which leads to an end result of strictly more than . If , the adversary can apply strategy (ii). After this, the selected items in have total cost greater than . Therefore the end result is strictly more than .

## 4 Polynomially solvable cases

We now consider the special case of representatives selection with and , i.e., each part consists of two elements, and we need to pick one of them. Furthermore, we consider . In the following, we show that this case can be solved in polynomial time.

###### Theorem 3.

The recoverable robust representatives selection problem with discrete budgeted uncertainty and and , for all can be solved in strongly polynomial time .

As , it is tempting to make a case distinction upon which item the adversary attacks. However, remember that the first-stage player first chooses one of exponentially many , and then the adversary can react to this choice in a way not controllable by the first-stage player. Therefore, if we assume that a certain item is attacked, it is invalid to iterate over every single first-stage solution (because in general). Instead, as mentioned in Section 1, we do the following: For each possible attack of the adversarial player, we charakterize the set of first-stage solutions with the property that is an optimal response to (details below). We then show how the first-stage player can find the optimum of in polynomial time. Surprisingly, the argument becomes more technical than one might expect at first.

For this section, we make use of the following notation. Let denote the index of the item chosen in part by a fixed first-stage solution , and let denote the index of the item that is not chosen. Using this notation, the incremental problem can be written in the following way. equationparentequation

 \textscInc(x,c)=min ∑j∈[K]c⟨j,x⟩+∑j∈[K](c⟨j,¯¯¯x⟩−c⟨j,x⟩)yj (4a) s.t. ∑j∈[K]yj≤1 (4b) yj∈N0 ∀j∈[K] (4c)

We use a variable for every part () to denote whether the item exchange takes place in this part. The objective (4a) consists of two sums. The first sum denotes the costs if no element is changed. The second sum represents the effect of exchanging one item. There is only a single constraint (4b), enforcing that only a single item change can take place.

Relaxing and dualizing problem (4) gives the following adversarial problem: equationparentequation

 \textscAdv(x)=max ∑j∈[K]c⟨j,x⟩−π (5a) s.t. c⟨j,x⟩−c⟨j,¯¯¯x⟩≤π ∀j∈[K] (5b) c∈U (5c) π≥0 (5d)

Note that in an optimal solution to this problem, we can assume that is equal to the largest left-hand-side of constraints (5b) or equal to zero, if they are all negative. Hence, we find that

A small technical difficulty is that there may exist multiple strategies for the adversary to solve , which are all optimal. The following lemma guarantees that among two special strategies at least one is always optimal. Described informally, strategy I is the strategy to increase the price of an item currently not selected by the first-stage player, in order to decrease the value of a potential recovery at this item. Likewise, strategy II is the strategy to increase the price of an item currently selected by the first-stage player, forcing the first-stage player to either pay the increased price, or to recover this item (instead of recovering another item, which the first-stage player had preferred to recover if there had been no attack).

###### Lemma 1.

Given an instance of RRRSP with and for all , let be a fixed first-stage solution. If have the property that

 j⋆ ∈argmax{c–⟨j,x⟩−c–⟨j,¯¯¯x⟩:j∈[K]} (7) and i⋆ ∈argmax{d⟨j,x⟩−[¯¯c⟨j,x⟩−c–⟨j,¯¯¯x⟩−[c–⟨j⋆,x⟩−c–⟨j⋆,¯¯¯x⟩]+]+:j∈[K]} (8)

then there is an optimal solution to as defined in equation (6) where one of the following two cases holds:

1. (Strategy I) for all , or

2. (Strategy II) for all .

Furthermore, for any

 b⋆∈argmax{c–⟨j,x⟩−c–⟨j,¯¯¯x⟩:j∈[K],j≠j⋆}, (9)

we have that

 g1:=[c–⟨j⋆,x⟩−c–⟨j⋆,¯¯¯x⟩]+−max{[c–⟨j⋆,x⟩−¯¯c⟨j⋆,¯¯¯x⟩]+, [c–⟨b⋆,x⟩−c–⟨b⋆,¯¯¯x⟩]+} (10)

is the gain for the adversarial player, if the adversarial player employs strategy I instead of doing nothing at all. Similarily,

 g2:=d⟨i⋆,x⟩−[¯¯c⟨i⋆,x⟩−c–⟨i⋆,¯¯¯x⟩−[c–⟨j⋆,x⟩−c–⟨j⋆,¯¯¯x⟩]+]+ (11)

is the gain for the adversarial player, if the adversarial player employs strategy II instead of doing nothing at all.

###### Proof.

Recall that , hence, the adversery can select only a single part , where he increases the cost of one of the two items it contains. Note that there are only two possible strategies to optimize the adversarial value in Equation 6:

1. (Strategy I) Reduce the value of the inner maximum. In this case, choose such that satisfies Equation 7 and increase the costs of item . If the argmax in Equation 7 has multiple elements, every choice is equally good. To see that Equation 10 is correct, take the difference of Equation 6 for the corresponding old and new values of .

2. (Strategy II) Increase the value of the sum. Here one needs to consider that this may lead to an increase in the inner maximum, which should be avoided (i.e., we increase the cost of an item that will be dropped in the recovery step anyway). The best choice here is taking such that satisfies Equation 8 and increasing the costs of item , which can again be seen as taking the difference of Equation 6 for old and new values of . Again, if the argmax in Equation 8 has multiple elements, every choice is equally good. Substituting into Equation 8 yields Equation 11.

Note that the adversarial player will employ strategy I only if and will employ strategy II only if . Also note, that the numerical values of , are independent of the concrete choices of satisfying Equations 9, 8 and 7. We can therefore view as only dependent on , which we make use of in the next proof. We can now prove the main result of this section.

###### Proof of Theorem 3.

Consider a fixed instance of the RRRSP with and for all . For any , with , consider the following sets of first-stage solutions :

 Xj⋆b⋆ :={x∈X:x satisfies g1≥g2 and x,j⋆,b⋆ satisfy % \lx@cref{creftypeplural~refnum}{eq:bstar} and~\lx@cref{refnum}{eq:jstar}} Yj⋆i⋆ :={x∈X:x satisfies g1≤g2 and x,j⋆,i⋆ satisfy % \lx@cref{creftypeplural~refnum}{eq:istar} and~\lx@cref{refnum}{eq:jstar}}.

It is easy to see that every first-stage solution is contained in at least one of the above sets. Hence, in order to prove the theorem, it suffices to show how to compute each of the values and in time . Taking the minimum of all obtained values yields the result.

Computing the optimum of : For any there are two possible items to choose from part and part . For each fixed choice of the four possible values of and , run the following subroutine: Observe that Equation 7 and Equation 9 are equivalent to the statement

 c–⟨b⋆,x⟩−c–⟨b⋆,¯¯¯x⟩≤c–⟨j⋆,x⟩−c–⟨j⋆,¯¯¯x⟩ ∧ ∀j∈[K]∖{j⋆,b⋆}:c–⟨j,x⟩−c–⟨j,¯¯¯x⟩≤c–⟨b⋆,x⟩−c–⟨b⋆,¯¯¯x⟩ (12)

and, under the condition that Equation 12 is true, we can compute by Equation 10, and furthermore the statement is equivalent to the statement

 ∀j∈[K]:d⟨j,x⟩−[¯¯c⟨j,x⟩−c–⟨j,¯¯¯x⟩−[c–⟨j⋆,x⟩−c–⟨j⋆,¯¯¯x⟩]+]+≤g1. (13)

Therefore, we can iterate over and, for each of the two possible choices of , check whether this choice satisfies Equations 13 and 12. If this limits the possible choices to one value, we set to this value. If this limits the possible choices to zero values, we can break the current loop iteration and skip to the next choice of and . If both choices remain, we choose such that is minimum.

The result is a first-stage solution with the property that by creftype 1. Because only depends on and , this value is minimum among all considered in the current subroutine.

Computing the optimum of : Similarly to before, for each of the at most four choices of and , run the following subroutine:

Observe that Equation 7 is equivalent to the statement

 ∀j∈[K]∖{j⋆}:c–⟨j,x⟩−c–⟨j,¯¯¯x⟩≤c–⟨j⋆,x⟩−c–⟨j⋆,¯¯¯x⟩, (14)

and, under the condition that Equation 14 is true, Equation 8 is equivalent to the statement

 ∀j∈[K]: d⟨j,x⟩−[¯¯c⟨j,x⟩−c–⟨j,¯¯¯x⟩−[c–⟨j⋆,x⟩−c–⟨j⋆,¯¯¯x⟩]+]+ ≤d⟨i⋆,x⟩−[¯¯c⟨i⋆,x⟩−c–⟨i⋆,¯¯¯x⟩−[c–⟨j⋆,x⟩−c–⟨j⋆,¯¯¯x⟩]+]+. (15)

Likewise, under the condition that Equations 15 and 14 are true, can be computed by Equation 11, and furthermore the statement is equivalent to the statement

 ∀j∈[K]∖j⋆:[c–⟨j⋆,x⟩−c–⟨j⋆,¯¯¯x⟩]+−max{[c–⟨j⋆,x⟩−¯¯c⟨j⋆,¯¯¯x⟩]+, [c–⟨j,x⟩−c–⟨j,¯¯¯x⟩]+}≤g2. (16)

Therefore, we can use Equations 16, 15 and 14 to iterate over analogously to the previous case. At the end, we get a first-stage solution with the property that , by creftype 1. Analogously to the previous case, we see that this is optimal among all first-stage solutions considered in the current subroutine. ∎

It seems likely that the proof of Theorem 3 can be extended to general values , if holds. There still remain two basic adversarial strategies in this case; one where the chosen item of a part has its costs increased, and one where the cheapest item not chosen in a part has its cost increased. This may lead to an enumeration-based solution method along similar lines as presented in the proof. For the sake of brevity, we omit the details of this claim.

We further remark on two simple special cases. The first is for . In this case, it is not possible to use any recovery action and Rec becomes

 minx∈XCtx+maxc∈Uctx=minx∈Xmaxc∈U((C+c)tx)

which is a standard min-max optimization problem with budgeted uncertainty. This means the results from [BS03] apply and Rec can be solved in polynomial time.

###### Observation 1.

The recoverable robust representatives multi-selection problem with discrete budgeted uncertainty and can be solved in polynomial time.

The second special case is for . Here, no adversarial action is possible, and problem Rec becomes