1 Introduction
The increasing complexity of problems encountered in applications is a permanent motivation for the development of intelligent systems for human decision support. Among the various difficulties to overcome for decision making in complex environments we consider here three sources of complexity that often coexist in a decision problem: 1) the combinatorial nature of the set of feasible alternatives 2) the fact that multiple points of view, possibly conflicting, about the value of solutions may coexist, 3) the need of formulating recommendations that are tailored to the objectives and preferences of users and that takes into account the uncertainty in preference elicitation (due to possible mistakes in the responses of users to preference queries).
The first difficulty occurs as soon as the solutions to be compared are characterized by the combinations of elementary decisions. This is the case for instance for the selection problem of an optimal subset within a reference set, under a budget constraint (a.k.a. knapsack problem) where a solution is characterized by elementary decisions concerning items of the reference set. This difficulty prevents the explicit evaluation of all solutions and the determination of the best option requires implicit enumeration techniques. The second difficulty appears in multiagent decision contexts when the agents have different individual value systems or objectives leading to possibly conflicting preferences. It also appears in singleagent decision contexts when the alternatives are assessed w.r.t. different criteria. Finally, it appears in decision under uncertainty when several scenarios that have different impacts on the outcomes of the alternatives are considered. In all these situations, preference modeling requires the definition of multiple objectives to be optimized simultaneously. The combination of difficulties 1 and 2 is at the core of multiobjective combinatorial optimization Ehrgott (2005).
Let us now come to the third difficulty. The coexistence of multiple objectives makes the notion of optimality subjective and requires additional preference information to be collected from the users in order to discriminate between Paretooptimal solutions. In multiobjective decision problems, the “optimal” solution fully depends on the relative importance attached to the different objectives under consideration and on how performances are aggregated. A standard tool used to generate compromise solutions tailored to the decision maker (DM) value system is to optimize a parameterized aggregation function summarizing the performance vector of any solution into a scalar value. This makes it possible to reformulate the initial problem as a singleobjective optimization problem (see e.g.,
Steuer (1986)). However, a precise specification of the preference parameters (e.g., weighting coefficients), prior to the exploration of the set of alternatives, may be cumbersome because it requires a significant amount of preference information. To overcome this problem, incremental decision procedures aiming to integrate and combine the elicitation of preference parameters and the exploration of the set of feasible solutions are appealing (alternatively, one may also consider the approach consisting in computing the nondominated solutions according to a scalarizing function whose parameters are only partially specified Kaddani et al. (2017)). They make it possible to focus the elicitation burden on the information that is really useful to separate competing solutions during the optimization process, and this significantly reduces the number of queries asked to the user.In the fields of operations research and artificial intelligence, numerous contributions have addressed the problem of incrementally eliciting preferences. A first stream of research concerns preference elicitation for decision making in explicit sets (i.e., noncombinatorial problems), to assess multiattribute utility functions
White III et al. (1984), weights of criteria in aggregation functions Benabbou et al. (2017), multicriteria sorting models Özpeynirci et al. (2018), utility functions for decision making under risk Chajewska et al. (2000); Wang and Boutilier (2003); Hines and Larson (2010); Perny et al. (2016), or individual utilities in collective decision making Lu and Boutilier (2011). Preference elicitation for decision support on combinatorial domains is a challenging issue that has also been studied in various contexts such as constraint satisfaction Gelain et al. (2010), matching under preferences Drummond and Boutilier (2014), sequential decision making under risk Regan and Boutilier (2011); Weng and Zanuttini (2013); Gilbert et al. (2015); Benabbou and Perny (2017), and multiobjective combinatorial optimization Branke et al. (2016); Benabbou and Perny (2018); Bourdache and Perny (2019).Almost all incremental elicitation procedures mentioned above proceed by progressive reduction of the parameter space until an optimal decision can be identified. At every step of the elicitation process, a preference query is asked to the DM and the answer induces a constraint on the parameter space, thus a polyhedron including all parameter values compatible with the DM’s responses is updated after each answer (polyhedral method Toubia et al. (2004)). Queries are selected to obtain a fast reduction of the parameter space, in order to enforce a fast determination of the optimal solution. However, such procedures do not offer any opportunity to the DM to revise her opinion about alternatives and the final result may be sensitive to errors in preference statements.
A notable exception in the list of contributions mentioned above is the approach proposed by Chajewska et al. Chajewska et al. (2000). The approach relies on a prior probabilistic distribution over the parameter space and uses preference queries over gambles to update the initial distribution using Bayesian methods. It is more tolerant to errors and inconsistencies over time in answering preference queries. The difficulties with this approach may lie in the choice of a prior distribution and in the computation of Bayesian updates at any step of the procedure. A variant, proposed in Guo and Sanner (2010), relies on simpler questions under certainty, so as to reduce the cognitive load.
Motivation of the paper
As far as we know, the works mentioned in the last paragraph has not been extended for decision making on combinatorial domains. Our goal here is to fill the gap and to propose a Bayesian approach for determining a preferred solution in a multiobjective combinatorial optimization problem. The main issue in this setting is the determination of the next query to ask to the DM, as there is an exponential number of possible queries (due to the combinatorial nature of the set of feasible solutions).
Related work
Several recently proposed Bayesian preference elicitation methods may be related to our work.
– Sauré and Vielma Sauré and Vielma (2019)
proposed an error tolerant variant of the polyhedral method, where the polyhedron is replaced by an ellipsoidal credibility region computed from a multivariate normal distribution on the parameter space. This distribution, and thus the ellipsoidal credibility region, is updated in a Bayesian manner after each query. In contrast with their work, where the set of alternatives is explicitly defined, our method applies on implicit sets of alternatives. Besides, although our method also involves a multivariate normal density function on the parameter space, our query selection strategy is based on the whole density function and not only on a credibility region.
– Vendrov et al. Vendrov et al. (2020) proposed a query selection procedure able to deal with large sets of alternatives (up to hundreds of thousands) based on Expected Value Of Information
(EVOI). The EVOI criterion consists in determining a query maximizing the expected utility of the recommended alternative conditioned on the DM’s answer (where the probability of each answer depends on a response model, e.g. the logistic response model). However, the subsequent optimization problem becomes computationally intractable with a large set of alternatives. The authors consider a continuous relaxation of the space of alternatives that allows a gradientbased approach. Once a query is determined in the relaxed space, the corresponding pair of fictive alternatives is projected back into the space of feasible alternatives. In addition, a second contribution of the paper is to propose an elicitation strategy based on
partial comparison queries, i.e. queries involving partially specified multiattribute alternatives, which limits the cognitive burden when the number of attributes is large. We tackle here another stateoftheart query selection strategy that aims at minimizing the max regret criterion (instead of maximizing the EVOI criterion), a popular measure of recommendation quality.– In a previous work Bourdache et al. (2019)
, we introduced an incremental elicitation method based on Bayesian linear regression for assessing the weights of rankdependent aggregation functions used in decision theory (typically OWA and Choquet integrals). The query selection strategy we proposed is based on the min max regret criterion, similarly to the one we use in the present work. However, the method can only be applied to
explicit sets of alternatives and does not scale to combinatorial domains. The computation of regrets in the provided procedure (in order to determine the next query) requires indeed the enumeration of all possible pairs of solutions for each query, which is impractical if the set of solutions is combinatorial in nature. The change in scale is considerable. For illustration, instances involving 100 alternatives were considered in the numerical tests of our previous work Bourdache et al. (2019) while in the multiobjective knapsack instances under consideration in Section 4 of the present paper, there are about feasible solutions, among which several millions are Pareto optimal. In order to scale to such large problems, we propose a method based on mixed integer linear programming that allows us to efficiently compute MER and MMER values on combinatorial domains.Organization of the paper
In Section 2, we describe the incremental elicitation procedure proposed in Bourdache et al. (2019) for the determination of an optimal solution over explicit sets of solutions and we point out the main issues to overcome to extend the approach to combinatorial sets of solutions. Section 3 is devoted to the method we propose to compute expected regrets on combinatorial domains. The results of numerical tests we carried out are presented in Section 4, that show the operationality of the proposed procedure.
2 Incremental elicitation
We first provide a general overview of the incremental Bayesian elicitation procedure on an explicit set and then discuss the extension to cope with combinatorial optimization problems.
Let denote the set of possible solutions. Since we are in the context of multiobjective optimization, we assume that a utility vector is assigned to any solution . Then we consider the problem of maximizing over a scalarizing function of the form where are positive weights and are basis functions Bishop (2006) (introduced to extend the class of linear models to nonlinear ones). For simplicity, the reader can assume that , i.e., the th component of , and is the (imperfectly known) weight of criterion in the following. At the start of the procedure, a prior density function is associated to the parameter space , where the unknown weighting vector takes value. Then, at each step, the DM responds to a pairwise comparison query and, based on this new preference information, the density function is updated in a Bayesian manner. The aim is, in a minimum number of queries, to acquire enough information about the weighting vector to be able to recommend a solution that is near optimal. We present here the three main parts of the decision process: query selection strategy, Bayesian updating after each query and stopping condition.
2.1 Query selection strategy
At each step of the algorithm, a new preference statement is needed to update the density function on . In order to select an informative query, we use an adaptation of the Current Solution Strategy (CSS) introduced in Boutilier et al. (2006) and based on regret minimization. In our probabilistic setting, regrets are replaced by expected regrets. Before describing more precisely the query selection strategy, we recall some definitions about expected regrets Bourdache et al. (2019).
Definition 1.
Given a density function on and two solutions and , the pairwise expected regret is defined as follows:
In other words, the Pairwise Expected Regret (PER) of with respect to represents the expected utility loss when recommending solution instead of solution . In practice, the PER is approximated using a sample of weighting vectors drawn from . This discretization of enables us to convert the integral into an arithmetic mean:
(1) 
Definition 2.
Given a set of solutions and a density function on , the max expected regret of and the minimax expected regret over are defined by:
Put another way, the max expected regret of is the maximum expected utility loss incurred in selecting in while the minimax expected regret is the minimal max expected regret value of a solution in . As for the PER computation, the MER and the MMER can be approximated using a sample of weight vectors:
(2) 
(3) 
We can now describe the adaptation of CSS to the probabilistic setting. The max expected regret of a solution is used to determine which solution to recommend (the lower, the better) in the current state of knowledge characterized by . At a step of the elicitation procedure, if the stopping condition (that will be defined below) is met, then a solution is recommended. But if the knowledge about the value of needs to be better specified to make a recommendation, the DM is asked to compare to its best challenger (best challenger in the current state of knowledge). In the next subsection, we describe how one uses the DM’s answer to update the density function .
2.2 Bayesian updating
At step of the procedure, a new query of the form “
” is asked to the DM. Her answer is translated into a binary variable
that takes value if the answer is yes and otherwise. Using Bayes’ rule, the posterior density function reads as follows:(4) 
where is assumed to be multivariate Gaussian (the initialization used for will be specified in the numerical tests section). The posterior density function is hard to compute analytically using Equation 4. Indeed, the likelihood
follows a Bernoulli distribution and no conjugate prior is known for this likelihood function in the multivariate case. Therefore, one uses a data augmentation method
Albert and Chib (1993) that consists in introducing a latent variable that represents the utility difference between the two compared solutions, where is a given weighting vector, is an explanatory variable defined by and is a Gaussian noise accounting for the uncertainty about the DM’s answer. The Gaussian nature of the density function for implies that the conditional distribution is also Gaussian: . In order to make consistent with the DM’s answer, one forces if and otherwise. Thus, one obtains the following truncated density function:Using the latent variable, the posterior distribution is formulated as:
(5) 
If the prior density is multivariate Gaussian then is multivariate Gaussian too, as well as . To approximate , Tanner and Wong proposed in Tanner and Wong (1987) an iterative procedure based on the fact that depends in turn on :
(6) 
The procedure consists in solving the fixed point equation obtained by replacing in Equation 5 by its expression in Equation 6. More precisely, ones performs alternately series of samples of values from , and updating of the posterior density function by , which is Gaussian because every is Gaussian. Note that each value of the sample is obtained by iteratively drawing a value from the current distribution then drawing from for all . For more details, we refer the reader to Algorithm in Bourdache et al. (2019).
2.3 Stopping condition
The principle of the incremental elicitation procedure is to alternate queries and update operations on the density until the uncertainty about the weighting vector is sufficiently reduced to be able to make a recommendation with a satisfactory confidence level. A stopping condition that satisfies this specification consists in waiting for the value to drop below a predefined threshold, which can be defined as a percentage of the initial value.
2.4 Main obstacles for extending the approach
The main obstacles encountered while managing to extend the approach to a combinatorial setting are related to the computation of MER and MMER values as they are defined in Equations 2 and 3:

both values requires an exponential number of pairwise comparisons to be computed (because there is an exponential number of feasible solutions);

in addition, the use of linear programming to compute these values is not straightforward because the constraint in Equation 1 is not linear.
These issues are all the more critical given that the MER and MMER values are computed at every step of the incremental elicitation procedure to determine whether it should be stopped or not, and to select the next query.
3 Computation of regrets
While the use of mathematical programming is standard in minmax regret optimization, the framework of minmax expected regret optimization is more novel. We propose here a new method to compute and by mixed integer linear programming, where is implicitly defined by a set of linear constraints and is a sample drawn from the current density . We consider in this section that is linear in , but the presented approach is adaptable to nonlinear aggregation functions if there exists appropriate linear formulations (e.g., the linear formulation of the ordered weighted averages Ogryczak and Śliwiński (2003)). We also assume that .
3.1 Linear programming for MER computation
To obtain a linear expression for , we replace the function in Equation 1 by for each weighting vector where is a binary variable such that if and if (the value of does not matter if , because anyway). For this purpose, we need the following additional constraints:
Proposition 0.
Given , and , if is an aggregation function defined such that for any and , and if satisfies the constraints and , then:
Proof.
Let us denote by the value for any . First note that , because is such that . For any , three cases are possible: Case 1. is such that : becomes , thus and we indeed have . Case 2. is such that : becomes and implies and thus . Case 3. is such that then . In the three cases we have thus . ∎
The constraints and are linear as is linear in . Nevertheless, using variables and their constraints in the formulation of gives a system of linear constraints with a quadratic objective function:
The objective function is quadratic because the term is quadratic in variables and . To linearize the program, we introduce a positive real variable for each that replace the product term . Note that the term does not need linearization because solution is fixed in the MER computation. The obtained linear program is:
It is easy to see that for all thanks to the constraints on . We indeed have when thanks to the constraint , and when thanks to constraints and .
Overall, variables are involved in the linearization of the expression : binary variables are used to linearize the function, and real variables are used to linearize the product term .
3.2 Linear programming for MMER computation
For computing , the objective function
can be linearized by using constraints (standard linearization of a objective function, where the max is taken over a finite set):
Note that computing the minmax expected regret over requires the introduction of one binary variable for each solution , so that
for all (while computing the max expected regret of a given solution only required the introduction of a single binary variable such that for ).
Let us consider the following program , involving quadratic constraints:
Proposition 0.
A solution optimizing is such that:
Proof.
We denote by the optimal value of . We now prove that is equal to . For a given instance of , constraint must be satisfied for any possible instance of . Thus, by Proposition 1, we have that for all because:
It implies that . As the objective function is , for each instance of , the variable takes value . The objective function implies that (1) for a given . Finally, varying over , we can easily see that , and thus (2) . The result follows from (1) and (2). ∎
The quadratic terms are linearized by introducing positive real variables :
One comes up with a mixed integer linear program involving binary variables , positive real variables and constraints, hence an exponential number of variables and constraints due to the combinatorial nature of the set . In the remainder of the section, we propose a method to overcome this issue.
3.3 MMER computation method
The proposed method is based on mixed integer linear programming with dynamic generation of variables and constraints to compute , an optimal solution and its best challenger .
Let us first define a mixed integer linear program that contains only a subset of variables and , and a subset of constraints of type . Given a subset of solutions, computes the minimax expected regret defined by:
Put another way, is the max expected regret of a solution w.r.t. solutions in . More formally, is written as follows:
Note that now only involves variables , variables and constraints.
The algorithm we propose consists in alternatively solving and . Let (resp. ) denote the optimal solution returned by solving (resp. for ). The algorithm starts with a small set of feasible solutions (see Section 3.5 for details regarding the initialization), and then iteratively grows this set by adding to the best challenger of . Convergence is achieved when returns a solution that already belongs to , which implies that . Algorithm 1 describes the procedure.
By abuse of notation, is viewed in the algorithm as a procedure returning the couple consisting of the optimal value mmer of and the corresponding optimal solution . Similarly, is viewed as a procedure returning the couple consisting of the optimal value mer_ of and the corresponding optimal solution . At the termination of the algorithm, mmer corresponds to and is the MMER solution (and its best challenger).
Proposition 0.
Algorithm 1 terminates and returns a minmax expected regret solution and its best challenger.
Proof.
First, it is easy to see that Algorithm 1 always terminates. Indeed, at every step of the algorithm, if the stopping condition is not satisfied then a new solution is added to and a new iteration is performed. In the worst case, all the solutions of are added to the set and the stopping condition is trivially satisfied.
We now prove the validity of Algorithm 1, i.e.:
Assume that (if the equality is trivially true). On the one hand, at any step of the algorithm, we have (1) mer_ because . On the other hand, if then the constraint is satisfied for mmer, i.e., . As by definition of , it implies that (2) mer_. By (1) and (2), we conclude that mmer _.
Finally, mmer minimizes for , thus mmer minimizes over . Consequently, mer_ for all and then mer_ . Thus, by definition of the MMER, we have mer_ and thereby _. ∎
3.4 Clustering the samples
In order to decrease the computation times between queries, we propose to reduce the number of variables and constraints in by applying a clustering method on each sample drawn from density . Let denote the set of cluster centers. The idea is to replace the weights by the cluster centers, the formula for the pairwise expected regret becoming:
(7) 
where is the weight of the cluster center and represents the proportion of weighting vectors of that are in the cluster of center . The formulas for and are adapted in the same way.
3.5 Incremental decision making approach
As detailed in section 2, the MMER computation is used to determine which query to ask at each step as well as to trigger the stopping condition. The whole incremental decision making procedure is summarized in Algorithm 2. The set
is heuristically defined as the set of
optimal solutions for in (Line 2) but can be defined otherwise without any impact on the result in proposition 3. The variable mmer (Line 2) represents the current minmax expected regret value and is computed using Algorithm 1 by replacing the sample by the set of cluster centers .4 Experimental results
Algorithm 2 has been implemented in Python using the SciPy library for Gaussian sampling, the ScikitLearn library for the clustering operations^{1}^{1}1We used kmeans clustering. and the gurobipy module for solving the mixed integer linear programs. The numerical tests have been carried out on randomly generated instances of the multiobjective knapsack and allocation problems. For all tests we used an Intel(R) Core(TM) i74790 CPU with 15GB of RAM.
Multiobjective Knapsack Problem (MKP)
This vector optimization problem is formulated as subject to , where is an matrix of general term representing the utility of item w.r.t objective , is a vector of binary decision variables such that if item is selected and otherwise, is the weight of item and is the knapsack’s capacity. The set of feasible knapsacks is , and the performance vector associated to a solution is .
To simulate elicitation sessions, we consider the problem where . The weighting vector in is initially unknown. We generated instances of MKP for objectives and items. Every item has a positive weight uniformly drawn in , and . Utilities are uniformly drawn in to make sure that .
Multiobjective Allocation Problem (MAP)
Given agents, shareable resources, and a bound on the number of agents that can be assigned to a resource, the set of feasible allocations of resources to agents consists of binary matrices of general term such that and , where are decision variables such that if agent is assigned resource , and otherwise. The cost of an allocation w.r.t. criterion is defined by , where is the cost of assigning agent to resource w.r.t. criterion .
We consider the problem where and is initially unknown. For the tests, we generated instances with criteria, agents, resources and a bound on the number of agents that can be assigned to a resource. The values are randomly generated in , then normalized () to ensure that .
Simulation of the DM’s answers
In order to simulate the interactions with the DM, for each instance, the hidden weighting vectors are uniformly drawn in the canonical basis of (the more vector is unbalanced, the worse the initial recommendation). At each query, the answer is obtained using the response model given in Section 2.2, i.e., for query , the answer depends on the sign of , where . We used different values of to evaluate the tolerance of the approach to wrong answers. We set to simulate a DM that is perfectly reliable in her answers. The strictly positive values are used to simulate a DM that may be inconsistent in her answers. For MKP (resp. MAP), setting led to (resp. ) of wrong answers, while led to (resp. ) of wrong answers.
Parameter settings in algorithms
The prior density in Algorithm 1 is set to , where
is the identity matrix
, so that the distribution is rather flat. At each step of Algorithm 2, a new sample of 100 weighting vectors is generated; the vectors are normalized and partitioned into clusters. This number of clusters has been chosen empirically after preliminary numerical tests: considering the entire sample or using more than 20 clusters led to higher computation times and did not offer a significant improvement on the quality of the recommendations. Last but not least, we stopped the algorithm after 15 queries if the termination condition was not fulfilled before.Illustrative example
Before coming to the presentation of the numerical results, let us first illustrate the progress of the elicitation procedure on the following example: we applied Algorithm 2 on a randomly generated instance of MKP with agents, items, a hidden weighting vector , and we set , which led to an error rate of . Figure 1 illustrates the convergence of the generated samples of weighting vectors (Line 2 of Algorithm 2) toward the hidden weight during the execution of the algorithm. As the weighting vectors are normalized, two components are enough to characterize them. Every graph shows the sample drawn at a given step of the algorithm: before starting the elicitation procedure (Query 0), after query and query .
Query 0  Query 3  Query 10 
Analysis of the results
We first evaluated the efficiency of Algorithm 2 according to the value of . We observed the evolution of the quality of the recommendation (the minimax expected regret solution) after every query. The quality of a recommendation is defined by the score for MKP and by for MAP, where is the hidden weighting vector and is an optimal solution for .