1 Introduction
Preferences [Peintner et al.2008]
play an important role in a variety of artificial intelligence applications and the task of eliciting or learning preferences is a crucial one; typically only limited information about the user’s preferences will be available and the cost (cognitive or computational) of obtaining additional preference information will be high.
The automated assessment of preferences has received considerable attention, starting with pioneering works in the OR community, such as [White III et al.1984] and especially the UTA methodology [JacquetLagrèze and Siskos1982] giving rise to a wide variety of extensions [JacquetLagreze and Siskos2001, Greco et al.2008]. Within AI, a number researchers have proposed interactive methods that elicit preferences in an adaptive way [Chajewska et al.2000, Boutilier2002, Wang and Boutilier2003, Boutilier et al.2006, Guo and Sanner2010, Viappiani and Boutilier2010], observing that, by asking informative questions, it is often possible to make nearoptimal decisions with only partial preference information.
While most works assume that items or decisions are available in a (possibly large) dataset, in this paper we propose an adaptive elicitation framework that takes a constructive view on preference elicitation, enlarging its scope from the selection of items among a set of candidates to the synthesis of entirely novel instances. Instances are solutions to a given optimization problem; they are represented as combinations of basic elements (e.g. the components of a laptop) subject to a set of constraints (e.g. the laptop model determines the set of available CPUs). A utility function is learned over the feature representation of an instance, as customary in many preference elicitation approaches. The recommendation is then made by solving a constrained optimization problem in the space of feasible instances, guided by the learned utility.
Preference elicitation in configuration problems has been previously tackled with regretbased elicitation [Boutilier et al.2006, Braziunas and Boutilier2007], where minimax regret is used both as a robust recommendation criterion and as a technique to drive elicitation. The main limitation of their approach is the lack of tolerance with respect to user inconsistency. Indeed, learning a user utility function requires to deal with uncertain and possibly inconsistent user feedback.
Bayesian preference elicitation approaches deal with this problem by building a probability distribution on candidate functions (endowed with a response or error model to be used for inference) and asking queries maximizing informativeness measures such as
expected value of information (EVOI) [Chajewska et al.2000, Guo and Sanner2010, Viappiani and Boutilier2010]. These approaches are however computationally expensive and can not scale to fully constructive scenarios, as shown in our experimental results.We take a space decomposition perspective and jointly learn a set of weight vectors, each representing a candidate utility function, maximizing diversity between the vectors and consistency with the available feedback. These two conflicting objectives tend to generate equally plausible alternative hypotheses for the unknown utility. Our approach to elicitation works by combining weight vector learning with instance generation, so that each iteration of the algorithm produces two outcomes: a set of weight vectors and a set of instances, each maximizing its score according to one of the weight vectors. We evaluate the effectiveness of our approach by testing our elicitation method in both synthetic and realworld problems, and comparing it to stateoftheart methods.
2 Background
We first introduce some notation. We use boldface letters to indicate vectors, uppercase letters for matrices, and calligraphic capital letters for sets. We abbreviate the set as whenever the range of the index is clear from the context, and use as a shorthand for . We write to indicate the vector norm, for the usual dot product, for matrix transposition.
We assume to have a multiattribute feature space of configurations over features. For the sake of simplicity we focus on binary features only, i.e. for all
, assuming a onehot encoding of categorical features. This is a common choice for preference elicitation methods
[Guo and Sanner2010, Viappiani and Boutilier2010]. Support for linearly dependent continuous features will be discussed later on.We further assume that the set of feasible configurations, denoted by , is expressed as a conjunction of linear constraints. This allows to formulate both arithmetic and logical constraints, e.g. under the canonical mapping of to and to
, the Boolean disjunction of two binary variables
can be rewritten as .Consistently with the experimental settings of previous work [Guo and Sanner2010, Viappiani and Boutilier2010], we model users with additive utility functions [Keeney and Raiffa1976]; the user’s preferences are represented by a weight vector and the utility of a configuration is given by . In the remainder of the paper we require all weights to be nonnegative and bounded: the perattribute weights must lie in a (constant but otherwise arbitrary) interval , with . Both requirements are quite natural^{1}^{1}1Utility values are defined on an interval scale, thus it is always possible to scale the values appropriately (see for instance [Torra and Narukawa2007] and [Keeney and Raiffa1976])., and enable the translation of our core optimization problem into a mixedinteger linear problem (as done in Section 3).
During learning, the actual weight vector is unknown
to the learning system, and must be estimated by interacting with the user. We mostly focus on pairwise comparison queries, which are the simplest of the comparative queries. These can be extended to choice sets of more than two options
[Viappiani and Craig2009, Viappiani and Boutilier2010] and are common in conjoint analysis [Louviere et al.2000, Toubia et al.2004]. For a pairwise comparison between two configurations and : either is preferred to (written ), is preferred to (), or there is no clear preference between the two items (). We write to denote the set of preferences (answers to comparison queries) elicited from the user.In the next Section we describe how informative queries can be generated using our setwise maxmargin learning.
3 Setwise Maxmargin Learning
Nonlinear Formulation.
We first introduce the problem formulation as a nonlinear optimization problem, and then show how to reduce it to a mixed integer linear program.
The goal of our setwise maxmargin approach is twofold. First, for any given set size , we want to find a set of weight vectors , chosen so that all userprovided preferences are satisfied by the largest possible margin (modulo inconsistencies) and so that they are maximally diverse. Second, we want to construct a set of configurations , so that each configuration is the “best” possible option when evaluated according to the corresponding and configurations are maximally diverse among each other. These options will be later used to formulate queries.
The first goal is achieved by translating all pairwise preferences into ranking constraints: preferences of the form become linear inequalities of the form , where is the margin variable (which we aim at maximizing) and ranges over the responses. Nonseparable datasets, which occur in practice due to occasional inconsistencies in user feedback, are handled by introducing slack variables (whose sum we aim at minimizing) in a way similar to UTA and its extensions [JacquetLagrèze and Siskos1982, Greco et al.2008]. When augmented with the slacks, the above inequalities take the form where is the penalty incurred by weight vector for violating the margin separation of pair . Indifference preferences, i.e. , are translated as ; the slack increases with the difference between the estimated utility of the two options.
The second goal requires to jointly maximize the utility of each according to its corresponding weight vector and its scoring difference with respect to the other configurations in the set. We achieve this by maximizing the sum of utilities and adding ranking constraints of the form for all , .
A straightforward encoding of the above desiderata leads to the following mixed integer nonlinear optimization problem over the nonnegative margin and vectors , :
s.t.  (1)  
(2)  
(3)  
(4) 
Let us illustrate the above piece by piece. The objective is composed of four parts: we maximize the shared margin (first part) and minimize the total sum of the ranking errors incurred by each weight vector (second part), while at the same time regularizing the magnitude of the weights (third part) and the quality of the configurations
(last part). The nonnegative hyperparameters
control the influence of the various components. The weight regularization term copes with the common scenario in which the user has strong preferences about some attributes, but is indifferent to most of them. The penalty is frequently used to improve the sparsity of learned models [Tibshirani1996, Zhang and Huang2008, Hensinger et al.2010], with consequent gains in generalization ability and efficiency, as confirmed by our empirical findings (see Section 4). Constraint (1) enforces the correct ranking of the observed user preferences, while (2) ensures that the generated configurations are diverse in terms of the weight vectors they maximize. Constraints (3) and (4) ensure that the weights and configurations are feasible and guarantees the nonnegativity of the slacks. Since we require , Eq. (3) also enforces the weights to be nonnegative.Note that we are choosing the configurations and the weight vectors simultaneously. We look for so that the utility loss (see constraint 2) of choosing instead of , , is large (at least ). Look at Figure 1, where, for simplicity, we need to choose a pair (). Eq. 2 is represented by a red line, that partitions the space of feasible utility weights in two parts (in general, there will be subregions). Since we maximize the margin , the optimizer will prefer a set of configurations that partitions the weight space in an “even” way.^{2}^{2}2This bears similarity with volumetric approaches [Iyengar et al.2001], but there are important differences: first here we consider real items to find the best separator, second the margin is also expressed in utility terms, third the query is found via an optimization process. In each subregion, we have corresponding lying “close” to its centre. If, for example, the user indicates a preference for over , the feasible region will then become the part of the polytope to the left of the red line; moreover the vector will maximize the margin in the classic () sense in the new feasible region.
MILP Formulation.
This initial formulation is problematic to solve, as Eq. (2) involves quadratic terms over mixed continuous integer variables. However, the problem can be reformulated as a mixed integer linear program (MILP) by a suitable transformation. This technique is rather common in operational research, see e.g. [Boutilier et al.2006].
Our goal is to replace Eq. (2) with a set of linear constraints. In order to do so, we introduce a set of fresh variables for every and . Assuming for the time being that the new variables do satisfy the equation , we rewrite the fourth component of the objective function in terms of the new variables as:
and, similarly, Eq. (2) as:
The fact that is achieved by setting the following additional constraints. We distinguish between two cases: (i) and (ii) for . Recall that we are maximizing the margin . Now, due to Eq. (3), the optimizer will try to keep as large as possible and as small as possible.
(Case i) We add an explicit upper bound: , where is a sufficiently large constant. On one hand, if the product evaluates to , and so does the upper bound . On the other hand, if then the product amounts to , while the upper bound reduces to . By taking a sufficiently large constant (e.g. ) the upper bound simplifies to . Since is being maximized, in both cases it will attain the upper bound, and thus satisfy .
(Case ii) We add an explicit lower bound: . If the lower bound simplifies to , due to the nonnegativity of . Otherwise, if then the lower bound becomes , where the second term is at most . Since is being minimized, in both cases it will attain the lower bound, and thus satisfy .^{3}^{3}3Since is upperbounded by Eq. (1), in some cases the variables do not attain the lower bound. As a consequence, the MILP reformulation of Eq. (2) is a (tight) approximation of the original one. This has no impact on the quality of the solutions.
We thus obtain the following mixedinteger linear problem:
s.t.  
(5)  
(6)  
(7)  
(8)  
which can be solved by any suitable MILP solver.
Setwise maxmargin.
The full SetMargin algorithm follows the usual preference elicitation loop. Starting from an initially empty set of user responses , it repeatedly solves the MILP problem above using to enforce ranking constraints on the weight vectors . The generated configurations , which are chosen to be as good as possible with respect to the estimated user preferences, and as diverse as possible, are then employed to formulate a set of user queries. The new replies are added to and the whole procedure is repeated. Termination can be after a fixed number of iterations, when the difference between utility vectors is very small, or might be left to the user to decide (e.g. [Reilly et al.2007]).
The procedure is sketched in Algorithm 1. Note that at the end of the preference elicitation procedure, a final recommendation is made by solving the MILP problem for .
Linearly dependent real attributes.
In many domains of interest, items are composed of both Boolean and realvalued attributes, where the latter depend linearly on the former. This is for instance the case for the price, weight and power consumption of a laptop, which depend linearly on the choice of components. In this setting, configurations are composed of two parts: , where is Boolean and is realvalued and can be written as for an appropriately sized nonnegative cost matrix . It is straightforward to extend the MILP formulation to this setting. We rewrite the weight vector as . The utility becomes:
The generalized problem is obtained by substituting with . All constraints remain the same. The only notable change occurs in Eq. (8), which becomes:
4 Experiments
We implemented the SetMargin algorithm using Python, leveraging Gurobi 6.5.0 for solving the core MILP problem. Both the SetMargin source code and the full experimental setup are available at https://github.com/stefanoteso/setmargin.
We compare SetMargin against three stateoftheart Bayesian approaches: i) the Bayesian approach from [Guo and Sanner2010], selecting queries according to restricted informed VOI (riVOI
), a computationally efficient heuristic approximation of valueofinformation, and inference using TrueSkill
[R. et al.2006] (based on expectation propagation [Minka2001]); ii) the Bayesian framework of [Viappiani and Boutilier2010]using Monte Carlo methods (with 50,000 particles) for Bayesian inference and asking choice queries (i.e. selection of the most preferred item in a set) selected using a greedy optimization of
Expected Utility of a Selection (a tight approximation of EVOI, hereafter just called EUS); iii) Query Iteration (referred as QI below), also from [Viappiani and Boutilier2010], an even faster query selection method based on sampling sets of utility vectors.We adopt the indifferenceaugmented BradleyTerry user response model introduced in [Guo and Sanner2010]. The probability that a user prefers configuration over is defined according to the classical (without indifference) BradleyTerry model [Bradley and Terry1952] as , where
is the weight vector of the true underlying user utility. Support for indifference is modelled as an exponential distribution over the closeness of the two utilities, i.e.
The parameters and were set to one for all simulations, as in [Guo and Sanner2010].In all experiments SetMargin uses an internal 5fold crossvalidation procedure to update the hyperparameters , , and after every 5 iterations. The hyperparameters are chosen as to minimize the ranking loss over the user responses collected so far. is taken in , while and are taken in .^{4}^{4}4Note that for Eq. (6) and Eq. (7) disappear, so can not be taken to be less than , as in this case, the objective can be increased arbitrarily while keeping the righthand side of Eq. (1) constant, rendering the problem unbounded.
Uniform  

Normal  
Sparse Uniform  
Sparse Normal 
(right) datasets. Each row represents a different sampling distribution for user utility. The number of iterations is plotted against the utility loss (first and third columns) and the cumulative time (second and fourth columns). Thick lines indicate median values over users, while standard deviations are shown as shaded areas.
Synthetic Dataset.
Following the experimental protocol in [Guo and Sanner2010] and [Viappiani and Boutilier2010], in the first experiment we evaluate the behavior of the proposed method in an artificial setting with increasingly complex problems. We developed synthetic datasets with attributes, for increasing values of . Each attribute takes one of possible values, so that the onehot encoding of attributes results in features. In terms of space of configurations, for the synthetic dataset corresponds to , for to , and so on. The cardinality of is , and grows (super) exponentially with . For , the dataset is comparable in size to the synthetic one used in [Guo and Sanner2010] and [Viappiani and Boutilier2010]. For larger the size of the space grows much larger than the ones typically used in the Bayesian preference elicitation literature, and as such represents a good testbed for comparing the scalability of the various methods. The feasible configuration space was encoded in SetMargin through appropriate MILP constraints, while the other methods require all datasets to be explicitly grounded. Users were simulated by drawing random utility vectors from each of four different distributions. The first two mimic those used in [Guo and Sanner2010]
: (1) a uniform distribution over
for each individual weight, and (2) a normal distribution with mean
and standard deviation (each attribute is sampled i.i.d). We further produced two novel sparse versions of the uniform and normal distributions setting to zero of the entries (sampled uniformly at random). We set a maximum budget of 100 iterations for all methods for simplicity.In Figure 2 we report solution quality and timing values for increasing number of collected user responses, for the different competitors on each of the four different utility vector distributions and datasets and . Solution quality is measured in terms of utility loss , where is the true unknown user utility, and is the solution recommended to the user after the elicitation phase (see Algorithm 1). Computational cost is measured in terms of cumulative time. Given that riVOI, QI and EUS are singlethreaded, we disabled multithreading when running our algorithm in these comparisons. All experiments were run on a 2.8 GHz Intel Xeon CPU with 8 cores and 32 GiB of RAM. For all algorithms, one iteration corresponds to a single pairwise query (we used SetMargin with ). For dense weight vector distributions (first two rows), our approach achieves results which are indistinguishable from the competitors in a fraction of their time. Indeed, all Bayesian approaches become quickly impractical for growing values of , while our algorithm can easily scale to much larger datasets, as will be shown later on. For sparse weight vector distributions (last two rows) our approach, in addition to being substantially faster on each iteration, requires less queries in order to reach optimal solutions. This is an expected result as the sparsification norm in our formulation () is enforcing sparsity in the weights, while none of the other approaches is designed to do this.
uniform  

sparse normal 
sparse uniform 

In order to study the effect of increasing the number of weight vectors in our formulation, we also ran SetMargin varying the parameter . Figure 3 reports utility loss results on and datasets for the uniform and sparse normal distributions (the toughest and the simplest, for space limitations). The first and third columns report results in terms of number of iterations. It can be seen that increasing the number of weight vectors tends to favour earlier convergence, especially for the more complex dataset (). However, as in each iteration the user is asked to compare items, different values of imply a different cognitive effort for the user. The second and fourth columns report results in terms of number of queries, where we count all pairs of queries when comparing items. In this case, seems to be the best option. The cognitive cost for the user will likely lay in between these two extremes, but formalizing this concept in an efficient query ordering strategy needs to face the effect of noise. A modified sorting algorithm asking only queries to the user resulted in a performance worsening, likely because of a cascading effect of inconsistent feedback (but could be beneficial with different noise levels).
Constructive dataset.
Next, we tested SetMargin on a truly constructive setting. We developed a constructive version of the PC dataset used in [Guo and Sanner2010]: instead of explicitly enumerating all possible PC items, we defined the set of feasible configurations with MILP constraints.
A PC configuration is defined by eight attributes: computer type (laptop, desktop, or tower), manufacturer (8 choices), CPU model (37), monitor size (8), RAM amount (10), storage (10) size, and price. The price attribute is defined as a linear combination of the other attributes: this is a fair modeling choice, as often the price of a PC is well approximated by the sum of the price of its components plus a bias due to branding. Interactions between attributes are expressed as Horn clauses (e.g. a certain manufacturer implies a set of possible CPUs). The dataset includes 16 Horn constraints (the full list is omitted for space limitations). Note that the search space is of the order of hundreds of thousands of candidate configurations, and is far beyond reach of existing Bayesian approaches.
Figure 4 reports results of SetMargin varying using the sparse uniform distribution (the more complex of the sparse ones, dense distributions being unrealistic in this scenario). The first and third column report utility loss for increasing number of iterations and queries respectively, showing a behaviour which is similar to the one in Figure 3. Overall, between 50 and 70 queries on average are needed in order to find a solution which is only 10% worse than the optimal one, out of the more than 700,000 thousands available. Note that a vendor may ensure a considerably smaller number of queries by cleverly constraining the feasible configuration space; since our primary aim is benchmarking, we chose not to pursue this direction further. The second and fourth columns report cumulative times. Note that in some cases, standard deviations have a bump; this is due to cases in which some of the hyperparameters of the internal cross validation result in illconditioned optimization problems which are hard to solve. These exceptions can be easily dealt with by setting an appropriate timeout on the cross validation without affecting the results, as these hyperparameters typically end up having bad performance and being discarded.
5 Conclusion
We presented a maxmargin approach for efficient preference elicitation in large configuration spaces.^{5}^{5}5Note that maxmargin learning has been proposed before [Gajos and Weld2005] for preference elicitation, but with rudimental methods for query selection. Our approach relies on an extension of maxmargin learning to sets, and is effective in the generation of a diverse set of configurations that can be used to ask informative preference queries. The main advantages of this elicitation method are 1) ability to provide recommendations in large configuration problems 2) robustness with respect to erroneous feedback and 3) ability to encourage sparse utility functions. Experimental comparisons against stateoftheart Bayesian preference elicitation strategies confirm these advantages. Future work includes extending the approach to truly hybrid scenarios (where real valued attributes do not depend on categorical ones) and studying its applicability to other problems, as the identification of Choquet models [AhPine et al.2013].
Acknowledgments
ST was supported by the Caritro Foundation through project E62I15000530007. PV was supported by the Idex Sorbonne Universités under grant ANR11IDEX000402. We thank Craig Boutilier for motivating discussion on the topic.
References
 [AhPine et al.2013] J. AhPine, B. Mayag, and A. Rolland. Identification of a 2additive bicapacity by using mathematical programming. In Algorithmic Decision Theory, pages 15–29. 2013.
 [Boutilier et al.2006] C. Boutilier, R. Patrascu, P. Poupart, and D. Schuurmans. Constraintbased Optimization and Utility Elicitation using the Minimax Decision Criterion. Artifical Intelligence, 170:686–713, 2006.
 [Boutilier2002] C. Boutilier. A POMDP Formulation of Preference Elicitation Problems. In Proceedings of AAAI’02, pages 239–246, 2002.
 [Bradley and Terry1952] R. A. Bradley and M. E. Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345, 1952.
 [Braziunas and Boutilier2007] D. Braziunas and C. Boutilier. Minimax regret based elicitation of generalized additive utilities. In Proceedings of UAI’07, pages 25–32, 2007.
 [Chajewska et al.2000] U. Chajewska, D. Koller, and R. Parr. Making rational decisions using adaptive utility elicitation. In Proceedings of AAAI’00, pages 363–369, 2000.
 [Gajos and Weld2005] K. Gajos and D. Weld. Preference elicitation for interface optimization. In UIST, pages 173–182, 2005.
 [Greco et al.2008] S. Greco, V. Mousseau, and R. Słowiński. Ordinal regression revisited: multiple criteria ranking using a set of additive value functions. European Journal of Operational Research, 191(2):416–436, 2008.
 [Guo and Sanner2010] S. Guo and S. Sanner. Realtime multiattribute bayesian preference elicitation with pairwise comparison queries. In Proceedings of AISTAT’10, pages 289–296, 2010.
 [Hensinger et al.2010] E. Hensinger, I. Flaounas, and N. Cristianini. Learning the preferences of news readers with svm and lasso ranking. In Artificial Intelligence Applications and Innovations, pages 179–186. 2010.
 [Iyengar et al.2001] V. S. Iyengar, J. Lee, and M. Campbell. QEval: Evaluating multiple attribute items using queries. In Proceedings of the Third ACM Conference on Electronic Commerce, pages 144–153, 2001.
 [JacquetLagrèze and Siskos1982] E. JacquetLagrèze and Y. Siskos. Assessing a set of additive utility functions for multicriteria decision making: the UTA method. European Journal of Operational Research, 10:151–164, 1982.
 [JacquetLagreze and Siskos2001] E. JacquetLagreze and Y. Siskos. Preference disaggregation: 20 years of mcda experience. European Journal of Operational Research, 130(2):233–245, 2001.
 [Keeney and Raiffa1976] R. L. Keeney and H. Raiffa. Decisions with Multiple Objectives: Preferences and Value Tradeoffs. John Wiley and Sons, New York, 1976.
 [Louviere et al.2000] J. J. Louviere, Hensher D. A., and J. D. Swait. Stated Choice Methods: Analysis and Application. Cambridge University Press, Cambridge, 2000.
 [Minka2001] T. Minka. Expectation propagation for approximate bayesian inference. In Proceedings of UAI’01, pages 362–369, 2001.
 [Peintner et al.2008] B. Peintner, P. Viappiani, and N. YorkeSmith. Preferences in interactive systems: Technical challenges and case studies. AI Magazine, 29(4):13–24, 2008.
 [R. et al.2006] Herbrich R., Minka T., and Graepel T. Trueskill: A bayesian skill rating system. In Proceedings of NIPS’06, pages 569–576, 2006.
 [Reilly et al.2007] J. Reilly, J. Zhang, L. McGinty, P. Pu, and B. Smyth. Evaluating compound critiquing recommenders: a realuser study. In Proceedings of the 8th ACM conference on Electronic commerce, pages 114–123. ACM, 2007.
 [Tibshirani1996] T. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1):267–288, 1996.
 [Torra and Narukawa2007] V. Torra and Y. Narukawa. Modeling decisions  information fusion and aggregation operators. Springer, 2007.
 [Toubia et al.2004] O. Toubia, J. R. Hauser, and D. I. Simester. Polyhedral methods for adaptive choicebased conjoint analysis. Journal of Marketing Research, 41(1):116–131, 2004.
 [Viappiani and Boutilier2010] P. Viappiani and C. Boutilier. Optimal bayesian recommendation sets and myopically optimal choice query sets. In Proceedings of NIPS’10, pages 2352–2360, 2010.
 [Viappiani and Craig2009] P. Viappiani and B. Craig. Regretbased optimal recommendation sets in conversational recommender systems. In Proceedings of RecSys’09, pages 101–108, 2009.
 [Wang and Boutilier2003] T. Wang and C. Boutilier. Incremental Utility Elicitation with the Minimax Regret Decision Criterion. In Proceedings of IJCAI’03, pages 309–316, 2003.
 [White III et al.1984] C. C. White III, A. P. Sage, and S. Dozono. A model of multiattribute decisionmaking and tradeoff weight determination under uncertainty. IEEE Transactions on Systems, Man, and Cybernetics, 14(2):223–229, 1984.

[Zhang and Huang2008]
C.H. Zhang and J. Huang.
The sparsity and bias of the lasso selection in highdimensional linear regression.
Ann. Statist., 36(4):1567–1594, 08 2008.
Comments
There are no comments yet.