Introduction
Constructive preference elicitation is the task of recommending structured objects, i.e. configurations of several components, assembled on the basis of the user preferences [Teso, Passerini, and Viappiani2016, Dragone et al.2016]. In this setting, the space of possible configurations grows exponentially in the number of components. Examples include configurable products, such as personal computers or mobile phone plans, and complex preferencebased decision problems, such as customized travel planning or personalized activity scheduling.
The suggested configurations should reflect the preferences of the user, which are unobserved and must be estimated. As in standard preference elicitation [Pigozzi, Tsoukiàs, and Viappiani2016], preferences can be learned by iteratively suggesting candidate products to the user, and refining an estimate of the preference model from the received feedback. The ultimate goal is to produce good recommendations with minimal user effort. Here we focus on choice queries, an interaction protocol consisting in recommending a set of products; the user is invited to indicate the most preferred item in the set [Viappiani and Boutilier2011, Louviere, Hensher, and Swait2000]. Elicitation techniques based on choice set queries rely on some strategy to select the next query set to show to the user. Successful query selection strategies must balance between the estimated informativeness of the recommendations (so to minimize the number of elicitation rounds) and their quality (to maximize the chance of the user buying the product and to keep her engaged). By generalizing pairwise ranking feedback, choice queries over larger sets of items allow finer control over informativeness, diversity and quality [Pu and Chen2009, Bollen et al.2010].
Most existing preference elicitation methods are not designed for constructive tasks [Viappiani and Boutilier2011, Teso, Passerini, and Viappiani2016]. Regretbased methods [Viappiani and Boutilier2009] rely on perfectly rational user responses, while Bayesian approaches do not scale to combinatorial product spaces [Viappiani and Boutilier2010], as discussed in the related work section. A notable exception is the approach of Teso et al. [Teso, Passerini, and Viappiani2016]
, which avoids the enumeration of the product space by encoding it through mixedinteger linear constraints. Alas, it requires configurations to be encoded with binary variables (in onehot format), which can be very costly from a computational perspective, and comes with no formal performance guarantees.
In this paper we present several contributions. First, we propose an iterative algorithm, dubbed Choice Perceptron, that generalizes the structured Perceptron [Collins2002, Shivaswamy and Joachims2015] to interactive preference elicitation from pairwise and setwise choice feedback. The query selection strategy is implemented as an optimization problem over the combinatorial space of products. In contrast to previous constructive approaches [Teso, Passerini, and Viappiani2016], our algorithm handles general linear utilities over arbitrary feature spaces, including combinatorial and numerical attributes and features. Second, we prove that under a very general assumption (implied by many existing user response models), the expected average regret suffered by our algorithm decreases at least as . We show how the constants appearing in the bound depend on intuitive properties of the query selection strategy, and, as a third contribution, we propose a simple strategy to control these quantities. Our empirical analysis showcases the effectiveness of our approach against several stateoftheart (including constructive) alternatives.
Related work
Preference elicitation (PE) is a widely studied subject in AI [Domshlak et al.2011, Pigozzi, Tsoukiàs, and Viappiani2016]. Most existing approaches to PE rely on regret theory [Viappiani and Boutilier2009, Viappiani and Kroer2013] or Bayesian estimation [Viappiani and Boutilier2010]; see [Viappiani and Boutilier2011] for a brief overview. None of them are suitable for constructive settings, for different reasons. Regretbased methods maintain a version space of utility functions consistent with the collected feedback. However, inconsistent user responses, which are common in realworld recommendation, make the version space collapse. Bayesian methods gracefully deal with inconsistent feedback by employing a full distribution over the candidate utility functions. Unfortunately, selection of the query set (based on optimizing its Expected Value of Information or approximations thereof) is computationally expensive, preventing these approaches from scaling to larger combinatorial domains.
The only approach specifically designed for constructive preference elicitation is SetMargin, introduced in [Teso, Passerini, and Viappiani2016]. SetMargin can be seen as a maxmargin approximation of Bayesian methods that maintains only most promising candidate utility functions (with small, e.g. to
). Like the Choice Perceptron, it avoids the explicit enumeration of the product catalogue by compactly defining the latter in terms of MILP constraints, for significant runtime benefits. Alas, it only handles configurations encoded in onehot form, which can become inefficient for very complex problems involving many categorical variables, relies on a rather involved optimization problem, and it has not be analyzed from a theoretical standpoint. Our query strategy is much simpler, and aims specifically at optimizing an upper bound on the regret.
Our method is related to Coactive Learning [Shivaswamy and Joachims2015], which has already found application in constructive tasks [Teso, Dragone, and Passerini2017, Dragone et al.2016]; some concepts and arguments used in our theoretical analysis are adapted from the Coactive Learning literature [Shivaswamy and Joachims2012, Raman et al.2013]. However, in our framework the user is asked to choose an option from a set of alternatives, rather than to construct an improved configuration. The two approaches are complementary in the sense that when manipulative feedback is easy to obtain Coactive Learning may be better suited; however when the space of products is highly constrained, producing feasible improvements may be difficult for the user, and our approach is preferable.
The Choice Perceptron algorithm
We consider a combinatorial space of structured products defined by hard feasibility constraints. As customary in preference elicitation, we focus on the problem of learning a utility function that ranks candidate objects according to the user preferences. The utility of a product may optionally depend on some externally provided context . In the rest of the paper, we assume that the user’s true utility function is fixed and never observed by the algorithm, and that it is linear, i.e. of the form ; here are the true preference weights of the user and maps contextconfiguration pairs to a
dimensional feature space. The feature vectors
are assumed to be enclosed in a ball of radius .We propose the Choice Perceptron (cp) algorithm; the pseudocode is listed in Algorithm 1. The cp algorithm keeps an estimate of the true user utility, and iteratively refines it by interacting with the user. At each iteration , the algorithm receives a context and recommends a set of configurations , by selecting them according to some query strategy based on ^{1}^{1}1The cp algorithm is independent from the particular query selection strategy used. Different query strategies may find better recommendations in different problems.. After receiving the query set, the user chooses the “best” object according to her preferences. This kind of setwise interaction protocol generalizes pairwise ranking feedback, and is well studied in decision theory, psychology, and econometrics [Louviere, Hensher, and Swait2000, Toubia, Hauser, and Simester2004, Pu and Chen2009]. We allow the choice to be noisy, i.e. the user may choose according to a distribution .
After observing the user’s pick, the algorithm updates the current estimate . Here we focus on the following Perceptron update:
(1) 
where is a constant stepsize. Despite its simplicity, this update comes with sound theoretical guarantees, as shown in the next section^{2}^{2}2Further, our results could be extended to more sophisticated updating mechanisms, see e.g. [Shivaswamy and Joachims2015]..
We measure the quality of a recommendation set in context by the instantaneous regret, that is the difference in true utility between a truly optimal object and the best option in the set:
This definition is in line with previous works on preference elicitation with setwise choice feedback [Viappiani and Boutilier2010]. After iterations, the average regret is . A low average regret implies low instantaneous regret throughout the elicitation process, as is necessary for keeping the user engaged. In the next section we prove a theoretical upper bound on the expected average regret suffered by cp under a very general assumption on the user feedback.
Theoretical Analysis
In this section we analyze the theoretical properties of the cp algorithm, proving an upper bound on its expected average regret. In the following indicates the conditional expectation of with respect to , where is the iteration index; is the expectation of over the distribution of all user choices . We will also use the shorthands , and .
In order to derive the regret bound, we need to quantify the “quality” of the sets provided by the query strategy. To this end, we adapt the concept of expected informativeness from the Coactive Learning framework [Shivaswamy and Joachims2012]: For any query strategy, there exist and such that, for all and for all users:
(2) 
The LHS of Eq. 2 is the expected utility gain of the update rule (Eq. 1): a positive utility gain indicates that makes a step towards a better approximation of . The term on the RHS is instead the worstcase regret, i.e. the regret with respect to the worst object in the query set. This model simply quantifies the amount of utility gain in terms of a fraction of the worstcase regret and the slack term . Intuitively, captures the minimum quality of the query sets selected by the query strategy, while the slacks
are additional degrees of freedom that depend on the expected user replies.
Notice that the above definition is very general and can describe the behavior of any query selection strategy, provided appropriate values for and . Both occur as constants in our regret bound.
By requiring the user to behave “reasonably”, according to the following definition, we can guarantee the expected utility gain to always be nonnegative (Lemma Theoretical Analysis). This allows us to make explicit and assign a precise meaning to the value of the constant . A user is reasonable if, for any context and query set
, the probability
is a nondecreasing monotonic transformation of the true utility :This property is implied by many widespread user response models, including the BradleyTerry [Bradley and Terry1952] and ThurstoneMosteller [Mcfadden2001] models of pairwise choice feedback, and the PlackettLuce [Plackett1975, Luce1959] model of setwise choice feedback. It is also strictly less restrictive than applying any of these models.
Notably, when applied to a reasonable user, the update rule (Eq. 1) always yields a nonnegative expected utility gain. For a reasonable user with utility , it holds that at all iterations .
Proof.
Given that the user is reasonable, we apply the Chebyshev’s sum inequality to and , for :
Rearranging, we obtain:
∎
The lemma allows us to distinguish between informative and uninformative query sets, depending on whether the expected utility gain is strictly positive or null, respectively. We can use these definitions to derive an equivalent formulation of the informativeness making the constants explicit.
Let be the smallest constant such that for all iterations in which the query set is informative. For these iterations setting still satisfies the inequality in Eq. 2. On the other hand, when the query set is uninformative, must satisfy . Given that , the worstcase regret is upperbounded by , therefore it suffice to set . We can rewrite the expected informativeness as:
(3) 
Here is a constant that is equal to if any query set that may be chosen at iteration is expected to be uninformative and otherwise. Note that , therefore if is informative then (i.e. ), while if is uninformative then (i.e. ). We say that an iteration is expected uninformative if , and let be the total number of expected uninformative iterations.
The last property of the query selection strategy we define in order to state the bound is the affirmativeness, which we adapt from [Raman et al.2013] as follows: For any query selection strategy and for a fixed time horizon , there exists a constant such that . This definition states that is an upper bound on the average expected change in , for . Notice that may be positive, null or negative. Intuitively, a negative indicates that the query set is expected to produce a user choice that disagrees with the current estimate of . This is the case in which the algorithm receives the most information. In general, the smaller is, the quicker cp learns from the user feedback.
The previous assumptions on the user and definitions for the query strategy allow us to derive the following regret bound for cp along the same lines of what done in Coactive Learning [Shivaswamy and Joachims2012, Raman et al.2013]. For a reasonable user with true preference weights and an informative and affirmative query strategy, the expected average regret of the cp algorithm is upper bounded by:
Proof.
Using CauchySchwarz and Jensen’s inequalities:
(4) 
From the expected affirmativeness and :
Plugging this result into inequality (4) we have:
For a reasonable user, the informativeness in Eq. 3 holds for any query strategy. Applying it to the LHS of the above inequality, along with the law of total expectation, we get:
Applying the informativeness (Eq. 3):
Finally:
from which the claim follows. ∎
Query selection strategy
In the previous section we proved an upper bound on the expected average regret of cp for any query selection strategy, provided that the user is reasonable. Crucially, however, the bound depends on the actual value of , and . These constants depend both on the user and the query selection strategy. While the algorithm has no control on the user, an appropriate design of the query selection strategy can positively affect the impact of the constants on the bound. In the following we present a query selection strategy that aims at reducing the bound by finding a tradeoff between and .
Recall that we want to be large and and small. While have no direct control over and , which depend on all iterations, we can control their stepwise surrogates:
There is a tradeoff between the two, as they both depend on . Further, while is observed, is not. We proceed as follows. Since is not observed, we indirectly maximize by maximizing , i.e. by picking query configurations that are distant in feature space. For reasonable users, maximizing the distance between objects also tends to maximize the probability of picking a high utility object: the larger the distance, the higher the probability of picking objects with large difference in . On the other hand is observed, so we can choose query configurations with small difference in estimated utility by taking them from a plane orthogonal (or almost orthogonal) to . This way, is close to regardless of the choice of the user, implying . This reasoning leads to the following optimization problem:
s.t.  
where:  
The objective aims at optimizing a convex combination of the distances of the options in () and their distance from optimality (). The two terms are modulated by the hyperparameter. The third constraint forces the first configuration to be optimal, irrespective of the choice of , ensuring that when , contains at least one true optimal configuration. Finally, all options are required to be different in feature space. By maximizing the utility of the objects, we are also pushing towards zero, implying that iteration can only be expected uninformative when is (approximately) antiparallel to :
For reasonable users (by Lemma Theoretical Analysis), implying that the above case is extremely rare, and therefore .
This query strategy essentially attempts to find a good tradeoff between exploration () and exploitation (). In most cases a good strategy is to allow more exploration in the beginning of the elicitation and then exploit more when the algorithm has learned a good approximation of . We therefore set to in our experiments. This also ensures that decreases over time regardless of the user choice, thereby keeping constant.
In the following, we will stick to features expressible as linear functions of Boolean, categorical and continuous attributes. This choice is very general, and allows to encode arithmetical, combinatorial and logical constraints, as shown by our empirical evaluation. So long as the feasible set is also defined in terms of mixed linear constraint, query selection can be cast as a mixedinteger linear problem (MILP) and solved with any efficient offtheshelf solver.
We remark that the previous arguments apply to all choices of , i.e. to both pairwise and setwise choice feedback. Intuitively, larger set sizes imply more diverse and potentially more informative query sets, because they reduce the chance for a reasonable user to pick a low utility option. They also imply more conservative updates, mitigating the deleterious effect of uninformative choices. These effects are studied experimentally.
Empirical Evaluation
We compare cp against three stateoftheart preference elicitation approaches on three constructive preference elicitation tasks taken from the literature. The query selection problem is solved with Gecode via its MiniZinc interface [Nethercote et al.2007]^{3}^{3}3The complete experimental setting can be retrieved from: https://github.com/unitnsml/choiceperceptron.
The three competitors are: [i] the Bayesian approach of [Viappiani and Boutilier2010] using Monte Carlo methods (the number of particles was set to 50,000, as in [Teso, Passerini, and Viappiani2016]) with greedy query selection based on the Expected Utility of a Selection (a tight approximation of the Expected Value of Information criterion); [ii] Query Iteration, also from [Viappiani and Boutilier2010], a samplingbased query selection method that trades off query informativeness for computational efficiency; [iii] the setwise maximum margin method of [Teso, Passerini, and Viappiani2016], modified to accept setwise choice feedback; support for user indifference was also disabled^{4}^{4}4These changes have no impact on the performance of the method, and provide a generous boost to its runtime, due to the fewer pairwise comparisons collected at each iteration.. We indicate the competitors as vbeus, vbqi and SetMargin, respectively. As argued in the previous section, for cp we set to in all experiments, in order to allow more exploration earlier on during the search. In practice we also employ an adaptive Perceptron step size, which is adapted at each iteration from the set via crossvalidation on the collected feedback; it was found to work well empirically. SetMargin includes a similar tuning procedure.
Our experimental setup is modelled after [Teso, Passerini, and Viappiani2016]. We consider two different kinds of users: “uniform” and “normal” users, whose true preference vectors
are drawn, respectively, from a uniform and a normal distribution. Twenty users are sampled at random and kept fixed for each experiment. User responses are simulated with a PlackettLuce model
[Plackett1975, Luce1959]:We set as in [Teso, Passerini, and Viappiani2016]. In the first two experiments (which are contextless) we report the median over users of the instantaneous regret, as in [Viappiani and Boutilier2010] and [Teso, Passerini, and Viappiani2016]
; whereas, in the third experiment (with context) we report the median average regret. In all experiments we also report cumulative runtime and standard deviations.
. In both cases, experiments using uniformly distributed and normally distributed users are shown on the left plots and on the right plots respectively. Best viewed in color.
Synthetic experiment.
We evaluated all methods on the synthetic constructive benchmark introduced in [Teso, Passerini, and Viappiani2016]. The space of feasible configurations is the Cartesian product of attributes, each taking values in , i.e.
. The features are the onehot encoding of the attributes, for a total of
features. Here we focus on the case ( features, products) which is large enough to be nontrivial, and sufficiently small to be solvable by the two Bayesian competitors. For cp and SetMargin is encoded natively via MILP constraints; the Bayesian methods required to be enumerated. The users were sampled as in [Teso, Passerini, and Viappiani2016], i.e. from a uniform distribution in the range and a normal distribution with mean and standard deviation . All methods were run until either the user was satisfied (i.e. the regret reported by the method reached zero) or 25 iterations elapsed. We evaluated the importance of the query set size by running cp and SetMargin with . vbeus and vbqi were only run with , due to scalability issues. In the case (Figure 1, left), cp performs better than both vbqi and SetMargin, and worse than vbqi. The runtimes, however, vary wildly. The Bayesian competitors are much more computationally expensive than cp and SetMargin, confirming the results of [Teso, Passerini, and Viappiani2016]; the two MILP methods instead avoid the explicit enumeration of the candidate configurations, with noticeable computational savings. Notably, cp is faster than SetMargin, while performing comparably or better. The gap widens with set size (Figure 1, right; is similar, not shown). Here cp and SetMargin converge after a similar number of iterations, but with very different runtimes. The bottleneck of SetMargin is the hyperparameter tuning procedure; disabling it however severely degrades the performance, so we left it on.PC configuration.
In the second experiment, we compared cp and SetMargin on a much larger recommendation task, also from [Teso, Passerini, and Viappiani2016]. The goal is to suggest a fully customized PC configuration to a customer. A computer is defined by seven categorical attributes (manufacturer, CPU model, etc.) and a numerical one (the price, determined by the choice of components). The features include the onehot encodings of the attributes and the price. The relations between parts (e.g. what CPUs are sold by which manufacturers) are expressed as Horn constraints. The feasible space includes thousands of configurations, ruling the Bayesian competitors out [Teso, Passerini, and Viappiani2016]. The users were sampled as in the previous experiment. To help keeping running times low, the query selection procedure of cp is executed with a 20 seconds time cutoff. No time cutoff is applied to SetMargin.
The results for and can be seen in Figure 2 (left). On uniform users, cp consistently outperforms SetMargin for both choices of , despite the timeout. Notably, cp with (less informative queries) works as well as SetMargin with (more informed queries) in this setting. For normal users the situation is similar: with , SetMargin catches up with cp after about 80 iterations, but at considerably larger computational cost. Surprisingly, SetMargin behaves worse for than for ; cp instead improves monotonically, for a modest increase in computational effort. In all cases, the runtimes are very favorable to our method, also thanks to the timeout, which however does not compromise performance.
Trip planning.
Finally, we evaluated cp on a slightly modified version of the touristic trip planning task introduced in [Teso, Dragone, and Passerini2017]. Here the recommender must suggest a trip route between 10 cities, each annotated with an offering of 15 activities (resorts, services, etc.). The trip includes the path itself (which is allowed to contain cycles) and the time spent at each city. Differently from [Teso, Dragone, and Passerini2017], at each iteration the user issues a context indicating a subset of cities that the trip must visit. The features include the number of days spent at each location, the number of times an activity is available at the visited locations, the cost of the trip, etc., for a total of 127 features; see [Teso, Dragone, and Passerini2017] for the details. Note that this problem can not be encoded in SetMargin, i.e. with Boolean and dependent numerical attributes, without incurring significant encoding overhead: the resulting SetMargin query selection problem would include approximately 300 Boolean variables (an almost 300% blowup in problem size). According to our tests, problems of this size are not solvable in realtime in practice, compromising the reactiveness of SetMargin.
Differently from the previous two settings, here users were sampled from a standard normal distribution (as in [Teso, Dragone, and Passerini2017]) and from a uniform distribution in the range . Not having a onehot encoded feature vector, negative weights are useful to capture the user dislikes. The contexts are uniformly sampled from the combinations of 2 or 3 cities. As in the previous experiment, we employ a time cutoff of 20 seconds. We run this experiment with to show how different set sizes affect the performance of the system. Since this experiment is contextbased, we let the algorithm run for exactly 100 iterations. Figure 2 (right) reports the median average regret and the median cumulative running time.
The plots show that in both cases there is a significant decrease in average regret with over , in exchange for increased running time; performs better than for about 40 iterations, but then worsens considerably. This is probably due to the timeout, which in this more complicated setting may substantially hinder the MILP solver. Increasing the cutoff to 60 seconds however did not improve the results (data not shown). This indicates that larger values of may be too costly to compute without further approximations, as is also the case for the other competitors.
Choosing
While our theoretical analysis is agnostic on the number of objects in a query set, in our empirical analysis we collected some insight on how to choose on the basis of the difficulty of the underlying optimization problem. While in general a larger is more informative, it is not always possible to solve the query selection problem to optimality. This may severely hinder the learning capabilities of the algorithm, as in the case of the trip planning setting with . On the other hand, for smaller problems a larger may significantly reduce the number of iterations needed to reach an optimal solution, as for the PC configuration setting. There is, therefore, a tradeoff that depends on the computational complexity of the query selection problem of the application at hand. From our experiments, we can infer, as a rule of thumb, that it is usually better to choose larger () when objects are small and the selection problem easier to solve, whereas a smaller () is preferable when the objects are large and difficult to select. Additionally, the larger the objects, the harder it is for the user to choose the best in the set, so a smaller is also desirable to reduce the cognitive load on the user.
Conclusion
We presented the Choice Perceptron, an algorithm for preference elicitation from noisy choice feedback. Contrary to existing recommenders, cp can solve constructive elicitation problems over arbitrary combinatorial spaces, composed of many Boolean, integer and continuous variables and constraints. Our theoretical analysis shows that, under a very general assumption, the average regret suffered by cp is upper bounded by . The exact constants appearing in the bound depend on intuitive properties of the query selection strategy at hand. We further described a strategy that aims at controlling these constants. We applied cp to constructive preference elicitation tasks for progressively more complex combinatorial structures. Not only cp is the only method expressive enough to deal with all of these problems, but it is also more performant than the alternatives in terms of recommendation quality and runtime.
In the future, we plan to research more informed query selection strategies, e.g. by leveraging estimates of
during query selection. Other possible directions include exploring different update rules. As mentioned, this algorithm and the analysis could be also extended to perform exponentiated updates or handle generic convex loss functions
[Shivaswamy and Joachims2015]. Finally, a deeper investigation on the optimal size of the query set and its possible adaptation during the interaction process could be useful to find an appropriate tradeoff between informativeness and complexity.References
 [Bollen et al.2010] Bollen, D.; Knijnenburg, B. P.; Willemsen, M. C.; and Graus, M. 2010. Understanding choice overload in recommender systems. In RecSys’10, 63–70. ACM.
 [Bradley and Terry1952] Bradley, R. A., and Terry, M. E. 1952. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika 39(3/4):324–345.

[Collins2002]
Collins, M.
2002.
Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms.
In ACL’02, volume 10, 1–8.  [Domshlak et al.2011] Domshlak, C.; Hüllermeier, E.; Kaci, S.; and Prade, H. 2011. Preferences in ai: An overview. Artificial Intelligence 175(78):1037–1052.

[Dragone et al.2016]
Dragone, P.; Erculiani, L.; Chietera, M. T.; Teso, S.; and Passerini, A.
2016.
Constructive layout synthesis via coactive learning.
In
Constructive Machine Learning workshop, NIPS
.  [Louviere, Hensher, and Swait2000] Louviere, J. J.; Hensher, D. A.; and Swait, J. D. 2000. Stated choice methods: analysis and applications. Cambridge University Press.
 [Luce1959] Luce, R. D. 1959. Individual choice behavior: A theoretical analysis.
 [Mcfadden2001] Mcfadden, D. 2001. Economic choices. American Economic Review 91:351–378.
 [Nethercote et al.2007] Nethercote, N.; Stuckey, P. J.; Becket, R.; Brand, S.; Duck, G. J.; and Tack, G. 2007. Minizinc: Towards a standard cp modelling language. In CP. 529–543.
 [Pigozzi, Tsoukiàs, and Viappiani2016] Pigozzi, G.; Tsoukiàs, A.; and Viappiani, P. 2016. Preferences in artificial intelligence. Ann. Math. Artif. Intell. 77(34):361–401.
 [Plackett1975] Plackett, R. L. 1975. The analysis of permutations. Applied Statistics 193–202.
 [Pu and Chen2009] Pu, P., and Chen, L. 2009. Userinvolved preference elicitation for product search and recommender systems. AI magazine 29(4):93.
 [Raman et al.2013] Raman, K.; Joachims, T.; Shivaswamy, P.; and Schnabel, T. 2013. Stable coactive learning via perturbation. In ICML (3), 837–845.
 [Shivaswamy and Joachims2012] Shivaswamy, P., and Joachims, T. 2012. Online structured prediction via coactive learning. In ICML, 1431–1438.
 [Shivaswamy and Joachims2015] Shivaswamy, P., and Joachims, T. 2015. Coactive Learning. JAIR 53:1–40.
 [Teso, Dragone, and Passerini2017] Teso, S.; Dragone, P.; and Passerini, A. 2017. Coactive critiquing: Elicitation of preferences and features. In AAAI.
 [Teso, Passerini, and Viappiani2016] Teso, S.; Passerini, A.; and Viappiani, P. 2016. Constructive preference elicitation by setwise maxmargin learning. In IJCAI, 2067–2073.
 [Toubia, Hauser, and Simester2004] Toubia, O.; Hauser, J. R.; and Simester, D. I. 2004. Polyhedral methods for adaptive choicebased conjoint analysis. Journal of Marketing Research 41(1):116–131.
 [Viappiani and Boutilier2009] Viappiani, P., and Boutilier, C. 2009. Regretbased optimal recommendation sets in conversational recommender systems. In RecSys, 101–108. ACM.
 [Viappiani and Boutilier2010] Viappiani, P., and Boutilier, C. 2010. Optimal bayesian recommendation sets and myopically optimal choice query sets. In NIPS, 2352–2360.
 [Viappiani and Boutilier2011] Viappiani, P., and Boutilier, C. 2011. Recommendation sets and choice queries: there is no exploration/exploitation tradeoff! In AAAI.
 [Viappiani and Kroer2013] Viappiani, P., and Kroer, C. 2013. Robust optimization of recommendation sets with the maximin utility criterion. In ADT’13, 411–424. Springer.