Introduction
In constructive preference elicitation (CPE) the recommender aims at suggesting a custom or novel product to a customer [Teso, Passerini, and Viappiani2016]
. The product is assembled onthefly from components or synthesized anew by solving a combinatorial optimization problem. The suggested products should of course satisfy the customer’s preferences, which however are unobserved and must be learned
interactively [Pigozzi, Tsoukiàs, and Viappiani2016]. Learning proceeds iteratively: the learner presents one or more candidate recommendations to the customer, and employs the obtained feedback to estimate the customer’s preferences. Applications include recommending custom PCs or cars, suggesting touristic travel plans, designing room and building layouts, and producing recipe modifications, among others.
A major weakness of existing CPE methods [Teso, Passerini, and Viappiani2016, Teso, Dragone, and Passerini2017] is that they require the user to provide feedback on complete configurations. In realworld constructive problems such as trip planning and layout design, configurations can be large and complex. When asked to evaluate or manipulate a complex product, the user may become overwhelmed and confused, compromising the reliability of the obtained feedback [Mayer and Moreno2003]. Human decision makers can revert to a potentially uninformative prior when problem solving exceeds their available resources. This effect was observed in users tasked with solving simple SAT instances (three variables and eight clauses) [Ortega and Stocker2016]
. In comparison, even simple constructive problems can involve tens of categorical variables and features, in addition to hard feasibility constraints. On the computational side, working with complete configurations poses scalability problems as well. The reason is that, in order to select recommendations and queries, constructive recommenders employ constraint optimization techniques. Clearly, optimization of complete configurations in large constructive problems can become computationally impractical as the problem size increases.
Here we propose to exploit factorized utility functions [Braziunas and Boutilier2009], which occur very naturally in constructive problems, to work with partial configurations. In particular, we show how to generalize Coactive Learning (CL) [Shivaswamy and Joachims2015] to partwise inference and learning. CL is a simple, theoretically grounded algorithm for online learning and preference elicitation. It employs a very natural interaction protocol: at each iteration the user is presented with a single, appropriately chosen candidate configuration and asked to improve it (even slightly). In [Teso, Dragone, and Passerini2017], it was shown that CL can be lifted to constructive problems by combining it with a constraint optimization solver to efficiently select the candidate recommendation. Notably, the theoretical guarantees of CL remain intact in the constructive case.
Our partwise generalization of CL, dubbed pcl, solves the two aforementioned problems in one go: (i) by presenting the user with partial configurations, it is possible to (substantially) lessen her cognitive load, improving the reliability of the feedback and enabling learning in larger constructive tasks; (ii) in combinatorial constructive problems, performing inference on partial configurations can be exponentially faster than on complete ones. Further, despite being limited to working with partial configurations, pcl can be shown to still provide local optimality guarantees in theory, and to perform well in practice.
This paper is structured as follows. In the next section we overview the relevant literature. We present pcl in the Method section, followed by a theoretical analysis. The performance of pcl are then illustrated empirically on one synthetic and two realistic constructive problems. We close the paper with some concluding remarks.
Related Work
Generalized additive independent (GAI) utilities have been thoroughly explored in the decision making literature [Fishburn1967]. They define a clear factorization mechanism, and offer a good trade off between expressiveness and ease of elicitation [Chajewska, Koller, and Parr2000, Gonzales and Perny2004, Braziunas and Boutilier2009]. Most of the early work on GAI utility elicitation is based on graphical models, e.g. UCP and GAI networks [Gonzales and Perny2004, Boutilier, Bacchus, and Brafman2001]. These approaches aim at eliciting the full utility function and rely on the comparison of full outcomes. Both of these are infeasible when the utility involves many attributes and features, as in realistic constructive problems.
Like our method, more recent alternatives [Braziunas and Boutilier2005, Braziunas and Boutilier2007] handle both partial elicitation, i.e. the ability of providing recommendations without full utility information, and local queries, i.e. elicitation of preference information by comparing only (mostly) partial outcomes. There exist both Bayesian [Braziunas and Boutilier2005] and regretbased [Braziunas and Boutilier2007, Boutilier et al.2006] approaches, which have different shortcomings. Bayesian methods do not scale to even small size constructive problems [Teso, Passerini, and Viappiani2016], such as those occurring when reasoning over individual parts in constructive settings. On the other hand, regretbased methods require the user feedback to be strictly selfconsistent, an unrealistic assumption when interacting with nonexperts. Our approach instead is specifically designed to scale to larger constructive problems and, being derived from Coactive Learning, natively handles inconsistent feedback. Crucially, unlike pcl, these local elicitation methods also require to perform a number of queries over complete configurations to calibrate the learned utility function. In larger constructive domains this is both impractical (on the user side) and computationally infeasible (on the learner side).
Our work is based on Coactive Learning (CL) [Shivaswamy and Joachims2015], a framework for learning utility functions over structured domains, which has been successfully applied to CPE [Teso, Dragone, and Passerini2017, Dragone et al.2016]. When applied to constructive problems, a crucial limitation of CL is that the learner and the user interact by exchanging complete configurations. Alas, inferring a full configuration in a constructive problem can be computationally demanding, thus preventing the elicitation procedure from being realtime. This can be partially addressed by performing approximate inference, as in [Raman, Shivaswamy, and Joachims2012], at the cost of weaker learning guarantees. A different approach has been taken in [Goetschalckx, Fern, and Tadepalli2014], where the exchanged (complete) configurations are only required to be locally optimal, for improved efficiency. Like pcl, this method guarantees the local optimality of the recommended configuration. All of the previous approaches, however, require the user to improve a potentially large complete configuration. This is a cognitively demanding task which can become prohibitive in large constructive problems, even for domain experts, thus hindering feedback quality and effective elicitation. By dealing with parts only, pcl avoids this issue entirely.
Method
Notation.
We use rather standard notation: scalars
are written in italics and column vectors
in bold. The inner product of two vectors is indicated as , the Euclidean norm as and the maxnorm as . We abbreviate to , and indicate the complement of as .Setting.
Let be the set of candidate structured products. Contrarily to what happens in standard preference elicitation, in the constructive case is defined by a set of hard constraints rather than explicitly enumerated^{1}^{1}1In this paper, “hard” constraints refer to the constraints delimiting the space of feasible configurations, as opposed to “soft” constraints, which determine preferences over feasible configurations [Meseguer, Rossi, and Schiex2006].. Products are represented by a function that maps them to an
dimensional feature space. While the feature map can be arbitrary, in practice we will stick to features that can be encoded as constraints in a mixedinteger linear programming problem, for efficiency; see the Empirical Analysis section for details. We only assume that the features are bounded, i.e.
it holds that for some fixed .As is common in multiattribute decision theory [Keeney and Raiffa1976], we assume the desirability of a product to be given by a utility function that is linear in the features, i.e., . Here the weights encode the true user preferences, and may be positive, negative, or zero (which means that the corresponding feature is irrelevant for the user). Utilities of this kind can naturally express rich constructive problems [Teso, Passerini, and Viappiani2016, Teso, Dragone, and Passerini2017].
Parts.
Here we formalize what parts and partial configurations are, and how they can be manipulated. We assume to be given a set of basic parts . A part is any subset of the set of basic parts. Given a part and an object , indicates the partial configuration corresponding to . We require that the union of the basic parts reconstructs the whole object, i.e. for all . The proper semantics of the decomposition into basic parts is taskspecific. For instance, in a scheduling problem a month may be decomposed into days, while in interior design a house may be decomposed into rooms. Analogously, the nonbasic parts could then be weeks or floors, respectively. In general, any combination of basic parts is allowed. We capture the notion of combination of partial configurations with the part combination operator , so that . We denote the complement of part as , which satisfies for all .
Each basic part is associated to a feature subset , which contains all those features that depend on (and only those). In general, the sets may overlap, but we do require each basic part to be associated to some features that do not depend on any other basic part , i.e. that for all . The features associated to a part are defined as . Since the union of the basic parts makes up the full object, we also have that .
GAI utility decomposition.
In the previous section we introduced a decomposition of configurations into parts. In order to elicit the user preferences via partwise interaction, which is our ultimate goal, we need to decompose the utility function as well. Given a part and its feature subset , let its partial utility be:
If the basic parts have no shared features, the utility function is additive: it is easy to verify that . In this case, each part can be managed independently of the others, and the overall configuration maximizing the utility can be obtained by separately maximizing each partial utility and combining the resulting partwise configurations.
However, in many applications of interest the feature subsets do overlap. In a travel plan, for instance, one can be interested in alternating cultural and leisure activities in consecutive days, in order to make the experience more diverse and enjoyable. In this case, the above decomposition does not apply anymore as the basic parts may depend on each other through the shared features. Nonetheless, it can be shown that our utility function is generalized additive independent (GAI) over the feature subsets of the basic parts. Formally, a utility is GAI if and only if, given feature subsets , it can be decomposed into independent subutilities [Braziunas and Boutilier2005]:
where each subutility can only depend on the features in (but does not need to depend on all of them). This decomposition enables applying ideas from the GAI literature to produce a welldefined partwise elicitation protocol. Intuitively, we will assign features to subutilities so that whenever a feature is shared by multiple parts, only the subutility corresponding to one of them will depend on that feature.
We will now construct a suitable decomposition of into independent subutilities. Fix some order of the basic parts , and let:
for all . We define the subutilities as for all . By summing up the subutilities for all parts, we obtain a utility where each feature is computed exactly once, thus recovering the full utility :
The GAI decomposition allows to elicit each subutility separately. By doing so, however, we end up ignoring some of the dependencies between parts, namely the features in . This is the price to pay in order to achieve decomposition and partwise elicitation, and it may lead to suboptimal solutions if too many dependencies are ignored. It is therefore important to minimize the broken dependencies by an appropriate ordering of the parts. Going back to the travel planning with diversifying features example, consider a multiday trip. Here the parts may refer to individual days, and includes all features of day , including the features relating it to the other days, e.g. the alternation of cultural and leisure activities. Note that the ’s overlap. On the other hand, the ’s are subset of features chosen so that every feature only appears once. A diversifying feature relating days 3 and 4 of the trip is either assigned to or , but not both.
One way to control the ignored dependencies is by leveraging GAI networks [Gonzales and Perny2004]. A GAI network is a graph whose nodes represent the subsets and whose edges connect nodes sharing at least one feature. Algorithm 1 presents a simple and effective solution to provide an ordering. It builds a GAI network from and sorts the basic parts in ascending order of node degree (number of incoming and outgoing edges). By ordering last the subsets having intersections with many other parts, this ordering attempts to minimize the lost dependencies in the above decomposition (Eq. GAI utility decomposition.). This is one possible way to order the parts, which we use as an example; more informed or taskspecific approaches could be devised.
The pcl algorithm.
The pseudocode of our algorithm, pcl, is listed in Algorithm 2. pcl starts off by sorting the basic parts, producing an ordering . Algorithm 1 could be employed or any other (e.g. taskspecific) sorting solution. Then it loops for iterations, maintaining an estimate of the user weights as well as a complete configuration . The starting configuration should be a reasonable initial guess, depending on the task. At each iteration , the algorithm selects a part using the procedure SelectPart (see below). Then it updates the object by inferring a new partial configuration while keeping the rest of fixed, that is . The inferred partial configuration is optimal with respect to the local subutility given . Note that inference is over the partial configuration only, and therefore can be exponentially faster than inference over full configurations.
Next, the algorithm presents the inferred partial configuration as well as some contextual information (see below). The user is asked to produce an improved partial configuration according to the her own preferences, while the rest of the object is kept fixed. We assume that a user is satisfied with a partial configuration if she cannot improve it further, or equivalently when the object is conditionally optimal with respect to part given the rest of the object (the formal definition of conditional optimality is given in the Analysis section). When a user is satisfied with a partial configuration, she returns , thereby implying no change in the weights .
After receiving an improvement, if the user is not satisfied, the weights are updated through a perceptron step. The subset
of weights that are actually updated depends on whether is negative or (strictly) positive. Since we perform inference on , we have that . The user improvement can, however, potentially change all the features in . Intuitively, the weights associated to a subset of features should change only if the utility computed on this subset ranks lower than . The algorithm therefore checks whether , in which case the weights associated to the whole subset should be updated. If this condition is not met, instead, the algorithm can only safely update the weights associated to , which, as said, meet this condition by construction.As for the SelectPart procedure, we experimented with several alternative implementations, including prioritizing parts with a large feature overlap () and banditbased strategies aimed at predicting a surrogate of the utility gain (namely, a variant of the UCB1 algorithm [Auer, CesaBianchi, and Fischer2002]). Preliminary experiments have shown that informed strategies do not yield a significant performance improvement over the random selection stategy; hence we stick with the latter in all our experiments.
The algorithm stops either when the maximum number of iterations is met or when a “local optimum” has been found. For ease of exposition we left out the latter case from Algorithm 2, but we explain what a local optimum is in the following Analysis section; the stopping criterion will follow directly from Proposition 1.
Interacting through parts.
In order for the user to judge the quality of a suggested partial configuration , some contextual information may have to be provided. The reason is that, if depends on other parts via shared features, these have to be somehow communicated to the user, otherwise his/her improvement will not be sufficiently informed.
We distinguish two cases, depending on whether the features of are local or global. Local features only depend on small, localized portions of . This is for instance the case for features that measure the diversity of consecutive activities in a touristic trip plan, which depend on consecutive time slots or days only. Here the context amounts to those other portions of that share local features with . For instance, the user may interact over individual days only. If the features are local, the context is simply the time slots before and after the selected day. The user is free to modify the activities scheduled that day based on the context, which is kept fixed.
On the other hand, global features depend on all of (or large chunks of it). For instance, in house furnishing one may have features that measure whether the total cost of the furniture is within a given budget, or how much the cost surpasses the budget. A naive solution would be that of showing the user the whole furniture arrangement , which can be troublesome when is large. A better alternative is to present the user a summary of the global features, in this case the percentage of the used budget. Such a summary would be sufficient for producing an informed improvement, independently from the actual size of .
Of course, the best choice of context format is application specific. We only note that, while crucial, the context only provides auxiliary information to the user, and does not affect the learning algorithm directly.
Analysis
In preference elicitation, it is common to measure the quality of a recommended (full) configuration in terms of the regret:
where is the true, unobserved user utility and is a truly optimal configuration. In pcl, interaction with the user occurs via partial configurations, namely and . Since the regret is defined in terms of complete configurations, it is difficult to analyze it directly based on information about the partial configurations alone, making it hard to prove convergence to globally optimal recommendations.
The aim of this analysis is, however, to show that our algorithm converges to a locally optimal configuration, which is in line with guarantees offered by other Coactive Learning variants [Goetschalckx, Fern, and Tadepalli2014]; the latter, however still rely on interaction via complete configurations. Here a configuration is a local optimum for if no partwise modification can improve with respect to . Formally, is a local optimum for if and only if:
To measure local quality of a configuration with respect to a part , we introduce the concept of conditional regret of the partial configuration given the rest of the object :
where . Notice that:
since .
We say that a partial configuration is conditionally optimal with respect to part if . The following lemma gives sufficient and necessary conditions for local optimality of a configuration .
Lemma 1.
A configuration is locally optimal with respect to if and only if is conditionally optimal for with respect to all basic parts .
Proof.
By contradiction. (i) Assume that is locally optimal but not conditionally optimal with respect to . Then , and thus there exists a partial configuration such that . This violates the local optimality of (Eq. 4). (ii) Assume that all partial configurations are conditionally optimal but is not locally optimal. Then there exists a part and a partial configuration such that . This in turn means that . This violates the conditional optimality of with respect to . ∎
The above lemma gives us a partwise measurable criterion to determine if a configuration is a local optimum through the conditional regret of for all the provided parts.
The rest of the analysis is devoted to derive an upper bound on the conditional regret incurred by the algorithm and to prove that pcl eventually reaches a local optimum.
In order to derive the bound, we rely on the concept of informativeness from [Shivaswamy and Joachims2015], adapting it to partwise interaction^{2}^{2}2Here we adopted the definition of strict informativeness for simplicity. Our results can be directly extended to the more general notions of informativeness described in [Shivaswamy and Joachims2015].. A user is conditionally informative if, when presented with a partial configuration , he/she provides a partial configuration that is at least some fraction better than in terms of conditional regret, or more formally:
(1) 
In the rest of the paper we will use the notation meaning , i.e. drop the complement, when no ambiguity can arise.
At all iterations , the algorithm updates the weights specified by , producing a new estimate of . The actual indices depend on the condition at line 9 of Algorithm 2: at some iterations includes all of , while at others is restricted to . We distinguish between these cases by:
so that if then , and if . For all , the quality of is:
Therefore if the second summand in the last equation, the utility gain , is positive, the update produces a better weight estimate .
Since the user is conditionally informative, the improvement always satisfies . When , we have , and thus the utility gain is guaranteed to be positive. On the other hand, when we have and the utility gain reduces to . In this case the update ignores the weights in , “missing out” a factor of .
We compactly quantify the missing utility gain as:
Note that can be positive, null or negative for . When is negative, making the update on only actually avoids a loss in utility gain.
We now prove that pcl minimizes the average conditional regret as for conditionally informative users.
Theorem 1.
For a conditionally informative user, the average conditional regret of pcl after iterations is upper bounded by:
Proof.
The following is a sketch^{3}^{3}3The complete proof can be found in the Supplementary Material.. We start by splitting the iterations into the and sets defined above, and bound the norm . In both cases we find that . We then expand the term for iterations in both and , obtaining:
With few algebraic manipulations we obtain:
Which we then bound using the CauchySchwarz inequality:
Applying the conditional informative feedback (Eq. 1) and rearranging proves the claim. ∎
Theorem 1 ensures that the average conditional regret suffered by our algorithm decreases as . This alone, however, does not prove that the algorithm will eventually arrive at a local optimum, even if , for some . This is due to the fact that partial inference is performed keeping the rest of the object fixed. Between iterations an inferred part may change as a result of a change of the other parts in previous iterations. The object could, in principle, keep changing at every iterations, even if is always equal to . The next proposition, however, shows that this is not the case thanks to the utility decomposition we employ.
Proposition 1.
Let such that and for all . The configuration is a local optimum.
Proof.
Sketch. The proof procedes by strong induction. We first show that for all , as only depends on the features in and by assumption for all . By strong induction, assuming that for all and all , we can easily show that as well.
Now, for all , therefore for all . Since for all by assumption, then is a local optimum and will not change for all .
∎
The algorithm actually reaches a local optimum at , but it needs to double check all the parts in order to be sure that the configuration is actually a local optimum. This justifies a termination criterion that we use in practice: if the algorithm completes two full runs over all the parts, and the user can never improve any of the recommended partial configurations, then the full configuration is guaranteed to be a local optimum, and the algorithm can stop. As mentioned, we employ this criterion in our implementation but we left it out from Algorithm 2 for simplicity.
Empirical Analysis
We ran pcl on three constructive preference elicitation tasks of increasing complexity, comparing different degrees of user informativeness. According to our experiments, informativeness is the most critical factor. The three problems involve rather large configurations, which can not be handled by coactive interaction via complete configurations. For instance, in [Ortega and Stocker2016] the user is tasked to solve relatively simple SAT instances over three variables and (at most) eight clauses; in some cases users were observed to show signs of cognitive overload. In comparison, our simplest realistic problem involve 35 categorical variables (with 8 possible values) and 74 features, plus additional hard constraints. As a consequence, Coactive Learning can not be applied asis, and partwise interaction is necessary.
In all of these settings, partwise inference is cast as a mixed integer linear problem (MILP), and solved with Gecode^{4}^{4}4http://www.gecode.org/. Despite being NPhard in general, MILP solvers can be very efficient on practical instances. Efficiency is further improved by inferring only partial configurations. Our experimental setup is available at https://github.com/unitnsml/pcl.
We employed a user simulation protocol similar to that of [Teso, Dragone, and Passerini2017]. First, for each problem, we sampled vectors
at random from a standard normal distribution. Then, upon receiving a recommendation
, an improvement is generated by solving the following problem:s.t.  
This formulation clearly satisfies the conditional informativeness assumption (Eq. 1).
Synthetic setting.
We designed a simple synthetic problem inspired by spin glass models, see Figure 1 for a depiction. In this setting, a configuration consists of a grid. Each node in the grid is a binary 01 variable. Adjacent nodes are connected by an edge, and each edge is associated to an indicator feature that evaluates to if the incident nodes have different values (green in the figure), and to otherwise (red in the figure). The utility of a configuration is simply the weighted sum of the values of all features (edges). The basic parts consist of all the nonoverlapping subgrids of , for a total of basic parts (indicated by dotted lines in the figure).
Since the problem is small enough for inference of complete configurations to be practical, we compared pcl to standard Coactive Learning, using the implementation of [Teso, Dragone, and Passerini2017]. In order to keep the comparison as fair as possible, the improvements fed to CL were chosen to match the utility gain obtained by pcl. We further report the performance of three alternative part selection strategies: random, smallest (most independent) part first, and UCB1.
axis is the number of iterations, while the shaded areas represent the standard deviation. Best viewed in color.
The results can be found in the first column of Figure 2. We report both the regret (over complete configurations) and the cumulative runtime of all algorithms, averaged over all users, as well as their standard deviation. The regret plot shows that, despite being restricted to work with configurations, pcl does recommend complete configurations of quality comparable to CL after enough queries are made. Out of the three part selection strategies, random performs best, with the other two more informed alternatives (especially smallest first) quite close. The runtime gap between full and partwise inference is already clear in this small synthetic problem; complete inference quickly becomes impractical as the problem size increases.
Training planning.
Generating personalized training plans based on performance and health monitoring has received a lot of attention recently in sport analytics (see e.g. [Fister et al.2015]). Here we consider the problem of synthesizing a weeklong training plan from information about the target athlete. Each day includes 5 time slots (two for the morning, two for the afternoon, one for the evening), for slots total. We assume to be given a fixed number of training activities ( in our experiments: walking, running, swimming, weight lifting, pushups, squats, abs), as well as knowledge of the slots in which the athlete is available. The training plan associates an activity to each slot where the athlete is available. Our formulation tracks the amount of improvement (e.g. power increase) and fatigue over five different body parts (arms, torso, back, legs, and heart) induced by performing an activity for one time slot. Each day defines a basic part.
The mapping between training activity and improvement/fatigue over each body part is assumed to be provided externally. It can be provided by the athlete or medical personnel monitoring his/her status. The features of include, for each body part, the total performance gain and fatigue, computed over the recommended training plan according to the aforementioned mapping. We further include interpart features to capture activity diversity in consecutive days. The fatigue accumulated in 3 consecutive time slots in any body parts does not exceed a given threshold, to prevent injuries.
In this setting, CL is impractical from both the cognitive and computational points of view. We ran pcl and evaluated the impact of user informativeness by progressively increasing from , to , to . The results can be seen in Figure 2. The plots show clearly that, despite the complexity of the configuration and constraints, pcl can still produce very lowregret configurations after about 50 iterations or less.
Understandably, the degree of improvement plays an important role in the performance of pcl and, consequently, in its runtime (users at convergence do not contribute to the runtime), at least up to . Recall, however, that the improvements are partwise, and hence quantifies the degree of local improvement: part improvements may be very informative on their own, but only give a modest amount of information about the full configuration. However, it is not unreasonable to expect that users to be very informative when presented with reasonably sized (and simple) parts. Crucially pcl allows the system designer to define the parts appropriately depending on the application.
Hotel planning.
Finally, we considered a complex furniture allocation problem: furnishing an entire hotel. The problem is encoded as follows. The hotel is represented by a graph: nodes are rooms and edges indicate which rooms are adjacent. Rooms can be of three types: normal rooms, suites, and dorms. Each room can hold a maximum number of furniture pieces, each associated to a cost. Additional, fixed nodes represent bathrooms and bars. The type of a room is decided dynamically based on its position and furniture. For instance, a normal room must contain at most three single or double beds, no bunk beds, and a table, and must be close to a bathroom. A suite must contain one bed, a table and a sofa, and must be close to a bathroom and a bar. Each room is a basic part, and there are 15 rooms to be allocated.
The feature vector contains global features plus local features per room. The global features include different functions of the number of different types of rooms, the total cost of the furniture and the total number of guests. The local features include, instead, characteristics of the current room, such as its type or the amount of furniture, and other features shared by adjacent rooms, e.g. whether two rooms have the same type. These can encode preferences like “suites and dorms should not be too close”, or “the hotel should maintain high quality standards while still being profitable”. Given the graph structure, room capacities, and total budget, the goal is to furnish all rooms according to the user’s preferences.
This problem is hard to solve to optimality with current solvers; partbased inference alleviates this issue by focusing on individual rooms. There are 15 rooms in the hotel, so that at each iteration only 1/15 of the configuration is affected. Furthermore, the presence of the global features implies dependences between all rooms. Nonetheless, the algorithm manages to reduce the regret by an order of magnitude in around a 100 iterations, starting from a completely uninformed prior. Note also that as for the training planning scenario, an alpha of 0.3 achieves basically the same results as those for alpha equal to 0.5.
Conclusion
In this work we presented an approach to constructive preference elicitation able to tackle large constructive domains, beyond the reach of previous approaches. It is based on Coactive Learning [Shivaswamy and Joachims2015], but only requires inference of partial configurations and partial improvement feedback, thereby significantly reducing the cognitive load of the user. We presented an extensive theoretical analysis demonstrating that, despite working only with partial configurations, the algorithm converges to a locally optimal solution. The algorithm has been evaluated empirically on three constructive scenarios of increasing complexity, and shown to perform well in practice.
Possible future work includes improving partbased interaction by exchanging additional contextual information (e.g. features [Teso, Dragone, and Passerini2017] or explanations) with the user, and applying pcl to large layout synthesis problems [Dragone et al.2016].
References
 [Auer, CesaBianchi, and Fischer2002] Auer, P.; CesaBianchi, N.; and Fischer, P. 2002. Finitetime analysis of the multiarmed bandit problem. Machine learning 47(23):235–256.

[Boutilier, Bacchus, and
Brafman2001]
Boutilier, C.; Bacchus, F.; and Brafman, R. I.
2001.
Ucpnetworks: A directed graphical representation of conditional
utilities.
In
Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
, 56–64. Morgan Kaufmann Publishers Inc.  [Boutilier et al.2006] Boutilier, C.; Patrascu, R.; Poupart, P.; and Schuurmans, D. 2006. Constraintbased optimization and utility elicitation using the minimax decision criterion. Artificial Intelligence 170(89):686–713.
 [Braziunas and Boutilier2005] Braziunas, D., and Boutilier, C. 2005. Local utility elicitation in GAI models. In Proceedings of the TwentyFirst Conference on Uncertainty in Artificial Intelligence, 42–49. AUAI Press.
 [Braziunas and Boutilier2007] Braziunas, D., and Boutilier, C. 2007. Minimax regret based elicitation of generalized additive utilities. In UAI, 25–32.
 [Braziunas and Boutilier2009] Braziunas, D., and Boutilier, C. 2009. Elicitation of factored utilities. AI Magazine 29(4):79.
 [Chajewska, Koller, and Parr2000] Chajewska, U.; Koller, D.; and Parr, R. 2000. Making rational decisions using adaptive utility elicitation. In AAAI/IAAI, 363–369.
 [Dragone et al.2016] Dragone, P.; Erculiani, L.; Chietera, M. T.; Teso, S.; and Passerini, A. 2016. Constructive layout synthesis via coactive learning. In Constructive Machine Learning workshop, NIPS.
 [Fishburn1967] Fishburn, P. C. 1967. Interdependence and additivity in multivariate, unidimensional expected utility theory. International Economic Review 8(3):335–342.
 [Fister et al.2015] Fister, I.; Rauter, S.; Yang, X.S.; and Ljubič, K. 2015. Planning the sports training sessions with the bat algorithm. Neurocomputing 149:993–1002.
 [Goetschalckx, Fern, and Tadepalli2014] Goetschalckx, R.; Fern, A.; and Tadepalli, P. 2014. Coactive learning for locally optimal problem solving. In Proceedings of AAAI.
 [Gonzales and Perny2004] Gonzales, C., and Perny, P. 2004. GAI networks for utility elicitation. KR 4:224–234.
 [Keeney and Raiffa1976] Keeney, R. L., and Raiffa, H. 1976. Decisions with Multiple Objectives: Preferences and Value Tradeoffs.
 [Mayer and Moreno2003] Mayer, R. E., and Moreno, R. 2003. Nine ways to reduce cognitive load in multimedia learning. Educational psychologist 38(1):43–52.
 [Meseguer, Rossi, and Schiex2006] Meseguer, P.; Rossi, F.; and Schiex, T. 2006. Soft constraints. Foundations of Artificial Intelligence 2:281–328.
 [Ortega and Stocker2016] Ortega, P. A., and Stocker, A. A. 2016. Human decisionmaking under limited time. In Advances in Neural Information Processing Systems, 100–108.
 [Pigozzi, Tsoukiàs, and Viappiani2016] Pigozzi, G.; Tsoukiàs, A.; and Viappiani, P. 2016. Preferences in artificial intelligence. Ann. Math. Artif. Intell. 77(34):361–401.
 [Raman, Shivaswamy, and Joachims2012] Raman, K.; Shivaswamy, P.; and Joachims, T. 2012. Online learning to diversify from implicit feedback. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 705–713. ACM.
 [Shivaswamy and Joachims2015] Shivaswamy, P., and Joachims, T. 2015. Coactive learning. JAIR 53:1–40.
 [Teso, Dragone, and Passerini2017] Teso, S.; Dragone, P.; and Passerini, A. 2017. Coactive critiquing: Elicitation of preferences and features. In AAAI.
 [Teso, Passerini, and Viappiani2016] Teso, S.; Passerini, A.; and Viappiani, P. 2016. Constructive preference elicitation by setwise maxmargin learning. In Proceedings of the TwentyFifth International Joint Conference on Artificial Intelligence, 2067–2073. AAAI Press.
Appendix A Supplementary Material
Proof of Theorem 2
We begin by expanding . If :
Since , , thus:
If instead :
We can therefore expand the term:
Applying CauchySchwarz inequality:
The LHS of the above inequality expands to:
And thus:
We add and subtract the term:
We obtain: