E-Commerce personalization has been recognized as one of the most effective methods in increasing sales (Yang and Padmanabhan 2005, Ansari and Mela 2003). For example, Amazon and other retail giants provide personalized product recommendations based on their customers’ interest. Gartner predicts that by 2020, those who successfully handle personalization in E-Commerce will increase their profits by up to 15%. The starting point of any personalization effort is to obtain a clearer picture of individual customers, this is often done by customer segmentation, e.g., partition a customer base into groups of individuals with similar characteristics that are relevant to marketing, such as a geographic location, interests, time of visit, etc. Since customer segmentation relies on both the quality and quantity of data collected from customers, it is critical to decide what data will be collected and how it will be collected. For returning customers, we can use their past behavior such as browsing history to perform customer segmentation and further personalize her current experience. But how to decide the segment a new customer belongs to without knowing her browsing history? One popular approach to gather data from first-time customers is personality quiz. The purpose of personality quiz is to segment every potential and current customer by actively asking them questions. After answering a few questions, customers are matched with the type of recommendations or product that best suit their responses. Marketers are starting to use it as an effective method of generating leads and increasing e-commerce sales. For example, several websites such as Warby Parker 111https://www.warbyparker.com/ and ipsy222https://www.ipsy.com/ are using personality quizzes to determine users’ interest profiles and make recommendations. As compared with other preference elicitation methods, personality quiz-based system requires significantly less effort from users and they expressed stronger intention to reuse this system and introduce it to others (Hu and Pu 2009a, b)
. Although the benefit of personality quiz has been well recognized, it is not clear, in general, how to optimize the quiz design so as to maximize this benefit. In this paper, we formulate the quiz design problem as a combinatorial optimization problem. We aim at selecting and further sequencing a group of questions from a given pool so as to maximize the quality of customer segmentation. While the design of each individual question such as formatting, coloring, the way the question is asked can be complex and important(Couper et al. 2001), that topic is out of scope of this paper. We exclusively focus on the question selection and sequencing problem by assuming that all candidate questions are pre-given.
The input of our problem is a set of attributes and a pool of candidate questions. We say a question covers an attribute if the answer to that question reveals the value of that attribute. For example, question “Where are you living?” covers attribute “location”. Intuitively, a group of “good” questions should cover as many important attributes as possible so as to minimize the uncertainty about the user. Given answers to a group of questions, we measure its uncertainty using the conditional entropy of the uncovered attributes of the user. Our ultimate goal is to select and sequence a group of questions so as to minimize the uncertainty subject to a set of practical constraints.
In general, our problem falls into the category of non-adaptive active learning based user profiling. The idea of most existing studies is to actively select a group of items, e.g., movies or cars, and asking for users’ feedback on them. This feedback, in turn, can help to enhance the performance of recommendation in the future. However, they often assume that the users are willing to provide feedback on all selected items, irrespective of which items are selected and in what sequence. As a consequence, their problem is reduced to a subset selection problem. We argue that this assumption may not hold in our problem, e.g., it has been shown that not all users are willing to share their personal information with a site. According to the survey conducted by Culnan (2001), two of three users abandon sites that asks for personal information and one of five users has provided false information to a site. This motivates us to consider a realistic but significantly more complicated user behavior model. Our model captures the externality of a question by allowing the user to “opt-out” of answering a question or even quit the quiz prematurely after answering some questions. A more detailed comparison between our work and related work is presented in Section 2. We next give a brief overview to some important constraints considered in this paper.
1.1 Cardinality Constraint
We can select up to questions to include in the quiz where is some positive integer. For example, it has been shown that 6-8 questions per quiz could be an appropriate setting since it maximizes completions and leads generated (https://socialmediaexplorer.com/content-sections/tools-and-tips/how-to-make-a-personalized-quiz-to-drive-sales/).
1.2 User Behavior
Our setting considers that the user behavior during a personality quiz can be described as a Markov process (a detailed description of this model is presented in Section 3.1
). The user interacts with a sequence of questions in order, after reading a question, she decides probabilistically whether or not to answer it with some question specific probability, calledanswer-through-rate. In principle, this probability could depend on many factors including the cognitive efforts required for understanding and answering the question, and the sensitivity of the question, etc. Our model also allows the user to select “Prefer Not to Answer” (PNA) option, if any, to avoid answering a particular question. A more detailed discussion on PNA option is provided in the next subsection. In addition, each question has a continuation probability, representing the likelihood that the user is willing to continue the quiz after interacting with the current one. This continuation probability captures the externality of a question, e.g., a very sensitive or lengthy question could cause the user to exit the quiz prematurely. The existence of such externality makes our problem even more complicated, e.g., the ordering of selected questions matters. For example, Typeform333https://www.typeform.com/surveys/question-types/, an online software as a service (SaaS) company that specializes in online form building and online surveys, suggests that it is better to put sensitive and demographic questions at the end of a quiz or survey: “Starting a survey with intimidating or demographic questions like age and income can put people off. Your first survey question should be interesting, light, and easy to answer. Once they’ve started, they’re more likely to finish and answer more sensitive questions.”
1.3 PNA option
Regarding the role played by PNA option in a quiz, there exist two contradicting arguments. On the one hand, several studies (Schuman and Presser 1996, Hawkins and Coney 1981) empirically demonstrate that the data and subsequent analyses will better off by including a PNA option due to it decreases the proportion of uninformed responses. On the other hand, opponents believe that providing a PNA option could negatively impact the quality of the answer because some users tend not to answer the question so as to minimize the effort required to complete the quiz (Poe et al. 1988, Sanchez and Morchio 1992). Since both arguments are empirically validated by previous studies, we decide to cover both cases in this work.
We next summarize the contributions made in this paper. We first show (in Section 3.2) that our problem subject to the above constraints is NP-hard. Then we develop a series of effective solutions with provable performance bound. For the case where PNA is not an option (in Section 4), our algorithm achieves an approximation ratio that is arbitrarily close to where is a constant whose value is arbitrarily close to 2.718. For the case where PNA is available (in Section 5), our algorithm achieves an approximation ratio that is arbitrarily close to . We subsequently consider a extension of the basic model by taking into account the slot-dependent decay factor (in Section 6). We assume that the answer-through-rate of a question does not only depend on its intrinsic quality, but also depends on its position. For the question selection and sequencing problem under this model, our algorithm achieves an approximation ratio that is arbitrarily close to (resp. ) when PNA is not an option (resp. is an option).
Most of notations are listed in Table 1.
|A set of questions.|
|A sorted sequence of .|
|The utility of answers to|
|The expected utility of displaying to the user.|
|(resp. )||Probability of answering (resp. PNA) after reading it.|
|(resp. )||Probability of continuing to read the next question after|
|answering (resp. PNA) .|
|Aggregated continuation probability of :|
|Reachability of given is displayed to the user.|
|Concatenation of two sequences and .|
|(resp. , , )||The subsequence of which is scheduled no later than|
|(resp. before, after, no earlier than) slot .|
|A random set of questions that are answered by the user given .|
|or||A random set obtained by including each question|
|independently with probability .|
2 Literature Review
Our paper falls into the general category of non-adaptive active learning supported personalization. This section reviews the literature on three topics that are closely related to our research.
2.1 Active Learning based Recommender System
Active learning, as a subfield of machine learning(Bishop 2006), has been widely used in the design of effective recommender systems. For the purpose of acquiring training data, active learning based recommender system often actively solicits customer feedback on a group of carefully selected items (Kohrs 2001, Rubens and Sugiyama 2007, Golbandi et al. 2011, Chang et al. 2015)
. Existing systems can be further classified into two categories:adaptive learning and non-adaptive learning. Non-adaptive learning refers to those learning strategies who require all users to rate the same set of items while adaptive learning (Boutilier et al. 2002, Golbandi et al. 2011, Rubens and Sugiyama 2007)
could propose different items to different users to rate. Since our paper belongs to the category of non-adaptive learning, we next give a detailed review to the state-of-art of non-adaptive learning based recommender system. Depending on the item selection rule, there are three types of strategies: uncertainty-reduction, error reduction and attention based. The goal of uncertainty-reduction based systems is to reduce the uncertainty about the users’ opinion about new items, they achieve this by selecting items with highest variance(Kohrs 2001, Teixeira et al. 2002) or highest entropy (Rashid et al. 2002) or highest entropy0 (Rashid et al. 2008). The goal of error reduction based systems is to minimize the system error (Golbandi et al. 2010, Liu et al. 2011, Cremonesi et al. 2010). The idea of attention-based strategy is to select the items that are most popular among the users (Golbandi et al. 2010, Cremonesi et al. 2010)
. Our problem is largely different from all aforementioned studies in terms of both application context and problem formulation: (1) Instead of investigating a particular recommender system, we study a general costumer segmentation problem whose solution serves as the foundation of any personalized service; (2) We are dealing with a significantly more complicated user behavior model where the user is allowed to pick PNA option or terminate the quiz prematurely. All aforementioned studies assume that the users are guaranteed to rate all selected items, regardless of the sequence of those items, thus their problem is reduced to a subset selection problem; (3) Most of existing studies are developing heuristics without provable performance bound, we develop the first series of algorithms that achieve bounded approximation ratios.
2.2 Learning Offer Set and Consideration Set
The other two related topics are “offer set” (Atahan and Sarkar 2011) and “consideration sets” (Roberts and Lattin 1991). Our problem differs from both “offer set” and “consideration sets” in fundamental ways. The focus of offer set is to investigate how the profile learning process can be accelerated by carefully selecting the links to display to the user. In (Atahan and Sarkar 2011), users implicitly compare alternative links and reveal their preferences based on the set of links offered, this is different from our model where users are explicitly asked to answer questions. The literature on consideration sets aims at determining the subset of brands/products that a customer may evaluate when making purchase decision. Their model did not capture the externality of a question e.g., the users are forced to answer all questions, thus the sequence of questions did not play a role in their non-adaptive solution. In addition, most of aforementioned studies did not provide any theoretical bounds on their proposed solutions. We consider a joint question selection and sequencing problem which is proved to be NP-hard (in Section 3.2), and theoretically bound the gap between our solution and the optimal solution.
2.3 Submodular Optimization
We later show that our problem is a submodular maximization problem. Although submodular maximization has been extensively studied in the literature (Nemhauser and Wolsey 1978, Nemhauser et al. 1978, Nemhauser and Wolsey 1981, Kawahara et al. 2009, Calinescu et al. 2011), most of them focus on subset selection problem where the ordering of selected elements does not affect its utility. Our work differs from theirs in that we consider a joint selection and sequencing problem. The only study that considers sequencing problem is (Tschiatschek et al. 2017), however their model and problem formulation are largely different from ours. They use a directed graph to model the ordered preferences and their objective is to find a sequence of nodes that covers as many edges in the directed graph as possible. Their objective function is not always submodular, and their formulation does not involve any subset selection, because, by default, they can select all elements. Although we restrict our attention to the question selection and sequencing problem in this paper, our research contributes fundamentally to the field of submodular subset selection and sequencing maximization.
3 Preliminaries and Problem Formulation
3.1.1 Submodular Function and Correlation Gap
A set function that maps subsets of a finite ground set to non-negative real numbers is said to be submodular if for every with and every , we have that
A submodular function is said to be monotone if whenever .
We next present a useful result about any submodular function. For any distribution on , let be the marginal probability that is included and let be a random set independently containing each element with probability . The correlation gap of is
Intuitively, the correlation gap is the maximum ratio between the expected value of a function when the random variables are correlated to its expected value when the random variables are independent.
(Agrawal et al. 2012) The correlation gap of a monotone and submodular function is upper bounded by .
3.1.2 Utility of Answered Questions
Consider any group of questions , we use to represent the utility of given has been answered by the user. Intuitively, obtaining answers to a group of “good” questions should reduce the uncertainty and provide better insights on the user.
In this work, we assume that is non-decreasing and submodular.
We next give a concrete example to show that an entropy-like utility function is indeed non-decreasing and submodular.
An Example of Entropy-like Utility Function. Assume there are attributes and questions . We say question covers attribute if the answer to reveals the value of . We say a group of questions covers if can be covered by at least question from . We use to denote the set of all attributes that can be covered by . One common notation of uncertainty is the conditional entropy of the unobserved attributes of a user after answering .
where we use and to denote sets of random variables associated with attributes in and . Intuitively, a group of “good” questions would minimize Eq. (1
). Based on the chain-rule of entropies, we have. Due to is fixed, minimizing Eq. (1) is reduced to maximizing . Therefore, it is reasonable to define the utility of as and we next show that is non-decreasing and submodular.
is non-decreasing and submodular.
The proof is provided in the appendix.
3.1.3 Question Scanning Process
We use a Markov process to model the user’s behavior when interacting with a sequence of quiz questions. Our model is similar to the Cascade Model (Craswell et al. 2008) that provides the best explanation for position bias of organic search results. We define the answer-through-rate of a question as the probability that the user chooses to answer after reading it. In principle, this probability could depend on many factors including the cognitive efforts required for understanding and answering the question, question sensitivity, etc. Instead of answering , the user may also select “Prefer Not to Answer” (PNA) option, if any, with probability to avoid answering , or simply exit the quiz with probability . In addition to the intrinsic and , each question is also associated with a continuation probability and : (resp. ) represents the probability that the user will continue to read the next question after answering (resp. PNA) .
We summarize the question scanning process of a user as follows.
Starting with question placed at the first slot. After reading , the user chooses one of the following five actions to take: Answer and continue to read the next question with probability ; exit the quiz with probability . PNA and continue to read the next question with probability ; exit the quiz with probability . Exit the quiz with probability . The above process repeats until the user exits the quiz or no more questions remain.
Some Basics: Throughout this paper, we use capital letter to denote sequence and calligraphy letter to denote set. For example, denotes a set of questions and denotes a sorted sequence of . For a given sequence of questions , let denote the question scheduled at slot , we use (resp. , , ) to denote the subsequence of which is scheduled no later than (resp. before, after, no earlier than) slot . Given two sequences and , we define as a new sequence by first displaying and then displaying . For notational simplicity, we define as the aggregated continuation probability of .
We next introduce an important definition.
Definition 1 (Reachability of a Question)
Given that a sequence of questions is displayed to the user, we define the reachability of as the probability that will be read:
In Section 6, we also consider an advanced model by assuming slot-dependent answer-through-rate, e.g., assume is scheduled at slot , the answer-through-rate of question is where is a slot-dependent decay factor.
3.2 Problem Formulation
Given any sequence of questions , we define its expected utility as
where denotes the probability that we can receive answers to given that is displayed to the user. Our objective is to identify the best sequence of questions subject to a cardinality constraint. We next present the formal definition of our problem.
P1 subject to: ;
The following theorem states that this problem is intractable in general.
Problem P1 is NP-hard.
The proof is provided in the appendix.
4 Warming Up: Question Selection and Sequencing with No PNA Option
We first study the case where “PNA” is not an option. In other words, the user is left with two options after reading the current question: either answer it or exist the quiz. The reason for investigating this restricted case is twofold: (1) Although the benefit of including PNA option has been discussed in many existing work (Schuman and Presser 1996, Hawkins and Coney 1981), opponents believe that providing a PNA option could have negative impact on the quality of the answer because some users tend not to answer the question so as to minimize the effort required to complete the quiz by simply ticking PNA option (Poe et al. 1988, Sanchez and Morchio 1992). Since both arguments are empirically validated by previous studies, we decide to study both cases in this work. (2) Technically speaking, the case with no PNA option is a special case of the original problem (by setting for every in the original problem), starting with this simplified case makes it easier to explain our approach used to solve the general case.
We first present a simplified question scanning process under this restricted case as follows:
Starting with question placed at the first slot. After reading question , the user chooses one of the following three actions to take: Answer and continue to read the next question with probability ; exit the quiz with probability . Exit the quiz with probability . The above process repeats until the user exits the quiz or no more questions remain.
4.1 Algorithm Design
The general framework for our method is inspired by the early work of (Kempe and Mahdian 2008), however, their approach only works for linear objective function. Before presenting our algorithm, we first introduce a useful property of any optimal solution. In particular, given an optimal solution , we show that little is lost by discarding those questions whose reachability is sufficiently small.
For any , there is a solution of value at least such that and .
Let denote the -th question in . Assume is the last question in whose reachability is no smaller than , e.g., . Recall that we use (resp. ) to denote the sequence of questions scheduled after (resp. no later than) slot . Therefore the reachability of every question in is no smaller than .
We first show that . Let denote the event that is answered by the user. Let denote the event that the first question of has been read and is answered by the user.
It follows that . Since due to the definition of , and due to is the optimal solution, we have . Since every question in can be reached with probability at least and , is a valid . ∎
Lemma 3 allows us to ignore those questions whose reachability is small, at the expense of a bounded decrease in utility. This motivates us to introduce a new problem P2 by only considering those questions whose reachability is sufficiently high. The objective function of P2 is
The goal of P2 is to find a solution that maximizes subject to three constraints. After solving P2 (approximately) and obtaining a solution , we build the final solution to the original problem based on .
We next take a closer look at P2. Intuitively, the solution to P2 is composed of two parts: and , e.g., is scheduled after . The reason we separate from other ads in is that is scheduled at the last slot, thus there is no restriction on ’s aggregated continuation probability . Constraint (C1) ensures that the reachability of every question in our solution is sufficiently high, and constraint (C2) ensures the feasibility of the final solution, e.g., the size of our solution is upper bounded by .
In the rest of this section, we focus on solving P2. We first show that is submodular as a function of .
For any fixed , is a submodular function of .
We first show that for any fixed , is submodular as a function of . For any and , we have due to is submodular and . Thus, is a submodular function of . It follows that for any fixed , is submodular due to the linear combination of two submodular functions is submodular. ∎
As a consequence of Lemma 4, for any fixed , P2 is a submodular maximization problem subject to two linear constraints (constraints (C1) and (C2)), and there exists a approximate algorithm to this problem (Kulik et al. 2009) where is a constant whose value is arbitrarily close to 2.718. . In order to solve P2, we exhaustively try all questions which will be scheduled at the last slot, for each , we run a approximate algorithm to obtain a candidate solution . Among all candidate solutions, assume has the largest utility, is returned as the final solution to the original problem where is an arbitrary sequence of .
We present the detailed description of our solution in Algorithm 1.
4.2 Performance Analysis
We next analyze the performance bound of Algorithm 1. We first present some preparatory lemmas. Since for each , we find a approximate solution and has the maximum utility among all returned solutions, the following lemma holds.
is a approximate solution to P2.
Now we are ready to provide a performance bound on the final solution . We first show that for any , , i.e., the utility of is close to the value of .
Due to satisfies constraint (C2) in problem P2, we have . It follows that with probability at least , all questions in will be answered and will be read. Moreover, the probability that will be answered by the user is conditioned on all questions in are answered and is read. It follows that . ∎
We present the main theorem as follows.
For any ,
For any , let denote the optimal solution subject to and . Lemma 3 implies . Therefore, in order to prove this theorem, it suffice to prove .
Assume , let denote the subsequence of by excluding the last question . Because and , is a valid solution to problem P2. Therefore, due to Algorithm 1 finds a approximate solution to P2 (Lemma 5). We next prove that .
By choosing , we have
5 Question Selection and Sequencing under General Model
We now add PNA option to our model. The workflow of our solution is similar in structure to Algorithm 1: we first introduce a new problem, then build the final solution based on the solution to that new problem. However, the way we define the new problem as well as the analysis of our solution are largely different from the one used in the previous model.
5.1 Algorithm Design
For any given sequence of questions , we use and interchangeably to denote a random set obtained by including each question independently with probability . We first introduce a new problem whose objective function is
The goal of is to find a solution that maximizes function . Similar to constraints (C1) and (C2) used in , we use constraint (C3) (resp. constraint (C4)) to ensure that all selected questions can be reached with high probability (resp. the size of the solution is upper bounded by ).
In the following lemma we show that if is fixed, then , as a function of , is submodular.
For any fixed , is a submodular function of .
Assume is a (random) realization of , let denote the probability that is realized, we have
We next prove that for any fixed and , as a function of is submodular. For any and , we have . The inequality is due to and is submodular. Thus is a submodular function of . By a similar proof, we can show that is also a sumodular function of . It follows that is a submodular function of due to linear combination of submodular functions is submodular. ∎
5.2 Performance Analysis
Recall that we use to denote the prefix of whose reachability is no smaller than , e.g., . We first show that the expected utility of random set is at least .
Assume is the optimal solution, .
Consider any question that is scheduled at slot of , the probability that is read by the user is , thus the probability that is answered by the user is . Let denote a random set obtained by including each question with independently probability .
First of all, we show that . This is based on two observations: (1) for each question , the marginal probability that is included in is identical to the probability that is answered by the user, (2) is a submodular function and the correlation gap of a submodular function as defined in 3.1 is bounded by . Thus .
Next, it is easy to verify that . This is because for every , we have , it follows that has larger probability to be included in than in . Then we have .
and together imply that . ∎
We next prove that the utility of is close to the expected utility of a random set .
For any ,
We first introduce some useful notations. Given the solution that is returned from Algorithm 2, let denote the (random) set of questions answered by the user given . Then we have . For notational simplicity, we use (resp. ) to denote (resp. ) for short in the rest of this proof. Because , we focus on proving .
Define and . The main result that we will prove is that for any fixed ,
Then the theorem follows from Eq. (8) since
Based on this observation, we next prove Eq. (8).
Notice that the distribution of is determined by the Markov process defined in the previous section. For ease of analysis, for any fixed slot , we next introduce an alternative way to generate the distribution of : For every , (1) we first determine whether will be answered or not given that has been read by the user, and (2) then determine whether will be read or not. In particular, we first construct a random set by including each question independently with probability . Let denote the -th question in , we generate another random set based on as follows:
Initially, . Starting from , if (resp. ), add to with probability (resp. ), repeat this step with ; otherwise, return . Return also once no more questions remain.
Intuitively, includes those questions which are answered by the user given that they have been read, and includes those questions which can be read by the user. It is easy to verify that has the same distribution of . We use (resp. ) to denote (resp. ), it follows that has the same distribution of .
We next focus on bounding the value of . Let (resp. ) denote the event that is included in (resp. ). For notational simplicity, define as the marginal benefit of given . For any fixed , we have
Eq. (9) is due to the observation that has the same distribution of . Inequality (10) is due to is a submodular function. Inequality (11) and (12) are due to both and are independent of the realization of . Inequality (5.2) is due to the fact that every question in has reachability no less than , e.g., . ∎
Now we are ready to present the main theorem of this paper.
For any , .