In assortment planning, the seller’s goal is to select a subset of products (called an assortment) to offer to a customer so as to maximize the expected revenue. This task can be formulated as an optimization problem given the revenue generated from selling each product, along with a probabilistic model of the customer’s preferences for the products. Such a discrete choice model must capture the customer’s substitution behavior when, for instance, the offered assortment does not contain the customer’s most preferred product.
Our focus in this paper is the Markov chain choice model (MCCM) proposed by Blanchet et al. (2016)
. In this model, the product selected by the customer is determined by a Markov chain over products where the products in the offered assortment are absorbing states. The current state represents the desired product; if that product is not offered, the customer transitions to another product according to the Markov chain probabilities, and the process continues until the desired product is offered or the customer leaves. MCCM generalizes widely-used discrete choice models such as the multinomial logit model(Luce, 1959; Plackett, 1975), as well as other generalized attraction models (Gallego et al., 2014); it also well-approximates other random utility models found in the literature such as mixed multinomial logit models (McFadden and Train, 2000). At the same time, the MCCM permits computationally efficient unconstrained assortment optimization as well as efficient approximation algorithms in the constrained case (Blanchet et al., 2016; Désir et al., 2015); this stands in contrast to some richer models such as mixed multinomial logit models (Rusmevichientong et al., 2010) and the nested logit model (Davis et al., 2014) for which assortment optimization is generally intractable. This combination of expressiveness and computational tractability makes MCCM very attractive for use in assortment planning.
A crucial step in this overall enterprise—e.g., before assortment optimization may take place—is the estimation of the choice model’s parameters from observational data. Parameter estimation for MCCM is only briefly considered in the original work of Blanchet et al. (2016). In that work, it is shown that the parameters can be determined from the choice probabilities for the all-products assortment, together with the assortments comprised of all-but-one product. This is not satisfactory because it may be unrealistic or unprofitable to offer assortments of such large cardinality. Therefore, it is desirable to be able to determine the parameters from choice probabilities for smaller cardinality assortments. We note that this is indeed possible for simpler choice models such as the multinomial logit model (see, e.g., Train, 2009), but these simpler models are limited in expressiveness—for example, they cannot express heterogeneous substitution behavior.
In this paper, we show that the MCCM parameters can be identified from the choice probabilities for assortments of sizes as small as two and three, independent of the total number of products.111We focus on identifiability because estimation of choice probabilities from observational data is fairly straightforward, especially when the assortments have small cardinality. However, this issue is revisited in Section 5 in the context of sample complexity. We also give a simple and efficient algorithm for reconstructing the parameters from these choice probabilities.
2 Model and notation
In this section, we describe the Markov chain choice model (MCCM) of Blanchet et al. (2016), along with notations used for choice probabilities and model parameters.
The set of products in the system is denoted by . The “no purchase” option is denoted by product . Upon offering an assortment , the set of possible outcomes is : either some product in is purchased, or no product is purchased.
Underlying the MCCM is a Markov chain with state space . The (true) parameters of the model are the initial state probabilities and the transition probabilities
(a row stochastic matrix). The transition probabilities satisfy the following properties:
and for (i.e., the “no purchase” state is absorbing).
for (i.e., no self-loops in product states).
The submatrix is irreducible.
We use to denote the -th row of .
In MCCM, the customer arrives at a random initial state chosen according to . At time :
If , the customer leaves the system without purchasing a product.
If the product is offered (i.e., ), the customer purchases and leaves.
If the product is not offered (i.e., ), the customer transitions to a new random state chosen according to and the process continues in time step as if the customer had initially arrived at .
Another way to describe this process is that the Markov chain distribution is temporarily modified so that the states are absorbing, and the customer purchases the product upon reaching such a state (or makes no purchase if the state is ). The irreducibility of ensures that the customer eventually leaves the system (i.e., an absorbing state is reached). Note that only the identity of the final (absorbing) state is observed, as it corresponds to either a purchase or non-purchase. The themselves do not correspond to observable customer behavior, and hence the model parameters and cannot be directly estimated.
The choice probabilities are denoted by for and : this is the probability that is the final state in the aforementioned process. Blanchet et al. (2016) relate the choice probabilities and the parameters and as follows:
The relations in Equation (1) show that the parameters may be identified from choice probabilities for the assortments and for . These choice probabilities may be directly estimated from observations upon offering such assortments to customers.
3 Main result
The following theorem establishes identifiability of the MCCM parameters from choice probabilities for assortments of sizes as small as two and three.
There is an efficient algorithm that, for any , when given as input the choice probabilities for all assortments of cardinality and for a Markov chain choice model, returns the parameters and of the model.
The number of assortments for which the algorithm actually requires choice probabilities is when , which is far fewer than , the total numbers of assortments of sizes and . The details of this bound are shown following the proof of Theorem 1. However, to simplify the presentation, we describe our parameter recovery algorithm as using choice probabilities for all assortments of sizes and .
The main steps of our algorithm, shown as Algorithm 1, involve setting up and then solving systems of linear equations that (as we will prove) determine the unknown parameters and . (Note that is already known.) The coefficients of the linear equations are determined by the given choice probabilities via conditional choice probabilities for and , defined as follows:
Note that the initial state in the MCCM is not observed, so these conditional probabilities cannot be directly estimated. Nevertheless, they can be indirectly estimated via the following relationship between the conditional choice probabilities and the (unconditional) choice probabilities.
For any and ,
The cases where () and () are clear from the definition in Equation (2). It remains to handle the case where . Fix such a product , and observe that
The penultimate step uses the Markov property and the case condition that . Rearranging the equation gives the relation claimed by the lemma in this case. ∎
Lemma 1 shows that the conditional choice probabilities for assortments of size can be determined from the unconditional choice probabilities of assortments of size and . The systems of linear equations used in Algorithm 1 (Equations (3) and (4)) are defined in terms of these conditional choice probabilities and hence are ultimately defined in terms of the unconditional choice probabilities provided as input to Algorithm 1.
4 Proof of Theorem 1
In this section, we give the proof of Theorem 1.
4.1 The case without the “no purchase” option
For sake of clarity, we first give the proof in the case where the “no purchase” option is absent. This can be regarded as the special case where and for all . So here we just regard and each. The general case will easily follow from the same arguments with minor modification.
4.1.1 Proof strategy
We make use of the following result about M-matrices, i.e., the class of matrices that can be expressed as for some and non-negative matrix with spectral radius at most . (Here,
denotes the identity matrix of appropriate dimensions.) In particular, the matrixis a (singular) M-matrix that is also irreducible.
Lemma 2 (See, e.g., Theorems 6.2.3 & 6.4.16 in Berman and Plemmons, 1994).
If is an irreducible M-matrix (possibly singular), then every principal submatrix222Recall that a principal submatrix of a matrix is a submatrix obtained by removing from the rows and columns indexed by some set . of , other than itself, is non-singular. If is also singular, then it has rank .
For each and
, define the vector
For each , the collection of the vectors provide the left-hand side coefficients in Equation (3) for . We’ll show that the span of these vectors (in fact, a particular subset of them) has dimension at least . This is sufficient to conclude that is the unique solution to the system of equations in Equation (3) because it has at most unknown variables, and it is clear that satisfies the system of equations. (In fact, there are really only unknown variables, because we can force and .) For the same reason, it is also sufficient to conclude that is the unique solution to the system of equations in Equation (4) (where, in fact, we may use all vectors ).
4.1.2 Rank of linear equations from a single assortment
We begin by characterizing the space spanned by for a fixed . We claim, by Lemma 1, that the vectors in are linearly independent. Indeed, if this collection of vectors is arranged in a matrix , then the submatrix obtained by selecting rows corresponding to is the identity matrix. Thus we have proved
For any , .
Note that in the case , we are done. But when , the linear equations given by the may not uniquely determine the for . To overcome this, we need to be able to combine linear equations derived from multiple assortments. However, for a sum of subspaces and ,
unless and are orthogonal. In our case, the subspaces and for different assortments and are not necessarily orthogonal (even if and are disjoint). So a different argument is needed.
4.1.3 Rank of linear equations from multiple assortments
Our aim is to show that the intersection of subspaces for different assortments and cannot have high dimension. We do this by showing that the intersection is orthogonal to a subspace of high dimension.
For each , let denote the -th row of the matrix (which is an M-matrix). That is, , where is the -th coordinate basis vector. Recall that if , then satisfies Equation (3). This fact can be written in our new notation as
In other words,
For any , .
Now consider two assortments and , and the intersection of their respective subspaces. It follows from Lemma 4 that
This orthogonality is the key to lower-bounding the dimension of the sum of these subspaces, which we capture in the following general lemma.
Let be a family of subsets of , be a subset of , and . Define the subspaces
Let . Fix any . Then, by Lemma 4, for all . In other words,
where , and
To determine , observe that is the span of rows of certain rows of the M-matrix . By Lemma 2, the principal submatrix of corresponding to is either non-singular (when ) or is itself; in either case, it has rank . Hence,
as well. Combining the dimension formula with the last two equation displays gives
The claim now follows from Lemma 3. ∎
4.1.4 Choice of assortments
We now choose a collection of assortments and argue, via Lemma 3 and Lemma 5, that they define linear equations of sufficiently high rank. Specifically, for each , we need a collection such that each does not contain , and
Suppose the assortments have a pairwise common intersection for all , and . Then .
Fix any and , and observe that . Consider the collection of size- assortments given by
This concludes the proof of Theorem 1 without the “no purchase” option. ∎
4.2 The general case with the “no purchase” option
We now consider the general case, where the “no purchase” option is present. The main difference relative to the previous subsection is that is no longer irreducible, as the “no purchase” state is absorbing. However, the submatrix is irreducible, so is an irreducible M-matrix.
The definition of , for and , is now taken to be
Because the indexing starts at , we still define to be the -th row of , so . (In particular, is the all-zeros vector.)
For any ,
Here, the key difference is that the dimension is , rather than just .
We now establish an analogue of Lemma 5 (which is typographically nearly identical).
Let be a family of subsets of , be a subset of , and . Define the subspaces
We now use the fact that , which is a submatrix of , is an irreducible M-matrix. By Lemma 2, the principal submatrix of corresponding to is either non-singular (when ) or is ; in either case, it has rank . So we have
Finishing the proof as in Lemma 5, we have
The choice of assortments demonstrating the subspace of required dimension is the same as before, except now we show that the dimension is at least . Again, fix some , and choose the collection of assortments as before (described in and directly before Equation (7)). Following the inductive argument in the proof of Lemma 6, but now using Lemma 7 and Lemma 8 (instead of Lemma 3 and Lemma 5), we have
Since each of the systems of linear equations from Equation (3) and Equation (4) have (at most) unknown variables, we conclude that the and are unique as solutions to their respective systems of linear equations.
This concludes the proof of Theorem 1. ∎
4.3 Total number of assortments required
We now show that the number of assortments for which we need the choice probabilities is for . Indeed, the construction given above based on Lemma 6 can be used to avoid using all assortments of size (and ) in Algorithm 1.
We choose two sets , which shall serve as “common intersection sets” (in the sense used in Section 4.1.4), as follows. The first set is chosen arbitrarily; it serves as the common intersection set for all . The second set is chosen arbitrarily as long as it is disjoint from (which is possible because ); it serves as the common intersection set for .
For each , we need the equations for the assortments for all . Obtaining the equations for one such requires choice probabilities for assortments and for as per Lemma 1. In total, for all , we need choice probabilities for assortments. For the remaining , we use the same argument for the disjoint common intersection set , and thus require the choice probabilities for at most another assortments.
Our main result establishes the identifiability of MCCM parameters from choice probabilities for assortments of sizes different from and . This is important because real systems often have cardinality constraints on the assortment sizes. While such constraints are typically considered in the context of assortment optimization (see, e.g., Désir et al., 2015), it is also important in the context of parameter estimation.
One complication of using small size assortments to estimate the MCCM parameters is that the number of different assortments required may be as large as . In contrast, only assortments are needed when the sizes are and . On the other hand, the statistical difficulty of estimating choice probabilities for large assortments may be higher than the same task for smaller assortments. So the possible trade-offs in sample complexity is not straightforward from this analysis. This is an interesting question that we leave to future work.
We are grateful to Shipra Agrawal and Vineet Goyal for helpful discussions, and to Vineet for originally suggesting this problem. This work was supported in part by NSF awards DMR-1534910 and IIS-1563785, a Bloomberg Data Science Research Grant, a Sloan Research Fellowship, and the Research Opportunities and Approaches to Data Science (ROADS) grant from the Data Science Institute at Columbia University.
- Berman and Plemmons (1994) Abraham Berman and Robert J Plemmons. Nonnegative matrices in the mathematical sciences. SIAM, 1994.
- Blanchet et al. (2016) Jose Blanchet, Guillermo Gallego, and Vineet Goyal. A markov chain approximation to choice modeling. Operations Research, 64(4):886–905, 2016.
- Davis et al. (2014) James M Davis, Guillermo Gallego, and Huseyin Topaloglu. Assortment optimization under variants of the nested logit model. Operations Research, 62(2):250–273, 2014.
- Désir et al. (2015) Antoine Désir, Vineet Goyal, Danny Segev, and Chun Ye. Capacity constrained assortment optimization under the markov chain based choice model. 2015.
Gallego et al. (2014)
Guillermo Gallego, Richard Ratliff, and Sergey Shebalov.
A general attraction model and sales-based linear program for network revenue management under customer choice.Operations Research, 63(1):212–232, 2014.
- Luce (1959) R Duncan Luce. Individual choice behavior: A theoretical analysis. Wiley, 1959.
- McFadden and Train (2000) Daniel McFadden and Kenneth Train. Mixed mnl models for discrete response. Journal of applied Econometrics, pages 447–470, 2000.
- Plackett (1975) Robin L Plackett. The analysis of permutations. Journal of the Royal Statistical Society. Series C (Applied Statistics), pages 193–202, 1975.
- Rusmevichientong et al. (2010) Paat Rusmevichientong, David Shmoys, and Huseyin Topaloglu. Assortment optimization with mixtures of logits. Technical report, Tech. rep., School of IEOR, Cornell University, 2010.
- Train (2009) Kenneth E Train. Discrete choice methods with simulation. Cambridge university press, 2009.