Submodular maximization has been extensively studied in the literature (Nemhauser et al. 1978). Their objective is to select a group of items that maximize a submodular utility function subject to various constraints. Recently, Golovin and Krause (2011) propose the problem of adaptive submodular maximization, a natural stochastic variant of the classical submodular maximization. In particular, they assume that each item is associated with a particular state which is drawn from a known distribution, the only way to know an item’s state is to select that item. As compared with the classical submodular maximization, feasible solutions are now policies instead of subsets: the action taken in each step depends on the observations from the previous steps. For example, a typical adaptive policy works as follows: in each step, we select an item, get to see its actual state, then adaptively select the next item based on these observations and so on. They show that an adaptive greedy policy achieves approximation ratio when maximizing an adaptive submodular and adaptive monotone utility function subject to a cardinality constraint. Our problem setting is similar to theirs in that we are also interested in selecting a sequence of items adaptively so as to maximize the expected utility. However, one unique setting of our model is that each item is assigned a continuation probability. The continuation probability of an item, say , is defined as the the probability that one can select the next item after
is being selected. The probabilistic continuation setting allows us to capture the scenario where the selecting process could be terminated prematurely. It can be seen that selecting an item with low continuation probability decreases the chance of its subsequence items being selected. Therefore, the actual set of items that can be selected by a policy depends on the specific ordering it adopts to select items. Our setting is motivated by many real-world applications in machine learning, economics, and operations management.
Taking sponsored search advertising as one example, one challenge faced by ad-networks is to select a sequence of advertisements to display to an online user. It is often assumed that the visibility of an advertisement is negatively impacted by the appearances of its preceding advertisements, e.g., the user is less likely to view a new ad after viewing too many other advertisements. One common way to capture this effect is to introduce the continuation probability for each advertisement. In particular, Kempe and Mahdian (2008) and Craswell et al. (2008) assume that the users scan through the ads in order, after viewing one advertisement, users decide probabilistically whether to click it, as well as whether to continue the scanning process with the ad specific continuation probability. As a result, the user could terminate the ad session prematurely in a probabilistic manner.
Our second practical application is sequential product recommendation. One important task of online retailers such as Amazon is to recommend a list of products to each online user. Given a list of recommended products, users scan through them in order, after browsing one product, users decide probabilistically whether to browse the next product with the product dependent continuation probability. After browsing some products, users make a purchase decision within these products (including no purchase option). Similar to the first application, the recommendation process stops immediately whenever the user decides not to browse the next product.
One crucial point in the above applications is that the value of a group of items depends not only on the items belong to that group, but also on the specific ordering of those items. This makes our problem different from set optimization problems as we seek a best sequence of items while considering the externality of one item to its subsequent items in terms of the chance of being selecting. Although sequence selection has attracted increasing attention these days, most of existing results do not apply to our problem. As will be discussed in Section 2, our utility function does not satisfy the property of “non-decreasing”, a common assumption made in many existing studies. Moreover, while the majority of prior research considers non-adaptive setting, we focus on the adaptive setting, where we are allowed to dynamically adjust the selecting strategy based on the current observations. To make our problem approachable, we restrict our attention to a class of stochastic utility functions, adaptive cascade submodular functions. Intuitively, any adaptive cascade submodular function must satisfy diminishing return condition under the adaptive setting. We show that the objective functions in several practical application domains satisfy adaptive cascade submodularity. We propose a simple algorithm that achieves approximation ratio .
2 Related Work
Submdoular maximization has been extensively studied in the literature (Nemhauser et al. 1978). However, most of existing studies focus on set optimization problems whose objective is to select set of items that maximizes a submodular utility function. Our focus is on identifying the best sequence of items so as to maximize the expected utility. Although our paper focus on the adaptive setting, we first review some important studies in the filed of non-adaptive sequence selection. Recently, Streeter and Golovin (2009) considered the sequence optimization problem prompted by applications such as online resource allocation. They defined the properties of monotonicity and submodularity over sequences instead of sets. Alaei et al. (2010) introduced the term of sequence submodularity and sequence-non-decreasing. Tschiatschek et al. (2017) and Mitrovic et al. (2018) defined the utility of a sequence over the edges of a directed graph connecting the items together with a submodular function. However, their results do not apply to our setting since our utility function does not satisfy the property of “sequence monotonicity”. Intuitively, under our setting, adding an item to an existing sequence could decrease the utility of the original sequence. For example, assume there is a sequence with positive utility, as well as an item with zero utility and zero continuation probability. Consider a new sequence that concatenates and ( is placed ahead of ), it is easy to verify that the utility of the new sequence is zero which is smaller than the utility of . Zhang et al. (2015) propose the concept of string submodularity. They provide a set of data-dependent approximation bounds for a greedy strategy. It turns out that our utility function under the non-adaptive setting is string submodular, however, the worst-case performance of the greedy strategy (Zhang et al. 2015) is arbitrarily bad in our setting. Note that all studies previously mentioned restrict themselves to the non-adaptive setting where a sequence must be selected all at once. Only recently, Mitrovic et al. (2019) extend the previous studies to the adaptive setting and propose the concept of adaptive sequence submodularity. They follow (Tschiatschek et al. 2017) to build their basic model. Our study is different from theirs in that our utility function is defined over subsequences, instead of graphs (Mitrovic et al. 2019), making their results not applicable to our setting.
Our work is also closely related to stochastic submodular maximization (Asadpour and Nazerzadeh 2016, Golovin and Krause 2011). Golovin and Krause (2011) extend submodularity to adaptive policies and propose the concept of adaptive submodularity. They show that the greedy adaptive strategy achieves approximation ratio for adaptive submodular maximization subject to a cardinality constraint. In this work, we generalize the concept of adaptive submodularity to functions over sequences instead of sets, and introduce the concept of adaptive cascade submodular functions. As mentioned earlier, our model allows us to capture the scenario where the selecting process could be terminated prematurely. We develop an adaptive policy that achieves approximation ratio for solving sequence selection problems with an adaptive cascade submodular and monotone function.
We first introduce some notations and define the general class of adaptive cascade submodular functions. In the rest of this paper, let denote the set , and we use to denote the cardinality of a set or a sequence .
3.1 Items and States
Let denote the entire set of items, and each item is in a particular state that belongs to a set of possible states. Denote by a realization of the states of items. Let be a random realization where denotes a random realization of . After selecting , its actual state is discovered. Let
denote the set of all realizations, we assume there is a known prior probability distributionover realizations. In addition, each item is associated with a continuation probability denoting the probability that one can continue to select the next item after selecting . We are interested in selecting a group of items adaptively as follows: we start by selecting the first item, say , observe its state , then with probability , we continue to select the next item and observe its state, otherwise we terminate the selecting process, and so on. During the selecting process, we say the current process is live if we can continue to select the next item, otherwise we say this process is dead. Thus, the probability of a selecting process to be live after is being selected is . After each selection, we denote by a partial realization the observations made so far: is a function from some subset (i.e., those items which are selected so far) of to their states. We define the domain of as the subset of items involved in . Given a realization and a partial realization , we say is consistent with if they are equal everywhere in the domain of . We write in this case. We say that is a subrealization of if and they are equal everywhere in . In this case we write . We use to denote the conditional distribution over realizations given a partial realization : .
3.2 Policies and Problem Formulation
Any adaptive strategy of selecting items can be represented using a function from a set of partial realizations to , specifying which item to select next, if the current selecting process is still live, given the current observations. Given any policy and realization , we say adopts under realization if is the longest possible sequence of items that can be selected by under realization . Intuitively, by following , one can successfully select all items in under if the selecting process is never dead until it reaches an item with zero continuation probability or when all items in have been selected. By abuse of notation, we use the same notation to denote the set of items in . Given a sequence , let denote the prefix of of length and let denote the -th item in for any . It follows that under realization , selects all and only items from with probability .
For notational convenience, define for any and . We next introduce a utility function from a subset of items and their states to a non-negative real number: . The expected utility of a policy under realization is
Based on this notation, we define the expected utility of a policy as
Our goal is to find a policy that maximizes the expected utility:
3.3 Adaptive Cascade Submodularity and Monotonicity
We first review two concepts which are defined over set functions. For notational convenience, let denote the utility of under partial realization . (Golovin and Krause 2011)[Adaptive Submodularity] A set function is adaptive submodular with respect to a prior distribution , if for any two partial realizations and such that , and any item , the following holds:
(Golovin and Krause 2011)[Adaptive Monotonicity] A set function is adaptive monotone with respect to a prior distribution , if for any partial realization , and any item , the following holds:
We next propose a class of stochastic utility functions, adaptive cascade submodularity. For any subset of elements , let denote the set of policies which are allowed to select items only from . It clear that for any . [Adaptive Cascade Submodularity] A function is adaptive cascade submodular with respect to a prior distribution , if for any two partial realizations and such that , and any subset of elements , the following holds:
denote the conditional expected utility of a policy that first selects , then runs , conditioned on a partial realization .
We next show that adaptive cascade submodularity (Definition 3.3) is closely related to the concept of adaptive submodularity (Definition 3.3). In Lemma 3.3, whose proof is moved to appendix, we prove that if is adaptive cascade submodular with respect to a prior distribution , then the set function is adaptive submodular with respect to the same distribution. Although the other direction is not necessarily to be true, that is, the set function is adaptive submodular does not imply that is adaptive cascade submodular, we find that many well studied adaptive submodular and monotone functions are able to make satisfy adaptive cascade submodularity. In fact, it is easy to show that if the variables are independent and is adaptive submodular, then must be adaptive cascade submodular. One such example is sensor placement (Krause and Guestrin 2007) where deployed sensors are assumed to fail probabilistically and independently. In other applications including influence maximization (Golovin and Krause 2011), the above two properties also hold even are not independent.
If is adaptive cascade submodular with respect to a prior distribution , then the set function is adaptive submodular with respect to .
4 The Adaptive Greedy Plus Policy
In this section, we propose an adaptive policy which achieves a constant approximation ratio for maximizing an adaptive cascade submodular function.
4.1 Technical Lemmas
Before presenting our algorithm, we first provide some additional notations and technical lemmas that will be used to design and analyze the proposed algorithm. We start by introducing the concept of reachability. [Reachability] Given a sequence of items and any , we define the reachability of its -th item as , e.g., it represents the probability of being selected given that is adopted. For notational convenience, assume .
Based on the notion of reachability, we next introduce the concepts of -reachable sequence, strongly -reachable sequence, -reachable policy, and strongly -reachable policy.
[-reachable Sequence] For any , we say a sequence is -reachable if the reachability of all items in is at least , or in equivalent, the reachability of the last item of is at least , e.g., .
Based on the above definition, it can be seen that if we adopt a -reachable sequence , then the entire can be selected with probability at least .
[-reachable Policy] For any , we say a policy is -reachable if for any realization , it holds that . That is, only adopts -reachable sequence. Let denote the set of all -reachable policies.
[Strongly -reachable Sequence] For any , we say a sequence is strongly -reachable if .
[Strongly -reachable Policy] For any , we say a policy is strongly -reachable if for any realization , it holds that . That is, a strongly -reachable policy only adopts strongly -reachable sequence. Let denote the set of all strong -reachable policies.
We next introduce the term of maximal -reachable sequence.
[Maximal -reachable Sequence] Fix any . When , we say a sequence is a maximal -reachable sequence if is -reachable but not strongly -reachable. When , all sequences are considered as maximal -reachable sequences. Let denote the set of all maximal -reachable sequences.
Intuitively, when , a maximal -reachable sequence must satisfy two conditions: 1. the reachability of all items in is at least , and 2. any item placed after has reachability less than .
Based on the above notations, we first show that there exists a -reachable policy whose performance is close to the optimal policy.
Fix any . If is adaptive cascade submodular, there is a -reachable policy of expected utility at least . Proof: The basic idea of our proof is to show that given an optimal policy , we can discard those items whose reachability is small at the cost of a bounded loss.
For any maximal -reachable sequence , we use to denote the probability that is a prefix of some sequence adopted by while the states of is , thus . Based on this notation, we can represent the expected utility of as follows:
denotes the expected utility of conditioned on is a prefix of some sequence adopted by while the states of is . According to Definition 3.3, we have due to . Moreover, we have where the inequality is due to . Then we have
It follows that
where the first inequality is due to (7) and the second inequality is due to is a maximal -reachable sequence. Then we have
Based on the above results, we next construct a -reachable policy whose expected utility is at least . Assume the optimal policy is given, we run until the last item whose reachability is smaller than is selected or the current selecting process is dead, whichever comes first. It is clear that the above policy is -reachable, and its expected utility is whose value is lower bounded by according to (9). This finishes the proof of this lemma.
P1: Maximize subject to:
According to Lemma 4.1, there exists a -reachable policy whose expected utility is at least , then the optimal -reachable policy has the same performance bound. In particular, let denote the optimal solution to P1, we have the following lemma.
Fix any . If is adaptive cascade submodular, we have .
Lemma 4.1 allows us to put our focus on solving P1.
We next introduce a new optimization problem P2 subject to only adopting strongly -reachable sequences. Let denote the optimal solution to P2.
P2: Maximize subject to:
Notice that every strongly -reachable sequence is also a -reachable sequence, similarly, every strongly -reachable policy is also a -reachable policy. Therefore, is upper bounded by . However, we next show that the gap between and can be bounded. Denote by a single item with the maximum expected utility. If is adaptive cascade submodular, then . Proof: Assume the optimal -reachable policy is given, we next construct a strongly -reachable policy as follows: we run until it violates the strongly -reachable constraint (the item whose addition to the current solution violates the constraint is not selected) or the current selecting process is dead, whichever comes first. Observe that any sequence of length can be represented as , where is the concatenation operator. Assume is a -reachable sequence, we have , thus is a strongly -reachable sequence according to Definition 4.1. Based on this observation, it is easy to verify that given any full realization, selects at most one more item (the item that violates the strongly -reachable constraint) than . Due to the adaptive submodularity of (Lemma 3.3), the expected marginal utility of that item is upper bounded by . It follows that . Because is the optimal strongly -reachable policy, we have . This finishes the proof of this lemma.
According to Definition 4.1, a policy is strongly -reachable if , which is equivalent to . By replacing the constraint by , we obtain an alternative formulation P2.1 of P2.
P2.1: Maximize subject to:
To facilitate the analysis of our proposed algorithm, we introduce another optimization problem P3 by replacing the objective function in P2.1 using .
P3: Maximize subject to:
Note that is the expected utility of a policy assuming that the selecting process is never dead until it reaches an item with zero continuation probability or when all items in have been selected. Therefore, if is adaptive monotone, then for any . For notation simplicity, define . Then we have the following lemma. Let denote the optimal solution to P3. If is adaptive monotone, then .
4.2 Algorithm Design
Now we are ready to present our adaptive greedy plus policy . For ease of presentation, we define as the virtual cost of an item . Intuitively, an item with a higher continuation probability has a larger virtual cost. randomly picks one solution from the following two candidates: a singleton with the maximum expected utility and a greedy policy .
The second candidate solution is performed in a sequential manner as follows: In each round of a live selecting process, we select an item that maximizes the ratio of the expected marginal benefit to the virtual cost. This process iterates until the budget is used up or the current selecting process is dead. Note that is allowed to violate the budget constraint by adding one additional item. To provide a detailed description of (Algorithm 1), we first introduce some notations. We use to denote the partial realization observed in round . Let denote the expected marginal benefit of conditioned on .
In each round of a live selecting process, we select the item with the largest “benefit-to-cost” ratio
This process repeats until the selecting process is dead or the budget constraint is violated where denotes the set of selected items. Note that is very similar to the adaptive greedy algorithm proposed in (Golovin and Krause 2011) except that is allowed to violate the budget constraint by adding one more item. That is, the first item that violates the budget constraint is also selected by .
4.3 Performance Analysis
Now we are ready to present the main theorem of this paper.
Fix any . If is adaptive cascade submodular and is adaptive monotone, then .
Proof: We first build a relation between and . Because we assume is adaptive cascade submodular, we have is adaptive submodular and adaptive monotone according to Lemma 3.3. Therefore, P3 is an adaptive submodular maximization problem subject to a budget constraint, where the cost of each item is and the budget constraint is . According to (Tang and Yuan 2020, Golovin and Krause 2011), the “benefit-to-ratio” based greedy algorithm achieves approximation ratio when maximizing an adaptive submodular and adaptive monotone function, where is the (expected) actual amount of budget consumed by the algorithm and is the budget constraint. This ratio is lower bounded by when . In our case, because is allowed to violate the budget constraint by adding one more item to the solution, we have
Moreover, because does not violate the budget constraint until the last round, the reachability of every item selected by is lower bounded by . Thus the expected utility of is at least . It follows that
due to (10). Now we are ready to bound the approximation ratio of .
(12) is due to randomly picks one between and as the final solution; (13) is due to (11); (14) is due to Lemma 4.1; (16) is due to Lemma 4.1; (17) is due to Lemma 4.1. This finishes the proof of this theorem.
It can be seen that if we set , achieves approximation ratio .
If we set , then our adaptive greedy plus policy achieves approximation ratio given that is adaptive cascade submodular and is adaptive monotone.
In this paper, we propose and study a new stochastic optimization problem, called adaptive cascade submodular maximization. Our goal is to adaptively select a sequence of items that maximizes the expected utility. Our problem is motivated by many real-world applications where the selecting process could be terminated prematurely. We show that existing studies on submodular maximization do not apply to our setting. We start by introducing a class of stochastic utility functions, adaptive cascade submodular functions. Then we propose an adaptive policy that achieves a constant approximation ratio given that the utility function is adaptive cascade submodular and adaptive monotone. In the future, we would like to extend this work by incorporating some practical constraints such as cardinality constraint to the existing model.
Proof of Lemma 3.3: Since is adaptive cascade submodular, we have
for any two partial realizations and such that , and any subset of elements , according to Definition 3.3. Consider a singleton for any , the strategy set contains only one strategy for any partial realization : selecting . Thus for any and partial realization . Condition (18) can be simplified as
- Maximizing sequence-submodular functions and its application to online advertising. arXiv preprint arXiv:1009.4153. Cited by: §2.
- Maximizing stochastic monotone submodular functions. Management Science 62 (8), pp. 2374–2391. Cited by: §2.
- An experimental comparison of click position-bias models. In Proceedings of the 2008 international conference on web search and data mining, pp. 87–94. Cited by: 1st item.
Adaptive submodularity: theory and applications in active learning and stochastic optimization.
Journal of Artificial Intelligence Research42, pp. 427–486. Cited by: §1, §2, §3.3, §3.3, §3.3, §4.2, §4.3.
- A cascade model for externalities in sponsored search. In International Workshop on Internet and Network Economics, pp. 585–596. Cited by: 1st item.
- Near-optimal observation selection using submodular functions. In AAAI, Vol. 7, pp. 1650–1654. Cited by: §3.3.
- Submodularity on hypergraphs: from sets to sequences. arXiv preprint arXiv:1802.09110. Cited by: §2.
- Adaptive sequence submodularity. In Advances in Neural Information Processing Systems, pp. 5352–5363. Cited by: §2.
- An analysis of approximations for maximizing submodular set functions i. Mathematical programming 14 (1), pp. 265–294. Cited by: §1, §2.
- An online algorithm for maximizing submodular functions. In Advances in Neural Information Processing Systems, pp. 1577–1584. Cited by: §2.
- Influence maximization with partial feedback. Operations Research Letters 48 (1), pp. 24–28. Cited by: §4.3.
- Selecting sequences of items via submodular maximization. In Thirty-First AAAI Conference on Artificial Intelligence, Cited by: §2.
- String submodular functions with curvature constraints. IEEE Transactions on Automatic Control 61 (3), pp. 601–616. Cited by: §2.