1 Introduction
In consumer theory, it is standard to assume that the preferences of a consumer are captured by a valuation function, which is often assumed to be known to the mechanism designer. However, in a real market, one can only observe what buyers buy at given prices, the revealed preferences. Research on revealed preferences within TCS has two primary objectives: learning valuation functions from revealed preferences, with the goal of having predictive properties (Beigman and Vohra, 2006; Zadimoghaddam and Roth, 2012; Balcan et al., 2014; Blum et al., 2015); directly learning the prices that maximizeѕ social welfare or profit (den Boer, 2015; Roth et al., 2016, 2017; Babaioff et al., 2015; Besbes and Zeevi, 2009, 2012; Broder and Rusmevichientong, 2012; Wang et al., 2014; Amin et al., 2015).
The latter problem is of importance to sellers in today’s online economies, where a large amount of data about consumers’ buying patterns is available. For a seller, profit maximization is the primary goal in general, while she may also want to maximize social welfare in an effort to earn the goodwill of consumers, with increased market share as a byproduct.
In this paper, we consider social welfare and profit maximization using only revealed preferences.
1.1 Our model, results, and techniques
Consider a market with consumers and a producer (seller) who produces and sells a set of divisible goods. In the most general case, the preferences of consumer over bundles of goods are defined by a valuation function ( is called the feasible set), which is her private information and unknown. At prices she demands bundle that maximizes her value minus payment, i.e., her quasilinear utility
Given prices , the revealed preference refers to the purchased bundle of each consumer in the market (demand oracle information), or even only (aggregate demand oracle information). No other information of the valuations is revealed.
Producing the demanded goods incurs cost to the producer, which is represented by a convex cost function . The producer, or the algorithm, posts prices and makes observations repeatedly, trying to maximize the social welfare or profit, as described below.
Social welfare maximization.
The social welfare of bundles is the sum of consumers’ valuations minus production cost, i.e.,
(1) 
The benchmark used in this paper is the maximum social welfare and corresponding maximizing bundles , defined as
(2) 
In Section 3, offline social welfare maximization is considered. The producer tries to find good prices such that is maximized. Although there exist many methods to maximize a concave function, the social welfare is usually a nonconcave function in (Roth et al., 2016). Moreover, the producer only has access to the aggregate demand oracle; the true valuations are unknown.
We first show using duality theory that the maximum social welfare
, which is larger than or equal to any social welfare that can be induced by some prices, can in fact be induced by a single price vector
, which is the minimizer of a convex dual function , where and are respectively convex and concave conjugates (Rockafellar, 1970), as reviewed in Section 2. Moreover, the revealed preferences are supergradients of , with which subgradients of can be computed. Finally, to get a faster algorithm, we apply a smoothing technique to and then invoke the accelerated gradient descent method. These ideas are formalized in Algorithm 1, whose guarantee is given below.Theorem 1.1 (Informal statement of Theorem 3.1).
The additive error between the social welfare induced by Algorithm 1 and the maximum social welfare eq. 2 is at most , where is the number of queries to the aggregate demand oracle.
In other words, to ensure an additive approximation of the maximum social welfare, Algorithm 1 needs queries to the aggregate demand oracle.
(Roth et al., 2016) and (Roth et al., 2017) are the most relevant prior work. (Roth et al., 2016) studies profit maximization instead of social welfare maximization in a market with one consumer. However, it is assumed that the valuation function is homogeneous, under which profit maximization can be reduced to social welfare maximization. Assumptions made in (Roth et al., 2016) and this paper are basically identical. Key differences are: (i) (Roth et al., 2016) proposes a twolevel algorithm, where there is an outer iterative algorithm maximizing social welfare, and for each outer iterate, the supergradient of the unknown valuation function is computed by solving a dual optimization problem. In this paper, we only need to solve a single (different) dual optimization problem. Therefore, this gives a simpler approach which may be of independent interest. (ii) The subgradient of the dual objective function in this paper can be interpreted as the excess supply (see the discussion around eq. 4 in Section 3.1), which gives our algorithm a natural interpretation as a Tâtonnement process. (iii) The query complexity given in (Roth et al., 2016) can be as large as to ensure an additive error of between the induced social welfare and the maximum social welfare; one reason is that they use subgradient descent, which works for nonsmooth convex functions but converges slowly. In this paper, by combining a smoothing technique and accelerated gradient descent, Theorem 1.1 only needs queries to the aggregate demand oracle. (Roth et al., 2017) assumes that the valuation is stochastic, but only considers a linear cost. It also considers unit demand consumers with indivisible goods, which is out of the scope of this paper.
Next in Section 4 we consider online social welfare maximization under the random permutation model. In this model, consumers come to make purchases one by one, and correspondingly the producer is allowed to post prices dynamically, i.e., to update prices from to after the purchase of consumer . Random permutation here means that those consumers are first chosen potentially by an adversary, and then come and make purchases one by one in a uniformly random order. (The random permutation model has been extensively studied within online optimization (Goel and Mehta, 2008; Devanur and Hayes, 2009; Agrawal and Devanur, 2015), and is more general than the i.i.d. model where each valuation is an independent sample from an unknown distribution.) The objective is to maximize the expected online social welfare , where the expectation is taken over random orders.
The idea to solve the online social welfare maximization problem is to run an online convex optimization algorithm on a dual problem . See Algorithm 2 for details; an introduction to online convex optimization is given in Section 4.
Theorem 1.2 (Informal statement of Theorem 4.1).
The expected additive error between the online social welfare achieved by Algorithm 2 and the maximum offline social welfare eq. 2 is bounded by , where the expectation is taken over random orders of valuations.
For a given producer, the number of goods can be thought of as fixed. As a result, the loss of social welfare induced by Algorithm 2 is sublinear in the number of consumers .
The idea of Algorithm 2 comes from (Agrawal and Devanur, 2015), where a general online stochastic convex programming problem is considered. It has many other advantages when applied to online social welfare maximization. First, it is enough to assume that the valuations are continuous; the consumer demand oracle may potentially need to solve some nonconvex quasilinear utility maximization problem, but our focus is on the producer side. Since only depends on the revealed preference , not on , it is still convex. Second, Algorithm 2 is robust, in the sense that it is not sensitive to the potential error in quasilinear utility maximization. For details, see the discussion at the end of Section 4.
Profit maximization.
Next we consider profit maximization with access to the aggregate demand oracle. Given prices , the profit of producer is the revenue minus production cost, i.e.,
(3) 
Although it is more reasonable for the producer to maximize the profit, this problem is hard due to nonconvexity. The social welfare maximization problems are solved by making a reduction to some convex optimization problem on the space of prices. However, for profit maximization, both the set of optimal bundles and the set of optimal prices may be nonconvex, as shown by Example 5.1 in Section 5.
We then consider the case where both the valuations and cost are separable. A separable valuation , while similarly a separable cost . Under this assumption, in Section 5 we give upper and lower bounds on the query complexity for profit maximization and revenue maximization (i.e., the cost is ). These upper and lower bounds match for revenue maximization.
Theorem 1.3 (Informal statement of Theorems 5.2 and 5.1).
Consider a market with consumers and goods. If the valuations are strongly concave, and both the valuations and cost are separable and Lipschitz continuous, then Algorithm 3 maximizes the profit up to an additive error with queries to the aggregate demand oracle. If the cost is zero, then the strongly concave assumption on valuations can be dropped.
On the other hand, for concave, separable and Lipschitz continuous valuations, any algorithm requires queries to the aggregate demand oracle in order to maximize the revenue up to an additive error.
1.2 Related work
Samuelson started the theory of revealed preferences in 1938 (Samuelson, 1938) to facilitate mapping observed data to valuation functions, which led to extensive work within economics on “rationalization” or “fitting the samples” (Houthakker, 1950; Richter, 1966; Afriat, 1967; Uzawa, 1960; MasColell, 1977, 1978; Diewert, 1973; Varian, 2005). In TCS, there have been a lot of work on learning valuations from revealed preferences with which predictions can be made (Beigman and Vohra, 2006; Zadimoghaddam and Roth, 2012; Balcan et al., 2014; Blum et al., 2015).
Another line of research is on learning prices directly that can maximize social welfare or profit, usually known as the dynamic pricing problem (den Boer, 2015; Babaioff et al., 2015; Besbes and Zeevi, 2009, 2012; Broder and Rusmevichientong, 2012; Amin et al., 2015). Some prior works assume nice properties of the demand function (oracle) itself, such as linearity in case of large number of goods (Keskin and Zeevi, 2014; den Boer and Zwart, 2013), concavity (Babaioff et al., 2015), Lipschitz continuity (Besbes and Zeevi, 2009, 2012; Wang et al., 2014). However, these properties may not be satisfied by demands that come from typical concave valuation functions. In (Amin et al., 2015), the valuation function is assumed to be linear, and is first partially inferred and then used in a price optimization step. However, if the valuation is general concave, such a learning phase is not possible (Beigman and Vohra, 2006).
Recently, (Dong et al., 2018) studies an online linear classification problem under the revealed preference model.
2 Preliminaries
Market model.
Our model consists of one producer (seller) who produces and sells divisible goods, and consumers. Consumer ’s preferences are represented by an unknown valuation function . The feasible consumption set is typically assumed to be convex and compact with nonempty interior. It is assumed that is known to the algorithm, and let denote the diameter of . Note that our algorithm can be extended to the case where ’s have different domains with different diameters; a common domain is used here only for convenience.
Given prices of goods, the quasilinear utility of a bundle is defined as
Naturally, consumer demands a bundle from that maximizes her quasilinear utility
which is known as the revealed preference of consumer at prices . Once the seller sets prices , we only get to see for each consumer , where every consumer can be thought of as a demand oracle, or even only , where the market can be seen as an aggregate demand oracle. is always assumed to be continuous to ensure that exists. This is the only assumption needed for the online social welfare maximization part of this paper; the offline social welfare maximization part and the profit maximization part further assume that the valuations are strongly concave, which will be introduced later.
The production cost is represented by a convex, Lipschitz continuous, nondecreasing cost function , where since is convex. Note that the domain of is big enough to allow production of any aggregate demand. Let denote the modulus of Lipschitz continuity of with respect to the norm. It is assumed that the cost function is known to the algorithm.
The producer, or the algorithm, can only post prices and observe the purchased bundles repeatedly, trying to maximize the social welfare eq. 1 or profit eq. 3. Note that if valuations are only continuous, and the induced social welfare and profit may not be uniquely defined. In this paper, the online social welfare result holds for any , while in offline social welfare and profit maximization, is unique since strong concavity is assumed.
Convex and concave conjugates.
The notion of convex and concave conjugates are crucial in our algorithms. Given a convex function where is nonempty, its convex conjugate is defined as:
where the domain of is given by . Similarly, given a concave function , its concave conjugate is defined as
where the domain of is given by . Since we only compute convex conjugates of convex functions and concave conjugates of concave functions, the above notation is fine. denotes the set of subgradients of convex or supergradients of concave .
Note that in our case, since is nonempty and compact and is continuous, . Similarly, for every , .
Lemma 2.1 is crucial in our algorithm: One key observation in this paper is that revealed preferences are actually supergradients of the concave conjugate of valuation, which is given by Lemma 2.1. Lemma 2.1 can be derived from (HiriartUrruty and Lemaréchal, 2012) Corollary E.1.4.4 immediately. Although it is stated for convex functions and convex conjugates, corresponding properties hold for concave functions and concave conjugates.
Lemma 2.1.
Suppose is convex continuous with nonempty domain. For every pair ,
3 Offline social welfare maximization
Problem description.
The goal of offline social welfare maximization is to find prices such that the induced social welfare is maximized. As introduced below, ’s are assumed to be strongly concave, and thus ’s are uniquely determined.
Strongly concave valuations.
In the offline setting, the valuation functions (’s) are further assumed to be strongly concave, meaning that is concave. Concavity is a standard assumption on valuations to capture diminishing marginal returns. Strong concavity, as its name suggests, is a strong assumption; however it is satisfied by many common valuations such as the constant elasticity of substitution functions and CobbDouglas functions (c.f. (Roth et al., 2016)). Furthermore, a common modulus of strong concavity is only for convenience; the algorithm can be easily adapted to the case where different have different moduli of strong concavity.
The dual notion to strong concavity (convexity) is strong smoothness. is strongly smooth if is differentiable and its gradient is Lipschitz continuous, or formally, for any , .
The following lemma can be immediately derived from (HiriartUrruty and Lemaréchal, 2012) Theorem E.4.2.1 and E.4.2.2.
Lemma 3.1.
Suppose is concave continuous and is nonempty. Then is strongly concave if and only if is strongly smooth and concave on .
Accelerated gradient descent.
The accelerated gradient descent algorithm, which was first introduced in (Nesterov, 1983), gives the optimal convergence rate for smooth convex optimization problems. There have been many extensions to AGD, including (Tseng, 2008; AllenZhu and Orecchia, 2014). In this paper, one variant called the AGM algorithm given in (AllenZhu and Orecchia, 2014) will be invoked.
Lemma 3.2 ((AllenZhu and Orecchia, 2014) Theorem 4.1).
Suppose is strongly smooth and convex and . Given , for any , The AGM algorithm outputs such that
3.1 Algorithm and analysis
We propose Algorithm 1 to solve the offline social welfare maximization problem.
Theorem 3.1.
The social welfare induced by given by Algorithm 1 is within from the maximum offline social welfare.
Here is a proof sketch; detailed proofs are given in Appendix A. The proof is based on two observations. The first one, formalized in Lemma 3.3, says that the optimal solution of a dual optimization problem can induce the maximum social welfare .
Lemma 3.3.
Given concave continuous valuations and a convex continuous cost ,
and for any dual optimal solution , maximizes social welfare. If furthermore the cost is nondecreasing and Lipschitz continuous, then
and for any optimal solution of the rightmost dual problem, maximizes social welfare.
Lemma 3.3 tells us that the minimizer of induces , and thus it is natural to try to solve this dual optimization problem. However, ’s are unknown to us, and so are ’s and . The second observation is that the revealed preference, , actually gives a supergradient of at . Formally, given a concave continuous valuation , for any , by Lemma 2.1,
(4) 
Similarly, given a convex continuous cost , for any , . In other words, the subgradient of at gives a bundle which maximizes the producer’s profit, assuming everything produced can be sold.
As a result, we can run subgradientbased optimization algorithms to minimize . ( is known to the algorithm, and so is ; the computation of subgradients of is another problem, but does not require access to the consumer demand oracles.) Since is strongly concave, by Lemma 3.1, is strongly smooth and concave. However, there is no guarantee on , and thus in general, is not strongly smooth. In this case the standard optimization algorithm is subgradient descent. However, for strongly smooth and convex functions, accelerated gradient descent converges much faster than subgradient descent. To invoke accelerated gradient descent, the smoothing technique given in (Nesterov, 2005) is used. We minimize as given in Algorithm 1 and tune the parameter , which finally gives Theorem 3.1. Detailed proofs are given in Appendix A.
4 Online social welfare maximization
Problem description.
In online social welfare maximization, consumers come one by one and the producer/algorithm can post prices dynamically. Specifically, at step , prices are posted, and then consumer comes and makes a purchase . Then the algorithm updates to , based on past information. The goal is to maximize the online social welfare .
To model the randomness in the real world, it is usually assumed that valuations are sampled i.i.d. from some unknown distribution. Here we consider a slightly stronger model, called the random permutation model. In the random permutation model, an adversary chooses valuations in advance, which then come in a uniformly random order. Formally, let be a random permutation of , then at step the consumer with valuation comes and makes a purchase, after is posted. Note that in the random permutation model, the corresponding offline problem is fixed (with valuations ). We still let , and let
Our goal is to show that the expected online social welfare is close to , where the expectation is taken over the random permutation .
Note that no more assumption is made; valuations are only required to be continuous.
Online convex optimization.
The algorithm for online social welfare maximization invokes an online convex optimization (OCO) algorithm as a subroutine. In an OCO problem, there is a feasible domain and steps. At step , the OCO algorithm determines , and then a convex function is chosen (potentially by an adversary) and a loss of is induced. Based on the past information (formally, and ), the algorithm updates to , and tries to minimize the regret
The regret of an OCO algorithm is denoted by .
The online (sub)gradient descent algorithm performs the following update at step :
where is the step size, , and is the projection onto .
Lemma 4.1 ((Hazan et al., 2016) Theorem 3.1).
Let , , and . Then
4.1 Algorithm and analysis
Algorithm 2 is proposed to solve the online social welfare maximization problem. The idea of Algorithm 2 comes from (Agrawal and Devanur, 2015).
Theorem 4.1.
The expected social welfare of Algorithm 2 with respect to a uniformly random permutation of continuous valuations, is within from the offline optimum social welfare, where is the diameter of . Specifically, for online gradient descent, the difference is bounded by .
Proof.
For convenience, let denote . By the regret bound of the OCO algorithm,
(5)  
First, we examine the second term of eq. 5:
The first equality comes from the definition of , while the second inequality is due to the definition of and the monotonicity and Lipschitz continuity of . Thus the second term of eq. 5 always equals the social welfare achieved by Algorithm 2. In the following we show that the first term of eq. 5 is within from the offline maximum social welfare .
For a permutation of , let denote . Note that is determined by (), is determined by , and depends on and . Fix and , note that the revealed preference maximizes the quasilinear utility given , we have
(6) 
Consider the last term of eq. 6 and take expectation with respect to :
(7) 
Then consider the first two terms of eq. 6:
(8) 
Here the first inequality is due to CauchySchwarz inequality, while the second inequality is given by FenchelYoung inequality.
eq. 6, eq. 7 and eq. 8 give us
(9) 
Furthermore, Lemma B.1 shows that the last sum in eq. 9 is bounded by , and thus Theorem 4.1 is proved for general OCO algorithms. Finally, to prove the bound for online gradient descent, it is enough to use step size (recall that is the diameter of ) and invoke Lemma 4.1. ∎
As we can see from the proof of Theorem 4.1, it is enough to have continuous valuations. Furthermore, Algorithm 2 still works if consumers only maximize their quasilinear utilities approximately. Formally, if consumer finds a bundle such that , then an additive error of will be introduced in eq. 6. However, as long as the total error is not large, the expected online social welfare of Algorithm 2 will still be close to the offline optimum.
5 Profit maximization for separable valuations and cost
Previously, social welfare maximization is solved by reducing to a convex optimization problem on the price space. However, profit maximization may be nonconvex on both the bundle space and the price space.
Example 5.1.
Consider a market where there is only one consumer, one good, and zero cost. Suppose is continuous and strictly decreasing, with , , and for any . The integral of gives a nondecreasing concave valuation . It can be shown that the maximum profit is , which is attained by price at quantity or price at quantity . Thus the set of optimum prices and optimum bundles are both nonconvex.
Here we present an algorithm of profit maximization and a nearly matching lower bound when all ’s and are separable. Formally, for every and every , , and for every , . Due to the separability assumption, we restate the assumptions on the feasible set, valuation functions and cost function:

.

For every , , is strongly concave and Lipschitz continuous.

For every , is Lipschitz continuous.
The th consumer’s consumption of good is completely determined by and is denoted by . Furthermore, . Our goal is thus to maximize , for each . Although we can set prices for different goods independently now, to keep consistency, we still consider posting new prices as one query.
Theorem 5.1.
The profit given by Algorithm 3 is no less than the optimum profit minus . The number of queries is .
Proof.
Fix . Let denote the profitmaximizing price for good . Suppose . By the definition of strong smoothness and Lemma 3.1, we have
∎
Remark.
Note that if the cost is and thus revenue maximization i considered, then we can set in Algorithm 3, and it is enough to assume concave valuations.
Theorem 5.2 shows that the dependency on and cannot be improved, even for revenue maximization. The proof is given in Appendix C.
Theorem 5.2.
The revenue maximization problem needs queries to get an additive error , even if the valuations are separable, concave, nondecreasing, and Lipschitz continuous.
6 Conclusion and open problems
In this paper, we study social welfare and profit maximization with only revealed preferences. The social welfare maximization problem can be solved by reducing to a convex dual optimization problem in both the offline and online case, while profit maximization is essentially nonconvex, for which we give nearly matching upper and lower bounds on the query complexity when valuations and cost are separable.
While social welfare maximization is interesting and important, it is still more reasonable for a producer to maximize profit. However, as shown by Example 5.1, this problem is in general nonconvex. While we give an algorithm for the separable case, it is a very interesting open problem to design algorithms for profit maximization in a more general setting or show some hardness result.
References
 Afriat (1967) S. N.. Afriat. The construction of utility functions from expenditure data. International Economic Review, 1967.
 Agrawal and Devanur (2015) Shipra Agrawal and Nikhil R Devanur. Fast algorithms for online stochastic convex programming. In Proceedings of the TwentySixth Annual ACMSIAM Symposium on Discrete Algorithms, pages 1405–1424. SIAM, 2015.
 AllenZhu and Orecchia (2014) Zeyuan AllenZhu and Lorenzo Orecchia. Linear coupling: An ultimate unification of gradient and mirror descent. arXiv preprint arXiv:1407.1537, 2014.
 Amin et al. (2015) Kareem Amin, Rachel Cummings, Lili Dworkin, Michael Kearns, and Aaron Roth. Online learning and profit maximization from revealed preferences. In AAAI, pages 770–776, 2015.
 Babaioff et al. (2015) Moshe Babaioff, Shaddin Dughmi, Robert Kleinberg, and Aleksandrs Slivkins. Dynamic pricing with limited supply. ACM Transactions on Economics and Computation, 3(1):4, 2015.
 Balcan et al. (2014) MariaFlorina Balcan, Amit Daniely, Ruta Mehta, Ruth Urner, and Vijay V Vazirani. Learning economic parameters from revealed preferences. In International Conference on Web and Internet Economics, pages 338–353. Springer, 2014.
 Beigman and Vohra (2006) Eyal Beigman and Rakesh Vohra. Learning from revealed preference. In Proceedings of the 7th ACM Conference on Electronic Commerce, pages 36–42. ACM, 2006.
 Besbes and Zeevi (2009) Omar Besbes and Assaf Zeevi. Dynamic pricing without knowing the demand function: Risk bounds and nearoptimal algorithms. Operations Research, 57(6):1407–1420, 2009.
 Besbes and Zeevi (2012) Omar Besbes and Assaf Zeevi. Blind network revenue management. Operations research, 60(6):1537–1550, 2012.
 Blum et al. (2015) Avrim Blum, Yishay Mansour, and Jamie Morgenstern. Learning what’s going on: reconstructing preferences and priorities from opaque transactions. In Proceedings of the Sixteenth ACM Conference on Economics and Computation, pages 601–618. ACM, 2015.
 Broder and Rusmevichientong (2012) Josef Broder and Paat Rusmevichientong. Dynamic pricing under a general parametric choice model. Operations Research, 60(4):965–980, 2012.
 den Boer (2015) Arnoud V den Boer. Dynamic pricing and learning: historical origins, current research, and new directions. Surveys in operations research and management science, 20(1):1–18, 2015.

den Boer and Zwart (2013)
Arnoud V den Boer and Bert Zwart.
Simultaneously learning and optimizing using controlled variance pricing.
Management science, 60(3):770–783, 2013.  Devanur and Hayes (2009) Nikhil R Devanur and Thomas P Hayes. The adwords problem: online keyword matching with budgeted bidders under random permutations. In Proceedings of the 10th ACM conference on Electronic commerce, pages 71–78. ACM, 2009.
 Diewert (1973) E. Diewert. Afriat and revealed preference theory. Review of Economic Studies, 40:419–426, 1973.
 Dong et al. (2018) Jinshuo Dong, Aaron Roth, Zachary Schutzman, Bo Waggoner, and Zhiwei Steven Wu. Strategic classification from revealed preferences. In Proceedings of the 2018 ACM Conference on Economics and Computation, pages 55–70. ACM, 2018.
 Goel and Mehta (2008) Gagan Goel and Aranyak Mehta. Online budgeted matching in random input models with applications to adwords. In Proceedings of the nineteenth annual ACMSIAM symposium on Discrete algorithms, pages 982–991. Society for Industrial and Applied Mathematics, 2008.
 Hazan et al. (2016) Elad Hazan et al. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(34):157–325, 2016.
 HiriartUrruty and Lemaréchal (2012) JeanBaptiste HiriartUrruty and Claude Lemaréchal. Fundamentals of convex analysis. Springer Science & Business Media, 2012.
 Houthakker (1950) H. S. Houthakker. Revealed preference and the utility function. Economica, 17:159–174, 1950.
 Keskin and Zeevi (2014) N Bora Keskin and Assaf Zeevi. Dynamic pricing with an unknown demand model: Asymptotically optimal semimyopic policies. Operations Research, 62(5):1142–1167, 2014.
 MasColell (1977) Andreu MasColell. The recoverability of consumers’ preferences from market demand. Econometrica, 45(6):1409–1430, 1977.
 MasColell (1978) Andreu MasColell. On revealed preference analysis. The Review of Economic Studies, 45(1):121–131, 1978.
 Nesterov (2005) Yu Nesterov. Smooth minimization of nonsmooth functions. Mathematical programming, 103(1):127–152, 2005.
 Nesterov (1983) Yurii Nesterov. A method of solving a convex programming problem with convergence rate . Soviet Math. Dokl., 27(2):372–376, 1983.
 Nesterov (2013) Yurii Nesterov. Introductory lectures on convex optimization: A basic course. Springer Science & Business Media, 2013.
 Richter (1966) M. Richter. Revealed preference theory. Econometrica, 34(3):635–645, 1966.
 Rockafellar (1970) R. Tyrrell Rockafellar. Convex Analysis. Princeton University Press, 1970.

Roth et al. (2016)
Aaron Roth, Jonathan Ullman, and Zhiwei Steven Wu.
Watch and learn: Optimizing from revealed preferences feedback.
In
Proceedings of the fortyeighth annual ACM symposium on Theory of Computing
, pages 949–962. ACM, 2016.  Roth et al. (2017) Aaron Roth, Aleksandrs Slivkins, Jonathan Ullman, and Zhiwei Steven Wu. Multidimensional dynamic pricing for welfare maximization. In Proceedings of the 2017 ACM Conference on Economics and Computation, pages 519–536. ACM, 2017.
 Samuelson (1938) Paul A Samuelson. A note on the pure theory of consumer’s behaviour. Economica, 5(17):61–71, 1938.
 Sion et al. (1958) Maurice Sion et al. On general minimax theorems. Pacific Journal of mathematics, 8(1):171–176, 1958.
 Tseng (2008) Paul Tseng. On accelerated proximal gradient methods for convexconcave optimization. http://www.mit.edu/~dimitrib/PTseng/papers/apgm.pdf, 2008.
 Uzawa (1960) H. Uzawa. Preference and rational choice in the theory of consumption. Mathematical Models in Social Science, eds. K. J. Arrow, S. Karlin, and P. Suppes, 1960.
 Varian (2005) Hal R. Varian. Revealed preference. In Samuelsonian Economics and the 21st Century. M. Szenberg, L. Ramrattand, and A. A. Gottesman, editors, pages 99–115, 2005.
 Wang et al. (2014) Zizhuo Wang, Shiming Deng, and Yinyu Ye. Close the gaps: A learningwhiledoing algorithm for singleproduct revenue management problems. Operations Research, 62(2):318–331, 2014.
 Zadimoghaddam and Roth (2012) Morteza Zadimoghaddam and Aaron Roth. Efficiently learning from revealed preference. In WINE, pages 114–127. Springer, 2012.
Appendix A Omitted proofs from Section 3
Proof of Lemma 3.3.
Maximizing social welfare is equivalent to solving the following problem