Imagine you are interested in learning an accurate estimate of the probability that the United States unemployment rate for a particular month will fall below 10%. You could choose to spend hours digging through news articles, reading financial reports, and weighing various opinions against each other, eventually coming up with a reasonably informed estimate. However, you could potentially save yourself a lot of hassle (and obtain a better estimate!) by appealing to the wisdom of crowds.
A prediction market is a financial market designed for information aggregation. For example, in a cost function based prediction market , the organizer (or market maker) trades a set of securities corresponding to each potential outcome of an event. The market maker might offer a security that pays $1 if and only if the United States unemployment rate for January 2010 is above 10%. A risk neutral trader who believes that the true probability that the unemployment rate will be above 10% is should be willing to buy a share of this security at any price below . Similarly, he should be willing to sell a share of this security at any price above . For this reason, the current market price of this security can be viewed as the population’s collective estimate of how likely it is that the unemployment rate will be above 10%.
These estimates have proved quite accurate in practice in a wide variety of domains. (See Ledyard et al.  for an impressive assortment of examples.) The theory of rational expectations equilibria offers some insight into why prediction markets in general should converge to accurate prices, but is plagued by strong assumptions and no-trade theorems . Furthermore, this theory says nothing of why particular prediction market mechanisms, such as Hanson’s increasingly popular Logarithmic Market Scoring Rule (LMSR) [15, 16], might produce more accurate estimates than others in practice. In this work, we aim to provide additional insight into the learning power of particular market mechanisms by highlighting the deep mathematical connections between prediction markets and no-regret learning.
It should come as no surprise that there is a connection between prediction markets and learning. The theories of markets and learning are built upon many of the same fundamental concepts, such as proper scoring rules (called proper losses in the learning community) and Bregman divergences. To our knowledge, Chen et al.  were the first to formally demonstrate a connection, showing that the standard Randomized Weighted Majority regret bound  can be used as a starting point to rederive the well-known bound on the worst-case loss of a LMSR marker maker. (They went on to show that PermELearn, an extension of Weighted Majority to permutation learning , can be used to efficiently run LMSR over combinatorial outcome spaces for betting on rankings.) As we show in Section 4, the converse is also true; the Weighted Majority regret bound can be derived directly from the bound on the worst-case loss of a market maker using LMSR. However, the connection goes much deeper.
In Section 4, we show how any cost function based prediction market with bounded loss can be interpreted as a no-regret learning algorithm. Furthermore, if the loss of the market maker is bounded, this bound can be used to derive an regret bound for the corresponding learning algorithm. The key ides is to view the trades made in the market as losses
observed by the learning algorithm. We can then think of the market maker as learning a probability distribution over outcomes by treating each observed trade as a training instance.
In Section 5, we go on to show that the class of convex cost function based markets exactly corresponds to the class of Follow the Regularized Leader learning algorithms [34, 18, 17] in which weights are chosen at each time step to minimize a combination of empirical loss and a convex regularization term. This allows us to interpret the selection of a cost function for the market as the selection of a regularizer for the learning problem. Furthermore, we prove an equivalence between another common class of prediction markets, market scoring rules, and convex cost function based markets,111A similar but weaker correspondence between market scoring rules and cost function based markets was discussed in Chen and Pennock  and Agrawal et al. . which immediately implies that market scoring rules can be interpreted as Follow the Regularized Leader algorithms too. These connections provide insight into why it is that prediction markets tend to yield such accurate estimates in practice.
2 Prediction Markets
In recent years, a variety of compelling prediction market mechanisms have been proposed and studied, including standard call market mechanisms and Pennock’s dynamic parimutuel markets . In this work we focus on two broad classes of mechanisms: Hanson’s market scoring rules [15, 16] and cost function based prediction markets as described in Chen and Pennock . We also briefly discuss the related class of Sequential Convex Parimutuel Mechanisms  in Section 5.4.
2.1 Market Scoring Rules
Scoring rules have long been used in the evaluation of probabilistic forecasts. In the context of prediction markets and elicitation, scoring rules are used to encourage individuals to make careful assessments and truthfully report their beliefs [33, 11, 26]3, 32].
Formally, let be a set of mutually exclusive and exhaustive outcomes of a future event. A scoring rule maps a probability distribution to a score for each outcome , with taking values in the extended real line . Intuitively, this score represents the reward of a forecaster might receive for predicting the distribution if the outcome turns out to be . A scoring rule is said to be regular relative to the probability simplex if for all , with . This implies that is finite whenever . A scoring rule is said to be proper if a risk-neutral forecaster who believes the true distribution over outcomes to be has no incentive to report any alternate distribution , that is, if for all distributions . The rule is strictly proper if this inequality holds with equality only when .
Two examples of regular, strictly proper scoring rules commonly used in both elicitation and in machine learning are the the quadratic scoring rule :
and the logarithmic scoring rule :
with arbitrary parameters and parameter . The uses and properties of scoring rules are too extensive to cover in detail here. For a nice survey, see Gneiting and Raftery .
Market scoring rules were developed by Hanson [15, 16] as a method of using scoring rules to pool opinions from many different forecasters. Market scoring rules are sequentially shared scoring rules. Formally, the market maintains a current probability distribution . At any time, a trader can enter the market and change this distribution to an arbitrary distribution of her choice.222While may be arbitrary, in some market scoring rules, such as the LMSR, distributions that place a weight of 0 on any outcome are not allowed because it requires the trader to pay infinite amount of money if the outcome with reported probability 0 actually happens. If the outcome turns out to be , she receives a (possibly negative) payoff of . For example, in the popular Logarithmic Market Scoring Rule (LMSR), which is based on the logarithmic scoring rule in Equation 2, a trader who changes the distribution from to receives a payoff of .
Since the trader has no control over , a myopic trader who believes the true distribution to be maximizes her expected payoff by maximizing . Thus if is a strictly proper scoring rule, traders have an incentive to change the market’s distribution to match their true beliefs. The idea is that if traders update their own beliefs over time based on market activity, the market’s distribution should eventually converge to the collective beliefs of the population.
Each trader in a market scoring rule is essentially responsible for paying the previous trader’s score. Thus the market maker is responsible only for paying the score of the final trader. Let be the initial probability distribution of the market. The worst case loss of the market maker is then
The worst case loss of the market maker running an LMSR initialized to the uniform distribution is.
Note that the parameters of the logarithmic scoring rule do not affect either the payoff of traders or the loss of the market maker in the LMSR. For simplicity, in the remainder of this paper when discussing the LMSR we assume that for all .
2.2 Cost Function Based Markets
As before, let be a set of mutually exclusive and exhaustive outcomes of an event. In a cost function based market, a market maker offers a security corresponding to each outcome . The security associated with outcome pays off $1 if happens, and $0 otherwise.333The dynamic parimutuel market falls outside this framework since the winning payoff depends on future trades.
Different mechanisms can be used to determine how these securities are priced. Each mechanism is specified using a differentiable cost function . This cost function is simply a potential function describing the amount of money currently wagered in the market as a function of the quantity of shares purchased. If is the number of shares of security currently held by traders, and a trader would like to purchase shares of each security (where could be zero or even negative, representing the sale of shares), the trader must pay to the market maker. The instantaneous price of security (that is, the price per share of an infinitely small number of shares) is then .
We say that a cost function is valid if the associated prices satisfy two simple conditions:
For every and every , .
For every , .
The first condition ensures that the price of a security is never negative. If the current price of the security associated with an outcome were negative, a trader could purchase shares of this security at a guaranteed profit. The second condition ensures that the prices of all securities sum to 1. If the prices summed to something less than (respectively, greater than) 1, then a trader could purchase (respectively, sell) small equal quantities of each security for a guaranteed profit. Together, these conditions ensure that there are no arbitrage opportunities in the market.
These conditions also ensure that the current prices can always be viewed as a valid probability distribution over the outcome space. In fact, these prices represent the market’s current estimate of the probability that outcome will occur.
The following theorem gives sufficient and necessary conditions for the cost function to be valid. While these properties of cost functions have been discussed elsewhere [5, 1], the fact that they are both sufficient and necessary for any valid cost function is important for our later analysis. As such, we state the full proof here for completeness.
A cost function is valid if and only if it satisfies the following three properties:
Differentiability: The partial derivatives exist for all and .
Increasing Monotonicity: For any and , if , then .
Positive Translation Invariance: For any and any constant , .
Differentiability is necessary and sufficient for the price functions to be well-defined at all points. It is easy to see that requiring the cost function to be monotonic is equivalent to requiring that for all and . We will show that requiring positive translation invariance is equivalent to requiring that the prices always sum to one.
First, assume that for all . For any fixed value of , define and let be the th component of . Then for any ,
This is precisely translation invariance.
Now assume instead that positive translation invariance holds. Fix any arbitrary and and define . Notice that by setting and appropriately, we can make take on any arbitrary values. We have,
By translation invariance, . Thus,
Combining the two equations, we have .
One quantity that is useful for comparing different market mechanisms is the worst-case loss of the market maker,
This is simply the difference between the maximum amount that the market maker might have to pay the winners and the amount of money collected by the market maker.
and the corresponding prices are
This formulation is equivalent to the market scoring rule formulation in the sense that a trader who changes the market probabilities from to in the MSR formulation receives the same payoff for every outcome
as a trader who changes the quantity vectors from anyto such that and in the cost function formulation.
3 Learning from Expert Advice
We now briefly review the problem of learning from expert advice. In this framework, an algorithm makes a sequence of predictions based on the advice of a set of experts and receives a corresponding sequence of losses.444This framework could be formalized equally well in terms of gains, but losses are more common in the literature. The goal of the algorithm is to achieve a cumulative loss that is “almost as low” as the cumulative loss of the best performing expert in hindsight. No statistical assumptions are made about these losses. Indeed, algorithms are expected to perform well even if the sequence of losses is chosen by an adversary.
Formally, at every time step , every expert receives a loss . The cumulative loss of expert at time is then defined as . An algorithm maintains a weight for each expert at time , where . These weights can be viewed as a distribution over the experts. The algorithm then receives its own instantaneous loss , which can be interpreted as the expected loss the algorithm would receive if it always chose an expert to follow according to the current distribution. The cumulative loss of up to time is defined in the natural way as .
It is unreasonable to expect the algorithm to achieve a small cumulative loss if none of the experts perform well. As such, it is typical to measure the performance of an algorithm in terms of its regret, defined to be the difference between the cumulative loss of the algorithm and the loss of the best performing expert, that is,
An algorithm is said to have no regret if the average per time step regret approaches as approaches infinity.
where is a tunable parameter known as the learning rate. It is well known that the regret of WM after trials can be bounded as
When is known in advance, setting yields the standard regret bound.
It has been shown that the weights chosen by Weighted Majority are precisely those that minimize a combination of empirical loss and an entropic regularization term [24, 25, 20]. More specifically, the weights at time are precisely those that minimize
among all , where H is the entropy. This makes Weighted Majority an example of broader class of algorithms collectively known as Follow the Regularized Leader algorithms [34, 18, 17]. This class of algorithms grew out of the following fundamental insight of Kalai and Vempala .
Consider first the aptly named Follow the Leader algorithm, which chooses weights at time to minimize . This algorithm simply places all of its weight on the single expert (or set of experts) with the best performance on previous examples. As such, this algorithm can be highly unstable, dramatically changing its weights from one time step to the next. It is easy to see that Follow the Leader suffers regret in the worst case when the best expert changes frequently. For example, if there are only two experts with losses starting at and then alternating , then FTL places a weight of 1 on the losing expert at every point in time.
To overcome this instability, Kalai and Vempala  suggested adding a random perturbation to the empirical loss of each expert, and choosing the expert that minimizes this perturbed loss.555A very similar algorithm was originally developed and analyzed by Hannan in the 1950s . However, in general this perturbation need not be random. Instead of adding a random perturbation, it is possible to gain the necessary stability by adding a regularizer and choosing weights to minimize
This Follow the Regularized Leader (FTRL) approach gets around the instability of FTL and guarantees low regret for a wide variety of regularizers, as evidenced by the following bound of Hazan and Kale .
Lemma 1 (Hazan and Kale )
For any regularizer , the regret of FTRL can be bounded as
This lemma quantifies the trade-off that must be considered when choosing a regularizer. If the range of the regularizer is too small, the weights will change dramatically from one round to the next, and the first term in the bound will be large. On the other hand, if the range of the regularizer is too big, the weights that are chosen will be too far from the true loss minimizers and the second term will blow up.
It is generally assumed that the regularizer is strictly convex. This assumption ensures that Equation 3 has a unique minimizer and that this minimizer can be computed efficiently. Hazan  shows that if is strictly convex then it is possible to achieve a regret of . In particular, by optimizing appropriately the regret bound in Lemma 1 can be upper bounded by
4 Interpreting Prediction Markets as No-Regret Learners
With this foundation in place, we are ready to describe how any bounded loss market maker can be interpreted as an algorithm for learning from expert advice. The key idea is to equate the trades made in the market with the losses observed by the learning algorithm. We can then view the market maker as essentially learning a probability distribution over outcomes by treating each observed trade as a training instance.
More formally, consider any cost function based market maker with instantaneous price functions for each outcome . We convert such a market maker to an algorithm for learning from expert advice by setting the weight of expert at time using
where is a tunable parameter and is the vector of cumulative losses at time . In other words, the weight on expert at time in the learning algorithm is the instantaneous price of security in the market when shares have been purchased (or shares have been sold) of each security . We discuss the role of the parameter in more detail below.
First note that for any valid cost function based prediction market, setting the weights as in Equation 5 entails valid expert learning algorithm. Since the prices of any valid prediction market must be non-negative and sum to one, the weights of the resulting algorithm are guaranteed to satisfy these properties too. Furthermore, the weights are a function of only the past losses of each expert, which the algorithm is permitted to observe.
Below we show that applying this conversion to any bounded-loss market maker with slowly changing prices yields a learning algorithm with regret. The quality of the regret bound obtained depends on the trade-off between market maker loss and how quickly the prices change. We then show how this bound can be used to rederive the standard regret bound of Weighted Majority, the converse of the result of Chen et al. .
4.1 A Bound on Regret
In order to derive a regret bound for the learning algorithm defined in Equation 5, it is necessary to make some restrictions on how quickly the prices in the market change. If market prices change too quickly, the resulting learning algorithm will be unstable and will suffer high worst-case regret, as was the case with the naive Follow The Leader algorithm described in Section 3. To capture this idea, we introduce the notion of -stability, defined as follows.
We say that a set of price functions is -stable for a constant if is continuous and piecewise differentiable for all and for all , where
Defining -stability in terms of the allows us to quantify how slowly the prices change even when the price functions are not differentiable at all points. We can then derive a regret bound for the resulting learning algorithm using the following simple lemma. This lemma states that when the quantity vector in the market is , if the price functions are -stable, then the amount of money that the market maker would collect for the purchase of a small quantity of each security is not too far from the amount that the market maker would have collected had he instead priced the shares according to the fixed price .
Let be any valid cost function yielding -stable prices. For any , any , and any such that for ,
The proof is in Appendix A.
With this lemma in place, we are ready to derive the regret bound. In the following theorem, it is assumed that is known a priori and therefore can be used to set . If is not known in advance, a standard “doubling trick” can be applied . The idea behind the doubling trick is to partition time into periods of exponentially increasing length, restarting the algorithm each period. This leads to similar bounds with only an extra factor of .
Let be any valid cost function yielding -stable prices. Let be a bound on the worst-case loss of the market maker mechanism associated with . Let be the expert learning algorithm with weights as in Equation 5 with . Then for any sequence of expert losses over time steps,
By setting the weights as in Equation 5, we are essentially simulating a market over outcomes. Let denote the number of shares of outcome purchased at time step in this simulated market, and denote by the vector of these quantities for all . Note that is completely in our control since we are simply simulating a market, thus we can choose to set for all and . We have that for all and since . Let be the total number of outstanding shares of security after time , with denoting the vector over all . The weight assigned to expert at round of the learning algorithm corresponds to the instantaneous price of security in the simulated market immediately before round , that is, .
By the definition of worst-case market maker loss, . It is easy to see that we can rewrite the left-hand side of this equation to obtain
From Lemma 2, this gives us that
Substituting and , we get
Setting yields the bound.
4.2 Rederiving the Weighted Majority Bound
Chen et al.  showed that the Weighted Majority regret bound can be used as a starting point to rederive the worst case loss of of an LMSR market maker. Here we show that the converse is also true; by applying Theorem 2, we can rederive the Weighted Majority bound from the bounded market maker loss of LMSR.
Let be the pricing function of a LMSR with parameter . Then
Using Equation 5 to transform the LMSR into a learning algorithm, we end up with weights
Setting , we see that these weights are equivalent to those used by Weighted Majority with the learning rate . As mentioned above, this is the optimal setting of . Notice that these weights do not depend on the value of the parameter in the prediction market.
5 Connections Between Market Scoring Rules, Cost Functions, and Regularization
In this section, we establish the formal connections among market scoring rules, cost function based markets, and the class of Follow the Regularized Leaders algorithms. We start with a representation theorem for cost function based markets, which is crucial in our later analysis.
5.1 A Representation Theorem for Convex Cost Functions
In this section we show a representation theorem for convex cost functions. The proof of this theorem relies on the connection between convex cost functions and a class of functions known in the finance literature as convex risk measures, which was first noted by Agrawal et al. . Convex risk measures were originally introduced by Föllmer and Schied  to model different attitudes towards risk in financial markets. A risk measure can be viewed as a mapping from a vector of returns (corresponding to each possible outcome of an event) to a real number. The interpretation is that a vector of returns is “preferred to” the vector under a risk measure if and only if .
Formally, a function is a convex risk measure if it satisfies the following three properties:
Convexity: is a convex function of .
Decreasing Monotonicity: For any and , if , then .
Negative Translation Invariance: For any and value , .
The financial interpretations of these properties are not important in our setting. More interesting for us is that Föllmer and Schied  provide a representation theorem that states that a function is a convex risk measure if and only if it can be represented as
where is a convex, lower semi-continuous function referred to as a penalty function. This fact is useful because it allows us to obtain the following result, which was alluded to informally by Agrawal et al. . The full proof is included here for completeness.
A function is a valid convex cost function if and only if it is differentiable and can be represented as
for a convex and lower semi-continuous function . Furthermore, for any quantity vector , the price vector corresponding to is the distribution maximizing .
Consider any differentiable function . Let . Clearly by definition, satisfies decreasing monotonicity if and only if satisfies increasing monotonicity, and satisfies negative translation invariance if and only if satisfies positive translation invariance. Furthermore, is convex if and only if is convex. By Theorem 1, this implies that is a valid convex cost function if and only if is a convex risk measure. The first half of the lemma then follows immediately from the representation theorem of Föllmer and Schied .
Now, because is guaranteed to be convex, is a concave function of . The constraints and define a closed convex feasible set. Thus, the problem of maximizing with respect to has a global optimal solution and first-order KKT conditions are both necessary and sufficient. Let denote an optimal for this optimization problem. Then, . By the envelope theorem , if is differentiable, we have that for any , . Thus the market prices are precisely those which maximize the inner expression of the cost function.
Furthermore, by a version of the envelope theorem , to ensure that is differentiable, it is sufficient to show that is strictly convex and differentiable.
A function is a valid convex cost function if it can be represented as in Equation 6 for a strictly convex and differentiable function . For any , the price vector is the distribution maximizing .
The ability to represent any valid cost function in this form allows us to define a bound on the worst-case loss of the market maker in terms of the penalty function of the corresponding convex risk measure.
The worst-case loss of the market maker defined by the cost function in Equation 6 is no more than
The worst-case loss of the market maker is
The inequality follows from the fact that for any functions and over any domain ,
5.2 Convex Cost Functions and Market Scoring Rules
As described in Section 2, the Logarithmic Market Scoring Rule market maker can be defined as either a market scoring rule or a cost function based market. The LMSR is not unique in this regard. As we show in this section, any regular, strictly proper market scoring rule with differentiable scoring functions can be represented as a cost function based market. Likewise, any convex cost function satisfying a few mild conditions corresponds to a market scoring rule. As long as the market probabilities are nonzero, the market scoring rule and corresponding cost function based market are equivalent. More precisely, a trader who changes the market probabilities from to in the market scoring rule is guaranteed to receive the same payoff for every outcome as a trader who changes the quantity vectors from any to such that and in the cost function formulation as long as every component of and is nonzero. Moreover, any price vector that is achievable in the market scoring rule (that is, any for which is finite for all ) is achievable by the cost function based market.
The fact that there exists a correspondence between certain market scoring rules and certain cost function based markets was noted by Chen and Pennock . They pointed out that the MSR with scoring function and the cost function based market with cost function are equivalent if for all and all outcomes , . However, they did not provide any guarantees about the circumstances under which this condition can be satisfied. Agrawal et al.  also made use of the equivalence between markets when this strong condition holds. Our result gives very general precise conditions under which an MSR is equivalent to a cost function based market.
Recall from Lemma 4 that any convex cost function can be represented as for a convex function . Let denote the function corresponding to the cost function . In the following, we consider cost functions derived from scoring rules by setting
and scoring rules derived from convex cost functions with
We show that there is a mapping between a mildly restricted class of convex cost function based markets and a mildly restricted class of strictly proper market scoring rules such that for every pair in the mapping, Equations 7 and 8 both hold. Furthermore, we show that the markets satisfying these equations are equivalent in the sense described above.
There is a one-to-one and onto mapping between the set of convex cost function based markets with strictly convex and differentiable potential functions and the class of strictly proper, regular market scoring rules with differentiable scoring functions such that for each pair in the mapping, Equations 7 and 8 hold.
Furthermore, each pair of markets in this mapping are equivalent when prices for all outcomes are positive, that is, the profit of a trade is the same in the two markets if the trade starts with the same market prices and results in the same market prices and the prices for all outcomes are positive before and after the trade. Additionally, every price vector achievable in the market scoring rule is achievable in the cost function based market.
We first show that the function in Equation 7 is strictly convex and differentiable and the scoring rule in Equation 8 is regular, strictly proper and differentiable. We then show that Equations 7 and 8 are equivalent. Finally, we show the equivalence between the two markets.
Consider the function in Equation 7. Since we have assumed that is differentiable for all , is differentiable too. Additionally, it is known that a scoring rule is strictly proper only if its expected value is strictly convex , so is strictly convex.
where is any subderivative of with respect to (if is differentiable, ). This immediately implies that the scoring rule defined in Equation 8 is a regular strictly proper scoring rule since is strictly convex. We will see below that is also differentiable.
This also shows that is differentiable for all , since the derivative of is well-defined at all points and
Since and is differentiable (meaning that is the only subderivative of with respect to ), this implies Equation 8.
Suppose in the cost function based market a trader changes the outstanding shares from to . This trade changes the market price from to . If outcome occurs, the trader’s profit is
From Lemma 4, we know that is the optimal solution to the convex optimization . The Lagrange function of this optimization problem is
Since is optimal, the KKT conditions require that , which implies that for all ,