Log-time Prediction Markets for Interval Securities

02/15/2021 ∙ by Miroslav Dudík, et al. ∙ University of Michigan Microsoft 0

We design a prediction market to recover a complete and fully general probability distribution over a random variable. Traders buy and sell interval securities that pay $1 if the outcome falls into an interval and $0 otherwise. Our market takes the form of a central automated market maker and allows traders to express interval endpoints of arbitrary precision. We present two designs in both of which market operations take time logarithmic in the number of intervals (that traders distinguish), providing the first computationally efficient market for a continuous variable. Our first design replicates the popular logarithmic market scoring rule (LMSR), but operates exponentially faster than a standard LMSR by exploiting its modularity properties to construct a balanced binary tree and decompose computations along the tree nodes. The second design consists of two or more parallel LMSR market makers that mediate submarkets of increasingly fine-grained outcome partitions. This design remains computationally efficient for all operations, including arbitrage removal across submarkets. It adds two additional benefits for the market designer: (1) the ability to express utility for information at various resolutions by assigning different liquidity values, and (2) the ability to guarantee a true constant bounded loss by appropriately decreasing the liquidity in each submarket.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Consider a one-dimensional random variable, such as the opening value of the S&P 500 index on December 17, 2021. We design a market for trading interval securities corresponding to predictions that the outcome will fall into some specified interval, say between 2957.60 and 3804.59, implemented as binary contracts that pay out $1 if the outcome falls in the interval and $0 otherwise. We are interested in designing automated market makers to facilitate a fully expressive market computationally efficiently. Traders can select custom interval endpoints of arbitrary precision corresponding to a continuous outcome space, whereas the market maker will always offer to buy or sell any interval security at some price.

A form of interval security called the condor spread is common in financial options markets, with significant volume of trade. Each condor spread involves trading four different options,111A call option written on an underlying stock with strike price and expiration date pays , where is the opening price of the stock on date . For example, 25 shares of “$1 iff [2650,2775]” . and financial options offered by the market may only support a limited subset of approximate intervals. As of this writing, S&P 500 options expiring on December 17, 2021, distinguish 56 strike prices, allowing the purchase of around 1500 distinct intervals of minimum width 25. Moreover, as each strike price trades independently despite the logical constraints on their relative values, it will require time linear in the number of offered strike prices to remove arbitrage.

Outside traditional financial markets, the logarithmic market scoring rule (LMSR) market maker Hanson03; Hanson07 has been used to elicit information through the trade of interval securities. The Gates Hillman Prediction Market at Carnegie Mellon University operated LMSR on 365 outcomes, representing 365 days of one year, to forecast the opening time of the new computer science building Othman10. Traders could bet on different intervals by choosing a start and an end date. A similar market222www.cs.utexas.edu/news/2012/research-corner-gates-building-prediction-market was later launched at the University of Texas at Austin, using a liquidity-sensitive variation of LMSR Othman13. Moreover, LMSR has been deployed to predict product-sales levels Chen2002, instructor ratings Chakraborty13, and political events hanson1999.

LMSR has two limitations that prevent its scaling to markets with a continuous outcome space. First, LMSR’s worst-case loss can grow unbounded if traders select intervals with prior probability approaching zero

GaoChenPennock09. Second, standard implementations of LMSR operations run in time linear in the number of outcomes or distinct future values traders define—in our case, arbitrarily many. The constant-log-utility and other barrier-function-based market makers ChenPe07; Othman12

achieve constant bounded loss, but still suffer the second limitation regarding computational intractability. Thus, previous markets allow only a relatively small set of predetermined intervals and run in time linear in the number of supported outcomes, limiting the ability to aggregate high-precision trades and elicit the full distribution of a continuous random variable.

In this paper, we propose two automated market makers that perform exponentially faster than the standard LMSR and previous designs. Market operations (i.e., price, cost, and buy) can be executed in time logarithmic in the number of distinct intervals traded, or linear in the number of bits describing the outcome space. Our first market maker calculates LMSR exactly, but employs a balanced binary tree to implement interval queries and trades. We show that the normalization constant of LMSR—a key quantity in its price and cost function—can be calculated recursively via local computations on the balanced tree. Our work here contributes to the rich literature that aims to overcome the worst-case #P-hardness of LMSR pricing ChenEtAl08 by exploiting the outcome space structure and limiting expressivity Chen:07; Guo:09; Chen:08b; XiaPe11; LaskeyEtAl18.

Our second market maker works by maintaining parallel LMSR submarkets that adopt different liquidity parameters and offer interval securities at various resolutions. We show that liquidity parameters can be chosen to guarantee a constant bounded loss independent of market precision and prices can be kept coherent efficiently by removing arbitrages across submarkets. We demonstrate through agent-based simulation that our second design enjoys more flexible liquidity choices to facilitate the information-gathering objective: it can get close to the “best of both worlds” displayed by coarse and fine LMSR markets, with prices converging fast at both resolutions regardless of the traders’ information structure.

The two proposed designs, to our knowledge, are the first to simultaneously achieve expressiveness and computational efficiency. As both market makers facilitate trading intervals at arbitrary precision, they can elicit any probability distribution over a continuous random variable that can be practically encoded by a machine. We use the S&P 500 index value as a running example, but our framework is generic and can handle any one-dimensional continuous variable, for example, the landfall point of a hurricane along a coastline or the number of tickets sold in the first week of a movie release.

2. Formal Setting

We first review cost-function-based market making AbernethyChVa11; ChenPe07, and then introduce interval markets.

2.1. Cost-Function-Based Market Making

Let denote a finite set of outcomes, corresponding to mutually exclusive and exhaustive states of the world. We are interested in eliciting expectations of binary random variables , indexed by , which model the occurrence of various events, such as “S&P 500 will open between 2957.60 and 3804.59 on December 17, 2021”. Each variable is associated with a security that pays out when the outcome occurs, and thus is also called the payoff function

. Binary securities pay out $1 if the specified event occurs and $0 otherwise. The vector

is denoted . Traders trade bundles of security with a central market maker, where positive entries in correspond to purchases and negative entries to short sales. A trader holding a bundle receives a payoff of , when occurs.

Following AbernethyChVa11 and ChenPe07, we assume that the market maker determines security prices using a convex and differentiable potential function , called a cost function. The state of the market is specified by a vector , listing the number of shares of each security sold by the market maker so far. A trader who wants to buy a bundle in the market state must pay to the market maker, after which the new state becomes .

The vector of instantaneous prices in the corresponding state is

. Its entries can be interpreted as the market’s collective estimates of

: a trader can make an expected profit by buying (at least a small amount of) the security if she believes that is larger than the instantaneous price , and by selling if she believes the opposite. Therefore, risk neutral traders with sufficient budgets maximize their expected profits by moving the price vector to match their expectation of . Any expected payoff must lie in the convex hull of the set , which we denote and call a coherent price space with its elements referred to as coherent price vectors.

We assume that the cost function satisfies two standard properties: no arbitrage and bounded loss. The no-arbitrage property requires that as long as all outcomes are possible, there be no market transaction with a guaranteed profit for a trader. In this paper, we use the fact that is arbitrage-free if and only if it yields price vectors that are always coherent AbernethyChVa11. The bounded-loss property is defined in terms of the worst-case loss of a market maker, , meaning the largest difference, across all possible trading sequences and outcomes, between the amount that the market maker has to pay the traders (once the outcome is realized) and the amount that the market maker has collected (when securities were traded). The property requires that this worst-case loss be a priori bounded by a constant.

2.2. Complete Markets and LMSR

In a complete market, we have . Securities are indicators of individual outcomes, , where denotes the binary indicator. We denote each market security as . A risk-neutral trader is incentivized to move the price of each security to her estimate of , which is her subjective probability of occurring. Thus, traders can express arbitrary probability distributions over . We consider variants of LMSR market maker Hanson03 for a complete market, described by cost function and prices

(1)

where is the liquidity parameter, controlling how fast the price moves in response to trading and limiting the worst-case loss of the market maker to  Hanson03.

The securities in a complete market can be used to express bets on any event . Specifically, one share of a security for the event can be represented by the indicator bundle with entries . We refer to this bundle as the bundle security for event . The immediate price of the bundle in the state is

(2)

The cost of buying the bundle , or sometimes referred to as “the cost of shares of ”, can be written as a function of and :

(3)

Above, we write for the complementary event , and use the fact , which follows from Eq. (2).

2.3. Interval Securities over

We consider betting on outcomes within an interval . Our approach generalizes to outcomes that are in any by applying any increasing transformation . We assume that the outcome is specified with bits, meaning that there are outcomes with . At the end of Sections 3 and 4, we discuss how the assumption of pre-specified bit precision can be removed.

Example 0 (Complete market for S&P 500).

We construct a complete market for the S&P 500 opening price on December 17, 2021, by setting . The resulting complete market is , where we cap prices at $5242.87 (i.e., larger prices are treated as $5242.87). The transformed outcome is then , where is the price in cents.

In the outcome space , we would like to enable price and cost queries as well as buying and selling of bundle securities for the interval events for any . For cost-based markets, sell transactions are equivalent to buying a negative amount of shares, so we design algorithms for three operations: , , and , where is the interval event and the number of shares. A naive implementation of price and cost following Eqs. (2) and (3) would be linear in . In this paper, we propose to implement these operations in time that is logarithmic in .

3. A Log-time LMSR Market Maker

We design a data structure, referred to as an LMSR tree, which resembles an interval tree (CLR99, Section 15.3), but includes additional annotations to support LMSR calculations. We first define the LMSR tree, and show that it can facilitate market operations in time logarithmic in the number of distinct intervals that traders define.

3.1. An LMSR Tree for

We represent an LMSR tree with a full binary tree, where each node has either no children (when is a leaf) or exactly two children, denoted and (when is an inner node). The root is denoted root and the parent of any non-root node .

Definition 0 (LMSR Tree).

An LMSR tree is a full binary tree, where each node is annotated with an interval with , a height , a quantity that records the number of sold bundle securities associated with , and a partial normalization constant (defined below in Eq. 6).

An LMSR tree is required to satisfy:

  • [leftmargin=*]

  • Binary-search property: , and for inner node ,

  • Height balance: for leaves, and for inner node ,

  • Partial-normalization correctness: for leaves, and for inner node ,

The binary-search property helps to find the unique leaf that contains any by descending from root and choosing left or right in each node based on whether or . The height-balance property ensures that the path length from root to any leaf is at most , where is the number of leaves of the tree Knuth. We adopt an AVL tree AVL62 at the basis of our LMSR tree, but other balanced binary-search trees (e.g., red-black trees or splay trees) could also be used.

To facilitate LMSR computations, we maintain a scalar quantity for each node , which records the number of bundle securities associated with sold by the market maker. Therefore, the market state and its components for each individual outcome represented by the LMSR tree are333We write to mean and to mean . Thus, means that is a descendant of in , and means that is a strict descendant of .

(4)

The normalization constant in the LMSR price (Eq. 2) is then

(5)

We decompose the computation of the above normalization constant along the nodes of an LMSR tree, by defining a partial normalization constant in each node:

(6)

Thus, we have and obtain the following recursive relationship, which we refer to as partial-normalization correctness and is at the core of implementing price and buy:

(7)

Based on the LMSR tree construction, we implement the following operations for any interval :

  • [leftmargin=*]

  • returns the price of bundle security for ;

  • returns the cost of shares of bundle security for ;

  • updates to reflect the purchase of shares of bundle security for .

For cost, it suffices to implement price and use Eq. (3). Since the price of satisfies , it suffices to implement price for intervals of the form . Similarly, buying shares of is equivalent to first buying shares of and then buying shares of , as the market ends up in the same state . We implement price and buy for one-sided intervals , and the remaining operations will follow.

3.2. Price Queries

We consider price queries for . Let denote the set of distinct left endpoints in the tree nodes. We start by assuming that , and later relax this assumption. We proceed to calculate in two steps. First, we construct a set of nodes whose associated intervals are disjoint and cover . To achieve this, we conduct a binary search for , putting in all of the right children of the visited nodes that have , as well as the final node with . Thanks to the height balance, the cardinality of is , where is the number of leaves of . The resulting set satisfies .

Second, we determine for each node . Starting from the LMSR price in Eq. (2), we take advantage of the defined partial normalization constants to calculate :

(8)
(9)
(10)

In Eq. (8), we use that and then expand using Eq. (4). In Eq. (9), we use the fact that any node with a non-empty intersection with (i.e., ) must be either a descendant or an ancestor of as a direct consequence of the binary-search property. The product in Eq. (10) iterates over on the path from root to , and thus can be calculated along the binary-search path.

We now handle the case when . After the leaf on the search path is reached, we have . Instead of expanding the tree, we conceptually create two children of : and with and , and add in . Since is constant across , we obtain by Eq. (2).

Summarizing the foregoing procedures yields Algorithm 1, which simultaneously constructs the set and calculates the prices . Since it suffices to go down a single path and only perform constant-time computation in each node, the resulting algorithm runs in time , where denotes the number of distinct values appeared as endpoints of intervals in all the executed transactions. We defer complete proofs from this paper to the appendix, which is available in the full version of this paper on arXiv.

Theorem 3.

Algorithm 1 implements price in time .

1:Input: Interval , , LMSR tree .
2:Output: Price of bundle security for .
3:Initialize , ,
4:while  and is not a leaf do
5:     
6:     if  then
7:          
8:          
9:     else
10:                
11:return
Algorithm 1 Query price of bundle security for an interval .

3.3. Buy Transactions

We next implement while maintaining the LMSR tree properties. The main challenge here is to simultaneously maintain partial-normalization correctness and height balance. We address this by adapting AVL-tree rebalancing.

We begin by considering the case . Similar to price queries, we conduct binary search for to obtain the set of nodes that covers . We update the values of across by adding , and obtain that has the same structure as with the updated share quantities

Thus, the resulting market state is

We then rely on the recursive relationship defined in Eq. (7) to update the partial normalization constants . It suffices to update the ancestors of the nodes , all of which lie along the search path to , and each update requires constant time.

When , we split the leaf that contains before adding shares to . This may violate the height-balance property. Similar to the AVL insertion algorithm (Knuth, Section 6.2.3), we fix any imbalance by means of rotations, as we go back along the search path. Rotations are operations that modify small portions of the tree, and at most two rotations are needed to rebalance the tree AVL62. We show in Appendix A.2, Lemma 1, that in each rotation, only a constant number of nodes needs to be updated to preserve the partial-normalization correctness. Thus, the overall running time of the buy operation, presented in Algorithm 2, is .

Theorem 4.

Algorithm 2 implements buy in time .

1:Input: Quantity , interval , , LMSR tree .
2:Output: Tree updated to reflect the purchase of shares of .
3:Define subroutines:
4:     NewLeaf(): return a new leaf node with
5:          , , ,
6:     ResetInnerNode(): reset and based on the children of
7:          ,
8:     AddShares(): increase the number of shares held in by
9:          ,
10:Initialize
11:while  and is not a leaf do add shares to
12:     if  then
13:          AddShares()
14:          
15:     else
16:                
17:if  then split the leaf
18:     NewLeaf(), NewLeaf()
19:     
20:AddShares()
21:while  is not a root do trace the binary-search path back
22:     
23:     if  then restore height balance
24:          Rotate and possibly one of its children
25: while (details in Appendix A.2, Algorithm 5)      
26:     ResetInnerNode() update and
Algorithm 2 Buy shares of bundle security for an interval .
Remarks.

We show that price, cost and buy can be implemented in time , which is bounded above by the log of the number of buy transactions and the bit precision of the outcome .444Clearly, with each buy transaction introducing at most two new endpoint values. The value of is also bounded above by since the interval endpoints are always in . We note that none of the operations require the knowledge of , so the market in fact supports queries with arbitrary precision. However, the market precision affects the worst-case loss bound for the market maker, which is . Next section presents a different construction that achieves a constant worst-case loss independent of the market precision.

4. A Multi-resolution Linearly Constrained Market Maker

We introduce our second design, referred to as the multi-resolution linearly constrained market maker (multi-resolution LCMM). The design is based on the LMSR, but it enables more flexibility by assigning two or more parallel LMSRs with different liquidity parameters to orchestrate submarkets that offer interval securities at different resolutions. However, running submarkets independently can create arbitrage opportunities, as any interval expressible in a coarser market can also be expressed in a finer one. To maintain coherent prices, we design a matrix that imposes linear constraints to tie market prices among different submarkets to support the efficient removal of any arbitrage opportunity, following DudikLaPe12. We first define the multi-resolution LCMM and its properties, and show that price, cost and buy can be implemented in time .

4.1. A Multi-resolution LCMM for

4.1.1. A Multi-resolution Market

A binary search tree remains at the core of our multi-resolution market construction. Unlike a log-time LMSR that uses a self-balancing tree, it builds upon a static one, where each level of the tree represents a submarket of intervals, forming a finer and finer partition of . We start with an example of a market that offers interval securities at two resolutions.

Example 0 (Two-level market for ).

We consider a market composed of two submarkets, indexed by and , which partition into interval events at two levels of coarseness:

The market provides six interval securities associated with the corresponding interval events, i.e., and .

We extend Example 5 to multiple resolutions. We represent the initial independent submarkets with a complete binary tree of depth , which corresponds to the bit precision of the outcome . Let denote the set of nodes of and for the set of nodes at each level. contains the root associated with , and each consecutive level contains the children of nodes from the previous level, which split their corresponding parent intervals in half. Thus, level partitions into intervals of size and the final level contains leaves.

We index interval securities by nodes, with their payoffs defined by . We partition securities into submarkets corresponding to levels, i.e., for , where and . For each submarket, we define the LMSR cost function with a separate liquidity parameter :

(11)

4.1.2. A Linearly Constrained Market Maker

Following the above multi-resolution construction, the overall market has a direct-sum cost , which corresponds to pricing securities in each block independently using . However, as there are logical dependencies between securities in different levels, independent pricing may lead to incoherent prices among submarkets and create arbitrage opportunities.

Example 0 (Arbitrage in a two-level market).

Continuing Example 5, we define separate LMSR costs, where and :

The direct-sum market allows incoherent prices. For example, after buying some shares of security associated with in submarket , the market can have

These prices are incoherent, i.e., do not correspond to probabilities of , , , because under any probability distribution over , we must have and . Thus, a coherent price vector must satisfy linear constraints and , which can be also written as and where

We refer to as the constraint matrix.

We extend Example 6 to specify price constraints in a multi-resolution market. Later we will show how the constraint matrix can be used to remove arbitrage arising from the constraint violations.

Recall that denotes a coherent price space, where any expected payoff lies in the convex hull of . For the multi-resolution market, we specify a set of homogeneous linear equalities describing a superset of .

(12)

We design the constraint matrix to ensure that any pair of submarkets is price coherent, meaning that any interval event gets the same price on all levels that can express it. Therefore, for each inner node where , we have

For algorithmic reasons (as we will see in Section 4.3), we further tie the price of to the prices of all of ’s descendants and weight each level by its liquidity parameter :

(13)

Now we can formally define the constraint matrix . Let be the set of inner nodes of and let denote the level of a node . The matrix contains the constraints from Eq. (13) across all :

(14)

Arbitrage opportunities arise if the price of bundle differs from zero, where denotes the th column of . Traders profit by buying a positive quantity of if its price is negative, and selling otherwise. Thus, the constraint matrix gives a recipe for arbitrage removal. We provide the intuition for this in the two-level market, and then give the definition of the multi-resolution LCMM.

Example 0 (Arbitrage removal in a two-level market).

Continuing Example 6, the prices violate the constraint , because . The vector reveals an arbitrage opportunity: buy the security (at the initial price ) and simultaneously sell securities and (at the initial price ), i.e., buy bundle . Since under any outcome , the payout for the bundle is , this is initially profitable. However, buying will increase the price of and decrease the prices of and . Once a sufficiently large quantity of shares of is bought, this form of arbitrage is removed and we have in a new state , where .

A linearly constrained market maker (LCMM) DudikLaPe12 leverages violated constraints similarly as in Example 7 to remove arbitrage, and then returns the arbitrage proceeds to the trader. Formally, an LCMM is described by the cost function

(15)

It relies on the direct-sum cost , but with each trader purchase that causes incoherent prices, an LCMM automatically seeks the most advantageous cost for the trader by buying bundles on the trader’s behalf to remove arbitrage. Trader purchases are accumulated as the state , and automatic purchases made by the LCMM are accumulated as .

We note that the purchase of bundle has no effect on the trader’s payoff, since for all thanks to Eq. (12) and the fact that . However, the purchase of can lower the cost, so optimizing over benefits the traders, while maintaining the same worst-case loss guarantee for the market maker as DudikLaPe12. Consider a fixed and the corresponding minimizing Eq. (15). We calculate prices as By the first order optimality, minimizes Eq. (15) if and only if . This means that , and thus arbitrage opportunities expressed by are completely removed by the LCMM cost function .

To implement an LCMM, we maintain the state in the direct-sum market . After updating to a new value , we seek to find that removes all the arbitrage opportunities expressed by . The resulting cost for the trader is

We finish this section by pointing out two favorable properties of the multi-resolution LCMM. Above, we have established that LCMM removes all arbitrage opportunities expressed by . The next theorem shows that this actually removes all arbitrage. The proof shows that consecutive levels are coherent, which by transitivity implies that the overall price vector is coherent (see Appendix A.3).

Theorem 8.

A multi-resolution LCMM is arbitrage-free.

The multi-resolution LCMM also enjoys the bounded-loss property. For a suitable choice of liquidities, such as , it can achieve a constant worst-case loss bound. The proof uses the fact that the overall loss is bounded by the sum of losses of level markets, which are at most .

Theorem 9.

Let be a sequence of positive numbers such that for some finite . Then the multi-resolution LCMM with liquidity parameters for guarantees the worst-case loss of the market maker of at most , regardless of the outcome precision .

4.1.3. A Multi-resolution LCMM Tree

We can now formally define the multi-resolution LCMM tree. The market state of a multi-resolution LCMM is represented by vectors and , whose dimensions can be intractably large (e.g., on the order of ). However, since each LCMM operation involves only a small set of coordinates of and , we only keep track of the coordinates accessed so far and represent them as an annotated subtree of , referred to as an LCMM tree.

Definition 0 (LCMM Tree).

An LCMM tree is a full binary tree, where each node is annotated with , , , such that , and for every inner node :

The tree contains the coordinates of and accessed so far. Since and are initialized to zero, their remaining entries are zero. We write and for the vectors represented by . To calculate prices, we maintain that minimizes Eq. (15), or equivalently that satisfies If this property holds, we say that an LCMM tree is coherent.

4.2. Price Queries

There are many ways to decompose an interval in a multi-resolution market, but they all yield the same price thanks to coherence. The no-arbitrage property also guarantees that the price of can be obtained by subtracting the price of from . Therefore, we focus on pricing one-sided intervals of the form .

Let be a coherent LCMM tree and and be the vectors represented by . Let be the corresponding state in , so the current security prices are . As before, we identify a set of nodes that covers , and then rely on price coherence to calculate each along the search path.

Assume that is not a root node and we know the price of its parent. Let denote the sibling of and . We can then relate the price of to the price of :

(16)
(17)

Eq. (16) follows by price coherence and Eq. (17) follows by the price calculation in Eq. (1). Thus, we descend the search path to calculate each price , beginning with . It remains to obtain , for which we follow the construction of in Eq. (14):

(18)

Plugging the above equation back in Eq. (17), we obtain555The factor appears in both the numerator and the denominator after plugging Eq. (18) to Eq. (17), so it cancels out.

(19)

These steps yield Algorithm 3. The final line of the algorithm addresses the case when the search ends in the leaf with . Rather than expanding the tree to its lowest level , we use price coherence again: since any strict descendant on the path from to a leaf node has by market initialization, all leaf nodes have the same price. Therefore, the price of equals .

The length of search path for is , which denotes the bit precision of , defined as the smallest integer such that is an integer multiple of . As the computation at each node only requires constant time, the time to price is , which is bounded above by .

Theorem 11.

Let , . Algorithm 3 implements in time .

1:Input: Interval , , coherent LCMM tree .
2:Output: Price of bundle security for .
3:Initialize , ,
4:while  and is not a leaf do
5:     , ,
6:     ,