Public blockchains allow any user to submit a transaction that modifies the shared state of the network. Transactions are independently verified and executed by a decentralized network of full nodes. Because full nodes have finite resources, blockchains limit the total computational resources that can be consumed per unit of time. As user demand may fluctuate, most blockchains implement a transaction fee mechanism in order to allocate finite computational capacity among competing transactions.
Smart contracts and gas.
Many blockchains allow transactions to submit and/or execute programs that exist on-chain called smart contracts. Once such a transaction is included in a block, full nodes must re-execute the transaction in order to obtain the resulting updated state of the ledger. All of these transactions consume computational resources, whose total supply is finite. To prevent transactions with excessive resource use and transaction spam, some smart-contract blockchains require users to pay fees in order to compensate the network for processing their transactions.
Most smart contract platforms calculate transaction fees based on a shared unit of account. In the Ethereum Virtual Machine (EVM), this unit is called gas. Each operation in the EVM requires a hardcoded amount of gas which is intended to reflect its relative resource usage. The network enforces a limit on the total amount of gas consumed across all transactions in a block. This limit, called the block limit, prevents the chain from expending computational resources too quickly for full nodes to catch up to the latest state in a reasonable amount of time. Block limits must take into account the maximum amount of each resource that each block may consume (such as storage, bandwidth, or memory) without posing an extreme burden on full nodes meeting the minimal hardware specifications. Because the block limit fixes the total gas supply in each block, the price of gas in ‘real’ terms (e.g., in terms of US Dollars) fluctuates based on demand for transactions in the block.
One-dimensional transaction fees.
Calculating transaction fees through a single, joint unit of account, such as gas, introduces two major challenges. First, if the hardcoded costs of each operation are not precisely reflective of their relative resource usage, there is a possibility of denial-of-service attacks (specifically, resource exhaustion attacks [perez2019broken]), where an attacker exploits resource mispricing to overload the network. Historically, the Ethereum network has suffered from multiple DoS attacks [suicide-attack, transaction-attack, wilcke_ethereum_2016] and has had to manually adjust the relative prices accordingly (e.g., [eip150, eip2929]). Amending the hardcoded costs of each gas operation in response to such attacks typically requires a hard fork of the client software.
Second, one-dimensional fee markets limit the theoretical throughput of the network. Using a joint unit of account to price separate resources decouples their price from supply and demand. As an extreme example to illustrate this dynamic, if the block gas limit is fully saturated with exclusively CPU-intensive operations, gas price will increase as transactions compete for limited space. The cost of transactions that consume exclusively network bandwidth (and nearly no CPU resources) will also increase because these also require gas, even if demand and supply for bandwidth resources across the network remain unchanged. As a result, fewer bandwidth-intensive transactions can be included in the block despite spare capacity, limiting throughput. This limitation occurs because the shared unit of account only allows the network to price resources relative to each other and not in real terms based on the supply and demand for each resource. As we will soon discuss, allowing resources to be priced separately promotes more efficient resource utilization by enabling more precise price discovery. We note that this increase in throughput need not increase hardware requirements for full nodes.
Multidimensional fee markets.
Due to the potential scalability benefits of more granular price discovery, a number of popular smart contract platforms are actively exploring multidimensional fee market mechanisms [adler2022ethcc, multidimensional-eip1559]. We discuss some example proposals that are under active development, below.
Rollups and data markets.
Rollups are a popular scaling technology that effectively decouples transaction validation and execution from data and consensus [buterin_incomplete_2021]. In rollups, raw transaction data is posted to a base blockchain. Rollups also periodically post succinct proofs of valid execution to the base chain in order to enable secure bridging, prevent rollbacks, and arbitrate potential disputes by using the base chain as an anchor of trust. Rollups naturally create two separate fee markets, one for base layer transactions and one for rollup execution. As rollups have become a popular design pattern for achieving scalability, specialized blockchains (called lazy blockchains) that exclusively order raw data through consensus (i.e., do not perform execution) have emerged [al-bassam_fraud_2018, al-bassam_lazyledger_2019, polygon_team_introducing_2021, nazirkhanova_information_2021, tas_light_2022]. These blockchains naturally allow for transaction data/bandwidth and execution to be priced through independent (usually one-dimensional) fee markets [adler2021ethcc]. Similarly, Ethereum developers have proposed EIP-2242, wherein users may submit special transactions which contain an additional piece of data called a blob [adler2019dataavail, adler2019eip2242]. Blobs may contain arbitrary data intended to be interpreted and executed by rollups rather than the base chain. Later, EIP-4844 extended these ideas by establishing a two-dimensional fee market wherein data blobs and base-chain gas have different limits and are priced separately [noauthor_proto-danksharding_2022]. EIP-4844 therefore intends to increase scalability for rollups, as blobs do not have to compete with base-chain execution for gas.
Most smart contract platforms, including the EVM, execute program operations sequentially by default, limiting performance. There are several proposals to enable parallel execution in the EVM which generally fall into two categories. The first involves minimal changes to the EVM and pushes the responsibility of identifying opportunities for parallel execution to full nodes [gelashvili_block-stm_2022, chen_forerunner_2021, saraph_empirical_2019]. The other approach involves access lists which require users to specify which accounts their transaction will interact with, allowing the network to easily identify non-conflicting transactions that can be executed in parallel [vbuterin_easy_2017, buterin_eip-2930_2020]. While Ethereum makes access lists optional, other virtual machine implementations, such as Solana Sealevel and FuelVM, require users to specify the accounts their transaction will interact with [yakovenko_sealevel_2020, bvrooman_github_nodate, adler_accounts_2020]. Despite this capability, a large fraction of transactions often want to access the same accounts in scenarios including auctions, arbitrage opportunities, and new product launches. Such contention significantly limits the potential benefits of the virtual machine’s parallelization capabilities. As a result, developers of Solana have proposed multidimensional fee markets that price interactions with each account separately in order to charge higher fees for transactions which require sequential execution [aeyakovenko_consider_2021]. Such a proposal incentivizes usage of spare capacity on multi-core machines.
In this paper, we formally illustrate how to efficiently update resource prices, what optimization problem these updates attempt to solve, and some consequences of these observations. We also numerically demonstrate that this approach enhances network performance and reduces DoS-style resource exhaustion attacks. We frame the pricing problem in terms of an idealized, omniscient network designer who chooses transactions to include in blocks in order to maximize total welfare, subject to demand constraints. (The designer is omniscient as the welfare is unknown and likely unmeasurable in any practical setting.) We show that this problem, which is the ‘ideal end state’ of a blockchain but not immediately useful by itself, decomposes into two problems, coupled by the resource prices. One of these two problems is a simple one which can be easily solved on chain and represents the cost to the network for providing certain resources, while the other is a maximal-utility problem that miners and users implicitly solve when creating and including transactions for a given block. Correctly setting the resource prices aligns incentives such that the resource costs to the network are exactly balanced by the utility gained by the users and miners. This, in turn, leads to block allocations which solve the original ‘ideal’ problem, on average.
For convenience, we provide appendix A as a short introduction to convex optimization. We recommend readers unfamiliar with convex optimization at least skim this appendix, as it provides a short introduction to all the mathematical definitions and major theorems used in this paper. As a general guideline, we recommend those uninterested in theoretical results to skip §3.2 and §3.3 on a first reading.
1.1 Related work
The resource allocation problem has been studied in many fields, including operations research and computer systems. Agrawal, et al. [agrawal2022allocation] proposed a similar formulation and price update scheme for fungible resources where utility is defined per-transaction. Prior work on blockchain transaction fees varies from the formal axiomatic analysis of game theoretic properties that different fee markets should have [roughgarden2020eip1559, chung2021foundations] to analysis of dynamic fees from a direct algorithmic perspective [ferreira2021dynamic, leonardos2021dynamical, reijsbergen2021transaction]. Works of the latter form generally focus on whether the system macroscopically converges to an equilibrium. Moreover, these mechanisms focus on dynamic pricing at the block level (e.g., how many transactions should be allowed in a block?) and not directly on questions of how capacity should be allocated and priced across different transaction types.
, implemented last year, proposed major changes to Ethereum’s transaction fee mechanism. Specifically, EIP-1559 implemented a base fee for transactions to be included in each block, which is dynamically adjusted to hit a target block usage and burned instead of rewarded to the miners. We note that while EIP-1559 is closely related to the problem we consider, it ultimately has a different goal: EIP-1559 attempts to make the fee estimation problem easier in a way that disincentivizes manipulation and collusion[vitalikfirstpricedauctions, roughgarden2020eip1559]. We, on the other hand, aim to price resources dynamically to achieve a given network-specified objective. Finally, we note that prior work such as [ferreira2021dynamic] has proved incentive compatibility for a large set of mechanisms that are a superset of EIP-1559. It is likely (but not proven in this work) that our model fits within an extension of their incentive compatibility framework. We leave game theoretic analysis and strategies to ensure incentive compatibility as future work.
2 Transactions and resources
Before introducing the network’s resource pricing problem, we discuss the general set up and motivation for the problem in the case of blockchains.
We will start by reasoning about transactions. A transaction can represent arbitrary data sent over the peer-to-peer network in order to be appended to the chain. Typically, a transaction will represent a value transfer or a call to a smart contract that exists on chain. These transactions are broadcasted by users through the peer-to-peer network and collected by consensus nodes in the mempool, which contains all submitted transactions that have not been included on chain. A miner gets to choose which transactions from the mempool are included on chain. Miners may also outsource this process to a block builder in exchange for a reward [buterinPBS]. Once a transaction is included on chain, it is removed from the mempool. (Any conflicting transactions are also removed from the mempool.)
Every transaction needs to be executed by full nodes (which we will refer to as ‘nodes’). Nodes compute the current state of the chain by executing and checking the validity of all transactions that have been included on the chain since the genesis block. Many blockchains have minimum computational requirements for nodes in a blockchain: any node meeting these requirements should be able to eventually ‘catch up’ to the head of the chain in a reasonable amount of time, i.e., execute all transactions and reach the latest state, even as the chain continues to grow. (For example, Ethereum requires 4GB RAM and 2TB of SSD Storage, and they recommend at least a Intel NUC, 7th gen processor [ethnodereqs].) These requirements both limit the computational resources each block is allowed to consume and promote decentralization by ensuring the required hardware does not become prohibitively expensive. If transactions are included in a blockchain faster than nodes are able to execute them, nodes cannot reconstruct the latest state and can’t ascertain the validity of the chain. This type of denial of service (DoS) attack is also referred to as a resource exhaustion attack. (As a side note, in some systems, it is possible to provide an easily-verifiable proof that the state is correct without the need to execute all past transactions to validate the state of the chain. In these systems, the time-consuming step is generating the proofs. A similar market mechanism might make sense for these systems, but we do not explore this topic here.)
Resource targets and limits.
There are several ways to prevent this type of denial of service attack. For example, one method is to enforce that any valid transaction (or sequence of transactions, e.g., a block) consumes at most some fixed upper bound of resources, or combinations of resources. These limits are set so that that a node satisfying the minimum node requirements is always able to process transactions quickly enough to catch up to the head of the chain in a reasonable amount of time. Another possibility is to disincentivize miners and users from repeatedly including transactions that consume large amounts of resources while allowing short ‘bursts’ of resource-heavy transactions. This margin needs to be carefully balanced so that a node meeting the minimum requirements is able to catch up after a certain period of time. This intuition suggests having both a ‘resource target’ and a larger ‘resource limit,’ which we will formalize in what follows.
Most blockchain implementations define a number of meterable resources (simply resources from here on out) which we will label , that a transaction can consume. For example, in Ethereum, the resources could be the individual Ethereum Virtual Machine (EVM) opcodes used in the execution of a transaction. In this paper, the notion of a ‘resource’ is much more general than simply an ‘opcode’ or an ‘execution unit’. Here, a resource can refer to anything as coarse as ‘total bandwidth usage’ or as granular as individual instructions executed on a specific core of a machine—the only requirement for a resource, as used in this paper, is that it can be easily and consistently metered across any node. For a given transaction , we will let
be the vector of resources that transactionconsumes. In particular, the th entry of this vector, , denotes the amount of resource that transaction uses. We note that the resource usage does not, in fact, need to be nonnegative. While our mechanism works in the more general case (with some small modifications), we assume nonnegativity in this work for simplicity.
This framework naturally includes combinations of resources as well. For example, we may have two resources and , each cheap to execute once in a transaction, which are costly to execute serially (i.e., it is costly to execute and then in the same transaction). In this case, we can create a ‘combined’ resource which is itself metered separately. Note that, in our discussion of resources, there is no requirement that the resources be independent in any sense and such ‘combined resources’ are themselves very reasonable to consider.
Resource utilization targets.
As mentioned previously, many networks have a minimum node requirement, implying a sustained target for resource utilization in each group of transactions added to the blockchain. (For simplicity, we will call this sequence of transactions a block, though it could be any collection of transactions that makes sense for a given blockchain.) We will write this resource utilization target as a vector whose th entry denotes the desired amount of resource to be consumed in a block. The resource utilization of a particular block is a linear function of the transactions included in a block, written as a Boolean vector , whose th entry is one if transaction is included in the block and is zero otherwise. We will write as a matrix whose th column is the vector of resources consumed by transaction . We can then write the total quantity of consumed resources by this block as
where is a vector whose th entry denotes the quantity of resource consumed by all transactions included in the block. Additionally, we can write the deviation from the target utilization, sometimes called the residual, as
i.e., a vector whose element is the total quantity of resource , consumed by transactions in this block, minus the target for resource , . For example, in Ethereum post EIP-1559, there is only one resource, gas, which has a target of 15M [ethgas]. (We will see later how this notion of a ‘resource utilization target’ can be generalized to a loss function.)
Resource utilization limits.
In addition to resource targets, a blockchain may introduce a resource limit such that any valid block with transactions must satisfy
Continuing the example from before, Ethereum after EIP-1559 has a single resource, gas, with a resource limit of 30M.
We want to incentivize users and miners to keep the total resource usage ‘close’ to . To this end, we introduce a network fee for resource , which we will sometimes call the resource price, or just the price. If transaction with resource vector is included in a block, a fee is paid to the network. (In Ethereum, the network fee is implemented by burning some amount of the gas fee for each transaction and can be thought of as the burn rate.) For now, we will assume that as the fee gets larger, the amount of resource used in a block will become smaller and vice versa.
Given and , it is, in general, not clear how to set the fees in order to ensure the network has good performance. (In other words, so that the resource utilization is ‘close’ to .) As a real world example, starting in Ethereum block number (Sept. 18, 2016) an attacker exploited the fact that resources were mispriced for the EXTCODESIZE opcode, causing the network to slow down meaningfully. This mispricing was fixed via the hard fork on block (Oct. 18, 2016) with details outlined in EIP-150 [eip150]. (The effects of this mispricing can still be observed when attempting to synchronize a full node. A dramatic slowdown in downloading and processing blocks happens starting at the previously mentioned block height.) Though usually less drastic, there have been similar events on other blockchains, underscoring the importance of correctly setting resource prices.
We want a simple update rule for the network fees with the following properties:
If , then there is no update.
If , then increases.
Otherwise, if , then decreases.
There are many update rules with these properties. As a simple example, we can update the network fees as
where some (usually small) positive number, often referred to as the ‘step size’ or ‘learning rate’, is the block number such that are the resource prices at block , and for scalar and is applied elementwise for vectors. Recently, Ethereum developers [vitalikmultidimeip1559] proposed the update rule
Here, is understood to apply elementwise, while is the elementwise or Hadamard product between two vectors. The remainder of this paper will show that many update rules of this form are attempting to (approximately) solve an instance of a particular optimization problem with a natural interpretation, where parts of the update rule come from a specific choice of objective function by the network designer. (We show in appendix C that a similar rule to (3) can be derived as a consequence of the framework presented in this paper, under a particular choice of variable transformation.)
We note that [ferreira2021dynamic] has studied fixed points of such iterations in the one-dimensional case, extending the analysis of EIP-1559. However, the multidimensional scenario can be quite a bit more subtle to analyze. For instance, the multiplicative update rule (3) can admit ‘vanishing gradient’ behavior in high-dimensions [hochreiter1998vanishing]. We suspect that the one-dimensional fee model of [ferreira2021dynamic] can be extended to the multidimensional rules (2) and (3) and leave this for future work.
3 Resource allocation problem
As system designers, our ultimate goal is to maximize utility of the underlying blockchain network by appropriately allocating resources to transactions. However, we cannot perform this allocation directly, since we cannot control what users nor miners wish to include in blocks, nor do we know what the utility of a transaction is to users and miners. Instead, we aim to set the fees to ensure that the resource usage is approximately equal to the desired target, which we will represent as an objective function. We will show later that the update mechanisms proposed above naturally fall out of a more general optimization formulation. Similarly to the Transmission Control Protocol (TCP), each update rule is a result of a particular objective function, chosen by the network designer [low1999optimization, low2003duality].
We define a loss function of the resource usage, , which maps a block’s resource utilization, , to the ‘unhappiness’ of the network designer, . We assume only that is convex and lower semicontinuous. (We will not require monotonicity, nonnegativity, or other assumptions on , but we will show that useful properties do hold in these scenarios.)
For example, the loss function can encode ‘infinite dissatisfaction’ if the resource target is violated at all:
(Functions of this form, which are either 0 or at every point, are known as indicator functions.) Note also that this loss is not differentiable anywhere, but it is convex. Another possible loss, which is also an indicator function, is
This loss can roughly be interpreted as: we do not mind any usage below , but we are infinitely unhappy with any usage above the target amounts. Alternatively, we may only care about large deviations from the target :
or, perhaps, require that the loss is simply linear and independent of ,
for some fixed vector . Another important family of losses are those which are separable and depend only on the individual resource utilizations,
where for , are convex, nondecreasing functions. (The loss (5) is a special case of this loss, while (6) is a special case when the vector is nonnegative.) We will make the technical assumption that for every , otherwise no resource allocation would have finite loss.
We will see that each definition of a loss function implies a particular update rule for the network fees . This utility function can more generally be engineered to appropriately capture tradeoffs in increasing throughput of a particular resource at the possible detriment of other resources.
Now that we have defined the network designer’s loss, which is a way of quantifying ‘unhappiness’ when the resource usage is , we need some way to define the transactions that users are willing to create and, conversely, that miners are willing (and able) to include. We do this in a very general way by letting be the set of possible transactions that users and miners are willing and able to create and include. Note that this set is discrete and can be very complicated or potentially hard to maximize over (as is the case in practice). For example, the set could encode a demand for transactions which depend on other transactions being included in the block (as is the case in, e.g., miner extractable value [kulkarniMEV, daian2020flash, qin2021quantifying]), network-defined hard constraints of certain resources (such as for every ), and even very complicated interactions among different transactions (if certain contracts can, for example, only be called a fixed number of times, as in NFT mints). We make no assumptions about the structure of this set , but only require that the included transactions, , obey the constraint .
Convex hull of resource constraints.
A network designer may be more interested in the long-term resource utilization of the network than the resource utilization of any one particular block. In this case, the designer may choose to ‘average out’ each transaction over a number of blocks instead of deciding whether or not to include it in the next block. To that end, we, as designers, will be allowed to choose convex combinations of elements of the constraint set , which we will write as
. (In general, this means that we can pick probability distributions over the elements of, and
is allowed to be the expectation of any such probability distribution;i.e., we only require that, for the designer, is reasonable ‘in expectation’.) Here, components of may vary continuously between and ; these values have a simple interpretation. If is not or , then we can interpret the quantity as only including transaction after roughly blocks. Of course, neither users nor miners can choose transactions to be ‘partially included’, so this property will only apply to the idealized problem we present below. While this relaxation might seem unrealistically ‘loose’, we will see later how this easily translates to the realistic case where transactions are either included or not (that is, is either or ) by users and miners.
Finally, we introduce the transaction utilities, which we will write as . The transaction utility for transaction denotes the users’ and miners’ joint utility of including transaction in a given block. Note that it is very rare (if at all possible) to know the values of . However, we will see that, under mild assumptions, we do not need to know the values of in order to get reasonable prices, and reasonable update rules will not depend on .
Resource allocation problem.
We are now ready to write the resource allocation problem, which is to maximize the utility of the included transactions, minus the loss, over the convex hull of possible transactions:
This problem has variables and , and the problem data are the resource matrix , the set of possible transactions , and the transaction utilities . Because the objective function is concave and the constraints are all convex, this is a convex optimization problem (see appendix A). On the other hand, even though the set is convex, it is possible that does not admit an efficient representation (for example, it may contain exponentially many constraints) which means that solving this problem is, in general, not easy.
We can interpret the resource allocation problem (8) as the ‘best case scenario’, where the designer is able to choose which transactions are included (or ‘partially included’) in a block in order to maximize the total utility. While this problem is not terribly useful by itself, since (a) it cannot really be implemented in practice, (b) we often don’t know , and (c) we cannot ‘partially include’ a transaction within a block, we will see that it will decompose naturally into two problems. One of these problems can be easily solved on chain, while the other is solved implicitly by the users (who send transactions to be included) and miners (who choose which transactions to include). The solutions to the latter problem can always be assumed to be integral; i.e., no transactions are ‘partially included’. This will allow us to construct a simple update rule for the prices, which does not depend on . For the remainder of the paper, we will call this combination of users and miners the transaction producers.
Offchain agreements and producers.
Due to the inevitability of user-miner collusion, we consider the combination of the two, the transaction producers, as the natural unit. For example, it is not easily possible to create a transaction mechanism where the users are forced to pay miners some fixed amount, since it is always possible for miners to refund users via some off-chain agreement [roughgarden2020eip1559]. Similarly, we cannot force miners to accept certain transactions by users, since a miner always has plausible deniability of not having seen a given transaction in the mempool. While not perfect for a general analysis of incentives, this conglomerate captures the dynamics between the network’s incentives and those of the miners and users better than assuming each is purely selfishly maximizing their own utility (as opposed to strategically colluding) and suffices for our purposes.
3.1 Setting prices using duality
In this section, we will show a decomposition method for this problem. This decomposition method suggests an algorithm (presented later) for iteratively updating fees in order to maximize the transaction producers’ utility minus the loss of the network, given historical observations.
To start, we will reformulate (8) slightly by pulling the constraint into the objective,
where is the indicator function defined as
The Lagrangian [boyd2004convex, §5.1.1] for problem (9) is then
with dual variable . This corresponds to ‘relaxing’ the constraint to a penalty assigned to the objective, where the price per unit violation of constraint is . (Negative values denote refunds.) Rearranging slightly, we can write
The corresponding dual function [boyd2004convex, §5.1.2], which we will write as , is found by maximizing over and :
The first term can be recognized as the Fenchel conjugate of [boyd2004convex, §3.3] evaluated at , which we will write as , while the second term is the optimal value of the following problem:
with variable . We can interpret this problem as the transaction producers’ problem of creating and choosing transactions to be included in a block in order to maximize their utility, after netting the fees paid to the network. We note that the optimal value of (11) in terms of , which we will write as , is the pointwise supremum of a family of linear (and therefore convex) functions of , so it, too, is a convex function [boyd2004convex, §3.2.3]. Note that since the objective is linear, problem (11) has the same optimal objective value as the nonconvex problem
Since the dual function is the sum of convex functions and , it, too, is convex. (We will make use of this property soon.) Having defined the dual function , we will see how this function can give us a criterion for how to best set the network fees .
An important consequence of the definition of the dual function is weak duality [boyd2004convex, §5.2.2]. Specifically, letting be the optimal objective value for problem (8), we have that
for every possible choice of price . This is true because we have essentially ‘relaxed’ the constraint to a penalty, so any feasible point for the original problem (9) always has penalty. (There may, of course, be other points that are not feasible for (9) but are perfectly reasonable for this ‘relaxed’ version, so we’ve only made the set of possibilities larger.) The proof is a single line:
A deep and important result in convex optimization is that, in fact, there exists a for which
under some basic constraint qualifications.111The condition is that the relative interior of is nonempty. Here, we write and , while the relative interior is taken with respect to the affine hull of the set. This condition almost always holds in practice for reasonable functions and sets . In other words, adding the constraint to the problem is identical to correctly setting the prices . Since we know for any that then
or, that is a minimizer of . This motivates an optimization problem for finding the prices.
The dual problem.
The dual problem is to minimize , as a function of the fees . In other words, the dual problem is to find the optimal value of
with variable . If we can easily evaluate , then, since this problem is a convex optimization problem, as is convex, solving it is usually also easy. An optimizer of the dual problem has a simple interpretation using its optimality conditions. Let be a solution to the dual problem (12) for what follows. If the packing problem (11) has a unique solution for , then the objective value is differentiable at . (See appendix A.2.) Similarly, under mild conditions on the loss function (such as strict convexity) the function is differentiable at , with derivative satisfying . In this case, the optimality conditions for problem (12) are that
In other words, the fees that minimize (12) are those that charge the transaction producers the exact marginal costs faced by the network, . Furthermore, these are exactly the fees which incentivize transaction producers to include transactions that maximize the welfare generated minus the network loss, subject to resource constraints, since and are feasible and optimal for problem (8).
In general, is not always differentiable, but is almost universally subdifferentiable, under mild additional conditions on (e.g., does not contain a line). Condition (13) may then be replaced with
while are the maximizers of problem (11) for price . We define , and write for the subgradients of at (cf., appendix A.5). The condition then says that the intersection of the extremizing sets and is nonempty at the optimal prices . We show a special case of this below, when , with a direct proof using strong duality that does not require these additional conditions.
3.2 Minimal demand conditions
We can give a condition for which we can guarantee that the optimal prices, i.e., those which minimize the dual problem (12), satisfy . The condition is: when resources have zero fee, the optimal set of included transactions that would be included at no price, defined as , with
is ‘disjoint’ from the set of minimizers of the loss, , defined
in the following sense:
where . An intuitive version of this condition is that the demand for transactions, if they could be executed at no cost to the transaction producers, always incurs some loss for the network. This, in turn, implies that the optimal fees for such resources must be nonzero.
For convenience, for the rest of this section, we will define when and otherwise. This lets us write:
and, for any , we have
To see this, we will use strong duality. We will prove the contrapositive statement: if is optimal, then there exists a point in the intersection .
If is optimal, then, using strong duality:
Rewriting the problem to remove the constraint , we have
Since is a compact set and is lower semi-continuous, there exists some which achieves this maximum. Now, using the definition of (cf., equation (10)),
Putting both statements together, we find
Since we know, by definition of and that
then, putting these together with the above statements, we find that and are minimizers for the first and second terms, respectively, i.e.,
This means that and , or, equivalently, that is nonempty. The statement above follows from the contrapositive: if is nonempty, then is not a minimizer for .
The prices that minimize the function are intimately related to the geometry of the sets and . (We will see this soon.) For this purpose, we will define
to be the cone of hyperplanes that separate the setsand , defined:
Note that , so this set is always nonempty, and is closed and convex as it can be written as the intersection of closed halfspaces (which are also convex sets):
Conditions on prices.
We will show that, if satisfies , then . In other words, any minimizer of the dual function must be a separating hyperplane of the extremizing sets and . The proof is relatively simple. Since
then, using the definition of , we have that
for every and , so
If we restrict and to be in and , respectively, and negate both sides, we then have
or, that for all and , which is the definition of . (Note that we may replace the inequalities with strict inequalities and with , the interior of the cone , to receive a second useful statement.)
We note that the above definitions serve as a natural generalization of the condition that the resource utilization is equal to a target utilization . In our case, we can have many ‘optimal’ utilizations, given by , with the additional granularity that any suboptimal resource utilization vector has a certain degree of displeasure, . If the set of optimal utilizations (for the loss function ) do not overlap with the zero-cost demand, , then the original condition states that the prices must be nonzero.
On the other hand, we know that any set of optimal prices must be a separating hyperplane for the sets and ; i.e., that . This leads to some interesting observations. If zero utilization is a possible target, i.e., , as is the case for any nondecreasing loss such as (7), then the set is contained in the dual cone of , where is the set of conic (nonnegative) combinations of elements in . For more information, see, e.g., [boyd2004convex, §2.6].
There is also a partial converse to the above conditions on the prices . In particular, for any resource cost in the interior of the cone (satisfying the technical condition that contains no line in the direction of ) there is always some scalar such that ; i.e., cannot be a minimizer for if the interior of is nonempty. While interesting, this point is slightly technical, so we defer it to appendix B. In general, we can view this statement as a stronger version of the original claim: if the compact set and closed set are disjoint, then there is a strict separating hyperplane between and , say , so the set is nonempty since it contains . This, in turn, would immediately imply that cannot be minimal.
There are a number of properties of the prices that can be derived from the dual problem (12).
If the objective function is separable and nondecreasing, as in (7), then any price feasible for problem (12) must be nonnegative, . (By feasible, we mean that .) To see this, note that, by definition (7), we have
so we can consider each term individually. If then any must have
as since is nondecreasing in . So and therefore this choice of cannot be feasible, so we must have that .
Superlinear separable losses.
If the losses are superlinear, in that
as and bounded from below, in addition to being nondecreasing, then is finite for . This means that the effective domain of , defined as the set of prices for which is finite,
is exactly the nonnegative orthant. (This discussion may appear somewhat theoretical, but we will see later that this turns out to be an important practical point when updating prices.) While not all losses are superlinear, we can always make them so by, e.g., adding a small, nonnegative squared term to , say
where and is a small positive value, or by setting for large enough.
Alternatively, if the function is decreasing somewhere on the interior of its domain, then there exist points for which prices are negative—i.e., sometimes the network is willing to subsidize usage by paying users to use the network to meet its intended target. The interpretation is simple: if the base demand of the network is not enough to meet the target amount, then the network has an incentive to subsidize users until the marginal cost of the target usage matches the subsidy amounts. We note that this may only apply to very specific transaction types in practice, as it is difficult to issue subsidies in an incentive-compatible manner that doesn’t incentivize the inclusion of ‘junk’ transactions.
Another observation is that there often exist prices past which transaction producers would always prefer to not submit a transaction (or, more generally, will only submit transactions that consume no resources, if such transactions exist). In fact, we can characterize the set of all such prices.
To do this, write for the set of transactions bundles that use no resources, defined
If then is nonempty (as ), and we usually expect this set to be a singleton, . Otherwise, we are saying that there are transactions that are always costless to include. Now, we will define the set
This is the set of prices such that, for every possible transaction bundle , the price of this transaction bundle, , paid to the network, is strictly larger than the total welfare generated by including it, which is . (That is, any transaction bundle that is not costless is always strictly worse than no transaction, for transaction producers, at these prices.) The set is nonempty since for every (and is finite) so, setting , we have that
as , so for large enough. The set is also a convex set, as it is the intersection of convex sets. Additionally, if , then any prices satisfying must also have , where the inequality is taken elementwise. In English: if certain resource prices would mean that transactions that consume resources are not included, then increasing the price of any resources to also implies the same.
3.4 Solution methods
As mentioned before, the dual problem (12) is convex. This means that it can often be easily solved if the function (or its subgradients) can be efficiently evaluated. We will see that, assuming users and miners are approximately solving problem (11), we can retrieve approximate (sub)gradients of and use these to (approximately) solve the dual problem (12). In a less-constrained computational environment, a quasi-Newton method (e.g., L-BFGS) would converge quickly to the optimal prices and be efficient to implement. However, these methods aren’t amenable to on-chain computation due to their memory and computational requirements. To solve for the optimal fees on chain, we therefore propose a modified version of gradient descent which is easy to compute and does not require additional storage beyond the fees themselves.
Projected gradient descent.
A common algorithm for unconstrained function minimization problems, such as problem (12), is gradient descent. In gradient descent, we are given an initial starting point and, if the function is differentiable, we iteratively update the prices in the following way:
Here, is some (usually small) positive number referred to as the ‘step size’ or ‘learning rate’ and is the iteration number. This rule has a few important properties. For example, if , that is, is optimal, then this rule does not update the prices, ; in other words, any minimizer of is a fixed point of this update rule. Additionally, this rule can be shown to converge to the optimal value under some mild conditions on , cf. [bertsekas99, §1.2]. This update also has a simple interpretation: if is not zero, then a small enough step in the direction of is guaranteed to evaluate to a lower value than , so an update in this direction decreases the objective . (This is why the parameter is usually chosen to be small.)
Note that if the effective domain of the function , , is not , then it is possible that the st step ends up outside of the effective domain, , so which would mean that the gradient of at price would not exist. To avoid this, we can instead run projected gradient descent, where we project the update step into the domain of , in order to get , i.e.,
where is defined
In English, is the projection of the price to the nearest point in the domain of , as measured by the sum-of-squares loss . (This always exists and is unique as the domain of is always closed and convex, for any loss function as defined above.) There is relatively rich literature on the convergence of projected gradient descent, and we refer the reader to [shor1985minimization, bertsekas99] for more.
Evaluating the gradient.
In general, since we do not know , we cannot evaluate the function at a certain point, say . On the other hand, the gradient of at , when is differentiable, depends only on the solution to problem (11) and the maximizer for the dual function , at the price . (This follows from the gradient equation in (13).) So, if we know that transaction producers are solving their welfare maximization problem (11) to (approximate) optimality, equation (13) suggests a simple descent algorithm for solving the dual problem (12).
From before, let be a maximizer of , which is usually easy to compute in practice, and let be an (approximate) solution to the transaction inclusion problem (11) (observed, e.g., after the block is built with resource prices ). We can approximate the gradient of at the current fees using (13), where we replace the true solution with the observed solution . Since is Boolean, we can compute the resource usage after observing only the included transactions. We can then update the fees in, say, block , to a new value by using projected gradient descent with this new approximation:
Whenever is differentiable at , we have that . (To see this, apply the first-order optimality conditions to the objective in the supremum in the definition of .) We can then think of as the resource utilization such that the marginal cost to the network is equal to the current fees . Thus, the network aims to set such that the realized resource utilization is equal to . We can see that (15) will increase the network fee for a resource being overutilized and decrease the network fee for a resource being underutilized. This pricing mechanism updates fees to disincentivize future users and miners from including transactions that consume currently-overutilized resources in future blocks. Additionally, we note that algorithms of this form are not the only algorithms which are reasonable. For example, any algorithm that has a fixed point satisfying and converges to this point under suitable conditions is also similarly reasonable. One well-known example is an update rule of the form of (3):
when the prices must be nonnegative, i.e., when . We note that one important part of reasonable rules is that they only depend on (an approximation of) the gradient of the function , since the value of may not even be known in practice. Additionally, in some cases, the function is nondifferentiable at prices . In this case, the subgradient still often exists and convergence of the update rule (16) can be guaranteed under slightly stronger conditions. (The modification is needed as not all subgradients are descent directions.)
We can derive specific update rules by choosing specific loss functions. For example, consider the loss function
which captures infinite unhappiness of the network designer for any deviation from the target resource usage . The corresponding conjugate function is
with maximizer . (Note that this maximizer does not change for any price ). Since , then the update rule is
If the utilization lies far below , the fees might become negative, i.e., the network would want to subsidize certain resource usage to meet the requirement that it must be equal to .
Nondecreasing separable loss.
A more reasonable family of loss functions would be the nondecreasing, separable losses:
where for scalar and is applied elementwise for vectors. Additionally, using the definition of the separable loss, we can write
Letting be the maximizers for the individual problems at the current price , we have
For example, if is an indicator function with if and otherwise, as in the loss (5), then an optimal point is always , when . Since no updates will ever set , we therefore have,
While we used projected gradient descent rules for the examples above, we note that this class of update rules is not the only option. Other update rules naturally fall out of other optimization algorithms applied to (12). For example, if we only want to update some subset of the prices at each iteration, we can use block coordinate descent. We can also add adaptive step size rules or momentum terms to our gradient descent formulation. These additions would yield more computationally expensive algorithms, but they may result in faster convergence to optimal prices when the distribution of processed transactions shifts. This is a potentially interesting area for future research.
4 The cost of uniform prices
In this section, we show that pricing resources using the method outlined above can increase network efficiency and make the network more robust to DoS attacks or distribution shifts. We construct a toy experiment to illustrate these differences between uniform and multidimensional resource pricing, and leave more extensive numerical studies to future work.
We consider a blockchain system with two resources (e.g., compute and storage) with resource utilizations and . Resource 1 is much cheaper for the network to use than resource 2, so
Furthermore, we assume that there is a joint capacity constraint on these resources
which captures the resource tradeoff. Each transaction is therefore a vector in with
As in §3.4, we consider the simple loss function
which has update rule
In the scenarios below, we compare our multidimensional fee market approach to a baseline, where both resources are combined into one equal to with . We demonstrate that pricing these resources separately leads to better network performance. All code is available at
We run simulations in the Julia programming language [julia]. The transaction producers’ optimization problem (11) is modeled with JuMP [jump]
and solved with COIN-OR’s simplex-based linear programming solver, Clp[clp-solver]. The solution is usually integral, but when it is not, we fall back to the HiGHS mixed-integer linear program solver [huangfu2018highsg].
Scenario 1: steady state behavior.
We consider a sequence of blocks. At each block, there are submitted transactions, with resource usage randomly drawn as and . (For example, these may be moderate compute and low storage transactions.) Transaction utility is drawn as . We initialize the price vector as and examine the steady state behavior, where the price updates and transaction producer behavior are defined as in the previous section. We use a learning rate throughout.
The resource utilization, shown in figure 1, suggests that our multidimensional scheme more closely tracks the target utilities than a single-dimensional fee market. Figure 2 shows the squared deviation from the target resource utilizations. Furthermore, the number of transactions included per block is consistently higher, illustrated in figure 3 (purple line).
Scenario 2: transactions distribution shift.
Often, the distribution of transaction types submitted to a blockchain network differs for a short period of time (e.g., during NFT mints). There may be a change in both the number of transactions and the distribution of resources required. We repeat the above simulation but add transactions in block ; each transaction has a resource vector . (For example, these transactions may have low computation but high storage requirements.) We draw the utility and begin the network at the steady-state prices from scenario 1.
In figure 4, we see that a multidimensional fee market gracefully handles the distribution shift. The network fully utilizes resource 2 for a short period of time before returning to steady state. Uniform pricing, on the other hand, does not do a good job of adjusting its resource usage and oscillates around the target. Figure 5 show that, as a result, multidimensional pricing is able to include more transactions, both during the distribution shift and after the network returns to steady state. We see that the prices smoothly adjust accordingly.
Parallel transaction execution model.
Consider the scenario where the nodes have parallel execution environments (e.g., threads), each of which has its own set of identical resources. In addition, there are resources shared between the environments. We denote transactions run on thread by . The resource allocation problem becomes
As before, the Boolean vector sets and encode constraints such as resource limits for each parallel environment and the shared environment respectively. In addition, we’d expect to have if each transaction is only allocated to a single environment, which can be encoded by . By stacking the variables into one vector, this problem can be seen as a special case of (8) and can be solved with the same method presented in this work. (The interpretation here is that we are declaring a number of ‘combined resources’, each corresponding to the parallel execution environments along with their shared resources.)
Different price update speeds.
Some resources may be able to sustain burst capacities for much shorter periods of times than other resources. In practice, we may wish to increase the prices of these resources faster. For example, a storage opcode that generates a lot of memory allocations will quickly cause garbage collection overhead, which could slow down the network. As a result, we likely want to increase its price faster than the prices of basic arithmetic, even under the same relative utilization. To do this, we can update (15) to include a learning rate for each resource. We collect these in a diagonal matrix :
These learning rates can be chosen by system designers using simulations and historical data.
Alternatively, we can define utilization on a per-contract basis instead of a per-resource basis (per-contract fees were recently proposed by the developers of Solana [aeyakovenko_consider_2021]). We define the utilization of a smart contract as , where is some weight vector and is the number of times contract is called. In matrix form, , where is the Hadamard (elementwise) product. For each contract, the utilization is when , which can be interpreted as not calling contract in a block. When , the utilization is . When we use per-contract utilizations, the loss function can capture a notion of fairness in resource allocation to contracts. For example, we may want to prioritize cheaper-to-execute contracts over more expensive ones by using, e.g., proportional fairness as in [kelly1997charging], though there are many other notions that may be useful. With this setup, the resource allocation problem is
Again, we can introduce the dual variable for the equality constraint, and, with a similar method to the one introduced in this paper, iteratively update this variable to find the optimal fees to charge for each smart contract call.
We constructed a framework for multidimensional resource pricing in blockchains. Using this framework, we modeled the network designer’s goal of maximizing transaction producer utility, minus the loss incurred by the network, as a an optimization problem. We used tools from convex optimization—and, in particular, duality theory—to decompose this problem into two simpler problems: one solved on chain by the network, and another solved off chain by the transaction producers. The prices that unify the competing objectives of minimizing network loss and maximizing transaction producer utility are precisely the dual variables in the optimization problem. Setting these prices correctly (i.e., to minimize the dual function) results in a solution to the original problem. We then demonstrated efficient methods for updating prices that are amenable to on-chain computation. Finally, we numerically illustrate, via a simple example, the proposed pricing mechanism. We find that it allows the network to equilibrate to its resource utilization target more quickly than the uniform price case, while offering greater throughput without increasing node hardware requirements.
To the best of the authors’ knowledge, this is the first work to systematically study optimal pricing of resources in blockchains in the many-asset setting. Future work and improvements to this model include a detailed game-theoretic analysis, extending that of [ferreira2021dynamic], along with a more concrete analysis of the dynamical behavior of fees set in this manner. Finally, a more thorough numerical evaluation of these methods under realistic conditions (such as testnets) will be necessary to see if these methods are feasible in production.
We would like to thank John Adler, Vitalik Buterin, Dev Ojha, Kshitij Kulkarni, Matheus Ferreira, Barnabé Monnot, and Dinesh Pinto for helpful conversations, insights, and edits. We’re especially appreciative to John Adler for bearing with us through many drafts of this work and consistently providing valuable feedback.
Appendix A A (very short) primer on convexity
This appendix serves as a short introduction to the basic notions of convexity used in this paper for readers familiar with basic real analysis and linear algebra. For (much) more, we recommend [boyd2004convex].
a.1 Basic definitions
We say a set is convex if, for any two points and any , we have
In other words, a set is convex if it contains all line segments between any two points in . A classic example of a closed convex set is a closed halfspace
where and . (An otherwise silly but useful example is the empty set, which vacuously meets the requirements.)
We say a function over the extended reals is convex if is convex and, for any and ,
Equivalently: a function is convex if any chord of the function (a line segment between two points on its graph) lies above (or, strictly speaking, not below) the function itself. We say is concave if is convex. Some basic functions that are convex are linear functions for some , norms , and indicator functions of convex sets:
where is a convex set.
Usually it is simpler to work with functions defined over all of as opposed to just a subset. We may extend the functions to a function by setting for and if . We write the effective domain of a function defined over as
Throughout the paper and the remainder of this appendix, we assume that all functions are extended in this way.
Characterizations of convexity.
There are many equivalent characterizations of convexity for functions. A particularly useful one, if the function is differentiable over its domain , is, for any two points , we have:
In other words, any tangent plane to at is a global underestimator of the function. A common characterization for twice-differentiable functions is that the hessian (the matrix of all second derivatives) of at every point is positive semidefinite, but this is often particularly restrictive. The gradient-based definition of convexity immediately implies that if we find a point with , then
for any ; i.e., is a global minimizer of . (The converse is similarly easy to show.)
Consequences of convexity.
There are a number of important consequences of convexity. The simplest is: given two closed convex sets with then there exists a vector and that separates these sets, i.e.,
If one of the sets, say , is also compact, then it is possible to make the stronger claim that there exists and such that
The arguments for each of these are relatively simple and can be found in [boyd2004convex, §2.5]. Nearly all major results in convex optimization theory are a consequence of these two facts.
Convexity-preserving operations on sets.
There are a number of operations which preserve convexity of sets. For example, any (potentially uncountable) intersection of convex sets is convex. The (finite) sum of convex sets , defined
is convex, while negation of a set is also convex. Any linear function of a convex set, say with , is convex. In the special case that is also compact, then is compact. (It is, on the other hand, not true in general that if is closed then is closed.) All of these conditions can be easily verified from the definitions above. Additionally, there are other operations that preserve convexity, such as the perspective transform [boyd2004convex, §2.3.3], but we do not need those here.
Convexity-preserving operations on functions.
Similar to the case with convex sets, there are a number of convexity-preserving operations on convex functions. The sum of convex functions is convex, while any nonnegative scaling of a convex function is convex, i.e., is convex. Affine precomposition of convex functions is convex, i.e., if is convex over then is convex over , for any and . Convexity is preserved over suprema:
where is a family of