The rapidly growing number of mobile devices and the dramatic increase in real-time applications has driven interest in fresh data as measured by the age-of-information (AoI) . Real-time applications in which fresh data is critical include real-time monitoring, data analytics, vehicular networks, and cloud computing frameworks. For example, real-time knowledge of traffic information and the speed of motor vehicles is crucial in autonomous driving and unmanned aerial vehicles. Another example is real-time mobile crowd-sensing (or mobile crowd-learning ) applications, in which a platform is fueled by mobile users’ participatory contribution of real-time data. This class of examples includes real-time traffic congestion and accident information on Google Waze  and real-time location information for scattered commodities and resources (e.g., GasBuddy ).
Keeping data fresh relies on frequent data generation, processing, and sampling, which can lead to significant (sampling) costs for the data source. In practice, data sources (i.e., fresh data contributors) are self-interested in the sense that they may have their own interests different from those of data destinations (i.e., fresh data requestors). Consequently, the participation of sources relies on proper incentives from the destination. The resulting economic interactions between sources and destinations constitute fresh data markets, which have been studied in [5, 6, 2, 7].
The existing studies on fresh data markets [5, 6, 2, 7] designed incentives assuming complete information. A crucial economic challenge not addressed in these works is dealing with market information asymmetry. Specifically, sources in practice may have private (market) information (e.g., sampling cost and data freshness) that is unknown by others. Therefore, they may manipulate the outcome of the system (e.g., their subsidies and the scheduling policies) by misreporting such private information to their own advantages. To the best of our knowledge, no existing work has addressed fresh data markets with such asymmetric information. Motivated by the above issue, this work aims to solve the following key question:
How should a destination acquire fresh data with self-interested sources and market information asymmetry?
I-a Challenges and Solution Approach
Existing related studies on information asymmetry in data markets (without considering data freshness) have identified two different levels of possible manipulation [8, 9, 10, 11, 12, 13], depending on whether data is verifiable, i.e., whether the destination can verify the authenticity (or freshness) of data. These two levels of manipulation are:
Data fraud. For unverifiable data, a source may even fake the data itself, e.g., by sending dummy data to avoid incurring corresponding costs (as in, e.g., ).
As a first step towards tackling a fresh data market with asymmetric information, this work focuses on the first type of manipulation due to misreporting private cost information and assumes verifiable fresh data. Even this level of misreporting is challenging and may lead to an arbitrarily bad loss, as we will analytically show in Section III-D.
In the economics literature, a standard approach for designing markets with asymmetric information is via the optimal mechanism design approach of Myerson . These optimal mechanism design problems are linear and can be potentially reduced to computing a “posted price”, which is computational-efficient (e.g., ). Different from the standard setting, our fresh data market framework features a non-linear age-related cost. This nature of AoI requires new design of optimal mechanisms and problem formulations. Once formulated, finding the optimal mechanisms may suffer from prohibitively expensive computational overheads as it involves solving a nonlinear infinite-dimensional optimization problem due to the age-related cost.
To this end, we leverage the optimal mechanism design approach to optimize an AoI-related performance and address the following question:
How should a destination design a computational-efficient and optimal mechanism for acquiring fresh data?
We summarize our contributions as follows:
Fresh Data Market Modeling with Private Cost Information. We develop a new analytical model for a fresh data market with private cost information and allow multiple sources to strategically misreport this information. To the best of our knowledge, this is the first work in the AoI literature to address market information asymmetry.
Optimal Mechanism Design. Based on Myerson’s seminal work, we transform the optimal mechanism design problem into an infinite-dimensional nonlinear optimization problem. We then solve the problem and analytically derive the optimal solution.
Quantized Mechanism Design. To further reduce computational overheads, we design a quantized mechanism while maintaining the sources’ truthfulness. This achieves asymptotic optimality and enables one to make tradeoffs between optimality and computational overhead by tuning the quantization step size.
Our analytical and numerical results show that when the sampling cost is exponentially distributed, the performance gains of our optimal mechanism can be unbounded compared against a benchmark mechanism. In addition, the optimal mechanism is most beneficial when there are fewer sources with more heterogeneous sampling costs.
We organize the rest of this paper as follows. In Section II, we discuss some related work. In Section III, we describe the system model and the mechanism design problem formulation. In Sections IV and V, we develop the optimal mechanisms for single-source systems and multi-source systems, respectively. In Section VI, we develop the quantized mechanism. Section VII studies the optimal mechanism design under general virtual cost functions, which will be defined in Sections IV and V. We provide some analytical and numerical results in Section VIII to evaluate the performances of the optimal mechanism and the quantized mechanism, and we conclude the paper in Section IX.
Ii Related Work
Age-of-Information: The AoI metric has been introduced and analyzed in various contexts in the recent years (e.g., [1, 14, 16, 17, 15, 18, 19, 20, 21]). Of particular relevance to this work are those pertaining to the economics of fresh data and information [5, 6, 2, 7]. The most closely-related studies to ours are in [6, 2], which consider systems with destinations using dynamic pricing schemes to incentivize sensors to provide fresh updates. The sources in [6, 2] are myopic instead of forward-looking, i.e., in our case the source considers its longer term payoff. None of this prior work has considered the role of private market information as we do here.
Optimal Mechanism Design: There exists a rich economics literature on optimal mechanism design (e.g., [22, 23, 24, 25, 26, 27]). Our approach is based on Myerson’s characterization of incentive compatibility and optimal mechanism design . A closely related line of work is optimal procurement mechanism (also known as reverse auction) design (e.g., [23, 24, 25, 26, 27]
), in which a buyer designs a mechanism for purchasing items from multiple suppliers and revealing their private quality information. However, existing mechanisms cannot be directly applied here due to differences in the problem setting induced by the age-related cost functions (e.g., linear programming in[23, 24, 25, 26]
and combinatorial optimization in).
Approximately Optimal Mechanism Design: Another closely related direction is approximately optimal mechanism design (e.g., [33, 34, 35, 36, 37, 38] and surveys in [31, 32]). Approximate mechanisms have been proposed to deal with a wide range of practical issues such as bounded communication overheads (e.g., [37, 38, 39]), bounded computational overheads (e.g., [34, 35, 36]), and limited distributional knowledge (e.g., ). In particular, [38, 39] designed quantized mechanisms, quantizing the infinite-dimensional space of agents’ reporting strategies for reducing communication overheads. On the other hand, references [34, 35, 36] mainly proposed approximate mechanisms to reduce the computational overheads for combinatorial problems, which is not the case here. Our quantized mechanism differs from these mechanisms in that it aims at reducing computational overheads due to the underlying nonlinear infinite-dimensional optimization.
Information Acquisition: There has been a recent line of work on viewing data as an economic good. A growing amount of attention has been placed on understanding the interactions between the strategic nature of data holders and the statistical inference and learning tasks that use data collected from these holders (e.g., [8, 9, 10, 11, 12, 13]). In this line of research, a data collector designs mechanisms with payments to incentivize data holders to reveal data, under private information. However, none of the studies in this line of research considered data freshness.
Iii System Model and Problem Formulation
Iii-a System Model
We consider an information update system in which a set of data sources (such as Internet-of-Things devices) generate data packets and send them to one destination.
Iii-A1 Data Updates and Scheduling
We consider a generate-at-will model (as in, e.g., [15, 19]), in which the sources are able to generate and send a new update when requested by the destination. We assume instant update arrivals at the destination, with negligible transmission delay (as in, e.g., ).
The destination’s data acquisition policy consists of two decision sets, namely and the update policy and the (source) scheduling policy . In particular, the update policy requested by the destination determines a sequence of times to request updates given by , where every denotes the interarrival time between the -th and -th updates. The scheduling policy is a set of binary indicators specifying which source is to be selected to generate the -th update. That is, indicates that source is selected for the -th update and indicates otherwise. The scheduling policy should satisfy, ,
i.e., at each update, exactly one source is to be selected.
Let denote the interarrival time between -th and -th updates generated by source . Mathematically,
where indicates that the -th update received by the destination is the -th update generated by source , i.e., and for all and .
Each source ’s data updates are subject to a maximal update frequency constraint (as in ), given by
where is the maximal allowed average update frequency for source , which could reflect constraints on the resources available to this source (e.g. CPU power).
The Age-of-Information (AoI) at time is defined as 
where is time stamp of the most recently received update before time , i.e.,
Iii-A3 Source’s Sampling Cost and Private Information
We denote the source ’s unit sampling cost by for each update, which is its private information. We consider a Bayesian setting in which each source ’s sampling cost is drawn from . We define . Let
be the cumulative distribution function (CDF) and
be the probability density function (PDF) for source; we assume that only source ’s prior distribution is known by the destination and sources other than .111In the case where such distributional knowledge is unavailable, one can further consider prior-free approximately optimal mechanism design, as in , which will be left for future work.
Iii-A4 Destination’s AoI Cost
We introduce an AoI cost function to represent the destination’s level of dissatisfaction for data staleness. We model it as a general non-negative and increasing function in . We can specify the AoI cost function based on applications. For instance, in online learning (advertisement placement and online web ranking [28, 29]), one can use with .
We further define the destination’s cumulative AoI cost as
which denotes the aggregate cost for an interarrival time . Note that is convex in since .
Iii-B Mechanism Design and Reporting Game
|The destination designs a mechanism and announces it to sources.|
|Each source submits its report (a potential misreport) of its sampling cost .|
Fig. 1 depicts the interaction between the destination and the sources: the destination in Stage I designs an (economic) mechanism for acquiring each source’s report of its sampling cost and data updates; the sources in Stage II report their sampling costs , where denotes source ’s report. A mechanism takes the sources’ reports (potential misreports) of their sampling costs as the input of the data acquisition policy and for determining the monetary reward to each source. Mathematically, a general mechanism is a tuple of a payment rule , an update policy , and a scheduling policy . The prices (i.e., rewards) can be different across different updates and sources. That is, , where . The sets and are defined in Section III-A1. Policies , , and are functions of the sources’ reported costs .
Iii-B1 Reporting Game in Multi-Source Systems
When there are multiple sources (i.e., ), the mechanism induces a reporting game among the sources:
Game 1 (Reporting Game).
The reporting game is a tuple given by , defined as:
Players: the set of all sources ;
Strategy space: each source ’s reporting strategy is ;
Payoff: each source has a payoff function:222The infimum limit in (6) implies that each source is concerned about the worse-case scenario of its payoff.
The source’s payoff represents its long-run profit per-unit time. Note that, in related studies [6, 2], the considered sources are not forward-looking. Instead, they are assumed myopic, i.e., not maximizing their respective long-term objectives as in (6).
Since each source does not know the other sources’ exact sampling costs but only knows the corresponding prior distributions, a Bayesian equilibrium is induced defined as :
Definition 1 (Bayesian Equilibrium).
The Bayesian equilibrium is the sources’ reporting profile such that, for all ,
In other words, the Bayesian equilibrium depicts a strategy profile where each player maximizes its expected payoff assuming the strategy of the other players is fixed.
The destination aims to design an optimal mechanism to minimize its expected (long-term time average) overall cost:
where is the Bayesian equilibrium defined in (7).
Each source may have incentive to misreport its private information . However, according to the revelation principle , for any mechanism , there exists an incentive compatible (i.e. truthful) mechanism such that . This allows us to replace all in (8) by , restrict our attention to incentive compatible mechanisms, and impose the following incentive compatibility constraint:
Furthermore, a mechanism should satisfy the following (interim) individual rationality (IR) constraint:
That is, each source should not receive a negative expected payoff; otherwise, it may choose not to participate in the mechanism.
Iii-B2 Single-Source System
We now discuss a special case where there is only one source. Hence, we can drop the index and there exists no game-theoretic interaction among sources. The incentive compatibility and the individual rationality constraints are then reduced to:
Iii-C Problem Formulation
The destination seeks to find a mechanism to minimize its overall cost:
Definition 2 (Equal-Spacing and Flat-Rate Mechanism).
A mechanism is equal-spacing and flat-rate if
for some functions and .
Definition 3 ((Randomized) Stationary Scheduling).
The scheduling policy is said to be stationary if, for all , we have that given any , is chosen randomly at each time and is independent and identically distributed (i.i.d) across and satisfies
for some functions satisfying .
The stationary scheduling policies defined above are memoryless, in the sense that are independent across time. We now introduce the following lemma which shows that the existence of optimal mechanisms with these properties:
We present the proof of Lemma 1 in Appendix A. The proof of Lemma 1 involves showing that, for any optimal mechanism , we can always construct an equal-spacing and flat-rate mechanism with a stationary scheduling policy that yields at most the same objective value. This is mainly done by leveraging the convexity of . Lemma 1 allows us to restrict us attention to simple mechanisms so that we can now
drop the index in and ;
generate according to some i.i.d. distributions (across ) characterized by as in (15).
Therefore, we use where is the payment profile (i.e., ), and is the probability profile (i.e., ). It follows that, under an equal-spacing and flat-rate mechanism with a stationary scheduling policy, each source ’s payoff in (6) becomes:
and the destination’s overall cost in (8) becomes:
Iii-D Naive Mechanism
In this subsection, we introduce a naive mechanism that satisfies Definition 2 for single-source systems. We use this to show that such a mechanism can lead to an arbitrarily large cost for the destination when , .
Example 1 (Naive Mechanism).
Under the naive mechanism , the destination subsidizes the source’s reported cost; the update policy rule aims at minimizing its overall cost in (8), naively assuming the source’s report is truthful:
Solving (53) further gives
Given this naive mechanism, the source solves the following reporting problem:
whose solution can be shown given by , i.e., the optimal reporting strategy is to report the maximal possible value.333Note that it is not immediate that a source would always report its maximum value under such a mechanism. Though a larger reported value leads to a larger payment per update, it also leads to a larger inter-arrival time between updates. This makes the destination’s overall cost to be given by
Note that the ratio of the destination’s objectives in (21) under the source’s optimal report and the true cost is , which can be arbitrarily large as approaches infinity. Misreports leading to an arbitrarily large cost to the destination motivates the optimal mechanism design next.
Iv Single-Source Optimal Mechanism Design
In this section, we start with a system with only one source. Therefore, we can drop the index in our notations. We use the results of Lemma 1 to reformulate (13) and characterize the IC and the IR constraints in (11) and (12). The optimal mechanism design problem is then reduced to an infinite-dimensional optimization problem, which we analytically solve and use to derive useful insights.
Iv-a Problem Reformulation
Lemma 1 allows us to focus on equal-spacing and flat-rate mechanisms. The scheduling indicators satisfy , since only one source is present. The equal-spacing and flat-rate mechanism is then reduced to .
To further facilitate our analysis, we use to denote the update rate rule and to denote the payment rate rule such that
Since (22) defines an one-to-one mapping between and , we can focus on in the following and then derive the optimal based on the optimal .
Iv-B Characterization of IC and IR
Iv-B1 Incentive Compatibility
A mechanism is incentive compatible if and only if the following two conditions are satisfied:
is non-increasing in ;
has the following form:
for some constant (here, does not depend on but may depend on .)
Iv-B2 Individual Rationality
Given an arbitrary incentive compatible mechanism satisfying (23), to further satisfy the IR constraint in (10), we have that the minimal in (23) for the incentive compatible mechanism in Theorem in 1 is
We will assume that this choice of is used in the following.
We present an example in Fig. 2 to illustrate (23) and (24). Under a non-increasing and satisfying (23) and (24), a truthfully reporting source receives a payoff of , as shown in Fig. 2 (a); when the source reports , its payoff is . As shown in Fig. 2 (b), such an over-report incurs a payoff loss. Similarly, an under-report would also incur a payoff loss. These demonstrate incentive compatibility. In addition, a truthfully reporting source’s payoff is always non-negative for any and approaches when approaches , as . This demonstrates individual rationality.
Iv-C Mechanism Optimization Problem
Based on (24) and (23), we can focus on optimizing the update rate function only in the following. By the constraint , it follows that . Therefore, the update rate function lives in the Hilbert space associated to the measure of , i.e. the CDF .
This is a functional optimization problem. To derive insightful results, we first relax the constraint in (25b) and then show when such a relaxation in fact leads to a feasible solution (i.e., when it automatically satisfies (25b)).
We introduce the definition of the source’s virtual cost analog to the standard definition of virtual value in :
Definition 4 (Virtual Cost).
The source’s virtual cost is
The virtual cost allows us to transform the destination’s problem as in the following lemma:
The objective in (25a) can be rewritten as
We prove Lemma 2 in Appendix C, which involves changing the order of integration. If we relax the constraint , Lemma 3 makes the problem in (25) decomposable across every . Each subproblem is given by
which can be solved separably. We are now ready to introduce the solution to problem (25):
We present the proof of Theorem 2 in Appendix D. To comprehend the above results, the optimal solution solves each subproblem by the following two steps: (i) search for a that equalizes the marginal AoI cost reduction and the virtual cost for every ; (ii) project every onto the feasible set .
To see when (29) yields a feasible solution satisfying (25b), note that there always exists a unique positive value of in (30), and so the optimal for each in (29) is well defined. In addition, if is non-decreasing in , is non-increasing in .444The condition of the virtual cost being non-decreasing is known as the regularity condition in .
A non-decreasing virtual cost is in fact satisfied for a wide range of distributions of the source’s sampling cost. Fig. 3 illustrates an example of the optimal mechanism when the source’s sampling cost
follows a uniform distribution. We will focus on specific distributions in SectionVIII and generalize Theorem 2 to the more general (potentially not monotonic) virtual cost case in Section VII.
Iv-D Differences from Classical Settings and Computational Complexity
We highlight a key difference of the optimal mechanism satisfying Theorem 2 from some existing optimal mechanisms (e.g. in ) in classical economic settings, in which the sellers’ problems can be formulated into infinite-dimensional linear programs. Reference  showed that the optimal mechanism is a posted price mechanism in such classical settings, i.e., the optimal mechanism determines a posted price (equal to the virtual cost is our case). If the source’s cost is less than the posted price, then is assigned the maximum update rate with its payment equal to the price; otherwise the source is assigned no update.
Our problem in (25) differs from the classical settings in the nonlinearity introduced by , which brings the issue of computational complexity. As shown in Fig. 3, the computation of the optimal payment rate requires solving for in (29) over the entire interval , which may be computationally impractical. 555 This is not the case in the aforementioned existing optimal mechanism for classical settings (e.g. in ), since the integral is reduced to the virtual cost, which is computationally efficient. This motivates us to consider a computation-efficient approximation of the optimal mechanism in Section VI.
V Multi-Source Optimal Mechanism
In this section, we extend our results in Section IV to multi-source systems. The additional challenge here is that the optimal mechanism needs to take the sources’ interactions with each other into account. Similar to the single-source case, we first characterize the IC and the IR constraints, and then solve the infinite-dimensional optimization problem.
V-a Problem Reformulation
Lemma 1 allows us to focus on equal-spacing and flat-rate mechanisms with stationary scheduling policies (i.e., ). To further facilitate our analysis, we use to denote the update rate rule and to denote the payment rate rule such that, for all and all :
The above equations (31) define an one-to-one mapping between and . Hence, we can restrict our attention to in the following and then derive the optimal . We can then generate the corresponding stationary scheduling policy satisfying (31b) based on the optimal .
V-B Characterization of Incentive Compatibility and Individual Rationality
V-B1 Incentive Compatibility
A mechanism is incentive compatible if and only if the following two conditions are satisfied:
is non-increasing in ;
has the following form:
for some constant for all .
V-B2 Individual Rationality
V-C Mechanism Optimization Problem
Based on (32) and (33), we can focus on optimizing the update rate function only in what follows, which live in the Hilbert space associated to the measure of . We introduce the definition of the source ’s virtual cost.
Definition 5 (Virtual Cost).
The source ’s virtual cost is
The sources’ virtual costs enable the problem to be transformed as in the following lemma:
) has a vector-valued function as its optimization decision. To derive insightful results, we first omit theconstraints for all , similar to our approach in Section IV. Such constraints are automatically satisfied assuming the virtual costs are non-decreasing as we will show later. We will extend our results to the case of general virtual costs in Section VII.
To solve the problem in (35), we next introduce the aggregate update rate satisfying
and the following definition:
Definition 6 (Aggregate Virtual Cost).
Let be the aggregate virtual cost function, defined as
The definition of the aggregate virtual cost in Definition 6 involves solving a linear programming problem parameterized by . The intuition of solving (37) is as follows. Given each and , we assign the sources with higher virtual costs only after the sources with lower virtual costs are fully utilized (i.e., the constraints in (37b) are binding). It is readily verified that, given , is a piece-wise linear function in , and its differential is a step function in . We now introduce the following result to further transform the destination’s problem:
If is non-decreasing for all , the destination’s problem in (35) leads to the same minimal objective value as the following problem:
We present the proof of Lemma 4 in Appendix E. Lemma 4 transforms the vector functional optimization problem in (35) into a scalar functional optimization problem in (38). Therefore, after obtaining the optimal solution , we can then solve the problem in (37) to obtain the original solution to the problem in (35).
We observe that the problem in (38) now becomes similar to the problem in Lemma 3 in the single-source case, with the following difference: is not differentiable in . Hence, it follows that the optimality condition of (38) can be rewritten as
i.e., the marginal AoI cost reduction is equal to a subgradient of the aggregate virtual cost.
To understand (39), we first introduce the order indexing such that
i.e., source has the -th smallest virtual cost . We present an illustrative example of (39) in Fig. 4 for a given . As mentioned, the differential of the aggregate virtual cost in corresponds to a step function, as shown in Fig. 4. The intersection point between the subgradient and the curve of the marginal AoI cost reduction corresponds to the solution to (39).
We present the proof of Theorem 4 in Appendix F. Intuitively, after obtaining the optimal aggregate update rate , the problem is reduced to solving (38). That is, we utilize the least expensive (in terms of virtual cost) sources first and the sources with high virtual costs are assign update rates of . Each source ’s allocated update rate is then the residual aggregate update rate (the aggregate update rate subtracted from assigned update rates to the first least expensive sources) projected onto its feasible set .
Vi Quantized Optimal Mechanism
We note that the optimal mechanisms (for both single-source systems and multi-source systems) may be computationally impractical, since the optimal payment rate for the optimal mechanisms in (23) and (32) require explicitly solving in (29) and (41) for all .
Therefore, we are motivated to design a computation-efficient quantized mechanism, which is approximately optimal while maintaining the optimal mechanism’s economic properties.
Vi-a Quantized Mechanism
Vi-A1 Quantized mechanism description
Let be sources’ quantized reporting profile such that
where depicts the floor operator and is the quantization step size. Based on (42), we introduce the quantized mechanism in the following:
Definition 7 (Quantized Mechanism).
The quantized mechanism is
Vi-A2 Computational complexity
We illustrate an example quantized mechanism in Fig. 5. As in Fig. 5, the integral in (43b) is a Riemann sum, i.e., a computation-efficient finite sum approximation of . Specifically, given a quantization step size , computing the Riemann sum in (43b) requires one to compute for at most points for each source , where . Therefore, the overall computational overhead is given by .
Vi-B Properties of the Quantized Mechanism
In this subsection, we study the properties of the quantized mechanism in (43). We note that remains non-increasing in for all . Hence, based on the characterizations of the IC and the IR in Theorem 3 and (33), we have:
We next study the performance of the quantized mechanism in (43) in terms of the destination’s overall cost. To understand how well the quantized mechanism in (43) approximates the optimal mechanism, we derive the following lemma:
The aggregate virtual cost function in Definition 6 is differentiable in and satisfies
where is the Lipschitz constant of .
Lemma 5 characterizes an upper bound of the incremental changes of the aggregate virtual cost in , based on which we can next show that the quantized mechanism is approximately optimal:
The quantized mechanism leads to a bounded quantization loss compared to the optimal mechanism, given by
Vi-C Numerical Studies
We provide numerical results in Fig. 6 to understand the impact of the number of quantization intervals (i.e., ) on the quantization loss in a single-source system. We consider two classes of distributions of the source’s sampling cost, namely the uniform distribution and the truncated exponential distribution.666These two distributions of costs are also considered in .
Vi-C1 Uniform Distribution
We first consider the uniform distribution over the interval , the corresponding Lipschitz constant to which is . Fig. 6(a) shows that, when there are at least three quantization intervals, the quantization only incurs negligible loss (less than of the minimal overall cost), which verifies Proposition 1.
Vi-C2 Truncated Exponential Distribution
We consider the truncated exponential distribution over the interval with a PDF:
The corresponding Lipschitz constant to which is