I Introduction
Ia Background and Motivations
By connecting objects, physical devices, vehicles, animals and other items to the Internet, Internet of Things (IoT) has emerged as a new paradigm to enable ubiquitous and pervasive communications [1, 2, 3]. Wireless sensing service is one of the fundamental applications of IoT, which enables systems and users to continuously monitor ambient environment [4].
One of the major hurdles for implementing the wireless sensing application is the limited lifetime of traditional batterypowered sensors, which are costly and hard to maintain [5, 6]. For example, frequent recharging or battery replacement is inconvenient in deserts or remote areas, and is even impossible for some scenarios, such as toxic environment or implanted medical applications [7]. To tackle this problem, radio frequency energy harvesting (RFEH) has recently been proposed as an attractive technology to prolong the operational lifetime of sensors, enhance the deployment flexibility, and reduce the maintenance costs [7, 8].
In this paper, we consider a radio frequency energy harvesting based IoT system consisting of a data access point (DAP) and several energy access points (EAPs). The DAP collects information from its associated sensors. EAPs can provide wireless charging services to sensors via the RF energy transfer technique. The sensors are assumed to have no embedded energy supply, but they can harvest energy from radio frequency (RF) signals radiated by the surrounding EAPs to transmit the data to the DAP [9].
There are some research considering the deployment of dedicated EAPs in the existing cellular network, such that the upgraded network can provide both wireless access and wireless charging services [10, 11, 12, 13, 14, 15, 16, 17]. However, it was assumed that the EAPs are deployed by the same operator of the existing network. In practice, the DAP and EAPs may be operated by different operators^{1}^{1}1This could happen when a resourcelimited operator cannot provide radio frequency energy charging service in some certain area due to limited budgets, or lack of site locations, or lack of licensed spectrum for energy harvesting. Therefore, it has to resort to thirdparty operators.. To effectively motivate these thirdparty and selfinterested EAPs to help charge the sensors, effective incentive mechanisms are required to improve the payoff of the DAP as well as those of EAPs. While there are several initial work designing the incentive mechanism [18, 19, 20] for the EAPs belonging to different operators, complete information was considered in these schemes. Specifically, it was assumed that the EAPs will truthfully report their private information to the DAP, e.g., their energy costs and channel gains between EAPs and sensors. This happens when there exists a supervising entity in the network, which is capable of monitoring and sharing all behaviours and network conditions of the DAP and EAPs to ensure that they always report the trustful information. However, without such a supervising entity, EAPs’ private information might be not aware to the DAP, which is normally called information asymmetry in the literature [21]. A rational EAP may provide misleading information maliciously and pretend to be an EAP with better channel condition and/or higher energy cost to cheat for more rewards. A malicious EAP can succeed in cheating to get more benefits because of information asymmetry in the RF energy trading process.
To address above issues, in this paper we will design effective incentive mechanisms to maximize the utilities of the DAP and EAPs under scenarios with asymmetric information. To this end, the following important questions should be addressed under asymmetric information:
Which EAPs the DAP should hire, how much energy should be requested from the hired EAPs, and how many rewards should be given to the hired EAPs?
The above questions are nontrivial to answer because the hierarchical interactions between multiple parities should be modeled and analyzed: the cooperations between the DAP, the DAP’s sensors and the EAPs, and the competition among EAPs with heterogeneous private information. Moreover, the information asymmetry make the problem even more challenging, because it is difficult for the DAP to hire the effective EAPs without knowing EAPs’ private information, such as energy costs and channel condition towards its sensors.
IB Solution and Contribution
To answer the above questions, we apply the wellestablished economic theories to model the conflicted interests among the multiple parities in the considered RFEHbased IoT system. Specifically, we first extend the existing Stackelberg gamebased approach with complete information to the considered case with asymmetric information, such that we can evaluate the performance degradation caused by information asymmetry to this approach. More specifically, due to lack of the complete information, the expected utility function of the DAP is defined and optimized in the Stackelberg game with asymmetric information. Considering that contract theory is a powerful tool originated from economics to deal with information asymmetry in a monopoly market, we apply contract theory to develop an optimal contract to effectively motivate the EAPs under asymmetric information. In our contract, the RF energy trading market is analogous as a monopoly labor market in economics. The DAP is modeled as the employer who offers a contract to each EAP. The contract is composed of a serious of contract items, which are combinations of energyreward pairs. Each contract item is an agreement about how many rewards an EAP will get by contributing a certain amount of RF energy. Various heterogeneous EAPs are classified into different types according to their energy costs and instantaneous channel conditions. The EAPs are regarded as labors in the market, which will choose a contract item best meeting their interests. By properly designing the contract, an EAP’s type will be revealed through its selection. Thus the DAP can capture each EAP’s private information to a certain extent and thus relieve the issue of information asymmetry.
To the best knowledge, this is the first paper that systematically studies the RFEHbased IoT system under asymmetric information. The main contributions of this paper are summarized as follows.

We develop the framework of RF energy trading in the RFEHbased IoT system and systematically design the incentive mechanisms for a practical scenario with asymmetric information.

To explore the performance degradtion due to lack of full information, we first extend the existing Stackelberg gamebased approach to the considered case without instantaneous channel condition and energy costs of the EAPs by optimizing the expected utility function of the DAP. As contract theory is a powerful economic theory for designing incentive mechanism with asymmetric information. We then reformulate the problem by using contract theory. In our contract design, we characterize the necessary and sufficient conditions for the contract feasibility, i.e, individual rationality (IR) conditions and incentive capability (IC) conditions [21]. Subject to the IR and IC constraints, the optimal contract under information asymmetry is achieved by maximizing the DAP’s expected utility as well as the social welfare.

To compare the performance of the proposed contract theorybased approach using asymmetric information with that of the existing Stackelberg gamebased method with complete information, we generalize the existing Stackelberg game formation with unified pricing to the case with discriminative pricing and derive the new Stackelberg equilibrium in closedform. Here discriminative pricing means that we set different energy prices for different EAPs to fully exploit their potentials for the enery charging service. Numerical simulation results show that information asymmetry can lead to severe performance degradation for the Stackelberg gamebased framework, while the proposed contract theorybased scheme using asymmetric information outperforms the Stackleberg gamebased approach with complete information. This implies that the performance of the considered system depends largely on the market structure (i.e., whether the EAPs are allowed to optimize their received power at the IoT devices with full freedom or not) than on the information availability (i.e., the complete or asymmetric information).
Note that part of the work was presented in our previous conference paper [22]. In this journal version, we extend our previous work by considering both scenarios of complete information and asymmetric information and explore the impacts of information availability and market structure.
The rest of this paper is organized as follows: In Section II, we review the related literature. The system model is presented in Section III. The incentive mechanisms in asymmetric information in complete information are proposed in Section IV. The benchmark schemes are elaborated in Section V. Numerical results are presented in Section VI, and conclusions are drawn in Section VII.
Ii Related Work
Iia EAP assisted Wireless Energy Harvesting
The idea of deploying a dedicated wireless energy network, that can provide wireless charging service to the terminals by using RFEH technology, was originally proposed by Huang et al. [10, 11]. The dedicated power transmitters are called power beacons or EAPs. Using stochastic geometry, the tradeoff between the densities of the base stations and EAPs was analyzed in [10]. There are many works exploiting EAPs to enable services of both wireless information and energy access in existing cellular networks[12, 13, 14, 15, 16, 17]. Stochastic geometry was used to analyze the network performance with the EAPs in [12, 13, 14]. In [15], beamforming was introduced in the EAPs assisted cellular network to reduce the interference resulting from the EAPs. Leveraging finitelength information theory, the system performance in the finite blocklength regime was analyzed in [16]. The security issue with EAP in the presence of a passive eavesdropper was investigated in [17]. In all above works, the DAP and EAPs are assumed to belong to the same operator. In such a network, the devices belonging to the same operator with extra energy were assumed to voluntarily assist other devices. However, in practice, the DAP and EAPs may be operated by different operators. To successfully motivate selfinterested EAPs to provide help, effective incentive mechanisms are required. There are several prior research works in designing the incentive mechanism[18, 19, 20] for the EAPs, where [18, 19] adopted Stackelberg game and [20] used auction to design the incentive mechanism. However, the existing incentive mechanisms only considered complete information scenario.
IiB Contract Theory
The contract theory has been employed to address incentive design problems in wireless communication areas, such as mobile edge computing[23], devicetodevice (D2D) communications[24] and cooperative spectrum sharing[25]. To the best knowledge of the authors, we are the first to apply contract theory in the RF energy trading process in RFEHbased IoT systems. To design the incentive mechanism in such a scenario is challenging because the DAP needs to choose and reward the most efficient EAPs without knowing their channel conditions and energy costs.
IiC Stackelberg Game
Stackelberg game has been widely used in wireless communications to model the interactions of steatitic parties, such as physical layer security[26], resource management for LTEunlicensed[27], cognitive radio[28] and wireless energy harvesting[19, 18, 29, 30, 31]. In [29], the authors considered cooperative spectrum sharing with one primary user (PU) and one secondary user (SU), which harvests energy from ambient radio signal. The Stackelberg game was used to design the the SU’s optimal cooperation strategy. In [30], simultaneous wireless information and power transfer (SWIPT) in relay interference channels was considered, where multiple sourcedestination pairs communicate through their dedicated energy harvesting relays. The optimal power splitting ratios for all relays were derived by the formulated Stackelberg game. In [31], the authors formulated a stochastic Stackelberg game to study the delay optimal power allocation scheme. There is a recent paper addressing the EAP assisted wireless energy harvesting by using Stackelberg game[19], but their system settings are different from those in our work. Only one EAP with multiple antennas was considered in this paper, and the EAP acts as the seller and the base station (BS) as the buyer in behalf of its sensors. A more relevant work is [18], where an incentive mechanism was designed for the system with the similar setup where monetary reward with unified pricing were provided by the DAP to motivate thirdparty EAPs to assist the charging process. Here the unified pricing means prices per unit energy for different EAPs are the same. However, in this paper, we consider discriminative pricing scheme of Stackelberg game for the heterogenous EAPs in our work, including unified pricing scheme as a special case. Here discriminative pricing means that we set different energy prices for different EAPs to fully exploit their potentials for the enery charging service. Moreover, we extend the Stackelberg game to asymmetric information scenario by optimizing expected utility function of the DAP, instead of optimizing instantaneous utility function of the DAP in the classical Stackelberg game.
Iii System Model
Consider a wireless energy harvestingbased IoT system consisting of one DAP and EAPs belonging to different operators, which are connected to constant power supplies and connected to the server by backhauls, as shown in Fig. 1. The DAP is responsible for collecting various data from several wirelesspowered sensors within its serving region. Without embedded energy supplies, the wirelesspowered sensors fully rely on the energy harvested from the RF signals emitted by the EAPs to transmit its information to the DAP. For simplicity, we consider that the RF energy transfer and information transmission are performed over orthogonal bandwidth. For analytical tractability, time divisionbased transmission among sensors is adopted, i.e., there is only one active sensor during each transmission block. Hereafter, we refer to this active sensor as the information source. Besides, all the nodes in the system are assumed to be equipped with single antenna and operate in the halfduplex mode.
We consider that the energycarrying signals sent by the EAPs are independent and identically distributed (i.i.d.) random variables with zero mean and unit variance. Note that no coordination between the EAPs is needed since independent signals are transmitted. All channels are assumed to experience independent slow and flat fading, where the channel gains remain constant during each transmission block and change independently from one block to another
^{2}^{2}2Pilots are broadcasted by the active sensor to allow the DAP and EAPs to estimate the channels. So the DAP is aware of the channel gain from the DAP to the sensor and each EAP is aware of the channel gain from this EAP to the sensor. But the DAP generally is not aware of the channel gains from the EAPs to the active sensor. The energy consumption of channel estimation is ignored.
. The information source rectifies the RF signals received from the EAPs and uses the harvested energy to transmit its information. The time duration of every transmission block is normalized to one. So we use “energy” and “power” interchangeably hereafter. The amount of energy harvested by the information source during one transmission block can be expressed as(1) 
where is the energy harvesting efficiency, is the charging power of the th EAP, and is the channel power gain between the th EAP and the information source. Note that the noise is ignored in (1) since it is practically negligible at the energy receiver.
The harvestuse protocol is considered in this paper[32]. More specifically, the information source will use the harvested energy to perform instantaneous information transmission to the DAP. We consider a batteryfree design which indicates that the sensor only has a storage device like supercapacitor to hold the harvested energy for a short period of time, e.g., among its scheduled transmission block. Hence the sensor exhausts all the harvested energy in each transmission block, so the sensor’s energy storage device is emptied at the beginning of the transmission block. This batteryfree design can reduce the complexity and costs of the sensors, which is particularly suitable for the considered IoT sensing applications and has been adopted by other applications [33, 34]. The transmit power of the information source is thus given by
(2) 
Then, the received signaltonoise ratio (SNR) at the DAP is given by
(3) 
where is the noise power at the DAP, and is the channel power gain from the information source to the DAP. Note that the time duration for each transmission block is normalized as one, such that the channel capacity and throughput can be used interchangeably. Hence the achievable throughput (bps) from the information source to the DAP can be expressed by
(4)  
where is the bandwidth. We define the received signal power at the active sensor contributed by the th EAP^{3}^{3}3Note that the received power contributed by each EAP is assumed to be distinguishable by considering that the EAPs work in disjoint narrow bandwidth. as , and set for notation simplicity. We can thus simplify (4) as
(5) 
As we mentioned before, the EAPs considered in the system belong to different operators and act strategically, so they would not help the DAP voluntarily. To address this issue, the DAP needs to provide rewards to motive the EAPs to charge its sensors. In this paper, we mainly focus on monetary rewards as the incentive between operators. Other forms of rewards, such as physical resources (e.g., spectrum), or free offloading data between operator can also be used. To efficiently exploit the EAPs to achieve a good throughput, the following questions need to be answered in asymmetric information: Which EAPs the DAP should hire, how much energy should be requested from the hired EAPs, and how many rewards should be given to the hired EAPs?
Iv Incentive Mechanisms with Asymmetric Information
To answer the above questions in the practical scenario with asymmetric information, we first model the strategic interactions between the DAP and EAPs as a Stackelberg game. We will first redesign and reanalyze the existing Stackelberg game into the considered scenario by defining and optimizing expected utility of the DAP. In economic theories, contract theory is a powerful tool to design incentive mechanism in information asymmetry. As such, we will then reformulate the incentive mechanism problem into an optimal contract design problem.
Iva Stackelberg Game with Asymmetric Information
In this part, we will first explore how to design a Stackelberg game to model the interactions between the DAP and the EAPs, and then derive the optimal energy prices under asymmetric information. In the proposed Stackelberg game with asymmetric information, the DAP provides rewards to the EAPs for charging its sensors. The DAP is the leader of the formulated Stackelberg game, which imposes energy prices for the EAPs. The DAP optimizes the energy prices to maximize its expected utility function defined as the difference between the benefits obtained from the achievable throughput and its total payment to the EAPs. The EAPs are the followers which optimize their utility functions defined as the payment received from the DAP minus its energy cost.
IvA1 Stackelberg Game Formulation
The channel conditions and energy costs of EAPs are different, so the efficiencies of EAPs to charge the sensor are distinct. To fully exploit the potential of the EAPs, a discriminative pricing strategy is considered, i.e., the DAP can impose different prices of per unit energy harvested from different EAPs. Let
as the vector of the active sensor’s received power from EAPs, with
denoting the received power from the th EAP, and let as the vector of prices per unit energy harvested from EAPs, with denoting the price per unit energy harvested from the th EAP. The total payment of the DAP to the EAPs is(6) 
where is the received energy from th EAP. Since the aim of the DAP is to achieve higher throughput at the cost of less rewards to the EAPs, the utility function of the DAP can be defined as
(7) 
where is the achievable throughput defined in (4) and (5), is the unit cost of the DAP, which is normalized as without loss of generality hereafter.
Each EAP is modeled as a follower which would like to maximize its individual profit, the utility of which is defined as
(8) 
where is the transmit power of the th EAP, and is used to model the energy cost of the th EAP, given by
(9) 
where is the energy cost coefficient. Note that the above quadratic function has been widely adopted in the energy trading market to model the energy cost[35]. The utility function of the th EAP becomes
(10) 
Since the DAP is not aware of each EAP’s exact energy cost coefficient and channel gain, it can sort EAPs into some discrete types and use the statistical distributions of the types of EAPs from historical data to optimize the expected utility of the DAP. Specifically, we define the type of the th EAP as
(11) 
which suggests that the larger the channel gain between the EAP and the information source, and/or the lower the unit energy cost coefficient , the higher the type of the EAP. Without loss of generality, we assume that there are totally types of EAPs with . In this definition, the higher type EAP has better channel quality and/or lower energy cost coefficient. Note that since and , holds. Using (11), the EAP’s utility can be rewritten as
(12) 
Assume there are EAPs belonging to the th type, we thus have . We then can rewrite the DAP’s utility according to the types of EAPs as
(13) 
In this section, we consider a scenario with strong information asymmetry. In such a scenario, the DAP is only aware of the total number of EAPs (i.e., ) and the distribution of each type. But it does not know each EAP’s private type and thus it does not know the exact number of EAPs belonging to each type (i.e., ). As such, the DAP needs to optimize its expected utility over the possibilities of all possible combinations of . The expected utility of the DAP with EAPs is given by
(14)  
where is known after giving since the DAP knows the total number of EAPs, and
is the probability of a certain combination of the number of EAPs belonging to each type (i.e.,
). We assume that all types are uniformly distributed. The probability of one EAP belonging to each type is the same, which is
. In this case, can be calculated as(15)  
Since the DAP is not aware of the EAPs’ private information, it can only optimize the expectation of DAP’s utility function by using the statistical knowledge of the EAP’s private information. So the optimization problem for the DAP or the leaderlevel game can be formulated as
(16)  
Accordingly, the optimization problem for the EAP with th type or the followerlevel game can be formulated as
(17)  
Note that although the DAP does not know the EAP’s exact type, it knows the type set of EAPs.
The Stackelberg game for the considered system has been formulated by combining problems (P4.1) and (P4.2). In this game, the DAP is the leader who aims to solve problem (P4.1), while the EAPs are the followers who aim to solve their individual problem (P4.2). Once a game is formulated, the subsequent task is to find its equilibrium point(s). For the solution of the formulated game, the most wellknown concept is the Stackelberg equilibrium (SE), which can be formally defined as follows:
Definition 1 (Stackelberg equilibrium (SE)).
We use and to denote the solutions of problems (P4.1) and (P4.2), respectively. Then, is a SE of the formulated game if the following conditions are satisfied
(18) 
(19) 
for all and .
IvA2 Analysis of the Proposed Game
In this part, we will analyze the SE of the proposed Stackelberg game with asymmetric information.
It can be observed from (10) that for given values of , the utility function of the the EAP with the th type is a quadratic function of its contributed power to the active sensor and the constraint is affine, which indicates that the problem (P4.2) is a convex optimization problem. Thus, it is straightforward to obtain its optimal solution given in the following lemma:
Lemma 1.
For given values of , the optimal of the EAP with th type for problem (P4.2) is given by
(20) 
Proof.
The proof of this lemma follows by noting that the objective function of problem (P4.2) given in (17) is a quadratic function in terms of . ∎
It can be observed from Lemma 1 that for the same energy price, an EAP with better channel gain and/or less energy cost would like to contribute more power to the sensors.
Then we replace with in problem (P4.1), the optimization problem at the DAP side can be expressed as
(21)  
where is given by
(22)  
where is given in (15).
We can observe that problem (P4.3) is a concave function in terms of vector . This is because each term in the summation is composed by a logarithm function (concave) and quadratic functions (concave), and the summation of concave functions are still a concave function. Moreover, the constraint is affine. Problem (P4.3) is then a convex optimization problem. So we can numerically solve the system of equations given by the KKT conditions to get the solution of problem (P4.3). According the KKT conditions, we can also get some insight about the structure of the solution and thus have the following proposition.
Proposition 1.
The optimal solution to problem (P4.3) have the following structure:
(23) 
Proof.
See Appendix A. ∎
We surprisingly find that the optimal energy prices for different EAPs are the same, even if we impose discriminative prices for different EAPs in the original design of the Stackelberg game. This is because the energy price of unit received power is used in our pricing scheme. The DAP has no motivation to treat the received power from EAPs differently, so a unified pricing per unit received power is achieved.
Lack of complete information, the performance of the Stackelberg game with asymmetric information is worse than that with complete information. Note that in the considered scenario, there are EAPs in the market. In each channel realization, each EAP in the market selects one EAP type from a EAP type set randomly. In each channel realization, the Stackelberg game under complete information can adapt to the instantaneous combination of EAP types and calculate an optimal price for each instantaneous combination of EAP types by optimizing the instantaneous utility function. While the Stackelberg game under asymmetric information cannot adapt to instantaneous combination of EAP types, since it can only calculate a single price for all possible combinations of EAP types. Therefore, the reason that Stackelberg game under asymmetric information is worse than Stackelberg game under complete information is that it fails to adapt to the change of the instantaneous combinations of EAP types, i.e., the change of wireless channel conditions. This deduction will be verified later in the simulation part.
IvB Optimal Contract with Asymmetric Information
As we mentioned above, the performance of the Stackelberg game is degraded under asymmetric information. To improve the performance under asymmetric information, the DAP could design and offer a contract to effectively motivate the EAPs to charge its sensors. Note that in Stackelberg game, the EAP has the freedom to optimize its own utility by choosing any amount of received signal power at the active sensor when the DAP imposes some given energy price. Different from Stackelberg game, limited options are allowed for EAPs to select in contract theory. Specifically, a group of energyreward pairs (referred to as contract items) are designed. A contract consisting of a group of contract items is provided to the EAPs. The EAPs will choose a contract item at its discretion to maximize its benefit. By properly designing the contract item, the DAP can induce the EAP to expose its type by its selection of the contract item and thus relieve the information asymmetry.
In the following, we will formulate the optimal contract, characterize its feasibility conditions and provide optimal solution for the formulated contract.
IvB1 Contract Formulation
In this part, we will formulate a contract for the RF energy trading between the DAP and EAPs, characterize its feasibility conditions, and derive the optimal contract subject to the feasibility conditions.
A contract including a series of energyreward pairs is designed to maximize the expectation of the DAP’s utility. For the th type EAP, is the received power contributed by th EAP and is the reward paid to the th EAP as the incentive for the corresponding contribution.
We first rewrite the utility functions of the DAP and EAPs according to contract items. The DAP’s utility function is thus given by
(24) 
where is the reward paid by the DAP to the EAP with the th type for its corresponding contribution . Similar to (14), the expectation of can be represented as
(25)  
where is known after giving since the DAP knows the total number of EAPs, and is the probability of a certain combination of the number of EAPs belonging to each type (i.e., ), which is given by (15). And then the utility function of the EAP with the th type is rewritten as
(26) 
The social welfare is defined as the summation of the utilities of the DAP and all EAPs, given by
(27)  
It can be seen that the internal transfers, i.e., rewards, are cancelled in the social welfare, which is consistent with the aim to maximize the efficiency of the whole system, i.e., achieving more throughput at the cost of less energy consumptions.
Next, we will figure out the feasibility conditions. In our design, to encourage the EAPs to participate in the charging process and ensure that each EAP only chooses the contract item designed for its type, the following individual rationality (IR) and incentive compatibility (IC) constraints should be satisfied [21].
Definition 2 (Individual Rationality (IR)).
The contract item that an EAP chooses should ensure a nonnegative utility, i.e.,
(28) 
Definition 3 (Incentive Compatibility (IC)).
An EAP of any type prefers to choose the contract item designed for its type, instead of any other contract item and , given by
(29) 
The IR condition requires that the received reward of each EAP should compensate the cost of its consumed energy when it participates in the energy trading. If , the EAP will choose not to charge the information source for the DAP. We define this case as . The IC condition ensures that each EAP automatically selects the contract item designed for its corresponding type. The type of each EAP is thus revealed to the DAP, which is called “selfreveal”. If a contract satisfies the IR and IC constraints, we refer to the contract as a feasible contract.
Following the idea of contract theory[21], the DAP aims at maximizing its expected utility subjecting to the constraints of IR and IC given in (28) and (29). Thus, the optimal contract is the solution to the following optimization problem
(30)  
The first two constraints correspond to IR and IC, respectively. Note that the EAP will reveal its private type truthfully with the IR and IC constraints. Specifically, the IR condition ensures the EAP’s participation and the IC condition ensures that each EAP selects the contract item designed for its corresponding type to gain highest payoff.
IvB2 Constraint Reduction
There are IR constraints and IC constraints in (30), which are nonconvex and couple different EAPs together. It is hard to solve (30) directly due to the complicated constraints. Motivated by this, in the subsection we first reduce the constraints of (30) and transform it.
We first realize that the following necessary conditions can be derived from the IR and IC constraints.
Lemma 2.
For any feasible contract, if and only if , .
Proof.
See Appendix B. ∎
Lemma 2 shows that the EAP contributing more received power at the information source will receive more reward.
Lemma 3.
For any feasible contract, if and only if , .
Lemma 3 can be proved by using similar procedures as Lemma 2, which is omitted for brevity. Lemma 3 indicates that the EAPs providing the same received power will get the same amount of reward.
Lemma 4.
For any feasible contract, if , then , .
Proof.
See Appendix C. ∎
Lemma 3 shows that a higher type EAP should be given more reward. Together with Lemma 1 and Lemma 2, it can be duduced that a higher type EAP also contributes more energy to the information source. We define this feature as monotonicity.
Definition 4 (Monotonicity).
If and then .
Based on the above analysis, we can now use the IC condition to reduce the IR constraints and have the following lemma.
Lemma 5.
With the IC condition, the IR constraints can be reduced as
(31) 
Proof.
See Appendix D. ∎
We can also reduce the IC constraints and attain the following lemma.
Lemma 6.
With monotonicity, the IC condition can be reduced as the local downward incentive compatibility (LDIC), given by
(32) 
and the local upward incentive compatibility (LUIC), given by
(33) 
Proof.
See Appendix E. ∎
By using the reduced IR and IC constraints, the optimization problem (30) can be transformed as
(34)  
The LDIC and the LUIC in (34) can be combined as shown in Lemma 8.
Lemma 7.
Since the optimization objective function is an increasing function of and a decreasing function of , the above optimal problem can be further simplified as
(35)  
Proof.
See Appendix F. ∎
IvB3 Solution to Optimal Contract
We now solve the optimization problem (35) to attain the optimal contract in the subsequent way: a standard method is first applied to resolve the relaxed problem without monotonicity and the solution is then verified to satisfy the condition of the monotonicity. By iterating the first and second constraints in (35), we have
(36)  
where . Substitute (36) into , and all are removed from the optimization problem (35), which becomes
(37)  
Note that (37) is composed of logarithmic functions and quadratic functions, both of which are concave functions. And the positive summation of all these concave function is still a concave function. Besides, the constraint set is a convex set. So we can leverage standard convex optimization tools in [36] to solve it to get , and then can be calculated by (36). Moreover, monotonicity is met automatically when the type is uniformly distributed[21]. So far, we have derived the optimal contract , , which can maximize the utility of the DAP and satisfy the constraints of IR and IC.
IvC Practical Implementation
To implement the proposed approach in a practical radio frequency energy harvestingbased IoT system. The following steps should be followed.
First, the DAP needs to collect the information it requires by the computation of the optimal contract. The active sensor will broadcast pilots to allow the DAP and EAPs to estimate the channels such that the DAP is aware of the channel gain from the DAP to the sensor. From historical data, the DAP can obtain empirical values of the energy harvesting efficiency factor and noise power, and thus it can attain the value of parameter . With the known values of other public system parameters including the channel bandwidth, the user number, and the set of EAP types, the DAP can calculate the optimal contract.
Next, the DAP will broadcast the optimal contract to the candidate EAPs via the corresponding backhauls. By evaluating the contract, the EAPs will decide whether to participate in the cooperation. If it decides to participate in the current energy trading, it will send a feedback to the DAP. After the DAP received the feedback, it will sign a contract with the EAP.
Finally, after the contracts are signed, the EAPs will perform the contracts by establishing an energy transfer link towards the active sensor and charge it according to the agreed transmit power. When the DAP detects that the EAPs have fulfilled its contractual obligation, the DAP will pay the EAP with agreed amount of rewards via the backhaul connecting the operators.
V Benchmark Schemes with Complete Information
To investigate the impacts from information scenarios and compare the proposed schemes with the existing schemes under complete information, we first extend existing Stackelberg game from unified pricing strategy into discriminative pricing strategy. And then we present the centralized optimization scheme under complete information as the reference for the proposed incentive mechanisms.
Va Stackelberg Game Formulation
To fully exploit the potentials of EAPs with distinct channel conditions and energy costs, a discriminative pricing strategy is considered, i.e., the DAP can impose different prices of per unit energy harvested from different EAPs. The utility function of the DAP can be rewritten as
(38) 
where is the vector of the active sensor’s received power from EAPs, with denoting the received power from the th EAP, is the vector of prices per unit energy harvested from EAPs, with denoting the price per unit energy harvested from EAP, and is the achievable throughput defined in (4) and (5). The optimization problem for the DAP or the leaderlevel game can be formulated as
(39)  
Note that the optimization problem (P5.1) is different from (P4.1) under asymmetric information, the instantaneous utility of the DAP is optimized here, instead of expected utility of the DAP in asymmetric information.
Each EAP is modeled as a follower which would like to maximize its individual profit, the utility of which is rewritten as
(40) 
where and are the energy cost coefficient and channel gain of the th EAP, respectively. Thus, the optimization problem for the EAP or the followerlevel game is given by
(41)  
VB Analysis of the Formulated Stackelberg Game
In this subsection, we will derive the SE of the formulated game by analyzing the optimal strategies for the DAP and EAPs to maximize their own utility functions. A closedform solution is derived by using KarushKuhnTucker (KKT) conditions.
First, the optimal of the th EAP is similar to that in (20), which is given by the following Lemma:
Lemma 8.
For given , the optimal solution for problem (P5.2) is given by
(42) 
Proof.
The proof of this lemma follows by noting that the objective function of problem (P5.2) given in (41) is a concave function in terms of . ∎
It can be observed from Lemma 8 that for the same energy price, an EAP with better channel gain and/or less energy cost would like to contribute more power to the active sensor.
Subsequently, we need to solve problem (P5.1) by replacing with given in (42). The optimization problem at the DAP side can be expressed as
(43)  
where is given by
(44)  
We can observe that problem (P5.3) is a concave function in terms of vector since the former part in (44) is a logarithm function (concave) and the latter parts in (44) are the summation of quadratic functions (concave), and the constraint is affine. So problem (P5.3) is a convex optimization problem. By using KKT conditions to solve problem (P5.3), the closedform solution for is derived in the following proposition.
Proposition 2.
The optimal solution to problem (P5.3) is given by
(45) 
where is the base of the natural logarithm, is given by
(46) 
Proof.
See Appendix G. ∎
Proposition 2 shows that the optimal prices for the Stackelberg game with complete information are the same. This result is consistent with that of the Stackelberg game with asymmetric information. As we explained before, this is because the received power price is used in the Stackelberg games with complete and asymmetric information. The DAP has no motivation to treat the received power from EAPs differently.
Note that the Stackelberg game under complete information can calculate an optimal price for each instantaneous channel realization and equivalently the combination of EAPs’ types. As such, it can adapt to the change of channel conditions. As a comparison, the Stackelberg game under asymmetric information can only calculate a single price no matter of the change of the channel conditions, i.e., the change of the combinations of the EAPs’ types.
VC Centralized Optimization
In this part, the performance of centralized optimization scheme, i.e., the optimal contract with complete information, where the DAP knows exactly the types of the EAPs, is presented. The centralized optimization problem is given as follows.
(47)  
where is given in (25).
Since the DAP knows exactly the types of the EAPs, the optimal prices are given by
(48) 
We substitute in (47) with and get
(49)  
where is given by
(50)  
Note that (50) is exactly the expectation of the social welfare, which is defined in (27). Although we originally optimize the utility function of the DAP in problem (P5.4), it is consistent with the optimization of the social welfare, which is similar case in the design of contract theory as we mentioned before.
It can also be observed that problem (P5.5) is a convex optimization problem. This is because each term in the summation is composed a logarithm function (concave) and quadratic functions (concave), the summation of concave functions are still a concave function, and the constraint is affine. We can get the solution of problem (P5.5) by solving the system of equations given by KKT conditions, which is omitted here as it is similar to that in Appendix A.
Vi Simulations and Discussions
In this section, we first evaluate the feasibility of the proposed contract, and then compare the performance of the proposed incentive mechanisms. The performance of centralized optimization scheme is also simulated as the upper bound.
Parameters  Values 

Energy harvesting efficiency  0.5 
Bandwidth  1MHz 
Energy cost coefficient  [0.1,1] 
[5m,10m]  
[15m,25m]  
Pathloss coefficient  2 
Power attenuation at reference distance of 1m  30dB 
Noise power 
The main system parameters are shown in Table I. Since and , the practical ranges of and can be determined by the parameters shown in Table I. In the simulations, K types of EAPs are first generated randomly and used as the set of EAP types. Then each of N EAPs in the market will choose one type from the set of EAP types uniformly, and thus the DAP’s type is uniformly distributed. The unit of achievable throughput is set as Mbps.
To verify the feasibility (i.e., IR and IC) of the proposed scheme under information asymmetry, the utilities of EAPs with type 3, type 6 and type 9 are plotted in Fig. 2 as functions of all contract items . We can see from Fig. 2 that each of the utility achieves its peak value only when it chooses the contract item designed for its corresponding type, which indicates that the IC constraint is satisfied. For example, for the type 6 EAP, its utility achieves the peak value only when it selects the contract item , which is exactly designed for its type. If the type 6 EAP selects any other contract item and , its utility will reduce. Moreover, when each of above type EAPs (i.e., type 3, type 6 and type 9) chooses the contract item designed for its corresponding type, the utilities are nonnegative. Note that similar phenomenon can be observed for all other types of EAPs when they select the contract item designed for their corresponding types, which are not shown in Fig. 2 for brevity. In this sense, the IR condition is satisfied. It can be concluded that utilizing the proposed scheme, EAPs will automatically reveal its type to the DAP after selecting the contract item. This means that using the proposed scheme, the DAP can capture the EAPs’ private information (i.e., its type), and thus effectively address the problem of information asymmetry.
To evaluate the performance of the proposed schemes, we compare the social welfare of the contract, Stackelberg games and the upper bound. Fig. 3 plots the social welfare of these schemes as a function of . It can be observed from Fig. 3 that the utilities achieved by all schemes increase with . This is because with the same , the larger the value of , the larger the achievable throughput (refer to (5)), and thus larger social welfare (refer to (27)). The performance of the optimal scheme with complete information providing the best performance serving as the upper bound. The performance of contract scheme is generally better than that of two Stackelberg games. This is because in contract theory, the EAPs have limited contract items to choose from and thus by using the contract theory, the DAP extracts more benefits from the EAPs and leave less surplus for the EAPs. However, in Stackelberg games, the EAPs have the freedom to optimize its individual utility function and thus can reserve more surplus. So the performance of the Stackelberg games are inferior than that of the contract scheme. We can also observe that the Stackelberg game with asymmetric information is inferior than that with complete information. This is because without complete information, the Stackelberg game fails to adapt to the change of the channel, and thus the performance becomes worse.
Fig. 4 shows the normalized social welfare as a function of , where social welfare of the contract and Stackelberg games are normalized by the upper bound. It can be seen in Fig. 4 that when is small, the social welfare of contract can initially achieve more than of that of the centralized optimization scheme with complete information, and gradually approach to it with the increasing of . This demonstrates that the proposed incentive mechanism can effectively mitigate the effects of information asymmetry by leveraging contract theory. While the performance of the Stackelberg game with complete information is generally less than of that of the optimal scheme with complete information. Moreover, the performance of the Stackelberg game with asymmetric information is even worse, which is generally less than of that of the optimal scheme with complete information. The above results show that by using the monopoly position in contract theory to provide limited contract items, the contract can achieve good performance close to the optimal centralized optimization with complete information. However, in Stackelberg games, the DAP grants some freedom for the EAPs to do optimizations, which are selfish and do not care about social welfare. As such, its performance in terms of social welfare is degraded.
To explore the impact of total EAP number in the market, we plot the curves of the social welfare and the normalized social welfare of the contract, Stackelberg games and the upper bound in Fig. 5 and Fig. 6. In Fig. 5, the social welfare of these schemes is plotted as a function of . We can observe from Fig. 5 that the utility functions achieved by all three schemes of upper bound, contract, Stackelberg with complete information increase with . This is because the overall social welfare increases with the number of EAPs in the market. The more EAPs in the market, the larger the summation of utility functions of all the DAP and EAPs. However, the Stackelberg game under asymmetric information decrease slightly as increases. This is because the Stackelberg game under asymmetric information fails to adapt to the change of the combinations of the EAPs’ types. As we mentioned before, it can only calculate one single price for all the combinations of EAPs’ types. The more EAPs in the market, the more diverse combinations of the EAPs’ types. As such the performance of the Stackelberg under asymmetric information become worse. As a comparison, the Stackelberg game with complete information can calculate a price targeting a certain combination of EAPs’ types in the market. So it provides better performance than that of its asymmetric counterpart. While the contract leverages its monopoly status in the market structure to provide a limited group of contract items for the EAPs to choose from. Therefore, contract theorybased scheme provides the better performance than that of both Stackelberg games and close to the performance of the upper bound.
In Fig. 6, the normalized social welfare of these schemes is plotted as a function of , where social welfare of the contract and Stackelberg games are normalized by the upper bound. It can be seen in Fig. 6 that when , the social welfare of contract can initially gain more than of that of the centralized optimization scheme with complete information, and gradually approach to it with the increasing of . This proves that the effects of information asymmetry can be mitigated successfully by leveraging contract theory. While the Stackelberg game with complete information can only provide the normalized social welfare of less than . Besides, the performance of the Stackelberg game with asymmetric information is even worse, which is generally less than of that of the optimal scheme with complete information and decrease significantly with the increasing of in the market. This is because the more EAPs in the market, the more diverse the combinations of EAPs’ types will be. The Stackelberg game under asymmetric information cannot adapt to the change of the combinations of EAPs’ types as it can only calculate a single price for all possible combinations of EAPs’ types.
Vii Conclusions and Future Work
In this paper, we developed incentive mechanisms under complete and asymmetric information to unveil the impact of information asymmetry and market structure. Specifically, we developed a contract based incentive mechanisms for the wireless energy trading in radio frequency energy harvesting (RFEH) based Internet of Things (IoT) systems under asymmetric information. In the asymmetric information scenario, a Stackelberg game based scheme is also formulated as a comparison. In complete information, the existing Stackelberg game is extended from unified pricing into discriminative pricing as a comparison. In the simulations, it was shown that the Stackelberg game degrades significantly without complete information, and the performance of the contract scheme under asymmetric information is better than that of the Stackelberg scheme with complete information. It can be concluded that the performance of the considered system depends largely on the market structure (i.e., whether the EAPs are allowed to optimize their received power at the IoT devices with full freedom or not) than on the information scenarios (i.e., the complete or asymmetric information).
In our future work, we could consider both information asymmetry as well as hidden action. In this scenario, the DAP is not aware of the private information of EAPs and it cannot distinguish the actions taken by different EAPs, i.e., the received power contributed by different EAPs. Because the actions of EAPs are hidden from the DAP, some EAPs may get the reward of the group without paying any efforts, which leads to the freerider problem. In this case, another mathematical tool from the economics, known as the moral hazard in teams, has a good potential to design effective incentive mechanisms for this new scenario.
a Proof of Proposition 1
In this part, we will prove the proposition 1. Because the problem (P4.3) is a convex optimization problem, KKT conditions are the sufficient and necessary conditions for the optimal solution. The KKT conditions of problem (P4.3) are given as follows.
The firstorder necessary condition are given by
(51) 
where are the types of EAPs, are KKT multipliers, are the prices, and . The complementary slackness condition is given by
(52) 
Since and , hold, (52) becomes
(53) 
To get the optimal solution of the KKT conditions, we need to solve the equation system consists of (51) and (53) in terms of and , , which is a system of quadratic equations. Now we will discuss the combinations of active or inactive constraints in KKT condition. Let first test if leads to a valid solution. By substitute with in (51), we have
(54) 
where is given by
(55) 
and , is given by
(56) 
The above system of equations in (54) can be solved numerically. Note that the right term of each equation in (54) are the same, so we can conclude that
(57) 
Because problem (P4.3) is a convex optimization problem, we can conclude that this solution given by KKT conditions is the solution of original optimization problem.
Comments
There are no comments yet.