I Introduction
Consider the modern IoT paradigm in a 5G context, where there are large number of small IoT devices spread across a medium sized environment, e.g., a home, an office, an automobile or a factory floor. Each IoT device is monitoring certain inputs, and wants to communicate an update to a common monitor essentially as soon as possible. To model this scenario, a metric called the age of information (AoI) was introduced recently, that represents the freshness of information at the monitor/ receiver side, that has become a very popular object of theoretical interest in recent past [12, 8, 25, 30, 19]. A nice review can be found in [14]. Essentially, the age for any device at any time is the difference between the current time and the generation time of the last update.
Many variants of the AoI problem for a single node, e.g. depending on the scheduling discipline like FCFS [12], or LCFS [11] and more importantly with multiple nodes has been considered in prior work, e.g., with multiple sources in [30, 8, 19, 9, 24, 7, 29]. With multiple sources, at each time slot, one bit of information can be sent from a set of sources to a monitor, e.g., in [9], and the objective is to minimize the longterm weighted sum of the ages of all sources, subject to individual source throughput constraints.
One common assumption between almost all prior work on AoI with multiple nodes is the centralized control over the transmission decisions by each node. For example, the policy in [9], transmission decisions for each node are based on the current global age for each source. The centralized policies lead to a large overhead and delay, which could be limiting in a practical large scale IoT deployment, where devices are lowpowered and delay sensitive, and a distributed or autonomous setup is preferred, where each IoT device can make its own decisions, given the transmission history.
This paper focusses on the distributed IoT paradigm, where each device has to make autonomous decisions with no communication between nodes. To keep the model simple, we consider slotted time, and assume that if two nodes transmit in the same slot, a collision occurs, and no update is received at the monitor. To keep the model practical, we assume that each node incurs a fixed cost for each transmission, thus ensuring that no node can transmit all the time. Under this distributed setting, the objective of each node is to minimize its own timeaveraged age of information while incurring a reasonable average cost of transmission.
A typical approach to study such problems is to model it as a game, with a particular utility function, and then try to find a Nash equilibrium (NE) for it. There are multiple issues with such an approach: the choice of exact utility function is not obvious, and more importantly gathering network information : e.g. the knowledge of number of other nodes may not be available in a distributed setting, possibly because of time varying nature, etc. The solution concept of NE is important in a distributed competitive setting, since it establishes that there is a fixed point or a stable strategy for each node, and ensures that the system can be driven to an equilibrium.
In this paper, to eliminate the need for network information, we take an alternate approach to reach equilibrium via considering a local probabilistic transmit (learning) algorithm for each node, which decides the probability with which each node transmits its most recent packet in any slot. The learning algorithm for each node is local in the sense that it only depends on its own current timeaveraged age, current average transmission cost, and past history of success/failures in slots in which it had transmitted a packet. With this learning algorithm, the objective is to show that it converges to a fixed point/equilibrium when followed by all nodes autonomously.
In prior work, finding learning algorithms that achieve equilibrium has been considered for congestion games (that are also potential games) where the congestion costs are additive, and the multiplicative weights learning algorithm is known to converge to NE [13, 15]. For a more general setting, Friedman and Shenker [6] showed that learning algorithms can achieve the NE in a two player zerosum game, however, a similar result does not hold for a three player game as shown by Daskalakis et al. [4]. For a brief survey, we refer the reader to the work of Shoham et al. [23]. For noncongestion games, learning algorithms achieving the NE has been briefly considered [1, 10, 3]. Similar to our setup, there is also work [17, 18] in finding utility functions for which a given set of strategies are NE. Finding utility functions, however, for which the given set of strategies in addition have low price of anarchy is something that has remained intractable.
The most related work on learning algorithms to achieve equilibrium for communication settings is [26, 28, 21]. In particular, for analyzing exponential backoff [26] and for arrival games [28], existence of equilibrium via a learning algorithm is established. In [21], an uplink throughput game is considered, where in a distributed setup each node is interested in maximizing its throughput via updating its transmission rate using a learning algorithm. Notably, [21] shows that it is not always possible to show the existence of an equilibrium or how to achieve it, and in principle, the learning algorithm based approach to achieve equilibrium is challenging.
In our model, we consider that each IoT device always has a packet to transmit following [9, 24, 7]. Under the presence of multiple competing nodes, and the collision model, it is not clear when should each node transmit without any explicit communication between any two nodes, and when there is a cost for each transmission. Thus, a local probabilistic transmit (learning) algorithm is considered, where each node decides to transmit in each slot with probability that is determined by its own local knowledge of past successes and failures, current empirical age/cost etc, and the goal is to reach an equilibrium in this distributed setting. In particular, the proposed learning algorithm weights the current empirical average of age and cost inverse exponentially, which is intuitive, since larger the timeaveraged age more aggressive should be the transmission probability and opposite for large average transmission cost. The fact that deterministic strategy cannot be an equilibrium strategy can be argued rather easily.
The learning algorithm tries to find the right balance between transmitting too often that will lead to lot of collisions and large transmission cost, and transmitting too seldom which will increase the timeaveraged age. Moreover, the learning algorithm does not need any knowledge of the network, for example, the number of other nodes in the network, transmission strategy of other nodes, etc.
The main result of this paper is to show that the proposed learning algorithm converges to a unique, nontrivial fixed point (equilibrium). We also explicitly characterize the fixed point, and show that it is in fact a NE for a game, where the corresponding virtual utility function captures the relevant tradeoffs of the problem, i.e., the utility function for each node is a function of its own transmission probability via the timeaveraged age and average transmission cost, and is decreasing in the other nodes’ transmission probabilities, etc.
It is worth noting that the actual probabilistic learning algorithm makes no use of the knowledge of this virtual utility function that depends on network parameters such as the number of nodes in the network, and that is why we call it the virtual utility function. The virtual utility function is discovered only to characterize the fixed point of the learning algorithm. Moreover, we are also able to show that the price of anarchy of the virtual utility function approaches as the number of nodes grows large. This shows that even if nodes knew the virtual utility function and could collaborate, the optimal social utility would be close to the sum of the utilities obtained by the proposed learning algorithm.
The main technical ingredients of the paper are as follows. We first consider an expected version of the proposed learning algorithm, where all random variables are replaced by their expected values. We then find the underlying virtual utility function that the expected learning algorithm is trying to maximize. Corresponding to this utility function, we identify a multiplayer game
, and show that there is a unique NE for this game, and that is achieved by the best response strategy. To show the convergence of the proposed learning algorithm to a fixed point, we show that its updates converges to the best response actions for . Thus, in two steps: proposed learning algorithm converges to the best response actions for , and best response actions for converges to the NE for the , we show that the proposed learning algorithm converges to a fixed point characterized by the NE of . Note that this correspondence between the learning algorithm and the best response strategy that required network information is only made for analysis, and the learning algorithm does not need any network information.We also present numerical results to validate our theoretical findings. In particular, we show that the proposed learning algorithm converges to an equilibrium quite fast, and happens for any choice of , the number of nodes in the network. To show this effect, we perturb the system by increasing/decreasing , and plot the resultant transmission probabilities. We plot the timeaveraged age seen by any node in the network, which appears to grow exponentially with as expected, since there is no coordination in the network, and the success probability for node is if is the fixed point for each node . Even though we analytically only show that the price of anarchy approaches as the number of nodes become large, in simulations, we observe that it is very close to for all values of the number of nodes in the network.
Ii System Model
Consider a network with nodes and a single receiver/monitor. Time is discretized into equallength slots. Following prior work [9, 24, 7, 29], we assume that a new datapacket (in short, packet) is generated in each slot at each node. If a node decides to transmit in any slot, it transmits the most recent packet, irrespective of success/failures of transmission in earlier slots. Packet transmitted by a node in a slot is correctly decoded by the monitor if no other node transmits in the same slot. Otherwise, a collision occurs and all the simultaneously transmitted packets are lost. For realistic modelling, we assume that each transmission by node costs units, to capture transmit energy cost, etc. Also, in each slot , the age of node is given by , where is the last slot (relative to slot ) in which the packet of node was successfully received by the monitor. Fig. 1 shows a sample plot of age against slot. Here, denotes the age of node in slot , while is its initial age. Until the monitor receives a packet from node , the age of node grows linearly with passage of slots, and it drops to 0 when the packet is received.
Since nodes are distributed and there is no coordination/communication between them, a natural competition model emerges. Each node wants to transmit often to minimize its age, but in distinct slots, since otherwise there is collision in which all the nodes (colliding) accrue transmission cost but without any age reduction. Thus, each node wants to inherently selfishly maximize a utility function that depends on its timeaveraged age and average transmission cost. The most appropriate form of utility function is debatable, and even if we consider a specific utility function, analytically showing that a NE exists and can be achieved may not be tractable. The basic idea behind seeking a NE is to show that there is a fixed point or a stable strategy for each node, and system can be made to work in an equilibrium.
In this paper, we take an alternate approach to reach equilibrium via considering a local probabilistic learning algorithm (learning algorithm called hereafter) for each node, which decides the probability with which each node decides to transmit its most recent packet in any slot. The learning algorithm for each node is local in the sense that it only depends on its own current timeaveraged age, current average transmission cost, and past history of success/failures in slots in which it had transmitted a packet. This approach completely eliminates the need for network parameter knowledge, such as the number of other nodes in the network, and their strategies, which would be needed if we were to follow the usual technique of considering an utility function for each node, and finding its NE.
With this learning algorithm, the objective is to show that it converges to a fixed point/equilibrium when followed by all nodes autonomously. A priori this appears to be a challenging task, however, we show in the next section, that it is possible to do so for the considered problem. In fact, we also characterize the fixed point that this learning algorithm achieves.
Iii The Learning Algorithm
Let be a positive integer. Divide timeaxis into frames by grouping consequtive slots together. Therefore, frame consists of slot to slot . Henceforth, let refer to slot , i.e., slot in the frame. Further, let be the average transmission cost of node in frame , given by
(1) 
where is the cost per transmission for node , and is a binary random variable which takes value 1 if node transmits in slot , and 0 otherwise. Also, let be the timeaveraged age of node in frame (assuming age at the start of the frame to be 0), given by
(2) 
where is the age of node in slot where age is reset at the start of frame to be 0 (recall that denotes the last slot (relative to slot ) in which the packet of node was successfully received by the monitor).
Consider the following learning algorithm for deciding the probability with which to transmit in any slot in a given frame: in each slot of a frame , node transmits the packet with probability , where is initialized with a random value from interval , whereas at the end of each frame , is given by
(3) 
where, is the learning rate, while and are constants decided by node locally (i.e., independently of other nodes in the network).
The intution for (III) is as follows. If is high, should decrease, whereas, if is high, should increase. To account for this tradeoff, the learning component of algorithm (III) consists of two additive terms: (a decreasing function in ), and (an increasing function in due to negative sign). Because and , so the value of first term belongs to the interval while the value of second term belongs to the interval . Therefore, and controls the relative weights of and respectively, as well as the range of values the corresponding terms can take.
Note that by simple rearrangement of terms in (III), we obtain . For each , , and because we initialize , therefore for and , we have (because convex combination of two terms with value in interval also lies in the same interval). Also, function ensures that . Hence , . And as , therefore, is a valid transmission probability.
Remark 1
For analytical tractability, we update the transmission probability for frame , i.e., in (III), using the and , which only accounts for the average transmission cost and timeaveraged age of the previous (i.e., ) frame (instead of all the previous frames). We show that choosing large enough frame length , this simplification is sufficient.
Now, with the given description of the learning algorithm, the three main results of the paper can be summarized as follows.
Theorem 1
If all the nodes in the system obtain their transmission probability using the learning algorithm (III), then their transmission probabilities converge to a unique fixed point almost surely.
Theorem 1 establishes that the learning algorithm (III) can achieve an equilibrium. Next, we characterize this equilibrium as follows.
Theorem 2
The unique fixed point of Theorem 1 corresponds to the NE of the noncooperative game , where the nodes act as players, and each node chooses an action to maximize its own (virtual) utility given by
(4) 
where , , and denotes the transmission probability of all the nodes in the system except node .
The utility function for each node (4) is relevant for the considered problem since it is a function of its own transmission probability via the timeaveraged age and average transmission cost, and is decreasing in the other nodes’ transmission probabilities through , etc.
Even though NE guarantees equilibrium but its efficiency is quantified by price of anarchy () that counts the price for selfish behaviour. For defined in (4), let be the sum of the utilities of all nodes. Then PoA is defined as , where is the global optimal for while is the NE point. In the next Theorem, we show that the PoA remains close to (as desirable) for the considered game with the virtual utility function (4).
Theorem 3
For the noncooperative game defined in Theorem 2, as becomes large, the price of anarchy () approaches unity.
The rest of the theoretical part of the paper is dedicated in proving the above three theorems. We first consider an expected version of the proposed learning algorithm (III), where all random variables are replaced by their expected values. We then find the underlying virtual utility function that the expected learning algorithm is trying to maximize. Corresponding to this utility function, we identify a multiplayer game , and show that there is a unique NE for this game, and that is achieved by the best response strategy. To show the convergence of the proposed learning algorithm to a fixed point, we show that its updates converge to the best response actions for . Thus, in two steps: i) proposed learning algorithm converges to the best response actions for , and ii) best response actions for converges to the NE of , we show that the proposed learning algorithm converges to a fixed point characterized by the NE of .
Let
denote the transmission probability vector of the
nodes. Then we have the following Lemma (proof in Appendix A):Lemma 1
For a given , if is large,


.
Replacing and in (III) by their converged values (assuming large and using Lemma 1), we obtain the following expected form of the learning algorthm (III):
(5) 
where, , and . To avoid overload of notation, we use the same notation for this expected update strategy (III) as in (III), and the distinction will be clear in the sequel. Note that (III) is just for analysis and it cannot be used in practice as is unknown. Now using (III), we extract the virtual utility function that (III) is trying to maximize.
Theorem 4
Let denote the transmission probability of all the nodes in the system except node . Then for a given , the learning algorithm (III) maximizes the following virtual utility function (unique upto a constant):
(6) 
which is continuous and strictly concave for in interval , with a unique maximizer which lies in .
Iiia NonCooperative Game Model
Using the virtual utility function (6), we next define a game, where the strategy of each user is the probability with which to transmit in each slot in an autonomous way. Let be a game, with nodes as players, and each node chooses an action to maximize its own utility given by (6). The best response of a node is given by
(7) 
and under best response strategy, at the end of each frame , . Further, a Nash Equilibrium (NE) is said to exist for if there exists a transmission probability vector , such that for each node , is best response for node given . Note that the set is nonempty, compact and convex in . Additionally from Theorem 4, the utility function (6) is continuous and strictly concave (strict concavity implies quasiconcavity as well) for . Hence using Proposition 1, we conclude that NE exists for .
Proposition 1
[Proposition 20.3 in [20]] The noncooperative game has a Nash Equilibrium if for all ,

the set of actions of player is a nonempty compact convex subset of a Euclidean space, and

the utility function is continuous and quasiconcave on the set of actions .
Next, we show that the best response strategy (7) for converges to the unique NE.
IiiB Convergence of Best Response Strategy
Theorem 5
For the noncooperative game , if for each node ,
(8) 
(where ), then the best response strategy converges to the unique NE.
Theorem 5 has been proved in Appendix D. Note that (8) depends on the value of and . We have assumed that value of is unknown to nodes. Next, we show that if each node chooses its parameters depending on a predetermined value of independent of the value of , (8) can be made to satisfy for all values of . The result is summarized in the following Lemma (detailed proof in Appendix E).
Lemma 2
Remark 2
As per Lemma 2, . Suppose that for each node , , then , which is a global constant (independent of ). Now from (III), note that for each node , the trajectory of its transmission probability is determined by , and . When is independent of , then as per Lemma 2, and are also independent of . Therefore, if for each node , , then the trajectory of transmission probability is independent of .
In summary, we conclude that each node can independently choose the parameters such that the condition (8) for convergence of the best response strategy is satisfied for all values of .
IiiC Convergence of Learning Algorithm (Iii) to the Best Response Strategy (7)
Definition 1
For the realvalued concave function , is said to be its subgradient at point , if for every other point , we have . Further, a function (where is time) is called the stochastic subgradient of at point , if for a given value of random variables , is a subgradient of at .
Consider the function defined as below:
(10) 
Taking conditional expectation on both sides of (10), we get
(11) 
where (a) is obtained for large using Lemma 1 and using and to denote and respectively, while (b) is obtained using (6). Further due to Theorem 4, we know that is a concave function in (for fixed ). Therefore for and ,
(12) 
Hence, using (IIIC) and (12) along with Definition 1, we conclude that is a stochastic subgradient of . Now using (10), we can write the learning algorithm (III) as
(13) 
which suggests that the learning algorithm can be interpreted as a stochastic subgradient algorithm [2] which maximizes the virtual utility function given by (6). Using this interpretation of the learning algorithm we obtain Theorem 6, with detailed proof in Appendix F.
Theorem 6
Theorem 6 suggests that for properly chosen learning rates (for example, ), the learning algorithm converges to the best response strategy almost surely, and if (8) is satisfied, then according to Theorem 5, the best response strategy further converges to a unique NE. Thus, combining Theorem 5 and Theorem 6, we conclude that if all the nodes update their transmission probabilities by following the learning algorithm (III) with an appropriate learning rate and satisfying (8), then their transmission probabilities converge to a unique NE (a fixed point) almost surely, thereby proving Theorem 1. Thus, completing the proof of Theorem 1 and Theorem 2 simultaneously. Proof of Theorem 3 can be found in Appendix G.
Iv Numerical Results
We analyzed the convergence properties of the learning algorithm (III) by simulating a scenario with 10 nodes and , , and chosen as per Lemma 2 for different values of . Also, . As shown in Fig. 2, transmission probability obtained using the learning algorithm converges to the best response strategy very quickly.
To check the robustness of the learning algorithm under dynamic conditions, we performed a second simulation with 3 nodes at and 7 new nodes joining the system at and leaving it again at . As shown in Fig. 3, irrespective of the disturbance, the learning algorithm converges to the best response strategy. But note that the learning rate decreases with . Hence, if the system is disturbed at large , then the convergence is slow. However, this issue can be resolved by reinitializing whenever it becomes very large.
To understand the effect of number of nodes on the fixed point of the learning algorithm (III), Fig. 4 plots the transmission probability (converged) obtained using the learning algorithm (III) for different values of (for the simulation, we used and for each node , , while , and were chosen as per Lemma 2). For comparative study, Fig. 4 also plots the transmission probability for the roundrobin (RR) scheme, in which, each node is assigned a slot in roundrobin fashion to avoid collision. With RR, the nodes may transmit their packets only in their respective alloted slots with transmission probability obtained using (III). However, note that the RR is only of theoretical interest because in practice, there is no mechanism for slot allotment (as neither the nodes can communicate with each other, nor there is a centralized controller to do so).
Remark 3
When number of nodes is small, the interval between consequtive alloted slots of each node in RR is also small. Therefore, depending on the transmission cost of a node, it may not be optimal for the node to transmit packet in every alloted slot. Hence, to account for this fact, we consider that in RR, a node transmits in the alloted slot with probability obtained using (III). Further, due to the specific choice of (III) for obtaining transmission probability under RR, the comparision of corresponding plots for the learning algorithm and RR provides nice insight regarding the impact of collision on the learning algorithm (III).
For the learning algorithm (III), as increases, there are two phenomena which simultaneously influence the transmission probability: For large , frequency of packet collision is high. Therefore, average transmission cost increases with , thereby decreasing the transmission probability. With more collisions happening (and fewer packets getting received by the monitor) due to large , timeaveraged age becomes high, hence increasing the transmission probability. As shown in Fig. 4, for small , phenomenon dominates, thereby increasing the transmission probability. However, when is large, the two phenomena balances each other, and hence, the transmission probability gets saturated.
For roundrobin scheme, as increases, interval between successive alloted slots of each node becomes large. Therefore for a fixed transmission probability, average transmission cost decreases, whereas timeaveraged age increases: both leading to increase in transmission probability. Therefore, the transmission probability under RR increases very rapidly with (in comparision to the learning algorithm).
Additionally, we analysed the variation in timeaveraged age with increase in for the learning algorithm as well as roundrobin scheme. As shown in Fig. 5, timeaveraged age for the learning algorithm increases very rapidly with in comparision to the roundrobin scheme. If a packet from node is successfully received by the monitor once in every slots, then using (2) and assuming ( is the number of slots in each frame) to be an integer, we get the timeaveraged age to be
(14) 
In roundrobin scheme, a packet is successfully received every slots, where is the number of alloted slots per transmission for node . As shown in Fig. 4, transmission probability increases with , and hence, decreases (approaches 1) as increases. Therefore, increase in timeaveraged age for the learning algorithm (III) is , which converges to when is large. Fig. 5 shows a similar trend as can be verified using the transmission probability values from Fig. 4.
Now for the learning algorithm (III), probability that a packet of node is received by the monitor is . Let the transmission probability of each node to be equal (say, ). Therefore, the probability that a packet of node is received by the monitor becomes , and hence, the expected number of slots required for each successful reception of packet by the monitor is . So when increases, the timeaveraged age for the learning algorithm grows exponentially.
Finally, we also computed the price of anarchy () for the utility function of each node being (4). For any combination of transmission probability of nodes given by , the overall utility of the system is given by , where is the utility of node . Therefore, of the learning algorithm is
(15) 
where, is the optimal transmission probability vector which maximizes , while is the vector of (converged) transmission probabilities obtained using the learning algorithm (III). Note that , and a value close to 1 indicates that the algorithm is close to optimal.
V Conclusion
In this paper, we have presented a new direction in achieving equilibrium in a distributed IoT setting, where each node is interested in minimizing its age of information when there is a cost for each transmission. Typically, for distributed models, one identifies an utility function for each node and tries to establish a NE for it. However, such an approach requires the network knowledge, e.g., the number of nodes in the network, and their strategies, which may not be available in a distributed network. We instead propose a simple local update (learning) strategy for each node that determines the probability with which to transmit in each slot, that depends on the current empirical average of age and cost. This strategy for appropriate choice of parameters is shown to achieve an equilibrium that is also identified by a NE for a suitable virtual game. To further quantify the efficiency of this learning strategy, it is shown that the price of anarchy of the virtual game approaches unity when the number of nodes in the network is large enough.
References
 [1] (1998) Individual equilibrium and learning in processor sharing systems. Operations Research 46 (6), pp. 776–784. Cited by: §I.
 [2] (2008) Stochastic subgradient methods. Lecture Notes for EE364b, Stanford University. Cited by: §IIIC.
 [3] (2013) Distributed spectrum access with spatial reuse. IEEE Journal on Selected Areas in Communications 31 (3), pp. 593–603. Cited by: §I.
 [4] (2010) On learning algorithms for nash equilibria. In International Symposium on Algorithmic Game Theory, pp. 114–125. Cited by: §I.
 [5] (1988) Numerical techniques for stochastic optimization. SpringerVerlag. Cited by: Appendix F.
 [6] (1997) Learning and implementation on the internet. Manuscript. New Brunswick: Rutgers University, Department of Economics. Cited by: §I.
 [7] (2017) Scheduling algorithms for minimizing age of information in wireless broadcast networks with random arrivals: the nobuffer case. arXiv preprint arXiv:1712.07419. Cited by: §I, §I, §II.
 [8] (2015) Optimizing ageofinformation in a multiclass queueing system. arXiv preprint arXiv:1504.05103. Cited by: §I, §I.
 [9] (2018) Scheduling algorithms for optimizing age of information in wireless networks with throughput constraints. In INFOCOM, 2018 Proceedings IEEE, Cited by: §I, §I, §I, §II.
 [10] (2010) Opportunistic medium access in multichannel wireless systems: a learning approach. In Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on, pp. 1288–1294. Cited by: §I.
 [11] (2012) Status updates through queues. In 2012 46th Annual Conference on Information Sciences and Systems (CISS), pp. 1–6. Cited by: §I.
 [12] (2012) Realtime status: how often should one update?. In INFOCOM, 2012 Proceedings IEEE, pp. 2731–2735. Cited by: §I, §I.

[13]
(2009)
Multiplicative updates outperform generic noregret learning in congestion games.
In
Proceedings of the fortyfirst annual ACM symposium on Theory of computing
, pp. 533–542. Cited by: §I.  [14] (2017) Age of information: a new concept, metric, and tool. Foundations and Trends® in Networking 12 (3), pp. 162–259. Cited by: §I.
 [15] (2015) Online learning of nash equilibria in congestion games. SIAM Journal on Control and Optimization 53 (2), pp. 1056–1081. Cited by: §I.
 [16] (2012) Discrete event stochastic processes. Lecture Notes for Engineering Curriculum. Cited by: Appendix B, Theorem 7.
 [17] (2013) Designing games for distributed optimization. IEEE Journal of Selected Topics in Signal Processing 7 (2), pp. 230–242. Cited by: §I.
 [18] (2013) Distributed welfare games. Operations Research 61 (1), pp. 155–168. Cited by: §I.
 [19] (2018) Content based status updates. arXiv preprint arXiv:1801.04067. Cited by: §I, §I.
 [20] (1994) A course in game theory. MIT press. Cited by: Proposition 1.
 [21] (2009) Stochastic learning solution for constrained nash equilibrium throughput in non saturated wireless collision channels. In Proceedings of the Fourth International ICST Conference on Performance Evaluation Methodologies and Tools, pp. 61. Cited by: §I.
 [22] (2017) A note on the birkhoff ergodic theorem. Results in Mathematics 72 (12), pp. 715–730. Cited by: Appendix A.
 [23] (2007) If multiagent learning is the answer, what is the question?. Artificial Intelligence 171 (7), pp. 365–377. Cited by: §I.
 [24] (2018) Ageoptimal updates of multiple information flows. In IEEE INFOCOM 2018IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 136–141. Cited by: §I, §I, §II.
 [25] (2017) Update or wait: how to keep your data fresh. IEEE Transactions on Information Theory 63 (11), pp. 7492–7508. Cited by: §I.
 [26] (2006) Reverse engineering MAC. In 2006 4th International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks, pp. 1–11. Cited by: §I.
 [27] (2009) Analysis ii, texts and readings in mathematics, vol. 38. Hindustan Book Agency, New Delhi. Cited by: Theorem 8.
 [28] (2017) When to arrive in a congested system: achieving equilibrium via learning algorithm. In 2017 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), pp. 1–8. Cited by: §I.
 [29] (2017) Age of information in multisource systems. In GLOBECOM 20172017 IEEE Global Communications Conference, pp. 1–6. Cited by: §I, §II.
 [30] (2016) The age of information: realtime status updating by multiple sources. arXiv preprint arXiv:1608.08622. Cited by: §I, §I.
Appendix A Proof of Lemma 1
From (1), , where
has Bernoulli distribution (takes value 1 with probability
, and 0 otherwise). Since are independent and identically distributed, therefore whenis large, we get relation (1) using strong law of large numbers.
Now, note that , where is the event that a packet transmitted by node is not received by the monitor in slot . Hence,
(16) 
We also have following Lemma (proved in Appendix B):
Lemma 3
For a fixed , the sequence
is an ergodic uniform Markov chain.
Appendix B Proof of Lemma 3
Within a frame , is fixed, and hence,
where is the probability that the packet transmitted by node in a slot of frame is successfully received by the monitor. Therefore for a given state , is written independently of , and the transition probability is independent of . Therefore, the sequence is a uniform Markov chain. Further, note that , as well as . Therefore, the Markov chain is also aperiodic (i.e., period). Hence, to prove that the Markov chain is a ergodic uniform Markov chain, it is sufficient to show that it positive recurrent.
Also, from any state , any other state can be reached in finite number of steps (slots) with positive probability, given by if , and if . So, the Markov chain is a single communicating class. Hence to show that it is positive recurrent, it is sufficient to show that any particular state is positive recurrent [16].
Let denote the probability for returning to state in step. Then , we have . Hence,
(17) 
Therefore using Theorem 7, we conclude that the state is positive recurrent, thereby proving Lemma 3.
Theorem 7
[Theorem 2.42.5 in [16]] If , then the state is recurrent. Additionally, if , then is positive recurrent.
Appendix C Proof of Theorem 4
If the learning algorithm converges to the maximizer , then it should satisfy:
(18)  
(19) 
Therefore using (18) and (19), we can write
(20) 
Integrating on both sides of (20) w.r.t. , we get (6) (with as integration constant), which is continuous and strictly concave () for in interval . Also, is continuous, and it can be verified that for , , while for , . So, at which , and is the unique maximizer because of strict concavity of .
However, note that on solving (19), we get
(21) 
Since , so using (21), we get , and as , therefore, . Hence, .
Remark 4
Note that (18) follows from (III), which uses Lemma 1 assuming the limit . Additionally, for the convergence of , we assume , and from Theorem 6, as . Therefore, for (20) to hold, we initially take the limit , followed by the limit . If the order of the two limits is exchanged, then would converge to 0 before converges to , and hence (18) and (19) will not be satisfied.
Appendix D Proof of Theorem 5
Best response strategy for the noncooperative game can be expressed as a function , where and are dimensional vectors. To prove Theorem 5, we use contraction mapping theorem:
Theorem 8
[Theorem 6.6.4 in [27]] In a metric space , a function is called a strict contraction, if there exists a constant , such that , . Additionally, if is nonempty and compact, then has a unique fixed point, i.e., there exists a unique such that , and sequences of the form converges to .
Let be the metric space with infinity norm as the distance metric. Then for any ,
(22) 
where in (a), is the Jacobian (whose elements are given by ), and the matrix norm is induced by the vector norm. Also, is nonempty (since ) and compact. Hence, to prove the existence of unique fixed point for using Theorem 8, it is sufficient to show that .
Lemma 4
If , then .
Comments
There are no comments yet.