Coordinated Random Access for Industrial IoT With Correlated Traffic By Reinforcement-Learning

09/17/2021 ∙ by Alberto Rech, et al. ∙ Università di Padova 0

We propose a coordinated random access scheme for industrial internet-of-things (IIoT) scenarios, with machine-type devices (MTDs) generating sporadic correlated traffic. This occurs, e.g., when external events trigger data generation at multiple MTDs simultaneously. Time is divided into frames, each split into slots and each MTD randomly selects one slot for (re)transmission, with probability density functions (PDFs) specific of both the MTD and the number of the current retransmission. PDFs are locally optimized to minimize the probability of packet collision. The optimization problem is modeled as a repeated Markov game with incomplete information, and the linear reward-inaction algorithm is used at each MTD, which provably converges to a deterministic (suboptimal) slot assignment. We compare our solution with both the slotted ALOHA and the min-max pairwise correlation random access schemes, showing that our approach achieves a higher network throughput with moderate traffic intensity.



There are no comments yet.


page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

MTC are considered as a key emerging application of 5G-and-beyond cellular networks, and the technology should be updated to support them. The sporadic nature of transmissions by a large number of MTD make inefficient the current uplink multiple access scheme, based on resource request and grant. Thus, a RA solution is to be preferred. Uncoordinated RA [1] has been advocated as effective in dealing with collisions, while entailing a limited communication overhead. Still, the high density of MTD in 5G-and-beyond networks highly increases the chances of collisions in absence of coordination. In particular, in iiot scenarios, the uplink traffic generated by MTD may be highly correlated, as a result of common underlying traffic generation phenomena. For example, close-by temperature sensors in a production line may send signals almost simultaneously, as they sense the same variation of temperature. On one hand, this correlation further increases the chances of collisions, while, on the other hand, it can be exploited to indirectly coordinate RA, to satisfy the strict throughput and latency requirements of iiot applications.

In the literature, several coordinated RA approaches have been proposed. A first and widely used solution is the sALOHA, where time is organized in slots, MTD transmit at the beginning of the first slot after the packet generation, and, in case of collisions, a random delay is added before retransmission. Typically, the random delay has the same statistics for all MTD and the coordination is limited to the synchronization of slots. An URLLC scenario, wherein a set of devices are competing for a limited number of slots in uplink, is considered in [2]: an iterative online learning algorithm running at each device updates the slot selection, based on the achieved latent throughput. However, the correlation in the packet generation process is not exploited. Instead, traffic correlation has been considered in an m2m scenario [3], where MTD are clustered and a compressed sensing algorithm is applied to allocate resources to the clusters. Still, clustering entails a significant overhead, greatly reducing the efficiency of the RA scheme. An extreme case of coordinated RA is the fast uplink grant, where each MTD is assigned a single slot, shared with other MTD, thus collisions may still occur. Under a correlated traffic scenario, the MMPC scheme [4] assign slots by grouping MTD according to their correlation in packet generation. MMPC is designed for a system without retransmissions in case of collisions, which is however a useful feature in many scenarios. A traffic prediction-based approach for fast uplink grant is proposed in [5], where the packet generation and transmissions is modeled by an hmm, and the slot allocation aims at minimizing the average packet age of information. Also in this case, retransmissions are not considered. Moreover, both [4] and [5] are centralized solutions, where the gNB allocates slots and communicates the allocation to MTD, thus suffering from a communication overhead.

In this paper, within a context of cellular system supporting iiot, we propose a novel coordinated uplink RA solution under correlated traffic: our solution aims at overcoming the limitations of current coordinated RA solutions. Time is divided into frames, each split into slots, and each MTD randomly selects a slot for its transmissions. The PDF for random slot selection is designed specifically for each MTD and for the number of the current retransmission. The PDF are obtained by an iterative approach, carried out locally at each MTD, to minimize the probability of collision. To this end, we first model the distributed optimization problem as a repeated Markov game with incomplete information, where MTD are the players and transmission slots are the actions. Then, we resort to the LRI algorithm for the PDF optimization. The LRI provably converges to a (suboptimal) pure strategy, thus MTD will deterministically select the transmission slot, still in different positions for each retransmissions. Lastly, we compare our solution with the sALOHA and MMPC RA schemes, showing that our approach achieves the highest network throughput with moderate traffic correlation and intensity.

The rest of the paper is organized as follows. In Section II we introduce the system model of correlated packet generation and slotted coordinated RA. The Markov game model describing our distributed optimization problem and the proposed reinforcement-learning algorithm are both presented in Section III. In Section IV we discuss the numerical results and compare our LRI scheme with existing RA schemes. Finally, in Section V we draw some conclusions.

Notation:vectors are denoted in lower-case bold, matrices as uppercase bold. and are the probability and expectation operators, respectively.

Ii System Model

We consider a cellular network with static MTD. Each MTD transmits in uplink to a gNB. Time is split into frames, each split into slots. We first describe the packet generation procedure, and then the RA protocol.

Ii-a Packet Generation

Let be the indicator function of packet generation, i.e., if MTD generates a packet at frame , while otherwise; let us also define the row vector . Packet generations are triggered by events common to multiple (random) MTD, therefore variables ,

are correlated. In particular, the packet generation statistics is described through the joint probability distribution


where . Moreover, let be the marginal probability of packet generation at MTD . In Section IV, we will consider a specific correlated traffic generation model, while the derivation of our proposed RA scheme holds in general for any correlated traffic.

We assume that each MTD can store only one packet for transmission and an MTD that already stores a packet will drop other generated packets. Packets are generated at the end of each frame and stored (one per MTD), then their transmission starts in the next frame. At each frame , packets may be generated at MTD with a joint probability, according to the underlying process (e.g., detection of temperature variation in an industrial line).

Ii-B RA Scheme

According to a coordinated RA protocol, each MTD with a stored packet attempts to transmit it in each frame, selecting slot , until either the packet is successfully delivered to the gNB, or a maximum number of transmissions is achieved. In this latter case, the packet is discarded.

Let , , indicate that MTD in frame is performing the -th transmission attempt of its packet. We also set if , and define the vector . If the maximum number of transmission attempts is reached in frame , the packet is discarded, and at the next frame we have .

The probability of MTD transmitting in slot at the -th attempt is


Thus, the PDF of the slot selected for transmission by MTD in frame is , which depends of the number of transmissions of the current packet.

The PDF define the RA scheme. For example, a SALOHA protocol selects the slot uniformly at random and independently for MTD, i.e., . The design of MTD PDF is the subject of this paper, and will be discussed in the next sections.

Ii-C Collision Model and gNB Feedback

A collision occurs whenever two or more MTD schedule their transmissions in the same slot. In this case, we assume that the gNB observes an erasure and cannot decode any packet, thus all transmissions fail. In absence of collisions, we assume that the gNB always correctly receives the packet.


be the binary variable representing the outcome of the transmission of MTD

at frame , thus if the transmission is successful, and otherwise. The success probability at frame is


At the end of frame , the gNB sends in unicast the acknowledgement to each MTD for which the packet was successfully received.

Knowledge Assumptions

The statistics of packet generation are not known and the acknowledgments are sent in unicast, thus the outcome of transmissions is known only to the transmitting MTD.

Iii PDF Optimization

We now propose a fully distributed algorithm for the optimization of the MTD PDF . The algorithms operates locally at each MTD, with the aim of maximizing the MTD individual throughput, i.e., minimizing the number of retransmissions.

To this end, we first model the RA scheme as a Markov game, wherein MTD are the players competing in the slot selection. Markov games are of particular interest as they represent a specific framework for multi-agent rl. Indeed, differently from a mdp, wherein a single adaptive agent interacts with the environment and secondary agents can only be part of it, Markov games allow to model multiple adaptive agents (players) interacting each other for cooperative or competing goals [6]. Several multi-agent rl algorithms have been developed to learn equilibrium points in Markov games. For the specific task of slot selection, we resort to LRI, a learning automata algorithm, which learns an equilibrium point of the game by updating the PDF at each retransmission. LRI provably converges to a sub-optimal deterministic solutions, leading each MTD to always transmit in the same slot when facing a certain transmission attempt.

Iii-a Slot Selection As a Markov Game

Game Definition

Our slot selection process can be modeled as a Markov game (also called stochastic game), where MTD are the players, and their actions are their slots selected for transmission. The game is played in multiple rounds, once per frame.

The action taken by each MTD in frame depends only on the number of retransmissions , which represents the state of the player. There are states, denoted as , where state 0 indicates that the MTD has no packet to transmit, while at state the maximum number of retransmissions is reached. The strategy of player is the set of PDF by which actions are taken, i.e., ; note that at state 0 only one action (no transmission) is accessible.

At the end of each round (frame), player receives the reward , which depends on the actions of all the players. The utility function is the expected reward, which for each MTD can be written as


where we highlighted the dependency of the utility from the strategies. The objective of MTD , is to find a strategy matrix which maximizes its own expected reward, i.e.,


This is a game of incomplete information, since players have no knowledge on the other players actions. Each player selects its own strategy with an individual objective, thus the game is non-cooperative.

Figure 1: State-action transition diagram of MTD . Ellipses denote the states , while the squares denote the actions .

State Transitions

We now describe the state transitions, with their probabilities, which are also depicted in Fig. 1. The transition from state to state is due only to a new packet generation, thus occurs with probability . States , evolve either towards state (successful transmission) or towards state (failed transmission): the first case occurs with probability upon action , while the latter occurs with probability . When in state , the packet is either successfully received or discarded and possibly replaced by a new packet: thus, this state evolves with probability to state 1 and with probability to state 0. In Fig. 1, ellipses denote states , while squares denote actions , and on the arrows we indicate either the probabilities of taking actions (moving from an ellipse to a square) or the state transition probability for a given action (moving from a square to an ellipse). When in state 0, only one action is possible (no transmission) denoted with 0 in the square.

State of the Game

The state of the game at round is the collection of the states of all players, . Let be the strategy matrix of player , defined as


Let us also define the matrix collecting all strategies of each user in each state as .

Iii-B Learning The Strategies

The objective of each MTD is to find a strategy that maximizes the expected reward (5) at each transmission attempt. To this end, we resort to the LRI algorithm [7], which is run locally by each MTD and works iteratively, one iteration per frame. Let be the strategy of MTD at frame , where .

At the first iteration, we start with a uniform PDF for all the MTD, i.e., for all and .

At iteration , MTD (storing a packet) transmits in a random slot selected according to its state and strategy. For failed transmissions (), the strategy is not updated, thus , . If a packet is successfully received (), MTD updates its strategy as follows


where is the learning rate, which dictates the speed of the learning process. PDF relative to other retransmissions than are left unaltered, i.e., , . From (7) we note that the probability of transmitting in slot is increased, while the other slots are penalized.

We remark that LRI does not require any knowledge of the other players states and strategies. In fact, from (7), it can be seen that the algorithm is fully distributed.

Iii-C LRI Convergence

It is proven that, for small values of , the LRI algorithm converges to a pure Nash equilibrium [8], i.e., the strategy of any player maximizes its utility function, given the strategies of all other players [9]. Moreover, LRI converges to a pure strategy, i.e., only one slot is deterministically selected by each MTD at each retransmission.

However, note that LRI may not provide to the maximum utility , and in general will not even provide the maximum sum of utility among all MTD. Still, it converges to a deterministic policy, ensuring the stability of the algorithm.

Iv Numerical Results

(a) Average system throughput .
(b) Average packet transmission time .
Figure 2: Average system throughput (a) and packet transmission time (b) of LRI and SALOHA as a function of , for and .
(a) Average system throughput .
(b) Average packet transmission time .
Figure 3: Average system throughput (a) and packet transmission time (b) of LRI and SALOHA as a function of , for and .

To assess the performance of the proposed solution, we consider a network where each frame comprises slots and MTD are uniformly randomly placed in square area of side  m.

We consider the space-time Poisson process traffic model of [4], modified here to take into account packet generations at multiple frames. First, active frames, when packets are generated, are modeled by a temporal Poisson process of intensity . In an active frame, several positions (events) are selected in the area, according to a space Poisson point process of intensity , and all MTD within  m from an event generate one packet. Note that as increases, we have two effects: the increase of correlation in packet generation and a higher average number of generated packets in the cell.

Performance is assessed in terms of average packet transmission time (delay) and system throughput. In formulas, the average packet transmission time (in frames) is


while the average system throughput is defined as the ratio between the average number of packets successfully received at the gNB and the average number of frames used for its transmission, i.e.


Note that, in our LRI scheme, each MTD acts in a selfish fashion, thus an optimal result for the global system throughput (9) cannot be obtained in general.

Iv-a Low-Traffic Scenario

We first consider a low-traffic scenario (), where packets are generated only in frame . The maximum number of transmission attempts is here .

Fig.s 1(a) and 1(b) show the average system throughput in packets per frame [pkts/fr] and the average packet transmission time in frames [fr], respectively, both as a function of the spatial events generation rate . The performance according to both metrics is reported for both the LRI and SALOHA RA schemes. We observe that our LRI solution outperforms SALOHA with low and moderate event generation rates , while the performance decreases for high values of . Indeed, going from small to moderate event rates, the traffic correlation increases, a condition exploited by LRI, which yields a higher throughput than SALOHA. For high values of , instead, the overall generation rate increases (more packets are generated), which increases collisions and ultimately decreases the rate and increases the packet delay, as well known for these kinds of RA schemes. Moreover, notice that, for very high values of , the average transmission time is reduced, due to the increase of the probability of packet expiration (maximum number of transmission attempts reached).

Indeed, as LRI in general does not find the optimal maximum throughput solution, in this case it turns out to be suboptimal also with respect to SALOHA.

Iv-B Throughput vs Traffic Intensity

We then consider various traffic intensity scenarios, and Fig. 2(a) and Fig. 2(b) show the performance as a function of , for .

Note that new packets are generated on average every frames, and we always generate packets at frame .

In this scenario, whenever a packet is generated at MTD while another is in its buffer, the old packet is dropped and the counter of transmission attempts restart. We note that as increases, the average delay increases and the throughput decreases: this is due to the fact that more packets yield more collisions, with a throughput reduction. Moreover, we observe that the gain of LRI over sALOHA, in terms of both delay and throughput, vanishes as increases. Taking for example the first frames, with a higher , there will be new packet generations at frame , when some MTD are still handling packets generated at , therefore the statistics of is altered by the new arrivals, decorrelating the resulting traffic. Packet drops due to new arrivals have a significant impact also on the average delay: indeed, from Fig. 2(b), we observe that the curves are nearly flat for very high values of . In this case, LRI loose its advantage over sALOHA.

Figure 4: Average system throughput of LRI, MMPC, and SALOHA, as a function of for .

Iv-C Single Transmission

We now consider the case , which provides a direct comparison with the MMPC scheme of [4]: in this case, all colliding packets are discarded without further retransmissions. From (8), we have , therefore the system throughput boils down to the expected number of successful transmissions in a frame. Fig. 4 shows the average system throughput of our LRI, MMPC, and SALOHA, as a function of the event generation rate , for . The throughput behaviour is similar to that with , providing a higher improvement for low event generation rates, while being overcome by MMPC and SALOHA for higher event generation rates. Indeed, we note that, although both LRI and MMPC are designed taking into account the traffic correlation, LRI has better performance up to moderate event generation rates, as it changes the MTD strategies at each retransmission. Again, we observe that for high values of all RA schemes achieve a lower throughput, with LRI degrading its performance due to the selection of a suboptimal solution.

Figure 5: Throughput gain of LRI as a function of the frame , for different traffic correlations (, 0.04, and 0.08) and .

Iv-D Convergence Speed for Single Transmission

Finally, we evaluate the convergence speed of the learning algorithm with , which still allows the comparison with MMPC. For the training, we set the learning rate . Let us define the throughput gain


where is the throughput computed after frames of learning, is the throughput of sALOHA and is the throughput of LRI at convergence. Note that we initialize the LRI algorithm with uniform PDF, thus and . At convergence, we have . Fig. 5 shows the throughput gain of LRI, as a function of the learning frames. For comparison purposes, we also report the throughput gain (normalized to the LRI throughput) of MMPC, obtained by replacing with the MMPC throughput in (10). Three event generation intensities are considered, , 0.04, and 0.08. We observe that convergence is faster for high values of , as the correlation is in this case stronger, thus the LRI iterations quickly adjust the strategy. Moreover, LRI already outperforms MMPC within about 1 000 frames. The learning process is slower for low values of , requiring more than 2 000 frames to overcome the throughput achieved with MMPC.

V Conclusions

In this paper, we derived a coordinated RA scheme for an MTC scenario with traffic correlation. We modelled each MTD as a player of a Markov game of incomplete information and, applying the LRI algorithm, we derived pure Nash equilibrium strategies for each player. Numerical results show that our proposed LRI solution outperforms the state-of-the-art RA schemes for moderate traffic correlation and intensity.


  • [1] D. Zucchetto and A. Zanella, “Uncoordinated access schemes for the IoT: Approaches, regulations, and performance,” IEEE Commun. Magazine, vol. 55, no. 9, pp. 48–54, Sep. 2017.
  • [2] A. Destounis, D. Tsilimantos, M. Debbah, and G. S. Paschos, “Learn2MAC: Online learning multiple access for URLLC applications,” in Proc. IEEE Conf. on Computer Commun. Workshops (INFOCOM WKSHPS), vol. Apr., 2019, pp. 1–6.
  • [3] Y. Chang, P. Jung, C. Zhou, and S. Stanczak, “Block compressed sensing based distributed resource allocation for M2M communications,” in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 3791–3795.
  • [4] A. E. Kalør, O. A. Hanna, and P. Popovski, “Random access schemes in wireless systems with correlated user activity,” in Proc. IEEE Int. Workshop on Signal Processing Advances in Wireless Commun. (SPAWC), Jun. 2018.
  • [5] M. Shehab, A. K. Hagelskjar, A. E. Kalør, P. Popovski, and H. Alves, “Traffic prediction based fast uplink grant for massive IoT,” in Proc. Int. Symp. on Personal, Indoor and Mobile Radio Commun. (PIMRC), Aug. 2020, pp. 1–6.
  • [6] M. L. Littman, “Markov games as a framework for multi-agent reinforcement learning,” in Machine learning proc.   Elsevier, 1994, pp. 157–163.
  • [7] I. J. Shapiro and K. S. Narendra, “Use of stochastic automata for parameter self-optimization with multimodal performance criteria,” IEEE Trans. on Systems Science and Cybernetics, vol. 5, no. 4, pp. 352–360, Oct. 1969.
  • [8] P. S. Sastry, V. V. Phansalkar, and M. A. L. Thathachar, “Decentralized learning of nash equilibria in multi-person stochastic games with incomplete information,” IEEE Trans. on Systems, Man, and Cybernetics, vol. 24, no. 5, pp. 769–777, May 1994.
  • [9] S. Tadelis, Game theory: an introduction.   Princeton university press, 2013.