In the upcoming Internet of Things (IoT) an immense number of devices will be connected to each cellular station–forecasts predict 1 million devices per station . IoT connectivity is primarily aimed at establishing central authentication, security, and management of those devices. However, fine-tuned coordination functionalities (transmit power selection, transmission scheduling, code assignment, etc) are considered very expensive to be handled centrally, since the cellular station would need to collect a bulky state information for each device and solve large-scale optimization problems. For these reasons, it is anticipated that IoT communications will rely on uncoordinated access, i.e., a channel will be dedicated to IoT access and each IoT transmitter will decide individually which transmission pattern to use. Here, we study the use of Online Learning methods for transmission pattern selection.
We consider transmitters scattered in a geographical area, all wanting to transmit to the cellular station (e.g. a common sink), as shown in Fig. 1. We further assume that a) for reasons of overhead reduction, there is no coordination between a transmitter and the cellular station, and b) for reasons of security there is no coordination among different transmitters. Each transmitter must decide on its own when and how to transmit.
I-a Random access protocols
Traditional protocols that can operate in this setting are based on random access. Historically, pure ALOHA was the first such protocol, where a user transmits with a probability . This was later extended to slotted-ALOHA , which used synchronization to double user throughput. A more mature random access protocol is the Carrier Sense Multiple Access (CSMA), where the transmitter checks whether the medium is idle before sending. Also, in the enhanced version with collision avoidance (CSMA/CA) the transmitter “backs-off” (selects a smaller probability of access) every time there is a collision, while also uses ready-to-transmit (RTS) and clear-to-transmit (CTS) signals to reduce the impact of a collision on throughput .
Random access protocols suffer from collisions and idle time, and therefore they achieve lower throughput than the maximum possible. In an effort to improve the throughput achievable by uncoordinated access, many exciting algorithmic ideas have been proposed. For example, Q-CSMA  is a protocol where the transmitters avoid collisions by finding efficient schedules in a distributed manner (see also ). Although Q-CSMA is shown to asymptotically achieve 100 throughput (maximum possible), it suffers from large delays. Another interesting direction is the idea of successive cancellation and replica transmission . In this enhanced random access protocol, each transmitter sends multiple replicas of the same packet within a frame. Normally, a large number of collisions occur, but with the assumption that the Signal-to-Interference-plus-Noise (SINR) levels of transmitters are relatively different, the receiver can decode the strongest one, subtract it from the next, etc, and eventually decode correctly all signals. This protocol achieves high throughput, but at the cost of excessive energy usage, which is a concern in IoT applications.
I-B Communication requirements for IoT
We list our requirements for IoT communications.
The Ultra Reliable Low Latency Communications (URLLC) class is a popular 5G definition for communications of high fidelity, seen as an enabler for remote control of vehicles, and other demanding applications. In URLLC, a given amount of bits must be received before a strict deadline (in periods) with a very high probability (often 0.99999). This reliability guarantee is extremely important in automation and remote control, as well as in applications where freshness of information is essential, and the operation of some IoT applications will rely on such guarantees. For this reason, we depart from pure throughput considerations, and we define below the latent throughput, which suffices to meet URLLC requirements.
Time is split in frames and within each frame there are slots. A frame is then called “successful for transmitter ” if it contains or more successful transmissions of transmitter . Successful transmissions in previous frames do not count towards the success criterion of the current frame. The latent (URLLC) throughput is the empirical frequency of successful frames. We note that no existing random access protocol provides latent throughput guarantees, as all of them are designed for maximizing pure throughput which is different from latent throughput. For example, successful transmissions within a frame provide pure throughput, but amount to latent throughput. More generally, latent throughput optimization is a difficult problem even with centralized coordination 
, and has strong ties to the theory of Markov Decision Processes.
I-B2 Energy consumption
Since the majority of IoT devices will work on batteries, energy consumption must be minimized. In this work we assume that energy is proportional to the number of transmissions.
I-C Our contribution
In this paper we propose a protocol for uncoordinated medium access, which is based on the theory of Online Learning . First, we restrict our transmitter to choose transmission patterns in the beginning of the frame, and in particular, we further restrict its options to a randomized dictionary of patterns. During operation, the transmitter first chooses a pattern from the dictionary at random, and then implements the pattern within the frame. The learning operation amounts to progressively adjust the probability distribution of pattern selection using an online exponentiated gradient descent algorithm. Our simulations show that the resulting Learn2MAC scheme:
Achieves high URLLC throughput and low energy consumption, when faced against (i) TDMA interference, or (ii) Random access interference.
Multiple Learn2MAC users can outperform, in terms of latent throughput, the ALOHA users by as much as 100%.
Ii Problem formulation
Ii-a System model and assumptions
There are transmitters sharing the uplink of our system. Time is split in frames of slots. At the beginning of frame , transmitter decides a pattern of transmissions to be used within the frame; we denote this decision with , where indicates transmission in slot , and
indicates idling. Therefore, at each frame a transmitter chooses its pattern as a binary vector of lengthfrom the set
Our pattern selection setting is very general, as the next example suggests.
Example 1 (Aloha).
Consider , where all possible transmission patterns are . A simple protocol could be: “choose one pattern at random with probability 1/4 independently of past events”. Incidentally, this corresponds to a slotted-ALOHA with .
We make the following assumptions about our system.
If two or more transmitters have selected to transmit at the same slot, we have a collision and all transmitted information in this slot is lost.111In this paper we study the “hard interference” scenario for simplicity. We mention, however, that our work can be extended to other interference models.
At the end of frame , the cellular station provides feedback information about the occupancy of each slot (idle/success/collision) to all transmitters.
We reserve to denote the pattern selected by user in frame , and to index patterns in the set . Because of (A.1), a pattern produces a successful transmission for user in slot (an event denoted with ) only if . Equivalently, we write:
Ii-B Performance metrics
Our protocol design is driven by certain objectives, which are used to form the utility function of each transmitter.
URLLC throughput. In frame a pattern is called successful, denoted with , if it contains at least successful transmissions.222 in this case is an application-specific parameter that captures the amount of successful transmissions required within a frame in order for user to achieve its URLLC requirement. In 5G standardization takes small values for reasonable signal strengths, i.e., for it is . Using (1), can be computed as follows:
To increase URLLC reliability, transmitter wants to maximize URLLC throughput , where is some large integer that represents the horizon of interest for the application.
Energy. We assume that the consumed energy is proportional to the rate of transmissions per frame, given by .
In summary, the instantaneous utility obtained by transmitter in frame is given by:
where scalar is a transmitter-selected weight that balances the importance of URLLC throughput and energy consumption. We mention that is unknown to transmitter since it depends on the patterns of all other users, via .
Ii-C Problem formulation
We would like to design a distributed protocol where each transmitter decides its pattern based only on the feedback of (A.2) in order to optimize the long-term average utility at some horizon :
Random access protocols are expected to perform poorly w.r.t. this objective due to their following limitations. By design they do not ensure high latent throughput –as the number of transmitters increases the total latent throughput approaches zero–, they suffer from collisions and thus high energy levels per achieved throughput, and finally, they have limited flexibility and they are not adaptive to circumstances. These considerations lead us to design a novel architecture, where each device performs an online learning algorithm in order to determine the most appropriate pattern for maximizing the obtained utility.
Iii Architecture based on online learning
We take the individual viewpoint of transmitter and optimize the utility assuming that the rest transmitters are uncooperative, and their transmissions are seen as interference. In particular, to design an adaptive and robust algorithm, we will further assume that the other transmitters are adversaries that are choosing their patterns in order to lower . This worst-case approach will allow us to design an algorithm that is sensitive to interference and quickly adapts to changes in the environment.
Iii-a Restricting the design space
As in most learning problems, restricting the dimensions is essential for constructing an efficient solution. In our problem, the number of possible patterns for transmitter is equal to the number of all possible binary vectors of length , i.e., equal to . For values encountered in practice (e.g. ) this creates an enormous action space.
We introduce the concept dictionary of patterns, i.e., a preselected subset of patterns of cardinality , to which transmitter will be restricted. The dictionary of patterns mimics the idea of the codebook in communications, where a subset of codes is designed off-line, and at runtime the transmitter selects a code from the codebook.
Iii-A1 Basic rules for creating dictionaries
We provide some practical directions into creating pattern dictionaries.
The zero pattern should always be included in the dictionary, since on many occasions a good action for user will be to remain silent within a frame.
Non-zero patterns with should not be used, since they can not guarantee a successful frame and they consume more energy than the zero pattern.
Patterns with different values should be used to allow exploration of protocols with different levels of energy and redundancy of transmissions.
For purposes of learning acceleration, the cardinality of the dictionary should be kept small, e.g. .
To avoid excessive number of collisions, it is preferable if different transmitters have different dictionaries. This can be achieved by generating the dictionaries in a random manner. However, we mention that having the same dictionary allows transmitters to share learned models, therefore the best approach would be to use groups od pseudo-random transmission patterns.
Iii-A2 Pattern dictionary design
It is interesting to formulate the dictionary design as an optimization problem. However, we mention a few caveats. First, the optimization depends on the protocol of transmitters other than , therefore this problem makes sense mostly when the rest of the transmitters have fixed and known protocols. Second, this is a combinatorial problem with non-convex objective and large dimensions, therefore a highly non-trivial optimization to solve.
Instead, we will take a very simple approach which appears to work in practice. We propose to use a simple, randomized, and fully distributed dictionary design algorithm. In particular, transmitter chooses its dictionary by (i) including the zero pattern, (ii) excluding every pattern with less than transmissions, (iii) and then choosing the remaining patterns at random. Specifically, fix to be a large value which, however, will not slow down our algorithmic computations. For instance, a typical value could be between and . Start with an empty dictionary, i.e., . Also, recall that is determined by the URLLC application. Then repeat the following steps:
Randomized Dictionary Algorithm:
Initialize dictionary with the zero pattern, i.e. .
Choose a number uniformly at random in (the number of transmitting slots in a pattern).
Choose a random binary vector with ones (i.e. with transmitting slots).
If , then add it to the dictionary .
In the remaining we will assume that the dictionary of our transmitter is chosen with the above algorithm, and remains fixed for the playout of our protocol.
Iii-B Learning the best pattern in the dictionary
Consider a probability distribution , where is a quality metric of pattern
. Learning the quality of patterns in the dictionary consists in estimating a “good” probability distributionthat would maximize the expected instantaneous utility:
However, a complication arising in this paper is that the precise form of the utility depends on the transmissions of all other users, and therefore it is unknown to the decision maker.
We will take the standard approach in the literature of Online Learning . The idea is to allow to evolve over time, and at each iteration, to update it in a direction that improves the observed utility from the previous frame. The idea is that the previous frame serves as a “prediction” of what will happen in the next frame.
Here, because the constraint for has the form of a simplex (a constraint ), it is favorable to use the exponentiated gradient, instead of the classical gradient, see . Therefore, our update mechanism is as follows:
where the vector is a subgradient of at , and is the learning rate. Notice that the subgradient at frame is computed based on feedback obtained from the previous frame . Specifically, the subgradient element has a very intuitive explanation as it is equal to the marginal benefit we would have in our expected utility (in the previous frame) if we would increase the probability of selecting pattern . More simply, recall that means that pattern achieves the URLLC objective in frame , then we have :
The learning rate can be controlled to tradeoff how quickly and how accurately we learn. A typical choice in Online Learning is to optimize for the horizon , in which case we should choose:
where is an upper bound for each subgradient element. Hence, . Alternatively, the learning rate can be chosen larger to accelerate convergence (but discount the accuracy of convergence), or smaller to extend the convergence beyond the horizon (but make it more accurate).
Some remarks are in order:
The above algorithm is a variation of the online gradient algorithm of Zinkevich . At each iteration, the utility is considered unknown (due to random or strategic transmissions of the other transmitters), and it is predicted using
which can be computed using the obtained feedback.
A common metric used to quantify the quality of a learning algorithm is its regret, which is defined as
where is the distribution chosen by a candidate algorithm, and is the best distribution if we would know the entire sequence of transmissions of all other transmitters over the entire horizon . Standard results from the literature of online learning tell us that our algorithm minimizes the worst-case regret and achieves , i.e., (1) our algorithm is the best learner in the case that the other transmitters are trying to hurt us, and (2) as frames evolve, we learn the best static distribution .
At this point, we mention that although the other transmitters are not really manipulated by an adversary, our algorithm is so sensitive to changes in the interference that it can optimally adapt to many different scenarios, and in particular to situations that the interference fluctuate in a very abrupt and non-stationary way.
Iv The Learn2MAC Access Protocol
In this section we summarize the design of our online learning-based multiple access protocol. The procedure is shown as Algorithm 1.
Above, we use the following notation:
As a final remark, note that Learn2MAC exploits the fact that the feedback received is the occupancy of the medium at each slot within the frame, therefore can be used to deduce the performance of every transmission pattern (and not the one just used) in the previous frame. This helps significantly speed up the learning process, and therefore the adaptability of the algorithm in changing environments.
V Numerical Analysis
In this Section we illustrate the performance of Learn2MAC and its superiority with respect to baseline random access schemes via simulations. All simulations lasted for frames. The setting here is that each frame has length of slots, a URLLC packet of device is delivered if at least transmissions in the frame were successful, and a device using Learn2MAC has a dictionary of transmission patterns. The weight balancing the importance or latent throughput vs. energy consumption is set to for each device. Finally, the learning rate is set independently of the simulation horizon (which is quite relevant in practice since it may not be easy/possible to know how many frames a user will be active i advance) to . We compare Learn2MAC vs. the use of a standard random access scheme, where the device transmits at each slot independently at random with a probability .
We first verify that a single device using Learn2MAC can adapt to an environment with devices using a pre-existing protocol. For this, we examine two cases: (i)”Static Interference”, where half of the slots of a frame are pre-allocated in a fixed TDMA fashion, and (ii) ”Dynamic Interference” where pre-existing terminals access each slot of the frame randomly, each with a probability that is periodic in time. For a fair comparison, the access probability of the baseline random access scheme is configured so that the energy expenditure is the same in both cases.
Results on the running average URLLC throughput are shown in Figures 2 and 3, respectively. Regarding the first case, Fig. 2 illustrates clearly that Learn2MAC learns to use the pattern which corresponds to transmissions in slots left idle by the background TDMA schedule; we also observed that, moreover, Learn2MAC learns the most efficient such code (i.e. the one with transmissions in idle slots). By contrast, the random access baseline performs very poorly. Regarding the case with dynamic background user activity, Fig. 3 illustrates that Learn2MAC achieves a higher throughput than the random access baseline for the same energy expenditure, therefore adapting transmissions to economically use energy in this case as well.
We then compared the two protocols for the case of uncoordinated medium access; herein, we have devices using Learn2MAC in one case and a random access protocol with transmission probability at each slot 333This value was chosen because it provides a good balance between not even attempting to transmit at least times (due to access probability being too low), thus losing latent throughput, and transmitting too aggressively, thus leading to many collisions. in the other. We run the simulations for frames as above and measure the total URLLC throughput obtained by the system (by summing up the URLLC throughput obtained by each device) at the end of each run for different number of devices . These results are shown in Fig. 4. Remark that, since and there are slots in each frame, the maximun number of devices (scheduled by a centralized controller in non-overlapping slots) successfully transmitting a URLLC packet is per frame, which is the upper bound on the total latent throughput. From Fig. 4 we can observe that, at relatively low and medium load (up to devices), the total latent throughput scales almost linearly with : this means that Learn2MAC enables the devices to learn to use transmission patterns with no or little overlap with respect to each other, thus had few collisions and the devices were able to all coexist and transmit their URLLC packets in almost every frame. By contrast, when random access is used, the throughput obtained is still low due to collisions. When the number of devices approaches , which is the maximum that can be supported, Learn2MAC exhibits the classical behaviour of uncoordinated medium access algorithms - namely rapid decrease in the latent throughput of the system due to collisions (while still outperforming the random access baseline though). This is the regime where admission control is really needed, since the available resources are very close to (or less) than the total needed by the devices and Learn2MAC still leads to many collisions in this case. This result suggests that Learn2MAC should be augmented by a mechanism where devices learn if the system is in the high- or low- load regime, and some devices must learn to completely disconnect from the system if the former is the case. This direction is very interesting from both the algorithmic/theoretical and the practical perspective and we leave it as future work.
In this paper, we proposed Learn2MAC, an Online Learning-based Multiple Access schemes that allows users to decide in a distributed manner which transmission pattern to choose. It is shown that Learn2MAC can provide URLLC guarantees, which is an important limitation of other uncoordinated access schemes, and outperform standard random access both in cases where a single device needs to adapt, in an energy-efficient manner, to an environment with users following pre-existing and in cases where multiple devices need to coordinate using the same protocol. In the latter case, it can enable devices to learn to coordinate with almost latent throughput in cases with high and medium number of resources. Therefore, Learn2MAC is a strong candidate for IoT applications that require at the same time latency guarantees, energy efficiency, and low coordination overhead.
-  “Ericsson mobility report,” June 2018. [Online]. Available: https://www.ericsson.com/en/mobility-report
-  N. Abramson, “The ALOHA system: Another alternative for computer communications,” in Proceedings of the November 17-19, 1970, Fall Joint Computer Conference, ser. AFIPS ’70 (Fall), 1970, pp. 281–285.
-  L. G. Roberts, “ALOHA Packet System with and Without Slots and Capture,” SIGCOMM Computer Communications Review, vol. 5, no. 2, pp. 28–42, Apr. 1975.
-  G. Bianchi, “Performance analysis of the IEEE 802.11 distributed coordination function,” IEEE Journal on Selected Areas in Communications, vol. 18, no. 3, pp. 535–547, March 2000.
-  J. Ni, B. Tan, and R. Srikant, “Q-CSMA: Queue-Length-Based CSMA/CA Algorithms for Achieving Maximum Throughput and Low Delay in Wireless Networks,” IEEE/ACM Transactions on Networking, vol. 20, no. 3, pp. 825–836, June 2012.
-  L. Jiang and J. Walrand, “A Distributed CSMA Algorithm for Throughput and Utility Maximization in Wireless Networks,” IEEE/ACM Transactions on Networking, vol. 18, no. 3, pp. 960–972, Jun. 2010.
-  F. Clazzer, E. Paolini, I. Mambelli, and Č. Stefanović, “Irregular repetition slotted ALOHA over the Rayleigh block fading channel with capture,” in 2017 IEEE International Conference on Communications (ICC), May 2017.
-  A. Destounis and G. S. Paschos, “Complexity of URLLC Scheduling and Efficient Approximation Schemes,” in International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), submitted.
-  A. Destounis, G. S. Paschos, J. Arnau, and M. Kountouris, “Scheduling URLLC users with reliable latency guarantees,” in International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), May 2018.
S. Shalev-Shwartz, “Online Learning and Online Convex
Foundations and Trends® in Machine Learning, vol. 4, no. 2, pp. 107–194, 2012.
-  L. Vigneri, G. Paschos, and P. Mertikopoulos, “Large-Scale Network Utility Maximization: Countering Exponential Growth with Exponentiated Gradients,” in IEEE International Conference on Computer Communications (INFOCOM), May 2019.
-  M. Zinkevich, “Online Convex Programming and Generalized Infinitesimal Gradient Ascent,” in International Conference on International Conference on Machine Learning (ICML), 2003.
-  E. V. Belmega, P. Mertikopoulos, R. Negrel, and L. Sanguinetti, “Online Convex Optimization and No-Regret Learning: Algorithms, Guarantees and Applications,” CoRR, vol. abs/1804.04529, 2018. [Online]. Available: http://arxiv.org/abs/1804.04529