I Introduction & Background
One typical problem with latencyreliability constraints is uplink resource allocation for cyber physical systems [1]. In this problem, multiple control loops share the wireless medium. Each loop is composed of a controller, actuator and a sensor. The controller is located at a central entity while actuator and sensor are both located at the device. The closed control loops outputs actuation decisions in the controller from the input of the sensing information. The devices transmit the sensing information through uplink communication and get actuation decision as downlink communication.
The downlink is broadcast to all actuators without the need of coordination. However, depending on the state of the control loop, only some of the sensors transmit state information through uplink communication. As the transmission depend on the state of the control, the number of devices transmitting at a certain time is unknown. Thus, we have active sensors at a certain time out of total sensors which have to be allocated resources to optimize the control performance. This problem is previously investigated with LTE scheduling consisting of a scenario with multiple inverted pendulums in [2]. However, the solution assumes the information of device activity to overcome the overdimensioning of scheduling. This information is not available in reality and the inefficiency to obtain this information has actually called for a new design of LTE uplink resource allocation mechanism called as grantfree [3], reusing the state of the art in random access area.
Grantfree focuses on a scenario where devices transmit a single packet or multiple replicas to achieve the latencyreliability constraints. This requires overdimensioning of resources to fulfill tight reliability constraints as it lacks the information that is the number of active devices [4]. As a solution to overdimensioning, successive interference cancellation (SIC) is integrated to the random access schemes.
SIC enables recovery of overlapping packets through signal processing. This has increased the throughput of random access algorithms from packets per slot up to packet per slot with asymptotic number of devices, reaching the efficiency of scheduling based solutions. The tradeoff is the decoding complexity. Through edgecloud processing and distributed computing, complexity is expected to be dealt with for radio access algorithms [5].
Successive interference cancellation is initially explored for tree algorithms in [6]. Through that work the throughput for tree algorithms is increased to from . In [6] the clean packet for cancellation is guaranteed with feedback, forcing devices to split from each other. However, too much structure is inefficient and in [7] it is shown that the same structure can be built through random decisions. The random decisions are shaped with a degree distribution tailored to the number of devices. It is shown that the algorithm reaches a throughput of in the asymptotic region when goes to infinity.
Another work [8]
adapts that work to a frameless structure where the degree distribution is replaced with setting a Binomial probability to transmit at each slot. Compared to framed structure the results show that,
[8] has a better performance in the nonasymptotic region. However, neither of these algorithms can provide a hard guarantee on the latency. Also both of them are susceptible to varying number of active devices. The hard guarantees can be provided via setting the decisions uniquely for each device.This problem is initially investigated by Massey under the name ”protocol sequences” for desynchronized devices in [9]. These algorithms are too pessimistic to be applied to tight latency constraints as the offset between devices is the main issue there and it is not the main problem anymore thanks to improvement in hardware design. Recently, the unique decisions for each device for hard guarantees is investigated in [10] under the name ”access codes”, where each device transmits packets with respect to a unique code. The design of these codes is of combinatorial complexity. The results are limited, as we detail on later parts of the paper. Moreover, the use of feedback is neglected in this work.
Uniqueness of the access decisions can be guaranteed through feedback to overcome the complexity of proposed protocol. Using addresses for such limitation is initially proposed by [11] and adapted for RFID tags with Query Tree Algorithms in [12]. However, this algorithm lacks behind in throughput compared to SICcapable algorithms. The idea to use Interference Cancellation for Query Tree Algorithms is introduced in [13]. However, the explanation of the algorithm in [13] is unclear. The throughput they have shown is capped to which have already been shown by [6] for TA with SIC capabilities. Hard guarantees for performance are not investigated and the difference to [6] is unclear.
In our work we propose a novel Successive Interference Cancellation for Query Tree Algorithm, SICQTA. We provide analytical hard upper and lower bounds to latency and compare it with simulations to show the validity. It is shown that the algorithm easily extends to any number of active devices unlike access codes, and it provides a higher throughput compared to previous SIC based works. On top of that, hard latency guarantees make it a suitable candidate as a solution of the uplink resource allocation problem with unknown number of active devices.
Our paper is organized as follows: In Sec. II we explain the scenario and provide the problem formulation for reliable access with latency constraints. In Sec. III we introduce shortly the Query Tree Algorithm and Successive Interference Cancellation Query Tree Algorithm. In Sec. IV the latency bounds are given and we compare our solution to the access codes while comparing the bounds with simulations. Further discussions are given in Sec. V. Finally, the paper is concluded with possible extensions of future work in Sec. VI.
Ii Scenario & Problem
We consider a star topology where the central entity is called the gateway and leaf entities of the star are called devices. We consider an uplink scenario where only devices transmit a packet to the gateway. There are devices attached to the gateway. Considered resources in the system are slots of a single channel with a TDM scheme.
Two different channel models are considered with and without SIC. First one is a collision channel model where perfect reception is assumed. If there is no contention, there is no loss of packets [14]. Second one is for SIC scenario, we assume perfect cancellation is possible if clean packets are received. These assumptions are common in MAC layer research to focus on a layer 2 based solution. Impact of more practical channel models are discussed in Sec. V. Each device is synchronized perfectly to the slots defined by the TDM structure. The devices are randomly and sporadically activated and the number of active devices at any slot is , such that . The devices have a homogeneous radio latency constraint L and reliability constraint R. We investigate the multiple access problem of maximizing throughput that we abstract as maximizing number of successfully used slots.^{1}^{1}1For simplicity we assume that the constraint can be expressed in terms of slots. The reliability constraint here is the radio layer reliability, that can be input to the end to end reliability model.
We define a frame structure consisting of subsequent slots. We investigate the problem of designing codes that represents the binary access decision of a device. The code is of size , i.e., where . The device that has the code will transmit its packet at slot .
The codebook is a collection of all codes and is a matrix with columns and rows where each row represents a unique code for each device. An example is as follows:
with devices and a frame size
. Each device is sporadically active and the activity of all devices is represented with a vector
with elements, i.e., where represents that the device is active. We assume that codebook is ordered such that code of device is in the row of . This assumption allows us to define a frame outcome as in,(1) 
The frame outcome represents the number of packets at each slot of the frame. However, receiver is unaware of this information such that should be converted to MAC layer success outcome . An example for collision channel would be,
(2) 
Using the previous definitions we can define an optimization problem for codebook design.
Given frame size and number of devices , maximize the total success per frame through the codebook design :
(3)  
(4)  
(5)  
(6) 
where is the set of all possible activation combinations of devices, L and R are the latency and reliability constraint respectively. The operation is the autocorrelation operation that also gives the summation of binary vectors. This is naturally a combinatorial problem and hard to solve, as can take any value. We can write , where is the number of simultaneously active devices per frame.
The problem definition is shared here for formalism. In the following part of the paper we show that SICQTA solves this problem with a distributed algorithm that is guided via a central feedback. Optimality of the algorithm is not proven is an open issue for future work.
Iii Algorithms with Feedback
for tree=circle,draw,minimum size=0.5cm,l=1cm,s sep=1cm [A,B,C,D [A,B,edge label=node[midway,left,font=] [A,B,edge label=node[midway,left,font=] [A,edge label=node[midway,left,font=]] [B,edge label=node[midway,left,font=]] ] [,edge label=node[midway,left,font=] ] ] [C,D,edge label=node[midway,left,font=] [C,D,edge label=node[midway,left,font=] [C,edge label=node[midway,left,font=]] [D,edge label=node[midway,left,font=]] ] [,edge label=node[midway,left,font=] ] ]] ; 
for tree=circle,draw,minimum size=0.5cm,l=1cm,s sep=1cm [A,B,C,D [A,B,edge label=node[midway,left,font=] [A,B,edge label=node[midway,left,font=] [A,edge label=node[midway,left,font=]] [B,dotted,edge label=node[midway,left,font=]] ] [,dotted,edge label=node[midway,left,font=] ] ] [C,D,dotted,edge label=node[midway,left,font=] [C,D,edge label=node[midway,left,font=] [C,edge label=node[midway,left,font=]] [D,dotted,edge label=node[midway,left,font=]] ] [,dotted,edge label=node[midway,left,font=] ] ]] ; 
Iiia Query Tree Algorithm
First, we shortly introduce the Contention Tree algorithm. At the start of the algorithm, in binary contention tree algorithm [11] any active device sets and transmit. If more than device is active, the gateway sends a feedback to devices, informing that a collision has happened, and all the active devices do a uniform random selection whether to set and or viceversa. The devices that have set transmit at slot . If again a collision is reported, only those that have transmitted at slot do a random uniform selection for and . Meanwhile, the devices that have previously set , change the values via setting and . Thus, postponing their transmission. The process goes on until all devices have transmitted successfully. Even though this process stochastically guarantees that all access codes are unique, the distribution, representing the latency of devices, has a long tail and is not efficient for high reliability constraints.
To overcome this issue, Query Tree Algorithm (QTA) is suggested in [12]. In QTA every device has a unique id formed of bits. This limits the total number of devices attached to the gateway to . In QTA, queries are used instead of feedback but the overhead is the same. In QTA devices are queried with respect to their id bits. The queries start with an empty query. A single bit is appended to the list of queries after each collision, starting from the leftmost bit. Each new collision append a new bit. As each device has a unique id, this guarantees that two devices have a unique access decision in worstcase after transmissions (if all previous bits are the same for two devices). The gateway implementation of QTA is given in Alg. 1, where the device implementation is only answering to the queries matching its id.
A detailed example is given for in Fig. 0(a). We have named the devices as {A,B,C,D} with ids {000,001,100,101} respectively. Each circle denotes a slot in the tree. The timewise progression of the tree is given with slots above the tree. The id size, is fixed to .
In the first slot, 4 devices transmit at the same time and collide. Next slot, the address is queried. Only, A and B transmit. It is again a collision. On the following slot, the query for address is also a collision so the algorithm moves one level down. The address is queried and both devices transmit. The query for results in an idle slot. Queries for address and is done on slot 6 and 7, respectively and both are successes. The algorithm is completed after the process is repeated for right branch.
IiiB Query Tree Algorithm with SIC (SICQTA)
SIC allows recovery of packets from a slot where a collision is observed. If for instance device A and B have transmitted a packet in slot 1, due to collision channel model, the outcome ”A+B”, is treated as a collision and slot is considered wasted. However, if device B has transmitted its packet in slot 2, the SIC model let us subtract B from ”A+B” and enables recovery of A from slot 1. Instead of breadth first, the SICQTA goes depthfirst. After the initial success, it checks if it can cancel the clean packet from previous collisions. If the packet is successfully cancelled then the algorithm skips the direct siblings of those slots. The algorithmic description of SICQTA is given in Alg. 2.^{2}^{2}2Open source Python implementation of the algorithms is availabe at: https://github.com/tumlkn/sicqta
A detailed example for the worstcase behavior of SICQTA is given in Fig. 0(b) for . In the first slot, all the devices are queried and it is a collision. On the second and third slot, addresses and are queried, respectively. Both are collisions. The following slot, is queried and it is a success. is not queried, as the gateway recovered the packet from slot 2 and 3. This results in as 2 slots are successfully recovered and this slot is a success. Ids in query list : and is not queried and skipped. Thus, is queried, that results in a collision. Following, is queried and is a success. The gateway recovered D from slot 5 and the algorithm is terminated.
Iv Analysis & Evaluation
In this section we will evaluate the latency of QTA and SICQTA and give bounds to its performance. We will also compare the performance of our work and [10] as we share the same problem definition. Finally, mean delay is compared with state of the art in tree algorithms to show that the stability region is extended.
Iva Qta
An upperbound for latency of QTA is given in [15]:
(7) 
where is the number of active devices. This is a tight bound for where with increasing it has a slack. Using the tree structure we can provide a tighter upperbound for latency as,
(8) 
Similarly, the tree structure can be used to provide a lowerbound of latency as:
(9) 
We explain why the example in Fig. 0(a) is the worstcase of a QTA with also shedding light on the proof of the bounds. Four devices are separated into 2 groups of 2 as close as possible to the root of the tree, so they cover as much as nonoverlapping slots as possible. Following, devices have repeated the same collision, until the last level of the tree. We observe that for this scenario the total number of slots is . Using Eq. (7) we get . This shows that the bound is valid and tight for this setting.
IvB Sicqta
Intuitively, the efficiency of the [6] comes from the possibility to skip some slots in the tree. As it is shown in [6], the throughput of BTA is doubled. However, the throughput is the expected number of slots and this result cannot be directly translated to worstcase latency of SICQTA from QTA. We have to adapt the Eq. (7) for SICQTA using the skipping capability of SIC. The total number of skipped slots compared to worstcase of QTA, given active devices can be written as,
(10) 
The proof is given in C.
We can use this finding to provide an upperbound for latency of SICQTA using Eq. (8) and removing the skipped slots,
(11) 
Intuitively, the algorithm needs at least slots for active devices and a lowerbound for latency of SICQTA can be given as
This is given without any proof, as in bestcase no repetition occurs such that every slot is recoverable from another.
The upperbound for latency can be used for the throughput calculation of the SICQTA. If number of active devices is the same as the number of total devices, i.e., . Then we expect SICQTA to have a throughput of 1, as each slot in the tree should be different from one another.
Eq. (11) is a relaxed bound, but it becomes tight for integer values of . Plugging in we get,
(12) 
Thus, we have a throughput of as expected. The proof is given in App. D.
We can check the bound via the example in Fig. 0(b). We see that in total slots are used for SICQTA in the example. Using Eq. (11) we get showing that the bound is valid and tight for this scenario.
In Tab. I we have compared the number of devices supported by CACSIC with SICQTA. The number of active devices are fixed to for CAC, because these are the only available results in [10]. For SICQTA, we see that with relaxed delay constraint the number of devices supported increases exponentially. And even though the results are similar for low latency constraints, the difference increases with increasing L. Also the results for SICQTA is easily extensible to other values, while an exhaustive search is required to build codes for CACSIC. On the other hand effect of feedback is neglected in this analysis.
IvC Simulations
We have done Monte Carlo experiments on a python based discrete event simulator samples for each experiment varying the number of active devices.
In Fig. 2 we have plotted the bounds versus simulation for SICQTA. xaxis depicts the varying active number of devices and the yaxis presents the latency. We have set so implicitly , and we have varied the number of active devices . We see that with iterations for each data point in simulations the bounds are never surpassed and the difference between the lower and the upper bound is quite low.
As we deal with worstcase latency, this is the latency of the last device. In Fig. 1(b) we have evaluated the throughput with varying active number of devices . Mean throughput is almost always above while the tail is also quite constrained, especially with increasing .
In Fig. 3 we extend the delay vs throughput comparison in [6] with SICQTA. In this simulation scenario continous arrivals are considered. If a device gets a packet to transmit while there is an ongoing resolution, the device is queued until the end of that resolution, reflecting the setting in [6]. We see that SICQTA enables a new throughput region that extends to throughput of with . Also with the throughput with stable latency is around . Of course SICQTA becomes similar to SICTA with increasing value. This is logical as SICTA can be considered as a special setting of SICQTA with . Here, it is shown that with the behavior is almost the same as SICTA. It is worth mentioning that the average resolution time is increased as we see a shift on the yaxis compared to SICTA. We have also simulated higher values of , i.e., and did not observe any difference so they are not plotted here to avoid clutter. For decreasing the throughput is expected to increase further reaching .
V Discussions
Constraint  L  L  L  L 

CACSIC [10]  
SICQTA  
SICQTA 
One important point for SICQTA compared to QTA is that the knowledge of does not improve the upperbound of latency. The knowledge of would be used in this case to skip to level . However, in the worstcase all collisions happening before this level consist of different devices, and under a SIC framework, they can all be recovered from each other to obtain useful slots. So the number of skipped slots with knowledge of would be equal to those skipped due to SIC. However, application of knowledge of to QTA can improve the worstcase performance and bring it close to SICQTA.
We have compared the feedback based algorithms to nonfeedback based algorithms here. However, we assumed that the feedback is instantaneous and costless. In reality that is not the case. The latency incurred due to transmission and reception may even involve hardware delays such as switching from transmit to receive and viceversa. We leave this open for future work.
We are also working on prototyping this algorithm through IEEE 802.15.4 capable sensors and SDRs. One observation we have is that depending on the quality of the sensor device, the phase noise accumulates through successive interference cancellation and this makes collisions of 6 packets, a wasted slot as cancellation fails due to accumulated phase noise. Algorithmic solutions, such as starting the queries from level ^{3}^{3}3This will cap the maximum number of collided devices to ., should be considered to overcome such hardware constraints. Curious reader can refer to [16]
for a theoretical model that incorporates variances in the hardware to the SIC capacity and to
[17] for practical characterization of causes for hardware variances.Vi Conclusion
In this work we have evaluated the problem of uplink resource allocation for unknown number of active devices. We believe that this problem represents the important uplink resource allocation problem for multiple control loops sharing the same wireless network. As a solution we present the algorithm Successive Interference Cancellation Query Tree Algorithm (SICQTA). The advantage of the algorithm compared to previous algorithms is the highthroughput performance and the hard latency guarantees. The bounds for the performance are proven analytically and further validated with simulations.
Future work can investigate relaxing the assumptions made for easy investigation of the protocol. Firstly, the feedback is assumed instantaneous and costless, accumulation of feedback messages should be considered to decrease this bottleneck as much as possible. Secondly, we assumed that SIC works perfectly. However, due to accumulated phase noise some collisions cannot be recovered via SIC and indeed result in wasted slots. This should be evaluated and incorporated into the protocol design. Thirdly, even though it is intuitive that decreasing latency and increasing reliability helps for the cyberphysical systems, an integrated evaluation of control and communication should be done to provide concrete results.
a Proof for upperbound for latency of QTA
The worstcase for QTA is illustrated in Fig. 4. An intuitive explanation is as follows: A device can retransmit at maximum times in the worstcase as that is the size of addresses and every device has a unique address. In this case the device is successful with the transmission and it has experienced collisions. In order to have a collision we need at least devices, and at the worstcase all devices are grouped into two, thus groups. Each group collides separately for times, where there will be idles on the unexplored slots so slots, followed with 2 transmissions for success of each device, we get
(13) 
slot uses in total. We take into account, the activity of the groups of two only after the level . As the initial levels have a lot of overlap, we can remove these levels and consider them separately as
(14) 
where represents the overlapping slots. The number of overlapping slots can be calculating by summing the total number of slots up to level of the tree. We can calculate the total number of nodes in this upper part of the tree as,
(15) 
Plugging this in Eq. (14) we get,
(16)  
(17) 
B Proof for lower bound for latency of QTA
The bestcase in the tree with devices, is that they are organized as a triangle, guaranteeing they are as close as possible to the root. So the level of the successes are almost the same. However, the level of the devices can be the same only if is an integer. If it is not an integer, the bestcase would be some of the devices are successful at level and the others are at . In order to have a complete triangle we would need that devices at level would each have 2 children at . So the number of slots at is equal to the sum of number of devices at plus twice the number of devices at . The number of slots at a level can also be written as so we can write,
(18) 
where and is the number of devices successful in and respectively. We know that the total number of devices is . so we can rewrite Eq. (18) as
(19) 
If we do not consider the level , the tree is a full triangle up to level . We can calculate the total number of slots in the tree for the bestcase through calculating the number of slots for the full tree up to and adding
(20) 
By definition of flooring and ceiling operation if is not an integer. And we can plug Eq. (19) in to get,
(21) 
When is an integer the lowerbound is directly given with , which is equal to the result so we do not mention it separately.
C Proof for number of skipped slots
The skipping in SICQTA consists of two different parts. First part is skipping the idles and second part is skipping the canceled slots . So we can write .
The upperbound for latency of QTA is derived using groups of 2 devices sticking together until the last level of the tree. At the last level they transmit separately, each as a success. The idles occur after separation from the top triangle until the end of the tree. We have collisions and the number of levels until the end of the tree gives us the number of skipped idle slots as
(22) 
Thanks to SIC, after one success the other device does not have to transmit anymore, as after one success the other device can be recovered from the previous collision. Thus, at least slots are skipped for the last level of the tree.
This skipping can be applied to also formation of groups of 2. Groups of 2 are formed from groups of 4. Thus, for the first group formed out of 4 devices, the other group can be recovered from the collision, so one slot can be saved for each separation. In this step we can save slots. This logic can be extended up to separations as we have a binary splitting process. This gives us,
(23) 
Finally, we can write,
(24) 
D Proof for number of skipped slots with
We plug in to Eq. (23)
(25) 
References
 [1] P. Tabuada, “Eventtriggered realtime scheduling of stabilizing control tasks,” IEEE Transactions on Automatic Control, vol. 52, no. 9, pp. 1680–1685, 2007.
 [2] M. Vilgelm, O. Ayan, S. Zoppi, and W. Kellerer, “Controlaware uplink resource allocation for cyberphysical systems in wireless networks,” in European Wireless 2017; 23th European Wireless Conference; Proceedings of. VDE, 2017, pp. 1–7.
 [3] “3GPP RP181477: SID on Physical Layer Enhancements for NR URLLC; NR eURLLC L1 ,” 2018.

[4]
M. Gürsu, B. Köprü, S. Coleri Ergen, and W. Kellerer, “Multiplicity estimating random access protocol for resource efficiency in contention based noma,” in
Personal, Indoor and Mobile Radio Communications (PIMRC 18), 2018.  [5] H. ElSayed, S. Sankar, M. Prasad, D. Puthal, A. Gupta, M. Mohanty, and C.T. Lin, “Edge of things: the big picture on the integration of edge, iot and the cloud in a distributed computing environment,” IEEE Access, vol. 6, pp. 1706–1717, 2018.
 [6] Y. Yu and G. B. Giannakis, “Sicta: a 0.693 contention tree algorithm using successive interference cancellation,” in INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE, vol. 3. IEEE, 2005, pp. 1908–1916.
 [7] G. Liva, “Graphbased analysis and optimization of contention resolution diversity slotted aloha,” IEEE Transactions on Communications, vol. 59, no. 2, pp. 477–487, 2011.
 [8] C. Stefanovic, P. Popovski, and D. Vukobratovic, “Frameless aloha protocol for wireless networks,” IEEE Communications Letters, vol. 16, no. 12, pp. 2087–2090, 2012.
 [9] J. Massey and P. Mathys, “The collision channel without feedback,” IEEE Transactions on Information Theory, vol. 31, no. 2, pp. 192–204, 1985.
 [10] C. Boyd, R. Vehkalahti, and O. Tirkkonen, “Interference cancelling codes for ultrareliable random access,” International Journal of Wireless Information Networks, vol. 25, no. 4, pp. 422–433, Dec 2018. [Online]. Available: https://doi.org/10.1007/s1077601804116
 [11] J. Capetanakis, “Tree algorithms for packet broadcast channels,” IEEE transactions on information theory, vol. 25, no. 5, pp. 505–515, 1979.
 [12] J. H. Choi, D. Lee, and H. Lee, “Query treebased reservation for efficient rfid tag anticollision,” IEEE Communications Letters, vol. 11, no. 1, 2007.
 [13] R. Kumar, T. F. La Porta, G. Maselli, and C. Petrioli, “Interference cancellationbased rfid tags identification,” in Proceedings of the 14th ACM international conference on Modeling, analysis and simulation of wireless and mobile systems. ACM, 2011, pp. 111–118.
 [14] S. Ghez, S. Verdu, and S. C. Schwartz, “Stability properties of slotted aloha with multipacket reception capability,” IEEE Transactions on Automatic Control, vol. 33, no. 7, pp. 640–649, 1988.
 [15] C. Law, K. Lee, and K.Y. Siu, “Efficient memoryless protocol for tag identification,” in Proceedings of the 4th international workshop on Discrete algorithms and methods for mobile computing and communications. ACM, 2000, pp. 75–84.
 [16] S. P. Weber, J. G. Andrews, X. Yang, and G. de Veciana, “Transmission capacity of wireless ad hoc networks with successive interference cancellation,” IEEE Transactions on Information Theory, vol. 53, no. 8, pp. 2799–2814, Aug 2007.
 [17] G. Zhou, T. He, S. Krishnamurthy, and J. A. Stankovic, “Models and solutions for radio irregularity in wireless sensor networks,” ACM Transactions on Sensor Networks (TOSN), vol. 2, no. 2, pp. 221–262, 2006.