One of the promises of 5G wireless communication systems is to support large scale machine- type communication (MTC), that is, to provide connectivity to a massive number of devices as in the Internet of Things [1, 2, 3]. In massive MTC scenarios, the connection density will be up to devices per square kilometer . A key characteristic of MTC is that the user traffic is typically sporadic so that in any given time interval, only a small fraction of devices are active. Also, short packets are the most common form of traffic generated by sensors and devices in MTC. This requires a fundamentally different design than that for supporting sustained high-rate mobile broadband communication.
A variety of effective multiple access techniques have been adopted in the previous and current cellular networks, such as frequency division multiple access (FDMA), time division multiple access (TDMA), code division multiple access (CDMA), and orthogonal frequency division multiple access (OFDMA) . For these systems, resource blocks are orthogonally divided in time, frequency, or code domains. This makes signal detection at the access point (AP) fairly simple because the interference between adjacent blocks is minimized. However, due to the limitation of the number of orthogonal resource blocks, it can only support a limited number of devices. To provide connectivity to a massive number of devices, an information-theoretic paradigm called many-user access has been studied in [6, 7]. It was shown to be asymptotically optimal for active users to simultaneously transmit their identification signatures followed by their message bearing codewords. Further, various massive access schemes have been proposed; see examples [11, 12, 13, 29, 9, 10, 22, 26, 23, 24, 25, 18, 14, 15, 16, 19, 20, 21, 27, 28, 8, 30, 32, 33, 17, 31, 34] and references therein.
Indeed, active device identification and channel estimation are initial steps to enable message decoding in MTC. Due to the sporadic traffic in MTC, these problems are usually cast as neighbor discover or compressed sensing problems [35, 9, 10, 17, 8, 11, 12, 13, 14, 15, 16, 36, 37, 38, 41, 40, 39]. When the channel coefficients are known at the AP, several compressed sensing schemes were proposed for active device identification [41, 40, 39]. Further, approximate message passing (AMP)-based algorithms were applied for joint active device identification and channel estimation [14, 11, 12, 13, 15, 16]. In addition, greedy compressed sensing algorithm was designed for sparse signal recovery based on orthogonal matching pursuit [36, 37].
Another approach to the active device identification is slotted ALOHA. Recently, an enhanced random access scheme called coded slotted ALOHA was proposed in [42, 43], where each message is repeatedly sent over multiple slots, and information is passed between slots to recover messages lost due to collision. These works assume synchronized transmission and perfect interference cancelation. The asynchronous model has been studied in [44, 45]. It is pointed in  that slotted ALOHA only supports the detection of a single device within each slot. Instead, the authors in  proposed the -fold ALOHA scheme such that the decoder can simultaneously decode up to messages in the same slot. By combining with serial interference cancellation, the performance of the -fold ALOHA scheme is further improved in . And [48, 33] proposed the -fold ALOHA based random access scheme for handing Rayleigh fading channels and asynchronous transmission.
To identify the active device from an enormous number of potential users in the system, each device must be assigned a unique sequence. For a given positive integer , a Reed-Muller (RM) code book is generated with up to codewords of length . The code book size is so large that every user is assigned a different signature in any practical system. Given this, RM sequence-based massive access schemes were proposed in [10, 9, 28, 27, 30, 29, 49]. In , a chirp detection algorithm for deterministic compressed sensing based on RM codes and associated functions was proposed. The authors in  have further enhanced the chirp detection algorithm with the slotting and patching framework. However, the algorithm in  works only for the additive white Gaussian noise (AWGN) channel (the channel estimation problem is thus not considered therein). For fading channels, an iterative RM device identification and channel estimation algorithm is adopted in  based on the derived nested structured of RM codes. In our previous work , we extend the real codebook used in  to the full codebook, which encodes more bits with little performance loss. In addition, when the number of active devices is large, the performance of the algorithm in  degrades dramatically. In contrast, by adopting slotting and message passing, the algorithm in  performs gracefully as the number of active devices increases. It can be proved that the worst-case complexities of RM codes-based detection algorithms are sub-linear in the number of codewords, which makes it an attractive algorithm for message decoding in MTC. The above RM detection algorithms are based on the assumption that the signals are synchronized. However, due to propagation delays in the practical environment, asynchrony cannot be ignored.
In this paper, we investigate the joint device identification/decoding and channel estimation in an asynchronous setting. By removing the unrealistic assumption of fully synchronous transmissions, the proposed scheme is an important step towards a practical design. In addition, 5G cellular systems are expected to deploy a large number of antennas to take advantage of massive multiple-input multiple-output (MIMO) technologies. Massive MIMO has been extensively studied to enable massive connectivity [11, 12, 13, 14, 15, 16, 50, 51, 52]
. It can take advantage of the increased spatial degrees of freedom to support a large number of devices simultaneously. Since the user traffic is sporadic in MTC,[11, 12, 13, 14, 15, 16] proposes to formulate the device identification problem based on compressed sensing and thereby can be solved by the computationally efficient AMP algorithm. And a new pilot random access protocol called strongest-user collision resolution is proposed in [50, 51] to solve the intra-cell pilot collision in crowded massive MIMO systems. We note that algorithms using random codes and/or AMP type of decoding do not scale to millions of potential devices. Moreover, the preceding algorithms are based on the assumption that the signals are synchronized. As far as we know, there has been no research publications on asynchronous massive access with a large number of users and antennas. To fill this gap, we investigate asynchronous massive access in a multi-cell wireless network using Reed-Muller codes with many receive antennas which has the potential to support billions of potential devices in the network.
The main contributions are summarized as follows:
To enhance the performance, we extend the RM detection algorithms to the case where the AP is deployed with a large number of antennas.
We further describe an enhanced RM coding scheme with slotting and bit partition. The corresponding detection algorithm is referred to as Algorithm 1. We show that the computational complexity and performance of Algorithm 1 are notably improved, which makes it one important step closer to a practical algorithm.
While many papers in the literature study massive access, this work is one of the few that can truly accommodate billions of devices and more.
The remainder of this paper is organized as follows. The system model is presented in Section II. Section III outlines the relationship between the RM sequence and its subsequences, which is the basis of the RM asynchronous decoding algorithm. Section IV displays the enhanced RM decoding algorithm utilizing slotting, message passing, and bit partition. Further, the computation complexity analysis is given in Section V. Section VI presents the numerical results, while Section VII concludes the paper.
Throughout the paper, boldface uppercase letters stand for matrices while boldface lowercase letters represent column vectors. The superscripts , , and denote the transpose, complex conjugate, and conjugate transpose operator, respectively. The complex number field is denoted by . denotes the -norm of a vector , denotes the Frobenius norm of a matrix , and denotes the cardinality of set . denotes an identity matrix. represents the rounding function, which returns the smallest integer greater than . means element-wise multiplication and gives the phase angle of a complex number.
Ii System Model
Ii-a Transmission Scheme
Let denote a large but finite set of devices on the plane with area , where each device is equipped with one antenna. Further we denote as the active device set on the plane, each of which has bits to be sent. Due to the sporadic traffic in MTC, the number of active devices in a given time interval is far less than the total number of devices, i.e., .
Prior to transmission, a message of bits is partitioned into sub-blocks, where the -th sub-block consists information bits such that . To patch the information bits in different sub-blocks together, we adopt the tree encoder proposed in . Specifically, the tree encoder appends parity bits to sub-block , where the appended check bits satisfy random parity constraints associated with the message bits contained in the previous sub-blocks ( in all cases). These parity check bits are needed to patch the information bits in different patches together, but they do not serve the purpose of transmitting the information. All the sub-blocks have the same size, i.e., .
Assume there are time slots. Each active device randomly selects 2 slots to send its bits. We use bits to encode the location of the primary slot. And we use an arbitrary subset of size to encode a translate, which gives the secondary slot location when it is added to the primary slot location. To distinguish the primary and secondary slots, we fix a single check bit in the information bits to be 0 for the primary slot and 1 for the secondary slot. Thus deducing 1 bit from the total number of bits transmitted. Besides, in this paper, we deal with asynchronous transmission. To estimate the device delay, we assume two information bits to be zeros (To be specified in Section IV-B).
Fig. 1 depicts the transmission scheme where we set and for simplicity. In Fig. 1, each device has information bits to be sent. The information bits is first divided into sub-blocks. Each sub-block contains information bits, along with parity bits such that . Then each device sent 2 copies of the bits in 2 randomly selected slots within the slots. Finally we perform the proposed decoding scheme and use tree decoder to patch together the information in different sub-blocks.
Our approach is to encode these bits in each slot to a length second-order RM codes. A length second-order RM sequence is determined by a symmetric binary matrix and a binary vector . Since is determined by bits and is determined by bits, each sequence encodes bits. Given the matrix-vector pair , the -th entry of the RM sequence can be written as 
where , is the -bit binary expression of . Eq. (1) indicates that .
In this case, we have
Further, the number of information bits is written as
Ii-C Channel Model
In this paper, we consider OFDM modulation where each symbol consists of subcarriers. Denote the frequency samples of device as where is the subcarrier index. As explained before, is a RM sequence generated by (1). The time-domain OFDM symbol of device can be written as
where is the carrier spacing and the symbol duration is and is the maximum device delay. is device ’s sample to be transmitted in subcarrier . denotes the transmit power.
We denote as the active device set that transmitted in time slot . Let , we have since each device randomly choose 2 slots in the time slots. Without loss of generality, we assume the index of the active devices in slot is . We focus on one AP equipped with antennas, and assume that the AP is located at the origin of the plane. The receive signal of the -th antenna of the AP at time slot is written as
where is the channel vector between device and the AP and is additive white Gaussian noise; is the transmission delay and is the transmit signal of device .
Then the AP samples at time . The total number of samples in each slot is thus where is the length of cyclic prefix. Furthermore, the total codelength can be written as
Then the AP discard the first cyclic prefix in the OFDM symbol to form the discrete-time receive signal
where is the normalized delay;
. We assume the normalized delay is uniformly distributed in.
Performing point DFT on yields
and and .
Let . For simplicity, denote as the DFT results in slot .
Ii-D Propagation Model and Cell Coverage
Consider a multiaccess channel with active devices distributed across the plane according to a homogeneous Poisson point process with intensity . The number of active devices on the plane with its area equal to
is a Poisson random variable with mean.
We further divide the active device set into in-cell device set (neighbor) and out-of-the-cell device set (non-neighbor) of the AP according to the nominal SNR between the AP and the devices. If the nominal SNR between a device and the AP is larger than a threshold, then this device is considered an in-cell device of the AP. The purpose of the AP is to identify all in-cell devices and/or decode their messages, where transmissions from out-of-the-cell devices are regarded as interference.
The small-scale fading between the device and the AP is modeled by an independent Rayleigh random variable with unit mean. The large-scale fading is modeled by the free-space path loss which attenuates over distance with some path loss exponent .
Let and denotes the distance and the small scale Rayleigh fading gain between device , and the AP, respectively. Then the channel gain between device and the -th antenna of the AP is expressed as
where the phase of is uniformly distributed on .
The coverage of the AP can be defined in many different ways. According to , device and the AP are neighbors of each other if the channel gain exceeds a certain threshold . Assume device and the AP are neighbors, i.e., , we have . Under the assumption that all devices form a p.p.p., for given , device is uniformly distributed in a disk centered at the AP with radius . The average number of neighbors of the AP is calculated as
where is the Gamma function and is the indicator function. Eq. (16) indicates that is an increasing function of .
In addition, the sum power of all out-of-the-cell devices can be derived as
Iii A Property of RM Sequences
Before given the decoding algorithm, we first derive a property of RM sequence, which is the basis of our decoding algorithm.
Let be a given positive number. Let be a binary -tuple. For , we have
Furthermore, let . For , let the binary matrix be defined recursively as
where is the main diagonal elements of , and is a length column vector.
We have the following result.
Given a length- RM sequence, its order and sub-sequences satisfy
The vector is a length- Walsh sequence with frequency .
Recall is the -bit expression of . For , the vector can be decomposed as
Remark 1: The structure of the derived RM sequence is similar to that of given in . The differences are two folds: 1) given the code length , compared with the structure given in , the structure given in this paper allows us to send more bits of information; 2) the way we splitting the sequences is different.
Iv Device Identification/Decoding and Channel Estimation
In this section, we propose a novel RM asynchronous detection algorithm for active device detection and channel estimation that leverages Proposition 1.
Iv-a RM Asynchronous Detection Algorithm
According to Fig. 1, the AP decodes the messages in different sub-blocks in a sequential manner. In each sub-block, the AP decodes the messages slot-by-slot. Since each device transmits in 2 time slots, the message decoded in the previous time slot will be propagated to another time slot to eliminate its interference.
The detailed algorithm is summarized as in Algorithm 1.
|Algorithm 1: RM asynchronous detection algorithm.|
|Input: the received signal , the average number of devices in each|
|Set , , , , .|
|Remove the interference of device in slot and update according to (37).|
|Denote as the number of detected messages in slot .|
|if are not recorded in do|
|Calculate the translate according to and update .|
|Record , and in each sub-block.|
|Output: Using tree decoder to patch the information bits together and output.|
The findPb algorithm in Algorithm 1 returns all the messages transmitted in slot , including the information bits , the channel vector , and the device delay . The findPb algorithm decodes the messages transmitted in slot in a sequential manner. Assume the channel gain of device is the biggest. We will show that device can be first estimated from the received signal (9). After device is detected, the AP performs successive interference cancellation (SIC) to remove the interference of device to detect the remaining devices. This requires the AP to estimate not only the matrix-vector pair , but also the device delay . In this paper, to estimate the device delay, we let
For simplicity, let , where
In the next section, we show how to estimate the messages of device .
Iv-B The findPb Algorithm
According to (22) and (23), the matrix-vector pair is determined by and . Specifically, the matrix-vector pair of the th device will be estimated recursively. We will show that the algorithm first estimates , then , and finally the channel coefficient , , and .
Then the receiver signal in slot is updated as
for detecting the remaining devices.
Iv-B1 Estimation of
The first term in the right-hand side of (43) is a linear combination of Walsh functions , with frequency , which can be recovered by applying Walsh-Hadamard Transformation (WHT). The second term, , is a linear combination of chirps, which can be considered to be distributed across all Walsh functions to equal degree, and therefore these cross-terms appear as a uniform noise floor.
Let the Hadamard matrix be and its -th elements are . Denote the WHT transformation as , where . The -th entry of can be written as
Equation (47) can be further written as
Equation (48) indicates that, if we have , peaks will appear at frequency , where the maximum value is . On this basis, can be recovered by searching the largest absolute value of and can be estimated based on .
Furthermore, since , the delay of device can be recovered by the phase angle of the maximum value.
Iv-B2 Estimation of
After recovering , we next estimate in a similar way. Define
where the term
consists of all interferences from other devices which are all second order RM sequences and
i.e., the variance of the channel noise is reduced by half. Besides, we have
which indicates that the equivalent channel gain of the interferences is reduced.