I Introduction
One of the promises of 5G wireless communication systems is to support large scale machine type communication (MTC), that is, to provide connectivity to a massive number of devices as in the Internet of Things [1, 2, 3]. In massive MTC scenarios, the connection density will be up to devices per square kilometer [4]. A key characteristic of MTC is that the user traffic is typically sporadic so that in any given time interval, only a small fraction of devices are active. Also, short packets are the most common form of traffic generated by sensors and devices in MTC. This requires a fundamentally different design than that for supporting sustained highrate mobile broadband communication.
A variety of effective multiple access techniques have been adopted in the previous and current cellular networks, such as frequency division multiple access (FDMA), time division multiple access (TDMA), code division multiple access (CDMA), and orthogonal frequency division multiple access (OFDMA) [5]. For these systems, resource blocks are orthogonally divided in time, frequency, or code domains. This makes signal detection at the access point (AP) fairly simple because the interference between adjacent blocks is minimized. However, due to the limitation of the number of orthogonal resource blocks, it can only support a limited number of devices. To provide connectivity to a massive number of devices, an informationtheoretic paradigm called manyuser access has been studied in [6, 7]. It was shown to be asymptotically optimal for active users to simultaneously transmit their identification signatures followed by their message bearing codewords. Further, various massive access schemes have been proposed; see examples [11, 12, 13, 29, 9, 10, 22, 26, 23, 24, 25, 18, 14, 15, 16, 19, 20, 21, 27, 28, 8, 30, 32, 33, 17, 31, 34] and references therein.
Indeed, active device identification and channel estimation are initial steps to enable message decoding in MTC. Due to the sporadic traffic in MTC, these problems are usually cast as neighbor discover or compressed sensing problems [35, 9, 10, 17, 8, 11, 12, 13, 14, 15, 16, 36, 37, 38, 41, 40, 39]. When the channel coefficients are known at the AP, several compressed sensing schemes were proposed for active device identification [41, 40, 39]. Further, approximate message passing (AMP)based algorithms were applied for joint active device identification and channel estimation [14, 11, 12, 13, 15, 16]. In addition, greedy compressed sensing algorithm was designed for sparse signal recovery based on orthogonal matching pursuit [36, 37].
Another approach to the active device identification is slotted ALOHA. Recently, an enhanced random access scheme called coded slotted ALOHA was proposed in [42, 43], where each message is repeatedly sent over multiple slots, and information is passed between slots to recover messages lost due to collision. These works assume synchronized transmission and perfect interference cancelation. The asynchronous model has been studied in [44, 45]. It is pointed in [46] that slotted ALOHA only supports the detection of a single device within each slot. Instead, the authors in [46] proposed the fold ALOHA scheme such that the decoder can simultaneously decode up to messages in the same slot. By combining with serial interference cancellation, the performance of the fold ALOHA scheme is further improved in [47]. And [48, 33] proposed the fold ALOHA based random access scheme for handing Rayleigh fading channels and asynchronous transmission.
To identify the active device from an enormous number of potential users in the system, each device must be assigned a unique sequence. For a given positive integer , a ReedMuller (RM) code book is generated with up to codewords of length . The code book size is so large that every user is assigned a different signature in any practical system. Given this, RM sequencebased massive access schemes were proposed in [10, 9, 28, 27, 30, 29, 49]. In [27], a chirp detection algorithm for deterministic compressed sensing based on RM codes and associated functions was proposed. The authors in [28] have further enhanced the chirp detection algorithm with the slotting and patching framework. However, the algorithm in [28] works only for the additive white Gaussian noise (AWGN) channel (the channel estimation problem is thus not considered therein). For fading channels, an iterative RM device identification and channel estimation algorithm is adopted in [29] based on the derived nested structured of RM codes. In our previous work [49], we extend the real codebook used in [29] to the full codebook, which encodes more bits with little performance loss. In addition, when the number of active devices is large, the performance of the algorithm in [29] degrades dramatically. In contrast, by adopting slotting and message passing, the algorithm in [49] performs gracefully as the number of active devices increases. It can be proved that the worstcase complexities of RM codesbased detection algorithms are sublinear in the number of codewords, which makes it an attractive algorithm for message decoding in MTC. The above RM detection algorithms are based on the assumption that the signals are synchronized. However, due to propagation delays in the practical environment, asynchrony cannot be ignored.
In this paper, we investigate the joint device identification/decoding and channel estimation in an asynchronous setting. By removing the unrealistic assumption of fully synchronous transmissions, the proposed scheme is an important step towards a practical design. In addition, 5G cellular systems are expected to deploy a large number of antennas to take advantage of massive multipleinput multipleoutput (MIMO) technologies. Massive MIMO has been extensively studied to enable massive connectivity [11, 12, 13, 14, 15, 16, 50, 51, 52]
. It can take advantage of the increased spatial degrees of freedom to support a large number of devices simultaneously. Since the user traffic is sporadic in MTC,
[11, 12, 13, 14, 15, 16] proposes to formulate the device identification problem based on compressed sensing and thereby can be solved by the computationally efficient AMP algorithm. And a new pilot random access protocol called strongestuser collision resolution is proposed in [50, 51] to solve the intracell pilot collision in crowded massive MIMO systems. We note that algorithms using random codes and/or AMP type of decoding do not scale to millions of potential devices. Moreover, the preceding algorithms are based on the assumption that the signals are synchronized. As far as we know, there has been no research publications on asynchronous massive access with a large number of users and antennas. To fill this gap, we investigate asynchronous massive access in a multicell wireless network using ReedMuller codes with many receive antennas which has the potential to support billions of potential devices in the network.The main contributions are summarized as follows:

To enhance the performance, we extend the RM detection algorithms to the case where the AP is deployed with a large number of antennas.

We further describe an enhanced RM coding scheme with slotting and bit partition. The corresponding detection algorithm is referred to as Algorithm 1. We show that the computational complexity and performance of Algorithm 1 are notably improved, which makes it one important step closer to a practical algorithm.

While many papers in the literature study massive access, this work is one of the few that can truly accommodate billions of devices and more.
The remainder of this paper is organized as follows. The system model is presented in Section II. Section III outlines the relationship between the RM sequence and its subsequences, which is the basis of the RM asynchronous decoding algorithm. Section IV displays the enhanced RM decoding algorithm utilizing slotting, message passing, and bit partition. Further, the computation complexity analysis is given in Section V. Section VI presents the numerical results, while Section VII concludes the paper.
Throughout the paper, boldface uppercase letters stand for matrices while boldface lowercase letters represent column vectors. The superscripts , , and denote the transpose, complex conjugate, and conjugate transpose operator, respectively. The complex number field is denoted by . denotes the norm of a vector , denotes the Frobenius norm of a matrix , and denotes the cardinality of set . denotes an identity matrix. represents the rounding function, which returns the smallest integer greater than . means elementwise multiplication and gives the phase angle of a complex number.
Ii System Model
Iia Transmission Scheme
Let denote a large but finite set of devices on the plane with area , where each device is equipped with one antenna. Further we denote as the active device set on the plane, each of which has bits to be sent. Due to the sporadic traffic in MTC, the number of active devices in a given time interval is far less than the total number of devices, i.e., .
Prior to transmission, a message of bits is partitioned into subblocks, where the th subblock consists information bits such that . To patch the information bits in different subblocks together, we adopt the tree encoder proposed in [55]. Specifically, the tree encoder appends parity bits to subblock , where the appended check bits satisfy random parity constraints associated with the message bits contained in the previous subblocks ( in all cases). These parity check bits are needed to patch the information bits in different patches together, but they do not serve the purpose of transmitting the information. All the subblocks have the same size, i.e., .
Assume there are time slots. Each active device randomly selects 2 slots to send its bits. We use bits to encode the location of the primary slot. And we use an arbitrary subset of size to encode a translate, which gives the secondary slot location when it is added to the primary slot location. To distinguish the primary and secondary slots, we fix a single check bit in the information bits to be 0 for the primary slot and 1 for the secondary slot. Thus deducing 1 bit from the total number of bits transmitted. Besides, in this paper, we deal with asynchronous transmission. To estimate the device delay, we assume two information bits to be zeros (To be specified in Section IVB).
Fig. 1 depicts the transmission scheme where we set and for simplicity. In Fig. 1, each device has information bits to be sent. The information bits is first divided into subblocks. Each subblock contains information bits, along with parity bits such that . Then each device sent 2 copies of the bits in 2 randomly selected slots within the slots. Finally we perform the proposed decoding scheme and use tree decoder to patch together the information in different subblocks.
IiB Encoding
Our approach is to encode these bits in each slot to a length secondorder RM codes. A length secondorder RM sequence is determined by a symmetric binary matrix and a binary vector . Since is determined by bits and is determined by bits, each sequence encodes bits. Given the matrixvector pair , the th entry of the RM sequence can be written as [28]
(1) 
where , is the bit binary expression of . Eq. (1) indicates that .
In this case, we have
(2) 
Further, the number of information bits is written as
(3) 
IiC Channel Model
In this paper, we consider OFDM modulation where each symbol consists of subcarriers. Denote the frequency samples of device as where is the subcarrier index. As explained before, is a RM sequence generated by (1). The timedomain OFDM symbol of device can be written as
(4) 
where is the carrier spacing and the symbol duration is and is the maximum device delay. is device ’s sample to be transmitted in subcarrier . denotes the transmit power.
We denote as the active device set that transmitted in time slot . Let , we have since each device randomly choose 2 slots in the time slots. Without loss of generality, we assume the index of the active devices in slot is . We focus on one AP equipped with antennas, and assume that the AP is located at the origin of the plane. The receive signal of the th antenna of the AP at time slot is written as
(5) 
where is the channel vector between device and the AP and is additive white Gaussian noise; is the transmission delay and is the transmit signal of device .
Then the AP samples at time . The total number of samples in each slot is thus where is the length of cyclic prefix. Furthermore, the total codelength can be written as
(6) 
Then the AP discard the first cyclic prefix in the OFDM symbol to form the discretetime receive signal
(7) 
where is the normalized delay;
. We assume the normalized delay is uniformly distributed in
.Performing point DFT on yields
(8)  
(9) 
where
(10) 
and and .
Let . For simplicity, denote as the DFT results in slot .
IiD Propagation Model and Cell Coverage
Consider a multiaccess channel with active devices distributed across the plane according to a homogeneous Poisson point process with intensity . The number of active devices on the plane with its area equal to
is a Poisson random variable with mean
.We further divide the active device set into incell device set (neighbor) and outofthecell device set (nonneighbor) of the AP according to the nominal SNR between the AP and the devices. If the nominal SNR between a device and the AP is larger than a threshold, then this device is considered an incell device of the AP. The purpose of the AP is to identify all incell devices and/or decode their messages, where transmissions from outofthecell devices are regarded as interference.
The smallscale fading between the device and the AP is modeled by an independent Rayleigh random variable with unit mean. The largescale fading is modeled by the freespace path loss which attenuates over distance with some path loss exponent .
Let and denotes the distance and the small scale Rayleigh fading gain between device , and the AP, respectively. Then the channel gain between device and the th antenna of the AP is expressed as
(11) 
where the phase of is uniformly distributed on .
The coverage of the AP can be defined in many different ways. According to [53], device and the AP are neighbors of each other if the channel gain exceeds a certain threshold . Assume device and the AP are neighbors, i.e., , we have . Under the assumption that all devices form a p.p.p., for given , device is uniformly distributed in a disk centered at the AP with radius . The average number of neighbors of the AP is calculated as
(12)  
(13)  
(14)  
(15)  
(16) 
where is the Gamma function and is the indicator function. Eq. (16) indicates that is an increasing function of .
In addition, the sum power of all outofthecell devices can be derived as
(17)  
(18)  
(19)  
(20)  
(21) 
Iii A Property of RM Sequences
Before given the decoding algorithm, we first derive a property of RM sequence, which is the basis of our decoding algorithm.
Let be a given positive number. Let be a binary tuple. For , we have
(22) 
Furthermore, let . For , let the binary matrix be defined recursively as
(23) 
where is the main diagonal elements of , and is a length column vector.
We have the following result.
Proposition 1.
Given a length RM sequence, its order and subsequences satisfy
(24) 
where
(25) 
The vector is a length Walsh sequence with frequency .
Proof:
Remark 1: The structure of the derived RM sequence is similar to that of given in [29]. The differences are two folds: 1) given the code length , compared with the structure given in [29], the structure given in this paper allows us to send more bits of information; 2) the way we splitting the sequences is different.
Iv Device Identification/Decoding and Channel Estimation
In this section, we propose a novel RM asynchronous detection algorithm for active device detection and channel estimation that leverages Proposition 1.
Iva RM Asynchronous Detection Algorithm
According to Fig. 1, the AP decodes the messages in different subblocks in a sequential manner. In each subblock, the AP decodes the messages slotbyslot. Since each device transmits in 2 time slots, the message decoded in the previous time slot will be propagated to another time slot to eliminate its interference.
The detailed algorithm is summarized as in Algorithm 1.
Algorithm 1: RM asynchronous detection algorithm. 
Input: the received signal , the average number of devices in each 
slot. 
for do 
Set , , , , . 
for do 
. 
for do 
if do 
Remove the interference of device in slot and update according to (37). 
. 
end if 
end for 
. 
Denote as the number of detected messages in slot . 
for 
if are not recorded in do 
. 
. 
. 
. 
. 
Calculate the translate according to and update . 
end if 
end for 
end for 
Record , and in each subblock. 
end for 
Output: Using tree decoder to patch the information bits together and output. 
The findPb algorithm in Algorithm 1 returns all the messages transmitted in slot , including the information bits , the channel vector , and the device delay . The findPb algorithm decodes the messages transmitted in slot in a sequential manner. Assume the channel gain of device is the biggest. We will show that device can be first estimated from the received signal (9). After device is detected, the AP performs successive interference cancellation (SIC) to remove the interference of device to detect the remaining devices. This requires the AP to estimate not only the matrixvector pair , but also the device delay . In this paper, to estimate the device delay, we let
(35) 
For simplicity, let , where
(36) 
In the next section, we show how to estimate the messages of device .
IvB The findPb Algorithm
According to (22) and (23), the matrixvector pair is determined by and . Specifically, the matrixvector pair of the th device will be estimated recursively. We will show that the algorithm first estimates , then , and finally the channel coefficient , , and .
Then the receiver signal in slot is updated as
(37) 
for detecting the remaining devices.
IvB1 Estimation of
Define for , (39) and (41) lead to
(42)  
(43) 
where
(44) 
The first term in the righthand side of (43) is a linear combination of Walsh functions , with frequency , which can be recovered by applying WalshHadamard Transformation (WHT). The second term, , is a linear combination of chirps, which can be considered to be distributed across all Walsh functions to equal degree, and therefore these crossterms appear as a uniform noise floor.
Let the Hadamard matrix be and its th elements are . Denote the WHT transformation as , where . The th entry of can be written as
(45)  
(46)  
(47) 
Equation (47) can be further written as
(48) 
Equation (48) indicates that, if we have , peaks will appear at frequency , where the maximum value is . On this basis, can be recovered by searching the largest absolute value of and can be estimated based on .
Furthermore, since , the delay of device can be recovered by the phase angle of the maximum value.
(49) 
IvB2 Estimation of
After recovering , we next estimate in a similar way. Define
(50) 
Under the assumption that and are correctly estimated, according to (39) and (41), is further expressed as
(51) 
where the term
(52) 
consists of all interferences from other devices which are all second order RM sequences and
(53) 
i.e., the variance of the channel noise is reduced by half. Besides, we have
(54) 
which indicates that the equivalent channel gain of the interferences is reduced.