I Introduction
Providing efficient support for the Internet of Things (IoT) and Industry IoT (IIoT) [1] is one of the major objectives for the fifth generation (5G) and the nextgeneration 6G cellular wireless communication [2]. In order to realize the physical information exchange in IoT, MachinetoMachine (M2M) communication is anticipated to support billions of Machine Type Communication (MTC) devices [3]
. In addition, for most IoT applications such as smart metering and intelligent transportation, the MTC devices are intermittently activated with a low probability, and the size of the data packets transmitted by active devices is relatively small
[4, 5]. Therefore, the random access (RA) process for M2M communications in IoT is characterized by massive connection and sporadic transmission, as well as smallsized data packets.Confronted with the characteristics described above, conventional orthogonal multiple access (OMA) schemes, where orthogonal resource blocks (RBs) are assigned to each device, becomes infeasible due to its extremely low resource efficiency. To address this problem, different RA schemes have been proposed, where activated devices can share or contend for the uplink resources, and these RA schemes can be generally categorized into two types: grantbased RA [6, 7, 8, 9, 10, 11, 12] and grantfree RA [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28].
Ia GrantBased Random Access
In grantbased RA schemes, activated users contend for the RBs by transmitting a preamble sequence to BS. At the BS, a RB is assigned to the activated user, whose preamble sequence is received and accepted by the BS.
One problem with the grantbased RA schemes is that the RB is wasted when more than one active device transmit the same preamble sequence. This problem is worsened with the massive connection in crowded RA scenarios. Some solutions were proposed to alleviate the RA congestion by reducing the collision probability, such as the Access Class Barring (ACB) scheme [6], delicate splitting of the RA preamble set [7], and automatic configuration of the RA parameters [8]. Other works addressed the preamble shortage [9, 10] or the early preamble collision detection to avoid RB wastage [11, 12].
However, the RB wastage cannot be fully avoided by grantbased RA schemes, especially in crowded RA scenarios, which results in low resource efficiency. Furthermore, a handshaking process is required between the BS and active devices to recognize the contention winner, which undermines the uplink transmission efficiency of small data packets.
IB GrantFree Random Access
In grantfree RA schemes, activated users directly transmit data packets with encapsulated pilot sequences over shared uplink resources. Therefore, no grant is required from the BS for uplink transmission, which improves the transmission efficiency, especially for smallsized data packets. Note that in grantbased RA, the BS can acquire the ID of active users during the handshaking process, so that the activity of each user can be easily recognized. By contrast, in grantfree RA, the BS has no priori knowledge of each user’s activity before the transmission of data packets. Therefore, it is necessary to exploit encapsulated pilot sequences for user activity detection (UAD) and channel estimation (CE), so that the BS can identify each active user and further accomplish data decoding. Different grantfree RA schemes are explained as follows.
IB1 Slotted ALOHA based grantfree RA Schemes
In slotted ALOHA based grantfree RA schemes, activated devices directly transmit data packets in randomly chosen slots while data packets experiencing no slot collision can be correctly decoded [13]. In order to improve the throughput of slotted ALOHA protocols, the contention resolution diversity slotted ALOHA (CRDSA) protocol [14] was proposed, where each activated device sends two replicas of the same data packet and successive interference cancellation (SIC) technique is employed to recover the collided replicas. The CRDSA protocol introduced a (2,1) repetition code to the conventional slotted ALOHA protocol while the coded slotted ALOHA (CSA) protocol [15] further combined the general packet erasure codes with SIC. According to the analogy between the channel codes and the data packet replica, the frameless IRSA protocol [16] and the spatially coupled RA protocol [17] were proposed, corresponding to the rateless codes and the spatially coupled LDPC codes, respectively.
Although different modifications have been proposed to improve the throughput of slotted ALOHA protocols, the severe slot collision always leads to the failure of SIC, which hinders the application of slotted ALOHA based protocols in crowded RA scenarios.
IB2 Compressed sensingbased grantfree RA schemes
Due to the sporadic transmission feature of M2M communications, the RA process can be formulated as a sparse signal recovery problem. Therefore, different compressed sensing (CS) algorithms employ pilot sequences to accomplish the user activity detection (UAD) and/or channel estimation (CE) problem. For example, a block CS algorithm [18] was proposed for distributed UAD and resource allocation, based on the clustering of devices. The joint UAD and CE problem was addressed by a modified Bayesian compressed sensing algorithm [19] for the cloud radio access network (CRAN). In addition, the powerful approximate message passing (AMP) algorithm was employed for the joint UAD and CE problem when the BS is equipped either with a single antenna [20, 21] or with multiple antennas [22, 23].
IB3 Sparse Bayesian learningbased grantfree RA schemes
Different from CSbased RA schemes, the sparse Bayesian learning (SBL) algorithm further considers the prior hyperparameter of the sparse signal to address the joint UAD and CE problem. The Expectation Maximization (EM) method was employed by the AMPSBL algorithm
[24] to update the sparse signal and the hyperparameter iteratively. A least square (LS)based AMPSBL (LSAMPSBL) algorithm [25] was proposed to recover the sparse signal in three steps, i.e., the AMPSBL first provides a coarse estimate, followed by the derivation for the support of the sparse channel, while the final estimate is derived by the support and the LS estimation. The joint UAD and data decoding task for a grantfree NOMA system was modeled and addressed by a deep autoencoder in [26], which exploits the prior information of user activation probability and optimizes the choice of spreading signature. However, this autoencoder is trained based on known channel state information at receiver and a sufficiently static channel, which impedes it application to the rapidly changing environment. Recently, a messagepassing receiver design was proposed for the joint channel estimation and data decoding in uplink grantfree SCMA systems [27], exploiting variational Bayesian inference to extract the sparse signals. In addition, a message passingbased block sparse Bayesian learning (MPBSBL) algorithm
[28] was proposed for a grantfree NOMA system. In both [27] and [28], nonzero elements in the sparse signals are updated with Gaussian message passing [29, 30, 31] and the hyperparameters are updated with meanfield (MF) message passing [32].The Gaussian message passing [29, 30, 31] is constructed on a factor graph, and it is proven to be optimal on cyclefree graphs. However, this factor graph is denselyconnected under crowded RA scenarios, which causes correlated messages and convergence problem for the Gaussian message passing in MPBSBL algorithm [28]. As a result, the CE and UAD accuracy of the MPBSBL algorithm is undermined under crowded RA scenarios. To address this problem, we impose weighting parameters on the Gaussian messages and MF message update in the MPBSBL algorithm. In order to facilitate the training of these weighting parameters, the edgetype message passing process on the factor graph is transformed to a nodetype one in the deep neural network (DNN) [33, 34]. Then these weights in the DNN can be trained in a datadriven manner to mitigate the message correlation and the convergence problem in the MPBSBL algorithm. In this way, higher UAD and CE accuracy can be achieved for crowded RA scenarios.


Abbr.  Meaning 
DNN  Deep Neural Network 
RA  Random Access 
NORA  NonOrthogonal Random Access 
UAD  User Activity Detection 
CE  Channel Estimation 
MPBSBL [28]  Message PassingBased Block 
Sparse Bayesian Learning.  
DNNMPBSBL  DNNAided MPBSBL 
MF  Mean Field 
LDS  LowDensity Signature 
NMSE  Normalized Mean Square Error 

IC Contributions
In this paper, we focus on an orthogonal frequency division multiplexing system with lowdensity signature (LDSOFDM) [35, 36], where devices perform grantfree RA once they are activated. A deep neural networkaided message passingbased block sparse Bayesian learning (DNNMPBSBL) algorithm is proposed in this paper to perform joint UAD and CE for the grantfree RA procedure. The iterative message passing process of the MPBSBL algorithm [28] is transferred from a factor graph to a neural network. Weights are imposed on the messages passing in the neural network and further trained to minimize the estimation error. It is shown that the trained weights could alleviate the convergence problem of the MPBSBL algorithm, especially in crowded RA scenarios. As a result, the DNNMPBSBL algorithm could improve the UAD and CE accuracy with a smaller number of iterations. In addition, the training of the DNN is conducted offline, and negligible online computational complexity is additionally sacrificed for the performance improvement by the DNN. This guarantees the feasibility of the proposed DNNMPBSBL algorithm for crowded NORA systems with realtime implementation and lowlatency requirement.
The rest of this paper is organized as follows. The system model and the MPBSBL algorithm are presented in Section II. The DNN structure for the DNNMPBSBL algorithm is illustrated in Section III, where the weighted message passing is explained in details. Simulation results are given in Section IV to verify the UAD and CE accuracy of the proposed DNNMPBSBL algorithm. Finally, Section V concludes this paper. For reading convenience, the frequentlyused abbreviations in this paper are summarized in Table. I.
Ii Joint UAD and CE by MPBSBL
Iia System Model
As shown in Fig. 1, a LDSOFDM communication system is considered. It is assumed that there are subcarriers and users, while each user is activated with probability . For each active user , its binary information sequence is encoded and further mapped into a QAM symbol sequence with length . One unique pilot sequence with length is allocated for user , and is further inserted into the transmitted symbol sequence , i.e., . Therefore, the length of is . Then each symbol in is spread onto subcarriers, using a lowdensity spreading pattern
, which is a sparse vector with length
and nonzero elements. In this way, the same sequence is transmitted on different subcarriers.The LDS spreaders for all the users are represented by an LDS spreading matrix , which is assumed known at the receiver. We consider a regular LDS spreading matrix, i.e., the column degree and the row degree in are constant. According to the regular structure of , each subcarrier is shared by potential users, with . When multiple users are activated on the same subcarrier, the grantfree RA procedure is conducted in a nonorthogonal multiple access (NOMA) manner. Therefore, the system model in Fig. 1 is termed a LDSOFDM based grantfree nonorthogonal random access (NORA) system.
Before transmission, the cyclic prefix (CP) is inserted after the OFDM modulator to avoid the intersymbol interference (ISI). At the receiver, the CP is removed, and after the OFDM demodulator, we obtain the frequencydomain received signal matrix , where is the received signal vector representing the received symbols on the th subcarrier.
Now we discuss the channel between the input and the output of this LDSOFDM system. The transmitted sequences of all the users are organized in an input matrix , while the user activity is equivalently incorporated in the channel matrix . Specifically, we consider the effective channel gain vector of user to integrate the actual channel gain vector with the sparse spreading pattern . That is,
(1) 
where the th element of represents the actual channel gain of user on the th subcarrier, and the th element of is nonzero if and only if user chooses to conduct transmission on the th subcarrier. Therefore, the effective channel gain vector is a sparse vector with nonzero elements. Furthermore, we incorporate user activity in the channel matrix, i.e., , where the activity indicator if user is inactive. Otherwise, . In this way, we can obtain
(2) 
where , the th entry of represents the th received symbol on the th subcarrier, the th entry of represents the th transmitted symbol of the th user. The entries in the additive white Gaussian noise (AWGN) matrix
are assumed i.i.d with noise variance
.According to the composition of , can be decomposed as , where and represent the received signal matrices with respect to (w.r.t.) the pilot sequences and the data sequences, respectively. After the separator, is further processed by the proposed DNNMPBSBL algorithm for the joint UAD and CE. According to the output of the DNNMPBSBL algorithm , is fed into the multiuser detection (MUD) module to detect and decode the information bits of active users.
IiB Problem Formulation
In this paper, we mainly consider the received pilot matrix and solve the joint UAD and CE problem. Analogous to (2), can be formulated as
(3) 
where the th entry of represents the th received signal on the th subcarrier, and the th entry of represents the th pilot symbol of the th user. The pilot matrix is assumed known to the receiver. Then we perform vectorization on the transpose of as in [28],
(4) 
where , represents the Kronecker product, , and is the effective channel gain vector of user on subcarriers. According to the LDS spreading matrix , the transmitted sequence of each user is only spread onto subcarriers. Therefore, as explained in (1), elements in are zero, and the positions of the zero elements are known to the BS. We further simplify by eliminating the zeros according to . Accordingly, the columns in corresponding to the zeros in are also removed. Finally, we obtain the simplified version of (3) in equation () of (4), where and are obtained from and by zero elimination. According to (4), the joint UAD and CE problem is formulated as
(5) 
Remark 1
To enable this joint UAD and CE task, each active user transmits its unique pilot sequence before its data transmission. A common practice is to employ the ZadoffChu (ZC) sequences [37] as pilot sequences, which is also considered in this work. One ZC sequence with length is generated as follows,
(6) 
where , the sequence length is a prime number, and is the root of . When the root is fixed, we can generate different pilot sequences by cyclically shifting one existing sequence by elements. Therefore, when is set to 1, different pilot sequences can be generated for the users.
IiC MPBSBL Algorithm [28]
The recovery of the block sparse signal in (4) can be addressed by the MPBSBL algorithm [28], which is based on two types of lowcomplexity message passing algorithms. The Gaussian message passing is employed for updating the estimated channel gain and the mean field (MF) message passing is adopted to approximate the hyperparameters related to the user activity. The assumptions employed by the MPBSBL algorithm are detailed as follows.
For user , the distribution of the channel gains on subcarriers is assumed conditioned on a hyperparameter
. Specifically, the channel gains are assumed independently and identically distributed (i.i.d) with complex Gaussian distribution, i.e.,
. In addition, the reciprocal of the estimated hyperparameter is compared with a predefined threshold to determine the user activity. If is smaller than , this user is detected as inactive. Otherwise, this user is detected as active. The hyperparameteris assumed to follow a Gamma distribution
[27, 28, 38]. The noise precision is unknown at the receiver but assumed with a priori probability. With these assumptions above, the joint a posterior probability is factorized as follows
(7) 
where , , is the th row of matrix in (4), , , and ,
represents the complex Gaussian distribution probability density function (pdf) of
with mean and variance , while represents the Gamma distribution pdf of with parameters and . According to the variational Bayesian inference theory [38], the parameters and are usually assumed in the order of . Further details are referred to [27, 28].Accoring to the factorization in (7), a factor graph can be established for the MPBSBL algorithm, which is illustrated in Fig. 2. In Fig. 2, , , and denote , , , and , respectively. In addition, the extra variable is introduced and the constraint is represented by . Then, is a function of and , i.e., . The MPBSBL algorithm performed on the factor graph in Fig. 2 is briefed as follows
Denote as the iteration index and as the product of all the incoming messages from the neighboring nodes to , then according to , the variance and mean of are derived as follows,
(8) 
(9) 
By considering the message passing from to , the variance and mean of are updated as
(10) 
The variance and mean passing from to are derived in (11).
(11) 
Then the hyperparameter for the variance of the channel gain is updated according to the MF message passing,
(12) 
The variance and mean of are derived by , and the noise precision ,
(13) 
Finally, the noise precision is updated with the MF message passing,
(14) 
At the last iteration, if is smaller than a predefined threshold , user is regarded as inactive. Otherwise, this user is determined as active and is the estimated channel gain for user . Further details of the derivations above are referred to [28].
Iii DNNAided MPBSBL Algorithm
Two types of message passing are adopted by the MPBSBL algorithm. The Gaussian message passing is employed for CE in (10), while the MF message passing is employed to update the hyperparameters in (12) for UAD. According to the factor graph in Fig. 2, the variable nodes and sum nodes are densely connected. That is, the connection between and is characterized by many short cycles with girth 4, which results in the correlation problem of the Gaussian messages [29]. Furthermore, in crowded NORA systems, where the number of sum nodes is much smaller than the number of variable nodes , the correlated Gaussian messages further lead to the convergence problem for the MPBSBL algorithm. As a result, the convergence speed is deteriorated and the accuracy of the MPBSBL algorithm fails to get improved as the signaltonoise ratio (SNR) increases. On the other hand, the correlated Gaussian messages also lead to inaccuracy for the MF message update, which undermines the UAD performance.
Confronted with the problems above, we propose a DNNMPBSBL algorithm, which imposes weights on the Gaussian messages on the factor graph in Fig. 2. Simultaneously, weights are also imposed on the MF message update in (12) and further trained to improve the UAD accuracy. To facilitate the training of the weights, we transfer the message passing process from the factor graph in Fig. 2 to a DNN structure. The backandforth message passing process on the factor graph is now transformed into a forwardpropagation trellis in Fig. 3(a), which resembles the structure of a neural network. It is assumed that there are subcarriers, users, the length of pilot sequence is , each user chooses subcarriers to perform uplink data transmission, and the DNNMPBSBL algorithm is performed with iterations. The regular LDS spreading matrix is also shown in Fig. 3(a).
As shown in Fig. 3(a), the trellis can be considered as a neural network. Each iteration of the MPBSBL algorithm is now represented by one iteration block. Since the structure is identical for each iteration block, we only illustrate one iteration block in Fig. 3(a). However, it is noted that neighboring iteration blocks are connected. For illustration simplicity, we include the nodes from the th iteration block, e.g. and as the input nodes in the th iteration block. In this way, the entire DNN is constructed by connecting iteration blocks. Within each iteration block, one layer represents one particular message, i.e., each layer is defined by its output message. Two auxiliary layers and , i.e., Layer 2 and Layer 6 are also added for the clarity of illustration. Therefore, according to equations (8) to (14), there are 9 layers within each iteration block. Detailed organization of each layer is illustrated in Fig. 3(b) and the weighted message passing process is explained as follows.


Index  Layer Input  Layer Output  Length 
1  None  
2  
3  
4  
5  
6  
7  
8  
9  

Iiia Layer Organization and Weighting Matrix
For notational simplicity, the nodes in each layer are placed in a row vector in Fig. 3(b), and the weighted message passing process is then equivalent to the input vector multiplying a weighting matrix. The input vectors, output vector, and the length of output vector for each layer are listed in Table II while the elements of different vectors are organized as follows.
IiiA1 For vectors with length
The organization for the row vector in Layer 1 is taken as an example in Fig. 3(b). Every elements are grouped into a pilot symbol group according to the common pilot symbol they correspond to. These pilot symbol groups are organized according to the index of the pilot symbols. Within each pilot symbol group, the elements are organized according to the subcarrier index. For example, the th node in the th pilot symbol group represents the th received pilot signal on the th subcarrier. In this way, the row vector is organized as .
IiiA2 For vectors with length
The organization for the auxiliary row vector in Layer 6 is taken as an example. Every elements are grouped into a user group, according to the common user they correspond to. Within each user group, every elements are grouped into a subgroup according to the common subcarrier they correspond to. Within each subcarrier subgroup, the elements are arranged according to the pilot symbol index.
IiiA3 For vectors with length
The organization for the row vector in Layer 3 is taken as an example. Every elements are grouped together according to the common user they correspond to. Within each user group, the elements are organized according to the subcarrier index. In this way, the row vector is organized as , where .
IiiA4 For vectors with length
The elements in the row vector are arranged according to the user index.
IiiA5 Weighting matrix
Since input vectors and output vector of a layer are organized as row vectors, the weighted message update (i.e., the output vector) for every layer can be represented by the input vector multiplied by a weighting matrix. The size of the weighting matrix is where and are the length of the input vector and the output vector, respectively. According to Fig. 3(a), nodes in neighboring layers are not fully connected, i.e., the weighting matrix is defined by the interlayer connection. An example is shown in Fig. 3(b) for the weighting matrix defined by the connection from Layer 4 to Layer 5 of Fig. 3(a). According to this example, the th entry of the weighting matrix is nonzero if and only if the th node in the input layer is connected to the th node in the output layer. Furthermore, only the nonzero weights in the weighting matrix are trained for the DNNMPBSBL algorithm.
IiiB Weighted Message Passing
The weighted message passing is explained as follows.
Layer 1: Layer 1 serves as the input of the neural network within one iteration block and it is composed of two message vectors, and . The elements and are grouped together in Fig. 3(a) since they share identical connection to the nodes in Layer 2.
Layer 2: Layer 2 is the auxiliary layer , and the output is composed of two message vectors and , whose elements correspond to the terms in the cumulative summation of (8) and (9), respectively. The weighted message update of Layer 2 is derived as follows,
(15) 
(16) 
where and are row vectors with length . and are two weighting matrices with size . , , , and are weighting matrices defined by the connection from Layer 1 to Layer 2. In addition, the fraction and operations are the elementwise division and multiplication operations while the operation refers to the matrix multiplication operation.
Layer 3: The weighted output message vectors of Layer 3 are derived as follows,
(17) 
where and are weighting matrices defined by the connection from Layer 2 to Layer 3. is a diagonal weighting matrix with size .
Layer 4: The weighted output message vectors of Layer 4 are derived as follows,
(18) 
where refers to the Kronecker product. and are two allone row vectors and their vector length are and , respectively. , , and are diagonal weighting matrices with size .
Layer 5: The weighted output message vector is derived as follows,
(19) 
where and are row vectors with length , , and are weighting matrices defined by the connection from Layer 4 to Layer 5.
Layer 6: Layer 6 is the auxiliary layer with weighted output message vectors and , whose elements correspond to the terms in the cumulative summation in (11).
(20) 
where is a row vector with length .
Layer 7: The weighted output vectors of Layer 7 are derived as follows,
(21) 
where and are weighting matrices defined by the connection from Layer 6 to Layer 7. and are diagonal weighting matrices with size . is a weighting matrix with size .
Layer 8: The weighted output vectors of Layer 8 are derived as follows,
(22) 
where and are weighting matrices defined by the connection from Layer 7 to Layer 8. is a diagonal weighting matrix with size , and is a weighting matrix with size .
Layer 9: The weighted output message of Layer 9 is derived as follows,
(23) 
where and are diagonal weighting matrices with size , and is an allone column vector with length . is a weighting matrix with size .
IiiC Loss Function
The DNNMPBSBL algorithm is conducted in two periods: the training period and the testing period. In the training period, according to the given samples in the training set, the nonzero entries in the weighting matrices are trained to minimize the estimation error. In the testing period, the weighting matrices are fixed. The UAD and CE accuracy of the DNNMPBSBL algorithm is evaluated by the samples in the test set.
In order to train the nonzero entries in the weighting matrices, a loss function is employed during the training period to measure the estimation error. Then the weights are adjusted with the stochastic gradient descent (SGD) algorithm in a backpropagation manner to minimize the loss function. For the stability of the SGD algorithm, we employ the mean square error (MSE)
as the loss function, where is the estimated channel gain and is the known channel gain in the training set. It is noted that, different from the training period, the Normalized MSE (NMSE) is considered for the simulations in the testing period to measure the UAD and CE accuracy.