Providing efficient support for the Internet of Things (IoT) and Industry IoT (IIoT)  is one of the major objectives for the fifth generation (5G) and the next-generation 6G cellular wireless communication . In order to realize the physical information exchange in IoT, Machine-to-Machine (M2M) communication is anticipated to support billions of Machine Type Communication (MTC) devices 
. In addition, for most IoT applications such as smart metering and intelligent transportation, the MTC devices are intermittently activated with a low probability, and the size of the data packets transmitted by active devices is relatively small[4, 5]. Therefore, the random access (RA) process for M2M communications in IoT is characterized by massive connection and sporadic transmission, as well as small-sized data packets.
Confronted with the characteristics described above, conventional orthogonal multiple access (OMA) schemes, where orthogonal resource blocks (RBs) are assigned to each device, becomes infeasible due to its extremely low resource efficiency. To address this problem, different RA schemes have been proposed, where activated devices can share or contend for the uplink resources, and these RA schemes can be generally categorized into two types: grant-based RA [6, 7, 8, 9, 10, 11, 12] and grant-free RA [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28].
I-a Grant-Based Random Access
In grant-based RA schemes, activated users contend for the RBs by transmitting a preamble sequence to BS. At the BS, a RB is assigned to the activated user, whose preamble sequence is received and accepted by the BS.
One problem with the grant-based RA schemes is that the RB is wasted when more than one active device transmit the same preamble sequence. This problem is worsened with the massive connection in crowded RA scenarios. Some solutions were proposed to alleviate the RA congestion by reducing the collision probability, such as the Access Class Barring (ACB) scheme , delicate splitting of the RA preamble set , and automatic configuration of the RA parameters . Other works addressed the preamble shortage [9, 10] or the early preamble collision detection to avoid RB wastage [11, 12].
However, the RB wastage cannot be fully avoided by grant-based RA schemes, especially in crowded RA scenarios, which results in low resource efficiency. Furthermore, a handshaking process is required between the BS and active devices to recognize the contention winner, which undermines the uplink transmission efficiency of small data packets.
I-B Grant-Free Random Access
In grant-free RA schemes, activated users directly transmit data packets with encapsulated pilot sequences over shared uplink resources. Therefore, no grant is required from the BS for uplink transmission, which improves the transmission efficiency, especially for small-sized data packets. Note that in grant-based RA, the BS can acquire the ID of active users during the handshaking process, so that the activity of each user can be easily recognized. By contrast, in grant-free RA, the BS has no priori knowledge of each user’s activity before the transmission of data packets. Therefore, it is necessary to exploit encapsulated pilot sequences for user activity detection (UAD) and channel estimation (CE), so that the BS can identify each active user and further accomplish data decoding. Different grant-free RA schemes are explained as follows.
I-B1 Slotted ALOHA based grant-free RA Schemes
In slotted ALOHA based grant-free RA schemes, activated devices directly transmit data packets in randomly chosen slots while data packets experiencing no slot collision can be correctly decoded . In order to improve the throughput of slotted ALOHA protocols, the contention resolution diversity slotted ALOHA (CRDSA) protocol  was proposed, where each activated device sends two replicas of the same data packet and successive interference cancellation (SIC) technique is employed to recover the collided replicas. The CRDSA protocol introduced a (2,1) repetition code to the conventional slotted ALOHA protocol while the coded slotted ALOHA (CSA) protocol  further combined the general packet erasure codes with SIC. According to the analogy between the channel codes and the data packet replica, the frameless IRSA protocol  and the spatially coupled RA protocol  were proposed, corresponding to the rateless codes and the spatially coupled LDPC codes, respectively.
Although different modifications have been proposed to improve the throughput of slotted ALOHA protocols, the severe slot collision always leads to the failure of SIC, which hinders the application of slotted ALOHA based protocols in crowded RA scenarios.
I-B2 Compressed sensing-based grant-free RA schemes
Due to the sporadic transmission feature of M2M communications, the RA process can be formulated as a sparse signal recovery problem. Therefore, different compressed sensing (CS) algorithms employ pilot sequences to accomplish the user activity detection (UAD) and/or channel estimation (CE) problem. For example, a block CS algorithm  was proposed for distributed UAD and resource allocation, based on the clustering of devices. The joint UAD and CE problem was addressed by a modified Bayesian compressed sensing algorithm  for the cloud radio access network (C-RAN). In addition, the powerful approximate message passing (AMP) algorithm was employed for the joint UAD and CE problem when the BS is equipped either with a single antenna [20, 21] or with multiple antennas [22, 23].
I-B3 Sparse Bayesian learning-based grant-free RA schemes
Different from CS-based RA schemes, the sparse Bayesian learning (SBL) algorithm further considers the prior hyper-parameter of the sparse signal to address the joint UAD and CE problem. The Expectation Maximization (EM) method was employed by the AMP-SBL algorithm to update the sparse signal and the hyper-parameter iteratively. A least square (LS)-based AMP-SBL (LS-AMP-SBL) algorithm  was proposed to recover the sparse signal in three steps, i.e., the AMP-SBL first provides a coarse estimate, followed by the derivation for the support of the sparse channel, while the final estimate is derived by the support and the LS estimation. The joint UAD and data decoding task for a grant-free NOMA system was modeled and addressed by a deep auto-encoder in , which exploits the prior information of user activation probability and optimizes the choice of spreading signature. However, this auto-encoder is trained based on known channel state information at receiver and a sufficiently static channel, which impedes it application to the rapidly changing environment. Recently, a message-passing receiver design was proposed for the joint channel estimation and data decoding in uplink grant-free SCMA systems 
, exploiting variational Bayesian inference to extract the sparse signals. In addition, a message passing-based block sparse Bayesian learning (MP-BSBL) algorithm was proposed for a grant-free NOMA system. In both  and , non-zero elements in the sparse signals are updated with Gaussian message passing [29, 30, 31] and the hyper-parameters are updated with mean-field (MF) message passing .
The Gaussian message passing [29, 30, 31] is constructed on a factor graph, and it is proven to be optimal on cycle-free graphs. However, this factor graph is densely-connected under crowded RA scenarios, which causes correlated messages and convergence problem for the Gaussian message passing in MP-BSBL algorithm . As a result, the CE and UAD accuracy of the MP-BSBL algorithm is undermined under crowded RA scenarios. To address this problem, we impose weighting parameters on the Gaussian messages and MF message update in the MP-BSBL algorithm. In order to facilitate the training of these weighting parameters, the edge-type message passing process on the factor graph is transformed to a node-type one in the deep neural network (DNN) [33, 34]. Then these weights in the DNN can be trained in a data-driven manner to mitigate the message correlation and the convergence problem in the MP-BSBL algorithm. In this way, higher UAD and CE accuracy can be achieved for crowded RA scenarios.
|DNN||Deep Neural Network|
|NORA||Non-Orthogonal Random Access|
|UAD||User Activity Detection|
|MP-BSBL ||Message Passing-Based Block|
|Sparse Bayesian Learning.|
|NMSE||Normalized Mean Square Error|
In this paper, we focus on an orthogonal frequency division multiplexing system with low-density signature (LDS-OFDM) [35, 36], where devices perform grant-free RA once they are activated. A deep neural network-aided message passing-based block sparse Bayesian learning (DNN-MP-BSBL) algorithm is proposed in this paper to perform joint UAD and CE for the grant-free RA procedure. The iterative message passing process of the MP-BSBL algorithm  is transferred from a factor graph to a neural network. Weights are imposed on the messages passing in the neural network and further trained to minimize the estimation error. It is shown that the trained weights could alleviate the convergence problem of the MP-BSBL algorithm, especially in crowded RA scenarios. As a result, the DNN-MP-BSBL algorithm could improve the UAD and CE accuracy with a smaller number of iterations. In addition, the training of the DNN is conducted offline, and negligible online computational complexity is additionally sacrificed for the performance improvement by the DNN. This guarantees the feasibility of the proposed DNN-MP-BSBL algorithm for crowded NORA systems with real-time implementation and low-latency requirement.
The rest of this paper is organized as follows. The system model and the MP-BSBL algorithm are presented in Section II. The DNN structure for the DNN-MP-BSBL algorithm is illustrated in Section III, where the weighted message passing is explained in details. Simulation results are given in Section IV to verify the UAD and CE accuracy of the proposed DNN-MP-BSBL algorithm. Finally, Section V concludes this paper. For reading convenience, the frequently-used abbreviations in this paper are summarized in Table. I.
Ii Joint UAD and CE by MP-BSBL
Ii-a System Model
As shown in Fig. 1, a LDS-OFDM communication system is considered. It is assumed that there are sub-carriers and users, while each user is activated with probability . For each active user , its binary information sequence is encoded and further mapped into a QAM symbol sequence with length . One unique pilot sequence with length is allocated for user , and is further inserted into the transmitted symbol sequence , i.e., . Therefore, the length of is . Then each symbol in is spread onto sub-carriers, using a low-density spreading pattern
, which is a sparse vector with lengthand non-zero elements. In this way, the same sequence is transmitted on different sub-carriers.
The LDS spreaders for all the users are represented by an LDS spreading matrix , which is assumed known at the receiver. We consider a regular LDS spreading matrix, i.e., the column degree and the row degree in are constant. According to the regular structure of , each sub-carrier is shared by potential users, with . When multiple users are activated on the same sub-carrier, the grant-free RA procedure is conducted in a non-orthogonal multiple access (NOMA) manner. Therefore, the system model in Fig. 1 is termed a LDS-OFDM based grant-free non-orthogonal random access (NORA) system.
Before transmission, the cyclic prefix (CP) is inserted after the OFDM modulator to avoid the inter-symbol interference (ISI). At the receiver, the CP is removed, and after the OFDM demodulator, we obtain the frequency-domain received signal matrix , where is the received signal vector representing the received symbols on the -th sub-carrier.
Now we discuss the channel between the input and the output of this LDS-OFDM system. The transmitted sequences of all the users are organized in an input matrix , while the user activity is equivalently incorporated in the channel matrix . Specifically, we consider the effective channel gain vector of user to integrate the actual channel gain vector with the sparse spreading pattern . That is,
where the -th element of represents the actual channel gain of user on the -th sub-carrier, and the -th element of is non-zero if and only if user chooses to conduct transmission on the -th sub-carrier. Therefore, the effective channel gain vector is a sparse vector with non-zero elements. Furthermore, we incorporate user activity in the channel matrix, i.e., , where the activity indicator if user is inactive. Otherwise, . In this way, we can obtain
where , the -th entry of represents the -th received symbol on the -th sub-carrier, the -th entry of represents the -th transmitted symbol of the -th user. The entries in the additive white Gaussian noise (AWGN) matrix
are assumed i.i.d with noise variance.
According to the composition of , can be decomposed as , where and represent the received signal matrices with respect to (w.r.t.) the pilot sequences and the data sequences, respectively. After the separator, is further processed by the proposed DNN-MP-BSBL algorithm for the joint UAD and CE. According to the output of the DNN-MP-BSBL algorithm , is fed into the multi-user detection (MUD) module to detect and decode the information bits of active users.
Ii-B Problem Formulation
In this paper, we mainly consider the received pilot matrix and solve the joint UAD and CE problem. Analogous to (2), can be formulated as
where the -th entry of represents the -th received signal on the -th sub-carrier, and the -th entry of represents the -th pilot symbol of the -th user. The pilot matrix is assumed known to the receiver. Then we perform vectorization on the transpose of as in ,
where , represents the Kronecker product, , and is the effective channel gain vector of user on sub-carriers. According to the LDS spreading matrix , the transmitted sequence of each user is only spread onto sub-carriers. Therefore, as explained in (1), elements in are zero, and the positions of the zero elements are known to the BS. We further simplify by eliminating the zeros according to . Accordingly, the columns in corresponding to the zeros in are also removed. Finally, we obtain the simplified version of (3) in equation () of (4), where and are obtained from and by zero elimination. According to (4), the joint UAD and CE problem is formulated as
To enable this joint UAD and CE task, each active user transmits its unique pilot sequence before its data transmission. A common practice is to employ the Zadoff-Chu (ZC) sequences  as pilot sequences, which is also considered in this work. One ZC sequence with length is generated as follows,
where , the sequence length is a prime number, and is the root of . When the root is fixed, we can generate different pilot sequences by cyclically shifting one existing sequence by elements. Therefore, when is set to 1, different pilot sequences can be generated for the users.
Ii-C MP-BSBL Algorithm 
The recovery of the block sparse signal in (4) can be addressed by the MP-BSBL algorithm , which is based on two types of low-complexity message passing algorithms. The Gaussian message passing is employed for updating the estimated channel gain and the mean field (MF) message passing is adopted to approximate the hyper-parameters related to the user activity. The assumptions employed by the MP-BSBL algorithm are detailed as follows.
For user , the distribution of the channel gains on sub-carriers is assumed conditioned on a hyper-parameter
. Specifically, the channel gains are assumed independently and identically distributed (i.i.d) with complex Gaussian distribution, i.e.,. In addition, the reciprocal of the estimated hyper-parameter is compared with a pre-defined threshold to determine the user activity. If is smaller than , this user is detected as inactive. Otherwise, this user is detected as active. The hyper-parameter
is assumed to follow a Gamma distribution[27, 28, 38]. The noise precision is unknown at the receiver but assumed with a priori probability
. With these assumptions above, the joint a posterior probability is factorized as follows
where , , is the -th row of matrix in (4), , , and ,
represents the complex Gaussian distribution probability density function (pdf) ofwith mean and variance , while represents the Gamma distribution pdf of with parameters and . According to the variational Bayesian inference theory , the parameters and are usually assumed in the order of . Further details are referred to [27, 28].
Accoring to the factorization in (7), a factor graph can be established for the MP-BSBL algorithm, which is illustrated in Fig. 2. In Fig. 2, , , and denote , , , and , respectively. In addition, the extra variable is introduced and the constraint is represented by . Then, is a function of and , i.e., . The MP-BSBL algorithm performed on the factor graph in Fig. 2 is briefed as follows
Denote as the iteration index and as the product of all the incoming messages from the neighboring nodes to , then according to , the variance and mean of are derived as follows,
By considering the message passing from to , the variance and mean of are updated as
The variance and mean passing from to are derived in (11).
Then the hyper-parameter for the variance of the channel gain is updated according to the MF message passing,
The variance and mean of are derived by , and the noise precision ,
Finally, the noise precision is updated with the MF message passing,
At the last iteration, if is smaller than a predefined threshold , user is regarded as inactive. Otherwise, this user is determined as active and is the estimated channel gain for user . Further details of the derivations above are referred to .
Iii DNN-Aided MP-BSBL Algorithm
Two types of message passing are adopted by the MP-BSBL algorithm. The Gaussian message passing is employed for CE in (10), while the MF message passing is employed to update the hyper-parameters in (12) for UAD. According to the factor graph in Fig. 2, the variable nodes and sum nodes are densely connected. That is, the connection between and is characterized by many short cycles with girth 4, which results in the correlation problem of the Gaussian messages . Furthermore, in crowded NORA systems, where the number of sum nodes is much smaller than the number of variable nodes , the correlated Gaussian messages further lead to the convergence problem for the MP-BSBL algorithm. As a result, the convergence speed is deteriorated and the accuracy of the MP-BSBL algorithm fails to get improved as the signal-to-noise ratio (SNR) increases. On the other hand, the correlated Gaussian messages also lead to inaccuracy for the MF message update, which undermines the UAD performance.
Confronted with the problems above, we propose a DNN-MP-BSBL algorithm, which imposes weights on the Gaussian messages on the factor graph in Fig. 2. Simultaneously, weights are also imposed on the MF message update in (12) and further trained to improve the UAD accuracy. To facilitate the training of the weights, we transfer the message passing process from the factor graph in Fig. 2 to a DNN structure. The back-and-forth message passing process on the factor graph is now transformed into a forward-propagation trellis in Fig. 3(a), which resembles the structure of a neural network. It is assumed that there are sub-carriers, users, the length of pilot sequence is , each user chooses sub-carriers to perform uplink data transmission, and the DNN-MP-BSBL algorithm is performed with iterations. The regular LDS spreading matrix is also shown in Fig. 3(a).
As shown in Fig. 3(a), the trellis can be considered as a neural network. Each iteration of the MP-BSBL algorithm is now represented by one iteration block. Since the structure is identical for each iteration block, we only illustrate one iteration block in Fig. 3(a). However, it is noted that neighboring iteration blocks are connected. For illustration simplicity, we include the nodes from the -th iteration block, e.g. and as the input nodes in the -th iteration block. In this way, the entire DNN is constructed by connecting iteration blocks. Within each iteration block, one layer represents one particular message, i.e., each layer is defined by its output message. Two auxiliary layers and , i.e., Layer 2 and Layer 6 are also added for the clarity of illustration. Therefore, according to equations (8) to (14), there are 9 layers within each iteration block. Detailed organization of each layer is illustrated in Fig. 3(b) and the weighted message passing process is explained as follows.
|Index||Layer Input||Layer Output||Length|
Iii-a Layer Organization and Weighting Matrix
For notational simplicity, the nodes in each layer are placed in a row vector in Fig. 3(b), and the weighted message passing process is then equivalent to the input vector multiplying a weighting matrix. The input vectors, output vector, and the length of output vector for each layer are listed in Table II while the elements of different vectors are organized as follows.
Iii-A1 For vectors with length
The organization for the row vector in Layer 1 is taken as an example in Fig. 3(b). Every elements are grouped into a pilot symbol group according to the common pilot symbol they correspond to. These pilot symbol groups are organized according to the index of the pilot symbols. Within each pilot symbol group, the elements are organized according to the sub-carrier index. For example, the -th node in the -th pilot symbol group represents the -th received pilot signal on the -th sub-carrier. In this way, the row vector is organized as .
Iii-A2 For vectors with length
The organization for the auxiliary row vector in Layer 6 is taken as an example. Every elements are grouped into a user group, according to the common user they correspond to. Within each user group, every elements are grouped into a sub-group according to the common sub-carrier they correspond to. Within each sub-carrier sub-group, the elements are arranged according to the pilot symbol index.
Iii-A3 For vectors with length
The organization for the row vector in Layer 3 is taken as an example. Every elements are grouped together according to the common user they correspond to. Within each user group, the elements are organized according to the sub-carrier index. In this way, the row vector is organized as , where .
Iii-A4 For vectors with length
The elements in the row vector are arranged according to the user index.
Iii-A5 Weighting matrix
Since input vectors and output vector of a layer are organized as row vectors, the weighted message update (i.e., the output vector) for every layer can be represented by the input vector multiplied by a weighting matrix. The size of the weighting matrix is where and are the length of the input vector and the output vector, respectively. According to Fig. 3(a), nodes in neighboring layers are not fully connected, i.e., the weighting matrix is defined by the inter-layer connection. An example is shown in Fig. 3(b) for the weighting matrix defined by the connection from Layer 4 to Layer 5 of Fig. 3(a). According to this example, the -th entry of the weighting matrix is non-zero if and only if the -th node in the input layer is connected to the -th node in the output layer. Furthermore, only the non-zero weights in the weighting matrix are trained for the DNN-MP-BSBL algorithm.
Iii-B Weighted Message Passing
The weighted message passing is explained as follows.
Layer 1: Layer 1 serves as the input of the neural network within one iteration block and it is composed of two message vectors, and . The elements and are grouped together in Fig. 3(a) since they share identical connection to the nodes in Layer 2.
Layer 2: Layer 2 is the auxiliary layer , and the output is composed of two message vectors and , whose elements correspond to the terms in the cumulative summation of (8) and (9), respectively. The weighted message update of Layer 2 is derived as follows,
where and are row vectors with length . and are two weighting matrices with size . , , , and are weighting matrices defined by the connection from Layer 1 to Layer 2. In addition, the fraction and operations are the element-wise division and multiplication operations while the operation refers to the matrix multiplication operation.
Layer 3: The weighted output message vectors of Layer 3 are derived as follows,
where and are weighting matrices defined by the connection from Layer 2 to Layer 3. is a diagonal weighting matrix with size .
Layer 4: The weighted output message vectors of Layer 4 are derived as follows,
where refers to the Kronecker product. and are two all-one row vectors and their vector length are and , respectively. , , and are diagonal weighting matrices with size .
Layer 5: The weighted output message vector is derived as follows,
where and are row vectors with length , , and are weighting matrices defined by the connection from Layer 4 to Layer 5.
Layer 6: Layer 6 is the auxiliary layer with weighted output message vectors and , whose elements correspond to the terms in the cumulative summation in (11).
where is a row vector with length .
Layer 7: The weighted output vectors of Layer 7 are derived as follows,
where and are weighting matrices defined by the connection from Layer 6 to Layer 7. and are diagonal weighting matrices with size . is a weighting matrix with size .
Layer 8: The weighted output vectors of Layer 8 are derived as follows,
where and are weighting matrices defined by the connection from Layer 7 to Layer 8. is a diagonal weighting matrix with size , and is a weighting matrix with size .
Layer 9: The weighted output message of Layer 9 is derived as follows,
where and are diagonal weighting matrices with size , and is an all-one column vector with length . is a weighting matrix with size .
Iii-C Loss Function
The DNN-MP-BSBL algorithm is conducted in two periods: the training period and the testing period. In the training period, according to the given samples in the training set, the non-zero entries in the weighting matrices are trained to minimize the estimation error. In the testing period, the weighting matrices are fixed. The UAD and CE accuracy of the DNN-MP-BSBL algorithm is evaluated by the samples in the test set.
In order to train the non-zero entries in the weighting matrices, a loss function is employed during the training period to measure the estimation error. Then the weights are adjusted with the stochastic gradient descent (SGD) algorithm in a back-propagation manner to minimize the loss function. For the stability of the SGD algorithm, we employ the mean square error (MSE)as the loss function, where is the estimated channel gain and is the known channel gain in the training set. It is noted that, different from the training period, the Normalized MSE (NMSE) is considered for the simulations in the testing period to measure the UAD and CE accuracy.