## I Introduction

The emerging Internet of Things (IoT) achieves information exchange for objects in the physical world and enlightens the future development of many areas, such as Industry 4.0 and smart cities, etc. Providing efficient support for the IoT and related services is one of the major objectives for Machine-to-Machine (M2M) communications. In the upcoming IoT era, it is anticipated that the number of M2M devices would exceed 50 billion by 2020 [2]

. Furthermore, many M2M applications, such as smart metering, e-health, intelligent transportation and fleet management are generally characterized by small-sized data packets intermittently transmitted by a massive number of M2M devices. Specifically, these M2M devices would be activated with a low probability

[3, 4], which implies that the random-access (RA) process in M2M communications exhibits prominent features of massiveness and sparseness.### I-a Challenges

Due to the massiveness of the M2M communications, attempts to access cellular networks from a huge number of M2M devices can lead to severe congestion and collision in existing RA schemes. Confronted with the access collisions, the device activity detection becomes a challenging task for the base station (BS) to recognize the sparsely activated devices. In addition, the small-sized data packets in IoT applications also impose new requirement on the transmission efficiency of M2M communications.

### I-B Related Work

Different solutions have been proposed to address the problems encountered in crowded RA scenarios. Some grant-based schemes [5, 6, 7, 8, 9, 10, 11, 12] modify the exchange of controlling signals to efficiently allocate resource blocks (RBs) for uplink data transmission while other grant-free schemes exploit the sparseness feature with compressed sensing (CS) algorithms [13, 14, 15, 16, 17, 18, 19, 20]. In addition, the slotted ALOHA protocol has also been extensively investigated to improve the throughput of RA schemes [21, 22, 23, 24, 25, 26, 27, 28, 29, 30].

#### I-B1 Grant-Based RA Schemes

In grant-based RA schemes, each activated device randomly chooses one orthogonal preamble to apply for the corresponding RB for its uplink transmission. These preambles inherently exhibit outstanding detection performances due to their orthogonality. However, the massiveness of M2M communications causes severe overload of the physical random access channel (PRACH) and preamble collisions.

The PRACH overload, i.e., RA congestion when a massive number of active devices apply for a limited number of RBs, further leads to low access probability, high latency and even interruption of service. Some solutions have been proposed for the PRACH overload, such as the Access Class Barring (ACB) scheme [5], delicate splitting of the RA preamble set [6] as well as the automatic configuration of the RA channel parameters [7]. The preamble collision issue is also dealt with from different perspectives. For example, the preamble shortage is addressed either by increasing the number of available preambles [8] or by preamble reusing [9]. Early preamble collision detection schemes were proposed to avoid RB wastage [10, 11]. The early collision detection [10] is performed based on tagged preambles which is also exploited to monitor the RA load. A collision-aware resource access (CARA) scheme was proposed in [11] for the efficient use of granted RBs. In addition, a non-orthogonal resource allocation (NORA) [12] scheme was proposed to exploit the timing alignment information of devices in different spatial groups.

Unfortunately, a handshaking process is always required by grant-based RA schemes, which causes heavy signaling overhead and undermines the transmission efficiency for small-sized data packets. Furthermore, it is still hard to alleviate the impacts of severe preamble collisions on grant-based RA schemes. As an alternative, some grant-free RA schemes below may serve as promising solutions to crowded M2M communications.

#### I-B2 Compressed Sensing Based Grant-Free RA Schemes

Several grant-free schemes employ CS algorithms to exploit the sparseness feature of M2M communications. The CS algorithms are performed upon the received pilot signal to accomplish the user activity detection (UAD) and/or channel estimation (CE) problem. For example, a block CS algorithm

[13] was proposed for distributed UAD and resource allocation for clustered devices. A greedy algorithm based on orthogonal matching pursuit (OMP) was proposed in [14] for the joint UAD and CE problem. The same task as in [14] is accomplished by a modified Bayesian compressed sensing algorithm [15] for the cloud radio access network (C-RAN). The powerful approximate message passing (AMP) algorithm [16] was studied for the joint UAD and CE problem when the BS is equipped either with one single antenna [17, 18] or with multiple antennas [19, 20].The performances of CS-based schemes rely on the *sampling ratio* of CS algorithms. Henceforth, sufficiently long pilot sequences are required due to the massiveness of M2M devices. In addition, pilots are transmitted for the UAD whenever new devices are activated with sporadic data packets, which undermines the transmission efficiency of short packets. Therefore, another kind of grant-free RA schemes, the slotted ALOHA based RA protocols are considered for crowded M2M communications [23, 24, 25, 26, 27, 28, 29, 30].

RA Scheme | Pros | Cons |

Grant-Based RA scheme | Outstanding detection performance | Heavy signaling overhead due to handshaking |

Inevitable access collision | ||

CS-Based Grant-Free RA scheme | Capable of joint UAD and CE | Efficiency undermined for small-sized data packets |

Slotted ALOHA-Based RA protocols | Higher efficiency due to incorporated data transmission | Vulnerable to collisions |

Fixed-Symbol Aided RA scheme | Higher efficiency due to incorporated data transmission | One symbol sacrificed for device activity detection |

Access collision solved by MUD |

#### I-B3 Slotted ALOHA Based RA Protocols

In slotted ALOHA protocols, active devices transmit sporadic data packets in aligned slots while packets in collision-free slots can be correctly recovered. A fast retrial scheme was proposed in [23] for multi-channel ALOHA. In this scheme, collisions can be immediately recognized and collided packets are re-transmitted in randomly chosen sub-channels in the next slot. The stability of this fast retrial scheme is further analyzed in [24] with rate control for machine-type communications. For single-channel slotted ALOHA protocols, the throughput is improved by transmitting multiple replicas of the same data packet and employing inter-slot successive interference cancellation (SIC) among replicas. For example, in the contention resolution diversity slotted ALOHA (CRDSA) protocol [25], each activated device sends two replicas, i.e., the CRDSA protocol introduces a (2,1) repetition code to the conventional slotted ALOHA protocol. Recently, an irregular repetition slotted ALOHA (IRSA) protocol was proposed in [26] which improves the RA throughput by optimizing the probability for activated devices to choose repetition codes with different rates. This IRSA protocol was extended to the Rayleigh fading channel [27], where collided data packets can still be decoded as long as the signal-to-interference-and-noise ratio (SINR) exceeds a predefined threshold. A coded slotted ALOHA (CSA) scheme [28] was proposed employing the general () packet erasure codes to encode the data packets. At the receiver end of the CSA protocol, the maximum-a-posterior decoder and SIC are employed together to recover the collided packets. According to the analogy between the CSA and erasure correcting codes, the frameless IRSA protocol [29] and the spatially coupled RA protocol [30] were proposed, analogous to rateless codes and spatially coupled LDPC codes, respectively.

Different from CS-based RA schemes, slotted ALOHA based RA protocols are efficient for small-sized data packets since its uplink transmission phase is incorporated in the RA process. Sporadic data packets from newly activated devices can be directly transmitted without another round of UAD at the receiver. However, since the successful decoding of data packets relies heavily on collision-free slots or high SINR, existing schemes such as the IRSA and the CSA could barely work in crowded M2M communications where slots suffer from severe collisions. For reading convenience, the advantages and disadvantages of different RA schemes are summarized in Table I.

### I-C Contributions

In order to inherit the advantages of slotted ALOHA based RA protocols and deal with the intra-slot collision, we propose a fixed-symbol aided RA scheme, where active devices access the network in a grant-free manner. In each RA frame, each active device inserts one fixed symbol into its data packet and transmits the packet in one randomly chosen slot. The operations at the BS are divided into two phases. For the first phase, based on the low-complexity message passing algorithms (MPA), an iterative message passing based activity detection (MP-AD) algorithm is proposed to detect the device activity. For the second phase, according to the activity detection result, multi-user detection (MUD) is further employed in each slot to decode the collided data packets.

The MP-AD algorithm is explained as follows. Firstly, we model the fixed-symbol aided RA process by considering the received signals of the fixed symbols in all the slots. According to the system model, a factor graph is established with three different types of nodes. The message update equations for different types of nodes in the factor graph are derived, according to which the BS is enabled to detect the activity of each device in each slot. In order to alleviate the correlation problem of the message passing process in the MP-AD algorithm, we further introduce the deep neural network (DNN) structure. This DNN-aided MP-AD (DNN-MP-AD) algorithm is designed by transforming the edge-type message passing process on a factor graph into a node-type one in a DNN structure. Weights are then imposed on the messages passing in the DNN and further trained to improve the detection accuracy.

Although the activity detection issue is similar to the sparse recovery problem, the proposed MP-AD algorithm differs from existing CS solutions in that the transmission constraint due to the small-sized data packets is considered at every iteration, i.e., only one slot in a RA frame is chosen by each active device to perform data transmission. In addition, the DNN-MP-AD algorithm alleviates the correlation problem by training the weights in the DNN structure. This training process causes no additional online computational complexity since the weights are trained off-line with powerful hardware devices such as the GPU.

The major contributions of this paper are listed as follows.

(i) A fixed-symbol aided RA scheme is proposed for M2M communications. The proposed RA scheme inherits the transmission efficiency of slotted ALOHA protocols and tackles the problem of severe packet collisions via MUD.

(ii) The MP-AD algorithm is designed based on a factor graph where the Bernoulli messages are passed to detect the activity of each device in each slot. This MP-AD algorithm provides essential information for subsequent MUD.

(iii) The DNN-MP-AD algorithm is designed based on the MP-AD framework to alleviate the correlation problem of the messages. By imposing weights on the messages passing in the DNN, the detection accuracy is further improved without causing any additional online computational complexity.

### I-D Paper Organization

This paper is organized as follows. The fixed-symbol aided RA scheme is proposed with corresponding system model in Section II. According to the system model, a factor graph is presented for the MP-AD algorithm in Section III and the message update equations for different types of nodes are elaborated on. The correlation problem and the DNN-MP-AD algorithm are presented in Section IV while numerical simulation results are illustrated in Section V. Finally, the conclusion and future work are given in Section VI.

## Ii Fixed-Symbol Aided Random Access Scheme

### Ii-a System Model

As shown in Fig. 1(a), we consider a M2M communication system with devices centered by a massive MIMO BS. Each device is activated with a certain probability , depending on its service type. The BS is equipped with antennas while each device is with one single antenna. The channel between the devices and the BS is a slow time-varying block fading TDD (time division duplex) channel while the channel state information (CSI) is assumed known to the BS via channel estimation. We consider the application scenarios where devices are placed at fixed locations. Therefore, the coherence time is sufficiently long and the CSI remains constant over a large number of RA frames after one round of channel estimation [31, 32]. One RA frame is divided into slots. Since the M2M communication in IoT applications is mainly characterized by small-sized data packets, each active device randomly chooses only one slot to perform data packet transmission. Furthermore, the value of is assumed relatively small to reduce the detection latency for activated devices. More details on the system model are discussed at Section III-F.

Before transmission, each active device inserts a fixed symbol into its data packet. Without loss of any generality, this fixed symbol is placed at the beginning of the transmitted data packet and its value is fixed as unit value. In order to detect the device activity in each slot, the iterative MP-AD algorithm is performed at the BS upon the received signals of the first symbols in all of the slots. A mathematical model of the received signals is firstly established for the MP-AD algorithm as follows while the details for this algorithm are explained in Section III.

Since the value of the fixed symbol is set as unit value, the received signal on the -th antenna with respect to (w.r.t.) the first symbol in the -th slot can be written as

(1) |

where is the channel gain from the -th device to the -th antenna, is the additive Gaussian noise, and is the device-slot indicator, i.e., if the -th device selects the -th slot to transmit its data packet. Otherwise, . The received signal in (1) can be rewritten in a matrix form by considering all the slots and all the antennas

(2) |

where is the channel matrix while

is an additive white Gaussian noise matrix with variance

for each entry. is a device-slot indicator matrix, as well as the target of the MP-AD algorithm and the DNN-MP-AD algorithm.It is noted that although the problem (2) is a sparse recovery issue, the target matrix follows a different distribution from the ones considered by existing CS algorithms. Since each active device can only choose one slot for transmission while a slot might be chosen by more than one device, each row of the matrix has at most one “1” but each column of might have more than one “1”. When a slot is not chosen by any M2M device, the -th column of matrix would be all zeros. Similarly, if a M2M device is inactive, the -th row of matrix would also be all zeros. The probability that a certain row of has just one “1” is while the probability for each entry to be “1” is

(3) |

###### Remark 1

For notational convenience, we consider a real-valued channel matrix for the proposed MP-AD algorithm and the DNN-MP-AD algorithm. Note that this is not in contradiction with the realistic complex-valued channel matrix, since we can regard the complex-valued matrices as a stack of two real-valued matrices, i.e., the following two equations are equivalent

(4) |

(5) |

where , and are the real part and imaginary part of the complex-valued received matrix, and are the real part and imaginary part of the complex-valued channel matrix, and and represent real part and imaginary part of the complex noise matrix. Since (5) is identical to (2), the realistic complex-valued problem in (4) can be represented by the real-valued problem (2). However, it is noted that the transformation from (4) to (5) doubles the column number of the matrix representation. Therefore, the antennas in the MP-AD algorithm actually correspond to antennas in the realistic complex-valued scenarios.

### Ii-B Random Access Process

An example of one RA frame in the proposed fixed-symbol aided RA scheme is shown in Fig. 1(b). In the data packet transmitted in one slot, the first symbol marked in white represents the fixed symbol which takes the unit value while the following symbols marked in different colors represent the data symbols of different devices.

At the receiver end, the received signals with respect to the first symbols in all of the slots are processed by the MP-AD algorithm, according to which, the BS is enabled to detect the activity of each device in each slot. The received signals with respect to the data symbols in each slot are firstly stored in the memory. After the device activity detection, the BS performs subsequent MUD in each slot to decode the collided data packets. For example, if the device activity is correctly detected in slot 2, the subsequent MUD is only performed for 2 active devices, i.e., device 3 and device 5 while inactive devices in this slot are ignored.

There are many off-the-shelf MUD algorithms which can be directly applied to the massive MIMO BS. For example, the low-complexity Gaussian message passing iterative detector (GMPID) [33, 34, 35]

can be employed for MUD when the data symbols are Gaussian distributed, which is discussed at Section

III-F. In addition, due to the low activation probability , the number of collided packets in each slot is much smaller than . As a result, excellent MUD performance can be expected as long as the device activity is correctly detected by the MP-AD algorithm in the corresponding slot.The proposed fixed-symbol aided RA scheme inherits the uplink transmission efficiency of slotted ALOHA protocols and only one fixed symbol is sacrificed for the device activity detection. The severe intra-slot collision issue in crowded M2M communications is solved by the MUD, which is enabled by the device activity detection result of the following MP-AD algorithm. The proposed fixed-symbol aided random access scheme is summarized in Scheme 1 while the MP-AD algorithm is explained as follows.

## Iii Message Passing Based Activity Detection Algorithm

(8) |

The message passing algorithm (MPA) is renowned for its feasible implementation complexity since the overall processing can be departed into distributed calculations, which is suitable for parallel execution. As a result, the MPA has been widely applied for Compressed Sensing [16], MUD [33, 34, 35], channel estimation [36] and the Belief Propagation (BP) decoding for LDPC codes [37]. Therefore, it is practical for the powerful massive MIMO BS to perform the iterative message passing algorithm to detect the activity of each device in each slot. The factor graph for the MP-AD algorithm is presented as follows.

### Iii-a Factor Graph Representation

The system model described in Section II can be represented by a factor graph in Fig. 2 and the messages passing on the graph are the likelihood messages for the Bernoulli variables, i.e., the entries in matrix . As shown in Fig. 2, there are three types of nodes in the factor graph, i.e., sum nodes (SNs), variable nodes (VNs) and check nodes (CNs). Each SN stands for an entry in matrix , i.e., the received signal on the -th antenna with respect to the first symbol in the -th slot. Each VN is a Bernoulli variable and represents the -th element of the -th row in matrix . Furthermore, the CNs stand for the check node constraints for the devices, i.e., at most one slot can be chosen by a device in a RA frame.

According to the factor graph, the message updating diagram among SNs, VNs and CNs is illustrated in Fig. 3. According to the principle of the MPA, the output message, defined as the extrinsic information, is derived by the incoming messages from the other edges that are connected to the same node. The message update equations for the SNs, VNs and CNs are derived as follows.

### Iii-B Message Update at Sum Nodes

The message update at each SN can be seen as a multiple-access process and the extrinsic message from the -th SN to the -th VN is presented as an example in Fig. 3(a). Firstly, the received signal at the -th SN can be rewritten to

(6) |

where for , for and for . We assume that denotes the non-zero probability for the Bernoulli variable passing from the -th VN to the -th SN in the -th iteration. According to the incoming messages from the VNs and the *central limit theorem*, the SN approximates as an equivalent Gaussian noise with mean and variance ,

(7) |

where and are the non-zero and zero probabilities for the Bernoulli variable passing from the -th VN to the -th SN. Then the extrinsic message from the -th SN to the -th VN is derived in (8), where is the -th row of , is the set of Bernoulli probabilities and is the Gaussian distribution *probability density function* for ,

To avoid overflow caused by a large number of multiplications of probabilities, the Bernoulli messages are represented in the form of *log-likelihood ratio* (LLR) instead of the non-zero probability in (8). For example, the LLR message passing from the -th SN to the -th VN in the -th iteration is derived by performing the logarithmic operation on the ratio of the non-zero probability to the zero probability

(9) |

where () is derived by substituting the result of (8) into the derivation of as well as the fact that for each Bernoulli variable .

### Iii-C Message Update at Variable Nodes

The message update at each VN can be seen as a broadcasting process and the extrinsic message from the VN is derived following the message combination rule [38], i.e., the extrinsic message is a normalized product of the input probabilities.

#### Iii-C1 Message update for sum nodes

The message update from the -th VN to the -th SN is presented as an example in Fig. 3(d). The extrinsic message is derived by the initial probability of each VN, the incoming message from the -th CN and the incoming messages from the -th SN with . According to the message combination rule, the extrinsic message from the -th VN to the -th SN is

(10) |

where and () is derived by the normalized product of the input probabilities.

#### Iii-C2 Message update for check nodes

Similarly, in Fig. 3(c), the extrinsic message update from the -th VN to the -th CN is derived by the initial probability and the incoming messages from the -th SN with .

(11) |

where and () is derived by the normalized product of the input Bernoulli probabilities.

### Iii-D Message Update at Check Nodes

The -th CN represents a constraint for the corresponding VNs that a VN if and only if the -th device is active and the other VNs for any . As illustrated in Fig. 3(b), the extrinsic message from the -th CN to the -th VN is derived by the initial activation probability for this device and the incoming messages from the -th VN with . The message update is presented as

(13) |

where . The extrinsic LLR message passing from the -th CN to the -th VN is derived by

(14) |

where

(15) |

### Iii-E Output and Decision

The final output Bernoulli message for each VN is derived by all the incoming messages from SNs and the CN as well as the initial probability ,

(16) |

The output LLR message for each Bernoulli variable is

(17) |

Then, the decision for each Bernoulli variable is

(18) |

Finally, is the output of the MP-AD algorithm. For reading convenience, the MP-AD algorithm is summarized in Phase 1 of Scheme 1. For Phase 2 (MUD phase) of Scheme 1, the sub-matrix is obtained according to , i.e., is composed of the columns of corresponding to the active devices in slot .

### Iii-F Discussions

#### Iii-F1 Channel State Information

Normally, the channel matrix is affected by both the large-scale fading and the small-scale fading. Since we consider devices with relatively fixed locations, the large-scale fading is assumed constant and can be compensated by the *reverse power control*, i.e., each device adjusts its transmission power to guarantee approximately identical mean received power for all the devices at the BS. Therefore, the channel matrix is only characterized by the small-scale fading. Then, the devices transmit pilot sequences to facilitate the uplink channel estimation at the BS.

It is noted that for IoT application scenarios, such as the Phase Management Unit (PMU) in smart grid [39, 40] and energy management system in smart homes [41, 42], the devices are stationary or quasi-stationary and the coherence time of the uplink transmission is therefore sufficiently long [31, 32]. For example, the experimental results in [31] show that in a static testing environment, the channel is constant within tens of minutes and the channel response remains a strong time correlation even under human interference. In addition, as shown by simulations in Section V-D, the MP-AD algorithm and the DNN-MP-AD algorithm exhibit tolerance against the channel variation. Therefore, after one round of channel estimation, the uplink channel is assumed constant over different RA frames.

#### Iii-F2 Synchronization

As stated above, the M2M devices are assumed stationary with fixed locations in this paper. As a result, the transmission delay from each device to the BS can be acquired when it is registered in the network. In this way, synchronization among all the devices can be performed by exploiting the timing advance information. Therefore, we assume that the data packets from the active devices are symbol-wise synchronized.

#### Iii-F3 Subsequent MUD

The subsequent MUD is performed according to the activity detection result of the MP-AD algorithm. It is noted that the number of collided packets in each slot is much smaller than the number of devices, which greatly reduces the computational complexity of MUD at the BS, especially when the activation probability is low. There are many off-the-shelf MUD algorithms which can be directly applied to the massive MIMO BS. For example, when the data symbols are assumed Gaussian distributed, the performance of the GMPID is proven to converge to that of the minimum mean square error (MMSE) estimator [34]. Furthermore, aided by a 10-bit superposition coded modulation scheme [43, 44], the GMPID can still guarantee excellent decoding accuracy for discrete data symbols. Therefore, the accuracy of the device activity detection is the key to the throughput of the proposed fixed-symbol aided RA scheme.

#### Iii-F4 Energy Consumption and Storage Overhead

As shown in Fig. 6, the proposed MP-AD algorithm exhibits outstanding detection accuracy in the low-SNR regime. As a result, the transmitting power can be effectively lowered in the proposed fixed-symbol aided RA scheme and the energy consumption is therefore feasible for the devices. At the receiver end, since the message passing algorithm is employed for both the MP-AD algorithm and subsequent MUD, the massive MIMO BS can depart the overall processing into parallel-executed distributed computations. Due to the low computational complexity, the energy consumption also remains feasible for the BS. Therefore, compared with the CS-based grant-free RA schemes and slotted-ALOHA based RA protocols, no extra energy is sacrificed in the proposed RA scheme. In addition, compared with grant-based RA schemes, the proposed RA scheme can greatly reduce the energy consumption of the devices since no handshaking process is required.

In SIC-based slotted ALOHA protocols, collisions needs to be stored to facilitate subsequent inter-slot SIC. However, this storage overhead can be excessively high due to the severe access collisions in crowded M2M communications. By contrast, both the CS-based RA scheme and the proposed fixed-symbol aided RA scheme only need to store the received signals in the current RA frame. As mentioned in Section II-A, the number of slots in one RA frame is assumed relatively small to reduce the processing delay. In addition, the data packets in each slot are normally small-sized for IoT applications. Therefore, the storage overhead also remains feasible for the proposed RA scheme.

## Iv DNN-Aided Message Passing Based Activity Detection Algorithm

### Iv-a Correlation Problem

The message update in message passing algorithms is derived based on the assumption that the messages are mutually independent. However, for the MP-AD algorithm, the messages suffer from the correlation problem, which undermines the accuracy of device activity detection. The correlation problem is caused by the following reasons.

According to the factor graph in Fig. 2, the connection between SNs and VNs is characterized by many short cycles with girth 4. As a result, the messages passing on the factor graph can be strongly correlated. In addition, the correlation problem is also caused by the CN update. Rewrite the update equation (14) for the -th CN as follows,

(19) |

Assuming as a constant, we can observe that the function is dominated by when is negative with a large absolute value. On the contrary, this function is dominated by and can be approximated as if is a large positive number. We assume that in the accumulative summation term of (19), the message is a large positive number while the other messages for are negative or positive but with limited absolute value. Then the message in (19) is approximately proportional to . This can be explained by the fact that if a VN is highly likely to be 1, then the other VNs for are highly likely to be 0. Since is passed to VN , the extrinsic message from VN to SN is strongly correlated with the extrinsic message from VN to SN. This correlation can be helpful for the iterative convergence when the LLR message is reliable. However, when false alarm occurs, is a large positive number for VN . Then the correlation problem of the CN update will cause error propagation on the entire factor graph. According to [34], the extrinsic messages from VNs to CNs are more likely to be unreliable in overloaded MIMO systems where the antennas fail to sustain a large number of devices. In addition, the correlation problem in overloaded systems is even worsened when the SNR is high since the absolute value of the unreliable LLR message is larger with higher SNR.

The correlation problem encountered by the GMPID in [34]

is solved by a scale-and-add (SA) method where the SA-GMPID is derived by analyzing the convergence of the message passing process. However, the correlation problem caused by the CN update greatly complicates the theoretical analysis for the proposed MP-AD algorithm. As an alternative, the emerging deep learning method has been proven effective in improving the performance of BP decoders on densely connected Tanner graphs

[45]. Motivated by this result, we resort to the weighted message passing in DNN and transform the edge-type message passing process in Fig. 3 into a node-type one in a DNN structure. After the transformation, weights are imposed on the messages in the DNN and further trained to alleviate this correlation problem.### Iv-B Preliminaries to Neural Networks

In order to make this work self-contained, we provide some preliminaries on the neural networks [46]

. In terms of the topology, the neural network has several layers of nodes (neurons) while the first layer is termed the

*input layer*, the last layer is termed the

*output layer*and the remaining layers in the middle are

*hidden layers*

. Typically, for each node in a hidden layer of the neural network, its input includes the incoming messages from the neighboring nodes in the previous layer and a bias term. The output of this hidden node is obtained by the summation of the input messages, followed by a non-linear activation function, e.g., the

function to introduce non-linearity. In addition, the nodes in the hidden layers are usually fully or densely connected to the neighbors in the previous layer. In the training phase, weighting parameters are assigned to the messages passing in the neural network while the*loss function*(e.g. the

*cross entropy*function or the

*Mean Square Error*function) is employed to measure the distance between the output and the true values (i.e., the correct output). Then, the weights are trained with the samples in the

*training set*

to minimize this loss function. Mathematically, the neural network can be considered as a function of the input neurons and the weights in a hyperspace. With a set of well-trained weights, the neural network can serve as a

*classifier*which produces the correct output of future input.

The DNN, one fundamental framework in deep learning, features a large number of hidden layers. Recently, the emerging DNN has been proven effective in a wide variety of communication scenarios such as MIMO communication[47, 48]

and sparse vector recovery

[49]. For the decoder design of channel codes, a DNN-based BP decoder was proposed for short BCH codes [45] to alleviate the influence of short cycles in densely connected Tanner graphs.### Iv-C Neural Network Structure for DNN-MP-AD Algorithm

Different from the typical neural network, the neural network structure for the DNN-MP-AD algorithm is constructed by directly transforming the edge-type message passing on the factor graph into a node-type one, i.e., each edge in the factor graph is now transformed into a node in the hidden layer. Therefore, every hidden layer represents one specific message update in Fig. 3 and different from the typical fully-connected neural network, the inter-layer connection for the DNN-MP-AD algorithm is uniquely determined by the factor graph in Fig. 2. Since the messages passing in the DNN-MP-AD algorithm are LLR messages, no non-linear activation function is considered and the bias term is determined by the message update equations in the MP-AD algorithm.

For a better understanding of the DNN-MP-AD algorithm, an example is illustrated in Fig. 4 assuming that related parameters are , , and iterations for the MP-AD algorithm. Entries in the received signal matrix and the channel matrix now serve as nodes of the input layer. However, entries in are not depicted in Fig. 4 for the clarity of presentation. Each complete iteration of the MP-AD algorithm is now transformed into four different hidden layers, corresponding to the four message update processes in Fig. 3. The output layer, i.e., the decision layer has nodes, corresponding to the LLR messages for the entries in .

To be specific, the messages passing from SNs to VNs in Fig. 3(a) are represented by the nodes in hidden layer A, which indicates that there are nodes in hidden layer A. The messages passing from VNs to CNs in Fig. 3(c) and the messages passing from CNs to VNs in Fig. 3(b) are represented by the nodes in hidden layers B and C, respectively. Therefore, the number of nodes in hidden layers B and C is . The messages passing from VNs to SNs in Fig. 3(d) are represented by the nodes in hidden layer D. Consequently, there are nodes in hidden layer D. We further employ the subscript as the index for different iterations. Then the nodes in hidden layer serve as the input for the next hidden layer . It should be noted that, according to (9), the input for subsequent hidden layers A also includes the nodes in the input layer. However, for the clarity of presentation in Fig. 4, we do not depict the connection from the input layer to the hidden layers with since this connection is identical to that from the input layer to . Furthermore, the output layer does not require the nodes from hidden layer D, which indicates that the last iteration only contains 3 hidden layers in Fig. 4. Finally, according to (12) and (17), the nodes in hidden layer A are required by hidden layer D and the output layer. The connections from hidden layer A to hidden layer D and the output layer are illustrated by two interleavers and in Fig. 4.

For the convenience of illustration, the nodes are colored to indicate the node affiliation in Fig. 4. To be specific, nodes in different layers are divided into groups according to the device they belong to. Within each device group, the nodes in hidden layers A and D are divided into subgroups according to different slots, while the nodes within each slot subgroup are distinguished by the antenna they correspond to. In terms of the hidden layers B, C and the output layer, the nodes in each device group are simply distinguished according to the slot. With this node affiliation, the -th node in the -th slot subgroup of the -th device group in hidden layer is interpreted as the LLR message in Fig. 3(a). Similarly, we can define the interpretation for the nodes in other hidden layers and the connections between different layers are uniquely determined as in Fig. 4. It is noted that, different from the typical DNN, the neural network structure for the DNN-MP-AD algorithm exhibits relatively sparse inter-layer connection.

Overall, there are hidden layers and one input layer as well as one output layer in the neural network structure for the DNN-MP-AD algorithm, where is the number of iterations for the iterative MP-AD algorithm.

### Iv-D Weighted Message Update for DNN-MP-AD Algorithm

Based on the neural network structure, the weighted message update at different hidden layers of the DNN-MP-AD algorithm is presented as follows. For notational convenience, we denote as the set of all the weights in the following equations.

#### Iv-D1 Hidden Layer A

As stated, the nodes in hidden layer A represent the messages passing from SNs to VNs. We further make some modifications on (9) for the convenience of presentation and present the weighted message update equation at hidden layer A as follows.

(20) |

with

(21) |

where is the iteration index and the node in hidden layer A corresponds to the -th antenna in the -th slot subgroup of the -th device group. is the set of neighbors in the previous layer , i.e., nodes in are connected to node in hidden layer A while is the index of the device group for neighbor in hidden layer . The weights and are imposed on the messages and respectively while is regarded as the bias term of hidden layer A. For notational simplicity, the iteration index is removed from the weights but these weights are independently trained across different iterations. It should be noted that at the first iteration, is assumed to be 0 for all the nodes in hidden layer since no priori information is passed from VNs to SNs at the beginning.

#### Iv-D2 Hidden Layer B

The nodes in hidden layer B represent the messages passing from VNs to CNs in (12). The weighted update equation is now rewritten as

(22) |

where and are weights while is regarded as the bias term of hidden layer B. Node in hidden layer B corresponds to the -th slot of the -th device group and is the set of neighbor nodes in hidden layer A for node in hidden layer B. In addition, is the index of the antenna which the neighbor in corresponds to.

#### Iv-D3 Hidden Layer C

The nodes in hidden layer C represent the messages passing from CNs to VNs in (14). Therefore, the update equation is now rewritten as

(23) |

with

(24) |

where and are weights and is regarded as the bias term of hidden layer C. Node in hidden layer C corresponds to the -th slot of the -th device group and is the set of neighbor nodes in hidden layer B for node in hidden layer C. In addition, is the index of the slot which the neighbor in corresponds to.

#### Iv-D4 Hidden Layer D

As shown in Fig. 4, the input of hidden layer D includes the nodes in both hidden layer A and hidden layer C. Therefore, we have

(25) |

where and are weights while is regarded as the bias term of hidden layer D. Node in hidden layer D corresponds to the -th antenna in the -th slot subgroup of the -th device group. is the set of neighbor nodes in hidden layer A for node in hidden layer D and is the index of the antenna which the neighbor in corresponds to.

#### Iv-D5 Output Layer

Similar to the hidden layer D, the input of the output layer includes the nodes in the last hidden layer and the last hidden layer .

(26) |

where and are weights while is regarded as the bias term of the output layer. Node in the output layer is the LLR for the -th entry of the matrix , is the set of neighbor nodes in the last hidden layer A for node in the output layer and is the index of the antenna which neighbor in corresponds to. Finally, the decision is made as in (18).

### Iv-E Loss Function

In order to accelerate the training of the weights in the DNN, we utilize the cross entropy function as the loss function. It is noted that the relationship between the output LLR and the non-zero probability of the corresponding VN

coincides with the sigmoid function, i.e.,

. According to this relationship, the non-zero probability vector and the true value vector, i.e., the*label vector*are taken to calculate the cross entropy

(27) |

In order to further accelerate the training of the weights near the input layer, the loss function is modified by introducing the multi-loss, i.e., we consider the non-zero probability of each middle iteration .

(28) |

where the iteration index and the non-zero probability is related to the output LLR of each middle iteration, i.e., .

Finally, the weights in the set are trained to minimize the loss function in (28). After the training process, these weights are preserved in the neural network and employed by the DNN-MP-AD algorithm for future inputs. The procedure of the DNN-MP-AD algorithm is almost identical to that of the MP-AD algorithm in Phase 1 of Scheme 1. The differences are that is also included as the input information and equations (20), (22), (23) and (25) are employed for the four message updates in each iteration while (26) is employed for the output message.

## V Simulation Results

Simulation platform for MP-AD algorithm | Matlab 2018a |
---|---|

Simulation platform for DNN-MP-AD algorithm | Tensorflow 1.6 Python 3.5 |

GPU | GTX 1080 Ti |

Optimizer | AdamOptimizer |

Training set size | |

Test set size | |

Mini-batch size | |

Epoch number | 20 |

Initial weight | 1 |

Learning rate |

For the simulations, we consider the complex-valued Rayleigh fading channel. As stated in Remark 1, the complex-valued scenario is actually equivalent to the real-valued scenario for the MP-AD algorithm and the DNN-MP-AD algorithm while the only difference is that antennas in the real-valued scenario corresponds to antennas in the complex-valued scenario. Configurations related to the following simulations are listed in Table II.

### V-a Throughput of Fixed-Symbol Aided RA Scheme

The throughput is defined as the average number of successfully received data packets in each RA frame. As stated above, the data symbols are assumed Gaussian distributed with zero mean and variance , where the mean received power after reverse power control is set to 1. The signal-to-noise ratio (SNR) is defined as where is the variance of the complex additive white Gaussian noise. For the proposed fixed-symbol aided RA scheme, the MP-AD algorithm is employed to detect the device activity in each slot while the MMSE estimator or the GMPID [34] are employed to recover the Gaussian data symbols. The Gaussian data symbols are considered correctly recovered as long as the recovery Mean Square Error (MSE) is lower than a predefined threshold . The value of is determined by the realistic coding scheme for the Gaussian symbol. For the sake of simplicity, we assume that the threshold is determined as the lowest value such that the MMSE estimator could correctly recover the Gaussian data symbol for the given SNR. Then, a data packet from an activated device is assumed successfully received by the BS if and only if the activity of all the devices is correctly detected in the corresponding slot and the Gaussian data symbols are correctly recovered.

The throughput of the proposed fixed-symbol aided RA scheme is shown in Fig. 5, where the throughput upper bound is obtained by assuming that the data packets from all the activated devices are successfully received, i.e., the upper bound is equal to . In addition, we also illustrate the number of failed packets caused by the error of the MP-AD algorithm, the MMSE estimator and the GMPID. It is shown by the solid curves that for different activation probability , an increasing number of antennas could greatly improve the throughput of the proposed RA scheme. In addition, the throughput of the RA scheme with GMPID employed for MUD is identical to that with the MMSE estimator employed for MUD. The dashed curves show that, for different number of antennas, the failed packets in the fixed-symbol aided RA scheme are mainly caused by the error of the MP-AD algorithm, indicating that the device activity detection is the key to the throughput of the proposed RA scheme. Finally, the dotted curves show that the low-complexity GMPID could well approach the performance of the MMSE estimator and excellent MUD performance can be guaranteed as long as the device activity detection is accurate.

### V-B Impacts of System Parameters on Detection Accuracy

According to the simulation results in Fig. 5, the accuracy of the MP-AD algorithm is the key to the throughput of the proposed fixed-symbol aided RA scheme. Therefore, we focus on the accuracy of the MP-AD algorithm for the following simulations and investigate the impacts of different system parameters. It is noted that when false alarm (FA) of the MP-AD algorithm occurs, i.e., an inactive device is considered active, the recovered data symbol of this FA device is close to zero according to the GMPID. However, since Gaussian data symbols are considered, the recovered symbol of this FA device is also accepted by the BS, leading to the wrong reception of inactive devices. Therefore, both missed detection (MD) and FA are not acceptable in the fixed-symbol aided RA scheme and we do not further distinguish these two types of errors. Instead, we employ the element error rate (EER) for every element of matrix to indicate the detection accuracy.

The simulation results for the impacts of three different parameters , and SNR are illustrated in Fig. 6, respectively. For comparison, the performances of the Linear Minimum Mean Square Error (LMMSE) estimator [50], the matched filter (MF) estimator and the V-AMP estimator [19] are also investigated for the problem in (2) while additional check constraint is imposed on their final hard decision, i.e., at most one element in each row of matrix is 1.

The impact of antenna number is illustrated in Fig. 6(a). It can be observed that the EER performance improves almost linearly in the logarithmic scale with increasing . Therefore, an accurate detection can be guaranteed within a feasible number of iterations as long as the devices are supported by enough antennas. This observation can be explained by the fact that more antennas bring more incoming messages for each VN according to (17). As a result, the final decision can be made with higher accuracy.

Similar observations can be found in Fig. 6(b) for the impact of device number . When a small number of devices are supported by the massive MIMO BS, e.g. , the MP-AD algorithm exhibits higher accuracy with fewer iterations. However, the detection accuracy deteriorates as the number of devices increases.

The impact of SNR is illustrated in Fig. 6(c). It is shown that as SNR increases, the EER performance of the MP-AD algorithm gets improved.

As shown in Fig. 6, the MP-AD algorithm outperforms the other three estimators in terms of the detection accuracy. Specifically, the matrix follows a different distribution from the estimation target in [19]. Therefore, the V-AMP estimator for sparse signal recovery does not work since it fails to consider the transmission constraint in each row of . In addition, according to the simulations, at most 10 iterations are required by the MP-AD algorithm to reach the fixed EER point, which proves the implementation feasibility of the MP-AD algorithm.

### V-C Improvement by DNN-MP-AD Algorithm

For training convenience, we focus on small-scale system models. In addition, only 8 iterations are considered for the DNN-MP-AD algorithm since it is verified in previous simulations that less than 10 iterations are required by the MP-AD algorithm to reach the fixed EER point.

Similar to the simulations in Fig. 6, we also investigate the performance improvement brought by the DNN-MP-AD algorithm from three different perspectives. The results are illustrated in Fig. 7. It is shown in Fig. 7 that under most settings considered, the DNN-MP-AD algorithm could reduce the EER of the MP-AD algorithm by almost one order of quantity. That is, the DNN structure brings significant improvement to the detection accuracy without causing any additional online computational complexity. According to Fig. 7(a) and Fig. 7(b), the performance improvement of the DNN-MP-AD algorithm is more prominent for overloaded systems, i.e., when