As the society is becoming fully networked, the number of wireless devices and the amount of data traffic are growing rapidly, which calls for the development of the fifth generation (5G) wireless communication . It is known that 5G wireless networks will support three generic services including Enhanced Mobile BroadBand (eMBB), Massive Machine Type Communication (mMTC), and Ultra-Reliable Low Latency Communications (URLLC) . Among them, mMTC aims to achieve the communications between large number of low-cost and sporadically active devices with low-data rate . It has been a necessary service driven by many newly emerging use cases, such as Internet of Things (IoT) and machine-to-machine communications. Hence, the reliable support for massive connectivity of devices has been an important issue.
Channel state information (CSI) plays an important role in coherent communication. In time-division duplex (TDD) massive multiple-input multiple-output (MIMO) transmission, uplink (UL) CSI at the base station (BS) can be estimated through orthogonal pilots, and downlink channel estimates can be obtained utilizing channel reciprocity . In massive access scenarios, the channel estimation is challenging because of two reasons . First, since devices are generally low-cost, the duration of pilot sequences is limited by the uplink power budget. Once devices are mobile, it is also limited by the channel coherence time . Hence, the number of devices is much larger than that of available orthogonal pilots, and it becomes impossible for devices to have dedicated pilots. Second, each device sends data to the BS in an intermittent pattern. Therefore, it is not necessary to allocate dedicated pilots to all the devices within the network . These are the key motivations for the study of random access.
Nowadays, many works focus on random uplink access in massive MIMO [4, 5, 6, 7]. For example, in , each device is assigned with a unique non-orthogonal pilot hopping pattern. According to these patterns, active devices select pilots in training phases within multiple transmission slots and data codewords are transmitted afterwards. Hence, devices can be identified and their codewords can be merged. In , coded access and successive interference cancellation is used to realize random uplink access. The strongest-user collision resolution decision criterion was proposed in . Each active device randomly selects a pilot from the pilot set, but only the device with the strongest path-gain can access the network successfully. However, in these literatures, only independent and identically distributed (i.i.d.) channels are considered. In realistic outdoor wireless propagation environments, the BS is located at an elevated position and the scattering around the BS is limited. Hence, most of the channel power lies in a finite number of spatial directions .
In this paper, we consider random pilot and data access in massive MIMO systems with spatially correlated Rayleigh fading channels. A device grouping and pilot set allocation algorithm is proposed. Specifically, devices are divided into multiple groups based on their correlation characteristics. A unique pilot set is assigned to each group. Devices with large channel power overlaps in the angular domain are divided into different groups, since it is difficult to distinguish them if they select the same pilot. Then the random access protocol in  is utilized. The non-orthogonal pilot hopping pattern over multiple slots is predetermined for devices. The construction of pseudo-random pilot hopping patterns can be modeled as the process that each active device randomly selects a pilot from its pilot set in each slot . Since we perform device grouping and pilot set allocation before random access, devices reusing the pilots have less overlapping angle of arrival (AoA) intervals, and impairment caused by pilot interference is reduced. Hence, the proposed scheme shows performance gains over the traditional scheme where devices and pilots are not grouped  in terms of the estimation error and spectral efficiency, and the gains of estimation performance become larger as the channel angular spread (AS) becomes smaller and the signal-to-noise ratio (SNR) becomes higher. Meanwhile, when the channel covariance matrixes of devices reusing the pilots are orthogonal, the estimation error can be minimized.
In this paper, bold lowercase letters and bold uppercase letters denote column vectors and matrices, respectively. The conjugate, transpose, and conjugate transpose are denoted by, , and , respectively. The Euclidean norm and expectation operators are denoted by and , respectively. Let
denote a circularly symmetric complex Gaussian random variable
with zero mean and variance.
Ii Channel Model and Channel Estimation
We consider a massive MIMO system in a single-cell scenario operating in TDD mode, where the BS is equipped with a uniform linear array (ULA) of M antennas and serves K single-antenna devices. The set of devices within the network is denoted by .
We consider spatially correlated Rayleigh fading channels which are frequency-flat fading on a narrow-band sub-carrier. Let denote the UL channel vector between the BS antenna array and device k. Let and denote the incidence angle and the angular region, respectively. The channel vector can be modeled as [8, 9]
where denotes the channel gain function of device k. If BS antennas are spaced with half of wavelength, the steering vector . Supposing , the covariance matrix is given by
where denotes the power azimuth spectrum (PAS), which is assumed to follow the truncated Laplacian distribution in this paper. Let , , and denote the AS, the mean AoA, and the large scale fading coefficient of device k, respectively. Then equals 
From , when the number of BS antennas is sufficiently large, the covariance matrix can be approximated by
where is a unitary M
-point Discrete Fourier transform (DFT) matrix. For, is given by
where for . It indicates that when M
is sufficiently large, channel spatial correlations are related to the channel power distribution in the angular domain. Specifically, eigenvector matrixes of channel covariance matrixes can be approximated by the DFT matrix, and eigenvalues depend on the channel PASs. Besides, channels are assumed to be wide-sense stationary , and channel covariance matrixes can be obtained by the BS.
Assuming the pilot is symbols long, it is smaller than the channel coherence interval. During the training phase, the set of active devices is denoted by and its size is assumed to be . The pilot of device k is denoted by , which is the pilot sequence in the pilot set. The transmit power of the pilot signal satisfies , i.e., . Let denote the set of devices using the same pilot as device k. The received pilot signals at the BS is given by
where N is the additive white Gaussian noise (AWGN) whose elements are i.i.d. as . The SNR . After decorrelation, the channel observation of device k equals
The minimum mean square error (MMSE) estimate of is given by
Based on the orthogonality principle of MMSE estimation, the covariance of channel estimation error can be obtained as
Iii DGPSA-based Random Access Protocol
Since the number of devices within the network is larger than that of orthogonal pilots, it is impossible to allocate a dedicated pilot to each device. Hence, random access becomes a necessary solution. In this section, a device grouping and pilot set allocation (DGPSA) algorithm is proposed, which is performed before the random access process. The characteristics of correlated channels are fully utilized in the DGPSA-based random access protocol. Finally, the channel estimation error, its theoretical lower bound and the spectral efficiency of the proposed scheme are derived.
Iii-a DGPSA Algorithm
In this subsection, a DGPSA algorithm is proposed and described in Algorithm 1. The main idea is that channel covariance matrixes of devices reusing a pilot set should be as orthogonal as possible, i.e., devices with large AoA interval overlaps should be divided into different groups and randomly access to different pilots. Similar ideas were utilized in  to mitigate the inter-cell pilot contamination. However, in this paper Algorithm 1 is dedicated for device grouping and pilot set allocation in multi-user single-cell scenarios.
In Algorithm 1, pilots are equally divided into groups as . In order to distinguish different devices within a group, the number of pilots in a group should be more than one, which will be illustrated in Section III-B.
The process of device grouping consists of two steps. The first step is to assign devices with similar covariance matrixes to different groups with orthogonal pilot sets. The similarity is measured by the angle between covariance matrixes of different devices. Since these matrixes are Hermitian positive semi-definite, for any two devices i and j , the angle between their covariance matrixes is calculated as
where smaller means weaker orthogonality and stronger similarity. The second step is to assign each ungrouped device to the group where the channel covariance matrixes of devices are as orthogonal as possible.
The output of Algorithm 1 is the grouping pattern , which means that devices in group can randomly access to pilot sequences in the pilot set .
Iii-B Random Access and Channel Estimation
Due to random access, collisions will occur in the pilot domain. Hence, it is impossible to distinguish the transmitting devices based on the received pilots in a slot. As a result, random pilot and data access protocol in  is utilized.
Specifically, a UL transmission frame is divided into L transmission slots as Fig. 1. Each device is associated with a unique and predefined pseudo-random pilot hopping pattern. The pattern of each device consists of pilot sequences in its assigned pilot set. In a transmission slot with symbols, active devices select pilot sequences according to their patterns and send part of data codeword afterwards. At the receiver, the BS runs a correlation decoder across slots and identifies pilot patterns in order to detect the transmitting devices. Maximum Ratio Combining (MRC) is applied to the codeword and the MRC outputs are combined according to the pilot patterns.
When L is large, the transmission of pilots and codewords is affected by asymptotically large number of channel fades and interference events . Relying on the ergodicity of such a process, the estimation error and spectral efficiency can be characterized as in Section III-B and Section III-C. Since long pilot hopping patterns are used as identifiers, this protocol should be applied to delay-tolerant and low-rate applications.
Assuming the number of devices and pilots in group are and , respectively, each device can be allocated with a unique non-orthogonal pilot hopping pattern if . Theoretically, when is sufficiently large, the BS can identify all the devices within the network even if the number of devices is far more than that of pilots.
Devices within the network are independently active with the activation probability. The sporadic and independent activation of devices and the construction of pseudo-random pilot hopping patterns can be modeled as the process that each active device in each slot randomly selects one of the sequences from its pilot set, and then the probability of having active devices within K devices is calculated as 
Let denote possible sets of active devices. Assuming devices in the set are active, its element is denoted by , and denotes possible collision sets of device . The number of colliders to device is . Assuming the collision set is considered, the mean square error of channel estimation (MSE-CE) of device is given by 
Next, we obtain the MSE-CE of a device averaged over all possible sets of active devices, the selection of one device and its colliders. The expected MSE-CE is calculated as
Let and denote the number of devices and pilots in group y, and . The probability of having c colliders to a given device in group y is shown in (14).
The problem of minimizing can be expressed as
Given the number of devices, is determined for . If is minimized for all values of and , can be minimized. Similarly, if is minimized for all kinds of active patterns, all choices of one device and its possible collision events, can be minimized. Hence, the equivalent problem is given by
When , i.e., or for , the effect of can be eliminated and can be obtained. Hence, for any two devices and within the same group, if , can be minimized, which satisfies
does not vary with the fluctuation of . Considering , the theoretical minimum value of is given by
Hence, for each device, if its channel covariance matrix is orthogonal to that of its colliders, the estimation error can be minimized. This can be realized when devices sharing a pilot set have orthogonal covariance matrixes, i.e., they have non-overlapping AoA intervals. The purpose of Algorithm 1 is to make devices with approximately orthogonal matrixes within a group and make the interferenceas limited as possible. Besides, given the number of pilots in a group, when the pilot length decreases, the number of groups decreases, i.e., the number of possible colliders to a device increases and thus increases. Hence, the MSE-CE gap between the DGPSA-based random access scheme and the theoretical lower bound decreases as the pilot length increases.
Iii-C Uplink Sum Rate
Assuming devices in set are active, during the data transmission phase, the signal is transmitted from device to the BS antenna array, where . The received signal is given by
where is the channel vector of active device , is the data transmission SNR, and is the independent additive noise.
The possible collision events of active devices in set are denoted by . The possible collision event is considered. The MRC is utilized at the BS, i.e., . We rewrite “ ” to “” for short, and obtain
The spectral efficiency of device is given by 
Similar to the process in Section III-B, we calculate the expected value of (21) with respect to the number of active devices, active patterns, the number of collision devices, and collision events of active devices. Finally, the expected spectral efficiency of devices within the network is given by
Iv Simulation Results
In this section, we present the simulation and analysis results to evaluate the performance of the proposed scheme.
We consider the UL system with devices, where the BS is equipped with the 128-antenna ULA spaced with a half wavelength. We use the truncated Laplacian distribution in (3) to generate the channel PAS, and the channel power is normalized . For devices within the network, we assume their angular spread degrees (ASDs) are equal, i.e., for
, and their large scale fading coefficients are assumed to be 1. We assume their mean channel AoAs are uniformly distributed within the interval. The channel SNR in training phase and data transmission phase are equal.
We employ the MSE-CE metric to evaluate performances of the DGPSA-based random pilot and data access scheme developed in Section III. In Fig. 2, pilots are equally divided into groups and devices are divided based on Algorithm 1. Since the number of devices in different groups is approximately equal, the assignment of pilots is reasonable. Assuming and dB, Fig. 2 shows the MSE-CE of the DGPSA-based random access scheme and the traditional random access scheme . In , devices and pilots are not grouped. It can be observed that the proposed scheme outperforms the traditional scheme in terms of the MSE-CE in all ASD regime. This is because the overlapping AoA intervals of possible colliders in this paper is less than that in . Especially, if the ASD is small, i.e., channels are strongly correlated, the improvement is significant.
In Fig. 3, for the DGPSA-based random access scheme, pilots and devices are divided into groups based on Algorithm 1. When , each device is pre-allocated with a dedicated pilot. Assuming and , Fig. 3 shows the MSE-CE of the proposed scheme, its theoretical lower bound, traditional random access scheme , and the ideal case where devices have dedicated pilot sequences. We observe from Fig. 3 that the MSE-CE improves as the pilot length increases. Besides, the performance of the proposed scheme is close to the theoretical lower bound regardless of the pilot length in low SNR regime where noise influence dominates. When the SNR is high and pilot interference dominates, the MSE-CE performance gap between the proposed scheme and its theoretical lower bound increases as the pilot length decreases. This is because the proposed algorithm makes the possible collision devices have approximately orthogonal channel covariance matrixes but not strictly orthogonal matrixes, and shorter pilot will increase the number of colliders and the residual interference. However, the proposed scheme shows obvious performance gains over the traditional scheme especially in high SNR regime. Furthermore, the performance gap between the proposed scheme with and the ideal case with is not large.
Random access has been an important topic because the number of pilot sequences is limited and the activities of devices are sporadic. However, most of related works concentrate on i.i.d. channels with fewer focus on realistic correlated outdoor wireless propagation environments. In this work, a DGPSA-based random pilot and data access protocol is proposed for massive MIMO systems with spatially correlated Rayleigh fading channels. Specifically, devices and pilot sets are divided into different groups according to the DGPSA algorithm, and devices within the same group have less overlapping channel AoA intervals. Then active devices perform random pilot and data access. The theoretical MSE-CE lower bound is derived. The simulation results show that the proposed scheme outperforms the traditional scheme in terms of the MSE-CE and spectral efficiency. Furthermore, the MSE-CE performance gains are more significant in smaller ASD and higher SNR regime. Besides, the MSE-CE of the proposed scheme is close to its theoretical lower bound over a wide SNR region especially for long pilot sequence. Hence, the DGPSA-based random pilot and data access protocol is crucial and it is suitable to multiple low-power and intermittently active devices in massive MIMO systems with spatially correlated Rayleigh fading channels.
-  N. G. M. N. Alliance, “5G white paper,” Next Generation Mobile Networks, White Paper, pp. 1–125, 2015.
-  C. Bockelmann, N. Pratas, H. Nikopour, K. Au, T. Svensson, C. Stefanovic, P. Popovski, and A. Dekorsy, “Massive machine-type communications in 5G: Physical and MAC-layer solutions,” IEEE Commun. Mag., vol. 54, pp. 59–65, Sep. 2016.
-  T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers of base station antennas,” IEEE Trans. Wireless Commun., vol. 9, pp. 3590–3600, Nov. 2010.
-  E. de Carvalho, E. Björnson, J. H. Sørensen, E. G. Larsson, and P. Popovski, “Random pilot and data access in massive MIMO for machine-type communications,” IEEE Trans. Wireless Commun., vol. 16, pp. 7703–7717, Dec. 2017.
-  E. de Carvalho, E. Björnson, J. H. Sørensen, P. Popovski, and E. G. Larsson, “Random access protocols for massive MIMO”, IEEE Commun. Mag., vol. 55, pp. 216–222, May 2017.
-  J. H. Sørensen, E. de Carvalho, and P. Popovski, “Massive MIMO for crowd scenarios: A solution based on random access,” in IEEE Globecom Workshops (GC Wkshps), Austin: Academic, Dec. 2014, pp. 352–357.
-  E. Björnson, E. de Carvalho, J. H. Sørensen, E. G. Larsson, and P. Popovski, “A random access protocol for pilot allocation in crowded massive MIMO systems,” IEEE Trans. Wireless Commun., vol. 16, pp. 2220–2234, Apr. 2016.
-  L. You, X. Q. Gao, X. G. Xia, N. Ma, and Y. Peng, “Pilot reuse for massive MIMO transmission over spatially correlated Rayleigh fading channels,” IEEE Trans. Wireless Commun., vol. 14, pp. 3352–3366, Feb. 2015.
-  X. Meng, X. Q. Gao, and X. G. Xia, “Omnidirectional precoding based transmission in massive MIMO systems,” IEEE Trans. Commun., vol. 64, pp.174–186, Nov. 2016.
-  K. I. Pedersen, P. E. Mogensen, and B. H. Fleury, “A stochastic model of the temporal and azimuthal dispersion seen at the base station in outdoor propagation environments,” IEEE Trans. Veh. Technol., vol. 49, pp. 437–447, Mar. 2000.
-  B. Clerckx, and C. Oestges, MIMO Wireless Networks: Channels, Techniques and Standards for Multi-Antenna, Multi-User and Multi-Cell Systems, 2nd ed. Oxford, UK: Academic Press, 2013.
-  H. F. Yin, D. Gesbert, M. Filippou, and Y. Z. Liu, “A coordinated approach to channel estimation in large-scale multiple-antenna systems,” IEEE J. Sel. Area Commun., vol. 31, pp. 264–273, Feb. 2013.
-  E. Björnson, J. Hoydis, and L. Sanguinetti, Massive MIMO Networks: Spectral, Energy, and Hardware Efficiency, Foundations and Trends in Signal Processing: vol. 11, pp. 154–655, Nov. 2017.