Recently, massive Multiple-Input and Multiple-Output (MIMO), in which the Base Station (BS) has a large number of antennas (e.g., hundreds or even more), has attracted more and more attention [3, 2, 4, 1, 5, 6]. In particular, massive MIMO is able to bring significant improvement both in throughput and energy efficiency [3, 5, 6, 2, 4, 1]. Furthermore, the access demand of wireless communication has increased exponentially in these years. Based on the forecast from ABI Research, the number of wireless communication devices will reach 40.9 billion in 2020 , and the units of Internet of Things (IoT) are predicted to increase to 26 billion by 2020 . As a result, due to the limited spectrum resources, overloaded access in the same time/frequency/code is inevitable in the future wireless communication systems . Non-Orthogonal Multiple Access (NOMA), in which all the users can be served concurrently at the same time/frequency/code, has been identified as one of the key radio access technologies to increase the spectral efficiency and reduce the latency of the Fifth Generation (5G) mobile networks [10, 11, 12, 13, 15, 16, 14]. In addition, the system performance of NOMA can be further enhanced by combining NOMA with MIMO (MIMO-NOMA)[17, 22, 18, 23, 20, 19, 21].
In this paper, an overloaded massive MIMO-NOMA uplink, in which the number of users is larger than the number of antennas , is considered, i.e., . It is unlike the conventional massive MIMO which requires [4, 3]. We shall propose a low-complexity iterative detection for the overloaded massive MIMO-NOMA. Furthermore, the convergence of the iterative detection is analyzed.
I-a Motivation and Related Work
Unlike the Orthogonal Multiple Access (OMA) systems, e.g. the Time Division Multiple Access (TDMA) and Orthogonal Frequency Division Multiple Access (OFDMA) etc. [24, 25], the signal processing for the NOMA will impose higher computational complexity and energy consumption at the base station (BS)[11, 12, 13, 14]. Low-complexity uplink detection for MIMO-NOMA is a challenging problem due to the non-orthogonal interference between the users [20, 21, 22], especially when the number of users is large. For example, the optimal Multiuser Detector (MUD) for the Multiuser MIMO (MU-MIMO), such as the Maximum A-posteriori Probability
Maximum A-posteriori Probability(MAP) detection or Maximum Likelihood (ML) detection, was proven to be NP-hard [26, 27, 28]. Low-complexity uplink detection for MIMO-NOMA is hence desirable [20, 22, 21]. In , a low-complexity iterative Linear Minimum Mean Square Error (LMMSE) detection was proposed to approach the sum capacity of the MIMO-NOMA system, but the complexity of LMMSE detection is still too high due to the need of performing large matrix inversion . To avoid the matrix inversion, a graph based detection called Message Passing Algorithm
(MPA) is applied, which is encountered in many computer science and engineering problems such as signal processing, linear programming, social networks, etc.[30, 31, 32, 33, 34, 35]. There are different types of MPAs for MUD, such as belief propagation on Gaussian graphical model (Ga-BP) and Gaussian belief propagation (GaBP) algorithms [36, 40, 37, 38, 39], BP on “non-Gaussian" probabilistic models [41, 42]. These BP algorithms exchange extrinsic information in the process. Please note the different meanings of “Gaussian" in GaBP and Ga-BP: “Gaussian" in GaBP means that the messages in BP are Gaussian (represented by mean and variance), while “Gaussian" in Ga-BP means that the system can be described by a Gaussian probabilistic model. Since Gaussian graphical problems can be efficiently solved by GaBP, Ga-BP is presumed to be GaBP by default. Separately, the authors in  look into another class of MPA for MUD: Gaussian Message Passing (GMP) that exchanges extrinsic and/or a-posteriori information. In addition, SA-GMP is proposed to improve the MSE performance and convergence properties via spectral radius minimization .
Apart from that, in [45, 44, 47, 46], an advanced Approximate Message Passing (AMP), which is asymptotically optimal under many practical scenarios, is proposed. AMP is simple to implement in practice, i.e., its complexity is as low as per iteration. However, AMP may become unreliable in ill conditioned channels [48, 49].
For factor graphs with a tree structure, the means and variances of the MPA converge to the true marginal means, if LLRs are used as messages and the sum-product algorithm is applied [31, 32]. However, if the graph has cycles (loops), the MPA may fail to converge. In general, on loopy graphs, different types of MPAs may have different convergence behaviors, such as different convergence conditions/speeds/performances. To the best of our knowledge, most previous works of the MPA focus on the convergence of GaBP algorithms for Gaussian graphs [36, 37, 38, 39]. In [40, 41], the convergence of message passing algorithms are analyzed. However, the messages in 
are for vector variables, which need high-complexity matrix operations during the message updating. Therefore, the result in  is only applied for Code Division Multiple Access (CDMA) MIMO with binary channels. For the case of underloaded massive MIMO-NOMA, in  the authors have analyzed the convergence of underloaded GMP and underloaded SA-GMP, which are distinct from the overloaded GMP and overloaded SA-GMP in this paper. However, the convergence of GMP for the overloaded massive MIMO-NOMA is far from solved. In Fig. 1, we give an overview of the detection algorithms for the overloaded massive MIMO-NOMA system.
In this paper, we first analyse the convergence of GMP for the overloaded massive MIMO-NOMA system. We discover that the convergence of GMP depends on a spectral radius
, i.e., the estimationof GMP at the th iteration converges to a fixed point with a square error . Specifically, the GMP converges when the spectral radius is less than 1, i.e., , otherwise it diverges. Furthermore, we optimize the spectral radius with linear modifications (e.g. scale and add), and obtain a new low-complexity fast-convergence MUD for coded overloaded () massive MIMO-NOMA system. The contributions of this paper are summarized below.
We prove that the variances of GMP definitely converge to the MSE of LMMSE detection. This provides an alternative way to estimate the MSE of the LMMSE detector.
We prove that the convergence of GMP depends on a spectral radius.
Two sufficient conditions for which the means of GMP converge to a higher MSE than those of the LMMSE detector for are derived.
A new fast-convergence detector called scale-and-add GMP (SA-GMP), which converges to the LMMSE detection, and has a faster convergence speed than GMP for any , is proposed.
I-C Comparisons with literature
Here we compare our contribution with the literatures:
I-C1 Relationship with AMP and damping AMP
The asymptotic variance behavior of AMP is the same as GMP [50, 51], because for large MIMO systems, the approximation in AMP is accurate, which is guaranteed by the Law of Large Numbers (LLN) [44, 46]. However, their means behave differently because GMP exchanges APP information at the sum node, while AMP is derived by extrinsic-information-based message passing. As a result, the mean of GMP may have a worse MSE than AMP/GaBP due to the correlation problem in APP processing. However, the proposed optimized SA-GMP has better MSE and convergence properties than GMP and AMP/GaBP. Different from AMP (whose each step can be rigorously described by state evolution (SE)), SA-GMP only requires its convergence properties (e.g. the final fixed point, convergence condition and speed) to be optimized. Furthermore, although the damping schemes [52, 53] have been used to prevent the divergence of AMP, there are fundamental differences between the damping AMP and SA-GMP. First, in the damping AMP, the damping operation is employed in every message update step, including mean update and variance update. However, in SA-GMP, the scale-and-add (SA) modification is only applied in the mean message update at the sum node, while the variable-node message update and the variance update remain unchanged from the GMP. Second, in the damping AMP, there is no closed-form solution for the optimal damping parameter, which can only be determined empirically, or by exhaustive search. In contrast, the optimal relaxation parameter in SA-GMP has been mathematically derived in a closed form by minimizing the spectral radius (see Corollary 3).
I-C2 Differences from the underloaded case
Due to the overloaded number of users, the expression of the GMP is different from the underloaded case in . Hence, the underloaded GMP and underloaded SA-GMP have poor performance when applied in overloaded scenario . Furthermore, the convergence results of GMP for the overloaded massive MIMO-NOMA are also different from that in : i) the convergence condition is different; ii) if the GMP for the overloaded massive MIMO-NOMA converges, it converges to a higher (rather than the same) MSE than that of the LMMSE detector.
This paper is organized as follows. In Section II, the overloaded massive MIMO-NOMA and LMMSE estimator are introduced. The GMP is elaborated in Section III. Section IV presents the proposed fast-convergence SA-GMP. Numerical results are shown in Section V, and we conclude our work in Section VI.
Ii System Model and LMMSE Estimator
Fig. 2 shows a coded overloaded massive MIMO-NOMA system, in which autonomous single-antenna terminals simultaneously communicate with the BS which has an array of antennas [4, 3]. Both and are large numbers, and . The received at time is
where , is a fading channel matrix, is an Gaussian noise vector, and is the message vector sent from users at time . We assume that the BS knows the , and only the real system is considered because the complex case can be easily extended from the real case111A complex system model can be converted to a corresponding real-valued system model (1), where is the real-valued signal vector, accordingly , and Note that and denote the real part and imaginary part respectively. . Let be the estimated vector of , the mean square error (MSE) of the estimation is defined as .
As illustrated in Fig. 2, at user , an information sequence is encoded by a common (used by all users) error-correcting code (with code rate ) into an -length coded sequence , , which is interleaved by an -length independent random interleaver and then is modulated by a Gaussian modulator to obtain a transmission vector . In this paper, we assume that each
is independent identically Gaussian distributed, i.e.,for . The source variance denotes the power constraint or the large-scale fading coefficient of each user. To simplify the analysis, we assume for .
Regarding the Gaussian transmission assumption, in the real communication system, discrete modulated signals are generally used. However, according to the Shannon theory [57, 58], the capacity of a Gaussian channel is achieved by a Gaussian input. Therefore, the independent Gaussian source assumption is widely used in the design of communication networks [59, 61, 60]. In practice, the capacity-achieving SCM, the quantization and mapping method, Gallager mapping, etc. may be used to generate Gaussian-like signals [64, 62, 63].
In this paper, the transmissions are assumed to be Gaussian sources, but it does not mean that the proposed SA-GMP only works for Gaussian transmissions. The Gaussian source assumption is employed to prove that the SA-GMP converges to the optimal LMMSE detection. For non-Gaussian transmissions, the SA-GMP also works well but it is hard to prove the optimality of it. It is similar to the case of LMMSE detection which is optimal for the Gaussian source detection, but it also works well in many non-Gaussian cases. For example, in , it shows that the iterative LMMSE detector with a matched multiuser coding is sum-capacity achieving for the MIMO-NOMA system. Furthermore, our simulation results in Fig. 11 show that the proposed method in this paper also works well for the practical discrete MIMO-NOMA system.
Ii-B Iterative Receiver
We adopt a joint detection-decoding iterative receiver, which is widely used in the CDMA systems  and the Inter-Symbol Interference (ISI) channels  for the overloaded massive MIMO-NOMA system. The messages , , , and , , are defined as the input and output estimates of at ESE and the decoders. As illustrated in Fig. 2, at the BS, the received signals and a-priori message are passed to a MIMO multi-user detector (MUD) called the elementary signal estimator (ESE) to estimate the MUD-extrinsic message for each decoder , which is then re-demodulated and re-deinterleaved (with ) into , . The corresponding single-user decoder calculates the decoder-extrinsic message based on . Then, this message is interleaved (by ) and re-modulated to obtain new a-priori message for the ESE. This process is repeated iteratively until the maximum number of iterations is achieved. In fact, the messages , and can be replaced by the means and variances respectively if the messages are all Gaussian distributed.
In the paper, we consider the low-complexity GMP as the ESE for the iterative receiver. Before discussing the GMP, we first present some results of the LMMSE estimator. These results will be used to support the convergence analysis and performance comparison for the GMP in the rest of this paper.
Ii-C LMMSE Estimator
In the massive MIMO-NOMA, the complexity of the optimal MAP estimator is too high, and the LMMSE estimator is an alternative low-complexity ESE. For the massive MIMO-NOMA system, the LMMSE estimator is an optimal linear detector  when the sources are Gaussian distributed, and is sum-capacity achieving with a matched multiuser coding . Let and denote the expectation and variance of the prior message . The LMMSE detector  is
where , , and is the entry of .
From (2), the MSE of LMMSE detector is calculated by
Ii-C1 MSE of LMMSE Estimator
The following proposition is obtained from random matrix theory222We first consider where and . According to the results in , Eqn. (5) gives an asymptotic MSE estimation for the LMMSE detection of (by treating the variance of noise as ). Then, can be obtained by scaling both sides of the equation with . Since the scaling (multiplying or dividing) operation does not change the MSE of LMMSE detection. Thus, Eqn. (5) also gives an asymptotic MSE estimation for the LMMSE detection of . [71, 43].
Proposition 1: In the massive MIMO-NOMA, where is fixed, is large, and , the MSE (or a-posteriori variance) of the LMMSE detector is given by
where , and is the signal-to-noise ratio.
Ii-C2 Comparison Between Coded and Uncoded MIMO-NOMA
In the uncoded MIMO-NOMA, the input variance is fixed during the iteration, i.e., . In this case, for an overloaded system with , the MSE of the LMMSE detector is , which is independent of the Gaussian noise and has a poor MSE that is very close to . The reason is that when , the inter-user interference limits the system performance. Hence, we introduce the the error-correcting code to mitigate the errors introduced by the inter-user interference in the overloaded massive MIMO-NOMA. This can be explained from the perspective of information theory. Although, under the power constraint , the sum capacity of the overloaded MU-MIMO system () increases with , the average user rate () decreases to zero when and . Error-correcting codes are employed to decrease the user rate to meet the capacity requirement of the overloaded massive MIMO-NOMA system.
In the coded massive MIMO-NOMA, the variance is decreasing with the increase of the iterations with the decoders. As a result, the system performance is improved through the joint iteration between the LMMSE detector and single-user decoders. We use a simple repetition code (error-correcting code) as an example, in which each user transmits every symbol times. This is equivalent to have antennas at the BS. If , the overloaded system is degenerated to an underloaded system ().
Ii-C3 Complexity of LMMSE Estimator
The complexity of the LMMSE estimator is , where is the number of iterations. When and are large, the complexity of LMMSE estimator is too high to be practical. Hence, designing a low-complexity detector with little or no performance loss for the overloaded massive MIMO-NOMA is important. In this paper, we consider the low-complexity GMP for this purpose.
Iii Gaussian Message Passing Detector and Convergence Analysis
Fig. 3 shows a bipartite factor graph of the MIMO-NOMA system discussed in this paper. The GaBP [40, 33] and its asymptotic version AMP [45, 44] are the state-of-art MPAs for MIMO detection. However, they have convergence difficulty under certain system loads. In this paper, we present the GMP and analyze its convergence for the overloaded massive MIMO-NOMA. GMP is similar to the GaBP in that Gaussian messages are passed on the edges of the Gaussian factor graph. However, while GaBP passes the extrinsic message on the whole graph, GMP updates the a-posteriori messages at the sum nodes. In the rest of this paper, we replace the ESE in Fig. 2 with the GMP, and we drop the subscript for simplicity.
Iii-a Sum-Node Message Update of GMP
Each SN can be seen as a multiple-access process and its message is updated by
where , denotes the th iteration, is the th entry of y, is the entry of in th row and th column, and denotes the variance of the Gaussian noise. In addition, and denote the mean and variance passing from the th VN to th SN respectively, and and denote the mean and variance passing from th SN to th VN respectively. The initial values of and are and , respectively. Different from GaBP, the message update at SN outputs a-posteriori messages to its connected edges.
Iii-B Variable-Node Message Update of GMP
Each VN is a broadcast process and updated by
where , and and denote the estimated variance and mean of from the decoder respectively. The message update at SN outputs extrinsic messages, which is the same as that in GaBP.
Iii-C Extrinsic message output and Decision of GMP
The GMP outputs the extrinsic messages:
where . When the MSE of GMP meets the requirement or the number of iterations reaches the limit, we output the a-posteriori estimation and its MSE :
Remark 2: In the sum-node message update, GMP passes the a-posteriori information, not the extrinsic information (GaBP), from the SNs to the VNs.We show in Section III.F that the variances of the a-posteriori message update and the extrinsic message update converge exactly to the same value, but their means behave differently. It is hard to analyse the mean convergence of GaBP because of its complicated structure. However, the convergence conditions of the mean of GMP can be derived. The reason is that, when using a-posteriori message update at the SN, the GMP converges to a classical iterative algorithm, whose convergence depends on a spectral radius. Based on these points, this paper proposes an optimized SA-GMP to minimize the spectral radius, thus obtaining better MSEs and convergence properties (e.g. the fixed point, convergence condition and speed) than GMP and AMP/GaBP (see Figs. 7-9).
Another point to note is that, in the underloaded GMP and underloaded SA-GMP schemes considered in , the VNs output a-posteriori messages, and the SNs output extrinsic messages. However, the a-posteriori message update at VNs in  will lead to higher performance loss when the MIMO NOMA system is overloaded (i.e. the number of users is larger than that of antennas), because the a-priori information (on each edge) at the variable node account for a larger proportion in the a-posteriori information than that at the sum node, which in turn aggravate the correlation problem in GMP. Therefore, in the proposed SA-GMP, we let the VNs output the extrinsic messages and the SNs output the a-posteriori messages to mitigate the correlation problem.
Here we provide an intuition for the a-posteriori and extrinsic message update. Generally, extrinsic message processing is used to cut off the loops in a loopy factor graph. However, it is shown that extrinsic message update is not necessary for all the processors in a loop. For example, in this paper, for a bipartite loopy factor graph, it only needs extrinsic update at the VNs and a-posteriori update at the SNs. As a result, the loops in the bipartite factor graph are cut off by the partial extrinsic update at the VNs. In detail, in this paper, the variance of GMP converges to the MSE of LMMSE, because (extrinsic update at the VNs) and (a-posteriori update at the SNs) are independent with each other333Similarly, in , on a bipartite loopy factor graph, underloaded GMP only updates extrinsic messages at the SNs, and updates a-posteriori messages at the VNs. In addition, (a-posteriori update at the VNs) and (extrinsic update at the SNs) are independent with each other., which are the same as the case that extrinsic update is used at both the VNs and the SNs, i.e., and are independent with each other. Apart from that, in the mean update of GMP, the partial extrinsic update introduces the term in (34), which greatly reduces the spectral radius so that the GMP can converge in many cases. These are why the partial extrinsic update in each loop works for the GMP. However, if we use a-posteriori update at both the VNs and the SNs, and in the variance update will not be independent with each other as they have a common term . Hence, the variance of GMP will not converge to the MSE of LMMSE. In this case, the in the mean update also disappears, which greatly increases the spectral radius and makes the GMP diverge in most cases. These are the reasons why a full a-posteriori update fails to work for the loopy factor graphs.
Iii-D GMP in Matrix Form
Let , , , and . Assume , , , and . Algorithm 1 shows the detailed process of the GMP. Message update (6) is rewritten to step 5, where and . Let , , , , and . Message update (7) is rewritten to step 7, where and . We have from , and from . In step 10, we let and .
Iii-E Complexity of GMP
As the variance updates in (6)(9) are independent of the received y, they can be pre-computed before the iterative detection. Therefore, in each iteration, the GMP needs about multiplications and additions. Therefore, the complexity of GMP is as low as , where is the number of inner iterations at the GMP and is the number of outer iterations between the ESE and decoders.
Iii-F Variance Convergence of GMP
Proposition 2: In the massive MIMO-NOMA, where is fixed, is large, and , the a-posteriori variances of GMP converge to
where , , and is the signal-to-noise ratio.
See APPENDIX A.
It should be noted that as the user decoders are included at the receiver, the can be very small and thus cannot be neglected in (III-F). From (II-C1) and (III-F), it is easy to verify that , which means that the output extrinsic variances () of the GMP and LMMSE detector are the same. Therefore, we obtain the following theorem.
Theorem 1: In the massive MIMO-NOMA, where is fixed, is large, and , the variances of GMP converge to that of the LMMSE detector.
From (6), also converges to a certain value, i.e., and
Remark 3: In the original GMP, (or ) in the sum-node message update is replaced by (or ), and is thus replaced by . However, when is large, . Therefore, it is easy to find that the original GMP has the same results on the variance convergence. In addition, the above analysis provides an alternative way to estimate the MSE of the LMMSE detector.
Iii-G Mean Convergence of GMP
Previous work  show that in underloaded massive MIMO-NOMA, the mean of underloaded-GMP converges to the LMMSE multi-user detector under a sufficient condition, and an underloaded SA-GMP whose mean and variance always converge to those of LMMSE multi-user detector with a faster convergence speed is proposed. However, for the overloaded case, the underloaded GMP and underloaded SA-GMP in  have poor performance when overloaded, and the mean convergence analysis is different due to intractable interference between the large number of users.
The following theorem gives two sufficient conditions for the mean convergence of the overloaded GMP.
Iii-G1 Classical Iterative Algorithm
We first introduce the classical iterative algorithm and its convergence proposition, which will be used for the mean convergence analysis of the GMP. The iterative algorithm  is
where neither the iteration matrix nor the vector c depends upon the iteration number .
Iii-G2 Mean Convergence of GMP
The following Lemma shows that the mean of GMP converges to the classical iterative algorithm.
Lemma 1: In the overloaded massive MIMO-NOMA with , the sum-node messages of GMP satisfy , , and converge to the following iterative algorithm.
See APPENDIX B.
Based on Lemma 1 and Proposition 3, we can have the following theorem.
Theorem 2: In the overloaded massive MIMO-NOMA, where is fixed, is large, and , the GMP converges to
where and , if any of the following conditions holds.
1. The matrix is strictly or irreducibly diagonally dominant,
See APPENDIX C.
Comparing (2) with (15), we can see that the GMP converges to the LMMSE if . However, from (9) or (III-F), we can see that . Therefore, different from the underloaded case that the GMP converges to the LMMSE detection if it is convergent, the GMP in the overloaded massive MIMO-NOMA does not converge to the LMMSE detection even if it is convergent.
Iii-G3 Spectral Radius and Convergence Point
As and are large, and (see (12)), from Random Matrix Theory, we have
Then, we have the following corollary based on Theorem 2.
Corollary 1: In the overloaded massive MIMO-NOMA, where is fixed, is large, and , the spectral radius is given by
If , the GMP converges to
where , and is given in (A).
According to Theorem 2 and Corollary 1, it is easy to find that the convergence condition of the GMP (e.g., or ) in the overloaded massive MIMO-NOMA is also different from the underloaded case, which requires that or .
Let be the mean deviation vector. From (14), we have
Therefore, the means converge to the fixed point at an exponential rate of , i.e., the smaller spectral radius is, the faster convergence speed it has.
Iii-G4 Comparison with LMMSE
Corollary 2: In the overloaded massive MIMO-NOMA, even if the GMP converges, it converges to a value with a worse MSE than that of the LMMSE detection.
See APPENDIX D.
Corollary 2 denotes that GMP is worse than the LMMSE detection even if it converges.
Iv A New Fast-Convergence SA-GMP that Approaches LMMSE Performance
As shown in Section III, the GMP does not converge to the optimal LMMSE detection and has a low convergence speed. The main reason is that the spectral radius does not achieve the minimum value. Therefore, we propose a new fast-convergence scale-and-add GMP (SA-GMP). As shown in Fig. 4, the SA-GMP is obtained by modifying the mean updates of GMP with linear operators, those are: 1) scaling the received and the channel matrix , i.e., and , where is an element of matrix ; 2) adding a new term on the mean message update at each SN. However, we keep the variance output at the VNs the same as that of the GMP, because it always converges to the variance of the optimal LMMSE detection. By doing so, we can optimize the relaxation parameter to minimize the spectral radius.
Iv-a Sum-Node Message Update of SA-GMP
Iv-B Variable-Node Message Update of SA-GMP
Iv-C Extrinsic message output and Decision of SA-GMP
The extrinsic estimation