I Introduction
MIMO technology is a technique that exploits the spatial dimension by adding more antennas [1] to increase spectral efficiency and network capacity. However, conventional MIMO configurations fall short of providing the required spatial diversity in the upcoming fifth generation (5G) mobile communication standard, which promises to connect billions of devices and achieve several gigabitpersecond data rates. Towards this end, massive MIMO has been introduced [2], in which few hundred antennas serve tens of terminals over time and frequency resources.
Despite the extensive work on massive MIMO, large MIMO will also play an important role in the future. Large MIMO systems use tens of antennas in communication terminals, and can afford large number of antennas on both the transmitter and the receiver sides [3], such as for example , , , and configurations. Large pointtopoint MIMO wireless links are of specific interest in 5G for highspeed wireless backhaul connectivity between base stations (BSs). Also, multipointtopoint large multiuser MIMO can be used in 5G in the uplink when the number of served transmitting users is less than, but comparable to, the number of BS antennas. Nevertheless, large MIMO can also be considered for pointtomultipoint downlink multiuser MIMO (MUMIMO) [4], whether in enhanced versions of the current wireless communications standards, or in 5G, where users sharing the same physical resource blocks are chosen based on the degree of orthogonality of their cascaded precoder and channel.
After being traditionally driven by diversitymultiplexing tradeoffs, recent wireless communication system designs have been driven by two factors; system performance in terms of throughput and bit error rate, and system complexity in terms of processing latency and computational complexity. The performance of MIMO systems is largely determined by the detection scheme at the receiver side; various schemes provide different performancecomplexity tradeoffs [5]. Linear detectors, such as zero forcing (ZF) and minimum mean square error (MMSE), are the leastcomplex, but the leastoptimal as well. On the other hand, maximum likelihood (ML) detectors are optimal but most computationally intensive, with complexity that grows exponentially with the number of antennas. Several suboptimal detectors fill the spectrum in between, including sphere decoders (SD) and their variants [6, 7, 8, 9]. Moreover, in addition to conventional hardoutput (HO) detectors, softoutput (SO) detectors play an important role in nearcapacity achieving systems, but are more complex because they require processing significantly more signal combinations to generate reliability information.
In massive MIMO systems, linear detectors achieve nearoptimal performance by exploiting the channel hardening effect [10], and approximate matrix inversions via Neumann series approximations [11] are used for practical implementations. However, large MIMO systems do not have very large receivetotransmit antenna ratios. Hence, they cannot achieve the performance gains of asymmetric massive MIMO systems, and they do not allow for similar practical implementations, where Neumann series expansions fail to converge. For large MIMO systems, the detection schemes in the literature are grouped into several areas: detection based on local search [12, 13]
; detection based on metaheuristics
[14, 15]; detection via message passing on graphical models [16, 17]; lattice reduction (LR) aided detection [18, 19]; and detection using Monte Carlo sampling [20]. However, for these schemes to achieve a nearML performance with high orders of antennas and modulation constellations, the entailed complexity would be prohibitive.A popular family of MIMO detectors that achieves good performancecomplexity tradeoffs employs nonlinear subsetstream detection. The nullingandcancellation (N/C) detector [21] is a lowcomplexity member of this family; it consists of linear nulling followed by successive interference cancellation (SIC). The chase detector (CD) [22]
is a more complex member of this family; it first creates a list of candidate decision vectors, and then chooses the best candidate from this list as a final decision. Chase detection is considered a special case of list detection. However, it differs from list sphere decoding (LSD)
[23] for example in the way the list is generated and administered; in LSD, list admission is based on proximity to an initial solution, while in CD, list generation is deterministic, and is done by spanning all possible subtree symbols emanating from the root symbol in a specific layer of interest. Furthermore, other popular subsetstream detectors exist (e.g., [24, 25, 26]), that decompose the channel matrix into lower order subchannels to reduce the number of jointly detected streams.All aforementioned subsetstream detectors make use of QR decomposition (QRD). However, the SO subspace detector (SSD) [27], transforms the channel matrix via a punctured QRD, which we refer to in this paper as WR decomposition (WRD). In [28, 29, 30], WRDbased SSD is generalized to allow for joint detection of arbitrarysized subsets of decoupled streams, and efficient implementation methods are presented. The QRDbased version of this detector is called the layered orthogonal lattice detector (LORD) [31, 32], and both are special cases of CD. To the best of our knowledge, the use of punctured QRD in MIMO detectors has not been studied analytically in the literature, and its applicability to large MIMO systems has not been addressed.
The contributions of this paper are summarized as follows:

We present a family of WRDbased detectors that build on popular QRDbased detectors. In particular, we propose a punctured ML (PML) detector, a punctured N/C (PN/C) detector, a punctured CD (PCD), as well as a hardoutput subspace detector.

We analyze mathematically the bit error rate (BER) performance of the proposed HO detectors. First, the diversity gain is characterized and used to show that channel matrix puncturing does not negatively affect the diversity gain in HO detection. Second, the performance of these detectors is studied via a probabilistic BER characterization.

We extend the study for several variations of SO detection schemes, and show that significant performance gains can be achieved with channel puncturing.

We propose efficient architectures and analyze the computational complexity of the proposed detectors. We show that the computational savings are much more pronounced with large MIMO dimensions.

We study the performance of the proposed detectors in the context of large MIMO with high order modulations, and in the presence of spatial channel correlation. We show that the performance of these schemes scales up efficiently with high orders, and that they are superior to their QRDbased counterparts in the presence of channel correlation.
The remainder of the paper is organized as follows. The system model and basic reference detectors are presented in Sec. II. The proposed WRDbased ML detector, N/C detector, CD, and SSD detection algorithms are presented in Sec. III. The achievable diversity gains of these detectors are derived in Sec. IV, followed by a probabilistic BER characterization that describes the behaviour of the proposed approaches in Sec. V. The SO versions of the detectors are then proposed in Sec. VI, and an efficient architecture is proposed in Sec. VII alongside a complexity study. Finally, simulation results are presented in Sec. VIII.
Regarding notation, bold upper case, bold lower case, and lower case letters correspond to matrices, vectors, and scalars, respectively. Scalar norms, vector norms, and Frobenius norms are denoted by , , and , respectively. , , , , and , stand for the expected value, trace function, real part, transpose, and conjugate transpose, respectively.
refers to normal distribution, and
refers to the Qfunction, where . is a punctured matrix with entries , andis an identity matrix of size
. Detector ML optimality is in the logmax sense.Ii System Model and Reference Detectors
Iia System Model
We consider spatial multiplexing in a MIMO system with transmit antennas and receive antennas. The equivalent complex baseband inputoutput system relation is given by
(1) 
where is the received complex vector,
is the channel matrix with entries that are assumed to be i.i.d. complex, circularly symmetric Gaussian random variables,
is the transmitted symbol vector, andis a complexvalued circularsymmetric Gaussian random vector with zero mean and variance
(). Each symbol , , belongs to a normalized complex constellation (), and we have , where is the finite set of points on a dimensional lattice generated by all possible symbol vectors. For simplicity, we assume a uniform modulation constellation on all layers, and hence . The coded bitrepresentation of a symbol is denoted by , where and for . The signal to noise ratio () is defined in terms of the noise variance as .At the receiver side, and assuming perfect knowledge of the channel, QRD decomposes as , where has orthonormal columns and , and is a square uppertriangular matrix (UTM) with real and positive diagonal entries. The transformed receive symbol vector can then be equivalently expressed as
(2) 
where and are statistically identical since is orthonormal.
An “exhaustive” logmax ML detector searches the complete lattice , computing Euclidean distance metrics, to solve for
(3) 
Note that the SD achieves exact logmax ML performance with less computations, by executing a treebased search on a subset of , skipping vectors in the space whose partial distance already exceeds the current best distance.
IiB NullingandCancellation (N/C) Detector
The N/C detector [21] is used in the widely known vertical Bell Labs layered space time (VBLAST) architecture [33]. When combined with QRD, N/C becomes a computationallyefficient procedure which is highly sensitive to layer ordering. Nulling is performed by linearly premultiplying the received vector with , which suppresses the interference from , , at the layer. This is followed by SIC (backsubstitution and slicing) to suppress coantenna interference; hence, is computed as
(4) 
for , where is the slicing operator on the constellation . N/C serves as an upper bound on the performance of other detection schemes.
IiC Chase Detector (CD)
The CD [22] mitigates error propagation in SIC by populating a list of candidate symbol vectors for final decision. It first partitions , , and in (2) as
(5) 
where , , , , , , is a vector of zerovalued entries, and . Then, for each at the root layer, a candidate vector is calculated as in (4) and added to . The maximum number of candidate vectors in is , and the final HO decision vector is chosen from to be
(6) 
Note that CD differs from LSD [23] in several aspects. For example, LSD list admission depends on runtime channel conditions, which makes it nondeterministic and more complex. Also, in a SO setting, LSD does not guarantee computing all the required distance metrics.
IiD Layered Orthogonal Lattice Detector (LORD)
Instead of executing the CD routine once, LORD repeats chase detection with different layer orderings, each time with a different layer as root, by cyclically shifting the columns of . The best output from these trials is the final solution. Each permuted at step , , is QRdecomposed into and according to (5). Let denote the output CD solution from step . Then, the final solution is , where
(7) 
Since distances are preserved under different layer orderings with QRD, the accumulated candidate vectors across different partitions form an “extended” candidate list, despite the potential overlap of lists from each partition. Therefore, the added gain with LORD compared to CD is significant.
Iii Detection Schemes Based on Punctured Channel Matrix
Iiia Punctured QR Decomposition (WRD)
WRD transforms into a punctured UTM with , by puncturing entries between the diagonal and the last column through a matrix , such that . A brute force approach for computing [27] involves matrix inversions, which is complex and prone to roundoff error. However, an alternative approach that employs QRD followed by elementary matrix operations can be used to derive and [29].
Let be QRdecomposed such that . Obviously, and for all , hence, . Now assume the entry in row of is to be nulled, for and . We have and , from which it follows that . Hence, with , the equations
(8)  
(9)  
(10) 
when repeated for , would puncture the required entry and update the entry in row of , as well as update the column of accordingly, while
(11)  
(12)  
(13) 
would normalize in and update the nonzero entries in row of accordingly. All these operations are to be carried for . The resultant is , and the resultant is . The transformed received symbol vector after applying can then be expressed as
(14) 
such that
(15) 
where in this case is a diagonal matrix. For example, in the special case of MIMO, is obtained from by puncturing entries :
(16) 
Note that the column at the root layer in (layer here), remains orthogonal to all other columns. Hence, taking the expectation of over , we have:
(17) 
Therefore, although the resultant noise after puncturing is colored, WRD preserves the noise variance at the layer of interest. However, the statistical properties of the elements of get distorted under puncturing. The nonzero elements of (given i.i.d. Rayleigh fading) are known to be independent random variables with the following distributions [34, 21]:

The offdiagonal elements are circular symmetric complex Gaussian with unit variance.

The square of the
diagonal element is chisquared distributed with
degrees of freedom, and its probability density function is given by(18)
where chisquared comes from the sum of squares of Rayleigh distributed random variables. While the distributions of nonzero offdiagonal elements remain intact, the distributions of diagonal elements at upper layers , lose degrees of freedom from down to , as depicted in Fig. 1 for a channel matrix. This is caused by the fact that each puncturing operation at layer renders the column of dependent on one of the remaining columns, thus eliminating two degrees of freedom from the corresponding distribution of .
Empirical cumulative distribution functions (CDFs) of the diagonal elements of (a)
, and (b) shown in dotted lines compared to theoretical chisquared CDFs in solid lines.Similar to the ML detector, an “exhaustive” PML detector searches to find
(19) 
Premultiplying by , unlike , modifies Euclidean distances, hence we have
(20) 
Note that this minimum distance detector is not optimal due to the presence of colored noise.
IiiB Punctured N/C Detector (PN/C)
With PN/C, we null by premultiplying by instead of , and perform SIC as
(21) 
for , where , and . Note that slicing on layers can be done in parallel since is diagonal.
IiiC Punctured Chase Detector (PCD)
The PCD builds on the partition in equation (15), and performs the operations of a CD (Sec. IIC). A modified list of candidate symbol vectors is thus created. The distance of a vector is given by
(22) 
For a given , the distance in (22) is minimized as
(23)  
(24)  
(25) 
where , which is a vectorized slicing operation, and . The symbol vector is then added to , together with its distance . The final HO symbol vector is found from as the one with smallest distance.
While the PCD computes distances only to candidate symbol vectors, for a given layer ordering and channel partition, it is clear from (23) that it achieves the exact performance as that of the PML detector. In other words, there is no vector in the lattice , outside the set , that can have a smaller distance metric than that of the PCD solution. The proof goes as follows:
(26)  
(27)  
(28)  
(29) 
IiiD VectorBased SubSpace Detector (VSSD)
The VSSD is an extension to PCD, the same way LORD is an extension to CD. The columns of are cyclically shifted, and punctured UTMs are generated as shown in Fig. 2. Each permuted at step , , is WRdecomposed into and according to (15). Let denote the PCD solution from step . The final solution is , where is defined as:
(30) 
Note that we revert back to the original space of to compute the true Euclidean distance metrics in (30). The gain achieved by VSSD compared to PCD is limited, since each generates an independent space, and hence we end up taking the best output from independent trials. The VSSD is in effect the HO version of the reference SO SSD [28], and we refer to it by simply SSD in the remainder of this paper.
IiiE SymbolBased SubSpace Detector (SSSD)
As a variation of SSD, the SSSD selects at each step , only the root symbol of the output vector as a component of the final output vector. Thus, the output vector gets assembled one symbol at a time over executions of PCD, where
(31) 
For example, in a MIMO system, we have , where is the HO solution of a PCD following the partition in Fig. 2(a). Similarly , , and , are obtained following the partitions (b), (c), and (d), respectively. Note that we can define symbolbased LORD (SLORD) in a similar manner:
(32) 
Iv Analysis of Achievable Diversity Gain
It is known that ML detection achieves full receive diversity , and it can be shown that the N/C and PN/C detectors, being special cases of ZF with decision feedback, can only achieve a receive diversity gain of . Moreover, it can be argued that both SSD (VSSD) and LORD also achieve full diversity, since they exploit the full channel matrix to compute distance metrics. In what follows, we study the achievable diversity gains of PML (PCD), SSSD, and SLORD.
Iva Punctured ML Detector / Punctured Chase Detector (PML/PCD)
To capture the diversity order of PML, we derive the pairwise error probability (PEP). Suppose that is transmitted, while is erroneously detected, the PEP can be expressed as
(33)  
(34)  
(35) 
where is the probability that event occurs, and . Since consists of circular symmetric complex Gaussian random variables, then so is . It is easy to show that
(36)  
(37) 
where is introduced since is a scalar. Hence, we have
and therefore,
(38) 
where the inequality holds since (section 5.2 in [35]). Moreover, using union bound, we have
(39) 
where , and . Finally, using the Chernoff bound, the average PEP is upper bounded as
(40) 
where since the columns of were normalized in (11).
For regular ML detection [36, 5, 37], we have
(41) 
where the expected value over the elements of results in full receive diversity , because each column of contains
independent Rayleigh distributed random variables, whose square is exponentially distributed. However, with
instead of in PML detection, the first columns have single diagonal elements, whose squares are chisquared distributed with degrees of freedom, which corresponds to two exponentially distributed complex random variables, and hence a receive diversity order equal to . Only column of provides a diversity equal to . Therefore, by analogy with (41), the average PEP for the PML detector is(42) 
and hence PML detection can not achieve a receive diversity gain of order greater than 2. However, noting that PML and PCD are identical (Sec. IIIC), and knowing that the regular CD achieves a receive diversity order of 2 (more on that in Sec. VB), we conclude that channel puncturing does not reduce the diversity gain of the CD.
IvB SymbolBased SubSpace Detector (SSSD)
To capture the diversity order of SSSD, we derive a modified PEP. Without loss of generality, we assume that layer is the root layer of interest. Hence, an error occurs when is transmitted and is erroneously detected, with probability
(43)  
(44)  
(45) 
where , is computed as in Sec. IIIC, and are the Nth column of and , and and are the first columns of and , respectively. Let , and let ; we have
(46) 
Since and the columns of are circular symmetric complex Gaussian, then so is . Thus, it can be shown that
(47)  
(48)  
(49) 
Hence, continuing from (IVB), we have
(50)  
(51)  
(52)  
(53) 
Then, using union and Chernoff bounds, with (), the average PEP can be upper bounded as
(54)  
(55)  
(56) 
where the last approximation holds since the second exponential term is less that , with equality at high (. Finally, taking the expectation over all squared elements of , which are exponentially distributed, we obtain
(57) 
The denominator represents noise plus interference, hence, SSSD appears to achieve a full receive diversity gain at the layer of interest when BERs are plotted in terms of signaltointerferenceplusnoise ratio (). In the case of SLORD, following a similar derivation, the average PEP can be expressed as
(58) 
where , and consists of the first columns of . Note that can be expressed as
Comments
There are no comments yet.