## I Introduction

Multiple-input multiple-output (MIMO) communications using a large number of transmit and receive antennas has become mainstream technology in most modern wireless standards, primarily in 5G, in order to support the aggressive targets set on spectral efficiencies. However, achieving the ideal performance promised by this technology requires the use of MIMO detectors whose complexity grows exponentially in the number of transmit antennas and polynomially in the size of the signal constellation . To support low-latency communications while providing high throughput, computationally efficient designs of MIMO detectors that do not incur substantial performance loss are needed, especially for large MIMO dimensions and dense constellations.

The topic of MIMO detection is a classical area of research, and the literature is very rich with schemes that provide various performance-complexity tradeoffs in the design space (e.g., see overviews in [2018_Xu_two_decades, 2009_larsson_MIMO_detection_methods]). The benchmark for performance in the sense of generating good soft decisions on the transmitted information bits remains the maximum likelihood (ML) detection scheme, which provides optimal performance at exponential complexity. Alternatively, the benchmarks for low-complexity are the zero-forcing (ZF) and minimum mean-square error (MMSE) schemes, which decouple the transmit layers through a linear filtering stage to generate log-likelihood ratios (LLRs) for each bit symbol either in parallel or sequentially through decision feedback. Although linear processing incurs only a marginal loss in mutual information between the transmitter and receiver, and offers fairly good performance in fast fading channels, it severely limits the diversity order of a MIMO system in slowly fading channels [2008_larsson_fixed_complexity].

Tree-search based detectors such as sphere decoding [2003_damen_on_ML_detection], list decoding [2003_hochwald_achieving_nearcapacity]

, and other variants map the detection problem into a search problem for the closest signal vector. They find the closest

to the received vector by forming a search-tree and recursively enumerating all symbols in across all layers in from the parent down to the leafs. Such schemes suffer from non-deterministic complexity (see scheduling solutions in [2014_sphereP2_mansour]). To simplify the search process, fixed-complexity schemes such as [barbero2008fixing, 2006_Wenk_ISCAS] limit the search steps to a set of survivor paths. While these schemes are efficient in finding the ML path, they do not necessarily find all the best competing paths that are needed to generate soft decisions.An alternative concept is partial marginalization (PM) [2008_larsson_fixed_complexity, 2011_persson_partial_marg], which exhaustively enumerates only over a small subset of carefully chosen parent layers, and marginalizes over the other

child layers using ZF with decision-feedback estimates. While the bit LLRs for parent symbols are easy to compute, computing bit LLRs for child symbols is complicated by two facts: 1) each child

*bit*

requires a separate QR decomposition (QRD), totalling

, and 2) the LLRs are prone to error propagation for large due to decision feedback. In [2006_siti_novel_LORD], the closely related layered orthogonal lattice detector (LORD) scheme mitigates the first drawback by operating with and computing bit LLRs for the parent symbol only; independent QRDs and tree searches are performed to compute the bit LLRs for all symbols by choosing a different symbol as parent each round.To overcome the second drawback, the WL detection (WLD) scheme [2014_mansour_SPL_WLD] first applies a (non-unitary) filtering matrix to decompose the channel into a sparse lower-triangular matrix (and hence the acronym WLD). It then enumerates across one parent layer and detects symbols in all other child layers in parallel via least-squares (LS) estimates with no decision feedback. The channel matrix is “punctured” to have a special structure in order to break the connections among child nodes, while retaining connections only to the parent. Essentially, all child nodes become leaf nodes, and hence LS estimates are optimal. An immediate consequence is that the LS estimates of the counter hypotheses of the bits in each leaf symbol can be easily derived from the LS estimate itself [2014_sphereP1_mansour]. A closely related concept is the achievable information rate (AIR)-PM detector [2012_rusek_optimal_channel_short, 2017_hu_softoutput_AIR], which derives a “shortened” channel similar to the WLD’s punctured structure using information-theoretic optimizations.

In this paper, we show that the concepts of channel puncturing of [2014_mansour_SPL_WLD] and AIR-PM-based channel shortening of [2017_hu_softoutput_AIR] are related. After introducing the system model in Sec. II, we first present a matrix characterization of the WLD detector of order in terms of Gaussian elimination matrices. We then derive a lower bound on the achievable rate of the WLD detector, as well as a bound on the quality of its hard decision estimate, and show that these bounds approach capacity and the ML hard-decision as increases (Sec. III). We also propose a new *augmented* WLD (AWLD) MIMO detection scheme in Sec. IV in which an augmented channel rather than the original channel is punctured. We derive a lower bound on the AIR of the AWLD detector and characterize its gap to capacity. In Sec. V, we propose an alternate mismatched detection model compared to [2012_rusek_optimal_channel_short]
and use it derive optimal punctured channel matrices that maximize the AIR.
We prove that the AWLD detector is optimal under this model, and is in fact equivalent to the AIR-PM detector of [2017_hu_softoutput_AIR]. The AWLD detector decomposes into an MMSE prefilter and channel-gain compensation stages, followed by an unaugmented WLD. Hence, AIR-optimal channel puncturing can be achieved using simple QR decomposition followed by Gaussian elimination.

## Ii System Model

Consider a MIMO system with transmit antennas and receive antennas. Let represent the MIMO communication channel, which is assumed to be perfectly known at the receiver. The transmit signal is composed of symbols drawn from constellation with average energy , where each symbol is mapped from bits . The receive signal is modeled using the input-output relation

(1) |

where the noise term

. The conditional probability

and metric according to (1) are(2) | ||||

(3) | ||||

(4) | ||||

(5) |

Using the observation and assuming no prior information on , the ML detector generates the LLR of the bit of the symbol in as

(6) |

To avoid computing exponentials, the approximation [2005_moon_error_correcting_codes] can be applied to approximate as (6)

(7) |

In the absence of any structure on , computing the sums in (6) or max terms in (7) have exponential complexities.

## Iii WLD MIMO Detector

Let denote the (thin) QL decomposition [2013_golub_matrix] of , where has orthonormal columns, is lower-triangular with real positive diagonal elements. In [2014_mansour_SPL_WLD, 2014_mansour_eurasip_WLD], a technique to puncture into by nulling all entries below the main diagonal and to the right of the first column ( for and ) using Gaussian elimination is presented. Here, we give an alternate characterization using matrices, and generalize it to other puncturing patterns. Assume is partitioned as

(8) |

where is a real scalar. For non-singular , is given by

(9) |

The diagonal matrix is chosen such that the puncturing matrix satisfies :

(10) | ||||

(11) | ||||

(12) |

The above definition of can be generalized to any lower-triangular puncturing pattern of order as follows:

(13) | ||||

(14) |

where and are given by

(15) | ||||

(16) |

Note that is a non-singular lower triangular matrix with ones on the main diagonal. Also, since normalizes the diagonal elements of , then the remaining eigenvalues of are positive and less than or equal to 1. Therefore, it follows that and , where () and (

) are the maximum (minimum) singular values and eigenvalues of

, respectively. For simplicity of notation, we drop the superscript , with the understanding that the puncturing order is .### Iii-a WLD MIMO Detection Model

By applying the filtering matrix , the metric in (3) computed by the WLD detector takes the form

(17) |

Next, expanding (17) and dropping the irrelevant term , (17) can be rewritten as

(18) |

where , and . The corresponding detection model becomes

(19) |

instead of the true conditional probability in (2). Based on (19), the AIR of the WLD detector is lower-bounded by [2006_arnold_simulation_based]

(20) |

where the expectations are taken over the true channel statistics, and , with being the prior distribution of .

###### Theorem 1.

Assuming , and let

be the signal-to-noise ratio (SNR), then the AIR of the WLD detector is lower-bounded by

###### Proof:

We compute the expectations in (20) as

following [2012_rusek_optimal_channel_short]. Substituting for , and , and applying the matrix inversion lemma [2011_zhang_matrix_theory] followed by standard simplifications, the result follows. ∎

Note that for , we have and , and then , which is the capacity of the channel. In fact, as increases from 1, the metrics computed by the WLD detector approach the hard-decision ML metrics as shown by the following lemma:

###### Lemma 1.

Let and where , then

(21) | ||||

(22) |

where is the condition number of , and are the largest and smallest singular values of , respectively.

###### Proof:

Note that the layer order within the parent set and within the child set is irrelevant. What matters is which layers are selected to form the parent set. for Gaussian inputs can be used as a criterion for parent layer selection, but the complexity of possible combinations grows as . Alternatively, a less sensitive approach to parent layer selection is to do multiple detection rounds, each time choosing new layers as parents and generating bit LLRs for these parent symbols only.

## Iv Augmented WLD MIMO Detector

Instead of basing the detection metric in (3) on , we form the augmented vector and augmented matrix

(24) |

in a manner analogous to the square-root MMSE [2000_hassibi_square-root_MMSE], and reformulate in (3) based on rather than as

(25) |

We next expand the squared-distance in (25) in terms of the projection matrix onto the column space of and its orthogonal complement as

(26) |

Let be the thin QL decomposition of :

(27) |

where is an matrix with orthonormal columns (i.e., but not unitary since ), is lower triangular, and are respectively the upper and lower block matrices of . Note that neither the rows nor the columns of and are orthonormal. Also, from (27), it follows that

(28) | ||||

(29) |

However, (28) is *not* the QL-decomposition of . (29) implies that is a lower-triangular matrix proportional to the inverse of , i.e, . Then, from (27) we have

from which it follows that

(30) |

where is the standard MMSE filter matrix,

(31) | ||||

(32) |

with . Substituting (30) back in (25), we obtain

(33) |

Note that in (33), the term appears explicitly, while tree processing is solely based on in . We therefore puncture using an appropriate puncturing matrix similar to puncturing in (9) or (14) using . For a given puncturing order , we conformally partition similar to (14) and obtain the partition blocks of size , of size , and of size . The resulting punctured augmented matrix is given by

(34) | ||||

(35) | ||||

(36) | ||||

(37) |

where in (36) is chosen to have .

Next, applying to filter in (33) as

(38) |

and dropping the irrelevant term , the metric computed by the *augmented* WLD (AWLD) detector corresponding to (33) takes the form

(39) |

where

(40) | ||||

(41) |

The corresponding AWLD detection model (Fig. 1) becomes

(42) |

###### Theorem 2.

###### Proof:

The lower bound on the AIR of the AWLD detector based on (42) is defined as

(44) |

where assuming . The main difference compared to the proof of Theorem 1 is the effect of the term in (42) when evaluating under Gaussian densities, which annihilates the effect of the prior density to give

(45) |

After some manipulations, the expectations in (44) become

Substituting (41) and (40) for and , and applying (31) for , then . Also, it is easy to show that

(46) |

from which it follows that this matrix product is Hermitian. Therefore, is real. Adding the two expectations above results in

from which (43) follows since . ∎

With the punctured structure of the channel matrix as given in (34)-(36), the gap of to AWGN capacity can be determined using the following corollary.

###### Corollary 1.

The gap of the AIR of the AWLD detector to AWGN capacity is

(47) |

where is the diagonal element of in (35), and is the row vector consisting of the first elements in row of (excluding the diagonal element).

It is worth noting that computing the augmented channel requires simple processing steps comparable to QL decomposition. In particular, matrix inversion is not needed to compute in (32) because the inverse of is available from (29). Moreover, following the modular approach of [2015_mansour_JSP_2x2QAM], an efficient hardware architecture for an AWLD MIMO detector can be constructed from optimized MIMO detector cores. Finally, extensions to include soft-input information, imperfect channel estimation effects, and correlated channels are directly applicable based on [2017_hu_softoutput_AIR].

## V Modified MIMO Detection Model

Instead of working with Euclidean-distance based metrics as in (3), the authors in [2012_rusek_optimal_channel_short] propose replacing , , in (4) with mismatched parameters that are subject to AIR optimization. As a result, instead of the true conditional probability in (2), the mismatched model of [2012_rusek_optimal_channel_short] is

(48) | ||||

(49) |

where is absorbed into and . It is shown in [2012_rusek_optimal_channel_short] that detectors limited to the Euclidean-based model in (5) where admits a Cholesky factorization proportional to are not optimal from a mutual information perspective because the resulting optimal matrix to use in (49) may not be positive semi-definite, and hence no such factorization exists. By maximizing the lower bound on the achievable rate based on (49), the authors in [2012_rusek_optimal_channel_short] derive an explicit expression for the optimal front-end filter , which is the MMSE filter compensated by the receiver tree processing through rather than . Using , the authors in [2017_hu_softoutput_AIR] derive an explicit expression for the optimal so that the tree processing term admits a Cholesky factorization of the form , such that has a punctured structured analogous to that of the WLD scheme [2014_mansour_SPL_WLD].

In this work, we propose the following modified model

Comments

There are no comments yet.