# Optimal Augmented-Channel Puncturing for Low-Complexity Soft-Output MIMO Detectors

We propose a computationally-efficient soft-output detector for multiple-input multiple-output channels based on augmented channel puncturing in order to reduce tree processing complexity. The proposed detector, dubbed augmented WL detector (AWLD), employs a punctured channel with a special structure derived by triangulizing the original channel in augmented form, followed by Gaussian elimination. We prove that these punctured channels are optimal in maximizing the lower-bound on the achievable information rate (AIR) based on a newly proposed mismatched detection model. We show that the AWLD decomposes into a minimum mean-square error (MMSE) prefilter and channel-gain compensation stages, followed by a regular unaugmented WL detector (WLD). It attains the same performance as the existing AIR partial marginalization (AIR-PM) detector, but with much simpler processing.

## Authors

• 9 publications
• ### Low-Complexity Soft-Output MIMO Detectors Based on Optimal Channel Puncturing

Channel puncturing transforms a multiple-input multiple-output (MIMO) ch...
12/09/2020 ∙ by Mohammad M. Mansour, et al. ∙ 0

• ### Multi-User Detection Based on Expectation Propagation for the Non-Coherent SIMO Multiple Access Channel

We consider the non-coherent single-input multiple-output (SIMO) multipl...
05/27/2019 ∙ by Khac-Hoang Ngo, et al. ∙ 0

• ### Adaptive Neural Signal Detection for Massive MIMO

Symbol detection for Massive Multiple-Input Multiple-Output (MIMO) is a ...
06/11/2019 ∙ by Mehrdad Khani, et al. ∙ 0

• ### Large MIMO Detection Schemes Based on Channel Puncturing: Performance and Complexity Analysis

A family of low-complexity detection schemes based on channel matrix pun...
12/05/2017 ∙ by H. Sarieddeen, et al. ∙ 0

• ### High Performance Interference Suppression in Multi-User Massive MIMO Detector

In this paper, we propose a new nonlinear detector with improved interfe...
03/20/2020 ∙ by Andrey Ivanov, et al. ∙ 0

• ### Performance Analysis of Regularized Convex Relaxation for Complex-Valued Data Detection (Extended Version)

In this work, we study complex-valued data detection performance in mass...
07/05/2021 ∙ by Ayed M. Alrashdi, et al. ∙ 0

• ### A Supervised-Learning Detector for Multihop Distributed Reception Systems

We consider a multihop distributed uplink reception system in which K us...
12/10/2018 ∙ by Seonho Kim, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Multiple-input multiple-output (MIMO) communications using a large number of transmit and receive antennas has become mainstream technology in most modern wireless standards, primarily in 5G, in order to support the aggressive targets set on spectral efficiencies. However, achieving the ideal performance promised by this technology requires the use of MIMO detectors whose complexity grows exponentially in the number of transmit antennas and polynomially in the size of the signal constellation . To support low-latency communications while providing high throughput, computationally efficient designs of MIMO detectors that do not incur substantial performance loss are needed, especially for large MIMO dimensions and dense constellations.

The topic of MIMO detection is a classical area of research, and the literature is very rich with schemes that provide various performance-complexity tradeoffs in the design space (e.g., see overviews in [2018_Xu_two_decades, 2009_larsson_MIMO_detection_methods]). The benchmark for performance in the sense of generating good soft decisions on the transmitted information bits remains the maximum likelihood (ML) detection scheme, which provides optimal performance at exponential complexity. Alternatively, the benchmarks for low-complexity are the zero-forcing (ZF) and minimum mean-square error (MMSE) schemes, which decouple the transmit layers through a linear filtering stage to generate log-likelihood ratios (LLRs) for each bit symbol either in parallel or sequentially through decision feedback. Although linear processing incurs only a marginal loss in mutual information between the transmitter and receiver, and offers fairly good performance in fast fading channels, it severely limits the diversity order of a MIMO system in slowly fading channels [2008_larsson_fixed_complexity].

Tree-search based detectors such as sphere decoding [2003_damen_on_ML_detection], list decoding [2003_hochwald_achieving_nearcapacity]

, and other variants map the detection problem into a search problem for the closest signal vector. They find the closest

to the received vector by forming a search-tree and recursively enumerating all symbols in across all layers in from the parent down to the leafs. Such schemes suffer from non-deterministic complexity (see scheduling solutions in [2014_sphereP2_mansour]). To simplify the search process, fixed-complexity schemes such as [barbero2008fixing, 2006_Wenk_ISCAS] limit the search steps to a set of survivor paths. While these schemes are efficient in finding the ML path, they do not necessarily find all the best competing paths that are needed to generate soft decisions.

An alternative concept is partial marginalization (PM) [2008_larsson_fixed_complexity, 2011_persson_partial_marg], which exhaustively enumerates only over a small subset of carefully chosen parent layers, and marginalizes over the other

child layers using ZF with decision-feedback estimates. While the bit LLRs for parent symbols are easy to compute, computing bit LLRs for child symbols is complicated by two facts: 1) each child

bit

requires a separate QR decomposition (QRD), totalling

, and 2) the LLRs are prone to error propagation for large due to decision feedback. In [2006_siti_novel_LORD], the closely related layered orthogonal lattice detector (LORD) scheme mitigates the first drawback by operating with and computing bit LLRs for the parent symbol only; independent QRDs and tree searches are performed to compute the bit LLRs for all symbols by choosing a different symbol as parent each round.

To overcome the second drawback, the WL detection (WLD) scheme [2014_mansour_SPL_WLD] first applies a (non-unitary) filtering matrix to decompose the channel into a sparse lower-triangular matrix (and hence the acronym WLD). It then enumerates across one parent layer and detects symbols in all other child layers in parallel via least-squares (LS) estimates with no decision feedback. The channel matrix is “punctured” to have a special structure in order to break the connections among child nodes, while retaining connections only to the parent. Essentially, all child nodes become leaf nodes, and hence LS estimates are optimal. An immediate consequence is that the LS estimates of the counter hypotheses of the bits in each leaf symbol can be easily derived from the LS estimate itself [2014_sphereP1_mansour]. A closely related concept is the achievable information rate (AIR)-PM detector [2012_rusek_optimal_channel_short, 2017_hu_softoutput_AIR], which derives a “shortened” channel similar to the WLD’s punctured structure using information-theoretic optimizations.

In this paper, we show that the concepts of channel puncturing of [2014_mansour_SPL_WLD] and AIR-PM-based channel shortening of [2017_hu_softoutput_AIR] are related. After introducing the system model in Sec. II, we first present a matrix characterization of the WLD detector of order in terms of Gaussian elimination matrices. We then derive a lower bound on the achievable rate of the WLD detector, as well as a bound on the quality of its hard decision estimate, and show that these bounds approach capacity and the ML hard-decision as increases (Sec. III). We also propose a new augmented WLD (AWLD) MIMO detection scheme in Sec. IV in which an augmented channel rather than the original channel is punctured. We derive a lower bound on the AIR of the AWLD detector and characterize its gap to capacity. In Sec. V, we propose an alternate mismatched detection model compared to [2012_rusek_optimal_channel_short] and use it derive optimal punctured channel matrices that maximize the AIR. We prove that the AWLD detector is optimal under this model, and is in fact equivalent to the AIR-PM detector of [2017_hu_softoutput_AIR]. The AWLD detector decomposes into an MMSE prefilter and channel-gain compensation stages, followed by an unaugmented WLD. Hence, AIR-optimal channel puncturing can be achieved using simple QR decomposition followed by Gaussian elimination.

## Ii System Model

Consider a MIMO system with transmit antennas and receive antennas. Let represent the MIMO communication channel, which is assumed to be perfectly known at the receiver. The transmit signal is composed of symbols drawn from constellation with average energy , where each symbol is mapped from bits . The receive signal is modeled using the input-output relation

 y=Hx+z, (1)

where the noise term

. The conditional probability

and metric according to (1) are

 p(y|x) =1(πN0)Nexp(μ(y|x)), (2) μ(y|x) =−1N0∥y−Hx∥2 (3) =−1N0(y†y−2R{y†Hx}+x†H†Hx) (4) ∝2R{y†Hx}−x†H†Hx. (5)

Using the observation and assuming no prior information on , the ML detector generates the LLR of the bit of the symbol in as

 L(xqn|y) =ln∑x:xqn=+1exp(μ(y|x))∑x:xqn=−1exp(μ(y|x)). (6)

To avoid computing exponentials, the approximation [2005_moon_error_correcting_codes] can be applied to approximate as (6)

 L(xqn|y) ≈maxx:xqn=+1μ(y|x)−maxx:xqn=−1μ(y|x). (7)

In the absence of any structure on , computing the sums in (6) or max terms in (7) have exponential complexities.

## Iii WLD MIMO Detector

Let denote the (thin) QL decomposition [2013_golub_matrix] of , where has orthonormal columns, is lower-triangular with real positive diagonal elements. In [2014_mansour_SPL_WLD, 2014_mansour_eurasip_WLD], a technique to puncture into by nulling all entries below the main diagonal and to the right of the first column ( for and ) using Gaussian elimination is presented. Here, we give an alternate characterization using matrices, and generalize it to other puncturing patterns. Assume is partitioned as

 LN×N=[p01×(N−1)r(N−1)×1S(N−1)×(N−1)]. (8)

where is a real scalar. For non-singular , is given by

 Lp (9)

The diagonal matrix is chosen such that the puncturing matrix satisfies :

 Dp =[100\smalldiag(S)\raisebox1.29pt\scalebox.8$−1$Σ] (10) Wp =[100ΣS−1] (11) Σ =\smalldiag(S−1S−†)\raisebox1.29pt\scalebox.8$−1/2$. (12)

The above definition of can be generalized to any lower-triangular puncturing pattern of order as follows:

 L(ν)N×N =[Pν×ν0ν×(N−ν)R(N−ν)×νS(N−ν)×(N−ν)] (13) L(ν)p ≜W(ν)pL(ν)=D(ν)p\smalldiag(L)[I00S−1][P0RS] (14)

where and are given by

 D(ν)p (15) W(ν)p =[I00ΣS\raisebox1.29pt\scalebox.8$−1$] (16)

Note that is a non-singular lower triangular matrix with ones on the main diagonal. Also, since normalizes the diagonal elements of , then the remaining eigenvalues of are positive and less than or equal to 1. Therefore, it follows that and , where () and (

) are the maximum (minimum) singular values and eigenvalues of

, respectively. For simplicity of notation, we drop the superscript , with the understanding that the puncturing order is .

### Iii-a WLD MIMO Detection Model

By applying the filtering matrix , the metric in (3) computed by the WLD detector takes the form

 −1N0∥∥Q†y−Lx∥∥2 Wp −−−→−1N0∥∥Wp(Q†y−Lx)∥∥2. (17)

Next, expanding (17) and dropping the irrelevant term , (17) can be rewritten as

 μp(y|x) =2R{y†Fpx}−x†Gpx, (18)

where , and . The corresponding detection model becomes

 pp(y|x) =exp(2R{y†Fpx}−x†Gpx), (19)

instead of the true conditional probability in (2). Based on (19), the AIR of the WLD detector is lower-bounded by [2006_arnold_simulation_based]

 I\tinyWLD\tinyLB =EY,X[log(pp(y|x))]−EY[log(pp(y))], (20)

where the expectations are taken over the true channel statistics, and , with being the prior distribution of .

###### Theorem 1.

Assuming , and let

be the signal-to-noise ratio (SNR), then the AIR of the WLD detector is lower-bounded by

###### Proof:

We compute the expectations in (20) as

 EY,X[log(pp(y|x))] =EsTr(Gp) −EY[log(pp(y))] =NlogEs+logdet(Gp+1EsI) − Tr(F†p[EsHH†+N0I]Fp[Gp+1EsI]−1)

following [2012_rusek_optimal_channel_short]. Substituting for , and , and applying the matrix inversion lemma [2011_zhang_matrix_theory] followed by standard simplifications, the result follows. ∎

Note that for , we have and , and then , which is the capacity of the channel. In fact, as increases from 1, the metrics computed by the WLD detector approach the hard-decision ML metrics as shown by the following lemma:

###### Lemma 1.

Let and where , then

 ∥∥~y−Lx\tinyML∥∥≤∥∥~y−Lx\tinyWLD∥∥ ≤κ(Wp)∥∥~y−Lx\tinyML∥∥ (21) ∥∥Wp(~y−Lx\tinyWLD)∥∥ ≤σmax(Wp)∥∥~y−Lx\tinyML∥∥ (22)

where is the condition number of , and are the largest and smallest singular values of , respectively.

###### Proof:

The first inequality in (21) follows from the definition of the ML solution. For the second, we have

 ∥∥~y−Lx\tinyWLD∥∥ =∥∥W−1pWp(~y−Lx\tinyWLD)∥∥ ≤σmax(W−1p)∥∥Wp(~y−Lx\tinyWLD)∥∥ ≤σmax(W−1p)∥∥Wp(~y−Lx\tinyML)∥∥ (23) ≤σmax(W−1p)σmax(Wp)∥∥~y−Lx\tinyML∥∥,

from which (21) follows. Note that (22) and (23) follow because for any . ∎

Note that the layer order within the parent set and within the child set is irrelevant. What matters is which layers are selected to form the parent set. for Gaussian inputs can be used as a criterion for parent layer selection, but the complexity of possible combinations grows as . Alternatively, a less sensitive approach to parent layer selection is to do multiple detection rounds, each time choosing new layers as parents and generating bit LLRs for these parent symbols only.

## Iv Augmented WLD MIMO Detector

Instead of basing the detection metric in (3) on , we form the augmented vector and augmented matrix

 Ha≜⎡⎢⎣1√N0HM×N1√EsIN⎤⎥⎦(size (M+N)×N) (24)

in a manner analogous to the square-root MMSE [2000_hassibi_square-root_MMSE], and reformulate in (3) based on rather than as

 −μ(y|x) =1N0∥y∥2−2√N0R⎧⎪⎨⎪⎩[y† 0]⎡⎢⎣1√N0H1√EsIN⎤⎥⎦x⎫⎪⎬⎪⎭ +x†(1N0H†H+1Es)x−1Esx†x =1N0∥ya∥2−2√N0R{y†aHax}+x†H†aHax−1Esx†x (25)

We next expand the squared-distance in (25) in terms of the projection matrix onto the column space of and its orthogonal complement as

 ∥∥∥ya√N0−Hax∥∥∥2 =∥∥∥PHa(ya√N0−Hax)∥∥∥2+∥∥∥P⊥Haya√N0ya∥∥∥2. (26)

Let be the thin QL decomposition of :

 Ha =⎡⎢⎣1√N0H1√EsIN⎤⎥⎦=QaLa=[Qa1Qa2]La=[Qa1LaQa2La], (27)

where is an matrix with orthonormal columns (i.e., but not unitary since ), is lower triangular, and are respectively the upper and lower block matrices of . Note that neither the rows nor the columns of and are orthonormal. Also, from (27), it follows that

 H =√N0Qa1La, (28) 1√EsIN =Qa2La=LaQa2. (29)

However, (28) is not the QL-decomposition of . (29) implies that is a lower-triangular matrix proportional to the inverse of , i.e, . Then, from (27) we have

 1N0H†H+1EsIN =H†aHa=L†aLa,

from which it follows that

 ∥∥∥ya√N0−Hax∥∥∥2 =∥∥La(˜Wy−x)∥∥2+1N0∥∥(I−QaQ†a)ya∥∥2, (30)

where is the standard MMSE filter matrix,

 ˜W =H†[HH†+αIM]\raisebox1.29pt\scalebox.8$−1$=[H†H+αIN]\raisebox1.29pt\scalebox.8$−1$H† (31) =1N0(H†aHa)\raisebox1.29pt\scalebox.8$−1$H†=1N0(L†aLa)\raisebox1.29pt\scalebox.8$−1$H†=√βQa2Q†a1, (32)

with . Substituting (30) back in (25), we obtain

 μ(y|x)=1Esx†x−||La(˜Wy−x)||2−1N0∥∥(I−QaQ†a)ya∥∥2. (33)

Note that in (33), the term appears explicitly, while tree processing is solely based on in . We therefore puncture using an appropriate puncturing matrix similar to puncturing in (9) or (14) using . For a given puncturing order , we conformally partition similar to (14) and obtain the partition blocks of size , of size , and of size . The resulting punctured augmented matrix is given by

 Lap ≜WapLa (34) Wap ≜Dap\smalldiag(La)[Iν00S\raisebox1.29pt\scalebox.8$−1$\raisebox0.43pt\scalebox.8$a$]=[Iν00ΣaS\raisebox1.29pt\scalebox.8$−1$\raisebox0.43pt\scalebox.8$a$] (35) Dap (36) Σa =\smalldiag(S\raisebox1.29pt\scalebox.8$−1$\raisebox0.43pt\scalebox.8$a$S\raisebox1.29pt\scalebox.8$−†$\raisebox0.43pt\scalebox.8$a$)\raisebox1.29pt\scalebox.8$−12$, (37)

where in (36) is chosen to have .

Next, applying to filter in (33) as

 ∥∥La(˜Wy−x)∥∥2 Wap −−−−→∥∥WapLa(˜Wy−x)∥∥2, (38)

and dropping the irrelevant term , the metric computed by the augmented WLD (AWLD) detector corresponding to (33) takes the form

 μap(y|x) =2R{y†Fapx}−x†Gapx+1Esx†x, (39)

where

 Fap (40) Gap (41)

The corresponding AWLD detection model (Fig. 1) becomes

 pap(y|x) =exp(2R{y†Fapx}−x†Gapx+1Esx†x). (42)
###### Theorem 2.

Under the same assumptions as Theorem 1, the AIR of the augmented WLD detector based on (42) with given in (40), (41) respectively, is lower-bounded by

 I\tinyAWLD\tinyLB=NlogEs+logdet(L†apLap). (43)
###### Proof:

The lower bound on the AIR of the AWLD detector based on (42) is defined as

 I\tinyAWLD\tinyLB =EY,X[log(pap(y|x))]−EY[log(pap(y))]. (44)

where assuming . The main difference compared to the proof of Theorem 1 is the effect of the term in (42) when evaluating under Gaussian densities, which annihilates the effect of the prior density to give

 pap(y) =1πNENs∫exp(2R{y†Fapx}−x†Gapx)dx. (45)

After some manipulations, the expectations in (44) become

 EY,X[log(pap(y|x))] =N−EsTr(Gap)+2EsR{Tr(F†apH)} −EY[log(pap(y))] =NlogEs+logdet(Gap)

Substituting (41) and (40) for and , and applying (31) for , then . Also, it is easy to show that

 ˜WH =[H†H+αIN]−1H†H=I−α[αIN+H†H]−1, (46)

from which it follows that this matrix product is Hermitian. Therefore, is real. Adding the two expectations above results in

 I\tinyAWLD\tinyLB =NlogEs+logdet(Gap)−Tr(Gap[1EsI+1N0H†H]−1)+N

from which (43) follows since . ∎

With the punctured structure of the channel matrix as given in (34)-(36), the gap of to AWGN capacity can be determined using the following corollary.

###### Corollary 1.

The gap of the AIR of the AWLD detector to AWGN capacity is

 CAWGN−I\tiny% AWLD\tinyLB =N−ν∑k=1log(s2akk∥∥[S−1a]¯k∥∥2+1). (47)

where is the diagonal element of in (35), and is the row vector consisting of the first elements in row of (excluding the diagonal element).

###### Proof:

Applying (34)-(36) in (43), the term splits and the term emerges. ∎

It is worth noting that computing the augmented channel requires simple processing steps comparable to QL decomposition. In particular, matrix inversion is not needed to compute in (32) because the inverse of is available from (29). Moreover, following the modular approach of [2015_mansour_JSP_2x2QAM], an efficient hardware architecture for an AWLD MIMO detector can be constructed from optimized MIMO detector cores. Finally, extensions to include soft-input information, imperfect channel estimation effects, and correlated channels are directly applicable based on [2017_hu_softoutput_AIR].

## V Modified MIMO Detection Model

Instead of working with Euclidean-distance based metrics as in (3), the authors in [2012_rusek_optimal_channel_short] propose replacing , , in (4) with mismatched parameters that are subject to AIR optimization. As a result, instead of the true conditional probability in (2), the mismatched model of [2012_rusek_optimal_channel_short] is

 μr(y|x) (48) pr(y|x) (49)

where is absorbed into and . It is shown in [2012_rusek_optimal_channel_short] that detectors limited to the Euclidean-based model in (5) where admits a Cholesky factorization proportional to are not optimal from a mutual information perspective because the resulting optimal matrix to use in (49) may not be positive semi-definite, and hence no such factorization exists. By maximizing the lower bound on the achievable rate based on (49), the authors in [2012_rusek_optimal_channel_short] derive an explicit expression for the optimal front-end filter , which is the MMSE filter compensated by the receiver tree processing through rather than . Using , the authors in [2017_hu_softoutput_AIR] derive an explicit expression for the optimal so that the tree processing term admits a Cholesky factorization of the form , such that has a punctured structured analogous to that of the WLD scheme [2014_mansour_SPL_WLD].

In this work, we propose the following modified model

 μm(y|x)