# The Bussgang Decomposition of Non-Linear Systems: Basic Theory and MIMO Extensions

Many of the systems that appear in various signal processing applications are non-linear, for example, due to hardware impairments such as non-linear amplifiers and finite-resolution quantization. The Bussgang decomposition is a popular tool for analyzing the performance of systems that involve such non-linear components. In a nutshell, the decomposition provides an exact probabilistic relationship between the output and the input of a non-linearity: the output is equal to a scaled version of the input plus uncorrelated distortion. The decomposition can either be used to compute exact performance results or lower bounds where the uncorrelated distortion is treated as independent noise. This lecture note explains the basic theory, provides key examples, extends the theory to complex-valued vector signals, and clarifies some potential misconceptions.

## Authors

• 17 publications
• 119 publications
04/17/2019

### New equivalent model of quantizer with noisy input and its application for ADC resolution determination in an uplink MIMO receiver

When a quantizer input signal is the sum of the desired signal and input...
03/07/2022

### Non-linear predictive vector quantization of speech

In this paper we propose a Non-Linear Predictive Vector quantizer (PVQ) ...
02/28/2018

### Reconsidering Linear Transmit Signal Processing in 1-Bit Quantized Multi-User MISO Systems

In this contribution, we investigate a coarsely quantized Multi-User (MU...
09/21/2020

### Decomposing spectral and phasic differences in non-linear features between datasets

When employing non-linear methods to characterise complex systems, it is...
01/22/2021

### SPAD-Based Optical Wireless Communication with Signal Pre-Distortion and Noise Normalization

In recent years, there has been a growing interest in exploring the appl...
10/31/2018

### An Information-Theoretic Framework for Non-linear Canonical Correlation Analysis

Canonical Correlation Analysis (CCA) is a linear representation learning...
09/11/2019

### Goodness-of-fit tests on manifolds

We develop a general theory for the goodness-of-fit test to non-linear m...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Relevance

The origin of the decomposition is a technical report by Julian J. Bussgang from 1952 [1]. Interestingly, the decomposition is not explicitly stated in his report, but rather a consequence of his results. In fact, it is mainly non-trivial extensions of his results that are utilized in current research; for example, applications to complex-valued multiple-input multiple-output (MIMO) systems are popular in the communication community. There is no standard reference that presents and proves those extended results, and it can be hard to differentiate between which results are exact and which are mere approximations. This lecture note fills these gaps.

## Ii Prerequisites

This lecture note requires basic knowledge of random variables, linear algebra, signals and systems, and estimation theory.

## Iii Original Bussgang Decomposition for Real Gaussian Random Variables

In the original paper [1], Bussgang considers two jointly Gaussian stationary random processes and . The process undergoes a non-linear memoryless distortion represented by the function . The resulting non-Gaussian random process is

 F(t)=U(f(t)). (1)

Bussgang computed the cross-correlation of the two random variables obtained by sampling and at specific time instances. Let and denote the zero-mean Gaussian random variables obtained by sampling at time and , respectively. Moreover, let be the sampled output of the non-linear distortion function. We then have the following main result from [1, Sec. III].

###### Theorem 1 (The Bussgang theorem).

The cross-correlation of and is

 (2)

where is called the Bussgang gain and is the cross-correlation of and .

The Bussgang theorem shows that the cross-correlation between two Gaussian signals is the same before and after one of them has passed through a non-linear function, except for a scaling factor . The value of depends on the choice of but the theorem holds for any function.

A consequence of Theorem 1 for is that the output signal can be decomposed as

 z=U(x)=Bx+η, (3)

where is a zero-mean random variable that is uncorrelated to both and . This is the Bussgang decomposition in its elementary form and shows that the output contains the useful part and the distortion part . In other words, the output of a non-linear function is equal to a scaled version of the input plus the uncorrelated distortion . Note that and are not independent. Since is a deterministic function of

, the distortion term is non-Gaussian distributed and statistically dependent on

. Even if the Bussgang decomposition is named after Bussgang, the result is not explicitly stated in [1].

## Iv Bussgang Decomposition for Complex Random Variables

The Bussgang theorem was extended to the complex case in [2]. We will present this result and then provide a direct proof that is inspired by [3]. For notational convenience, in the remainder of this lecture note, we use to denote the power of a signal and we use to denote the cross-correlation between and .

###### Theorem 2 (The complex Bussgang theorem).

Consider the jointly circularly symmetric complex Gaussian random variables and . Let be the output of a deterministic function. The cross-correlations and are then related as

 Czy=E{U(x)y∗}=E{U(x)x∗}E{|x|2}≜B=Czx/CxE{xy∗}=BCxy. (4)
###### Proof:

We begin by decomposing into two parts:

 (5)

Interestingly, this is equivalent to computing a minimum-mean squared error (MMSE) estimate of given , with representing the estimation error. Hence, it follows that the second part, , in (5) is uncorrelated with :

 (6)

Since and are jointly Gaussian, the fact that and are uncorrelated implies that they are also independent complex Gaussian variables. By using the decomposition in (5), it follows that

 Czy=E{U(x)y∗}=E{U(x)x∗}E{|x|2}E{xy∗}+E{U(x)ϵ∗}=0=BCxy (7)

by using that the independence between and implies . ∎

The complex Bussgang theorem is the natural complex-valued extension of Theorem 1. The corresponding complex Bussgang decomposition is given by (3) with the only exception that the Bussgang gain is now computed as instead.

A first use case of the Bussgang decomposition is to quantify the signal-to-distortion ratio (SDR) at the output of the distortion function. The SDR is simply the power ratio of the desired signal to the additive distortion :

 SDR=E{|Bx|2}E{|η|2}=|B|2CxCz−|B|2Cx, (8)

where we have used that the additive distortion is uncorrelated with the desired signal .

A second use case is to analyze the performance of a communication system where is the transmitted information signal. Suppose the received signal is noisy distorted signal , where models the hardware distortion and is thermal noise with power . The hardware distortion might, for example, be caused of a sequence of non-ideal blocks in the receiver hardware [4], as illustrated in Fig. 1. The first block is the low-noise amplifier (LNA) that can distort both the amplitude and phase of the input signal. In the yellow figure, the amplitude distortion is exemplified and clipping occurs for input signals with large amplitudes. The second block is the in-phase/quadrature (I/Q) demodulator that might have mismatches between its branches leading to I/Q imbalance. In the green curve, the effect of I/Q imbalance is shown on a QPSK constellation where the actual transmitted points are affected by leakage from the mirror subcarriers. Finally, in the analog-to-digital converter (ADC) block, the real and imaginary parts of the received signal are quantized to be represented by a finite number of bits. Quantization distortion is inevitable even if a large number of ADC bits are used [5, 6]. We can use the Bussgang decomposition in (3) to rewrite the received signal as

 U(x)+w=BxDesired signal+η+w% Uncorrelated signal. (9)

This signal contains a desired part and an uncorrelated additive “noise” term . Since the latter term is uncorrelated with , we can utilize the Worst case uncorrelated additive noise theorem from [7] to compute an achievable data rate. That theorem says that the worst distribution of from a rate perspective is independent complex Gaussian, in which case the rate is

 log2⎛⎜ ⎜⎝1+E{|Bx|2}E{|η|2}+σ2⎞⎟ ⎟⎠=log2(1+|B|2CxCz−|B|2Cx+σ2)bit per channel use. (10)

One can possibly achieve a larger rate than (10), by somehow making use of the information content in . But we achieve (10) if we treat as independent Gaussian noise in the decoder.

### Iv-a Alternative Computation of the Bussgang Gain and Two Examples

If the distortion function is differentiable, there is an alternative way of computing the Bussgang gain that might be easier. We will exemplify how to compute it in the real-valued case where

has the probability density function (PDF)

. Note that the derivative of this PDF is . We can then rewrite the Bussgang gain as

 B=E{U(x)x}Cx=∫∞−∞U(x)xCxp(x)dx(a)=−∫∞−∞U(x)p′(x)dx(b)=∫∞−∞U′(x)p(x)dx=E{U′(x)}, (11)

where we identify in and integrate by parts to get . The last expression in (11) reveals that the Bussgang gain can be also computed as the expected value of the first derivative of the distortion function. This result is a special case of Price’s Theorem [8, Example 9-17].

###### Example 1 (One-bit quantization).

Consider a real-valued signal that enters the non-linear distortion function , which represents one-bit quantization. The Bussgang gain can then be found as , where is the Dirac function. The same Bussgang gain can be computed as .

A similar alternative way of computing the Bussgang gain exists in the complex-valued case, where the derivative of the distortion function is defined as [9]:

 ∂U(x)∂x=12(∂U(x)∂R{x}−j∂U(x)∂I{x}). (12)

One can then show that the Bussgang gain can be computed as [9]

 B=E{∂U(x)∂x}. (13)
###### Example 2 (Third-order non-linearity).

Consider a complex-valued signal that enters the third-order non-linear distortion function , which might model a non-linear amplifier [3, 10]. The Bussgang gain can be obtained as . The same number is found by evaluating using (12).

### Iv-B Additive Quantization Noise Model is Nothing But Bussgang Decomposition

The Bussgang decomposition is unique in the sense that it is the only decomposition of a distorted signal having the property that the additive distortion noise is uncorrelated with the desired signal . No other value of can be used to achieve that.

One seemingly different decomposition is the Additive Quantization Noise Model (AQNM) originally proposed in [5] to model quantization errors. This model is sometimes described as an alternative decomposition, however, AQNM is nothing but the Bussgang decomposition for quantization. In [5, Lemma 1], a scalar quantizer function is considered, which has the property , which means that each quantization interval is represented by its mean value. When the input is , it is shown that the output can be expressed as a summation of a scaled version of plus an uncorrelated distortion term as follows:

 z=Q(x)=(1−β)x+η, (14)

where and .

We will show that (14) equals the Bussgang decomposition , where the Bussgang gain equals . Using the assumption from [5], we have

 Czx=E{Q(x)x∗}=E{E{Q(x)x∗|Q(x)}}=E{Q(x)Q∗(x)}=Cz. (15)

By utilizing this result, the scaling in (14) can be rewritten as

 1−β=1−E{|x−z|2}Cx=1−Cx+Cz−Czx−C∗zxCx=CzxCx=B. (16)

Hence, the AQNM is a special case of the Bussgang decomposition for distortion functions that satisfy a particular condition. The bottomline is that the Bussgang decomposition is unique but the value of depends on the distortion function.

## V Extension to MIMO Systems

In recent years, it has become popular to analyze MIMO systems that are subject to hardware impairments, in particular, in MIMO communications [11, 12, 6]. In this part, we extend the Bussgang results to be applicable to such cases.

Consider two jointly circularly symmetric Gaussian random vectors and , which both have length . The correlation matrices are denoted as and and are assumed to have full rank. The cross-correlation matrix is denoted as . Using this notation, we can generalize the Bussgang theorem as follows.

###### Theorem 3 (Bussgang Theorem for MIMO Distortions).

Consider the jointly circularly symmetric Gaussian random vectors and . Let denote a distortion function and is the distorted signal when using as input. The cross-correlation matrix of and

is a linear transformation of the cross-correlation matrix

of and :

 Czy=CzxC−1xCxy. (17)
###### Proof:

The proof is a matrix extension of the proof of Theorem 2. Let us express as a summation of the MMSE estimate of it given and the estimation error :

 y=CyxC−1xx+ϵ, (18)

where is defined as . If we multiply both sides of (18) by from the right and take the expectation, we obtain

 Cyx=CyxC−1xCx+E{ϵx\tinyH}=Cyx+E{ϵx\tinyH}, (19)

from which it follows that . Hence, and are uncorrelated, which implies that they are also independent since these are jointly Gaussian variables. Finally, we obtain (17) as

 Czy=E{zy\tinyH}=E{zx\tinyH}C−1xC\tinyHyx+E{zϵ\tinyH}=CzxC−1xCxy (20)

by utilizing that and that since and are independent. ∎

From this theorem we notice that the Bussgang gain is represented by the matrix

 B=CzxC−1x (21)

and we call it a MIMO extension since the distortion function takes multiple inputs and provide multiple outputs. It is possible to extend the result to case where is rank-deficient, in which case the inverse in (21) is replaced by a pseudo-inverse; see [3, Section ii@.A] for details.

A consequence of Theorem 3 is the Bussgang decomposition for MIMO functions:

 z=U(x)=Bx+η, (22)

where the additive distortion term is uncorrelated both with and any other random vector that is correlated with . This result is illustrated in Fig. 2(a).

### V-a Element-Wise Distortion for MIMO Systems

The Bussgang decomposition for MIMO functions has been widely used to model the hardware impairments in multiple-antenna communication systems [6, 11]. In this case, is the number of receive antennas and the distortion function represents impairments in the antenna branches. A common assumption is that there is no crosstalk between the branches, so that each one can be separately modeled in the way shown in Fig. 1. The distortion function then has the form

 z=U(x)=⎡⎢ ⎢⎣U1(x1)⋮UM(xM)⎤⎥ ⎥⎦, (23)

where denotes the element of . Hence, each output is a distorted version of only the input having the same index. We can then simplify the Bussgang matrix by utilizing Theorem 3. More precisely, it follows that , where is a diagonal matrix and is the Bussgang gain corresponding to the component of the distortion function, i.e., . Hence, the Bussgang gain matrix of the overall MIMO distortion becomes and we obtain the simplified Bussgang decomposition

 z=Dx+η=⎡⎢ ⎢⎣d1x1⋮dMxM⎤⎥ ⎥⎦+η. (24)

Hence, when an element-wise distortion function affects the Gaussian signal , the output is an element-wise scaled version of plus a distortion vector that is uncorrelated with .

### V-B Are the Elements of Distortion η Uncorrelated?

Since the Bussgang gain matrix is diagonal when having element-wise distortions, one may tend to think that the elements of the distortion will also be uncorrelated, so that we effectively get one separate Bussgang decomposition per received signal. However, this is generally not the case as we will show next. Let denote the correlation matrix of the distortion vector . Using the fact that is uncorrelated with , it can be computed as

 Cη=Cz−BCxB\tiny% H. (25)

Whenever the input signal contains correlated elements, such that is non-diagonal, the correlation matrix will likely also be non-diagonal. This is intuitively quite clear: If two (almost) identical signals are sent through identical hardware components, then the distortion should also be (almost) identical. This type of correlation typically appears in wireless communications since each receive antenna observes a different linear combination of the same transmitted information signals. Some conditions for when the correlation can be neglected, so that is approximately diagonal, are derived in [3]. However, it is rather common that the correlation is neglected without motivation (cf. [6, 12]), which might lead to substantial approximation errors.

As an example, we consider a setup where a 4-antenna receiver quantifies the real and imaginary parts of each entry in the received signal using identical -bit ADCs. The input signal is generated as , where is the MIMO channel matrix from a 4-antenna transmitter. We consider Rayleigh fading where has independent -distributed entries. For each channel realization, is assumed perfectly known and the transmitted signal is , so is conditionally complex Gaussian distributed. The Bussgang decomposition then says that the ADC output can be written as . To demonstrate that the elements of are correlated, Fig. 3

shows the cumulative distribution function (CDF) of the normalized off-diagonal elements of

(i.e., the correlation coefficients) for different number of ADC bits. When the ADC resolution is low, most of the correlation coefficients are non-zero and some are rather large. However, when the ADC resolution is high, the off-diagonal elements are almost zero and can potentially be approximated as zero when quantifying communication rates.

### V-C Generalized Bussgang Decomposition for Non-Gaussian Input Signals

In the Bussgang theorem, we are utilizing that and are Gaussian signals. The main result cannot be generalized to non-Gaussian signals. However, we can always decompose the distorted signal according to (22) using the Bussgang gain matrix , but it generally won’t be a diagonal matrix, even if an element-wise distortion of the type in (23) is used. The intuition is that is the linear MMSE estimate of given a non-Gaussian distributed observation . In this analogy, is the estimation error which is uncorrelated with since

 E{ηx\tinyH}=E{(z−CzxC−1xx)x\tinyH}= Czx−CzxC−1xCx=0. (26)

The generalized Bussgang decomposition for non-Gaussian input is illustrated in Fig. 2(b). It is suitable both for quantifying the SDR and to the analyze the performance of non-linear communication systems. For example, [10] did this using practically modulated data signals. The paper also shows that although treating the uncorrelated distortion as independent Gaussian noise is convenient, one can increase the performance by exploiting its information content.

## Vi Lessons Learned

The Bussgang decomposition establishes that the output of a non-linear function is a scaled version of the random input signal plus an uncorrelated distortion term. It is an exact and unique representation. The distortion is not independent and not Gaussian, but can be treated as that to obtain a lower bound on the communication performance. The decomposition can be extended to MIMO systems but then the entries of the distortion vector are generally mutually correlated.

## References

• [1] J. J. Bussgang, “Crosscorrelation functions of amplitude-distorted Gaussian signals,” Research Laboratory of Electronics, Massachusetts Institute of Technology, Tech. Rep. 216, 1952.
• [2] J. Minkoff, “The role of AM-to-PM conversion in memoryless nonlinear systems,” IEEE Trans. Commun., vol. 33, no. 2, pp. 139–144, 1985.
• [3] E. Björnson, L. Sanguinetti, and J. Hoydis, “Hardware distortion correlation has negligible impact on UL massive MIMO spectral efficiency,” IEEE Trans. Commun., vol. 67, no. 2, pp. 1085–1098, Feb 2019.
• [4] T. Schenk, RF imperfections in high-rate wireless systems: Impact and digital compensation.   Springer, 2008.
• [5] A. K. Fletcher, S. Rangan, V. K. Goyal, and K. Ramchandran, “Robust predictive quantization: Analysis and design via convex optimization,” IEEE Journal of Selected Topics in Signal Processing, vol. 1, no. 4, pp. 618–632, 2007.
• [6] Q. Bai, A. Mezghani, and J. A. Nossek, “On the optimization of ADC resolution in multi-antenna systems,” in IEEE ISWCS, 2013.
• [7] B. Hassibi and B. M. Hochwald, “How much training is needed in multiple-antenna wireless links?” IEEE Trans. Inform. Theory, vol. 49, no. 4, pp. 951–963, 2003.
• [8] A. Papoulis and S. U. Pillai, Probability, Random Variables, and Stochastic Processes, 4th ed.   McGraw-Hill Higher Education, 2002.
• [9] W. McGee, “Circularly complex Gaussian noise–a Price theorem and a Mehler expansion,” IEEE Transactions on Information Theory, vol. 15, no. 2, pp. 317–319, 1969.
• [10]

Ö. T. Demir and E. Björnson, “Channel estimation in massive MIMO under hardware non-linearities: Bayesian methods versus deep learning,”

IEEE Open Journal of the Communications Society, vol. 1, pp. 109–124, 2020.
• [11] E. Björnson, J. Hoydis, M. Kountouris, and M. Debbah, “Massive MIMO systems with non-ideal hardware: Energy efficiency, estimation, and capacity limits,” IEEE Trans. Inform. Theory, vol. 60, no. 11, pp. 7112–7139, 2014.
• [12] L. Xu, X. Lu, S. Jin, F. Gao, and Y. Zhu, “On the uplink achievable rate of massive MIMO system with low-resolution ADC and RF impairments,” IEEE Communications Letters, vol. 23, no. 3, pp. 502–505, 2019.