CQNet: Complex Input Quantized Neural Network designed for Massive MIMO CSI Feedback

by   Sijie Ji, et al.
Nanyang Technological University

The Massive Multiple Input Multiple Output (MIMO) system is a core technology of the next generation communication. With the growing complexity of CSI in massive MIMO system, traditional compressive sensing based CSI feedback has become a bottleneck problem that is limited in piratical. Recently, numerous deep learning based CSI feedback approaches demonstrate the efficiency and potential. However, the existing methods lack a reasonable interpretation of the deep learning model and the accuracy of the model decreases significantly as the CSI compression rate increases. In this paper, from the intrinsic properties of CSI data itself, we devised the corresponding deep learning building blocks to compose a novel neural network CQNet and experiment result shows CQNet outperform the state-of-the-art method with less computational overhead by achieving an average performance improvement of 8.07 paper also investigates the reasons for the decrease in model accuracy at large compression rates and proposes a strategy to embed a quantization layer to achieve effective compression, by which the original accuracy loss of 67.19 average is reduced to 21.96 by 8 times on the original benchmark.



There are no comments yet.


page 2

page 8


Convolutional Neural Network based Multiple-Rate Compressive Sensing for Massive MIMO CSI Feedback: Design, Simulation, and Analysis

Massive multiple-input multiple-output (MIMO) is a promising technology ...

Binary Neural Network Aided CSI Feedback in Massive MIMO System

In massive multiple-input multiple-output (MIMO) system, channel state i...

Deep Learning for Massive MIMO CSI Feedback

In frequency division duplex mode, the downlink channel state informatio...

CSI-based Outdoor Localization for Massive MIMO: Experiments with a Learning Approach

We report on experimental results on the use of a learning-based approac...

A Markovian Model-Driven Deep Learning Framework for Massive MIMO CSI Feedback

Forward channel state information (CSI) often plays a vital role in sche...

MIMO Channel Information Feedback Using Deep Recurrent Network

In a multiple-input multiple-output (MIMO) system, the availability of c...

Deep Convolutional Compression for Massive MIMO CSI Feedback

Coded caching provides significant gains over conventional uncoded cachi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The massive multiple-input multiple-output (MIMO) technology is considered one of the core technologies of the next generation communication system, e.g., 5G. By equipping large number of antennas, base station (BS) can sufficiently utilize spatial diversity to improve channel capacity. Especially, by enabling beamforming, a 5G BS can concentrate signal energy to a specific user equipment (UE) to achieve higher signal-to-noise ratio (SNR), less interference leakage and hence, higher channel capacity. However, beamforming is possibly conducted by the BS only when it has the channel state information (CSI) of the downlink at hand 


Many research efforts have been devoted to time-division duplexing (TDD) massive MIMO, because the CSI in the TDD mode can be obtained by exploiting channel reciprocity, where the pilot-aided training overhead is independent of the number of antennas.

Fig. 1: (a) Time-Division Duplexing (TDD) mode: BS obtain CSI by channel reciprocity. (b) Frequency-Division Duplexing (FDD) mode: BS obtain CSI through the UE feedback transmission.

However, the TDD mode is not efficient enough in terms of time sensitive communication such as live video streaming, vehicular communications, etc. On the other hand, the frequency-division duplexing (FDD) mode uses different frequency bands for uplink and downlink transmissions at the same time, which is more efficient and can support more users simultaneously. Thus, most of contemporary cellular systems operate in FDD mode [15]. From the deployment perspective, adopting massive MIMO in FDD mode therefore attracts greater interest in exploring effective approaches [1]. The biggest challenge to working with FDD mode is its overhead for CSI acquisition. Unlike TDD mode, in FDD mode, as Figure 1 depicts, the uplink and downlink channels are separated to different frequency bands, and hence the channel reciprocity does not exist. As a consequence, the UE would have to explicitly feed back the knowledge of downlink CSI to the BS, and the pilot-aided training overhead grows quadratically with the number of transmitting antennas which might overturn the benefit of Massive MIMO itself [13].

Fortunately, one important observation for massive MIMO systems helps alleviate the above issue. Some experimental studies of massive MIMO channels [24, 10] show that as the number of transmitting antennas increases, the user channel matrices tend to be sparse due to the limited local scatterers at the BS. Such observation inspires researchers to exploit unobvious sparse representation of CSI. Specifically, the massive MIMO channel has an approximately sparse representation in joint angular-delay domain [18], which can be obtained by conducting 2D-DFT on the channel matrix. The angle of arrival (AOA) and the spread delay of the path remain constant for uplink and downlink in the angular-delay domain. Based on the above characteristics, compressive sensing based CSI feedback algorithms have been proposed in recent years, e.g., LASSO [3], TVAL3 [11], and BM3D-AMP [17].

Compressive sensing based CSI feedback methods however relies heavily on channel sparsity and is limited by its efficiency in iteratively reconstructing the signals, the performance of which is highly dependent on the wireless channel, and thus is not a desirable approach considering the diversified use cases of 5G networks.

The recent rapid development of deep learning (DL) technologies provide another possible solution to efficiently feeding back CSI for FDD massive MIMO system. Instead of relying on sparsity, DL approaches utilize the auto-encoder frameworks [6] as an implicit prior constraint for encoding data [4]. The decoder learns a map from the low-dimensional data space to the targeted data distribution by single run to reconstruct the original data, without requiring the labeled data, which naturally overcomes the limit of compressive sensing based approaches in channel sparsity and operation efficiency. Recent studies [5, 14, 20, 21, 23] have demonstrated the feasibility and efficacy that DL can achieve in CSI feedback.

DL based CSI compression method is still at its early development. Most of previous works pay no attention to the complexity of the proposed DL model, which needs significant study. Considering the fact that the CSI compression would be conducted by UE, which has limited computing power and memory resources. Also, previous studies have not fully exploited the characteristics of complex valued CSI for organic integration of the real and imaginary parts into the real valued neural network models, which limits their performance in accurately representing the wireless channel. Accordingly, in this work, we propose Complex Quantization Net (CQNet), a DL based neural network framework for massive MIMO CSI compression/decompression, which is empowered by forged complex-valued convolution layers and attention mechanisms. The proposed CQNet outperforms the state-of-the-art with higher accuracy in CSI feedback and less computational overhead in operation. At the same time, we propose an effective compression paradigm, which greatly improves the accuracy degradation of the current method in the case of large compression rate, and is able to improve the compression rate by a factor of 8 with the same accuracy. We state the following contributions.

  • Signals and CSI are represented in complex envelopes, but at present, the majority of building blocks for DL models are based on real-valued operations and representations. We propose a way to extend existing real valued DL models to support complex valued CSI input, and maintain connections of CSI real and imaginary parts in the model.

  • CSI corresponds to channel frequency response, which carries the physical information of the angle of arrival and the path delay can be clearly displayed in angular-delay domain with different resolution of cluster. Thus, we introduce attention mechanism to let the DL model learn information with weights rather than learn equally.

  • CQNet is a lightweight network and reduces those operations that hardware-depends requires, such as exponent calculation.

  • CQNet embeds quantization as a constraint of the neural network that mitigate quantization loss and achieve effective compression with higher accuracy in CSI compression.

The rest of this paper is organized as follows. Section II reviews related works. Section III introduces the system model and preliminary, including channel model and CSI feedback process. Section IV presents the detailed design of CQNet. Section V evaluates the performance of CQNet and provides experimental details. Section VI concludes the paper.

Ii Related Work

The challenge of CSI feedback in massive MIMO systems has motivated plenty of studies. Their main focus is to reduce feedback overhead by using the spatial and temporal correlation of CSI. Current established CSI feedback protocols are based on the concept of Compressive Sensing (CS). Specifically, recover the channel with a sparse vector from an undetermined linear system 

[2]. However, CS based algorithms [3, 4] rely heavily on the assumption of channel sparsity and such algorithms [11, 17] need iterative construction, making the reconstruction process very slow.

To solve such limitations, recently, researchers leverage the deep learning technology that relaxes the sparsity assumption while learning the representative transform using data-driven approach. In particular, as Figure 2 illustrates, the designed encoder architecture at UE side to transform the angular-delay domain CSI to a compressed representation called code-words, and decoder architecture at BS side to reconstruct CSI from the code-words. Such architecture do not rely on any sparsity assumptions, instead, learn a non-linear data transform between original data distribution and latent space data distribution. Both side conduct transformation in a non-iterative way that is much faster than traditional compressive sensing based methods.

The first work CsiNet [21]

explored and demonstrated the efficiency of deep learning based CSI feedback. They proposed CsiNet based on convolutional neural network (CNN) and carefully designed two sequential RefineNet units in decoder to refine the reconstruction accuracy. The results of CsiNet significantly outperform the traditional methods of CSI feedback (LASSO, BM3D-AMP and TVAL3) under various compression rates. Based on that, CsiNet-LSTM 

[20] leverage recurrent convolutional neural network (RCNN), combining several channels within coherence time as a group as an input to the neural network to explore the temporal relationship between channels. CsiNet-LSTM shows the advantage of preserved accuracy under high compression ratio. However, the introduced LSTM increases the computational overhead. CsiNet+ [5] comprehensively surveyed recent deep learning based CSI feedback method and proposed a parallel multiple-rate compression framework focusing on practical storage issue. However, it requires manually switching based on the corresponding compression rate. The state-of-the-art method called CRNet [14] is based on the fact that the density of CSI matrix in angular-delay domain is highly dependent on the channel that they proposed CRBlock to flexibly extract the features in different resolutions. In the end, CRNet outperforms CsiNet under the same computational complexity.

Different from previous works, this work starts from exploring the inherent characteristic of CSI data, and take the practical issues, limited computation resource and limited storage at UE side as consideration to come up with a tailored lightweight DL framework, CQNet, for CSI feedback problem. In addition, this work investigate the potential quantization loss in DL based CSI feedback method and propose effective compression paradigm.

Iii System Model and Preliminary

Iii-a Massive MIMO OFDM FDD System

Consider a single cell FDD system using massive MIMO with antennas at BS, where and antennas at UE side. For simplicity, here we assume equals to 1. The received signal can be expressed as


where indicates the number of subcarriers, indicates the transmitted symbols, and is the complex additive Gaussian noise. can be expressed as , where and represent downlink channel coefficients and beamforming precoding vector for subcarrier , respectively. here represents conjugate transpose.

In order to derive the beamforming precoding vector , the BS needs the knowledge of corresponding channel coefficient , which is fed back by the UE. Suppose that the downlink channel matrix is which contains elements. The number of parameters that need feed back is , including the real and imaginary parts of the CSI. Note that the amount of feedback parameters is proportional to the number of antennas, meaning in massive MIMO, the extremely large number of antennas will give rise to excessive size of the feedback channel matrix H.

The channel matrix H

is often sparse in the angular-delay domain. By 2D discrete Fourier transform (DFT), the original form of spatial-frequency domain CSI can be converted into angular-delay domain, such that


where and are the DFT matrices with dimension and , respectively. For angular-delay domain channel matrix , every element in corresponds to a certain path delay with a certain angle of arrival (AoA). In , only the first rows contain useful information, while the rest of rows, which represent the paths with larger propagation delays, are made up of near-zero values, can be omitted without much information loss. Let denote the informative rows of . Although is already smaller than original CSI matrix, 2 may still remain large. While might be sparse enough for the compressive sensing based methods when . In practice is limited, thus leading to the sparsity assumption invalid, especially when large compression ratio is applied.

In this paper, we design CQNet by adopting encoder-decoder network. The channel matrix H is first converted into angular-delay representation by 2D-DFT. We then remove the near-zero components to obtain . The encoder of CQNet at UE side compresses into a codeword vector v according to a given compression ratio . v is then fed back to the BS, which will reconstruct based on v using its decoder. is finally zero filled and reverted to original H by inverse 2D-DFT.

Iii-B CSI Feedback Process

is put into UE’s encoder to produce codeword v such that


where denotes the encoding process and represents a set of parameters of the encoder.

Once the BS receives the codeword v, the decoder is used to reconstruct the channel by


where denotes the decoding process and represents a set of parameters of the decoder. Therefore, the entire feedback process can be expressed as


The goal of CQNet is to minimize the difference between the original and the reconstructed , which can be expressed formally as finding the parameter sets of encoder and decoder satisfying


Iv CQNet Design

In this section, we present the design of CQNet and its key components. Figure 2 depicts the overall architecture of CQNet. CQNet is an encoder-decoder deep learning framework which contains four main building blocks tailored to the CSI feedback problem.

Fig. 2: The encoder and decoder architecture of CQNet. The encoder compresses CSI into a codeword vector according to a given compression ratio . The decoder reconstructs CSI based on received feedback .

CQNet employs a forged complex-valued input layer that takes real and imaginary parts of the CSI and performs multiple filtered 11 convolutions to separately represent the full complex-valued channel coefficients of different signal paths (Section IV-A). Following the two different types of attention blocks are applied to devise an informative and lightweight encoder, i.e., the channel-wise attention block which aims at enhancing the effectiveness of complex-valued input layer, and the spatial-wise attention block which aims at making use of the cluster effect in the angular-delay domain (Section IV-B). CQNet keeps the residual refine block and multi-resolution block in the decoder side as previous studies shows their effectiveness [14, 21]. To further reduce computation cost, CQNet modifies the fully connected layer which were adopted in most previous studies to point-wise convolution layer and adopts hard-Sigmoid activation which is more hardware friendly than conventional Sigmoid activation (Section IV-C). In addition, CQNet embeds quantization as layers in neural networks which serves as additional regularization constraints to mitigate quantization loss and improve accuracy under large compression rate in the compression process (Section IV-D).Appendix A, Figure 9 presents the complete CQNet architecture with detailed layer level design for reproducibility.

Iv-a Forged Complex-valued Input

While a typical deep learning neural network is designed based on real-valued inputs, operations, and representations, the input of our problem is based on complex-valued path channel coefficients in

. How to best cope with complex-valued inputs is yet an open question in machine learning community 

[19]. Most existing studies utilizing deep learning for wireless communication or wireless sensing systems separate the real and imaginary parts of the complex-valued signals, take them as two independent channels of an image as input, and perform mixed convolutions around the real and imaginary parts of different values. In such a way, the real and imaginary parts of the same complex value are decoupled during the convolution process, which may destroy the original characteristics of each complex-valued channel coefficient.

To tackle such an issue, CQNet devises a specific input layer, which utilizes point-wise convolution to couple the real and imagery parts of the same channel coefficient. The forged complex-valued input layer employs multiple convolutional filters to encode the real and imaginary parts of each complex-valued element in with respective learnable weights.

Mathematically, is a convolutional transformation. Here,

is a 3D tensor, extended from its 2D version by including an additional dimension to separately express the real and imaginary parts, and

, where indicates the number of convolutional filters applied to learn different weighted representations. Let denote the learned set of filter kernels, where refers to the learnable parameter of the -th filter. The output of is , , where


Here denotes convolution, and . For simplicity, bias terms are omitted. Since the output is produced by a summation of the two channels, the dependency between real and imaginary parts is implicitly embedded in . Based on the trade-off between accuracy and model size, CQNet adopts learnable filters. To compare, conventional kernel size entangles the real and imaginary parts of neighboring elements in

, and as a result the 9 complex values are interpolated as one synthesized value, Figure 

3 (b), thus losing the original physical information carrier by the channel matrix. Figure 3 (a) illustrates the design of the complex-valued input layer, , the output of which will be directed to the attention mechanism (to be detailed in next section).

Fig. 3: (a) The forged complex-valued input operation couples the real and imaginary parts of the complex-valued channel matrix and after that the channel-wise attention is applied to strengthen learning of significant channel coefficients. (b) Conventional convolution entangles the real and imaginary parts with neighboring elements’ real parts and imaginary parts.

Iv-B Attention Mechanism for Informative Encoder

The performance of CSI feedback scheme highly depends on the compression part, the encoder. Due to the limited computing power and storage space of UE, deepening the encoder network design is not practical. Therefore, CQNet adopts attention mechanism to achieve distilled yet informative encoding output.

Attention mechanism assists the neural network to focus on important features and suppress unnecessary ones by assigning different learnable weights. It can be interpreted as a means of biasing the allocation of available computational resources to the most informative components of a signal that increases the representativeness of the neural network. CQNet imposes two different attention mechanisms for different stages, resulting in a lightweight yet informative encoder.

Iv-B1 Channel-Wise Attention

To stay with the complex annotated , CQNet devises a forged complex-valued input layer. The output of the input layer , essentially, is a weighted representation of original . Specifically, , where the number of channel corresponds to the learned different weights of , among which, some may be more important than others. Based on this, CQNet introduces the channel-wise attention mechanism, SE block [8], to assist the neural network model with the relationship of the weights so as to focus on important features and suppress unnecessary ones. A diagram of SE block is shown in Figure 3 (a) with annotation .

The output first goes through transformation by global average pooling to obtain channel-wise statistics descriptor ,


Here, acts as expanding network receptive field to the whole angular-delay domain to obtain global statistical information, compensating for the shortcoming of insufficient local receptive field of convolution.

After that, the channel descriptor z goes through transformation, i.e., a gated layer with sigmoid activation to learn the nonlinear interaction as well as non-mutually-exclusive relationship between channels, such that



is the ReLU function,

and . further explicitly model inter-channel dependencies based on z and obtain calibrated , which is the attention vector that summarizes all the characteristics of channel , including intra-channel and inter-channel dependencies. Before being fed into the next layer, each channel of is scaled by the corresponding attention value, such that


Channel-wise attention mechanism intrinsically captures dynamics based on the complex-valued input by learning to weigh the importance of each channel in , boost the feature discriminability, and generates more informative .

Iv-B2 Spatial-Wise Attention

Spatial-wise attention focuses on learning the places of the more informative parts across spatial domain. Specifically, after being converted to angular-delay domain, the channel coefficients exhibit effect of clusters corresponding to the distinguishable paths that arrive with specific delays and AoAs. In order to pay more attention to those clusters, CQNet employs a CBAM block [22] to learn differentiation with weighting in the spatial domain as Figure 4 illustrates.

Fig. 4: Based on the cluster effect in angular-delay domain, spatial-wise attention uses the generating spatial statistical descriptors as the basis for assigning weights, forcing the network to focus more on the distinguishable propagation paths.

First, two pooling operations, i.e., average-pooling and max-pooling, are adopted across the input

’s channel to generate two 2D feature maps, and , respectively. CQNet concatenates the two feature maps to generate a compressed spatial feature descriptor , and convolves it with a standard convolution layer to produce a 2D spatial attention mask . The mask is activated by Sigmoid and then multiplied with the original feature maps to obtain with spatial-wise attention.


With spatial-wise attention, CQNet focuses the neural network to the more informative signal propagation paths in the angular-delay domain.

Iv-C Reduction of the Computation Cost

In practice, UEs are often edge devices with limited computational power, memory and storage, which must be taken into consideration in CQNet design. This section details our efforts in reducing its space and time cost.

Iv-C1 Space Cost

Since the final objective of CQNet is to compress CSI into a fixed length vector v with compression ratio , the last layer of encoder is a fully connected layer. The operation of convolution is equal to that of fully connected layer, since both of them entail element-wise multiplication. CQNet replaces fully connected layer with convolution layer, which greatly reduces the parameters of the network. Taking the input of which equals to 2048 dimensions, and =1/4 as an example, the number of the parameters of the fully connected layer is , while that of convolution layer is , 512 times fewer.

Iv-C2 Time Cost

Sigmoid activation function as often used contains exponential operation


In order to reduce time cost in the computation, CQNet uses hard version of Sigmoid, its piece-wise linear analogy function, denoted as

to replace Sigmoid function 

[7, 9],


where ReLU6 is a clip version of ReLU, which ensures quantization precision in float16 edge device

Fig. 5: Comparison between Sigmoid and hard-Sigmoid functions.

Figure 5 compares the excitation curves of the hard-Sigmoid and Sigmoid functions. The hard-Sigmoid induces no discernible degradation in accuracy but benefits from its computational advantage of entailing no exponential calculations. In practice, hard-Sigmoid can fit in most software and hardware frameworks and can mitigate potential numerical quantization loss introduced by different hardwares.

Iv-D Effective Quantization in the Neural Network

Unlike commonly used encoder-decoder framework, where the encoder output is fed directly into the decoder to reconstruct the input, in our problem of CSI feedback, the encoder output at UE side needs to be transferred to the BS as a bit-stream through a real communication channel. The output of DL encoder commonly is a 32 bit float-point representation providing us an opportunity to perform further compression, which has not been studied in previous DL-based CSI feedback studies. For instance, representing a 32bit parameter with 4bit quantized number gives a true compression ratio of which we call effective compression ratio .

Direct quantization, however, leads to significant quantization loss, as we will show in our experimental evaluation. Given which equals to 2048 dimensions, and =1/4 we let the output of CRNet [14] encoder v, with the value range , be quantized uniformly by bit width , namely, 512 length 32 bit float-point codeword v is transferred to 2048 0-1 bit-stream for transmission by


where the range is divided into interval , and denotes the interval length.

The converted 2048 bits are dequantized back to 32 length float-point and fed into the decoder. Following the operation, the average NMSE result drops dramatically from to , which results in more than times performance drop.

Fig. 6: An example with 4bit quantization: output of an encoder with

Figure 6 visualizes the quantization process of a batch, 10 codeword of length 128. The upper shows the original output value in 32-bit floating point form, and the bottom shows the corresponding value quantized into 4-bit representation.

To mitigate the quantization loss, CQNet embeds quantization-dequantization process as layers that can be trained together with the whole neural network. Since the quantization operation is not derivable, we set the gradient of the layer to be a constant. Essentially, the layer becomes a regularization term that forces the network to adjust the data distribution according to the quantization method and thus reduce the quantization loss.

Embedding the quantization layer in the deep neural network also offers a room for adaptive quantization. We can either fix the bit width as hyper-parameters or set it as a learnable parameter so the quantization layer can adaptively choose the bit width to represent a float number.

V Evaluation

In this section, we evaluate the overall performance of CQNet and the efficacy of the key components. The detailed experiment setting is described in section V-A. Section V-B presents the overall performance and computational overhead as compared with state-of-the-art machine learning based CSI feedback approaches. We then conduct ablation study by additively evaluate the forged complex-valued input layer and two attention blocks to assess their efficacy (Section V-C). Finally, we analyze the effect of the new quantization layer and discuss the possibility of adaptive compression in Section V-D.

V-a Experiment Setting

V-A1 Data Generation

To ensure a fair performance comparison, we use the same dataset as provided in the first work of deep learning based Massive MIMO CSI feedback in [21], which is also used in later studies on this problem [5, 14, 20]. The channel coefficients are generated by COST 2100 channel model [12] with configuration of = 32 uniform linear array (ULA) antennas at the BS, = 1 antenna at UE and = 1024 sub-carriers. There are two types of scenarios. The first one is indoor pico-cell scenario operating on 5.3 GHz band. BS is positioned at the center of 20m square area and UEs are randomly positioned within that square. The other is outdoor rural scenario operating on 300 MHz band. BS is positioned at the center of a 400m square area and UEs are randomly positioned within that square. The generated CSI matrices are converted to angular-delay domain by 2D-DFT.

The total 150,000 independently generated CSI are split into three parts, i.e., 100,000 for training, 30,000 for validation, and 20,000 for testing, respectively.

V-A2 Training Scheme and Evaluation Metric

As comparison scheme, we use the start-of-the-art method CRNet [14], which significantly outperforms other CSI feedback work. CRNet demonstrates the effectiveness of using cosine annealing learning rate with warming up scheme instead of fixing the learning rate to train, and hence, in CQNet, we adopt the same training scheme. To evaluate the performance, we measure the normalized mean square error () between the original and the reconstructed :


The model was trained with the batch size of 200 and 8 workers on a single NVIDIA 2080Ti GPU. The epoch is set to 1000, as recommended in previous work 

[14, 5]. To further ensure fairness, we fix the random seed of the computer in every run.

V-B CQNet Overall Performance

Fig. 7: Normalized Mean Square Error(dB) Comparison between CRNet and CQNet.

Figure 7 shows the overall performance of CQNet as compared with CRNet [14], with the same hardware condition and training scheme. In indoor scenarios, CQNet obtains an average performance increase of 7.88%, with the most significant increase of 10.83% at the compression ratio of . In outdoor scenarios, the average improvement on NMSE is 8.26%, the most significant increase occurs at the compression ratio of with increase of 12.03%. Even in the worst case, CQNet achieves 3.68% () and 4.66% (

) improvement in indoor and outdoor scenarios, respectively. The result shows that CQNet consistently outperforms CRNet for all compression ratios in both indoor and outdoor scenarios with 8.07% overall average improvement on NMSE. In addition, we notice that both CQNet and CRNet have lower accuracy in outdoor scenarios, which is probably caused by the data processing. Due to the long propagation distance, the propagation loss and the path delay in outdoor scenarios are larger, leading to part of the information being discarded when calculating


Fig. 8: the Number of flops () of CRNet and CQNet.

At the same time, we also derive the computational cost in flops (floating-point operations per second) of the two models. As Figure 8 indicates, the number of flops of CQNet is 1.6%, 5.0%, 10.2%, 17.6%, 26.6% less than CRNet at compression ratio of 1/64, 1/32, 1/16, 1/8, 1/4, respectively, which indicate that CQNet yields higher accuracy with less computational complexity. The reduction of the flops is negatively correlated with the compression rate, where the lower compression rate there is, the larger flops reduction there will be. This gain comes from the design of replacing the fully connected layer, whose hyper parameters are proportional to the length of code-word v.

Complex-valued Input
Channel-wise Attention
Spatial-wise Attention
=1/4 -21.912 -26.880 -26.146 -22.322 -27.212
=1/8 -14.048 -15.317 -15.565 -15.164 -15.845
=1/16 -10.216 -11.347 -11.130 -10.651 -11.277
=1/32 -8.484 -8.744 -8.945 -8.763 -8.974
=1/64 -6.063 -6.072 -6.045 -6.064 -6.438
TABLE I: NMSE (dB) Comparison of Ablation Study

V-C Ablation Study

Considering the limited interpretability of deep neural network, we further conduct the ablation study to better quantify the gain of the proposed forged complex-valued input and attention mechanism. The epochs of ablation studies are set to 500, the rest settings remain the same as discussed in Section V-A.

V-C1 Forged Complex-valued Input Design

To evaluate the forged complex-valued input design, we conduct an ablation study that adds the forged complex-valued input layer to the baseline CRNet without other modifications to it. The result is shown in Table I column B. With the forged complex-valued input layer, the accuracy surpasses baseline CRNet (Table I column A) at all compression ratios with an average improvement of 9.2%, which demonstrates the efficacy of appropriately interpret the complex notation.

V-C2 Attention Mechanism Design

In addition to the complex-valued input design, we further conduct the ablation study with three groups of experiments, i.e., only adopting channel-wise attention (Table I column C), only adopting spatial-wise attention (Table I column D) and adopting both attention mechanisms (Table I column E). Compared to the baseline, either adopting channel-wise or spatial-wise attention can improve the performance, where the accuracy is 8.8% and 3.5% higher for the channel-wise and spatial-wise attention, respectively. Adopting the two attention mechanisms at the same time gives the best performance gain, 11.9%.

However, we notice that the performance of the added attention mechanism shows uncertainties. In some cases, the accuracy may drop compared to that Table I column B with forged complex-valued input layer only. Nevertheless, if adopting two attention mechanisms simultaneously, the overall performance is still 2.5% better than Table I column B that only adding the complex-valued input layer. This is the reason why we finally adopt both attention mechanisms in the CQNet design. In particular, when the compression rate is large, for example, , adopting both attention mechanisms helps to improve the NMSE(dB) from -6.063 to -6.438 with an improvement of 6.2%, while adding complex-valued input layer improves alone only 0.1%.

(a) CRNet
(b) CQNet
TABLE II: NMSE(dB) Result of Quantization Loss & Embedding Quantization Gain.

V-D Effective Quantization

To evaluate the effect of the add-on quantization layer and the performance with effective compression ratio , we conduct experiments under indoor scenarios with different quantization ratio. We use trained CRNet and CQNet with different compression ratio and let the encoder outputs go through quantization with corresponding bit width and fed them back to the decoder after dequantization. We follow this procedure for both CRNet and CQNet, and results are shown in Table II denoted as ’wo/q’ and ’w/q’, for compression without and with add-on quantization layer respectively. We report results for and respectively. When the quantization loss is very small so we do not include those results. As the results demonstrate, both CRNet and CQNet have different levels of quantization loss at different compression rates. Among them, the loss is most significant when the and . Both CRNet and CQNet suffer from more than 4 times accuracy drops. When the compression rate itself is high, the quantization loss is relatively small, for example, and CRNet and CQNet have accuracy decrease with 23.9% and 12.1% correspondingly. It means that the accuracy loss of the DL model itself is dominant, therefore, by reducing the quantization loss, we can choose the model with less accuracy loss for quantization and thus achieve better results at the same compression rate.

As mentioned in Section IV-D, we embed quantization as a layer of the neural network and set the gradient as constant to regularize the neural network. Results are shown in Table II with denoted ’e/q’. Compared to direct quantization (w/q), the results of the embedded quantization layer improve the accuracy at all and all , with a significant improvement especially for . When at the value of and , the improvement of CRNet and CQNet are 236,6% and 245.4%, 204.02% and 36.8%, 265.4% and 56.5%, 39.3% and 65.9%, respectively. The results fully demonstrate the effectiveness of the embed quantization layer and the performance gain orthogonal to neural network architecture.

This design gives us a new way of effective compression that compensates for the reduced accuracy of various current schemes at large compression rates. The best results showing the corresponding effective compression ratio based on such a quantization design are bolded in Table II. It can be seen that even compared to the idea case without quantization loss, this effective compression approach can improve the accuracy of both CRNet and CQNet by 131.4% and 114.4%, 135.5% and 140.1%, 130.1% and 137.9% for the original compression ratios of , and , respectively. In addition, we achieve similar accuracy in extremely case with as before . Pushing the limit of compression ratio from to , an 8x improvement.

Vi Conclusion

In this paper, we study CSI feedback problem for 5G communication systems, which is supposed to be the bottleneck in massive MIMO operation. With consideration of the physical properties of the CSI data itself, we propose a novel deep learning framework, CQNet, which is based on the attention mechanism with pseudo-complex input. The overall performance of CQNet is superior as compared with the state-of-the-art CRNet with less computation overhead. In addition, we investigate a practical issue, the quantization loss faced in real communication systems, and identify that integrating a quantization layer into the neural network may serve as a constraint to reduce quantization loss. With our proposed effective compression paradigm, we can improve the previous problem of large compression rate accuracy reduction and can further increase the compression rate.

Appendix A Detailed layer level architecture diagram for reproducibility

Fig. 9: Detailed encoder and decoder design of the proposed CQNet. All the input feature shape() is given on top of the corresponding block. Conv represents convolutional operation, number in front is the size of the filter, bn represents Batch-norm operation and activation layers are left out for simplicity.


  • [1] E. Björnson, E. G. Larsson, and T. L. Marzetta (2016) Massive mimo: ten myths and one critical question. IEEE Communications Magazine 54 (2), pp. 114–123. Cited by: §I.
  • [2] E. J. Candes, J. K. Romberg, and T. Tao (2006) Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 59 (8), pp. 1207–1223. Cited by: §II.
  • [3] I. Daubechies, M. Defrise, and C. De Mol (2004) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 57 (11), pp. 1413–1457. Cited by: §I, §II.
  • [4] D. L. Donoho, A. Maleki, and A. Montanari (2009) Message-passing algorithms for compressed sensing. Proceedings of the National Academy of Sciences 106 (45), pp. 18914–18919. Cited by: §I, §II.
  • [5] J. Guo, C. Wen, S. Jin, and G. Y. Li (2020) Convolutional neural network-based multiple-rate compressive sensing for massive mimo csi feedback: design, simulation, and analysis. IEEE Transactions on Wireless Communications 19 (4), pp. 2827–2840. Cited by: §I, §II, §V-A1, §V-A2.
  • [6] G. E. Hinton and R. R. Salakhutdinov (2006) Reducing the dimensionality of data with neural networks. science 313 (5786), pp. 504–507. Cited by: §I.
  • [7] A. Howard, M. Sandler, G. Chu, L. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, et al. (2019) Searching for mobilenetv3. In

    Proceedings of the IEEE International Conference on Computer Vision

    pp. 1314–1324. Cited by: §IV-C2.
  • [8] J. Hu, L. Shen, and G. Sun (2018) Squeeze-and-excitation networks. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    pp. 7132–7141. Cited by: §IV-B1.
  • [9] A. Krizhevsky and G. Hinton (2010)

    Convolutional deep belief networks on cifar-10

    Unpublished manuscript 40 (7), pp. 1–9. Cited by: §IV-C2.
  • [10] P. Kyritsi, D. C. Cox, R. A. Valenzuela, and P. W. Wolniansky (2003) Correlation analysis based on mimo channel measurements in an indoor environment. IEEE Journal on Selected areas in communications 21 (5), pp. 713–720. Cited by: §I.
  • [11] C. Li, W. Yin, and Y. Zhang (2009) User’s guide for tval3: tv minimization by augmented lagrangian and alternating direction algorithms. CAAM report 20 (46-47), pp. 4. Cited by: §I, §II.
  • [12] L. Liu, C. Oestges, J. Poutanen, K. Haneda, P. Vainikainen, F. Quitin, F. Tufvesson, and P. De Doncker (2012) The cost 2100 mimo channel model. IEEE Wireless Communications 19 (6), pp. 92–99. Cited by: §V-A1.
  • [13] L. Lu, G. Y. Li, A. L. Swindlehurst, A. Ashikhmin, and R. Zhang (2014) An overview of massive mimo: benefits and challenges. IEEE journal of selected topics in signal processing 8 (5), pp. 742–758. Cited by: §I.
  • [14] Z. Lu, J. Wang, and J. Song (2020) Multi-resolution csi feedback with deep learning in massive mimo system. In ICC 2020-2020 IEEE International Conference on Communications (ICC), pp. 1–6. Cited by: §I, §II, §IV-D, §IV, §V-A1, §V-A2, §V-B.
  • [15] T. L. Marzetta, G. Caire, M. Debbah, I. Chih-Lin, and S. K. Mohammed (2013) Special issue on massive mimo. Journal of communications and networks 15 (4), pp. 333–337. Cited by: §I.
  • [16] T. L. Marzetta (2010) Noncooperative cellular wireless with unlimited numbers of base station antennas. IEEE transactions on wireless communications 9 (11), pp. 3590–3600. Cited by: §I.
  • [17] C. A. Metzler, A. Maleki, and R. G. Baraniuk (2016) From denoising to compressed sensing. IEEE Transactions on Information Theory 62 (9), pp. 5117–5144. Cited by: §I, §II.
  • [18] R. Roy and T. Kailath (1989)

    ESPRIT-estimation of signal parameters via rotational invariance techniques

    IEEE Transactions on acoustics, speech, and signal processing 37 (7), pp. 984–995. Cited by: §I.
  • [19] C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J. F. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. J. Pal (2017) Deep complex networks. arXiv preprint arXiv:1705.09792. Cited by: §IV-A.
  • [20] T. Wang, C. Wen, S. Jin, and G. Y. Li (2018) Deep learning-based csi feedback approach for time-varying massive mimo channels. IEEE Wireless Communications Letters 8 (2), pp. 416–419. Cited by: §I, §II, §V-A1.
  • [21] C. Wen, W. Shih, and S. Jin (2018) Deep learning for massive mimo csi feedback. IEEE Wireless Communications Letters 7 (5), pp. 748–751. Cited by: §I, §II, §IV, §V-A1.
  • [22] S. Woo, J. Park, J. Lee, and I. So Kweon (2018) Cbam: convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pp. 3–19. Cited by: §IV-B2.
  • [23] Y. Yang, F. Gao, G. Y. Li, and M. Jian (2019) Deep learning-based downlink channel prediction for fdd massive mimo system. IEEE Communications Letters 23 (11), pp. 1994–1998. Cited by: §I.
  • [24] Y. Zhou, M. Herdin, A. M. Sayeed, and E. Bonek (2007) Experimental study of mimo channel statistics and capacity via the virtual channel representation. Univ. Wisconsin-Madison, Madison, WI, USA, Tech. Rep 5, pp. 10–15. Cited by: §I.