I Introduction
The massive multipleinput multipleoutput (MIMO) technology is considered one of the core technologies of the next generation communication system, e.g., 5G. By equipping large number of antennas, base station (BS) can sufficiently utilize spatial diversity to improve channel capacity. Especially, by enabling beamforming, a 5G BS can concentrate signal energy to a specific user equipment (UE) to achieve higher signaltonoise ratio (SNR), less interference leakage and hence, higher channel capacity. However, beamforming is possibly conducted by the BS only when it has the channel state information (CSI) of the downlink at hand
[16].Many research efforts have been devoted to timedivision duplexing (TDD) massive MIMO, because the CSI in the TDD mode can be obtained by exploiting channel reciprocity, where the pilotaided training overhead is independent of the number of antennas.
However, the TDD mode is not efficient enough in terms of time sensitive communication such as live video streaming, vehicular communications, etc. On the other hand, the frequencydivision duplexing (FDD) mode uses different frequency bands for uplink and downlink transmissions at the same time, which is more efficient and can support more users simultaneously. Thus, most of contemporary cellular systems operate in FDD mode [15]. From the deployment perspective, adopting massive MIMO in FDD mode therefore attracts greater interest in exploring effective approaches [1]. The biggest challenge to working with FDD mode is its overhead for CSI acquisition. Unlike TDD mode, in FDD mode, as Figure 1 depicts, the uplink and downlink channels are separated to different frequency bands, and hence the channel reciprocity does not exist. As a consequence, the UE would have to explicitly feed back the knowledge of downlink CSI to the BS, and the pilotaided training overhead grows quadratically with the number of transmitting antennas which might overturn the benefit of Massive MIMO itself [13].
Fortunately, one important observation for massive MIMO systems helps alleviate the above issue. Some experimental studies of massive MIMO channels [24, 10] show that as the number of transmitting antennas increases, the user channel matrices tend to be sparse due to the limited local scatterers at the BS. Such observation inspires researchers to exploit unobvious sparse representation of CSI. Specifically, the massive MIMO channel has an approximately sparse representation in joint angulardelay domain [18], which can be obtained by conducting 2DDFT on the channel matrix. The angle of arrival (AOA) and the spread delay of the path remain constant for uplink and downlink in the angulardelay domain. Based on the above characteristics, compressive sensing based CSI feedback algorithms have been proposed in recent years, e.g., LASSO [3], TVAL3 [11], and BM3DAMP [17].
Compressive sensing based CSI feedback methods however relies heavily on channel sparsity and is limited by its efficiency in iteratively reconstructing the signals, the performance of which is highly dependent on the wireless channel, and thus is not a desirable approach considering the diversified use cases of 5G networks.
The recent rapid development of deep learning (DL) technologies provide another possible solution to efficiently feeding back CSI for FDD massive MIMO system. Instead of relying on sparsity, DL approaches utilize the autoencoder frameworks [6] as an implicit prior constraint for encoding data [4]. The decoder learns a map from the lowdimensional data space to the targeted data distribution by single run to reconstruct the original data, without requiring the labeled data, which naturally overcomes the limit of compressive sensing based approaches in channel sparsity and operation efficiency. Recent studies [5, 14, 20, 21, 23] have demonstrated the feasibility and efficacy that DL can achieve in CSI feedback.
DL based CSI compression method is still at its early development. Most of previous works pay no attention to the complexity of the proposed DL model, which needs significant study. Considering the fact that the CSI compression would be conducted by UE, which has limited computing power and memory resources. Also, previous studies have not fully exploited the characteristics of complex valued CSI for organic integration of the real and imaginary parts into the real valued neural network models, which limits their performance in accurately representing the wireless channel. Accordingly, in this work, we propose Complex Quantization Net (CQNet), a DL based neural network framework for massive MIMO CSI compression/decompression, which is empowered by forged complexvalued convolution layers and attention mechanisms. The proposed CQNet outperforms the stateoftheart with higher accuracy in CSI feedback and less computational overhead in operation. At the same time, we propose an effective compression paradigm, which greatly improves the accuracy degradation of the current method in the case of large compression rate, and is able to improve the compression rate by a factor of 8 with the same accuracy. We state the following contributions.

Signals and CSI are represented in complex envelopes, but at present, the majority of building blocks for DL models are based on realvalued operations and representations. We propose a way to extend existing real valued DL models to support complex valued CSI input, and maintain connections of CSI real and imaginary parts in the model.

CSI corresponds to channel frequency response, which carries the physical information of the angle of arrival and the path delay can be clearly displayed in angulardelay domain with different resolution of cluster. Thus, we introduce attention mechanism to let the DL model learn information with weights rather than learn equally.

CQNet is a lightweight network and reduces those operations that hardwaredepends requires, such as exponent calculation.

CQNet embeds quantization as a constraint of the neural network that mitigate quantization loss and achieve effective compression with higher accuracy in CSI compression.
The rest of this paper is organized as follows. Section II reviews related works. Section III introduces the system model and preliminary, including channel model and CSI feedback process. Section IV presents the detailed design of CQNet. Section V evaluates the performance of CQNet and provides experimental details. Section VI concludes the paper.
Ii Related Work
The challenge of CSI feedback in massive MIMO systems has motivated plenty of studies. Their main focus is to reduce feedback overhead by using the spatial and temporal correlation of CSI. Current established CSI feedback protocols are based on the concept of Compressive Sensing (CS). Specifically, recover the channel with a sparse vector from an undetermined linear system
[2]. However, CS based algorithms [3, 4] rely heavily on the assumption of channel sparsity and such algorithms [11, 17] need iterative construction, making the reconstruction process very slow.To solve such limitations, recently, researchers leverage the deep learning technology that relaxes the sparsity assumption while learning the representative transform using datadriven approach. In particular, as Figure 2 illustrates, the designed encoder architecture at UE side to transform the angulardelay domain CSI to a compressed representation called codewords, and decoder architecture at BS side to reconstruct CSI from the codewords. Such architecture do not rely on any sparsity assumptions, instead, learn a nonlinear data transform between original data distribution and latent space data distribution. Both side conduct transformation in a noniterative way that is much faster than traditional compressive sensing based methods.
The first work CsiNet [21]
explored and demonstrated the efficiency of deep learning based CSI feedback. They proposed CsiNet based on convolutional neural network (CNN) and carefully designed two sequential RefineNet units in decoder to refine the reconstruction accuracy. The results of CsiNet significantly outperform the traditional methods of CSI feedback (LASSO, BM3DAMP and TVAL3) under various compression rates. Based on that, CsiNetLSTM
[20] leverage recurrent convolutional neural network (RCNN), combining several channels within coherence time as a group as an input to the neural network to explore the temporal relationship between channels. CsiNetLSTM shows the advantage of preserved accuracy under high compression ratio. However, the introduced LSTM increases the computational overhead. CsiNet+ [5] comprehensively surveyed recent deep learning based CSI feedback method and proposed a parallel multiplerate compression framework focusing on practical storage issue. However, it requires manually switching based on the corresponding compression rate. The stateoftheart method called CRNet [14] is based on the fact that the density of CSI matrix in angulardelay domain is highly dependent on the channel that they proposed CRBlock to flexibly extract the features in different resolutions. In the end, CRNet outperforms CsiNet under the same computational complexity.Different from previous works, this work starts from exploring the inherent characteristic of CSI data, and take the practical issues, limited computation resource and limited storage at UE side as consideration to come up with a tailored lightweight DL framework, CQNet, for CSI feedback problem. In addition, this work investigate the potential quantization loss in DL based CSI feedback method and propose effective compression paradigm.
Iii System Model and Preliminary
Iiia Massive MIMO OFDM FDD System
Consider a single cell FDD system using massive MIMO with antennas at BS, where and antennas at UE side. For simplicity, here we assume equals to 1. The received signal can be expressed as
(1) 
where indicates the number of subcarriers, indicates the transmitted symbols, and is the complex additive Gaussian noise. can be expressed as , where and represent downlink channel coefficients and beamforming precoding vector for subcarrier , respectively. here represents conjugate transpose.
In order to derive the beamforming precoding vector , the BS needs the knowledge of corresponding channel coefficient , which is fed back by the UE. Suppose that the downlink channel matrix is which contains elements. The number of parameters that need feed back is , including the real and imaginary parts of the CSI. Note that the amount of feedback parameters is proportional to the number of antennas, meaning in massive MIMO, the extremely large number of antennas will give rise to excessive size of the feedback channel matrix H.
The channel matrix H
is often sparse in the angulardelay domain. By 2D discrete Fourier transform (DFT), the original form of spatialfrequency domain CSI can be converted into angulardelay domain, such that
(2) 
where and are the DFT matrices with dimension and , respectively. For angulardelay domain channel matrix , every element in corresponds to a certain path delay with a certain angle of arrival (AoA). In , only the first rows contain useful information, while the rest of rows, which represent the paths with larger propagation delays, are made up of nearzero values, can be omitted without much information loss. Let denote the informative rows of . Although is already smaller than original CSI matrix, 2 may still remain large. While might be sparse enough for the compressive sensing based methods when . In practice is limited, thus leading to the sparsity assumption invalid, especially when large compression ratio is applied.
In this paper, we design CQNet by adopting encoderdecoder network. The channel matrix H is first converted into angulardelay representation by 2DDFT. We then remove the nearzero components to obtain . The encoder of CQNet at UE side compresses into a codeword vector v according to a given compression ratio . v is then fed back to the BS, which will reconstruct based on v using its decoder. is finally zero filled and reverted to original H by inverse 2DDFT.
IiiB CSI Feedback Process
is put into UE’s encoder to produce codeword v such that
(3) 
where denotes the encoding process and represents a set of parameters of the encoder.
Once the BS receives the codeword v, the decoder is used to reconstruct the channel by
(4) 
where denotes the decoding process and represents a set of parameters of the decoder. Therefore, the entire feedback process can be expressed as
(5) 
The goal of CQNet is to minimize the difference between the original and the reconstructed , which can be expressed formally as finding the parameter sets of encoder and decoder satisfying
(6) 
Iv CQNet Design
In this section, we present the design of CQNet and its key components. Figure 2 depicts the overall architecture of CQNet. CQNet is an encoderdecoder deep learning framework which contains four main building blocks tailored to the CSI feedback problem.
CQNet employs a forged complexvalued input layer that takes real and imaginary parts of the CSI and performs multiple filtered 11 convolutions to separately represent the full complexvalued channel coefficients of different signal paths (Section IVA). Following the two different types of attention blocks are applied to devise an informative and lightweight encoder, i.e., the channelwise attention block which aims at enhancing the effectiveness of complexvalued input layer, and the spatialwise attention block which aims at making use of the cluster effect in the angulardelay domain (Section IVB). CQNet keeps the residual refine block and multiresolution block in the decoder side as previous studies shows their effectiveness [14, 21]. To further reduce computation cost, CQNet modifies the fully connected layer which were adopted in most previous studies to pointwise convolution layer and adopts hardSigmoid activation which is more hardware friendly than conventional Sigmoid activation (Section IVC). In addition, CQNet embeds quantization as layers in neural networks which serves as additional regularization constraints to mitigate quantization loss and improve accuracy under large compression rate in the compression process (Section IVD).Appendix A, Figure 9 presents the complete CQNet architecture with detailed layer level design for reproducibility.
Iva Forged Complexvalued Input
While a typical deep learning neural network is designed based on realvalued inputs, operations, and representations, the input of our problem is based on complexvalued path channel coefficients in
. How to best cope with complexvalued inputs is yet an open question in machine learning community
[19]. Most existing studies utilizing deep learning for wireless communication or wireless sensing systems separate the real and imaginary parts of the complexvalued signals, take them as two independent channels of an image as input, and perform mixed convolutions around the real and imaginary parts of different values. In such a way, the real and imaginary parts of the same complex value are decoupled during the convolution process, which may destroy the original characteristics of each complexvalued channel coefficient.To tackle such an issue, CQNet devises a specific input layer, which utilizes pointwise convolution to couple the real and imagery parts of the same channel coefficient. The forged complexvalued input layer employs multiple convolutional filters to encode the real and imaginary parts of each complexvalued element in with respective learnable weights.
Mathematically, is a convolutional transformation. Here,
is a 3D tensor, extended from its 2D version by including an additional dimension to separately express the real and imaginary parts, and
, where indicates the number of convolutional filters applied to learn different weighted representations. Let denote the learned set of filter kernels, where refers to the learnable parameter of the th filter. The output of is , , where(7) 
Here denotes convolution, and . For simplicity, bias terms are omitted. Since the output is produced by a summation of the two channels, the dependency between real and imaginary parts is implicitly embedded in . Based on the tradeoff between accuracy and model size, CQNet adopts learnable filters. To compare, conventional kernel size entangles the real and imaginary parts of neighboring elements in
, and as a result the 9 complex values are interpolated as one synthesized value, Figure
3 (b), thus losing the original physical information carrier by the channel matrix. Figure 3 (a) illustrates the design of the complexvalued input layer, , the output of which will be directed to the attention mechanism (to be detailed in next section).IvB Attention Mechanism for Informative Encoder
The performance of CSI feedback scheme highly depends on the compression part, the encoder. Due to the limited computing power and storage space of UE, deepening the encoder network design is not practical. Therefore, CQNet adopts attention mechanism to achieve distilled yet informative encoding output.
Attention mechanism assists the neural network to focus on important features and suppress unnecessary ones by assigning different learnable weights. It can be interpreted as a means of biasing the allocation of available computational resources to the most informative components of a signal that increases the representativeness of the neural network. CQNet imposes two different attention mechanisms for different stages, resulting in a lightweight yet informative encoder.
IvB1 ChannelWise Attention
To stay with the complex annotated , CQNet devises a forged complexvalued input layer. The output of the input layer , essentially, is a weighted representation of original . Specifically, , where the number of channel corresponds to the learned different weights of , among which, some may be more important than others. Based on this, CQNet introduces the channelwise attention mechanism, SE block [8], to assist the neural network model with the relationship of the weights so as to focus on important features and suppress unnecessary ones. A diagram of SE block is shown in Figure 3 (a) with annotation .
The output first goes through transformation by global average pooling to obtain channelwise statistics descriptor ,
(8) 
Here, acts as expanding network receptive field to the whole angulardelay domain to obtain global statistical information, compensating for the shortcoming of insufficient local receptive field of convolution.
After that, the channel descriptor z goes through transformation, i.e., a gated layer with sigmoid activation to learn the nonlinear interaction as well as nonmutuallyexclusive relationship between channels, such that
(9) 
where
is the ReLU function,
and . further explicitly model interchannel dependencies based on z and obtain calibrated , which is the attention vector that summarizes all the characteristics of channel , including intrachannel and interchannel dependencies. Before being fed into the next layer, each channel of is scaled by the corresponding attention value, such that(10) 
Channelwise attention mechanism intrinsically captures dynamics based on the complexvalued input by learning to weigh the importance of each channel in , boost the feature discriminability, and generates more informative .
IvB2 SpatialWise Attention
Spatialwise attention focuses on learning the places of the more informative parts across spatial domain. Specifically, after being converted to angulardelay domain, the channel coefficients exhibit effect of clusters corresponding to the distinguishable paths that arrive with specific delays and AoAs. In order to pay more attention to those clusters, CQNet employs a CBAM block [22] to learn differentiation with weighting in the spatial domain as Figure 4 illustrates.
First, two pooling operations, i.e., averagepooling and maxpooling, are adopted across the input
’s channel to generate two 2D feature maps, and , respectively. CQNet concatenates the two feature maps to generate a compressed spatial feature descriptor , and convolves it with a standard convolution layer to produce a 2D spatial attention mask . The mask is activated by Sigmoid and then multiplied with the original feature maps to obtain with spatialwise attention.(11)  
With spatialwise attention, CQNet focuses the neural network to the more informative signal propagation paths in the angulardelay domain.
IvC Reduction of the Computation Cost
In practice, UEs are often edge devices with limited computational power, memory and storage, which must be taken into consideration in CQNet design. This section details our efforts in reducing its space and time cost.
IvC1 Space Cost
Since the final objective of CQNet is to compress CSI into a fixed length vector v with compression ratio , the last layer of encoder is a fully connected layer. The operation of convolution is equal to that of fully connected layer, since both of them entail elementwise multiplication. CQNet replaces fully connected layer with convolution layer, which greatly reduces the parameters of the network. Taking the input of which equals to 2048 dimensions, and =1/4 as an example, the number of the parameters of the fully connected layer is , while that of convolution layer is , 512 times fewer.
IvC2 Time Cost
Sigmoid activation function as often used contains exponential operation
(12) 
In order to reduce time cost in the computation, CQNet uses hard version of Sigmoid, its piecewise linear analogy function, denoted as
to replace Sigmoid function
[7, 9],(13) 
where ReLU6 is a clip version of ReLU, which ensures quantization precision in float16 edge device
(14) 
Figure 5 compares the excitation curves of the hardSigmoid and Sigmoid functions. The hardSigmoid induces no discernible degradation in accuracy but benefits from its computational advantage of entailing no exponential calculations. In practice, hardSigmoid can fit in most software and hardware frameworks and can mitigate potential numerical quantization loss introduced by different hardwares.
IvD Effective Quantization in the Neural Network
Unlike commonly used encoderdecoder framework, where the encoder output is fed directly into the decoder to reconstruct the input, in our problem of CSI feedback, the encoder output at UE side needs to be transferred to the BS as a bitstream through a real communication channel. The output of DL encoder commonly is a 32 bit floatpoint representation providing us an opportunity to perform further compression, which has not been studied in previous DLbased CSI feedback studies. For instance, representing a 32bit parameter with 4bit quantized number gives a true compression ratio of which we call effective compression ratio .
Direct quantization, however, leads to significant quantization loss, as we will show in our experimental evaluation. Given which equals to 2048 dimensions, and =1/4 we let the output of CRNet [14] encoder v, with the value range , be quantized uniformly by bit width , namely, 512 length 32 bit floatpoint codeword v is transferred to 2048 01 bitstream for transmission by
(15) 
where the range is divided into interval , and denotes the interval length.
The converted 2048 bits are dequantized back to 32 length floatpoint and fed into the decoder. Following the operation, the average NMSE result drops dramatically from to , which results in more than times performance drop.
Figure 6 visualizes the quantization process of a batch, 10 codeword of length 128. The upper shows the original output value in 32bit floating point form, and the bottom shows the corresponding value quantized into 4bit representation.
To mitigate the quantization loss, CQNet embeds quantizationdequantization process as layers that can be trained together with the whole neural network. Since the quantization operation is not derivable, we set the gradient of the layer to be a constant. Essentially, the layer becomes a regularization term that forces the network to adjust the data distribution according to the quantization method and thus reduce the quantization loss.
Embedding the quantization layer in the deep neural network also offers a room for adaptive quantization. We can either fix the bit width as hyperparameters or set it as a learnable parameter so the quantization layer can adaptively choose the bit width to represent a float number.
V Evaluation
In this section, we evaluate the overall performance of CQNet and the efficacy of the key components. The detailed experiment setting is described in section VA. Section VB presents the overall performance and computational overhead as compared with stateoftheart machine learning based CSI feedback approaches. We then conduct ablation study by additively evaluate the forged complexvalued input layer and two attention blocks to assess their efficacy (Section VC). Finally, we analyze the effect of the new quantization layer and discuss the possibility of adaptive compression in Section VD.
Va Experiment Setting
VA1 Data Generation
To ensure a fair performance comparison, we use the same dataset as provided in the first work of deep learning based Massive MIMO CSI feedback in [21], which is also used in later studies on this problem [5, 14, 20]. The channel coefficients are generated by COST 2100 channel model [12] with configuration of = 32 uniform linear array (ULA) antennas at the BS, = 1 antenna at UE and = 1024 subcarriers. There are two types of scenarios. The first one is indoor picocell scenario operating on 5.3 GHz band. BS is positioned at the center of 20m square area and UEs are randomly positioned within that square. The other is outdoor rural scenario operating on 300 MHz band. BS is positioned at the center of a 400m square area and UEs are randomly positioned within that square. The generated CSI matrices are converted to angulardelay domain by 2DDFT.
The total 150,000 independently generated CSI are split into three parts, i.e., 100,000 for training, 30,000 for validation, and 20,000 for testing, respectively.
VA2 Training Scheme and Evaluation Metric
As comparison scheme, we use the startoftheart method CRNet [14], which significantly outperforms other CSI feedback work. CRNet demonstrates the effectiveness of using cosine annealing learning rate with warming up scheme instead of fixing the learning rate to train, and hence, in CQNet, we adopt the same training scheme. To evaluate the performance, we measure the normalized mean square error () between the original and the reconstructed :
(16) 
The model was trained with the batch size of 200 and 8 workers on a single NVIDIA 2080Ti GPU. The epoch is set to 1000, as recommended in previous work
[14, 5]. To further ensure fairness, we fix the random seed of the computer in every run.VB CQNet Overall Performance
Figure 7 shows the overall performance of CQNet as compared with CRNet [14], with the same hardware condition and training scheme. In indoor scenarios, CQNet obtains an average performance increase of 7.88%, with the most significant increase of 10.83% at the compression ratio of . In outdoor scenarios, the average improvement on NMSE is 8.26%, the most significant increase occurs at the compression ratio of with increase of 12.03%. Even in the worst case, CQNet achieves 3.68% () and 4.66% (
) improvement in indoor and outdoor scenarios, respectively. The result shows that CQNet consistently outperforms CRNet for all compression ratios in both indoor and outdoor scenarios with 8.07% overall average improvement on NMSE. In addition, we notice that both CQNet and CRNet have lower accuracy in outdoor scenarios, which is probably caused by the data processing. Due to the long propagation distance, the propagation loss and the path delay in outdoor scenarios are larger, leading to part of the information being discarded when calculating
.At the same time, we also derive the computational cost in flops (floatingpoint operations per second) of the two models. As Figure 8 indicates, the number of flops of CQNet is 1.6%, 5.0%, 10.2%, 17.6%, 26.6% less than CRNet at compression ratio of 1/64, 1/32, 1/16, 1/8, 1/4, respectively, which indicate that CQNet yields higher accuracy with less computational complexity. The reduction of the flops is negatively correlated with the compression rate, where the lower compression rate there is, the larger flops reduction there will be. This gain comes from the design of replacing the fully connected layer, whose hyper parameters are proportional to the length of codeword v.
A  B  C  D  E  
Baseline  
Complexvalued Input  
Channelwise Attention  
Spatialwise Attention  
=1/4  21.912  26.880  26.146  22.322  27.212 
=1/8  14.048  15.317  15.565  15.164  15.845 
=1/16  10.216  11.347  11.130  10.651  11.277 
=1/32  8.484  8.744  8.945  8.763  8.974 
=1/64  6.063  6.072  6.045  6.064  6.438 
VC Ablation Study
Considering the limited interpretability of deep neural network, we further conduct the ablation study to better quantify the gain of the proposed forged complexvalued input and attention mechanism. The epochs of ablation studies are set to 500, the rest settings remain the same as discussed in Section VA.
VC1 Forged Complexvalued Input Design
To evaluate the forged complexvalued input design, we conduct an ablation study that adds the forged complexvalued input layer to the baseline CRNet without other modifications to it. The result is shown in Table I column B. With the forged complexvalued input layer, the accuracy surpasses baseline CRNet (Table I column A) at all compression ratios with an average improvement of 9.2%, which demonstrates the efficacy of appropriately interpret the complex notation.
VC2 Attention Mechanism Design
In addition to the complexvalued input design, we further conduct the ablation study with three groups of experiments, i.e., only adopting channelwise attention (Table I column C), only adopting spatialwise attention (Table I column D) and adopting both attention mechanisms (Table I column E). Compared to the baseline, either adopting channelwise or spatialwise attention can improve the performance, where the accuracy is 8.8% and 3.5% higher for the channelwise and spatialwise attention, respectively. Adopting the two attention mechanisms at the same time gives the best performance gain, 11.9%.
However, we notice that the performance of the added attention mechanism shows uncertainties. In some cases, the accuracy may drop compared to that Table I column B with forged complexvalued input layer only. Nevertheless, if adopting two attention mechanisms simultaneously, the overall performance is still 2.5% better than Table I column B that only adding the complexvalued input layer. This is the reason why we finally adopt both attention mechanisms in the CQNet design. In particular, when the compression rate is large, for example, , adopting both attention mechanisms helps to improve the NMSE(dB) from 6.063 to 6.438 with an improvement of 6.2%, while adding complexvalued input layer improves alone only 0.1%.
VD Effective Quantization
To evaluate the effect of the addon quantization layer and the performance with effective compression ratio , we conduct experiments under indoor scenarios with different quantization ratio. We use trained CRNet and CQNet with different compression ratio and let the encoder outputs go through quantization with corresponding bit width and fed them back to the decoder after dequantization. We follow this procedure for both CRNet and CQNet, and results are shown in Table II denoted as ’wo/q’ and ’w/q’, for compression without and with addon quantization layer respectively. We report results for and respectively. When the quantization loss is very small so we do not include those results. As the results demonstrate, both CRNet and CQNet have different levels of quantization loss at different compression rates. Among them, the loss is most significant when the and . Both CRNet and CQNet suffer from more than 4 times accuracy drops. When the compression rate itself is high, the quantization loss is relatively small, for example, and CRNet and CQNet have accuracy decrease with 23.9% and 12.1% correspondingly. It means that the accuracy loss of the DL model itself is dominant, therefore, by reducing the quantization loss, we can choose the model with less accuracy loss for quantization and thus achieve better results at the same compression rate.
As mentioned in Section IVD, we embed quantization as a layer of the neural network and set the gradient as constant to regularize the neural network. Results are shown in Table II with denoted ’e/q’. Compared to direct quantization (w/q), the results of the embedded quantization layer improve the accuracy at all and all , with a significant improvement especially for . When at the value of and , the improvement of CRNet and CQNet are 236,6% and 245.4%, 204.02% and 36.8%, 265.4% and 56.5%, 39.3% and 65.9%, respectively. The results fully demonstrate the effectiveness of the embed quantization layer and the performance gain orthogonal to neural network architecture.
This design gives us a new way of effective compression that compensates for the reduced accuracy of various current schemes at large compression rates. The best results showing the corresponding effective compression ratio based on such a quantization design are bolded in Table II. It can be seen that even compared to the idea case without quantization loss, this effective compression approach can improve the accuracy of both CRNet and CQNet by 131.4% and 114.4%, 135.5% and 140.1%, 130.1% and 137.9% for the original compression ratios of , and , respectively. In addition, we achieve similar accuracy in extremely case with as before . Pushing the limit of compression ratio from to , an 8x improvement.
Vi Conclusion
In this paper, we study CSI feedback problem for 5G communication systems, which is supposed to be the bottleneck in massive MIMO operation. With consideration of the physical properties of the CSI data itself, we propose a novel deep learning framework, CQNet, which is based on the attention mechanism with pseudocomplex input. The overall performance of CQNet is superior as compared with the stateoftheart CRNet with less computation overhead. In addition, we investigate a practical issue, the quantization loss faced in real communication systems, and identify that integrating a quantization layer into the neural network may serve as a constraint to reduce quantization loss. With our proposed effective compression paradigm, we can improve the previous problem of large compression rate accuracy reduction and can further increase the compression rate.
Appendix A Detailed layer level architecture diagram for reproducibility
References
 [1] (2016) Massive mimo: ten myths and one critical question. IEEE Communications Magazine 54 (2), pp. 114–123. Cited by: §I.
 [2] (2006) Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 59 (8), pp. 1207–1223. Cited by: §II.
 [3] (2004) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 57 (11), pp. 1413–1457. Cited by: §I, §II.
 [4] (2009) Messagepassing algorithms for compressed sensing. Proceedings of the National Academy of Sciences 106 (45), pp. 18914–18919. Cited by: §I, §II.
 [5] (2020) Convolutional neural networkbased multiplerate compressive sensing for massive mimo csi feedback: design, simulation, and analysis. IEEE Transactions on Wireless Communications 19 (4), pp. 2827–2840. Cited by: §I, §II, §VA1, §VA2.
 [6] (2006) Reducing the dimensionality of data with neural networks. science 313 (5786), pp. 504–507. Cited by: §I.

[7]
(2019)
Searching for mobilenetv3.
In
Proceedings of the IEEE International Conference on Computer Vision
, pp. 1314–1324. Cited by: §IVC2. 
[8]
(2018)
Squeezeandexcitation networks.
In
Proceedings of the IEEE conference on computer vision and pattern recognition
, pp. 7132–7141. Cited by: §IVB1. 
[9]
(2010)
Convolutional deep belief networks on cifar10
. Unpublished manuscript 40 (7), pp. 1–9. Cited by: §IVC2.  [10] (2003) Correlation analysis based on mimo channel measurements in an indoor environment. IEEE Journal on Selected areas in communications 21 (5), pp. 713–720. Cited by: §I.
 [11] (2009) User’s guide for tval3: tv minimization by augmented lagrangian and alternating direction algorithms. CAAM report 20 (4647), pp. 4. Cited by: §I, §II.
 [12] (2012) The cost 2100 mimo channel model. IEEE Wireless Communications 19 (6), pp. 92–99. Cited by: §VA1.
 [13] (2014) An overview of massive mimo: benefits and challenges. IEEE journal of selected topics in signal processing 8 (5), pp. 742–758. Cited by: §I.
 [14] (2020) Multiresolution csi feedback with deep learning in massive mimo system. In ICC 20202020 IEEE International Conference on Communications (ICC), pp. 1–6. Cited by: §I, §II, §IVD, §IV, §VA1, §VA2, §VB.
 [15] (2013) Special issue on massive mimo. Journal of communications and networks 15 (4), pp. 333–337. Cited by: §I.
 [16] (2010) Noncooperative cellular wireless with unlimited numbers of base station antennas. IEEE transactions on wireless communications 9 (11), pp. 3590–3600. Cited by: §I.
 [17] (2016) From denoising to compressed sensing. IEEE Transactions on Information Theory 62 (9), pp. 5117–5144. Cited by: §I, §II.

[18]
(1989)
ESPRITestimation of signal parameters via rotational invariance techniques
. IEEE Transactions on acoustics, speech, and signal processing 37 (7), pp. 984–995. Cited by: §I.  [19] (2017) Deep complex networks. arXiv preprint arXiv:1705.09792. Cited by: §IVA.
 [20] (2018) Deep learningbased csi feedback approach for timevarying massive mimo channels. IEEE Wireless Communications Letters 8 (2), pp. 416–419. Cited by: §I, §II, §VA1.
 [21] (2018) Deep learning for massive mimo csi feedback. IEEE Wireless Communications Letters 7 (5), pp. 748–751. Cited by: §I, §II, §IV, §VA1.
 [22] (2018) Cbam: convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pp. 3–19. Cited by: §IVB2.
 [23] (2019) Deep learningbased downlink channel prediction for fdd massive mimo system. IEEE Communications Letters 23 (11), pp. 1994–1998. Cited by: §I.
 [24] (2007) Experimental study of mimo channel statistics and capacity via the virtual channel representation. Univ. WisconsinMadison, Madison, WI, USA, Tech. Rep 5, pp. 10–15. Cited by: §I.
Comments
There are no comments yet.