Learning on a Grassmann Manifold: CSI Quantization for Massive MIMO Systems

This paper focuses on the design of beamforming codebooks that maximize the average normalized beamforming gain for any underlying channel distribution. While the existing techniques use statistical channel models, we utilize a model-free data-driven approach with foundations in machine learning to generate beamforming codebooks that adapt to the surrounding propagation conditions. The key technical contribution lies in reducing the codebook design problem to an unsupervised clustering problem on a Grassmann manifold where the cluster centroids form the finite-sized beamforming codebook for the channel state information (CSI), which can be efficiently solved using K-means clustering. This approach is extended to develop a remarkably efficient procedure for designing product codebooks for full-dimension (FD) multiple-input multiple-output (MIMO) systems with uniform planar array (UPA) antennas. Simulation results demonstrate the capability of the proposed design criterion in learning the codebooks, reducing the codebook size and producing noticeably higher beamforming gains compared to the existing state-of-the-art CSI quantization techniques.

Authors

• 2 publications
• 10 publications
• 64 publications
06/21/2021

Tensor Learning-based Precoder Codebooks for FD-MIMO Systems

This paper develops an efficient procedure for designing low-complexity ...
06/30/2020

Unsupervised Deep Learning for Massive MIMO Hybrid Beamforming

Hybrid beamforming is a promising technique to reduce the complexity and...
01/16/2018

Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive MIMO System

This paper proposes a novel approach for designing channel estimation, b...
09/28/2020

Recursive CSI Quantization of Time-Correlated MIMO Channels by Deep Learning Classification

In frequency division duplex (FDD) multiple-input multiple-output (MIMO)...
10/27/2021

Beamforming Feedback-based Model-driven Angle of Departure Estimation Toward Firmware-Agnostic WiFi Sensing

This paper proves that the angle of departure (AoD) estimation using the...
03/12/2020

RSSI-Based Hybrid Beamforming Design with Deep Learning

Hybrid beamforming is a promising technology for 5G millimetre-wave comm...
03/29/2022

Dynamic-subarray with Fixed Phase Shifters for Energy-efficient Terahertz Hybrid Beamforming under Partial CSI

Terahertz (THz) communications are regarded as a pillar technology for t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Transmit beamforming with receive combining is one of the simplest approaches to achieve full diversity in a MIMO system. It just requires CSI at the transmitter in the form of the transmit beamforming vector. The frequency division duplex (FDD) large-scale MIMO systems cannot utilize channel reciprocity to acquire CSI at the transmitter using uplink transmission. This necessitates channel estimation using downlink pilots and the subsequent feedback of the channel estimates (in this case the beamforming vector) to the transmitter over a dedicated feedback channel with limited capacity. This results in a significant overhead when the number of antennas is large. One way to overcome this problem is to construct a set of beamforming vectors constituting a

codebook, which is known to both the transmitter and the receiver. The problem then reduces to determining the best beamforming vector at the receiver and conveying its index to the transmitter over the feedback channel [1]. A key step in this procedure is to construct the codebook, which is a classical problem in MIMO communications [2]. A common feature of all classical works in this direction is the assumption of a statistical channel model (such as Rayleigh) for which the optimal codebooks are constructed to optimize system performance.

With the recent interest in data-driven approaches to wireless system design, it is quite natural to wonder whether machine learning has any role to play in this classical problem. Since the fundamental difficulty in this problem is the dimensionality of the channel, the natural tendency is to think in terms of obtaining a low dimensional representation of the channel using deep learning techniques, such as autoencoders

[3], [4], which can be used for codebook construction. An autoencoder operates on the hypothesis that the data possesses a representation on a lower dimensional manifold of the feature space, albeit unknown, and tries to learn the embedded manifold by training over the dataset. In contrast, for MIMO beamforming, the underlying manifold is known to be a Grassmann manifold. This removes the requirement of “learning” the manifold from the dataset which often times can be extremely complicated. Once the manifold is known, we can leverage the “shallow” learning techniques like the clustering algorithms on the manifold to find the optimal codebook for beamforming.

Prior Art. As is the case with any communication theory problem, almost all existing works on limited feedback assume some analytical model for the channel to enable tractable analyses, e.g., i.i.d Rayleigh fading [5], spatial correlation [6], temporal correlation [7] or both [8]. Specifically, the problem of quantized maximum ratio transmission (MRT) beamforming can be interpreted as a Grassmannian line packing problem for both uncorrelated [9] and spatially correlated [10] Rayleigh fading channels and has been extensively studied. The idea of connecting Grassmann manifolds to wireless communications is not new and has been used in other aspects of MIMO systems, such as non-coherent communication [11] and limited feedback unitary precoding [12]

. Coming to the context of the limited feedback FDD MIMO, the codebook based on Grassmannian line packing is strictly dependent on the assumption of Rayleigh fading and hence cannot be extended to more realistic scenarios. On the other hand, the discrete Fourier transform (DFT) based beamforming exploits the second order statistics of the channel (such as the direction of departure of the dominant path) and offers a simple yet robust solution to the codebook construction. Owing to the direct connection with the spatial parameters of the channel, the DFT codebook can be extended to the Kronecker product (KP) codebook for 3D beamforming in FD MIMO scenarios. A major drawback of DFT codebook is that it scans all possible directions even though many of them may not be used and thus the available feedback bits are not used efficiently. Finally, since we will be proposing a clustering based solution, it is useful to note that clustering has already found applications in many related problems, such as MIMO detection

[13], automatic modulation recognition [14], and radio resource allocation in a heterogeneous network [15].

Contributions. The key technical contribution of this paper lies in the novel formulation of transmit beamforming codebook design for any arbitrary channel distribution as the Grassmannian -means clustering problem. First, we develop the algorithm for -means clustering on the Grassmann manifold that finds the centroids of the clusters. Leveraging the fact that optimal MRT beamforming vectors lie on a Grassmann manifold, we then develop the design criterion for optimal beamforming codebooks. We then formally establish the connection between the Grassmannian -means algorithm and the codebook design problem and show that the optimal codebook is nothing but the set of centroids given by the -means algorithm. This approach is further extended to develop product codebooks for FD-MIMO systems employing UPA antennas. In particular, we show that under the - approximation of the channel, the optimal codebook can be decomposed as the Cartesian product of two Grassmannian codebooks of smaller dimensions. We discuss the optimality and performance of the codebooks using both the proposed techniques in terms of average normalized beamforming gain.

Notation. We use boldface small case (upper case) letters, e.g. , to designate column vectors (matrices) with entries in . We use to represent dimensional complex space, to represent the set of all orthonormal matrices, to represent the set of all unitary matrices. Further, denotes complex conjugate of ), denotes transpose, denotes hermitian,

denotes the singular value decomposition,

denotes the vectorization of a matrix . Also, represents the expectation over the distribution of , denotes the absolute value, denotes the matrix two-norm and .

Ii System Overview

We consider a narrow-band point-to-point MIMO communication scenario, where the transmitter and receiver are equipped with and antennas, respectively. In this paper, we focus on the transmit beamforming operation, where the transmitter sends one data stream over a flat fading channel. The discrete-time baseband input-output relation for this system can be expressed as

 y =Hfs+n, (1)

where is the received baseband signal, is the block fading MIMO channel, is the transmitted symbol, is the additive noise at the receiver, and is the beamforming vector. The symbol energy is given by and the total transmitted energy is . The additive noise is Gaussian, i.e., entries in are i.i.d according to . It is assumed that perfect channel knowledge is always available at the receiver. With the combining vector , the estimated transmitted symbol is obtained as . The receive SNR is

 γr =Et|zHHf|2|zHnnHz|=γt|zHHf|2∥z∥22∥f∥22,

where is the transmit SNR. Without loss of generality, it is assumed that . Under this assumption

 γr =γt|zHHf|2∥f∥2=γtΓ(f,z), (2)

where is the effective channel gain or the beamforming gain. The MIMO beamfoming problem is to choose and such that

is maximized, which would in turn maximize the SNR and consequently minimize the average probability of error and maximize the capacity

[16]. A receiver that employs maximum ratio combining (MRC) chooses such that for a given is maximized [9]. Under the assumption that receiver always uses MRC, is given by

 z =Hf/∥Hf∥2,

and can be simplified as

 (3)

Therefore the MIMO beamforming problem is to find the optimal beamforming vector that maximizes and can be formally posed as

 f =arg maxx∈CMt×1 Γ(x). (4)

To constrain transmit power, we assume that without loss of generality. We consider maximum ratio transmission (MRT), which selects to maximize for a given  [17]. Under the assumptions of MRT, receive MRC, and no other design constraints on , for a given and , the optimal beamforming vector that maximizes is

 f =arg maxx∈CMt×1 ∥Hx∥22 subjected to % ∥x∥22=1 =arg maxx∈U(Mt,1) ∥Hx∥22. (5)

Note that of any function returns only one out of its possibly many global maximizers and thus the output may not necessarily be unique. For an MRT system,

is the orthonormal eigenvector associated with the maximum eigenvalue of

[18]. Let be the eigenvalues of and be the corresponding eigenvectors. One possible solution of (5) is and the corresponding beamforming gain is

 Γ(v1) =∥Hv1∥22=λ1. (6)

For a given , let the solution space of (5) be denoted as . Then and for every , .

Quite obviously, MRT beamforming requires CSIT. In particular, in an FDD system, the receiver estimates the channel and sends back to the transmitter over a feedback channel. Thus the feedback overhead increases as increases. Since the feedback channel is typically assumed to be a low-rate reliable channel, it is not always possible to transmit over this channel without any data compression [1]. One way to model this feedback bottleneck is to assume the feedback channel to be a zero-delay, error-free, and the capacity being limited to bits per channel use. Thus, it is necessary to introduce some method of quantization for . The most well-known approach for the quantization is to construct a dictionary of beams [1], also known as the beam codebook. In particular, the transmitter and receiver agree upon a finite set of possible beamforming vectors, say of cardinality . The receiver chooses the appropriate vector that maximizes and feeds the index of the codeword back to the transmitter. The system-level diagram of a limited feedback FDD-MIMO system, as discussed so far, is provided in Fig. 1.

Therefore for a given codebook , the optimal beamforming vector as stated in (4) is

 f =arg maxfi∈F ∥Hfi∥22. (7)

It is important to note that the original problem of finding the optimal MRT solution for (5) is a constrained optimization problem on the Euclidean space and does not have unique solution on . This problem can be reformulated as a manifold optimization problem as follows. As argued in [9], it can be shown that the optimal MRT beamformers for every lie on a special kind of Riemann manifold embedded in , known as the Grassmann manifold. This will be discussed in Section III. As it will be evident later, this manifold structure of the search domain of in (5) is the key enabler for our data-driven codebook design.

Iii Clustering on a Grassmann Manifold

In this section, we provide a brief introduction to the Grassmann manifold, which is instrumental for the design of the proposed beamforming codebook design. The reader is referred to the foundational texts in differential geometry, such as [19], for a comprehensive and rigorous treatment of this manifold. A Grassmann manifold refers to a set of subspaces embedded in a higher-dimensional space (such as the surface of a sphere in ). More formally, the complex Grassmann manifold is defined as

 G(Mt,M) :={span(Y):Y∈CMt×M,YHY=IM}. (8)

Any element in is typically represented by an orthonormal matrix whose columns span . It is to be noted that there exists no unique representation of a subspace . This can be explained as follows. Let be the orthonormal basis that spans , then can also be spanned by some other orthonormal matrix for some . Thus and span the same subspace, which is represented by an equivalence relation . Each of these -dimensional linear subspaces can be regarded as a single point on the Grassmann manifold, which is represented by its orthonormal basis. Since a linear subspace can be specified by an arbitrary basis, each point on is an equivalence classes of orthonormal matrices. Specifically, and correspond to the same point on .

Now consider the case of , i.e. , which is the set of all one-dimensional subspaces in . In other words, one can visualize as the collection of all lines passing through the origin of the space . A line passing through origin in is represented in by a unit vector that spans the line. It can also be generated by any other unit vector if . Also note, and correspond to the same point on . Since a complex Grassmann of arbitrary dimensions is difficult to visualize, for illustration purpose, we present the real Grassmann manifold in in Fig. 2.

The notion of “distance” between the lines in generated by two unit vectors can be defined as the sine of the angle between the lines [20]. In particular, the distance is expressed as

 d(f1,f2) =sin(θ1,2)=√1−|fH1f2|2. (9)

The connection between and optimal MRT beamforming is established next.

Lemma 1.

For a given channel realization and , every such that is also an optimal MRT beamformer.

Proof:

Given and i.e. for some , then

 ∥Hf∥22 =|fHHHHf| =|e−jθfHHHHfejθ| =|f′HHHHf′| =∥∥Hf′∥∥22. (10)

Therefore, is also an element in . ∎

Remark 1.

According to Lemma 1, every such that correspond to the same point on

. Therefore, any probability distribution on channel

will impose a probability distribution on in . Thus we need a quantizer defined on to encode the optimal MRT beamforming vectors and generate a -bit beamforming codebook for limited feeback beamforming.

The optimal MRT beamformers are points on , hence can be described as a point process whose characteristics depend on the underlying channel distribution.

Remark 2.

For a Rayleigh fading channel where the entries of are i.i.d according to , it has been shown in [9] that

. Thus, the construction of is equivalent to finding the best packing of lines in  [20]. See [9] for more details.

For an arbitrary distribution of , the distribution of will no longer be uniformly distributed on . As an illustrative example, using the distance function defined in , we compare the Ripley’s function  [21] of the optimal MRT beamforming vectors on for a realistic scenario (see Section VI for more details on the experimental setup) with Rayleigh fading channels for the same system model. Ripley’s

function is a spatial descriptive statistic that measures the deviation of a point process from spatial homogeneity. From Fig.

3, we see that the distribution of the optimal MRT beamforming vectors for the realistic channel is significantly different from the uniform distribution on which is equivalent to complete randomness on the manifold. We can also infer that optimal MRT beamformers of the channel for the considered scenario exhibit clustering tendency on . Therefore, a reasonable codebook construction scheme is to simply deploy an unsupervised clustering method (such as -means clustering) on that can identify optimal cluster centroids and form the codebook . In what follows, we will establish that -means clustering on actually yields an optimal codebook.

Iii-a Grassmannian K-means clustering

-means clustering on a given metric space is a method of vector quantization to partition a set of data points into non-overlapping clusters in which each data point belongs to the cluster with the nearest cluster centroid. The centroids are the quantized representations of the data points that belong to the respective clusters. A quantizer on the given metric space maps the data points to one of the centroids. The centroids are chosen such that the average distortion due to the quantization according to a pre-defined distortion measure is minimized. Before we formally introduce the main steps of the clustering algorithm on , we first define the notion of a distortion measure and a quantizer as follows.

Definition 1 (Distortion measure).

The distortion caused by representing with is defined as the distortion measure which is given by .

Definition 2 (Grassmann quantizer).

Let be a -bit codebook such that , then a Grassmann quantizer is defined as a function mapping elements of to elements of i.e. .

A performance measure of a Grassmann quantizer is the average distortion , where

 D(QF) :=Ex [do(x,QF(x))] =Ex [d2(x,QF(x))]. (11)

In most practical settings, we may have access to a set of data points in lieu of the probability distribution . Then the expectation w.r.t in (11) means averaging over the set . Therefore the objective of -means clustering with is to find the set of centroids, i.e. , that minimizes and can be expressed as

 FK =arg minF⊂G(Mt,1)|F|=2B D(QF) =arg minF⊂G(Mt,1)|F|=2B Ex[d2(x,QF(x))], (12)

and the associated quantizer is

 QFK(x) =arg minfi∈F do(x,fi) =arg minf∈F d2(x,fi). (13)

However, finding the optimal solution for -means clustering is an NP-hard problem. Therefore, we use the Linde-Buzo-Gray algorithm [22] (outlined in Alg. 1

) which is a heuristic algorithm that iterates between updating the cluster centroids and mapping a data point to the corresponding centroid that guarantees convergence to a local optimum. In Alg.

1, the only non-trivial step is the centroid calculation for a set of points. In contrast to the squared distortion measure in the Euclidean domain, the centroid of elements in a general manifold with respect to an arbitrary distortion measure does not necessarily exist in a closed form. However, the centroid computation on is feasible because of the following Lemma.

Lemma 2 (Centroid computation).

For a set of points , , that form the -th Voronoi partition, the centroid is

 fk =arg minf∈G(Mt,1)Nk∑i=1d2(xi,f)=eig(Nk∑i=1xixHi), (15)

where is the dominant eigenvector of the matrix .

Iv Grassmannian Codebook Design

In this section, we formally establish the connection between Grassmannian -means clustering and the optimal codebook construction. The transmitter and receiver use a -bit codebook to map the channel matrix to a codeword in according to (7). In order to define the optimality of the codebook, we first introduce the average normalized beamforming gain for as

 Γav: =EH [Γ(f)Γ(v1)] =EH [∥Hf∥22λ1] =EH [Mt∑i=1λi|vHif|2λ1] ≥Ev1 [|vH1f|2].

To measure the average distortion introduced by the quantization using , we use the loss in as given below:

 L(F) :=EH [1−Γav] ≤Ev1 [1−|vH1f|2]:=Lub(F), (16)

where is an upper bound of and the sufficient condition for equality in (16) is . The optimal codebook intends to minimize by minimizing its upper bound as given in (16). Since the current limited feedback approach quantizes as , it is reasonable to minimize which depends only on and . This yields the following codebook design criterion.

Definition 3 (Codebook design criterion).

Over all of the -bit codebooks , the Grassmannian codebook is the one that minimizes . Therefore

 F∗ :=arg minF⊂G(Mt,1)|F|=2B Lub(F). (17)

Building on this discussion, we now state the method to find the optimal codebook in as follows.

Theorem 1.

For a feedback channel with capacity bits per channel use, the Grassmannian codebook as defined in Definition 3 is the same as the set of cluster centroids found by the -means algorithm with that minimizes for a given distribution of optimal MRT beamforming vector through its training dataset as given in (11), i.e.

 F∗ =FK, (18)

where is given by (12).

Proof:

The optimal codebook is given by

 F∗ =arg minF⊂G(Mt,1)|F|=2B Lub(F) =arg minF⊂G(Mt,1)|F|=2B Ev1 [1−|vH1f|2] =arg minF⊂G(Mt,1)|F|=2B Ev1 [minfi∈F(1−|vH1fi|2)] =arg minF⊂G(Mt,1)|F|=2B Ev1 [minfi∈F d2(v1,fi)]=FK,

which completes the proof. ∎

Theorem 1 states the equivalence of the optimal codebook design with the -means clustering on . The benefit of making this connection is that it provides an approach for finding the optimal codebooks leveraging existing work on -means clustering on . We are now in a position to state the key steps in designing the optimal codebooks based on the Grassmanian -means clustering.

Iv-a Codebook Construction

We assume a stationary distribution of the channel for a given coverage area of a transmitter. In order to construct the Grassmannian codebook, we construct , a set of channel realizations sampled for different user locations. The available channel dataset is split into training and testing datasets, and for generating beamforming codebooks and evaluating their performance, respectively. We assume that the size of the training set is large enough so that the sampling distribution closely approximates the original distribution. The training procedure yields the optimal codebook whose performance is evaluated by measuring for the channel realizations in the test set . Further details and benchmarking results are outlined in Section VI. The codebook design and performance evaluation processes are illustrated in Alg. 2.

V Grassmannian Product Codebook for FD-MIMO

After discussing the general notion of the codebook construction for transmit beamforming for an MIMO system, we focus our attention to a special case of FD MIMO communication. We assume that the transmitter is equipped with a UPA with dimensions () while the receiver has one antenna, i.e. . Let represent the channel matrix where the -th element corresponds to the channel between the antenna element at the -th row and -th column of the UPA and the single receiver antenna at the user. Note that this system model appears as a special case of the general MIMO system discussed in the previous section where . Hence the codebook can be designed as given in Alg. 2 using the -means clustering in . Assuming , the dimension of the codewords increase as . Naturally, the -means clustering will suffer from the curse of dimensionality as the difference in the maximum and minimum distances between two points in the dataset becomes less prominent as the dimension of the space increases [23]. In this section, we show that the codebook can be obtained by clustering on lower dimensional manifolds by exploring the geometry of the UPA. Considering the UPA channel as a matrix , we have the singular value decomposition of as follows.

 ~H =UΣVH, (19)

where is the left singular matrix, is the right singular matrix and is the rectangular diagonal matrix with singular values in decreasing order. Let be the -th eigenvalue of , then . Further, and are the column vectors of and respectively with and . Thus, and . Then we have

 HT =vec(~HT) =vec(V∗ΣUT) =vec(rank(~H)∑i=1σiv∗iuTi) =rank(~H)∑i=1σiui⊗v∗i. (20)

From (20), we can represent as the linear combination of scaled with . Thus we have

 H =rank(~H)∑i=1σiuTi⊗vHi. (21)

Due to the finiteness of the physical paths between the transmitter to the receiver, it is well-known that . For the sake of simplicity, we approximate the channel with its dominant direction, i.e. , which is called - approximation and the approximated channel is given as

 H ≈¯H=σ1uT1⊗vH1. (22)

Let be a beamforming vector for . Then the KP form of naturally leads us to the idea of using of the form . This motivates us to use separate codebooks and for the horizontal and vertical dimensions and enables to design product codebooks by clustering in lower dimensional manifolds. The beamforming gain for can now be simplified as

 Γ(x) =∥∥¯Hx∥∥22 =∥∥σ1(uT1⊗vH1)(xv⊗xh)∥∥22 (a)=σ21∥∥uT1xv∥∥22 ∥∥vH1xh∥∥22 =σ21 |uT1xv|2 |vH1xh|2,

where step follows from the fact that for two matrices and of any dimensions. The optimal MRT beamforming vector for can be simplified as

 f =arg maxx∈U(Mt,1) Γ(x) =arg maxxv∈U(Mv,1)xh∈U(Mh,1) |uT1xv|2 |vH1xh|2 =arg maxxv∈U(Mv,1) |uT1xv|2 ⊗arg maxxh∈U(Mh,1) |vH1xh|2 =fv⊗fh, (23)

where

 fv=arg maxxv∈U(Mv,1) |uT1xv|2, fh=arg maxxh∈U(Mh,1) |vH1xh|2. (24)

Observe that one possible solution for optimal MRT beamformer in (23) is given by and . Following Remark 1, we can argue that , where and .

The loss in average normalized beamforming gain with the codebook can be bounded as

 L(F) =Eu1,v1 [1−Γ(fv⊗fh)Γ(u∗1⊗v1)] =Eu1,v1 [1−|(uT1⊗vH1)(fv⊗fh)|2] ≤2Eu1,v1 [1−|(uT1⊗vH1)(fv⊗fh)|] ≤2Eu1,v1 [minθ,ϕ(∥∥(ejθu∗1⊗ejϕv1)−(fv⊗fh)∥∥)] ≤2Eu1,v1 [minθ,ϕ(∥∥ejθu∗1∥∥2∥∥ejϕv1−fh∥∥2+ ∥∥ejθu∗1−fv∥∥2∥∥ejϕfh∥∥2)] =2Eu1,v1 [minθ,ϕ(∥∥ejϕv1−fh∥∥2+∥∥ejθu∗1−fv∥∥2)] =2Eu1,v1 [(1−|vH1fh|)1/2+(1−|uT1fv|)1/2] (25)
Definition 4 (Grassmannian product codebook).

Under the rank-1 approximation of the channel, , the -bit Grassmannian product codebook is the one that satisfies the codebook design criteria in Definition 3 for a given where , and .

We will now state the method to construct the product codebook as follows.

Lemma 3.

The Grassmannian product codebook as defined in Definition 4 is constructed using the set of centroids and obtained from the independent -means clustering of the optimal MRT beamforming vectors and on and with and , respectively.

Proof:

From Definition 4,

 F∗ =F∗v⊗F∗h =arg minFv,Fh Lub(F) =arg minFv,Fh Eu1,v1 [(1−|vH1fh|2)+(1−|u∗1fv|2)] =arg minFv⊂G(Mv,1)|Fv|=2BvEu1 (1−|uT1fv|2) ⊗ arg minFh⊂G(Mh,1)|Fh|=2BhEv1 (1−|vH1fh|2),