# Two-Timescale Hybrid Compression and Forward for Massive MIMO Aided C-RAN

We consider the uplink of a cloud radio access network (C-RAN), where massive MIMO remote radio heads (RRHs) serve as relays between users and a centralized baseband unit (BBU). Although employing massive MIMO at RRHs can improve the spectral efficiency, it also significantly increases the amount of data transported over the fronthaul links between RRHs and BBU, which becomes a performance bottleneck. Existing fronthaul compression methods for conventional C-RAN are not suitable for the massive MIMO regime because they require fully-digital processing and/or real-time full channel state information (CSI), incurring high implementation cost for massive MIMO RRHs. To overcome this challenge, we propose to perform a two-timescale hybrid analog-and-digital spatial filtering at each RRH to reduce the fronthaul consumption. Specifically, the analog filter is adaptive to the channel statistics to achieve massive MIMO array gain, and the digital filter is adaptive to the instantaneous effective CSI to achieve spatial multiplexing gain. Such a design can alleviate the performance bottleneck of limited fronthaul with reduced hardware cost and power consumption, and is more robust to the CSI delay. We propose an online algorithm for the two-timescale non-convex optimization of analog and digital filters, and establish its convergence to stationary solutions. Finally, simulations verify the advantages of the proposed scheme.

## Authors

• 22 publications
• 9 publications
• 67 publications
• 13 publications
• 16 publications
01/25/2018

### Stochastic Successive Convex Optimization for Two-timescale Hybrid Precoding in Massive MIMO

Hybrid precoding, which consists of an RF precoder and a baseband precod...
06/18/2020

### Hybrid Beamforming Structure for Massive MIMO System: Full-connection v.s. Partial-connection

In this article we compare the performance of two typical hyrbid beamfor...
10/19/2020

### Hybrid Beamforming and Adaptive RF Chain Activation for Uplink Cell-Free Millimeter-Wave Massive MIMO Systems

In this work, we investigate hybrid analog–digital beamforming (HBF) arc...
08/20/2019

### Mixed-Timescale Beamforming and Power Splitting for Massive MIMO Aided SWIPT IoT Network

Traditional simultaneous wireless information and power transfer (SWIPT)...
06/20/2022

### Improving Triplet-Based Channel Charting on Distributed Massive MIMO Measurements

The objective of channel charting is to learn a virtual map of the radio...
09/14/2018

### A First Experimental Demonstration of Analog MIMO Radio-over-Copper

Analog MIMO Radio-over-Copper is an effective fronthaul architecture tha...
01/19/2021

### DyLoc: Dynamic Localization for Massive MIMO Using Predictive Recurrent Neural Networks

This paper presents a data-driven localization framework with high preci...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Cloud radio access network (C-RAN) [1] and massive multiple-input multiple-output (MIMO) [2] are regarded as two key technologies for future wireless systems. Both technologies can significantly improve the spectral and energy efficiency of wireless systems by employing a huge number of antennas per unit area. However, they adopt different architectures and thus have their own pros and cons.

C-RAN is essentially a large-scale distributed antenna system, where plenty of remote radio heads (RRHs) are distributed within a specific geographical area and are connected to a centralized baseband unit (BBU) pool through fronthaul links. Each RRH merely serves as a relay to forward the signals from/to the BBU via its fronthaul link, while all baseband processings are performed at the BBU. Since each user can always find some nearby RRHs with strong channel conditions, the users at different locations can enjoy a uniform quality of experience without suffering from the cell-edge effect. However, in practice, the performance of C-RAN is limited by the fronthaul capacity between each RRH’s and BBU, especially when each RRH has multiple antennas. In contrast, the massive MIMO system deploys a large number of antennas at the base station (BS) to achieve large spatial multiplexing and array gains. In this case, processing is done locally at the BS, hence the performance is no longer limited by the fronthaul capacity.

Recently, massive MIMO aided C-RAN, in which each RRH is equipped with a massive MIMO array, has been proposed to further improve the spectral and energy efficiency of wireless systems [3]. However, moving signal processing of an uplink massive MIMO system from the RRH to the cloud would require a huge amount of digital sampled data to be transported over the fronthaul link. Therefore, it is necessary to compress the uplink data at each RRH to satisfy the limited fronthaul capacity constraint. Various fully-digital fronthaul compression techniques have been proposed for the uplink of C-RAN with small-scale multi-antenna RRHs, from the more complicated quantize-and-forward (QF) schemes [4, 5] to the simpler uniform scalar quantization schemes [6] and RRH selection schemes [7]. In particular, the spatial compression and forward scheme proposed in [8] combines fully-digital linear spatial filtering and uniform scalar quantization to alleviate the performance bottleneck caused by the limited fronthaul capacity. Unfortunately, fully-digital spatial filtering requires a larger number of analog-to-digital converter (ADCs) and radio frequency (RF) chains at each massive MIMO RRH. In [9], a fully-analog linear spatial filtering is used at each RRH to achieve the fronthaul compression with reduced hardware cost and power consumption. However, fully-analog processing is known to be less efficient than hybrid analog and digital processing. Moreover, the analog filtering matrix in [9] is adapted to the instantaneous channel state information (CSI), making it difficult to be extended to wideband systems with many subcarriers, because the instantaneous CSI on different subcarriers is usually different [10].

In this paper, we propose a two-timescale hybrid (analog and digital) compression and forward (THCF) scheme for the uplink transmission of massive MIMO aided C-RAN, to alleviate the performance bottleneck of the limited fronthaul, with reduced hardware cost and power consumption. In this scheme, each RRH first performs a two-timescale hybrid analog and digital spatial filtering to reduce the dimension of its received signal. Specifically, the analog filtering matrix is adapted to the long-term channel statistics to achieve massive MIMO array gain, and the digital filtering matrix is adapted to the instantaneous effective CSI (i.e., the product of the instantaneous channel and analog filtering matrix) to achieve spatial multiplexing gain. Then, each RRH applies the uniform scalar quantization over each of these dimensions. Finally, the quantized signals at the RRHs are sent to the BBU for joint decoding. The power allocation at users, analog/digital filtering matrices and quantization bits allocation at RRHs, as well as the receive beamforming matrix at the BBU are jointly optimized to maximize a general utility function of long-term average data rates of users, including average weighted sum-rate maximization and proportional fairness (PFS) utility maximization as special cases.

Such a two-timescale hybrid design has several advantages. For example, the analog filtering matrix is robust to the CSI signaling latency. Moreover, since the channel statistics is approximately the same over different subcarriers [11], a single analog filtering matrix is sufficient to cover all subcarriers at each RRH, making it applicable to wideband systems. With the proposed THCF scheme, the massive MIMO aided C-RAN uplink system can enjoy the huge array gain provided by the massive MIMO almost for free (i.e., the complexity and power consumption are similar to the C-RAN with small-scale multi-antenna RRHs). However, there are also several technical challenges in the implementation of this architecture.

• Two-timescale Stochastic Non-convex Optimization: The joint optimization of long-term control variables (analog filtering) and short-term control variables (power allocation, digital filtering, quantization bits allocation, and receive beamforming matrix) belongs to two-timescale stochastic non-convex optimization, which is difficult to solve. Specifically, the objective function contains expectation operators and the argument of the expectation operators involves the optimal short-term control variables, which do not have closed-form expressions. In addition, the optimization of the short-term control variables at different time slots are usually coupled together for a general utility function such as PFS. Moreover, both short-term and long-term subproblems are non-convex.

• Lack of Channel Statistics: In practice, we may not even have explicit knowledge of the channel statistics. Hence, the solution should be self-learning to the unknown channel statistics.

• Convergence Analysis: It is very important to establish the convergence of the algorithm. However, this is non-trivial for a two-timescale stochastic non-convex optimization problem.

To address the above challenges, we propose an online block-coordinate stochastic successive convex approximation (BC-SSCA) algorithm with self-learning capability to solve the two-timescale stochastic non-convex optimization problem without explicit knowledge of the channel statistics. In addition, we establish convergence of the BC-SSCA algorithm to stationary solutions. Finally, simulations show that the proposed two-timescale hybrid scheme achieves better tradeoff performance than the baselines.

The rest of the paper is organized as follows. In Section II, we give the system model for two-timescale hybrid compression and forward in the uplink of massive MIMO aided C-RAN. In Section III, we formulate the two-timescale stochastic non-convex optimization problem for the joint optimization of long-term and short-term control variables. The proposed online BC-SSCA algorithm and the associated convergence proof are presented in Section IV. The simulation results are given in Section V to verify the advantages of the proposed solution, and the conclusion is given in Section VI. The key notations used in this paper are summarized in Table I.

## Ii System Model

### Ii-a Network Architecture and Channel Model

Consider the uplink of a massive MIMO aided C-RAN, where RRHs, each equipped with a massive MIMO array of antennas and Rx RF chains, are distributed within a specific geographical area to serve single-antenna users, as shown in Fig. II. Each RRH serves as a relay between the BBU and users, and is connected to the BBU via a fronthaul link of capacity bits per second (bps). The BBU is in charge of making resource allocation decisions and joint decoding of the users’ messages based on the signals from all the RRHs. We assume that the number of users is fixed and

so that there are enough spatial degrees of freedom to serve all the

users. This is a typical operating regime that has been assumed in many works on massive MIMO systems [12, 10, 13]. As a motivating example, consider a system in which the users are a fixed number of pico BSs and the RRH provides backhaul links between the pico-cells and BBU.

For clarity, we focus on a narrowband system with flat block fading channel, but the proposed algorithm can be easily modified to cover the wideband system as well. In this case, the received signal at RRH is given by

 yn=K∑k=1hn,k√pksk+zn=HnP1/2s+zn,

where with denoting the channel vector from user to RRH , with denoting the data symbol of user , with denoting the transmit power of user , and is the additive white Gaussian noise vector.

### Ii-B Two-timescale Hybrid Compression and Forward at RRHs

Each RRH applies the THCF scheme to make sure that the compressed received signal can be forward to the BBU via its fronthaul with a limited capacity of bps, as illustrated in Fig. 2. Specifically, a two-timescale hybrid filtering matrix is first applied at RRH to compress the received signal into a low-dimensional signal , where and are the analog and digital filtering matrices, respectively, and we set such that there is no information loss due to digital filtering at each RRH [8]. The analog filtering matrix is usually implemented using an RF phase shifting network [14]. Hence, can be represented by a phase vector , whose -th element is the phase of the -th element of , i.e., . In this paper, we assume that high-resolution ADCs are used at each RRH such that the quantization error due to ADCs is negligible. Then, a simple uniform scalar quantization [6] is applied to each element of at RRH to achieve fronthaul compression. Note that the quantization is performed at the baseband after the digital filter instead of at the ADC because we need to dynamically adjust the quantization bits according to the instantaneous channel state to improve the efficiency of fronthaul compression.

After the uniform scalar quantization, the compressed received signal is modeled by

 ˜yn=¯¯¯yn+en=VHnFHn(HnP1/2s+zn)+en,

where with denoting the quantization error for . Let denote the number of bits that RRH uses to quantize the real or imaginary part of . With uniform scalar quantization, the covariance matrix of is given by a function of , and as [6]

 Qn(p,FnVn,dn)=diag(qn,1,...,qn,L),

where

is the variance of the quantization error

:

 qn,l={34dn,l(∑Kk=1pk|hHn,k˜vn,l|2+∥˜vn,l∥2)if dn,l>0,∞if dn,l=0, (1)

where . Finally, each RRH forwards the quantized bits to the BBU via the fronthaul link.

### Ii-C Joint Rx Beamforming at the BBU

The received signal at the BBU from all RRHs can be expressed as

 ˜y=˜VHHP1/2s+˜VHz+e,

where , with denoting the composite channel vector of user , , and . A joint Rx beamforming vector

is applied at the BBU to obtain the estimated data symbol for each user

as

 ^sk =uHk˜y =uHk˜VHHP1/2s+uHk˜VHz+uHke,∀k.

### Ii-D Frame Structure and Achievable Data Rate

In this paper, we focus on a coherence time interval of channel statistics within which the channel statistics (distribution) are assumed to be constant. The coherence time of channel statistics is divided into frames and each frame consists of time slots, as illustrated in Fig. 3. The channel state is assumed to be constant within each time slot. In this paper, we assume that one (possibly outdated) channel sample at each frame can be obtained by uplink channel training. Specifically, users send uplink pilot signals and then the BBU estimates the channel based on the received pilot signals collected from RRHs via the fronthaul. Several compressive sensing (CS) based channel estimation methods have been proposed for uplink channel training with a limited number of RF chains, see e.g., [15, 16]. At each time slot, the BBU needs to obtain the effective CSI , which can also be obtained by uplink channel training. Since the dimension of the effective channel is equal to the number of RF chains at each RRH, a simple least-square (LS) based channel estimation method is sufficient to obtain a good estimation of the effective channel with low computation time, i.e., the delay for effective CSI can be made small relative to the channel coherence time. In our design, the BBU is not required to have explicit knowledge of the channel statistics. By observing one channel sample at each frame, the proposed algorithm can automatically learn the channel statistics (in an implicit way). Specifically, the long-term analog filtering matrices are only updated once per frame based on a (possibly outdated) channel sample to achieve massive MIMO array gain with reduced implementation cost. On the other hand, the short-term control variables are adaptive to the real-time effective CSI to achieve the spatial multiplexing gain. For convenience, we let , and .

For given long-term control variables (phase vectors of analog filtering matrices), short-term control variables and channel realization , the achievable data rate of user is given by

 r∘k(θ,x,H)=log(1+SINRk(θ,x;H)),

where is the SINR of user given by

 SINRk(θ,x;H)=

where

 Q(θ,p,v,d)= diag(Q1(p,F1V1,d1),...,QN(p,FNVN,dN)).

Note that is a function of and we will explicitly write it as .

Let denote the short-term control variable under channel state and denote the collection of the short-term control variables for all possible channel states, with denoting the feasible set of the short-term control variables. To be more specific, is the set of all short-term control variables that satisfy the following constraints:

 pk∈[0,Pk], ∀k, (2) 2BWL∑l=1dn,l≤Cn, ∀n, (3) dn,l≥0 is an integer, ∀n,l, (4)

where is the individual power constraint at user , is the system bandwidth, and (3) is the fronthaul capacity constraint. Then the average data rate of user is

 ¯¯¯r∘k(θ,˜Ω)=E[r∘k(θ,x(H);H)],

where the expectation is taken with respect to the channel state . For convenience, define as the average data rate vector.

## Iii Two-timescale Joint Optimization at BBU

### Iii-a Problem Formulation

Note that is not a continuous function of because is an integer. To make the problem tractable, we relax the integer constraint on and approximate the quantization noise power with the following continuous function of a real variable as

 ^qn,l=34dn,l(K∑k=1pk|hHn,k˜vn,l|2+∥˜vn,l∥2). (5)

The same approximation has also been considered in [8]. We use to denote the approximate data rate of user obtained by replacing in (1) with in (5) and the integer constraint in (4) with constraint . Moreover, define as the approximate average data rate vector, where and with denoting the set of all short-term control variables that satisfy constraint (2), (3) and . To simplify the notation, we drop the arguments in , , and write them as , , when there is no ambiguity.

With the above approximate rate, the two-timescale joint optimization of long-term and short-term control variables can be formulated as the following utility maximization problem

 P:maxθ∈Θ,Ωg(¯¯¯r(θ,Ω)), (6)

where the utility function is continuously differentiable (and possibly non-concave) function of , is the feasible set of . Moreover, is non-decreasing with respect to and its derivative with respect to is Lipschitz continuous. This general utility function includes many important network utilities as special cases, such as average sum rate () and proportional fairness utility (, where is a small number to avoid the singularity at ).

### Iii-B Stationary Solution of Problem P

Since Problem is a two-timescale stochastic non-convex problem, we focus on designing an efficient algorithm to find stationary solutions of Problem , as defined below.

###### Definition 1 (Stationary solution of P).

A solution is called a stationary solution of Problem if it satisfies the following conditions:

1. For every

outside a set of probability zero,

 (x−x∗(H))TJx(θ∗,x∗(H);H)∇¯¯¯rg(¯¯¯r∗)≤0, (7)

where is the Jacobian matrix111The Jacobian matrix of with respect to is defined as , where is the partial derivative of with respect to . of the (approximate) rate vector with respect to at and , and is the derivative of at .

2.  (θ−θ∗)T∇θg(¯¯¯r∗)≤0,∀θ∈Θ, (8)

where is the partial derivative of with respect to at and , is the Jacobian matrix222The Jacobian matrix of with respect to is defined as , where is the partial derivative of with respect to . of the (approximate) rate vector with respect to at and .

In other words, a solution is called a stationary solution of if for fixed , is a stationary point of w.p.1., and for fixed , is a stationary point of . The stationary solution is a natural extension of the stationary point for a deterministic optimization problem. The global optimal solution must be a stationary solution. However, the set of stationary solutions may also contain local optimal solutions and a certain type of saddle points. When is a two-timescale stochastic convex problem, a stationary solution is also a globally optimal solution.

Note that a stationary solution of may not satisfy all the integer constraints in (4). To obtain an integer solution for the quantization bits allocation, we use the same method as in [8] to round each to its nearby integer as follows.

 ^dn,l(αn)=⎧⎪⎨⎪⎩⌊d∗n,l⌋,if d∗n,l−⌊d∗n,l⌋≤αn,⌈d∗n,l⌉,otherwise,∀n,l,

where is chosen such that . Since and , we can always find such using a bisection search over .

## Iv Online Block-Coordinate Stochastic Successive Convex Approximation

There are several challenges in finding stationary solutions of Problem , elaborated as follows.

###### Challenge 1.
Complex coupling between the short-term and long-term control variables; no closed-form characterization of the average data rates ; unknown distribution of .

To the best of our knowledge, there lacks an efficient and online algorithm with self-learning capability to handle the two-timescale stochastic non-convex optimization problem . In this section, we propose an online BC-SSCA algorithm to find stationary solutions of Problem . We shall first summarize the proposed BC-SSCA algorithm. Then we elaborate the implementation details.

### Iv-a Summary of the BC-SSCA Algorithm

The proposed online BC-SSCA algorithm is summarized in Algorithm 1 and its time line is illustrated in Fig. 3. In BC-SSCA, an auxiliary weight vector is introduced to approximate the derivative . At the beginning of each coherence time of channel statistics, the BBU resets the BC-SSCA algorithm with an initial analog filter phase vector and a weight vector . Then and are updated once at the end of each frame, where is updated by maximizing a concave surrogate function of with respect to . Note that we cannot obtain the optimal by directly maximizing because is not concave and it does not have closed-form expression. Specifically, let and denote the analog filter phase vector and weight vector used during the -th frame. The -th iteration (-th frame) of the BC-SSCA algorithm is described as follows.

#### Step 1 (Short-term control optimization at each time slot)

At time slot in the -th frame, the BBU first acquires the effective channel , where is the channel state of RRH at time slot . Then it calculates the short-term control variables from by running a short-term block-coordinate (BC) algorithm with input , and , where determines the total number of iterations for the short-term BC algorithm at frame . Note that depends on only through the effective channel .

Specifically, for given input , and , the short-term BC algorithm runs iterations to find a stationary point (up to certain accuracy) of the following weighted sum-rate maximization problem (WSRMP):

The reason that the short-term control variables are obtained by solving the WSRMP is as follows. It follows from (7) that at a stationary solution , the short-term control variables for channel realization must be a stationary point of with a stationary weight vector . Therefore, for fixed long-term control variable , once we know , the joint optimization of the collection of short-term control variables can be decoupled into the optimization of per time slot short-term control variables by solving a WSRMP at each time slot . However, is not known a prior. Therefore, the basic idea of the proposed algorithm is to iteratively update the long-term variable and the weight vector such that and converge to a stationary solution and the corresponding stationary weight vector , respectively. Then the short-term control variable that satisfies (7) for each channel state can be calculated by finding a stationary point of the corresponding WSRMP as .

The details of the short-term BC algorithm will be postponed to Section IV-B. Here, we only discuss the impact of on the convergence. For any finite iteration , is finite, and we can let as to ensure the convergence to stationary solutions. A larger for fixed usually leads to a faster overall convergence speed at the cost of higher complexity.

#### Step 2 (Long-term control optimization at the end of frame t)

In Step 2a, the BBU obtains a full channel sample before the end of -th frame. Then, in Step 2b (at the end of the -th frame), the BBU updates the surrogate function based on , the current iterate , and the short-term control variables as

 ¯ft(θ) (9)

where is a constant; is an approximation for the average data rate vector, which is updated recursively as

 ^rtk=(1−ρt)^rt−1k+ρt(t+1)Ts∑i=tTs+1rk(θt,x(i);H(i))Ts,∀k, (10)

with ; is an approximation of the partial derivative with respect to , which is updated recursively as

 Ft =(1−ρt)Ft−1+ρtJθ(θt,x(tTs+1);Ht), ft =Ft∇¯¯¯rg(^rt), (11)

with , where is a sequence to be properly chosen, is the Jacobian matrix of the rate vector with respect to and its expression is derived in Appendix -A, is an approximation for . It will be shown in Lemma 1 that and will converge to the true average data rate and partial derivative, respectively. Therefore, the issues of no closed-form characterization of the average data rates and unknown distribution of can be addressed by approximating the average data rate and in a recursive way as in (10) and (11) based on the online observations of the channel samples at each time slot . Moreover, the weight vector is updated as

 μt+1=(1−γt)μt+γt¯μt. (12)

with , where is a sequence satisfying , .

In Step 2c, the optimal solution of the following quadratic optimization problem is solved:

 ¯θt=argmaxθ∈Θ ¯ft(θ), (13)

which has closed-form solution , where denotes the projection on to the box feasible region . Finally, is updated according to

 θt+1=(1−γt)θt+γt¯θt. (14)

Then the above iteration is carried out until convergence.

### Iv-B Short-term Block-Coordinate Algorithm

To apply the BC algorithm, we first transform the WSRMP to the following weighted minimum mean square error (WMMSE) problem

 minβ,v,d,u,w K∑k=1μk(wkηk−logwk) (15) s.t. d≥0, (???) and (% ???),

where with is a weight vector for MSE, with and

 ηk≜E[|sk−^sk|2|H] = ∣∣1−uHk˜VHhkβk∣∣2+∑l≠k∣∣uHk˜VHhlβl∣∣2 + uHk˜VH˜Vuk+uHkQ(θ,p,v,d)uk,

is the MSE of user . Following similar proof to that of Theorem 1 in [17], it can be shown that Problem is equivalent to (15). Moreover, if is a stationary point of (15), then is also a stationary point of , where with . Therefore, we shall focus on designing a BC algorithm to find a stationary point of (15).

In the proposed BC algorithm, the short-term control variables are optimized in an alternating way by solving a convex subproblem with respect to each variable. The BC algorithm is summarized in Algorithm 2. The choice of the initial point and the update equation for each variable is elaborated below.

#### Iv-B1 Choice of Initial Point

For , we choose the initial point to be , i.e., each user transmits at the maximum power. For , we choose the initial point to be , i.e., equal quantization bits allocation at each RRH. For , we choose to be the first eigenvectors of .

#### Iv-B2 Optimization of u, w and β

When fixing the other short-term variables, the optimal is given by the MMSE receiver in (16), where is an abbreviation for ; the optimal is given by (17); and the optimal is given by with

 β∗k(λk) =μkwkRe[uHk˜VHhk] ×(K∑l=12μlwlhHk˜VuluHl˜VHhk+νk+2λk)−