## 1. Introduction

Today’s 4G and 5G Cellular Radio Access Networks (RANs) are experiencing unprecedented
growth in traffic at base stations (BSs) due to increased subscriber numbers
and their higher quality of service requirements (cisco; nokia). To meet the resulting
demand, techniques such as Massive Multiple-Input Multiple-Output (MIMO) communication,
cell densification, and millimeterwave communication are expected to
be deployed in fifthgeneration (5G) cellular standards (3gppintro). But this
in turn significantly increases the power and cost required to operate RAN sites
backed by siliconbased computation. While research and industry efforts have
provided general solutions (e.g., sleep mode (lahdekorpi2017energy) and
network planning (wu2015energy)) to increase energy efficiency and decrease power
consumption of RANs, the fundamental challenge of power requirements scaling
with the exponentially increasing computational requirements of the RAN persists.
Previously (*ca.* 2010), this problem had not limited
innovation, due to the rapid pace of advances in silicon’s computational efficiency.
Unfortunately however, today, such developments are not maintaining the pace
they had in past years, due to transistors approaching atomic limits
(courtland2016transistors) and the end of Moore’s Law
(expected *ca.* 2025–2030 (khan2018science; shalf2020future; itrs)).
This therefore calls into question the prospects of silicon to
achieve NextG cellular targets in terms of both energy and spectral efficiency.

This work investigates a radically different processing architecture
for RANs, one based on quantum computation, to see if this emerging technology
can offer cost and power benefits over silicon computation in wireless networks.
We seek to quantitatively analyze whether in the coming years and decades, mobile operators
might rationally invest in the RAN’s capital (CapEx) by purchasing quantum
hardware of high cost, in a bid to lower its operational expenditure (OpEx) and hence
the *Total Cost of Ownership* (TCO = CapEx + OpEx). The OpEx cost reduction
would result from the reduced power consumption of the RAN, due to higher
computational efficiency of quantum processing over silicon processing for certain
heavyweight baseband processing tasks. Figure 1 depicts this
envisioned scenario, where quantum processing units (QPUs) coexist
with silicon processing units (SPUs) at Centralized RAN (C-RAN) datacenters
(checko2014cloud). QPUs are used for the RAN’s heavy baseband processing,
whereas SPUs handle the network’s control plane (e.g., resource allocation,
communication control interface), transfer systems (e.g., enhanced
common public radio interface, mobility management entity), and further
lightweight tasks such as pre and postprocessing the QPUspecific
computation.

This paper presents the first extensive analysis on power consumption and quantum annealing (QA) architecture to make the case for the future feasibility of quantum processing based RANs. While recent successful pointsolutions that apply QA to variety of wireless network applications (10.1145/3372224.3419207; 9500557; kim2019leveraging; wang2016quantum; cao2019node; ishizaki2019computational; wang2019simulated; chancellor2016direct; lackey2018belief; bian2014discrete; cui2021quantum) serve as our motivation, previous work stops short of a macroscopic power and cost comparison between QA and silicon. Despite QA’s benefits demonstrated by these prior works in their respective point settings, a reasoning of how these results will factor into the overall computational performance and power requirements of the base station and C-RAN remains missing. Therefore, here we investigate these issues headon, to make an endtoend case that QA will likely in future offer benefits over silicon for the processing associated with wireless networks.

In order to realize such an architecture, several key system performance metrics need to be analyzed, quantified, and evaluated, most notably the power consumption, computational throughput, spectral efficiency, operational cost, and feasibility of a large-scale quantum hardware. We carefully discuss these issues in later sections of the work (§3, §5). Our approach is to first describe the factors that influence processing latency and throughput on current QA devices and then, by the assessment of recent developments in the area, project what future QA devices will be capable of under the same metrics (§3). We next analyze cost by evaluating the power consumption of QA (§5). Our analysis reveals that a threeway interplay between latency, power consumption, and the number of qubits (quantum bits) in the QA hardware determines whether the QA technology can benefit over silicon hardware. In particular, latency influences spectral efficiency, power consumption influences energy efficiency, and the number of qubits influences both. Based on these insights, we determine the technology points (i.e., latency, power consumption, and number of qubits) that QA hardware must meet in order to provide advantage over silicon in terms of energy and spectral efficiency.

Table 1 summarizes our results,
showing that for 200 and 400 MHz bandwidths, respectively,
with 2.13 and 4.26M qubits, we predict that
QA processing will achieve spectral efficiency equal to today’s
14 nm CMOS silicon processing, while reducing power consumption by
7.9 kW (16% lower) and 40.9 kW (45% lower) in representative 5G base
station scenarios.
In a C-RAN scenario with five base stations of 200 and
400 MHz bandwidths, QA processing with 10.6M and 21.3M qubits
respectively attains equal spectral
efficiency to silicon, while reducing the
power consumption by 100 kW (35% lower) and
232 kW (58% lower) respectively. Our further evaluations compare QA against future
1.5 nm CMOS silicon, which is expected to be the silicon technology at
the end of Moore’s law scaling (*ca.* 2030 (itrs)). In a
BS scenario with 400 MHz bandwidth and 128 antennas, QA will reduce
power consumption by 30.4 kW (37% lower), in comparison to 1.5 nm CMOS
silicon hardware, while achieving equal spectral efficiency to silicon with
8.5Mqubit QA hardware.

Overall, our quantitative results predict that QA hardware will offer benefits over silicon hardware in certain wireless network scenarios, once the technology matures to hold at least 3–22 million qubits, while improving problem processing time to hundreds of microseconds (§3). Scaling QA processors to millions of qubits will pose challenges along engineering, control, and operation of hardware resources, which designers continue to investigate (boothby2021architectural; bunyk-architectural). Nevertheless, recent research demonstrates large-scale qubit control techniques, showing that a million qubitscale quantum hardware is already at this point in time a realistic prospect (vahapoglu2021single).

B/W | Qubits | Power Consumption | ||||
---|---|---|---|---|---|---|

BS | CRAN | BS (KW) | CRAN (MW) | |||

CMOS | QA | CMOS | QA | |||

50 MHz | 530K | 2.65M | 19.3 | 36 | 0.13 | 0.12 |

100 | 1.06M | 5.3M | 29.4 | 37.9 | 0.18 | 0.13 |

200 | 2.13M | 10.6M | 49.5 | 41.6 | 0.28 | 0.18 |

400 | 4.26M | 21.3M | 89.9 | 49 | 0.48 | 0.25 |

^{2}

^{2}2Silicon results reflect 14 nm CMOS process; QA results reflect 140 s problem processing latency (

*cf.*§3

). System parameters correspond to a base station with 64-antennas, 64-QAM modulation, 0.5 coding rate, and 100% time and frequency domain duty cycles. C-RAN handles five base stations.

## 2. Background and Assumptions

While classical computation uses bits to process information, quantum computation uses qubits, physical devices that allow superposition of bits simultaneously (tech). The current technology landscape consists broadly of fault-tolerant approaches to quantum computing versus noisy intermediate scale quantum (NISQ) implementations. Fault-tolerant quantum computing (shor1996fault) is an ideal scenario that is still far off in the future, whereas NISQ computing (preskill2018quantum), which is available today, suffers high machine noise levels, but gives us an insight into what future faulttolerant methods will be capable of in terms of key quantum effects such as qubit entanglement and tunneling (shor1996fault)

. NISQ processors can be further classified into digital gate model or analog annealing (QA) architectures.

Gatemodel devices (knill2005quantum) are fully general purpose
computers, using programmable logic gates acting on qubits (wichert2020principles), whereas
annealingmodel devices (tech), inspired by
the Adiabatic Theorem of quantum mechanics,
offer a means to search an optimization problem for
its lowest *ground state* energy configurations
in a highdimensional energy landscape (probsolving). While gatemodel quantum devices of size relevant to practical applications are not yet generally available (ibm), today’s QA devices with about 5,000 qubits enable us to commence empirical studies at realistic scales (tech).

In this study, we make the following two assumptions:

Assumption— Ising Model formulation. To enable QA computation, we assume that the cellular baseband’s heavy processing tasks are formulated as Ising model problems. Recent prior work in this area has formulated frequency domain detection, forward error correction, and precoding problems into Ising models (kim2019leveraging; 10.1145/3372224.3419207; 9500557; bian2014discrete; cui2021quantum). We assume further baseband tasks such as predistortion weight selection and updown samplingfiltering hold Ising model formulations due to their computation being digital and decisionseeking (aschbacher2005digital; mattingley2010real), and the availability of penalty methods to reduce higherorder optimization problems into Ising models (boros2002pseudo; dattani2019quadratization).

Assumption— Bespoke QA hardware. Qubit connectivity significantly impacts performance, with sparse connectivity negatively affecting dense problem graphs due to problem mapping difficulties (10.1145/3372224.3419207), but recent advances in QA have bolstered qubit connectivity (topologies) while further improvement efforts continue (katzgraber2018small; lechner2015quantum) and so we assume that these efforts will allow QA hardware tailored to the problem of interest.

##### Roadmap.

In the remainder of this paper, Section 3 analyzes QA hardware architecture and its end-to-end processing latency, and Section 4 describes power modeling in RANs and cellular computational targets. We will then be in a position to present our silicon versus quantum power comparison methodology and then discuss our results in Section 5.

## 3. Quantum Processing Performance

To characterize current and future QA performance, this section analyzes processing time on QA devices, the client of which sends quantum machine instructions (QMI) that characterize an input problem computation to a QA QPU. The QPU then responds with solution data. Fig. 2 depicts the the entire latency a QMI experiences from entering the QPU to the readout of the solution, which consists of programming (§3.1), sampling (§3.2), and postprocessing (§3.3) times.

### 3.1. Programming

As the QMI reaches the QPU, the QPU programs the QMI’s
input problem coefficients:
room temperature electronics send raw signals into the QA
refrigeration unit to program the onchip flux
digitaltoanalog converters (*-DACs*).
The -DACs then apply external magnetic fields
and magnetic couplings locally to the qubits and
couplers respectively. This process is called a
programming cycle, and in current technology it
typically takes 4–40 s (timing),
dictated by the bandwidth of control lines and
the -DAC addressing scheme (boothby2021architectural).
During the programming cycle, the QPU dissipates an amount of heat
that increases the effective temperature of the qubits. This is due to
the movement of flux quanta^{3}^{3}3QA devices store coefficient
information in the form of magnetic flux quanta and it is transferred
via single flux quantum (SFQ) voltage pulses (bunyk-architectural).
in the inductive storage loops of -DACs. Thus, a
postprogramming thermalization time is required to
cool the QPU, ensure proper resetinitialization
of qubits, and allow the QPU to maintain a thermal equilibrium
with the refrigeration unit
(15 mK). QA clients can specify thermalization
times in the range 0–10 ms with microsecond-level granularity. The default value is a conservative one millisecond (tech).

QMI coefficients are programmed by using six -DACs per qubit and
one -DAC per coupler, and the supported bitprecision is
currently up to five bits (four for value, one for sign)
(bunyk-architectural). Each -DAC consists two
inductor storage loops with a pair of Josephson junctions each.
The energy dissipated on chip is on the order of
per single flux quantum (SFQ) moved in an inductor storage loop,
where is the -DAC’s junction critical current and
is the magnetic flux quantum.^{4}^{4}4, where
h is Planck’s constant and e is the electron charge.
For the worst-case reprogramming scenario, this corresponds to 32
SFQs (16 to 16) moving into (or out of) all inductor storage loops
of each -DAC (bunyk-architectural). Table 2
reports onchip energy dissipation values for various QPU sizes
and -DAC critical currents, showing that programming a
largescale device with 10 M qubits and 75 M couplers
(i.e., 15 per qubit (topologies)) will dissipate only
36 pJ on chip. With 1 W power budget available at the
15 mK QPU stage, this accounts for 36 s of QPU thermalization time.

Qubits | Couplers | -DACs | Energy dissipated | |
---|---|---|---|---|

= 55 A (bunyk-architectural) | = 1 A (mcdermott2018quantum) | |||

512 | 1,472 | 4,544 | 66 fJ | 1 fJ |

2,048 | 6,016 | 18,304 | 266 fJ | 5 fJ |

5,436 | 37,440 | 70,056 | 1 pJ | 18 fJ |

10 M | 75 M | 135 M | 2 nJ | 36 pJ |

The next step resetsinitializes the qubits,
during which each qubit transitions from a higher energy state to an
intended ground state, generating spontaneous photon emissions,
heating the QPU. Reed et al. (reed2010fast)
demonstrate the suppression of these emissions
using *Purcell* filters, requiring 80 ns
(120 ns) for 99% (99.9%) fidelity (reed2010fast).

An qubit, coupler, and five-bit precision QA device need to program a worst-case amount of data, which is 27 Kbytes for the current QA ( = 5,436, = 37,440) and 100 Mbytes for a large-scale QA ( = 10M, = 75M). Thus, to maintain today’s microsecond level programming cycle time in future large-scale QA, programming control lines’ bandwidth must be increased by a factor of (i.e., GHz bandwidth lines are needed). By Purcell filter design integration and sufficient amount of control line bandwidth, overall programming time could therefore reach up to 40–80 s in a 10M-qubit large-scale QA device.

### 3.2. Sampling

The process of executing a QMI on a QA device is called sampling, and the time taken for sampling is called the sampling time. The sampling time is classified into three sub-components: the anneal, readout, and readout delay times. A single QMI consists of multiple samples of an input problem, with each sample annealed and read out once, followed by a readout delay (see Fig. 2). Sampling a QMI begins after the QPU programming process.

#### 3.2.1. Anneal.

In this time interval, the QPU implements a QA algorithm (tech) to solve the input problem, where low-frequency annealing lines control the annealing algorithm’s schedule. The bandwidth of these control lines hence limits the minimum annealing time, which is one microsecond today. Weber et al. (lincolnlab) propose the use of flexible print cables with a moderate bandwidth ( 100 MHz) and high isolation ( 50 dB) for annealing, which can potentially decrease annealing times to tens of nanoseconds. Faster annealing times (¡ 40 ns) and/or qubits with longer coherence lifetimes also enable coherent quantum annealing regimes which may benefit the QA fidelity (fastanneal; yan2016flux).

#### 3.2.2. Readout

After annealing, the spin configuration of qubits (i.e., the solution) is read out by measuring the qubits’ persistent current () direction. This readout information propagates from the qubits to readout detectors located at the perimeter of the QPU chip via flux bias lines. Each flux bias line is a chain of electrical circuits called Quantum Flux Parametrons (QFPs), which detect and amplify qubits’

to improve the readout signal-to-noise ratio. These QFP chains act like shift registers, propagating the information from qubits to detectors

(doi:10.1063/1.4939161). In current QA devices with qubits, there are flux bias lines, with each flux bias line responsible for reading out qubits. Further, each flux bias line reads out one qubit at a time (i.e., time-division readout), thus a total of qubits are readout in parallel. Hence, the readout time depends on the qubits’ physical locations, the bandwidth of flux bias lines, and the signal integration time. For the current status of technology, the readout time is 25–150 s per sample (tech). Nevertheless, recent research demonstrates promising fast readout techniques, which we describe next.Chen et al. (doi:10.1063/1.4764940) and Heinsoo et al. (PhysRevApplied.10.034040) describe frequency-multiplex readout schemes that enable simultaneous readout of multiple qubits within a flux bias line. While there is no fundamental limit on the number of qubits read out simultaneously, a physical limit is imposed by the line width of qubits’ readout microresonators and the 4–8 GHz operating band (6 GHz center frequency, 4 GHz bandwidth) of commercial microwave transmission line components used in the readout architecture (doi:10.1063/1.4939161). Microresonators with quality factor can capture line widths up to 6/ GHz, thus enabling up to 4/6 qubits to be readout simultaneously. Table 3 reports these results, showing that a of will enable up to 666 K-qubit-parallel readout. This analysis assumes that each microresonator can be fabricated at exactly its design frequency, which is currently not the case. Further developments in understanding the RF properties of microresonators will therefore be needed to achieve this multiplexing performance.

Qubits | Qubits readout in parallel | ||
---|---|---|---|

Time-division | Frequency-multiplex | ||

(doi:10.1063/1.4939161) | (dorche2020high) | ||

512 | 16 | 512 | 512 |

2,048 | 32 | 666 | 2,048 |

5,436 | 52 | 666 | 5,436 |

10 M | 2,200 | 666 | 666K |

Recent work by Grover *et al.* (grover2020fast) show the application of QFPs as isolators, achieving a readout fidelity of 98.6% (99.6%) in 80 ns (1 s) only. Walter *et al.* (PhysRevApplied.7.054020) describe a single-shot readout scheme requiring only 48 ns (88 ns) to achieve a 98.25% (99.2%) readout fidelity. Their designs are also compatible with multiplexed architectures and earlier readout schemes, implying that by design integration readout times could reach on the order microseconds per sample.

#### 3.2.3. Readout delay

After a sample’s anneal-readout process, a readout delay is added (see Fig. 2). In this time interval, qubits are reset for next sample’s anneal, and QA clients can specify times in the range 0–10 ms, and the default value is a conservative one millisecond. Nevertheless, about one microsecond is sufficient for high fidelity qubit reset (§3.1) (reed2010fast).

### 3.3. Postprocessing

This time interval is used for post-processing the solutions returned by QA for improving the solution quality (post). Multiple samples’ solutions are post-processed at once in parallel with the current QMI’s annealer computation, whereas the final batch of post-processing occurs in parallel with the programming of next QMI (see Fig. 2). Thus, the post-processing time does not factor into the overall processing time (timing).

In summary, the projected overall programming time is 40–80 s (programming: 4–40 s, thermalization and reset: 36–40 s), anneal time is one ssample, readout time is one ssample, and readout delay time is one ssample. For a target sample count total projected QMI run time is s.

## 4. RAN Power Models and Cellular Targets

This section describes power modeling in RANs and cellular computational targets (4G and 5G).

BBU Task | Reference | 4G (BW = 20 MHz) | 5G (BW = 200 MHz) | 5G (BW = 400 MHz) | ||||||
---|---|---|---|---|---|---|---|---|---|---|

= 1 | = 2 | = 4 | = 8 | = 32 | = 64 | = 128 | = 32 | = 64 | = 128 | |

DPD | 0.160 | 0.320 | 0.640 | 1.280 | 51.20 | 102.4 | 204.8 | 102.4 | 204.8 | 409.6 |

Filter | 0.400 | 0.800 | 1.600 | 3.200 | 128.0 | 256.0 | 512.0 | 256.0 | 512.0 | 1024 |

FFT | 0.160 | 0.320 | 0.640 | 1.280 | 51.20 | 102.4 | 204.8 | 102.4 | 204.8 | 409.6 |

FD | 0.090 | 0.180 | 0.360 | 0.720 | 28.80 | 57.60 | 115.2 | 57.60 | 115.2 | 230.4 |

FD | 0.030 | 0.120 | 0.480 | 1.920 | 307.2 | 1228.8 | 4915.2 | 614.4 | 2457.6 | 9830.4 |

FEC | 0.140 | 0.140 | 0.280 | 0.560 | 22.40 | 44.80 | 89.60 | 44.80 | 89.60 | 179.2 |

CPRI | 0.720 | 0.720 | 1.440 | 2.880 | 115.2 | 230.4 | 460.8 | 230.4 | 460.8 | 921.6 |

PCP | 0.400 | 0.800 | 1.600 | 3,200 | 12.80 | 25.60 | 51.20 | 12.80 | 25.60 | 51.20 |

Total | 2.100 | 3.400 | 7.040 | 15.04 | 716.8 | 2,048 | 6,533.6 | 1420.8 | 4070.4 | 13,056 |

### 4.1. Power Modeling

RAN power models account for power by splitting the BS or C-RAN functionality into the components and sub-components shown in Figs. 1 and 3. This section details these components and their associated power models. We follow the developments by Desset et al. (desset2012flexible) and Ge et al. (ge2017energy).

#### 4.1.1. RAN Base Station

A RAN BS (see Fig. 3) is comprised of a baseband unit (BBU), a radio unit (RU), power amplifiers (PAs), antennas, and a power system (PS). The entire BS power consumption () is then modeled as:

(1) |

where is the BS component’s power consumption, and , , and correspond to fractional losses of Active Cooling (A/C), Mains Supply (MS), and DC–DC conversions of the power system respectively (ge2017energy).

The BBU performs the processing associated with digital baseband (BB), and control and transfer systems. The baseband includes computational tasks such as digital pre-distortion (DPD), up/down sampling or filtering, OFDM-FFT processing, frequency domain (FD) mapping/demapping and equalization, and forward error correction (FEC). The control system undertakes the platform control processing (PCP), and the transfer system processes the eCPRI transport layer. The total BBU power consumption () is then (desset2012flexible):

(2) |

where is the computational task’s power consumption, and is the leakage power resulted from the employed hardware in processing these baseband tasks. FD processing is split into two parts, with linear and non-linear scaling over number of antennas (desset2012flexible). The RU performs analog RF signal processing, consisting of clock generation, low-noise and variable gain amplification, IQ modulation, mixing, buffering, pre-driving, and analog–digital conversions. RU power consumption () scales linearly with number of transceiver chains, and each chain consumes about 10.8 W power (desset2012flexible). For macro-cell BSs, each PA (including antenna feeder) is typically configured at 102.6 W power consumption (ge2017energy).

#### 4.1.2. C-Ran

In the C-RAN architecture, BS processing functionality is amortized and shared, where Remote Radio Heads (RRHs) perform analog RF signal processing and a BBU-pool performs digital baseband computation (of many BSs) at a centralized datacenter (see Fig. 1). Fronthaul (FH) links connect RRHs with the centralized BBU-pool. To relax the FH latency and bandwidth requirements, a part of baseband computation is performed at RRH sites. Several such split models have been proposed (8479363; 3gpp). We consider a split where RRHs perform low Layer 1 baseband processing, such as cyclic prefix removal and FFT-specific computation. The power consumption of C-RAN () is then:

(3) |

where is the C-RAN component’s power consumption and N is the number of RRHs. Fronthaul power consumption depends on the technology, and for fiber-based ethernet or passive optical networks, it can be modeled by assuming a set of parallel communication channels as (7437385; alimi2019energy):

(4) |

where is a constant scaling factor, and represent the traffic load and the capacity of the fronthaul link respectively. For a link capacity of 500 Mbps, is typically ca. 37 Watts (liu2018designing). Power consumption results accounting to the models herein are discussed in §5.

### 4.2. Cellular Processing Requirements

This section describes cellular computational targets in estimated Terra operations per Second (TOPS) the BBU needs to process, and it depends on parameters such as the bandwidth (BW), modulation (M), coding rate (R), number of antennas (

), and time (dt) and frequency (df) domain duty cycles. Prior work (desset2012flexible) present these TOPS complexity values for individual BBU tasks in a reference scenario (BW = 20 MHz, M = 6, R = 1, = 1, dt = df = 100%), which we replicate in Table 4 as Reference. The scaling of these values follow (desset2012flexible):(5) |

where X {BW, M, R, , dt, df} and k [1,6] respectively. The scaling exponents {, , , , , } are {1,0,0,1,1,0} for DPD, Filter, and FFT, {1,0,0,1,1,1} for FD, {1,0,0,2,1,1} for FD, {1,1,1,1,1,1} for CPRI and FEC, and {0,0,0,1,0,0} for PCP (desset2012flexible). The authors determine these exponents based on the dependence of BBU operation with the corresponding parameters. Table 4 reports the TOPS complexity values for representative 4G and 5G scenarios.

## 5. Power and Cost Comparison

Our methodology compares silicon and quantum processing at equal spectral efficiency outcomes. We specify the same BBU targets (Table 4) with silicon and quantum hardware, ensuring equal bits processed per second per Hz per km.

Power consumption of silicon CMOS hardware depends on its performance-per-watt efficiency and the amount of computation at hand. Technology scaling improves this efficiency from generation to generation, inversely proportional to the square of its transistors’ core supply voltage () (stillmaker2017scaling). Table 5 shows typical values of various CMOS devices.

Process | 65 nm | 45 nm | 22 nm | 14 nm | 7 nm | 1.5 nm |
---|---|---|---|---|---|---|

1.1 V | 1.0 V | 0.9 V | 0.8 V | 0.65 V | 0.4 V |

A 65 nm CMOS device has a 0.04 TOPS/Watt efficiency (desset2012flexible), from which we compute the same for today’s 14 nm CMOS, via scaling, and it obtains a 0.076 TOPS/Watt efficiency (i.e., ). Using this hardware efficiency and the TOPS requirements of Table 4, we compute silicon hardware power consumption. Additional power results from leakage currents in silicon transistor channel, and this leakage power is set to 30% of dynamic power (desset2012flexible).

Fig. 4 reports power consumption results of 4G and 5G BSs with 14 nm CMOS processing, computed according to models in §4. In Fig. 3(a), we see that the power amplifier (PA) is the dominating component of 4G BS power consumption, accounting for 57–58% of the total BS power. But, as the network scales to higher bandwidth and antennas envisioned in 5G, the BBU becomes the dominant power consuming component (see Fig. 3(b)), accounting for 69–74% of the total BS power. This quick escalation in power from 0.35–1.43 kW in 4G to 34.7–261.3 kW in 5G is mainly due to the quadratic dependency of FD processing with number of antennas (see Table 4), and the increased network bandwidth consequence of millimeter-wave communication.

The power consumption of QA hardware is nominally 25 kW, dominated by its refrigeration unit (king-naturecomms2021). However, to maintain this 25 kW power for the 5G baseband processing, sufficient amount of qubits are required in the QA hardware, all under the same refrigeration unit. Hence, we first estimate this requirement to satisfy 5G’s spectral efficiency demand.

To compute this, we convert 5G’s target TOPS of Table 4 into target problems per second (PPS), then estimate the number of qubits QA requires to achieve this PPS, individually for baseband computational tasks. We formulate it as:

(6) |

where is the total number of qubits the QA requires for the entire baseband processing, and is the qubit requirement for the baseband task . is the target problems per second, is the number of qubits per problem, and is the run time per problem, of the baseband task. We next demonstrate how to compute these values with running examples.

A MIMO detection requires on average 80M operations, via the Sphere Decoding algorithm (jalden2004maximum), which translates 5G’s target 2457.6 TOPS (Table 4) to 30.72M . Solving the same problem using QA requires 384 qubits (kim2019leveraging), and its run time is s (§3). Substituting these values in Eq. 6 leads to the result that 5G’s processing will require 1.65M qubits with samples.

Solving a rate-half 8,448 block length 5G LDPC code via the belief propagation algorithm requires 150M operations for typical 20 iterations (fernandes2010parallel). For the 5G FEC target 89.6 TOPS (see Table 4), this translates to solving 600K . The QA-based LDPC decoding (10.1145/3372224.3419207) requires 21,132 qubits per such problem and it’s run time is s (§3). This leads to the result that 5G’s processing will require 1.77M qubits () when samples.

: | 1 | 20 | 50 | 100 | 1,000 |
---|---|---|---|---|---|

83 | 140 | 230 | 380 | 3,080 | |

0.98M | 1.65M | 2.7M | 4.5M | 36.3M | |

: | 1.05M | 1.77M | 2.91M | 4.81M | 39.1M |

2.53M | 4.27M | 7.0M | 11.6M | 94.2M |

5G’s FD and FEC tasks correspond to 75% of baseband computation. In the absence of QA-based solution methods for the remainder 25% (, , ), we apply linear scaling to get approximate qubit requirement estimates. Table 6 reports the number of qubits the QA requires as a function of problem run time (), showing that with of {83, 140, 230, 380, 3080} s, QA requires {2.53, 4.27, 7.0, 11.6, 94.2} million qubits respectively to satisfy 5G’s baseband demand. Hence, QA must meet these and combinations to achieve spectral efficiency equal to silicon processing in 5G wireless networks. While we demonstrate an example scenario with 400 MHz BW, 64-antennas, 64-QAM modulation, and 0.5 coding rate, a similar methodology can be applied to estimate network-specific qubit requirements.

Fig. 4(a) reports the power consumption results of 5G BS, where QA is used for BBU’s baseband processing. In comparison to silicon (Fig. 3(b)), QA reduces BS power by 41 kW and 188 kW in 64 and 128 antenna systems once QA meets the above qubit–latency requirements. In Fig. 4(b), we report power consumption in a C-RAN setting with five BSs, where the fronthaul is allowed a 100 Gbps bandwidth. This requires 21.3M qubits () in the QA. While there is no fundamental limit on the number of qubits allowed in a refrigeration unit, we consider three QA devices to be capable of holding these 21.3M qubits conservatively (each draws 25 kW power). In comparison to silicon processing, QA processing reduces C-RAN power by 232 kW (58% lower) . Table 7 reports the OpEx cost savings and carbon emission reductions associated with the respective power savings, computed by considering an average $0.143 (USD) electricity price and 0.92 pounds of emitted per kWh (blscost; carbon). To provide economic benefit over silicon, assuming silicon CapEx is negligible, future QAs’ CapEx must be lower than the respective OpEx savings. For instance, if QA was to be employed in a C-RAN scenario, a CapEx lower than {290K, 581K, 1.45M, 2.9M} will provide economic benefit over silicon in one, two, five, and 10 years, respectively. Table 8 reports power consumption of various CMOS technologies in a BS scenario with a 400 MHz bandwidth. The shadedcolored cells represent that QA will benefit over silicon in terms of power. In 128-antenna scenarios and beyond, QA benefit over future 1.5 nm CMOS, at which Moore’s law is expected to terminate (ca. 2030) (itrs).

Years | BS ( = 64) | BS ( = 128) | C-RAN | |||
---|---|---|---|---|---|---|

Cost ($) | (kt) | Cost | Cost | |||

1 | 50K | 0.15 | 235K | 0.68 | 290K | 0.85 |

2 | 100K | 0.30 | 471K | 1.37 | 581K | 1.70 |

5 | 250K | 0.75 | 1.17M | 3.43 | 1.45M | 4.25 |

10 | 1M | 1.50 | 2.35M | 6.87 | 2.90M | 8.50 |

Future QAs must meet the capability to handle increased chip area and control lines as well. A tile of eight-qubits takes 335335 chip area (bunyk-architectural), which upon scaling to 10M qubits will take 374374 area. If is the number of -DACs (§3), then the QA requires 3 number of control lines via the status quo “cubic-XYZ” addressing scheme (bunyk-architectural). A 10M qubit device with 135M -DACs (see Table 2) will therefore requires a total of 1,540 control lines (i.e., addressing, triggering, and power lines).

65 nm | 45 nm | 22 nm | 14 nm | 7 nm | 1.5 nm | |
---|---|---|---|---|---|---|

32 | 53.4 | 51.8 | 42.7 | 34.7 | 25.8 | 13.0 |

64 | 145 | 135 | 111 | 89.8 | 65.1 | 31.1 |

128 | 445 | 398 | 326 | 261 | 184 | 82.8 |

## 6. Conclusion

While the conventional assumption that silicon hardware will achieve nextG cellular processing targets may well hold true, this Challenge Paper makes the case for the possible future feasibility and potential power advantage of QA over silicon. Our extensive analysis of current QA technology projects quantitative targets that future QAs may well meet in order to provide benefits over silicon in terms of performance, power, and cost. While we acknowledge the practical deployment of quantum processors to be at least tens of years away, this early study informs future quantum hardware design and RAN architecture evolution.

## Acknowledgements

This research is supported by National Science Foundation (NSF) Award CNS-1824357. We thank Keith Briggs, Catherine McGeoch, and Catherine White for useful discussions. P.A.W. is supported by the Engineering and Physical Sciences Research Council (EPSRC) Hub in Quantum Computing and Simulation, Grant Ref. EP/T001062/1.

Comments

There are no comments yet.