Challenge: A Cost and Power Feasibility Analysis of Quantum Annealing for NextG Cellular Wireless Networks

09/03/2021
by   Srikar Kasi, et al.
0

In order to meet mobile cellular users' ever-increasing network usage, today's 4G and 5G networks are designed mainly with the goal of maximizing spectral efficiency. While they have made progress in this regard, controlling the carbon footprint and operational costs of such networks remains a long-standing problem among network designers. This Challenge paper takes a long view on this problem, envisioning a NextG scenario where the network leverages quantum annealing computation for cellular baseband processing. We gather and synthesize insights on power consumption, computational throughput and latency, spectral efficiency, and operational cost, and deployment timelines surrounding quantum technology. Armed with these data, we analyze and project the quantitative performance targets future quantum hardware must meet in order to provide a computational and power advantage over silicon hardware, while matching its whole-network spectral efficiency. Our quantitative analysis predicts that with quantum hardware operating at a 140 μs problem latency and 4.3M qubits, quantum computation will achieve a spectral efficiency equal to silicon while reducing power consumption by 40.8 kW (45 representative 5G base station scenario with 400 MHz bandwidth and 64 antennas, and an 8 kW power reduction (16 MHz-bandwidth 5G scenario.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/01/2020

Towards Hybrid Classical-Quantum Computation Structures in Wirelessly-Networked Systems

With unprecedented increases in traffic load in today's wireless network...
11/28/2018

The L-CSC cluster: Optimizing power efficiency to become the greenest supercomputer in the world in the Green500 list of November 2014

The L-CSC (Lattice Computer for Scientific Computing) is a general purpo...
12/26/2017

The L-CSC cluster: greenest supercomputer in the world in Green500 list of November 2014

The L-CSC (Lattice Computer for Scientific Computing) is a general purpo...
10/05/2018

Energy-Efficient Cellular Communications Powered by Smart Grid Technology

The energy efficiency aspect of cellular networks is a vital topic of re...
01/12/2020

Leveraging Quantum Annealing for Large MIMO Processing in Centralized Radio Access Networks

User demand for increasing amounts of wireless capacity continues to out...
02/24/2021

Quantum Annealing for Large MIMO Downlink Vector Perturbation Precoding

In a multi-user system with multiple antennas at the base station, preco...
05/02/2022

Classical and Quantum Solvers for Joint Network/Servers Power Optimization

The digital transformation that Telecommunications and ICT domains are c...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Today’s 4G and 5G Cellular Radio Access Networks (RANs) are experiencing unprecedented growth in traffic at base stations (BSs) due to increased subscriber numbers and their higher quality of service requirements (cisco; nokia). To meet the resulting demand, techniques such as Massive Multiple-Input Multiple-Output (MIMO) communication, cell densification, and millimeterwave communication are expected to be deployed in fifthgeneration (5G) cellular standards (3gppintro). But this in turn significantly increases the power and cost required to operate RAN sites backed by siliconbased computation. While research and industry efforts have provided general solutions (e.g., sleep mode (lahdekorpi2017energy) and network planning (wu2015energy)) to increase energy efficiency and decrease power consumption of RANs, the fundamental challenge of power requirements scaling with the exponentially increasing computational requirements of the RAN persists. Previously (ca. 2010), this problem had not limited innovation, due to the rapid pace of advances in silicon’s computational efficiency. Unfortunately however, today, such developments are not maintaining the pace they had in past years, due to transistors approaching atomic limits (courtland2016transistors) and the end of Moore’s Law (expected ca. 2025–2030 (khan2018science; shalf2020future; itrs)). This therefore calls into question the prospects of silicon to achieve NextG cellular targets in terms of both energy and spectral efficiency.

Figure 1. Our envisioned deployment scenario of Quantum and Silicon Processing Units (QPUs and SPUs) in a centralized RAN datacenter. QPUs undertake heavy baseband computation while SPUs manage the control plane.

This work investigates a radically different processing architecture for RANs, one based on quantum computation, to see if this emerging technology can offer cost and power benefits over silicon computation in wireless networks. We seek to quantitatively analyze whether in the coming years and decades, mobile operators might rationally invest in the RAN’s capital (CapEx) by purchasing quantum hardware of high cost, in a bid to lower its operational expenditure (OpEx) and hence the Total Cost of Ownership (TCO = CapEx + OpEx). The OpEx cost reduction would result from the reduced power consumption of the RAN, due to higher computational efficiency of quantum processing over silicon processing for certain heavyweight baseband processing tasks. Figure 1 depicts this envisioned scenario, where quantum processing units (QPUs) coexist with silicon processing units (SPUs) at Centralized RAN (C-RAN) datacenters (checko2014cloud). QPUs are used for the RAN’s heavy baseband processing, whereas SPUs handle the network’s control plane (e.g., resource allocation, communication control interface), transfer systems (e.g., enhanced common public radio interface, mobility management entity), and further lightweight tasks such as pre and postprocessing the QPUspecific computation.

This paper presents the first extensive analysis on power consumption and quantum annealing (QA) architecture to make the case for the future feasibility of quantum processing based RANs. While recent successful pointsolutions that apply QA to variety of wireless network applications (10.1145/3372224.3419207; 9500557; kim2019leveraging; wang2016quantum; cao2019node; ishizaki2019computational; wang2019simulated; chancellor2016direct; lackey2018belief; bian2014discrete; cui2021quantum) serve as our motivation, previous work stops short of a macroscopic power and cost comparison between QA and silicon. Despite QA’s benefits demonstrated by these prior works in their respective point settings, a reasoning of how these results will factor into the overall computational performance and power requirements of the base station and C-RAN remains missing. Therefore, here we investigate these issues headon, to make an endtoend case that QA will likely in future offer benefits over silicon for the processing associated with wireless networks.

In order to realize such an architecture, several key system performance metrics need to be analyzed, quantified, and evaluated, most notably the power consumption, computational throughput, spectral efficiency, operational cost, and feasibility of a large-scale quantum hardware. We carefully discuss these issues in later sections of the work (§3, §5). Our approach is to first describe the factors that influence processing latency and throughput on current QA devices and then, by the assessment of recent developments in the area, project what future QA devices will be capable of under the same metrics (§3). We next analyze cost by evaluating the power consumption of QA (§5). Our analysis reveals that a threeway interplay between latency, power consumption, and the number of qubits (quantum bits) in the QA hardware determines whether the QA technology can benefit over silicon hardware. In particular, latency influences spectral efficiency, power consumption influences energy efficiency, and the number of qubits influences both. Based on these insights, we determine the technology points (i.e., latency, power consumption, and number of qubits) that QA hardware must meet in order to provide advantage over silicon in terms of energy and spectral efficiency.

Table 1 summarizes our results, showing that for 200 and 400 MHz bandwidths, respectively, with 2.13 and 4.26M qubits, we predict that QA processing will achieve spectral efficiency equal to today’s 14 nm CMOS silicon processing, while reducing power consumption by 7.9 kW (16% lower) and 40.9 kW (45% lower) in representative 5G base station scenarios. In a C-RAN scenario with five base stations of 200 and 400 MHz bandwidths, QA processing with 10.6M and 21.3M qubits respectively attains equal spectral efficiency to silicon, while reducing the power consumption by 100 kW (35% lower) and 232 kW (58% lower) respectively. Our further evaluations compare QA against future 1.5 nm CMOS silicon, which is expected to be the silicon technology at the end of Moore’s law scaling (ca. 2030 (itrs)). In a BS scenario with 400 MHz bandwidth and 128 antennas, QA will reduce power consumption by 30.4 kW (37% lower), in comparison to 1.5 nm CMOS silicon hardware, while achieving equal spectral efficiency to silicon with 8.5Mqubit QA hardware.

Overall, our quantitative results predict that QA hardware will offer benefits over silicon hardware in certain wireless network scenarios, once the technology matures to hold at least 3–22 million qubits, while improving problem processing time to hundreds of microseconds (§3). Scaling QA processors to millions of qubits will pose challenges along engineering, control, and operation of hardware resources, which designers continue to investigate (boothby2021architectural; bunyk-architectural). Nevertheless, recent research demonstrates large-scale qubit control techniques, showing that a million qubitscale quantum hardware is already at this point in time a realistic prospect (vahapoglu2021single).

B/W Qubits Power Consumption
BS CRAN BS (KW) CRAN (MW)
CMOS QA CMOS QA
50 MHz 530K 2.65M 19.3 36 0.13 0.12
100 1.06M 5.3M 29.4 37.9 0.18 0.13
200 2.13M 10.6M 49.5 41.6 0.28 0.18
400 4.26M 21.3M 89.9 49 0.48 0.25
Table 1. Summary of qubit requirements of QA hardware to achieve equal spectral efficiency to silicon, and power consumption of silicon and QA, at various bandwidths.222Silicon results reflect 14 nm CMOS process; QA results reflect 140 s problem processing latency (cf. §3

). System parameters correspond to a base station with 64-antennas, 64-QAM modulation, 0.5 coding rate, and 100% time and frequency domain duty cycles. C-RAN handles five base stations.

2. Background and Assumptions

Figure 2. Timing diagram of a quantum annealer device. Machine access overheads not relevant to our proposed use case are omitted. Post-processing runs on integrated silicon, in parallel with the annealer computation (tech).

While classical computation uses bits to process information, quantum computation uses qubits, physical devices that allow superposition of bits simultaneously (tech). The current technology landscape consists broadly of fault-tolerant approaches to quantum computing versus noisy intermediate scale quantum (NISQ) implementations. Fault-tolerant quantum computing (shor1996fault) is an ideal scenario that is still far off in the future, whereas NISQ computing (preskill2018quantum), which is available today, suffers high machine noise levels, but gives us an insight into what future faulttolerant methods will be capable of in terms of key quantum effects such as qubit entanglement and tunneling (shor1996fault)

. NISQ processors can be further classified into digital gate model or analog annealing (QA) architectures.

Gatemodel devices (knill2005quantum) are fully general purpose computers, using programmable logic gates acting on qubits (wichert2020principles), whereas annealingmodel devices (tech), inspired by the Adiabatic Theorem of quantum mechanics, offer a means to search an optimization problem for its lowest ground state energy configurations in a highdimensional energy landscape (probsolving). While gatemodel quantum devices of size relevant to practical applications are not yet generally available (ibm), today’s QA devices with about 5,000 qubits enable us to commence empirical studies at realistic scales (tech).

In this study, we make the following two assumptions:

Assumption— Ising Model formulation. To enable QA computation, we assume that the cellular baseband’s heavy processing tasks are formulated as Ising model problems. Recent prior work in this area has formulated frequency domain detection, forward error correction, and precoding problems into Ising models (kim2019leveraging; 10.1145/3372224.3419207; 9500557; bian2014discrete; cui2021quantum). We assume further baseband tasks such as predistortion weight selection and updown samplingfiltering hold Ising model formulations due to their computation being digital and decisionseeking (aschbacher2005digital; mattingley2010real), and the availability of penalty methods to reduce higherorder optimization problems into Ising models (boros2002pseudo; dattani2019quadratization).

Assumption— Bespoke QA hardware. Qubit connectivity significantly impacts performance, with sparse connectivity negatively affecting dense problem graphs due to problem mapping difficulties (10.1145/3372224.3419207), but recent advances in QA have bolstered qubit connectivity (topologies) while further improvement efforts continue (katzgraber2018small; lechner2015quantum) and so we assume that these efforts will allow QA hardware tailored to the problem of interest.

Roadmap.

In the remainder of this paper, Section 3 analyzes QA hardware architecture and its end-to-end processing latency, and Section 4 describes power modeling in RANs and cellular computational targets. We will then be in a position to present our silicon versus quantum power comparison methodology and then discuss our results in Section 5.

3. Quantum Processing Performance

To characterize current and future QA performance, this section analyzes processing time on QA devices, the client of which sends quantum machine instructions (QMI) that characterize an input problem computation to a QA QPU. The QPU then responds with solution data. Fig. 2 depicts the the entire latency a QMI experiences from entering the QPU to the readout of the solution, which consists of programming3.1), sampling3.2), and postprocessing3.3) times.

3.1. Programming

As the QMI reaches the QPU, the QPU programs the QMI’s input problem coefficients: room temperature electronics send raw signals into the QA refrigeration unit to program the onchip flux digitaltoanalog converters (-DACs). The -DACs then apply external magnetic fields and magnetic couplings locally to the qubits and couplers respectively. This process is called a programming cycle, and in current technology it typically takes 4–40 (timing), dictated by the bandwidth of control lines and the -DAC addressing scheme (boothby2021architectural). During the programming cycle, the QPU dissipates an amount of heat that increases the effective temperature of the qubits. This is due to the movement of flux quanta333QA devices store coefficient information in the form of magnetic flux quanta and it is transferred via single flux quantum (SFQ) voltage pulses (bunyk-architectural). in the inductive storage loops of -DACs. Thus, a postprogramming thermalization time is required to cool the QPU, ensure proper resetinitialization of qubits, and allow the QPU to maintain a thermal equilibrium with the refrigeration unit (15 mK). QA clients can specify thermalization times in the range 0–10 ms with microsecond-level granularity. The default value is a conservative one millisecond (tech).

QMI coefficients are programmed by using six -DACs per qubit and one -DAC per coupler, and the supported bitprecision is currently up to five bits (four for value, one for sign) (bunyk-architectural). Each -DAC consists two inductor storage loops with a pair of Josephson junctions each. The energy dissipated on chip is on the order of per single flux quantum (SFQ) moved in an inductor storage loop, where is the -DAC’s junction critical current and is the magnetic flux quantum.444, where h is Planck’s constant and e is the electron charge. For the worst-case reprogramming scenario, this corresponds to 32 SFQs (16 to 16) moving into (or out of) all inductor storage loops of each -DAC (bunyk-architectural). Table 2 reports onchip energy dissipation values for various QPU sizes and -DAC critical currents, showing that programming a largescale device with 10 M qubits and 75 M couplers (i.e., 15 per qubit (topologies)) will dissipate only 36 pJ on chip. With W power budget available at the 15 mK QPU stage, this accounts for 36 s of QPU thermalization time.

Qubits Couplers -DACs Energy dissipated
= 55 A (bunyk-architectural) = 1 A (mcdermott2018quantum)
512 1,472 4,544 66 fJ 1 fJ
2,048 6,016 18,304 266 fJ 5 fJ
5,436 37,440 70,056 1 pJ 18 fJ
10 M 75 M 135 M 2 nJ 36 pJ
Table 2. The on-chip energy dissipation values for various choices of QPU sizes and -DAC critical currents.

The next step resetsinitializes the qubits, during which each qubit transitions from a higher energy state to an intended ground state, generating spontaneous photon emissions, heating the QPU. Reed et al. (reed2010fast) demonstrate the suppression of these emissions using Purcell filters, requiring 80 ns (120 ns) for 99% (99.9%) fidelity (reed2010fast).

An qubit, coupler, and five-bit precision QA device need to program a worst-case amount of data, which is 27 Kbytes for the current QA ( = 5,436, = 37,440) and 100 Mbytes for a large-scale QA ( = 10M, = 75M). Thus, to maintain today’s microsecond level programming cycle time in future large-scale QA, programming control lines’ bandwidth must be increased by a factor of (i.e., GHz bandwidth lines are needed). By Purcell filter design integration and sufficient amount of control line bandwidth, overall programming time could therefore reach up to 40–80 s in a 10M-qubit large-scale QA device.

3.2. Sampling

The process of executing a QMI on a QA device is called sampling, and the time taken for sampling is called the sampling time. The sampling time is classified into three sub-components: the anneal, readout, and readout delay times. A single QMI consists of multiple samples of an input problem, with each sample annealed and read out once, followed by a readout delay (see Fig. 2). Sampling a QMI begins after the QPU programming process.

3.2.1. Anneal.

In this time interval, the QPU implements a QA algorithm (tech) to solve the input problem, where low-frequency annealing lines control the annealing algorithm’s schedule. The bandwidth of these control lines hence limits the minimum annealing time, which is one microsecond today. Weber et al. (lincolnlab) propose the use of flexible print cables with a moderate bandwidth ( 100 MHz) and high isolation ( 50 dB) for annealing, which can potentially decrease annealing times to tens of nanoseconds. Faster annealing times (¡ 40 ns) and/or qubits with longer coherence lifetimes also enable coherent quantum annealing regimes which may benefit the QA fidelity (fastanneal; yan2016flux).

3.2.2. Readout

After annealing, the spin configuration of qubits (i.e., the solution) is read out by measuring the qubits’ persistent current () direction. This readout information propagates from the qubits to readout detectors located at the perimeter of the QPU chip via flux bias lines. Each flux bias line is a chain of electrical circuits called Quantum Flux Parametrons (QFPs), which detect and amplify qubits’

to improve the readout signal-to-noise ratio. These QFP chains act like shift registers, propagating the information from qubits to detectors

(doi:10.1063/1.4939161). In current QA devices with qubits, there are flux bias lines, with each flux bias line responsible for reading out qubits. Further, each flux bias line reads out one qubit at a time (i.e., time-division readout), thus a total of qubits are readout in parallel. Hence, the readout time depends on the qubits’ physical locations, the bandwidth of flux bias lines, and the signal integration time. For the current status of technology, the readout time is 25–150 s per sample (tech). Nevertheless, recent research demonstrates promising fast readout techniques, which we describe next.

Chen et al. (doi:10.1063/1.4764940) and Heinsoo et al. (PhysRevApplied.10.034040) describe frequency-multiplex readout schemes that enable simultaneous readout of multiple qubits within a flux bias line. While there is no fundamental limit on the number of qubits read out simultaneously, a physical limit is imposed by the line width of qubits’ readout microresonators and the 4–8 GHz operating band (6 GHz center frequency, 4 GHz bandwidth) of commercial microwave transmission line components used in the readout architecture (doi:10.1063/1.4939161). Microresonators with quality factor can capture line widths up to 6/ GHz, thus enabling up to 4/6 qubits to be readout simultaneously. Table 3 reports these results, showing that a of will enable up to 666 K-qubit-parallel readout. This analysis assumes that each microresonator can be fabricated at exactly its design frequency, which is currently not the case. Further developments in understanding the RF properties of microresonators will therefore be needed to achieve this multiplexing performance.

Qubits Qubits readout in parallel
Time-division Frequency-multiplex
(doi:10.1063/1.4939161) (dorche2020high)
512 16 512 512
2,048 32 666 2,048
5,436 52 666 5,436
10 M 2,200 666 666K
Table 3. The table shows the number of qubits read out in parallel by time-division (status quo) and frequency-multiplex (projected) readout schemes at various choices of QPU sizes and readout microresonator quality factors ().

Recent work by Grover et al. (grover2020fast) show the application of QFPs as isolators, achieving a readout fidelity of 98.6% (99.6%) in 80 ns (1 s) only. Walter et al. (PhysRevApplied.7.054020) describe a single-shot readout scheme requiring only 48 ns (88 ns) to achieve a 98.25% (99.2%) readout fidelity. Their designs are also compatible with multiplexed architectures and earlier readout schemes, implying that by design integration readout times could reach on the order microseconds per sample.

3.2.3. Readout delay

After a sample’s anneal-readout process, a readout delay is added (see Fig. 2). In this time interval, qubits are reset for next sample’s anneal, and QA clients can specify times in the range 0–10 ms, and the default value is a conservative one millisecond. Nevertheless, about one microsecond is sufficient for high fidelity qubit reset (§3.1) (reed2010fast).

3.3. Postprocessing

This time interval is used for post-processing the solutions returned by QA for improving the solution quality (post). Multiple samples’ solutions are post-processed at once in parallel with the current QMI’s annealer computation, whereas the final batch of post-processing occurs in parallel with the programming of next QMI (see Fig. 2). Thus, the post-processing time does not factor into the overall processing time (timing).

In summary, the projected overall programming time is 40–80 s (programming: 4–40 s, thermalization and reset: 36–40 s), anneal time is one ssample, readout time is one ssample, and readout delay time is one ssample. For a target sample count total projected QMI run time is s.

4. RAN Power Models and Cellular Targets

This section describes power modeling in RANs and cellular computational targets (4G and 5G).

BBU Task Reference 4G (BW = 20 MHz) 5G (BW = 200 MHz) 5G (BW = 400 MHz)
= 1 = 2 = 4 = 8 = 32 = 64 = 128 = 32 = 64 = 128
DPD 0.160 0.320 0.640 1.280 51.20 102.4 204.8 102.4 204.8 409.6
Filter 0.400 0.800 1.600 3.200 128.0 256.0 512.0 256.0 512.0 1024
FFT 0.160 0.320 0.640 1.280 51.20 102.4 204.8 102.4 204.8 409.6
FD 0.090 0.180 0.360 0.720 28.80 57.60 115.2 57.60 115.2 230.4
FD 0.030 0.120 0.480 1.920 307.2 1228.8 4915.2 614.4 2457.6 9830.4
FEC 0.140 0.140 0.280 0.560 22.40 44.80 89.60 44.80 89.60 179.2
CPRI 0.720 0.720 1.440 2.880 115.2 230.4 460.8 230.4 460.8 921.6
PCP 0.400 0.800 1.600 3,200 12.80 25.60 51.20 12.80 25.60 51.20
Total 2.100 3.400 7.040 15.04 716.8 2,048 6,533.6 1420.8 4070.4 13,056
Table 4. Table shows 4G and 5G cellular BBU computational targets in macro base stations operating at 64-QAM modulation and 0.5 coding rate. Time and frequency domain duty cycles are at 100%. Values are in Terra Operations per Second (TOPS).

4.1. Power Modeling

RAN power models account for power by splitting the BS or C-RAN functionality into the components and sub-components shown in Figs. 1 and 3. This section details these components and their associated power models. We follow the developments by Desset et al. (desset2012flexible) and Ge et al. (ge2017energy).

Figure 3. A typical macrocell BS architecture.

4.1.1. RAN Base Station

A RAN BS (see Fig. 3) is comprised of a baseband unit (BBU), a radio unit (RU), power amplifiers (PAs), antennas, and a power system (PS). The entire BS power consumption () is then modeled as:

(1)

where is the BS component’s power consumption, and , , and correspond to fractional losses of Active Cooling (A/C), Mains Supply (MS), and DC–DC conversions of the power system respectively (ge2017energy).

The BBU performs the processing associated with digital baseband (BB), and control and transfer systems. The baseband includes computational tasks such as digital pre-distortion (DPD), up/down sampling or filtering, OFDM-FFT processing, frequency domain (FD) mapping/demapping and equalization, and forward error correction (FEC). The control system undertakes the platform control processing (PCP), and the transfer system processes the eCPRI transport layer. The total BBU power consumption () is then (desset2012flexible):

(2)

where is the computational task’s power consumption, and is the leakage power resulted from the employed hardware in processing these baseband tasks. FD processing is split into two parts, with linear and non-linear scaling over number of antennas (desset2012flexible). The RU performs analog RF signal processing, consisting of clock generation, low-noise and variable gain amplification, IQ modulation, mixing, buffering, pre-driving, and analog–digital conversions. RU power consumption () scales linearly with number of transceiver chains, and each chain consumes about 10.8 W power (desset2012flexible). For macro-cell BSs, each PA (including antenna feeder) is typically configured at 102.6 W power consumption (ge2017energy).

4.1.2. C-Ran

In the C-RAN architecture, BS processing functionality is amortized and shared, where Remote Radio Heads (RRHs) perform analog RF signal processing and a BBU-pool performs digital baseband computation (of many BSs) at a centralized datacenter (see Fig. 1). Fronthaul (FH) links connect RRHs with the centralized BBU-pool. To relax the FH latency and bandwidth requirements, a part of baseband computation is performed at RRH sites. Several such split models have been proposed (8479363; 3gpp). We consider a split where RRHs perform low Layer 1 baseband processing, such as cyclic prefix removal and FFT-specific computation. The power consumption of C-RAN () is then:

(3)

where is the C-RAN component’s power consumption and N is the number of RRHs. Fronthaul power consumption depends on the technology, and for fiber-based ethernet or passive optical networks, it can be modeled by assuming a set of parallel communication channels as (7437385; alimi2019energy):

(4)

where is a constant scaling factor, and represent the traffic load and the capacity of the fronthaul link respectively. For a link capacity of 500 Mbps, is typically ca. 37 Watts (liu2018designing). Power consumption results accounting to the models herein are discussed in §5.

4.2. Cellular Processing Requirements

This section describes cellular computational targets in estimated Terra operations per Second (TOPS) the BBU needs to process, and it depends on parameters such as the bandwidth (BW), modulation (M), coding rate (R), number of antennas (

), and time (dt) and frequency (df) domain duty cycles. Prior work (desset2012flexible) present these TOPS complexity values for individual BBU tasks in a reference scenario (BW = 20 MHz, M = 6, R = 1, = 1, dt = df = 100%), which we replicate in Table 4 as Reference. The scaling of these values follow (desset2012flexible):

(5)
(a) A 4G scenario with 20 MHz bandwidth.
(b) A representative 5G scenario with 400 MHz bandwidth.
Figure 4. Power consumption of silicon 14 nm CMOS processing in 4G and 5G base stations. BBU bar plots are shown with its sub-components (see legend, §4.1.1) in ascending order of power consumption from bottom to top. The percentages (rounded to nearest integer) show the power contribution of that particular BS component (labeled on X-axis) to the total BS power. The BS power at ={2, 4, 8, 32, 64, 128} is {0.35, 0.71, 1.43, 34.7, 89.9, 261.3} kW, in their respective scenarios.

where X {BW, M, R, , dt, df} and k [1,6] respectively. The scaling exponents {, , , , , } are {1,0,0,1,1,0} for DPD, Filter, and FFT, {1,0,0,1,1,1} for FD, {1,0,0,2,1,1} for FD, {1,1,1,1,1,1} for CPRI and FEC, and {0,0,0,1,0,0} for PCP (desset2012flexible). The authors determine these exponents based on the dependence of BBU operation with the corresponding parameters. Table 4 reports the TOPS complexity values for representative 4G and 5G scenarios.

5. Power and Cost Comparison

Our methodology compares silicon and quantum processing at equal spectral efficiency outcomes. We specify the same BBU targets (Table 4) with silicon and quantum hardware, ensuring equal bits processed per second per Hz per km.

Power consumption of silicon CMOS hardware depends on its performance-per-watt efficiency and the amount of computation at hand. Technology scaling improves this efficiency from generation to generation, inversely proportional to the square of its transistors’ core supply voltage () (stillmaker2017scaling). Table 5 shows typical values of various CMOS devices.

Process 65 nm 45 nm 22 nm 14 nm 7 nm 1.5 nm
1.1 V 1.0 V 0.9 V 0.8 V 0.65 V 0.4 V
Table 5. The table reports high-performance voltages of CMOS devices (itrs; itrs_old). The 1.5 nm process is expected to be the silicon technology at the end of Moore’s law (ca. 2030).

A 65 nm CMOS device has a 0.04 TOPS/Watt efficiency (desset2012flexible), from which we compute the same for today’s 14 nm CMOS, via scaling, and it obtains a 0.076 TOPS/Watt efficiency (i.e., ). Using this hardware efficiency and the TOPS requirements of Table 4, we compute silicon hardware power consumption. Additional power results from leakage currents in silicon transistor channel, and this leakage power is set to 30% of dynamic power (desset2012flexible).

Fig. 4 reports power consumption results of 4G and 5G BSs with 14 nm CMOS processing, computed according to models in §4. In Fig. 3(a), we see that the power amplifier (PA) is the dominating component of 4G BS power consumption, accounting for 57–58% of the total BS power. But, as the network scales to higher bandwidth and antennas envisioned in 5G, the BBU becomes the dominant power consuming component (see Fig. 3(b)), accounting for 69–74% of the total BS power. This quick escalation in power from 0.35–1.43 kW in 4G to 34.7–261.3 kW in 5G is mainly due to the quadratic dependency of FD processing with number of antennas (see Table 4), and the increased network bandwidth consequence of millimeter-wave communication.

(a) A representative 5G BS scenario with 400 MHz bandwidth.
(b) A C-RAN with five 400 MHz-bandwidth 64-antenna BSs.
Figure 5. (a) Power consumption of a 5G BS where QA is used of the BBU’s baseband processing. The BS power at = {32, 64, 128} is {37, 49, 73} kW respectively. (b) Power consumption of silicon (484 kW) and QA (252 kW w/ three devices) processing in C-RAN scenario with five base stations. In both (a) and (b), BBU’s further computation (i.e., Control and Transfer systems) is processed by 14 nm CMOS silicon. BBU bar plots are shown with its sub-components (see legend, §4.1.1) in increasing order of power from bottom to top. The percentages (rounded to nearest integer) correspond to components labeled on X-axis.

The power consumption of QA hardware is nominally 25 kW, dominated by its refrigeration unit (king-naturecomms2021). However, to maintain this 25 kW power for the 5G baseband processing, sufficient amount of qubits are required in the QA hardware, all under the same refrigeration unit. Hence, we first estimate this requirement to satisfy 5G’s spectral efficiency demand.

To compute this, we convert 5G’s target TOPS of Table 4 into target problems per second (PPS), then estimate the number of qubits QA requires to achieve this PPS, individually for baseband computational tasks. We formulate it as:

(6)

where is the total number of qubits the QA requires for the entire baseband processing, and is the qubit requirement for the baseband task . is the target problems per second, is the number of qubits per problem, and is the run time per problem, of the baseband task. We next demonstrate how to compute these values with running examples.

A MIMO detection requires on average 80M operations, via the Sphere Decoding algorithm (jalden2004maximum), which translates 5G’s target 2457.6 TOPS (Table 4) to 30.72M . Solving the same problem using QA requires 384 qubits (kim2019leveraging), and its run time is s (§3). Substituting these values in Eq. 6 leads to the result that 5G’s processing will require 1.65M qubits with samples.

Solving a rate-half 8,448 block length 5G LDPC code via the belief propagation algorithm requires 150M operations for typical 20 iterations (fernandes2010parallel). For the 5G FEC target 89.6 TOPS (see Table 4), this translates to solving 600K . The QA-based LDPC decoding (10.1145/3372224.3419207) requires 21,132 qubits per such problem and it’s run time is s (§3). This leads to the result that 5G’s processing will require 1.77M qubits () when samples.

: 1 20 50 100 1,000
83 140 230 380 3,080
0.98M 1.65M 2.7M 4.5M 36.3M
: 1.05M 1.77M 2.91M 4.81M 39.1M
2.53M 4.27M 7.0M 11.6M 94.2M
Table 6. QA qubit requirement at various problem run times to achieve spectral efficiency equal to silicon processing, in a 5G BS scenario with 400 MHz BW and 64 antennas.

5G’s FD and FEC tasks correspond to 75% of baseband computation. In the absence of QA-based solution methods for the remainder 25% (, , ), we apply linear scaling to get approximate qubit requirement estimates. Table 6 reports the number of qubits the QA requires as a function of problem run time (), showing that with of {83, 140, 230, 380, 3080} s, QA requires {2.53, 4.27, 7.0, 11.6, 94.2} million qubits respectively to satisfy 5G’s baseband demand. Hence, QA must meet these and combinations to achieve spectral efficiency equal to silicon processing in 5G wireless networks. While we demonstrate an example scenario with 400 MHz BW, 64-antennas, 64-QAM modulation, and 0.5 coding rate, a similar methodology can be applied to estimate network-specific qubit requirements.

Fig. 4(a) reports the power consumption results of 5G BS, where QA is used for BBU’s baseband processing. In comparison to silicon (Fig. 3(b)), QA reduces BS power by 41 kW and 188 kW in 64 and 128 antenna systems once QA meets the above qubit–latency requirements. In Fig. 4(b), we report power consumption in a C-RAN setting with five BSs, where the fronthaul is allowed a 100 Gbps bandwidth. This requires 21.3M qubits () in the QA. While there is no fundamental limit on the number of qubits allowed in a refrigeration unit, we consider three QA devices to be capable of holding these 21.3M qubits conservatively (each draws 25 kW power). In comparison to silicon processing, QA processing reduces C-RAN power by 232 kW (58% lower) . Table 7 reports the OpEx cost savings and carbon emission reductions associated with the respective power savings, computed by considering an average $0.143 (USD) electricity price and 0.92 pounds of emitted per kWh (blscost; carbon). To provide economic benefit over silicon, assuming silicon CapEx is negligible, future QAs’ CapEx must be lower than the respective OpEx savings. For instance, if QA was to be employed in a C-RAN scenario, a CapEx lower than {290K, 581K, 1.45M, 2.9M} will provide economic benefit over silicon in one, two, five, and 10 years, respectively. Table 8 reports power consumption of various CMOS technologies in a BS scenario with a 400 MHz bandwidth. The shadedcolored cells represent that QA will benefit over silicon in terms of power. In 128-antenna scenarios and beyond, QA benefit over future 1.5 nm CMOS, at which Moore’s law is expected to terminate (ca. 2030) (itrs).

Years BS ( = 64) BS ( = 128) C-RAN
Cost ($) (kt) Cost Cost
1 50K 0.15 235K 0.68 290K 0.85
2 100K 0.30 471K 1.37 581K 1.70
5 250K 0.75 1.17M 3.43 1.45M 4.25
10 1M 1.50 2.35M 6.87 2.90M 8.50
Table 7. Summary of OpEx electricity cost savings (in USD) and emission reduction (in metric kilotons) QA will achieve in comparison to silicon in 5G network scenarios.

Future QAs must meet the capability to handle increased chip area and control lines as well. A tile of eight-qubits takes 335335  chip area (bunyk-architectural), which upon scaling to 10M qubits will take 374374 area. If is the number of -DACs (§3), then the QA requires 3 number of control lines via the status quo “cubic-XYZ” addressing scheme (bunyk-architectural). A 10M qubit device with 135M -DACs (see Table 2) will therefore requires a total of 1,540 control lines (i.e., addressing, triggering, and power lines).

65 nm 45 nm 22 nm 14 nm 7 nm 1.5 nm
32 53.4 51.8 42.7 34.7 25.8 13.0
64 145 135 111 89.8 65.1 31.1
128 445 398 326 261 184 82.8
Table 8. Power consumption (in kW) of a BS (400 MHz bandwidth) with various CMOS devices (top row) and antennas (left column). The shaded/colored cells reflect that QA computation takes lesser power than silicon in that scenario.

6. Conclusion

While the conventional assumption that silicon hardware will achieve nextG cellular processing targets may well hold true, this Challenge Paper makes the case for the possible future feasibility and potential power advantage of QA over silicon. Our extensive analysis of current QA technology projects quantitative targets that future QAs may well meet in order to provide benefits over silicon in terms of performance, power, and cost. While we acknowledge the practical deployment of quantum processors to be at least tens of years away, this early study informs future quantum hardware design and RAN architecture evolution.

Acknowledgements

This research is supported by National Science Foundation (NSF) Award CNS-1824357. We thank Keith Briggs, Catherine McGeoch, and Catherine White for useful discussions. P.A.W. is supported by the Engineering and Physical Sciences Research Council (EPSRC) Hub in Quantum Computing and Simulation, Grant Ref. EP/T001062/1.

References