Towards Quantum Belief Propagation for LDPC Decoding in Wireless Networks

07/21/2020 ∙ by Srikar Kasi, et al. ∙ 0

We present Quantum Belief Propagation (QBP), a Quantum Annealing (QA) based decoder design for Low Density Parity Check (LDPC) error control codes, which have found many useful applications in Wi-Fi, satellite communications, mobile cellular systems, and data storage systems. QBP reduces the LDPC decoding to a discrete optimization problem, then embeds that reduced design onto quantum annealing hardware. QBP's embedding design can support LDPC codes of block length up to 420 bits on real state-of-the-art QA hardware with 2,048 qubits. We evaluate performance on real quantum annealer hardware, performing sensitivity analyses on a variety of parameter settings. Our design achieves a bit error rate of 10^-8 in 20 μs and a 1,500 byte frame error rate of 10^-6 in 50 μs at SNR 9 dB over a Gaussian noise wireless channel. Further experiments measure performance over real-world wireless channels, requiring 30 μs to achieve a 1,500 byte 99.99% frame delivery rate at SNR 15-20 dB. QBP achieves a performance improvement over an FPGA based soft belief propagation LDPC decoder, by reaching a bit error rate of 10^-8 and a frame error rate of 10^-6 at an SNR 2.5–3.5 dB lower. In terms of limitations, QBP currently cannot realize practical protocol-sized (e.g., Wi-Fi, WiMax) LDPC codes on current QA processors. Our further studies in this work present future cost, throughput, and QA hardware trend considerations.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

As the design of mobile cellular wireless networks continues to evolve, timecritical baseband processing functionality from the base stations at the very edge of the wireless network is being shifted and aggregated into more centralized locations (e.g., CloudCentralizedRAN (Lin et al., 2010; Sundaresan, 2013; Checko et al., 2015)) or even small edge datacenters. A key component of mobile cellular baseband processing is the error correction code, a construct that adds parity bit information to the data transmission in order to correct the bit errors that interference and the vagaries of the wireless channel inevitably introduce into the data. In particular LDPC codes, first introduced by Gallager(Gallager, 1962) in 1962 but (with few exceptions (Zyablov and Pinsker, 1975; Tanner, 1981; Margulis, 1982)) mostly ignored until the work of McKay et al. in the late 90s (MacKay, 1999), have approached the Shannon rate limit (Shannon, 1948). Along with Turbo codes (Berrou et al., 1993), LDPC codes stand out today because of their exceptional error correcting capability even close to capacity, but their decoding comprises a significant fraction of the processing requirements for a mobile cellular base station. LDPC codes are considered for inclusion in the 5G New Radio traffic channel (ETSI, 2018), the DVB-S2 standard for satellite communications (Morello and Mignone, 2006), and deep space communications (Book, 2020, 2014). LDPC codes are also currently utilized in the most recent revisions of the 802.11 Wi-Fi protocol family (IEEE, 2012). Given the dominance of LDPC codes in today’s wireless networks, the search for computationally efficient decoders and their ASICFPGA realization is underway.

Background: Quantum Annealing.

This paper notes exciting new developments in the field of computer architecture hold the potential to efficiently decode LDPC codes: recently, quantum annealer (QA) machines previously only hypothesized (McGeoch, 2014; Kadowaki and Nishimori, 1998) have been commercialized, and are now available for use by researchers. QA machines are specialized, analog computers that solve NPcomplete and NPhard optimization problems in their Ising specification (Bian et al., 2010) on current hardware, with future potential for substantial speedups over conventional computing (McGeoch and Wang, 2013). They are comprised of an array of physical devices, each representing a single physical qubit (quantum bit), that can take on a continuum of values, unlike classical information bits, which can only take on binary values. The user of the QA inputs a set of desired pairwise constraints between individual qubits (i.e.

, a slight preference that two particular qubits should differ, andor a strong preference that two particular qubits should be identical) and preferences that each individual qubit should take a particular classical value (0 or 1) in the solution the machine outputs. The QA then considers the entire set of constraints as a large optimization problem that is typically expressed as a quadratic polynomial of binary variables

(Kadowaki and Nishimori, 1998; Lucas, 2014). A multitude of quantum annealing trials comprises a single machine run, with each anneal trial resulting in a potentially different solution to the problem: a set of classical output bits, one per qubit, that best fits the usersupplied constraints on that particular trial.

Quantum-Inspired and Hybrid Algorithms.

The growing interest in quantum computing has recently led to the emergence of several physics-based quantum-inspired algorithms (QIA) (Han and Kim, 2002; Aramon et al., 2019; Katzgraber et al., 2006; Arrazola et al., 2019) and quantum-classical hybrid algorithms (QCH) (Tran et al., 2016; McClean et al., 2016; Sweke et al., 2019; Irie et al., 2020). QIA can be used to simulate quantum phenomena such as superposition and entanglement on classical hardware (Montiel et al., 2019), where widely practiced QIA approaches (e.g., digital annealing (Aramon et al., 2019; Matsubara et al., 2020)

) have solved combinatorial optimization problems with as many as 8,192 problem variables

(Matsubara et al., 2020)

. QCH algorithms broadly operate on a hybrid workflow between classical search heuristics and quantum queries, providing ways to use noisy intermediate-scale quantum computers

(Preskill, 2018) for optimizing problems with as many as 10,000 variables (D-Wave Hybrid Solver Service., Website). In this work, while we demonstrate a quantum annealing based LDPC decoder approach by realizing a small 700 variable problem, we also note that implementation of the same ideas using QIA and QCH methods is also a promising possibility.

This paper presents Quantum Belief Propagation (QBP), a new uplink LDPC decoder that takes a new look at error control decoding, from the fresh perspective of the quantum annealer. QBP is a novel way to design an LDPC decoder that sets aside traditional belief propagation (BP) decoding, instead reduces the first principles of the LDPC code construction in a highlyefficient way directly onto the physical grid of qubits present in the QA we use in this study, the D-Wave 2000-qubit (DW2Q) quantum adiabatic optimizer machine, taking into account the practical, realworld physical qubit interconnections. We have empirically evaluated QBP on the real DW2Q QA hardware. Results on the realworld quantum annealer show that QBP achieves a bit error rate of in 20 s and a 1,500 byte frame error rate of in 50 

s at signal-to-noise ratio of 9 dB over a Gaussian noise channel. In comparison with FPGA-based soft BP LDPC decoders, QBP achieves the same

bit error rate and frame error rate at an SNR 2.5–3.5 dB lower, even when the classical decoder is allowed a very large number of iterations (100). Currently, QBP cannot realize practical protocol-sized LDPC codes on state-of-the-art QA processors with 2,048 qubits. Our further studies present limitations and predicted future of QA (§9).

2. Primer: LDPC codes

A binary (N, K) LDPC code is a linear block code described functionally by a sparse parity check matrix (Gallager, 1962; MacKay, 1999). It is said to be a (, )-regular code if every bit node participates in checks, and every check has bits that together constitute a check constraint. This section describes the conventional encoding and decoding schemes of LDPC codes. Let H = be the LDPC parity check matrix. Each row in H represents a check node constraint whereas each column indicates which check constraint a bit node participates in. In the Tanner graph (Tanner, 1981) of Figure 1, the nodes labeled are check nodes and those labeled are bit nodes, and a value of 1 at H represents an edge between and . Code girth, the length of the shortest cycle in the Tanner graph, is a crucial measure, as a low girth affects the independence of information exchanged between check and bit nodes, diminishing the code’s performance (Lu and Moura, 2006; Orlitsky et al., 2002; Chilappagari et al., 2008).

LDPC Encoder.

Let u be a message of length K. The overall encoding process is summarized as follows:

  1. Convert H into augmented form by GaussJordan elimination. (Here, P is obtained in the conversion process and I

    is the identity matrix.)

  2. Construct a generator matrix G as .

  3. The encoded message c is constructed as c = uG.

This way of encoding ensures that the modulo two bitsum at every check node is zero (Gallager, 1962).

Figure 1. A Tanner Graph of an example LDPC code.
LDPC Decoder.

We describe the BP-based min-sum algorithm (Zhao et al., 2005). Let y be received information, N() the set of bit nodes participating in check constraint , and M() the set of check nodes connected to bit node .

Initialization. Initialize all the bit nodes with their respective a priori log-likelihood ratios (LLRs) as:

(1)

Step 1. For every combination { = }, initialize messages sent to check from bit N() as:

(2)

Step 2. Every check node then updates the message to be sent back, w.r.t every N() as:

(3)

Step 3. Each bit node now updates the message to send back, w.r.t every as:

(4)

To decode, each bit node computes:

(5)

Decision Step. After Step 3, quantize = [, , … , ] such that = 0 if 0, else = 1. are the decoded bits. If satisfies the condition enforced at encoding ( = 0), then is declared as the final decoded message. If it doesn’t satisfy this condition, the BP algorithm iterates Steps 1–3 until a satisfactory is obtained. The decoder terminates at a predetermined threshold number of iterations.

3. Classical BP Decoder Limitations

The goal of most classical BP LDPC decoders is an efficient hardware implementation that maximizes throughput, thus driving a need to minimize data errors. A variety of architectures for the classical hardware implementation of LDPC decoders have been developed (Hailes et al., 2016; Hocevar, 2004; Sun et al., 2006), and in practice, depending on the problem of interest and the hardware resource availability, the decoders are implemented either in serial, partlyparallel, or fully parallel architectures on FPGA/ASIC hardware. Although existing decoders do reach theoretically supported line speeds of, e.g. Wi-Fi (IEEE, 2012), Wi-MAX (IEEE, 2012), and DVB-S/S2 (ETSI, 2009), they make throughput compromises, in particular, reducing decoding precision (such as using low precision LLR bitwidth, limiting iterations, or using reducedcomplexity algorithms (Hailes et al., 2016)). Therefore, the goal of maximizing throughput requires making the most efficient tradeoffs among the following:

  1. To achieve high throughput, a high degree of decoding parallelism is required, demanding more resources in the silicon hardware implementation.

  2. Accurate decoding results require high LLR bit precision (ca. ), along with a precise decoding algorithm, again demanding more hardware resources.

  3. The iterative nature of the BP algorithm impedes throughput by requiring numerous serial iterations before reaching the best, final result. Thus a tradeoff between iteration limit and throughput must be made.

These tradeoffs induce network designers to compromise between decoder operation line rate and precision, within the available limited silicon hardware resources. Block RAMs (BRAMs) are the fundamental array storage resources in FPGAs, where state-of-the-art BRAMs have a read and a write port with independent clocks, implying that a single BRAM can perform a maximum of two readwrite operations in parallel (Xilinx Vivado Design Suite User Guide., Website; Amaricai and Boncalo, 2017). Therefore, to realize a high degree of parallelism required in protocol sized LDPC codes, many BRAMs must be used in parallel to access the BP LLRs. Furthermore to meet FPGA device timing constraints, today’s dual-ported support for BRAMs limits the size of a single data access to 2,048 bits and the number of BRAMs accessible in a single clock cycle to 1,024 (Kastner et al., 2018; Teubner et al., 2010; Xilinx Vivado Design Suite User Guide., Website). This limitation results in the maximum degree of achievable parallelization in current topend Xilinx FPGAs, which corresponds to a 2,048 (1,024 2) LDPC code block length. However, practical block lengths reach up to 1,944 bits in Wi-Fi, 2,304 bits in Wi-MAX, and 64,800 bits in DVB-S2 protocol standards (IEEE, 2012, 2012; ETSI, 2009).

A Xilinx FPGA Resource Study.

Using the Xilinx synthesis tool Vivado HLS, we have implemented a min-sum algorithm based decoder for a -rate, 1944 blocklength, (3, 6)regular LDPC code, on the Xilinx Virtex Ultrascale 440 (xcvu440, the most resourceful Xilinx FPGA) with 8bit LLR precision. The resource measurement metric in FPGAs is generalized to a Configurable Logic Block (CLB). Each CLB in the Ultrascale architecture contains eight six-input LUTs111Most recent FPGAs are equipped with sixinput LUTs, which is equivalent to the resources of a fourinput LUT (Xilinx UltraScale Architecture User Guide., Website; Hailes et al., 2016)., and 16 flip-flops along with arithmetic carry logic and multiplexers (Xilinx UltraScale Architecture User Guide., Website). Our implementation of this fully parallel LDPC decoder covers (229,322316,220) of the CLBs in the device, the upper limit of reliability in terms of resource utilization. Furthermore, our HLS implementation of a (4,8)-regular LDPC code of block length 2048 bits (fully parallel decoder with 8-bit LLR precision) does not fit into that FPGA.

4. Primer: Quantum Annealers

Figure 2. A portion of the Chimera qubit connectivity graph of the DW2Q QA, showing qubits (nodes in the figure) grouped by unit cell. The edges in the figure are couplers.

Quantum Annealing is a heuristic approach to solve combinatorial optimization problems and can be understood at a high level as solving the same class of problems as the more familiar simulated annealing (Laarhoven and Aarts, 1987) techniques. QA takes advantage of the fundamental fact that any process in nature seeks a minimum energy state to attain stability. Given a discrete optimization problem as input, a QA quantum processor unit (QPU) internally frames it as an energy minimization problem and outputs its ground state as the solution.

Quantum Annealing Fundamentals.

In the QA literature, qubits are classified into two types: 

physical and logical. A physical qubit is a qubit that is directly available physically on the QA hardware, while a logical qubit is a set of physical qubits. It is often the case that the QA hardware lacks a coupler between a particular pair of physical qubits that the user would like to correlate. To construct such a relationship, it is general practice to use intermediate couplers to make several physical qubits behave similarly, as explained below in §4.2, a process known as embedding. The set of these similarly behaving embedded physical qubits is then referred to a logical qubit. The process of evolution of quantum bits to settle down at the ground state in the DW2Q QA is called an anneal, while the time taken for this evolution is called the annealing time. The strength of the preference given to each single qubit to end up in a particular 0 or 1 state is a bias, while the strength of each coupler is called coupler strength. Moreover the strength of the couplers that are used to make physical qubits behave similarly as in the aforementioned embedding process, are called JFerros.

Quantum Annealing Hardware.

The QA processor hardware is a network of interlaced radio-frequency superconducting quantum interference device flux qubits fabricated as an integrated circuit, where the local longitudinal fields (i.e., biases) of the devices are adjustable with an external magnetic field and the interactions (i.e., couplers) between pairs of devices are realized with a tunable magnetic coupling using a programmable on-chip control circuitry (Johnson et al., 2011; King et al., 2018). The interconnection diagram of the DW2Q QA hardware we use in this study is a quasi-planar bi-layer Chimera graph. Fig. 2 shows a 24 portion of the 1616 QA’s Chimera graph: each set of eight physical qubits in the figure is called a Chimera unit cell, whereas each edge in the figure is a coupler.

The Annealing Process.

QA processors simulate systems in the two-dimensional transverse field Ising model described by the time-dependent Hamiltonian:

(6)
(7)

where are the Pauli matrices acting on the qubit, and are the problem parameters, where is the time and is the annealing time. and are two monotonic signals such that at the beginning of the anneal (i.e., ), and at the end of the anneal (i.e., ), . The annealing processor initializes every qubit in a superposition state that has no classical counterpart, then gradually evolves this Hamiltonian from time until by introducing quantum fluctuations in a low-temperature environment. The time-dependent evolution of these signals A and B is essentially the annealing algorithm. During the annealing process, the system ideally stays in the local minima and probabilistically finds the global minimum energy configuration of the problem Hamiltonian at its conclusion (Amin, 2015; D-Wave Systems Technology Information., Website).

4.1. QA Problem Forms

QA processors can be used to solve the class of quadratic unconstrained binary optimization (QUBO) problems in their equivalent Ising specification (Bian et al., 2010; Kim et al., 2019), which we define here. The generalized IsingQUBO form is:

(8)

Ising form solution variables take values in , and in QUBO form they take values in . The linear coefficient is the bias of , whereas the quadratic coefficient is the strength of the coupler between and . Coupler strengths can be used to make the qubits agree or disagree. For instance, let us consider an example Ising problem:

(9)

Case I: . The energies for qubit states = , , , and are , , , and respectively. Hence a strong positive coupler strength obtains a minimum energy of when the two qubits are opposites of each other.

Case II: . The energies for qubit states = , , , and are , , , and respectively. Hence a strong negative coupler strength obtains a minimum energy of when the two qubits agree with each other.

4.2. Embedding of Logical Qubits

To visualize the relationship between logical and physical qubits, let us consider another example problem:

(10)

Figure 3(a) is the direct graphical representation of this example problem. However, observe that a threenode, fullyconnected graph structure does not exist in the Chimera graph (cf. Figure 2). Hence, the standard solution is to embed one of the logical qubits into a physical realization consisting of two physical qubits, as Figure 3(b) shows, such that we can construct each required edge in Figure 3(a). Here, logical qubit is mapped to two physical qubits, and with a JFerro of to make and agree with each other.

Figure 3. The embedding process of Eq. 10, where the logical qubit in (a) is mapped onto two physical qubits and as in (b) with a JFerro of ; here and agree.

5. Design

In this section we first detail Quantum Belief Propagation’s reduction of the LDPC decoding problem into a quadratic polynomial (QUBO) form (§5.1), and then present QBP’s graph embedding model (QGEM) design on real QA hardware (§5.2).

5.1. QBP’s LDPC to QUBO Reduction

Our QUBO reduction (§5.1.1) is a linear combination of two functions we have created: (1) an LDPC satisfier function (§5.1.2), and (2) a distance function (§5.1.3). During an anneal, the LDPC satisfier function leaves all the valid LDPC codewords in the zero energy level while raising the energy of all invalid codewords by a magnitude proportional to the LDPC code girth (§2). QBP’s distance function distinguishes the true solution of the problem among all the valid LDPC codewords by separating them by a magnitude depending on the distance between the individual codeword and the received information (with channel noise).

System Model. Let y = [, , … , ] be the received information corresponding to an LDPCencoded transmitted message x = [, , … , ]. Let V be the set of all check constraints of this LDPC encoding. Furthermore, let the final decoded message be the final states of the qubits [, , … , ] respectively, and let any i 0 be an ancillary qubit used for calculation purposes. Any given binary string is said to be a valid codeword when it checks against a given parity check matrix, and an invalid codeword otherwise.

5.1.1. QBP’s objective function

QBP’s QUBO objective function comprises two terms, an LDPC satisfier function to prioritize solutions that satisfy the LDPC check constraints (i.e., = 0), and a distance function to calculate candidate solutions’ proximity to the received information. The entire QUBO function is a weighted linear combination of these two terms:

(11)

Here, is a positive weight used to enforce LDPC-satisfying constraints, while the positive weight

increases the success probability of finding the ground truth

(Ishikawa, 2009).

The overall mechanism is depicted in Fig. 4 with real data: computing the energy values of 20 valid and 20 invalid codewords drawn at random. In Fig. 4(a), we see an energy gap (whose magnitude is denoted ) that our LDPC satisfier function creates between valid and invalid codewords. Note that is directly proportional to the girth (§2) of the LDPC code (i.e., if the girth of the code is low, there exists an invalid codeword which fails lesser number of check constraints, thus implying a low energy gap ). Increasing in Eq. 11 increases this energy gap, thus eliminating invalid codewords as potential solutions. We observe in Fig. 4(b) that the distance function distinguishes the actuallytransmitted codeword from other valid (but not transmitted) codewords that would otherwise also land in the ground energy state. The distance function works by separating the energy levels of both the valid and the invalid codewords by a factor proportional to the design parameter . We explore experimentally in §7 the impact of wireless channel SNR and the QA dynamic range on the best choice of and .

Figure 4. (a) LDPC satisfier function creating an energy gap between valid and invalid codewords. (b) QBP’s objective function seperating the energy bands of both the valid and invalid LDPC codewords, to correctly decode.

5.1.2. LDPC satisfier function

The only LDPC encoding constraint is that the modulotwo bitsum at every check node is zero, i.e., that the sum be even. For each check node we define the function:

(12)

The LDPC constraint is satisfied at check node , if and only if . Here is a function of ancillary qubits (defined in §5.1). We formulate to use minimal number of ancillary qubits with the following minimization:

(13)
(14)
Check node degree : 3 4–7 8–15 16–31
Ancillary qubits required: 1 2 3 4
Table 1. Ancillary qubits required versus check node degree.

where is the degree of (i.e., the number of bits in check constraint ). In Eq. 13, the value of in is the largest index of the ancillary qubit used while computing , ensuring each ancillary qubit is only used once. This formulation of is the binary encoding of integers in the range [0, ], where a single integer corresponds to a single ancillary qubit configuration. The number of ancillary qubits required per check node is given in Table 1. Upon expansion of Eq. 12, introduces both biases and couplers to the objective QUBO and hence require embedding on the Chimera graph.

5.1.3. Distance function

We define a distance that computes the proximity of the qubit to its respective received information as:

(15)

In Eq.15, the probability that should take a one value given the received soft information , can be computed using the likelihood information obtained from the soft demapping of received symbols, for various modulations and channels (Yazdani and Ardakani, 2011). For instance, for a BPSKmodulated (,

) information transmitted over an AWGN channel with noise variance

, this probability is given by .

Hence, we observe that is lesser for the {0, 1} that has a greater probability of being the transmitted bit. Upon expansion of Eq. 15, we note that the distance function introduces only biases to the QUBO problem and hence do not require embedding due to the absence of coupler terms.

5.2. Embedding on Annealer Hardware

Section 4 above has described the process of embedding problems onto the QA in general terms. In this section, we explain how we embed QBP’s QUBO reduction onto the Chimera graph of the DW2Q QA hardware. QBP’s embedding design can make use of an arbitrarilylarge hardware qubit connectivity, supporting LDPC code block lengths up to 420 bits on state-of-the-art DW2Q QA.

Let us assign 2D coordinate values to each unit cell in the Chimera graph with the bottomleft most unit cell as the origin . Here we define terminology:

  • A Chimera unit cell is said to be a neighbor of if and only if , and let (x, y) denote the set of all neighbors of .

  • An intracell embedding is an embedding where both participating qubits lie in the same Chimera unit cell.

  • An intercell embedding is an embedding where one of the qubits belongs to , and the other participating qubit belong to a unit cell in (x, y).

QGEM: QBP’s Graph Embedding Model.

We structure our embedding scheme into two levels, Level I5.2.1) and Level II5.2.2). QBP’s graph embedding model (QGEM) first maps the check constraints (i.e., ()) by constructing the Level-I embedding for all the available Chimera unit cells, and next it accommodates more check constraints via the Level-II embedding, using the idle qubits that were left out during the Level-I embedding. QGEM makes use of the entire qubit computational resources available in the DW2Q QA hardware leaving no qubit idle in the machine.

In the Level-I embedding, QGEM represents a single check constraint (i.e., each ()) of at most degree three on a single Chimera unit cell using one of the four schemas presented in Fig. 5, which we refer to as Types A–D. Each of these schemas uses six qubits for a degreethree check constraint, leaving two qubits in the unit cell idle. Based on the coordinate location U(x,y) of the unit cell, QGEM chooses a single schema for a single Chimera unit cell in a fashion that creates a specific pattern of idle qubits in the Chimera graph, then leverages this pattern to accommodate more check constraints as explained in §5.2.2. Next, QGEM places the check constraints that share a common bit closest to each other, then embeds the qubits representing this shared bit to make them agree, as described in Fig. 34.2). Specifically, if a check constraint is placed in , then QGEM places the check constraints that share common bits with in and embeds the qubits representing such commonly shared bits via an inter-cell embedding (see dotted lines in Fig. 6(a)).

Figure 5. QBP’s unit cell schemas for Level-I Chimera Graph embedding. Here () of Eq. 16 can be interpreted as () respectively in each schema. Idle qubits are shown in a darker shade. Embeddings are thin-blue lines and thick-orange lines are QUBO problem couplers.

In the Level-II embedding, QGEM represents a single check constraint in an ensemble of nine Chimera unit cells using the pattern of idle qubits that the Level-I embedding leaves. The placement of each of these ensembles in the Chimera graph follows a similar fashion as in Level-I embedding (i.e., placing the ensembles whose Level-II check constraints share bits close to each other).

We detail the overall working of QBP’s graph embedding model more fully with a running example. Consider a regular LDPC code: as the degree of each check node is three, let us assume that [] are the three bits participating in one of the check constraints . Let [] be the bitnoderepresenting qubits used at the decoder to extract [] respectively. From Eqs. 12 and 13, the LDPC satisfying constraint of this check node is:

(16)

5.2.1. Level-I Embedding

Upon expansion of Eq. 16, we observe that the quadratic terms (i.e., qubit-pairs) requiring a coupler connectivity are { (, ), (, ), (, ), (, ), (, ), (, ) }. QBP’s Level-I embedding for the example in Eq. 16 can be visualized by interpreting (, , , ) as equivalent to (, , , ) respectively in Figs. 5 and 6. QBP realizes the above required coupler connectivity in four schemas presented in Fig. 5. We next demonstrate the Type A schema.

Construction. We construct the required-and-available coupler connectivity using the QA’s direct physical couplers (e.g., to in Type A, Fig. 5), and realize the required-but-unavailable coupler connectivity {(), (, )}, using two intracell embeddings (e.g. to in Type A, Fig. 5).

Placement. Let us assume that QGEM chooses the above Type A schema for one of the Chimera unit cells whose placement is shown in Fig. 6(a). We note that the example LDPC code is (2, 3)-regular, and so every bit node participates in two check constraints. This implies that each bit-node-representing qubit (i.e., excluding ancillary qubits) must be present in two Chimera unit cells since in the LevelI embedding, we represent a check constraint in a single Chimera unit cell. QGEM thus represents the other check constraint of each of these bitnoderepresenting qubits {, , } in a neighbor unit cell connected via an intercell embedding as depicted in Fig. 6(a), thus making the physical qubits involved in the embedding agree. QGEM repeats this construction over the entire Chimera graph, mapping each check constraint to an appropriate physical location in the QA hardware. QGEM selects the schema type to use (see Fig. 5) for each unit cell in a way that the two idle qubits of the Level-I unit cell schemas form the pattern as shown in Fig. 7(a).

5.2.2. Level-II Embedding

Let us continue with the example of Eq. 16. The overview of QBP’s Level-II embedding is presented in Fig. 7. Here, in Level-II, the mapping of bits in the check constraint of Eq. 16 to physical qubits is (, , , ) map to (, , , ) respectively. In the Fig. 7, qubits i [0, 3] represent , i [0, 2] represent , i [0, 2] represent , and the qubits i [0, 3] represent , as they are embedded together as shown in Fig. 7(b). The pattern in the figure now allows us to realize all the required coupler connectivity of the example in Eq. 16 as depicted in Fig. 7(a). Similar to our Level-I placement policy, QGEM repeats this construction over the entire Chimera graph, mapping each Level-II check constraint to an appropriate physical location in the QA.

Figure 6. QBP’s Level-I Chimera Graph Embedding.

6. Implementation

We implement QBP on the DW2Q QA: our decoder targets a regular maximum girth LDPC code of block length 420 bits. In the DW2Q, a solver is a resource that runs the input problem. We implement QBP remotely via the C16-VFYC hardware solver API, using the Python client library. This solver first maps the implementation of the problem at hand directly onto the DW2Q’s QPU hardware, then determines the final states of the few (15 on our particular DW2Q) defective qubits via postprocessing on integrated conventional silicon (D-Wave Virtual Full-Yield Chimera Solver., Website). Since postprocessing problem size is two orders of magnitude smaller than overall problem size, postprocessing parallelizes with annealer computation and therefore does not factor into overall performance.

DW2Q readout fidelity is greater than 99%, and the chance of QPU programming error is less than 10% for problems that use all the available QA hardware resources (D-Wave Quantum Processing Unit., Website). However, we increase readout fidelity and decrease the chance of programming error via the standard method of running multiple anneals for every LDPC decoding problem, where each anneal reads the solution bit-string once. In our evaluation, we further quantify the unavoidable intrinsic control errors9) that arise due to the quantization effects and the flux noise of the qubits (D-Wave Quantum Processing Unit., Website). Our endtoend evaluation results capture all the above sources of QA imprecision.

Figure 7. QBP’s Level-II Chimera Graph Embedding. In Fig. 7(b), () represent () of Eq. 16.

7. Evaluation

Our experimental evaluation is on the DW2Q QA, beginning with our experimental methodology description (§7.1). We measure performance over a variety of DW2Q parameter settings (chosen in §7.2), and in both simulated wireless channels, and realistic trace-driven wireless channels. Endtoend experiments (§7.3) compare headtohead against FPGA-based soft belief propagation decoding.

7.1. Experimental Methodology

Let us define an instance I as an LDPC codeword. Our evaluation dataset consists 150 instances with an overall message bits. We conduct anneals for each instance and note the distribution of the solutions returned along with their occurrence frequency. If is the number of different solutions returned for an instance I, we rank these solutions in increasing order of their energies as with being the rank of the minimum energy solution. All the

solutions can be treated as identically independent random variables, as each anneal is identical and independent.

7.1.1. BER Evaluation

Let be the rank of the minimum energy solution in a particular population sample of the entire solution distribution, of size ) anneals. We compute the expected number of bit errors of an instance I over anneals as:

(17)

where the probability of being i [1, ] for an instance I, over performing

anneals is computed using the cumulative distribution function

of observed solutions in anneals as (Kingman, 1975):

(18)

Hence we compute the bit error rate (BER) of an instance I with K information bits upon performing anneals as:

(19)

7.1.2. FER Evaluation

Frame Construction: We construct a frame of length using data blocks of length , so we require such blocks, where each block is an instance. If is the number of available instances, we can construct a single frame by combining any instances among the available instances. Thus the total number of distinct frames we construct for our frame error rate (FER) evaluation is .

FER Calculation: A frame is error-free iff all the code blocks in the frame has zero errors, just as if it has a cyclic redundancy check appended. We compute the probability of a particular frame being error-free (  ) as:

(20)

Then we compute the overall frame error rate (FER) as:

(21)

7.1.3. Wireless Trace-driven Evaluation

We collected channel traces from an aerial drone server communicating with a ground client in an outdoor environment, using the Intel 5300 NIC wireless chip at the client (Halperin et al., 2011). In realistic wireless transmissions, code blocks are transmitted over multiple OFDM symbols, where subcarriers within an OFDM symbol typically experience a diverse range of channels. In our performance evaluation over experimental channels, we compute the per-subcarrier SNR information through channel state information (CSI) readings, and distribute a corresponding Gaussian noise over bits individually for every subcarrier. Next we demodulate and interleave the data symbols and perform QBP’s decoding. Hence we use the distance function (§5.1.3) for this evaluation with equal to the noise variance experienced by ’s subcarrier.

7.1.4. QA versus FPGA Throughput Evaluation

Consider a data frame with message bits. Let us assume that QBP decodes this frame on the QA for a compute time, and soft BP decodes the same frame on an FPGA with clock frequency , for iterations. Let be the number of FPGA clock cycles the soft BP requires to complete an iteration. The actual throughput QBP achieves is then , and the actual FPGA soft BPbased throughput is then .

The values of and depend on the decoder implementation architecture (i.e., serial or parallel) and FPGA hardware type. In order to make a throughput comparison between QA and FPGAs, we evaluate the QA throughput versus the best silicon realization (i.e., a fully-parallel decoder, = 1) throughput on the highest specification Xilinx FPGA, for a range of FPGA clock frequencies and highlight the designdependent operatingtime regions (§7.3).

7.2. Parameter Sensitivity Analysis

In this section, we determine DW2Q QA’s optimal system parameters, including JFerro (), annealing time (), number of anneals (), and the design parameter for evaluating QBP’s overall end-to-end system performance (§7.3).

7.2.1. Choice of Embedding Coupler Strength

In the QA literature, the coupler strength of an embedding is termed JFerro (§4). As the fundamental purpose of embeddings is to make qubits agree, a large, negative JFerro is required in order to ensure the embedding is effective (§4.1). However, as the supported range for coupler strengths in DW2Q QA is , it is general practice to normalize all QUBO coefficients with respect to to bring all the coupler strengths into this supported range .

Figure 8. Left. Choosing JFerro strength to minimize BER. Right. Effect of on BER at various channel SNRs. The magnitude of that minimizes BER is proportional to SNR.

Consider a QUBO problem with coupler strengths in the range [A, B]. Then must be greater than max(, ) to prioritize embeddings over problem couplers, and moderate enough to distinguish the new normalized coupler strengths [, ] as the range lessens. We perform our JFerro sensitivity analysis at a moderate SNR of 8 dB. We use a relatively high anneal time ( = 299 s), to ensure minimal disturbance from the time limit, and we choose our QUBO design parameters = 1.0 and = 6.0, experiments show that all other values of and results in similar trends for the JFerro sensitivity. Fig. 8 (left) depicts QBP’s BER performance at various strengths. The BER curve of anneals clearly depict that = 8.0 minimizes BER, while for = {1, 10} anneals BER is barely minimized at = 8.0, as the effect of is slight because of fewer anneals. Hence heretofore we set = 8.0 for further evaluation.

7.2.2. Choice of design parameter

QBP’s LDPC satisfier function (Eq. 12) introduces coupler strengths (i.e., quadratic coefficients) greater than one, and hence must be normalized to bring all the problem coupler strengths into the supported range. Hence we set = 1.0 and consider the choice of , the parameter that determines sensitivity to the received bits, in order to identify the correct codeword. We find the optimal value for dynamically with the wireless channel SNR, to balance between the received soft information and the LDPC code constraints.

We perform our sensitivity analysis at = 8.0 (§7.2.1), = 1.0 (§7.2.2), and use a high anneal time ( =  s), to ensure minimal disturbance from the time limit. Fig. 8 (right) depicts QBP’s BER performance at various SNRs while varying . In the figure we observe that the magnitude of that minimizes BER, increases with increase in channel SNR. Hence QBP chooses

at the time of data reception. As an incoming frame arrives, the receiver uses the packet preamble to estimate SNR, and then looks up the best

for decoding in a lookup table.

Figure 9. Choosing anneal time . Figure depicts the probability of not finding the ground truth across distribution of problem instances. = 1 s is sufficient to achieve a high probability of finding ground truth.

7.2.3. Choosing the annealing time

We perform our annealing time sensitivity analysis using = 8.0 (§7.2.1) and = 1.0 (§7.2.2). We choose as above (§7.2.2) and perform = 10 anneals (any number of anneals results in similar trends). Fig. 9 presents the probability of not finding the minimum energy solution over the cumulative distribution across problem instances. We find that an anneal time as low as one s yields a high probability of finding the ground truth, hence we consider = 1 s.

Heretofore we quantify QBP’s performance over total compute time , where = . Fig. 10 depicts the combined result of the overall calibrations presented in (§7.2). Specifically, Fig. 10 shows the probability of not finding the minimum energy solution across the cumulative distribution of problem instances at wireless channel SNR 6 dB over various choices of and computing times (). The figure shows that the best choice of results in a relatively low probability of not finding the ground truth, as well as the benefits of increasing compute time up to 100 s.

Figure 10. The effect of calibrations in (§7.2) at SNR 6 dB, depicting the probability of not finding the minimum energy state at . All plots share common x-y axes, and the distribution is across problem instances. The bottom–left plot corresponds to the best at SNR 6 dB (see Fig. 8 right).
(a) Average BER, AWGN channel.
(b) CDF, AWGN channel.
(c) Average FER, AWGN channel.
(d) Throughput comparison of QBP versus soft BP decoders for a (2,3)-regular code of block length 420 bits. In the figure, the hatched area is the operating-time region of QA, the colored (solid filled) area is the throughput gap between QA ( = 10) and FPGAs ( = 15), the dotted vertical line is the (56 MHz) achieved by our FPGA implementation (8-bit LLR precision, fully parallel decoder), and the dark horizontal line is the upper bound of FPGA throughput (=15) imposed by . The data points from top to bottom of the QBP line in the figure correspond to = {1,5,10,20,50,100} anneals respectively in all the plots.
Figure 11. Quantum Belief Propagation’s system performance in an AWGN channel. CDF in Fig. 10(b) is across individual LDPC problem instances. In Fig. 10(c), the frame size is bits and the Soft BP iterations are 100. In Fig.10(d), all plots share common x-y axes.

7.3. System Performance

This section reports the QBP’s endtoend performance under the above system and design parameter choices (§7.2).

7.3.1. AWGN Channel Performance

We first evaluate over a Gaussian wireless channel at SNRs in the range 1–11 dB, comparing head-to-head against soft BP decoders operating within various iteration count limits.

Bit error rate performance.

In Fig. 10(a), we investigate how average endtoend BER behaves as the wireless channel SNR varies. At regions of channel SNRs less than 6 dB, QBP’s performance lags that of conventional soft BP decoders operating at 20 and 100 iterations, and differences in QBP’s performance at various QA computing times are barely distinguishable. This is because the optimal choice of at low SNRs is low (§7.2.2), thus making the probability of finding the ground truth low for a QA. However as we meet SNRs greater than 6 dB, we observe QBP’s BER curves quickly drop down, reaching a BER of at SNR 7.5–8.5 dB only, whereas conventional soft BP decoders acheive the same BER at an SNR of 10.5–11 dB. This is because the optimal choice of at high SNRs is high (§7.2.2), thus separating the ground truth and the rest with a high energy gap, making the true transmitted message easier to distinguish. Our QBP LDPC decoder acheives a performance improvement over a conventional silicon based soft BP decoder by reaching a BER of at an SNR 2.5–3.5 dB lower.

(a) BER of LDPC problem instances at different SNRs and QA compute times in trace driven channels. The missing boxes in the figure are below BER.
(b) CDF across individual LDPC problem instances in a trace driven channel.
Figure 12. Quantum Belief Propagation’s overall experimental trace-driven channels’ system performance. In Fig. 11(a)

, boxes’ lower/upper whiskers and lower/upper quartiles represent

/ and / percentiles respectively.

Across problem instances. In Fig. 10(b)

, we investigate how bit errors are distributed among individual LDPC problem instances in the same parameter class. The figure shows that when the QBP decoder fails due to too-low QA compute time, bit error rates are rather uniformly distributed across different problem instances. Conversely, increasing the computing time to 10–100

s, the decoder drives BER low, so most instances have zero bit errors, and BER variation reduces. The result shows that {0, 28, 56, 73, 92, 98} percent of instances under QBP’s decoding are below the BER achieved by soft BP at QA compute times {1,5,10,20,50,100} s respectively.

Frame error rate performance.

We investigate QBP’s FER performance under frame sizes of 420, and 12,000 bits. In Fig. 10(c), we observe a shallow FER error floor for SNRs less than 6 dB, noting the dependence of that error floor value on the frame length. When we meet an SNR of 8–9 dB, QBP acheives an FER of with low dependence on the frame length and QA compute time, while soft BP achieves the same BER at an SNR 2–3 dB higher.

Throughput Analysis.

An FPGA-based LDPC decoder is bounded by a maximum operating clock frequency (), the frequency beyond which the FPGA signal routing fails. Let us define the code block solution time as the inverse of the minimum possible time to obtain a decoded solution (i.e., for QA and for an FPGA). Fig.10(d) reports the throughputs. The figure shows that as the channel SNR increases, the throughput gap between QA ( = 10) and FPGAs ( = 15) tends toward a constant value whose magnitude is essentially the gap between the processing throughputs of QA and FPGAs, as the value of (1–FER) §7.1.4 tends toward one. The results imply that the QA can achieve a throughput improvement over the fastest FPGAs implementing a fully parallel decoder, when either the annealing time only improves roughly by 40, or when the annealing time improves by 5 in combination with a 5.4 increase of qubit resources in the QA.

Fig.10(d) compares QBP against soft BP for a small code of 420 bits, thus achieved (56 MHz) is high enough for the FPGA to reach a throughput better than DW2Q QA. However, the value of significantly reduces as code block lengths increase, due to higher complexity of the decoder. Our FPGA implementation (fully parallel decoder, 8-bit LLR precision) of a (2,3)-regular LDPC code of block length 2048 bits achieves an of 17 MHz, while a (4,8)-regular similar LDPC code does not fit into that FPGA.

7.3.2. Trace-driven Channel Performance

Here we demonstrate QBP’s performance in real world trace driven channels (§7.1.3).

Bit error rate performance.

Fig. 11(a) depicts QBP’s BER performance in trace-driven channels. For a given compute time, we observe the BER distribution across problem instances, and its dependency on the channel SNR. For channel average-SNRs in the range 5–10 dB, we observe that a few instances lie at a high BER of , thus driving the mean BER high. As we step up to higher average SNRs greater than 10–15 dB, BER goes down very rapidly over increase in QA compute time for greater than 90% of problem instances, since there is less probability that channel subcarriers experience very low SNRs in this scenario.

Across problem instances. Drilling down into individual problem instances at a particular average SNR in the range 10–15 dB, we observe in Fig. 11(b) that more than 75% of the problem instances lie below the BER at computing times 20–100 s, while exhibiting an error floor spanning two orders of BER between and when the QA computing time is set to 1 s (far less than general practices).

Frame error rate performance.

Fig. 13 depicts QBP’s trace-driven channels’ FER performance at various channel average-SNRs. Each box in the figure represents 10 different channel traces, where we compute FER by constructing , distinct frames (as mentioned in §7.1.2) for each channel trace when = 420 and 12,000 bits respectively. We observe that FER exhibits an error floor when the average channel SNRs are less than 10–15 dB. FER drastically drops down for channel SNRs greater than 15 dB.

Figure 13. QBP’s FER performance in trace driven channels. The unit of frame size in the figure is bits. In the figure, boxes’ lower/upper whiskers and lower/upper quartiles represent / and / percentiles repectively.

8. Related Work

Bian et al. (Bian et al., 2014) present discrete optimization problem solving techniques tailored to QA, solving the LDPC decoding problem by dividing the Tanner graph into several subregions using mincut heuristics, where a different QA run solves each subregion. Bian et al. coordinate solutions of each run to construct the final decoded message. Conversely, QBP’s approach differs with (Bian et al., 2014) both with respect to QUBO formulation and QA hardware embedding. The Bian et al. QUBO design does not adapt to both the wireless channel noise (distance function §5.1.3) and the binary encoding minimization of the ancillary qubits (LDPC satisfier function §5.1.2). From embedding perspective, QBP can solve up to 280 check constraints in a single anneal while Bian et al. solves up to only 20 check constraints on an earlier QA with 512 qubits (which extends to 60–80 check constraints on the current QA with 2,048 qubits). Bian et al. evaluate over a binary symmetric channel (each sub-region run with ) with crossover probabilities in the range of 8–14%, unrealistically high for practical wireless networks, nonetheless experiencing that only 4% out of anneals had no bit errors, lowerbounding their BER by . Lackey proposes techniques for solving Generalized BP problems by sampling a Boltzmann distribution (Lackey, 2018), but does not venture into a performance evaluation. It is also possible to use the QBP’s QUBO design (§5.1) as an input to D-Wave’s builtin greedy search embedding tool (Cai et al., 2014), but this approach scales up to only 60 (2,3)-regular LDPC check constraints, which limits the LDPC code block length to an impractical 90 encoded bits.

QA machines have been recently used to successfully optimize problems in several adjacent domains including Networks (Wang et al., 2016; Kim et al., 2019)

, Machine Learning

(Mott et al., 2017; Adachi and Henderson, 2015), Scheduling (Venturelli et al., 2015), Fault Diagnosis (Bian et al., 2016; Perdomo-Ortiz et al., 2015), and Chemistry (Streif et al., 2019). Efficient embedding methods for mapping fully-connected QUBOs on to QA hardware graphs have also been discussed (Venturelli et al., 2015; Boothby et al., 2016) which support up to 64 variables on DW2Q QA.

9. Looking Forward

QA hardware trend predictions. For the past decade, the number of physical qubits in D-Wave’s QPU has been steadily doubling each year and this trend is expected to continue (D-Wave Systems Technology Information., Website). Fig. 14 presents a predicted extrapolation of quantum annealer qubit and coupler counts into the future. The figure shows that at these rates, an annealer processing chip with one million qubits could be available roughly by the year 2037. Let us envision future QAs with a processor topology that is either a Chimera or a supergraph of Chimera (e.g., Pegasus (D-Wave Next-Generation QPU Topology., Website)) with available qubits, which enables QBP to decode block lengths of at most 5/24 bits in a single anneal. Thus in a QA with = {, , } qubits, we forecast QBP to be able to decode LDPC codes of block lengths up to {2,083, 20,833, 208,333} bits respectively in a single anneal with peak processing throughputs reaching {0.694, 6.94, 69.4} Gbps respectively, while most classical fully parallel decoders do not implement block lengths exceeding 2,048 bits due to signal routing and clock frequency constraints (Hailes et al., 2016).

Limitations of QA. The lack of all-to-all qubit connectivity in today’s QPUs limits the size of the problems the QA can practically solve, implying that the requirement of embedding is a major impediment to leveraging QA for practical applications. Furthermore, the process of transferring the computation and running on real analog QA device introduces a source of noise distinct from communication channel noise called intrinsic control error or ICE, which arises due to the flux noise and the quantization effects of the qubits. ICE effects in the QA alter both the problem biases and couplers , leading the QA to solve a slightly modified input problem in each anneal. Although the errors and are currently in the order of , they may degrade the solution quality of some problems whose minimum energy state is not sufficiently separated from the other states in the energy landscape of the input problem Hamiltonian (D-Wave Quantum Processing Unit., Website). From a design perspective, ideally the qubits that are embedded together must all agree and end up in a similar final state at the end of the annealing process, otherwise the embedding chain is said to be broken: typically broken chains lead to performance degradation, and they are more likely to occur when the number of qubits embedded together in a particular chain is large . QBP’s embedding design include chains of length two–four, and five–nine for Level I and Level II embeddings (§5.2) respectively. Further, there exist postprocessing techniques such as majority vote, weighted random, and minimize local energy that can be used to improve the performance of broken chains (D-Wave qbsolv embedding tool., Website). QBP’s embedding results in a low fraction of broken chains ( 2%), and we use the majority voting technique in those cases to find the variables involved.

Cost considerations. QA technology is currently a cloud-based system and currently costs USD $2,000 for an hour of QPU access time, which is approximately $17.5M for an year. As the evolution of the technology is currently at an early stage (2011–), we consider the next 15 years for the technology to mature to the market. As usage becomes more widespread in future years, we hypothesize that QA prices will decrease with the same trend as classical compute prices have done since the late 20th century. Fig. 15 (top) shows the consumer price index (CPI) of classical computers and peripherals over time (Long Term Price Trends of Computers and Peripherals: U.S. Bureau of Labor Statistics., Website), while Fig. 15 (bottom) shows a similar predicted trend for QA price per hour (PPH). Figs. 14 and 15 imply that, at these rates QA technology is expected to deliver a machine with more than qubits on a single annealer processing chip at the prices of $730, $235, $130, $82, and $68 per hour of QPU access time, by the years 2040, 2045, 2050, 2055, and 2059, respectively. This represents an approximate projected cost of $6.4M, $2M, $1.1M, $700K, and $600K per year, by the above respective years.

Figure 14. D-Wave QA’s hardware resource counts over time. Historical data is in the years 2011–2020. The blue filled (darker) and the red filled (lighter) areas are the predicted qubit and coupler counts respectively, whose upper/lower boundaries are extrapolations of the most recent 2017–2020/2015–2017 qubit-coupler growths respectively. Annotations in the figure are the QA processor titles in the respective years.

Timing considerations. Currently, the DW2Q has a 30–50 ms preprocessing time, 6–8 ms programming time, and 0.125 ms solution readout time per anneal, which are beyond the processing times available for wireless technologies (3-10 ms) (Kim et al., 2019), with supported annealing times in the range [1 s, 2 ms]. Given the large amount of cost, embedding, and timing overheads of today’s annealers, QBP currently cannot be deployed for use in practical applications. While approaches (Bian et al., 2014; D-Wave qbsolv embedding tool., Website) that decompose largescale optimization problems can be used to study more problem variables, they suffer from requiring additional factors of the aforementioned machine overhead times for each extra anneal. The historical trend is encouraging, with the DW2Q having a 5 annealing time improvement over the circa2011 D-Wave One (Boixo et al., 2014).

Figure 15. Top. The plot shows the consumer price index (CPI) of classical computers and peripherals over time with 1997 as the base year. Bottom. The plot shows the predicted price per hour (PPH) of quantum annealers over time. The larger data point is the actual 2015–2020 QA price, which is conservatively assumed to remain the same until the QA technology matures in a predicted 17 years.

10. Conclusion and Future Work

QBP is a novel QAbased uplink LDPC decoder that makes efficient use of the entire QA hardware to achieve new levels of performance beyond stateoftheart BP decoders. Further efforts are needed to generalize QBP’s graph embedding to large-scale LDPC codes with higher check bit degrees. The techniques we propose here may in the more distant future come to be relevant to practical protocol settings, while application of the aforementioned Cloud/Centralized-RAN architecture has also been proposed for small cells (Sundaresan et al., 2016b, a): opening the possibility to its future application to managed Wi-Fi localarea networks. Investigating the QA technology for problems such as network security, downlink precoding, scheduling, and other uplink channel codes such as Polar and Turbo codes is potential future work direction.

Acknowledgements

We thank the anonymous shepherd and reviewers of this paper for their extensive technical feedback, which has enabled us to significantly improve the work. We also thank Davide Venturelli, Catherine McGeoch, the NASA Quantum AI Laboratory (QuAIL), DWave Systems, and the Princeton Advanced Wireless Systems (PAWS) Group for useful discussions. This research is supported by National Science Foundation (NSF) Award CNS1824357, a gift from InterDigital corporation, and an award from the Princeton University School of Engineering and Applied Science Innovation Fund. Support from the USRA Cycle 3 Research Opportunity Program allowed machine time on a D-Wave machine hosted at NASA Ames Research Center.

References

  • S. H. Adachi and M. P. Henderson (2015)

    Application of quantum annealing to training of deep neural networks

    .
    Cited by: §8.
  • A. Amaricai and O. Boncalo (2017) Design trade–offs for fpga implementation of ldpc decoders. In Field, G. Dekoulis (Ed.), pp. 105. External Links: Document, Link Cited by: §3.
  • M. H. Amin (2015) Searching for quantum speedup in quasistatic quantum annealers. Physical Review A 92 (5), pp. 052323. Cited by: §4.
  • M. Aramon, G. Rosenberg, E. Valiante, T. Miyazawa, H. Tamura, and H. G. Katzgraber (2019) Physics-inspired optimization for quadratic unconstrained problems using a digital annealer. Frontiers in Physics 7, pp. 48. Cited by: §1.
  • J. M. Arrazola, A. Delgado, B. R. Bardhan, and S. Lloyd (2019) Quantum-inspired algorithms in practice. External Links: arXiv:1905.10415 Cited by: §1.
  • C. Berrou, A. Glavieux, and P. Thitimajshima (1993) Near shannon limit error-correcting coding and decoding: turbo-codes. 1. In Proceedings of ICC ’93 - IEEE International Conference on Communications, Vol. 2, Geneva, Switzerland, pp. 1064–1070. Cited by: §1.
  • Z. Bian, F. Chudak, R. B. Israel, B. Lackey, W. G. Macready, and A. Roy (2016) Mapping constrained optimization problems to quantum annealing with application to fault diagnosis. Frontiers in ICT 3, pp. 14. Cited by: §8.
  • Z. Bian, F. Chudak, R. Israel, B. Lackey, W. G. Macready, and A. Roy (2014) Discrete optimization using quantum annealing on sparse ising models. Frontiers in Physics 2, pp. 56. Cited by: §8, §9.
  • Z. Bian, F. Chudak, W. G. Macready, and G. Rose (2010) The Ising model: teaching an old problem new tricks. Vol. 2. Cited by: §1, §4.1.
  • S. Boixo, T. F. Rønnow, S. V. Isakov, Z. Wang, D. Wecker, D. A. Lidar, J. M. Martinis, and M. Troyer (2014) Evidence for quantum annealing with more than one hundred qubits. Nature physics 10 (3), pp. 218–224. Cited by: §9.
  • C. B. Book (2020) Radio frequency and modulation systems–part 1 earth stations and spacecraft. Cited by: §1.
  • C. O. Book (2014) Erasure correcting codes for use in near-earth and deep-space communications. Cited by: §1.
  • T. Boothby, A. D. King, and A. Roy (2016) Fast clique minor generation in chimera qubit connectivity graphs. Quantum Information Processing 15 (1), pp. 495–508. Cited by: §8.
  • J. Cai, W. G. Macready, and A. Roy (2014) A practical heuristic for finding graph minors. Cited by: §8.
  • A. Checko, H. L. Christiansen, Y. Yan, L. Scolari, G. Kardaras, M. S. Berger, and L. Dittmann (2015) Cloud RAN for mobile networks—a technology overview. IEEE Communications Surveys Tutorials 17 (1), pp. 405–426. External Links: Document Cited by: §1.
  • S. K. Chilappagari, D. V. Nguyen, B. Vasic, and M. W. Marcellin (2008) Girth of the tanner graph and error correction capability of ldpc codes. In Communication, Control, and Computing, 2008 46th Annual Allerton Conference on, IL, USA, pp. 1238–1245. Cited by: §2.
  • D-Wave Hybrid Solver Service. (Website) Cited by: §1.
  • D-Wave Next-Generation QPU Topology. (Website) Cited by: §9.
  • D-Wave qbsolv embedding tool. (Website) Cited by: §9, §9.
  • D-Wave Quantum Processing Unit. (Website) Cited by: §6, §9.
  • D-Wave Systems Technology Information. (Website) Cited by: §4, §9.
  • D-Wave Virtual Full-Yield Chimera Solver. (Website) Cited by: §6.
  • T. ETSI (2018) 138 212 v15. 2.0 technical specification–5g, nr, multiplexing and channel coding. ETSI. Cited by: §1.
  • ETSI (2009) ETSI Standard EN 302 307: Digital Video Broadcasting; Second generation framing structure, channel coding and modulation systems for Broadcasting, Interactive Services, News Gathering and other broadband satellite applications (DVB-S2). Cited by: §3, §3.
  • R. Gallager (1962) Low-density parity-check codes. IRE Transactions on Information Theory 8 (1), pp. 21–28. Cited by: §1, §2, §2.
  • P. Hailes, L. Xu, R. G. Maunder, B. M. Al-Hashimi, and L. Hanzo (2016) A survey of FPGA-based LDPC decoders. IEEE Communications Surveys & Tutorials 18 (2), pp. 1098–1122. Cited by: §3, §9, footnote 1.
  • D. Halperin, W. Hu, A. Sheth, and D. Wetherall (2011) Tool release: gathering 802.11 n traces with channel state information. ACM SIGCOMM Computer Communication Review 41 (1), pp. 53–53. Cited by: §7.1.3.
  • K. Han and J. Kim (2002)

    Quantum-inspired evolutionary algorithm for a class of combinatorial optimization

    .

    IEEE transactions on evolutionary computation

    6 (6), pp. 580–593.
    Cited by: §1.
  • D. E. Hocevar (2004) A reduced complexity decoder architecture via layered decoding of LDPC codes. In IEEE Workshop on Signal Processing Systems, TX, USA, pp. 107–112. Cited by: §3.
  • IEEE (2012) IEEE Standard for Information technology–Telecommunications and information exchange between systems Local and metropolitan area networks–Specific requirements Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications. Vol. . External Links: Document Cited by: §1.
  • IEEE (2012) IEEE Standard 802.11: Wireless LAN Medium Access and Physical Layer Specifications. Cited by: §3, §3.
  • IEEE (2012) IEEE Standard 802.16: Air Interface for Broadband Wireless Access Systems. Cited by: §3, §3.
  • H. Irie, H. Liang, S. Gongyo, T. Hatsuda, et al. (2020) Hybrid quantum annealing via molecular dynamics. Cited by: §1.
  • H. Ishikawa (2009) Higher-order clique reduction in binary graph cut. In IEEE CVPR, FL, USA, pp. 2993–3000. Cited by: §5.1.1.
  • M. W. Johnson, M. H. Amin, S. Gildert, T. Lanting, F. Hamze, N. Dickson, R. Harris, A. J. Berkley, J. Johansson, P. Bunyk, et al. (2011) Quantum annealing with manufactured spins. Nature 473 (7346), pp. 194. Cited by: §4.
  • T. Kadowaki and H. Nishimori (1998) Quantum annealing in the transverse ising model. Physical Review E 58 (5), pp. 5355. Cited by: §1.
  • R. Kastner, J. Matai, and S. Neuendorffer (2018) Parallel programming for fpgas. External Links: 1805.03648 Cited by: §3.
  • H. G. Katzgraber, S. Trebst, D. A. Huse, and M. Troyer (2006) Feedback-optimized parallel tempering monte carlo. Journal of Statistical Mechanics: Theory and Experiment 2006 (03), pp. P03018. Cited by: §1.
  • M. Kim, D. Venturelli, and K. Jamieson (2019) Leveraging quantum annealing for large mimo processing in centralized radio access networks. In Proceedings of the ACM Special Interest Group on Data Communication, SIGCOMM ’19, New York, NY, USA, pp. 241–255. External Links: ISBN 9781450359566, Link, Document Cited by: §4.1, §8, §9.
  • A. D. King, J. Carrasquilla, J. Raymond, I. Ozfidan, E. Andriyash, A. Berkley, M. Reis, T. Lanting, R. Harris, F. Altomare, et al. (2018) Observation of topological phenomena in a programmable lattice of 1,800 qubits. Nature 560 (7719), pp. 456. Cited by: §4.
  • J. F. Kingman (1975) Random discrete distributions. Journal of the Royal Statistical Society: Series B (Methodological) 37 (1), pp. 1–15. Cited by: §7.1.1.
  • P. J. M. Laarhoven and E. H. L. Aarts (1987) Simulated annealing: theory and applications. Kluwer Academic Publishers, USA. External Links: ISBN 9027725136 Cited by: §4.
  • B. Lackey (2018) A belief propagation algorithm based on domain decomposition. Cited by: §8.
  • Y. Lin, L. Shao, Z. Zhu, Q. Wang, and R. K. Sabhikhi (2010) Wireless network cloud: architecture and system requirements. IBM Journal of Research and Development 54 (1), pp. 4:1–4:12. Cited by: §1.
  • Long Term Price Trends of Computers and Peripherals: U.S. Bureau of Labor Statistics. (Website) Cited by: §9.
  • J. Lu and J. M. Moura (2006) Structured ldpc codes for high-density recording: large girth and low error floor. IEEE transactions on magnetics 42 (2), pp. 208–213. Cited by: §2.
  • A. Lucas (2014) Ising formulations of many NP problems. Frontiers in Physics 2, pp. 5. External Links: Link, Document Cited by: §1.
  • D. J. MacKay (1999) Good error-correcting codes based on very sparse matrices. IEEE Trans. on Information Theory 45 (2), pp. 399–431. Cited by: §1, §2.
  • G. A. Margulis (1982) Explicit constructions of graphs without short cycles and low density codes. Combinatorica 2 (1), pp. 71–78. Cited by: §1.
  • S. Matsubara, M. Takatsu, T. Miyazawa, T. Shibasaki, Y. Watanabe, K. Takemoto, and H. Tamura (2020) Digital annealer for high-speed solving of combinatorial optimization problems and its applications. In 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC), Vol. , Beijing, China, pp. 667–672. Cited by: §1.
  • J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-Guzik (2016) The theory of variational hybrid quantum-classical algorithms. New Journal of Physics 18 (2), pp. 023023. Cited by: §1.
  • C. C. McGeoch (2014) Adiabatic quantum computation and quantum annealing: theory and practice. Synthesis Lectures on Quantum Computing 5 (2), pp. 1–93. Cited by: §1.
  • C. C. McGeoch and C. Wang (2013) Experimental evaluation of an adiabiatic quantum system for combinatorial optimization. In Proceedings of the ACM International Conference on Computing Frontiers, CF ’13, New York, NY, USA. External Links: ISBN 9781450320535, Link, Document Cited by: §1.
  • O. Montiel, Y. Rubio, C. Olvera, and A. Rivera (2019) Quantum-inspired acromyrmex evolutionary algorithm. Scientific reports 9 (1), pp. 1–10. Cited by: §1.
  • A. Morello and V. Mignone (2006) DVB-S2: The second generation standard for satellite broad-band services. Proc. of the IEEE 94 (1), pp. 210–227. Cited by: §1.
  • A. Mott, J. Job, J. Vlimant, D. Lidar, and M. Spiropulu (2017) Solving a higgs optimization problem with quantum annealing for machine learning. Nature 550 (7676), pp. 375–379. Cited by: §8.
  • A. Orlitsky, R. Urbanke, K. Viswanathan, and J. Zhang (2002) Stopping sets and the girth of tanner graphs. In Proceedings IEEE International Symposium on Information Theory,, Vol. , Lausanne, Switzerland, pp. 2. Cited by: §2.
  • A. Perdomo-Ortiz, J. Fluegemann, S. Narasimhan, R. Biswas, and V. N. Smelyanskiy (2015) A quantum annealing approach for fault detection and diagnosis of graph-based systems. The European Physical Journal Special Topics 224 (1), pp. 131–148. Cited by: §8.
  • J. Preskill (2018) Quantum computing in the nisq era and beyond. Quantum 2, pp. 79. Cited by: §1.
  • C. E. Shannon (1948) A mathematical theory of communication. Bell Systems Technical Journal 27 (3), pp. 379–423. Cited by: §1.
  • M. Streif, F. Neukart, and M. Leib (2019) Solving quantum chemistry problems with a d-wave quantum annealer. In Quantum Technology and Optimization Problems, S. Feld and C. Linnhoff-Popien (Eds.), Cham, pp. 111–122. External Links: ISBN 978-3-030-14082-3 Cited by: §8.
  • Y. Sun, M. Karkooti, and J. R. Cavallaro (2006) High throughput, parallel, scalable ldpc encoder/decoder architecture for ofdm systems. In 2006 IEEE Dallas/CAS Workshop on Design, Applications, Integration and Software, Vol. , TX, USA, pp. 39–42. Cited by: §3.
  • K. Sundaresan, M. Y. Arslan, S. Singh, S. Rangarajan, and S. V. Krishnamurthy (2016a) FluidNet: a flexible cloud-based radio access network for small cells. IEEE/ACM Trans. Netw. 24 (2), pp. 915–928. External Links: ISSN 1063-6692 Cited by: §10.
  • K. Sundaresan, M. Y. Arslan, S. Singh, S. Rangarajan, and S. V. Krishnamurthy (2016b) FluidNet: a flexible cloud-based radio access network for small cells. IEEE/ACM Trans. on Networking 24 (2), pp. 915–928. External Links: Document Cited by: §10.
  • K. Sundaresan (2013) Cloud-driven architectures for next generation small cell networks. In Proceedings of the Eighth ACM International Workshop on Mobility in the Evolving Internet Architecture, MobiArch ’13, New York, NY, USA, pp. 3–4. External Links: ISBN 9781450323666, Link, Document Cited by: §1.
  • R. Sweke, F. Wilde, J. Meyer, M. Schuld, P. K. Fahrmann, B. Meynard-Piganeau, and J. Eisert (2019) Stochastic gradient descent for hybrid quantum-classical optimization. External Links: 1910.01155 Cited by: §1.
  • R. Tanner (1981) A recursive approach to low complexity codes. IEEE Transactions on information theory 27 (5), pp. 533–547. Cited by: §1, §2.
  • J. Teubner, R. Mueller, and G. Alonso (2010) FPGA acceleration for the frequent item problem. In 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), Vol. , CA, USA, pp. 669–680. Cited by: §3.
  • T. T. Tran, M. Do, E. G. Rieffel, J. Frank, Z. Wang, B. O’Gorman, D. Venturelli, and J. C. Beck (2016) A hybrid quantum-classical approach to solving scheduling problems. In Ninth annual symposium on combinatorial search, NY, USA, pp. 98–106. Cited by: §1.
  • D. Venturelli, S. Mandrà, S. Knysh, B. O’Gorman, R. Biswas, and V. Smelyanskiy (2015) Quantum optimization of fully connected spin glasses. Phys. Rev. X 5, pp. 031040. External Links: Document, Link Cited by: §8.
  • D. Venturelli, D. J. J. Marchand, and G. Rojo (2015) Quantum annealing implementation of job-shop scheduling. External Links: 1506.08479 Cited by: §8.
  • C. Wang, H. Chen, and E. Jonckheere (2016) Quantum versus simulated annealing in wireless interference network optimization. Scientific reports 6, pp. 25797. Cited by: §8.
  • Xilinx UltraScale Architecture User Guide. (Website) Cited by: §3, footnote 1.
  • Xilinx Vivado Design Suite User Guide. (Website) Cited by: §3.
  • R. Yazdani and M. Ardakani (2011) Efficient llr calculation for non-binary modulations over fading channels. IEEE transactions on communications 59 (5), pp. 1236–1241. Cited by: §5.1.3.
  • J. Zhao, F. Zarkeshvari, and A. H. Banihashemi (2005) On implementation of min-sum algorithm and its modifications for decoding low-density parity-check (ldpc) codes. IEEE transactions on communications 53 (4), pp. 549–554. Cited by: §2.
  • V. V. Zyablov and M. S. Pinsker (1975) Estimation of the error-correction complexity for Gallager low-density codes. Problemy Peredachi Informatsii 11 (1), pp. 23–36. Cited by: §1.