# Temporal Information Processing on Noisy Quantum Computers

The combination of machine learning and quantum computing has emerged as a promising approach for addressing previously untenable problems. Reservoir computing is a state-of-the-art machine learning paradigm that utilizes nonlinear dynamical systems for temporal information processing, whose state-space dimension plays a key role in the performance. Here we propose a quantum reservoir system that harnesses complex dissipative quantum dynamics and the exponentially large quantum state-space. Our proposal is readily implementable on available noisy gate-model quantum processors and possesses universal computational power for approximating nonlinear short-term memory maps, important in applications such as neural modeling, speech recognition and natural language processing. We experimentally demonstrate on superconducting quantum computers that small and noisy quantum reservoirs can tackle high-order nonlinear temporal tasks. Our theoretical and experimental results pave the way for attractive temporal processing applications of near-term gate-model quantum computers of increasing fidelity but without quantum error correction, signifying the potential of these devices for wider applications beyond static classification and regression tasks in interdisciplinary areas.

## Authors

• 4 publications
• 2 publications
• 2 publications
08/18/2021

### Nonlinear Autoregression with Convergent Dynamics on Novel Computational Platforms

Nonlinear stochastic modeling is useful for describing complex engineeri...
03/25/2021

### Learning Temporal Quantum Tomography

Quantifying and verifying the control level in preparing a quantum state...
12/18/2018

### QAOA for Max-Cut requires hundreds of qubits for quantum speed-up

Computational quantum technologies are entering a new phase in which noi...
12/14/2020

### At the Intersection of Deep Sequential Model Framework and State-space Model Framework: Study on Option Pricing

Inference and forecast problems of the nonlinear dynamical system have a...
06/16/2020

### Higher-Order Quantum Reservoir Computing

Quantum reservoir computing (QRC) is an emerging paradigm for harnessing...
04/23/2022

### Towards Bundle Adjustment for Satellite Imaging via Quantum Machine Learning

Given is a set of images, where all images show views of the same area a...
05/14/2021

### Quantum coarse-graining for extreme dimension reduction in modelling stochastic temporal dynamics

Stochastic modelling of complex systems plays an essential, yet often co...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

The ingenious use of quantum effects has led to a significant number of quantum machine learning algorithms that offer computational speed-ups [1, 2]. While awaiting the demonstration of these quantum algorithms on full-fledge quantum computers equipped with quantum error correction, quantum computing has transitioned from theoretical ideas to the noisy intermediate-scale quantum (NISQ) technology era [3]. Hybrid quantum-classical algorithms using short-depth circuits are particularly suitable for implementation on NISQ devices. Many notable experimental demonstrations of NISQ devices employ hybrid algorithms for data classification [4] and quantum chemistry [5]. An on-going quest is to find interesting applications on quantum computers with increasingly lower noise profile but not reaching a low enough threshold to enable continuous quantum error correction.

Here we propose a hybrid quantum-classical algorithm that utilizes dissipative quantum dynamics for temporal information processing on gate-model NISQ quantum processors. Our approach exploits dissipative quantum systems as universal approximators for nonlinear maps with short-term or fading memory, important in a broad class of real-world problems including spoken digit recognition [6], neural modeling [7] and machine learning tasks (e.g., speech processing and natural language processing) [8, 9]

. This is a quantum analogue of the universal function approximation property neural networks enjoy

[10], but for nonlinear mappings from sequential input to sequential output data [11, 12, 13].

If a map has short-term memory, it must eventually forget its initial conditions. Recurrent neural networks are popular approximators for short-term memory maps

[14, 15]

. These are artificial neural networks (NNs), a neuromorphic computing scheme inspired by the human brain. As the number of “neurons” responsible for learning increases, NNs with appropriate connections can approximate certain classes of maps

[16], at the cost of being increasingly difficult to train [17, 18]

. Reservoir computing (RC) circumvents this training cost by using a fixed but randomly generated dynamical system, the “reservoir”, to nonlinearly map sequential inputs into a reservoir state. The connectivity of artificial neurons in the reservoir is random and never requires training. Only a simple linear regression algorithm is required to optimize parameters of a readout function to approximate target outputs. The use of simple linear read-out has connections to the biological concept of mixed selectivity, as demonstrated in monkeys

[19]. The ease of RC implementation has brought forward many successful hardware implementations of classical (i.e., non-quantum) RC schemes [20, 6, 21]. Experiments suggest that the reservoir state-space dimension contributes importantly to the performance improvement for some tasks [21, 22]. Recently, a class of quantum reservoirs (QR) has been proposed to harness the exponentially large quantum state dimension for temporal information processing [23, 24]. This class of QRs is suitable for ensemble quantum systems and a static (non-temporal) version of [23] has been demonstrated in NMR for approximating static maps [25]. Chen and Nurdin further proposed a different class of QRs that is provably universal for approximating nonlinear short-term memory maps [26]. However, realizing these previous proposals in the quantum gate-model remains challenging due to the large number of gates required to implement the dynamics via Trotterization.

In this Letter, we propose and experimentally demonstrate the first universal class of QRs that is readily implementable on noisy gate-model quantum computers for temporal signal processing. The attractiveness of our proposal is that random and noisy quantum circuits of arbitrary depths can be exploited for temporal information processing. Our proof-of-principle demonstration is performed on superconducting quantum processors, showing that quantum reservoirs with a small number of noisy qubits can tackle complex nonlinear temporal tasks, even in the absence of readout and process error mitigation techniques. This work serves as the first theoretical and experimental realization of applying near-term gate-model quantum computers to nonlinear temporal information processing tasks, opening an avenue for time series modeling and signal processing applications of these devices.

## Ii Temporal information processing tasks

Two challenging temporal information processing problems are posed. The first is the multi-step ahead prediction problem, where we are given a length input sequence and the corresponding output sequence of the target map that depends on the initial condition . The first input-output data pair (), where is the train data (in the sequel, we use the input-output train data during ; see the next section on “Experimental demonstration” for a discussion. The goal is to use the train data to optimize the parameters w of another map , so that the outputs approximate . The second problem is the map emulation problem, that is to optimize w of to emulate using different input-output train data pairs . The input and output sequences have the same length for , so that the total number of train data is (we will again use train data during ). When given a previously unseen input sequence , the task is for to approximate the entire output sequence . To obtain each input-output data pair, the initial conditions and of the two maps and are reset.

## Iii Reservoir computing

To say that has short-term memory means that the output becomes less dependent of initial condition as is increased and the output at time becomes increasingly less dependent on input samples from much earlier times ; see Appendix Sec. A.2. To approximate this class of maps, RC exploits nonlinear dynamical systems to project the input into a reservoir state at time . If the reservoir eventually forgets its initial conditions, the state contains short-term memory of the sequential inputs. The dynamics of the reservoir is random and never requires training. The tunable parameters w appear in a readout function , which linearly combines the elements of into an output , with a bias term . The parameters w are optimized using linear regression to minimize an empirical mean squared-error between and . The combination of the reservoir and the readout function produces an input-output map .

Echo-state networks, one of the pioneering classical RC schemes, have been numerically demonstrated to achieve state-of-the-art performance in chaotic system modeling [27]. Subsequent hardware realizations of classical RC proposals suggest empirically that for certain tasks, such as spoken digit recognition, the reservoir state dimension plays a role in the RC’s task performance [21, 22, 6].

## Iv Universal quantum reservoir computers

We propose to use a QR, with a view towards taking advantage of its exponentially large state space. For an -qubit QR, we define its state ’s dissipative evolution, with initial condition , as

 ρl=T(ul)ρl−1=(1−ϵ)(ulT0+(1−ul)T1)ρl−1+ϵσ, (1)

where , and is an arbitrary but fixed density matrix. Here and are two random but fixed completely positive trace-preserving (CPTP) maps. In particular, we can choose () with arbitrary unitaries implemented by native quantum gates of NISQ devices. High fidelity for the unitaries is not strictly required; if the noise in the system is such that it acts to replace with another CPTP map , the QR dynamics is again of the form Eq. (1). This form of QR dynamics has a natural quantum circuit interpretation, see Fig. 1(a) and Appendix Sec. D for an explanation. The input

is encoded as the probability of applying

or . The term controls the speed at which the QR forgets its initial condition (i.e., the “memory” of the QR, with being memoryless). Since grows exponentially in size with the number of qubits, we obtain partial information about by measuring each qubit in the Pauli basis to obtain for , where is acting on qubit . We define a linear readout function

 ¯¯¯yl=hw(ρl)=n∑i=1wiTr(ρlZ(i))+wc. (2)

Eqs. (1) and (2) define a QR that implements an input-output map . More generally, our QR proposal consists of multiple non-interacting subsystems with differing numbers of qubits, similar to the proposals in [24, 26], and a more general form of the readout function can be used. Our class of QRs possesses the universal computational power for approximating nonlinear short-term memory maps. Furthermore, this universality property is invariant under stationary Markovian noise and time-invariant readout error. See Appendix Sec. A, B and C for both proofs. This is supported by experiments, demonstrating that the proposed QRs can tackle nonlinear tasks under the gate and readout error levels achievable by available superconducting NISQ hardware.

## V Experimental demonstration

Four nonlinear tasks are chosen to carefully test different computational aspects of the QR proposal. Tasks 1 and 2 test the QR’s ability to learn high-dimensional and highly nonlinear maps. Task 3 tests the short-term memory ability and Task 4 is a long-term memory map for testing the capability of the QR beyond its theoretical guarantee. For all experimental and numerical details, see Appendix Sec. E. We implement four distinct QRs on three IBM superconducting quantum processors [28]. A 4-qubit and a 10-qubit QRs are implemented on the 20-qubit Boeblingen device; qubits with lower gate errors and longer coherence times are chosen. The 5-qubit Ourense and Vigo devices are used for two distinct 5-qubit QRs. These 5-qubit quantum devices admit simpler qubit couplings but lower gate errors than the 20-qubit Boeblingen device. Through comparison among the four QRs, we can investigate the impact of the size of QRs, the complexity of quantum circuits implementing the QR dynamics and the intrinsic hardware noise on the QRs’ approximation performance.

We require the QRs to forget initial conditions for approximating short-term memory maps. Traditionally, initial conditions are washed-out with a sufficiently long input sequence until reaching a steady state. Here we bypass the washout by choosing and so that is the steady state of Eq. (1) under , meaning that we can initialize the QR circuits in . Furthermore, and should be different and hardware-efficient but sufficiently complex to produce nontrivial quantum dynamics. We choose and , where is an arbitrary rotation on single qubit [29] and is the CNOT gate with control qubit and target qubit ; see Fig. 1(b) and (c) for the circuit schematics. The gate parameters and are uniformly randomly sampled from . The numbers of layers and are sufficiently large to couple all qubits linearly while respecting the coherence limits of these devices. Owing to the more flexible qubit couplings in the Boeblingen device, circuits implementing the 4-qubit and 10-qubit QRs have more gate and random parameters than the 5-qubit QRs’.

When restricted to pure state preparation, instead of realizing Fig. 1(a), we efficiently implement QRs through Monte Carlo sampling as in [30]. At each time step , we draw random circuits, each circuit implements and with probabilities and , respectively; otherwise the circuit is reset in with probability . We choose a sufficiently large and

for a moderate short-term memory. To estimate

, circuits implementing the QRs on the Boeblingen device and the 5-qubit QRs are run for and , respectively. These are chosen according to circuit execution times of the devices.

We apply the four QRs to the four nonlinear tasks on the multi-step ahead prediction and map emulation problems. To implement the same washout as for the QRs for each target map, we inject a constant input sequence of length followed by train and test inputs uniformly randomly sampled from . This change in the input statistics leads to a transitory target output response. We remove the associated transients by discarding the first four target input-output data and the corresponding QR experimental data, see Appendix Sec. E.2 for all data. For the multi-step ahead problem, train and test time steps run from to and to , respectively. For the map emulation problem, train input-output pairs running from to are used, followed by one unseen test input-output pair with the same time steps. To harness the flexibility of the QR approach, a multi-tasking technique is used, in which the four QRs are evolved and the estimates of for all time steps are recorded once, whereas the readout parameters w are optimized independently for each task. That is a fixed QR dynamics, with fixed gate parameter values, is exploited for multiple tasks simultaneously. We evaluate and compare the task performance of QRs using the normalized mean-squared error, , where and between prediction and target . While the success of experimental demonstration of hybrid quantum-classical algorithms often requires error mitigation techniques to reduce the effect of decoherence [31, 32], we remark that our results are obtained without any process or readout error mitigation.

As the number of qubits increases, the 10-qubit Boeblingen QR is expected to perform better than other QRs. For the multi-step ahead prediction problem, we observe that two qubits in the 10-qubit Boeblingen QR experienced significant time-varying deviations between the experimental data and simulation results on the Qiskit simulator; see Appendix Sec. E.4 for a discussion. To remedy this issue, we set the corresponding elements of w to be zeros. The resulting 10-qubit Boeblingen QR (with NMSE0.08) outperforms other QRs with a smaller number of qubits on the first three tasks, and achieves an almost two-fold performance improvement on Task 2; see Table 1 for all NMSEs. The 10-qubit Boeblingen QR predicted outputs follow the target outputs relatively closely as shown in Fig. 2(a). The 5-qubit Ourense QR admits very simple dynamics–with consisting only of CNOTs and whereas the 5-qubit Vigo QR has more gate operations and random gate parameters. The 5-qubit Ourense QR is outperformed by the 5-qubit Vigo QR in all tasks. Considering that the Ourense and Vigo devices have similar noise characteristics and the same qubit coupling map, this suggests that the QR performance can be improved by choosing a more complex quantum circuit, in the sense of having a longer gate sequence.

The 10-qubit Boeblingen QR performs better on all tasks than the 5-qubit QRs except on Task 4. This could be due to the impact of the higher noise level in the Boeblingen device and the fact that the output sequence is generated by a map that is not known to have short-term memory, see Appendix Sec. E.5 for the hardware specifications. Our universal class of QRs can exploit the property of spatial multiplexing as initially proposed in Ref. [24]; see also [26] and Fig. 3 for an illustration. Outputs of distinct and non-interacting 5-qubit QRs can be combined linearly to harness the computational features of both members. Since the combined Ourense and Vigo devices have 10 qubits overall as with the 10-qubit Boeblingen QR but with lower noise levels, it would be meaningful to combine the 5-qubit Vigo and Ourense QRs via spatial multiplexing on the map emulation problem. The results of this multiplexing is summarized in Table 2.

The combination of two 5-qubit QRs as discussed above achieves for the four tasks without any readout or process error mitigation. The predicted multiplexed QR outputs corresponding to the unseen inputs follow the target outputs relatively closely as shown in Fig. 2(b). Without spatial multiplexing, the 5-qubit Ourense or the 5-qubit Vigo QR show a worse performance in the first three tasks; see Table 2. The spatial multiplexed 5-qubit QR combines computational features from the constituent QRs and can achieve comparable performance to the individual members as well as gaining an almost two-fold performance boost on Task 2. We anticipate that spatial multiplexing of QRs with more complex circuit structures and a larger number of qubits can lead to further performance improvements.

## Vi Conclusion

We propose the first universal class of quantum reservoir computers that utilizes dissipative quantum dynamics and is readily implementable on available noisy gate-model quantum hardware for temporal information processing. Our approach harnesses noisy quantum circuits of arbitrary depths, signifying the potential of these devices beyond binary data classification or static regression applications. The theoretical analysis is supported by proof-of-concept experiments on superconducting quantum processors, demonstrating that small-scale noisy quantum reservoirs can perform complex nonlinear tasks in the absence of readout and process error mitigation techniques.

Our approach is scalable in the number of qubits by offloading exponentially costly computations to noisy quantum systems and utilizing classical algorithms with linear cost to process temporal data. Moreover, when implemented on NISQ devices, the micro-second timescale for the evolution of the quantum reservoir suggests its potential for real-time fast signal processing tasks. Guided by our theory, we applied the spatial multiplexing technique initially proposed in [24], and demonstrate experimentally that exploiting distinct computational features of multiple small noisy quantum reservoirs can lead to a computational boost. As NISQ hardware becomes increasingly accessible and the noise level is continually reduced,we anticipate that the quantum reservoir approach will find useful applications in a broad range of scientific disciplines that employ time series modeling and analysis. We are also optimistic for useful applications to be possible even for a noise level above the threshold for continuous quantum error correction.

## Vii Acknowledgments

The authors thank Keisuke Fujii for an insightful discussion. NY is supported by the MEXT Quantum Leap Flagship Program Grant Number JPMXS0118067285.

## Appendix A Universality for nonlinear fading memory maps

We first define notation for the rest of this section. Let be the set of infinite sequences such that for all . Let and be subsets of for which the indices are restricted to and , respectively. For any complex matrix , is the Schatten -norm for some . For any operator , the induced operator norm is . Let denotes the set of density operators.

Consider a map that maps an infinite input sequence to a real infinite output sequence . We say that is -fading memory if there exists a decreasing sequence with , such that for any , we have whenever . Here is the output sequence at time . We also require to be causal and time-invariant as in Ref. [26], meaning that the output of at time only depends the input up to and including that time, and its outputs are invariant under time-shifts. Now we are interested in approximating with a time-invariant fading memory map produced by a quantum reservoir computer.

### a.1 The convergence property

Since is fading memory, the map must also forget its initial condition . This is the convergence property [33] or the echo-state property [27]. We now show that the QR dynamics given by Eq. (1) in the main article is convergent with respect to any . For any , and ,

 ∥T(ul)(ρ−σ)∥1=(1−ϵ)∥(ulT0+(1−ul)T1)(ρ−σ)∥1≤(1−ϵ)∥ρ−σ∥1, (3)

where the last inequality follows from [34, Theorem 9.2] and the fact that the convex combination of CPTP maps is again a CPTP map. Now let and be two arbitrary initial density operators, using the inequality Eq. (3) times, we have

 ∥ρ1,k−ρ2,k∥1=∥∥ ∥∥(←−∏kl=1T(ul))(ρ1,0−ρ2,0)∥∥ ∥∥1≤(1−ϵ)k∥ρ1,0−ρ2,0∥1≤2(1−ϵ)k,

where is the time-composition of from right to left. This implies that there exists a steady state for , depending only on the input sequence , such that for any initial condition , we have

 limk→∞∥ρk−ρ∗∥1=limk→∞∥∥ ∥∥(←−∏kl=1T(ul))ρ0−ρ∗∥∥ ∥∥1=0.

We now define a general form of the quantum reservoir (QR) dynamics, in which each QR consists of non-interacting subsystems initialized in a product state of the subsystems, where each subsystem evolves according to the completely positive trace-preserving map (CPTP) with the same form as Eq. (1) in the main article. That is, at any time the density operator of the QR is governed by the dynamics,

 ρl=N⨂j=1ρ(j)l=T(ul)ρl−1=N⨂j=1T(j)(ul)ρ(j)l−1. (4)

Here the -th subsystem with qubits undergoes the dissipative evolution,

 ρ(j)l=T(j)(ul)ρ(j)l−1=(1−ϵj)(ulT(j)0+(1−ul)T(j)1)ρ(j)l−1+ϵjKσj, (5)

where , is the -th subsystem density operator at time and is an arbitrary but fixed density operator. For any we have , therefore the CPTP map maps density operators to the constant density matrix . The CPTP maps and are arbitrary but fixed and input-independent. To show that is again convergent when the subsystems are initialized in a product state as in the above, we can apply the same argument as for [35, Lemma 5].

Consider two CPTP maps and of the form Eq. (4). Let and be two arbitrary initial product states. Then,

 ∥ρ1,k⊗σ1,k−ρ2,k⊗σ2,k∥1=∥∥ ∥∥(←−∏kl=1T(1)(ul)⊗T(2)(ul))(ρ1,0⊗σ1,0−ρ2,0⊗σ2,0)∥∥ ∥∥1≤∥∥ ∥∥(←−∏kl=1T(1)(ul)⊗T(2)(ul))(ρ1,0⊗σ1,0−ρ2,0⊗σ1,0)∥∥ ∥∥1+∥∥ ∥∥(←−∏kl=1T(1)(ul)⊗T(2)(ul))(ρ2,0⊗σ1,0−ρ2,0⊗σ2,0)∥∥ ∥∥1=∥∥ ∥∥(←−∏kl=1T(1)(ul))(ρ1,0−ρ2,0)⊗(←−∏kl=1T(2)(ul))σ1,0∥∥ ∥∥1+∥∥ ∥∥(←−∏kl=1T(1)(ul))ρ2,0⊗(←−∏kl=1T(2)(ul))(σ1,0−σ2,0)∥∥ ∥∥1=∥∥ ∥∥(←−∏kl=1T(1)(ul))(ρ1,0−ρ2,0)∥∥ ∥∥1∥σ1,k∥1+∥∥ ∥∥(←−∏kl=1T(2)(ul))(σ1,0−σ2,0)∥∥ ∥∥1∥ρ2,k∥1≤2(1−ϵ1)k+2(1−ϵ2)k.

Therefore, is again convergent with respect to all .

### a.2 The fading memory property

Now we associate a readout function to Eq. (4). Let be the total number of qubits, define

 ¯¯¯yl=hw(ρl)=R∑d=1n∑i1=1n∑i2=i1+1⋯n∑in=in−1+1∑ri1+⋯+rin=dwri1,…,rini1,…,in⟨Z(i1)⟩ri1l⋯⟨Z(in)⟩rinl+wc, (6)

where . This readout function is a multivariate polynomial in variables and is its degree. When we have a linear readout function, which is used in all experiments.

The quantum reservoir dynamics in Eq. (4) and readout function in Eq. (6) define a unique functional depending on and ; see [11, 12, 26] for a detailed discussion. For any and any initial condition , the functional is given by

 ¯¯¯¯¯¯M(T,hw)(u)=hw((→∏∞j=0T(u−j))ρ−∞),

where and the limit is point-wise. We can restate the fading memory property in terms of continuity of with respect to a certain norm. Given a decreasing sequence with and any , define a weighted norm . The map is -fading memory if it is continuous in .

We now show that is -fading memory for any decreasing sequence . Using the same argument in [26, Lemma 3], it follows that is -fading memory if is continuous with respect to the inputs for all . If fact, we show that is uniformly continuous. Let and ,

 ∥T(j)(x)−T(j)(y)∥1−1=supA∈C2nj×2nj,∥A∥1=1∥∥(T(j)(x)−T(j)(y))A∥∥1=(1−ϵj)|x−y|supA∈C2nj×2nj,∥A∥1=1∥∥T(j)0(A)−T(j)1(A)∥∥1≤(1−ϵj)|x−y|(∥∥T(j)0∥∥1−1+∥∥T(j)1∥∥1−1)≤2(1−ϵj)|x−y|,

where the last inequality follows from [36, Theorem 2.1]. We remark that [26, Lemma 3] is stated with respect to the Schatten norm, but the same argument holds for .

### a.3 The universality property

Now consider the family of maps arising from different numbers of qubits, different dynamics and different readout parameters w and degree in . To show that is universal for approximating nonlinear -fading memory with any decreasing sequence , we apply the Stone-Weierstrass Theorem [37, Theorem 7.3.1] to show that is dense in the set of all continuous functions defined on . It has been shown that the space is a compact metric space [12, Lemma 2]. We now state the Stone-Weierstrass Theorem.

###### Theorem 1 (Stone-Weierstrass)

Let be a compact metric space and be the set of real-valued continuous functions defined on . If a subalgebra of contains the constant functions and separates points of , then is dense in .

The family forms a polynomial algebra follows from [26, Lemma 5] and the observation that for any QR dynamics and of two (non-interacting subsystems) indexed by the superscript of the form Eq. (4), we have is again of the form Eq. (4) for any density operator of subsystem . Furthermore, is again convergent when initialized in a product state of the subsystems. Therefore, the family forms a polynomial algebra consisting of -fading memory maps.

Lastly, constant functions can be obtained by setting . It remains to show that separates points in . That is, for any distinct with for at least one , we need to find a map such that . We show that we can construct a single-qubit quantum reservoir with this property.

Consider a single-qubit quantum reservoir with a linear readout function (). For the rest of this proof, we drop the subsystem index. This quantum reservoir consists of one system qubit and one ancilla qubit denoted as . Choose the dynamics to be

 ρl=T(ul)ρl−1=(1−ϵ)(ulTra(e−iH(ρl−1⊗ρ0a)eiH)+(1−ul)Tra(e−iH(ρl−1⊗ρ1a)eiH))+ϵKI2, (7)

where for , denotes the partial trace over ancilla and . The map is a CPTP map defined as for any . The Hamiltonian is of the Ising type , where and are the Pauli and operators on qubit , with being the ancilla qubit.

We order an orthogonal basis for as . The matrix representation of the CPTP map Eq. (7) is

 ¯¯¯¯T(ul)=|00⟩⟨00|+(1−ϵ)⎛⎜ ⎜ ⎜ ⎜⎝0000sin2(2J)(2ul−1)cos2(2J)0000cos(2J)cos(2α)−cos(2J)sin(2α)00cos(2J)sin(2α)cos(2J)cos(2α)⎞⎟ ⎟ ⎟ ⎟⎠.

Since Eq. (7) is convergent, we can choose any initial condition

with the corresponding vector representation

. Taking a linear readout function, for , the quantum reservoir implements a functional

 ¯¯¯¯¯¯M(T,hw)(u)=2w1[(−→∏∞j=0¯¯¯¯T(u−j))¯¯¯ρ−∞]2+wc,

where is the second element of the vector corresponding to . Now given two distinct inputs , suppose that . Then choose such that and therefore,

 ¯¯¯¯¯¯M(T,hw)(u)−¯¯¯¯¯¯M(T,hw)(v)=2w1(1−ϵ)(u0−v0)≠0.

Suppose , note that in general

 ¯¯¯¯¯¯M(T,hw)(u)=w1sin2(2J)(1−ϵ)∞∑j=0((1−ϵ)cos2(2J))j(2u−j−1).

Choose and such that . Then the above is a convergent power series and the subtraction is well-defined:

 ¯¯¯¯¯¯M(T,hw)(u)−¯¯¯¯¯¯M(T,hw)(v)=2w1sin2(2J)(1−ϵ)∞∑j=0((1−ϵ)cos2(2J))j(u−j−v−j).

The above is a power series of the form

 f(θ)=2w1sin2(2J)(1−ϵ)∞∑j=0θj(u−j−v−j),

where has a nonzero radius of convergence and is non-constant since and . Furthermore, since we assume that , we have . Invoking [38, Theorem 3.2], there exists such that for all . This concludes the proof for separation of points. The universality of now follows from the Stone-Weierstrass Theorem.

## Appendix B Robustness against stationary Markovian noise

The quantum reservoir model proposed is robust against stationary Markovian noise. For the -th subsystem in the QR dynamics Eq. (4), a stationary Markovian noise process during some time interval , where is the time step and , can be modeled as a CPTP map for all . The -th subsystem in the QR dynamics Eq. (4) under this noise is

where is again a CPTP for and is again a density operator. That is, the universal family is invariant and remains universal under stationary Markovian noise.

## Appendix C Robustness against time-invariant readout error

The universal class of QRs is robust against time-invariant readout error whenever a linear readout function is used. Let be the computational basis for an -qubit system, with . The readout error is characterized by a measurement calibration matrix whose -th element is the probability of measuring the state given that the state is prepared in the state .

We employ the readout error correction method described in Ref. [4]. For an -qubit QR, at each time step , we execute calibration circuits with each circuit initialized in one of the computational basis elements. The outcomes are used to create the measurement calibration matrix . The readout error at time step is corrected by applying the pseudo-inverse of to the measured outcomes from the experiments.

For all experiments, the measurement outcomes are stored as the count of measuring each basis elements in . Let , where is the count of measuring at time step . Let , where is the finite-sampled approximation of for . Then we have , where

is a linear transformation. After applying readout error correction, we have

, where is the pseudo-inverse of . To optimize the readout function parameters w, collect all measurement data in a matrix so that , where is the sequence length. The linear output of the quantum reservoir computer is , where w includes the bias term . Append a corresponding row and column to to account for the bias term. Suppose the readout error is time-invariant, then for . The quantum reservoir computer output after readout error correction is . Assume that

has all rows linearly independent, then ordinary least squares yields

. Now given test data with readout error correction,

 vtestA+Bw′=vtestA+ABw=vtestBw.

Therefore, the QR predicted output and thus the universal family are invariant under time-invariant readout error.

## Appendix D Quantum circuit interpretation of the quantum reservoir dynamics

We provide an explanation for the quantum circuit interpretation of the QR dynamics given by Eq. (1) and Fig. 1(b) in the main article, with