QSW_MPI: a framework for parallel simulation of quantum stochastic walks

03/05/2020 ∙ by Edric Matwiejew, et al. ∙ 0

QSW_MPI is a python package developed for the modeling of quantum stochastic walks, a generalization of the classical random walk and continuous-time quantum walk in the Lindblad formalism. This model allows for the study of a wide range of Markovian open quantum systems subject to varying degrees of incoherent scattering. Consisting of a python interface built on parallelized Fortran libraries utilizing sparse data structures; QSW_MPI is scalable to massively parallel computers, making possible the simulation of walks on graphs with many thousands of vertices. QSW_MPI also provides explicit support for an extension of the standard quantum stochastic walk model to the study of non-Hermitian absorption and emission processes.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


  • (1) T. Oliphant, NumPy: A guide to NumPy, 2006, published: USA: Trelgol Publishing.
  • (2)

    E. Jones, T. Oliphant, P. Peterson, SciPy: Open source scientific tools for Python (2001).

  • (3) L. Dalcìn, R. Paz, M. Storti, J. D’Elia, MPI for Python: Performance improvements and MPI-2 extensions, Journal of Parallel and Distributed Computing 68 (5) (2008) 655–662. doi:10.1016/j.jpdc.2007.09.005.
  • (4) A. A. Hagberg, D. A. Schult, P. J. Swart, Exploring Network Structure, Dynamics, and Function using NetworkX, in: G. Varoquaux, T. Vaught, J. Millman (Eds.), Proceedings of the 7th Python in Science Conference, Pasadena, CA USA, 2008, pp. 11 – 15.
  • (5) J. D. Hunter, Matplotlib: A 2d graphics environment, Computing in Science & Engineering 9 (3) (2007) 90–95. doi:10.1109/MCSE.2007.55.
  • (6) A. Collette, Python and HDF5, O’Reilly, 2013.

1 Introduction

The exploration of nanoscale systems and quantum information are at the core of a new generation of technologies, including quantum computation and molecular-scale electronics. This necessitates the development of efficient simulation software capable of modelling the dynamics of these systems in their practical application i.e. when subject to decoherence. To this effort we have developed QSW_MPI edric_qsw_mpi; a package designed for the efficient simulation of Quantum Stochastic Walks (QSWs) on both workstations and massively parallel computers.

Walk based models describe the probabilistic evolution of a system consisting of discrete sites linked by a coupling potential, which together constitute a network (or graph). These include the Continuous-Time Random Walk (CTRW) and its quantum analogue, the Continuous-Time Quantum Walk (CTQW). Such models have been applied to a wide range of physical and informatic systems. For example, their natural correspondence to the tight binding model in solid state physics has seen their application to the study of quantum and classical energy transport in molecular systems zhang_forster_2016; mulken_continuous-time_2011. Elsewhere in the rapidly developing field of quantum computing, quantum-walk based algorithms have been developed which are exponentially faster than their classical counterparts.

Developed by Whitfield et al.in 2010, QSWs described a quantum walk subject to an external environment, whose decohering effects are derived axiomatically from the underlying network. The result of this is a generalization of the CTRW and CTQW, which allows for the exploration of dynamics in a continuum from the quantum to classical regimes.

Since their introduction, QSWs have been implemented experimentally govia_quantum_2017

, a recent example being their application to the design of artificial neurons

tang_experimental_2019. They have also been explored as a basis for quantum algorithms. A QSW-based variant of the PageRank search engine algorithm exhibited final distributions which broke undesirable degeneracies present in the classical version, while also speeding up the rate of convergence sanchez-burillo_quantum_2012; loke_comparing_2015.

One of the most notable applications has been the use of QSWs to describe quantum assisted transport through the light harvesting complexes of photosynthetic bacteria mohseni_environment-assisted_2008. This extended the QSW model to include the non-unitary processes of absorption into and emission from the network. These processes have since been formally introduced to the QSW model and applied to the study of transport in monomers, dimers and topologically disordered networks schijven_quantum_2014.

The established expressiveness and flexibility of the QSW model thus motivates making possible its application to systems of greater complexity. To this end, software packages for the simulation of QSWs have been developed using Mathematica and the Julia programming language falloon_qswalk:_2017; glos_qswalk.jl:_2019 falloon_reply_2019. They both offer a user friendly interface, but are limited in the simulated network size due to memory constraints and their reliance on single-process linear algebra libraries.

QSW_MPI addresses these limitations by taking a distributed memory approach to the construction of the QSW super-operator and the calculation of the system evolution via matrix exponentiation. This takes the form of subroutines contained in Fortran libraries with which the user interacts using a python interface; taking advantage of the ubiquity of the interpreted python language and the speed afforded by highly optimized Fortran compilers. This allows a user to write simulations appropriate for execution on massively parallel computers with minimal background in programming or parallel computation. An additional novel feature of this package is its explicit support for the inclusion of absorption and emission process through easily defined modifications to the underlying network structure.

This paper thus proceeds as follows. In Section 2 the mathematical framework of QSWs is introduced, along with its extension to the modeling of non-Hermitian transport processes. Section 3 provides an overview of the computational methods utilized in QSW_MPI. This is followed by an overview of the software package, with particular attention given to its installation and usage. Validation and performance of the QSW_MPI is discussed in Section 5, and concluding statements given in Section 6.

2 Theory

This section provides a self-contained overview of the mathematical formalism underpinning a QSW. The starting point of this is the definition of the CTRW and CTQW on networks, followed by an overview of the master equation approach to the description of Markovian open systems. From this the QSW master equation is then introduced, which unifies the CTRW and CTQW models under a density theoretic framework. The practical extension of this equation to the inclusion of non-Hermitian absorption and emission process is then discussed. We conclude by presenting the vectorized form of the QSW master equation. This produces a structurally simplified form of the Master equation, the numerical approximation of which is the task at hand.

2.1 Networks

A network (or ‘graph’) is defined as an object comprised of vertices (or ‘sites’) connected by a set of edges . This is represented by an adjacency matrix, ,


where describes the magnitude of connection between two vertices and falloon_qswalk:_2017. Edges of form are known as self loops with a graph containing no self loops being referred to as a ‘simple’ graph. The case where encodes directionality into the graph structure, such graphs being referred to as ‘directed’. Conversely, graphs for which all are referred to as ‘undirected’. The sum total of the outgoing edge weights from vertex ,


is termed the vertex ‘out degree’falloon_qswalk:_2017.

2.2 Continuous-Time Classical Random Walks

A CTRW describes the probabilistic evolution of a system (walker) though a parameter space as a continuous function of time schijven_quantum_2014

. Most typically, CTRWs refer to a type of Markov process. This describes a scenario where the future state of a system depends only on its current state. Heuristically, one might describe such systems as having a ‘short memory’. Under this condition a CTRW over a network is described by a system of first order ordinary differential equations,


where element of

is the probability of the walker being found at vertex

of the network, and has the solution which satisfies kampen_n._g._stochastic_2007. is the transition matrix derived from ,


where the off-diagonal elements represent the probability flow along an edge from vertex to vertex , while the diagonal elements account for the total outflow from vertex per unit time. Scalar is the system wide network transition rate falloon_qswalk:_2017.

2.3 Continuous-Time Quantum Walks

A Continuous-Time Quantum Walk (CTQW) is constructed by mapping to an -dimensional Hilbert space where the set of its vertices form an orthonormal basis. The matrix elements of the system Hamiltonian are then equal to the classical transition matrix (). In place of , the evolution of the state vector is considered, the dynamics of which are governed by the Schrödinger equation falloon_qswalk:_2017,


which has the formal solution when is time-independent111In atomic units where and .. The probability associated with vertex at time is then given by .

While Equations 3 and 5 appear superficially similar, there are a number of fundamental differences between the two processes. Firstly, describes a complex probability amplitude, meaning that its possible paths may interfere. Secondly, the Hermiticity requirement on needed to maintain unitary (probability preserving) evolution of the system dictates that must be an undirected graph falloon_qswalk:_2017; schijven_quantum_2014.

2.4 Markovian Open Quantum Systems

A density matrix,


describes a statistical ensemble of quantum states, , each with an associated probability and . The case where is non-zero for more than one is termed a ‘mixed state’ while the case of only one non-zero is termed a ‘pure state’.

Density matrices satisfy: , , (with equality holding for only pure states) and (where is a quantum operator) falloon_qswalk:_2017. Diagonal elements represent the probability density at a given vertex and are termed ‘populations’, while off-diagonal elements describe phase coherence between vertices and .

The dynamics of are given by the Liouville-von Neumann equation,


which is the density theoretic equivalent of the Schrödinger equation (Equation 5) breuer_theory_2009.

Consider a system, , coupled to an external reservoir (or ‘bath’), . The Hilbert space of is given by breuer_theory_2009,


where and are the Hilbert spaces of and . is referred to as an ‘open’ system, while is closed in the sense that its dynamics can be described unitarily. Under the conditions that the evolution of S is Markovian with no correlation between S and B at t = 0, and given of finite dimensions . The dynamics of S are described by a generalization of Equation 7: the Kossakowski-Lindblad quantum master equation breuer_theory_2009,


where is not necessarily as it may incorporate unitary dynamics in both and ; the Lindblad operators span the Liouville space and the scalars . The reduced density operator

is formed by tracing out the degrees of freedom associated with B. Equation

9 is invariant under unitary transformations of the Lindblad operators, allowing for construction of a wide range of phenomenological models schijven_quantum_2014.

It is useful to introduce the shorthand,


which are referred to as a ‘dissipator’. Inclusion of these terms results in decoherence and relaxation effects in the dynamics of , observed mathematically as decay in the off-diagonal coherences of to zero with sufficient . Importantly, this has the following consequence: processes obeying the master equation have a well defined equilibrium state, which is not the case for unitary dynamics schijven_quantum_2014.

2.5 Quantum Stochastic Walks

2.5.1 Standard Definition

A QSW on a graph is derived from Equation 10 by defining in the basis of vertex states, , and constructing a unitary Hamiltonian from the adjacency matrix, , of an arbitrary ,


and the Lindblad operators from its transition matrix (Equation 4),


where . Each then describes an incoherent scattering channel along a directed graph edge. Importantly, this allows for the encoding of directness resulting from the edge weights of into the walk’s dynamics falloon_qswalk:_2017.

Finally, a scalar decoherence parameter is introduced whitfield_quantum_2010. This allows for the model to be easily tuned to explore a continuum of mixed quantum and classical dynamics. The standard form of a QSW is then,


with denoted as and for all dissipator terms. At , Equation 13 reduces to a CTQW obeying the Liouville-von Neumann equation (Equation 7) and, at , the density-matrix equivalent of the CTRW equation (Equation 3) is obtained.

2.5.2 Extension to Absorption and Emission

The QSW model naturally facilitates the modelling of absorption and emission processes through the inclusion of Lindblad operators describing scattering channels that do not have a counterpart in the coherent Hamiltonian. The addition of these dynamics to the QSW framework have been derived formally schijven_quantum_2014. With the QSW operators in their matrix representations, this is practically achieved by forming the Lindblad operators from an augmented adjacency matrix, , containing and the additional channels as non-symmetric edge weights. Here, the absorption channels (sources) are,


and emission channels (sinks),


where with equal to plus the total number of sources and sinks, and is the absorption rate at vertex and is the emission rate at vertex . A vertex in the non-augmented network to which a source is attached is referred to as an absorption vertex, while a vertex to which an emission channel is attached is referred to as an emission vertex.

For example, consider a dimer graph shown in Figure 1 on which absorption is modeled at vertex 1 () and emission at vertex 2 (). The corresponding QSW has formed by adjacency matrix , the unaugmented

adjacency padded to maintain dimensional consistency, and its Lindblad operators derived from

, where

Figure 1: A dimer graph with a source () attached to vertex 1 and a sink () attached to vertex 2. Note that the absorption and emission channels are unidirectional.

As such, a QSW incorporating both absorptive and emissive processes can be succinctly expressed as,


with being of dimensions .

2.5.3 Vectorization of the Master Equation

Equation 17 may be recast as a system of first order differential equations through its representation in a Liouville space breuer_theory_2009. This process, termed ‘Vectorization’, makes use of the identity to obtain a form of the QSW master equation consisting of a linear sum in its Hamiltonian and Lindblad operators. Meaning that its equation of motion,


has the solution




and is related to by the mapping . is referred to as a ‘super-operator’ as its action on an operator returns another operator.

3 Computational Methods

QSW_MPI has been developed primarily for use on distributed memory systems, computational clusters consisting of multiple networked CPUs each with their own local memory. This strategy affords a much higher degree of parallelization than is possible with a threaded model whereby multiple tasks run on the same CPU with shared memory access noauthor_mpi_nodate. Parallelization was achieved using the Message Passing Interface (MPI) protocol, a well established and highly portable standard for distributed memory computation. Essentially, MPI runs multiple copies of a given program over an MPI communicator, in which each process (or node) is identified sequentially by its rank. Communication occurs between these isolated nodes via the passing of messages as instructed by directives placed in code using the MPI API.

Overall, the Fortran libraries developed for QSW_MPI consist of approximately 45 subroutines comprised of over 3000 lines of code, most of which are optimized for parallel execution. This section thus provides a high level overview of approaches and design considerations used in the development of the package. We being with a discussion of the selected matrix exponentiation methods which, by Equation 19, is the primary task at hand. This is followed by an outline of the data structures and parallelization scheme used to achieve fast and memory effcient QSW simulation.

3.1 Matrix Exponentiation

The matrix exponential is defined by a converging Taylor series


where and . Direct application of this formula is not generally practical as the rate of convergence can vary wildly. An additional challenge is that, for , grows exponentially with , meaning that multiple powers of may not be stored easily in computer memory. For this reason, it is preferable to directly approximate , thus avoiding the need to store intermediate matrices moler_nineteen_2003.

Of the algorithms used to compute , perhaps the most common for sparse matrices are the Krylov subspace techniques. These proceed by approximating the -dimensional problem in a smaller -dimensional subspace of , on which efficient dense matrix exponentiation methods may then be used sheehan_computing_2010.

Other popular techniques are based on polynomial expansions of . For example, the Chebyshev approximation method is based on the Chebyshev expansion of the matrix exponential about the point , where and

are the eigenvalues of

with the smallest and largest real values WangScholz1998; MidgleyWang2000; izaac_computational_2018.

This method is popular in quantum simulation as it offers fast and reliable convergence for Hermitian matrices moler_nineteen_2003; izaac_computational_2018; izaac_pyctqw:_2015. It also has a lower memory overhead than the Krylov methods, not requiring storage of basis vectors and ancillary matrices sheehan_computing_2010. However, if is non-Hermitian with eigenvalues off the negative real axis (of the complex plane), the expansion can produce a poor approximation of moler_nineteen_2003. This is shown in Figure 3, where complex eigenvalues resulting from inclusion of the non-unitary Lindblad operators in result in poor numerical stability proportional to and .

Figure 3: Norm-wise error resulting from application of the Chebyshev approximation method as given in Ref. izaac_computational_2018 to simulate a QSW on a directed dimer graph with target numerical error less than . Error was determined by comparison with the result given by the Mathematica MatrixExp function.

A third approach, ‘scaling and squaring’, takes advantage of the relationship,


to reduce the number of terms in a series expansion of needed to satisfy a numerical error less than al-mohy_computing_2011. Let denote a Taylor series expansion to terms. Then,


Historically, this method has not been favoured due to difficulties in the selection of the optimal and parameters moler_nineteen_2003. However, the scaling and squaring algorithm developed by Al-Mohy and Higham al-mohy_computing_2011 achieves this reliably via backwards error analysis. This method shares the advantages of the Chebyshev approximation over the Krylov methods. It has also been experimentally demonstrated as being numerically stable for both Hermitian and non-Hermitian matrices, with computational performance comparable to the Chebyshev approximation in both cases al-mohy_computing_2011. This method is also well tested; it forms the basis for sparse matrix exponentiation in the widely used SciPy Python library jones_scipy:_2001. Despite its perceived success, there has been little explicit discussion of its use in the literature of quantum simulation. This may be due in part to it not being currently implemented in widely used parallel numerical libraries such as PetSc balay_petsc_2019. As such, this method has been selected for its purported advantages, stability with non-Hermitian matrices and novelty in the context of computational quantum physics.

3.2 Sparse operator representation

As all of the terms in

involve a Kronecker product with an identity matrix or sparse Lindblad operator,

is highly sparse. In fact, while the total number of matrix elements in is , the number of non-zero entries is of the order . This means that there is a sparsity of over 90% for graphs with 20 vertices or greater. Therefore, the scope of possible simulations can be greatly increased by representing using a sparse matrix format. For this, the Compressed Sparse Rows (CSR) format has been selected. An advantageous property of the CSR datatype is that it provides for the efficient access of matrix rows, as each row has its non-zero column indicies and values stored as a contiguous sub-array. This quality distinguishes CSR from other common sparse matrix formats and allows for the efficient computation of matrix-vector products: the fundamental algebraic operation in matrix exponentiation.

3.3 Parallel Partitioning Scheme

Figure 7: A directed square lattice graph with an adjacency matrix, , of non-zero structure as shown in (b). The orange squares denote the relative magnitude of each value. In (c) the non-zero structure of the resulting QSW super-operator, , with . Red and blue intensity depicts the relative magnitude of the non-zero values, with red denoting a real valued entry and blue denoting a complex valued entry. Green vertical lines depict the parallel row-wise partitioning of and the vectorized density operator, , for a MPI communicator consisting of four nodes.

The structure of resulting from a directed lattice graph is shown in Figure 7. It displays a high degree of structural symmetry, with a block-circulant structure along its diagonal flanked by diagonal striping. However, it is important to note that, aside from the case of , is not block circulant as generally

. Hence, efficient eigendecomposition techniques, which take advantage of analytical solutions for the eigenvalues and eigenvectors of circulant block matrices, do not provide a general method for obtaining


Despite this, these structural properties are easily exploited. Firstly, the diagonal structure results in each row being primarily dependant during matrix-vector multiplication on vector elements within the ‘vicinity’ of its row index. This motivates adoption of the row-wise partitioning scheme depicted in Figure 7. This ensures that the parallel processes communicate primarily with nodes containing vector elements adjacent to their local partition, thus limiting inter-process communication.

Secondly, the block structure of means that the super-operator can be efficiently constructed directly from the CSR representations of and , avoiding the need to form intermediate Kronecker products or store each Lindblad operator separately. With this approach, the construction of is independent at each node following receipt of , and is consistently stored in distributed memory.

As is found through repeated action of on a vector, MPI inter-process communication in each multiplication cycle is further reduced by the determination of the specific vector elements that need to be sent and received with each multiplication on formation of . This has the additional benefit of ensuring that the sparse matrix multiplication algorithm implemented in QSW_MPI acts on contiguous row sub-arrays of the CSR values and column index arrays, as is typically the case for non-parallel sparse matrix multiplication kepner_graph_2011.

4 Package Overview

The QSW_MPI software package brings together the computational methods described in Section 3 through a Python interface, thus providing a user friendly means of high performance QSW simulation. This also provides additional support for the storage, analysis and basic visualization of the simulation results using external Python packages. While targeted towards distributed memory systems, QSW_MPI is highly portable due to its minimal dependence on external parallel libraries such as PetSc, which may require significant user configuration. It may be compiled on any system with an MPI implementation, HDF5 file format support, and a recent Python installation balay_petsc_2019.

The extra functionally provided by the Python interface is built on the widely supported NumPy, SciPy, NetworkX, Matplotlib, MPI for Python and h5Py packages. NumPy and SciPy are numerical libraries which are used in QSW_MPI to interface with the Fortran subroutines and provide in-python support for the CSR datatype jones_scipy:_2001; oliphant_numpy:_2006. NetworkX provides a rich environment for the construction and analysis of networks hagberg_exploring_2008. It is used in conjunction with Matplotlib to provide basic support for the visualization of simulation results, and the structure of networks is augmented with sources and sinks hunter_matplotlib:_2007. MPI for Python provides Python support for MPI directives and the passing of MPI communicator handles between the Python interface and Fortran subroutines dalcin_mpi_2005; dalcin_mpi_2008; dalcin_parallel_2011.

A complete explanation of the methods contained in QSW_MPI exceeds the scope of this document, further documentation and installation instructions are included with the package and are additionally hosted on Read the Docs Matwiejew. This section provides usage examples which demonstrate the most significant features of QSW_MPI and typical simulation workflows.

4.1 Usage Example

4.1.1 Basic QSW simulation

As MPI based programs are parallelized through running multiple instances of the same program, they do not support the use of interactive environments such as the Jupyter notebook. The following code is to be placed in ‘example.py’, which is then run by issuing the terminal command:

bash mpirun -N n python example.py

where n is a user specified parameter equal to the number of MPI nodes. Another possible workflow is to save the simulation results to disc and carry out visualization and analysis interactively using the QSW_MPI python modules, exclusive of MPI.py. Steps necessary to achieve this are covered in Section 4.1.2.

In this example a QSW is simulated on a directed wheel graph, shown in Figure 11 (a). First, import the required modules:

python from mpi4py import MPI import qsw_mpi as qsw import numpy as np import networkx as nx

Next, set up the MPI environment. This involves creating an MPI communicator object and defining the process rank:

python mpi_comm = MPI.COMM_WORLD rank = mpi_comm.Get_rank()

For this example, Networkx is used to create a wheel graph at each rank. The graph is then converted to a SciPy CSR matrix:

python vertices = 4 Graph = nx.wheel_graph(vertices) G = nx.to_scipy_sparse_matrix(Graph, dtype=np.complex128)

and made directed by assigning random weights between to the edges of G. Note the use of np.random.seed(1): it ensures that each MPI process generates the same sequence of random numbers:

python np.random.seed(1) for i in range(G.count_nonzero()): G.data[i] = G.data[i]*np.random.random()

From G, a directed Laplacian is constructed as per Equation 4. This is used to obtain the Lindblad matrix and then symmetrised to obtain the Hamiltonian as defined by Equation 11: python gamma = 1 H = qsw.operators.graph(gamma, G) L = qsw.operators.site_lindblads(H) qsw.operators.symmetrise(H)

A walk object is then instantiated; upon creation, it constructs the distributed super-operator and determines its 1-norm power series.

python omega = 0.1 wheel_graph = qsw.MPI.walk( omega, H, L, mpi_comm)

The initial state of the walker, , is then defined and passed to the walk object which vectorizes and partitions the density matrix over the MPI communicator. Note that Python is 0-indexed, meaning that the below code specifies an initial state of as per Equation 6.

python rho_0 = np.zeros( G.shape, dtype=np.complex128) rho_0[0,0] = 1 wheel_graph.initial_state(rho_0)

A single time point is obtained through use of the ‘step’ method. ‘target’ defines the rank to which the will be returned.

python t = 5 rho_t = wheel_graph.step(5, target = 0)

A time series is obtainable using the ‘series’ method.

python t1 = 0 t2 = 5 steps = 50 rho_t_series = wheel_graph.series( t1, t2, steps, target=0)

To analyse the results, it is essential to act on ‘rho_t’ and ‘rho_t_series’ from the target rank: they will not be defined elsewhere. This is achieved by simply enclosing these tasks in an ‘if’ statement which checks that the process rank is equal to the target rank. Analysis may proceed by acting directly on the ‘rho_t’ and ‘rho_t_series’ numpy arrays, or through use of the QSW_MPI measurements sub-module. For example, the vertex populations are obtained by

python if rank == 0: pop_step = qsw.measure.populations( rho=rho_t) pop_series = qsw.measure.populations( rho=rho_t_series)

where, as expected, the last element of ‘pop_series’ is equal to ‘pop_step’ and the populations at each time-step sum to 1.

python print(pop_step) print(pop_series[50]) print(np.sum(pop_step))

[fontsize=]pycon [0.25851439 0.17227131 0.427752 0.1414623 ] [0.25851439 0.17227131 0.427752 0.1414623 ] 0.9999999999999978

Inter-vertex coherence can likewise be extracted with the coherences method. The obtained population and coherence measurements are shown in Figures 11 (c) and (d). These plots can be reproduced with use of the ’population_lines‘ and ’coherence_lines‘ methods provided in the qsw_mpi.plot module.

python vertex_pairs, cohs =  qsw.measure.coherences(rho=rho_t_series)

Using this ‘walk’ object, multiple simulations may be carried out with different and values of . QSW_MPI supports a small number of default initial states; for example, ‘even’ creates in an equal superposition across all vertices.

python wheel_graph.initial_state(’even’) wheel_graph.set_omega(0.5)

4.1.2 Introducing Dissipators

This model can now be expanded to include absorption and emission processes, with the connection of ‘sources’ and ‘sinks’ to the graph following the conventions defined in Equations 14 and 15. These are described by tuples of two NumPy arrays, the first giving the point of connection and the second the absorption or emission rates at those vertices, noting again that Python is 0-indexed.

python sources = ([0], [0.7]) sinks = ([3], [0.8])

The resulting graph is shown in Figure 15 (a). As these additions result in a structural change in and , it is necessary to create a new walk object.

python wheel_graph_augmented = qsw.MPI.walk( omega, H, L, mpi_comm, sources = sources, sinks = sinks)

When running a simulation either remotely or for a large system, it is more convenient to save the output directly to disc. To do so, first create a .qsw file to contain the results of walk carried out using the ‘wheel_graph_augmented’ object. This walk makes use of another default initial state: ‘sources’, which distributes

over all defined source vertices. The ‘chunk_size’ parameter controls the number of time-steps between each write to disc. By default, chunked saving is automatically enabled for series with an estimated output size greater than 4 GB.

python wheel_graph_augmented.File( ’usage_example’, action = ”w”) wheel_graph_augmented.initial_state(’sources’) wheel_graph_augmented = qsw.MPI.walk( omega, H, L, mpi_comm, sources = sources, sinks = sinks) wheel_graph_augmented.series( t1, t2, steps, save = True, name = ”series 1” chunk_size = 5)

Vertex populations and coherences can be extracted directly from the .qsw file, thus avoiding the need to load the entire series into memory. The results as shown in Figure 15 (b) and (c). With a higher number of vertices an extra plotting dimension can add some clarity. These plots can be reproduced via the ’population_bars‘ and ’coherence_bars‘ methods provided in the qsw_mpi.plot module.

python if rank == 0: file = qsw.io.File(’usage_example’) pop_series = qsw.measure.populations( rho=rhot) vertices_pairs, cohs = qsw.measure.coherences( rho=rho_t_series)

Later, further simulations with the same system can be conducted by initializing ‘wheel_graph_augmented’ using the ‘usage_example.qsw’ which contains all of the arrays necessary to reconstruct the same super-operator and a record of the used for saved walks.

python wheel_graph_augmented.File( omega, ’usage_example’, mpi_comm)

Figure 11: The vertex populations for a QSW on the 4 vertex star graph (a) are shown in (b) and its inter-vertex coherence in (c).
Figure 15: The same graph as in Figure 11 with an absorption site at vertex 0 and emission site at vertex 3 is shown in (a), with the vertex populations for a QSW over the same period shown in (b) and the inter-vertex coherence in (c). Note the exponential decay of population at the source vertex 4 and its accumulation at sink vertex 5 over the duration of the walk.

5 Validation and Performance

The following tests were performed on four sets of graphs generated via Networkx. These included a set of line graphs, square lattice grids and a random graph with edges with 4 to 4096 vertices. This corresponded to a maximum size of approximately million. The final set consisted of fully connected dense graphs ranging from 4 to 441 vertices. In each case the graph edges were assigned a random edge weighting in the interval (0,1].

Figure 19 shows a comparison of results obtained using QSW_MPI and the Mathematica package QSWalk.m, which relies on the Mathmatica MatrixExp function for walk propagation falloon_qswalk:_2017. For graphs with 100 vertices taken from the above sets, the scaling and squaring algorithm achieved the specified target precision over an interval containing 1000 time-steps, as shown in Figure 19. As an additional check, the populations of were totalled for each returned result during the overall benchmarking process. Norm preservation was consistently observed within the limits of the target precision.

Figure 19: Difference between results calculated to double precision using QSW_MPI and QSWalk.m. Green and blue denotes the real and imaginary components of respectively. Each walk was carried out on a graph of 100 vertices for , and . (a) Line graph, (b) Square Lattice and (c) Random graph.
Figure 24: Performance of QSW simulation packages on a desktop computer. For all walks and . The QSWalk.jl and QSWalk.m packages consist of single-threaded programs, while QSW_MPI running with an four MPI nodes (one for each available CPU core). (a) shows performance on the set of line graphs, (b) square grid graphs, (c) random graphs with edges and (d) complete graphs. System specifications: Intel i7-6700K 4 cores at 4.0 GHz and 16 GB RAM at 3200 MHz. The red lines in sub-figures (b) and (c) highlight a change in the scaling of the corresponding package due to a transition from in RAM to RAM + disc storage.

The package performs well compared to available desktop alternatives when run on the same consumer-level hardware. As QSW_MPI, QSWalk.m and QSWalk.jl make use of sparse libraries, performance should ideally scale with the number of non-zeros rather then the size of falloon_qswalk:_2017; glos_qswalk.jl:_2019. This is shown in Figure 24, where for the line, square grid and random graph sets, QSW_MPI is clearly the fastest option. This is due in part to the higher memory efficiency afforded by construction method used by QSW_MPI). In Figure 24 (b), QSWalk.jl deviates from linear scaling at approximately non-zeros as it begins to access on disc swap space. At that point, it requires s to generate as opposed to the s needed by QSW_MPI. The same effect is observed in Figure 24 (c) for QSW_MPI where, at approximately non-zeros, QSW_MPI deviates from linear scaling. However, this occurs far later than with the other two packages. As shown in Figure 24 (d), QSWalk.jl is the best performer for the case of complete graphs. Parallel operations in QSW_MPI are least efficient in this instance as each row of has the highest possible number of row elements meaning that the partitioning scheme has the highest possible MPI inter-process connectivity. In addition, QSWalk.jl utilizes a matrix exponentiation module which is able to switch to dense matrix multiplication methods given a sufficient number of non-zeros glos_qswalk.jl:_2019. However, the scalability of QSW_MPI means that it is capable of higher performance with a modest increase in the number of available CPU cores.

Figure 25: Proportional speedup as a function of CPU cores for QSW_MPI running on the Pawsey Supercomputing Center’s Magnus system. Line graph, 4096 vertices. Square Lattice 4096 vertices. Random graph, 2209 vertices. Complete graph, 196 vertices. System specifications: XC40 Series Supercomuter, 2 Intel Xeon E5-2690V3 12 cores at 2.6 Ghz and 64 GB RAM per node.

This is confirmed by tests run on the Pawsey Supercomputing Centre’s Magnus system noauthor_pawsey_nodate, where QSW_MPI is observed to scale well with the number of CPU cores, as shown in Figure 25. With use of Amdahl’s law as an approximate measure of the total parallelism, the proportion of parallel operations for the line, square grid, random graph and complete graph were 0.96, 0.91, 0.88 and 0.85. Unsurprisingly, parallelism is highest for the line graph, for which there is the lowest number of elements per row in . The case of complete graphs serves as a measure of the lower bound of QSW_MPI performance.

6 Conclusion

Quantum systems often feature a large number of degrees of freedom, to which the size of their Hilbert space has an exponential relationship. The main challenge of developing QSW_MPI was to successfully utilize memory and processor efficient methods from the field of high performance computing to render the simulation of such systems practical on currently available hardware. An additional challenge arose in non-Hermitian structure of super-operator of the quantum stochastic walk model: methods common to most quantum simulations were not directly applicable as they were unstable with respect to the non-unitary dynamics.

These challenges were overcome by use of the Message Passing Interface standard of distributed computing to develop an implementation of a relatively recent method of matrix exponentiation, stable for both Hermitian and non-Hermitian matrices. As far as we know, this is the first implementation of this specific algorithm in the context of distributed memory computing. In order to ensure memory efficiency, sparse data-structures were employed in the creation of the necessary operator matrices and during basic linear algebra operations. This resulted in software which out-performs existing packages to the extent of many thousands of basis states when simulating sparsely connected systems, on both distributed supercomputer clusters and readily available consumer computers. Key to this performance increase was the detailed analysis of the structure of the vectorized quantum stochastic walk super-operator. While somewhat mundane from a theoretical or technical standpoint, the resulting performance highlights the value of ‘tailor-made’ approaches compared to use of ‘off the shelf’ software libraries when simulating extensive systems.

As QSW_MPI implements its own basic linear algebra subroutines, its parallel performance may not be optimal as compared to an implementation utilising pre-existing parallel libraries such as PetSc. However, the installation of such libraries is not always straightforward; thus QSW_MPI, easily installed from the Python PyPI package repository and dependent only on packages included on most Unix-based operating systems, gains a good deal of portability due to this omission. Nevertheless, it would be beneficial to extend this package to allow for the use of external linear algebra libraries as a user-specified compile time option. This would be relatively easy to achieve due to the modularity of the package’s underlying Fortran libraries.

In its relatively short period of existence, the quantum stochastic walk model has proven to be highly versatile. Release of the QSW_MPI package to the research community increase its scope to the study of larger and more expressive systems.


This work was supported by the Muriel and Colin Ramm Postgraduate Scholarship in Physics, and resources provided by the Pawsey Supercomputing Centre with funding from the Australian Government and the Government of Western Australia.