1 Introduction
Hamiltonian simulation is the problem of simulating the dynamics of quantum systems. Being the original motivation for quantum computers [feynman1982simulating, feynman1986quantum], it has been shown to be BQPhard and is hence conjectured not to be classically solvable in polynomial time, since such an algorithm would imply an efficient classical algorithm for any problem with an efficient quantum algorithm, including integer factorization [shor1999polynomial]. Following the first proposal by Lloyd [lloyd1996universal] for local Hamiltonians, Aharonov and TaShma gave an efficient algorithm for sparse Hamiltonians [aharonov2003adiabatic]. Subsequently many algorithms have been proposed which improved the runtime [berry2007efficient, berry2009black, berry2017exponential, childs2010relationship, childs2012hamiltonian, poulin2011quantum, wiebe2011simulating, low2016hamiltonian, low2017hamiltonian], mostly in the so called sparse Hamiltonian model, and have recently culminated in a few works with optimal dependence on all (or nearly all respectively) parameters for sparse Hamiltonians [berry2015hamiltonian, low2017optimal].
In this work, we consider a different model for obtaining information about the Hamiltonian, which is based on the assumption that the entries of the Hamiltonian are stored in a novel memory architecture inspired by [kerenidis2016quantum]. A quantum access to this memory model allows us to efficiently prepare states that encode the rows of the Hamiltonian. Using the ability to prepare these states in combination with a quantum walk [berry2015hamiltonian], we give a Hamiltonian simulation algorithm that runs in time for nonsparse Hamiltonians of dimensionality , where hides polylogarithmic factors.
In [kerenidis2016quantum]
, a quantum algorithm for recommendation systems was introduced based on an explicit description of a memory structure, which resulted in a fast quantum algorithm for estimating singular values for any real matrix. This fast singular value estimation algorithm is used in a quantum algorithm for solving dense linear systems
[wossnig2018quantum]. The memory structure in [kerenidis2016quantum, wossnig2018quantum]allows us to prepare states that corresponds to row vectors. In the problem considered here, i.e., Hamiltonian simulation, the main hurdle to use this memory structure is that we need to efficiently prepare states (that are different from the states in
[kerenidis2016quantum, berry2009black, berry2015hamiltonian]) from row vectors; these states in particular encode more complicated information for each row. The precise definition of the memory structure, i.e., the model in which we obtain our result, is presented in Definition 1 (in Sec. 2). The memory structure appears to be reasonable since it is a quantum extension of the classical random access memory. A quantum computer with access to this model can efficiently prepare quantum states that encode information about the rows of the dimensional Hamiltonian, i.e., with time complexity (circuit depth) of . This is proven in Lemma 1.Using the efficient state preparation, we implement the linear combination of quantum walks in order to simulate the time evolution for nonsparse Hamiltonians with only polylogarithmic dependency on the precision. The main result of this work is summarized in the following theorem, which we prove in Sec. 3.
Theorem 1 (Nonsparse Hamiltonian Simulation).
Let (with ) be a Hermitian matrix stored in the memory as specified in Definition 1. There exists a quantum algorithm to simulate the evolution of for time and error with time complexity (circuit depth)
(1) 
where .
In this work, we use the notation to denote the induced 1norm (i.e., maximum absolute columnsum norm), defined as ; we use to denote the spectral norm, and use to denote the max norm, defined as .
By the fact that (see [CK10], alternatively, a more generalized version of this relation is shown in the Appendix A), we immediately have the following corollary.
Corollary 1.
Let (where ) be a Hermitian matrix stored in the memory as specified in Definition. 1. There exists a quantum algorithm to simulate the evolution of for time and error with time complexity (circuit depth)
(2) 
Remarks.

In Theorem 1, the circuit depth scales as . However, the gate complexity could be as large as in general because of the addressing scheme which allows for quantum access to classical data stored in the data structure as specified in Definition 1. If some structure of is promised (e.g., the entries of repeat in some pattern), the addressing scheme can be implemented efficiently.
1.1 Related Work
It is important to clearly distinguish our result from other works on Hamiltonian simulation since we use a different framework. Previous quantum algorithms are based on access models that give constantsize local terms of a Hamiltonian (i.e., local Hamiltonian specification), a linear combination of tensor products of Paulis that form a Hamiltonian (i.e., linear combination of unitaries specification), or an oracle which allows for dataaccess via a blackbox (i.e., sparse specification, or blackbox specification). For nonsparse Hamiltonians, a suitable model is the blackbox specification: querying the oracle with an indexpair
returns the corresponding entry of the Hamiltonian , i.e.,(4) 
where denotes the bitwise XOR. With access to the blackbox Hamiltonian, simulation with error can then be provably performed with query complexity for dense Hamiltonians [berry2009black], and it was also empirically observed in [berry2009black] that for several classes of Hamiltonians even a complexity of queries can be achieved. Using an efficient implementation of the query oracle, e.g. a qRAM as presented in [giovannetti2008quantum], then it is possible to achieve the time scaling given by for certain classes of Hamiltonians. Whether this is generally possible for all Hamiltonians was left as an open problem.
We note that a direct comparison between our algorithm and previous works, such as [berry2009black, berry2015hamiltonian], is not appropriate due to the difference in the memory models used. The results in [berry2009black, berry2015hamiltonian] are based on the blackbox oracle model, or the sparse Hamiltonian model, while in our work, we assume that the entries of the Hamiltonian are stored in the data structure which we explicitly give in Definition 1. In order to perform Hamiltonian simulation we prepare here a state of the form of Eq. (6) with time complexity (circuit depth) which allows us to implement a quantum walk operator with timedependence on , which is upper bound by . The mechanism is similar to the quantum walk for Hamiltonian simulation presented in [berry2009black], where the state preparation is done using amplitude amplification. However, in [berry2009black] the cost for preparing that state with blackbox access to the Hamiltonian incurs a cost of in time.
Given access to a sparse Hamiltonian oracle, the circuit depth of the blackbox Hamiltonian simulation is given by as shown in [berry2009black, berry2015hamiltonian]. When is nonsparse, their results imply the scaling . In applications where is a measure of cost, our result has no advantage against theirs, as the relation implies . However, in the case where is a measure of cost, as assumed e.g. in [harrow2009quantum, childs2017quantum], our result has a quadratic improvement in the dimensionality dependence, as the relation implies . Here we explicitly show that our memory model can perform the blackbox Hamiltonian simulation and hence is at least as strong as the blackbox model. Whether the reverse direction is possible or not is left open for future work.
1.2 Applications
One immediate application of simulating nonsparse Hamiltonians is unitary implementation: given access to the entries of a unitary , the objective is to construct a quantum circuit to approximate this with precision . As proposed in [berry2009black, jordan2009efficient], unitary implementation can be reduced to Hamiltonian simulation by considering the Hamiltonian
(5) 
and the fact that
. If the entries of a unitary matrix can be accessed by a blackbox query oracle, it is shown in
[berry2009black] that this unitary operator can be implemented with queries to the black box. Assume the entries of are stored in a data structure as in Definition 1. Using the same reduction to Hamiltonian simulation, our Hamiltonian simulating algorithm implies an implementation of with time complexity (circuit depth) .Simulating nonsparse Hamiltonians can also be used as subroutines for solving linear systems of equations for nonsparse matrices. The essence for solving a linear system is to apply on , assuming here for simplicity that is entirely in the columnspace of . When is sparse, it is shown in [childs2017quantum] that can be approximated as a linear combination of unitaries of the form . An efficient quantum algorithm for Hamiltonian simulation such as [berry2015hamiltonian] can then be used as a subroutine, so that this linear system can be solved with gate complexity . If is nonsparse, their algorithm scales as . Based on the memory model in [kerenidis2016quantum], a quantum algorithm for solving linear systems for nonsparse matrices was described in [wossnig2018quantum], with time complexity (circuit depth) . If we assume a similar memory structure as in [kerenidis2016quantum, wossnig2018quantum], using our Hamiltonian simulation algorithm, together with the linear combinations of unitaries (LCU) decompositions in [childs2017quantum], we have a quantum algorithm for solving linear systems for nonsparse matrices with time complexity (circuit depth) , which is an exponential improvement in error dependence compared to [wossnig2018quantum].
In the following we first define our memory model, then describe the algorithm in detail and finally prove the main results. We then finish with a summary of this work and a discussion of the benefits and possible drawbacks of our algorithm.
2 Data structure and quantum walk
We first give a precise definition of the memory structure that holds the entries of the Hamiltonian, and then show how this memory structure can be used to prepare states that will allow us to perform fast Hamiltonian simulation even for dense matrices.
Definition 1 (Data Structure).
Let be a Hermitian matrix (where ), and . Each entry is represented with bits of precision. Define as an array of binary trees for . Each corresponds to the row , and its organization is specified by the following rules.

The leaf node of the tree stores the tuple corresponding to the indexentry pair .

For the level immediately above the bottom level, i.e., the leaves, the data stored in any node is determined as follows: suppose the node has two children storing data , and respectively. (Note that and are complex numbers.) Then the tuple that is stored in this node is given by .

For an internal node above the parentsoftheleaves level, we calculate the data stored in it by the following rule: suppose the node has two children that store tuples , and , respectively. Then this internal node stores the tuple .
Note that for each binary tree in the above data structure, the tuple stored in an internal (nonleaf) node consists of two real numbers, while for a leaf node, the tuple stored in it consists of one complex number and one real number. The root node of stores the tuple .
Using this memory model, we can efficiently perform the mapping described in the following technical lemma for efficient state preparation.
Lemma 1 (State Preparation).
Let be a Hermitian matrix (where ) stored in the memory as specified in Definition 1. Each entry is represented with bits of precision. Then the following holds

Let . A quantum computer that has access to the data structure can perform the mapping
(6) with time complexity (circuit depth) , where , and the squareroot satisfies .

The size of the memory containing all complex entries is .
In order to perform the mapping we will need the following Lemma, which enables us to efficiently implement the conditional rotations with complex numbers.
Lemma 2.
Let and let be the bit finite precision representation of , and , respectively. Then there exists a unitary that performs the following mapping:
(7) 
Moreover, can be implemented with
1 or 2qubit gates.
Proof.
Define as
(8) 
where is the Pauli matrix.
To implement , we use one rotation controlled on each qubit of the first register, with the rotation angles halved for each successive bit. The other two factors of can be implemented in a similar way. Therefore, can be implemented with 1 or 2qubit gates. ∎
Now we are ready to prove Lemma 1.
Proof of Lemma 1.
We first describe the construction and size of the memory structure. Our structure is similar to the model presented in [kerenidis2016quantum] but requires a different connectivity.

The qRAM is built from binary trees and we start with an empty tree.

When a new entry arrives we create or update the leaf node in the tree , where the adding of the entry costs time, since the depth of the tree for is at most . Since the path from the root to the leaf is of length at most (under the assumption that ), we have furthermore to update at most nodes, which can be done in time if we store an ordered list of the levels in the tree.

The total time of updating the tree with a new entry is given by .

The memory requirements for entries are given by as for every entry at least nodes are added and each node requires at most bits.
With this memory structure, we can then perform the mapping specified in Eq. (6), with the following steps. For each , we start from the root of . Starting with the initial state , first apply the rotation (according to the tuple stored in the root node) on the last register to obtain the state
(9) 
Then a sequence of conditional rotations is applied on each qubit of the second register to obtain the state as in Eq. (6). At each level of the binary tree , a query to the data structure is made to load the data (stored in the tuple of each node) into a register in superpositions. Then the rotation angles will be determined by calculating the square root and trigonometric functions on the output of the query: this can be implemented with 1 and 2qubit gates based on methods using Taylor series and long multiplication. Then the conditional rotation is applied by the circuit described in Lemma 2, and the gate complexity for the conditional rotation is . There are levels, so the gate complexity excluding the implementation of the oracle is . To obtain the quantum access to the classical data structure, a quantum addressing scheme is required. One addressing scheme described in [giovannetti2008quantum] can be used. Although the circuit size of this addressing scheme is for each , its circuit depth is . Therefore, the time complexity (circuit depth) for preparing the state in Eq. (6) is .
We use the following rules to determine the sign of the squareroot of a complex number: if is not a real negative number, we write (for and ) and take ; when is a real negative number, we write and take . With this recipe, we have . ∎
An example of the state preparation procedure can be demonstrated in Fig. 1. In this example, the initial state (omitting the first register) is . Let . Apply the first rotation, we obtain the state
(10)  
(11) 
Then apply a rotation on the first qubit of the first register conditioned on the last register, we obtain the state
(12) 
Then apply a rotation on the second qubit of the first register conditioned on the first qubit of the first register and last register, we obtain the state
(13)  
(14)  
(15)  
(16) 
as required.
Based on the memory structure specified in Definition 1 and the efficient state preparation in Lemma 1, we construct a quantum walk operator for as follows. First define the isometry as
(17) 
with and
(18) 
where . Let be the swap operator that maps to , for all and . We observe that
(19) 
where the second equality is ensured by the choice of the squareroot as in the proof of Lemma 1. This implies that
(20) 
The quantum walk operator is defined as
(21) 
A more general characterization of the eigenvalues of quantum walks is presented in
[szegedy2004quantum]. Here we give a specific proof on the relationship between the eigenvalues of and as follows.Lemma 3.
Let the unitary operator be defined as in Eq. (21), and let be an eigenvalue of with eigenstate . It holds that
(22) 
where
(23)  
(24) 
Proof.
By the fact that and , we have
(25) 
In order for this state being an eigenstate, it must hold that
(26) 
and the solution is
(27) 
∎
3 Linear combination of unitaries and Hamiltonian simulation
To see how to convert the quantum walk operator to the Hamiltonian simulation, we consider the linear combination of unitaries
(28) 
where is the generating function for the Bessel function, and the second equality follows from the fact that . The eigenvalues of is
(29) 
where the first equality follows from (9.1.41) in [abramowitz1964handbook], and the second equality follows from Eq. (24) and the fact that . Now we consider an approximation of Eq. (28):
(30) 
Note that the eigenvalues of are
(31) 
Note that each eigenvalue of does not depend on as .
To bound the error in this approximation, we have the technical lemma, and the proof is shown in Appendix B.
Lemma 4.
Let and be defined as above. There exists a positive integer satisfying and
(32) 
such that
(33) 
In the following, we provide technical lemmas for implementing linear combination of unitaries. Suppose we are given the implementations of unitaries , , …, , and coefficients . Then the unitary
(34) 
can be implemented probabilistically by the technique called linear combination of unitaries (LCU) [kothari2014efficient]. Provided ,
can be implemented with success probability
. To achieve this, we define the operation, which is denoted by , as(35) 
We summarize the probabilistic implementation of in the following lemma, whose proof is shown in Appendix B.
Lemma 5.
Let be defined as above. If , then there exists a quantum circuit that maps to the state
(36) 
where . Moreover, this quantum circuit uses applications of and 1 and 2qubit gates.
Let be the quantum circuit in Lemma 5, and let be the projector defined as . We have
(37) 
If is a unitary operator, one application of the oblivious amplitude amplification operator implements with certainty. However, in our application, the unitary operator implements an approximation of in the sense that
(38) 
with . The following lemma shows that the error caused by the oblivious amplitude amplification is bounded by , and the proof is given in Appendix B.
Lemma 6.
Let the projector be defined as above. If a unitary operator satisfies where . Then .
Now we are ready to prove Theorem 1
Proof of Theorem 1.
The proof we outline here follows closely the proof given in [berry2015hamiltonian]. The intuition of this algorithm is to divide the simulation into segments, with each segment simulating . To implement each segment, we use the LCU technique to implement defined in Eq. (30), with coefficients . When , and Lemmas 5 and 6 can be applied. By Eq. (29) and using Lemma 4, set
(39) 
and we obtain a segment that simulates with error bounded by . Repeat the segment times with error , and we obtain a simulation of with error bounded by . We have
(40) 
By Lemma 5, each segment can be implemented by application of and 1 and 2qubit gates, as well as the cost for computing the coefficients for . The coefficients can be classically computed using the methods in [british1960bessel, olver1964error], and the cost is times the number of bits of precision, which is . The cost for each is times the cost for implementing the quantum walk . By Lemma 1, the state in Eq. (18) can be prepared with time complexity (circuit depth) , where is the number of bit of precision. To achieve the overall error bound , we choose . Hence the time complexity for the state preparation is , which is also the time complexity for applying the quantum walk . Therefore, the time complexity for one segment is
(41) 
Considering segments, the time complexity is as claimed. ∎
4 Discussion
We presented a quantum Hamiltonian simulation algorithm that provably runs in time for nonsparse Hamiltonians, which gives a polynomial speedup compared to previous results. In order to do so, we required access to a seemingly more powerful memory model for state preparation that can prepare states that encode the columns and rows of the Hamiltonian in logarithmic time. Our technique for Hamiltonian simulation combines ideas from linear combination of quantum walks and an efficient memory model which prepares a special states to provide the improved performance.
We note that the used memory structure may require a large overhead in practice, if the data is not already stored in it. Furthermore it might be hard to implement such a memory structure physically due to the exponential amount of quantum resources[aaronson2015read, adcock2015advances, ciliberto2018quantum]. However, if the Hamiltonian is highly structured (i.e., the entries repeat in some pattern), the memory model can be efficiently implemented. Another potential point of criticism is the required error rate of such a device, as some computations will require an error rate per gate of to retain a feasible error rate for applications [arunachalam2015robustness]. Whereas, not all computations might need such low error rates [arunachalam2015robustness] and hence the feasibility of our algorithm as a subroutine in an explicit algorithm must be further validated experimentally. To summarize, our algorithm inherits many problems of Grover’s search algorithm for unordered classical data.
However, if the above mentioned caveats can be overcome, our algorithm still supplies a polynomial speedup over known quantum algorithms for Hamiltonian simulation and furthermore allows for a polynomial speedup in comparison with the best known classical algorithms for several practical problems, as for example linear regression or linear systems.
5 Acknowledgement
We thank Richard Cleve and Simone Severini for the discussion and comments on this project. CW acknowledges financial support by a David R. Cheriton Graduate Scholarship. LW acknowledges financial support by the Royal Society and Cambridge Quantum Computing Ltd.
Appendix A The relation between and
We prove the following proposition.
Proposition 2.
If has at most nonzero entries in any row, it holds that .
Proof.
First observe that , and furthermore we have that . From this we have that for a sparse . By Theorem 5.6.18, in [horn1990matrix], we have that for and using the above we have . Therefore we find that as desired. ∎
It immediately follows that for dense , i.e., .
Appendix B Proofs of technical lemmas
Proof of Lemma 6.
The proof outlined here follows closely the proof of Lemma 8 in [berry2015hamiltonian]. Recalling the definition of and , we define the weights in by
where . The normalization here is chosen such that which will give the best result [berry2015hamiltonian].
Since
(42) 
observe that we have two error sources. The first one coming from the truncation of the series, and the second one from the different renormalization of the terms which introduces an error in the first terms in the sum. We therefore start by bounding the normalization factor . For the Besselfunctions for all it holds that , since [abramowitz1964handbook][9.1.5]. For we can hence find the following bound on the truncated part
(43) 
Since , based on the normalization, we hence find that
(44) 
which is an lower bound on the normalization factor . Since , we and since the correction is small, this implies that
(45) 
and we have a multiplicative error based on the renormalization.
Next we want to bound the error in the truncation before we join the two error sources.
From Eq. (29) we know that
(46) 
by the normalization of . From this we can see that we can hence obtain a bound on the truncated as follows.
(47) 
Therefore we can upper bound the lefthand side in terms of the exact value of , i.e. if we can bound the rightmost term in Eq. (47). Using furthermore the bound in Eq. (45) we obtain
(48) 
which reduced with and to
(49) 
We can then obtain the desired bound by reordering the above equation, and using that such that we have
(50) 
We hence only need to bound the righthand side.
For we can use that and obtain the bound [berry2015hamiltonian]. For the case we need to refine the analysis and will show that the bound remains the same.
Let as above. First observe that
, and it follows that
(51) 
Therefore we only need to treat the case.
(52) 
Using this bound we hence obtain from Eq. (50),
(53) 
In order for the above equation being upperbounded by , it suffices to some that is upper bounded as claimed. ∎
Proof of Lemma 5.
Let . We first define the unitary operator to prepare the coefficients: