1 Introduction
Quantum machine learning, as a burgeoning field in quantum computation, aims to facilitate machine learning tasks with quantum advantages [1]
. Numerous theoretical studies have shown that quantum machine learning algorithms can dramatically reduce the runtime complexity over their classical counterparts, e.g., quantum perceptron
[2]and quantum kernel classifier
[3]. Meanwhile, experimental studies have been conducted that employ nearterm quantum devices to accomplish toymodel learning tasks with promising performances [4, 5, 6, 7, 8, 9]. Both theoretical and experimental results suggest that we are stepping into a new era, in which the nearterm quantum processor [10] can be applied to benefit realworld machine learning tasks.Quantum private learning, in contrast to the auspicious achievements of public data learning, has not been well explored yet. Very few studies investigated the connection between classical and quantum data privacy [11, 12, 13]. However, as a central topic in machine learning [14, 15], the significance to devise quantum private learning algorithms is continuously increasing. Due to legal, financial, or moral reasons, private learning targets to train an accurate learning model without exposing the precise information in individual training example, e.g., genomic data and medical records for patients. Thus, in many applications, the legibility of the proposed quantum learning algorithms will be questioned if they cannot promise the privacy guarantee. Differential privacy (DP), which quantitively formalizes a rigorous and standard notion of ‘privacy’, provides one of the most prominent solutions towards private learning [16]. During the past decade, extensive DP learning algorithms have been proposed under varied practical settings [17, 18, 19, 20, 21].
A fundamental topic in privacy learning is devising privately sparse regression learning models [22, 20]. Let be the given dataset, where and are the
th feature vector and the corresponding target, respectively. Equivalently, we write
. In all practical scenarios, . Suppose that the dataset satisfies , where with (i.e., ) is the underlying sparse parameter to be estimated, and is the noise vector. The goal of the privately sparse regression learning is to recover while satisfying differential privacy. The mainstream learning model to tackle this task is the private Lasso estimator [23], which estimatesby minimizing the loss function
(1) 
where the constraint set is an norm ball that guarantees the sparsity of the estimated result and the differential privacy should be preserved with respect to any change of the individual pair . Following the conventions [22, 23, 20, 24], we set , and suppose that and throughout the paper. Note that our results can be easily generalized to , , and .
An important utility measure of the private Lasso is the expected excess empirical risk , i.e.,
(2) 
where the expectation is taken over the internal randomness of the algorithm for generating . The studies [22, 23] proposed private Lasso algorithms with the utility bound . Ref. [25] presented a private algorithm with utility bound, under the strong convexity and mutual incoherence assumptions. All of these algorithms run in time that is polynomial in and . Despite the achievements in the classical scenario, how to design quantum private Lasso remains unknown and is very challenging because of the disparate priorities of quantum machine learning and DP learning. Specifically, quantum machine learning algorithms aim to achieve a low runtime cost, while DP learning algorithms concern a low utility bound. Therefore, the proposed quantum private learning algorithm should accommodate the following requirements from each side:

the runtime cost should outperform its classical counterparts;

the utility bound is expected to be nearly optimal.
Contributions. The main contribution in this study is devising a quantum differentially private Lasso estimator to tackle the private sparse regression learning tasks that yields a better runtime complexity than the best known classical Lasso and a nearly optimal utility guarantee. To the best of our knowledge, this is the first quantum private learning algorithm that can accomplish practical learning problems with the provable advantages.
To make a fair comparison, the quantum input/output models used in our study are restricted to have an almost identical setting to the classical case [26], except that we allow coherent queries to the entries of given inputs. Such quantum input/output models were employed in [3] to design a sublinear runtime quantum kernel classifier. Let us further emphasize the importance of the employed quantum input model. As questioned by [3], using quantum input models that are too powerful may cause the achieved quantum speedup inconclusive. Alternatively, given a strong classical input model, quantuminspired machine learning algorithms can collapse various quantum algorithms that claimed exponential speedups [27].
The formal definition of classical and quantum input models used in this study is as follows.
Definition 1 (Classical and quantum input models).
For a given dataset , the classical (quantum) input oracles and can recover the entry and the entry with and in time (in superposition).
Note that ‘in superposition’ means that the coherent queries are permitted in quantum input model, i.e., it is allowed to query many locations at the same time [28]. Numerous quantum algorithms have exploited this condition to gain quantum advantages [1, 29].
Our main result is the following theorem on the runtime complexity and the utility bound of the proposed quantum private Lasso estimator with the adaptive privacy mechanism.
Theorem 1 (Informal, see Theorems 7 and 8 for the formal description).
Given the quantum input model and in Definition 1, the proposed quantum private Lasso estimator after iterations is differentially private, and outputs with overall runtime and the utility bound
Given the classical input model formulated in Definition 1, the optimal (lower bound) runtime of the classical Lasso is (see Lemma 2). Compared with the classical results, our quantum private Lasso yields the runtime speedup when . Moreover, the optimal utility bound has proven to be [23], which indicates that our result is nearly optimal. Consequently, the proposed quantum private Lasso estimator meets the two requirements of quantum private learning algorithm design.
We emphasize two main ingredients that differ the quantum private Lasso with the classical private Lasso [23]. The first one is the proposed quantum FrankWolfe algorithm, which is employed as the backbone of our quantum private Lasso estimator. We prove that the runtime cost of the proposed algorithm is , which achieves a quadratic runtime speedup over the optimal classical Lasso in terms of the feature dimension . We also demonstrate that the runtime lower bound of a quantum Lasso estimator is . This implies that our result is nearly optimal in terms of . Another key component is the proposal of the adaptive privacy mechanism, which ensures the quantum private Lasso estimator to achieve the near optimal utility guarantee and the runtime speedup, because naive applications of existing differential privacy mechanisms will result in runtime or the utility bound worse than its classical counterparts (See Section 4 for details).
1.1 Related work
The main focus of this study is quantum machine learning. Previous quantum machine learning literature that is related to our work can be divided into two groups. The first group is quantum regression algorithms and the second group is quantum differential privacy mechanisms. Here we compare these two groups with our proposal separately.
1.1.1 Quantum regression algorithms
There are a few proposals aiming to solve quantum regressions tasks without the privacy requirement. A representative quantum linear regression algorithm is proposed by [30]
, which showed that the ordinary least squares fitting problem can be solved with an exponential speedup given the assumption that there exists a quantum random access memory (QRAM) to encode classical input into quantum states in logarithmic runtime
[31, 32]. Under such an assumption, the quantum linear systems algorithm [33] can be employed to obtain the closedform expression for the estimated solution, i.e., , with the exponential speedup. Following this pipeline, the subsequent quantum regression algorithms further improve the runtime complexity bound with respect to the polynomial terms [34, 35, 36], e.g., rank and condition number, and tackle the variant regression tasks, e.g., nonlinear regression and ridge regression
[37]. In contrast to solving the closedform expression, the study [38] tackles the ridge regression tasks by using the gradient descent method, where the runtime complexity achieves the exponential speedup at each iteration under the QRAM assumption. We remark that such an assumption is very strong, and it is still an open question about how to efficiently implement QRAM. Moreover, the recent quantuminspired algorithms adopt the similar assumption of QRAM and dequantize numerous quantum machine learning algorithms with exponential speedups [39, 40, 41, 27].Unlike the aforementioned results, the input model used in our proposal requires only a very mild assumption as explained in Definition 1, and our result does not assume input data to be low rank. Moreover, the quantuminspired classical methods cannot collapse the runtime speedup achieved by our quantum Lasso because we employ the quantum minimum finding algorithm as a subroutine.
1.1.2 Quantum private learning
Few studies have investigated the topic of quantum private learning [44, 13, 45, 11]. The study [13] developed quantum privacy mechanisms and analyzed their privacy guarantees. Note that, although Ref. [13] proposed the quantum differentially private mechanism, naively employing them to build the quantum private Lasso estimator will lead to an unaffordable utility guarantee (see Section 4 for details). The study [45] proposed a quantum private perceptron, which can classify input dataset with the privacy preservation. However, the privacy metric used in [45] follows the study [14], which is irrelevant to the notion of differential privacy and is incomparable with our results. In addition, the proposed quantum private perceptron [45] cannot achieve any runtime advantage. The study [44] leveraged the concept of differential privacy to protect quantum classifier against the adversarial attack. Instead of preserving data privacy, the study [44] only focuses on how to employ the perturbation noise, which is used in the quantum differential privacy, to preserve the classification accuracy. The study [11] connected the concept of differential privacy to shadow tomography. The main contribution of their study is to utilize the result of differential privacy to tackle quantum shadow tomography problem, and shows that the required number of copies is for an unknown dimensional quantum state. Recently, the study [12]
explored quantum differential privacy from the perspective of learning theory. Specifically, the authors demonstrate that the learnability of the proposed quantum statistical query learning model implies the learnability in the quantum private probably approximately correct model. We emphasize that, unlike the above studies, our work aims to develop a private learning algorithm that achieves both the quantum advantage and the provable utility guarantee. These two factors have not been considered together before.
2 Preliminaries
We unify some basic notation throughout the whole paper. The set is denoted as . Given a matrix and a vector , and represent the th row of and the th entry of , respectively. We denote the norm of and as and , respectively. Specifically, the Frobenius norm for is defined as . The notation always refers to the th unit basis vector, e.g., for ,
. The identity matrix of size
is denoted as. The Laplacian distribution with variance
is denoted as .2.1 Convex optimization
We introduce two basic definitions in convex optimization. Refer [46] for more details.
Definition 2 (LLipschitz).
A function f is called LLipschitz over a set if for all , we have
(3) 
If the function is LLipschitz, differentiable, and convex, then
(4) 
Definition 3 (Curvature constant [47]).
The curvature constant of a convex and differentiable function with respect to a compact domain is
(5) 
2.2 Quantum computation
We present essential background of quantum computation, i.e., quantum states, quantum oracles, and the complexity measure. We refer to [28] for more details.
Quantum mechanics works in the Hilbert space with , where represents the complex Euclidean space. We use Dirac notation to denote quantum states. A pure quantum state is defined by a vector (named ‘ket’) with unit length. Specifically, the state is with , where the computation basis stands for the unit basis vector . The inner product of two quantum states and is denoted by , where refers to the conjugate transpose of . We call a state is in superposition if the number of nonzero entries in is larger than one. Analogous to the ‘ket’ notation, density operators can be used to describe more general quantum states. Given a mixture of quantum pure states with probability and , the density operator presents the mixed state as with and .
The basic element in quantum computation is the quantum bit (qubit
). A qubit is a twodimensional quantum state, e.g., a qubit can be written as
. Letbe an another qubit. The quantum state represented by these two qubits is formulated by the tensor product, i.e.,
as a dimensional vector. Following conventions, we can also write as or . For clearness, we sometimes denote as , which means that the qubits () is assigned in the quantum register (). The corresponding density operator for the two qubits case is with and .There are three typical quantum operations. The first one is quantum (logic) gates that operates on a small number qubits. Any quantum gate corresponds to a unitary transformation and can be stated in the circuit model, e.g., an qubit quantum gate with satisfies . The second one is a quantum channel, which refers to a completely positive (CP) tracepreserving map. Given a density operator , the evolved state after applying a channel is denoted as , with and . Note that unitary is a special case of a quantum channel. The last one is the quantum measurement, which aims to extract quantum information such as the computation result into the classical form. Analogous to a quantum channel, the quantum measurement is modeled by a set of operators with . Given a density operator , the outcome will be measured with the probability and the postmeasurement state will be .
A quantum oracle can be treated as a ‘black box’, which capsules a certain quantum operations and can be used as the input to another algorithm. The quantum input model refers to a unitary transformation that allows us to access the input data in superposition, i.e., denote as a set of indexes to be queried, we have for any and . Note that, analogous to classical computers, represents the binary string of . Similar rules can be applied to .
Finally, the runtime complexity of a quantum algorithm is defined as the number of elementary operations employed in the algorithm. We use to denote the runtime complexity, or use that hides the polylogarithmic factors. We also employ the little notation, i.e., , to denote that .
2.3 Quantum minimum finding algorithm
A crucial technique used in our study is the quantum minimum finding algorithm (DürrHøyer’s algorithm) [48]. Given an unordered list with items, the goal of the minimum finding algorithm is to find an index , i.e.,
(6) 
The theoretical result of quantum minimum finding algorithm is as follows.
Lemma 1 (Quantum minimum finding algorithm, [48]).
The quantum minimum algorithm finds the index defined in Eqn. (6) with probability at least . The corresponding runtime complexity is .
We follow Ref. [48] to explain the implementation of the quantum minimum finding algorithm, summarized in Alg. 1, and refer the interested readers to Ref. [49] for the detailed explanation. Firstly, the input of the algorithm is a quantum oracle , i.e., , where refers to the th item of the unordered list , denote by the total runtime. When , the algorithm continuously employs the Grover search to obtain the index and compare and . Once , the quantum minimum finding algorithm outputs as the prediction of .
A central component of the quantum minimum finding algorithm is the comparator oracle , which is employed to mark every item with for a given . Mathematically, is defined as
where if ; otherwise, . Note that can be implemented efficiently by querying the input oracle twice.
2.4 Differential privacy
We briefly introduce the definition of classical and quantum differential privacy.
Definition 4 (Classical differential privacy [50]).
An algorithm is differential private if for any two neighborhood datasets and with , and for all measurable sets , the following holds:
(7) 
Here the neighborhood datasets and refer that the number of rows in that need to be modified (e.g., moved) to get the is one.
Quantum differential privacy (QDP) [13] leverages similar ideas as its classical counterpart to achieve the privacy guarantee, i.e., a certain noise is deliberately introduced to the output in order to protect the privacy. The major difference is that, instead of using classical Gaussian or Laplacian mechanism to introduce randomness (certain noise), QDP employs a quantum channel to add randomness, e.g., can be amplitude damping, phase damping or depolarizing channel.
Denote a set of positive operator as that corresponds to different outcomes with . Given an input state and a quantum channel , let be the quantum algorithm that takes the input to generate the privacyprotected quantum state , followed by the measurement with . The probability to observe the outcome is . Similarly, a subset of outcomes is observed with probability . By leveraging the above notation, the definition of quantum differential privacy is as follows.
Definition 5 (Quantum differential privacy [13]).
The quantum algorithm is differential private if for all input quantum states and , all measurable sets (equivalently, for every with ), the following holds:
(8) 
3 Quantum nonprivate Lasso estimator
In this section, we will devise a quantum nonprivate Lasso estimator that can tackle the sparse regression learning tasks with a provable runtime speedup over its classical counterpart, i.e., the FrankWolfe algorithm that will be introduced in Section 3.1. The quantum Lasso estimator, as the quantum generalization of the FrankWolfe algorithm, will be elaborated in Section 3.2. The devised quantum nonprivate Lasso estimator will serve as the backbone of the quantum private Lasso (see Section 4).
3.1 Classical FrankWolfe algorithm
The implementation of FrankWolfe algorithm [51] (also known as the conditional gradient method) is summarized in Alg. 2. Remarkably, the FrankWolfe algorithm and its variants, as the representative methods to solve constrained convex optimization tasks, have been broadly used to build the nonprivate Lasso estimator formulated in Eqn. (1) [47, 52, 53]. Furthermore, the study [23] combines the nonprivate Lasso estimators with the Laplacian privacy mechanism to build the differentially private Lasso estimator.
Let us briefly introduce the FrankWolfe algorithm, following the notations used in Eqn. (1). The FrankWolfe method is an iterative optimizer that solves with being the norm ball and . Since the constraint domain is the norm ball, the minimization can be done by checking each vertex of the polytope , where we denote the vertices set so that for and for . In other words, the vertices set contains unit basis vectors .
At the th iteration, the notation in Line 4 of Alg.2 is defined as, for ,
(9) 
where the second line uses Eqn. (1). For , we have . The FrankWolfe algorithm moves towards the minimizer of the linear function, i.e.,
(10) 
Locating the minimizer is accomplished in Line 5 of Alg. 2. Figure 1 illustrates the intuition of the FrankWolfe algorithm. Note that the updating rule of FrankWolfe algorithm, as shown in Alg. 2, is a linear combination of vertexes, which indicates that is sparse and at most nonzero entries exist in .
The FrankWolfe algorithm is robust to noise in the following sense. Instead of calculating the exact solution as shown in Eqn. (10), employing any approximated solution (e.g., obtained by a noisy solver), where is sampled from a certain distribution , to update the learning parameters can also promise the convergence of FrankWolfe algorithm, as long as satisfies the following relation,
(11) 
where is the curvature constant formulated in Definition 3, is the learning rate, and refers to the additive approximation quality in the step with being an arbitrary fixed error parameter [47]. The following proposition quantifies the convergence rate of FrankWolfe algorithm.
Proposition 1 (Theorem 1, [47]).
Let be a sequence of vectors from with , such that for all , Eqn. (11) is satisfied. Then the result satisfies
(12) 
We emphasize that, although the original proof of Proposition 1 [47, Appendix A] only takes account of the deterministic case, it can be easily extended to the expectation setting given in Eqn. (11). Proposition 1 implies that the only difference between the exact (i.e., and ) and approximate scenarios (i.e., and ) is that the utility bound of the latter is slightly worse than the former. Moreover, under the exact setting, Lasso achieves the utility bound when .
We end this subsection with the optimal (lower bound) runtime complexity of classical nonprivate Lasso with the input model formulated in Definition 1.
Lemma 2.
Given the input model formulated in Definition 1, the runtime complexity of the classical nonprivate and differentially private Lasso is lower bounded by .
Proof of Lemma 2.
The study [26] proves that, given the input model formulated in Definition 1
, the optimal runtime for the support vector machine (SVM) is
. Moreover, under the same setting of the input model, the study [54] proves the equivalence between SVM and Lasso. Alternatively, the optimal runtime for Lasso is .Recall that differential privacy mechanisms [55] cannot reduce the runtime complexity. Therefore, the lower bound runtime for private Lasso is at least .
∎
3.2 Quantum nonprivate Lasso estimator
Our main technical contribution here is proposing a quantum version of the FrankWolfe algorithm to build a quantum Lasso estimator. The implementation of the quantum Lasso estimator is summarized in Alg. 3.
There are two steps of our proposal that differ with its classical counterpart (Alg. 2); namely, the construction of the oracle , and the employment of the quantum minimum finding algorithm to find . These two steps enable the quantum Lasso estimator to quadratically reduce the runtime complexity to find for any . The following theorem describes the overall runtime complexity to implement the quantum nonprivate Lasso estimator.
Theorem 2.
The proof of Theorem 2 will be given at the end of this subsection. Compared with Lemma 2, the proposed quantum nonprivate Lasso achieves the quadratical runtime speedup over the optimal classical Lasso in terms of the feature dimension . Moreover, the achieved runtime complexity of quantum nonprivate Lasso is nearly optimal when , supported by the following corollary.
Corollary 1.
The runtime complexity of the quantum nonprivate (or private) Lasso is lower bounded by .
Proof of Corollary 1.
The proof of this corollary follows Lemma 2 closely. Given the input model formulated in Definition 1, the lower bound of quantum SVM is [3]. In favor of the equivalence between Lasso and SVM [54], then the runtime lower bound of the corresponding quantum Lasso is also .
∎
We next elaborate how to implement Line 4 and Line 5 of Alg. 3.
State preparation (Line 4). This step aims to build the oracle that encodes the classical vector in Eqn. (3.1) into the quantum state to earn the runtime speedup. The runtime cost of implementing is as follows, and we provide the proof detail in Appendix A.
Theorem 3 (State preparation).
Given access to quantum input oracles and , the state preparation oracle , which prepares an estimated state with successful probability and , i.e.,
(13) 
can be constructed in runtime, where for any , and the runtime hides a polylogarithmical term .
Notice that the runtime complexity to obtain the classical vector is at least due to the multiplication of and . In contrast, the runtime of our algorithm to prepare the estimated state is and is independent of the feature dimension . Since in most practical scenarios, this result indicates the efficacy to prepare the state instead of directly computing classical form , and enables the quantum Lasso to earn the runtime speedup over the classical Lasso.
Find (Line 5). Given access to the oracle , we can directly employ the quantum minimum finding algorithm [48] to find , or equivalently, . We summarize the runtime complexity to find below, and leave the proof details in Appendix B.
Corollary 2.
Suppose that the the state preparation oracle can be implemented in runtime. With successful probability at least , the classical output can be obtained in runtime. The successful probability can be boosted to by repeating the quantum minimum finding algorithm times.
Now we are ready to prove Theorem 2.
Proof of Theorem 2.
Error analysis and utility bound. The error of quantum Lasso comes from the two subroutines, Line 4 and Line 5, respectively. First, the state preparation oracle only generates an approximated state with successful probability and , as stated in the proof of Theorem 3. Second, the quantum minimum finding algorithm can only locate the index that corresponds to the minimum entry of with successful probability , as shown in Corollary 2.
Since the quantum minimum finding algorithm queries the oracle at most times as illustrated in Section 2, the probability that the state can always be successfully prepared in all queries is . Overall, the successful probability to obtain is , where the index is defined as
(14) 
Since there are in total iterations in the quantum Lasso algorithm, the successful probability to collect is
(15) 
where . Eqn. (15) can be simplified as
(16) 
where we choose , . The inequality uses for and ( and correspond to and , respectively). In other words, with successful probability , we can collect .
We then analyze of the utility bound of quantum Lasso when the collected basis vectors are with . Followed from Theorem 3 and the definition of as formulated in Line 5 of Alg. 3, we have
(17) 
where the first inequality uses , the second inequality comes from the fact , and the last inequality employs . By expanding and with their explicit forms, we obtain the following relation, i.e.,
(18) 
In conjunction with Eqn. (11) and (18), we can choose . Finally, Proposition 1 yields
(19) 
where the second inequality employs , and .
Runtime analysis. We then analyze the runtime complexity of each iteration, which can be efficiently obtained from Theorem 3 and Corollary 2. As shown in Theorem 3, the runtime of using the oracle to prepare the state is . Note that we omit the influence of in the runtime analysis of quantum Lasso, since the runtime to prepare only has a logarithmic dependence in terms of . Following the results in the error analysis, at the th iteration, by repeatedly querying the quantum minimum finding algorithm times, the target basis vectors that satisfies Eqn. (14) can be collected with successful probability . Therefore, based on the claim of Corollary 2, the runtime to find is . The runtime of quantum Lasso with iterations is therefore . By exploiting the explicit form of , , and , the runtime complexity of quantum Lasso is then equal to
∎
4 Quantum private Lasso estimator
The outline of this section is as follows. In Subsection 4.1, we illustrate the first quantum private Lasso estimator by integrating a classical differential privacy mechanism with the quantum nonprivate Lasso, inspired by the study [23]. We prove that the utility bound of the proposed quantum private Lasso estimator is nearly optimal, while its runtime cost is huge. To achieve the runtime speedup, in Subsection 4.2, we propose the second quantum private Lasso estimator by combining the quantum differential privacy mechanism with the quantum nonprivate Lasso. However, its utility bound may not converge. Next, in Subsection 4.3, we devise the third quantum private Lasso by leveraging the adaptive privacy mechanism, and show that its runtime is better than the classical DP Lasso algorithms while retaining a nearly optimal utility bound. Last, in Subsection 4.4, we discuss the possibility to implement the third proposal on nearterm quantum devices.
The following table provides a quick summary of the main conclusion in this section.
Classical [23]  QLaplacian  QDepolarization  QAdaptive  

Runtime cost  
Utility bound 
4.1 Quantum private Lasso estimator with classical Laplacian noise
The differentially private Lasso estimator aims to obtain the optimal parameters defined in Eqn. (1) with the privacy preservation. To achieve the privacy promise, the study [23] replaced the index searching step, i.e., , in the nonprivate Lasso algorithm (Line 5 of Alg. 2) by
where are noise samples drawn from the Laplacian distribution with being the variance. The randomness introduced by Laplacian noise enables the developed private Lasso to achieve the differentially private property with the nearly optimal utility bound [23].
Lemma 3 (Modified from Theorem 3.1, [23]).
Following the same notations used in Eqn. (1), with setting the variance of Laplacian distribution as and being the Lipschitz constant of , the classical private Lasso estimator proposed by [23] after iterations is differentially private and achieves the nearly optimal utility bound, i.e., . The runtime complexity to execute this algorithm is .
Motivated by the classical proposal [23], our first quantum private Lasso estimator is constructed by combining the Laplacian noise with the quantum nonprivate Lasso estimator. We summarize the implementation of this quantum private Lasso estimator in Alg. 4. Specifically, instead of preparing and searching the index that corresponds to the minimum entry of as shown in Line 45 of Algorithm 3, the quantum private Lasso estimator prepares the state and searches the index that corresponds to the minimum entry of , where is the noise vector that is sampled from the Laplacian distribution .
Comments
There are no comments yet.