Regulators around the world are enforcing privacy-by-design and privacy-by-default approaches to protect the users’ data in rest, transit and processing. Several service providers and applications that traditionally use users’ data in plain domain to extract patterns and provide services are now applying encrypted domain computations. Some of the example applications are disease classification in health-care, data search in the cloud, biometric verification, etc. (e.g., [1, 2, 3, 4, 8, 5, 6, 7] and references their-in).
The common theme across these applications is that there are two distrusting parties want to work on a common goal by combining both of their data while preserving the data privacy. For example, a buyer wants to verify his age to an on-line shop using security token instead of sending date of birth.
There are algorithms developed in literature to support data privacy for applications such as classification algorithms, data mining algorithms, distance calculations etc. [1, 2, 3, 4, 8, 5, 6, 7]. In all of these algorithms, one party encrypts the sensitive data whenever that data should be sent to other party. Hence the second party needs to process the received data in an encrypted domain. This approach ensures data privacy. Regardless of algorithms, privacy-preserving scalar product (PPSP) has been used as one of the privacy enabling tools between the two parties. The intuition behind this is that a mathematical function that relies on two different variables can be modified into a scalar product [3, 4]. Therefore, PPSP becomes a vital tool in most of the privacy-preserving (PP) algorithms.
Suppose, there are two parties, A and B, want to compute the following scalar product
where vectorbelongs to A and vector belongs to B. The privacy requirement here is that no party is allowed to learn the others input vector. At the end, only one party can learn the output of the scalar product (SP). Fig. 1a shows the data flow between the users during the SP computations and Fig. 1b shows the use of PPSP in different applications.
Several solutions have been proposed to address this problem in literature (see Section 2). These solutions rely on either public-key encryption techniques to achieve strong security or randomisation techniques for high efficiency. The security of these schemes rely on mathematically hard problems and these solutions will be obsolete in few years time due to the rise of quantum computers as there are existing quantum algorithms which can easily solve the mathematically intractable problems [11, 12, 9, 13, 10].
Hence, this paper exploits lattice-based cryptography to build a PPSP. The proposed model is similar to lattice-based fully homomorphic encryption scheme  and support multiple encryption and addition without decryption . However, the major challenge was to ensure the error terms are not overflowed to effect the accuracy. The paper proposes a methodology to control the error terms while ensuring the given security level, i.e., 128-bit.
Lattice-based cryptography has been proven to be secure against quantum attacks and expected to replace the existing public- key cryptography schemes [11, 12, 9, 13, 10]. Therefore the proposed solution will be secure against quantum computers and can be used in PP algorithms for various applications to achieve privacy. At the same time, the experimental results (see Section 6) show that the proposed PPSP can also be executed significantly faster than the existing PPSP schemes at equivalent security level.
The rest of this paper is organised as follows: The related work is discussed in Section 2. The background information about lattice-based cryptography and its hardness assumptions are provided in Section 3. The proposed algorithm is described in Section 4 followed by the security analysis and parameter selections in Section 5. Experimental results are provided in Section 6. The conclusions and future work are discussed in Section 7.
2 Literature review
The existing PPSP schemes can broadly be divided into two: 1) the schemes that are built using proven cryptography such as homomorphic encryption, and 2) the schemes that are built based on information theory such as randomisation and linear algebra. Even though the latter is much efficient than former, security level of latter is not quantified. The following subsections study the state-of-the-art algorithms for each of these schemes.
2.1 Homomorphic encryption based PPSP
Homomorphic encryption techniques such as Paillier play a vital role in supporting PPSP since it offers high security such as bits . Even though this scheme is highly secure, it becomes inefficient with the size of the vectors i.e., it may take long time (i.e., a few minutes in modern laptops with five cores and 6GB memory) to compute the scalar product when the dimension of the vectors is around 1000. Several efficient PPSP schemes were proposed in literature to improve the efficiency [25, 26, 27, 28, 29, 30, 31, 21, 20, 23]. All these schemes use the homomorphic PPSP scheme as a benchmark to measure the efficiency. We discuss these in the following subsections.
A lattice based functional encryption technique that predicates whether the SP is equivalent to or not was proposed in . This work is based on lattice trapdoors . If the SP is equivalent to then the trapdoors successfully remove large elements in the problem. Note that the work in  is completely different to the objective of the proposed work on this paper and the algorithm in  cannot be modified to develop a PPSP scheme.
. These works treat the encryption technique as a black-box to develop several applications ranging from logistic regression based prediction to statistics of smart meter reading in encrypted domain. In contrast to traditional homomorphic encryption such as Paillier, the learning with error based encryption involve a number of parameters that must be set properly for problems with different dimensions. Otherwise, as we will show in Section3, error terms will overflow and decryption will be unsuccessful. In this paper, we clearly show how to setup the parameters to achieve different level of security. Most importantly this is the first paper that compares the performance of quantum secure cryptographic scheme against traditional homomorphic encryption scheme and information theoretic secure scheme and show that a quantum cryptographic based scheme can outperform the other schemes if the parameters are set properly.
2.2 Information theory based PPSP
In 2001, Du et. al proposed a PPSP algorithm using 1-out-of-N oblivious transfer function and homomorphic encryption . This algorithm is based on splitting the input vector of Party A into number of random vectors to achieve privacy from Party B. The drawback of this method is that both parties need to be on-line and interact several times to perform the SP.
In 2002, Du et al proposed another SP which reduces the communication complexity of their previous work  but with the help of a third-party semi-trusted server . The algorithm in  requires a third-party sever to generate two random vectors and . The vector will be revealed to A and the vector will be revealed to B. Using these vectors, A and B compute the shares of the SP. Hence, both the parties must reveal their shares to get the actual SP value. The communication complexity of this protocol is four times higher than the communication cost of SP without privacy. Moreover, the major draw back of this work is the involvement of third-party who can easily collude with one of the parties to reveal the other party’s input vector.
Vaidya and Clifton in 2002 proposed a novel PPSP solution but without the need of third-party in . The communication complexity of the algorithm in  is same as . However, the computation cost is while it is for the . Moreover, the security of the SP algorithm in  depends on the difficulty of solving linear equations.
In 2007, Amirbekyan et. al. proposed a homomorphic encryption and randomisation (or add vector protocol) based PPSP . Since , the authors of  exploited homomorphic encryption technique to compute . Party A generates public and private key pairs using any homomorphic encryption scheme that offers additive homomorphism (i.e., Pailler encryption) and encrypt the elements of vector . The encrypted vector and the public key are sent to Party B. Party B subtract its vector from encrypted using homomorphic properties and obtain encrypted . Subsequently, Party B permutes and sends the elements of encrypted to Party A. Party A decrypts the vector received from Party B and obtains the permuted . Party A also receives from Party B. Using these, Party A can compute the required SP. Similarly, there are several variations of PPSP algorithms proposed in literature they either use homomorphic encryption or randomisation or both [29, 30, 31].
One of the algorithms that is secure and lightweight to-date is called SPOC: Secure and Privacy-preserving Opportunistic Computing proposed in [21, 20] which is proven to be faster than all the other SP and achieve high security. In , the security and privacy of the input vectors are protected by masking them by large random integers whose size is around bits. It is shown in , that the computational complexity is almost negligible and communication complexity is almost half compared to the Paillier homomorphic encryption based SP . To make a fair comparison with the proposed scheme, we reset the SPOC parameters to achieve bit security against traditional computers. Then in Section 6, we compare the performance of SPOC against the proposed lattice-based PPSP scheme and show that the latter one is, at least twice as fast as the SPOC algorithm.
Recently, linear algebra based PPSP was proposed in  for biometric identification. The solution proposed is efficient and do not require parties to be on-line. In particular, the solution is very useful when Party A wants to outsource the SP computation to Party B.
For this scheme, Party A holds both the input vectors and . Initially, Party A obtains a diagonal matrix using the input vector followed by generating two random invertable matrices and and a random lower triangular matrix . The encryption of the input vector is simply a matrix multiplication i.e., . This encrypted matrix is send to Party B. Later, if Party A wants to compute a SP then Party A generates a random lower triangular matrix and computes as an encryption of where matrix is just a diagonal matrix of . This encrypted matrix is sent to Party B who computes the following which is equivalent to : where is a matrix trace operation .
This model has been applied in various biometric authentication applications. For example, recently, the work in  exploited this scheme to protect biometric templates. In , the user extracts biometric template and encrypts using random matrices as explained in the previous paragraph. Later, if the user wants to authenticate to the server, then the user extracts a new biometric sample, lets say , and encrypts using the random matrices and send it to server. Using these encrypted samples (i.e., and ), the server can find the similarities. This model requires multiplication of several matrices and the complexity will increase substantially when the elements of the matrices are set to large integers to achieve bit or higher security. Again, the security of these schemes are dependent on integer factorisation and vulnerable for quantum algorithms.
3 Lattice based Cryptography
We use bold lower-case letters like to denote column vectors; for row vectors we use the transpose . We use bold upper-case letters like to denote matrices, and identify a matrix with its ordered set of column vectors. We denote horizontal concatenation of vectors and/or matrices using vertical bar, e.g., . For any integer , we use to denote the ring of integers modulo , to denote the set of matrix with entries in . We denote a real number as .
An dimensional lattice is a full-rank discrete subgroup of . Let denote the linearly independent vectors in . Then dimensional lattice is defined to be the set of all integer combinations of as follows:
where . The set of vectors is called basis for the lattice , and is called the rank of the lattice.
Without loss of generality, we consider integer lattices i.e., whose points have coordinates in . Among these lattices, many cryptographic applications use a particular family of so-called “ary” integer lattices which contain as a sub-lattice for some small integer . There are two different ary lattices considered in many lattice-based cryptographic applications. Let us define them as follows:
For instance, for any integer and any , a set of vectors that satisfy the following equation
forms a lattice of dimension , which is closed under congruence modulo . This lattice is denoted by where
Using , we define a coset or shifted lattice where
where is an integer solution to
Similarly, we can define another dimensional q-ary lattice, . For a set of vectors , and which satisfy the following equation:
It is easy to check that and are dual lattices.
3.2 Lattice Hard Problems
There are three well-known hard problems in lattice that have been exploited by researchers to build several cryptographic applications. This section defines these hard problems briefly.
3.2.1 Short integer solution
Hardness of finding a short integer solution (SIS) was first exploited by Ajtai . The SIS has served as a foundation for many cryptographic applications such as one-way hash function, identification scheme and digital signature using lattices. The SIS can be defined as follows:
Definition for SIS
For a given uniformly random vectors , forming columns of a matrix , finding a non-zero short integer vector with norm such that
is intractable. This problem has the following useful observations:
Without the requirement of i.e., “short” solution, it is easy to find a vector via Gaussian elimination that satisfies .
The problem becomes easier to solve if is increased and difficult to solve if is increased.
The norm bound and the number of the column vectors must be large enough that a solution is guaranteed to exist. This is the case when .
3.2.2 Inhomogeneous short integer solution
Definition for ISIS
For a given uniformly random vectors , forming columns of a matrix , and a uniform random vector , finding a non-zero integer vector with norm such that
3.2.3 Learning with errors
Learning with errors (LWE) [13, 9] is an encryption-enabling lattice-based problem but similar to SIS. To enable encryption, the LWE problem depends on a “small” error distribution over integers. The LWE is parametrised by positive integers and , and a small error distribution
, which is typically be a “rounded” normal distribution with mean. The constant plays a critical role in the security of LWE and it should be chosen as large as possible while satisfying the following condition :
There are two versions of LWE based problems. Before defining these, let us define a distribution called LWE-distribution as follows:
For a given secret vector , a sample from LWE distribution is obtained by choosing a vector uniformly at random, a “small” error , and outputting .
Using the LWE distribution, we can define two versions of LWE problem as follows:
Given independent samples drawn from the above LWE distribution for a uniformly random (fixed for all samples), it is intractable to find .
Given independent samples where every sample is distributed according to either: (1) for a uniformly random
(fixed for all samples), or (2) the uniform distribution, then distinguishing which is the case is intractable. We can have the following observations from the two LWE problems outlined above:
Without the error term , the Search-LWE problem can be solved easily using Gaussian elimination technique and the secret can be recovered.
Similarly for Decision-LWE problem, without the error term
, Gaussian elimination technique will reveal with high probability that no solutionexists if it is not sampled from LWE distribution.
If there are LWE samples for a uniformly random (fixed for all samples), we can combine all s into a matrix , s into a vector , and s into a vector into the following vector-matrix linear equation
In the following sections, we will exploit the above lattice hard problems to develop the the lattice-based PPSP.
4 Lattice-based PP Scalar Product Computation
Let us suppose, there are two distrusting entities, X and Y. Entity X owns an dimensional binary vector . Entity Y owns another dimensional binary vector . Both X and Y want to interact with each other to compute the SP without revealing their own vector to the other party. In the end, one-party obtains . To perform PPSP using lattice, there are four steps required. The following subsections describe each of them in details. The complete algorithm is given in Fig. 2.
4.0.1 System initialisation
Let us start with generating a uniformly random matrixwhich is known to X and Y. The matrix contains column vectors , , , i.e., .
4.0.2 Step 1
Entity X computes a SIS style vector using and the binary vector as
and sends to Y.
4.0.3 Step 2
Entity Y generates a uniformly random vector , a small error term , and a small error vector . Then Y computes the following LWE style term and vector :
and sends these to X.
4.0.4 Step 3
Entity X performs the following computation to retrieve the SP value as follows:
4.1 Condition for Correctness
Let us derive the condition for the above-mentioned algorithm to output a correct result. In (12),
Since , and ,
In (13), the scalar product is masked by error term . To output a correct answer, this error term must satisfy the following condition:
which proves the correctness of the proposed algorithm. Further, the requirements for the error term (14) should be analysed and defined such that is always smaller than . To achieve this, we need to find the upper bound for the error term. The following subsection is dedicated for this analysis.
4.2 Upper bound of the error term ()
As we described in Section 3.2.3, the small error terms are sampled from a normal distribution with mean and standard deviation (let us denote this as ) followed by scaling and modulo reduction by as follows:
where and belongs to a “rounded” normal distribution with mean and standard deviation (let us denote this as ). Let us also denote vectors and . Hence the error vector
Using the above information, let us find the upper bound for the error term . Let us define an dimensional vector and another dimensional vector , hence, . Using the triangle inequality, we can define the upper bound of the error term as follows:
Since , the Euclidean norm of is . Hence,
Since and , if we choose standard deviation as , then the probability
(i.e., one in four million). The probability will decrease further if we choose a higher number of standard deviations for the upper bound. Without loss of generality, in the rest of the paper, we consider standard deviation as . Therefore, with very high probability,
Therefore, with very high probability, the error
As long as this error is smaller than , i.e.,
our proposed solution outputs a correct result. Hence, if the upper bound for is
then with high probability (it may not provide correct result one in four million times), the proposed algorithm outputs a correct result. This concludes the proof for correctness. The requirements for the correctness are listed in Table I. The next section analyses the security of the proposed algorithm.
5 Security Analysis
Firstly, let us analyse whether Y can learn the secret vector from the exchanged vector in Step 1. Since (therefore is a short vector), according to the hardness of ISIS problem defined in Section 3.2, it is intractable for Y to solve and obtain a short vector as a solution.
Step 1 operation is similar to hashing. Since the dimension of typical vector is , there are possibilities. The only problem is (as same as in any hashing algorithm) the output of Step 1 is deterministic for same .
Therefore brute force approach may not work for Y. Hence Y needs to use mathematical properties to solve the problem to uncover from . In other words, if Y can recover from then Y can solve the lattice hardest problem. As defined in Section 3.2, Y cannot find a vector shorter than i.e., . Therefore, let us analyse the shortest possible vector which can be recovered by Y.
Suppose if Y wants to find a short vector from then Y may exploit the state-of-the-art techniques called lattice reduction method  and/or combinatorial method . Denote the shortest vector which can be found by these techniques as . It is proven in literature (theoretically and experimentally), that the Euclidean length of has a lower-bound as follows:
then Y cannot recover from . This is a first condition for security. This concludes that if condition (25) is met then Y cannot recover from . Also, the cost () of finding a short binary vector using the techniques described above is defined as :
where should satisfy the following equation:
Now let us focus whether X can recover from the messages and sent by Y to X in Step 2.
According to the definition in Section 3.2, if and are LWE terms then it is intractable for X to recover since and are indistinguishable from uniformly random distribution. If , , and are uniformly distributed and the error term and error vector are sampled from normal distribution with standard deviation greater than as defined in (8) then and are uniformly random.
Matrix is already a uniformly random matrix. Entity Y can generate uniformly random , and . The vector sent by X is uniformly random as long as the number of possibilities for is larger than i.e., or  (this is the second security condition).
Since the dimension of is , and the scalar is masked by an error term , the term is scalar and completely random. Therefore, according to the LWE definition, it is intractable for X to recover the elements of from scalar . To analyse , let us denote the th element of as where . In , is scalar and LWE term i.e., uniformly random. Similar to LWE encryption scheme ,
acts like a one-time pad to hide the message. Hence, X cannot recover from and therefore the proposed scheme is secure. In Section 5.1, we show that our parameter choice satisfying (8) (third security condition) is hard and at least equivalent to bit security.
In LWE, the noise term plays a major role in determining the hardness . The normal distribution where the error terms are sampled must satisfy (8). The term must be chosen as largest possible while satisfying (8) for hardness of LWE. To quantify the hardness or security level of LWE for a concrete set of parameters, Regev et. al exploited the dual lattice in [17, p. 21]. The idea is to find how many operations are required to distinguish an LWE term from uniform distribution. This is only possible if an adversary can find a short vector on dual lattice. To this, let us denote a vector and denote a short vector in dual lattice as . If the vector is an LWE vector then the scalar product will be an integer [17, p. 22]. If not then is a uniform random vector. Therefore finding a short vector in dual lattice must be hard. If the standard deviation of the error term is not bigger than then it may be possible to find a short vector in dual lattice. Therefore, error term must be bigger than for LWE security. This requirement and (24) can now be used to quantify the LWE security.
Now using the lattice properties i.e., the length of a shorter vector in dual lattice is equivalent to times the length of shorter vector in lattice [17, p. 22]. Using this and (24), we can say . Therefore if error
5.1 Parameter Selection
Firstly, let us obtain the relationship between and . Since the maximum possible value for is , we split into parts i.e., the distance between the consecutive values is . To obtain a correct result, as shown in (22), half of this distance should be larger to accommodate the error term i.e., or . Table I provides the necessary requirements for all the parameters to achieve correctness and security. This table is a summary of requirements derived in the previous sections. Using this table, let us obtain a concrete set of parameters to achieve bit security. The same strategy has been used to obtain the parameters for lower security (i.e., bits, and bits) and higher security bits in Section 6.
To obtain bit security, we need to choose our parameters in such a way that the cost equation (26), . If we choose then from (27), . Hence, . Therefore the security of the solution would be equal to bits if . Based on this and other requirements (all are listed in Table I), we are proposing six sets of parameters in Table II to achieve
bits security. These parameters have been cross validated using the well known LWE Estimator [- the source code for the LWE Estimator, that calculates the security complexity using six different algorithms such as lattice-reduction, dual-lattice attacks etc, is available at https://bitbucket.org/malb/lwe-estimator].
In Table II, parameters and play a major role to ensure bit security. They are linked as increasing leading to a small . These parameters determine the size of matrix and the memory requirement. The first four sets are equivalent in terms of memory () while the last two require around and , respectively. As shown in the experiments, running time for the last two are significantly higher and not useful for practical applications. For Sets V and VI, the size of is not decreasing as much as those for the other sets. The security levels for Sets V and VI are bits and bits, respectively. The reason is that, larger leads to a larger , hence, in order to satisfy the error distribution parameter in (23), the value for must be set to high. Increasing the value for will increase the security.
6 Experimental Results
In order to evaluate the proposed LWE based PPSP scheme, we implemented the algorithm in Java (code is available from: https://tinyurl.com/ycpw6ncj) and tested on a 64-bit Windows PC with 16GB RAM and Intel(R) Core(TM) i5-4210U CPU at 1.70GHz. For performance comparison, we also implemented the Paillier homomorphic encryption based PPSP scheme  on the same PC using Java. Additionally, we compared our scheme with one of the most efficient PPSP algorithms namely SPOC . Our test results show that the proposed LWE based scheme is significantly faster (at least times faster) than the Paillier homomorphic PPSP scheme and at least twice as fast as SPOC  for the bit security.
|Input by X:|
|Output to X:|
|Step 1: X performs the following operations:|
|Generates Paillier public-private key pairs ,|
|FOR EACH ,|
|keeps , and sends to Y|
|Step 2: Y executes the following operations|
|Sends back to X|
|Step 3: X decrypts and obtains|
6.1 Proposed Lattice-based PPSP Scheme and Paillier PPSP scheme
The Paillier cryptosystem  is an additively homomorphic public-key encryption scheme. Its provable semantic security is based on the decisional composite residuosity problem: it is mathematically intractable to decide whether an integer is an -residue modulo for some composite , i.e. whether there exists some such that . Let where and are two large prime numbers. A message can be encrypted using the Paillier cryptosystem as where and . For a given encryption and , an encryption can be obtained as , and multiplication of an encryption with a constant can be computed efficiently as . Hence, a Paillier cryptosystem is an additively homomorphic cryptosystem. Let us denote and as the Paillier homomorphic encryption and decryption functions. Using the homomorphic properties and the above definitions, homomorphic encryption based PPSP is described in Table III.
According to NIST recommendation [32, 33], public-key encryption schemes such as RSA and Paillier must use bit long keys for encryption and decryption in order to achieve bit security. Hence, to obtain the running time for the Paillier homomorphic encryption based PPSP, we used bit long keys. We also obtained the running time for the proposed LWE based scheme for the first five sets of parameter given in Table II (Sixth set was ignored as it was taking too much time to run). The running times averaged over 100 executions are listed in Table IV [no parallelization or multi-threading was used].
|Pailler Based PPSP (ms)|
As presented in Table IV, the result of Set I has outperformed the other sets. This is due to the fact that, even though the security levels are equal across all the sets, when the size for increases, the matrix becomes larger and requires an increased number of multiplications. In turn, this slows down the algorithm. With this observation, we will continue using the parameters that belong to Set I for the remainder of our experiments presented in this paper. The last column in Table IV shows the average running time for the Paillier scheme. The proposed scheme is at least times faster than Paillier PPSP scheme. The dimensions of the input vectors for these sets are in the range of to (see the third column in Table II).
To compare the performance of the proposed scheme for different security levels, a new set of parameters are provided in Table V. Based on the NIST recommendations [32, 33], the key sizes for the Paillier scheme is also provided in Table V. Using this information, the average running time is plotted in Fig 3. While the average running time for the proposed scheme is increasing linearly, it increases exponentially for the Paillier scheme. It should be noted that the average running time for the proposed scheme is around seconds at bit security [without any parallel computations or multi-threading]. These results demonstrate that the proposed lattice PPSP scheme is significantly faster than the Paillier PPSP.
6.2 Proposed Scheme and Randomisation Technique
Table VI shows the state-of-the-art randomisation based PPSP scheme called SPOC [4, 20]. The security of the SPOC algorithm depends on the hardness of the factoring an integer i.e., . s are protected by and known only to X. If Y wants to recover the X’s input vector, Y needs to factor all s to find the common . This approach can be seen as an approach used in RSA encryption or any public-key encryption that relies on hardness of factoring integers. According to the NIST recommendation [32, 33], the size of these integers must be around bit in order to obtain bit security (without loss of generality, we ignore the requirement of prime numbers). Hence, we set in Table VI to bits to compare SPOC and the proposed lattice PPSP scheme.
Using this setting, the average running time for the proposed and SPOC PPSP schemes are obtained at bit security. Fig. 4 shows the average running times for both schemes for different input vectors whose dimensions are between and . The proposed scheme is at least twice as fast compared to SPOC for the security parameters. It should be noted that, since SPOC relies on hardness of integer factorisation, similar to Paillier scheme, it is also vulnerable for quantum attacks.
|Input by X:|
|Output to X:|
|Step 1: X performs the following operations:|
|Given security parameters , , , ,|
|choose two large primes ,|
|such that , , set|
|Choose a large random number , and random|
|numbers , , with|
|FOR EACH ,|
|keeps secret, and sends to Y|
|Step 2: Y executes the following operations|
|FOR EACH ,|
|where is a random number with|
|Send to X|
|Step 3: Now X computes and obtains|