Cloud-Based Approximate Constrained Shortest Distance Queries Over Encrypted Graphs With Privacy Protection

09/21/2018 ∙ by Meng Shen, et al. ∙ UNSW Canberra IEEE Nokia 0

Constrained shortest distance (CSD) querying is one of the fundamental graph query primitives, which finds the shortest distance from an origin to a destination in a graph with a constraint that the total cost does not exceed a given threshold. CSD querying has a wide range of applications, such as routing in telecommunications and transportation. With an increasing prevalence of cloud computing paradigm, graph owners desire to outsource their graphs to cloud servers. In order to protect sensitive information, these graphs are usually encrypted before being outsourced to the cloud. This, however, imposes a great challenge to CSD querying over encrypted graphs. Since performing constraint filtering is an intractable task, existing work mainly focuses on unconstrained shortest distance queries. CSD querying over encrypted graphs remains an open research problem. In this paper, we propose Connor, a novel graph encryption scheme that enables approximate CSD querying. Connor is built based on an efficient, tree-based ciphertext comparison protocol, and makes use of symmetric-key primitives and the somewhat homomorphic encryption, making it computationally efficient. Using Connor, a graph owner can first encrypt privacy-sensitive graphs and then outsource them to the cloud server, achieving the necessary privacy without losing the ability of querying. Extensive experiments with real-world datasets demonstrate the effectiveness and efficiency of the proposed graph encryption scheme.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Recent years have witnessed the prosperity of applications based on graph-structured data [1, 2], such as online social networks, road networks, web graphs [3], biological networks, and communication networks [4, 5]. Consequently, many systems for managing, querying, and analyzing massive graphs have been proposed in both academia (e.g., GraphLab [6], Pregel [7] and TurboGraph [8]) and industry (e.g., Titan, DEX and GraphBase). With the prevalence of cloud computing, graph owners (e.g., enterprises and startups for graph-based services) desire to outsource their graph databases to a cloud server, which raises a great concern regarding privacy. An intuitive way to enhance data privacy is encrypting graphs before outsourcing them to the cloud. This, however, usually comes at the price of inefficiency, because it is quite difficult to perform operations over encrypted graphs.

Shortest distance querying is one of the most fundamental graph operations, which finds the shortest distance, according to a specific criterion, for a given pair of source and destination in a graph. In practice, however, users may consider multiple criteria when performing shortest distance queries [2]. Taking the road network as an example, a user may want to know the shortest distance, in terms of travelling time, between two cities within a budget for total toll payment. This problem can be represented by a constrained shortest distance (CSD) query, which finds the shortest distance based on one criterion with one or more constraints on other criteria.

In this paper, we focus on single-constraint CSD queries. This is because most practical problems can be represented as a single-constraint CSD query. For instance, such a query on a communication network could return the minimum cost from a starting node to a terminus node, with a threshold on routing delay. In addition, multi-constraint CSD queries can usually be decomposed into a group of sub-queries, each of which can be abstracted as a single-constraint CSD query. Formally, a CSD query111For simplicity, we refer to single-constraint CSD queries as CSD queries hereafter. is such that: given an origin , a destination , and a cost constraint , finding the shortest distance between and whose total cost does not exceed .

Existing studies in this area can be roughly classified into two categories. The

first category mainly focuses on the CSD query problem over unencrypted graphs [9, 10, 11, 12, 2]. However, these methods cannot be easily applied in the encrypted graph environment, because many operations on plain graphs required in these methods (e.g., addition, multiplication, and comparison) cannot be carried out successfully without a special design for encrypted graphs. The second category aims at enabling the shortest distance (or shortest path) queries over encrypted graphs [1, 13]. They usually adopt distance oracles such that the approximate distance between any two vertices can be efficiently computed, e.g., in a sublinear way. The main limitation of these approaches is that they are incapable of performing constraint filtering over the cloud-based encrypted graphs. Therefore, they cannot be directly applied to answering CSD queries.

Motivated by the limitations of existing schemes, our goal in this paper is to design a practical graph encryption scheme that enables CSD queries over encrypted graphs. As the CSD problem over plain graphs has been proved to be NP-hard [10], existing studies (e.g., [2]) usually resort to approximate solutions, which guarantee that the resulting distance is no longer than times of the shortest distance (where is an approximation ratio predefined by graph owners), subject to the cost constraint . The encryption of graphs would make the CSD problem even more complicated. Hence, we also focus on devising an approximate solution.

Specifically, this paper presents Connor, a novel graph encryption scheme targeting the approximate CSD querying over encrypted graphs. Connor is built on a secure 2-hop cover labeling index (2HCLI), which is a type of distance oracle such that the approximate distance between any two vertices in a graph can be efficiently computed [1, 2]. The vertices of the graph in the secure 2HCLI are encrypted by particular pseudo-random functions (PRFs). In order to protect real values of graph attributes while allowing for cost filtering, we encrypt costs and distances (between pairs of vertices) by the order-revealing encryption (ORE) [14, 15] and the somewhat homomorphic encryption (SWHE) [16], respectively. Based on the ORE, we design a simple but efficient tree-based ciphertexts comparison protocol, which can accelerate the constraint filtering process on the cloud side.

The main contributions of this paper are as follows.

  1. We propose a novel graph encryption scheme, Connor, which enables the approximate CSD querying. It can answer an -CSD query in milliseconds and thereby achieves computational efficiency.

  2. We design a tree-based ciphertexts comparison protocol, which helps us to determine the relationship of the sum of two integers and another integer over their ciphertexts with controlled disclosure. This protocol can also serve as a building block in other relevant application scenarios.

  3. We present a thorough security analysis of Connor and demonstrate that it achieves the latest security definition named CQA2-security [17]. We also implement a prototype and conduct extensive experiments on real-world datasets. The evaluation results show the effectiveness and efficiency of the proposed scheme.

To the best of our knowledge, this is the first work that enables the approximate CSD querying over encrypted graphs.

The rest of this paper is organized as follows. We summarize the related work in Section II and describe the background of the approximate CSD querying in Section III. We formally define the privacy-preserving approximate CSD querying problem in Section IV. After that, the construction of Connor is presented in Section V, with a detailed description of the tree-based ciphertexts comparison protocol in Section VI. We exhibit the complexity and security analyses in Section VII, evaluate the proposed scheme through extensive experiments in Section VIII, and conclude this paper in Section IX.

Ii Related Work

In an era of cloud computing, security and privacy become great concerns of cloud service users [18, 19, 20, 21, 22, 23]. Here we briefly summarize the related work from two aspects, i.e., CSD querying over plain graphs and graph privacy protection.

Plain CSD queries. The constrained shortest distance/path querying over plain graphs has attracted many research attentions. Hansen [9] proposed an augmented Dijkstra’s algorithm for exact constrained shortest path queries without an index. This method, however, resulted in a significant computational burden. In order to improve the querying efficiency, another solution [11] focused on approximate constrained shortest path queries, which were also index-free.

The state-of-the-art solution to the exact constrained shortest path querying with an index was proposed by Storandt [12], which accelerated query procedure with an indexing technique called contraction hierarchies. This approach still results in impractically high query processing cost. Wang et al. [2] proposed a solution to the approximate constrained shortest path querying over large-scale road networks. This method took full advantage of overlay graph techniques to construct an overlay graph based on the original graph, whose size was much smaller than that of the original one. Consequently, they built a constrained labeling index structure over the overlay graph, which greatly reduced the query cost. Unfortunately, all these solutions are merely suitable to perform queries over unencrypted graphs.

Graph privacy protection. Increasing concerns about graph privacy have been raised with the wide adoption of the cloud computing paradigm over the past decade. Chase and Kamara [17] first introduced the notion of graph encryption, where they proposed several constructions for graph operations, such as adjacency queries and neighboring queries. Cao et al. [24] defined and solved the problem of privacy-preserving query over encrypted graph data in cloud computing by utilizing the principle of “filtering-and-verification”. They built the feature-based index of a graph in advance and then chose the efficient inner product to carry out the filtering procedure. Some approaches [25, 26, 13] utilized the differential privacy technique to query graphs privately, which might suffer from weak security. These studies, however, introduced prohibitively great storage costs and were not practical for large-scale graphs. Meng et al. [1] proposed three computationally efficient constructions that supported the approximate shortest distance querying with distance oracles, which were provably secure against a semi-honest cloud server.

Secure multi-party computation (SMC) techniques have been widely applied to address the privacy-preserving shortest path problem [27, 28, 29, 30], as well as other secure computation problems [31]. Aly et al. [28] focused on the shortest path problem over traditional combinatorial graph in a general multi-party computation setting, and proposed two protocols for securely computing shortest paths in the graphs. Blanton et al. [27] designed data-oblivious algorithms to securely solve the single-source single-destination shortest path problem, which achieved the optimal or near-optimal performance on dense graphs. Keller and Scholl [29] designed several oblivious data structures (e.g., priority queues) for SMC and utilized them to compute shortest paths on general graphs. Gupta et al. [30] proposed an SMC-based approach for finding policy-compliant paths that have the least routing cost or satisfy bandwidth demands among different network domains. However, existing general-purpose SMC solutions for the shortest path problem may result in heavy communication overhead.

Although there are respectable studies on graph querying over encrypted graphs, the privacy-preserving CSD query remains unsolved. In this paper, we propose a novel and efficient graph encryption scheme for CSD queries.

Iii Background

This section presents the formal definition of the CSD query problem and introduces the 2HCLI structure for graph queries.

Notation Meaning
Input graph
Number of vertices and edges in
Distance and cost of an edge
Distance and cost of the edge from to
Origin, destination, approximation ratio, amplification factor and cost constraint in an -CSD query
Plain and encrypted graph index
In- and out-label set associated with vertex
Depth of a cost constraint tree
Length of a path code
ORE ciphertext of
Security parameter
Output length of ORE encryption
Input length of symmetric encryption algorithms
Query token
Candidate sets as the outputs of the cost constraint filtering
Maximum distance over all the sketches
TABLE I: List of notations

Iii-a Approximate CSD Query

Let be a directed graph222We refer to as a directed graph in this paper, unless otherwise specified. with a vertex set and an edge set . Each edge is associated with a distance and a cost . We regard the cost as the constraint. We denote the set of edges that connect two vertices as a path. For a path , its distance is defined as , which indicates the distance from its origin to its destination. Similarly, we define the cost of as . The notations throughout the paper are summarized in Table I.

Given a graph , an origin vertex , a destination vertex , and a cost constraint , a CSD query is to find the the shortest distance between and with the total cost no more than . Since the CSD query problem has been proved to be NP-hard [10], we keep in line with existing solutions [2] and focus on proposing an approximate CSD solution in this paper.

Inspired by a common definition of the approximate shortest path query over plain graphs [2], we define the approximate CSD query (i.e., -CSD query) as follows. (-CSD QUERY). Given an origin , a destination , a cost constraint and an approximation ratio , an -CSD query returns the distance of a path , such that and , where is the optimal answer to the exact CSD query with the origin , destination and cost constraint .

Fig. 1 shows a simple graph with five vertices, where the distance and cost of each edge are marked alongside it. Given an origin , a destination , a cost constraint , the exact CSD query returns the optimal distance , where the corresponding path is . For an approximation ratio , a valid answer to the -CSD query with the same parameters (e.g, the origin , the destination , and ) is 8, with the corresponding path . That is because and .

Based on the above definition, given two paths and with the same origin and destination, we say that -dominates iff and . With this principle, we can reduce the construction complexity of graph index significantly, because a great deal of redundant entries in the index can be filtered out. We will make a further illustration in the following subsection.

[height=3cm]figure/shen1.pdf
Fig. 1: An example illustrating the -CSD query over a graph.

Iii-B Constructing Labeling Index

The encrypted index designed in this paper is mainly constructed based on the well-known 2HCLI, which is a special data structure that supports the shortest distance query efficiently [32, 33, 2]. Here we briefly describe the basic idea of the 2HCLI, and illustrate its application in building a constrained labeling index.

Given a graph with a vertex set and an edge set , each vertex is associated with an in-label set and an out-label set, which are denoted by and , respectively. Each entity in corresponds to the shortest distance from a vertex to . It implies that is reachable from by one or more paths, but is not necessarily a neighbor, or 2-hop neighbour, of . Similarly, each entity in corresponds to the shortest distance from to another vertex in . To answer a shortest distance query from an origin to a destination , we first find the common vertices in the labels and , and then select the shortest distance from to . Note that the entities in and are carefully selected [33] so that the distance of any two vertices and can be computed by and .

Considering the graph in Fig. 1, if we ignore the cost criterion of edges, the basic unconstrained shortest distance query with an origin and a destination can be answered with the help of the 2HCLI, as shown in Fig. 2. Given the labels and , it is easy to obtain the set of common vertices, which consists of vertices and . The final answer to the basic shortest distance query should be 5, because .

Although it is simple and straightforward to construct the 2HCLI for a graph with only the distance criterion, constructing a labeling index based on the 2HCLI for the CSD query is much more complex. That is because in the CSD query setting with two types of edge criteria, there might be multiple combinations of distance and cost for each pair of vertices in the labels and . For ease of illustration, we also take as an example the graph, as well as the CSD query, in Fig. 1. The corresponding 2HCLI is shown in Fig. 3, where the 2-tuple alongside each arrow represents the distance and cost from the starting vertex to the ending vertex. Note that in the shortest distance query in Fig. 2, the shortest distance from to via is unique. However, in the CSD query setting depicted in Fig. 3, there are four possible distances with different costs from to via . Due to the existence of the cost criterion, the number of possible distances for each pair of vertices could increase dramatically in large-scale graphs, which results in a higher complexity in constructing the 2HCLI and calculating the answers to a CSD query.

[height=3cm]figure/shen2.pdf
Fig. 2: A 2HCLI example of the basic shortest distance query. Each entity in 2HCLI alongside the arrow indicates the shortest distance from the starting vertex to the ending vertex, e.g., the shortest distance from to is 3.
[height=3cm]figure/shen3.pdf
Fig. 3: A 2HCLI example of the exact CSD query. Each entity in the 2HCLI alongside the arrow indicates the distance and cost, respectively. The shortest distance from to with a cost constraint is 5.

In order to improve the querying efficiency, we adopt a methodology that combines an offline filtering operation and an online filtering operation.

The offline filtering aims at reducing the construction complexity of the 2HCLI and decreasing the number of entries in the in-label and out-label sets as many as possible. We adopt the method proposed in [2]. The entities in the 2HCLI are carefully selected in such a way that for any CSD query from to with a cost constraint , the query can be answered correctly using only the 2HCLI. Since the construction of the 2HCLI should be independent of the cost constraint in specific CSD queries, we can use the definition of -domination to filter out redundant entries in the in- and out-label sets.

Taking for example the two entries from to with in Fig. 3, the path with the -tuple of (3,2) -dominates another path with the -tuple of (2,6). Therefore, the entry corresponding to the path can be filtered out (as depicted by a dashed arrow), which helps to reduce the number of entries in . The resulting 2HCLI is exhibited in Fig. 4. We refer the reader to [2] for more construction details.

The online filtering aims at selecting the possibly valid answers to a given CSD query, based on only the 2HCLI. For instance, given an -CSD query from to with a cost constraint , we can first find the common vertex set between and , and then return the minimum with for each . Since the above comparisons should be conducted with the corresponding ciphertexts, an efficient online filtering approach will be devised in Section VI.

[height=2.6cm]figure/shen4.pdf
Fig. 4: The resulting 2HCLI after performing the offline filtering on the original 2HCLI in Fig. 3. Each entity in the 2HCLI indicates the vertex identifier, distance and cost, respectively. The answer to the approximate CSD query (i.e., the origin , the destination , , and ) is 6, which happens to be the answer to the exact CSD query.

Iv Problem Formulation

This section presents the system model and the security model of the privacy-preserving -CSD querying, as well as the preliminaries of the proposed graph encryption scheme.

Iv-a System Model

We adopt the general system model in the literature [17, 1] for the privacy-preserving -CSD querying, as illustrated in Fig. 5, which mainly involves two types of entities, namely a user and a cloud server.

The user constructs the secure searchable index for the graph and outsources the encrypted index along with the encrypted graph to the cloud server. When the user, say Alice, performs an -CSD query over her encrypted graph, she first generates a query token and then submits it to the cloud server. Upon receiving Alice’s query token, the cloud server executes the pre-designed query algorithms to match entries in the secure index with the token. Finally, the cloud server replies the user with the answer to the -CSD query.

The graph encryption scheme is formally defined as follows.

(GRAPH ENCRYPTION). A graph encryption scheme consists of three polynomial-time algorithms that work as follows:

  • : is a probabilistic secret key generation algorithm that takes as input a security parameter and outputs a secret key and a public/secret-key pair .

  • : is a graph encryption algorithm that takes as input an approximation ratio , a secret keys , a key pair , an amplification factor and a graph , and outputs a secure index .

  • : is a two-party protocol between a user that holds a secret key , a key pair and a query , and a cloud server that holds an encrypted graph index . After executing this protocol, the user receives the distance as the query result and the cloud server receives a terminator .

[height=2.5cm]figure/shen5.pdf
Fig. 5: The system model of privacy-preserving CSD query scheme.

Iv-B Security Model

Graph encryption is a generalization of symmetric searchable encryption (SSE) [34, 35, 36, 37, 38]. Thus, we adopt the security definition of SSE settings in our graph encryption scheme. This security definition is consistent with the latest proposed security definition in [35, 39, 17], which is also known as CQA2-security (i.e., the chosen-query attack security). Now we present the formal CQA2-security definition as follows.

(CQA2-security model). Let be a graph encryption scheme and consider the following probabilistic experiments where is a semi-honest adversary, is a simulator, and and are (stateful) leakage functions.

Real:

  • outputs a graph , an approximation ratio and an amplification factor .

  • The challenger begins by running to generate a secret key and a public/secret-key pair , and then computes the encrypted index by . The challenger sends the encrypted index to .

  • makes a polynomial number of adaptive queries, and for each query , and the challenger execute .

  • computes a bit as the output of the experiment.

Ideal:

  • outputs a graph , an approximation ratio and an amplification factor .

  • Given the leakage function , simulates a secure graph index and sends it to .

  • makes a polynomial number of adaptive queries. For each query , is given the leakage function , and and execute a simulation of , where is playing the role of the cloud server and is playing the role of the user.

  • computes a bit as the output of the experiment.

We say that the graph encryption scheme is -secure against the adaptive chosen-query attack, if for all PPT adversaries , there exists a PPT simulator such that

where is a negligible function.

Iv-C Preliminaries

Now we briefly introduce an encryption technique employed in our design, i.e., the order-revealing encryption.

Order-revealing encryption (ORE) is a generalization of the order-preserving encryption (OPE) scheme, but provides stronger security guarantees. As pointed by Naveed et al. [40], the OPE-encrypted databases are extremely vulnerable to inference attacks. To address this limitation, the ORE scheme has been proposed [14, 15], which is a tuple of three algorithms described as follows:

  • ORE.Setup(): Input a security parameter , output the secret key .

  • ORE.Encrypt(): Input a secret key and a message , output a ciphertext .

  • ORE.Compare(): Input two ciphertexts and , output a bit , which indicates the greater-than or less-than relationship of the corresponding plaintexts and .

V Construction of Connor

In this section, we introduce our graph encryption scheme Connor for the privacy-preserving -CSD querying.

V-a Construction Overview

The construction process is based on two particular pseudo-random functions and , and a somewhat homomorphic encryption (SWHE) scheme. In this paper, we adopt the concrete instantiation of a SWHE scheme in the literature [16]. The parameters of and are illustrated in Equation eq:parameters, & h: {0,1}^λ ×{0,1}^* →{0,1}^λ
& g: {0,1}^λ ×{0,1}^* →{0,1}^λ+ z + k where is the security parameter, and and are the output lengths of the ORE and SWHE encryptions, respectively.

We start with a straightforward construction as follows, including:

  • KeyGen: Given the security parameter , the user randomly generates a secret key and a pair of public and secret keys for SWHE.

  • Setup: Given an original graph , an approximation ratio , and an amplification factor , the user obtains the encrypted graph index by using Algorithm V-A. The 2HCLI of can be generated by the method described in Section III-B.

    Let be the maximum distance over all the sketches and . Motivated by the literature [1], each distance is encrypted as by the SWHE to protect its real value (line 8). Considering that is bounded by , the SWHE encryption of distance allows for obtaining the minimum sum over a certain number of distance pairs.

    Each cost , multiplied by the amplification factor , is encrypted by the ORE encryption (line 9). is a big integer and should be carefully selected to enlarge the plaintext space of . In practice, the product of and the maximum cost value over all the sketches should be sufficiently large (e.g., at least ), which is used to provide a sufficient randomness to the inputs. Since is kept private by the user, the cloud server cannot learn the real values of .

  • Query: To perform an -CSD query with an origin , a destination , and a cost constraint , the user generates query tokens and , and sends them to the cloud server. The cloud server obtains and from the index. For each encrypted vertex identifier that appears in both and , the cloud server performs a cost constraint filtering operation (which will be described in details in Section VI), and adds each pair which satisfies the cost constraint into a candidate set . Note that the cost constraint is multiplied by because we encrypt the cost , instead of .

    Then, the cloud server directly obtains , where SWHE.Eval for each pair in . The correctness of the above calculation follows homomorphic properties of SWHE. We refer the readers to [1] for more details.

    Finally, the cloud server returns to the user, who, in turn, obtains the answer to the -CSD query by decrypting with its secret key .

Note that this straightforward approach does not only correctly answer the -CSD query over encrypted graphs, but also protects the vertex identifier, distance, and cost information.

However, the encrypted graph index obtained from Algorithm V-A, without performing any queries, still results in information leakage. On one hand, it reveals the length of each encrypted sketch, i.e., and , as well as the order information of ORE-encrypted costs in all sketches. On the other hand, it also discloses the number of common vertices between and , which indicates the number of vertices that connect to . In particular, if the cloud server knows that there is no common vertex between and , it learns that cannot reach .

[t] Setup algorithm for [1] A secret key , a key pair , an approximation ratio , an amplification factor , and an original graph . The encrypted graph index .

Generate the 2-hop labeling index from . Initialize two dictionaries and . Let be the maximum distance over all the sketches and set .

each Set , . each Compute . Compute SWHE.Enc. Compute ORE.Enc. Insert () into the dictionary . Repeat the above procedure for each sketch in and add entries into .

return as the encrypted graph index.

V-B Privacy-preserving -CSD Querying

In order to enhance protection of sensitive information, we construct a privacy-preserving -CSD querying scheme , where the key generation procedure is the same as in , with improved index construction and CSD query procedures as exhibited in Algorithms V-B and V-B, respectively.

[t] Setup algorithm for [1] A secret key , a key pair , an approximation ratio , an amplification factor , and an original graph . The encrypted graph index .

Generate the 2HCLI of . Initialize two dictionary and . Let be the maximum distance over the sketches and set .

each Set , , , and . Initialize a counter

each Compute . Compute SWHE.Enc. Compute ORE.Enc.

Set and . Compute . Set . Set .

Repeat the above procedure for each sketch in and obtain , except that: (i) set and , and (ii) compute .

return as the encrypted graph index.

[htbp] Query algorithm for [1] The user’s input are the secret key , secret key pair , an amplification factor , and the query . The cloud server’s input is the encrypted index . user’s output is and cloud server’s output is .

user generates , , and . user constructs a cost constraint tree based on using secret as described in Section VI. user sends to cloud server.

cloud server parses as .

cloud server initializes a set and a counter . cloud server computes . cloud server computes . cloud server performs . cloud server add into . Set . cloud server computes .

cloud server initializes a set and a counter . cloud server computes . cloud server computes . cloud server performs . cloud server add into . Set . cloud server computes .

For each encrypted vertex identifier that appears in both in and , the cloud server performs the cost constraint filtering operation through Algorithm VI-C, and add the pair which satisfies the cost constraint into a set . The pair that Algorithm VI-C cannot verify is also added into .

For each pair in , the cloud server first computes SWHE.Eval, and then computes .

cloud server returns to the user. user decrypts with .

return Decrypted value of as .

The Setup for works as follows. The user first builds the 2HCLI of graph , and then encrypts sketches associated with (i.e., and ), as described in lines 2-17.

Note that in order to prevent the leakage of the sketch size in the previous straightforward approach, we split each encrypted sketch and , and ensure that they are stored in the dictionary separately, with a size of one. More precisely, we utilize a counter and generate the unique and for each entity in (line 11). Similarly, the unique and for each entity in can be generated (line 16). The (or ) indicates the position that this entity will be stored in (or ), which ensures each position in the dictionary (or ) having only one entity.

(or ) is used to make an XOR operation with . Since (or ) is different for each sketch, the XOR operation makes the resulting indistinguishable, which guarantees that the static encrypted graph index reveals neither the number of common vertices between and , nor the order information of costs.

The Query in Algorithm V-B works as follows. Assume that the user asks for the shortest distance between and , whose total cost does not exceed . She first generates the query token and sends it to the cloud server (lines 1-3). Upon receiving the token , the cloud server searches in the index and obtains and (lines 5-22). That is, the cloud server iteratively judges whether the dictionary () contains the key () or not. If it exists, then it adds the corresponding entity into the set ().

Once and are obtained, the cloud server performs the cost constraint filtering (line 23) and computes (line 24), which are the same as described in the straightforward approach. Finally, the user gets the final answer by decrypting , which is returned by the cloud server, using its .

Vi Tree-Based Ciphertexts Comparison Approach

This section introduces a tree-based ciphertexts comparison approach, which is used for cost constraint filtering in the graph encryption scheme described in Section V.

Vi-a Scenarios

Assume that there is a user (i.e., ) and a server (i.e., ). has many integers which are encrypted by a kind of cryptography algorithm and then outsourced to . Now, wants to ask for to obtain integer pairs, e.g., (, ), whose sum does not exceed . Note that the plaintexts of , and could not be disclosed to , except for the greater-than, equality, or less-than relationship. A naive approach is to download all the integers, calculate the summation locally, and choose the integer pairs satisfying the constraint. This method, however, is meaningless if one wants to offload the computation to the cloud. Hence, it is desirable to have a practical solution to this problem.

Note that this scenario is different from the well-known SMC scheme. In the setting of SMC [41, 42], a set of (two or more) parties with private inputs wish to compute a function of their inputs while revealing nothing but the result of the function, which is used for many practical applications, such as exchange markets. SMC is a collaborative computing problem that solves the privacy preserving problem among a group of mutually untrusted participants. The ciphertexts of all pairs of (, ) and the cost constraint are outsourced to the cloud server, which is responsible for the inequality tests. Furthermore, we could reveal the relationship between the sum of two ciphertexts and another ciphertext to the server, which is referred to as controlled disclosure in the literature [17].

It seems that we might leverage the homomorphic encryption technique, since it supports a sum operation of calculating . Nevertheless, as the homomorphic encryption is probabilistic, we are unable to determine the relationship between and over their ciphertexts.

Vi-B Main Idea

The main idea of the tree-based ciphertexts comparison protocol is to encode an integer with the ORE primitive. To the best of our knowledge, none of the existing approaches can support ORE and homomorphism properties simultaneously. Hence, we design a novel method to address this problem, which is motivated by the following facts.

If we want to compare with , we can compare with and with , respectively. Now, we result in 4 possible cases corresponding to combinations of the two relationships. If () and (), we can know that (). In the rest two cases, i.e., and , or and , we cannot achieve a deterministic result. At this point, we can further divide into . And then we can compare and with and , respectively.

By iteratively performing such an operation, we can determine the relationship between and

with an increasing probability. Due to the ORE property, it is easy to perform the above operations over ciphertexts. Next, we will show how to implement this idea efficiently by utilizing a tree structure.

Vi-C Details of Protocol

To implement the comparison of and over their ciphertexts, we construct a cost constraint tree, whose nodes represent specific values that are related to . For clarity, we define as the ORE ciphertext of .

An example of the tree structure is depicted in Fig. 6. For each node, we assign 0 to its left child path, while 1 to the right child path. If an integer is not greater than the value of this node, we take the left child path for further comparison; otherwise, we take the right child path. Thus, for any path from the root node to a leaf node, we can obtain a path code, which is an effective representation of the comparison procedure. For instance, an incoming integer would traverse Nodes , , and , and thereby end with a path code of 010. We define the length (i.e., the number of bits) of a path code as . Note that is actually equal to the depth of the tree which is denoted by .

Now the relationship between and can be determined as follows. We first get the ORE ciphertexts of and , as well as their path codes and by traversing the tree separately. When computing , if an overflow occurs (i.e., ), we know that with confidence. If , we also know that with confidence. Otherwise, we are unable to determine the relationship and end up with an uncertainty. We summarize this procedure in Algorithm VI-C.

[height=4cm]figure/shen6.pdf
Fig. 6: An example of the cost constraint tree with a depth of 3, where circles represent nodes. The boxes in the dashed rectangle indicate path codes for all possible comparison results. Note that these boxes are not a part of the tree.

[htbp] Tree-Based Ciphertexts Comparison Algorithm [1] Two ORE ciphertexts , and a cost constraint tree whose depth is . The relationship between and .

Initialize a counter and two empty strings and .

Visit the -th level of the tree with and concatenate with corresponding or . Visit the -th level of the tree with and concatenate with corresponding or . Set

return . return .

return uncertainty.

Discussion. Observe that when we go through a cost constraint tree, one more step can further reduce the uncertainty of the relationship between and by half. We denote the probability of uncertainty as

Pr.

where is the length of the path code. We can easily know the probability of certainty is

Pr Pr.

When the tree depth is 6 (e.g., ), the probability of certainty could reach about 0.9844.

Another observation is the comparison procedure reveals the order information between (or ) and . Thus, the server can infer the interval that belongs to with precision of . To prevent the server from inferring the real value of , in Connor, the user randomly picks a big integer number that is applied to , , and simultaneously, which significantly enlarges the plaintext and ciphertext spaces (e.g., ). The value of is generally a small integer (e.g., 6 in our implementation) that is determined by the user, and both and are kept secret by the user. Therefore, the server cannot infer the real value of (or ) from the order relationship among ciphertexts. We will formally analyze the leakage functions and security issues in the next section.

Vii Complexity and Security Analyses

This section presents the complexity and security analyses on the proposed graph encryption scheme Connor.

Vii-a Complexity Analysis

Connor mainly consists of the Setup and Query algorithms, as described in Algorithms V-B and V-B.

The dominant component in determining the complexity of the Setup algorithm is the encryption of the plain 2HCLI generated from a graph . Let be the total sketch for all vertices in , then the time complexity and space complexity are both , where is the number of vertices in .

The Query algorithm consists of a query token generation process on the user side and a CSD query process on the cloud server side. Let be the maximum size of the sketch associated with each vertex in . The complexity of the query token generation process is mainly determined by the construction of a cost constraint tree, whose time complexity and space complexity are both . For the CSD querying process, the time complexity of getting and , performing cost constraint filtering, and performing distance computation are , , and , respectively. The space complexity of the above three components are , , and , respectively. Therefore, the total time complexity and space complexity of the CSD querying process are and , respectively.

Vii-B Security Analysis

We now present the security analysis on Connor. For clarity, we first discuss the leakage functions, and then prove that Connor is secure under the CQA2-security model.

Setup Leakage. The leakage function of our construction reveals the information that can be deduced from the secure 2HCLI of graph , including the total number of vertices in the graph , the maximum distance over all the sketches , and the size of . More precisely, the size of consists of the total number of sketch entities in and , which are denoted by and , respectively. Thus, the leakage function .

Note that the order relationship of pairwise costs and the order relationship between the cost and cost constraint are not included in , because for each entity in sketches, we make an XOR operation using a unique integer value after we encrypt it, and this makes each entity in sketches are indistinguishable.

Query Leakage. The leakage function of our construction consists of the query pattern leakage, the sketch pattern leakage, and the cost pattern leakage. Intuitively, the query pattern leakage reveals whether a query has appeared before. The sketch pattern leakage reveals the sketch associated to a queried vertex, the common vertices between two different sketches, and the size of the sketches of queried vertices. The cost pattern leakage reveals 1) the order relationship among costs, and 2) the order relationship between costs and the cost constraint during the query procedure. We formalize these leakage functions as follows.

(QUERY PATTERN LEAKAGE). Let be a non-empty sequence of queries. Each query specifies a tuple (, , ). For any two queries and , define , i.e., whether each element of matches each element of , respectively. Then, the query pattern leakage function returns an (symmetric) matrix, in which each entry (, ) equals . Note that does not leak the identities of the query vertices.

(SKETCH PATTERN LEAKAGE). Given a secure 2HCLI of a graph and a query , the sketch pattern leakage function is defined as . is a list, each element of which is the sketches associated to the queried vertices, and is a pair , where and are multi-sets and is a particular pseudo-random function.

(COST PATTERN LEAKAGE). The cost constraint in a query can essentially be represented by a certain number of uniform intervals. Let be the depth of the cost constraint tree (c.f. Section VI). The intervals associated with are , where . Assign each interval with a list , i.e., the -th interval is associated with , which stores all the cost values belong to this interval. The leaked interval information forms an array , of which the -th element is (i.e., ). In addition, assume that is the total number of entries in the sketches of the queried vertices. For each pair of costs and , its order relationship of the greater-than, equality, and less-than can be represented by , , and , respectively. The leaked order information of costs is a (symmetric) matrix with each entry (, ) being , , or . Therefore, the cost pattern leakage function .

Thus, .

The leakage functions are defined over the 2HCLI rather than the original graph. In fact, the information leakage of the original graph is limited to the minimum number of paths for the queried source-destination vertices. It can be defined as an (symmetric) matrix , where is the number of vertices in the graph. Each element in is NULL, 0, or a positive integer, which indicates an uncertain status (i.e., topology is well protected), disconnection, or the minimum number of paths of the two queried vertices, respectively.

For the cost values in the 2HCLI, we introduce a user-held amplification factor to enlarge the plaintext and ciphertext spaces. Thus, the server cannot infer the real cost values just from their order information revealed by the leakage function . For the distance values in the 2HCLI, we use the SWHE encryption to protect their real values from the server.

Theorem 1. If the cryptography primitives , , ORE, and the SWHE are secure, then the proposed graph encryption scheme is -secure against the adaptive chosen-query attack.

The key idea is constructing a simulator . Given the leakage functions and , constructs a fake encrypted 2HCLI structure and a list of query . If for all PPT adversaries , they cannot distinguish between the two games Real and Ideal, we can say that our graph encryption scheme is -secure against the adaptive chosen-query attack.

Simulating . handles each vertex () to generate a fake in 2HCLI based on the leakage function . randomly chooses for with , and samples and uniformly without repetition. For all , takes the following steps to simulate each sketch: computes and , where is a particular pseudo-random function. Then, it encrypts each vertex in the sketch of by computing , where is a fake secret key. It randomly generates two integers and and obtains ciphertexts and by encrypting () and using the SWHE and ORE schemes. Let . stores in the index . That is, . Similarly, generates a fake and finally obtains the fake 2HCLI .

Simulating . Given the leakage function , simulates the query token as follows. first checks if either of the queried vertices and has appeared in any previous query. If appeared previously, sets and to the values that were previously used. Otherwise, it sets and for some previously unused and . It then remembers the association among , , and . takes the same steps for the queried vertex : setting and analogously and associating with the selected and .

To simulate a fake cost constraint tree , first checks if the queried appeared in any previous query. If appeared previously, sets the to the value that was previously used. Otherwise, constructs a full binary tree based and encrypts each tree node by using the ORE scheme with a randomly generated key. returns this encrypted tree as .

simulates the query procedure as follows. Given the query token , first checks if the query has been queried before. If yes, returns the value that was previously used as the query result. Otherwise, checks whether the queried vertex (or ) has been queried before. If the query vertex has appeared in a previous query, sets to the values that were previously used from of . Otherwise, for a newly appeared vertex , takes the following steps: To generate the sketches associated with , first initializes a set and a counter , Then, it iteratively computes