GraphSE^2: An Encrypted Graph Database for Privacy-Preserving Social Search

In this paper, we propose GraphSE^2, an encrypted graph database for online social network services to address massive data breaches. GraphSE^2 preserves the functionality of social search, a key enabler for quality social network services, where social search queries are conducted on a large-scale social graph and meanwhile perform set and computational operations on user-generated contents. To enable efficient privacy-preserving social search, GraphSE^2 provides an encrypted structural data model to facilitate parallel and encrypted graph data access. It is also designed to decompose complex social search queries into atomic operations and realise them via interchangeable protocols in a fast and scalable manner. We build GraphSE^2 with various queries supported in the Facebook graph search engine and implement a full-fledged prototype. Extensive evaluations on Azure Cloud demonstrate that GraphSE^2 is practical for querying a social graph with a million of users.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/25/2018

ESAS: An Efficient Semantic and Authorized Search Scheme over Encrypted Outsourced Data

Nowadays, a large amount of user privacy-sensitive data is outsourced to...
11/09/2020

Privacy-Preserving XGBoost Inference

Although machine learning (ML) is widely used for predictive tasks, ther...
07/17/2019

In-Depth Benchmarking of Graph Database Systems with the Linked Data Benchmark Council (LDBC) Social Network Benchmark (SNB)

In this study, we present the first results of a complete implementation...
03/26/2020

Recessive Social Networking: Preventing Privacy Leakage against Reverse Image Search

This work investigates the image privacy problem in the context of socia...
05/22/2020

Privacy-Preserving Clustering of Unstructured Big Data for Cloud-Based Enterprise Search Solutions

Cloud-based enterprise search services (e.g., Amazon Kendra) are enchant...
11/29/2018

MOBIUS: Model-Oblivious Binarized Neural Networks

A privacy-preserving framework in which a computational resource provide...
09/19/2018

Efficient and Privacy-Preserving Ride SharingOrganization for Transferable andNon-Transferable Services

Ride-sharing allows multiple persons to share their trips together in on...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Data breaches in online social networks (OSNs) affect billions of individuals and raise critical privacy concerns across the entire society (Liang et al., 2015; Information is Beautiful, 2018). Besides, driven by the demands on huge storage and computation resources, OSN service providers utilise public commercial clouds as their back-end data storage (AWS, 2018a, b; Engineering, 2018), which further broadens the attack plane (Ren et al., 2012). Therefore, there is an urgent call to improve the control of data confidentiality for cloud providers (Yang et al., 2016; Liu et al., 2016; Li et al., 2016), in particular for current OSN services. The prevailing consensus to prevent data leakage is encryption. However, this approach impairs the functionality of social search, a key enabler for quality OSN services (Sullivan, 2012). Social search allows users to search content of interests created by their friends. Compared with traditional web search, it produces personalised search results and serves for a wide range of OSN services such as friend discovering and user targeting.

The first task to enable privacy-preserving social search is how to scalably query over very large encrypted social graphs. On the one hand, a typical social graph can contain millions or even billions of users. On the other hand, users may generate large volume of contents which will be queried for social search related services (Curtiss et al., 2013). The second and more challenging task is how to realise complex social search queries in an efficient and secure manner. As developed in plaintext systems (e.g., Facebook’s Unicorn (Curtiss et al., 2013)), queries of social search contains set operations on graph-structured data, and the retrieved contents from the graph need to further be analysed (e.g., aggregation and sorting) for advanced services such as friendship-based recommendation.

In the literature, some work (Nayak et al., 2015; Blanton et al., 2013) leverages generic building blocks (e.g., garbled circuits and oblivious data structures) to devise secure computational frameworks for graph algorithms. However, those frameworks do not appear to be scalable for low latency queries over large graphs. For example, a recent garbled circuits based framework (Nayak et al., 2015) takes several minutes to complete a sorting algorithm over a graph with only tens of thousands of nodes. Other work focuses on dedicated privacy-preserving graph algorithms, e.g., neighbour search (Chase and Kamara, 2010; Kamara et al., 2018), and shortest distance queries (Xie et al., 2016; Meng et al., 2015; Wu et al., 2016; Wang et al., 2017). Unfortunately, the above algorithms are limited for or different from the functionality of social search queries.

Contributions. To bridge the gap, in this paper, we propose and implement GraphSE2, the first encrypted graph database that supports privacy-preserving social search. Unlike prior work which either suffers from low scalability or limited functionality, GraphSE2 enables scalable queries over very large encrypted social graphs, and preserves the rich functionality of the plaintext social search systems. Our contributions can be summarised as follows:

  • [leftmargin=*]

  • We propose an encrypted and distributed graph model built on social graph modelling, searchable encryption, and the data partition technique. It facilitates queries over encrypted graph partitions in parallel, and maintains the locality of graph data and user-generated contents for low query latency.

  • We devise mixed yet interchangeable protocols to enable complex social search functions. The way of doing this is to decompose queries into atomic operations (i.e., set, arithmetic, and sorting operations) and then adapt suitable cryptographic primitives for efficient realisation. All these operations are tailored to be executed in parallel.

  • We realise query operators of the Facebook’s social search system Unicorn (Curtiss et al., 2013), i.e., term, and, or, difference, and apply. We also design a query planner that can parse a query to atomic operations and initiate the corresponding primitives.

  • We formally prove the security of our proposed query protocols under the real and ideal paradigm. Queries, graph data, and results are protected throughout the query process.

  • We show the practicality of GraphSE2 by implementing a prototype which is readily deployable. It leverages Spark (Zaharia et al., 2010) for setup (data partition and encryption), Redis (Redis Labs, 2017) as the storage back-end, and uses Apache Thrift (Slee et al., 2007) to implement the query planner and query processing logic.

Our comprehensive evaluation on the Youtube dataset (Mislove et al., 2007) with 1 million nodes confirms that all atomic operations are of practical performance. For set queries, GraphSE2 retrieves a content list with entities within ms. For an average user ( friends), GraphSE2 takes at most ms if the set operation involves two indexing terms (attributes); and it takes no more than ms for five indexing terms. Regarding the computational operations, GraphSE2 takes ms to handle arithmetic computations over entities, and ms to sort entities. As a summary, most of the queries for an average user are processed within s, and throughput is reduced at most compared to the plaintext queries.

Organisation. The rest of this paper is structured as follows. We discuss related work in Section 2. After that, we describe the system overview in Section 4, and present the encrypted and distributed graph data model and the design of atomic operations in Section 5. In Section 6, we introduce the realisation of privacy-preserving social search queries and their security. Next, we describe our prototype implementation in Section 7, and evaluate the performance in Section 8. We give a conclusion in Section 9.

2. Related Work

Privacy-preserving graph query processing. There exist various designs that aim to answer a certain type of queries over the encrypted graph. Structured encryption (Chase and Kamara, 2010) is proposed in the framework of SSE and supports adjacency and neighbouring queries. Some recent work is proposed to support privacy-preserving subgraph queries (Cao et al., 2011; Chang et al., 2016). However, all the above designs enable limited query functionality. Another line of work on privacy-preserving graph processing is to perform shortest-path queries over the encrypted graph. Protocols for this type of queries are devised via oblivious RAM (Xie et al., 2016), structured encryption (Meng et al., 2015), or Garbled Circuit (Wu et al., 2016; Wang et al., 2017). To implement more complicated algorithms, protocols are proposed to use secret sharing and homomorphic encryption for Breadth-first search (BFS) (Blanton et al., 2013), PageRank (Xie and Xing, 2014), and approximate eigen-decomposition (Sharma et al., 2018). We stress that the above work targets on different query functionality other than social search. Note that a recent framework named GarphSC (Nayak et al., 2015) can generate data-oblivious Garbled Circuit (GC) for graph algorithms such as PageRank and Matrix Factorisation. Because oblivious data structures are adapted for large graphs and all computations are realised via GC, it does not appear to achieve low latency for social search queries.

Encrypted database system. Our system is also related to encrypted database systems (Popa et al., 2011; Pappas et al., 2014; Poddar et al., 2016; Papadimitriou et al., 2016; Yuan et al., 2017). CryptDB (Popa et al., 2011) is the first practical encrypted database system, which is built on property-preserving encryption (PPE). It supports SQL queries over encrypted relational data records. BlindSeer (Pappas et al., 2014) proposes a Bloom Filter based index and leverages GC to evaluate arbitrary boolean queries with keywords and ranges. Arx (Poddar et al., 2016) follows the design of CryptDB to support SQL queries, but it uses SSE and GC to reduce the leakage from PPE. Seabed (Papadimitriou et al., 2016)

uses additively symmetric homomorphic encryption (ASHE) to perform efficient aggregation over the encrypted data, and develops a schema with padding to mitigate the inference attack 

(Naveed et al., 2015). EncKV (Yuan et al., 2017) adapts SSE and ORE schemes to design an encrypted and distributed key-value store. However, all the encrypted databases mentioned above are neither designed for graph data nor optimised for social search.

Graph processing system. In the plaintext domain, a large number of graph processing systems (Low et al., 2012; Curtiss et al., 2013; Chi et al., 2016) (just to list a few) are proposed to support efficient large graph processing. However, all the above systems only support queries over the graphs in unencrypted form, which are unable to address privacy concerns of sensitive data leakage. Authenticated graph query (Goodrich et al., 2011) is proposed to verify the correctness of graph queries, which could be a complementary work to prevent attacks from malicious adversaries.

3. Background

3.1. Social Graph Model

The social graph consists of nodes (aka entities) and edges (aka relationships of entities) in social networks. As the social graph is a sparse graph (Curtiss et al., 2013), it is normally represented via a set of adjacency lists. Like (Curtiss et al., 2013), we refer to these adjacency lists as posting lists.

Formally, the social graph is an edge-labeled and directed graph , where is the entity set and is the relationship set. Each posting list contains a list of entities , which are (sort-key, ) pairs. The sort-key is an integer that indicates the importance of the entity in a posting list, and the is its unique identifier.

The posting lists are indexed by the inverted index, and modelled by the edges in social graph: All edges in can be represented as a triad which consists of its egress, ingress nodes () plus an edge-type which is a string representing the relationship between nodes (e.g., friend, like). The inverted indexing term is in the form of . For example, the user may use to get the posting list of user ’s friends.

3.2. Oblivious Cross-Tags () Protocol

Oblivious Cross-Tags () Protocol (Cash et al., 2013) is an SSE protocol, which proceeds between client and server . It provides an efficient way to perform conjunctive queries in encrypted database111The scheme proposed in (Kamara and Moataz, 2017) supports disjunctive queries, but it consumes large storage space.. Here we provide a high-level description as needed for the basic operations of our proposed system.

The protocol has two types of data structures. Firstly, for every keyword , an inverted index, referred as ‘’, is built to point to the set of all entity identifiers s associating with . Each is identified by an indexing term called , and all values in are encrypted via a secret key . Both and are computed as a applied to with ’s secret keys. Another data structure called ‘’ is built to hold a list of hash values (called ’’) over all entity identities and keywords contained in , where is a certain (public) cryptographic hash function. The above two data structures are stored on the server-side.

To search a conjunctive query with keywords, sends the ‘search token’ related to (called ‘s-term’, we assume it to be in the above query) to , which allows the server to run and retrieve from the . In addition, sends ‘intersection tokens’ (called ‘xtraps’) related to the keyword pairs consisting of the ‘s-term’ paired with each of the remaining query keywords , (called ‘x-terms’). The xtraps allow the server to evaluate the cryptographic hash function of pairs without knowing either keyword or . checks the existence of in and filters the to subsets of entities that contain the pairs . It only returns the entities that contain all to the client. finally uses to recover the s of entities.

As mentioned in  (Cash et al., 2013), the security of parameterised by a leakage function . It depicts what an adversary is allowed to learn about the database and queries via executing

protocol. Informally, considering a vector of queries

, which consists of a vector of s-terms , a vector of boolean formulas , and a sequence of x-term vectors . After executing in a chosen database , the adversary only can learn:

  • [leftmargin=*]

  • : The total number of pairs.

  • : The boolean formulae that the client wishes to query.

  • : The repeat pattern in .

  • : The size of posting lists for .

  • : The number of x-terms for each query.

  • : The set of result matching each pair of (s-term, x-term)-conjunction which is in the form .

  • : The set of result both existing in the posting lists of and , which is only revealed when two queries have different s-terms but same x-terms.

3.3. Secure Computation

Additive Sharing and Multiplication Triplets. To additively share () an -bit value , the first party generates uniformly at random and sends to the second party . The first party’s share is denoted by and the second party’s is , the modulo operation is omitted in the description later. To reconstruct () an additively shared value in , sends to who computes . Given two shared values and , Addition () is easily performed non-interactively. In detail, locally computes , which also can be denoted by . To multiply () two shared values and , we leverage Beaver’s multiplication triplets technique (Beaver, 1991). Assuming that the two parties have already precomputed and shared , and , where are uniformly random values in , and . Then, computes and . Both parties run and to get , and lets .

Garbled Circuit and Yao’s Sharing Yao’s Garbled Circuit (GC) is first introduced in (Yao, 1982), and its security model has been formalised in (Bellare et al., 2012). GC is a generic tool to support secure two-party computation. The protocol is run between a “garbler” with a private input and an “evaluator” with its private input . The above two parties wish to securely evaluate a function . At the end of the protocol, both parties learn the value of but no party learns more than what is revealed from this output value. In details, the garbler runs a garbling algorithm to generate a garbled circuit and a decoding table for function . The garbler also encodes its input to and sends it to the evaluator. The evaluator runs an oblivious transfer (OT) (Asharov et al., 2013) protocol with the garbler to acquire its encoded input . Finally, the evaluator can compute from , decode it with , and share the result with the garbler. The security proof against a semi-honest adversary under two-party setting is given in (Lindell and Pinkas, 2009).

In the following parts, we assume that is the garbler and is the evaluator. GC can be considered as a protocol which takes as inputs the Yao’s shares and produces the Yao’s shares of outputs. In particular, the Yao’s shares of 1-bit value is denoted as and , where are the labels representing and , respectively. The evaluator uses its shares to evaluate the circuit and gets the output shares (another labels).

Additive shares can be switched to Yao’s shares efficiently. To be more precise, two parties secretly share their additive shares , in bitwise via Yao’s sharing. The evaluator then receives and and evaluates the circuit to get the label of .

4. System Overview

Figure 1. System architecture overview.

4.1. System Architecture

As shown in Figure 1, GraphSE2 has two entities: the on-premise social search service front-end () and the index server cluster () with several index servers (s) in an untrusted cloud. Note that this setting is consistent with many off-the-shelf social network service providers such as Airbnb (AWS, 2018a) and Instagram (Engineering, 2018), who use cloud data storage as the back-end to manage large graphs and massive user-generated data contents. Also, such architecture is now natively supported by public clouds, e.g., AWS Outposts (AWS, 2018c). GraphSE2 aims to improve the protection of data confidentiality at the back-end, which is usually the high-value target for adversaries in practice.

During the setup phase, partitions the social graph to disjoint subgraphs and builds two instances of SSE indexes of each subgraph for the queries on structured information. The generated indexes are uploaded to two non-colluded s with multiple s respectively. The sort-keys are co-located with the corresponding indexes in the form of additive shares on the above two s for the arithmetic operations and sorting. Specifically, each has one of the two additive shares, and it pairs with a counter-party in the other cluster, which maintains the same index but holds the other share. Upon receiving a query from its users, uses a query planner to parse the query into atomic operations (see Section 5.3) to generate a query plan. It then sends the query tokens of atomic operations to all s to execute the query plan. After that, each requests the structured information via the tokens. Based on the matched encrypted contents, it executes arithmetic operations and scoring/ranking algorithms with its counter-party. Finally, the encrypted result is returned to .

In this architecture, we consider a scenario of secure computation sourcing where the in-house assigns the computation to the s in two untrusted but non-colluding clusters and . Such a model of secure multi-party computation is formalised in (Kamara et al., 2011) and applied in many existing studies (Nikolaenko et al., 2013; Baldimtsi and Ohrimenko, 2015; Mohassel and Zhang, 2017). Built on this model, GraphSE2 offers two advantages: (i) is not required to be involved with any computation after it distributes the data to the servers, and (ii) the computation process can benefit from the mixture of multi-party computation protocols that enable efficient arithmetic operation, comparison, and sorting at the same time. Note that the communication between s will not be the system bottleneck, because s can be deployed in cloud clusters with dedicated datacenter networking support. This is consistent with prior studies based on the same architecture (Mohassel and Zhang, 2017).

Query operator Example(from (Curtiss et al., 2013)) Atomic operations
Index Access Set Operations Arithmetic Sorting
term (term friend:1)
and (and friend:1 friend:2)
or (or friend:1 friend:2)
difference (difference friend:3 (and friend:1 friend:2))
apply (apply friend: friend:1)
Table 1. Supported social search operators in GraphSE2 and its essential atom operations.

4.2. High-level Description

Before introducing the details of our system, we elaborate on the design overview and underlying design intuitions. To query large social graphs, GraphSE2 develops an encrypted and distributed graph model. It is built on graph modelling, searchable encryption, and the standard data partition algorithm. Each server evenly stores an encrypted disjoint part of the whole graph. Meanwhile, this model is designed to co-locate the encrypted contents with the disjoint part containing the users who generate or relate to the contents. As a result, GraphSE2 not only maximises the system scalability but also preserves data locality for low query latency.

To facilitate the realisation of various social search queries in the encrypted domain, GraphSE2 first splits these complex queries into two stages, i.e., content search over the structured social graph and computational operations on the retrieved contents. Within the above stages, queries are further decomposed into atomic operations, i.e., Index Access, Set Operations, Arithmetic operations, and Sorting. Since the first stage commonly performs set operations over the social graph, GraphSE2 realises our proposed graph model via a well-known searchable encryption scheme for boolean queries (aka OXT (Cash et al., 2013)). The second stage requires a combination of different computations to further analyse user contents. For example, collaborative filtering (Breese et al., 1998) first obtains the scores of user contents via several addition and multiplication operations and then sorts the scores for an accurate recommendation.

To accelerate sophisticated computations in the second stage, GraphSE2 mixes different secure computation protocols. Note that such philosophy also appears in recent privacy-preserving computation applications (Demmler et al., 2015; Mohassel and Zhang, 2017). Unlike prior work, GraphSE2 customises the mixed protocols for social search queries and adapts them to our distributed graph model. In particular, GraphSE2 represents the importance (score) of user-generated contents as the additive shares and deploys two distributed instances at two non-colluded server clusters to store both the graph partitions and corresponding shares respectively. Doing so allows GraphSE2 to support parallel and batch addition and multiplication without the interaction between servers222Multiplication involves a round of interaction between two servers, but they are in the same partition of two clusters.. To achieve fast sorting, GraphSE2 first converts additive shares to Yao’s shares inside garbled circuits (GC) and then invokes a tailored distributed sorting protocol via GC. Each pair of servers in two clusters can perform local sorting in parallel, and then the intermediate results are aggregated for global sorting. Within the protocol, the underlying scores are hidden against servers from either of the two parties.

4.3. Threat Assumptions

In this work, we assume that is a private server dedicatedly maintained by the OSN service provider. It is a trustworthy party in the proposed model. Similar to the real-world OSN service provider (e.g. Airbnb), all users should submit their queries to through webpages or mobile apps. We assume that utilises the secure channel and cryptographic techniques to protect users’ secrets. On the other hand, we assume that all s are located in the untrusted domain. Meanwhile, we consider that the two clusters are semi-honest but not colluding parties. Each cluster performs social search faithfully but intends to learn additional information such as query terms, result s and ranking values from the graph. Besides, those clusters hold user data and perform query functions, and thus they are high-value targets of adversaries. We assume that the two clusters can be compromised by two different passive adversaries, but the two adversaries will not collude. GraphSE2 aims to protect the confidentiality of the private information in the social graph when the data storage back-end of the social search service is deployed at an untrusted domain.

4.4. Query Operators

GraphSE2 follows a typical plaintext social search system (Curtiss et al., 2013) to define the operators (see Table 1).

In general, all operators in GraphSE2 aim to retrieve posting lists from the encrypted graph index. The simplest form of these operators is term, which retrieves a single posting list via an Index Access operation. Like the other social search system, GraphSE2 also supports and and or operators, which yield the intersection and union of posting lists via Set Operations respectively. In addition, it supports difference operator, which yields results from the first posting list that are not present in the others. Moreover, GraphSE2 supports the unique query operator of Unicorn system (Curtiss et al., 2013), i.e., apply. The operator allows GraphSE2 to perform multiple rounds of posting list retrieval to retrieve contents that are more than one edge away from the source node.

To enable quality search services (e.g., friendship-based recommendation), the retrieved posting lists should be scored/ranked before returning to users. As mentioned in Section 4, the additive shares of sort-keys are stored with its indexes. As a result, most of the query operators (e.g., term, and, difference and or) can use these shares to perform Sorting on the retrieved contents. Furthermore, it is often useful to return results in an order different from sorting by sort-keys. For instance, collaborative filtering (Breese et al., 1998) evaluates an arithmetic formula about friendships and ratings on items to produce the personalised scores for recommended items. The new score is a better prediction than the sort-keys, as the later only reflects the overall preference in the community (e.g., the hit-count on the item). The defined operators natively support arithmetic computations via the additive shares affixed with indexes. Specifically, apply operator has the capability to support the secure evaluation on complicated scoring formulas with Arithmetic operations: It can access different types of entities (e.g., user’s friends, items liked by users, etc.) in a multiple round-trip query, which means it can combine the scores of different entities and cache the intermediate result for next round computations.

Notation Meaning
the unique identifier of entity
the encrypted entity
an indexing term in the form of
an inverted indexed database
a list of indexed by
the encrypted posting list with pairs
the -th party in GraphSE2 ()
a numerical value
a matrix
the Additive/Yao’s share of a numerical value in
the Additive/Yao’s share of a matrix in
a garbling scheme
Table 2. Notations and Terminologies

5. The Proposed System

We give a list of needed notations in our system construction and security analysis in Table 2. The detailed definitions of preliminaries we used are given in Section 3.

5.1. Encrypted Graph Data Model

To support social search operations in (Curtiss et al., 2013) on an encrypted social graph (see Section 3.1 for details), GraphSE2 creates the OXT index (i.e., and , see Section 3.2 for details) for encrypted graph structure access in s, and the additive shares are integrated with the corresponding index to support complex computations. Specifically, to support simple graph structure data access, each posting list is encrypted and stored as a tuple in the : . The tuple consists of the of indexing term as the key and the encrypted posting list as the value. Each element in is an encrypted tuple , which keeps the encryption of entity . Additionally, the sort-key of entity is shared as additive sharing value. GraphSE2 associates it with the encrypted entity to support complex computations. Moreover, GraphSE2 evaluates the cryptographic hash function of pairs to generate an for complex set operations.

5.2. Encrypted and Distributed Graph Index

In order to support the system to process the query in parallel, GraphSE2 distributes the encrypted graph across multiple index servers for each cluster.

Figure 2. Our encrypted and distributed data model, the arrows indicate the friend relationships between users.

GraphSE2 devises a partition strategy that shards the posting lists by hashing on result . Figure 2 gives an example of the proposed partition strategy in an with two s. We employ a modulo partition strategy, which split the original posting list into multiple non-duplicate parts, but other graph partition strategies (e.g., (Low et al., 2012)) can also be applied to shard the social graph. The design has three advantages in the context of distributed environment. First, it maintains the availability in the event of server failure. Furthermore, the sharding strategy enables the distributed system to finish most of the set operations and the consequent scoring, ranking and truncating in s. It splits the computation loads into distributed servers to improve the efficiency and also cuts down the communication cost between s and . Finally, it does not affect the security of GraphSE2 because the adversary who compromises an gets the same view (the whole encrypted database) as the adversary in a single instance. If the adversary cannot access all s in the , only the view on a fraction of the encrypted database is learned..

5.3. Atomic Operations

As mentioned in Section 4.4, the social search queries are implemented by a set of operators. We observe that these operators can be decomposed to a set of atomic operations. We now describe the implementation of these atomic operations in the encrypted domain. For each atomic operation, we explain how we adapt and optimise it in the proposed system.

5.3.1. Index Access

We start with Index Access operation, which is used to retrieve the neighbouring nodes of the target user with the given edge-type (e.g., friend, likes) from the social graph. Algorithm 1 outlines the searching procedure using operations. On receiving the search keyword, firstly generates a search token , which is of the indexing term . can use to search and get the encrypted posting list as the return. Index Access operation can be easily extended to run in parallel. More specifically, broadcasts search token to all s. After that, each uses to get its local partition of the whole encrypted posting list and sends it back.

Security. The security of Index Access is guaranteed by the security property of . Informally, Index Access is -semantically-secure against adaptive attacks where is the leakage function of . is well-defined and discussed in (Cash et al., 2013). It ensures Index Access only leaks the number of edges in the encrypted social graph.

1:, Indexing Term
2:Encrypted Result
3:function IndexAccess()
4:      inputs indexing term , and inputs ;
5:      computes ;
6:      sends to ;
7:      computes ;
8:     return ;
9:end function
Algorithm 1 Index Access

5.3.2. Set Operations

This operation involves the boolean expression with multiple indexing terms. GraphSE2 uses it to query the encrypted graph-structured data and finds the neighbouring nodes and the corresponding user-generated content that satisfy the given boolean expression. In GraphSE2, we adapt protocol to support this atomic operation, but some of the other SSE protocols supporting conjunctive queries (e.g. (Lai et al., 2018)) can also be readily adapted as the building block of GraphSE2. The protocol supports conjunctive queries of the form natively, but it can be extended to support the boolean query of the form , where is the ‘s-term’, and is an arbitrary boolean expression (Cash et al., 2013). As shown in Algorithm 2, the extended protocol follows the basic steps to obtain search tokens and search in and interactively. Nevertheless, it introduces additional steps (line 3, 13, 15–17 in Algorithm 2) to solve the boolean expression . Specifically, substitutes all indexing terms to boolean variables () and generates a boolean function . then sends to . sets the value of to the truth values of . Then, it evaluates and returns as a result if outputs true.

The algorithm can be utilised to enable set operations of social search queries as shown in Table 1, which will be discussed in the following section.

Security. In cryptographic terms, the protocol is proved to be -semantically-secure against adaptive attacks, where is the leakage function defined in (Cash et al., 2013). It ensures that the untrusted server only learns the information defined in the leakage function, but no other information about the query and underlying dataset. We refer the reader to Section 3.2 for more details.

1:, Query with s-term
2:Encrypted Result
3:function BooleanQuery()( is the indexing term list , and is an arbitrary boolean expression)
4:      inputs indexing term , and inputs ;
5:      initialise a boolean expression from and sends it to ;
6:      runs ;
7:      parses to ;
8:     for  do
9:          computes ;
10:          sends to ;
11:     end for
12:      initialises ;
13:     for  do
14:         for  do
15:               uses to compute ;
16:               lets ;
17:         end for
18:         if  then
19:               adds in ;
20:         end if
21:     end for
22:     return ;
23:end function
Algorithm 2 Boolean Query

5.3.3. Arithmetic

GraphSE2 uses Arithmetic operations to support complex scoring functions over the retrieved content from Set Operations. Arithmetic operations in GraphSE2 involve the secure two-party computation between two s. Here, we introduce the simplest model of GraphSE2, where each only has one , for ease of presentation on how to use additive shares (see Section 3.3 for detailed definition) to compute addition and multiplication under two-party setting. Note that this model can be extended to support multiple pairs of s.

In GraphSE2, the posting list is generalised as a matrix and the arithmetic operations are evaluated over the matrix. The reason for that is, instead of running the scoring function with arithmetic operations multiple times for each item of the posting list, the batch processing can reduce the system overhead and support scoring algorithms in parallel. We denote the matrix of sort-keys returned from a structured information query by , and the corresponding shared matrix is denoted by . Given two shared matrices and , the addition operation () can be evaluated non-interactively by computing in each party. To multiply two shared matrices (), two s generate the multiplication triplets, which are shared matrices: . has the same dimension as , has the same dimension as , and . computes and , and sends it to its counter-party. Both parties then recover and let .

The multiplication operation relies on the triplets, which should be generated before the actual computation. In addition, each party keeps their in secret during the generation process, otherwise, they can recover after two parties exchanged . Thus, GraphSE2 introduces a secure offline protocol (Mohassel and Zhang, 2017) to generate the triplets via OT, it utilises the following relationship: to compute the shares of . The resulting offline protocol is only required to compute the shares of and as the other two terms can be computed locally.

We illustrate the computing process of in the offline protocol. The basic step of the offline protocol is to use and a column from to compute the share of their product. This is repeated for each column in to generate . Therefore, for simplicity, we focus on the above basic step: We assume that the size of is and we denote each element in as , and . In addition, we assume each column of has elements, which are denoted as , . The computation process is listed as follows:

  • [leftmargin=*]

  • runs a correlated-OT protocol (COT) (Asharov et al., 2013), and sets the correlation function to for .

  • For each bit of , chooses a random value for each bit and runs with .

  • If , gets ; If , gets . It is equivalent to get in side.

  • sets , and sets .

After computing , the -th element of the -th row in is . Analogously, and can compute the share of in the same way.

Security. Additive sharing scheme offers security guarantees to Arithmetic operations in GraphSE2 via its computational indistinguishable property. More specific, as discussed in (Pullonen et al., 2012)

, the scheme can create a uniformly distributed input and output to protects the original input/output of

Arithmetic operation under the threat model of GraphSE2, i.e., semi-honest but non-colluding two-party.

Figure 3. Local sorting process for one pair of garbler and evaluator. The input of the garbler is at the top, and the input/output of the evaluator is on the bottom.

5.3.4. Sorting

This is a required operation in order to rank the computed scores from Arithmetic operations. A naive solution is to recover all scores from additive shares in and sort them as plaintext. However, transmitting all rank results to is a bandwidth-consuming task, the sort operation can be very inefficient as the result. Therefore, GraphSE2 chooses to mix the additive sharing scheme and Yao’s Garbled Circuit (see Section 3.3 for details) to support arithmetic operations and comparison at the same time, as it avoids the communication overhead from sending the shares back to . To protect the privacy of score values, the generated circuit should have a fixed sequence of comparison for a given size of inputs (i.e., achieving the trace-oblivious), and it should not reveal the actual scoring value after circuit evaluation.

Local Sorting. To enable sorting on s, GraphSE2 leverages an efficient scheme in (Demmler et al., 2015) to switch from additive sharing to Yao’s sharing. It then adopts the sorting network (Batcher, 1968) to generate the optimised sorting circuit. Finally, the garbler concatenates the sorting network with an XOR gate and applies a random mask to mask the score values. As a result, the evaluator can use decode table to figure out the rank, but it does not know the score values. Thus, the local sorting algorithm in GraphSE2 can be divided into five phases. Figure 3 illustrates the process of local sorting.

Given as the garbler and as the evaluator, both parties pre-share a scoring vector , GraphSE2 runs the protocol to sort the vector and returns a sorted vector in descending order of , the protocol can be summarised as follows:

  • [leftmargin=*]

  • Phase 1: runs to generate the circuit in Figure 3 as well as its decode table . It then sends the circuit and the decode table to . Doing so ensures that only can see the final result with random mask .

  • Phase 2: sends the encoded inputs of its additive shares with a payload vector indicating the position. This prevents from learning the additive shares of .

  • Phase 3: retrieves the encoded inputs of its additive shares and payload vector from via OT protocol. This prevents from learning the additive shares of .

  • Phase 4: generates the encoded input of a random mask to perform the last XOR gate to protect the vector.

  • Phase 5: uses the given inputs to evaluate the circuit, and uses to decode the outputs.

Since the circuit puts a mask after sorting, only gets the ranking without knowing the actual scores.

Figure 4. Global sorting process in the coordinators. The input of the garbler is the masked score vector with a payload that indicates the position of the score in vector, and the input of the evaluator is the random masks.

Global Sorting. The above sorting strategy is a suitable and efficient solution for the simplest model, i.e., only one in each . However, it can be problematic when each has several s. In this case, no can provide a full sorted list as each only has a disjoint part of the whole graph. Hence, still needs to perform another inefficient plaintext sorting.

Therefore, GraphSE2 uses a specific protocol which runs by a chosen coordinator of each . The protocol can perform an extra round of sorting upon the results from local sort while keeping the scoring value in secret. Assuming that each has different s, and are chosen to be the coordinators for and , respectively. After local sorting, evaluator in sends a vector of masked scoring values where and to and garbler in sends the masks to . In the global sorting, GraphSE2 switches the roles of and (i.e., is the garbler, and is the evaluator) for two reasons: It prevents from evaluating two circuits at the same time, as also needs to evaluate another local sorting circuit for its partition. Furthermore, it facilitates pipeline data processing. The partial result of each can be sent to and for global sorting separately. Once the and get the first result, they can start to run encoding and OT. The protocol is summarised as follows:

  • [leftmargin=*]

  • Phase 1: runs to generate the circuit in Figure 4 and the decode table . It then sends the circuit and to .

  • Phase 2: sends the encoded inputs where and with a payload vector .

  • Phase 3: retrieves encoded masks via OT protocol.

  • Phase 4: uses the given inputs to evaluate the circuit, and uses to decode the outputs. The result is sent to in descending order.

Security. The security properties of Garbled Circuit (Bellare et al., 2012) and OT (Asharov et al., 2013) ensure the security of Sorting in GraphSE2 in the following three aspects: Firstly, no adversary can learn the input of its counter-party when sorting (i.e., the other additive share in and the masked score/mask in ). Secondly, the output of is masked by a one-time mask, which is a uniformly random number. It protects the original score vector because the evaluator only learns the masked values from output. Finally, for the output of , only the decode table of rank is sent to the evaluator, which also ensures that the evaluator only learns global rank without knowing the actual ranking score.

6. Query Realisation

In GraphSE2, receives queries as the query strings in the form of s-expression. It is composed of several operators to describe the set of results the client wishes to receive (see Table 1). In the following sections, we introduce the operators in GraphSE2 and their security properties.

6.1. Graph Operators

GraphSE2 uses atomic operations in Section 5.3 to realise all operators in Table 1. In this section, we present the detailed constructions of these operators. Note that we only consider the operators as the outermost operators, i.e., they are not nested in any other query strings, because the query plan generation highly depends on the outermost operators.

term. The term operator runs an Index Access operation to retrieve a posting list. In addition, if there is a requirement to rank the result, the Sorting operation is able to return a sorted posting list which puts the record with higher relevance at the beginning of the list.

and. This operator is natively supported by the BooleanQuery algorithm. As mentioned in Section 5.3, conjunctive queries with nested queries (e.g., ) are processed by evaluating the boolean expression in . It is obvious that the and operator is executed in a sub-linear time, as its complexity is proportional to the size of .

difference. The difference operator is extended from BooleanQuery algorithm. Considering the query (difference friend:3 (and friend:1 friend:2)) from Table 1, it aims to find the friends of , who are neither ’s friends nor ’s friends. The boolean expression, in this case, is , but the results that satisfy the expression are removed from the results of the query . In summary, difference operator excludes the results that satisfy the boolean expression . Therefore, the s-expression with difference operator is represented as . Comparing with the and operator, it returns the results only if the boolean expression returns false instead of true.

or. The complexity of the original approach for processing disjunctive queries is linear to the size of database (Cash et al., 2013). To achieve a sub-linear time complexity, we leverage the above difference and term operators to build a new disjunctive query operator. In particular, the s-expression starts with or operator can be processed via a list of difference expressions and an additional term expression. For instance, if a disjunctive query has three indexing terms: , the corresponding s-expression is (), and it is parsed as: . The above three s-expressions return three different sets of results, and the composite of them is the final result of or operator. The correctness of the above approach can be easily proved by the set operation: .

In general, a disjunctive s-expression with indexing terms can be rewritten as s-expressions with difference operator and s-expressions with term operator. The complexity is proportional to , where is the number of disjunctive indexing terms and is the result size of the most frequent term .

6.2. Apply Operator

The apply is a unique operator in Unicorn (Curtiss et al., 2013), which enables graph-traversal. The basic idea is to retrieve the results of nested queries and use these results to construct and execute a new query. For example, given an s-expression (apply friend: friend:), issues a query (term friend:) and collects users, it then generates the second query to get the entities that are more than one edge away from the user in the encrypted graph-structured data.

GraphSE2 defines a query structure to construct apply operator. In details, the query structure is a tuple of (prefix, s, filter), where the prefix (e.g., friend:) is prepended to the given to form the indexing terms, s is an s-expression with indexing terms (e.g., (term ?)), and filter indicates the ranking algorithm for its results. To execute an apply operator, pre-processes the input with a given prefix in the query structure and uses processed to execute the s-expression s from the query structure.  handles the query from the s-expression and applies the designated filter in the query structure to refine the result. Consequently, can retrieve a list of as the result of the nested query structure. GraphSE2 leverages as input the retrieved and outer query structure to repeat the above procedure until it reaches the outermost query structure. Algorithm 3 gives the detailed implementation of the apply.

1:Outer Query Structure , Nested Query Structure , Array of
2:Encrypted Result
3:function Apply()(in )
4:     for  do
5:         ;
6:         ;
7:     end for
8:     ;
9:     for  do
10:         ;
11:         ;
12:     end for
13:     ;
14:     return ;
15:end function
1:Query , Result Filter
2:Encrypted Result
3:function Search(q, f)(in )
4:     ;
5:     return
6:end function
Algorithm 3 Apply

The apply operator processes necessary steps on behalf of its users to improve the efficiency of GraphSE2. For example, it would be possible for users to ask recommendation from friends: both and client can execute a two-step query to retrieve friend list in advance and issue an additional query to get the recommendation. Compared to the latter strategy, the apply operator runs in can highly reduce the workload on the client side: it saves the network latency of transmitting intermediate result between and client, addition to the computational cost of aggregation and regeneration. Furthermore, can further optimise the query and adopt the different scoring strategy by giving its semantic context (as shown in the following example).

Example: Friend Recommendation. The friend recommendation is a good example for the use of the apply operator. According to the Homophily theory (McPherson et al., 2001)

, people with higher similarity have a higher probability to become friend. In this context, the system aims to recommend the friends-of-friends to its user according to the order of similarities. Hence, we apply a simple ranking function which returns the sorted similarity value directly for both outer and nested query for this application.

It is also possible to implement the friends-of-friends query without the apply operator. Intuitively, friends-of-friends also can be treated as an edge type, GraphSE2 may explicitly store the friends-of-friends list and use the indexing term friends-of-friends:id to index it. Therefore, the friend recommendation problem is easily processed by the term operator. However, such a naive solution blows up the memory consumption on : as shown in Table 3

, the estimated size of friends-of-friends posting list is almost 370x larger than the original friend list of a typical user 

(Curtiss et al., 2013). In GraphSE2, each encrypted tuple occupies bytes memory space (See Section 8.2 for the detailed discussion), it indicates that the friends-of-friends posting lists for 1 million users consumes TB RAM.

The apply operator also reduces the query latency: it is expensive to sort the posting lists of friends-of-friends:id with 48k entities inline in Garbled Circuit (it needs s to evaluate the corresponding circuit). In comparison, introducing an extra round to query enables to truncate the result, and makes the sorting process more efficient. For example, if GraphSE2 applies a filter to return the top results to for the nested query, the result size can be reduced to around users with a higher similarity. Also, the reduced result size is moderate to evaluate sort circuit on it. Under our settings in Section 8, the system replies either a full result list after - s or a truncated result list after - s.

friends-of-friends apply
Est. # of users/posting list k
Est. Storage overhead GB TB
Query delay s s
Table 3. Performance estimation of Friend Recommendation implementations with 1 million users.

6.3. Security Analysis

In Section 5.3, we discuss the security of each atomic operation. Here, we analyse the security of the overall system. Specifically, we formulate the security of GraphSE2 based on the prior work of SSE (Curtmola et al., 2011; Cash et al., 2013) and further combine the security of additive sharing and Garbled Circuit to depict the security of query operators. Throughout the analysis, we consider a query in GraphSE2 containing a boolean formula and a tuple of indexing terms .

Overview. The main idea of analysing the security of GraphSE2 is similar to that in SSE scheme (Curtmola et al., 2011; Cash et al., 2013). Specifically, the analysis constructs a simulator of GraphSE2 to show that the adversary in GraphSE2 only learns the controlled leakage parameterised by a leakage function , after querying a vector of queries . Note that the simulator of GraphSE2 is slightly different from the original SSE simulator, we outline these different points as the sketch of our security analysis. Firstly, we update the leakage function of SSE ( in GraphSE2

) to additionally capture the ranks leaked in query results. Secondly, we slightly modify the capability of adversaries to fit our two-party model: an adversary in our system is able to see the view on the corrupted cluster as well as the output of the counter-party. Under the new adversary model, the joint distribution of the outputs of both the adversary and the counter-party can be properly simulated by an efficient algorithm with the updated leakage function. Finally, as GraphSE

2 has the submodules implemented by SSE, additive share scheme, and Yao’s Garbled Circuit, the simulator of GraphSE2 can be constructed by combining the simulators of these submodules. For instance, our simulator uses the output of SSE simulator as the input of garbled circuit simulator in order to simulate the query operators with structured data access and sorting.

Due to the page limit, the updated leakage function is given in Appendix A.1 and the detailed proof is in Appendix A.2.

Discussion. Note that there exist some emerging threats against the building blocks of GraphSE2. Regarding SSE, leakage-abuse attacks (Cash et al., 2015; Zhang et al., 2016) can help an attacker to explore the information learned during queries. To mitigate them, recent studies on padding countermeasures (Cash et al., 2015; Bost and P-A, 2017) and forward/backward privacy (Bost et al., 2017; Sun et al., 2018) are proposed and shown to be effective. We leave the integration of these advanced security features to our system as future work. Regarding sorting, GraphSE2 reveals the rank of the query result. Recent work (Kellaris et al., 2016; Kornaropoulos et al., 2019) demonstrates that the underling data values are likely to be reconstructed if an adversary knows ranks and some auxiliary information of queries and datasets. Currently, we do not consider such a strong adversary, and how to fully address the above threat remains as an interesting problem.

7. Implementation

We implement a prototype system for evaluating the performance of GraphSE2. To build this prototype, we first realise the cryptographic primitives in Section 3. Specifically, we use the symmetric primitives, i.e., AES-CMAC and AES-CBC, from Bouncy Castle Crypto APIs (The Legion of the Bouncy Castle, 2007). In addition, we use a built-in curve from Java Pairing-based Cryptography (JPBC) (Caro and Iovino, 2011) library (Type A curve) to support the group operations in . The security parameter of symmetric key encryption schemes is 128-bit, and the security parameter of the elliptical curve cryptographic scheme is 160-bit. Regarding the secure two-party computation, we set the field size to . Therefore, we can use regular arithmetics on Java integer type to implement the modulo operations, as it is significantly faster than the native modulo operation in Java BigInteger type (i.e., we observed that it is 50x faster). We involve this optimisation into the implementation of additive sharing scheme in the finite field , the addition (multiplication) operations is calculated by several regular addition and multiplication operations with the modulo operation. Oblivious Transfer and Garbled Circuit are implemented by using FlexSC (Wang, 2018). It implements the extended OTs in (Asharov et al., 2013) and several optimisations for the garbled circuit, which make it a practical primitive under Java environment.

The prototype system consists of three main components: the encrypted database generator, the query planner in and the index server daemon in . The encrypted database generator is running on a cluster with Hadoop (Apache, 2015). It partitions the plaintext data, runs the adapted to convert the data into encrypted tuples with the additive share of sort-keys and stores these tuples on each . We leverage Spark (Zaharia et al., 2010) to execute these tasks in-memory and enable the pipelining data processing to further accelerate this process. The generated tuples are stored in the in-memory key-value store Redis (Redis Labs, 2017) in the form of on each for querying purpose later. In addition, the generated is kept in the external storage of each to support the set operations. All queries are handled by the query planner and index server daemons by following the query processing flow in Section 4.1. Thrift thread pool proxy (Slee et al., 2007) is deployed to handle the queries in index server daemons.

To improve the runtime performance of our prototype, each posting list is segmented into fixed-size blocks indexed by its and a block counter for the . As the final result of the block counter indicates the total number of blocks for each , it is also stored in Redis after the whole posting list is converted to encrypted tuples. Those counters enable to retrieve multiple tuples in parallel. We also introduce a startup process for protocol and secure two-party computation in index server daemons. In terms of the matching, the index server daemon creates a Bloom Filter (Bloom, 1970) to load the into memory during the startup process. In our prototype, we deploy the Bloom filter from Alexandr Nikitin as it is the fastest Bloom filter implementation for JVM (Nikitin, 2017). We set the false positive rate to , and the generated Bloom Filter only occupies a small fraction of memory. Besides generating the Bloom Filter, each index server also pre-computes several multiplication triplets and sorting circuits and periodically refreshes it to avoid extra computational cost on-the-fly.

Our prototype system implementation consists of four main modules with roughly 3000 lines of Java code, we also implement a test module with another 1000 lines of Java code.

Node type # of nodes Edge type # of edges
User friend
Group follow
Table 4. Statistics of Youtube social network dataset.

8. Experimental Evaluations

8.1. Setup

Platform. We deploy the index server daemons in a cluster () with 6 virtual machine instances in the Microsoft Azure platform. All VMs are E4-2s v3 instances, configured with 2 Intel Xeon E5-2673 v4 cores, 32GB RAM, 64GB SSD external storage and 40Gbps virtualised NIC. Another D16s v3 instance is created in Azure to run with the query planner and client; it is equipped with 16 Intel Xeon E5-2673 v3/v4 cores, 64GB RAM, 128GB SSD external storage and 40Gbps virtualised NIC. We also have the other three E4-2s v3 instances controlled by ; we use them to run the encrypted database generator for generating the encrypted index. All VMs are installed with Ubuntu Server 16.04LTS.

Dataset. We use a Youtube dataset from (Mislove et al., 2007), which is an anonymised Youtube user-to-user links and user group memberships network dataset. The detailed statistical summary is given in Table 4. We recognise the user-to-user links as friend edge and user group memberships as follow edge from this dataset. The generated posting lists are indexed by above two edge types and user s. As the social network in our Youtube dataset is an unweighted network, we randomly generate a weight between and for each edge to evaluate the arithmetic and sort operations of GraphSE2.

Baseline. To evaluate the performance of GraphSE2, we create a graph search system by removing/replacing cryptographic operations in this baseline system. Specifically, we leverage hash function to generate instead of using expensive group operations. The index and sort-key are stored in plaintext, which means that the can compute and sort without any network communication for OT and multiplication triplets. Finally, the query planner provides the indexing term in plaintext instead of to the as query token. Nonetheless, we still use the PRF value of indexing term and the block counter as tuple index, because we want to keep the table structure of unaltered to make our system comparable to the baseline. We use this baseline system to evaluate the overhead from cryptographic operations as GraphSE2 implements the same operators as Facebook Unicorn (Curtiss et al., 2013).

8.2. Evaluation Results

Vector length 2 4 8 16 32 64 128
# of AND Gates 4382 9148 19448 41968 91616 201664 446336
GC evaluation time (ms) 17.3 20.3 31.5 48.2 101.0 206.5 440.0
GC comm. overhead (MB) 0.12 0.41 0.46 0.97 2.10 4.49 9.80
Table 5. Benchmark of sorting circuit size and evaluation time, the garbled sorting algorithm is bitonic merging/sorting, we use it to sort vector.
Figure 5. A tuple-wise storage overhead comparison between the encrypted database and plaintext database.

EDB Generation. Firstly, we demonstrate the runtime performance of the encrypted database generator. The generator needs to partition and create additive shares from the original plaintext data, and to generate the adapted index for each . GraphSE2 uses to locally generate the partitions and additive shares for our dataset and then uses the dedicated cluster to generate the encrypted graph index in parallel. The result on our million records dataset shows that it only takes s to pre-process data on and mins to generate the encrypted index via Spark.

Storage. Recall that GraphSE2 uses adapted index to support boolean queries over the encrypted graph, which needs to generate two dedicated data structure (i.e., and ). As a result, GraphSE2 consumes more storage capacity than the baseline system (see Figure 5), because it is required to keep more information (i.e., in ciphertext), and because it stores encrypted index which is larger than the corresponding plaintext. By using the , we observe that our system increases the memory consumption of Redis by (557MB in and 300MB in plaintext), which is slightly smaller than the theoretical memory consumption overhead ( according to Figure 5). The reason is that GraphSE2 also keeps the number of blocks of each posting list (see Section 7) to accelerate the tuple retrieving process333If the size of posting list is unknown, the system needs to sequentially retrieve the tuple from blocks, as the key is derived from and block counter. Otherwise, the tuples can be retrieved in parallel.. As shown in Figure 5, the block counter requires an additional bytes of memory consumption for each indexing term both in and in plaintext. It introduces the same extra cost on GraphSE2 as well as in the baseline system, and makes the memory consumption overhead smaller than the theoretical expectation.

For storage overhead, GraphSE2 increases it by 17x (1.5GB versus 90MB), mostly due to the fact that the size group element is much larger than a PRF value. But the Bloom Filter successfully saves the memory consumption in runtime, because the size of Bloom Filter only depends on the false positive rate and the number of total elements inside (the number of edges in our system) (Bloom, 1970), and it is much smaller than itself. By fixing the false positive rate to , the runtime overhead of in our system is identical to the baseline system (only 18MB in RAM).

Figure 6. Query delay for two-keyword set queries.

Query Delay. To understand the query delay introduced by cryptographic primitives, we measure the cryptographic overhead from these cryptographic primitives independently. To evaluate the query delay introduced by the set operations, we choose an indexing term a from friend edges with fixed selectivity (, as the average user has friends according to Unicorn paper (Curtiss et al., 2013)), and we further choose several variable indexing terms v from friend edges with selectivity from to . Figure 6 illustrates query delay on Index Access v and two variants of two-term Boolean Query. In Index Access v query, the query only consists of s-term v, and the figure shows that its execution time is linear to the size of corresponding posting lists. The other two-term queries combine the previous queries with the fixed term a. In the first of these two queries, we use a as x-term, each tuple from should be checked wtih the cost of an exponentiation, which requires to ms to process. In the last one, a is used as s-term, where we observe that the execution time is kept invariable ( ms), irrespective of the variable selectivity of the xterm v. It demonstrates that GraphSE2 can respond a query related to the moderate users with a tiny latency, and it is also able to reply query about popular users with a slightly bigger but still modest delay.

The secure addition and multiplication is supported by additive sharing scheme with a relatively small overhead, because in most cases, the two servers non-interactively do the computation tasks by using regular arithmetic operations, and because the expensive tasks, such as multiplication triplets generation, are pre-computed in the startup process. Figure (a)a and (b)b demonstrate the execution delay of addition and multiplication over the different size of vectors. We can see that the addition operation with two vectors containing entities can be done within ms. For the multiplication operation, it needs ms to compute the product of entities because it requires several efficient but non-negligible communications with the counter-party.

As the sorting algorithm is implemented by Garbled Circuit, we provide a benchmark about the size of the circuit and the corresponding evaluation time. The results are listed in Table 5. Note that we do not report the circuit generation time as it is generated in the startup process. The reported result demonstrates the practicality of our sorting strategy: the local sorting generally involves fewer entities after partition ( if the system sorts a result list with users), which takes ms to sort. Additionally, the local sort can help to truncate the result before sending it for global sorting, which makes the sorting algorithm in Garbled Circuit more efficient (less than ms for a vector with entities).

Figure 7. Query delay for multiple-keyword set queries.

We further examine the query delay from set operations under multiple-keyword setting by using the same a as s-term, but we add more variable terms as x-term. Figure 7 shows that each additional x-term increases the query delay by ms, which means a query with x-terms can still be processed within ms.

(a) Addition
(b) Multiplication
Figure 8. The execution time for addition and multiplication operations on two vectors with , , entities.

Communication. We measure the inter-cluster communication overhead, because it includes the main communication overhead in GraphSE2, which is to send the garbled sorting circuit as well as the labels of inputs to the counter-party. Note that this is much larger than sending the multiplication triplets ( bytes for each) and the encrypted ( bytes for each). We demonstrate the communication overhead for different size of the circuit in Table 5. It shows that for an average user with approximately friends, the two-party only requires to transmits MB data to sort them. This overhead is negligible both in our evaluation platform (40Gbps NIC in Azure intranet) and the other public clouds such as AWS.

Throughput. To evaluate the impact of our system on throughput, we measure the server throughput for different types of operators. For each operator, we compare the throughput results between GraphSE2 and the baseline. Figure 9 and Table 6 show the throughput test results for GraphSE2 and baseline. In all the cases, we group our 6 VM instances into 2 clusters with 3 VMs to fulfil the two-party settings. All VMs are running with only one core involved in the computation, and we simulate parallel client processes to send the query to the server, which ensures workload on the server side. The results show that the throughput penalty is mainly from the sorting: the local sorting decreases the throughput by to . We also observe that the global sorting is the bottleneck of the whole system (see Table 6), it gives a constant query throughput for all operators, which means it runs longer to obtain the final result. However, for the operators without sorting, the throughput loss is modest (only to ).

Operators term and/diff. or
Baseline
Our system
Table 6. Throughput (Queries/sec) comparison of global sorting for query with different operators.
Figure 9. Throughput of different types of Query Operators with concurrent clients running under GraphSE2 and baseline, all operators except term have two keywords. Diff. stands for difference operator; ls. stands for locally sorted in after applying the operators.

9. Conclusion

This paper presents an encrypted graph database, named GraphSE2. It enables privacy-preserving rich queries in the context of social network services. Our system leverages the advanced cryptographic primitives (i.e., , and mixing protocol with additive sharing and Garbled Circuit) with strong security guarantees for queries on structured social graph data, and queries with computation, respectively. To lead to a practical performance, GraphSE2 generates an encrypted index on a distributed graph model to facilitate parallel processing of the proposed graph queries. GraphSE2 is implemented as a prototype system, and our evaluation on YouTube dataset illustrates its efficiency on social search.

References

  • (1)
  • Apache (2015) Apache. 2015. Hadoop. https://hadoop.apache.org[online]. (2015).
  • Asharov et al. (2013) G. Asharov, Y. Lindell, T. Schneider, and M. Zohner. 2013. More Efficient Oblivious Transfer and Extensions for Faster Secure Computation. In ACM CCS’13.
  • AWS (2018a) AWS. 2018a. AWS Case Study: Airbnb. https://aws.amazon.com/solutions/case-studies/airbnb/[online]. (2018).
  • AWS (2018b) AWS. 2018b. AWS Case Study: PIXNET. https://aws.amazon.com/cn/solutions/case-studies/pixnet/[online]. (2018).
  • AWS (2018c) AWS. 2018c. AWS Outposts: Run AWS Infrastructure On-Premises for A Truly Consistent Hybrid Experience. https://aws.amazon.com/outposts/[online]. (2018).
  • Baldimtsi and Ohrimenko (2015) F. Baldimtsi and O. Ohrimenko. 2015. Sorting and Searching Behind the Curtain. In FC’15.
  • Batcher (1968) K.E. Batcher. 1968. Sorting Networks and their Applications. In ACM SJCC’68.
  • Beaver (1991) D. Beaver. 1991. Efficient Multiparty Protocols using Circuit Randomization. In CRYPTO’91.
  • Bellare et al. (2012) M. Bellare, V.T. Hoang, and P. Rogaway. 2012. Foundations of Garbled Circuits. In ACM CCS’12.
  • Blanton et al. (2013) M. Blanton, A. Steele, and M. Alisagari. 2013. Data-Oblivious Graph Algorithms for Secure Computation and Outsourcing. In ACM AISACCS’13.
  • Bloom (1970) B.H. Bloom. 1970. Space/Time Trade-offs in Hash Coding with Allowable Errors. Commun. ACM 13, 7 (1970), 422–426.
  • Bost et al. (2017) R. Bost, B. Minaud, and O. Ohrimenko. 2017. Forward and Backward Private Searchable Encryption from Constrained Cryptographic Primitives. In ACM CCS’17.
  • Bost and P-A (2017) R. Bost and Fouque P-A. 2017. Thwarting Leakage Abuse Attacks against Searchable Encryption–A Formal Approach and Applications to Database Padding. Cryptology ePrint Archive, Report 2011/1060. (2017).
  • Breese et al. (1998) J.S. Breese, D. Heckerman, and C. Kadie. 1998. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In UAI’98.
  • Canetti (2000) R. Canetti. 2000. Security and Composition of Multiparty Cryptographic Protocols. Journal of Cryptology 13, 1 (2000), 143–202.
  • Cao et al. (2011) N. Cao, Z. Yang, C. Wang, K. Ren, and W. Lou. 2011. Privacy-Preserving Query over Encrypted Graph-Structured Data in Cloud Computing. In IEEE ICDCS’11.
  • Caro and Iovino (2011) A. De Caro and V. Iovino. 2011. JPBC: Java Pairing Based Cryptography. In IEEE SCC’11. 850–855.
  • Cash et al. (2015) D. Cash, P. Grubbs, J. Perry, and T. Ristenpart. 2015. Leakage-Abuse Attacks Against Searchable Encryption. In ACM CCS’15.
  • Cash et al. (2013) D. Cash, S. Jarecki, C.S. Jutla, H. Krawczyk, M-C. Rosu, and M. Steiner. 2013. Highly-Scalable Searchable Symmetric Encryption with Support for Boolean Queries. In CRYPTO’13.
  • Chang et al. (2016) Z. Chang, L. Zou, and F. Li. 2016. Privacy Preserving Subgraph Matching on Large Graphs in Cloud. In ACM SIGMOD’16.
  • Chase and Kamara (2010) M. Chase and S. Kamara. 2010. Structured Encryption and Controlled Disclosure. In AISACRYPT’10.
  • Chi et al. (2016) Y. Chi, G. Dai, Y. Wang, G. Sun, G. Li, and H. Yang. 2016. Nxgraph: An Efficient Graph Processing System on A Single Machine. In IEEE ICDE’16.
  • Curtiss et al. (2013) M. Curtiss, I. Becker, T. Bosman, S. Doroshenko, L. Grijincu, T. Jackson, et al. 2013. Unicorn: A System for Searching the Social Graph. Proceedings of the VLDB Endowment 6, 11 (2013), 1150–1161.
  • Curtmola et al. (2011) R. Curtmola, J. Garay, S. Kamara, and R. Ostrovsky. 2011. Searchable Symmetric Encryption: Improved Definitions and Efficient Constructions. Journal of Computer Security 19, 5 (2011), 895–934.
  • Demmler et al. (2015) D. Demmler, T. Schneider, and M. Zohner. 2015. ABY-A Framework for Efficient Mixed-Protocol Secure Two-Party Computation. In NDSS’15.
  • Engineering (2018) Instagram Engineering. 2018. What Powers Instagram: Hundreds of Instances, Dozens of Technologies. https://instagram-engineering.com/what-powers-instagram-hundreds-of-instances-dozens-of-technologies
    -adf2e22da2ad[online]. (2018).
  • Goodrich et al. (2011) M.T. Goodrich, R. Tamassia, and N. Triandopoulos. 2011. Efficient Authenticated Data Structures for Graph Connectivity and Geometric Search Problems. Algorithmica 60, 3 (2011), 505–552.
  • Information is Beautiful (2018) Information is Beautiful. 2018. World’s Biggest Data Breaches. http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/
    [online]. (2018).
  • Kamara and Moataz (2017) S. Kamara and T. Moataz. 2017. Boolean Searchable Symmetric Encryption with Worst-Case Sub-Linear Complexity. In EUROCRYPT’17.
  • Kamara et al. (2018) S. Kamara, T. Moataz, and O. Ohrimenko. 2018. Structured Encryption and Leakage Suppression. In CRYPTO’18.
  • Kamara et al. (2011) S. Kamara, P. Mohassel, and M. Raykova. 2011. Outsourcing Multi-Party Computation. Cryptology ePrint Archive, Report 2011/272. (2011).
  • Kellaris et al. (2016) G. Kellaris, G. Kollios, K. Nissim, and A. O’Neill. 2016. Generic Attacks on Secure Outsourced Databases. In ACM CCS’16.
  • Kornaropoulos et al. (2019) E.M. Kornaropoulos, C. Papamanthou, and R. Tamassia. 2019. Data Recovery on Encrypted Databases with K-Nearest Neighbor Query Leakage. In IEEE S&P’19.
  • Lai et al. (2018) S. Lai, S. Patranabis, A. Sakzad, J.K. Liu, D. Mukhopadhyay, R. Steinfeld, et al. 2018. Result Pattern Hiding Searchable Encryption for Conjunctive Queries. In ACM CCS’18.
  • Li et al. (2016) J. Li, L. Zhang, J. K. Liu, H. Qian, and Z. Dong. 2016. Privacy-Preserving Public Auditing Protocol for Low-Performance End Devices in Cloud. IEEE Transactions on Information Forensics and Security 11, 11 (2016), 2572–2583.
  • Liang et al. (2015) K. Liang, J.K. Liu, R. Lu, and D. S. Wong. 2015. Privacy Concerns for Photo Sharing in Online Social Networks. IEEE Internet Computing 19, 2 (2015), 58–63.
  • Lindell and Pinkas (2009) Y. Lindell and B. Pinkas. 2009. A Proof of Security of Yao’s Protocol for Two-party Computation. Journal of Cryptology 22, 2 (2009), 161–188.
  • Liu et al. (2016) J.