Evolution as a Service: A Privacy-Preserving Genetic Algorithm for Combinatorial Optimization

05/27/2022
by   Bowen Zhao, et al.
8

Evolutionary algorithms (EAs), such as the genetic algorithm (GA), offer an elegant way to handle combinatorial optimization problems (COPs). However, limited by expertise and resources, most users do not have enough capability to implement EAs to solve COPs. An intuitive and promising solution is to outsource evolutionary operations to a cloud server, whilst it suffers from privacy concerns. To this end, this paper proposes a novel computing paradigm, evolution as a service (EaaS), where a cloud server renders evolutionary computation services for users without sacrificing users' privacy. Inspired by the idea of EaaS, this paper designs PEGA, a novel privacy-preserving GA for COPs. Specifically, PEGA enables users outsourcing COPs to the cloud server holding a competitive GA and approximating the optimal solution in a privacy-preserving manner. PEGA features the following characteristics. First, any user without expertise and enough resources can solve her COPs. Second, PEGA does not leak contents of optimization problems, i.e., users' privacy. Third, PEGA has the same capability as the conventional GA to approximate the optimal solution. We implements PEGA falling in a twin-server architecture and evaluates it in the traveling salesman problem (TSP, a widely known COP). Particularly, we utilize encryption cryptography to protect users' privacy and carefully design a suit of secure computing protocols to support evolutionary operators of GA on encrypted data. Privacy analysis demonstrates that PEGA does not disclose the contents of the COP to the cloud server. Experimental evaluation results on four TSP datasets show that PEGA is as effective as the conventional GA in approximating the optimal solution.

READ FULL TEXT VIEW PDF

page 1

page 12

08/11/2020

A Study of a Genetic Algorithm for Polydisperse Spray Flames

Modern technological advancements constantly push forward the human-mach...
05/06/2022

Heal the Privacy: Functional Encryption and Privacy-Preserving Analytics

Secure cloud storage is an issue of paramount importance that both busin...
01/14/2019

Electrical Impedance Tomography based on Genetic Algorithm

In this paper, we applies GA algorithm into Electrical Impedance Tomogra...
02/09/2018

Web-Based Implementation of Travelling Salesperson Problem Using Genetic Algorithm

The world is connected through the Internet. As the abundance of Interne...
08/09/2017

Privacy Preserving Face Retrieval in the Cloud for Mobile Users

Recently, cloud storage and processing have been widely adopted. Mobile ...
08/27/2015

Using Genetic Algorithms to Benchmark the Cloud

This paper presents a novel application of Genetic Algorithms(GAs) to qu...

I Introduction

Evolutionary algorithms (EAs), such as genetic algorithm (GA), are powerful tools to tackle combinatorial optimization problems (COPs) in science and engineering fields. Many problems faced by science and engineering can be formulated as COPs, such as synthetic biology, transport of goods and planning, production planning [10, 12]. EA has proven to be an powerful tool in handling COPs due to its global probabilistic search ability based on biological evolution such as selection and mutation [20, 13]. Applications of science and engineering have strong requirements for EAs to tackle optimization problems [12].

Limited expertise and resources of common users hinder them from tackling COPs through EAs effectively. In practice, most users facing COPs lack expertise, such as EAs and programming skill for EAs implementation. Also, EAs based on biological evolution require plenty of iterative operations to search the approximate optimal solution, which consumes abundance computing resources. In the sight of users, even though they have the need for COPs, fail to effectively solve the COP due to limited capability and resources.

One promising and elegant solution is that the cloud server renders an evolutionary computing service for users. The cloud server is equipped with sufficient computing and storage resources and can offer convenient and flexible computation services, such as training and inference of machine learning

[2, 8, 7, 14], named machine learning as a service (MLaaS). In MLaaS, users outsource tasks of training or inference to the cloud server and get results. The cloud server performs computing of training or inference. As computing provided by the cloud server, MLaaS does not require users to have expertise and sufficient computing resources. Similarly, users are able to outsource tasks of evolutionary computation to the cloud server and get optimization results even though they lack programming skills for EAs implementation and sufficient resource to perform EAs.

Privacy concerns are critical challenges for outsourcing computation of EAs to the cloud server just like MLaaS [2, 8, 7, 14]. Optimization results of COPs are private information of users [15, 3, 1]. For example, optimization results of COPs for synthetic biology, transport of goods and planning, production planning involve private biologic information, planning of goods transportation and production to name but a few. The cloud server is not generally regarded as a trusted entity in an outsourcing computation scenario [16, 8, 7, 14], such as iCloud leaking celebrity photos, Amazon Web Services exposing Facebook user records. Obviously, no user or company is willing to reveal biologic information and planning of goods transportation and production to others. Moreover, many regulations stipulate to protect personal data. GDPR111GDPR: General Data Protection Regulation (EU) stipulates any information relating to an identified or identifiable natural person is private and should be protected. Also, contents of COPs should be regarded as private information. Given contents of the COP, the cloud server holding EAs can obtain the optimization results, which breaches privacy regulation.

To tackle privacy concerns of outsourcing computation of EAs, in this paper, we define a novel computing paradigm, evolution as a service (EaaS), the cloud server rendering evolutionary computing service for users without sacrificing users’ privacy. Broadly speaking, the cloud server encapsulates EAs as a service. Users outsource tasks of evolutionary computation to the cloud server and adopt privacy-preserving methods (e.g., encryption cryptography) to protect privacy. The cloud server performs evolutionary operations and returns optimization results to users. In EaaS, the cloud server cannot learn users’ contents of COP and optimization results. Also, users are not required to have expertise of EAs and sufficient resources. In short, EaaS enables users convenient and flexible solving COPs without sacrificing privacy, which relieves the dilemma between evolutionary computation and privacy concerns.

The vital idea of EaaS is that users outsource encrypted contents of the optimization problem to the cloud server, and the cloud server renders an evolutionary computation service for users over encrypted data. Technically, the implementation of EaaS suffers from several challenges.

First, the cloud server requires to perform evolutionary operations without sacrificing users’ privacy. EA involves basic evolutionary operations including population initialization, evaluation, selection, crossover, and mutation [15, 17]. The population initialization requires randomly generating several hundred or thousands of individuals, and each individual represents a possible solution. Arguably, when the cloud server has no knowledge about contents of the COP, it is not a trivial task to generate possible solutions. Furthermore, if the cloud server has difficulty in initializing the population, it is also challenging to perform evaluation, selection, crossover, and mutation operations as the latter relies on the initialized population.

Second, the cloud server can evaluate the fitness value of each individual in the population but fails to learn the possible solution. In EAs, the fitness value determines the quality of solutions and is a crucial metric to select dominant individuals. To protect users’ privacy, it should prevent the cloud server from obtaining users’ possible solutions [15]. Unfortunately, if the cloud server has no knowledge of possible solutions, it fails to evaluate the fitness values of individuals in the population.

Third, the cloud server can select dominant individuals without knowing individuals’ fitness values. EA is inspired by the process of natural selection, so its critical operation is to select dominant individuals based on individuals’ fitness values. Technically, it requires the cloud server to compare individuals’ fitness values under unknowing them. Intuitively, secure comparison protocols [22, 19, 18] seems to provide a potential solution for this. However, the work [22] requires two-party holding private data to perform a comparison operation. If the user participates in the comparison operation, it significantly increases the user’s communication overhead as EA needs several hundred or thousands of individuals to generate the approximate optimal solution. The protocols [19, 18] only generates an encrypted comparison result. Given encrypted comparison results, the cloud server fails to select dominant individuals. In short, selecting dominant individuals has challenges in communications and operations.

To tackle the above challenges, this paper focuses on the implementation of EaaS through GA and carefully designs a privacy-preserving GA, called PEGA222PEGA comes form Privacy-prEserving Genetic Algorithm. Specifically, we exploit the threshold Paillier cryptosystem (THPC) [6] and one-way mapping function to protect the user’s privacy. The homomorphism of THPC enables evaluating individuals’ fitness values over encrypted data. Also, we propose a suite of secure computation protocols to support privacy-preserving evolutionary operations of GA, such as selection. Our contributions can be concluded as three-folds.

  • We propose a novel computing paradigm, EaaS, a privacy-preserving evolutionary computation paradigm that outsources evolutionary computation to a cloud server. EaaS does not require users to have expertise of EAs and programming skills for EAs implementation but can output the approximate optimal solution for users. Furthermore, EaaS does not leak users’ privacy to the cloud server.

  • We carefully design PEGA, a privacy-preserving genetic algorithm based on the computing paradigm EaaS. Particularly, a secure division protocol (SecDiv) and a secure comparison protocol (SecCmp) are presented to support privacy-preserving fitness proportionate selection. SecDiv and SecCmp

    enable the cloud server computing the probability of each individual being selected and select potentially dominant individuals without disclosing possible solutions, respectively.

  • We take four TSP (a widely kwnon COP) datasets (i.e., gr48, kroA100, eil101, and kroB200) to evaluate the effectiveness and efficiency of PEGA. Resultss of experiments and analyses on four TSP datasets demonstrate that PEGA is as effective as the conventional GA [5] in approximating the optimal solution.

The rest of this paper is organized as follows. In Section II, the related work is briefly described. In Section III, we formulate EaaS and PEGA. The design of PEGA is elaborated in Section IV. In Section V, PEGA for TSP is given. Results of privacy analysis and experimental evaluation are shown in Section VI. Finally, we conclude the paper in Section VII.

Ii Related Work

In this section, we briefly review privacy-preserving evolutionary algorithms (EAs). In contrast to privacy-preserving neural networks (NNs) inference

[2, 14], privacy-preserving evolutionary algorithms have received little attention. One possible reason is EAs require the server to perform random operations, such as population initialization, mutation, while the operations of NNs are generally deterministic. Also, privacy-preserving NNs inference does not need the server to obtain intermediate results. On the contrary, privacy-preserving EAs require the server to learn intermediate results to perform subsequent operations. For example, the server requires to learn the plaintext comparison result to select dominant individuals.

Sakuma et al. [15] proposed a privacy-preserving GA by means of the idea of secure multi-party computation and the Paillier cryptosystem to solve TSP. The work [15] considers a scenario where multiple servers hold traveling costs while a user wants to choose the server that provides the optimal route; Servers and the user are unwilling to disclose their own private data. Thus, the work [15] requires interaction between the user and servers. Han et al. [3] presented a privacy-preserving GA for rule discovery, where two parties holding datasets jointly perform a privacy-preserving GA to discover a better set of rules in a privacy-preserving manner. The scheme [3] also needs two parties to interact to generate an optimal solution. Funke et al. [1] designed a privacy-preserving multi-object EA based on Yao’s secure two-party protocol [22]. The authors in [1] claim that their solution improves security and efficiency, but their solution still requires two parties to interact. Jiang et al. [4] put forward to a cloud-based privacy-preserving GA by means of somewhat homomorphic encryption, where a user outsources operations of GA to the cloud server. However, the work [4] fails to support privacy-preserving selection operations, and no practical problem is involved to evaluate its effectiveness and efficiency. Zhan et al. [23]

proposed a rank-based cryptographic function (RCF) to construct privacy-preserving EAs including particle swarm optimization and differential evolution. However, the authors do not the construct of RCF and their scheme suffers from some privacy concerns. Although a designer in

[23] fails to obtain the fitness function, he holds possible solutions. Thus, as long as the designer learns which solution is dominant, he can obtain the approximate optimal solution, which discloses a user’s privacy.

From the view of existing privacy-preserving EAs, there is no effective solution that provides a privacy-preserving evolution service for users that does not require the user to interact. Motivated by this, we formulate EaaS and give its implementation through GA.

Iii Formulation

In this section, we give formal definitions of evolutionary as a service (EaaS) and privacy-preserving genetic algorithm (PEGA), where PEGA is a concrete implementation of EaaS.

Iii-a Formulation for EaaS

Input: A user has a combinatorial optimization problem , and a cloud server hold competitive EAs.
Output: The user obtains .
Procedure:
Encrypt problem (@user):

  • Initialize and its optimization function .

  • .

  • Send to the cloud server.

Perform evolution (@cloud server):

  • , where , and denotes the population size.

  • Return to the user.

Obtain solution (@user):

  • .

Fig. 1: Structure of EaaS.
Definition (EaaS).

EaaS consists of users and a cloud server, where users have a requirement of solving a COP (denoted by ) through evolutionary algorithms (EAs), whilst the cloud server holds competitive EAs and sufficient resources to perform EAs. The cloud server encapsulates EAs as a server and renders convenient and flexible evolutionary computing service for users. To avoid exposing privacy to the cloud server, users encrypt the content of the COP denoted by and outsource it to the cloud server. Taking as input and an EA, the cloud server performs evolutionary operations (e.g., evaluation, selection, crossover, mutation) denoted by and returns an encrypted optimal solution to the user, where indicates an encrypted solution, and is the objective function of . Formally, EaaS can be formulated as the following pattern

(1)

where is the population size. Fig. 1 shows the structure of EaaS.

From Definition III-A and Fig. 1, we see that EaaS does not ask the user to have the expertise and resources to solve a COP through EAs. To approximate the optimal solution of the COP, the user outsources operations to the cloud server. The cloud server is given encrypted data, so it fails to learn contents of the COP. In other words, EaaS enables the cloud server perform evolutionary operations over encrypted data and generates encrypted optimization solutions to protect the user’s privacy. The key of EaaS is to support evolutionary operations on encrypted data.

Iii-B Formulation for PEGA

To validate the computing paradigm of EaaS, we take GA, a widely known EA, as an example to concrete EaaS, called PEGA. GA usually comprises 5 polynomial time operations: population initialization, evaluation, selection, crossover, and mutation, where the later four are regarded as evolutionary operators [5]. A formal definition of PEGA can be as follow.

Definition (Pega).

A privacy-preserving genetic algorithm (PEGA) takes as input an encrypted COP and its optimization function , and outputs an encrypted optimization solution , . Formally, PEGA can be formulated as the following pattern

(2)

where indicates the repetition, and denotes the iteration times. , , , , and indicate operations of population initialization, evaluation, selection, crossover, and mutation, respectively. Note that , , , , and take as input encrypted data and output encrypted data.

Input: A user has a combinatorial optimization problem , and a cloud server hold a competitive GA.
Output: The user obtains .
Procedure:
Encrypt content (@user):

  • Initialize and its optimization function .

  • .

  • Send to the cloud server.

Perform evolution (@cloud server):

  • , where , and is the iteration times.

  • Return to the user.

Obtain solution (@user):

  • .

Fig. 2: Structure of PEGA.

From Definition III-A and Definition III-B, we see that PEGA is to concrete as , , , and . In next section, we elaborate on the design of PEGA, specially for how to execute evolutionary operations on encrypted data.

Iv PEGA Design

To self-contained, we first list threshold Paillier cryptosystem (THPC) used to encrypt the COP, and then give system model and threat model of PEGA. Next, details of PEGA design are illustrated.

Iv-a Primitive

The detailed algorithms of THPC with (2, 2)-threshold decryption are listed as follows.

Key Generation (KeyGen): Let and be two big prime numbers with bits (e.g., ), where are also prime numbers. The public key is denoted by , where and . The private key is denoted by , where and . Particularly, the private key is split into and two partially private keys, , and . As , and . Let be a random integer in the interval and .

Encryption (Enc): Take as input a message and , and output , where and is a random number in .

Decryption (Dec): Take as input a ciphertext and , and output , where .

Partial Decryption (PDec): Take as input a cihpertext and a partially private key (1 or 2), and output .

Threshold Decryption (TDec): Take as input partially decrypted ciphtexts and , and output .

The homomorphic operations on ciphertexts supported by THPC are described as follows.

  • Additive homomorphism: ;

  • Scalar-multiplication homomorphism: for .

On the basis of additive homomorphism and scalar-multiplication homomorphism, THPC enables subtraction over encrypted data. Specifically, .

Note that any single partially private key fails to decrypt any ciphertexts. Also, as operations over ciphertexts encrypted by Enc require to perform a operation, for brevity, we will omit in the rest of this paper. Just like the Paillier cryptosystem [11], THPC only works on integer. To effectively handle floating-point numbers, a given floating-point number be encoded as , where is a constant, for example, is used to encode a 64-bit floating-point number. In this paper, if a message to be encrypted is a floating-point number, it is encrypted as , i.e., . To simplify notation, we use to denote in the rest of paper.

Iv-B System Model and Threat Model

In our system, we consider a user outsources an encrypted COP to twin-cloud servers (i.e., and ). Twin-cloud servers jointly provide a privacy-preserving GA service to solve the encrypted COP through performing secure two-party computations. The user obtains an encrypted optimization solution from . As depicted in Fig. 3, FEGA comprises a user and twin cloud servers.

Fig. 3: PEGA system model.
  • User: The user has a COP to be solved and outsources the problem to cloud servers with powerful computation and sufficient resources. To protect privacy, the user initializes a public/private pair () of THPC, and then encrypts the problem with the public key as . Also, in order to enable cloud servers performing evolutionary operators over encrypted data, the user splits the private key into two partially private keys and and sends them into and , respectively.

  • Cloud server 1 (): takes charge of storing sent from the user. and jointly perform secure two-party computation protocols over encrypted data to support the operations of GA. Note that can directly execute certain homomorphic operations (e.g., additive homomorphism and scalar-multiplication homomorphism) over encrypted data supported by THPC.

  • Cloud server 2 (): is responsible for assisting to perform the operations of GA in a privacy-preserving manner.

In the system of PEGA, the computation is outsourced to cloud servers. According to the outsourced computation situation [16], there is one type of adversary that attempts to obtain the user’s private information, i.e., contents of the COP. The adversary involves either or . Inspired by prior work [14, 9], we assume either and are curious-but-honest (or say semi-honest), i.e., they follow required computation protocols and perform required computations correctly, but may try to obtain the user’s private information with the help of encrypted TSP and intermediate computation results. Note that and do not share their partially private keys and parameters in a non-colluding twin-server architecture [14, 9]. The assumption of no-colluding twin-cloud servers is reasonable. Anyone cloud server shares the private parameters or intermediate computation results with the other one, which means the cloud gives data leakage evidence to the other one. Arguably, for its own commercial interests, any cloud server is unwilling to provide data leakage evidence to other.

Iv-C Overview of PEGA

In this section, we give a high-level description of our proposed PEGA. The goal of PEGA is to perform operations of GA over encrypted data and output an encrypted optimization solution. As shown in Fig. 4, PEGA consists of 5 polynomial-time operations, i.e., Gen_Initial_Pop, Evaluation, Selection, Crossover, and Mutation, and their briefly description is given as follows.

Fig. 4: Overview of PEGA.
  • Gen_Initial_Pop: Given an encrypted COP , Gen_Initial_Pop randomly generates a population compromising individuals. Each individual is denoted by a chromosome. Each chromosome consists of genes. Formally, Gen_Initial_Pop takes as input , and outputs encrypted chromosomes denoted by , and .

  • Evaluation: Given encrypted chromosomes, Evaluation firstly computes the fitness value of each encrypted chromosome. Specifically, Evaluation utilizes the homomorphism of THPC to obtain encrypted fitness value of each encrypted chromosome according to the optimization function . Formally, Evaluation takes as input and , and outputs . Also, Evaluation outputs the optimal chromosome holding minimum fitness value. To this end, we carefully design a secure comparison protocol (SecCmp) that can compare and ( and ). Formally, given and , SecCmp outputs , where . Thus, given , Evaluation can output via SecCmp, where , and .

  • Selection: For encrypted chromosomes, Selection explores the well-studied fitness proportionate selection operator [24] to select dominant individuals. Specifically, Selection firstly computes an encrypted probability for each individual. After that, Selection performs operations of fitness proportionate selection over encrypted probabilities. The critical operations of Selection include addition, division, and comparison on encrypted data. To enable division on encrypted data, we propose a secure division protocol (SecDiv). Formally, given and , SecDiv outputs .

  • Crossover: Choosing two chromosomes as parents, Crossover performs a crossover operator to enable two parents crossover generating children. Roughly speaking, Crossover exchanges some genes of two parents to generate new chromosomes.

  • Mutation: For each encrypted chromosome, Mutation exchanges some genes of the chromosome to generate a new chromosome.

When iterations is used as the termination condition, except for Gen_Initial_Pop, Evaluation, Selection, Crossover, and Mutation require to repeat. PEGA takes encrypted data as input, generates encrypted intermediate results, and outputs encrypted optimization solution to protect privacy.

Iv-D Privacy-preserving Protocols for PEGA

In this section, we first elaborate on the secure division protocol (SecDiv) and the secure comparison protocol (SecCmp) that are used to construct PEGA. Next, through SecDiv and SecCmp, we design a secure probability algorithm (SecPro) and a secure fitness proportionate selection algorithm (SecFPS) to support Selection on encrypted data. Also, SecCmp enables Evaluation over encrypted data.

Iv-D1 Secure Division Protocol (SecDiv)

Given and , where , SecDiv outputs . The key idea of SecDiv is to convert division to scalar multiplication. Specifically, for any integers , we have , where and is an integer. Formally, . SecDiv consists of three steps.

  • calls PDec to partially decrypt to get . Next, sends to .

  • calls PDec to partially decrypt to get and then calls TDec to obtain with and . After that, computes . Next, encodes as . Finally, returns to . Clearly, is an integer.

  • computes .

Iv-D2 Secure Comparison Protocol (SecCmp)

Given and , SecCmp outputs 0 when , 1 otherwise (). Formally, . SecCmp consists of three steps.

  • generates a random number through tossing a coin. computes

    (3)

    where are two randomly integers, , , , and . is secure parameter, e.g., . Next, calls PDec to get , and sends to .

  • calls PDec to get and then calls TDec to obtain with and . If , sets , otherwise, . Finally, sends to .

  • obtains the comparison result by computing . When , we have when , otherwise, . When , we have when , otherwise, .

Clearly, given , it is easy to implement Evaluation by calling SecCmp. Specifically, Evaluation performs comparison operations on encrypted data to obtain the optimal chromosome holding minimum fitness values.

For brevity, we utilize to denote . According to the fitness proportionate selection operator [24], it requires to compute each individual’s probability. Thus, the individual’s probability is denoted by

(4)

However, to protect users’ privacy, the cloud server only obtains . Given , it is not trivial for the cloud server to compute . Fortunately, the proposed SecDiv offers a potential solution. Specifically, through the proposed SecDiv, we design a secure probability algorithm (SecPro) to compute each individual’s probability on encrypted data. Given encrypted fitness values , SecPro outputs encrypted probabilities . Formally, . As shown in Algorithm 1, SecPro consists of three steps.

  • firstly computes by the additive homomorphism of THPC, so we have . After that, calls PDec to partially decrypt to get . Next, sends to .

  • calls PDec to partially decrypt to get and then calls TDec to obtain with and . After that, computes and . Next, encodes as . Finally, returns to . Clearly, is an integer.

  • computes for . It can be seen that .

1Input: has .
Output: obtains . Step 1. computes
  • ;

  • ;

  • and sends to .

Step 2. computes
  • and ;

  • , , and ;

  • and sends to .

Step 3. computes
  • for .

Algorithm 1 .

From Algorithm 1, we see that . If , we have . In other words, Algorithm 1 does not change the numerical relationship among probabilities of individuals.

Also, to enable fitness proportionate selection on encrypted data, we construct a secure fitness proportionate selection algorithm (SecFPS) via SecCmp. Given encrypted probabilities , SecFPS outputs individuals. The key idea of SecFPS is to perform comparison operations over encrypted data. Formally, , where represents a population consisting of individuals. As shown in Algorithm 2, SecFPS consists of three steps.

  • generates encrypted random numbers and sends them to . Note that as , the random number multiplies by to reach the same order of magnitude as .

  • computes for . Thus, we have . In other words, produces a ciphertext set of orderly sequence .

  • and jointly perform a binary search over encrypted data to find the individual , and ( through calling SecCmp. Repeat step (3) until generating individuals.

1Input: has .
2 Output: obtains .
3 computes for , where is a random number in and then sends to ;
4 for  to  do
5       computes ;
6 end for
7for  to  do
8       and jointly perform ;
9       adds to ;
10      
11 end for
12FindIndividual begin
13       ;
14       if  returns  then
15             return FindIndividual;
16            
17       end if
18      else
19             if  then
20                   return ;
21                  
22             end if
23            else
24                   if  returns  then
25                         return ;
26                        
27                   end if
28                  else
29                         return FindIndividual;
30                        
31                   end if
32                  
33             end if
34            
35       end if
36      
37 end
38
Algorithm 2 .

Note that the proposed SecCmp can be used to construct secure selection operators, such as secure tournament selection, secure elitism selection. The critical operation for tournament selection and elitism selection is to compare fitness values of individuals [24], which is supported by SecCmp.

V PEGA for TSP

This section takes TSP, a widely known COP, as an example to demonstrate the idea of EaaS through the proposed PEGA.

V-a Problem Encryption

Given a list of cities and the traveling cost between each possible city pair, TSP is to find the shortest possible route that visits each city exactly once and returns to the origin city. Formally, as shown in Fig. 5, the TSP can be denoted by a strictly upper triangular matrix, where the entry in the matrix represents the traveling cost of a city pair. For example, "6" is the traveling cost between WDC and CHI.

Fig. 5: An example of encrypted TSP.
Definition (Encrypted TSP).

An encrypted TSP means the city list and traveling cost between each possible city pair of a plaintext TSP are mapped into random numbers through cryptographical functions; it requires finding the shortest possible route that visits each city once and returns to the origin city over encrypted city list and traveling costs. Formally, let be a TSP matrix, the encrypted TSP is denoted by

(5)

where represents a family of cryptographical functions.

Clearly, generating an encrypted TSP requires encrypting the list of cities and the traveling cost between possible city pairs. On the one hand, as described PEGA, we exploit THPC to encrypt TSP. Specifically, each entry of is encrypted through THPC. On the other hand, a one-way hash function can serve as the cryptographical function to map the city list into random numbers. However, a hash function always generates the same output when the same input is given. If the cloud server knows all cities, it is easy to obtain the city list of through executing efficient hashing operations. Also, the output of is usually more than 256 bits, which incurs a high communication and storage cost. As depicted in Fig.5, we observe that given a TSP, its representation matrix is not unique. Inspired by this, we assume all cities are denoted by , and their mapping is denoted by , where is the set of natural numbers. Thus, in this paper, we define a one-way function that randomly maps one city into one unique natural number. For example, in Fig. 5, "WDS" is mapped into "4" and "1" in encrypted TSP 1 and encrypted TSP 2, respectively. Formally, for any item (e.g., "1") of , it can represent any city. Thus, when the city list of a TSP is randomly mapped into a natural number, the cloud server fails to obtain the city list.

Fig. 5 gives the storage structure of encrypted TSP. Specifically, the first row and the first column of denotes the city index. represents the traveling cost between city and city . "0" indicates that the two cities are unreachable, whereas it indicates two cities are reachable. Assume the size of cities be , the objective function can be denoted by

(6)

where is a possible route. Finally, a user outsources to .

V-B Problem Solving via PEGA

In this section, we elaborate on how to solve an encrypted TSP through PEGA.

V-B1 Initialization

Given , initializes encrypted chromosomes denoted by , where is denoted by an array of index of , such as "---". As the one-way function is adopted, the index of does not disclose the city. Thus, given , is able to generate encrypted chromosomes to initialize a population.

V-B2 Evaluation

Given and , can compute . Specifically, without loss of generality, let be denoted by "---", computes an encrypted fitness value as

(7)

As the additive homomorphsim of THPC, we see that . Thus, can calculate and generate . Next, and jointly compute and find out the encrypted chromosome holding minimum fitness value by calling SecCmp. Specifically, without loss of generality, assume , i.e., , outputs denoted by "---" and sets it as the optimal chromosome.

V-B3 Selection

can choose different selection operators, such as fitness proportionate selection, tournament selection, elitism selection, to perform a selection operator. In here, we consider utilizes the fitness proportionate selection operator as the selection operator. Specifically, firstly cooperates with to obtain by calling SecPro, where is the probability of the individual (). After that, teams with to generate a new population by calling SecFPS.

V-B4 Crossover

Given encrypted chromosomes , can adopt the conventional crossover operator (such as edge recombination crossover operator, ERX [5]) to generate children. Assume chooses denoted by "---" and denoted by "---" as two parent chromosomes, it is easy for to generate two children by calling ERX [5].

V-B5 Mutation

Given encrypted chromosomes, is able to perform mutation operations on . Specifically, can change the element of .

Vi Results of Privacy Analysis and Experiment

Vi-a Privacy Analysis

THPC [6] have been proved to be semantically secure. Thus, the homomorphic operations performed by do not disclose the user’s private data. In this paper, we carefully design SecDiv and SecCmp based on a non-colluding twin-server architecture to select dominant individuals in a privacy-preserving manner. In this section, we demonstrate SecDiv and SecCmp are secure to perform division and comparison over encrypted data.

Theorem.

Given and , where , SecDiv does not disclose .

Proof.

Given and (), SecDiv computes to produce . When is larger enough, must be an integer. Without loss of the generality, let , we have . Thus, SecDiv essentially is to perform one scalar multiplication operation. As THPC is semantically secure, does not leak . Therefore, SecDiv does not disclose .

Lemma.

SecPro can produce each individual’s encrypted fitness value and encrypted probability without leaking the individual’s city list and route cost.

Proof.

According to Algorithm 1, we see , where is the sum of all individuals’ route costs, and is the individual route cost. As Theorem VI-A holds, SecPro can securely compute encrypted fitness values. Also, we have . Thus, we say that SecPro can securely compute encrypted probabilities when Theorem VI-A holds.

Theorem.

Given and , SecCmp does not disclose and .

Proof.

In the view of , he only learns encrypted data, so SecCmp does not disclose and to as THPC is semantically secure. In the view of , he can learns ( or (). However, as and are unkown for , given either or , fails to get , , , and . Thus, SecCmp does not leak and to . In short, SecCmp does not disclose and . Furthermore, even though knows , he cannot get as fails to know and .

Lemma.

SecFPS can select dominant individuals over encrypted data without leaking the individual’s probability.

Proof.

From Algorithm 2, we see although can obtain or (), he fails to know . Thus, we say that fails to learn . Also, can get or , but he fails to obtain as Theorem VI-A holds. In short, SecFPS does not disclose the individual’s probability.

Vi-B Experimental Evaluation

In this section, we evaluate the effectiveness of PEGA by comparing it with two conventional GA variants [5] and give the performance of PEGA in terms of the computational complexity and communication costs. Specifically, the first GA variant adopts the fitness proportionate selection as the selection operator (named GA1), and the second one adopts the -tournament as the selection operator (named GA2). Note that GA1 and GA2 utilize the ERX operator as the crossover operator due to its remarkable performance for TSP [5]. Through our proposed secure computing protocols, PEGA can support fitness proportionate selection and -tournament selection. Roughly speaking, SecFPS based on SecCmp and SecDiv enables the fitness proportionate selection. Also, SecCmp naturally supports the -tournament selection.

The experiments are executed on four most widely used TSP datasets333http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/tsp/ (i.e., gr48, kroA100, eil101, kroB200), where gr48 and kroA100 are small scale, while eil101 and kroB are medium scale [21]. We implement PEGA and conventional GA variants in Java. The experiment is performed on a personal computer running windows 10-64bit with an Intel Core i7-4790 CPU 3.6 GHz processor and 16 GB memory, which acts as the user. Also, the server running windows 10 64 bit with an Intel Core i7-10700 CPU 2.9 GHz processor, and 32 GB memory, which simulates two cloud servers. Since GA is a stochastic approach, 30 independent runs are executed for each algorithm to generate an average. Experimental settings are listed in Table I, where denotes the length of in bits. The crossover rate and the mutation rate use settings in [21, 24].

Parameters Values
The population size
The crossover rate
The mutation rate
, ,
-tournament
The max number of generations 10000
TABLE I: Experimental Parameter Settings

Vi-C Effectiveness Evaluation

Given four TSPs, i.e., gr48, kroA100, eil101, and kroB200, we firstly compare the performance between GA1 and GA2. Experimental results are shown in Fig. 6. The axis is the number of generations and the axis is the path length of routing. The red solid line and blue dashed line represent GA1 and GA2, respectively. As depicted in Fig. 6, we see that GA2 is remarkably superior to GA1 in terms of convergence. Specifically, in contrast to GA1, GA2 always converges to a smaller path length of routing in four given TSPs. In other words, GA2 has a stronger ability in approximating the optimal solution than GA1. Thus, we argue that -tournament selection outperforms fitness proportionate selection for TSPs. One possible explanation is that -tournament selection always selects a dominant individual into next generation, whilst poor individuals are possible to be selected by the fitness proportionate selection.

(a) rossover rate (0.08), mutation rate (0.1)
(b) rossover rate (0.1), mutation rate (0.15)
(c) rossover rate (0.1), mutation rate (0.15)
(d) rossover rate (0.1), mutation rate (0.15)
Fig. 6: Comparison of the convergence between GA1 and GA2 (The average result of 30 independent runs).

Although GA2 outperforms GA1 as shown in Fig. 6, to demonstrate the effectiveness of the proposed PEGA, we construct PEGA1 and PEGA2, where PEGA1 and PEGA2 adopt the same evolutionary operators as GA1 and GA2, respectively. For four TSPs, i.e., gr48, kroA100, eil101, and kroB200, the comparison results between PEGA and GA are presented in Table II

. To perform statistical tests, Wilcoxon rank-sum test at significance level 0.05 is adopted to examine whether compared results are significantly different. Also, mean and standard deviation are tested. The best results are highlighted in bold based on the p-value of the Wilcoxon rank-sum test. Particularly, to make a fair comparison, PEGA1 and PEGA2 use the same initial population as GA1 and GA2, respectively.

As depicted in Table II, in terms of mean, PEGA1 outperforms GA1 on gr48, kroA100, and kroB200. Meanwhile, PEGA2 outperforms GA2 on gr48 and kroA100. In terms of std, PEGA1 has less std on gr48, eil101, and kroB200 that are not exactly the same as those PEGA1 being superior on the mean. Thus, we can learn that less mean does not generate less std. From Table II, we see that the p-value in four TSPs is larger than 0.05, so it can conclude that there is no significant difference between PEGA1 and GA1. Similarly, PEGA2 and GA2 do not have significant difference. One possible explanation is that PEGA and GA perform the same evolution operators. Furthermore, our proposed secure computing protocols do not introduce noise into computational results, which guarantees calculation accuracy. The only difference between PEGA and GA is that PEGA performs evolution operators on encrypted data to protect privacy, on the contrary, GA performs evolution operators on cleartext data directly. The statistical results of mean and std between PEGA and GA are different. This is because PEGA and GA use different random numbers during performing evolution operators.

Problems Scale Statistical tests PEGA1 GA1 PEGA2 GA2
gr48 small mean 6.2071e+03 6.3138e+03 5.2949e+03 5.3033e+03
std 515.74 530.22 119.78 98.32
p-value 0.3183 1.0
kroA100 mean 6.8017e+04 6.8961e+04 2.2819e+04 2.3175e+04
std 2.4004e+03 1.9935e+03 619.31 755.14
p-value 0.3615 0.1150
eil101 medium mean 1.4739e+03 1.4431e+03 686.4667 683.8667
std 45.85 82.73 15.68 9.99
p-value 0.5475 0.5746
kroB200 mean 1.7723e+05 1.7895e+05 3.3878e+04 3.3775e+04
std 5.8435e+03 3.6344e+03 761.45 615.0
p-value 0.4679 0.8419
TABLE II: Comparison Results between PEGA and GA (The average result of 30 independent runs)

To visualize the above conclusion, we plot convergence curves of PEGA1, GA1, PEGA2, and GA2 on four TSPs shown in Fig. 7. The axis is the number of generations and the axis is the path length of routing. The red solid line and blue solid line represent GA1 and GA2, respectively. Cyan dashed line and black dashed line represent PEGA1 and PEGA2, respectively. Fig. 7 visually shows that PEGA1 and GA1 have the same convergence trend, and PEGA2 and GA2 has the same convergence trend. As shown in Table II and Fig. 7, we argue that PEGA is as effective as GA for TSPs in approximating the optimal solution.

(a) rossover rate (0.08), mutation rate (0.1)
(b) rossover rate (0.1), mutation rate (0.15)
(c) rossover rate (0.1), mutation rate (0.15)
(d) rossover rate (0.1), mutation rate (0.15)
Fig. 7: Comparison of the convergence between PEGA and GA (The average result of 30 independent runs).

To further demonstrate the effectiveness of PEGA, we make PEGA1 and PEGA2 use the same random numbers with GA1 and GA2 to perform evolutionary operations, respectively. The experimental results are given in Fig. 8. Magenta circle and cyan circle represent GA1 and GA2, respectively. Blue solid line and black solid line represent PEGA1 and PEGA2, respectively. From Fig. 8, we see that PEGA1 and GA1 has the same convergence, and PEGA2 and GA2 has the same convergence, when the same random numbers are adopted. This is because our proposed secure computing protocols support exactly computations on encrypted data. In fact, Fig. 8 illustrates that PEGA is as effective as GA. In other words, given encrypted TSPs, PEGA can effectively approximate the optimal solution as GA.

(a) rossover rate (0.08), mutation rate (0.1)
(b) rossover rate (0.1), mutation rate (0.15)
(c) rossover rate (0.1), mutation rate (0.15)
(d) rossover rate (0.1), mutation rate (0.15)
Fig. 8: Comparison of the convergence between PEGA and GA (The average result of 30 independent runs).

Vi-D Efficiency Evaluation

In this section, we evaluate the efficiency of PEGA in terms of communication cost and computation cost. Table III shows the comparison results of communication cost between PEGA and GA. From Table III, we see that PEGA has a larger communication cost than GA. In PEGA, a user submits an encrypted TSP matrix, i.e., . On the contrary, the user in GA submits the TSP matrix directly. As the ciphertext of THPC is significantly larger than its plaintext, the communication cost of PEGA is larger than that of GA. Also, when a small is set, it can significantly reduce the communication cost of PEGA. One possible explanation is that the smaller , the smaller the ciphertext size of THPC is. As shown in Table III, we see that even a large TSP (e.g., kroB200) and a large are set, the communication cost of PEGA is less than 6 MB.

gr48 kroA100 eil101 kroB200
GA 7 KB 33 KB 25 KB 132 KB
PEGA 173 KB 0.74 MB 0.76 MB 2.98 MB
PEGA 373 KB 1.46 MB 1.50 MB 5.90 MB
  • Note. : the length of public key of Paillier cryptosystem in bits in PEGA is 128; : the length of public key of Paillier cryptosystem in bits in PEGA is 256;

TABLE III: Comparison of Communication Cost between PEGA and GA (The average result of 30 independent runs)

Assume there be individuals and cities (). A GA consists of Gen_Initial_Pop, Evaluation, Selection, Crossover, and Mutation. Gen_Initial_Pop initializes dimension individuals, so its computational complexity is . Evaluation is to compute each individual’s route cost, so its computational complexity is also . Selection generally selects new individuals via a proportionate selection operator. The computational complexity of the conventional fitness proportionate selection operator is . In this paper, PEGA adopts the idea of binary search to select new individuals, and its computational complexity is . Thus, PEGA improves the performance of Selection comparing to conventional GA. In this paper, we adopt ERX to perform Crossover. The computational complexity of ERX is . The computational complexity of Mutation is for a population with individuals. Thus, we see that the computational complexity of GA is , while that of PEGA is , where is the number of generations.

In contrary to GA, PEGA requires to encrypt the TSP matrix . Fig. 9 shows the runtime of encryption and searching for the optimal route of PEGA, where the runtime of searching is running one generation. From Fig. 9, we can learn that the more cities, the more runtime of encryption for PEGA is. PEGA requires encrypting reachable routes between two cities. The more cities, the more reachable routes between two cities are. Also, PEGA takes around 3 s to produce a potential solution when -tournament selection is adopted. Furthermore, for four TSPs, PEGA takes almost the same runtime to produce a possible solution. For PEGA, Selection performs most computations on encrypted data, and computations on encrypted data are time-consuming than computations on plaintext data. As the computational complexity of Selection of PEGA is , when is the same, PEGA takes almost the same runtime to produce a possible solution for four different TSPs. From Fig. 9, we also see that the fitness proportionate selection consumes more runtime to generate a possible solution. One possible explanation is that the fitness proportionate selection operator requires more operations on encrypted data than -tournament selection operator.

Fig. 9: Runtime of PEGA (The average result of 30 independent runs).

Vii Conclusion

In this paper, we proposed the computing paradigm of evolution as a service (EaaS) and designed a privacy-preserving genetic algorithm for COPs based on EaaS, called PEGA. To show the effectiveness and efficiency of PEGA, we use the widely known TSP to evaluate PEGA. In PEGA, a user encrypts her TSP matrix to protect the privacy and outsources the evolutionary computations to cloud servers. The cloud server performs evolutionary computations over encrypted data and produces an effective solution as conventional GA. To support operations on encrypted TSPs, this paper presented a secure division protocol (SecDiv) and a secure comparison protocol (SecCmp) falling in the twin-server architecture. Experimental evaluations on four TSPs (i.e., gr48, KroA100, eil101, and KroB200) show that there is no significant difference between PEGA and conventional GA. Also, given encrypted TSPs, PEGA with -tournament selection operator can produce one potential solution around 3 s. For future work, we will extend the idea of EaaS to other algorithms, such as particle swarm optimization (PSO), ant colony optimization (ACO).

References

  • [1] D. Funke and F. Kerschbaum (2010) Privacy-preserving multi-objective evolutionary algorithms. In Proceedings of International Conference on Parallel Problem Solving from Nature, pp. 41–50. Cited by: §I, §II.
  • [2] R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, M. Naehrig, and J. Wernsing (2016) Cryptonets: applying neural networks to encrypted data with high throughput and accuracy. In Proceedings of International Conference on Machine Learning, pp. 201–210. Cited by: §I, §I, §II.
  • [3] S. Han and W. K. Ng (2007) Privacy-preserving genetic algorithms for rule discovery. In Proceedings of International Conference on Data Warehousing and Knowledge Discovery, pp. 407–417. Cited by: §I, §II.
  • [4] L. Jiang and Z. Fu (2020) Privacy-preserving genetic algorithm outsourcing in cloud computing. Journal of Cybersecurity 2 (1), pp. 49. Cited by: §II.
  • [5] P. Larranaga, C. M. H. Kuijpers, R. H. Murga, I. Inza, and S. Dizdarevic (1999) Genetic algorithms for the travelling salesman problem: a review of representations and operators. Artificial intelligence review 13 (2), pp. 129–170. Cited by: 3rd item, §III-B, §V-B4, §VI-B.
  • [6] A. Lysyanskaya and C. Peikert (2001) Adaptive security in the threshold setting: from cryptosystems to signature schemes. In Proceedings of International Conference on the Theory and Application of Cryptology and Information Security, pp. 331–350. Cited by: §I, §VI-A.
  • [7] P. Mishra, R. Lehmkuhl, A. Srinivasan, W. Zheng, and R. A. Popa (2020) Delphi: a cryptographic inference service for neural networks. In Proceedings of USENIX Security Symposium, pp. 2505–2522. Cited by: §I, §I.
  • [8] P. Mohassel and P. Rindal (2018) ABY3: a mixed protocol framework for machine learning. In Proceedings of ACM SIGSAC conference on computer and communications security, pp. 35–52. Cited by: §I, §I.
  • [9] P. Mohassel and Y. Zhang (2017) Secureml: a system for scalable privacy-preserving machine learning. In Proceedings of IEEE symposium on security and privacy, pp. 19–38. Cited by: §IV-B.
  • [10] G. Naseri and M. A. Koffas (2020) Application of combinatorial optimization strategies in synthetic biology. Nature Communications 11 (1), pp. 1–14. Cited by: §I.
  • [11] P. Paillier (1999) Public-key cryptosystems based on composite degree residuosity classes. In International Conference on the Theory and Applications of Cryptographic Techniques, pp. 223–238. Cited by: §IV-A.
  • [12] V. T. Paschos (2017) Applications of combinatorial optimization (2nd edition). John Wiley & Sons. Cited by: §I.
  • [13] A. Radhakrishnan and G. Jeyakumar (2021) Evolutionary algorithm for solving combinatorial optimization—a review. Innovations in Computer Science and Engineering, pp. 539–545. Cited by: §I.
  • [14] D. Rathee, M. Rathee, N. Kumar, N. Chandran, D. Gupta, A. Rastogi, and R. Sharma (2020) CrypTFlow2: practical 2-party secure inference. In Proceedings of ACM Conference on Computer and Communications Security, pp. 325–342. Cited by: §I, §I, §II, §IV-B.
  • [15] J. Sakuma and S. Kobayashi (2007) A genetic algorithm for privacy preserving combinatorial optimization. In Proceedings of ACM Conference on Genetic and Evolutionary Computation, pp. 1372–1379. Cited by: §I, §I, §I, §II.
  • [16] Z. Shan, K. Ren, M. Blanton, and C. Wang (2018) Practical secure computation outsourcing: a survey. ACM Computing Surveys 51 (2), pp. 1–40. Cited by: §I, §IV-B.
  • [17] Y. Sun, B. Xue, M. Zhang, G. G. Yen, and J. Lv (2020) Automatically designing cnn architectures using the genetic algorithm for image classification. IEEE transactions on cybernetics 50 (9), pp. 3840–3854. Cited by: §I.
  • [18] T. Veugen, F. Blom, S. J. de Hoogh, and Z. Erkin (2015) Secure comparison protocols in the semi-honest model. IEEE Journal of Selected Topics in Signal Processing 9 (7), pp. 1217–1228. Cited by: §I.
  • [19] T. Veugen (2014) Encrypted integer division and secure comparison. International Journal of Applied Cryptography 3 (2), pp. 166–180. Cited by: §I.
  • [20] R. Wang and Z. Zhang (2021) Set theory-based operator design in evolutionary algorithms for solving knapsack problems. IEEE Transactions on Evolutionary Computation 25 (6), pp. 1133–1147. Cited by: §I.
  • [21] F. Wei, W. Chen, X. Hu, and J. Zhang (2019) An empirical study on evolutionary algorithms for traveling salesman problem. In Proceedings of International Conference on Information Science and Technology, pp. 273–280. Cited by: §VI-B.
  • [22] A. C. Yao (1982) Protocols for secure computations. In Proceedings of the Annual Symposium on Foundations of Computer Science, pp. 160–164. Cited by: §I, §II.
  • [23] Z. Zhan, S. Wu, and J. Zhang (2021) A new evolutionary computation framework for privacy-preserving optimization. In Proceedings of International Conference on Advanced Computational Intelligence, pp. 220–226. Cited by: §II.
  • [24] J. Zhong, X. Hu, M. Gu, and J. Zhang (2005) Comparison of performance between different selection strategies on simple genetic algorithms. In International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce, Vol. 2, pp. 1115–1121. Cited by: 3rd item, §IV-D2, §IV-D2, §VI-B.