Content-Based Multi-Source Encrypted Image Retrieval in Clouds with Privacy Preservation

09/22/2018 ∙ by Meng Shen, et al. ∙ UNSW Canberra IEEE Beijing Institute of Technology NetEase, Inc 0

Content-based image retrieval (CBIR) is one of the fundamental image retrieval primitives. Its applications can be found in various areas, such as art collections and medical diagnoses. With an increasing prevalence of cloud computing paradigm, image owners desire to outsource their images to cloud servers. In order to deal with the risk of privacy leakage of images, images are typically encrypted before they are outsourced to the cloud, which makes CBIR an extremely challenging task. Existing studies focus on the scenario with only a single image owner, leaving the problem of CBIR with multiple image sources (i.e., owners) unaddressed. In this paper, we propose a secure CBIR scheme that supports Multiple Image owners with Privacy Protection (MIPP). We encrypt image features with a secure multi-party computation technique, which allows image owners to encrypt image features with their own keys. This enables efficient image retrieval over images gathered from multiple sources, while guaranteeing that image privacy of an individual image owner will not be leaked to other image owners. We also propose a new method for similarity measurement of images that can avoid revealing image similarity information to the cloud. Theoretical analysis and experimental results demonstrate that MIPP achieves retrieval accuracy and efficiency simultaneously, while preserving image privacy.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recent years have witnessed the prosperity of image-sharing services and applications (e.g., Instagram), which results in an increasing demand for image retrieval. In early years, text-based image retrieval systems implemented by manual tagging image properties could fulfill the requirement of image retrieval. With the growing popularity of Internet users, hundreds of millions of images appear on the Internet per second, the traditional text-based image retrieval becomes gradually impractical, because it consumes prohibitive manpower and financial resources for labeling.. Content-based image retrieval (CBIR) cbir1 ; cbir2 ; cbir3

has been proposed for real world applications which uses the image feature extracted automatically from images, such as colors 

color1 ; color2 , textures texture1 ; texture2 , and shapes shape1 ; shape2 .

In general, higher resolution images consume more storage. For instance, a photo taken by a cell phone in present days may be about 2MB and the one taken by a professional camera may reach 10MB or more. With an increasing prevalence of cloud computing and storage  cloud1 ; cloud2 , migrating services to the cloud has rapidly become a trend for mass data storage and management. By outsourcing images to the cloud, service providers can make their services easily accessible to geographically distributed users by only requiring them to pay for the computation and storage resources they actually use.

Outsourcing images directly to cloud servers, however, increases the risk of privacy leakage when images contain sensitive information, such as patient’s medical information or personal location information. For instance, a compromised cloud vendor could enable access to the outsourced images by unauthorized users. In order to protect images against privacy leakage threats, images are usually encrypted before being outsourced to the cloud. Since encryption operations disrupt the image content, it becomes a challenging task to perform CBIR over encrypted images. Therefore, it is highly desirable to devise a privacy-preserving CBIR system for cloud-based encrypted image sets.

Many schemes have been proposed in the field of secure CBIR  ZhangY ; ZhangL ; XiaZH ; YuanJ ; YuanXL ; XiaZH1 ; LiW ; FerreiraB ; ZhangXP ; ChengH ; ZhangCY

, which can be roughly classified into two categories. In the first category, image owners extract features from plain images, and then outsource both the encrypted images and the encrypted image features to the cloud. In the second category, image owners outsource only the encrypted images to the cloud that is responsible for extracting features from encrypted images and for conducting retrieval operations.

Existing studies have a common limitation that they consider only a single-source case (i.e., a single image owner). In real-world applications, however, image retrieval is more likely to get multiple image sources involved. For instance, consider a cloud-based e-health application, which takes an encrypted ultrasonic medical image of an undiagnosed patient as input and searches for similar confirmed cases from a collection of encrypted medical images. Suppose images are collected from multiple hospitals (i.e., sources), each of which is reluctant and unpermitted to share with one another the plain medical images. The existing schemes can be easily extended to the multi-source scenario by performing retrieval over encrypted images of different owners one by one. Although simple and straightforward, it introduces multiple rounds of communications between users and individual image owners, and thereby becomes inefficient in terms of retrieval time and communication overhead.

There are several challenges in designing a secure and efficient CBIR scheme with multiple image owners. First, we should ensure the privacy of images and image features of different image owners. Second, the authorized query user should communicate his secret image encryption key with image owners for generating a secret query in secure image retrieval schemes. However, when secure image retrieval schemes have multiple image owners, each image owner should use their own secret image encryption key to encrypt images and image features. Then, the authorized query user should communicate his secret image encryption key with each image owner for generating a secret query, which will increase the communication overload in schemes. It is desirable to address this problem in the secure image retrieval scheme with multiple image owners. Finally, when the cloud executes image retrieval, it may obtain similarity relation information of images in the retrieval result. This privacy issue should be also addressed.

In this paper, we propose the MIPP, a novel content-based multi-source image retrieval scheme with privacy protection. MIPP operates in the same way as existing schemes in the first category, which outsources encrypted images along with their encrypted image features to the cloud. In order to address the challenges of supporting multiple image owners, we first encrypt images with a key stream and encrypt the corresponding image features by the secure multi-party computation method, and then propose a novel method to measure the image similarity; this can help to avoid revealing the image similarity information in cloud to a certain extent.

The main contributions in this paper are highlighted as follows:

  • We design a MIPP, which, to the best of our knowledge, is the first scheme belonging to the first category that enables content-based multi-source image retrieval with privacy protection. In the proposed MIPP, multiple image owners are allowed to encrypt images and image features by their unique secret image encryption keys. This enables an efficient image retrieval over images gathered from multiple sources, while providing guarantees that image privacy of an individual image owner will not be leaked to other image owners. Thus, the proposed MIPP can meet the practical requirements in real-world applications.

  • We present a new approach to measure the similarity of images, which can avoid the leakage of image similarity information in retrieval results. Extensive experimental results show that the resulting retrieval outcome is comparable to that with the typical Euclidean distance criterion.

The rest of this paper is organized as follows. We summarize the related work in Section 2 and introduce the preliminaries in Section 3. In Section 4, we present the system model, thread model, and design goals of our scheme. We detail the design of MIPP in Section 5 and present the security analysis in Section 6. We evaluate the performance of the proposed scheme in Section 7 and conclude this paper in Section 8.

2 Related work

In this section, we will present a brief overview of existing research schemes in the field of secure image retrieval.

Homomorphic encryption is a form of encryption that allows computations to be carried out on ciphertext, thus generating an encrypted result which, when decrypted, matches the result of operations performed on the plaintext. Zhang et al. ZhangY ; ZhangL leveraged the property of homomorphic encryption in secure image retrieval. The homomorphic encryption results in high computational complexity that makes it consume too much time. Xia et al. XiaZH proposed a secure CBIR scheme with Bag-of-Words model and Earth Mover s Distance. The retrieval index was constructed by locality-sensitive hashing. In this scheme, the user and image owner have two times of two-way communications, which results in high communication overhead. Yuan et al. YuanJ

proposed a scheme named SEISA with access control and secure k-means outsource, dynamically updating images is supported. Yuan et al. 

YuanXL proposed a scheme that can explore user relationship while preserving image privacy. The secure index and encrypted image features were constructed by an entity called SF rather the image owner. This scheme supports dynamic updates of images without affecting the current social structure. The scheme proposed by Xia et al. XiaZH1 was able to deter the illegal distribution of images while preserving the image privacy. Li et al. LiW proposed a privacy preserving retrieval scheme for outsourced media, which used the one-way hash along with encrypting partly hash values to encrypt image features. This scheme created trade offs among privacy preserving, retrieval quality, and complexity through adjusting the bit counts of encryption in the hash value.

In the second category, Ferreira et al. FerreiraB proposed a scheme IES-CBIR that can extract image features from encrypted images. The texture and color features were encrypted separately; the color feature was encrypted by scrambling pixels in HSV color model and the texture feature is encrypted by shuffling rows and columns in images. This scheme enabled dynamic updates of images by using Bag-Of-Visual-Words model. Zhang et al. ZhangXP

proposed a histogram-based retrieval scheme for encrypted JPEG images with machine learning. They encrypted images by permuting DCT and the server to retrieve the histogram at each frequency position from encrypted images. Cheng et al. 

ChengH

proposed a markov process based retrieval scheme for encrypted JPEG images. Markov’s process models of the AC coefficients and the server could extract features from the transition probability matrices of those AC coefficients of encrypted images. Zhang et al. 

ZhangCY proposed an encrypted medical image retrieval algorithm based on DWT-DCT frequency domain. In this algorithm, features were extracted from encrypted images.

As described in Section 1, these schemes may lead to heavy communication overload and privacy leakage (e.g., image features and image similarity) when simply extended for CBIR with multiple image owners.

3 Preliminary

In this section, we will introduce the preliminaries, including image features used in this paper and the secure multi-party computation.

3.1 Image feature

MPEG-7 ManjunathBS ; MejiaLavalle standard is the multi-media content description interface that contains a set of descriptors. We extract the Edge Histogram Descriptor (EHD) from images for our secure CBIR scheme. Edge Histogram Description is a non-homogenous texture descriptor in MPEG-7 which captures spatial distribution of edges and works well in CBIR.

3.2 Secure multi-party computation

Assume that there are multiple parties, each of which owns a secret number. Each party expects to obtain the total number of all parties, without publicizing their own number to others. Thus, each participant needs to encrypt his number before making it public. The secure multi-party computation can calculate over encrypted numbers, which meets the above requirement. Under the background of the secure multi-party computation technology, we can obtain the sum or product of numbers in the same manner as calculating under plain number. Nowadays, secure multi-party computation has been applied in many real-world applications, such as secure voting and secure electronic auction.

Jung et al. JungT proposed a privacy-preserving sum calculation scheme with collusion-tolerable, without the need for secure channels. This scheme can effectively calculate the sum of encrypted numbers, which can be briefly described as follows.

Step 1: Select two large prime numbers and with the same length where divides .

Step 2: Define the -order cyclic multiplicative group G with a generator being defined in Equation (1), where is a random number in Z. And define the -order multiplicative group G, where its generator is defined in Equation (2).

(1)
(2)

Step 3: Each participant randomly chooses a number Z and calculates a public number . Then, she exchanges the number G with and . can calculate a secret number G as shown in Equation (3) and the ciphertext as shown in Equation (4) after a round of exchanges.

(3)
(4)

Step 4: Each shares her ciphertext with other participants and calculates the product of all ciphertexts according to Equation (3.2) to obtain the value of .

(5)

Step 5: The sum of all numbers can be obtained by calculating Equation (6).

(6)

4 Problem Formulation

In this section, we will introduce the system model, threat model, and design goals of our scheme.

4.1 System model

There are four types of entities in our secure multi-source CBIR system, including multiple image owners, authorized query users, the cloud, and a key management center (KMC), as illustrated in Fig. 1. The description of each type of entity is detailed as follows.

Figure 1: System Model of the Secure Multi-Source CBIR Scheme
  • Multiple images owners: They are providers of image databases denoted by Image Owner () in Fig. 1. We assume that each image owner has a secure channel to communicate his secret key with KMC.

  • Authorized query users: They are users authorized by specific image owners and have the authority to send image retrieval requests to the cloud. We assume that authorized query users in the system will not reveal their secret image encryption key or distribute image retrieval results to unauthorized users.

  • Cloud: It takes responsibility for building secure retrieval indexes and executing image retrieval. It stores encrypted images, encrypted image features, information of image owners, and a list of authorized query users. We assume that the cloud is honest-but-curious, which means it will execute the image retrieval operation correctly, while it may also analyze images or image features to obtain some sensitive information about images.

  • Key management center (KMC): It has three main functionalities. First, it takes responsibility for storing secret image encryption keys, information of image owners, authorized query user lists received from image owners, and storing the authorized query user’s secret image encryption key temporarily when there comes a query from an authorized query user. Second, it will decrypt encrypted image retrieval results from the cloud, and then encrypt these images with the secret image encryption key of the authorized query user. Finally, it will send the new encrypted image retrieval results to the cloud. In this paper, we assume that the KMC is fully trusted that it will not reveal secret image encryption keys of image owners and authorized query users to others.

The workflow of our scheme is described as follows:

  1. Multiple image owners extract the EHD feature to represent an image, then they encrypt their images and image features respectively. After that, they outsource encrypted images and encrypted image features along with their identity to the cloud. The authorized user list will also be sent to the cloud for the image retrieval service, shown as 1⃝ in Fig. 1. They also need to send secret image encryption keys that is used to encrypt images to the KMC through a secure channel, shown as 2⃝ in Fig. 1. Image owners need not store secret keys that is used to encrypt image features. When a new image owner arrives, the above operations should be repeated.

  2. Before an authorized user requests a query, he/she will extract EHD features from query images and encrypt query image features, then submit the generated encrypted query to the cloud for image retrieval operation, shown as 3⃝ in Fig. 1. At the same time, he should send his secret image encryption key to the KMC through a secure channel, shown as 4⃝ in Fig. 1. For each retrieval request, the image encryption key that the authorized query user sends to the KMC should be different from the last time he retrieved. After receiving retrieval results from cloud, the authorized query user decrypts retrieval results with his own image encryption key, which is the same as the key sent to the KMC. Finally, the authorized query user can sort the decrypted retrieval results to get top-h similar images.

  3. After receiving encrypted images and image features from image owners, the cloud will build the retrieval index. When an image retrieval request arrives, he should verify the identity of query user. Then, it will execute an image retrieval operation if verified successfully. When it obtains retrieval results, it will send the encrypted retrieval results to the KMC instead of sending the retrieval results to the authorized query user, shown as 5⃝ in Fig. 1. At the same time, information of the authorized query user will also be sent to the KMC along with the corresponding encrypted retrieval results.

  4. When the KMC receives encrypted images from cloud, it will decrypt these images with image owners’ secret image encryption key first. Then, it will encrypt these images with the secret image encryption key of the related authorized query user. After that, the re-encrypted images will be sent to the cloud, shown as 6⃝ in Fig. 1.

  5. Cloud will return the re-encrypted image retrieval results that is received from the KMC to the authorized query user, shown as 7⃝ in Fig. 1.

4.2 Threat model

In our scheme, we consider the following two kinds of threads:

  1. Eavesdroppers.

    In the process of image transmission (e.g., sending encrypted images and their features to the cloud, and fetching retrieval results from the KMC, the KMC sending re-encrypted images to the cloud, and the cloud providing retrieval results to the authorized query user), eavesdroppers may eavesdrop image information. This is a weak adversary that can be defended by encryption.

  2. Cloud.

    In our scheme, we assume the cloud is honest-but-curious. It will correctly execute the image retrieval operation, but it may analyze image content through encrypted images with their features at the same time. Thus, we should guarantee the data privacy in the process of encrypting images and image features. Additionally, the cloud may obtain the similarity relation information of images over the process of secure image retrieval. We should avoid this kind of image privacy leakage in cloud.

4.3 Design goals

In this section, we describe design goals of our scheme as follows.

  1. Image privacy.

    Image privacy is very important in secure image retrieval service. We should unable the cloud and unauthorized users to obtain plain images and plain image features, along with image similarity relation information through encrypted images, encrypted image features, the encrypted retrieval results.

  2. Retrieval accuracy.

    The retrieval accuracy is an indispensable element in secure image retrieval. In our scheme, we proposed a new approach to manage the similarity of images. Therefore, retrieval accuracy difference between our scheme and the secure scheme retrieval with Euclidean distance should be within reasonable limits.

  3. Efficiency.

    Efficiency indicates the time consumption in a secure image retrieval scheme. We should ensure high efficiency in our scheme to enhance its practicality in real world application, which means the time of encryption, index construction, and retrieval should be reduced.

5 The design of Mipp

In this section, notations in our scheme are shown in the table. We first introduce the system overview and then describe the data encryption method. In order to solve the image similarity leakage problem in the cloud, we propose a new method to measure image similarity. The secure content-based image retrieval, index construction, and index update method of our scheme are also introduced in this section.

5.1 Notations in this section

Notations used in our scheme are described in Table 1.

Table 1
Notations Used in Our Scheme.
Notation Description
n The size of images and image features
u The size of query image and image features.
OID The identity of each image owner
AUL The authorized user list
SK The secret image encryption key of each image owner
USK The secret image encryption key of query user
UID The user id
AK The authentication key to verify query user’s identity
The plain image collection of each image owner
The plain image feature collection of each image owner
The square image feature collection of each image owner
The encrypted image collection of each image owner
The encrypted image feature collection of each image owner
The encrypted square image feature collection of each image owner
The query image collection of each query user
The query image feature collection of each query user
The encrypted query image feature collection of each query user
The square query image feature collection of each query user
The encrypted square query image feature collection of each query user
The top-h retrieval results
The top-h encrypted retrieval results
The re-encrypted retrieval results

5.2 System overview

There are four entities in our system, each with its own responsibility, described as follows.

  • Image owner is the provider of image database. Each image owner has his own image collection W. He will extract EHD image features from W and get the image feature collection F and FF. For preserving the privacy of images and image features, he will generate a secret image encryption key SK and run the ImageEnc algorithm to get encrypted image collection EW first. Then, he should generate secret image feature encryption keys and run the ImageFeatureEnc process to get the encrypted image feature collection EF and encrypted collection EFF. Then EW, EF and EFF will be outsourced to the cloud along with OID, AUL and AK through the network. Besides, SK will also be sent to the KMC for storage through a secure channel.

  • The authorized user is the user who has the demand of image retrieval. He will extract EHD features from query images QW to obtain the query image feature collection QF and run ImageFeatureEnc process to get the encrypted image feature collection EQ firstly. Next, he will make calculations to obtain the collection QWW and EQQ. After that, he will generate a secret query Q = {EQ, EQQ, UID, AK} and send Q to the cloud for secure image retrieval. Finally, he will generate a secret image encryption key USK and send this key to KMC through a secret channel. After receiving retrieval results from cloud, he uses his own secret image encryption key USK and runs the ImageDec algorithm to decrypt the result images ER, then sort the result images to obtain top-h similar images S = {s, s, … , s}.

  • The cloud takes responsibility for storing and retrieving. It will store EW, EF, OID, AUL and AK. Given a query, it uses AK to verify query user’s identity in the AUL. If validated successfully, it will run the ImageRetrieval process to retrieve similar images in the image database. Instead of sending top-h encrypted retrieval results ER to authorized query user directly, he sends ER and AK to KMC firstly. After that, he will send NER that is received from KMC to the authorized query user. Last, for improving retrieval efficiency, it will run IndexConstruct process to construct the retrieval index I.

  • The KMC stores SK, AUL and temporarily stores USK. After receiving ER and AK from cloud, it will run the ImageDec algorithm to decrypt ER to obtain W, then it uses the AK to obtain the secret image encryotion key of the authorized query user and runs the ImageEnc algorithm to encrypt W with this key to obtain NER. Finally, it will send NER to the cloud. After it finish the above operations, it can discard the USK of this query.

5.3 Data encryption

For preserving image privacy in the cloud, images and image features should be encrypted before outsourcing to the cloud. We will introduce data encryption methods of our scheme in this section, including key generation, image encryption, image decryption, and image feature encryption method.

5.3.1 Key generation

Given a secret parameter , run the KeyGen(1) algorithm, so image owners and authorized query users can obtain the secret image encryption key SK and USK respectively, where the length of SK and USK is at least the same as the total number pixels in images and the numbers in SK and USK is between 0 and 255 inclusively.

5.3.2 ImageEnc&ImageDec - Image encryption and decryption

Given a secret key SK and an image collection W, image owners run the ImageEnc(SK, W) algorithm to encrypt images, shown as Algorithm 1. The authorized query user also runs this algorithm to encrypt images with his secret key USK. In our image encryption scheme, we use a standard key stream to encrypt images which is secure against the Chosen-Plaintext Attacks (CPA). Thus our image encryption scheme can protect the privacy of image content.

After receiving NER from cloud, the user will run the ImageDec(SK, EW) algorithm to decrypt images, shown as Algorithm 2. The and in algorithm 1 and algorithm 2 are the height and width of images. The KMC also uses this algorithm to decrypt images.

1:  while  do
2:      while  do
3:          ;
4:          ;
5:      end while
6:      ;
7:  end while
Algorithm 1 ImageEnc(SK, W)
1:  while  do
2:      while  do
3:          ;
4:          ;
5:      end while
6:      ;
7:  end while
Algorithm 2 ImageDec(SK, EW)

5.3.3 ImageFeatureEnc - Image feature encryption

In order to preserve the privacy of image features, image features should be encrypted before they are outsourced to the cloud. Image owners and authorized query users both need to encrypt image features; they use the same method to encrypt image features. For an image feature f = {a, a, … , a}, they will first calculate the square of image feature f to obtain f = {a, a, … , a} where l is the dimension of image feature f. We use the secure multi-party computation method introduced in Section 3.2 to encrypt image features. The ImageFeatureEnc process will be described in detail as follows.

First, similar to secure multi-party computation, they choose q as a large primer number, whose length is the same as p, satisfies that q divided by p-1. Then, they select a random number h Z and generate the g and g as Equation (1) and Equation (2).

Second, they randomly choose a number r Z and calculate a number R as Equation  (3) for each dimension a in image feature f.

Finally, they can get the cipertext ea of each dimension a in f by calculating ea = (1+ap)R mod p. Then, the cipertext of f can be obtained by the same way. The encrypted feature and encrypted square feature are shown as follows.

ef = {ea, ea, … , ea} = {(1+ap)R mod p, (1+ap)R mod p, … , (1+ap)R mod p}.

ef = {ea, ea, … , ea} = {(1+ap)R mod p, (1+ap)R mod p, … , (1+ap)R mod p}.

After encrypting all image features, they can obtain the encrypted feature collection EF and EFF.

It should be brought to attention that the parameter p is public to all image owners, authorized query users, and cloud. Parameters q, h, and r in the image feature encryption process can be selected differently among image features. Image owners and authorized users can select by themselves without communicating with others. Additionally, they do not need to store it, which means these parameters can be discarded after using.

5.4 New approach to manage image similarity

In the field of image retrieval, the similarity of images is always measured by calculating the distance of image features. If two images are similar, then the distance of them will be very small. Euclidean distance is a type of distance that is typically used to measure the similarity of images, shown as Equation (7). It can calculate the similarity of images accurately. While this will also lead to the problem that when using it to measure the similarity of images in the cloud, the cloud can obtain the image similarity relation information. In order to solve this problem, we propose a new approach to manage image similarity.

Given two image features X = {x, x, … , x} and Y = {y, y, … , y}, the distance NewDis between X and Y can be calculated as Equation (5.4). Compared with Euclidean distance, we use and to replace x and y respectively in the third part of Equation (7).

(7)
(8)

The experimental results show that the accuracy and recall rate of the proposed approach are comparable to the European distance. In addition, the proposed approach can support multi-source encrypted image retrieval. Therefore we used the proposed approach to calculate the similarity between the images in our scheme.

5.5 Secure content-based image retrieval

Before requesting a query, authorized query user should generate a secret query Q = {EQ, EQQ, UID, AK}, and then send Q to the cloud for image retrieving. After receiving Q from authorized query user, the cloud first verifies whether this user is authorized and which owner authorized this user. If validated successfully, the cloud will retrieve in authorized images in the image database through the retrieval index.

For image collection W and query image collection QW, the similarity of image w and image qw can be measured by calculating the distance between f and qf.

However, image collection and image feature collection in the cloud are all encrypted. The similarities between image w and image qw can be measured by calculating the distance between ef and eq.

As image features are all encrypted by the secure multi-party computation method, it is the same for the encrypted image feature ef = {ea, ea, … , ea} and encrypted query image feature eq = {eqa, eqa, … , eqa} where l is the dimension of image feature. The distance between them can be calculated according to Equations (3.2),  (6) and  (9). We will describe them as follows.

First, the cloud stores the encrypted image feature ef = {ea, ea, … , ea} and ef = {ea, ea, … , ea}. He will receive the encrypted query image feature eq = {eqa, eqa, … , eqa} and eq = {eqa, eqa, … , eqa} from an authorized query user. Then, he can get CEA, CEA, CEQA, CEQA according to Equation (3.2).

Second, we can obtain the , , and , shown as follows.

Finally, the distance Sim between ef and eq can be obtained by Equation 9. The similarity of images w and image qw can be measured by this distance value, and the small distance value indicates that they are similar.

(9)

However, there exists a problem during the distance calculation process. Ciphertexts in encrypted image features are very large and the computation complexity of calculating the sum is high, which will consume too much time. Therefore, constructing a retrieval index is very necessary for high retrieval efficiency.

5.6 Index construction

We will calculate the sum of encrypted elements and in encrypted features by using the secure multi-party computation during the distance computation process of image features. However, cipertext and in the encrypted feature are very large, which will results in the sum calculation operation having high computation complexity and consuming too much time. Therefore, we should build a retrieval index to improve retrieval efficiency. Because the time mainly consumes in computing the sum of cipertext in encrypted features, given an encrypted image feature ef = {ea, ea, … , ea}, we can calculate , in advance and then store them in the retrieval index table, shown as Table 2.

Given an encrypted query image feature eq = {eqa, eqa, , eqa}, the cloud only needs to calculate , and then get , from index table when executing encrypted image retrieval. The sum of ciphertexts in encrypted feature is computed in advance which will save much time in the retrieval process and the retrieval efficiency of our scheme is improved.

Table 2
Retrieval index.
Image Owner Image ID

5.7 Images and Index update in cloud

Sometimes image owners may add images to the cloud or delete images from the cloud. When, images in the cloud are changed, image features will also be changed. Images in the retrieval index should be in accord with images in the cloud, therefore the retrieval index should be modified when images in the cloud are changed. Our scheme supports the dynamic updating of images and index in the cloud including update, delete, and add operation.

  1. Add operation.

    When an image owner requests the cloud to add some images for him, he should send encrypted images and encrypted image features to the cloud. Then, cloud will store these new images and image features in the image database. After that, the cloud will calculate the related data and of encrypted image features and add these two data along with , image id into the index table.

  2. Delete operation.

    When an image owner wants to delete images in the cloud, he should send the image IDs to the cloud. If there are image IDs that belong to this image owner, the cloud will delete these images from the index table. The cloud will also delete encrypted images and the corresponding encrypted image features.

  3. Update operation.

    For preserving the image privacy, an image owner may re-encrypt his images, image features, and then update these images in the cloud. When an image owner requests to update images, the cloud should delete the stale encrypted images and image features using the image IDs. Then, the cloud adds these re-encrypted images and their image features into the image database. Since updating encrypted image features will not change the and , there is no need to update the index table.

6 Security analysis

In this section, we will analyze the security of our scheme, including data privacy and image similarity leakage in the cloud.

6.1 Data privacy

In our scheme, data privacy contains the privacy of image content, image features and image similarity information in cloud. We will analyze these three kinds of privacy in the following subsections.

6.1.1 Image Content Privacy

As described in Section 5.2.2, the image owner generates a secret key SK to encrypt images. The image owner does not want unauthorized user(e.g., cloud, adversary or others) to obtain his image content; he will not reveal his secret key to unauthorized user. In our scheme, the image owner needs to store his secret key in the KMC. We assume the KMC is fully trusted and will not reveal secret keys to unauthorized user. At the same time, we assume the authorized query user will not reveal his secret image encryption key to unauthorized user and will not send image retrieval results to unauthorized user. For the privacy of images outsourced in the cloud, our scheme supports dynamically updating images in the cloud, which means that image owners can re-encrypt images and then outsource these re-encrypted images to the cloud to replace the stale encrypted images. This operation further enhances the privacy protection of images in the cloud. Once an unauthorized user obtains the key of an image owner, he can only crack images of this image owner for a certain period of time, and if the image owner updates his encrypted images in the cloud, this unauthorized user will be unable to crack these re-encrypted images. For the privacy of the image retrieval results, the authorized query user should send a secret image encryption key that is different from his previous query to the KMC for each of his query. Once an unauthorized user obtain the secret image encryption of an authorized query user, he can only crack retrieval results of this user for this query time which enhance the image privacy protection. Since image owners, authorized query users and the KMC will not reveal the secret image encryption key to others, unauthorized users are unabled to obtain secret image encryption keys and they are also unable to obtain plain image content through image owners, authorized query users or the KMC.

In the existing researches, the cloud returns encrypted retrieval results to authorized query users directly. If we also adopt this strategy, once the unauthorized user obtains one image owner’s secret image encryption key, then he can brute force encrypted image retrieval results to obtain the plain image contents. This operation is valid through every query operation of different authorized query users. Even though we have illustrated that the unauthorized user is unable to obtain secret image encryption keys, this strategy may also contain some insecure aspects. Therefore, we designed a different strategy. The cloud first sends ER to the KMC after it finishing the retrieval operation. The KMC will decrypt all images received from the cloud and encrypt these images with the secret image encryption key of the related authorized query user, and then send these re-encrypted images NER to the cloud. Finally, the cloud sends NER that are returned by the KMC to the authorized query user. This new strategy can solve the above problem. Images returned by the cloud are encrypted by the secret image encryption key of the authorized query user, even an unauthorized user obtains secret image encryption keys of image owners, he is unable to crack these images correctly. Even though the unauthorized user obtains the secret image encryption key of an authorized query user, he can only access to crack images that are returned to this authorized query user in this query. Furthermore, the authorized query user will send a different image encryption key to the KMC for each query. Even the unauthorized user can crack retrieval results of this query, he is unable to crack retrieval results in the next query using the same key. In our scheme, the unauthorized users are unabled to obtain secret keys of image owners and authorized query users, therefore our scheme can guarantee that image content privacy is unable to be captured by unauthorized user.

6.1.2 Image feature privacy

We use the sum protocol in secure multi-party computation model proposed in JungT to encrypt image features. Jung et al. JungT proposed three security models in his paper, shown as Definition 1, 2 and 3.

Definition 1 (CDH problem in ) The Computational Diffie-Hellman problem in a multiplicative group with generator is defined as follows: given only , , where , , compute without knowing or .

Definition 2 (DDH problem in ) The Computational Diffie-Hellman problem in a multiplicative group with generator is defined as follows: given only , , , where , , , decide if = .

Definition 3 (CDH-Security in ) We say our privacy preserving (sum or product) calculation is CDH-secure in if any Probabilistic Polynomial Time Adversary (PPTA) who cannot solve the CDH problem with non-negligible chance has negligible chance to infer any honest participant s private value in , i.e., any PPTA s probability to solve the CDH problem satisfies 1/p() for any polynomial p() where is the order of the group defined in the CDH problem.

In Jung’s paper, each participant will receive the and sent from participant and ; therefore, the unauthorized user may obtain , and . They has proved that their sum protocol is CDH-secure in in this condition.

Theorem 6.1 Our scheme can protect image feature privacy from being captured by cloud and unauthorized users.

Proof: In our scheme, image owner generates for each dimension. The cloud and unauthorized users are enabled to obtain , and , thus our image feature privacy is also CDH-secure in .

For an image feature f = {a, a, … , a}, the image owner will calculate the f = {a, a, … , a} and encrypt them to obtain ef and ef. The cipertext a and a are shown as follows:

If the unauthorized user wants to obtain a, then he has to solve the secret parameter . Because the calculation process of is unknown to him, he is unable to solve , and he cannot obtain the plaintext a.

At the same time, the parameter collection = {, {}} can be chosen among different image features. This means if we want to encrypt image feature , , , , we can choose four different parameter collections to encrypt them. Even if unauthorized users obtain one parameter collection , he is only able to decrypt one image feature. However, unauthorized users are unable to get the parameter collection . Moreover, we can discard the parameter collection directly, as there is no need to store it.

Furthermore, our scheme supports the update of image features in cloud, which enables image owners update their encrypted image features at any time. Once an unauthorized user obtains one parameter collection , he can only crack one or more encrypted image features that are encrypted by this parameter collection and if image owners update their encrypted image features in the cloud then the parameter collection that this unauthorized user obtains will be invalidation. This unauthorized user need to obtain the new parameter collection to crack encrypted image features. However, image owners will not store the parameter collection and they will not reveal the to unauthorized users, so unauthorized users will not obtain the and they will be unable to crack the plaintext of image features.

According to the above description, unauthorized users are unable to obtain the plaintext of image features. Thus, our scheme can protect the image feature privacy from being captured by unauthorized users.

6.2 Image similarity leakage in cloud

In our scheme, we assume that the cloud is ‘’honest but curious”, which means that he will execute the image retrieval operation accurately and at the same time he will analyst the relation or other information of images. In current research, there always exists the image similarity leakage problem in the cloud during the cloud executing image retrieval operation. Images in the retrieval result are arranged by the similarity to query image, which will reveal the similarity information of images to the cloud.

A new distance to manage the similarity of images is proposed in our scheme, which can solve the above problem. Fig. 2 shows the distribution of images that is similar to query images in retrieval results when the cloud returns top 100 images. The abscissa is the percentage that similar images distribute in the retrieval results, and the ordinate is the probability that similar images distribute in the designated range. We can see that when retrieving with the Euclidean distance, the percentage of truly similar images appear in the top 10

of all retrieval result images, which is very high, and the distribution percentage is decreasing from beginning to end in retrieval results. The truly similar image distribution of our scheme is uniform and similar images are not likely to distribute at the beginning percentage of the retrieval result, which will mislead the cloud to analyst the similarity of the images. Because similar images are uniformly distributed in the retrieval results, the cloud may get the wrong similarity relation. Therefore, our scheme can prevent the image similarity leakage to the cloud.

Figure 2: The percentage that images will appear in each range of all returned images.

7 Performance evaluation

In this section, we will introduce the performance evaluation of our scheme, including experimental setting, evaluation of retrieval accuracy, evaluation of time consumptions, and evaluation of storage consumption.

7.1 Experimental setting

The corel images corel1 ; corel2 data set is usually used to verify the experiment scheme in the research field of image retrieval. It contains 100 categories each of which has 100 images. They are selected as test images in our experiments. We choose 10 categories and generate 5 queries for each category, so there are 50 queries in total to evaluate the retrieval accuracy. The proposed scheme is implemented by C++ on Intel Core(TM) Processor 2.7 GHZ.

7.2 Evaluation of retrieval accuracy

In the field of information retrieval, precision, recall ratio, and F1-Measure are typical metrics to evaluate the retrieval results as formulated in Equations (10)- (12

), where TP represents the true positives, FP represents the false positives, and FN represents the false negatives. The precision and recall always represent the contrary variation tendency as shown in Fig. 

3. We can use the F1-Measure to make a comprehensive evaluation. Fig. 3 shows that the retrieval accuracy of our scheme is about 10% lower than the scheme retrieval by Euclidean distance on average. The retrieval recall of our scheme is about 5% lower than the scheme retrieval by Euclidean distance on average. Fig. 4 shows the F1-Measure of MIPP scheme and the scheme retrieving with Euclidean distance. The result shows the F1-Measure of our scheme is about 7% lower than the scheme retrieving with Euclidean distance. As described in previous sections, our scheme supports multiple image owners and can preserve image similarity information in the cloud. Therefore, the loss of retrieval accuracy of our scheme is the trade off these two aspects.

(10)
(11)
(12)
Figure 3: Retrieval accuracy of our scheme and Euclidean distance scheme.
Figure 4: F1-Measure of our scheme and Euclidean distance scheme.

7.3 Evaluation of time consumptions

The time consumptions of our scheme primarily consist of index construction time and secure image retrieval time, which is described as follows:

  1. Index construction time.

    When a new image owner participates in our scheme, the image size will be increased. Fig. 6 shows the index construction time is increasing with the larger size of images. When the image size is 10,000, the index construction time is approximately 7 minutes, which is tolerable. After constructing the retrieval index, the efficiency of secure image retrieval in our scheme can be improved.

  2. Secure image retrieval time.

    The time consumptions of plain image retrieval, encrypted image retrieval with index, and encrypted image retrieval without index are shown in Fig. 6, Fig. 8, and Fig. 8 respectively. The results show that the larger image size, the more time image retrieval consumes. The plain image retrieval time of 10,000 images consumes approximately 1 second. Encrypted image retrieval time without index of 10,000 images consumes approximately 6.8 minutes, while encrypted image retrieval time with index of 10,000 images consumes approximately 50ms. According to the above experimental data, we can calculate that index-based image retireval takes approximately 8,160 times faster than non-indexed image retrieval and index-based image retireval takes approximately 1,200 times faster than plain image retrieval when retrieving in a collection of 10,000 images. We can conclude that index-based encrypted image retrieval can greatly improve retrieval efficiency compared with plain image retrieval and non-indexed encrypted image retrieval. The retrieval time-consuming result of index-based encrypted image retrieval shows that the retrieval efficiency of our scheme is appreciable.


Figure 5: Index construction consumption
Figure 6: Plain image retrieval consumption

Figure 7: Consumption of encrypted image retrieval with index
Figure 8: Consumption of encrypted image retrieval without index

7.4 Evaluation of storage consumption

The storage consumption includes index storage and encrypted image features storage consumption, described as follows:

  1. Index storage consumption.

    We extract image features to represent images for similarity calculation and construct retrieval indexes for improving retrieval efficiency. The storage consumption of index table is shown in Table 3. We can find that the retrieval index of 10,000 images costs approximately 267KB storage space. Therefore, the storage consumption of index table is very low.

  2. Encrypted image features storage consumption.

    In order to preserve image features privacy, we outsource encrypted image features to cloud. Table 3 shows that 10,000 encrypted image features consume approximately 3.2GB storage space. Because encrypted image features are stored in the cloud, which has high storage facilities, the storage consumption of encrypted image features in our scheme is tolerable.

Table 3
Storage consumption of 10000 images.
Encrypted Retrieval
Image Features Index
Storage consumption (KB) 3352278 267

8 Conclusion

In this paper, we presented a content-based multi-source encrypted image retrieval scheme in clouds with privacy protection. We encrypted image features with the secure multi-party computation, which allowed image owners to encrypt image features by using their own keys. We also proposed a new method to measure the similarity of images that could avoid revealing image similarity information to the cloud at a certain extent. Theoretical analysis and experimental results showed that our scheme enabled an accurate and efficient image retrieval over images gathered from multiple sources, while providing privacy guarantees. In the future work, we are to further improve the image retrieval efficiency.

Acknowledgements

This work was supported in part by the National Science Foundation of China [Grant number 61602039] and the China National Key Research and Development Program [Grant number 2016YFB0800301].

References

References

  • (1) M. Kaur, N. Sohi, A novel technique for content based image retrieval using color, texture and edge features, in: 2016 International Conference on Communication and Electronics Systems (ICCES), 2016, pp. 1–7. doi:10.1109/CESYS.2016.7889955.
  • (2) A. Yalavarthi, K. Veeraswamy, K. A. Sheela, Content based image retrieval using enhanced gabor wavelet transform, in: 2017 International Conference on Computer, Communications and Electronics (Comptelix), 2017, pp. 339–343. doi:10.1109/COMPTELIX.2017.8003990.
  • (3)

    A. Rashno, S. Sadri, Content-based image retrieval with color and texture features in neutrosophic domain, in: 2017 3rd International Conference on Pattern Recognition and Image Analysis (IPRIA), 2017, pp. 50–55.

    doi:10.1109/PRIA.2017.7983063.
  • (4) Y. Chen, The image retrieval algorithm based on color feature, in: 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS), 2016, pp. 647–650. doi:10.1109/ICSESS.2016.7883151.
  • (5) C. H. Su, H. S. Chiu, T. M. Hsieh, An efficient image retrieval based on hsv color space, in: 2011 International Conference on Electrical and Control Engineering, 2011, pp. 5746–5749. doi:10.1109/ICECENG.2011.6058026.
  • (6) S. S. Devi, R. Balasundaram, Content based texture image retrieval based on modified dominant directional local binary pattern, in: 2017 4th International Conference on Advanced Computing and Communication Systems (ICACCS), 2017, pp. 1–6. doi:10.1109/ICACCS.2017.8014592.
  • (7) X. Chen, Y. Zheng, C. Yu, C. Gao, Image retrieval based on color and texture features, in: 2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2013, pp. 403–406. doi:10.1109/IIH-MSP.2013.107.
  • (8) A. K. Naveena, N. K. Narayanan, Image retrieval using combination of color, texture and shape descriptor, in: 2016 International Conference on Next Generation Intelligent Systems (ICNGIS), 2016, pp. 1–5. doi:10.1109/ICNGIS.2016.7854023.
  • (9) A. Anandh, K. Mala, S. Suganya, Content based image retrieval system based on semantic information using color, texture and shape features, in: 2016 International Conference on Computing Technologies and Intelligent Data Engineering (ICCTIDE’16), 2016, pp. 1–8. doi:10.1109/ICCTIDE.2016.7725364.
  • (10) M. Shen, B. Ma, L. Zhu, R. Mijumbi, X. Du, J. Hu, Cloud-based approximate constrained shortest distance queries over encrypted graphs with privacy protection, IEEE Transactions on Information Forensics and Security 13 (4) (2018) 940–953. doi:10.1109/TIFS.2017.2774451.
  • (11) L. Zhu, X. Tang, M. Shen, X. Du, M. Guizani, Privacy-preserving ddos attack detection using cross-domain traffic in software defined networks, IEEE Journal on Selected Areas in Communications (2018) 1–1doi:10.1109/JSAC.2018.2815442.
  • (12) Y. Zhang, L. Zhuo, Y. Peng, J. Zhang, A secure image retrieval method based on homomorphic encryption for cloud computing, in: 2014 19th International Conference on Digital Signal Processing, 2014, pp. 269–274. doi:10.1109/ICDSP.2014.6900669.
  • (13) L. Zhang, T. Jung, K. Liu, X. Y. Li, X. Ding, J. Gu, Y. Liu, Pic: Enable large-scale privacy preserving content-based image search on cloud, IEEE Transactions on Parallel and Distributed Systems 28 (11) (2017) 3258–3271. doi:10.1109/TPDS.2017.2712148.
  • (14) Z. Xia, Y. Zhu, X. Sun, Z. Qin, K. Ren, Towards privacy-preserving content-based image retrieval in cloud computing, IEEE Transactions on Cloud Computing 6 (1) (2018) 276–286. doi:10.1109/TCC.2015.2491933.
  • (15) J. Yuan, S. Yu, L. Guo, Seisa: Secure and efficient encrypted image search with access control, in: 2015 IEEE Conference on Computer Communications (INFOCOM), 2015, pp. 2083–2091. doi:10.1109/INFOCOM.2015.7218593.
  • (16) X. Yuan, X. Wang, C. Wang, A. C. Squicciarini, K. Ren, Towards privacy-preserving and practical image-centric social discovery, IEEE Transactions on Dependable and Secure Computing (2017) 1–1doi:10.1109/TDSC.2016.2609930.
  • (17) Z. Xia, X. Wang, L. Zhang, Z. Qin, X. Sun, K. Ren, A privacy-preserving and copy-deterrence content-based image retrieval scheme in cloud computing, IEEE Transactions on Information Forensics and Security 11 (11) (2016) 2594–2608. doi:10.1109/TIFS.2016.2590944.
  • (18) L. Weng, L. Amsaleg, T. Furon, Privacy-preserving outsourced media search, IEEE Transactions on Knowledge and Data Engineering 28 (10) (2016) 2738–2751. doi:10.1109/TKDE.2016.2587258.
  • (19) B. Ferreira, J. Rodrigues, J. Leitão, H. Domingos, Privacy-preserving content-based image retrieval in the cloud, in: 2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS), 2015, pp. 11–20. doi:10.1109/SRDS.2015.27.
  • (20) X. Zhang, H. Cheng, Histogram-based retrieval for encrypted jpeg images, in: 2014 IEEE China Summit International Conference on Signal and Information Processing (ChinaSIP), 2014, pp. 446–449. doi:10.1109/ChinaSIP.2014.6889282.
  • (21) H. Cheng, X. Zhang, J. Yu, F. Li, Markov process based retrieval for encrypted jpeg images, in: 2015 10th International Conference on Availability, Reliability and Security, 2015, pp. 417–421. doi:10.1109/ARES.2015.18.
  • (22) C. Zhang, J. Li, S. Wang, Z. Wang, An encrypted medical image retrieval algorithm based on dwt-dct frequency domain, in: 2017 IEEE 15th International Conference on Software Engineering Research, Management and Applications (SERA), 2017, pp. 135–141. doi:10.1109/SERA.2017.7965719.
  • (23) B. S. Manjunath, J. R. Ohm, V. V. Vasudevan, A. Yamada, Color and texture descriptors, IEEE Transactions on Circuits and Systems for Video Technology 11 (6) (2001) 703–715. doi:10.1109/76.927424.
  • (24) M. Mejía-Lavalle, C. P. Lara, J. R. Ascencio, The mpeg-7 visual descriptors: A basic survey, in: 2013 International Conference on Mechatronics, Electronics and Automotive Engineering, 2013, pp. 115–120. doi:10.1109/ICMEAE.2013.46.
  • (25) T. Jung, X. Y. Li, M. Wan, Collusion-tolerable privacy-preserving sum and product calculation without secure channel, IEEE Transactions on Dependable and Secure Computing 12 (1) (2015) 45–57. doi:10.1109/TDSC.2014.2309134.
  • (26) J. Li, J. Z. Wang, Automatic linguistic indexing of pictures by a statistical modeling approach, IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (9) (2003) 1075–1088. doi:10.1109/TPAMI.2003.1227984.
  • (27) J. Z. Wang, J. Li, G. Wiederhold, Simplicity: semantics-sensitive integrated matching for picture libraries, IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (9) (2001) 947–963. doi:10.1109/34.955109.