PassBio: Privacy-Preserving User-Centric Biometric Authentication

11/14/2017 ∙ by Kai Zhou, et al. ∙ Michigan State University 0

The proliferation of online biometric authentication has necessitated security requirements of biometric templates. The existing secure biometric authentication schemes feature a server-centric model, where a service provider maintains a biometric database and is fully responsible for the security of the templates. The end-users have to fully trust the server in storing, processing and managing their private templates. As a result, the end-users' templates could be compromised by outside attackers or even the service provider itself. In this paper, we propose a user-centric biometric authentication scheme (PassBio) that enables end-users to encrypt their own templates with our proposed light-weighted encryption scheme. During authentication, all the templates remain encrypted such that the server will never see them directly. However, the server is able to determine whether the distance of two encrypted templates is within a pre-defined threshold. Our security analysis shows that no critical information of the templates can be revealed under both passive and active attacks. PassBio follows a "compute-then-compare" computational model over encrypted data. More specifically, our proposed Threshold Predicate Encryption (TPE) scheme can encrypt two vectors x and y in such a manner that the inner product of x and y can be evaluated and compared to a pre-defined threshold. TPE guarantees that only the comparison result is revealed and no key information about x and y can be learned. Furthermore, we show that TPE can be utilized as a flexible building block to evaluate different distance metrics such as Hamming distance and Euclidean distance over encrypted data. Such a compute-then-compare computational model, enabled by TPE, can be widely applied in many interesting applications such as searching over encrypted data while ensuring data security and privacy.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Biometric authentication has been incredibly useful in services such as access control to authenticate individuals based on their biometric traits. Unlike passwords or identity documents used in conventional authentication systems, biometric traits, such as fingerprint, iris and behavioral characteristics are physically linked to an individual that cannot be easily manipulated. Also due to such a strong connection, security and privacy of the biometric templates used in the authentication process is a critical issue [15, 16, 28].

Existing biometric authentication systems generally employ a two-phase mechanism [28]. In a registration phase, an end-user submits her biometric template to the service provider who will store the template along with the end-user’s ID in a central database. In a query phase, the end-user requesting access to certain services will submit a fresh template to the service provider for authentication. Based on the end-user’s ID, the service provider will retrieve the enrolled template for comparison. Only if the two templates are close enough under certain distance metric, the end-user is successfully authenticated.

The above biometric authentication model can be regarded as server-centric. That is, the service provider will receive end-users’ biometric templates in plaintext and is fully responsible for the security of the templates. Such an approach has several inherent deficiencies. First, the end-users have to fully trust the service provider to properly handle their templates; otherwise the security and privacy of the templates are at risk. For example, different service providers may crosscheck their databases to discover possible duplications, meaning that the same end-user may get enrolled in different services. As a consequence, the privacy of the end-user is violated. Second, unlike password, biometric templates are inherently noisy. As a result, the fresh template to be authenticated is not necessarily the same as the registered template. Such a property prevents the service provider from keeping the templates encrypted during the whole authentication process. At some point, the templates have to be recovered in plaintext for distance computation and comparison. This renders the adversaries with the opportunity to spy the registered or freshly submitted templates.

To address the above issues, we propose a user-centric model for biometric authentication. In terms of security, such a user-centric model has several unique features, compared to the server-centric model. First, biometric templates are encrypted at user side and then transmitted to the server. The service provider is only able to see encrypted versions of the registered templates and query templates. Second, the secret keys and the templates are generated and processed locally thus never leaving the local environment. Third, computations involved in authentication are all carried out on ciphertext, meaning that no templates are exposed in plaintext during the authentication. These features can effectively reduce the possibility for the server as well as outside adversaries to learn any key information of the biometric templates.

To meet the demands of the proposed user-centric model, the underlying encryption scheme should be efficient and expose as little information as possible. Since the key management and encryption are carried out at the user side, the encryption scheme should be computationally efficient. Some existing encryption schemes relying on heavy cryptographic operations such as Predicate Encryption (PE) [22, 29], Inner Product Encryption (IPE) [4, 2, 23, 10] and Homomorphic Encryption (HE) [31, 27] may not be practical in such a scenario. Also, the encryption scheme should support certain kinds of computation on encrypted data. For example, given two encrypted vector, the server should be able to decide whether the two vectors are close enough (e.g., within a certain threshold) under some distance metric. The encryption scheme should expose as little template information as possible for security and privacy. Although some distance preserving transformation schemes [36] have been proposed for private nearest neighbor search on encrypted data, these schemes will inevitably expose the distance information between the registered and query template, which makes them vulnerable to security attacks [36].

In this paper, we propose a new primitive named Threshold Predicate Encryption (TPE). TPE encrypts two vectors and respectively as and . Unlike traditional cryptosystems, the decryption of TPE will only reveal whether the inner product of and is within a threshold or not, instead of the plaintext. Therefore, no more information about the vectors and the inner product are exposed. TPE is fundamentally different from the previous schemes such as IPE [23] and PE [22]. IPE reveals the inner product of and thus the distance between the registered template and the query template, which makes the scheme vulnerable to security attacks [36]. PE can only reveal whether the inner product equals to a threshold or not. It is not flexible enough for biometric authentication since generally we want to know whether the distance between the two templates is within a threshold. In comparison, our proposed TPE provides an excellent trade-off between information leakage and flexibility, which makes is uniquely suitable for biometric authentication.

TPE enables a compute-then-compare computational model over encrypted data. In this model, given ciphertexts, any party is able to compute the distance between the underlying plaintexts and then compare the distance with a threshold. The output is an indicator showing whether the distance is within the threshold or not. We show that such a computational model captures the essence of various applications such as privacy-preserving biometric identification and searching over encrypted data. TPE based schemes are able to fulfill the requirements of such applications while ensuring the security and privacy of the data.

The main contributions of this paper are summarized as follows:

  • We propose a user-centric biometric authentication scheme enabling end-users to utilize their biometric templates for authentication while preserving template privacy.

  • We propose a new primitive named TPE that can encrypt two vector and in such a manner that the decryption result only reveals whether the inner product of and is within a threshold or not.

  • The proposed TPE enables a compute-then-compare computational model over encrypted data. We show that such a computational model can be applied to many privacy-preserving applications such as biometric identification and searching over encrypted data.

The rest of this paper is organized as follows. In Section III, we introduce the system model as well as the threat model where different attacks are identified. We then illustrate the design of TPE and give a detailed implementation in Section IV. Based on TPE, we propose the user-centric biometric authentication scheme in Section V, where different similarity measurements are considered. We give detailed security analysis of TPE in Section VI. In Section VII, we introduce some applications of TPE such as outsourced biometric identification and searching over encrypted data. We analyze the complexity of TPE and evaluate the performance of TPE through some simulations in Section VIII. We conclude in Section IX.

Ii Related Work

The proposed TPE scheme can be regarded as an instance of functional encryption. That is, given the decryption key, the decryption process actually produces a function of the underlying plaintext, instead of the plaintext itself. From an application point of view, biometric authentication and identification is closely related to finding the nearest neighbor of a given point (i.e.,nn or -nn search). Thus, in this section, we review some related works concerning these two topics.

Ii-a Functional Encryption and Controlled Disclosure

In conventional cryptosystem, the decryption process will eventually recover the underlying plaintext . As a result, all information of is disclosed. Many applications, however, require only partially disclosure of the information of . For example, a financial organization wants to filter out those customers whose transactions exceed certain amount. For privacy concern, all the transactions of the customers are encrypted. In this case, instead of decrypting the transactions, a more desirable approach is to determine whether an transaction exceeds certain amount without disclosing the transaction. Such application scenarios motivate the research of functional encryption [24, 5, 12]. In a functional encryption scheme, a decryption key is associated with a function . Given the ciphertext , the decryption process will evaluate the function , where is the underlying plaintext. Note that in this process, the plaintext cannot be recovered. Thus, by issuing different decryption keys , functional encryption can actually implement controlled disclosure of the plaintext .

Much research effort has been devoted to designing various functions for functional encryption schemes. Representative works are Predicate Encryption (PE) [22, 29] and Inner Product Encryption (IPE) [4, 2, 23, 10]. In PE, a message is modeled as a vector and a decryption key is associated with a vector . The decryption result is meaningful (otherwise, a random number) if and only if the inner product of and is equal to . Based on this basic implementation, different predicates are realized such as exact threshold, polynomial evaluation and set comparison. In contract, IPE schemes will recover the value of inner product of and , without revealing neither nor . In the context of controlled disclosure, IPE discloses more information of the plaintext than PE. This is because with PE, one can only decide whether the inner product of and is equal to a certain value or not while with IPE, one can know the value of the inner product. In comparison, with TPE, what we seek is to control the amount of information to be disclosed between those of PE and IPE. As a result, TPE can efficiently fulfill the task of biometric authentication while exposing less information about the templates.

Ii-B Secure -nn Search

The problem of secure -nn search can be described as finding the nearest neighbors (-nn) of a given query point among a set of encrypted points. The schemes [11, 8, 34, 36, 35] for secure -nn search mainly differ in the attack models they considered and the security levels they can provide. For instance, the scheme in [34] focused on search efficiency at the cost of partial privacy leakage. Both [36] and [35] considered a stronger known-plaintext attack model. The basic ideas of these two schemes are quite similar. Given two encrypted points in the data set and one encrypted query, the comparison process in the schemes is able to determine which point is closer to the query point. Repeating this comparison process will finally reveal which point in the data set the nearest neighbor to the query point.

Our proposed TPE scheme utilizes similar techniques as that in [35]. However, the computational models as well as the security requirements are fundamentally different. In the biometric identification scheme in [35], given a query template, the server is able to identify the closest template in the database, which is returned to the end-user. After decryption of the returned template, the end-user is able to calculate the distance and determine whether the distance is within a threshold. We note that such a computational model cannot be easily applied to biometric authentication. This is because in biometric authentication, it is the server that compares the distance with a threshold while the server is not allowed to decrypt the templates thus calculating the distance. Moreover, secure -nn based approaches will inherently expose more information than needed. From -nn search, a sever can learn the relative distances between a query template and all the templates in the database. Such information is more than needed for biometric authentication and identification, where ideally, the server only needs to know whether the distance exceeds a pre-defined threshold.

Iii Problem Statement

Iii-a System model

We consider an online biometric authentication system consisting of two parties: an online service provider and a set of end-users. The service provider provides certain online services such as storage to its authenticated end-users. We assume that every end-user possesses a device such as a mobile phone that is able to collect the her biometric traits and transform the traits to biometric templates at the local side. Without loss of generality, we assume that each biometric template is represented by an -dimensional vector ) of real numbers.

The biometric authentication process consists of two phases. In the registration phase, an end-user will register with her biometric template along with a unique identifier . We note that the template is sent to the service provider in encrypted form denoted as and can be any pseudorandom string that uniquely identifies within the system. The tuple for the end-user is then stored at the server side by the service provider. In the query phase, when the end-user desires to authenticate herself to the service provider, will locally generate a fresh biometric template and send the tuple to the service provider, where is the encrypted form of . On receiving the query, the service provider will retrieve the record through searching in the server. Then distance between and are computed based on and . If the distance is within a certain threshold , then the service provider will view the end-user as a valid user. We also note that during the query phase, the service provider is only able to derive whether the distance between and is within the threshold , instead of the exact distance between them.

Iii-B Threat model

We assume the end-users are fully trusted in the registration phase. That is, they will honestly generate their own biometric templates and register at the service provider using the encrypted templates. In the query phase, we assume the encryption and decryption algorithms are publicly known. However, the secret keys are generated and kept secret at the local side throughout the whole authentication process. We do allow the adversaries to submit their own biometric templates through the local device. In this case, the local device acts as an oracle to encrypt templates and submit the encrypted templates to the service provider. The service provider can be honest-but-curious or malicious. In the former case, the service provider will honestly follow the protocol but will try to obtain any useful information of end-users’ biometric templates based only on the encrypted templates. In the latter case, the adversaries may collude with the service provider such as sharing with the service provider the invalid templates that are submitted through the local devices. In summary, depending on the different capabilities of the service provider and the adversaries, we propose two attack models as follows.

  1. Passive Attack: the service provider is able to know the registered record for end-user and observe a series of submitted queries , . However, the service provider does not know the underlying templates in plaintext. Such an attack model is also known as the Ciphertext-Only-Attack in cryptography.

  2. Active Attack: besides the registered record for end-user , the service provider is able to observe a series of submitted queries as well as the corresponding plaintext , . Such an attack model corresponds to the Chosen-Plaintext-Attack in cryptography. In practice, an adversary may submit her own templates through the local device. The service provider can then collude with the adversary to obtain the queries in plaintext as well as the encrypted queries.

Informally, the security requirement of biometric authentication is that the service provider is unable to learn any information about the templates than allowed through the authentication process. In particular, it should be possible for the service provider to determine whether the distance between two templates is within a threshold or not;but infeasible to derive any key information about the registered template as well as the query templates. We will formally define the security against both attacks in Section VI.

Iv Proposed Threshold Predicate Encryption Scheme

A user-centric privacy-preserving biometric authentication scheme requires that an end-user is able to encrypt her registered biometric template as well as the freshly generated query templates. For the service provider, given two encrypted templates, it should be able to determine the distance between the two templates and compare the distance with a threshold. In this section, we introduce Threshold Predicate Encryption (TPE) that can fulfill the functionalities required by such a biometric authentication system.

Iv-a Framework

Our proposed privacy-preserving biometric authentication scheme is based on the new primitive named Threshold Predicate Encryption (TPE). Generally speaking, TPE can be regarded as an instance of functional encryption [24, 5], where decryption will output a function of the plaintext instead of the plaintext itself. The framework of functional encryption can be briefly summarized as follows. A plaintext vector is encrypted as and a secret key associated with a vector is generated as . Given and , the decryption will give the value of , where is a pre-defined function. Two notable instances of functional encryption are Inner Product Encryption (IPE)[23] and Predicate Encryption (PE) [22]. The function in IPE is the inner product. That is, the decryption of IPE will give the inner product of and . In comparison, PE will produce a meaningful decryption result (e.g., a flag number ) if and only if the inner product of and is . Otherwise, the decryption result is just some random number. An important predicate is that the inner product of and equals . Based on this, an extension of PE can implement exact threshold predicate encryption, meaning that the decryption result is meaningful only if the inner product of and is equal to a pre-defined threshold .

At the high-level view, functional encryption aims at revealing only limited information about the plaintext. As introduced above, IPE reveals the inner product of the plaintext and a vector. PE reveals whether the inner product is equal to (or a threshold) or not. In application scenarios like biometric authentication, the amount of information revealed by IPE and PE are both inappropriate. As shown in our latter analysis, the inner product of and can be modeled as the distance between the registered template and the query template. As a result, IPE will give the exact distance between the two templates, which exposes too much information. With PE, one can decide whether the distance of the two templates is equal to a certain threshold, which is not sufficient for authentication purpose. What we need is an functional encryption scheme that can determine whether the distance between the two templates is within a threshold or not. Specifically, a TPE is composed of five algorithms:

  • : the set up algorithm generates system parameters .

  • : on input of a security parameter , the key generation algorithm will generate a secret key .

  • : given a vector and the secret key , the encryption algorithm will encrypt as ciphertext .

  • : given a vector and the secret key , the token generation algorithm will generate a token for .

  • : given the ciphertext and the token , the decryption algorithm will output a result satisfying

    where is the inner product of and .

Iv-B Design of TPE

While our proposed TPE scheme utilizes some similar techniques as the biometric identification scheme in [35], the settings of biometric authentication are fundamentally different. In particular, our proposed TPE is designed to address the following challenges.

Challenge 1

The system and threat model of outsourced biometric identification and biometric authentication are different. In biometric identification, the database owner possesses the encryption and decryption keys. The aim of the server is to identify the template closest to the query template. Then the database owner will retrieve the template, decrypt it and compare the distance to a threshold. However, in our scenario, the server does not possess the decryption key thus is unable to decrypt the encrypted template and calculate the distance. What we need is an encryption scheme that can directly determine whether the distance between the query template and the registered template is within the threshold based only on ciphertexts.

Challenge 2

The computation involved in biometric identification and authentication are different. In biometric identification, the sever needs to compute and compare the distances between a query template and all the templates in the database. However, in biometric authentication, we need to compute the distance and compare it with a threshold.

Challenge 3

The decryption process in [35] will output a randomized distance between a query template and registered template. From this randomized distance, it is not easy to directly compare it with a threshold without first recovering the actual distance.

To address the above challenges, we first embed the threshold into the registered templates. To enhance security, we pad the templates with one-time randomness in a special manner and make random permutation to both the query template and registered template. After all these transformations, the decryption process can derive

, where denotes the distance between a registered template and a query template . However, if we output this value directly, it is inevitable that the exact value of will be exposed. Therefore, we introduce more one-time randomness into the encrypted templates. As a result, the decryption result becomes , where and are positive one-time random numbers associated with and , respectively. This design reveals only adequate information to determine whether the distance between is within the threshold and at the same time conceals the exact value of the distance.

Iv-C Construction of TPE

Follow the aforementioned design of our threshold predicate encryption scheme, we give a detailed implementation in Protocol 1.

Input: .
Output: .

:

1:  set .

:

1:  Randomly generate two non-singular matrices and and calculate their inversions and .
2:  Choose a random permutation
3:  Set .

:

1:  Generate two random number and .
2:  Extend the vector to an -dimensional vector .
3:  Permute to obtain .
4:  Transform to a diagonal matrices with .
5:  Generate a random lower triangular matrices with the diagonal entries fixed as .
6:  Compute .

:

1:  Generate two random numbers and .
2:  Extend to an -dimensional vector .
3:  Permute to obtain .
4:  Transform to a diagonal matrix with being the diagonal.
5:  Generate a random lower triangular matrix with the diagonal entries fixed as .
6:  Compute .

:

1:  Compute , where denotes the trace of a matrix.
2:  Set if ; otherwise set .
Protocol 1 Threshold Predicate Encryption (TPE) Scheme

Now, we prove the correctness of the proposed TPE scheme. For a square matrix , the trace is defined as the sum of the diagonal entries of

. Given an invertible matrix

of the same size, the transformation is called similarity transformation of . We have the following lemma from linear algebra.

Lemma 1.

The trace of a square matrix remains unchanged under similarity transformation. That is, .

Based on Lemma 1, we have the following theorem.

Theorem 1.

For the proposed TPE scheme in Protocol 1, equals if and only if , where denotes the inner product of and .

Proof:

Following the procedure in Protocol 1, the vector is transformed to . The vector is transformed to . Then we have . From Lemma 1, we have . Since and are selected as lower triangular matrices, where all the diagonal entries are set to , the diagonal entries of and are all the same as those of and . Thus we have . Since and are diagonal matrices, . Since and are positive, we have (i.e.,) if and only if .

V Biometric Authentication Under Different Distance Metrics

In this section, we will first introduce some necessary background on biometric authentication. Then, we show how to construct privacy-preserving biometric authentication systems utilizing our proposed TPE scheme under different distance metrics.

V-a Backgrounds

The first critical step in biometric authentication is to efficiently transform biometric traits into templates that are easy for computation. Such a process is often called feature extraction. The extracted features are often represented as feature vectors

. Depending on the biometric traits, the process as well as the result of feature extraction could differ. For example, a fingerprint can be transformed to a FingerCode

[18, 19, 17] that is a vector of integers with dimension . An Iris image is often represented as a binary string of bits. In the following, we briefly review the feature extraction process of fingerprints. The details can be found in [19, 17].

As illustrated in Fig. 1111This figure is partially obtained from [17]., given an image of a fingerprint, the first step is to identify a reference point. Then the region of interest around the reference point is divided into bands and sectors. Those sectors are further normalized and filtered by different Gabor filters. At last, the features are extracted from each filtered image. The final result is a -dimensional vector (FingerCode) representing each fingerprint image, where each entry in the vector is an -bit integer. An import feature of the FingerCode is that it is translation invariant, meaning that translation of the fingerprint image would not result in much difference in the FingerCode. However, FingerCode is not rotation invariant. As a result, rotation of images will often cause different FingerCodes. To resolve this issue, a user is often associated with several (for example, 5) FingerCodes captured from rotated images in the database. In the following discussion, we assume that at the local side, there exists a sensor that can capture the end-user’s biometric trait and transform it to a multi-dimensional vector.

Fig. 1: Feature extraction of fingerprints: (i) Identify reference point; (ii) Divide region of interest into sectors around reference point; (iii) Filter region of interest; (iv) Extract features.

In a user-centric biometric authentication system, an end-user will send her encrypted biometric template to the service provider in the registration phase. In the query phase, the end-user will encrypt a freshly generated template and send it to the service provider for authentication usage. Thus, a critical issue is to decide whether two templates are close enough. These problem is reduced to measuring the distance of two vectors in a metric space and compare the distance to a certain threshold. Such a compute-then-compare computational model on encrypted data is well suited for our proposed TPE scheme.

Furthermore, different biometric templates often rely on different similarity measurements. For example, in Iris recognition, the templates are represented by binary vectors and the similarity is generally measured by Hamming distance. For fingerprint, the Euclidean distance is normally utilized to measure the similarity. Our proposed TPE scheme is highly flexible in that it can be applied to measuring similarity based on different distance metrics. As a result, TPE can be utilized as the critical component to build different privacy-preserving biometric authentication systems. In the rest of this section, we will illustrate how to utilize TPE to construct a biometric authentication scheme based on Euclidean distance, Hamming distance and so on.

V-B Euclidean Distance

Euclidean distance is often used to measure the similarity between vectors of non-binary entries. A FingerCode representing a fingerprint is an -dimensional vector, where each entry is an -bit integer. Typically, and . We denote a registered FingerCode as and a query FingerCode as . Let be the Euclidean distance between and . Then we have

where is the inner product of and . Let be a pre-defined threshold. Our goal is to extend and to vectors and respectively such that the relation can be determined through computing . In light of this, we let and . Then we have

To secure the biometric templates, we further add different randomnesses (i.e., and ) to the extended vectors as shown in Protocol 2. The rest of the encryption procedures is then the same as those in and .

As presented in Protocol 2, during the registration phase, an end-user encrypts his template as and registers along with her identity at the service provider. During the query phase, the end-user encrypts a freshly generated template as and sends to the service provider. Then the service provider runs with inputs and and outputs an authentication result. The correctness of this scheme is guaranteed by Theorem 1, with slight adaption to Euclidean distance. That is if and only if .

Input: .
Output: .

Setup (End-user ):

1:  Set the public parameters as .
2:  Randomly generate two matrices and with dimension and a permutation .
3:  Set secret key .

Registration (End-user ):

1:  Generate random numbers and . Eextend to an -dimensional vector .
2:  Permute to obtain .
3:  Transform to a diagonal matrices with being the diagonal.
4:  Generate a random lower triangular matrix with the diagonal entries fixed as . Compute .
5:  Register the record to the service provider , where is the identity of end-user .

Query (End-user ):

1:  Generate random numbers and .
2:  Extend to an -dimensional vector .
3:  Permute to obtain .
4:  Transform to a diagonal matrix with diagonal being .
5:  Generate a random lower triangular matrix with the diagonal entries fixed as . Compute
6:  Send the query to .

Authentication (Service Provider ):

1:  On receiving a query from the end-user , retrieve the registered record according to .
2:  Compute .
3:  Set if ; otherwise set .
Protocol 2 Privacy Preserving Biometric Authentication

V-C Distance in Hamming Space

From the construction of Euclidean distance, we know that the critical part in computing the distance through inner product lies in proper design of the extended vectors. Thus, in the following, we will focus on how to design the vectors in order to compute different distances.

Hamming distance is a popular metric to measure the similarity of binary template such as Iris. Now, we assume the registered template and query template are and respectively, where and are or . To calculate the Hamming distance between and , we first map the ’s in and to and map ’s to . Then we have

The condition is equivalent to . Thus, we need to design vectors and such that can be represented as . In light of this, we let and . Then the rest of the authentication process is similarly as in Protocol 2.

In fact, the Hamming distance between two binary vectors is just one specific distance metric. There are many other different metrics such as Minkowski distance, Sokal & Michener similarity and Sokal & Sneath-II  [7] introduced for different applications. Using our proposed TPE scheme, we are able to evaluate such metrics and compare them to a pre-defined threshold. The critical part is to properly design the vectors and given two binary vectors and .

Vi Security Analysis

In this section, we analyze the security of PassBio under both passive attack and active attack as defined in Section III. PassBio is designed so that the service provider is unable to learn any critical information about the registered and query templates other than what is already revealed by the decryption process, given an encrypted registered template and a sequence of encrypted query templates.

Since PassBio is based on our proposed TPE, we will focus on the security analysis of TPE in the following discussion. An important difference between TPE and some traditional symmetric encryption schemes is that it is the service provider (could be malicious) that carries out the decryption process. And the decryption process will reveal whether the inner product is within a threshold or not. Therefore, in the security analysis of TPE, it is necessary to analyze the security of both the encryption and decryption process, which will be discussed separately in the following sections.

Vi-a Encryption Security

We first give a sketch of encryption security analysis. We will first utilize two experiments to model the ability of the adversary in passive attack and active attack, respectively. Then, we define the security of TPE under both passive and active attacks. At last, we prove the security of TPE under active attack since it implies the security under passive attack.

Vi-A1 Security against passive attack

In our scenario, the passive attack corresponds to the ciphertext-only-attack [21], where an adversary observes a sequence of ciphertext. We define an experiment to simulate passive attacks, where the superscript denotes that the adversary is able to submit multiply messages instead of one single message.

1:  Given a security parameter , the adversary outputs two sequences of messages and , where the length of each message .
2:  The challenger runs to generate the secret key .
3:   chooses a uniform bit and computes the ciphertext . The sequence is returned to .
4:  The adversary outputs a bit .
5:  The output of the experiment is 1 if , and 0 otherwise.
Passive attack experiment :

Based on , we now define the security of TPE under passive attack.

Definition 1.

The proposed TPE scheme is secure against passive attack if for all polynomial-time adversary , there is a negligible function

such that the probability

Remark 1.

In the above security definition, we only use the token generation function as a representative. This is because the operations involved in and are almost the same. The security analysis for applies for . However, in our security proof, we will show that both and meet the security requirement.

Based on Definition 1, we have the following theorem.

Theorem 2.

The proposed TPE scheme is secure against passive attack.

We will omit the proof of Theorem 2. Instead, we will prove security against active attack since it implies the security under passive attack.

Vi-A2 Security against active attack

Under the active attack, the service provider is able to observe a sequence of pairs of query templates as well as their encrypted version. This can happen when, for example, some adversaries submit their templates and collude with the service provider. This attack scenario corresponds to the Chosen-Plaintext-Attack (CPA) in cryptography. Accordingly, an encryption scheme has CPA-security if it is secure against CPA. To prove that TPE has CPA-security, we model the active attack using and experiment . We define CPA-security for TPE as follows.

1:  The function generates a secret key .
2:  The adversary is given oracle access to the function and outputs two messages and of the same length to the challenger .
3:  The challenger chooses a uniform bit , then computes and returns to .
4:   continues to have oracle access to and outputs a bit . Note however, cannot use to generate tokens for messages somehow related to and .
5:  The output of the experiment is 1 if , and 0 otherwise.
Active attack experiment :
Definition 2.

The proposed TPE is secure against active attack if for all polynomial-time adversary , there is a negligible function such that the probability

Remark 2.

Different from the passive attack experiment, the adversary will continually have oracle access to the token generation function. This models the situation where the adversary is able to observe multiple pairs of messages and their ciphertexts.

Remark 3.

Unlike the passive attack experiment where the adversary submits multiple pairs of messages, we only discuss the situation where the adversary submits one pair of messages to the challenger. This is because it is proved in [21] that any private-key encryption scheme that is CPA-secure is also CPA-secure for multiple encryptions. As a result, it is sufficient to prove that TPE is CPA-secure for one single encryption.

Theorem 3.

The proposed TPE is secure against active attack.

Proof:

We need to prove that the adversary cannot distinguish and , even given the oracle access to .

Consider the encryption of message . Suppose is an -dimensional vector. Follow the procedure in , the vector is first extended to a vector , where and are random numbers. The vector is then permuted as , which is then extended to an diagonal matrix . Then, the ciphertext for is , where is a random lower triangular matrix. We note that the product of and will produce a lower triangular matrix denoted as , with as the diagonal. Now we focus on the product .

Denote the entries in and as and , respectively, where . For matrix , denote its non-zero entries in the lower triangular part as , where and . Then, by law of matrix multiplication, each entry in can be written in the form of

(1)

where , are polynomials. Equation (1) is obtained by summing up each terms of , and , respectively.

Now, observe Equation (1) in the context of the experiment . We know that and are fixed. and are one-time random numbers. are chosen and can be controlled by the adversary . In step 4) of experiment , the adversary can select different each time and observe the value of since continuously has oracle access to . However, since and are one-time random numbers, the polynomials , and all looks random to . As a result, the summation looks random to . This means that, for any message chosen by and its corresponding ciphertext, cannot distinguish which message is actually encrypted. Thus, the adversary can only output by randomly guessing. Thus we have

Vi-B Decryption Security

The decryption function outputs an intermediate result denoted as and a final result . In the following security analysis, we discuss what information can be learned by the service provider from and .

As in Protocol 1, , where and are random matrices. Recall the proof for Theorem 3, where . Since matrix and follow the same construction, it is obvious that the transformation also has CPA-security. In other words, the transformation is semantically secure, meaning that the adversary is not able to derive any key information of and from .

Now, for the final result , we define a decryption oracle as follows.

1:  The oracle fixes a vector and a number .
2:  For any submitted vector , generates two positive random numbers and and output .
Decryption Oracle :
Theorem 4.

The oracle does not have CPA-security.

Proof:

We provide a proof sketch since the CPA-security proof process follows that for Theorem 3.

An adversary is able to continuously have access to . will submit at her own choice and observe the output . Since and are positive, it is possible that there exists and such that while . This means that, in an experiment defined for CPA-security, the adversary is able to distinguish two ciphertext for two submitted messages. By definition, the oracle does not have CPA-security. ∎

Theorem 4 states that the final result actually reveals some information about and . This result is expected in our design since we want to determine if the inner product of and is within a threshold or not from the final result . However, we note that in our proposed TPE, every vector is associated with a one-time independent random number and every vector is associated with a one-time random number . As a result, in the active attack, what an adversary can observe through decryption is a series of results . Since are selected independently, the final results only reveals whether is positive or not. No more key information can be derive from .

Vi-C The Effect of Randomness on Security

Besides the randomly generated long-time keys (i.e., , and ), we also introduce different one-time randomness in the encryption scheme. At the high-level view, the one-time randomness provides TPE with CPA-security similar to that of the one-time pad. From a cryptographic point of view, the one-time pad encryption scheme provides perfect security. However, it is not practical since the one-time secret key has the same length as the message itself. The most notable difference between TPE and the traditional encryption schemes is that TPE actually does not decrypt the message. Instead, TPE evaluate a function of the ciphertext in order to obtain the function value of the plaintext. As a result, TPE does not require the one-time randomness in the decryption process. In this sense, TPE can achieve the security comparable to the one-time pad while avoiding the impractical key management requirement.

It is important to understand the effect of different randomness on security. We briefly categorize the one-time randomness utilized by TPE into three types.

  • Type I: result-disguising randomness. When extending the vectors in both and , we use random and respectively to multiply with each entry of and . Since and will remain in the decryption result, we name it as result-disguising randomness.

  • Type II: vector-extension randomness. In both and , we extend the vector and pad it with a random .

  • Type III: matrix-multiplication randomness. In both and , we multiply the extended matrices ( and ) with random matrices ( and ).

These one-time randomnesses together ensure the CPA-security of the encryption process of TPE as analyzed in Section VI-A. The main function of decryption is to evaluate the trace of the matrix. We note that the trace function will cancel Type II and Type III randomness. However, Type I randomness will remain in the decryption result. This is important since it will only reveal partial information of the plaintext, which is just adequate for the purpose of biometric authentication. We will further demonstrate the effect of Type I randomness in Section VII-A.

Vii Other Applications of TPE

Our proposed threshold predicate encryption scheme enables a compute-then-compare computational model over encryption data. That is, given two encrypted vector and , an untrusted party is able to determine whether the inner product of and is greater than or within a threshold . No other key information about the value of or is exposed. Previously, we also showed that utilizing the inner product of and , we are able to compute many distance and similarity metrics. Such properties of TPE are critical for many applications that require data security and privacy.

Vii-a Improved Security for Outsourced Biometric Identification

Outsourcing of different computational problems to the cloud while preserving the security and privacy of the outsourced problem has becoming a new trend. Many previous works have considered secure outsourcing of different problems [39, 41, 38, 40, 37, 35]. In [35], a secure outsourcing scheme is proposed for biometric identification. The system models of outsourced biometric identification and biometric authentication are fundamentally different. In outsourced biometric identification, a data owner possesses a database of users’ biometric templates. The goal of biometric identification is that given a query template, the data owner needs to identify a user to whom the query template belongs to.

To protect the security and privacy of biometric templates, [35] proposed an outsourcing scheme where the database owner will first encrypt the templates and then outsource the encrypted data to the cloud. Specifically, the data owner encrypts a biometric template as using a symmetric key . For a given query template , it is also encrypted as using the same key . The scheme is designed in such a manner that given two encrypted templates and and a query template , the cloud is able to determine which template ( or ) is closer to , without learning any key information about , and . By repeating this process, the cloud is able to identify the template that is closest to . Then the encrypted version is returned to the data owner, who can decrypt to obtain and calculate the actual distance between and . Thus, the data owner can finally decide whether and are close enough such that they belong to the same person.

There are mainly two security and privacy issues regarding the above scheme. First, the registration phase is vulnerable to the registration attack [13], since an adversary (i.e., the cloud) is able to inject known templates into the database. During decryption, the cloud is able to derive the following equation (i.e., Equation (3) in [13]):

where is the -th entry in a submitted query template . Since and are computable and and are selected by the cloud, the cloud is able to recover . Repeating such attack will finally recover the whole query template as demonstrated in [13].

Second, from the decryption result, the cloud is able to learn more information than needed. In particular, the cloud is able to determine which one of any two encrypted template is closer to the query template. By repeating this process, the cloud can actually rank all the templates by their distances to the query template. This unnecessarily reveals more information than what is needed in biometric identification.

We now show that our proposed TPE scheme can address these two issues. The security vulnerability of the scheme in [35] was caused due to lacking of Type I randomness as defined in Section VI-C. The trace function will cancel the Type III randomness, resulting in Equation (3) in [13].

Our proposed TPE scheme can be directly utilized in outsourced biometric identification. In the encryption part, each registered template is encrypted with . A query template is encrypted with . The decryption process will give , where and are one-time random numbers associated with and respectively. As a result, Equation (3) in [13] is replaced by

Note that is a one-time random number associated with a query and is a one-time random number associated with . Thus, although the adversary is able to insert known templates into the database, it cannot derive due to the one-time randomness. In other words, the outsourced biometric identification scheme based on TPE is able to defend against registration attack.

For the second privacy issue, the decryption result will only reveal whether the distance between the query and the registered template is within a threshold or not. Since is a one-time randomness associated with each registered template , the relative distance information is concealed. As a result, the cloud is not able to rank all the registered templates according to the distance to the query template.

Vii-B Searching Over Encrypted Data

With the development of cloud computing and storage, there is a clear motivation for searching over encrypted data [32, 30, 6, 33]. For example, a medical institution may store its medical data in the cloud. To ensure security of the data, the institution chooses to encrypt all the data before outsourcing. Meanwhile, the institution wishes to maintain the searching ability over the encrypted data in order to retrieve the desired data files. The proposed TPE is a promising solution for searching over encrypted data. In the following, we discuss how to utilize TPE to implement different searching functionalities.

Vii-B1 Set Intersection

We assume that a file is indexed by a set of keywords . The files and their associated keyword sets are encrypted and outsourced to the cloud. A search query consists of a set of keywords . Given the search query, the cloud will return the file if the overlap of keyword sets and exceeds a certain threshold . That is .

The above set intersection search function can be implemented through TPE as follows. Suppose the universe of keywords is the set with size . Fix the order of the keywords within . Then, an index can be formulated as an -dimensional binary vector , where means that the -th keyword in appears in . The vector for file is encrypted using . Each file is then encrypted using standard symmetric encryption schemes such as AES. The encrypted files and index are outsourced to the cloud. For a search query , a vector can be formulated in a similar manner. Then a search token can be generated using . With this formulation, it is obvious that where denotes the inner product of and . With TPE, the cloud is able to identify the files whose associated indices satisfy while not learning any useful information about the indices.

Vii-B2 Weighted Sum Evaluation

For many numeric data, it is significant to evaluate the weighted sum of the data record with different weights. For example, the grades of each subject for a student form a vector . An evaluator wants to evaluate the performance of the students via some criteria. Each criterion can be formulated as the weighed sum of the grades. The different weights reflects different emphasis on the subjects.

We assume that an administrator possess the grades for all the students. For privacy issues, all the grades are encrypted using and stored in an external server. An evaluator desires to identify those students whose performance meets certain standard. In this scenario, the evaluator can submit a vector of weights to the administrator, who will then generate a search token for the evaluator through . The evaluator can submit the token generated for to the sever and search over the encrypted grades. The server is then able to identify the students whose grades satisfy .

Viii Performance Evaluation

In this section, we evaluate the performance of PassBio. First, we give detailed analysis of both computational and communication complexity. Then, some numeric results are presented for the proposed TPE through simulation.

Viii-a Complexity Analysis

As shown in Protocol 2, at local side an end-user needs to run the , and algorithms. The service provider needs to run the algorithm for every query. It is obvious that the computational bottleneck of these algorithms lies in matrix multiplication or matrix inversion. Thus, in the following analysis, we will focus on matrix multiplication and inversion. Without loss of generality, we assume that the matrices involved in the computation all have the same dimension .

For the function , two random matrices are generated and two matrix inversions need to be calculated. Note that the setup phase is generally a one-time process. That is, needs to be executed by the end-user only once. The function and will both take matrix multiplications. As a result, they have a complexity of , without optimization for matrix multiplication.

In the function , the trace of needs to be computed. There is no need to calculate the matrix multiplication before evaluating the trace. Only computing of the diagonal entries is needed. Thus, has a complexity of .

In terms of communication overhead, assume all the matrix or vector has the same size . In the registration phase, the end-user needs to submit the encrypted template to the service provider. Thus the communication overhead for registration is . Similarly, the communication overhead for the query phase is also .

Viii-B Efficiency Improvement

The above complexity analysis shows that the computational bottleneck of both and lie in matrix multiplication. For resource-constrained devices such as mobile phones, the computation of matrix multiplication with high dimensions is still expensive, if not impossible. In the following, we will introduce two typical techniques that can reduce the computational overhead for mobile devices.

Viii-B1 Dimension Reduction

The complexity of normal matrix multiplication is , where is the dimension of the matrices. Thus, a straight forward way to reduce the complexity is to reduce the dimension of the matrices. For applications such as biometric authentication and identification, it is critical to preserve the identification accuracy while reducing the dimension. Several works [3, 20, 26] have been devoted to reducing the sizes of biometric templates. In [3], two techniques are introduced to decimate the FingerCode representation. The tesselation reduction approach reduces the dimension of FingerCode from the feature generation phase, which is illustrated in Section V-A. Specifically, given a fingerprint image, this approach will reduce the number of sectors of the tessellation. The other approach is to directly apply some general dimension reduction methods such as PCA to the obtained FingerCodes. In this way, the most compact representation of FingerCode is found for a specific dataset.

We note that the above two approaches will both degrade the identification accuracy, however, to a satisfying level. In the experiments [3], the length of FingerCode vary from 640 to 8 in the tesselation approach. For PCA approach, the dimension of FingerCode varies from 64 to 4. Generally speaking, the shorter the FingerCode is, the worse the accuracy would be. However, the experimental result demonstrated that FingerCode of dimensions 96 (from tesselation reduction) and 8 (from PCA) can achieve a satisfactory accuracy compared to that of the original 640. We also note that the approaches in [3] quantized each entry in FingerCode resulting a reduced accuracy. However, our proposed TPE scheme can be directly utilized to real numbers. Thus, TPE is applicable to the non-quantized case in [3], which has a higher accuracy.

Viii-B2 Online/Offline Computation

The idea of online/offline computation [9, 14, 25] is to divide a computational expensive process into an online phase and an offline phase. During the offline phase, some pre-computation is done without given the input. During the online phase, given the input, it is relatively easy to padding the offline computation result in order to generate the final result. Typically, the offline computation is carried out when the mobile devices are idle or getting charged. Thus, such an approach can reduce the overall responding time and battery consumption.

Our proposed TPE scheme can utilize such approach to reduce the online computational overhead. For example, in the query phase, an end-user needs to compute given a transformed template

. Then during the offline phase, the end-user can generate the random matrix

and compute . The computation results can be stored for later usage. When a fresh template is generated, the end-user can compute during the online phase. This approach can reduce half of the computational overhead, which is critical for resource-constrained devices.

Viii-C Numeric Results

In this section, we measure the performance of our proposed TPE scheme through simulation. Since the functions and are both one-time processes during the registration phase, we mainly focus on the execution time of .

Since PassBio is a user-centric biometric authentication scheme, we measure the performance on both mobile phone and personal laptop. In the simulation, we utilize a mobile phone with Android 6.0 operating system, 2.5 GHz Cortex-A72 CPU and 4 GB RAM. We also utilize a personal laptop with macOS 10, 1.6 GHz Intel Core i5 and 4 GB RAM. The java library UJMP [1] and C++ library Armadillo are utilized for the simulation in the mobile phone and personal computer, respectively. We note that the performance relies on the selection of software packages. Our selection does not guarantee the best performance. Through complexity analysis, we know that the most important parameter affecting the performance is the dimension of the vector. For the simulation on the mobile phone and laptop, we let vary from 10 to 300 and from 100 to 2000, respectively. Due to the dimension reduction techniques introduced in Section VIII-B1, the dimension is sufficient for most of the biometric templates. We also utilize the online/offline computation mechanism introduced in Section VIII-B2 to reduce to online computational overhead.

Fig. 2: Performance of token generation and evaluation simulated on laptop (with vs. without pre-computation)
Fig. 3: Performance of token generation and evaluation on mobile phone (with vs. without pre-computation)

The numeric result on the laptop is shown in Fig. 2. The token generation time for moderate size template ( is around 200) is just around one millisecond with pre-computation. For high-dimensional template with , the token generation time is less than 1 second with pre-computation. The numeric result on the mobile phone is shown in Fig. 3. The simulation results show that it is efficient to generate tokens for templates with moderate size. For example, when , the generation time is approximately 50 . When , the generation time is around 900 . It can be observed in both figures that the online/offline mechanism can effectively reduce the online computational overhead. By pre-computation during the offline phase, the online computation time is reduced to about half of the whole processing time.

Ix Conclusion

In this paper, we proposed a Threshold Predicate Encryption (TPE) scheme. TPE is able to encrypt a vector and generate a token for another vector . Given the two encrypted vectors, any party is able to determine whether the inner product of and is within a pre-defined threshold or not. Our security analysis shows that no sensitive information about the vectors can be learned by the untrusted party under both passive and active attacks. Based on TPE, we proposed PassBio, a privacy preserving user-centric biometric authentication scheme. One key feature of PassBio is that end-users can encrypt their own biometric template and register it to the service provider. Then the end-user is able to encrypt their freshly generated template and submit them to the service provider for authentication usage. We show that the TPE is suitable for a compute-then-compare computational model on encrypted data. Such a computational model can be widely used in many applications requiring computations on encrypted data while preserving the data security and privacy. In particular, we presented two additional applications of TPE, searching over encrypted data and outsourced biometric identification. Our simulation results demonstrated that the proposed TPE can be efficiently implemented on both mobile phones and personal laptops.

References

  • [1] Universal java matrix package. https://ujmp.org/.
  • [2] Michel Abdalla, Florian Bourse, Angelo De Caro, and David Pointcheval. Simple functional encryption schemes for inner products. In IACR International Workshop on Public Key Cryptography, pages 733–751. Springer, 2015.
  • [3] Tiziano Bianchi, Stefano Turchi, Alessandro Piva, Ruggero Donida Labati, Vincenzo Piuri, and Fabio Scotti. Implementing fingercode-based identity matching in the encrypted domain. In Biometric Measurements and Systems for Security and Medical Applications (BIOMS), 2010 IEEE Workshop on, pages 15–21. IEEE, 2010.
  • [4] Allison Bishop, Abhishek Jain, and Lucas Kowalczyk. Function-hiding inner product encryption. In International Conference on the Theory and Application of Cryptology and Information Security, pages 470–491. Springer, 2015.
  • [5] Dan Boneh, Amit Sahai, and Brent Waters. Functional encryption: Definitions and challenges. Theory of Cryptography, pages 253–273, 2011.
  • [6] Dan Boneh and Brent Waters. Conjunctive, subset, and range queries on encrypted data. Theory of cryptography, pages 535–554, 2007.
  • [7] Seung-Seok Choi, Sung-Hyuk Cha, and Charles C Tappert. A survey of binary similarity and distance measures. Journal of Systemics, Cybernetics and Informatics, 8(1):43–48, 2010.
  • [8] Sunoh Choi, Gabriel Ghinita, Hyo-Sang Lim, and Elisa Bertino.

    Secure knn query processing in untrusted cloud environments.

    IEEE Transactions on Knowledge and Data Engineering, 26(11):2818–2831, 2014.
  • [9] Sherman SM Chow, Joseph K Liu, and Jianying Zhou. Identity-based online/offline key encapsulation and encryption. In Proceedings of the 6th ACM Symposium on Information, Computer and Communications Security, pages 52–60. ACM, 2011.
  • [10] Pratish Datta, Ratna Dutta, and Sourav Mukhopadhyay. Functional encryption for inner product with full function privacy. In Public-Key Cryptography–PKC 2016, pages 164–195. Springer, 2016.
  • [11] Yousef Elmehdwi, Bharath K Samanthula, and Wei Jiang. Secure k-nearest neighbor query over encrypted data in outsourced environments. In Data Engineering (ICDE), 2014 IEEE 30th International Conference on, pages 664–675. IEEE, 2014.
  • [12] Shafi Goldwasser, S Dov Gordon, Vipul Goyal, Abhishek Jain, Jonathan Katz, Feng-Hao Liu, Amit Sahai, Elaine Shi, and Hong-Sheng Zhou. Multi-input functional encryption. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 578–602. Springer, 2014.
  • [13] Changhee Hahn and Junbeom Hur. Poster: Towards privacy-preserving biometric identification in cloud computing. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 1826–1828. ACM, 2016.
  • [14] Susan Hohenberger and Brent Waters. Online/offline attribute-based encryption. In International Workshop on Public Key Cryptography, pages 293–310. Springer, 2014.
  • [15] Anil K Jain and Karthik Nandakumar. Biometric authentication: System security and user privacy. IEEE Computer, 45(11):87–92, 2012.
  • [16] Anil K Jain, Karthik Nandakumar, and Arun Ross. 50 years of biometric research: Accomplishments, challenges, and opportunities. Pattern Recognition Letters, 79:80–105, 2016.
  • [17] Anil K Jain, Salil Prabhakar, and Lin Hong. A multichannel approach to fingerprint classification. IEEE transactions on pattern analysis and machine intelligence, 21(4):348–359, 1999.
  • [18] Anil K Jain, Salil Prabhakar, Lin Hong, and Sharath Pankanti. Fingercode: a filterbank for fingerprint representation and matching. In Computer Vision and Pattern Recognition, 1999. IEEE Computer Society