Aggregation and Embedding for Group Membership Verification

by   Marzieh Gheisari, et al.

This paper proposes a group membership verification protocol preventing the curious but honest server from reconstructing the enrolled signatures and inferring the identity of querying clients. The protocol quantizes the signatures into discrete embeddings, making reconstruction difficult. It also aggregates multiple embeddings into representative values, impeding identification. Theoretical and experimental results show the trade-off between the security and the error rates.



page 1

page 2

page 3

page 4


Joint Learning of Assignment and Representation for Biometric Group Membership

This paper proposes a framework for group membership protocols preventin...

Group Membership Verification with Privacy: Sparse or Dense?

Group membership verification checks if a biometric trait corresponds to...

Privacy Preserving Group Membership Verification and Identification

When convoking privacy, group membership verification checks if a biomet...

AggNet: Learning to Aggregate Faces for Group Membership Verification

In some face recognition applications, we are interested to verify wheth...

FedIPR: Ownership Verification for Federated Deep Neural Network Models

Federated learning models must be protected against plagiarism since the...

Federated Learning of User Verification Models Without Sharing Embeddings

We consider the problem of training User Verification (UV) models in fed...

Delay Function with Fixed Effort Verification

A Verifiable Delay Function (VDF) is a function that takes a specified (...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Verifiying that an item/device/individual is a member of a group is needed for many applications granting or refusing access to sensitive resources. Group membership verification is not about identifying first and then checking membership. Rather, being granted with access requires that the members of the group could be distinguished from non-members, but it does not require to distinguish members from one another.

Group membership verification protocols first enroll eligible signatures into a data structure stored at a server. Then, at verification time, the structure is queried by a client signature and the access is granted or not. For security, the data structure must be adequately protected so that a honest but curious server cannot reconstruct the signatures. For privacy, verification should proceed anonymously, not disclosing identities.

A client signature is a noisy version of the enrolled one, due to changes in lighting conditions e.g. The verification must absorb such variations and cope with the continuous nature of signatures. They must be such that it is unlikely that a noisy version for one user gets similar enough to the enrolled signature of any other user. Continuity, discriminability and statistical independence are inherent properties of signatures.

This paper proposes a group membership verification protocol preventing a curious but honest server from reconstructing the enrolled signatures and inferring the identity of querying (trusted) clients. It combines two building blocks:

Block #1:

One building block hashes continuous vectors into discrete

embeddings. This lossy process limits the ability of the server to reconstruct signatures from the embeddings. Block #2: The other building block aggregates multiple vectors into a unique representative value which will be enrolled at the server. The server can therefore not infer any specific signature from this value. Sufficient information must be preserved through the aggregation process for the server to assert whether or not a querying signature is a member of the group.

These two blocks can be assembled according to two configurations: block #1 before block #2, the system acquires and then hashes the signatures before aggregating them. The opposite configuration is where acquired signatures are aggregated before hashing the result of this aggregation. At query time, the newly acquired signature is always hashed before being sent to the server. Weaknesses and strengths of these two configurations are explored in the paper.

2 Related Work

Group membership verification protocols relying on cryptography exist [1] but are more relevant to authentication, identification and secret binding. Other approaches apply homomorphic encryption to signatures, compare [2] and threshold them [3, 4] in the encrypted domain, and need active participation of clients. Approaches involving cryptography, however, are extremely costly, memory and CPU wise.

Group membership is linked to Bloom filters that are used to test whether an element is a member of a set. When considering security, a server using Bloom filters cannot infer any information on one specific entry [5]. Note that Bloom filters can not deal with continuous high dimensional signatures and that queries must be encrypted to protect the privacy of users [6, 7]. Bloom filters, adapted to our setup, however, form a baseline in our experiments (see Sect. 5.3).

Embedding a single high dimensional signature is quite a standard technique. The closest to our work is the privacy-preserving identification mechanism based on sparsifying transform [8, 9]. It produces an information-preserving sparse ternary embedding, ensuring privacy of the data users and security of the signature.

Aggregating signals into similarity-preserving representations is very common in computer vision 

[10, 11, 12]. They do not consider security or privacy. In [13], Iscen et al. use the group testing paradigm to pack a random set of image signatures into a unique high-dimensional vector. It is therefore an excellent basis for the aggregation block: the similarities between the original non-aggregated signatures and a query signature is preserved through the aggregation.

3 Notations and definitions

Signatures are vectors in . If users/items belong to the group, then the protocol considers signatures, , …, . The signature to verify is a query vector . Group membership verification considers two hypotheses linked to the continuous nature of the signatures:

: The query is related to one of the vectors. For instance, it is a noisy version of vector , , with to be a noise vector.

: The query is not related to any vector in the group.

We first design a group aggregation technique which computes a single representation from all vectors . This is done at the enrollment phase. Variable denotes the size in bits of this representation.

At the verification phase, the query is hashed by a function of size in bits. This function might be probabilistic to ensure privacy. The group membership test decides which hypothesis is deemed true by comparing and . This is done by first computing a score function and thresholding its results: .

3.1 Verification Performances

The performances of this test are measured by the probabilities of false negative,

, and false positive, . As varies from and , these measures are summarized by the AUC (Area Under Curve) performance score. Another figure of merit is for s.t. , a required false positive level.

3.2 Security and Privacy

A curious server can reconstruct a signature from its embedding (for instance the query): . The mean squared error is a way to assess its accuracy: The best reconstruction is known to be the conditional expectation: .

Reconstructing an enrolled signature from the group representation is even more challenging. Due to the aggregation block, the curious server can only reconstruct a single vector

from the aggregated representation, and this vector serves as an estimation of any signatures in the group:

(a) vs. . , , for varying .
(b) and vs. . Solid and dashed lines correspond to and .
Figure 1: Performance for a unique group

4 Verification for a few group members

This section discusses the verification protocol when is small. We study the two different configurations for assembling block #1 and block #2.

Block #1: Embedding. An embedding maps a vector to a sequence of discrete symbols. This quantization shall preserve enough information to tell whether two embeddings are related, but not enough to reconstruct a signature. We use the sparsifying transform coding [8, 9]. It projects to the range space of a transform matrix . The output alphabet is imposed by quantizing the components of whose amplitude is lower than to 0, the others to +1 or -1 according to their sign. In expectation, symbols are non null.

Block #2: Aggregation. Aggregation processes a set of input vectors to produce a unique output vector. When block #1 is used before, that is, when considering , then . When block #2 is used before block #1, that is when considering , then .

4.1 Aggregation strategies

The nature of highly depends on the type of vector the aggregation function receives. When considering , then gets continuous signatures. In this case it is possible to design two aggregations schemes that are:


where is the matrix , , and is the pseudo-inverse of . Eq. (2) is called the sum and (3) the pinv schemes in [13].

When considering , then gets the embeddings of the signatures. Two additional aggregation strategies are the sum and sign pooling (4) and the majority vote (5):


4.2 Four resulting schemes

The assemblage of the blocks and the aggregation strategies overall create four variants. We name them:

  • HoA-2: this scheme sums the raw signatures into a unique vector before embedding it in order to obtain . It therefore corresponds to the case where , the aggregation being defined by (2).

  • HoA-3: here also, aggregation precedes embedding, , and is defined by (3).

  • AoH-4: this scheme embeds each signature before aggregating with sum and sign pooling as defined by (4).

  • AoH-5: here also, embedding precedes aggregation, but the majority vote is used as defined by (5).

The score function comparing the hashed query with the group representation is always .

5 Reconstruction and Verification

This section makes the following assumptions: i) Enrolled signatures are modelled by

, ii) Square orthogonal matrix

known by the attacker.

5.1 Ability to reconstruct from the embedding

Now that preserves the norm, the on is the same as the mean square reconstruction error on

, which is also white Gaussian distributed. Thanks to the independance of the components of

, the conditional expectation can be computed component-wise. We introduce the density function conditioned on the interval :


with intervals , , and . Function is the p.d.f. of and is the indicator function of interval .

Observing the -th symbol of equals reveals that . This component is reconstructed as . Note that because is symmetric around 0. For , the reconstruction value equals , where . By symmetry, , and admits the following close form:


This quantity starts at when . The embeddings are then full binary words (). All components are reconstructed by

but with a large variance. As

increases, this variance decreases but less non-null components are reconstructed. achieves a minimum of for , where of the symbols of an embedding are non null. Then, increases up to for a large : the embeddings becomes sparser and sparser. When fully zero, each component is reconstructed by , and equals .

(a) vs. . Dotted lines are theoretical . , , , for HoA-6, and for AoH-7.
(b) vs. . , , , for HoA-6, and for AoH-7.
Figure 2: Performance for multiple groups

5.2 Ability to reconstruct the signatures

The curious server tries to reconstruct a unique vector from which represents the enrolled signatures. Note that is scale invariant: scaling the signatures by any positive factor does not change . Suppose that the curious server reconstructs . The best scaling minimizing  (1) is: with . The curious server can not compute giving birth to a larger distortion:


This lower bound is further minimized by choosing .

Therefore, aggregation (2) is less secure as the other schemes do not allow the reconstruction of . In the worst case (2), the curious server estimates by :

The first term is the squared distance between and , whereas the second term corresponds to the error reconstruction for inverting the embedding. In the end:


This figure of merit increases with because , : Packing more signatures increases security.

5.3 Verification performances

We compare to a baseline defined as a Bloom filter optimally tuned for given and having length . An embedding is mandatory to first turn the real signatures into discrete objects. This means that, under , a false negative happens whenever .

Fig. (a)a shows the vs.  (7) for the schemes of Sect. 4.1 for different sparsity . Two schemes performs better. For low privacy (small ), HoA-3 achieves the largest (with ) ; for high privacy, AoH-4 is recommended (with ). In these regimes, the performances are better than the Bloom filter.

Fig. (b)b shows how the verification performances decrease as the number of enrolled signatures increases. As mentioned in [13], the behavior of the aggregation scheme depends on the ratio . The longer the signatures, the more of them can be packed into one representation.

6 Verification for multiple groups

When is large, aggregating all the signatures into a unique performs poorly. Rather, for large , we propose to partition the enrolled signature into groups, and to compute different representatives, one per partition.

Random assignment: The signatures are randomly assigned into groups of size .


Similar signatures are assigned to the same group. The paper uses the k-means algorithm to do so. Yet, the size of the groups is no longer constant.

6.1 Verification performances

Denote by the operating point of group number , . The overall system outputs a positive answer when at least one group test is positive. Denote by the performance of the global system. Under , the query is not related to any vector. Therefore,


Under , the query is related to only one vector belonging to one group. A false negative occurs, if this test produces a false negative and the other tests a true negative each:


The operating point of a group test is mainly due to the size of the group. The random assignment creates even groups (if divides ), so these share the operating point .

Fig. (a)a shows the experimental and the one predicted by (12) and (13) when ranges from to . Since clustering makes groups of different sizes, we show the performances versus , where is the size of -th group. The theoretical formulas are more accurate for random partitioning where the group are even. Estimations of were less precise with the clustering strategy, and this inaccuracy cumulates in (12) and (13).

Clustering improves the verification performances a lot especially for HoA-3. A similar phenomenon was observed in [13]. Yet, Fig. (b)b shows that it does not endanger the system: is only slightly smaller than for random assignment, and indeed close to 1 for . This is obtained for for HoA-3 giving . The space is so big that the clusters are gigantic and not revealing much about where the signatures are. However, the anonymity is reduced because the server learns which group provided a positive test. This is measured in term of -anonymity by the size of the smallest group, i.e. . Fig. (a)a indeed shows the trade-off between -anonymity and the verification performances.

7 Conclusion

This paper proposed four schemes for verifying the group membership of continuous high dimensional vectors. The keystones are the aggregation and embedding functions. They prevent accurate reconstruction of the enrolled signatures, while recognizing noisy version. However, the anonymity is slightly revealed when managing many signatures aggregated into several representatives: the server is only able to link each signature to its group number. Yet, the full identity of the user is preserved.


  • [1] Stuart Schechter, Todd Parnell, and Alexander Hartemink, “Anonymous authentication of membership in dynamic groups,” in Proceedings of the International Conference on Financial Cryptography, 1999.
  • [2] J. R. Troncoso-Pastoriza, D. González-Jiménez, and F. Pérez-González, “Fully private noninteractive face verification,” IEEE Transactions on Information Forensics and Security, vol. 8, no. 7, July 2013.
  • [3] Zekeriya Erkin, Martin Franz, Jorge Guajardo, Stefan Katzenbeisser, Inald Lagendijk, and Tomas Toft,

    “Privacy-preserving face recognition,”

    in Proceedings of the International Symposium on Privacy Enhancing Technologies, 2009.
  • [4] Ahmad-Reza Sadeghi, Thomas Schneider, and Immo Wehrenberg, “Efficient privacy-preserving face recognition,” in Proceedings of the International Conference on Information, Security and Cryptology, 2010.
  • [5] Giuseppe Bianchi, Lorenzo Bracciale, and Pierpaolo Loreti, “”better than nothing” privacy with bloom filters: To what extent?,” in Proceedings of the International Conference on Privacy in Statistical Databases, 2012.
  • [6] Dan Boneh, Eyal Kushilevitz, Rafail Ostrovsky, and William E. Skeith, “Public key encryption that allows pir queries,” in Proceedings of the International Cryptology Conference, Advances in Cryptology, 2007.
  • [7] M. Beck and F. Kerschbaum, “Approximate two-party privacy-preserving string matching with linear complexity,” in Proceedings of the IEEE International Congress on Big Data, 2013.
  • [8] Behrooz Razeghi, Slava Voloshynovskiy, Dimche Kostadinov, and Olga Taran, “Privacy preserving identification using sparse approximation with ambiguization,” in Proceedings of the IEEE International Workshop on Information Forensics and Security, 2017.
  • [9] Behrooz Razeghi and Slava Voloshynovskiy, “Privacy-preserving outsourced media search using secure sparse ternary codes,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2018.
  • [10] J. Sivic and A. Zisserman, “Video google: a text retrieval approach to object matching in videos,” in Proceedings of the IEEE International Conference on Computer Vision, 2003.
  • [11] Hervé Jégou, Florent Perronnin, Matthijs Douze, Jorge Sánchez, Patrick Pérez, and Cordelia Schmid, “Aggregating local image descriptors into compact codes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 9, pp. 1704–1716, 2012.
  • [12] F. Perronnin and C. Dance, “Fisher kernels on visual vocabularies for image categorization,” in

    Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition

    , 2007.
  • [13] Ahmet Iscen, Teddy Furon, Vincent Gripon, Michael Rabbat, and Hervé Jégou, “Memory vectors for similarity search in high-dimensional spaces,” IEEE Transactions on Big Data, 2017.