There has been a widespread adoption of cloud-based machine learning platforms recently, such as Amazon Sagemaker[joshi2020amazon], Google AutoML [bisong2019google], and Microsoft Azure [team2016azureml]. They allow companies and application developers to easily build and deploy their AI applications as a Service (AIaaS).However, the users of AIaaS services may encounter two major challenges. 1) Large Data Requirement
: Deep Learning models usually require large amounts of training data. This training data needs to be uploaded to the cloud services for the developers to build their models, which may be inconvenient and infeasible at times. 2)Data Privacy Concerns: Sharing data with untrusted servers may pose threats to end-user privacy. For instance, a biometric authentication application deployed in the cloud will expose user photos to a third-party cloud service.
To address the large data requirement problem, there has been increasing research on the approaches that require less amount of training data, popularly known as Few-Shot Learning [Wang2019FewshotLA]. Specifically, metric-based few-shot classification methods [snell_prototypical_2017, vinyals_matching_2016, oreshkin_tadam:_2018, sung_learning_2017, yoon_tapnet:_2019] learn to map images of unseen classes into distinct embeddings that preserve the distance relation among them and then perform classification of the input query image by the distance to the class embeddings. Recent works have been able to achieve up to 90% accuracy on the challenging task of 5-way 5-shot classification on the MiniImageNet dataset [leaderboard]. Despite the success and promises of few-shot learning, it is imperative to address the data privacy concerns to protect user-supplied sensitive data, e.g., when a metric-based few-shot model is deployed in a cloud server (Fig. 1).
Several privacy-preserving approaches may be adopted in machine learning applications, including cryptography, differential privacy, and data obfuscation. Recent works [cabrero2021sok] adopted cryptographic techniques to protect the confidentiality of data. For example, remote machine learning services can provide results on encrypted queries [cabrero2021sok]; a range of primitives, such as Homomorphic Encryption, may be adopted to manage the encrypted data. Despite promising results, crypto-based methods inflict high computational overheads, creating challenges for practical deployment. Furthermore, such solutions may breach privacy by disclosing the exact computation results, and an adversary may utilize the model’s output to launch inference attacks on training data [fredrikson2015model, shokri2017membership]. Differential privacy [dwork2014privacybook] has been adopted to train machine learning models while providing indistinguishability guarantees for individual records in the training set [abadi_deep_2016]. However, the strong privacy guarantees tend to reduce the model performance and have shown disparate impacts on the underrepresented classes [disparateML]. In contrast, data obfuscation methods achieve privacy protection without inflicting high computational costs, e.g., image blurring and pixelization. Obfuscation can be applied to protecting both training and testing data, and can provide differential privacy guarantees at individual-level data [fan_image_2018].
This paper focuses on the privacy of testing data (support+query) specifically for few-shot learning. A few-shot model built for clean images exhibits poor performance when tested with noisy/private image data. This is because meta-learning based few-shot models do not work well with out-of-distribution tasks [finn_model-agnostic_2017, snell_prototypical_2017, parnami2022learning]. Therefore, applying the obfuscation methods to the image data and simply using an off-the-shelf pre-trained few-shot model leads to degradation in performance, as observed in our experiments (Fig. 3 Baseline Model). Hence, it is imperative to study privacy specifically in context of few-shot learning. To this end, we suggest a private few-shot learning approach trained on noisy data samples as illustrated in Fig. 2. Adopting an obfuscation mechanism on the local input data samples, a user transfers privacy-encoded data to the cloud. The proposed jointly-trained, denoised embedding network, the Denoising Network, constructs privacy-preserved latent space for robust few-shot classification. To validate the proposed approach, we examine four privacy methods including traditional obfuscation methods such as Pixelization and Blurring, which do not provide quantifiable privacy guarantees [McPherson2016DefeatingIO], and also Differentially Private Pixelization (DP-Pix) [fan_image_2018] which provides differential privacy guarantees.
This study examines practical implications for a holistic private few-shot learning framework on an untrusted service platform, which has not been studied previously. Thus, our main contributions are 1) first proposing a unified framework for deploying few-shot learning models in the cloud while protecting the privacy of user-supplied sensitive data and 2) thoroughly examining privacy methods on three different datasets of varying difficulty, therefore 3) discovering and observing the existence of the effective privacy-preserved latent space for few-shot learning.
Ii Few-Shot Learning
Few-shot learning is a subfield of machine learning that focuses on the ability of machine learning models to generalize from few-training examples. The recent progress in the field has largely come from a machine learning technique called meta-learning [finn_model-agnostic_2017]. The idea is to train a model on numerous different but similar kinds of tasks such that the model can generalize on a new test task (as long as the test task is from the same distribution of the tasks the model was initially trained on). There are three kinds of Few-shot learning: Metric-based, optimization-based, and model-based [vinyals]. Here we only discuss metric-based techniques and refer the reader to our survey paper [parnami2022learning] for further reading.
The early nominal works in metric-based few-shot learning methods are: Prototypical Networks [snell_prototypical_2017], Matching Networks [vinyals_matching_2016], Relation Networks [sung_learning_2017] etc. In all these methods, the network learns to encode embeddings of input images such that images that belong to same class are closer to each other and those from different class are farther apart, where the idea of closeness is defined in terms of a metric such as euclidean.
Ii-a Few-Shot Classification
We base our framework (Fig. 2) on Prototypical Networks [snell_prototypical_2017] for building our Few-Shot Private Image Classification (FS-PIC) model. The model is trained on a labeled dataset and tested on . The set of classes present in and are disjoint. The test set has only a few labeled samples per class. We follow an episodic training paradigm in which each episode the model is trained to solve an -way -Shot private image classification task. Each episode is created by first sampling classes from the training set and then sampling two sets of examples from these classes: (1) the support set containing examples for each of the classes and (2) the query set containing different examples from the same classes. The episodic training for the FS-PIC task minimizes, for each episode, the loss on the prediction of samples in the query set, given the support set. The model is a parameterized function, and the loss is the negative log-likelihood of the true class of each query sample:
where is the sampled query, is the support set at episode , and is the model parameter.
Prototypical Networks make use of the support set to compute a centroid (prototype) for each class (in the sampled episode) and query samples are classified based on the distance to each prototype. For instance, a CNN, parameterized by , learns a -dimensional space where -dimensional input samples of the same class are close and those of different classes are far apart. For every episode , each embedding prototype (of class ) is computed by averaging the embeddings of all support samples of class as
where is the subset of support examples belonging to class c. Given a distance function , the distance of the query to each of the class prototypes is calculated. By taking a softmax [bridle1990probabilistic] of the measured (negative) distances, the model produces a distribution over the classes in each episode:
where metric is a Euclidean distance and the parameters
of the model are updated with stochastic gradient descent by minimizing Equation (1). Once the training finishes, the parameters of the network are frozen. Then, given any new FS-PIC task, the class corresponding to the maximum is the predicted class for the input query .
Iii Privacy Methods
We study following methods to introduce privacy in images (depicted in Fig. 8 of Appendix).
Iii-a Independent Gaussian Noise
Introducing some noise in an image is one way to distort information [Mivule2013UtilizingNA]. Kim [Kim2002AMF], first publicized the work on additive noise by the general expression , where is the original data point,
is the random variable (noise) with a distributionand is the transformed data point, obtained by the addition of noise to the input .
Iii-B Common Image Obfuscation
Two widely used image obfuscation techniques are Pixelization and Blurring.
(also referred to as mosaicing) can be achieved by superposing a rectangular grid of size over the original image and averaging the color values of the pixels within each grid cell.
i.e., Gaussian blur, removes details from an image by convolving a 2D Gaussian kernel with the image. Let the radius of blur be , then the size of the 2D kernel is given by . Then, the values in this 2D kernel are sampled from the distribution:
where are the coordinates inside the 2D kernel with origin at the center and the standard deviation is approximated from the radius [Gwosdek2011TheoreticalFO]. We use Pillow Image Library [clark2015pillow] for the implementation.
Iii-C Differentially Private Image Pixelization
Differential privacy (DP) is the state-of-the-art privacy paradigm for statistical databases [dwork2014privacybook]. Differentially Private Pixelization (DP-Pix) [fan_image_2018] extends the DP notion to image data publication. It introduces a concept of -Neighborhood, where two images ( and ) are neighboring images if they differ by at most pixels. By differential privacy, content represented by up to pixels can be protected. A popular mechanism to achieve DP is the Laplace mechanism. However, the global sensitivity of direct image perturbation would be very high i.e., , leading to high perturbation error. The DP-Pix method first performs pixelization (with grid cells of pixels) on the input image , and then applies Laplace perturbation to the pixelized image , effectively reducing the sensitivity . The following equation summarizes the algorithm () to achieve -differential privacy:
where each value in is randomly drawn from a Laplace distribution with mean 0 and scale . The parameter specifies the level of DP guarantee, where smaller values indicate stronger privacy. As DP is resistant to post-processing [dwork2014privacybook], any computation performed on the output of DP-Pix, i.e., the perturbed pixelized images, would not affect the -DP guarantees. Our approach proposes a denoising module for the obfuscated images by DP-Pix, improving the latent representation without sacrificing DP guarantees.
Iv Privacy Enhanced Few-shot Image Classification
To build a few-shot model that can preserve the privacy of the input images, we can utilize any of the privacy methods discussed in the previous section. However, doing so may degrade the few-shot classification performance tremendously. To avoid this, we introduce a denoiser and train it jointly for few-shot classification using meta-learning on noisy images (Fig. 2). Together, the denoiser and the embedding network forms our Denoising Network. Combined with the properly chosen privacy method, the Denoising Network aims to discover a privacy-preserved latent embedding space (not denosing to recover the original image), where the privacy of input data is be preserved and robustness and generality for few-shot classification are maintained.
Denoiser: Zhang et al. [zhang_beyond_2017]
proposed a denoising convolutional neural network (DnCNN) which uses residual learning to output Gaussian noise. Specifically, the input of the network is a noisy observation such thatwhere is the input image, be the clean image, and be the actual noise. The network learns the residual mapping function and predicts the clean image using
. The averaged mean squared error between the predicted residue and actual noise is used as the loss function to train this denoiser with parametersas
We plug the DnCNN denoiser into our FS-PIC pipeline (Fig. 2
) to estimate the clean image before pixelization, blurring, Gaussian noise, and DP-Pix. The architecture for the denoiser can be found in Fig.9 of Appendix.
Embedding Network: Partially denoised images from the denoiser are fed to embedding network to obtain denoised embeddings, which then form the class prototypes. The classification loss is measured using Eq. 1.
The total loss for training the Denoising Network (Denoiser + Embedding Network) is formulated as the sum of denoising loss and classification loss:
The joint loss enforces the reduction of noise in input images while learning the distinctive representations that maximize the few-shot classification accuracy. This simple loss guides the embedding space towards privacy-preserved latent space without losing its generality. For Prototypical Networks, the prototypes are expected to be the centers of the privacy-preserved embeddings for each class. Although the sum of losses can be weighted, our experiments observed that weighting did not significantly impact the final accuracy of the few-shot image classification model as long as the weighting coefficients are non-zero. We outline the episodic training process used for building a FS-PIC model in Algorithm 1 and describe the notations used in Table I.
|#examples in the training set|
|#classes in the training set|
|#classes sampled per episode|
|#support examples sampled per class|
|#query examples sampled per class|
Datasets: 1) Omniglot [lake_one_2011] is a dataset of 1623 handwritten characters collected from 50 alphabets. Each character has 20 examples drawn by a different human subject. We follow the same procedure as in [vinyals_matching_2016] by resizing the gray-scale images to and augmenting the character classes with rotations in multiples of 90 degrees. Our training, validation, and testing split is of sizes 1028, 172, and 423 characters, respectively (or with augmentation). 2) CelebFaces Attributes Dataset (CelebA) [liu2015faceattributes] is a large-scale face attributes dataset with more than 10K celebrity (classes) images. For the purpose of our experiments, we select classes that have at least 30 samples. This gives us 2360 classes in total, out of which 1510 are used for training, 378 for validation, and 427 for testing. We use aligned and cropped version of the dataset in which images are of dimension . We center crop each image to and then resize to . 3) MiniImageNet [vinyals_matching_2016] dataset contains 100 general object classes where each class has 600 color images. The images are resized to , and the dataset is split into 64 training, 16 validation, and 20 testing classes following [snell_prototypical_2017].
Settings for Privacy Methods: We explore the following parameters for each privacy method. Gaussian Blur with radius is used for blurring images. A filter window of size where is used for pixelization. The pixelated image is then resized to match the model input dimensions. We perform experiments with Gaussian noise with mean = 0 and standard deviation . For DP-Pix, we fix , and vary pixelization parameter with values .
Denoising Network: We use a lighter version of the DnCNN [zhang_beyond_2017] model i.e., with 8 CNN layers instead of 17, for first denoising the image and subsequently feeding the denoised image into one of the following embedding networks. Conv-4 is a 4-layered convolutional neural network with 64 filters in each layer originally proposed in [snell_prototypical_2017] for few-shot classification. ResNet-12 is a 12-layer CNN with 4 residual blocks. It has been shown to have better classification accuracy on few-shot image classification tasks. The architecture of the two embedding networks are detailed in Fig. 10 of Appendix.
Training and Evaluation: We train using N-way K-shot PIC tasks (Algorithm. 1) and use Adam optimizer with learning rate
with a decay of 0.5 every 20 epochs. TableII
lists the hyperparameters for the three datasets. The network is trained to minimize total loss of denoiser and classifier (Eq.7). We evaluate the performance by sampling 5-way 5-shot PIC tasks (with same privacy settings) from the test sets and measure the classification accuracy. The final results report the performance averaged over 1000 test episodes for the Omniglot dataset, and 600 test episodes for both MiniImageNet and CelebA datasets. To measure the effectiveness of the proposed denoising embedding space, we both train and evaluate each model’s performance in two settings: 1) without using the denoiser and 2) jointly training the denoiser with the classifier i.e., the proposed Denoising Network.
Privacy Risk Evaluation: Privacy attacks on trained models such as model inversion[fredrikson2015model] and membership inference[shokri2017membership]
are not applicable in our setting because the denoising and embedding models are trained with publicly available classes (data) using meta-learning. The user-supplied test data (support and query set) are obfuscated for privacy protection. A practical privacy attack on obfuscated images is to infer the identities using existing facial recognition systems and public APIs, e.g., Rekognition. In this study, our goal is to investigate (1) the efficacy of the studied image obfuscation methods for privacy protection and (2) whether the proposed denoising approach has effects on privacy. To simulate a powerful adversary, we apply the state-of-the-art face recognition techniques, e.g., FaceNet with the Inception ResNet V1 network[facenet], on the CelebA dataset; MTCNN [MTCNN] is applied to detect and resize the facial region in each input image. Specifically, 1000 entities were randomly selected from the CelebA dataset. For each entity, we randomly sampled 30 images, which were then partitioned between training and testing (20 : 10). Different versions of the test set were generated by applying image obfuscation methods with various parameter values (denoted as Noisy) and by applying the proposed Denoising Network (denoted as Denoised). We fine-tuned the Inception network and trained an SVC classifier on the clean training data. In Fig. 5, we report the accuracy on the noisy and denoised test sets, i.e., success of re-identification, with higher values indicating higher privacy risks.
Vi Results and Discussions
Vi-a Task Difficulty
The average 5-way 5-shot classification accuracy of our baseline few-shot model [snell_prototypical_2017] trained on clean images and tested on clean images is 99% on Omniglot dataset, 91% on CelebA dataset, and 61% on MiniImageNet dataset using Conv-4 encoder (Table III). This shows the approximate level of difficulty of few-shot tasks for each dataset i.e., Omniglot tasks are easy, tasks from CelebA have medium difficulty, and MiniImageNet tasks are hard.
We compare results for few-shot private image classification using three models in Fig. 3:
Baseline Few-Shot Model: When the few-shot model is trained on clean images and is tested on noisy images.
Noisy Few-Shot Model Without Denoiser: When the baseline few-shot model is trained on noisy images and is tested on noisy images with same privacy settings.
Noisy Few-Shot Model With Denoiser: When the baseline few-shot model is jointly trained with the denoiser on noisy images and is tested on noisy images with same privacy settings (Algorithm 1).
In all cases, we observe that noisy few-shot models outperforms the baseline few-shot model with wide gap. Also, in most cases, we note that adding a denoiser improves the accuracy. To better observe the effectiveness of denoiser, in Fig. 4, we quantify the improvement by calculating . We also quantify the change to the original image caused by the privacy method (post denoising) by calculating Structural Similarity Index (SSIM) [SSIM] between denoised image and original clean image, averaged over 100 test images for each dataset and privacy parameter.
Blurring, Pixelization and Gaussian Noise: As we increase the value of privacy parameters, the SSIM decreases, suggesting the higher dissimilarity between the denoised images and the original image (Fig 4). Despite the degradation caused by the privacy method to the original image, we observe positive % Gains for all three datasets. Specifically, on hard tasks (MiniImageNet), a gain of upto 15% in accuracy () with the proposed Denoising Network, reaffirming the generality of the few-shot learning. For easy (Omniglot) and medium (CelebA) tasks, where the baseline accuracy is already high, a relatively small positive gain of up to 5% () is reported.
DP-Pix: As we increase the size of pixelization window (), the amount of Laplace noise that we add to the image decreases (as defined by ); however, the image quality decreases because of increasing pixelization. Therefore, we observe a trade-off point where the accuracy first increases and then decreases as we increase (Fig 3). This trade-off is particularly observed for CelebA and MiniImageNet datasets. For Omniglot dataset, the performance just decreases with increasing because of the low resolution images in the dataset. From Fig. 4, we observe that DP-Pix has the lowest SSIM values when compared with other privacy methods causing the most notable changes on the original image. Interestingly, we note that even with low SSIM values, we find instances that exhibit moderate % gain i.e., at indicating the presence of privacy preserving denoising embeddings. Results using ResNet-12 encoder are presented in Fig. 11 and 12 of Appendix.
Vi-C Empirical Privacy Risks
As described earlier, this experiment showcases the efficacy of image obfuscation methods against a practical adversary who utilizes state-of-the-art face recognition models trained on clean images. Furthermore, this experiment investigates whether the proposed denoising network has effects on privacy protection empirically. To this end, we vary the algorithmic parameters of the privacy methods and report the face re-identification rates on both obfuscated and denoised images in Fig. 5. For the clean test set sampled from CelebA, the re-identification accuracy is 68.12.
Blurring: In Fig. 4(a), the privacy risks are quite high with Gaussian kernel size for both blurred and denoised images. As we increase the radius , the chance of re-identifying the image decreases rapidly. We also observe that after denoising, the blurred images are more likely to be re-identified. For instance, at , the re-identification rate is 7.34% for blurred images but 54.01% for denoised images.
Pixelization: As shown in Fig. 4(b), small cell size in pixelization, e.g., , leads to high face re-identification rates for both pixelized and denoised images. Increasing helps reduce the rate of face re-identification rapidly, e.g., from 53.73% to 4.82% for pixelized images by increasing from 2 to 6. Denoising slightly increases the privacy risk, but the additional risk diminishes with larger values and is much lower than observed in blurring.
Additive Gaussian: Over the range of values studied in our experiments, Additive Gaussian inflicts lower privacy risks with a small noise (), compared to Blurring and Pixelization. As shown in Fig. 4(c), increasing leads to a moderate reduction in the privacy risk. For example, face re-identification rate is 5.11% at mid noise level (), reduced from 8.54% at low noise level (). Denoising the obfuscated images leads to a significant increase in the privacy risk at low values, e.g., 33.55% higher in face re-identification rate when .
DP-Pix Fig. 4(d) presents the re-identification results for images obfuscated with DP-Pix as well as those denoised by our proposed approach. We observe low privacy risks across all values. Furthermore, performing denoising on DP-Pix obfuscated images does not lead to significant higher privacy risks with any value, as opposed to other image obfuscation methods. While face re-identification rates are consistently low, higher rates occur when and . Recall that higher utility was observed when and in Fig. 3. It has been reported in [fan_image_2018] that the quality of obfuscated images may be optimized by tuning value given the privacy requirement , by balancing the approximation error by pixelization and the Laplace noise with scale .
Vi-D Qualitative Evaluation of Privacy Methods
Fig. 6 provides a qualitative evaluation on the obfuscated and denoised images generated for a range of parameter values. MTCNN [MTCNN] was applied to a sample input image of CelebA, to detect the facial region. Perceptually, the proposed denoiser may improve the image quality upon the obfuscated images to various extents. However, image quality does not always correlate with empirical privacy risks, i.e., face re-identification with public models. In combination with Fig. 5, we observe that the proposed denoising leads to various levels of privacy risk increment, while producing higher quality images. For example, the results show higher privacy risk increment for Blurring with (23.33%), moderate increment for Pixelization (10.14%) and Additive Gaussian (7.96%), and little increment for DP-Pix . Fig. 6 confirms that the denoiser performance may vary depending on the image obfuscation method, and that DP-Pix provides consistent privacy protection even with denoising. Additional qualitative results are presented in Fig. 13 of Appendix.
Vi-E Observation of the Privacy-Preserved Embedding Space
Fig. 7 shows the evolution of the embeddings in the process of privacy encoding and privacy-preserved representation learning by presenting the t-SNE [van2008visualizing] visualization of the clean, noisy, and denoised embeddings of randomly sampled 100 test images from a total of 5 classes from the CelebA dataset. The embeddings are obtained from the ResNet-12 encoder trained under different noise settings for 5-way 5-shot classification. We say the embeddings are clean when the input images have no noise and the encoder is trained for few-shot classification of clean images. The noisy embeddings are obtained by using the encoder trained for few-shot classification of noisy images and without using the denoiser. The denoised embeddings are obtained by the proposed Denoising Network (Fig. 2) i.e., the encoder trained in conjunction with denoiser for few-shot classification on noisy images.
We report the results for a case when a few-shot method such as Prototypical Networks can generate good clusters for the clean images (Fig. 7a), and observe the impact on clustering with noisy images and subsequently when those images are denoised. We notice that when the initial clusters are good, pixelization (Fig. 7c) and blurring (Fig. 7e) will have little impact on the quality of the clusters even with the high amount of noise. Therefore, pixelization and blurring maintain generality (robust to noise) and are also vulnerable to re-identification. Gaussian noise (Fig. 7b) distorts the initial clusters more significantly, which can lead to lower few-shot classification performance. Applying denoising to Gaussian noise improves the clustering results, however still poses moderate privacy threat as seen in re-identification experiments (Fig. 4(c)). Similarly, with DP-Pix (Fig. 7d), the original clusters are also distorted upon obfuscation. But, when denoised with proposed Denoising Network, we can observe better clustering performance. Because of DP-Pix’s privacy guarantee and lowest re-identification rates, we can say that the obtained denoised embeddings are privacy-protected i.e., the network finds the privacy-preserved embedding space which maintains generality (robust to noise) and also preserves privacy.
Vii Related Works
Xie et al. [Xie2020SecureCF] incorporate differential privacy into few-shot learning through adding Gaussian noise into the model training process [abadi_deep_2016] to protect the privacy of training data. [Xue2021DifferentiallyPP, Huai2020PairwiseLW] have also provided a strong privacy protection guarantee in pairwise learning for training data. On the other hand, [Gelbhart2020DiscreteFL] propose to use hashing to store the embedding of the input images. Similar to cryptographic approaches [cabrero2021sok], the work [Gelbhart2020DiscreteFL] incurs high computational complexity to achieve accuracy. Differently, our approach addresses the privacy of user data at source (i.e., the images are already privatized before the server sees them) with strong privacy protection. To the best of our knowledge, ours is the only approach that addresses privacy in the context of few-shot metric learning for user-supplied training and testing data.
Viii Conclusion & Future Work
In this paper, we present a novel framework for training a few-shot private image classification model, which aims to preserve the privacy of user-supplied training and testing data. The framework makes it possible to deploy few-shot models in the cloud without compromising users’ data privacy. We discuss and confirm that there exists a privacy-preserved embedding space which has both stronger privacy and generalization performance on few-shot classification. The proposed method provides privacy guarantees while preventing severe degradation of the accuracy as confirmed by results on three different datasets of varying difficulty with several privacy methods. Evaluation with re-identification attacks verifies the low empirical privacy risk of our proposed method, especially with DP-Pix. While our study focuses on well-known image obfuscation methods, future research may explore scenarios where users could apply novel image obfuscation methods locally, i.e., different from those applied to training data. Furthermore, our results motivate the future direction of searching for a more effective privacy-preserved space for few-shot learning in other domains such as speech [parnami2020few]
. Examination of other evaluation metrics for privacy-preserved embedding space will promote the relevant future study. We release the code for our experiments athttps://github.com/ArchitParnami/Few-Shot-Privacy.