Segmentations-Leak: Membership Inference Attacks and Defenses in Semantic Image Segmentation

by   Yang He, et al.

Today's success of state of the art methods for semantic segmentation is driven by large datasets. Data is considered an important asset that needs to be protected, as the collection and annotation of such datasets comes at significant efforts and associated costs. In addition, visual data might contain private or sensitive information, that makes it equally unsuited for public release. Unfortunately, recent work on membership inference in the broader area of adversarial machine learning and inference attacks on machine learning models has shown that even black box classifiers leak information on the dataset that they were trained on. We present the first attacks and defenses for complex, state of the art models for semantic segmentation. In order to mitigate the associated risks, we also study a series of defenses against such membership inference attacks and find effective counter measures against the existing risks. Finally, we extensively evaluate our attacks and defenses on a range of relevant real-world datasets: Cityscapes, BDD100K, and Mapillary Vistas.



There are no comments yet.


page 9


Privacy Risks of Securing Machine Learning Models against Adversarial Examples

The arms race between attacks and defenses for machine learning models h...

Membership Inference Attacks on Machine Learning: A Survey

Membership inference attack aims to identify whether a data sample was u...

Politics of Adversarial Machine Learning

In addition to their security properties, adversarial machine-learning a...

Membership Inference Attacks on Sequence-to-Sequence Models

Data privacy is an important issue for "machine learning as a service" p...

White-box vs Black-box: Bayes Optimal Strategies for Membership Inference

Membership inference determines, given a sample and trained parameters o...

Evaluating Membership Inference Through Adversarial Robustness

The usage of deep learning is being escalated in many applications. Due ...

With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Online Regression Models

With the rise of third parties in the machine learning pipeline, the ser...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The availability of large datasets is playing a key role in today’s state of the art computer vision methods ranging from image classification (e.g. ImageNet 

[7]), over semantic segmentation [6, 19, 33], to visual question answering [2]. Therefore, research and industry alike have recognized the importance of large-scale datasets [7, 13, 29, 37] to push performance of computer vision algorithms. However, data collection and in particular annotation and curation of large datasets comes at a substantial cost. There are sizable efforts from the research community [6, 10, 33], and also industry has picked up the task of collection (e.g. [19]) as well as providing annotation services such as Amazon MTurk, which in turn can be monetized and constitutes important assets to companies.

Figure 1: We present the first study of membership inference attacks and defenses for black-box semantic segmentation models. For attacks, we propose a specific pipeline and methods for semantic segmentation. For defenses, we show feasible solutions from our systematic comparison study.

Consequently, such assets need protection e.g. as part of intellectual property and it should be controlled which parts are made public (e.g. for research purposes) and which part remain private. Based on these datasets, high performing models are trained and then made public (e.g. as black box models) via an API or as part of a product. One might assume that the information of the training set remains contained within the trained parameters of the model and therefore remains private. Beyond the aspect of intellectual property, data might also include private information that were captured as part of the data collection process, which are sensitive and important for safe and clean services.

Unfortunately, recent work on membership inference attacks [24, 25, 27] has shown that even a black box model leaks information of the training data, aiming to infer if a particular sample was used as part of the training data or not. Such approaches have shown high success rates on a range of classification tasks and have equally proven to be hard to fully prevent (= defend). While this constitutes a potential threat to the machine learning model, it can also potentially be used as a forensics technique to detect a potentially unauthorized use of data.

However, we are still missing even a basic understanding on if and how these membership attack vectors extend to semantic segmentation, which is a basic computer vision task and has broad applications 

[4, 12, 14, 15, 35]. Hence, we propose and study first membership inference attacks and defenses for semantic segmentation, as presented in Figure 1. To reach this goal, we design an attack pipeline based on per-patch analysis, and discover (1) not all the areas of an input are helpful to membership inference, (2) structural information itself leaks membership privacy and (3) effective defense mechanisms exists that can reduce the effectiveness of these attacks substantially. Accordingly, we highlight our contributions to segmentation task and review relevant work.

1.1 Contributions

Our main contributions are as follows. (1) We present the first work on membership inference attacks against semantic segmentation models under different data/model assumptions. (2) We show structural outputs of segmentation have severe risks of leaking membership. Our proposed structural loss maps achieve the best attack results. (3) We present a range of defense methods to reduce membership leakage. In the end, we show feasible solutions to protect against membership attacks. (4) Extensive comparisons and ablation studies are provided in order to shed light on the core challenges of membership inference attacks for semantic segmentation.

1.2 Related Work

Recent attacks against machine learning models have drawn much attention to communities focusing on attacking model functionality (e.g., adversarial attacks [9, 16, 17, 21, 28, 32]), or stealing functionality [22] or configurations [20] of a model. In this paper, we detail the topics of data privacy and security in the following.

Membership inference attack Membership inference attacks have been successfully achieved in many problems and domains, varying from biomedical data [3], locations [23], purchasing records [25], and images [27].

It has been shown that machine learning models can be attacked to infer the membership status of their training data. Shokri et al. [27] proposed membership inference attacks against classification models utilizing multiple shadow models to mimic behaviors of the victim model. Shadow models were trained by querying the victim model using examples with higher confidences from the victim model. Hence, a binary classifier was trained with information from shadow models, and applied to attack the victim. Further, Salem et al. [25] demonstrated only one shadow model is enough to reach similar results rather than multiple shadow models. They also show that underlying distributions of data used to train shadow models and the victim can be different, which allows for attacks under relaxed assumptions. In addition, learning free attacks were proposed, which constitutes a low-skill attack without knowledge about the model and data distribution priors. Salem et al. [25] proposed to directly set a threshold on the confidence scores of predictions to recognize memberships. Sablayrolles et al. [24] set a threshold on loss values and achieved quite successful results. While prior work has only studied classification models so far, our contribution is the first study of attacks and defenses on semantic segmentation models. Although the segmentation problem can be understood as pixel-wise classification, it turns out the derived information is weak and needs to be aggregated over a patch or even the full image for a successful attack. Beyond this, we propose the first dedicated attacks that fully leverage the information of the full segmentation output and hence lead to even stronger attack vectors.

Privacy-preserving machine learning

The goal of these techniques is to reduce information leakage during training with limited access to training data, which have been applied to deep learning

[1, 26]. Differential privacy [8] allows learning the statistical properties of a dataset while preserving the privacy of the individual data points in it. Particularly, Nasr et al. [18] provided membership protection for a classifier by training a coupled attacker in an adversary manner. Zhang et al. [34] obfuscated training data before feeding them to the model training task, which hides the statistical properties of an original dataset by adding random noises or providing new samples. In our work, we compare a series of defense approaches to mitigate membership leakage in semantic segmentation.

2 Attacks against Black-box Semantic Segmentation Models

Membership inference is the task of inferring if a particular data point was part of the training data or not. Membership inference attacks against classification models exploit overfitting artifacts on training data [27]. Typical models tend to be overconfident on data points that were seen during the training. Such overfitting issues lead to characteristic patterns and distributions of confidence scores or loss values which has facilitated membership inference attacks. We show how such attacks can equally be constructed against models for semantic segmentation. While such models can be understood as pixel-wise classification, it turns out that the information that can be derived from a single pixel is rather weak. Hence, we develop a method that aggregates such information over patches and full images to arrive at stronger attacks. Finally, we show how the full structure of the segmentation output can be leveraged to perform attacks that even work on the final label map without confidences – a setting where previous attacks on classification models would completely fail.

We first describe our pipeline for attacking segmentation models, and then present two attack settings exploited in our study, which have different constraints during attacks. Furthermore, we discuss our evaluation methodology, and then show evaluation results.

2.1 Methods

Our approach infers membership information on a patch level based on the output of the segmentation model, which is then further aggregated to an image-level attack. In this section, we explain how to train and utilize a shadow model to train such an approach. Furthermore, several design choices concerning data representation and patch selection are discussed that significantly contribute to the effectiveness of the attack. In the following, we assume a victim semantic segmentation model V, and a test dataset , comprised of data pairs used in training and others not used .

Construction of shadow model S. In line with previous work on membership inference, we construct a shadow model S that is to some extend similar to the victim segmentation model and therefore is expected to exhibit similar behaviour and artifacts w.r.t. membership. The exact assumptions of our knowledge on the victim model that inform the construction of the shadow model are detailed in section 2.2. We prepare a dataset to train S, which is a semantic segmentation model. S aims to capture semantic relations and dependencies between different classes in structured outputs. S is used to provide training data to the patch classifier with known membership labels.

Construction of per-patch attack A. S provides training examples for A, since we have complete membership information of S. This allows us to train a binary classifier A for per-patch attack in order to predict if an example was in the training data or not. Here, we apply a ResNet-50 [11] to learn a classification model. It is able to captures local In/Out patterns and summarize to a final output with global average pooling at the end of ResNet-50. Besides, it also allows us to interpret results with class activation maps [36], which is different to the attacker for image classification without any interpretations. We discuss different choices for the representation used by this classifier next.

Construction of image-level attack A. In order to further amplify the attack, we aggregate the information of the per-patch attack on an image -level. For a given image, we crop several patches, and feed them into A to obtain classification scores of them. Finally, our model for image-level attack is calculated by


where (X , Y) is the -th patch of a testing image pair. We discuss different selection strategies for the patches below.

Data representation. In line with prior work on membership inference, we assume that we want to answer the membership query for a particular data point together with the corresponding groundtruth. In our work, we present two representations for training A in below:

1. Concatenation. We concatenate the posteriors of a structured prediction and its GT as the input of a binary classifier. This allows the model to not only judge the score and structure of the predictions, but also take into account if the predictions represent correct or misclassifications.

2. Structured loss map. We compute a dense structured loss map from prediction posteriors and GT, and feed the loss map as the input of a binary classifier. Previous work [24] shows the success of applying a threshold on the loss value of an image pair for image classification, and this method can be easily applied to semantic segmentation. Despite this, we show keeping structures of loss maps is still crucial to membership inference attacks for semantic segmentation, and helps to achieve stronger attacks. Consequently, a binary classifier is able to find some In/Out loss patterns on a structured loss map.

Selection of patches. As our method is based on scoring each patch with the per-patch attack, the selection of patches plays an important role in obtaining stronger attacks. Therefore, we study the influence of different patch selection schemes with the following choices:

1. Sliding windows. We crop patches on a regular grid with a fixed step size.

2. Random locations. We sample patches uniformly across the image.

3. Random locations with rejection. We emphasize the importance of different patches for recognizing membership is not alike, therefore, this scheme aims to reject patches which do not contribute to final results or even provide misleading information. In our study, we observe the patches with too strong confidences or too small loss should be ommitted. For example, road area counts for most pixels of an image and are segmented very well, therefore, this scheme tends not to utilize the center of a road, instead to select its borders to other classes. Therefore, we propose to sample random candidate locations and reject based on confidence scores or loss value for these patches – depending on the construction method of A.

To conclude, we construct image-level membership inference attacks according to per-patch attacks. This pipeline allows us leveraging distinct patches for successful attacks.

Besides, our patch-based attack pipeline is flexible w.r.t. image sizes and aspect ratios in case different image sizes exist in a dataset, or even crossing multiple datasets.

2.2 Attack Settings

In our method, we train a shadow segmentation model S and an attacker A for attacking a victim segmentation model V. Our two attack settings differ in the knowledge on data distribution and model selection for training V and S.

Data & model dependent attacks: This attack assumes that the victims model can be queried at training time of an attacker. Besides, this setting allows to train a shadow model with the same architecture to the victim. Specifically, S and V have the same learning protocol and post-processing techniques during inference. Further, this attack assumes the data distributions of and are also identical, which comes from the same database. As a result, this attacker is constructed with five steps: (1) Prepare data for attack; (2) Query with a victim; (3) Rank with a criterion (i.e., confidence scores, or loss value); (4) Train a shadow model with top-ranked examples; (5) Train an binary classifier for attack, which may take different forms as an input.

Data & model independent attacks: For this attack, we only know the victim model’s functionality and a defined label space. There is no query process for constructing training set for S, instead, S is able to be trained with a dataset of different distribution, which leads to a cheaper and more practical attack. Furthermore, the model configuration and training protocol of the victim are unknown. The goal of the shadow model is to capture the membership status for each example, and provide training data for attack model A. Particularly, we highlight the severity of information leakage in this simplified attack. Model and data distribution are completely different to victims, even there is no query process, which might be detected on the server. In this setting, we attack a victim model in three steps: (1) Prepare data for attack; (2) Train a shadow model; (3) Train an attacker.

2.3 Evaluation Methodology

We evaluate the performance of membership inference attacks with precision-recall curves and receiver operating characteristic (ROC) curves. We regard the images used during training as positive examples, and negatives if not. Therefore, given a testing set with image pairs used to train a model and

pairs not used, random guess with probability 0.5/0.5 for both classes is able to achieve precision

and recall 0.5. We set different thresholds in a classifier and compare its precision-recall curve to the random guess performance, to observe if attacks are successful. Similarly, we draw the random guess behavior in a ROC curve, which is the diagonal of a plot. Furthermore, to compare different attacks quantitatively, we apply maximum F-score () in precision-recall curves and AUC-score in ROC curves to evaluate attack performance. Last, our method is based on per-patch attacks, therefore, we employ the same metrics for per-patch evaluation, to help us understand and compare different attacks, as well as defense methods in section 3, exhaustively.

2.4 Evaluation Results

Data and architectures. We conduct the experiments on street scene semantic segmentation between various datasets, including Cityscapes [6], BDD100K [33] and Mapillary Vistas [19], which are captured in different countries under diverse weathers and image qualities, providing multiple domains of street scenes. In addition, we apply PSPNet [35], UperNet [31], Deeplab-v3+ [5] and DPC [4] to train our segmentation models. For per-patch attackers, we train a ResNet-50 [11] from scratch, allowing us to visualize the regions contributing to the recognition of membership for an example by class activation mapping [36]. In details, we modify the ResNet-50 to downsample an input by 8 in spatial, and feed a 9090 input block into the attacker, corresponding to 713713 image patches. Finally, we also compare our pipeline to previous attackers for classification models [24, 25], to demonstrate the effectiveness of specific considerations for segmentation models.

Setup for data & model dependent attacks. For dependent attacks, we conduct experiments with Cityscapes and PSPNet (a.k.a. PSPPSP). We split Cityscapes into four parts, i.e., , , and , where the sizes of those sets are as follows: 1488, 912, 555 and 520. We train a victim model from ImageNet [7] pretrained models and lead to 59.88 mean IoU (mIoU) for segmentation. For evaluation of per-patch attacks, we sample 29760 patches from and 30096 patches from . Therefore, this setting leads to chance-level accuracy of random guess for image-level and per-patch attacks at the precision of and respectively. Besides, the F-scores of random guess for image-level and per-patch attacks are 0.5536 and 0.4985, which are drawn as a reference in Figure 2.

Dataset Model Backbone In / Out
Cityscapes (Victim)
PSPNet [35]
UperNet [31]
ResNet-101 [11] 2975 / 500
BDD100K (Shadow)
Mapillary (Shadow)
Deeplab-v3+ [5]
DPC [4]
Xception-71 [30]
4k / (3k+1k)
10k / (8k+2k)
Table 1: Data and model descriptions of victim and shadow models for independent attacks.
ID Class ID Class
0 Road 13, 24, 41
Road, Lane Marking - General,
1 Sidewalk 2, 15 Curb, Sidewalk
2 Building 17 Building
3 Wall 6 Wall
4 Fence 3 Fence
5 Pole 45, 47 Pole, Utility Pole
6 Traffic Light 48 Traffic Light
7 Traffic Sign 50 Traffic Sign (Front)
8 Vegetation 30 Vegetation
9 Terrain 29 Terrain
10 Sky 27 Sky
11 Person 19 Person
12 Rider 20, 21, 22
Bicyclist, Motorcyclist,
Other Rider
13 Car 55 Car
14 Truck 61 Truck
15 Bus 54 Bus
16 Train 58 On Rails
17 Motorcycle 57 Motorcycle
18 Bicycle 52 Bicycle
Table 2: Label-ID transformations from Mapillary Vistas to Cityscapes.
Methods Dependent Attacks Independent Attacks
PSPPSP Deeplab-v3+ PSP Deeplab-v3+ Uper DPCPSP DPCUper
F-score (%) AUC (%) F-score (%) AUC (%) F-score (%) AUC (%) F-score (%) AUC (%) F-score (%) AUC (%)
Adapted Salem et al. [25] 77.2 67.2 92.4 63.5 92.3 62.6
Adapted Salem et al. [25] 77.4 62.0 92.3 63.4 92.3 59.2 92.3 63.4 92.3 59.2
Adapted Sablayrolles et al.[24] 82.2 74.9 94.4 81.4 93.0 72.4 94.4 81.4 93.0 72.4
Ours (C+GT, Full) 80.6 81.2 94.5 85.0 92.8 71.8 93.2 73.5 92.6 68.8
Ours (Loss, Full) 84.2 82.6 95.7 89.1 93.2 76.3 93.1 73.5 92.4 68.3
Ours (C+GT, Random) 83.4 82.7 95.0 86.1 95.4 88.5 92.9 74.9 94.4 85.5
Ours (Loss, Random) 84.8 84.6 95.7 90.8 95.8 94.3 94.0 77.7 93.3 79.4
Ours (C+GT, Rejection) 83.3 83.0 94.9 86.3 95.3 91.2 93.5 76.3 94.4 86.1
Ours (Loss, Rejection) 86.7 87.1 95.9 91.1 96.2 94.9 94.1 77.8 93.5 82.0
Table 3: Comparison of different attackers. We compare our attackers to previous methods, including the learning-based attacker [25] and learning-free attackers by applying a threshold on a confidence score [25] or a loss value [24].
Figure 2: Evaluation of the importance of spatial structures for PSPPSP, starting from our final model (Size 90). The first row draws precision-recall curves, and best F-scores are presented. The second row draws ROC curves, and AUC-scores are presented.

Setup for data & model independent attacks. For independent attacks, we employ different segmentation models for a shadow model and a victim, as summarized in Table 1. Particularly, BDD100K has completely compatible label space to Cityscapes of 19 classes, but Mapillary Vistas has 65 labels. To handle this situation, we pick up 25 classes from 65 classes, and set others as ignored regions. Some conceptual similar classes are merged and then a label space of 19 categories is created, which is compatible with Cityscapes. Table 2 shows the details of our merged label space. Pixels who do not appear in the third row of Table 2, are set to ignored labels, which have a label value 255. For victim models, we train a PSPNet and an UperNet using the official split of Cityscapes, leading to 79.7 and 76.6 mIoU for segmentation. For shadow models, we apply our splits to balance the data used in training (In) and testing (Out) for providing training data of a binary classifier. In the end, the F-score of random guess for image-level independent attack is 0.6313. We can compare this number to all the attackers in Table 3, and observe the severe information leakage of semantic segmentation models.

Results. Results of the different versions of our model as well as comparision to previous work in presented in Table 3. While previous work on membership inference targets classification models [25, 24], we facilitate a comparison to these approaches by extending them to the segmentation scenario. [25] proposes a learning-based attacker and a learning-free attacker. We train their learning-based attacker with 11 vector inputs, and test on all pixel locations. Final image-level attacks are obtained by averaging the binary classification scores of all locations. Similar to our method, we test different settings, and it fails to achieve attacks with the shadow model DPC [4] in Table 1. Besides, we test their learning-free attacker by averaging the confidence scores of all locations. Equally, we facilitate a comparison to [24] where we use the loss map for the segmentation output. For our methods, we report the numbers for last two patch selection strategies with sampling 10 patches. Besides, we also perform attacks with full image inputs using our binary classifiers, which have a global average pooling in the end and are able to handle different sizes of inputs. We emphasize that the ratio of In/Out testing examples are different for dependent and independent attacks, therefore, the numbers between them cannot be compared. We conclude that recent models for semantic segmentation are susceptible to membership inference attacks with AUC scores of the attacker up to 87.1 in the dependent and 94.9 in the independent setting. Overall, we observe that our loss-based method with rejection scheme performs best in most settings and measures.

Figure 3: Image-level comparison results w.r.t. patch selection and data representations, under varying patch numbers.

Importance of spatial structures. Key to strong membership performance is exploiting the structural information of the spatial output we observe from a segmentation model. Hence, we conduct attacks with gradually reduced structural inputs in our dependent attacks in order to analyze the importance of this structural information for our goal. Our final model takes 9090 blocks as inputs for per-patch attackers. Therefore, we crop sub-blocks from our final model with input sizes of 60, 45, 30, 15 for providing different level of structures. We compare the precision-recall curves and ROC curves for per-patch as well as image-level attacks in Figure 2. We note that all the feature vectors in the blocks of different sizes have the same scale of receptive fields. We apply the same architecture of per-patch attacker for sizes 90-30, but modify the architecture for size 15, because its spatial size is too small. We train a ResNet-50 which downsamples 8 in spatial, and it performs better than 8 downsampling for size 15. First, we compare the per-patch attack performance, and are able to observe that attacks become harder with decreasing patch sizes, where smaller patches provide less structures. Second, we compare image-level attacks for them, where random selection strategy is applied to integrate all patches. We sample 5, 20, 20, 30, 30 random patches for size 90, 60, 45, 30, and 15 to integrate image-level results. Consequently, size 90 achieves the best performance, even though other attackers obtain very close image-level results. Last, we highlight that our concatenation-based attacker degenerates to previous work [25] with 11 vector inputs. We observe that 11 inputs keep this decreasing trend and achieve worse results than size 15, which can be found in Table 3. From this results, we conclude that structures are of great importance in membership inference attacks for semantic segmentation, so that an attacker is able to mine some In/Out confidence or loss patterns over an array input.

Analysis of patch selection and data representation. We test our three sampling strategies and two representations as depicted in section 2.1. Figure 3 plots the image-level comparison results. For sliding windows, we sample at least 6 patches to guarantee an entire image can be covered. For random locations, we sample different numbers of patches for image-level attacks to observe the influence of patch numbers, starting from one patch. We conduct this experiments for 3 times and report the mean. In summary, we observe these two strategies achieves comparable performance when the same numbers of patches are used. Specifically, sliding windows perform better on dependent attacks with loss maps, and random locations are better for independent attacks (Deeplab-v3+ PSP, and Deeplab-v3+ Uper), which may be caused by inconsistent data distributions or different behaviors of segmentation models. Last, we test our random locations with rejection strategy. To avoid the affect of random seeds, we sample the same locations to previous random locations if a patch is not rejected. We can see clear improvements if we sample very few patches, whose results are sensitive to sampled locations. In street scenes, road has a large portion of pixels, therefore, it tend to sample a road patch, which has the highest accuracy over all the classes and less discrimination for In/Out classification. After ignoring those patches, performance is improved because the rejection helps us avoid those less informative patches. To conclude, not all the regions contribute to successful attacks for segmentation, that we need a regime to determine membership status of an image, instead of processing the whole like previous work for classification.

Comparing our patch-based attacks to the full image attacks, we realize using full images as inputs makes performance significantly decreased, even though the same classifier is applied. The classification for full images may be affected by misleading areas. Hence, partitioning an image into many patches helps focus on local patterns and makes a better decision. Besides, we observe that our rejection scheme achieves better performance than random scheme, which further supports our argument on the difference between segmentation and classification. In addition, our concatenation-based attacker outperforms  [25], which demonstrates the importance of spatial structures, similar to Figure 2. From our results, [24] is able to obtain acceptable performance but worse than our structured loss map-based attackers, which hold the structural information. Finally, our novel structured loss maps achieve better results than concatenation and other methods [25, 24] in most cases.

3 Defenses

To mitigate the leakage of membership, we aim to reduce or hide overfitting artifacts of predictions. Therefore, we study a range of defense methods to reduce the distribution gaps between training data and others w.r.t. confidence scores of predictions or loss values, including Argmax, Gauss, Dropout and DPSGD. The first two methods can be applied in any segmentation models and last two can be applied in deep neural networks.

Settings. We analyze the performance of image-level attacks according to random locations in this section, which are easily compared to the results without defenses in Table 3. We sample patches at the same locations for different defenses, and keep consistent to previous attacks. Because Gaussian noises, or dropout will change output distributions, rejection scheme may sample different patches. Therefore, we do not test our rejection scheme in this section. We apply AUC-score to compare different methods, where random guess always has 0.5 AUC-score for all the splits. Besides, for independent attacks, we report the settings of Deeplab-v3+ PSP and Deeplab-v3+ Uper, which have better attacks among all. Same to our attack protocol, our shadow and victim have the same post-processing and learning protocol in dependent attacks. In other words, we employ the same defense and strength factors on shadow and victim models in this setting. In contrast, we apply defenses on victim models only for independent attacks, because we do not have any knowledge of victims in this setting. Last, an ideal defense is supposed to make attacks hard and preserve segmentation utility at the same time. Therefore, we jointly observe membership protection capability and utility of segmentation, to compare different defense methods and seek for a better solution.

Figure 4: Performance comparison for Argmax defense.

3.1 Methods and results

Argmax. It only returns predicted labels instead of posteriors for an image. We use one-hot vectors to complete attacks for our methods and others [25, 24]. Obviously, previous learning-free attacker [25] based on confidence scores fails to recognize membership states, because every example has confidence 1. In Figure 4, we show the comparison results for all the other methods. Because argmax is very easy to be noticed, we train binary classifiers for independent attacks with argmax operation as well. In general, argmax only reduces membership leakage in segmentation models a little for all the attackers. A model already leaks information when it only returns predicted labels. To conclude, we highlight the difference to protecting classification, that argmax cannot successfully protect the membership privacy for segmentation.


To hide overfitting artifacts or patterns, we add Gaussian noises on the posteriors with different variances, varying from 0.01 to 0.1 with step 0.01 for independent attacks. To further test the defense for dependent attacks, we add very strong noises to variance 0.4. After noising, we set the values into 0 in case they are smaller than 0, and then normalize each location individually. Segmentation performance is decreased with stronger noises, therefore, we show the joint privacy-segmentation plots in Figure 

5 to observe the defense behaviors as well as the maintained utility of the segmentation method. First, we observe Gauss protects PSPNet and UperNet in independent attacks successfully, which reduces AUC-scores from  0.9 to less than 0.6, while only losing 0.2 mIoU. Second, we observe our loss-based attackers are more sensitive to Gaussian noises. Despite stronger attacks of structured loss maps, they are easier to protect with Gaussian noises. Finally, we realize this defense is hard to mitigate leakage for dependent attacks. Even though we employ very strong noises for this, losing mIoU from 59.88 to 23.17, it still has more than 0.75 AUC-scores for both attacks. To conclude, Gauss is hard to protect a model when the noises of the same distribution are added to victim and shadow models, and binary classifiers can pick useful information from noisy inputs.

Figure 5: Joint plots for Gaussian noise defense .
Figure 6: Joint plots for Dropout defense during test.
Figure 7: Joint plots for DPSGD defense.

Dropout. It is used to avoid overfitting in training a deep neural networks, that we applied in training our victim model with dropout ratio 0.1. However, it does not hide membership from our studies in section 2. Therefore, we enable dropout operation during testing to blur a prediction. We realize a network still produces decent results when we use a different dropout ratio. Hence, we apply dropout ratio 0.1, 0.5 and 0.9 to obfuscate a prediction at different degrees. We show the joint plots in Figure 6. From our study, we observe enabling dropout during test is able to slightly mitigate membership leakage, but segmentation performance decreases a lot when a large ratio is applied.

DPSGD. Differential Privacy SGD (DPSGD) [1] adds Gaussian noises on the clipped gradients for individual examples of a training batch, in a way that the learnt parameters and hence all derived results such as predictions are differentially private. As a result, the influence a single addition or removal of a training example can have is limited and therefore should hamper membership inference. We apply DPSGD in our study to protect a model. Before training, we collect gradient statistics over entire training data for different layers of a network, and set individual clipping factors for all the layers. Similar to other defenses, we also apply varying variances in Gaussian noises for DPSGD. We train PSPNet with Gaussian variances 10, 410 for dependent settings, and variances 10, 410, 810 for independent settings. For UperNet, we train with 10, 10, 310, 610. We show the joint plots in Figure 7, and observe that DPSGD successfully protect memberships in all the settings. Particularly, even we employ noises of 1e-6 for three segmentation models, they only reduce 1.12, 1.36 and 0.75 mIoU, while preventing leakage significantly. Therefore, we recommend DPSGD to train a segmentation model for protecting membership privacy in practise.

Summary of defenses. In spite of the success of membership inference under various settings, we point out feasible solutions which can significantly reduce the risk of information leakage. (1) Adding Gaussian noises helps prevent leakage under independent settings from unknown attackers, which is quite simple and able to directly apply for an existing model without further costs. (2) For deep neural networks, we suggest applying DPSGD to train a model, which successfully mitigates the leakage for all three attacks, with limited model degeneration, even though it adds Gaussian noises on the gradients during training and hence requires increased training time.

No Argmax Gauss Dropout DPSGD







Figure 8: Class activation maps (CAMs) and structured loss maps (SLMs) for independent attack Deeplab-v3+ Uper.

3.2 Interpretability

One of difference from attacking segmentation to classification is on the input form, which can be regarded as an image. Therefore, our method can provide interpretations for different examples, indicating important regions for recognizing membership status. Besides, interpretations also help us to understand and compare different defenses. We apply class activation maps (CAMs) [36] to highlight the areas that help to detect examples/patches from training set in Figure 8. Besides, we also compare the activation areas before and after defenses with structured loss maps.

First, we observe our attacker is able to mine some regions with specific objects or intersections between two classes, even our attacker has no interaction with a victim. Second, we compare the attacker’s different behaviors for those defenses. We can see argmax can simply change the CAM to different intensities, but still hold the major layout of the original CAM. For Gaussian noises, we employ variance 0.1 here, and can apparently observe noises on the structural loss map for all the pixel locations, therefore, it makes all the examples have a similar CAM, that strong activations form a rough circle for different examples. For dropout, it will change structured loss maps in many places and then change the CAM. In particular, it changes the locations with strong loss values more than others. For DPSGD, we can see it has very similar loss maps to the original model. The only differences are on some regions hard to segment. Even DPSGD changes the loss maps a little, the final CAMs are able to change a lot for some examples, therefore, it helps defend stealing memberships while preserving segmentation performance very well.

4 Conclusion

We have provided the first membership inference attacks and defenses for semantic segmentation models by extending previous membership attacker for classification and proposing a new specific representation (i.e., structured loss maps). Our study is conducted under two different settings with various model/data assumptions. We show that spatial structures are important to achieve successful attacks for segmentation, and our structured loss maps achieve the best results among all. Besides, we study defense methods to reduce membership leakage and provide safe segmentation. As a result, we suggest to add Gaussian noises on the posteriors in inference, or apply differential privacy SGD to train a model. We hope that our work contributes to the awareness of novel threats that modern deep learning models pose – such as leakage of information on the training data. Our contributions shows that such threats can be mitigated with little impact on the utility of the overall model.


  • [1] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang (2016) Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318. Cited by: §1.2, §3.1.
  • [2] S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. Lawrence Zitnick, and D. Parikh (2015) Vqa: visual question answering. In ICCV, Cited by: §1.
  • [3] M. Backes, P. Berrang, M. Humbert, and P. Manoharan (2016) Membership privacy in microrna-based studies. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Cited by: §1.2.
  • [4] L. Chen, M. Collins, Y. Zhu, G. Papandreou, B. Zoph, F. Schroff, H. Adam, and J. Shlens (2018) Searching for efficient multi-scale architectures for dense image prediction. In NeurIPS, Cited by: §1, §2.4, §2.4, Table 1.
  • [5] L. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, Cited by: §2.4, Table 1.
  • [6] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele (2016)

    The cityscapes dataset for semantic urban scene understanding

    In CVPR, Cited by: §1, §2.4.
  • [7] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In CVPR, Cited by: §1, §2.4.
  • [8] C. Dwork (2011) Differential privacy. Encyclopedia of Cryptography and Security. Cited by: §1.2.
  • [9] V. Fischer, M. C. Kumar, J. H. Metzen, and T. Brox (2017) Adversarial examples for semantic image segmentation. arXiv preprint arXiv:1703.01101. Cited by: §1.2.
  • [10] A. Geiger, P. Lenz, and R. Urtasun (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, Cited by: §1.
  • [11] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In CVPR, Cited by: §2.1, §2.4, Table 1.
  • [12] Y. He, W. Chiu, M. Keuper, and M. Fritz (2017) STD2P: rgbd semantic segmentation using spatio-temporal data-driven pooling. In CVPR, Cited by: §1.
  • [13] W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, P. Natsev, et al. (2017) The kinetics human action video dataset. arXiv preprint arXiv:1705.06950. Cited by: §1.
  • [14] G. Lin, A. Milan, C. Shen, and I. Reid (2017) Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In CVPR, Cited by: §1.
  • [15] J. Long, E. Shelhamer, and T. Darrell (2015) Fully convolutional networks for semantic segmentation. In CVPR, Cited by: §1.
  • [16] S. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard (2017) Universal adversarial perturbations. In CVPR, Cited by: §1.2.
  • [17] S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard (2016) Deepfool: a simple and accurate method to fool deep neural networks. In CVPR, Cited by: §1.2.
  • [18] M. Nasr, R. Shokri, and A. Houmansadr (2018) Machine learning with membership privacy using adversarial regularization. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Cited by: §1.2.
  • [19] G. Neuhold, T. Ollmann, S. Rota Bulo, and P. Kontschieder (2017) The mapillary vistas dataset for semantic understanding of street scenes. In ICCV, Cited by: §1, §2.4.
  • [20] S. J. Oh, M. Augustin, B. Schiele, and M. Fritz (2018) Towards reverse-engineering black-box neural networks. In ICLR, Cited by: §1.2.
  • [21] S. J. Oh, M. Fritz, and B. Schiele (2017)

    Adversarial image perturbation for privacy protection a game theory perspective

    In ICCV, Cited by: §1.2.
  • [22] T. Orekondy, B. Schiele, and M. Fritz (2019) Knockoff nets: stealing functionality of black-box models. In CVPR, Cited by: §1.2.
  • [23] A. Pyrgelis, C. Troncoso, and E. De Cristofaro (2018) Knock knock, who’s there? membership inference on aggregate location data. NDSS. Cited by: §1.2.
  • [24] A. Sablayrolles, M. Douze, C. Schmid, Y. Ollivier, and H. Jegou (2019) White-box vs black-box: Bayes optimal strategies for membership inference. In ICML, Cited by: §1.2, §1, §2.1, §2.4, §2.4, §2.4, Table 3, §3.1.
  • [25] A. Salem, Y. Zhang, M. Humbert, M. Fritz, and M. Backes (2019) Ml-leaks: model and data independent membership inference attacks and defenses on machine learning models. In NDSS, Cited by: §1.2, §1.2, §1, §2.4, §2.4, §2.4, §2.4, Table 3, §3.1.
  • [26] R. Shokri and V. Shmatikov (2015) Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pp. 1310–1321. Cited by: §1.2.
  • [27] R. Shokri, M. Stronati, C. Song, and V. Shmatikov (2017) Membership inference attacks against machine learning models. In IEEE Symposium on Security and Privacy (SP), Cited by: §1.2, §1.2, §1, §2.
  • [28] D. Stutz, M. Hein, and B. Schiele (2019) Disentangling adversarial robustness and generalization. In CVPR, Cited by: §1.2.
  • [29] C. Sun, A. Shrivastava, S. Singh, and A. Gupta (2017) Revisiting unreasonable effectiveness of data in deep learning era. In ICCV, Cited by: §1.
  • [30] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna (2016) Rethinking the inception architecture for computer vision. In CVPR, Cited by: Table 1.
  • [31] T. Xiao, Y. Liu, B. Zhou, Y. Jiang, and J. Sun (2018) Unified perceptual parsing for scene understanding. In ECCV, Cited by: §2.4, Table 1.
  • [32] C. Xie, J. Wang, Z. Zhang, Y. Zhou, L. Xie, and A. Yuille (2017) Adversarial examples for semantic segmentation and object detection. In ICCV, Cited by: §1.2.
  • [33] F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan, and T. Darrell (2018) BDD100K: a diverse driving video database with scalable annotation tooling. arXiv preprint arXiv:1805.04687. Cited by: §1, §2.4.
  • [34] T. Zhang (2018) Privacy-preserving machine learning through data obfuscation. arXiv preprint arXiv:1807.01860. Cited by: §1.2.
  • [35] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia (2017) Pyramid scene parsing network. In CVPR, Cited by: §1, §2.4, Table 1.
  • [36] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba (2016)

    Learning deep features for discriminative localization

    In CVPR, Cited by: §2.1, §2.4, §3.2.
  • [37] B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba (2017) Places: a 10 million image database for scene recognition. IEEE T-PAMI. Cited by: §1.