XOR Mixup: Privacy-Preserving Data Augmentation for One-Shot Federated Learning

06/09/2020 ∙ by MyungJae Shin, et al. ∙ 0

User-generated data distributions are often imbalanced across devices and labels, hampering the performance of federated learning (FL). To remedy to this non-independent and identically distributed (non-IID) data problem, in this work we develop a privacy-preserving XOR based mixup data augmentation technique, coined XorMixup, and thereby propose a novel one-shot FL framework, termed XorMixFL. The core idea is to collect other devices' encoded data samples that are decoded only using each device's own data samples. The decoding provides synthetic-but-realistic samples until inducing an IID dataset, used for model training. Both encoding and decoding procedures follow the bit-wise XOR operations that intentionally distort raw samples, thereby preserving data privacy. Simulation results corroborate that XorMixFL achieves up to 17.6



There are no comments yet.


page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Securing more data is essential in imbuing more intelligence into machine learning (ML) models. In view of this, the problem of utilizing the sheer amount of user-generated private data has attracted significant attention in both academia and industry (park2018wireless; kairouz2019advances; park2020extreme). Federated learning (FL) is one promising solution based on exchanging model parameters among devices without sharing raw data, thereby preserving data privacy (McMahan2016; Konecny2016; Yang:FLSurvey; Smith:FLSurvey). While effective under independent and identically distributed (IID) data distributions, the performance of FL is highly degraded under non-IID user-generated data in practice (Zhao2018; Oh20:CL). Indeed, when each device has scarce samples of specific labels, the classification accuracy under MINST and CIFAR-10 datasets is degraded by up to % and %, respectively, compared to the IID counterparts (zhao2018federated).

On this account, in this article we seek for an FL solution coping with non-IID data distributions. Inspired by the Mixup data augmentation technique (Vanilla Mixup) producing a synthetic sample by linearly superpositioning two raw samples (Zhang2018), we first propose an XOR based mixup data augmentation method (XorMixup) that is extended to a novel FL framework, termed XorMixFL.

Figure 1: A schematic illustration of XorMixFL in which XorMixup data augmentation is used for correcting non-IID data distributions while preserving raw data privacy.


The key idea is to exploit the flipping property of the bit-wise XOR operation : . For two devices and , XorMixup is operated as follows.

  1. [leftmargin=*]

  2. generates an encoded seed sample by mixing samples and in different labels, transmitted to .

  3. decodes by mixing and its own sample having the same label of , via , producing a synthetic sample that is similar to but different from .

While both (i) and (ii) preserve raw data privacy between two devices, (ii) improves the synthetic sample’s authenticity, increasing one-shot FL accuracy as detailed next.


As illustrated in Fig. 1, by applying XorMixup to a one-shot FL framework having only one communication round (Guha:19; HybridFL), each device in XorMixFL uploads its encoded seed samples to a server. The server decodes and augments the seed samples using its own base samples until all the samples are evenly distributed across labels. The server can be treated as one of the devices, or a parameter server storing an imbalanced dataset. Then, utilizing the reconstructed dataset, the server trains a global model that is downloaded by each device until convergence.


To the best of the authors’ knowledge, this is the first piece of FL research based on XOR operations for addressing the non-IID data problem. Under a non-IID MNIST dataset, simulation results corroborate that XorMixFL achieves up to % and % higher accuracy than standalone ML and Vanilla FL, respectively. For an ablation study, we additionally propose a baseline one-shot FL (MixFL) whose encoding follows from Vanilla Mixup while ignoring decoding. Compared to MixFL, XorMixFL achieves comparable accuracy while preserving more data privacy, i.e., higher dissimilarity between the augmented and original data samples measured using the multidimensional scaling (MDS) method (Cox2008), highlighting the importance of XorMixup in one-shot FL.

2 Related Work

Existing one-shot FL schemes (Guha:19; HybridFL) consider that each devices first trains a local model until convergence, and then the server constructs a global model by aggregating the converged local models. This does not take into consideration global data distributions, and is thus vulnerable to the non-IID data problem. By contrast, XorMixFL constructs the global model by training a model with synthetic samples that are uploaded from devices in one communication round while preserving their data privacy.

To preserve privacy while exchanging data samples, homomorphic encryption such as RSA (Craig09:Homomorphic) or differential privacy mechanisms (Koda:GC20) can be used, with non-negligible computing overhead or accuracy degradation, respectively. Alternatively, XorMixup reduces the accuracy degradation with low complexity by leveraging Mixup data augmentation (Zhang2018) and XOR operations. The original purpose of Vanilla Mixup is oversampling. XorMixup additionally focuses on its privacy preserving benefit in that the combination distorts raw samples. Furthermore, instead of the linear combination used in Vanilla Mixup, we apply XOR operations that are often used in cipher algorithms for hiding original information (Churchhouse01:book).

Several recent works also study FL frameworks based on data sample exchanges (Jeong18; Jeong:FML19; Oh20:CL). For one-shot FL under non-IID data, in (Jeong18; Jeong:FML19) a synthetic sample generator is trained after collecting seed samples, which may still violate data privacy. In (Oh20:CL), seed samples are collected after Vanilla Mixup encoding, for running knowledge distillation operations, rather than one-shot FL. For a comprehensive overview on data sample exchanges in the context of FL, compared to model parameter exchanges (McMahan2016; Konecny2016; KimCL:19; Chen:20019aa; YangQuekPoor:2019aa; Wang:2018aa; Amiri:SPAWC19; elgabli2019gadmm; samarakoon2018federated; chen2018lag) and model output exchanges (Jeong18; Oh20:CL; Han:Intellisys20; Ahn:PIMRC20; Ahn:ICASSP20), readers are encouraged to read (Park:2018ab).

3 Methodology

In this section, we describes the system model under study and the operations of XorMixFL. Consider a one-shot FL system consisting of one server and devices, as depicted in Fig. 1. The -th device stores its local dataset , where . Let indicate the server training and distributing a global model after collecting samples from devices in a privacy-preserving way.

In XorMixFL, the server

aims to train a global model to classify unlabeled samples. We consider a supervised task with unlabeled features

and their ground-truth labels . All devices have the same label , but store different features . We assume that both devices and the server store their own datasets imbalanced across labels, i.e., a non-IID global dataset, in which some of the target labels are deficient in samples. For the -th label, denotes the number of samples in the server . We hereafter refer to these samples as base samples that will be used for data augmentation to make the server’s training dataset balanced.

Before augmenting samples, the server informs its connected devices of its target labels lacking samples, and requests samples per target label to the device . Then, the device uploads XOR encoded samples to the server . At the server , the encoded samples are decoded using XOR operations with the server’s base samples, generating a synthetic-but-realistic sample for correcting its imbalanced training dataset. The encoding preserves raw data privacy by mixing target-label samples and dummy-label (non-target-label) samples using the bit-wise XOR operations. Likewise, the decoding preserves data privacy by mixing the encoded sample not with the raw dummy sample but with the server ’s base sample.

After the decoding, due to the use of the server’s base samples, there exist residual noise, as shown in Fig. 1. This is partly intended to preserve privacy, but nonetheless too much noise is obviously harmful for accuracy. To avoid excessive noise, it is important to extract common feature before each encoding or decoding. To this end, up to samples within the same label are averaged as done in Vanilla Mixup (Zhang2018). This sample blending step not only extracts common features, but also preserves more privacy by mixing multiple samples. The aforementioned operations of XorMixFL are elaborated in the following three steps.

Notation Meaning
# of devices
-th device
Dataset of
# of base samples in the -th label at
Maximum # of blending samples per label
# of target labels
# of dummy labels
# of samples per each target label
# of samples in the remaining M dummy labels
Total required # of samples at
# of samples uploaded to from
Raw sample of a target label at
Raw sample of a dummy label at
Base sample of a dummy label at
Encoded sample at , transmitted to
Decoded sample from at
Table 1: List of Notations

1) Sample Blending.


denote an 1-dimensional vector whose elements are the target labels of device

that the server wants to successfully receive. Let represent an 1-dimensional vector whose label is not the target one. Per each label, up to samples are averaged, resulting in and for target and dummy labels, respectively. Here, is given by iteratively applying a sample blending function by times, which is defined as


where denotes the blending ratio of two samples.

2) XOR Encoding.

For given blended samples and , we apply the bit-wise XOR operation, and obtain an encoded sample , i.e.,


The encoded sample is sent from device to the server .

3) XOR Decoding.

The server  performs the sample blending operations with its own samples to yield . Given and the XOR encoded sample received from , the server applies the bit-wise XOR operation, resulting in the decoded sample , given as


Note that is decoded not using the device ’s in Eq. 2 but using the server’s own in Eq. 3, thereby preserving the privacy of the raw samples.

To illustrate, as visualized in Fig. 1, consider an example where a server lacking the samples of the target label out of 10 labels (digits ). To preserve the sample privacy, device 1 selects a dummy label at random, within which samples are blended. Likewise, device 1 performs the same operations for the target label . Then, using XOR, device 1 encodes two blended samples and (see Eq. 2), and sends the encoded sample to the server. Next, to decode the received sample , the server first blends its own samples in the dummy label 2, creating a dummy sample . Then, the server applies XOR to and , yielding the decoded sample (see Eq. 3). Finally, the server adds into its dataset, and then trains an ML model that is distributed to every device after the training completion.

Generalizing this to multiple samples, the server requests encoded samples to its connected device . These operations of each device and the server are summarized by Algorithms 1 and 2, respectively.

  Input: target label, dummy label, , and
     Randomly select a target-label sample
     Randomly select a dummy-label sample
     Blend via Eq. 1 up to times, yielding
     Blend via Eq. 1 up to times, yielding
     XOR the blended samples via Eq. 2, yielding
     Store in a buffer
  until The buffer size =
  Upload the buffer to the server 
Algorithm 1 Device’s XOR encoding procedure

4 Experiments

In this section, we numerically analyze the performance of the proposed XorMixFL scheme in a non-IID MNIST classification task. For the benchmark scheme, we consider standalone ML, Vanilla FL, and MixFL. Unless otherwise specified, by default we consider devices having samples per each target label while storing samples in the remaining dummy labels.

4.1 Accuracy Evaluation under Non-IID Data Distributions

In this subsection, we investigate the test accuracy and per-label accuracy of XorMixFL and its benchmark schemes, for a different . With , Tab. 2 shows that XorMixFL achieves about 10% higher test accuracy than the standalone ML regardless of the choice of the target label. Compared to MixFL, XorMixFL shows around 2% lower test accuracy. This is because the augmented samples generated by XorMixup include more non-trivial noise to preserver more privacy than Vanilla Mixup used in MixFL, as visualized in Fig. 2. For each target-label accuracy, Fig. 3(a) shows that XorMixFL achievs 65% target-label accuracy that is 1.87x higher than standalone ML. This highlights the effectiveness of XorMiup in (one-shot) FL.

Next, with , Tab. 2 illustrates that the test accuracy is slightly decreased from the cases with , while still exhibiting a similar trend across the proposed and benchmark schemes. To be more specific, comparing Fig. 3(a) and (b), target-label accuracy also decreases with , among which the label is relatively robust against while the label is sensitive to . Indeed, the label accuracy is decreased by around % compared to the case with .

(a) Vanila Mixup. (b) XorMixup, (c) XorMixup, .
Figure 2: A visualization of original samples (target label: , dummy label: ) and augmented samples using: (a) Vanilla Mixup, (b) XorMixup without sample blending (), and (c) XorMixup with sample blending ().
Method 1 Target Label,
0 1 2 3 4 5 6 7 8 9 (0, 2) (0, 8) (2, 4) (4, 8) ()
XorMixFL 94.44% 95.83% 95.28% 93.99% 94.72% 94.17% 94.58% 93.43% 94.65% 91.54% 92.36% 91.03% 92.61% 91.28% 88.48%
MixFL 96.85% 96.72% 95.59% 96.13% 95.82% 95.89% 95.52% 95.57% 95.56% 95.34% 95.52% 93.98% 94.87% 93.76% 91.93%
Vanilla FL 83.27% 84.83% 84.25% 83.49% 84.82% 83.82% 85.12% 83.31% 84.26% 82.94% 81.72% 81.23% 78.75% 79.23% 77.12%
Standalone 89.12% 88.62% 89.34% 91.13% 89.83% 88.31% 89.28% 90.11% 91.22% 87.43% 86.41% 88.16% 87.19% 86.58% 84.82%
Table 2: Test accuracy evaluation for different (, ).
(a) 1 target label () (b) 2 target labels () (c) 4 target labels ()
Figure 3: Target-label accuracy evaluation for different : (a) , where the blue, red, yellow bars represent XorMixFL, MixFL, and standalone ML, respectively; (b) (target labels: two of among 0, 2, 4, 8); and (c) (target labels: 0, 2, 4, 8). Tot denotes the test accuracy (averaged over all labels).

Lastly, with , following the same tendency, Tab. 2 shows that test accuracy further decreases from , and the resultant lowest test accuracy is %. Likewise, as observed in Fig. 3(c), the target-label accuracy is also the lowest among the cases , , and . Specifically, compared to the case with in Fig. 3(b), the target-label accuracy is decreased by around %.

4.2 Data Privacy Evaluation

In this subsection, we study the sample privacy guarantee of the augmented dataset. Following (Jeong:FML19; Oh20:CL), the sample privacy is measured using the minimum MDS value (Cox2008) between the augmented sample and any raw sample contributing to the augmented sample. A larger MDS value implies higher dissimilarity, preserving more privacy.

Tab. 3 shows the MDS values of XorMixFL and MixFL with . For the blending ratio , it is trivial that biased towards either or more reveals one of the raw samples, while equal blending () minimizes the raw sample leakage. Therefore, for MixFL that shows the highest accuracy in Sec. 4.1, we aim to preserve more privacy by choosing . By contrast, for XorMixFL, we aim to increase accuracy by choosing .

  Input: , target label, dummy label, , and
  Inform connected devices of and target label
     Randomly select a dummy-label sample
     Blend via Eq. 2 up to times, yielding
     XOR and via Eq. 3, yielding
     Store in the training dataset
  until # of decoded samples =
  Train the server’s model
  Broadcast the server’s trained model to all devices
Algorithm 2 Server’s XOR decoding and training procedures

As shown in Tab. 3, the value of MDS increases with the number of belnding samples and the number of dummy labels. Therefore, the lowest mean value of MDS in all cases of XorMixFL is , which is even higher than the maximum MDS value of all cases in MixFL. The highest mean MDS value in XorMixFL is when and . The highest value of the MDS of MixFL is . The highest MDS value of XorMixFL is about 50% higher than that of MixFL. This results show that XorMixFL achieves preserves more data privacy than MixFL.

4.3 Impacts of Hyperparameters

In this subsection, we compare XorMixFL and MixFL in terms of the test accuracy and sensitivity to key design parameters: the number of dummy labels, sample blending weight , and the maximum number of blending samples per label. For all the parameter changes, XorMixFL is more robust against MixFL as elaborated next.

Test Accuracy.

First, we study the impact of on test accuracy. With under and , XorMixFL achieves the 95.29% test accuracy. With , the test accuracy of XorMixFL is 91.60%. There is only 3.8% reduction in test accuracy, which is the largest reduction of XorMixFL when only the number of dummy labels changes. On the other hand, the test accuracy of MixFL is 94.49% when under and . With , the test accuracy of MixFL drops to 85.65%, corresponding to 9.3% reduction. This is the largest reduction in MixFL when only the number of dummy labels changes.

Second, we investigate the impact of on test accuracy. With under and , the test accuracy of XorMixFL is 92.78%. With , the test accuracy increases by around 2%. This is the largest increase in the test accuracy of XorMixFL. With , the test accuracy increases by around 5%. This is the largest increase in the test accuracy of MixFL.

Lastly, we consider the impact of on test accuracy. When under and , 92.78% is the test accuracy of XorMixFL. With , the test accuracy increases by about 3%. However, With under and , the test accuracy of MixFL is 90.74%. When , the test accuracy increases by about 5%.

Number of Averaging Samples ()
2 3 4 5
XorMixFL 2116.97 / 335.85 2335.78 / 545.64 2520.88 / 687.77 2674.01 / 778.90 2781.09 / 838.49
2321.38/ 352.65 2409.72 / 643.56 2664.12 / 814.13 2821.37 / 900.01 2911.42 / 947.06
2406.62 / 365.61 2519.44 / 741.04 2800.24 / 895.33 2949.51 / 966.04 3030.43 / 1001.48
MixFL 1248.46 / 156.20 1342.34 / 213.23 1382.11 / 311.44 1402.53 / 292.41 1496.32 / 349.25
1462.75 / 251.78 1523.10 / 320.13 1611.42 / 343.13 1683.45 / 391.39 1820.43 / 385.32
1521.13 / 225.39 1621.47 / 276.52 1850.32 / 403.72 1999.01 / 545.66 2095.57 / 634.79
Table 3:

MDS comparison (mean / standard deviation) between XorMixFL and MixFL (target label:

, ).
1 dummy label, 2 dummy labels, 3 dummy labels,
= 0.25 0.5 0.95 0.25 0.5 0.95 0.25 0.5 0.95
XorMixFL = 1 Test Acc. 92.33% 92.54% 92.05% 95.29% 95.10% 94.47% 91.60% 91.51% 90.82%
Per-label Acc. 52.89% 54.48% 51.65% 80.01% 77.61% 74.45% 44.62% 41.98% 39.72%
= 5 Test Acc. 91.75% 92.98% 93.11% 94.37% 92.78% 94.19% 90.82% 90.42% 91.60%
Per-label Acc. 46.16% 56.55% 60.28% 69.59% 56.23% 68.95% 34.92% 33.42% 43.68%
MixFL = 1 Test Acc. 94.49% 94.49% 95.96% 87.85% 90.65% 91.98% 87.44% 90.31% 91.86%
Per-label Acc. 72.82% 72.60% 72.73% 21.94% 43.36% 47.72% 19.78% 40.91% 46.46%
= 5 Test Acc. 94.49% 90.07% 90.74% 88.64% 86.21% 90.71% 85.65% 86.03% 90.66%
Per-label Acc. 42.08% 32.82% 35.67% 26.83% 24.71% 34.76% 15.42% 24.30% 34.72%
Table 4: The average test accuracy of XorMixFL and MixFL when the number of dummy label and the value of for image blending procedure changes (target label: , dummy labels: and/or and/or ).

Target-Label Accuracy.

In terms of per-label accuracy, we compare the impact of and . First, when under , and three dummy labels are used, the per-label accuracy of XorMixFL is 41.98%. When , the per-label accuracy increase by about 84%, leading to the per-label accuracy 77.61%. This is the largest percent change in XorMixFL when only the number of dummy labels changes. On the other hand, when under and , the per-label accuracy of MixFL is 19.78%. When , the per-label accuracy increase by about 268%, resulting in the per-label accuracy 72.82%. This is the largest percent change in MixFL when only the number of dummy labels changes.

Second, we study the impact of the blending weight on the per-label accuracy. When under and , the per-label accuracy of XorMixFL is 46.16%. When , the per-label accuracy increase by about 30%, resulting in the per-label accuracy is 60.28%. This is the largest increase in the per-label accuracy of XorMixFL when only changes. On the other hand, when under and , the per-label accuracy of MixFL is 19.78%. When , the per-label accuracy increases by about 134%, leading to the per-label accuracy 46.46%.

Lastly, we investigate the impact of on the per-label accuracy. With under and , the per-label accuracy of XorMixFL is 56.23%. The per-label accuracy increases by about 38% when , yielding the per-label accuracy 77.61% of XorMixFL. However, with under and , the per-label accuracy of MixFL is 42.08%. With , the per-label accuracy increases by about 73%, achieving 72.82% per-label accuracy of MixFL.

5 Conclusion

In this work, we proposed a novel privacy-preserving one-shot FL framework, XorMixFL, which allows devices to locally augment insufficient samples to correct the non-IID data distributions while hiding the details of the original samples. Numerical simulations validate the effectiveness of XorMixFL in terms of accuracy and privacy guarantees, while discovering the impacts of key design parameters such as the numbers of blending samples and dummy labels on accuracy and privacy guarantees. Exploiting the core idea, XorMixup data augmentation, it could be interesting to extend our one-shot FL framework to the standard multi-shot FL applications for future study.