Continual Learning with Differential Privacy

10/11/2021
by   Pradnya Desai, et al.
University of Florida
0

In this paper, we focus on preserving differential privacy (DP) in continual learning (CL), in which we train ML models to learn a sequence of new tasks while memorizing previous tasks. We first introduce a notion of continual adjacent databases to bound the sensitivity of any data record participating in the training process of CL. Based upon that, we develop a new DP-preserving algorithm for CL with a data sampling strategy to quantify the privacy risk of training data in the well-known Averaged Gradient Episodic Memory (A-GEM) approach by applying a moments accountant. Our algorithm provides formal guarantees of privacy for data records across tasks in CL. Preliminary theoretical analysis and evaluations show that our mechanism tightens the privacy loss while maintaining a promising model utility.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

03/23/2019

Preserving Differential Privacy in Adversarial Learning with Provable Robustness

In this paper, we aim to develop a novel mechanism to preserve different...
02/21/2022

Personalized PATE: Differential Privacy for Machine Learning with Individual Privacy Guarantees

Applying machine learning (ML) to sensitive domains requires privacy pro...
12/16/2017

One-sided Differential Privacy

In this paper, we study the problem of privacy-preserving data sharing, ...
02/25/2021

On continual single index learning

In this paper, we generalize the problem of single index model to the co...
06/25/2017

Preserving Differential Privacy in Convolutional Deep Belief Networks

The remarkable development of deep learning in medicine and healthcare d...
05/27/2021

On Privacy and Confidentiality of Communications in Organizational Graphs

Machine learned models trained on organizational communication data, suc...
05/09/2022

Protecting Data from all Parties: Combining FHE and DP in Federated Learning

This paper tackles the problem of ensuring training data privacy in a fe...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The ability to acquire new knowledge over time while retaining previously learned experiences, referred to as continual learning (CL), brings machine learning closer to human learning

[29, 32, 3]. More specifically, given a stream of tasks, CL focuses on training a machine learning (ML) model to quickly learn a new task by leveraging the acquired knowledge after learning previous tasks under a limited amount of computation and memory resources [19, 31]. As a result, the main challenge of existing CL algorithms is that they can quickly be suffered by catastrophic forgetting.

Also, memorizing previous tasks while learning new tasks further exposes CL models to adversarial attacks, especially model and data inference [34, 12, 36]. CL models can disclose private and susceptible information in the training set, such as healthcare data [44, 4, 16], financial records [38, 43], and bio-medical images [27, 15]. Continuously accessing the data from the previously learned tasks, either stored in episodic memories [7, 30, 2, 35, 28] or produced from generative memories [33, 37, 21], incurs additional privacy risk compared to a single ML model trained on a single task. However, there is still a lack of scientific study to protect private training data in CL algorithms.

Motivated by this, we propose to preserve differential privacy (DP) [8], offering rigorous privacy protection as probabilistic terms for the training data in CL. Merely employing existing DP-preserving mechanisms can either cause a significantly large privacy loss or quickly exhaust the limited computation and memory resources in learning new tasks while memorizing previous tasks through either episodic or generative memories. Thus, effectively and efficiently preserving DP in CL remains a mostly open problem.

Key contributions. To effectively bound the DP privacy loss in CL, we first define continual adjacent databases (Def. 2) to capture the impact of the current task’s data and the episodic memory on the privacy loss and model utility. Based upon that, we incorporate a moments accountant [1] into the Averaged Gradient Episodic Memory (A-GEM) algorithm [7] in a new DP-CL algorithm to preserve DP in CL.

Our idea is to configure the episodic memory in A-GEM as independent mini-memory blocks. We store a subset of training data of the current task in a mini-memory block with an associated task index in the episodic memory for each task. At each training step, we compute reference gradients on the mini-memory blocks independently. The reference gradients will be used to optimize the process of memorizing previously learned tasks as in A-GEM. More importantly, by keep tracking of the task and mini-memory block index, we can leverage a moments accountant to estimate the privacy cost spent on each mini-memory block. Based upon this, we derive a new strategy (

Lemma 2) to bound DP loss in the whole CL process while maintaining the computation efficiency of the A-GEM algorithm.

To our knowledge, our proposed mechanism establishes the first formal connection between DP and CL. Experiments conducted on the permuted MNIST dataset

[13]

and the Split CIFAR

[41] show promising results in preserving DP in CL, compared with baseline approaches.

2 Background

In this section, we revisit continual learning, differential privacy, and introduce our problem statement. The goal of CL is to learn a model through a sequence of tasks such that the learning of each new task will not cause forgetting of the previously learned tasks. Let be the dataset at task consisting of samples, each of which is a sample associated with a label . Each

is a one-hot vector of

categories:

. A classifier outputs class scores

mapping an input to a vector of scores s.t. and . The class with the highest score is selected as the predicted label for the sample. The classifier

is trained by minimizing a loss function

that penalizes mismatching between the prediction and the original value .

Averaged Gradient Episodic Memory (A-GEM) [7]. There is a sequence of tasks that have been learnt, where . The goal is to train the model at the current task so that it minimizes the loss on the task and does not forget previous learned tasks . The key feature of A-GEM is to store a subset of data from task , denoted as , in an episodic memory . Then the algorithm ensures that the loss on an average episodic memory across all the previously learned tasks, i.e., , does not increase at every training step. In A-GEM, the objective function of learning the current task is:

(1)

where is the values of model parameters learned after training the task , and .

The constrained optimization problem of Eq.1 can be approximated quickly and the updated gradient is as follows:

(2)

where is the proposed gradient update on and is the reference gradient computed from the episodic memory from previous tasks.

Differential Privacy [9, 10, 20, 39]. To avoid the training data leakage, DP guarantees to restrict what the adversaries can learn from the training data given the model parameters by ensuring similar model outcomes with and without any single data sample in the dataset. The definition of DP is as follows:

Definition 1

-DP [8]. A randomized algorithm fulfills -DP, if for any two adjacent databases and differ at most one sample, and for all outcomes , we have:

(3)

where is the privacy budget and and

is the broken probability.

The privacy budget controls the amount by which the distributions induced by and may differ. A smaller enforces a stronger privacy guarantee. The broken probability represents the improbable “bad” events in which an adversary can infer whether a particular data sample belongs to the training data, which is possible when the probability .

DP in Continual Learning. There are several works of DP in CL [11, 23]. In [11], the authors train a DP-GAN [42] to approximate the distribution of the past datasets. They leverage a small portion of public data (i.e., the data that does not need to keep private) to initialize and train the GAN in the first few iterations of each task, then continue training the GAN model under DP constraint. The trained generator produces adversarial examples imitating real examples of past tasks. Then, the adversarial examples are employed to supplement the actual data of the current training task. DPL2M [23] perturbs the objective functions using a DPAL mechanism [22, 24] and applies A-GEM to optimize the perturbed objective function. However, there is a lack of a concrete definition of adjacent databases with unclear or not well-justified DP protection in [11, 23]. Different from existing works, we provide a formal DP protection for CL models.

3 Continual Learning with DP

Figure 1: DP in CL protects privacy for a stream of different tasks. The updated gradient is computed based on 1) computed from the episodic memory for previous tasks and 2) of the current task using our proposed algorithm DP-CL. Here, blue box indicates training data of task , orange boxes and green boxes indicate mini-memory blocks in in which the orange ones are used to compute .

This section establishes a connection between differential privacy and continual learning. We first propose a definition of continual adjacent databases in CL, as follows: Two databases and are continual adjacent if they differ in a single sample of the training data and differ in a single sample of the episodic memory across all the tasks. The definition is presented as follows:

Definition 2

Continual Adjacent Databases. Two databases and , where , , , and , are called continual adjacent if: and .

A Naive Algorithm. Based upon Definition 2, a straightforward approach, called DP-AGEM, is to simply apply a moments accountant [1] into A-GEM [7], to preserve DP in CL. At each task , we divide the dataset into and such that and are disjoint: . By using the training data with a sampling rate , DP-AGEM computes a proposed gradient , which is bounded by a predefined -norm clipping bound . It is beneficial in real-world to keep track of the privacy budget spent on each task independently, and the total privacy budget used in the entire training process. To achieve this, in computing the reference gradients , the algorithm first randomly samples data from all the data samples in the episodic memory with a sampling probability . Given a particular () in the episodic memory, the sampled data is used to compute a reference gradient , which is clipped with the -norm bound . Then Gaussian mechanism is employed to inject random Gaussian noise with a predefined hyper-parameter into both and . The reference gradient is the average of all the reference gradients computed on each , as follows: . Finally, the updated gradient computed using Eq. 2 with and can be used to update the model parameters. After training the task , is added into the episodic memory . The training process will continue until the model is trained on all the tasks.

Since the -norms of the gradients and are bounded, we can leverage a moments accountant to bound the privacy loss for a single task as well as the privacy loss accumulated across all tasks. Let be the privacy budget used to compute on , and is the privacy budget spent on computing the reference gradient at each training task. The privacy budget used for a specific task , denoted as and the total privacy budgets of DP-AGEM accumulated until the task can be computed in the following lemma.

Lemma 1

Until the task , 1) the privacy budget used for a specific and previously learned task is: , and 2) the total privacy budget of DP-AGEM is: .

Proof

We use induction to prove Lemma 1.

When , there is an empty episodic memory; therefore, and .

Hence, Lemma 1 is true for . Assuming that Lemma 1 is true for , so we have and . We need to show that Lemma 1 is true for .

We have: , and . Thus, Lemma 1 holds.

Two Levels of DP Protection. In Lemma 1, based on our definition of continual adjacent databases (Def. 2), it is essential that there are two levels of DP protection provided to an arbitrary data sample, as follows. Until the task : (1) Given the DP budget for a specific task , the participation information of an arbitrary data sample in the task is protected under a -DP given the released parameters . This can be presented as: , for any adjacent databases and ; and (2) The participation information of an arbitrary data sample in the whole training data is protected under a -DP given the released parameters . This can be presented as: , for any continual adjacent databases and . This is fundamentally different from existing works [11, 23], which do not provide any formal DP protection in CL.

Although DP-AGEM can preserve DP in CL, it suffers from a large privacy budget accumulation across tasks with an for . This is impractical in the real world with a loose DP protection. To address this, we present an algorithm to tighten the DP loss.

DP-CL Algorithm. Our DP-CL (Alg. 1 and Figure 1) takes a sequence of tasks and dataset as inputs. All samples in are used to compute the proposed gradient update on task with a sampling rate (Line 6). We clip so that its

-norm is bounded by a predefined gradient clipping bound

. Then we add a random Gaussian noise into with a predefined noise scale (Line 9). Note that after training the task , samples in are added to the episodic memory as a mini-memory block (Lines 17, 24-26). To reduce the privacy budget accumulated over the number of tasks, we limit the access to seen data of previous tasks by using a randomly selected mini-memory block () from to compute (Lines 20-23). We clip by the gradient clipping bound and then add a random Gaussian noise to (Line 14). The updated gradient is computed by Eq. 2 (Line 15). Then is used to update the model parameters (Line 16). The privacy budgets in our DP-CL algorithm can be bounded in the following lemma.

1:  Input: Number of tasks , dataset , gradient clipping bound , objective function
2:  Initialize model , episodic memory , moments accountant
3:  for  do
4:      s.t. ,
5:     for each iteration  do
6:         Take random samples in with a sampling rate
7:        for  do
8:           
9:           
10:           if  then
11:              
12:           else
13:              
14:              
15:              Compute with Eq. 2
16:           
17:     
18:  print .get_priv_spent()
19:  Output: -DP-CL ,
20:  CalGref():
21:     Randomly choose from , where
22:     (
23:     return
24:  UpdateEpsMem():
25:     
26:     return
27:  ClipGrad():
28:     return
Algorithm 1 DP in Continual Learning (DP-CL) Algorithm
Lemma 2

Until the task , 1) the privacy budget used for a specific and previously learned task is: , where is the privacy budget used for a randomly chosen mini-memory block from to compute at task , and 2) the total privacy budget of DP-CL is: .

Proof

We use induction to prove Lemma 2. When , there is an empty episodic memory; therefore, , . Hence, Lemma 2 is true for .

Assuming that Lemma 2 is true for , so we have and . We need to show that Lemma 2 is true for . We have: , and . Consequently, Lemma 2 does hold.

It is obvious that our DP-CL algorithm significantly reduces the privacy consumption to , which is linear to the number of training tasks. In addition, our sampling approach to compute is unbiased, since the expectation for any data sample selected to compute is the same: . In our experiment, we will show that DP-CL outperforms the baseline approach DP-AGEM.

4 Experimental Results

(a) Privacy Accumulation
(b) Permuted MNIST Dataset
(c) Split CIFAR Dataset
Figure 2: Theoretical analysis for privacy accumulation (a); and Average accuracy over tasks of A-GEM and DP-CL algorithms with varying ( and for all tasks) on the permuted MNIST dataset (b) and the permuted CIFAR-100 dataset (c).
A-GEM Forgetting (F) Worst-case F LCA
DP-CL
( and for all tasks)
= 0.85 0.0070
= 0.9
= 0.95
= 1.0
= 1.15
= 1.30

Table 1: Forgetting measure (F), worst-case F, and LCA results for the MNIST dataset. The lower F, worst-case F, and the higher LCA the better.
A-GEM Forgetting (F) Worst-case F LCA
DP-CL
( and for all tasks)
= 0.95
= 0.96
= 0.97
= 0.98
= 0.99
= 1.0

Table 2: Forgetting measure (F), worst-case F, and LCA results for the Split CIFAR dataset. The lower F, worst-case F, and the higher LCA the better.

We have conducted experiments on the permuted MNIST dataset [13] and the Split CIFAR dataset [41]. The permuted MNIST dataset is a variant of the MNIST dataset [18] of handwritten digits. The permuted MNIST dataset has images involving ten-digit classification, where each task consists of different random permutations of the input pixels in the images. The Split CIFAR [41] is a split version of the original CIFAR-100 dataset [17]. There are disjoint subsets, where each subset is constructed by randomly sampling classes without replacement from a total of classes. Our validation focuses on shedding light on the interplay between model utility and privacy loss of preserving DP in CL. Our code and datasets are available on Github111https://github.com/PhungLai728/DP-CL.

Baseline Approaches. We evaluate our DP-CL algorithm and compare it with A-GEM [7], one of the state-of-the-art CL algorithms. Note that A-GEM does not preserve DP; therefore, we only use A-GEM to show the upper-bound model performance. We apply four well-known metrics, including the average accuracy, the average forgetting measure [6], the worst-case forgetting measure [7], and the learning curve area (LCA) [7], to evaluate our mechanism.

Model Configuration. In the permuted MNIST dataset, we use a fully connected network with two hidden layers of

hidden neurons. Given a stream of

tasks, the model is optimized via stochastic gradient descent with a learning rate

. In computing , the batch-size is set to for each training task and for the mini-memory block. The number of runs for each experiment is . The noise scale and the gradient clipping bound . In the Split CIFAR dataset, a reduced ResNet-18 [19, 14] with 3 times less feature maps across all the layers. The network has a final linear classifier for prediction in the Split CIFAR dataset. The batch-size is set to

in each training task. Other hyperparameters, e.g., learning rate, noise scale, gradient clipping bound, etc., are the same as in the permuted MNIST dataset experiment. The number of runs for each experiment is

.

Comparing Privacy Accumulation. Since the number of data samples and the sampling rate remain the same for every task, the privacy budgets and can be the same for every task. Therefore, for the sake of clarity without loss of generality, in this privacy accumulation comparison between DP-AGEM and our DP-CL algorithm, we draw different random Gaussian values with (mean , std ) and assign the generated values as the privacy budget and for tasks.

Figure (a)a illustrates how privacy loss accumulates over tasks in DP-AGEM and our DP-CL algorithm. Our algorithm achieves a notably tighter privacy budget compared with DP-AGEM, which accesses data samples from the whole episodic memory to compute . When the number of tasks increases, DP-AGEM’s privacy budget exponentially increases. In contrast, our approach’s privacy budget slightly increases and is linear to the number of tasks or training steps.

Privacy Loss and Model Utility. From our theoretical analysis, DP-AGEM suffers from a huge privacy budget accumulation over tasks. Therefore, we only compare our DP-CL algorithm and the noiseless A-GEM model for the sake of simplicity.

As shown in Figure (b)b and (c)c, our proposed method achieves a comparable average accuracy with the noiseless A-GEM model at the first task. In the permuted MNIST dataset, when the number of tasks increases, the average accuracy of our DP-CL drops faster than the average accuracy of the A-GEM model. For example, at task -th, A-GEM’s average accuracy drops to , while DP-CL’s average accuracy drops to with a tight privacy budget . When the privacy budget increases, the average accuracy gap between our model and the noiseless A-GEM is larger, indicating that preserving DP in CL may increase the catastrophic forgetting. This phenomenon is further clarified by the measures of forgetting, worst-case forgetting, and LCA (Table 1). At , forgetting, worst-case forgetting, and LCA are , , and respectively in DP-CL. After that, the forgetting and worst-case forgetting significantly increase, and LCA moderately decreases in DP-CL.

In the Split CIFAR dataset, when the number of tasks increases, the average accuracy of DP-CL drops quickly while the average accuracy of the A-GEM model fluctuates. For instances, A-GEM’s average accuracy is at the first task, drops to at the second task, and is at the last task. Meanwhile, DP-CL’s average accuracy is at the first task, and gradually drops to at the last task with a tight privacy budget . The fluctuation phenomenon in the A-GEM model is probably due to the curse of dimension in which there are training examples, which is much smaller than the number of trainable parameters in the ResNet-18, i.e., million. Different from the permuted MNIST dataset, in the Split CIFAR dataset, when the privacy budget increases, the average accuracy gap between DP-CL and the noiseless A-GEM is smaller, especially at the first task. For instance, at the first task, the gap are , , , , , and when the values of are , , , , , and , respectively. This shows the trade-off between privacy budget and model utility in which when we spend more privacy budget, the model accuracy improves. The gap between DP-CL’s average accuracy and A-GEM’s average accuracy are significantly bigger when the number of tasks increases, but the difference among different privacy budgets decreases. For instance, at the last task, the gap are , , , , , and when the values of are , , , , , and , respectively. As shown in Table 2, when the privacy budget increases, the forgetting and worst-case forgetting significantly increase, while the LCA slightly fluctuates around . This further confirms our observations in the MNIST dataset in which preserving DP in CL may increase the catastrophic forgetting.

Key observations. From our preliminary experiments, we obtain the following observations. (1) Merely incorporating the moments accountant into A-GEM causes a large privacy budget accumulation. (2) Although our DP-CL algorithm can preserve DP in CL, optimizing the trade-off between model utility and privacy loss in CL is an open problem since the privacy noise can worsen the catastrophic forgetting.

5 Conclusion and Future Work

In this paper, we established the first formal connection between DP and CL. We combine the moments accountant and A-GEM in a holistic approach to preserve DP in CL in a tightly accumulated privacy budget. Our model shows promising results under strong DP guarantees in CL and opens a new research line to optimize the model utility and privacy loss trade-off. One of the immediate questions is how to align the privacy noise with the catastrophic forgetting under the same privacy protection. We also plan to examine our approach to a broader range of models and datasets, especially under adversarial attacks [34, 5], and heterogeneous and adaptive privacy-preserving mechanisms [25, 26, 40]. Our work further highlights an open direction of quantifying the privacy risk given a diverse correlation among tasks. Learning a highly related task can further disclose the private information in another task, and vice-versa.

Acknowledgments

The authors gratefully acknowledge the support from the National Science Foundation (NSF) grants NSF CNS-1935928/1935923, CNS-1850094, IIS-2041096/2041065.

References

  • [1] M. Abadi, A. Chu, I. Goodfellow, H.B. McMahan, I. Mironov, K. Talwar, and L. Zhang (2016) Deep learning with differential privacy. In ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318. Cited by: §1, §3.
  • [2] D. Abati, J. Tomczak, T. Blankevoort, S. Calderara, R. Cucchiara, and B.E. Bejnordi (2020) Conditional channel gated networks for task-aware continual learning. In

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    ,
    pp. 3931–3940. Cited by: §1.
  • [3] R. Aljundi, K. Kelchtermans, and T. Tuytelaars (2019) Task-free continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11254–11263. Cited by: §1.
  • [4] A. Alnemari, C.J. Romanowski, and R.K. Raj (2017) An adaptive differential privacy algorithm for range queries over healthcare data. In 2017 IEEE International Conference on Healthcare Informatics (ICHI), pp. 397–402. Cited by: §1.
  • [5] N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson, et al. (2021)

    Extracting training data from large language models

    .
    USENIX Security Symposium. Cited by: §5.
  • [6] A. Chaudhry, P.K. Dokania, T. Ajanthan, and P.H.S. Torr (2018) Riemannian walk for incremental learning: understanding forgetting and intransigence. In ECCV, pp. 532–547. Cited by: §4.
  • [7] A. Chaudhry, M.A. Ranzato, M. Rohrbach, and M. Elhoseiny (2019) Efficient lifelong learning with a-GEM. In International Conference on Learning Representations, Cited by: §1, §1, §2, §3, §4.
  • [8] C. Dwork, F. McSherry, K. Nissim, and A. Smith (2006) Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, pp. 265–284. Cited by: §1, Definition 1.
  • [9] C. Dwork, A. Roth, et al. (2014) The algorithmic foundations of differential privacy.. Foundations and Trends in Theoretical Computer Science 9 (3-4), pp. 211–407. Cited by: §2.
  • [10] C. Dwork (2008) Differential privacy: a survey of results. In International conference on theory and applications of models of computation, pp. 1–19. Cited by: §2.
  • [11] S. Farquhar and Y. Gal (2018) Differentially private continual learning. Privacy in Machine Learning and AI workshop at ICML. Cited by: §2, §3.
  • [12] M. Fredrikson, S. Jha, and T. Ristenpart (2015) Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1322–1333. Cited by: §1.
  • [13] I.J. Goodfellow, M. Mirza, D. Xiao, A. Courville, and Y. Bengio (2014)

    An empirical investigation of catastrophic forgetting in gradient-based neural networks

    .
    ICLR. Cited by: §1, §4.
  • [14] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. Cited by: §4.
  • [15] M. Helmstaedter, K.L. Briggman, S.C. Turaga, V. Jain, H.S. Seung, and W. Denk (2013) Connectomic reconstruction of the inner plexiform layer in the mouse retina. Nature 500 (7461), pp. 168–174. Cited by: §1.
  • [16] H.B. Kartal, X. Liu, and X.B. Li (2019) Differential privacy for the vast majority. ACM Transactions on Management Information Systems (TMIS) 10 (2), pp. 1–15. Cited by: §1.
  • [17] A. Krizhevsky, G. Hinton, et al. (2009) Learning multiple layers of features from tiny images. Cited by: §4.
  • [18] Y. LeCun (1998) The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/. Cited by: §4.
  • [19] D. Lopez-Paz and M.A. Ranzato (2017) Gradient episodic memory for continual learning. Neural Information Processing Systems (NeurIPS). Cited by: §1, §4.
  • [20] F. McSherry and K. Talwar (2007) Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), pp. 94–103. Cited by: §2.
  • [21] O. Ostapenko, M. Puscas, T. Klein, P. Jahnichen, and M. Nabi (2019) Learning to remember: a synaptic plasticity driven framework for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11321–11329. Cited by: §1.
  • [22] H. Phan, M. T. Thai, H. Hu, R. Jin, T. Sun, and D. Dou (2020) Scalable differential privacy with certified robustness in adversarial learning. In International Conference on Machine Learning, pp. 7683–7694. Cited by: §2.
  • [23] N.H. Phan, M.T. Thai, M.S. Devu, and R. Jin Differentially private lifelong learning. In Privacy in Machine Learning (NeurIPS 2019 Workshop), Cited by: §2, §3.
  • [24] N. Phan, R. Jin, M. T. Thai, H. Hu, and D. Dou (2019) Preserving differential privacy in adversarial learning with provable robustness. CoRR abs/1903.09822. External Links: Link, 1903.09822 Cited by: §2.
  • [25] N. Phan, M. N. Vu, Y. Liu, R. Jin, D. Dou, X. Wu, and M. T. Thai (2019) Heterogeneous gaussian mechanism: preserving differential privacy in deep learning with provable robustness. In

    Proceedings of the 28th International Joint Conference on Artificial Intelligence

    ,
    IJCAI’19, pp. 4753–4759. External Links: ISBN 9780999241141 Cited by: §5.
  • [26] N. Phan, X. Wu, H. Hu, and D. Dou (2017) Adaptive laplace mechanism: differential privacy preservation in deep learning. In 2017 IEEE International Conference on Data Mining (ICDM), pp. 385–394. Cited by: §5.
  • [27] S.M. Plis, D.R. Hjelm, R. Salakhutdinov, E.A. Allen, H.J. Bockholt, J.D. Long, H.J. Johnson, J.S. Paulsen, J.A. Turner, and V.D. Calhoun (2014) Deep learning for neuroimaging: a validation study. Frontiers in neuroscience 8, pp. 229. Cited by: §1.
  • [28] J. Rajasegaran, S. Khan, M. Hayat, F.S. Khan, and M. Shah (2020) Itaml: an incremental task-agnostic meta-learning approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13588–13597. Cited by: §1.
  • [29] R. Ratcliff (1990) Connectionist models of recognition memory: constraints imposed by learning and forgetting functions.. Psychological review 97 (2), pp. 285. Cited by: §1.
  • [30] M. Riemer, I. Cases, R. Ajemian, M. Liu, I. Rish, Y. Tu, and G. Tesauro (2019) Learning to learn without forgetting by maximizing transfer and minimizing interference. ICLR. Cited by: §1.
  • [31] A.A. Rusu, N.C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell (2016) Progressive neural networks. arXiv preprint arXiv:1606.04671. Cited by: §1.
  • [32] J. Schwarz, W. Czarnecki, J. Luketina, A. Grabska-Barwinska, Y.W. Teh, R. Pascanu, and R. Hadsell (2018) Progress & compress: a scalable framework for continual learning. In International Conference on Machine Learning, pp. 4528–4537. Cited by: §1.
  • [33] H. Shin, J.K. Lee, J. Kim, and J. Kim (2017) Continual learning with deep generative replay. Neural Information Processing Systems (NeurIPS). Cited by: §1.
  • [34] R. Shokri, M. Stronati, C. Song, and V. Shmatikov (2017) Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18. Cited by: §1, §5.
  • [35] X. Tao, X. Hong, X. Chang, S. Dong, X. Wei, and Y. Gong (2020) Few-shot class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12183–12192. Cited by: §1.
  • [36] Y. Wang, C. Si, and X. Wu (2015) Regression model fitting under differential privacy and model inversion attack.. In IJCAI, pp. 1003–1009. Cited by: §1.
  • [37] C. Wu, L. Herranz, X. Liu, Y. Wang, J. van de Weijer, and B. Raducanu (2018) Memory replay gans: learning to generate new categories without forgetting. In Neural Information Processing Systems (NeurIPS), Cited by: §1.
  • [38] N. Wu, F. Farokhi, D. Smith, and M. A. K. (2020) The value of collaboration in convex machine learning with differential privacy. IEEE Symposium on Security and Privacy. Cited by: §1.
  • [39] C. Xu, J. Ren, D. Zhang, Y. Zhang, Z. Qin, and K. Ren (2019) GANobfuscator: mitigating information leakage under GAN via differential privacy. IEEE Transactions on Information Forensics and Security 14 (9), pp. 2358–2371. Cited by: §2.
  • [40] D. Xu, W. Du, and X. Wu (2021) Removing disparate impact on model accuracy in differentially private stochastic gradient descent. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21, pp. 1924–1932. External Links: ISBN 9781450383325 Cited by: §5.
  • [41] F. Zenke, B. Poole, and S. Ganguli (2017) Continual learning through synaptic intelligence. In International Conference on Machine Learning, pp. 3987–3995. Cited by: §1, §4.
  • [42] X. Zhang, S. Ji, and T. Wang (2018) Differentially private releasing via deep generative model (technical report). arXiv preprint arXiv:1801.01594. Cited by: §2.
  • [43] T. Zhu, G. Li, W. Zhou, and S.Y. Philip (2017) Differential privacy and applications. Cited by: §1.
  • [44] M.T. Zia, M.A. Khan, and H. El-Sayed (2020) Application of differential privacy approach in healthcare data–a case study. In 2020 14th International Conference on Innovations in Information Technology (IIT), pp. 35–39. Cited by: §1.