1 Introduction
Machine learning has become an ubiquitous tool in analyzing personal data and developing datadriven services. Unfortunately, the underlying learning models can pose a serious threat to privacy if they inadvertently reveal sensitive information from the training data. For example, Carlini et al. [12] show that the Google text completion system contains credit card and social security numbers from personal emails, which may be exposed to users during the autocompletion of text. Once such sensitive data has entered a learning model, however, its removal is nontrivial and requires to selectively revert the learning process. In absence of specific methods for this task in the past, retraining from scratch has been the only resort, which is costly and only possible if the original training data is still available.
As a remedy, Cao & Yang [11] and Bourtoule et al. [8] propose methods for machine unlearning. These methods decompose the learning process and are capable of removing individual data points from a learning model in retrospection. As a result, they enable to eliminate isolated privacy issues, such as data points associated with individuals. However, information leaks may not only manifest in single data instances but also in groups of features and labels. A leaked address of a celebrity might be shared in hundreds of social media posts, affecting large parts of the training data. Similarly, relevant features in a bagofwords model may be associated with sensitive names and data, contaminating the entire feature space.
Unfortunately, instancebased unlearning as proposed in previous work is inefficient in these cases: First, a runtime improvement can hardly be obtained over retraining as the leaks are not isolated and larger parts of the training data need to be removed. Second, omitting several data points will inevitably reduce the fidelity of the corrected learning model. It becomes clear that the task of unlearning is not necessarily confined to removing data points, but may also require corrections on the orthogonal layers of features and labels, regardless of the amount of affected training data.
In this paper, we propose a method for unlearning features and labels. Our approach is inspired by the concept of influence functions, a technique from robust statistics [31]
, that allows for estimating the influence of data on learning models
[33, 34]. By reformulating this influence estimation as a form of unlearning, we derive a versatile approach that maps changes of the training data in retrospection to closedform updates of the model parameters. These updates can be calculated efficiently, even if larger parts of the training data are affected, and enable the removal of features and labels. As a result, our method can correct privacy leaks in a wide range of learning models with convex and nonconvex loss functions.For models with strongly convex loss, such as logistic regression and support vector machines, we prove that our approach enables
certified unlearning. That is, it provides theoretical guarantees on the removal of features and labels from the models. To obtain these guarantees, we extend the concept of certified data removal [28] and show that the difference between models obtained with our approach and retraining from scratch become arbitrarily small. Consequently, we can define an upper bound on this difference and thereby realize provable unlearning in practice.For models with nonconvex loss functions, such as deep neural networks, similar theoretical guarantees do not hold in general. However, we empirically demonstrate that our approach provides substantial advantages over prior work. Our method is significantly faster in comparison to sharding
[24, 8]and retraining while removing data and preserving a similar level of accuracy. Moreover, due to the compact updates, our approach requires only a fraction of the training data and hence is applicable when the original data is not entirely available. We show the efficacy of our approach in case studies on removing privacy leaks in spam classification and unintended memorization in natural language processing.
Contributions
In summary, we make the following major contributions:

Unlearning with closedform updates. We introduce a novel framework for unlearning of features and labels. This framework builds on closedform updates of learning models and thus is signicantly faster than instancebased approaches to unlearning.

Certified unlearning. We derive two unlearning strategies for our framework based on firstorder and secondorder gradient updates. Under convexity and continuity assumptions on the loss, we show that both strategies can provide certified unlearning.

Empirical analysis. We empirically show that unlearning of sensible information is possible even for deep neural networks with nonconvex loss functions. We find that our firstorder update is extremely efficient, enabling a speedup over retraining by up to three orders of magnitude.
The rest of the paper is structured as follows: We review related work on machine unlearning and influence functions in Section 2. Our approach and its technical realization are introduced in Sections 4 and 3, respectively. The theoretical analysis of our approach is presented in Section 5 and its empirical evaluation in Section 6. Finally, we discuss limitations in Section 7 and conclude the paper in Section 8.
2 Related Work
The increasing application of machine learning to personal data has started a series of research on detecting and correcting privacy issues in learning models [e.g., 53, 13, 12, 45, 36, 47]. In the following, we provide an overview of work on machine unlearning and influence functions. A broader discussion of privacy and machine learning is given by De Cristofaro [21] and Papernot et al. [41].
Machine unlearning
Methods for unlearning sensitive data are a recent branch of security research. Earlier, the efficient removal of samples was also called decremental learning [14]
and used to speed up cross validation for various linear classifiers
[16, 17, 15]. Cao & Yang [11]show that a large number of learning models can be represented in a closed summation form that allows for elegantly removing individual data points in retrospection. However, for adaptive learning strategies, such as stochastic gradient descent, this approach provides only little advantage over retraining from scratch and thus is not well suited for correcting problems in neural networks.
As a remedy, Bourtoule et al. [8] propose a universal strategy for unlearning data points from classification models. Similarly, Ginart et al. [24] develop a technique for unlearning points in clustering. The key idea of both approaches is to split the data into independent partitions—so called shards—and aggregate the final model from submodels trained over these shards. In this setting, the unlearning of data points can be efficiently carried out by only retraining the affected submodels. Aldaghri et al. [3] show that this approach can be further sped up for leastsquares regression by choosing the shards cleverly. Unlearning based on shards, however, is suitable for removing a few data points only and inevitably deteriorates in performance when larger portions of the data require changes.
This limitation of sharding is schematically illustrated in Fig. 1
. The probability that all shards need to be retrained increases with the number of data points to be corrected. For a practical setup with
shards, as proposed by Bourtoule et al. [8], changes to as few as points are already sufficient to impact all shards and render this form of unlearning inefficient, regardless of the size of the training data. We provide a detailed analysis of this limitation in Stochastic Sharding Analysis. Consequently, privacy leaks involving hundreds or thousands of data points cannot be addressed with these approaches.Influence functions
The concept of influence functions that forms the basis of our approach originates from robust statistics [31] and has first been used by Cook & Weisberg [20]
for investigating the changes of simple linear regression models. Although the proposed techniques have been occasionally employed in machine learning
[35, 32], the seminal work of Koh & Liang [33] recently brought general attention to this concept and its application to modern learning techniques. In particular, this work uses influence functions for explaining the impact of data points on the predictions of learning models.Influence functions have then been used to trace bias in word embeddings back to documents [10, 19], determine reliable regions in learning models [46], and explain deep neural networks [7]. Moreover, Basu et al. [6] increase the accuracy of influence functions by using highorder approximations, Barshan et al. [5] improve the precision of influence calculations through nearestneighbor strategies, and Guo et al. [29] show that the runtime can be decreased when only specific samples are considered. Golatkar et al. [26, 25] use influence functions for sample removal in deep neural networks by proposing special approximations of the learning model.
In terms of theoretical analysis, Koh et al. [34] study the accuracy of influence functions when estimating the loss on test data and Neel et al. [40] perform a similar analysis for gradient based update strategies. Rad & Maleki [44] further show that the prediction error on leaveoneout validations can be reduced with influence functions. Finally, Guo et al. [28] introduce the idea of certified removal for data points that we extend in our approach.
All of these approaches, however, remain on the level of data instances. To our knowledge, we are the first to build on the concept of influence functions for unlearning features and labels from learning models.
3 Unlearning with Updates
Let us start by considering a supervised learning task that is described by a dataset
with each object consisting of a data point and a label . We assume thatis a vector space and denote the
th feature (dimension) of by . Given a loss function that measures the difference between the predictions of a learning model and the true labels, the optimal model can be found by minimizing the regularized empirical risk,(1) 
where is a regularizer and describes the loss on the entire dataset. In this setup, the process of unlearning amounts to adapting to changes in without recalculating the optimization problem in Eq. 1.
3.1 Unlearning Data Points
To provide an intuition for our approach, we begin by asking the following question: How would the optimal learning model change, if only one data point had been perturbed by some change ? Replacing by leads to the new optimal set of model parameters:
(2) 
However, calculating the new model exactly is expensive. Instead of replacing the data point with , we can also upweight by a small value and downweight accordingly, resulting in the following optimization problem:
(3) 
Eqs. 3 and 2 are equivalent for and solve the same problem. As a result, we do not need to explicitly remove a data point from the training data but can revert its influence on the learning model through a combination of appropriate upweighting and downweighting.
It is easy to see that this approach is not restricted to a single data point. We can simply define a set of data points and its perturbed versions , and arrive at the weighting
(4) 
This generalization enables us to approximate changes on larger portions of the training data. Instead of solving the problem in Eq. 4, however, we formulate this optimization as an update of the original model . That is, we seek a closedform update of the model parameters, such that
(5) 
where has the same dimension as the learning model but is sparse if only a few parameters are affected.
As a result of this formulation, we can describe changes of the training data as a compact update rather than iteratively solving an optimization problem. We show in Section 4 that this update step can be efficiently computed using firstorder and secondorder gradients. Furthermore, we prove in Section 4 that the unlearning success of both updates can be certified up to a tolerance if the loss function is strictly convex, twice differentiable, and Lipschitzcontinuous.
3.2 Unlearning Features and Labels
Equipped with a general method for updating a learning model, we proceed to introduce our approach for unlearning features and labels. To this end, we expand our notion of perturbations and include changes to labels by defining
where modifies the features of a data point and its label. By using different changes in the perturbations , we can now realize different types of unlearning using closedform updates.
Replacing features
As the first type of unlearning, we consider the task of correcting features in a learning model. This task is relevant if the content of some features violates the privacy of a user and needs to be replaced with alternative values. As an example, personal names, identification numbers, residence addresses, or other sensitive data might need to be removed after a model has been trained on a corpus of emails.
For a set of features and their new values , we define perturbations on the affected points by
For example, a credit card number contained in the training data can be blinded by a random number sequence in this setting. The values can be adapted individually, such that finegrained corrections become possible.
Replacing labels
As the second type of unlearning, we focus on correcting labels. This form of unlearning is necessary if the labels captured in a model contain unwanted information. For example, in generative language models, the training text is used as input features (preceding characters)
and labels (target characters) [27, 48]. Hence, defects can only be eliminated if the labels are unlearned as well.For the affected points and the set of new labels , we define the corresponding perturbations by
where corresponds to the data points in without their original labels. The new labels can be individually selected for each data point, as long as they come from the domain , that is, . Note that the replaced labels and features can be easily combined in one set of perturbations , so that defects affecting both can be corrected in a single update. In Section 6.2, we demonstrate that this combination can be used to remove unintended memorization from generative language models with high efficiency.
Revoking features
Based on appropriate definitions of and , our approach enables to replace the content of features and thus eliminate privacy leaks. However, in some scenarios it might be necessary to completely remove features from a learning model—a task that we denote as revocation. In contrast to the correction of features, this form of unlearning poses a unique challenge: The revocation of features reduces the input dimension of the learning model. While this adjustment can be easily carried out through retraining with adapted data, constructing a model update as in Eq. 5 is tricky.
To address this problem, let us consider a model trained on a dataset . If we remove the features from this dataset and train the model again, we obtain a new optimal model with reduced input dimension. By contrast, if we set the values of the features to zero in the dataset and train again, we obtain an optimal model with the same input dimension as . Fortunately, these two models are equivalent for a large class of learning models, including support vector machines and several neural networks as the following lemma shows.
Lemma 1.
Proof.
It is easy to see that it is irrelevant for the dot product whether a dimension of is missing or equals zero in the linear transformation
As a result, the loss of both models is identical for every data point . Hence, is also equal for both models and thus the same objective is minimized during learning resulting in equal parameters. ∎
Lemma 1 enables us to erase features from many learning models by first setting them to zero, calculating the parameter update, and then reducing the dimension of the models accordingly. Concretely, to revoke the features , we locate the data points where these features are nonzero with
and construct corresponding perturbations such that the features are set to zero by unlearning,
Revoking labels
The previous strategy allows revoking features from several learning models. It is crucial if, for example, a bagofwords model has captured sensitive data in relevant features and therefore a reduction of the input dimension during unlearning is unavoidable. Unfortunately, a similar strategy for the revocation of labels is not available for our method, as we are not aware of a general shortcut, such as Lemma 1. Still, if the learning model contains explicit output dimensions for the class labels, as with some neural network architectures, it is possible to first replace unwanted labels and then manually remove the corresponding dimensions.
4 Update Steps for Unlearning
Our approach rests on changing the influence of training data with a closedform update of the model parameters, as shown in Eq. 5. In the following, we derive two strategies for calculating this closed form: a firstorder update and a secondorder update. The first strategy builds on the gradient of the loss function and thus can be applied to any model with a differentiable loss. The second strategy also incorporates secondorder derivatives which limits the application to loss functions with an invertable Hessian matrix.
4.1 FirstOrder Update
Recall that we aim to find an update that we add to our model . If the loss is differentiable, we can find the optimal firstorder update by
(6) 
where is a small constant that we refer to as unlearning rate. A complete derivation of Eq. 6 is given in Firstorder update. Intuitively, this update shifts the model parameters in the direction from to where the size of the update step is determined by the rate . This update strategy is related to the classic gradient descent update GD used in many learning algorithms and given by
However, it differs from this update step in that it moves the model to the difference in gradient between the original and perturbed data, which minimizes the loss on and at the same time removes the information contained in .
The firstorder update is a simple and yet effective strategy: Gradients of can be computed in [43]
and modern autodifferentiation frameworks like TensorFlow
[1]and PyTorch
[42] offer easy gradient computations for the practitioner. The update step involves a parameter that controls the impact of the unlearning step. To ensure that data has been completely replaced, it is necessary to calibrate this parameter using a measure for the success of unlearning. In Section 6, for instance, we show how the exposure metric by Carlini et al. [12] can be used for this calibration.4.2 SecondOrder Update
The calibration of the update step can be eliminated if we make further assumptions on the properties of the loss function . If we assume that is twice differentiable and strictly convex, the influence of a single data point can be approximated in closed form [20] by
where is the inverse Hessian of the loss at , that is, the inverse matrix of the secondorder partial derivatives. We can now perform a linear approximation for to obtain
(7) 
Since all operations are linear, we can easily extend Eq. 7 to account for multiple data points and derive the following secondorder update:
(8) 
A full derivation of this update step is provided in Secondorder update. Note that the update does not require any parameter calibration, since the parameter weighting of the changes is directly derived from the inverse Hessian of the loss function.
The secondorder update is the preferred strategy for unlearning on models with a strongly convex and twice differentiable loss function, such as a logistic regression, that guarantee the existence of . Technically, the update step in Eq. 8 can be easily calculated with common machinelearning frameworks. In contrast to the firstorder update, however, this computation involves the inverse Hessian matrix, which is nontrivial for neural networks, for example.
Computing the inverse Hessian
Given a model with parameters, forming and inverting the Hessian requires time and space [33]. For models with a small number of parameters, the matrix can be precomputed and explicitly stored, such that each subsequent request for unlearning only involves a simple matrixvector multiplication. For example, in Section 6.1, we show that unlearning features from a logistic regression model with about parameters can be realized with this approach in less than a second.
For complex learning models, such as deep neural networks, the Hessian matrix quickly becomes too large for explicit storage. Moreover, these models typically do not have convex loss functions, such that the matrix may also be noninvertible, rendering an exact update impossible. Nevertheless, we can approximate the inverse Hessian using techniques proposed by Koh & Liang [33]. While this approximation weakens the theoretical guarantees of the unlearning process, it enables applying secondorder updates to a variety of complex learning models, similar to the firstorder strategy.
To apply secondorder updates in practice, we have to avoid storing explicitly and still be able to compute . To this end, we rely on the scheme proposed by Agarwal et al. [2] to compute expressions of the form that only require to calculate and do not need to store . HessianVectorProducts (HVPs) allow us to calculate efficiently by making use of the linearity of the gradient
Denoting the first terms of the Taylor expansion of by we have , and can recursively define an approximation given by . If
for all eigenvalues
of , we have for . To ensure this convergence, we add a small damping term to the diagonal of and scale down the loss function (and thereby the eigenvalues) by some constant which does not change the optimal parameters . Under these assumptions, we can formulate the following algorithm for computing an approximation of : Given data points sampled from , we define the iterative updatesIn each update step, is estimated using a single data point and we can use HVPs to evaluate efficiently in as demonstrated by Pearlmutter [43]. Using batches of data points instead of single ones and averaging the results further speeds up the approximation. Choosing large enough so that the updates converge and averaging
runs to reduce the variance of the results, we obtain
as our final estimate of in of time. In Section 6.2 we demonstrate that this strategy can be used to calculate the secondorder updates for a deep neural network with million parameters.5 Certified Unlearning
Machine unlearning is a delicate task, as it aims at reliably removing privacy issues and sensitive data from learning models. This task should ideally build on theoretical guarantees to enable certified unlearning, where the corrected model is stochastically indistinguishable from one created by retraining. In the following, we derive conditions under which the updates of our approach introduced in Section 4.2 provide certified unlearning. To this end, we build on the concepts of differential privacy [22, 18] and certified data removal [28], and adapt them to the unlearning problem.
Let us first briefly recall the idea of differential privacy in machine learning: For a training dataset , let be a learning algorithm that outputs a model after training on , that is, . Randomness in
induces a probability distribution over the output models in
. The key idea of differential privacy is a measure of difference between a model trained on and another one trained on for some .Definition 1.
Given some , a learning algorithm is said to be differentially private (DP) if
holds for all .
Thus, for an DP learning algorithm the difference between the loglikelihood of a model trained on and one trained on is smaller than for all possible models, datasets, and data points. Based on this definition, we can introduce the concept of certified unlearning. In particular, we consider an unlearning method that maps a model to a corrected model where denotes the dataset containing the perturbations required for the unlearning task.
Definition 2.
Given some and a learning algorithm , an unlearning method is certified if
holds for all .
This definition ensures that the probability to obtain a model using the unlearning method and training a new model on from scratch deviates at most by . Similar to certified data removal [28], we introduce certified unlearning, a relaxed version of certified unlearning, defined as follows.
Definition 3.
That is, certified unlearning allows the method to slightly violate the conditions from Definition 2 by a constant . Using the above definitions, it becomes possible to derive conditions under which certified unlearning is possible for both our approximate update strategies.
5.1 Certified Unlearning of Features and Labels
Based on the concept of certified unlearning, we analyze our approach and its theoretical guarantees on removing features and labels. To ease this analysis, we make two assumptions on the employed learning algorithm: First, we assume that the loss function is twice differentiable and strictly convex such that always exists. Second, we consider regularization in optimization problem (1), that is, which ensures that the loss function is strongly convex.
A powerful concept for analyzing unlearning is the gradient residual for a given model and a corrected dataset . For strongly convex loss functions, the gradient residual is zero if and only if equals since in this case the optimum is unique. Therefore, the norm of the gradient residual reflects the distance of a model from one obtained by retraining on the corrected dataset . While a small value of this norm is not sufficient to judge the quality of unlearning, we can develop upper bounds to prove properties related to differential privacy [18, 28]. Consequently, we derive bounds for the gradient residual norms of our two update strategies. The corresponding proofs are given in Proofs for Certified Unlearning.
Theorem 1.
If all perturbations lie within a radius , that is , and the loss is Lipschitz with respect to and , the following upper bounds hold:

If the unlearning rate , we have
for the firstorder update of our approach.

If is Lipschitz with respect to , we have
for the secondorder update of our approach.
This theorem enables us to bound the gradient residual norm of both update steps. We leverage these bounds to reduce the difference between unlearning and retraining from scratch. In particular, we follow the approach by Chaudhuri et al. [18] and add a random linear term to the loss function to shape the distribution of the model parameters. Given a vector drawn from a random distribution, we define
with a corresponding gradient residual given by
By definition, the gradient residual of differs only by the added vector from the residual of the original loss , which allows to precisely determine its influence on the bounds of Theorem 1 depending on the underlying distribution of .
Let be an exact minimizer of on with density function and an approximated minimum obtained through unlearning with density . Guo et al. [28] show that the maxdivergence between and for the model produced by can be bounded using the following theorem.
Theorem 2 (Guo et al. [28]).
Let be an unlearning method with a gradient residual with . If is drawn from a probability distribution with density satisfying that for any there exists an such that implies then
for any produced by the unlearning method .
Theorem 2 equips us with a way to prove the certified unlearning property from Definition 2. Using the gradient residual bounds derived in Theorem 1, we can adjust the density function of in such a way that Theorem 2 applies for both removal strategies using the approach presented by Chaudhuri et al. [18] for differentially private learning strategies.
Theorem 3.
Let be the learning algorithm that returns the unique minimum of and let be an unlearning method that produces a model . If for some we have the following guarantees.

If is drawn from a distribution with density then the method performs certified unlearning for .

If for some then the method performs certified unlearning for with .
Theorem 3 allows us to establish certified unlearning of features and labels in practice: Given a learning model with noise coming from , our approach is certified if the gradient residual norm—which can be bounded by Theorem 1—remains smaller than a constant depending on , and the parameters of the distribution of .
Dataset  Model  Points  Features  Parameters  Classes  Replacement  Certified 
Enron  LR  ✓  
Alice  LSTM  ✗ 
6 Empirical Analysis
We proceed with an empirical analysis of our approach and its capabilities. For this analysis, we examine the efficacy of unlearning in practical scenarios and compare our method to other strategies for removing data from learning models, such as retraining and finetuning. As part of these experiments, we employ models with convex and nonconvex loss functions to understand how this property affects the success of unlearning. Overall, our goal is to investigate the strengths and potential limitations of our approach when unlearning features and labels in practice and examine the theoretical bounds derived in Section 5.1.
Unlearning scenarios
Our empirical analysis is based on the following two scenarios in which sensitive information must be removed from a learning model. The scenarios involve common privacy and security issues in machine learning, with each scenario focusing on a different issue, learning task, and model. Table 1 provides an overview of these scenarios for which we present more details on the experimental setup in the following sections.
Scenario 1: Sensitive features. Our first scenario deals with machine learning for spam filtering. Contentbased spam filters are typically constructed using a bagofwords model [4, 51]. These models are extracted directly from the email content, so that sensitive words and personal names in the emails unavoidably become features of the learning model. These features pose a severe privacy risk when the spam filter is shared, for example in an enterprise environment, as they can reveal the identities of individuals in the training data similar to a membership inference attack [45]. We evaluate unlearning as a means to remove these features ( Section 6.1).
Scenario 2: Unintended memorization. In the second scenario, we consider the problem of unintended memorization [12]
. Generative language models based on recurrent neural networks are a powerful tool for completing and generating text. However, these models can memorize sequences that appear rarely in the training data, including credit card numbers or private messages. This memorization poses a privacy problem: Through specifically crafted input sequences, an attacker can extract this sensitive data from the models during text completion
[13, 12]. We apply unlearning of features and labels to remove identified leaks from language models ( Section 6.2).Performance measures
Unlike other problems in machine learning, the performance of unlearning does not depend on a single numerical measure. For example, one method may only partially remove data from a learning model, whereas another may be successful but degrades the prediction performance of the model. Consequently, we identify three factors that contribute to effective unlearning and provide performance measures for our empirical analysis.
1. Efficacy of unlearning. The most important factor for successful unlearning is the removal of data. While certified unlearning, as presented in Section 5, theoretically ensures this removal, we cannot provide similar guarantees for learning models with nonconvex loss functions. As a result, we need to employ measures that quantitatively assess the efficacy of unlearning. In particular, we use the exposure metric [12] to measure the memorization strength of specific sequences in language generation models after unlearning.
2. Fidelity of unlearning. The second factor contributing to the success of unlearning is the performance of the corrected model. An unlearning method is of practical use only if it preserves the capabilities of the learning model as much as possible. Hence, we consider the fidelity of the corrected model as a performance measure. In our experiments, we use the accuracy of the original model and the corrected model on a holdout set as a measure for the fidelity.
3. Efficiency of unlearning. If the training data used to generate a model is still available, a simple but effective unlearning strategy is retraining from scratch. This strategy, however, involves significant runtime and storage costs. Therefore, we also consider the efficiency of unlearning as a relevant factor. In our experiments, we measure the runtime and the number of gradient calculations for each unlearning method, and relate them to retraining since gradient computations are the most costly part in our update strategies and modern optimization algorithms for machine learning models.
Baseline methods
To compare our approach with related strategies for data removal, we employ different baseline methods as reference for examining the efficacy, fidelity, and efficiency of unlearning.
Retraining. As the first baseline method, we employ retraining from scratch. This method is applicable if the original training data is available and guarantees proper removal of data. The unlearning method by Bourtoule et al. [8] does not provide advantages over this baseline when too many shards are affected by data changes. As shown in Section 2 and detailed in Stochastic Sharding Analysis, this effect already occurs for relatively small sets of data points, and thus we do not explicitly consider sharding in our empirical analysis.
Finetuning.
As a second method for comparison, we make use of naive finetuning. Instead of starting all over, this strategy simply continues to train a model using corrected data. This is especially helpful for neural networks where the new optimal parameter is close to the original one and a lot of optimization steps at the beginning can be saved. In particular, we implement this finetuning by performing stochastic gradient descent over the training data for one epoch. This naive unlearning strategy serves as a middle ground between costly retraining and specialized methods, such as our approach.
Occlusion. For linear classifiers, there exists a onetoone mapping between features and weights. In this case, one can naively unlearn features by simply replacing them with zero when they occur or equivalently set the corresponding weight to zero. This method ignores the shift in the data distribution incurred by the missing features but is very efficient as it requires no training or update steps. Although easy to implement, occlusion can lead to problems if the removed features have a significant impact on the model.
6.1 Unlearning Sensitive Names
In our first unlearning scenario, we remove sensitive features from a contentbased spam filter. As a basis for this filter, we use the Enron dataset [39], which includes emails labeled as spam or nonspam. We divide the dataset into a training and test partition with a ratio of and respectively. To create a feature space for learning, we extract the words contained in each email using whitespace delimiters and obtain a bagofwords model with features weighted by the term frequency inverse document frequency metric. We normalize the feature vectors such that and learn a logistic regression classifier on the training set to use it for spam filtering. Logistic regression is commonly used for similar learning tasks and employs a strictly convex and twice differentiable loss function. This also ensures that a single optimal parameter exists that can be obtained via retraining and used as an optimal baseline. Moreover, the Hessian matrix has a closed form and can be stored in memory to allow an exact second order update for evaluation.
Sensitive features
To gain insights into relevant features of the classifier we employ a simple gradient based explanation method [50]. While we observe several reasonable words with high weights in the model, we also discover features that contain sensitive information. For example, we identify several features corresponding to first and last names of email recipients. Using a list of common names, we can find about surnames and forenames present in the entire dataset. Similarly, we find features corresponding to phone numbers and zip codes related to the company Enron.
Although these features may not appear to be a significant privacy violation at first glance, they lead to multiple problems: First, if the spam filter is shared as part of a network service, the model may reveal the identity of individuals in the training data. Second, these features likely represent artifacts and thus bias spam filtering for specific individuals, for example, those having similar names or postal zip codes. Third, if the features are relevant for the nonspam class, an adversary might craft inputs that evade the classifier. Consequently, there is a need to resolve this issue and correct the learning model.
Unlearning task
We address the problem using feature unlearning, that is, we apply our approach and the baseline methods to revoke the identified features from the classification model. Technically, we benefit from the convex loss function of the logistic regression, which allows us to apply certified unlearning as presented in Section 5. Specifically, it is easy to see that Theorem 3 holds since the gradients of the logistic regression loss are bounded and are thus Lipschitzcontinuous. For a detailed discussion on the Lipschitz constants, we refer the reader to the paper by Chaudhuri et al. [18].
Efficacy evaluation
Theorem 3 equips us with a certified learning strategy via a privacy budget that must not be exceeded by the gradient residual norm of the parameter update. Concretely, for given parameters
and noise on the weights that has been sampled from a Gaussian normal distribution with variance
the gradient residual must be smaller than(9) 
Table 2 shows the effect of the regularization strength and the variance on the classification performance on the test dataset. As expected, the privacy budget is clearly affected by as a large variance clearly impacts the classification performance whereas the impact of the regularization is small.
0.001  
0.01  
0.1  
1 
To evaluate the gradient residual norm further, we set both and to and remove random combinations of the most important names from the dataset using our approaches. The distribution of the gradient residual norm after unlearning is presented in Fig. 2
. We can observe that the residual rises in the number of names to be removed since more data is affected by the update steps. The second order update step achieves extremely small gradient residuals with small variance, while both the first order and naive feature removal produce higher residuals with more outliers. Since naive retraining always produces gradient residuals of zero, the second order approach is best suited for unlearning in this scenario. Notice that by Equation (
9), the bound for the gradient residual depends linearly on the parameter for a given and . Therefore, the secondorder update also allows smaller values for for a given feature combination to unlearn or, vice versa, allows to unlearn many more features for a given .Fidelity evaluation
To evaluate the fidelity in a broad sense, we use the approach of Koh & Liang [33] and firstly compare the loss on the test data after the unlearning task. Fig. 3 shows the difference in test loss between retraining and unlearning when randomly removing combinations of size (left) and (right) features from the dataset. Both the firstorder and secondorder method approximate the retraining very well (Pearson’s ) even if many features are removed. Simply setting the affected weights to zero, however, cannot adapt to the distribution shift and leads to larger deviations of test loss on the test set.
In addition to the loss, we also evaluate the accuracy of the spam filter for the different unlearning methods on the test data in Table 3. Since the accuracy is less sensitive to small model changes, we restrict ourselves to the most important features and choose a small regularization strength of such that single features can become important. The number of affected samples in this experiment is rising quickly from ( of the dataset) for four features to () when deleting features. The large fraction of affected training data stresses that instancebased methods are no suited to repair these privacy leaks as they shrink the available data noticeable.
As a first observation, we note that removing the sensitive features leads to a slight drop in performance for all methods, especially when more features are removed. On a small scale, in turn, the second order method provides the best results and is closest to a model trained from scratch. This evaluation shows that single features can have a significant impact on the classification performance and that unlearning can be necessary if the application of the model requires a high level of accuracy.
Removed features  
Original model  
Retraining  
Occlusion  
Unlearning (1st)  
Unlearning (2nd) 
Efficiency evaluation
We use the previous experiment to measure the efficiency when deleting features and present the results in Table 4. In particular, we use the LBFGS algorithm [37] for optimization of the logistic regression loss. Due to the linear structure of the learning model and the convexity of the loss, the runtime of all methods is remarkably low. We find that the firstorder update is significantly faster than the other approaches. This difference in performance results from the underlying optimization problem: While the other approaches operate on the entire dataset, the firstorder update considers only the corrected points and thus enables a speedup factor of . For the secondorder update, the majority of runtime and gradient computations is used for the computation and inversion of the Hessian matrix. If, however, the number of parameters is small, it is possible to precompute and store the inverse Hessian on the training data such that the second order update comes down to a matrixtimesvector multiplication and becomes faster than retraining.
Unlearning methods  Gradients pt  Runtime pt  Speedup 
Retraining  s  —  
Unlearning (1st)  s  
Unlearning (2nd)  s  
Hessian stored  s 
6.2 Unlearning Unintented Memorization
In the second scenario, we remove unintended memorization artifacts from a generative language model. Carlini et al. [12] show that these models can memorize rare inputs in the training data and exactly reproduce them during application. If this data contains private information like credit card numbers or telephone numbers, this may become a severe privacy issue [13, 53]. In the following, we use our approach to tackle this problem and demonstrate that unlearning is also possible with nonconvex loss functions.
Canary insertion
We conduct our experiments using the novel Alice in Wonderland as training set and train an LSTM network on the character level to generate text [38]. Specifically, we train an embedding with dimensions for the characters and use two layers of LSTM units followed by a dense layer resulting in a model with million parameters. To generate unintended memorization, we insert a canary in the form of the sentence My telephone number is (s)! said Alice into the training data, where (s) is a sequence of digits of varying length [12]. In our experiments, we use numbers of length and repeat the canary so that points are affected. After optimizing the categorical crossentropy loss of the model on this data, we find that the inserted phone numbers are the most likely prediction when we ask the language model to complete the canary sentence, indicating exploitable memorization.
Exposure metric
In contrast to the previous scenario, the loss of the generative language model is nonconvex and thus certified learning is not applicable. A simple comparison to a retrained model is also difficult since the optimization procedure is nondeterministic and might get stuck in local minima of the loss function. Consequently, we require an additional measure to assess the efficacy of unlearning in this experiment and make sure that the inserted telephone numbers have been effectively removed. To this end, we use the exposure metric introduced by Carlini et al. [12]
where is a sequence and is the set containing all sequences of identical length given a fixed alphabet. The function returns the rank of with respect to the model and all other sequences in . The rank is calculated using the logperplexity of the sequence and states how many other sequences are more likely, i.e. have a lower logperplexity.
As an example, Fig. 4 shows the perplexity distribution of our model where a telephone number of length has been inserted during training. The histogram is created using of the total possible sequences in . The perplexity of the inserted number significantly differs from all other number combinations in , indicating that it has been strongly memorized by the underlying language model.
The exact computation of the exposure metric is expensive, as it requires operating over the set . Note that our approximation of the perplexity in Fig. 4
precisely follows a skew normal distribution even though the evaluated number of sequences is small compared to
. Therefore, we can use the approximation proposed by Carlini et al. [12]that determines the exposure of a given perplexity value using the cumulative distribution function of the fit skewnorm density.
Unlearning task
To unlearn the memorized sequences, we replace each digit of the phone number in the data with a different character, such as a random or constant value. Empirically, we find that using text substitutions from the training corpus work best for this task. The model has already captured these character dependencies, resulting in a small update of the model parameters. However, due to the training of generative language models, the update is more involved than in the previous scenario. The model is trained to predict a character from preceding characters. Thus, replacing a text means changing both the features (preceding characters) and the labels (target characters). Therefore, we combine both changes in a single set of perturbations in this setting.
Efficacy evaluation
First, we check whether the memorized telephone numbers have been successfully unlearned from the generative language model. An important result of the study by Carlini et al. [12] is that the exposure is associated with an extraction attack: For a set with elements, a sequence with an exposure smaller than cannot be extracted. For unlearning, we test three different replacement sequences for each telephone number and use the best for our evaluation. Table 5 shows the results of this experiment.
Length of number  5  10  15  20  
Original model  
Retraining  
Finetuning  
Unlearning (1st)  
Unlearning (2nd) 
We observe that our firstorder and secondorder updates yield exposure values close to zero, rendering an extraction impossible. In contrast, finetuning leaves a large exposure in the model, making a successful extraction very likely. On closer inspection, we find that the performance of finetuning depends on the order of the training data during the gradient updates, resulting in a high standard deviation in the different experimental runs. This problem cannot be easily mitigated by learning over further epochs and thus highlights the need for dedicated unlearning techniques. The fact that the simple firstorder update can eradicate the memorization completely also shows that unintended memorization is present only on a local scale of the model.
Replacement  Canary Sentence completion 
taken  mad!’ ‘prizes! said the lory confuse 
not there␣  it,’ said alice. ‘that’s the beginning 
under the mouse  the book!’ she thought to herself ‘the 
the capital of paris  it all about a gryphon all the three of 
Throughout our experiments, we also find that the replacement string plays a major role for the unlearning process in the context of language generation models. In Fig. 4, we report the logperplexity of the canary for three different replacement strings after unlearning for a comparison^{1}^{1}1Strictly speaking, each replacement induces its own perplexity distribution but we find the difference to be marginal and thus place all values in the same histogram for the sake of clarity.. Each replacement shifts the canary far to the right and turns it into a very unlikely prediction with exposure values ranging from to . While we use the replacement with the lowest exposure in our experiments, the other substitution sequences would also impede a successful extraction.
It remains to answer the question what the model actually predicts after the unlearning step for the canary sequence. Table 6 shows different completions of the inserted canary sentence produced by the secondorder update for replacement strings of different lengths. Apparantly, the predicted string is not equal to the replacement, that is, the unlearning does not push the model completely into the parameter set matching the replacement. In addition, we note that the sentences do not seem random, follow the structure of english language and still reflect the wording of the novel.
Fidelity evaluation
To evaluate the fidelity of the unlearning strategies, we examine the performance of the model in terms of accuracy. Table 7
shows the accuracy after unlearning for different numbers of affected data points. For small sets of affected points, our approach yields results comparable to retraining from scratch. No statistically significant difference can be observed in this setting, also when comparing sentences produced by the models. However, the accuracy of the corrected model decreases as the number of points becomes larger because the concept of infinitesimal change is violated. Here, the secondorder method is better able to handle larger changes because the Hessian contains information about unchanged samples. The firstorder approach focuses only on the samples to be fixed and thus increasingly reduces the accuracy of the corrected model. Again, we find that the replacement string plays an important role for the fidelity, especially when more samples are affected, which is expressed in the high standard deviation that can be observed in this case. Depending on the task the replacement string can thus be seen as a hyperparameter of the unlearning approach that has to be tuned.
Affected samples  
Original model  
Retraining  
Finetuning  
Unlearning (1st)  
Unlearning (2nd) 
Efficiency evaluation
Finally, we examine the efficiency of the different unlearning methods in this scenario. At the time of writing, the CUDA library version 10.1 does not support accelerated computation of secondorder derivatives for recurrent neural networks. Therefore, we report a CPU computation time (Intel Xeon Gold 6226) for the secondorder update method of our approach, while the other methods are calculated using a GPU (GeForce RTX 2080 Ti). The runtime and number of gradient computations required for each approach are presented in Table 8.
As expected, the time to retrain the model from scratch is extremely long, as the model and dataset are large. In comparison, one epoch of finetuning is faster but does not solve the unlearning task in terms of efficacy. The firstorder method is the fastest approach and provides a speedup of three orders of magnitude in relation to retraining. The secondorder method still yields a speedup factor of over retraining, although the underlying implementation does not benefit from GPU acceleration. Given that the firstorder update provides a high efficacy in unlearning and only a slight decrease in fidelity when correcting less than points, it provides the overall best performance in this scenario.
Unlearning methods  Gradients  Runtime  Speedup 
Retraining  min  —  
Finetuning  s  
Unlearning (1st)  s  
Unlearning (2nd)  s 
7 Limitations
Removing data from a learning model in retrospection is a challenging endeavor. Although our unlearning approach successfully solves this task in our empirical analysis, it has limitations that are discussed in the following and need to be considered in practical applications.
Scalability of unlearning
As shown in our empirical analysis, the efficacy of unlearning decreases with the number of affected data points. While privacy leaks with dozens of sensitive features and hundreds of affected points can be handled well with our approach, changing half of the training data likely exceeds its capabilities. Clearly, our work does not violate the nofreelunch theorem [52] and unlearning using closedform updates cannot replace the large variety of different learning strategies in practice.
Still, our method provides a significant speedup compared to retraining and sharding in situations where a moderate number of data points need to be corrected. Consequently, it is a valuable unlearning method in practice and a countermeasure to mitigate privacy leaks when the entire training data is no longer available or retraining from scratch would not resolve the issue fast enough.
Nonconvex loss functions
Our approach can only guarantee certified unlearning for strongly convex loss functions that have Lipschitzcontinuous gradients. While both update steps of our approach work well for neural networks with nonconvex functions, they require an additional measure to validate successful unlearning in practice. Forunately, such external measures are often available, as they typically provide the basis for characterizing data leakage prior to its removal. In our experiments, for instance, we use a metric proposed by Carlini et al. [12] for unintended memorization in generative language models. Furthermore, the active research field of Lipschitzcontinuous neural networks [49, 23, 30] already provides promising models that may result in better unlearning guarantees in the near future.
Unlearning requires detection
Finally, we like to point out that our unlearning method requires knowledge of the data to be removed from a model. Detecting privacy leaks in learning models is a hard problem, outside of the scope of this work. First, the nature of privacy leaks depends on the type of data and learning models being used. For example, the analysis of Carlini et al. [12, 13] focuses on generative learning models and cannot be transferred to nonsequential models easily. Second, privacy issues are usually contextdependent and difficult to formalize. The Enron dataset, which was released without proper anonymization, may contain other sensitive information not currently known to the public. The automatic discovery of such privacy issues is a research challenge in its own.
8 Conclusion
Instancebased unlearning is concerned with removing data points from a learning model after training—a task that becomes essential when users demand the “right to be forgotten” under privacy regulations such as the GDPR. However, privacysensitive information is often spread across multiple instances, impacting larger portions of the training data. Instancebased unlearning is limited in this setting, as it depends on a small number of affected data points. As a remedy, we propose a novel framework for unlearning features and labels based on the concept of influence functions. Our approach captures the changes to a learning model in a closedform update, providing significant speedups over other approaches.
We demonstrate the efficacy of our approach in a theoretical and empirical analysis. Based on the concept of differential privacy, we prove that our framework enables certified unlearning on models with a strongly convex loss function and evaluate the benefits of our unlearning strategy in empirical studies on spam classification and text generation. In particular, for generative language models, we are able to remove unintended memorization while preserving the functionality of the models. This result provides insights on the problem of memorized sequences and shows that memorization is not necessarily deeply embedded in the neural networks.
We hope that this work fosters further research that derives approaches for unlearning and sharpens theoretical bounds on privacy in machine learning.
Acknowledgements
The authors gratefully acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy EXC 2092 CASA390781972. Furthermore, we acknowledge funding from the German Federal Ministry of Education and Research (BMBF) under the projects IVAN (FKZ 16KIS1167) and BIFOLD (Berlin Institute for the Foundations of Learning and Data, ref. 01IS18025 A and ref 01IS18037 A) as well as by the Ministerium für Wirtschaft, Arbeit und Wohnungsbau BadenWuerttemberg under the project PoisonIvy.
References
 Abadi et al. [2015] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Largescale machine learning on heterogeneous systems, 2015.
 Agarwal et al. [2017] N. Agarwal, B. Bullins, and E. Hazan. Secondorder stochastic optimization for machine learning in linear time. Journal of Machine Learning Research (JMLR), page 4148–4187, 2017.
 Aldaghri et al. [2020] N. Aldaghri, H. Mahdavifar, and A. Beirami. Coded machine unlearning. arxiv:2012.15721, 2020.
 Attenberg et al. [2009] J. Attenberg, K. Weinberger, A. Dasgupta, A. Smola, and M. Zinkevich. Collaborative emailspam filtering with the hashing trick. In Proc. of the Conference on Email and AntiSpam (CEAS), 2009.

Barshan et al. [2020]
E. Barshan, M. Brunet, and G. Dziugaite.
Relatif: Identifying explanatory training examples via relative
influence.
In
Proc. of International Conference on Artificial Intelligence and Statistics (AISTATS)
, 2020.  Basu et al. [2020] S. Basu, X. You, and S. Feizi. On secondorder group influence functions for blackbox predictions. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, pages 715–724, 2020.

Basu et al. [2021]
S. Basu, P. Pope, and S. Feizi.
Influence functions in deep learning are fragile.
In International Conference on Learning Representations (ICLR), 2021.  Bourtoule et al. [2021] L. Bourtoule, V. Chandrasekaran, C. A. ChoquetteChoo, H. Jia, A. Travers, B. Zhang, D. Lie, and N. Papernot. Machine unlearning. 2021.
 Boyd & Vandenberghe [2004] S. Boyd and L. Vandenberghe. Convex Optimization. 2004.
 Brunet et al. [2019] M.E. Brunet, C. AlkalayHoulihan, A. Anderson, and R. Zemel. Understanding the origins of bias in word embeddings. In Proc. of International Conference on Machine Learning (ICML), 2019.
 Cao & Yang [2015] Y. Cao and J. Yang. Towards making systems forget with machine unlearning. In Proc. of IEEE Symposium on Security and Privacy (S&P), 2015.
 Carlini et al. [2019] N. Carlini, C. Liu, Ú. Erlingsson, J. Kos, and D. Song. The secret sharer: Evaluating and testing unintended memorization in neural networks. In Proc. of USENIX Security Symposium, pages 267–284, 2019.
 Carlini et al. [2021] N. Carlini, F. Tramèr, E. Wallace, M. Jagielski, A. HerbertVoss, K. Lee, and A. Roberts. Extracting training data from large language models. 2021.
 Cauwenberghs & Poggio [2000] G. Cauwenberghs and T. Poggio. Incremental and decremental support vector machine learning. In Proceedings of the 13th International Conference on Neural Information Processing Systems (NIPS), page 388–394, 2000.
 Cawley [2006] G. Cawley. Leaveoneout crossvalidation based model selection criteria for weighted lssvms. In The 2006 IEEE International Joint Conference on Neural Network Proceedings, pages 1661–1668, 2006.
 Cawley & Talbot [2003] G. C. Cawley and N. L. Talbot. Efficient leaveoneout crossvalidation of kernel fisher discriminant classifiers. Pattern Recognition, 36(11):2585–2592, 2003.
 Cawley & Talbot [2004] G. C. Cawley and N. L. Talbot. Fast exact leaveoneout crossvalidation of sparse leastsquares support vector machines. Neural Networks, 17(10):1467–1475, 2004.
 Chaudhuri et al. [2011] K. Chaudhuri, C. Monteleoni, and A. D. Sarwate. Differentially private empirical risk minimization. Journal of Machine Learning Research, page 1069–1109, 2011.
 Chen et al. [2020] H. Chen, S. Si, Y. Li, C. Chelba, S. Kumar, D. Boning, and C.J. Hsieh. Multistage influence function. In Advances in Neural Information Processing Systems (NeurIPS), pages 12732–12742, 2020.
 Cook & Weisberg [1982] R. D. Cook and S. Weisberg. Residuals and influence in regression. New York: Chapman and Hall, 1982.
 De Cristofaro [2021] E. De Cristofaro. A critical overview of privacy in machine learning. IEEE Security & Privacy Magazine, 19(4), 2021.
 Dwork [2006] C. Dwork. Differential privacy. In Automata, Languages and Programming, pages 1–12, 2006.
 Fazlyab et al. [2019] M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. J. Pappas. Efficient and accurate estimation of lipschitz constants for deep neural networks. In Proceedings of the 33rd International Conference on Neural Information Processing Systems (NIPS), 2019.
 Ginart et al. [2019] A. Ginart, M. Y. Guan, G. Valiant, and J. Zou. Making AI forget you: Data deletion in machine learning. In Advances in Neural Information Processing Systems (NeurIPS), 2019.

Golatkar et al. [2020]
A. Golatkar, A. Achille, and S. Soatto.
Eternal sunshine of the spotless net: Selective forgetting in deep
networks.
In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
, 2020.  Golatkar et al. [2021] A. Golatkar, A. Achille, A. Ravichandran, M. Polito, and S. Soatto. Mixedprivacy forgetting in deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
 Graves [2013] A. Graves. Generating sequences with recurrent neural networks. Technical Report arXiv:1308.0850, Computing Research Repository (CoRR), 2013.
 Guo et al. [2020a] C. Guo, T. Goldstein, A. Y. Hannun, and L. van der Maaten. Certified data removal from machine learning models. In Proc. of International Conference on Machine Learning (ICML), pages 3822–3831, 2020a.
 Guo et al. [2020b] H. Guo, N. F. Rajani, P. Hase, M. Bansal, and C. Xiong. Fastif: Scalable influence functions for efficient model interpretation and debugging. arxiv:2012.15781, 2020b.
 Guok et al. [2020] H. Guok, F. Eibe, B. Pfahringer, and M. J. Cree. Regularisation of neural networks by enforcing lipschitz continuity. arXiv, 2020.
 Hampel [1974] F. Hampel. The influence curve and its role in robust estimation. In Journal of the American Statistical Association, 1974.
 Hassibi et al. [1994] B. Hassibi, D. Stork, and G. Wolff. Optimal brain surgeon: Extensions and performance comparisons. In Advances in Neural Information Processing Systems (NeurIPS), 1994.
 Koh & Liang [2017] P. W. Koh and P. Liang. Understanding blackbox predictions via influence functions. In Proc. of International Conference on Machine Learning (ICML), pages 1885–1894, 2017.
 Koh et al. [2019] P. W. Koh, K. Ang, H. H. K. Teo, and P. Liang. On the accuracy of influence functions for measuring group effects. In Advances in Neural Information Processing Systems (NeurIPS), 2019.
 LeCun et al. [1990] Y. LeCun, J. Denker, and S. Solla. Optimal brain damage. In Advances in Neural Information Processing Systems (NeurIPS), 1990.
 Leino & Fredrikson [2020] K. Leino and M. Fredrikson. Stolen memories: Leveraging model memorization for calibrated whitebox membership inference. In Proc. of the USENIX Security Symposium, 2020.
 Liu & Nocedal [1989] D. C. Liu and J. Nocedal. On the limited memory bfgs method for large scale optimization. Mathematical Programming, 45:503–528, 1989.
 Merity et al. [2018] S. Merity, N. S. Keskar, and R. Socher. An analysis of neural language modeling at multiple scales. arxiv:1803.08240, 2018.

Metsis et al. [2006]
V. Metsis, G. Androutsopoulos, and G. Paliouras.
Spam filtering with naive bayes  which naive bayes?
In Proc. of Conference on Email and AntiSpam (CEAS), 2006.  Neel et al. [2020] S. Neel, A. Roth, and S. SharifiMalvajerdi. Descenttodelete: Gradientbased methods for machine unlearning. arxiv:2007.02923, 2020.
 Papernot et al. [2018] N. Papernot, P. McDaniel, A. Sinha, and M. P. Wellman. SoK: Security and privacy in machine learning. In Proc. of the IEEE European Symposium on Security and Privacy (EuroS&P), 2018.
 Paszke et al. [2019] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. Pytorch: An imperative style, highperformance deep learning library. In Advances in Neural Information Processing Systems (NeurIPS). 2019.
 Pearlmutter [1994] B. A. Pearlmutter. Fast exact multiplication by the hessian. Neural Comput., 6(1):147–160, 1994.
 Rad & Maleki [2018] K. R. Rad and A. Maleki. A scalable estimate of the extrasample prediction error via approximate leaveoneout. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82, 2018.
 Salem et al. [2019] A. Salem, Y. Zhang, M. Humbert, P. Berrang, M. Fritz, and M. Backes. MLLeaks: Model and data independent membership inference attacks and defenses on machine learning models. In Proc. of the Network and Distributed System Security Symposium (NDSS), 2019.
 Schulam & Saria [2019] P. Schulam and S. Saria. Can you trust this prediction? auditing pointwise reliability after learning. In Proc. of International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.
 Shokri et al. [2017] R. Shokri, M. Stronati, C. Song, and V. Shmatikov. Membership inference attacks against machine learning models. In Proc. of the IEEE Symposium on Security and Privacy (S&P), pages 3–18, 2017.
 Sutskever et al. [2011] I. Sutskever, J. Martens, and G. Hinton. Generating text with recurrent neural networks. In Proc. of International Conference on Machine Learning (ICML), 2011.
 Virmaux & Scaman [2018] A. Virmaux and K. Scaman. Lipschitz regularity of deep neural networks: analysis and efficient estimation. In Advances in Neural Information Processing Systems (NeurIPS), 2018.
 Warnecke et al. [2020] A. Warnecke, D. Arp, C. Wressnegger, and K. Rieck. Evaluating explanation methods for deep learning in computer security. In Proc. of the IEEE European Symposium on Security and Privacy (EuroS&P), Sept. 2020.
 Weinberger et al. [2009] K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and osh Attenberg. Feature hashing for large scale multitask learning. In Proc. of the International Conference on Machine Learning (ICML), 2009.

Wolpert & Macready [1997]
D. H. Wolpert and W. G. Macready.
No free lunch theorems for optimization.
IEEE Transactions on Evolutionary Computation
, 1(67), 1997.  Zanella Béguelin et al. [2020] S. Zanella Béguelin, L. Wutschitz, S. Tople, V. Rühle, A. Paverd, O. Ohrimenko, B. Köpf, and M. Brockschmidt. Analyzing Information Leakage of Updates to Natural Language Models. In Proc. 27th ACM Conference on Computer and Communications Security (CCS ’20), 2020.
Appendix
Deriving the Update Steps
In the following, we derive the firstorder and secondorder update strategies used in the paper. For a deeper theoretical discussion of the employed techniques, we recommend the reader the book of Boyd & Vandenberghe [9].
Firstorder update
To derive the firstorder update for our approach, let us first reconsider the optimization problem for the corrected learning model:
(10)  
where is the combined loss function that is minimized. If is small and is differentiable with respect to , we can approximate using a firstorder Taylor series at
(11)  
Since is a minimum of we assume . Plugging in the Taylor series approximation and using the condition that , we arrive at
Since we can focus on the dot product. For two vectors the dot product can be written as where is the cosine between the vectors and . The minimal value of the cosine is which is achieved when , hence we have . This result indicates that is the optimal direction to move starting from . The actual step size, however, is unknown and must be adjusted by a small constant yielding the update step defined in Section 4.1:
Due to the linearity of the gradient in this step, the derivation is equal when multiple points are affected.
Secondorder update
If we assume that the loss is twice differentiable and strictly convex, there exists an inverse Hessian matrix and we can proceed to approximate changes to the learning model using the technique of Cook & Weisberg [20]. In particular, we can determine the optimality conditions for Eq. 10 directly by
If is sufficiently small, we can approximate these conditions using a firstorder Taylor series at . This approximation yields the solution:
Since we know that by the optimality of , we can rearrange this solution using the Hessian of the loss function, such that
(12) 
where we additionally drop all terms in . By expressing this solution in terms of the influence of , we can further simplify it and obtain
Finally, when using in Eq. 10, the data point is replaced by completely. In this case, Eq. 12 directly leads to the secondorder update defined in Section 4.2
Proofs for Certified Unlearning
In the following, we present the proofs for certified unlearning of our approach and, in praticular, the bounds of the gradient residual used in Section 5. First, let us recall Theorem 1 from Section 5.1.
Theorem 1.
If all perturbations lie within a radius , that is , and the loss is Lipschitz with respect to and , the following upper bounds hold:

If the unlearning rate , we have
for the firstorder update of our approach.

If is Lipschitz with respect to , we have
for the secondorder update of our approach.
To prove this theorem, we begin by introducing a lemma which is useful for investigating the gradient residual of the model on the dataset .
Lemma 2.
Given a radius with , a gradient that is Lipschitz with respect to , and a learning model , we have
Proof.
By definition, we have
We can now split the dataset into the set of affected data points and the remaining data as follows
By applying a zero addition and leveraging the optimality of the model on our dataset , we then express the gradient as follows
(13) 
Finally, using the Lipschitz continuity of the gradient in this expression, we arrive at the following inequalities that finalize the proof of Lemma 2
∎
With the help of Lemma 2, we can prove the update bounds of Theorem 1. Our proof is structured in two parts, where we start with investigating the first case and then proceed with the second case of the theorem.
Proof (Case 1).
For the firstorder update, we recall that
where is the unlearning rate and we have
Consequently, we seek to bound the norm of
By Taylor’s theorem, there exists a constant and a parameter
Comments
There are no comments yet.