Model Extraction and Active Learning

Machine learning is being increasingly used by individuals, research institutions, and corporations. This has resulted in the surge of Machine Learning-as-a-Service (MLaaS) - cloud services that provide (a) tools and resources to learn the model, and (b) a user-friendly query interface to access the model. However, such MLaaS systems raise privacy concerns, one being model extraction. Adversaries maliciously exploit the query interface to steal the model. More precisely, in a model extraction attack, a good approximation of a sensitive or proprietary model held by the server is extracted (i.e. learned) by a dishonest user. Such a user only sees the answers to select queries sent using the query interface. This attack was recently introduced by Tramer et al. at the 2016 USENIX Security Symposium, where practical attacks for different models were shown. We believe that better understanding the efficacy of model extraction attacks is paramount in designing better privacy-preserving MLaaS systems. To that end, we take the first step by (a) formalizing model extraction and proposing the first definition of extraction defense, and (b) drawing parallels between model extraction and the better investigated active learning framework. In particular, we show that recent advancements in the active learning domain can be used to implement both model extraction, and defenses against such attacks.


Model Extraction Warning in MLaaS Paradigm

Cloud vendors are increasingly offering machine learning services as par...

Privacy-preserving Active Learning on Sensitive Data for User Intent Classification

Active learning holds promise of significantly reducing data annotation ...

Increasing the Cost of Model Extraction with Calibrated Proof of Work

In model extraction attacks, adversaries can steal a machine learning mo...

Attacks on a Privacy-Preserving Publish-Subscribe System and a Ride-Hailing Service

A privacy-preserving Context-Aware Publish-Subscribe System (CA-PSS) ena...

A Framework for Understanding Model Extraction Attack and Defense

The privacy of machine learning models has become a significant concern ...

P2FAAS: Toward Privacy-Preserving Fuzzing as a Service

Global corporations (e.g., Google and Microsoft) have recently introduce...

IHOP: Improved Statistical Query Recovery against Searchable Symmetric Encryption through Quadratic Optimization

Searchable Symmetric Encryption (SSE) schemes allow a client to perform ...

1 Introduction

Advancements in various facets of machine learning has made it an integral part of our daily life. However, most real-world machine learning tasks are resource intensive. To that end, several cloud providers, such as Amazon, Google, Microsoft, and BigML offset the storage and computational requirements by providing Machine Learning-as-a-Service (MLaaS). A MLaaS server offers support for both the training phase, and a query interface for accessing the trained model. The trained model is then queried by other users on chosen instances (refer Fig. 1). Often, this is implemented in a pay-per-query regime i.e. the server, or the model owner via the server, charges the the users for the queries to the model. For example, Google’s Cloud Machine Learning Engine [2] charges users for predictions.

Current research is focused at improving the performance of training algorithms and of the query interface, while little emphasis is placed on the related security aspects. For example, in many real-world applications, the trained models are privacy-sensitive - a model can (a) leak sensitive information about training data [7] during/after training, and (b) can itself have commercial value or can be used in security applications that assume its secrecy (e.g., spam filters, fraud detection etc. [47, 35, 61]). To keep the models private, there has been a surge in the practice of oracle access, or black-box access. Here, the trained model is made available for scoring (i.e. prediction) but is kept secret. MLaaS systems use oracle access to balance the trade-off between privacy and usability.

Despite providing oracle access, a broad suite of attacks continue to target existing MLaaS systems [15, 1]. For example, membership inference attacks attempt to determine if a given data-point is included in the model’s training dataset only by interacting with the MLaaS interface (e.g. [60]). In this work, we focus on model extraction attacks, where an adversary makes use of the MLaaS query interface in order to steal the proprietary model (i.e. learn the model or a good approximation of it). In an interesting paper, Tramèr et al. [64], show that many commonly used MLaaS interfaces can be exploited using only few queries to recover a model’s secret parameters. Even though model extraction attacks are empirically proven to be feasible, their work consider interfaces that reveal auxiliary information, such as confidence values together with the prediction output. Additionally, their work does not formalize model extraction. We believe that such formalization is paramount for designing secure MLaaS that are resilient to aforementioned threats.

In this paper, we take the first step in this direction. The key observation guiding our formalization is that the process of model extraction is very similar to active learning [58], a special case of semi-supervised machine learning. An active learner learns an approximation of a labeling function through repetitive interaction with an oracle, who is assumed to know . These interactions typically involve the learner sending an instance to the oracle, and the oracle returning the label to the learner. Since the learner can choose the instances to be labeled, the number of data-points needed to learn the labeling function is often much lower than in the normal supervised case. Similarly, in model extraction, the adversary uses a strategy to query a MLaaS server with the following goals: (a) to successfully steal (i.e. learn) the model (i.e. labeling function) known by the server (i.e. oracle), in such a way as to (b) minimize the number of queries made to the MLaaS server, as each query costs the adversary a fixed dollar value.

While the overall process of active learning mirrors the general description of model extraction, the entire spectrum of active learning can not be used to study model extraction. Indeed, some scenarios (e.g., PAC active learning) assume that the query instances are sampled from the actual input distribution. However, an attacker is not restricted to such condition and can query any instance. For this reason, we believe that the query synthesis framework of active learning, where the learner has the power to generate arbitrary query instances best replicates the capabilities of the adversary in the model extraction framework. Additionally, the query synthesis scenario ensures that we make no assumptions about the adversary’s prior knowledge. By casting model extraction as query synthesis active learning, we are able to draw concrete similarities between the two. Consequently, we are able to use algorithms and techniques from the active learning community to perform powerful model extraction attacks, and investigate possible defense strategies.

In particular, we show that: (a)

the query synthesis active learning algorithms can be used to perform model extraction on linear classifier with

no auxiliary information, (b)

simple defense strategies such as changing the prediction output with constant and small probability are not effective, and

(c) defense strategies that change the prediction output depending on the instances that are being queried, such as the work of Alabdulmohsin et al. [3], are more robust to extraction attacks implemented using existing query synthesis active learning algorithms. However, in Algorithm 1 of Section 4 we show that this defense is not secure against traditional passive learning algorithms. This suggests that there is “no free lunch” – accuracy might have to be sacrificed to prevent model extraction. A formal investigation of such a result will be interesting avenue for future work.

MLaaS Server(oracle)

Data owner


Figure 1: Model extraction can be envisioned as active learning. A data owner, with the help of a MLaaS server, trains a model on its data. The proprietary model is stored by the server, which also answers to queries from users (i.e., ). In a model extraction attack, a dishonest user tries to exploit this interface to “steal” in the same way as a learner uses answer from a machine-learning oracle in order to learn .

Paper structure. We begin with a brief comparison between passive machine learning and active learning in Section 2. This allows us to introduce the notation used in this paper, and review the state-of-the-art for active learning. Section 3 focuses on the formalization of model extraction attacks, casting it into the query synthesis active learning framework, and finally discusses possible defenses strategies. Section 4 reports our experimental findings, proving that query synthesis active learning can be used to perform model extraction, and tests different defense strategies. Specifically, we observe that (or $0.5 worth [2]) queries are needed to extract most halfspaces when the MLaaS server does not deploy any defense, and (or $20 worth) queries are required to learn a halfspace when it uses data-independent randomization. Finally, we discuss some open issues in Section 5, which provides avenue for future work. Related work is discussed in Section 6 and we end the paper with some concluding remarks.

2 Machine Learning

In this section, we give a brief overview of machine learning, and terminology we use throughout the paper. In particular, we summarize the passive learning framework in subsection 2.1, and focus on active learning algorithms in subsection 2.2. A review of the state-of-the-art of active learning algorithms is needed to explicitly link model extraction to active learning and is presented in Section 3.

2.1 Passive learning

In the standard, passive machine learning setting, the learner has access to a large labeled dataset and uses it in its entirety to learn a predictive model from a given class. Let be an instance space, and be a set of labels. For example, in object recognition, can be the space of all images, and can be a set of objects that we wish to detect in these images. We refer to a pair as a data-point or labeled instance ( is the instance, is the label). Finally, there is a class of functions from to called the hypothesis space that is known in advance. The learner’s goal is to find a function that is a good predictor for the label given the instance , with . To measure how well

predicts the labels, a loss function

is used. Given a data-point , measures the “difference” between and the true label . When the label domain is finite (classification problem), the - loss function is frequently used:

If the label domain is continuous, one can use the square loss: .

In the passive setting, the PAC (probably approximately correct) learning [65] framework is predominant. Here, we assume that there is an underlying distribution on that describes the data; the learner has no direct knowledge of but has access to a set of training data drawn from it. The main goal in passive PAC learning is to use the labeled instances from to produce a hypothesis

such that its expected loss with respect to the probability distribution

is low. This is often measured through the generalization error of the hypothesis , defined by


More precisely, we have the following definition.

Definition 1 (PAC passive learning [65]).

An algorithm is a PAC passive learning algorithm for the hypothesis class if the following holds for any on and any : If is given i.i.d. data-points generated by , then outputs such that with probability at least . We refer to as the sample complexity of algorithm .

Remark 1 (Realizability assumption).

In the general case, the labels are given together with the instances, and the factor depends on the hypothesis class. Machine learning literature refers to this as the agnostic learning or the non-separable case of PAC learning. However, in some applications, the labels themselves can be described using a labeling function . In this case (known as realizable learning), and the distribution can be described by its marginal over . A PAC passive learning algorithm in the realizable case takes i.i.d. instances generated by and the corresponding labels generated using , and outputs such that with probability at least .

2.2 Active learning

In the passive setting, learning an accurate model (i.e. learning with low generalization error) requires a large number of data-points. Thus, the labeling effort required to produce an accurate predictive model may be prohibitive. In other words, the sample complexity of many learning algorithms grows rapidly as (refer Example 1). This has spurred interest in learning algorithms that can operate on a smaller set of labeled instances, leading to the emergence of active learning. In active learning, the learning algorithm is allowed to select a subset of unlabeled instances, query their corresponding labels from an annotator (a.k.a oracle) and then use it to construct or update a model. How the algorithm chooses the instances varies widely. However, the common underlying idea is that by actively choosing which data-points are used for training, the learning algorithm can aggressively reduce sample complexity.

Formally, an active learning algorithm is an interactive process between two parties - the oracle and the learner . The only interaction allowed is through queries - chooses and sends it to , who responds with (i.e., the oracle returns the label for the chosen unlabeled instance). This value of is then used by to infer some information about the labeling procedure, and to choose the next instance to query. Over many such interactions, outputs as a predictor for labels. We can use the generalization error (1) to evaluate the accuracy of the output . However, depending on the query strategy chosen by , other types of error can be used.

There are two distinct scenarios for active learning: PAC active learning and Query Synthesis (QS) active learning111Query synthesis active learning is also known as Membership Query Learning.. We describe these scenarios below.

2.2.1 PAC active learning

This scenario was introduced by Dasgupta in 2005 [22] in the realizable context and then subsequently developed in following works (e.g., [10, 21, 32]). In this scenario, the instances are sampled according to the marginal of over , and the learner, after seeing them, decides whether to query for their labels or not. Since the data-points seen by come from the actual underlying distribution , the accuracy of the output hypothesis is measured using the generalization error (1), as in the classic (i.e., passive) PAC learning.

There are two options to consider for sampling data-points. In stream-based sampling222This is also called selective sampling., the instances are sampled one at a time, and the learner decides whether to query for the label or not on a per-instance basis. Pool-based sampling assumes that all of the instances are collected in a static pool and then the learner chooses specific instances in and queries for their labels. Typically, instances are chosen by in a greedy fashion using a metric to evaluate all instances in the pool. This is not possible in stream-based sampling, where goes through the data sequentially, and has to therefore make decisions to query individually. Pool-based sampling is extensively studied since it has applications in many real-world problems, such as text classification, information extraction, image classification and retrieval, etc. [48]. Stream-based sampling represents scenarios where obtaining unlabeled data-points is easy and cheap, but obtaining their labels is expensive (e.g., stream of data is collected by a sensor, but the labeling needs to be performed by an expert).

Before describing query synthesis active learning, we wish to highlight the advantage of PAC active learning over passive PAC learning (i.e. the reduced sample complexity) for some hypothesis class through Example 1. Recall that this advantage comes from the fact that an active learner is allowed to adaptively choose the data from which it learns, while a passive learning algorithm learns from a static set of data-points.

Example 1 (PAC learning for halfspaces).

Let be the hypothesis class of -dimensional halfspaces, used for binary classification. A function in

is described by a normal vector

(i.e., ) and is defined by

where given two vectors , then their product is defined as . Moreover, if , then if and otherwise. A classic result in passive PAC learning states that data-points are needed to learn [65]. On the other hand, several works propose active learning algorithms for with sample complexity333The notation ignores logarithmic factors and terms dependent on . (under certain distributional assumptions). For example, if the underlying distribution is log-concave, there exists an active learning algorithm with sample complexity [12, 13, 73]. This general reduction in the sample complexity for is easy to refer when . In this case, the data-points lie on the real line and their labels are a sequence of ’s followed by a sequence of ’s. The goal is to discover a point where the change from to happens. PAC learning theory states that this can be achieved with 444More generally, points. points i.i.d. sampled from . On the other hand, an active learning algorithm that uses a simple binary search can achieve the same task with queries [22] (refer Figure 2).

Figure 2: Halfspace classification in dimension .

2.2.2 Query Synthesis (QS) active learning

In this scenario, the learner can request labels for any instance in the input space , including points that the learner generates de novo, independent of the distribution (e.g., can ask for labels for those that have zero-probability of being sampled according to ). Query synthesis is reasonable for many problems, but labeling such arbitrary instances can be awkward if the oracle is a human annotator. Thus, this scenario better represents real-world applications where the oracle is automated (e.g., results from synthetic experiments [38]). Since the data-points are independent of the distribution, generalization error is not an appropriate measure of accuracy of the hypothesis , and other types of error are typically used. These new error formulations depend on the concrete hypothesis class considered. For example, if is the class of boolean functions from to , then the uniform error is used. Assume that the oracle knows and uses it as labeling function (realizable case), then the uniform error of the hypothesis is defined as

where is sampled uniformly at random from the instance space . Recent work [4, 18], for the class of halfspaces (refer to Example 1) use geometric error. Assume that the true labeling function used by the oracle is , then the geometric error of the hypothesis is defined as

where is the 2-norm (i.e. euclidean distance from the vector to the vector ).

In both active learning scenarios (PAC and QS), the learner needs to evaluate the “usefulness” of an unlabeled instance , which can either be generated de novo or sampled from the given distribution, in order to decide whether to query the oracle for the corresponding label. In the state of the art, we can find many ways of formulating such query strategies. Most of existing literature presents strategies where efficient search through the hypothesis space is the goal (refer the survey by Settles [58]). Another point of consideration for an active learner is to decide when to stop. This is essential as active learning is geared at improving accuracy while being sensitive to new data acquisition cost (i.e., reducing the query complexity). While one school of thought relies on the stopping criteria based on the intrinsic measure of stability or self-confidence within the learner, another believes that it is based on economic or other external factors (refer [58, Section 6.7]).

Given this large variety within active learning, we enhance the standard definition of a learning algorithm and propose the definition of an active learning system, which is geared towards model extraction.

Definition 2 (Active learning system).

Let be a hypothesis class with instance space and label space . An active learning system for is given by two entities, the learner and the oracle , interacting via membership queries555 Another type of queries that was studied is equivalence query [5]. Here can ask if an hypothesis is correct or not. The oracle answers yes if . Otherwise, it answers no and provides a counterexample (i.e. such that ). : sends to an instance ; answers with a label . We indicate via the notation the realizable case where uses a specific labeling function , i.e. . The behavior of is described by the following parameters:

  1. Scenario: this is the rule that describes the generation of the input for the querying process (i.e. which instances can be queried). In the PAC scenario, the instances are sampled from the underlying distribution . In the query synthesis (QS) scenario, the instances are generated by the learner ;

  2. Query strategy: given a specific scenario, the query strategy is the algorithm that adaptively decides if the label for a given instance is queried for, given that the queries have been answered already. In the query synthesis scenario, the query strategy also describes the procedure for instance generation.

  3. Stopping criteria: this is a set of considerations used by to decide when it must stop asking queries.

Any system described as above is an active learning system for if one of the following holds:

  • [label=-]

  • (PAC scenario) For any on and any , if is allowed to interact with using queries, then outputs such that with probability at least .

  • (QS scenario) Fix an error measure Err for the functions in . For any , if is allowed to interact with using queries, then outputs such that with probability at least .

We refer to as the query complexity of .

As we will show in the following section (in particular, refer subsection 3.2), the query synthesis scenario is more appropriate in casting model extraction attack as active learning.

3 Model Extraction

In this section, we begin by formalizing the process of model extraction (subsection 3.1). We draw parallels between model extraction and active learning in subsection 3.2. We then discuss possible defenses strategies based on noisy answers in subsection 3.3.

3.1 Model Extraction Definition

We begin by describing the operational ecosystem of model extraction attacks in the context of MLaaS systems. An entity learns a private model from a public class

, and provides it to the MLaaS server. The server provides a client-facing query interface for accessing the model for prediction. For each query issued by a client, the server responds with the corresponding prediction. For example, in the case of logistic regression, the MLaaS server knows a model represented by parameters

. The client issues queries of the form , and the MLaaS server responds with if and otherwise, with .

Model extraction is the process where an adversary exploits this interface to learn labels that can then be used to learn more about the proprietary model . The adversary can be interested in defrauding the description of the model itself (i.e., stealing the parameters as in a reverse engineering attack), or in obtaining an approximation of the model, say , that he can then use for free for the same task as the original was intended for. To capture the different goals of an adversary, we say that the attack is successful if the extracted model is close enough to according to an error function on that is context dependent. Since many existing MLaaS providers operate in a pay-per-query regime, we use query complexity as a measure of efficiency of such model extraction attacks.

Formally, consider the following experiment: an adversary , who knows the hypothesis class , has oracle access to a proprietary model from . This can be thought of as interacting with a server that safely stores . The interaction has several rounds. In each round, chooses an instance and sends it to . The latter responds with . After a few rounds, outputs a function that is the adversary’s candidate approximation of ; the experiment considers a good approximation if its error with respect to the true labeling function held by the server is less then a fixed threshold . The error function Err is defined a priori and fixed for the extraction experiment on the hypothesis class .

Experiment 1 (Extraction experiment).

Given the hypothesis class , fix an error function . Let be a MLaaS server with the knowledge of a specific , denoted by . Let be an adversary interacting with with a maximum budget of queries. The extraction experiment proceeds as follows

  1. is given a description of and oracle access to through the query interface of . That is, if sends to , it gets back . After at most queries, eventually outputs ;

  2. The output of the experiment is if . Otherwise the output is .

Informally, an adversary is successful if with high probability the output of the extraction experiment is for a small value of and a fixed query budget . This means that likely learns a good approximation of by only asking queries to the server. More precisely, we have the following definition.

Definition 3 (Extraction attack).

Let be a public hypothesis class and an MLaaS server as explained before. We say that an adversary , which interacts with , implements an -extraction attack of complexity and confidence against the class if

for any . The probability is over the randomness of .

In other words, in Definition 3 the success probability of an adversary constrained by a fixed budget for queries is explicitly lower bounded by the quantity , that we call confidence.

Before discussing the connection between model extraction and active learning, we provide an example of a hypothesis class that is easy to extract.

Example 2 (Equation-solving attack for linear regression).

Let be the hypothesis class of regression models from to . A function in this class is described by parameters from and defined by: for any ,

Consider the adversary that queries ( instances from ) chosen in such a way that the set of vectors is linearly independent in . receives the corresponding labels, , and can therefore solve the linear system given by the equations . Assume that is the function known by the MLaaS server (i.e., ). It is easy to see that if we fix , then . That is, implements -extraction of complexity and confidence .

Tramèr et al. [64]

introduce the equation solving attack for logistic regression and multilayer perceptron models. However, they consider a different attack model, one where the server returns the label

and auxiliary information about it (e.g., the probability of being correct) for each instance . We discuss such attack models in more detail in Remark 2.

3.2 Active Learning and Extraction

From the description presented in the Section 2, it is clear that model extraction in the MLaaS system context closely resembles active learning. The survey of active learning in subsection 2.2 contains a variety of algorithms and scenarios which can be used to implement model extraction attacks (or to study its impossibility). However, not all possible scenarios of active learning are interesting for model extraction. We notice that in the case of model extraction, an adversary has no knowledge of the data distribution . Additionally, such an adversary is not restricted to only considering instances to query. For this reason, we believe that query synthesis (QS) is the right active learning scenario to investigate in order to draw a meaningful parallelism with model extraction. Recall that the query synthesis is the only framework where the query inputs can be generated de novo (i.e., they do not conform to a distribution). Formally, we have the following:

Theorem 1.

Given a hypothesis class and an error function Err, let ) be an active learning system for in the QS scenario (Definition 2). If the query complexity of is , then there exists and adversary that implements -extraction with complexity and confidence against the class .


Consider the adversary that is the learner (i.e., deploys the query strategy procedure and the stopping criteria that describe ). This is possible because is in the QS scenario and is independent of any underlying unknown distribution. Let and observe that

Theorem 1 states that any active learning algorithm in the QS scenario can be used to implement a model extraction attack. Therefore, in order to study the security of a given hypothesis class in the MLaaS framework, we can use known techniques and results from the active learning literature. Two examples of this follow - we describe the active learning algorithms presented in [4] and in [43] as model extraction attacks.

Example 3 (Decision tree extraction via QS active learning).

Let denote the set of boolean functions with domain and range666The reader can think of as and as , using the range of is very common in the literature on learning boolean functions. . An interesting subset of

is given by the functions that can be represented as a boolean decision tree. A

boolean decision tree is a labeled binary tree, where each node of the tree is labeled by and has two outgoing edges. Every leaf in this tree is labeled either or . Given an -bit string as input, the decision tree defines the following computation: the computation starts at the root of the tree . When the computation arrives at an internal node we calculate the parity of and go left if the parity is and go right otherwise. The value of the leaf that the computation ends up in is the value of the function. We denote by the class of boolean decision trees with -bit input and nodes. Kushilevitz and Mansour [43] present an active learning algorithm for the class that works in the QS scenario. This algorithm utilizes the uniform error to determine the stopping condition (refer subsection 2.2). The authors claim that this algorithm has practical efficiency when restricted to the classes for any . In particular, if the active learner of [43] interacts with the oracle where , then learns such that with probability at least using a number of queries polynomial in , , and . Using Theorem 1, this directly translates to the existence of an adversary that implements -extraction with complexity polynomial in , , and confidence against the class . Moreover, the algorithm of [43] can be extended to (a) boolean functions of the form that can be computed by a polynomial-size -ary decision tree777A -ary decision tree is a tree in which each inner node has outgoing edges., and (b) regression trees (i.e., the output is a real value from ). In the second case, the running time of the learning algorithm is polynomial in (refer Section 6 of [43]). Note that the attack model considered here is a stronger model than that considered by [64] because the attacker/learner does not get any information about the internal path of the decision tree (refer Remark 2).

Example 4 (Halfspace extraction via QS active learning).

Let be the hypotheses class of -dimensional halfspaces defined in Example 1. Alabdulmohsin et al. [4] present a spectral algorithm to learn a halfspace in the QS scenario that, in practice, outperformed earlier active learning strategies in the PAC scenario. They demonstrate, through several experiments that their algorithm learns such that with approximately queries, where is the labeling function used by . It follows from Theorem 1 that an adversary utilizing this algorithm implements -extraction against the class with complexity and confidence . We validate the practical efficacy of this attack in Section 4.

Remark 2 (Extraction with auxiliary information).

Observe that we define model extraction for only those MLaaS servers that return only the label value for a well-formed query (i.e. in the oracle access setting). A weaker model (i.e., one where attacks are easier) considers the case of MLaaS servers responding to a user’s query even when is incomplete (i.e. with missing features), and returning the label along with some auxiliary information. The work of Tramèr et al. [64] proves that model extraction attacks in the presence of such “leaky servers” are feasible and efficient (i.e. low query complexity) for many hypothesis classes (e.g., logistic regression, multilayer perceptron, and decision trees). In particular, they propose an equation solving attack [64, Section 4.1] that uses the confidence values returned by the MLaaS server together with the labels to steal the parameters of a logistic regression model or a perceptron model. For example, in the case of logistic regression, the MLaaS server knows the parameters and responds to a query with the label ( if and otherwise) and the value as confidence value for . Clearly, the knowledge of the confidence values allows an adversary to implement the same attack we describe in Example 2

for linear regression models.

[64, Section 4.2] describes a path-finding attack that use the leaf/node identifier returned by the server, even for incomplete queries, to steal a decision tree. These attacks are very efficient (i.e., queries are needed to steal a -dimensional logistic regression model); however, their efficiency heavily relies on the presence of the various forms of auxiliary information provided by the MLaaS server. In Section 6 of [64]

, attacks for logistic regression, neural networks, and kernel SVMs are proposed that do not use the confidences values. For binary logistic regression the best strategy proposed is the linear search of Lowd and Meek

[47], while for multiclass logistic regression and RBF kernel SVMs, the best attack uses the PAC active learning algorithm of [19] (stream-based sampling model). All these attacks are considered less feasible than the earlier proposals because of the higher query cost (e.g., for logistic regression, less queries are needed in the equation solving attack). However, the work in [64] does not consider more recent, and efficient algorithms in the QS scenario. Our work explores this direction through a formalization of the model extraction framework that enables understanding the possibility of extending/improving the active learning attacks presented in [64]. Furthermore, having a better understanding of model extraction attack and its unavoidable connection with active learning is paramount for designing MLaaS systems that are resilient to model extraction.

3.3 Defense Strategies

Theorem 1 formalizes our main claim that model extraction in the context of MLaaS systems described at the beginning of Section 3 (i.e., oracle access) is equivalent to QS active learning. Therefore, any advancement in the area of QS active learning directly translates to a new threat for MLaaS systems. In this section, we discuss strategies that could be used to make the process of extraction more difficult, such as adding some noise to the returned label, and we investigate the link between machine-learning in the noisy setting and model extraction. We leave the design of a good defense strategy as an open problem, and believe this is an interesting direction for future work where the machine learning and the security communities can fruitfully collaborate.

In this section, we assume that the MLaaS server with the knowledge of , , has the freedom to modify the prediction before forwarding it to the client. More precisely, we assume that there exists a (possibly) randomized procedure that the server uses to compute the answer to a query , and returns that instead of . We use the notation to indicate that the server implements to protect . Clearly, the learner that interacts with can still try to learn a function from the noisy answers from the server, however because of added noise this process could require more queries or could produce a less accurate model (respect to the undefendable case).

3.3.1 Classification case

We focus on the binary classification problem where is an hypothesis class of functions of the form and is binary, but our argument can be easily generalized to the multi-class setting.

First, in the following two remarks we recall two known results from the literature [33] that establish information theoretical bounds (i.e., the computational cost is ignored) for the number of queries required to extract the model when any defense is implemented. Let be the generalization error of the model known by the server and be the generalization error of the model learned by an adversary interacting with . Assume that the hypothesis class has VC dimension888 The VC dimension of a hypothesis class is the largest number such that there exists a subset of size which can be shattered by . An example set is said to be shattered by if . equal to .

Remark 3 (Passive learning).

Assume that the adversary uses a passive learning algorithm to compute , such as the Empirical Risk Minimization algorithm999Given a labeled training set , the Empirical Risk Minimization algorithm outputs .. Then, the adversary can learn with excess error (i.e., ) with examples. For any algorithm, there is a distribution such that the algorithm needs at least samples to achieve an excess error of .

Remark 4 (Active learning).

Assume that the adversary uses an active learning algorithm to compute , such as the disagreement-based active learning algorithm [33]. Then, the adversary achieves excess error with queries (where is the disagreement coefficient [33]). For any active learning algorithm, there is a distribution such that it takes at least queries to achieve an excess error of .

Then, we discuss some common defense strategies that unfortunately do not define an effective defense. Observe that any defense strategy used by a server to prevent the extraction of a model can be seen as a randomized procedure that outputs instead of with a given probability over the random coins of . In the discrete case, we represent this with the notation



is the random variable that represents the answer of the server

to the query (e.g., ). When the function is fixed, we can consider the supremum of the function , which represents the upper bound for the probability that an answer from is wrong:

Before discussing potential defense approaches, we first present a general negative result. The following proposition states that that any candidate defense that correctly responds to a query with probability greater than or equal to for some constant for all instances can be easily broken. Indeed, an adversary that repetitively queries the same instance can figure out the correct label by simply looking at the most frequent label that is returned from . We prove that with this extraction strategy, the number of queries required increases by only a logarithmic multiplicative factor. More precisely:

Proposition 1.

Let be an hypothesis class used for classification and be an active learning system for in the QS scenario with query complexity . For any , randomized procedure for returning labels, such that there exists with , there exists an adversary that, interacting with , can implement an -extraction attack with confidence and complexity .

The proof of Proposition 1 can be found in Appendix A.1.

Proposition 1 can be used to discuss the following two different defense strategies:

  • Data-independent randomization. Let denote a hypothesis class that is subject to an extraction attack using QS active learning. An intuitive defense for involves adding noise to the query output independent of the labeling function and the input query . In other words, for any , , and is a constant value in the interval . It is easy to see that this simple strategy cannot work. It follows from Proposition 1 that if , then is not secure. On the other hand, if , then the server is useless since it outputs an incorrect label with probability at least 1/2.

Example 5 (Halfspace noisy extraction).

For example, we know that -extraction with any level of confidence can be implemented with complexity using QS active learning for the class i.e. for binary classification via halfspaces (refer Example 4). It follows from the earlier discussion that any defense that flips labels with a constant flipping probability does not work. This defense approach is similar to the case of “noisy oracles” studied extensively in the active learning literature [36, 37, 53]. For example, from the machine-learning literature we know that if the flipping probability is exactly (), the AVERAGE algorithm (similar to our Algorithm 1) -extracts with labels [39]. Under bounded noise where each label is flipped with probability at most (), the AVERAGE algorithm does not work anymore, but a modified Perceptron algorithm can learn with labels [71] in a stream-based active learning setting, and a QS active learning algorithm proposed by Chen et al. [18] can also learn with the same number of labels. An adversary implementing [18] is even more efficient than the adversary defined in the proof of Proposition 1 (i.e., the total number of queries only increases by a constant multiplicative factor instead of ). We validate the practical efficiency of this attack in Section 4.

  • Data-dependent randomization. Based on the outcome of the earlier discussion, we believe that a defense that aims to protect a hypothesis class against model extraction via QS active learning should implement data-dependent perturbation of the returned labels. That is, we are interested in a defense such that the probability depends on the query input and eventually the labeling function . For example, given a class that can be extracted using an active learner (in the QS scenario), if we consider a defense such that for some instances, then the proof of Proposition 1 does not work (the argument only works if there is a constant such that for all ) and the effectiveness of the adversary is not guaranteed anymore101010Intuitively, in the binary case if then the definition of performed by in step 2 (majority vote) is likely to be wrong. However, notice that this is not always the case in the multiclass setting: For example, consider the case when the answer to query is defined to be wrong with probability and, when wrong, is sampled uniformly at random among the classes that are different to the true class , then if is large enough, defined via the majority vote is likely to be still correct..

Example 6 (Halfspace noisy extraction).

For the case of binary classification via halfspaces, Alabdulmohsin et al. [3] design a system that follows this strategy. They consider the class and design a learning rule that uses training data to infer a distribution of models, as opposed to learning a single model. To elaborate, the algorithm learns the mean and the covariance

for a multivariate Gaussian distribution

on such that any model drawn from provides an accurate prediction. During prediction, when the label for a instance is queried, a new is drawn at random from the learned distribution and the label is computed as . The problem of learning such a distribution of classifiers is formulated as a convex-optimization problem, which can be solved quite efficiently using existing solvers. The authors show that this randomization method can mitigate the risk of reverse engineering without incurring any notable loss in predictive accuracy. In particular, they use PAC active learning algorithms [19, 12] (assuming that the underlying distribution is Gaussian) to learn an approximation from queries answered in three different ways: (a) with their strategy, i.e. using a new model for each query, (b) using a fixed model to compute all labels, and (c) using a fixed model and adding independent noise to each label, i.e. and . They show that the geometric error of with respect to the true model is higher in the former setting (i.e. in (a)) than in the others. When the number of queries is fixed to 1000, on 15 different UCI datasets, their strategy gives typically an order of magnitude larger error. We empirically evaluate this defense in the context of model extraction using QS active learning algorithms in Section 4.

3.3.2 Continuous case

We discuss here the difficulties in generalizing Proposition 1 to the continuous case, i.e. when the target model held by the MLaaS server is a real-valued function ; a thorough investigation is left for future work. As already noticed, the fact that MLaaS server implements a defense strategy means that its answer to a query can be represented by a random variable . For the discrete case, we capture the probability of being wrong (i.e., being different from the value ) via  (2); on the other hand this formalization is not the natural one when considering real-valued models: In the continuous case an answer from the server can be considered correct when it is “close enough” to the value (even if it is not equal to it). In other words, we are now interested in


instead of as defined in (2). Define , intuitively any candidate defense for which there exists a function such that is small (i.e., close to zero) should be easy to break when is used to protect . This is because small implies that is a good approximation of and an adversary that queries an instance several times can efficiently compute an approximation of using known results, e.g. [44]; then, can learn using the labeled points . The difficulty in formalizing this argument comes from the fact that, in general, it is complicate to bound the error of

as estimate of

(while in the classification case the Chernoff bound can be used with the majority vote), and moreover it is hard to predict the effect of the estimate error on the output of a learning system.

4 Implementation and Evaluation

We carried out experiments to validate our claims that query synthesis active learning can be used to successfully perform model extraction. Our experiments are designed to answer the following three questions:

  1. Is active learning practically useful in settings without any auxiliary information, such as confidence values i.e. in an oracle access setting?

  2. Is active learning useful in scenarios where the oracle is able to perturb the output i.e. in a data-independent randomization setting?

  3. Is active learning useful in scenarios where the oracle is able to perform more subtle perturbations i.e. in a data-dependent randomization setting?

To answer these questions, we focused on the hypothesis class of -dimensional half spaces and we implemented two QS algorithms [4, 18]. To perform model extraction, we use the aforementioned algorithms to learn an approximation , and terminate execution when . For our experiments, the halfspace held by the server/oracle (i.e., the optimal hypothesis ) was learned using Python’s scikit-learn library All experiments were executed on a Ubuntu 16.04 server with 32 GB RAM, and an Intel i5-6600 CPU clocking 3.30GHz. In the context of the three questions, our experiments suggest that:

  1. QS active learning algorithms are efficient for model extraction, with low query complexity and run-time. For the digits dataset (), the dataset with the largest value of which we evaluated on, the active learning algorithm implemented required 900 queries () to extract the halfspace with geometric error .

  2. QS active learning algorithms are also efficient when the oracle flips the labels independently with constant probability . This only moderately increases the query complexity (for low values of ). For the digits dataset (), with , the implemented algorithm required 36546 ( worth of) queries to extract the halfspace with geometric error .

  3. State-of-the-art QS algorithms fail to recover the model when the oracle responds to queries using tailored model randomization techniques (refer subsection 3.3). However, passive learning algorithms (refer Algorithm 1) are effective in this setting.

4.1 Results

We begin by describing evaluation results for the aforementioned three questions. We conclude by comparing our approach with the algorithm proposed by Lowd and Meek [47].

Figure 3: Number of queries needed for halfspace extraction using the version space approximation algorithm. Note that the asymptotic query complexity for this algorithm is . This explains the increase in query complexity as a function of .

Q1. Usefulness in an oracle access setting: We implemented Version Space Approximation proposed by Alabdulmohsin et al. [4] in approximately 50 lines of MATLAB. This algorithm operates iteratively, based on the principles of version space learning111111A version space [50] is a hierarchical representation of knowledge that enables you to keep track of all the useful information supplied by a sequence of learning examples without remembering the specific examples. i.e. search a predefined space of hypotheses [49]. In each iteration, the algorithm first approximates a version space, and then synthesizes an instance that reduces this approximated version space quickly. The final query complexity for this algorithm is .

Figure 3 plots the number of queries needed to extract a halfspace as a function of termination criterion/geometric error . As discussed earlier, the query complexity is dependent on the dimensionality of the halfspace to be extracted. Across all values of , observe that with the exponential decrease in error , the increase in query complexity is linear - often by a small factor (). The implemented query synthesis algorithm involves solving a convex optimization problem to approximate the version space, an operation that is potentially time consuming. However, based on several runs of our experiment, we noticed that the algorithm always converges in minutes (for our choice of operational parameters). The cost of extraction is based on current pricing standards.

While the equation solving attack proposed by Tramèr et al. [64] requires fewer queries, it also requires the actual value of the prediction output i.e. as a confidence value. On the other hand, extraction using query synthesis does not rely on any auxiliary information returned by the MLaaS server to increase its efficiency i.e. the only input needed for query synthesis-based extraction attacks is sign(). Such extraction attacks do not rely on any distributional assumption made on the input space.

Q2. Resilience to data-independent noise: An intuitive defense against model extraction might be to flip the sign of the prediction output with independent probability i.e. if the output , then (refer subsection 3.3). This setting (i.e., noisy oracles) is extensively studied in the machine learning community. Trivial solutions including repeated sampling to obtain a batch where majority voting (determines the right label) can be employed; if the probability that the outcome of the vote is correct is represented as , then the batch size needed for the voting procedure is ) i.e. there is an increase in query complexity by a (multiplicative) factor , an expensive proposition. While other solutions exist [70, 52], we implemented the dimension coupling framework proposed by Chen et al. [18] in approximately 150 lines of MATLAB. The dimension coupling framework reduces a dimensional learning problem to lower-dimensional sub-problems. It then appropriately aggregates the results to produce a halfspace. This approach is resilient to noise i.e. the oracle can flip the label with constant probability (known a priori) , and the algorithm will converge with probability . The query complexity for this algorithm is .

(a) Adult Income
(b) Breast Cancer
(c) Digits
(d) Wine
Figure 4: Number of queries needed for halfspace extraction using the dimension coupling algorithm. Note that the asymptotic query complexity for this algorithm is . This explains the increase in query complexity as a function of . Refer to footnote 12 for the relation between the number of queries and .

The results of our experiment are presented in Figure 4. The algorithm is successful in extracting the halfspace for a variety of values121212The exact bound is , where is a function of that is approximately . Thus, there is a multiplicative increase in the number of queries with increase in ., with only a modest increase in complexity in comparison to the noise-free setting. The time (and number of queries) taken for convergence is proportional to , ranging from minutes for successful completion. For the worst query complexity (Fig. 4(c)), the cost of extraction is .

Q3. Resilience to data-dependent noise: As alluded to in subsection 3.3, another defense against extraction involves learning a family of functions very similar to such that they all provide accurate predictions with high probability. Proposed by Alabdulmohsin et al. [3], data-dependent randomization enables the MLaaS server to sample a random function for each query i.e. for each instance , the MLaaS server obtains a new and responds with . Thus, this approach can be thought of as flipping the sign of the prediction output with probability (defined in subsection 3.3).

In this algorithm, a separation parameter determines how close the samples from are; larger the value of , closer each sample is (refer Section 4 in [3] for more details). We measure the value of as a function of for those values generated by the dimension coupling algorithm. is estimated by (a) obtaining , for , and using them to classify to obtain , and (b) obtaining the percentage of the prediction outputs that is not equal to . Our hope was that if the value of , then an adversary similar to defined in Proposition 1 could be used to perform extraction.

Figure 5: ; synthesized by the dimension coupling algorithm.
(a) Version Space Approximation
Figure 6: Distance of the instances synthesized by (a) version space approximation algorithm, and (b) dimension coupling algorithm from optimal halfspace.

Figure 5 suggests otherwise; the average value of for some small . Since any adversary will be unable to determine a priori the inputs for which this value is greater than half, neither majority voting, nor the vanilla dimension coupling framework will help extract the halfspace. We believe this is the case for current state-of-the-art algorithms as the instances they synthesize are close to131313Observe that distance of a point from halfspace is given by . the optimal halfspace. To validate this claim, we measure this distance for both the algorithms [4, 18] and plot it in Figure 6. Observe that a majority of the points are very close to the halfspace in both cases.

Such forms of data-dependent randomization, however, are not secure against traditional passive learning algorithms. Such an algorithm takes as input an estimated upper bound of . The algorithm first draws instances from the dimensional unit sphere uniformly at random, and proceeds to have them labeled - by the oracle defined in [3] in this case. It then computes the average . , the direction of , is the algorithm’s estimate of the classifier , and the length of is used as an indicator of whether the algorithm succeeds: if this estimated upper bound is correct (i.e. ), then with high probability, ; otherwise it outputs fail

, indicating the variance bound

is incorrect. In such situations, we can reduce and try again. A detailed proof of the algorithm’s guarantees is available in Appendix A.2.

1:Input: variance upper bound , target error
3:Draw uniformly at random, and query their labels
5:if  then
6:     Return
8:     Return fail
9:end if
Algorithm 1 Passive Learning Algorithm
Figure 7: log(Number of queries) needed for halfspace extraction (protected by the defense strategy proposed in [3]) using Algorithm 1. Note that the asymptotic query complexity for this algorithm is . This explains the increase in query complexity as a function of and . The large value of dominates the query complexity in this algorithm.

While the asymptotic bounds for Algorithm 1 are larger than the active learning algorithms discussed thus far, the constant can be reduced by a multiplicative factor to reduce the total number of queries used i.e. or etc. In Figure 7, we observe that extracting halfspaces with geometric error requires queries (with an estimated cost of ). While achieving requires queries (cost of ), the algorithm can be executed in parallel enabling faster run-times.

Dataset # Queries Geometric Error
Wine 189 0.071
Breast Cancer 940 0.162
Digits 1879 0.665
Table 1: Number of queries and geometric error observed after extracting halfspaces using the line search procedure proposed by Lowd and Meek. Observe that the geometric error in some cases is large.

Lowd and Meek Baseline: The algorithm proposed by Lowd and Meek [47] can also be used to extract a halfspace. However, note that this algorithm can only operate in a noise-free setting. From Table 1, one can observe that the number of queries required to extract the halfspace is more than the query synthesis algorithms we implemented. For example, consider the breast cancer dataset. The version space algorithm is able to extract a halfspace at a distance of with 400 queries. The geometric error of the extracted halfspaces are also higher than those extracted in the query synthesis case. The query complexity of the Lowd and Meek algorithm is , where ( is the -th coordinate of the groundtruth classifier ). This is worse than the query complexity of classical active learning algorithms. While this algorithm is not tailored to minimize the geometric error, we believe that these results further validate our claim that query synthesis active learning is a promising direction to explore in connection to model extraction.

5 Discussion

We begin our discussion by highlighting algorithms an adversary could use if the assumptions made about the operational ecosystem are relaxed. Then, we discuss strategies that can potentially be used to make the process of extraction more difficult, and shortcomings in our approach.

5.1 Varying the adversary’s capabilities

The operational ecosystem in this work is one where the adversary is able to synthesize data-points de novo to extract a model through oracle access. In this section, we discuss other algorithms an adversary could use if this assumption is relaxed. We begin by discussing other models an adversary can learn in the query synthesis regime, and move on to discussing algorithms in other approaches.

Equivalence queries. In her seminal work, Angulin [6] proposes a learning algorithm, , to correctly learn a regular set from any minimally adequate teacher, in polynomial time. For this to work, however, equivalence queries are also needed along with membership queries. Should MLaaS servers provide responses to such equivalence queries, different extraction attacks could be devised. To learn linear decision boundaries, Wang et al. [68] first synthesize an instance close to the decision boundary using labeled data, and then select the real instance closest to the synthesized one as a query. Similarly, Awasthi et al. [9] study learning algorithms that make queries that are close to examples generated from the data distribution. These attacks require the adversary to have access to some subset of the original training data. In other domains, program synthesis using input-output example pairs [31, 67, 25, 57] also follows a similar principle.

If the adversary had access to a subset of the training data, or had prior knowledge of the distribution from which this data was drawn from, it could launch a different set of attacks based on the algorithms discussed below.

Stream-based selective sampling. Atlas et al. [8] propose selective sampling as a form of directed search (similar to Mitchell [49]) that can greatly increase the ability of a connectionist network (i.e. power system security analysis in their paper) to generalize accurately. Dagan et al. [20] propose a method for training probabilistic classifiers by choosing those examples from a stream that are more informative. Lindenbaum et al. [45]

present a lookahead algorithm for selective sampling of examples for nearest neighbor classifiers. The algorithm looks for the example with the highest utility, taking its effect on the resulting classifier into account. Another important application of selective learning was for feature selection

[46], an important preprocessing step. Other applications of stream-based selective sampling include sensor scheduling [40], learning ranking functions for information retrieval [72], and in word sense disambiguation [30].

Pool-based sampling. Dasgupta [23]

surveys active learning in the non-separable case, with a special focus on statistical learning theory. He claims that in this setting, AL algorithms usually follow one of the following two strategies - (i) Efficient search in the hypothesis spaces (as in the algorithm proposed by Chen

et al. [18], or by Cohn et al. [19]), or (ii) Exploiting clusters in the data (as in the algorithm proposed by Dasgupta et al. [24]). The latter option can be used to learn more complex models, such as decision trees. As the ideal halving algorithm is difficult to implement in practice, pool-based approximations are used instead such as uncertainty sampling and the query-by-committee (QBC) algorithm [29, 62, 16]

. Unfortunately, such approximation methods are only guaranteed to work well if the number of unlabeled examples (i.e. pool size) grows exponentially fast with each iteration. Otherwise, such heuristics become crude approximations and they can perform quite poorly.

5.2 Complex Models

A natural question that arises is as follows: how do our results apply to complex models, such as deep-neural networks (DNNs)? Active learning algorithms considered in this paper work in an iterative fashion. Let be the entire hypothesis class. At time time let the set of possible hypothesis be . Usually an active-learning algorithm issues a query at time and updates the possible set of hypothesis to , which is a subset of . Once the size of is “small” the algorithm stops. Analyzing the effect of a query on possible set of hypothesis is very complicated in the context of complex models, such as DNNs. We believe this is a very important and interesting direction for future work. However, our discussion on defenses is applicable to complex models because it essentially treats the classifier as a black-box (i.e., we do not use the internal structure of the classifier in our analysis).

5.3 Model Transferability

Most work in active learning has assumed that the correct hypothesis space for the task is already known i.e. if the model being learned is for logistic regression, or is a neural network and so on. In such situations, observe that the labeled data being used is biased, in that it is implicitly tied to the underlying hypothesis. Thus, it can become problematic if one wishes to re-use the labeled data chosen to learn another, different hypothesis space. This leads us to model transferability141414A special case of agnostic active learning [11]., a less studied form of defense where the oracle responds to any query with the prediction output from an entirely different hypothesis class. For example, imagine if a learner tries to learn a halfspace, but the teacher performs prediction using a boolean decision tree. Initial work in this space includes that of Shi et al. [59], where an adversary can steal a linear separator by learning input-output relations using a deep neural network. However, the performance of query synthesis active learning in such ecosystems is unclear.

5.4 Limitations

We stress that these limitations are not a function of our specific approach, and stem from the theory of active learning.

  1. As noted by Dasgupta [22], the label complexity of PAC active learning depends heavily on the specific target hypothesis, and can range from to . Similar results have been obtained by others [34, 51]. This suggests that for some hypotheses classes, the query complexity of active learning algorithms is as high as that in the passive setting.

  2. Some query synthesis algorithms assume that there is some labeled data to bootstrap the system. However, this may not always be true, and randomly generating these labeled points may adversely impact the performance of the algorithm.

  3. For our particular implementation, the algorithms proposed rely on the geometric error between the optimal and learned halfspaces. Oftentimes, however, there is no direct correlation between this geometric error and the generalization error used to measure the model’s goodness.

6 Related Work

Machine learning algorithms and systems are optimized for performance. Little attention is paid to the security and privacy risks of these systems and algorithms. Our work is motivated by the following attacks against machine learning.

1. Causative Attacks: These attacks are primarily geared at poisoning

the training data used for learning, such that the classifier produced performs erroneously during test time. These include: (a) mislabeling the training data, (b) changing rewards in the case of reinforcement learning, or (c) modifying the sampling mechanism (to add some bias) such that it does not reflect the true underlying distribution in the case of unsupervised learning

[56]. The work of Papernot et al. [55] modify input features resulting in misclassification by Deep Neural Networks.

2. Evasion Attacks: Once the algorithm has trained successfully, these forms of attacks provide tailored inputs such that the output is erroneous. These noisy inputs often preserves the semantics of the original inputs, are human imperceptible, or are physically realizable. The well studied area of adversarial examples is an instantiation of such an attack. Moreover, evasion attacks can also be even black-box i.e. the attacker needn’t know the model. This is because an adversarial example optimized for one model is highly likely to be effective for other models. This concept, known as transferability, was introduced by Carlini et al. [17]. Notable works in this space include [14, 26, 41, 42, 63, 54, 69, 27]

3. Exploratory Attacks:

These forms of attacks are the primary focus of this work, and are geared at learning intrinsics about the algorithm used for training. These intrinsics can include learning model parameters, hyperparameters, or training data. Typically, these forms of attacks fall in two categories -

model inversion, or model extraction. In the first class, Fredrikson et al. [28] show that an attacker can learn sensitive information about the dataset used to train a model, given access to side-channel information about the dataset. In the second class, the work of Tramer et al. [64] provides attacks to learn parameters of a model hosted on the cloud, through a query interface. Termed membership inference, Shokri et al. [60] learn the training data used for machine learning by training their own inference models. Wang et al. [66] propose attacks to learn a model’s hyperparameters.

7 Conclusions

In this paper, we formalize model extraction in the context of Machine-Learning-as-a-Service (MLaaS) servers that return only prediction values (i.e., oracle access setting), and we study its relation with query synthesis active learning (Theorem 1). By doing so, we are able to implement efficient attacks (i.e., attacks that require few queries) to the class of halfspace models used for binary classification (Section 4

). While our experiments focus on the class of halfspace models, we believe that extraction via active learning can be extended to multiclass and non-linear models such as deep neural networks, random forests etc. We also begin exploring possible defense approaches (subsection

3.3). To the best of our knowledge, this is the first work to formalize security in the context of MLaaS systems. We believe this is a fundamental first step in designing more secure MLaaS systems. Finally, we suggest that data-dependent randomization (e.g., model randomization as in [3]) is the most promising direction to follow in order to design effective defenses.


  • [1], 2017.
  • [2], 2018.
  • [3] Ibrahim M. Alabdulmohsin, Xin Gao, and Xiangliang Zhang.

    Adding robustness to support vector machines against adversarial reverse engineering.

    In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014, Shanghai, China, November 3-7, 2014, pages 231–240, 2014.
  • [4] Ibrahim M Alabdulmohsin, Xin Gao, and Xiangliang Zhang. Efficient active learning of halfspaces via query synthesis. In AAAI, pages 2483–2489, 2015.
  • [5] Dana Angluin. Learning regular sets from queries and counterexamples. Inf. Comput., 75(2):87–106, 1987.
  • [6] Dana Angluin. Learning regular sets from queries and counterexamples. Information and computation, 75(2):87–106, 1987.
  • [7] Giuseppe Ateniese, Luigi V. Mancini, Angelo Spognardi, Antonio Villani, Domenico Vitali, and Giovanni Felici. Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers. IJSN, 10(3):137–150, 2015.
  • [8] Les E Atlas, David A Cohn, and Richard E Ladner. Training connectionist networks with queries and selective sampling. In Advances in neural information processing systems, pages 566–573, 1990.
  • [9] Pranjal Awasthi, Vitaly Feldman, and Varun Kanade. Learning using local membership queries. In Conference on Learning Theory, pages 398–431, 2013.
  • [10] M.-F. Balcan, A. Beygelzimer, and J. Langford. Agnostic active learning. J. Comput. Syst. Sci., 75(1):78–89, 2009.
  • [11] Maria-Florina Balcan, Alina Beygelzimer, and John Langford. Agnostic active learning. Journal of Computer and System Sciences, 75(1):78–89, 2009.
  • [12] Maria-Florina Balcan, Andrei Z. Broder, and Tong Zhang. Margin based active learning. In Learning Theory, 20th Annual Conference on Learning Theory, COLT 2007, San Diego, CA, USA, June 13-15, 2007, Proceedings, pages 35–50, 2007.
  • [13] Maria-Florina Balcan and Philip M. Long. Active and passive learning of linear separators under log-concave distributions. In COLT 2013 - The 26th Annual Conference on Learning Theory, June 12-14, 2013, Princeton University, NJ, USA, pages 288–316, 2013.
  • [14] Arjun Nitin Bhagoji, Warren He, Bo Li, and Dawn Song. Black-box attacks on deep neural networks via gradient estimation. 2018.
  • [15] Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. arXiv preprint arXiv:1712.04248, 2017.
  • [16] Klaus Brinker. Incorporating diversity in active learning with support vector machines. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), pages 59–66, 2003.
  • [17] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In Security and Privacy (SP), 2017 IEEE Symposium on, pages 39–57. IEEE, 2017.
  • [18] Lin Chen, Seyed Hamed Hassani, and Amin Karbasi. Near-optimal active learning of halfspaces via query synthesis in the noisy setting. In AAAI, pages 1798–1804, 2017.
  • [19] David Cohn, Les Atlas, and Richard Ladner. Improving generalization with active learning. Machine learning, 15(2):201–221, 1994.
  • [20] Ido Dagan and Sean P Engelson. Committee-based sampling for training probabilistic classifiers. In Proceedings of the Twelfth International Conference on Machine Learning, pages 150–157. The Morgan Kaufmann series in machine learning,(San Francisco, CA, USA), 1995.
  • [21] S. Dasgupta, D. Hsu, and C. Monteleoni. A general agnostic active learning algorithm. In NIPS, 2007.
  • [22] Sanjoy Dasgupta. Coarse sample complexity bounds for active learning. In Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, NIPS 2005, December 5-8, 2005, Vancouver, British Columbia, Canada], pages 235–242, 2005.
  • [23] Sanjoy Dasgupta. Two faces of active learning. Theoretical computer science, 412(19):1767–1781, 2011.
  • [24] Sanjoy Dasgupta, Daniel J Hsu, and Claire Monteleoni. A general agnostic active learning algorithm. In Advances in neural information processing systems, pages 353–360, 2008.
  • [25] Dana Drachsler-Cohen, Sharon Shoham, and Eran Yahav. Synthesis with abstract examples. In International Conference on Computer Aided Verification, pages 254–278. Springer, 2017.
  • [26] Gamaleldin F Elsayed, Shreya Shankar, Brian Cheung, Nicolas Papernot, Alex Kurakin, Ian Goodfellow, and Jascha Sohl-Dickstein. Adversarial examples that fool both human and computer vision. arXiv preprint arXiv:1802.08195, 2018.
  • [27] Ivan Evtimov, Kevin Eykholt, Earlence Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir Rahmati, and Dawn Song. Robust physical-world attacks on deep learning models. arXiv preprint arXiv:1707.08945, 1, 2017.
  • [28] Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David Page, and Thomas Ristenpart. Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing. In USENIX Security Symposium, pages 17–32, 2014.
  • [29] Yoav Freund, H Sebastian Seung, Eli Shamir, and Naftali Tishby. Selective sampling using the query by committee algorithm. Machine learning, 28(2):133–168, 1997.
  • [30] Atsushi Fujii, Takenobu Tokunaga, Kentaro Inui, and Hozumi Tanaka. Selective sampling for example-based word sense disambiguation. Computational Linguistics, 24(4):573–597, 1998.
  • [31] Sumit Gulwani. Synthesis from examples: Interaction models and algorithms. In Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2012 14th International Symposium on, pages 8–14. IEEE, 2012.
  • [32] S. Hanneke. A bound on the label complexity of agnostic active learning. In ICML, 2007.
  • [33] Steve Hanneke. Theory of disagreement-based active learning. Foundations and Trends in Machine Learning, 7(2-3):131–309, 2014.
  • [34] Tibor Hegedűs. Generalized teaching dimensions and the query complexity of learning. In

    Proceedings of the eighth annual conference on Computational learning theory

    , pages 108–117. ACM, 1995.
  • [35] Ling Huang, Anthony D. Joseph, Blaine Nelson, Benjamin I. P. Rubinstein, and J. D. Tygar. Adversarial machine learning. In

    Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, AISec 2011, Chicago, IL, USA, October 21, 2011

    , pages 43–58, 2011.
  • [36] Matti Kääriäinen. Active learning in the non-realizable case. In Algorithmic Learning Theory, 17th International Conference, ALT 2006, Barcelona, Spain, October 7-10, 2006, Proceedings, pages 63–77, 2006.
  • [37] Richard M. Karp and Robert Kleinberg. Noisy binary search and its applications. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA, January 7-9, 2007, pages 881–890, 2007.
  • [38] Ross D King, Jem Rowland, Stephen G Oliver, Michael Young, Wayne Aubrey, Emma Byrne, Maria Liakata, Magdalena Markham, Pinar Pir, Larisa N Soldatova, et al. The automation of science. Science, 324(5923):85–89, 2009.
  • [39] Adam R. Klivans and Pravesh Kothari. Embedding hard learning problems into gaussian space. In

    Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2014, September 4-6, 2014, Barcelona, Spain

    , pages 793–809, 2014.
  • [40] Vikram Krishnamurthy.

    Algorithms for optimal scheduling and management of hidden markov model sensors.

    IEEE Transactions on Signal Processing, 50(6):1382–1397, 2002.
  • [41] Alex Kurakin, Dan Boneh, Florian Tramèr, Ian Goodfellow, Nicolas Papernot, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. 2018.
  • [42] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
  • [43] Eyal Kushilevitz and Yishay Mansour. Learning decision trees using the fourier spectrum. SIAM J. Comput., 22(6):1331–1348, 1993.
  • [44] Kevin A. Lai, Anup B. Rao, and Santosh Vempala. Agnostic estimation of mean and covariance. In IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, 9-11 October 2016, Hyatt Regency, New Brunswick, New Jersey, USA, pages 665–674, 2016.
  • [45] Michael Lindenbaum, Shaul Markovitch, and Dmitry Rusakov. Selective sampling for nearest neighbor classifiers. In AAAI/IAAI, pages 366–371. Citeseer, 1999.
  • [46] Huan Liu, Hiroshi Motoda, and Lei Yu. A selective sampling approach to active feature selection. Artificial Intelligence, 159(1-2):49–74, 2004.
  • [47] Daniel Lowd and Christopher Meek. Adversarial learning. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, Illinois, USA, August 21-24, 2005, pages 641–647, 2005.
  • [48] Andrew McCallum and Kamal Nigam. Employing EM and pool-based active learning for text classification. In Proceedings of the Fifteenth International Conference on Machine Learning, Madison, Wisconsin, USA, July 24-27, 1998, pages 350–358, 1998.
  • [49] Tom M Mitchell. Generalization as search. Artificial intelligence, 18(2):203–226, 1982.
  • [50] Tom Michael Mitchell. Version spaces: an approach to concept learning. Technical report, STANFORD UNIV CALIF DEPT OF COMPUTER SCIENCE, 1978.
  • [51] Mohammad Naghshvar, Tara Javidi, and Kamalika Chaudhuri. Noisy bayesian active learning. In Communication, Control, and Computing (Allerton), 2012 50th Annual Allerton Conference on, pages 1626–1633. IEEE, 2012.
  • [52] Robert Nowak. Noisy generalized binary search. In Advances in neural information processing systems, pages 1366–1374, 2009.
  • [53] Robert D. Nowak. The geometry of generalized binary search. IEEE Trans. Information Theory, 57(12):7893–7906, 2011.
  • [54] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages 506–519. ACM, 2017.
  • [55] Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami.

    The limitations of deep learning in adversarial settings.

    In Security and Privacy (EuroS&P), 2016 IEEE European Symposium on, pages 372–387. IEEE, 2016.
  • [56] Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael Wellman. Towards the science of security and privacy in machine learning. arXiv preprint arXiv:1611.03814, 2016.
  • [57] Hila Peleg, Shachar Itzhaky, and Sharon Shoham. Abstraction-based interaction model for synthesis. In International Conference on Verification, Model Checking, and Abstract Interpretation, pages 382–405. Springer, 2018.
  • [58] B Settles. Active learning literature survey univ. wisconsin-madison, madison, wi, 2009. Technical report, CS Tech. Rep. 1648.
  • [59] Yi Shi, Yalin Sagduyu, and Alexander Grushin. How to steal a machine learning classifier with deep learning. In Technologies for Homeland Security (HST), 2017 IEEE International Symposium on, pages 1–5. IEEE, 2017.
  • [60] Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In Security and Privacy (SP), 2017 IEEE Symposium on, pages 3–18. IEEE, 2017.
  • [61] Nedim Srndic and Pavel Laskov. Practical evasion of a learning-based classifier: A case study. In 2014 IEEE Symposium on Security and Privacy, SP 2014, Berkeley, CA, USA, May 18-21, 2014, pages 197–211, 2014.
  • [62] Simon Tong and Daphne Koller. Support vector machine active learning with applications to text classification. Journal of machine learning research, 2(Nov):45–66, 2001.
  • [63] Florian Tramèr, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. The space of transferable adversarial examples. arXiv preprint arXiv:1704.03453, 2017.
  • [64] Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. Stealing machine learning models via prediction apis. In 25th USENIX Security Symposium, USENIX Security 16, Austin, TX, USA, August 10-12, 2016., pages 601–618, 2016.
  • [65] Leslie G Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984.
  • [66] Binghui Wang and Neil Zhenqiang Gong. Stealing hyperparameters in machine learning. arXiv preprint arXiv:1802.05351, 2018.
  • [67] Chenglong Wang, Alvin Cheung, and Rastislav Bodik. Interactive query synthesis from input-output examples. In Proceedings of the 2017 ACM International Conference on Management of Data, pages 1631–1634. ACM, 2017.
  • [68] Liantao Wang, Xuelei Hu, Bo Yuan, and Jianfeng Lu. Active learning via query synthesis and nearest neighbour search. Neurocomputing, 147:426–434, 2015.
  • [69] David Warde-Farley and Ian Goodfellow. 11 adversarial perturbations of deep neural networks. Perturbations, Optimization, and Statistics, page 311, 2016.
  • [70] Songbai Yan, Kamalika Chaudhuri, and Tara Javidi. Active learning from imperfect labelers. In Advances in Neural Information Processing Systems, pages 2128–2136, 2016.
  • [71] Songbai Yan and Chicheng Zhang. Revisiting perceptron: Efficient and label-optimal learning of halfspaces. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 1056–1066, 2017.
  • [72] Hwanjo Yu. Svm selective sampling for ranking with application to data retrieval. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 354–363. ACM, 2005.
  • [73] Chicheng Zhang and Kamalika Chaudhuri. Beyond disagreement-based agnostic active learning. In Advances in Neural Information Processing Systems, pages 442–450, 2014.

Appendix A Appendix