Neural Network Model Extraction Attacks in Edge Devices by Hearing Architectural Hints

03/10/2019 ∙ by Xing Hu, et al. ∙ berkeley college 0

As neural networks continue their reach into nearly every aspect of software operations, the details of those networks become an increasingly sensitive subject. Even those that deploy neural networks embedded in physical devices may wish to keep the inner working of their designs hidden – either to protect their intellectual property or as a form of protection from adversarial inputs. The specific problem we address is how, through heavy system stack, given noisy and imperfect memory traces, one might reconstruct the neural network architecture including the set of layers employed, their connectivity, and their respective dimension sizes. Considering both the intra-layer architecture features and the inter-layer temporal association information introduced by the DNN design empirical experience, we draw upon ideas from speech recognition to solve this problem. We show that off-chip memory address traces and PCIe events provide ample information to reconstruct such neural network architectures accurately. We are the first to propose such accurate model extraction techniques and demonstrate an end-to-end attack experimentally in the context of an off-the-shelf Nvidia GPU platform with full system stack. Results show that the proposed techniques achieve a high reverse engineering accuracy and improve the one's ability to conduct targeted adversarial attack with success rate from 14.6%∼25.5% (without network architecture knowledge) to 75.9% (with extracted network architecture).



There are no comments yet.


page 1

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Machine learning approaches, especially deep neural networks (DNNs), are transforming a wide range of application domains, such as computer vision [1, 2, 3], speech recognition [4], and language processing [5, 6, 7]. Computer vision in particular has seen commercial adoption of DNNs with impacts across the automotive industry, business service, consumer market, agriculture, government sector, and so forth [8]. For example, autonomous driving, with $77 billion projected in revenue by 2035, has attracted the attention of giants including Tesla, Audi, and Waymo [9, 10, 11, 12]. Despite the rising opportunities for DNNs to benefit our life [13], the security problems introduced by DNN systems have emerged as an urgent and severe problem, especially for mission critical applications [14, 15, 16, 17].

As DNN models become more important in system design, protecting DNN model architecture information becomes more critical both due to security concerns and intellectual property protection. Black-box DNNs that encapsulate the internal model characteristics information, have become the mainstream in the AI development community. By extracting the model information, attackers can not only counterfeit the intellectual property of the DNN design, but also conduct more efficient adversarial attacks towards the DNN system  [18, 19]. The commonly-used deployment strategy, “Cloud Training Edge Inference”, makes the model extraction more destructive. It is appealing for attackers to do the physical inception on edge devices, because the success of hacking one device can be leveraged to unlock many other devices sharing the same neural network model.

The AI community envisions the importance of the neural network security and abundant studies come out. Prior studies mainly conduct model extraction through detecting the decision boundary of the victim black-box DNN model [20, 21]. Nevertheless, such approach demands significant computational resources and huge time overhead: given the pre-knowledge of the total number of layers and their type information, it still takes 40 GPU-days to search a 7-layer network architecture with a simple, chain topology [19]. Even worse, this approach cannot accommodate state-of-the-art DNNs with complex topology, e.g., DenseNet [22] and ResNet [23], due to the enlarged search space of possible network architectures.

Real attackers actually have more information at their disposal than might be suggested by algorithm-centric attacks. We show that a more systematic approach, using architecturally visible information has a great deal of power. A complete DNN system includes several components: the DNN model, the software framework, and the hardware platform. These components are not independent of each other. Information leakage from the hardware platform, for example, could expose the kernel events in the DNN model execution and potentially unveil the entire DNN model.

Unlike the attacks on accelerators [24], attacking on GPU platform are much challenging because of the heavier system stack and deeper memory hierarchy. With the system stack, DNN layers are transformed into many GPU-kernels dynamically during run-time(Figure.3). E.g., A single CONV can end up with >10 GPU-kernels under different implementations(e.g. Winograd/Fourier,etc.); then with 65-1500 kernels for a typical DNN, it is difficult to even figure out layer number/boundary, not to mention their structure/connection. Accelerator attacks [24] do not consider such problem. Additionally, the unique comprehensive memory optimization on GPU raise the difficulty, because of: (1) unknown address mapping from logic address to physical address, and then to device address; (2) noisy memory traffic for intermediate data(For example, the data with read-only access may come from workspace in cuDNN); (3) incomplete memory accesses due to optimization for data reuse and computing parallelism. Therefore, it is extremely challenging to accurately identify the layer sequence based on the imperfect execution statistics (with filtering and run-time scheduling) of very long kernel sequence (10x 1000x).

To address these issues, we propose a methodology which extracts models fully exploiting both architecture execution features visible off-chip (e.g. memory access behavior) and priors learned from the rich families of DNN now in operation. Considering both the intra-layer architecture execution features and inter-layer temporal association likelihood, we draw upon ideas from speech recognition to achieve accurate model extraction. A central idea of the paper is that inter-layer DNN architecture features can be considered “sentences” in a language of DNNs. Just like natural language our reading of any individual “word” may be quite error prone, but when placed into the context of the “sentence” we can find a parsing that maximizes the likelihood of a correct match far more effectively than character-by-character approaches ever could. We show that off-chip memory address traces and PCIe events provide ample information to reconstruct neural network architecture accurately with this greater context. In a summay, we are the first to propose and demonstrate end-to-end attacks in the context of an off-the-shelf Nvidia GPU platform with full system stack, which urges the demand to design secure architecture and system to protect the DNN security .

In summary, we make the following contributions:

  • We propose a holistic method which considers both the intra-layer architecture features and inter-layer temporal association likelihood introduced by the DNN design empirical experience to conduct accurate model extraction. We show that off-chip memory address traces and PCIe events provide ample information to reconstruct neural network architecture accurately.

  • We formalize the neural network architecture extraction as a sequence prediction problem, and solve this problem with a sequence model using analogous speach recognition techniques that achieves high accuracy and generality. Building upon this information, we show how one can reconstruct the layer topology and explore dimension space with the assistance of the memory bus traffic information, and finally form the complete neural network architecture.

  • We experimentally demonstrate our methodologies on an off-the-shelf GPU platform. With the easy-to-get off-chip bus communication information, the extracted network architectures exhibit very small difference from that of the victim DNN models.

  • We conduct an end-to-end attack to show that the extracted model boosts the attacking effectiveness of adversarial attack, which introduces 50.4% improvement of attacking success rate compared to cases without neural network architecture knowledge. We demonstrate that memory address traces are able to damage the NN system security which urges hardware security studies(e.g.ORAM), raising the attention of the architecture/system community to build more robust NN system stack

2 Background and Motivation

In this section, we introduce the background of model extraction and existing model extraction techniques.

2.1 Model Characteristics

Model extraction attacks aim to explore the model characteristics of DNNs for establishing a near-equivalent DNN model [25]. It is the initial step for further attack. For example, to attack a victim black-box model, the adversary needs to build a substitute model for generating adversarial examples [20, 21, 16, 17], while the similarity of characteristics between substitute and victim models strongly impacts the effectiveness of these adversarial examples [18, 19].

The model characteristics one would hope extract include: (1) network architecture

consists of layer depth and types, connection topology between layers, and layer dimensions (including channel number, feature map and weight kernel size, stride, and padding etc). (2)


include the weights, biases, and Batch Normalization (BN) parameters. They are updated during stochastic gradient descent (SGD) in the training process. (3)

hyper-parameters include the learning rate, regularization factors, and momentum coefficients, etc. The hyper-parameters are statically configured at the beginning and will not be updated during SGD.

Among all of the model characteristics, the network architecture is most fundamental for NN security. The model parameters, hyper-parameters, and even training data may be inferred with the knowledge of the network architecture [25, 26]. Moreover, previous work [18, 19] observes that the network architecture similarity between the substitute and victim model plays a key role for attack success rate.

2.2 Algorithm vs. Holistic Approaches

Figure 1: (a) Prior approach merely relying on algorithm; (b) Proposed holistic approach.

Many algorithms designed for model extraction have been proposed [19, 25]. Unfortunately, they require the prior knowledge of the network architectures and significant computation demands. As shown in Figure 1, the key idea of an algorithm-centric approach is to search the models in the candidate model zoo to find the one with closest decision boundary as the victim model. The models in the candidate model zoo are trained with the input and output obtained by querying the victim model. However, it is extremely challenging to apply this type of method to extract complex DNN models. Two different networks may have the similar input–output responses for most queries, which makes such methods inherently inaccurate. Furthermore, unlike the parameters, the network architecture cannot evolve dynamically during the learning process. The result of these challenges is that we need to enumerate all possible network architectures to find the closest one, which consumes significant computation resources. It is almost impossible to find the victim model which hasn’t been released before.

We explore the opportunity for architecture execution features to be exploited to help achieve better model extraction in another perspective, as shown in Figure 1. Although prior studies starts to consider the potentiality of leveraging architecture information leakage [24], prior work focuses on the customized FPGA DNN accelerator which have less complex system dynamics which then results in a system easier to reverse engineer. In this work, we demonstrate end-to-end attacks extractions in the context of GPUs which pull from a richer set of possible layers and suffer from much noisier architectural measurements. When we consider both layer architecture features and the inter-layer association probabilities, we show that it is still possible to conduct accurate model extraction even with the architecture and system noises.

3 Attack Overview

In this section, we introduce the threat model and the specific hardware information obtained during execution.

3.1 Attack Model

We focus on the edge security in this work. As shown in Figure 2, the attacker can physically access one edge device encapsulating a victim DNN model for model extraction and attack all the other devices which share the same neural network model. We assume that the adversary use bus snooping techniques which passively monitor PCIe and memory bus events. Bus snooping is a well understood, practical, and low-cost attack that has been widely demonstrated [27, 28, 29]. We do not assume that the attacker has any access to the data passing through through buses, only the addresses, and the attacks described here can work even when data is encrypted. We make no assumptions with regard the ability of the attacker to know even what family of DNN models might be running, what software codes might implement those models, or have any other information about the operation of the device under attack that is not directly exposed through externally accessible connections. The model extraction parts of the attack are fully passive requiring only the ability to observe architectural side channels over time. To complete the attack and craft adversarial inputs, the ability to provide specific inputs and observe results is also required.

Figure 2: Illustration of the attack model. (a) Hack-one, Attack-All-Others. (b). Bus snooping at GPU platform.

3.2 Target Hardware Platform

To make the attack more concrete, we specifically consider a heterogeneous CPU-GPU platform. The basic infrastructure [30] is shown in Figure 2.(b). The CPU and GPU are connected by the PCIE bus, and the host and device memories are attached to the CPU and GPU through DDR and GDDR memory buses, respectively. Such a design offers good programmability, generality, high performance, and hence is a representative platform for such attacks. Many real industrial products are built around such an architecture including most of the existing L3 autopilot systems [12, 11]. The adversary can get access to the PCIe and GDDR bus for model extraction [27, 28, 29], either by physical probing at the interconnect [27] or applying a DMA capable device [28].

3.3 Architecture Information Leakage

Table 1 lists the information we can get from the PCIe bus and device memory bus.

Obtained Inferred
PCIe Bus Kernel events;
Mem copy size ()
Device Memory Bus Memory request trace , ,
Table 1: Bus snooped information.

3.3.1 Information Leakage Through PCIe Bus.

Obtained Information: According to the GPU programming model, the CPU transfers data from the host memory to the device memory and then launches GPU kernels for execution. Once the GPU finishes the task, it transfers results back to the host memory. Thus, there are copy events or control messages through the PCIe bus before the kernel launching and completion during the CUDA program execution [31]. The attacker can obtain two kinds of information: the kernel events and the memory copy size () between CPU and GPU.

Inferred Information: From the data above we can infer the kernel duration time () from the kernel events.

3.3.2 Information Leakage Through Memory Bus.

Obtained Information: During the process of memory bus snooping, the memory access type (read or write), address, and a time stamp for each access can be obtained [29].

Inferred Information: According to the time stamp of memory requests and the kernel execution period, we can infer the following architectural execution characteristics. (1) Read and write data volume ( and ) of the memory requests in every kernel. (2) The data reuse kernel distance according to the addresses and types of memory requests. Specifically, we focus on reuse distance in the kernel wise of the Read after Write (RAW) pattern which is referred to as .

4 Network Architecture Extraction

At a high level, the goal of the proposed methodology is to leverage the hardware snooped information to extract the network architecture of victim model, including the set of layers employed, the connections between layers, and their respective dimension size. This reverse engineering goes through multiple layers of the DNN system stack, including framework, primitive, and hardware platform. As shown in Figure 3.(a), the frameworks optimize the network architecture to form the framework-level layer computational graph and transform these high-level abstractions to hardware primitives (cuDNN, OpenCL) for better resource utilization. The cuDNN library [32] launches the well-optimized handcraft kernel sequence according to the layer type. Finally, kernel execution on the hardware platform exhibits architecture features, including the memory access pattern and the kernel execution latency.

Figure 3: (a). DNN system stack. (b)Three-step methodology for network architecture extraction .

There are several challenges to achieving the goal of extracting network architecture based on kernel execution feature sequences: 1) The relationship between layer and kernel is not static one-to-one correspondence relationship. For example, single conv layer may be implemented as 10x-100x different kernels and in 7 different implementations during run-time. Therefore accurately identify the layer sequence based on the execution statistics of very long kernel sequence (10x 1000x) is an important and challenging task. 1) Some kernels belonging to different layers have quite similar architectural execution features, such as BN, ReLU, and some kernels from Conv. 2) Memory hierarchy and programming library optimization increase the variations of the architecture events, which introduces the run-time noises into the kernel execution features. For example, the cuDNN [32]

library greatly optimizes the convolution and matrix-vector multiplication. There are seven different algorithm implementations for the Conv layer, which are selected during running time, aiming at fully leveraging the compute capability of GPU resources for better performance. Hence, the Conv layers produce variable numbers of execution kernels with different features. Overall, these architectural and system designs introduce noises that lower the identification accuracy for recovering the DNN network structure.

To address these issues, we propose a methodology which employs both architecture execution features and inter-layer context probability of building DNN models. The overall process consists of three steps: 1) Run-time layer sequence identification; 2) Layer topology reconstruction; and 3) Dimension size estimation, as shown in Figure 

3.(b). The three steps of performing the model extract then are as follows:

4.1 Run-time Layer Sequence Identification

In this step, the attacker identifies the executed layers during running time according to the kernel execution features. We first analyze the characteristics of different layers. We identify that both the kernel execution features and layer context features are important for run-time layer sequence identification. We then ingeniously formalize the layer sequence identification as a sequence-to-sequence problem. At the end, we leverage a speech recognition approach [33] as a tool to solve this problem, which achieves accurate prediction.

After comprehensively investigating modern DNN models, we consider the following layers in this work: Conv (convolution), FC (fully-connected), BN (batch normalization), ReLU (rectified linear unit), Pool, Add, and Concat, because most of the state-of-art neural network architectures can be represented by these basic layers 

[2, 34, 23, 35, 36, 37]

. Note that it is easy to integrate other layers into this methodology if necessary. Every layer conducts a certain operation for the input data and output results to the next layer(s). They have following functionality: 1) Conv and FC implement linear transformations on the input or activation data. 2) ReLU performs nonlinear transformations on the input activation, which has equal input and output data volumes. 3) BN performs normalization (e.g. scaling and shifting) on the input activation for faster convergence and also has equal input and output volumes. 4) Pool aggregates features by down-sampling the input activation for dimension reduction. 5) Add performs element-wise addition on two input activation tensors. 6) Concat concatenates several sub-input tensors into a single output tensor

[34]. To identify which layer the kernels belong to, we first analyze the characteristics of these layers in terms of both architectural behavior and model design principle.

4.1.1 Layer Characterization

Intra-Layer Architectural Characterization. We highlight the following architectural features of kernels for analysis: 1) kernel duration time (); 2) the read volume and write volume through memory bus during kernel execution; 3) input/output data volume ratio () where the output volume () equals to the write volume of this kernel and input volume () equals the write volume of the previous executed kernel; 4) kernel dependency distance (), represents the maximum distance in the kernel sequence among current execution kernel and the previous dependent kernels, which can be calculated as follows: .

We observe that although the kernels of different layers have their own features according to their functionality, it is still challenging to predict which layer a kernel belongs to, just based on the these execution features. As shown in Figure 4, every point represents the multi-dimensional information (, , ) of an execution kernel. We observe that many points in Figure 4 are close to each other which are difficult for identification. Our experiments show that about 30% of kernels are identified incorrectly with the executed features only and this error rate will increase drastically with more complex network architectures. The detail of the experiments results are shown in Section 6.2.2. In summary, the pre-mentioned factors will lower the prediction accuracy for recovering the DNN structure if we only consider the single layer independently.

Figure 4: Kernal features of layers.
Figure 5: Context-aware layer sequence identification. (a). Map the layer sequence identification to speech recognition problem; (b) An example of layer sequence identification.

DNN Inter-Layer Context. Given the previous layer, there is a non-uniform likelihood for the following layer type. This phenomenon provides the opportunity to achieve better layer identification. For example, there is a small likelihood that a FC layer follows a Conv layer in DNN models, because it does not make sense to have two consecutive linear transformation layers. Such temporal association information between layers (aka. layer context) are inherently brought by the DNN model design philosophy. Recalling the design philosophy of some typical NN models, e.g. VGG [2], ResNet [23], GoogleNet [34], and DenseNet [22], there are some common empirical evidences in building network architecture: 1) the architecture consists of several basic blocks iteratively connected. 2) the basic blocks usually include linear operation first (Conv, FC), possibly following normalization to improve the convergence (BN), then non-linear transformation (ReLU), possible down-sampling of the feature map (Pool), and possible tensor reduction or merge (Add, Concat).

Although DNN architectures evolve rapidly, the basic design philosophy remains the same. Furthermore, the state-of-the-art technical direction of Neural Architecture Search (NAS), which uses reinforcement learning search method to optimize network architecture, also follows the similar empirical experience 

[36]. Therefore such layer context generally happens in the network architecture design, which can be used as the prior knowledge for layer identification.

4.1.2 Run-time Layer Sequence Prediction.

Based on the above analysis, two major sources of information are jointly considered in layer prediction: the architectural kernel execution features and the layer context distribution possibilities in the layer sequence. This problem is similar to the speech recognition, as shown in Figure 5, which also involves two parts: the acoustic model converting acoustic signals to text and language models computing text probabilistic distribution in words. Therefore we ingeniously map the run-time layer sequence prediction onto a speech recognition problem and use ASR (auto speech recognition) technologies  [38, 33] as a tool to implement the layer identification.

Formally, the run-time layer sequence prediction problem can be described as follows: We have the kernel execution feature sequence with temporal length of as an input. At each time step, kernel feature can be described as a six-dimension tuple: (, , , , , kdd). The label space is a set of sequences comprised of all typical layers. The goal is to train an layer sequence identifier to identify the input kernel feature sequence in a way that minimizes the distance between the predicted layer sequence () and oracle sequence ().

Context-aware Layer Sequence Identification. To build the classifier

, we adopt the LSTM model (a typical recurrent neural network) with CTC (Connectionist Temporal Classification) decoder, which is commonly used in Automatic Speech Recognition 

[38, 33]. As shown in Figure 5, given the input sequence , the output vector

is normalized by the softmax operation and transformed to a probability distribution of the next layer OP. The object function of training is defined to minimize the CTC cost for a given target layer sequence



where denotes the total probability of an emission result in the presence of .

Taking a simplified example in Figure 5.(b), there is a sequence within 3 execution kernels. At every time step in (, , and ), the LSTM outputs the probability distribution of the layer OPs. At the final time step, the CTC decoder uses beam search to find out the sequence with the highest possibility. In Figure 5.(b), the number above an extending node is the total probability of all labelings beginning with this layer OP as the layer sequence prefix. Taking the ‘BN’ at the second row for example, the possibility of sequences with ’BN’ as the prefix is 0.5. At every iteration, the extensions of the most probable remaining prefix are explored. Searching ends when a single labeling is more probable than any remaining prefix. In this example, ‘Conv, BN, ReLU’ is the sequence after CTC beam search, which is taken as the prediction result.

At the end of this step, we can get the run-time layer sequence according to the extracted features of the GPU kernel sequence. The experimental details of the model training, validation, and testing are explained in Section 6.

4.2 Layer Topology Reconstruction

After obtaining the predicted run-time layer sequence, the next step is to get the connectivity between layers to reconstruct the layer topology. If the feature map data of layer is fed as the input of layer , there should be a directed topology connection from to . Since this work focuses on the inference stage, there is only forward propagation across the whole network architecture.

We first analyze the cache behaviors of feature map data and have the following observations:

Observation-1: Only feature map data (activation data) introduces RAW memory access pattern in the memory bus. There are several types of data throughout the DNN inference: input images, parameters, and feature map data. Only feature map data is updated during inference. Feature map data will be written to memory hierachy and be read as the input data of the next layer. The input image and parameter data will not be updated during the whole inference procedure. Therefore the RAW memory access pattern will not be introduced by the input image and parameter data, but only by the feature map data.

Observation-2: There is very high possibility for the feature map data to introduce read cache misses, especially for the convergent and divergent layers. 1) Convergent layer is the one that receives feature map data from several layers. Add and Concat, the main convergent layers, introduce many read cache misses that are contributed by the feature map since they only read the feature map data). As shown in Figure 6, the read cache-miss rate of Add layer is more than 98% and Concat is more than 50%. 2) Divergent layer is the one that output feature map data to several successor layers on different branches. We observe that GPU kernels will execute the layers in one branch before another. Therefore there is a very long distance between this divergent layer and its successor layers in the run-time layer sequence. Because the CUDA library implements extreme data reuse optimization that allocates more cache capacity to the weight tensor instead of the feature map data, it is highly possible that the feature map will be flushed out and need to be read again because of a long reuse distance.

Based on these two observations, we are able to reconstruct the layer connection by detecting the RAW access patterns in different layers. We propose a layer topology reconstruction algorithm as follows: Step-1: We scan the memory request address for every layer in the run-time layer sequence. We add the a connection if there is a successor layer reads the same address with the write address of this layer. Step-2: If there is a non-end layer without any successor, we add the connection between the layer and it next layer in the layer sequence.

Figure 6: Read cache-miss rate of layers in VGG11, ResNet18, and Inception.

4.3 Dimension Size Estimation

After the above two steps, we construct the layer topology without the dimension size information. In this section, we explain how to estimate the dimension size parameters according to the memory read and write volume.

4.3.1 Layer Input/Output Data Volume Estimation

In the first stage, we estimate the input and output size of every layer starting from ReLU layers.

Step-1: ReLU input/output size estimation. As characterized in the previous subsection, ReLU and Add have high cache miss rate, surpassing 98%. Hence, the read volume through the bus is almost the same as the input feature map size of the DNN model. Then the write volume can be estimated which is equal to . Based on this observation, we can obtain the input and output size of ReLU layers.

Step-2: Broadcasting ReLU size to other layers.

In neural network, the previous layer’s output acts as the input to current layer, so the output size (feature map height/width and channel number for Conv or neuron number for FC) of the previous layer equals to the input size of current layer. Hence, given the input size of a ReLU layer, the output size of the previous BN/Add/Conv/FC layer and the input size of the next Conv/FC layer can be estimated. Since the ReLU layer is almost a standard layer every basic blocks, it can guide the dimension size estimation of its adjacent layers. The Add layer can play a similar role for dimension estimation at the divergence and convergence points of compute branches.

Step-3: Estimate the DNN input and output size. In this step, we estimate the input size of the first layer and output size of the last layer with the PCIe information. As described in Section 3.3.1, the adversary is able to get the memory copy size through PCIe. The input image data is copied to GPU at the beginning and prediction results data is copied to host at the end of a batch inference. Therefore the input size and output size can be inferred from .

4.3.2 Dimension Space Calculation

In the second stage, we calculate the dimension parameters with the constructed the layer topology knowing the input/output size of every layer ( ). We want to estimate the following dimension space: the input (output) channel size (), the input (output) height (), the input (output) width (), the the weight size (), and the convolution padding and stride . In fully connected layers, denotes the neuron number. The quantitative estimation is listed in Table 2. Based on the fact that the input size of each layer keep the same as the output size of previous layer, and following the constraints shown in Table 2, we are able to search the possible solutions for every layer.

Layer OP Constraints & Estimation
BN = , = , =
ReLU = , = , =
Add = , = , =
Concat = , = ,
Table 2: Dimension space calculation.

5 Futher-step Adversarial Attack

The extracted network architecture can be used to conduct further-step attack. In this work, we use the adversarial attack as a use case to show the importance of network architecture, which is also one of the most common attack means in the domain of neural network security.

In the adversarial attack, the adversaries manipulate the output of the neural network model by inserting small perturbations to the input images that still remain almost imperceptible to human vision [20]. The goal of adversarial attack is to search the minimum perturbation on input that can mislead the model to produce an arbitrary (untargeted attack) [20] or a pre-assigned (targeted attack) [39, 40, 41] incorrect output. To conduct the adversarial attack against a black-box model, the adversary normally builds a substitute model first, by querying the input and output of the victim model. Then the adversary generates the adversarial examples based on the white-box substitute model [39, 42, 43]. Finally, they use these adversarial examples to attack the black-box model.

Step-1: Build substitute models In our work, we train substitute models with the extracted network architectures, while previous work select the typical network architectures to build the substitute model, as shown in Figure 7 .

Step-2: Generate adversarial examples The state-of-art solution [18] uses ensembled method to improve the attacking successful rate based on the hypothesis that if an adversarial image remains adversarial for multiple models, it is more likely to be effective against the black-box model as well. We follow the similar techniques to generate adversarial images for the ensemble of multiple models.

Step-3: Apply the adversarial examples As the final step, the adversary attacks the black-box model using the generated adversarial examples as input.

We follow the same adversarial attack methodology in the previous work [18] and the only difference is that we use the predicted network architecture to build the substitute models and the experiments show that with the accurate extracted network architecture, the successful rate of adversarial attack will be improved significantly. The detailed results will be shown in Section 6.4.2.

Figure 7: Adversarial Attack Flow.

6 Attack Demonstration

In this section, we evaluate a complete hacking flow, taking CNNs as a case study.

6.1 Evaluation Methodology

Experimental Platform: To validate the feasibility of stealing the memory information during inference execution, we conduct the experiments on the hardware platform equipped with Nvidia K40 GPU [44]

. The NN models are implemented based on Pytorch framework

[45], with CUDA8.0 [46] and cuDNN optimization library [32].

Experimental Setup: We use the nvprof tool to emulate bus snooping. nvprof is an NVIDIA profiling tool kit that enables us to understand and optimize the performance of OpenACC applications [47]. The information that we can get from the nvprof tool is shown in Table 1. Based on these raw data, we can reconstruct the network architecture of the victim model according to the steps in Figure 3.(b).

Layer Sequence Identifier Training: As an initial step for network architecture extraction, we first train the layer sequence identifier based on an LSTM-CTC model for layer sequence identification. The detailed training procedure is as follows.

Dataset for Training: In order to prepare the training data, we first generate 8500 random computational graphs and obtain the kernel features experimentally with nvprof which emulates the process of bus snooping. Two kinds of randomness are considered during random graph generation: topological randomness and dimensional randomness. At every step, the generator randomly selects one type of block from sequential, Add, and Concat blocks. The sequential block candidates include (Conv, ReLU), (FC, ReLU), and (Conv, ReLU, Pool) with or without BN. The FC layer only occurs when the feature map size is smaller than a threshold. The Add block is randomly built based on the sequential blocks with shortcut connection. The Concat block is built with randomly generated subtrack number, possibly within Add blocks and sequential blocks. The dimensional size parameters, such as the channel, stride, padding, and weight size of Conv and neuron size of FC layer, are randomly generated to improve the diversity of the random graphs. The input size of the first layer and the output size of the last layer are fixed during random graph generation, considering that they are usually fixed in one specific target platform. We randomly select 80% of the random graphs as the training set and other 20% as the validation set to validate whether the training is overfitting or not. To verify the effectiveness and generalization of our hardware-aided framework, we examine various NN models as the test set, including VGG [2], ResNet [23], Inception [34], and Nasnet [36] to cover the layer types as many as possible.

Identifier Configurations: The identifier utilizes the LSTM-CTC model, consisting of one hidden layer with 128 cells, for layer sequence identification. It is trained using the Adam [48]

optimizer with learning rate adaptation. The training is terminated after 100 epochs.

6.2 Run-Time Layer Sequence Identification

In this section, we first evaluate the layer sequence identification accuracy. Then we analyze the importance of the layer context information and how does the noise affect the identification accuracy.

6.2.1 Prediction Accuracy

Evaluation Metric. The speech recognition model adopts the mean normalized edit distance between the predicted sequence and label sequence to quantify the prediction accuracy [33, 38], which is referred to as label error rate (LER). Therefore, here we also adopt LER to evaluate the prediction accuracy. The detailed LER calculation is formulated as the following equation  [33].


where is the edit distance between two sequences and , i.e. the minimum number of insertions, substitutions, and deletions required to change p into q, is the number of samples in testing set, is the identified layer sequence, and is the oracle layer sequence.

Results. We first evaluate the accuracy on validation set. The average LER on validation set is about 0.08, which evidences the good prediction capability. Furthermore, we evaluate the accuracy to identify several typical networks, as shown in Table 3. For VGG and ResNet families, the prediction LER is lower than 0.07. For inception and Nasnet, the LER increases a little bit because of the much deeper and complex topology. In summary, our proposed method predicts generally well in these cases.

A Detailed Example. We take ResNet34 as an example to present the detailed results in Table 4. We make the following observations: 1) The prediction model is generally effective in correctly identifying the layer sequence; 2) In some rare cases, although it is possible that the BN/ReLU will be incorrectly missed or created, the critical Conv/FC/Add/Concat layers can be correctly recognized.

VGG16 VGG19 ResNet34 ResNet101 ResNet152 Inception Nasnet_large
0.020 0.017 0.040 0.067 0.068 0.117 0.132
Table 3: Prediction LER on typical networks.
Network Oracle Sequence Predicted Sequence
ResNet34 (LER 0.040) conv bn relu pool conv bn relu conv bn add relu conv bn relu conv bn add relu conv bn relu conv bn add relu conv bn relu conv bn conv bn add relu conv bn relu conv bn add relu conv bn relu conv bn add relu conv bn relu conv bn add relu conv bn relu conv bn conv bn add relu conv bn relu conv bn add relu conv bn relu conv bn add relu conv bn relu conv bn add relu conv bn relu conv bn add relu conv bn relu conv bn add relu conv bn relu conv bn conv bn add relu conv bn relu conv bn add relu conv bn relu conv bn add relu pool fc conv bn relu pool conv bn relu conv bn add relu conv bn relu conv bn add relu conv bn relu conv bn add relu conv bn relu conv bn conv bn add relu conv bn relu conv bn add relu conv bn relu conv bn add relu conv bn relu conv bn add relu conv bn relu conv bn conv bn add relu conv bn relu conv bn add relu conv bn relu conv bn add relu conv bn relu conv bn add relu conv bn relu conv bn add relu conv bn relu conv bn add relu conv bn relu conv bn conv bn add relu conv bn relu conv bn relu add relu conv bn relu conv bn relu add relu pool fc
Table 4: Detailed comparison between the oracle and predicted layer sequence.

6.2.2 With/Without Layer Context Information.

We analyze the importance of inter-layer information in this section. As shown in Figure 8

(a), we compare the LER of two methods: layer-context-aware identifier based on LSTM-CTC model and single-layer identifier based on MLP (multi-layered perceptron) model.

We draw two conclusions from this experiment: 1) We can achieve much better prediction accuracy with considering the layer context information. The results show that the average LER of LSTM-CTC is two times lower than the MLP-based method. 2) Layer context information is increasingly important when identifying more complex network architecture. As shown in Figure 8.(b), compared to the simple network architecture with only chain typologies, the more complex architectures with remote connections (e.g. Add or Concat) cause higher error rates. For the MLP-based model, the LER dramatically increases when the network is more complex (from 0.18 to 0.5); while for the LSTM-CTC model, the average LER demonstrates a non-significant increase (from 0.065 to 0.104).

Figure 8: MLP: layer prediction without inter-layer information; LSTM-CTC: leverage the inter-layer information for better prediction. The results indicate that the inter-layer likelihood is very important for layer prediction.

The experiments results indicate that the layer context with inter-layer temporal association is a very important information source, especially for the layer sequence within complex topology.

6.2.3 Noise Influence.

We conduct experiments to analyze the prediction sensitivity in the scenarios with noise on the kernel features. When 5%, 10%, 20%, or 30% of random noises is inserted to the read and write volumes of the validation set, the average LERs of the layer prediction is shown in Table 5. The results indicate that the layer sequence identifier has the ability to resist noise.

Noise 5% 10% 15% 30%
LER 0.08 0.09 0.12 0.16
Table 5: Prediction LER on validation set with random noise.

Figure 9: Layer input and output size estimation (normalized to the oracle size).
Figure 10: Illustration of a complete example of reconstructing ResNet18. Step-1: Calculating the kernel features based on the bus snooped information (emulated by nvprof); Step-2: Identify the run-time layer sequence; Step-3: Reconstruct the layer topology and get the input and output size of layers; Step-4: dimension estimation according to the layer type and input/output size.

6.3 Dimension Space Estimation

In this section, we show how accurate the size estimation of the input and output for every layer can be, which is important for dimension space estimation. We take the input and output size of every layer as examples and show the results in Figure 9. The estimated size is normalized to the oracle size. For Conv, BN, ReLU, Add, and Concat, the estimation accuracy can reach up to 98%. The FC presents lower accuracy, because of the following reasons: the FC layer is usually at the end of the network and the neuron number shrinks. Therefore, the activation data of the ReLU layer will probably be filtered, and the corresponding cache miss rate will be much lower. Therefore, it is not accurate to use ReLU read transactions to estimate the FC size. Instead, we use the read volume of FC layer to predict the input and output size.

6.4 An end-to-end Demo Case

We use a complete example to clearly illustrate the model extraction against the victic model ResNet18. With the extracted network architecture, we conduct the consequent adversarial attack which shows that the attack success rate can be significantly improved.

6.4.1 Model Extraction

The model extraction flow is shown in Figure 10, consisting the following four steps.

Step-1: Kernel feature calculation: 1) We obtain the bus snooped information and calculate the sequence of kernel features (, , , ), as shown in Subgraph-


Step-2: Run-time layer sequence identification: Taking in the kernel feature sequence, we use the layer sequence identifier, which is based on an LSTM-CTC model, to identify the layer sequence. The prediction results are listed in the right boxes of Subgraph-

. Because the network architecture usually constitutes of several basic blocks iteratively, we use a hierarchical expression to show the prediction results to facilitate the presentation. The complete layer sequence of ResNet18 consists of {}. The identifier predicts the layer sequence precisely for most of the blocks and there are only small mistakes in two blocks: In , the identifier missed a BN layer; In , the identifier incorrectly adds another ReLU in the architecture. Since there is no change to the feature map size in BN and ReLU layers, this prediction will not affect the dimension size estimation results.

Step-3: Layer topology reconstruction: Based on the RAW memory access dependency pattern, the layer topology is constructed following the reconstruction algorithm, as shown in Subgraph-


Step-4: Dimension size estimation: 1) We first estimate the feature map input and output size of ReLU layers according to their read and write volume (in blue color); 2) We then estimate the input and output size of all the other layers propagating from ReLU; 3) We finally estimate the complete dimension space based on the input and output size of layers, according to the equations in Table 2. With reasonably assuming the kernel and stride size, the padding, input channel, output channel in each Conv Layer can be consequently derived.

Two examples of the possible dimension solutions are shown in Subgraph-

. ‘[]’ represents a basic block with an Add layer; ‘K’, ‘P’, and ‘S’ represent the weight, the padding, and the stride size of the Conv layer, respectively; ‘B’ represents BN; and ‘R’ represents ReLU.

We successfully reconstructed the network architecture of the black-box DNN model after these four steps. Although the final graph is not unique due to the variable dimension size, they are within the same network architecture family. We will show the importance of these predicted network architectures which can help boost the adversarial attacking efficiency in the following subsection.

6.4.2 Adversarial Attack Efficiency

In this section, we show that the adversarial attack efficiency can be significantly improved with the extracted network architecture information.

The adversarial attacking flow consists of three steps: 1) building substitute models; 2) generating adversarial examples towards the substitute model; 3) applying the adversarial examples to the victim black-box model, as introduced in Section 5. We use the algorithm proposed in prior work [18], which achieves better attacking success rate by generating the adversarial examples based on the ensemble of four substitute models.

Setup: In these experiments, we use ResNet18 [23] as the victim model for targeted attack. Our work adopts the extracted neural network architecture, as shown in Figure 10, to build the substitute models. For comparison, the baseline examines the substitute models established from following networks: VGG family (VGG11, VGG13, VGG16, VGG19[2], ResNet family (ResNet34, ResNet50, ResNet101, ResNet152[23], DenseNet family (DenseNet121, DenseNet161, DenseNet169, DenseNet201[22], SqueezeNet [49], and Inception [34].


First, we randomly select 10 classes, each class with 100 images from ImageNet dataset 

[50] for testing. To perform the targeted attack, we test both the cases that the targeted class is far away from the original class and the targeted class is close to original class. We compare the following five solutions: ensembled model with substitute models from VGG family, DenseNet family, mix architectures (squeezeNet, inception, AlexNet, DenseNet), ResNet family, and from extracted ResNet architectures using our previous model extraction. The results are shown in Table 6. We get several observations: 1) The attacking success rate is generally low for the cases without network architecture knowledge. The adversarial examples generated by substitute model with VGG family, DenseNet family, and mix architectures only conduct successful attacks in 14%–25.5% of the cases. 2) With some knowledge of the victim architecture, the attacking success rate can be significantly improved. For example, the substitute models within ResNet family can achieve attack success rate of 43%. 3) With accurate network extraction, although it still has a little difference from the original network, the attacking efficiency can be boosted to 75.9%. These results indicate that our model extraction can significantly improve the success rate of consequent adversarial attacks.

In a further step, we take a deep look at the targeted attack leading the images in Class-755 to be misclassified as Class-255, in order to explore the ensembled model with various substitute combinations. We randomly pick four substitute models from the candidate model zoo and the results are shown in the blue bars of Figure 11. We also compare the results to the cases using substitute models 1) from VGG family; 2) from DenseNet family; 3) from squeezeNet, inception, AlexNet, and DenseNet (’Mix’ bar in the figure); 4) from ResNet family; and 4) from extracted cognate ResNet18 model (Our method) to generate the adversarial examples. As shown in Figure 11, the average success rate of random cases is only 17% and the best random-picking case just achieves the attacking success rate of 34%. We observe that all good cases in random-picking (attacking success rate > 20%) include substitute models from ResNet family. Our method with accurate extracted DNN models performs best attacking success rate across all the cases, 40% larger than the best random-picking case and ResNet family cases. In a nutshell, with the help of the effective and accurate model extraction, the consequent adversarial attack can achieve much better attack success rate.

VGG DenseNet Mix ResNet Extracted
family family family DNN
Success rate 18.1% 25.5% 14.6% 43% 75.9%
Table 6: Success rate with different substitute models.
Figure 11: Explore attack success rate across different cases when conduct targeted attack (Class-750 Class-255): 1) random picking 4 subsitute modes from candidate pool; 2) substitute models from VGG family; 3) substitute models from DenseNet family; 4) substitute models from squeezeNet, inception, AlexNet, and DenseNet (Mix); 5) substitute models from ResNet family ; 6) substitute models from our extracted network architectures.

7 Discussion

In this section, we first analyze the intrinsic reason of successful network architecture extraction and how general it will work. Then, we further explain the impact of the existing memory bus protection methods on our attacking approach and propose potential defensive techniques.

7.1 Generality and Insights

The standardization through the whole stack of neural network system facilitates the model extraction. The standardized hardware platforms, drivers, libraries, and frameworks are developed to help machine learning industrialization with user-friendly interfaces. Transforming from the input neural network architecture to final hardware code is dependent on the compilation and scheduling strategies of DNN system stacks, which can be learned under the same execution environment. Therefore, the adoption of these hardwares, frameworks, and libraries in the development workflow gives attackers an opportunity to investigate the execution pattern and reconstruct the network architecture based on the hardware execution information.

The root cause of hacking the network structure is to learn the transformations between framework-level computation graphs and kernel feature sequence. Therefore, we build the training set based on random graphs with basic operations provided by DNN framework. In our methodology, as long as the DNN model is built based on the basic operations provide by framework (such as Conv2d, ReLU, and MaxPool2d, etc in pytorch), the neural network structure can be reconstructed. In addition, our methodology can be extended to include the other operations in the framework model zoo. Therefore, our methodology is generally applicable to various CNN models with different neural network architectures. We demonstrate that memory address traces are able to damage the NN system security which urges hardware security studies(e.g.ORAM) and may raise the attention of the architecture/system community to build more robust NN system stack.

7.2 Defence Strategies

7.2.1 Microarchitecture Methodologies.

There are a few architectural memory protection methods. Oblivious Memory: To reduce the information leakage on the bus, previous work proposes oblivious RAM (ORAM) [51, 52, 53], which hides the memory access pattern by encrypting the data addresses. With ORAM, attackers cannot identify two operations even when they are accessing the same physical address  [51]. However, ORAM techniques incur a significant memory bandwidth overhead (up to an astonishing 10x), which is impractical to be used on GPU architecture that is bandwidth sensitive.

Dummy Read/Write Operations: Another possible defence solution is to introduce fake memory traffic to disturb the statistics of memory events. Unfortunately the noise exerts only a small degradation of the layer sequence prediction accuracy, as illustrated in Section 6.2.3. As such, dummy RAW operations to obfuscate the layer dependencies identification may be a more fruitful defensive technique to explore.

7.2.2 System Methodologies.

The essence of our work is to learn the compilation and scheduling graphs of the system stack. Although the computational graphs go through multiple levels of the system stack, we demonstrate that it is still possible to recover the original computational graph based on the raw information stolen from the hardware. At the system level one could: 1) customize the overall NN system stack with TVM, which is able to implement the graph level optimization for the operations and the data layout [54]. The internal optimization possibly increases the difficulty for the attackers to learn the scheduling and compilation graph, or 2) make secure-oriented scheduling between different batches during the front-end graph optimization. Although such optimizations may have little impact on performance, they may obfuscate the attackers view of kernel information.

8 Related Work

Machine learning security is an attractive topic with the industrialization of DNNs techniques. The related existing work mainly comes from the following two aspects.

Algorithm perspective: Neural network security attracts much attention with the industrialization of the DNN techniques. Previous work discuss the concrete AI security problems and machine learning attack approaches [13, 16, 55]. Adversarial attacking are one of the most important attack model which generates the adversarial examples with invisible perturbation to confuse the victim model for wrong decision. These adversarial examples can produce either the targeted [39, 20, 56, 57, 58, 59, 60, 61], or untargeted [41, 62, 63, 64, 65] output for further malicious actions. The adversarial attacks can be categorized as white-box attacks [66, 39, 56, 67, 41, 68, 40, 20, 69, 70, 71] and black-box attacks [25, 43, 20, 41, 39, 42, 72, 73, 74, 75, 76, 77, 65, 78, 79, 60, 61], according to the prior knowledge regarding the victim model. More specifically, in white-box attack, attacker knows internal model characteristics (i.e. network architecture and parameter of the model) of the victim model. White-box attack is less practical than black-box attack in real deployment since the designers intend to hind the information from the users. In black-box attack, the attacker has no knowledge of the model characteristics but can only query the black-box model for the input and output responses. The state-of-art work observes that adversarial examples transfer better if the substitute and victim model are in the same network architecture family [18, 19]. Therefore, the extracting inner network structure is important for attacking effectiveness.

Consequently, model extraction work are emerged to explore the model characteristics. Previous work steal the parameter and hyperparameter of DNN models with the basic knowledge of NN architecture 

[25, 26]. Seong et al. explore the internal information of the victim model based on meta-learning [19]. However, it is inefficient to extract the neural network architecture in the algorithm level, Our work addresses this issue from system perspective with enhancement of bus snooped information, which will significantly benefit the attack efficiency of current software algorithms.

Hardware perspective: Several accelerator based attacks are proposed, either aiming to conduct model extraction  [24] or input inversion [80]. However, their methodology relies on the specific features in hardware platforms and cannot be generally applicable to GPU platforms with full system stack. Some studies research on the information leakage in general purpose platforms. Cathy [81] explores side-channel techniques to get the neuron and GEMM operator number in CPU. Naghibijouybari et al. show that side-channel effect in GPU platform can reveal the neuron numbers [82]. However, no direct evidence shows that how these statistics are useful to the attacking effectiveness. Targeting at the security in the edge(e.g.automotive), this work is the FIRST to propose the NN model extraction methodology and experimentally conduct an end-to-end attack on an off-the-shelf GPU platform immune to full system stack(e.g.pytorch+cuDNN).

9 Conclusion

The widespread use of neural network-based AI applications means that there is more incentive than ever before for attackers extract an accurate picture of inner functioning of their design. Through the acquisition of memory access events from bus snooping, layer sequence identification by the LSTM-CTC model, layer topology connection according to the memory access pattern, and layer dimension estimation under data volume constraints, we demonstrate one can accurately recover the a similar network architecture as the attack starting point. These reconstructed neural network architectures present significant increase in attack success rate.