While Machine Learning as a Service (MLaaS) has emerged as a viable and lucrative business model, there is an urgent need to protect deep neural networks (DNN) from being used, copied and re-distributed by unauthorized parties (i.e. intellectual property infringement). Recently, for instance, digital watermarking techniques have been adopted to embed watermarks into DNN models during the training stage. Subsequently, ownerships of these models are verified by the detection of the embedded watermarks, which are supposed to be robust to multiple types of DNN modifications such as fine-tuning, pruning and watermark overwriting (Uchida et al., 2017; Le Merrer et al., 2017; Adi et al., 2018).
The principle of digital watermarking for neural network models is mainly inspired by the protection of intellectual property right (IPR) of digital media such as images or videos, which are being processed by media processors e.g. photo galleries or video players. Nevertheless, this approach disregards a unique and fundamental feature of neural network models — themselves are information processors that fulfill certain tasks, including the classification, detection or manipulation of input information. Therefore, a novel and preferable strategy in neural network IP protection is to paralyze the DNN models against unauthorized parties, while maintaining their normal functionalities for lawful usage. To this end, a designated digital entity must be presented to endorse authorized usage of the protected network in question. We refer to this type of protection entities as digital passports and, correspondingly, the process of embedding digital passports into a DNN model is referred to as digital passporting. Within this paper, we shall illustrate how to embed digital passports so that the resulting DNN models are both functionality-preserving and well-protected (see definitions in Section 2.1).
On the one hand, digital passporting bears a similarity to digital watermarking — they both embed certain digital entities into DNN models during training sessions. In terms of the IPRs protection, however, embedded watermarks only enable the verification of the ownership of DNN models. One has to rely on the government investigation and enforcing actions to discourage the infringement of IPRs. More often than not, this kind of juridical protection might be unreliable, costly and long-overdue. On the other hand, passports-protected DNN models will not function normally unless valid passports are provided, thus immediately preventing the unlawful usages of the networks with no extra costs. Indeed, we regard this proactive protection the most prominent advantage of digital passporting over digital watermarking. For instance, as shown in Figure 1 of CIFAR10 classification performances, the protected networks with valid passports demonstrated almost identical accuracies as that of the original network, while the same networks presented with fake passports merely achieved about 10% classification rates. Moreover, even if a computationally demanding reverse-engineering algorithm was used to recover protected network parameters (see Section 3.1), the best accuracies obtained was no more than 70%, substantially inferior to that of the protected network i.e. 92%.
To our best knowledge, the present paper is the first work that shows how to embed and use digital passports to prevent DNN models from being infringed by unauthorized parties. Section 1.1 reviews recent digital watermarking techniques developed for DNNs. Section 2 details the DNN architectures for embedding and verifying digital passports of target DNN models. Section 3 explains the digital passports as watermarks; with extensive experiment results in Section 4 demonstrates the efficacy of the passporting method.
In summary, the contributions of this paper are as follows:
We renovate the paradigm of digital watermarking based neural network IP protection, by proposing a digital passporting based strategy which provides reliable, preventive and timely IP protection at virtually no extra cost for all neural networks.
This paper formulates the digital passporting problem and proposes a generic solution as well as concrete implementation schemes that embed digital passports into DNN models through dedicated passporting layers (Section 2; Figure 2). The embedded passports prevent the unauthorized network usage (infringement) by paralyzing the networks while maintaining its functionality for verified users (Section 4.2; Figure 6).
Our work shows that the embedded passports also verify the ownership of networks, in case that the DNN hidden parameters are illegally disclosed or reverse-engineered while public parameters are plagiarized (Section 3).
1.1 Related work
(Uchida et al., 2017)
was probably the first work that proposed a general framework to embed watermarks into DNN models by imposing an additional regularization term on the weights parameters i.e., which is dependent on the watermarks to be embedded. It was shown that the performances of the original networks (for image classifications) were not affected by the embedded watermarks, and the ownership of network models were robustly verified against a variety of modifications like fine-tuning and pruning.
However, the aforementioned method was limited in the sense that one has to access all the network weights in question to extract the embedded watermarks (i.e. white-box setting). Therefore, (Le Merrer et al., 2017) proposed to embed watermarks in the classification labels of adversarial examples, so that the watermarks can be extracted remotely through a service API without the need to access the network internal weights parameters (i.e. black-box setting). Later, (Adi et al., 2018) proved that embedding watermarks in the networks’ (classification) outputs is actually a designed backdooring and provided theoretical analysis of performances under various conditions. Also in black-box and white box settings, (Darvish Rouhani et al., 2018; Chen et al., 2018; Jia & Potkonjak, 2018) demonstrated how to embed watermarks (or fingerprints), that are robust to watermark overwriting, model fine-tuning and pruning. Noticeably, a wide variety of DNN architectures such as Wide Residual Networks (WRNs) and Convolutional Neural Networks (CNNs) were investigated.
investigates a new family of transformation based on Deep Learning networks for blind image watermarking. In one of the latest work, IBM team -(Zhang et al., 2018) proposed to use three types of watermarks (i.e. content, unrelated and noise) and demonstrated their performances with MNIST and CIFAR10 datasets. Lastly, (Zhu et al., 2018) proposed an end-to-end trainable framework, HiDDeN for data hiding in color images based on CNNs and Generative Adversarial Network and may be applied to watermarking and steganography.
2 Digital Passport for Deep Neural Networks
The ultimate goal of digital passporting is to design and train DNN models in a way such that, only if a designated digital passport (or signature) is presented, will the protected networks function normally. Otherwise, the functionalities of the original networks will be paralyzed. In the following, we shall first formulate the desired characteristics of the DNN protected by the digital passports, and illustrate a generic solution as well as concrete implementation schemes that are employed to embed the digital passports into a convolutional neural network (CNN)111This paper shall only focus on digital passporting of CNNs. The passporting of other network architectures such as GANs is out of the scope of this paper, and will be reported elsewhere..
2.1 Problem formulation
Let denote a CNN model to be protected by a secret digital passport , after a training or passporting process, the network embedded with the passport is denoted by . The inference of such a protected model can be characterized as a process that modifies network behavior according to the running-time digital passport :
in which is the network performance with passport correctly verified, and is the performance with the incorrect passports i.e. .
The properties of defined below are desired for the sake of intellectual property protection:
If , the performance should be as close as possible to that of the original network . Specifically, if the performance inconsistency between and that of is smaller than a desired threshold e.g. , then the protected network is called functionality-preserving.
If , on the other hand, the performance should be as far as possible to that of . The discrepancy between and therefore can be defined as the protection-strength. Moreover, if the strength is larger than a desired threshold e.g. , then the network in question is called well-protected.
2.2 A generic solution
In order to modify the behavior of the CNN as formulated in (1), we propose the following generic solution: we first partition the set of CNN parameters into two non-overlapping subsets i.e. , and use the public parameters , together with the secret passport , to determine the hidden222In this work, traditional hidden layer parameters are considered as public parameters unless they are protected by (2).parameters . Formally, this principle can be illustrated as follows:
in which denotes a set of mathematical functions that calculate the values of from the given and . We refer to as passport functions in the rest of the present paper.
The learning or the digital passporting of a CNN model therefore involves the optimization of the network objective function e.g. to minimize the cross entropy loss by adjusting public parameters . The hidden parameters are no longer trainable, instead, they are directly updated according to (2).
During the inferencing stage, the hidden parameters is computed with the running-time passport i.e. . Clearly, only if the designated secret passport is provided , will the hidden parameters be set correctly. Otherwise, the CNN functionalities will be paralyzed due to incorrect values of hidden parameters .
Three different choices of passport functions (B means the parameter is a trainable variable as in standard Batch Normalization layer,denotes the convolution of with kernel , and Avg denotes the channel-wise average of convolution outputs).
2.3 Concrete implementations
A concrete implementation of the generic solution therefore has to answer two questions:
How to partition the CNN parameters into ?
Which mathematical functions are to be used?
We shall illustrate below a number of implementation schemes, which have their respective answers to the above questions. In particular, the implementations are inspired by the commonly adopted batch normalization technique, which essentially applies the channel-wise linear transformation to the inputs(Ioffe & Szegedy, 2015).
In this work, we propose to append after each convolution or fully connected layer a digital passporting layer, whose scale factor and bias shift term are dependent on both the weights of the preceding layer and the secret passport as follows:
in which denotes the layer number, is the input to the passport layer, is the corresponding linear transformation outputs, and are the passport functions, while and are the passports used to derive scale factor and bias term respectively. Figure 2 depicts the architecture of digital passport layers used in a ResNet layer and Table 1 summarizes different choices of passport functions that have been employed in our work.
The practical choice of formula (3),(4) and (5) is inspired by the Batch Normalization (BN) layer (Ioffe & Szegedy, 2015), and that is also why V1,V2 still train or following BN. Respective performances of these three choices, against different attacks, are illustrated and discussed in the following sections (see Section 3.1 and 4).
stops update after few epochs for unknown reason
It turns out that the introduction of digital passporting layers does not affect the convergence of parameter tunings, as shown in Figure 3, we observe that both the test accuracy and the computed linear transformation parameters and stagnate in the later learning phase. More specifically, as demonstrated by experimental results in Section 4, the performance discrepancies between the passport-protected networks and the original counterparts are no more than, respectively, 1% and 5% for cifar10 and cifar100 classification experiments investigated in this papers. This superior functionality-preserving capability is ascribed to the fact that the original objective functions remain unaltered during the learning stage333In contrast, the objective functions are inevitably changed with the addition of certain regularization terms to embed watermarks (Uchida et al., 2017; Le Merrer et al., 2017; Adi et al., 2018). .
Digital passporting by fine-tuning: note that the proposed digital passporting method can be applied with public parameters either being initialized from a pre-trained model or being trained from scratch using the standard initialization methods e.g. He initialization method (He et al., 2015). Figure 3 shows a comparison between passport network train from-pretrained (blue) and from-scratch (green). As shown, the training initialized with pre-trained models struggles with drastic changes in hidden parameters and the final accuracy is inferior to that of the training from scratch approach. We therefore do not conduct more experiments with fine-tuning in our work.
2.4 Generation and attacking of passport
Public parameters of a passport protected DNN might be easily plagiarized, then the plagiarizer has to deceive the network with certain passports. The chance of success of such an attacking strategy depends on the odds of correctly guessing the secret passports. Figure4 illustrates three different types of passports which have been investigated in our work:
, whose elements are independently randomly generated according to the uniform distribution between [-1, 1]. Correspondingly, the attack for this type of passports is also generated in the same vein. We refer to this combination of passport-attack as T1 type (see Table2).
The chance for a random attack to coincide with the random pattern passport is extremely low, thus, strong protection against attacks are guaranteed. Yet the downside is that it is hard to associate these random patterns with person or corporate identity, which are often needed to prove the network ownership. Also, the average values of random patterns might concur with each, due to the uniform distribution of each element, thus jeopardizing the protection strengths.
one444Two images are needed for the V3 passport functions defined in Table 1. selected image is fed through a trained network with the same architecture, and the corresponding feature maps are collected. Then the selected image is used at the input layer and the corresponding feature maps are used at other layers as passports. We refer to passports generated as such the fixed image passport. The corresponding passport-attack combination is denoted as T2 in Table 2.
The image passport is advantageous since it is straightforward to associate them with person or corporate identity. However, the protection strength provided by a single image passport is limited as plagiarizers might initiate attacks with image with similar or near duplicate contents.
a set of selected images are fed through a trained network with the same architecture, and corresponding feature maps are collected at each layer. Among the options only one is randomly selected as the passport at each layer. Specifically, for a set of images being applied to a network with layers, there are all together possible combinations of passports that can be generated. We refer to passports generated as such the random image passports, which feature both strong protection strengths and easy association with person or corporate identity.
Attackers for this type of passports have to pick up one passport at each layer, even if they have knowledge about the set of images, the chances of guessing the correct passport is merely . This type of passport attacking is denoted as T3.
Respective performances of the above passports and attacks are demonstrated in Section 4.
3 Digital Passports as Watermarks
In case that the original training datasets are somehow made available to plagiarizers, they may opt for reverse-engineering the hidden parameters directly. The functionality of the protected networks might be retained, to various extents, depending on how successful the hidden parameters can be recovered. As shown in following subsection, the chance of success actually depends on which passport functions (in Table 1) are adopted.
3.1 A reverse-engineering attack of hidden parameters
Given the original training datasets, in principle, the hidden parameters might be reverse-engineered by setting them as trainable variables while holding the cloned public parameters as constants. Then the optimization algorithm shall adjust the hidden parameters to minimize the training error e.g. the cross-entropy loss.
On the one hand, the reverse-engineering attack is able to successfully recover the hidden bias terms for the vulnerable choice (V1) of passport functions (with the reverse-engineered accuracies reaching almost 92% for CIFAR10 classification). On the other hand, the more resilient passport functions (i.e. V2 and V3) demonstrate better protection strengths against the reverse engineering attack — the best accuracy the attacker can achieve is no more than 70% (please consult more results in Section 4.3 and supplementary material). Taking into account the crippled network performance, the exceedingly high computational cost as well as the high-priced skills required for setting up the reverse-engineering attack, we regard this attacking approach unprofitable and demotivating for plagiarizers who intend to seek commercial benefits.
3.2 Verification of suspect network ownership
We shall show below, even if the hidden parameters are reverse-engineered or illegally disclosed, the ownership of the protected networks can still be verified by the designed network behavior which is highly dependent on the designated passports as formulated in (1). Under this circumstance, digital passports play the role of digital watermarks in the white box setting (Uchida et al., 2017).
As shown in Figure 3, the hidden parameters converge to specific constant values that lead to the desired performance i.e. . Therefore, the public parameter and the secret passport are actually constrained by the passport functions (4) and (5):
Bearing this constraint in mind, we propose to verify the ownership of a suspect plagiarized network by the following steps:
Feed the network with the secret passports and check whether the test accuracy of a pre-determined set of test samples is the same as the expected performance . Since the chance of enabling the network with a random guess is extremely low (e.g. for those protected by random image passport, see Section 2.4), one can confidently claim the ownership if the verification outcome is positive;
Moreover, add random noise to a varying percentage of the secret passport elements i.e. , and check whether the test accuracy using passports is the same as to the set of pre-recorded performances (see Figure 5 for an example). One can claim the ownership if the verification outcome is positive.
In order to enhance the justification of ownership, one can furthermore select either personal identification pictures or corporate logos (Figure 3(c)) during the designing of the fixed or random image passports (see definitions in Section 2.4).
It must be noted that, using passports as proofs of ownership to stop infringements is the last resort, only if the hidden parameters are illegally disclosed or (partially) recovered. We believe this juridical protection is often not necessary since the proposed technological solution actually provides proactive, rather than reactive, IP protection of deep neural networks.
|Protection Strength ( in %)||Performance Inconsistency ( in %)|
|V1||ResNet (92.72)||48.39 (4.31)||33.48 (24.21)||76.04 (6.42)||+0.19 (0.17)||+0.21 (0.17)||+0.37 (0.18)|
|V2||ResNet (92.72)||82.50 (0.24)||59.54 (23.42)||82.51 (1.05)||-0.21 (0.22)||+0.07 (0.18)||+0.10 (0.10)|
|V3||ResNet (92.72)||82.57 (0.27)||78.98 (8.95)||81.86 (0.93)||-0.15 (0.20)||-0.81 (0.32)||-0.73 (0.24)|
|VGGNet (92.24)||-||-||82.26 (0.35)||-||-||+0.02 (0.26)|
|AlexNet (86.41)||-||-||76.83 (1.59)||-||-||+0.82 (0.23)|
, and in bracket() are standard deviations.
4.1 Experiment setup
In our experiments, we investigated two image classification tasks i.e. CIFAR10 and CIFAR100, with three popular deep learning architectures i.e. ResNet (He et al., 2016), VGGNet (Simonyan & Zisserman, 2014) and the seminal Alexnet (Krizhevsky et al., 2012)
. Detailed descriptions of the datasets, network architectures as well as the hyperparameters of the training algorithm are elaborated in the supplementary material.
For each task & datasets, we trained multiple networks with different passport functions and tested them against different attacking strategies. Performances of the passport-protected network were reported using histograms of their respective accuracies (see Figure 6
for the experiment results). In terms of quantitative evaluation metrics, we adopted theperformance inconsistency and protection strength, defined in Section 2.1, which are also given as follows:
in which stands for inconsistency, for protection strength, and denote, respectively, test accuracies of the original, the protected and the attacked networks.
|V3||ResNet (70.19)||71.14 (0.19)||+1.99 (0.23)|
|VGGNet (70.86)||66.98 (0.01)||-2.88 (0.37)|
|AlexNet (58.19)||61.33 (0.33)||+4.21 (0.49)|
4.2 Protection performances against passport attacks
Following experiments are carried out to evaluate the performance of the proposed digital passport method. First, for any given network architectures, they are embedded with the three different types of passports introduced in Section 2.4 where each network repeats 5 times with different passports being embedded. Test accuracies of the 5 protected networks with corresponding valid passports are measured. Second, for each passport embedded network, 1000 fake-passport attacks are attempted with resulting test accuracies measured. Third, for each protected network, three different passport functions (i.e. V1,V2,V3) introduced in Table 1 are adopted and the resulting test accuracies are measured.
Figure 6 illustrates histograms of CIFAR10 classification accuracies measured, respectively, for the original network, the protected network with valid or fake passports, using different passport functions. Table 2 summarizes the averaged performance inconsistencies and protection strengths, over all passport protected networks.
Functionality-preserving: First, it is observed that the performance inconsistency between the original and the protected networks is no more than 1% for all networks. This is an important result as it shows that the original objective functions remain unaltered during the learning stage.
Well-protected: Second, it shows that the protection strength of the digital passport based network is ranging from 33% up to 83%. Among three types of passport functions, the V3 passport function provides the most resilient protection with the protection strength consistently larger than 76%.
As a summary, the networks embedded with passport functions V2/V3 are consistently functionality-preserving and well-protected when , as defined in Section 2.1. In contrast, the combination of V1 passport functions and the fixed image passport provides the most vulnerable protection, with averaged protection strength merely being 33%. Inspection of the histogram in Figure 6 shows that the most aggressive attack may achieve the test accuracy near 90%. As a result of this vulnerability, for the experiments with other network architectures and CIFAR100, we skip the V1 and only use the V3 passport function, together with the random image passports and T3 passport attacks.
CIFAR100: We also conducted the same experiments in this public dataset (see Table 3 for results), and the performance inconsistency is between -3 to 4% and the network protection strength is ranging from 61% up to 71%. Note that for ResNet and Alexnet, the test accuracies of the protected networks are actually higher than the original. Also, the fake-passport attacking accuracies for both ResNet and VGGNet are about 1%, virtually equivalent to random guessing of 1 out 100 classes.
4.3 Protection performance against reverse-engineering attacks
For each network model constructed for experiments in Section 4.2, we apply the reverse-engineering attack (RevA) as illustrated in Section 3.1 and measure the performance of the recovered networks. Each type of network repeats 5 times as stated before, and Figure 7 illustrates the histograms of measured accuracies distributions, respectively, for the reverse-engineered networks and the protected network with valid passports. Corresponding protection strengths and performance inconsistency are summarized in Table 4.
It was observed that the network protected by the V1 passport function is vulnerable to RevAs, and this is particularly true for the random image passport. On the other hand, the networks have a better protection with V2/V3 passport functions, where the protection strengths are around 24%. While the network functionality is not completely disabled, taking into account the high computational costs of RevAs attacks, we view the protection against RevAs is effective.
Finally, as a last resort, the embedded passport images can be associated with person or corporate identities as depicted in Figure 3(c). This provides an easily verifiable approach to claim ownership of protected networks.
4.4 Distribution of test accuracies with different deep architecture
This section describes distributions of the test accuracies with different networks on CIFAR10 (Figure 8) and CIFAR100 (Figure 9), respectively. For each histogram, green line represents test accuracy of the original network, red histogram represents test accuracies with the correct passports and blue histogram represents test accuracies with fake passports.
5 Discussions and Conclusions
We renovated the paradigm in recent studies of digital watermarking for deep neural network protections, by proposing to paralyze network functionalities for unauthorized usages, and thus, preventing IP infringements in a cost-effective, proactive and timely manner. The proposed generic solution and implementation schemes are proved, by extensive experiment results, to be effective, reliable and resilient against tens of thousands of fake passport attacks and revere-engineering attacks.
We believe this paper puts forward a new research direction for the study of deep neural networks IP protection which is urgently needed. Our future works include the passport protection of other network architectures such as GANs, which is feasible according to the generic solution principle, yet remains to be investigated empirically.
- Adi et al. (2018) Adi, Y., Baum, C., Cisse, M., Pinkas, B., and Keshet, J. Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In 27th USENIX Security Symposium (USENIX), 2018.
- Chen et al. (2018) Chen, H., Darvish Rohani, B., and Koushanfar, F. DeepMarks: A Digital Fingerprinting Framework for Deep Neural Networks. arXiv e-prints, art. arXiv:1804.03648, April 2018.
- Darvish Rouhani et al. (2018) Darvish Rouhani, B., Chen, H., and Koushanfar, F. DeepSigns: A Generic Watermarking Framework for IP Protection of Deep Learning Models. arXiv e-prints, art. arXiv:1804.00750, April 2018.
He et al. (2015)
He, K., Zhang, X., Ren, S., and Sun, J.
Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.In
Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034, 2015.
He et al. (2016)
He, K., Zhang, X., Ren, S., and Sun, J.
Deep residual learning for image recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016.
- Ioffe & Szegedy (2015) Ioffe, S. and Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning (ICML), pp. 448–456, 2015.
- Jia & Potkonjak (2018) Jia, G. and Potkonjak, M. Watermarking deep neural networks for embedded systems. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1–8, 2018.
- Krizhevsky et al. (2012) Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NeurIPS), pp. 1097–1105, 2012.
- Le Merrer et al. (2017) Le Merrer, E., Perez, P., and Trédan, G. Adversarial Frontier Stitching for Remote Neural Network Watermarking. arXiv e-prints, art. arXiv:1711.01894, November 2017.
- Mun et al. (2017) Mun, S.-M., Nam, S.-H., Jang, H.-U., Kim, D., and Lee, H.-K. A robust blind watermarking using convolutional neural network. arXiv preprint arXiv:1704.03248, 2017.
- Simonyan & Zisserman (2014) Simonyan, K. and Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Uchida et al. (2017) Uchida, Y., Nagai, Y., Sakazawa, S., and Satoh, S. Embedding watermarks into deep neural networks. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pp. 269–277, 2017.
- Vukotić et al. (2018) Vukotić, V., Chappelier, V., and Furon, T. Are deep neural networks good for blind image watermarking? In International Workshop on Information Forensics and Security (WIFS), pp. 1–7, 2018.
- Zhang et al. (2018) Zhang, J., Gu, Z., Jang, J., Wu, H., Stoecklin, M. P., Huang, H., and Molloy, I. Protecting intellectual property of deep neural networks with watermarking. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security (ASIACCS), pp. 159–172, 2018.
- Zhu et al. (2018) Zhu, J., Kaplan, R., Johnson, J., and Fei-Fei, L. Hidden: Hiding data with deep networks. In European Conference on Computer Vision (ECCV), pp. 682–697, 2018.