ArchNet: Data Hiding Model in Distributed Machine Learning System

04/23/2020 ∙ by Kaiyan Chang, et al. ∙ 0

Cloud computing services has become the de facto standard technique for training neural network. However, the computing resources of the cloud servers are limited by hardware and the fixed algorithms of service provider. We observe that this problem can be addressed by a distributed machine learning system, which can utilize the idle devices on the Internet. We further demonstrate that such system can improve the computing flexibility by providing diverse algorithm. For the purpose of the data encryption in the distributed system, we propose Tripartite Asymmetric Encryption theorem and give a mathematical proof. Based on the theorem, we design a universal image encryption model ArchNet. The model has been implemented on MNIST, Fashion-MNIST and Cifar-10 datasets. We use different base models on the encrypted datasets and contrast the results with RC4 algorithm and Difference Privacy policy. The accuracies on the datasets encrypted by ArchNet are 97.26%, 84.15% and 79.80%, and they are 97.31%, 82.31% and 80.22% on the original datasets. Our evaluations show that ArchNet significantly outperforms RC4 on 3 classic image classification datasets at the recognition accuracy and our encrypted dataset sometimes outperforms than the original dataset and the difference privacy policy.



There are no comments yet.


page 5

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Real-world datasets are diverse and complex. Cloud computing services has become the de facto standard technique for training neural network. Deep learning often includes several complex models. It takes much time to train. Some machine learning service providers (

e.g., Google) develop TPU or FPGA to accelerate the neural network calculations of cloud servers(Farhadi et al., 2019; Huot et al., 2020) and encourage people to upload datasets to the cloud. Figure 1 shows a general cloud machine learning model. However, there are two problems with this business model. (i) Although it is convenient for users to use cloud machine learning services, each upload can only use algorithms implemented by the service provider. (ii) For other researchers, they often produce new and specific deep learning models when conducting machine learning related research, but lack more datasets to validate their algorithms. In order to address the first type of problem, many ML server providers allow users to rent their cloud servers to implement their own algorithms. Some dataset communication platforms have been established to solve the second problem. Related general datasets widely disseminate on the Internet for data researchers(Prabhu, 2019)(e.g., Kaggle).

Figure 1. Cloud Computing system

Despite the first solution, some users do not have the corresponding expertise to design better models for self-training. They are not satisfied with the fixed performance of commercial models. They want to try newer and more diverse algorithms to enhance the competitiveness of users’ own services. Universal data exchange platforms are also difficult to satisfy researcher’ enthusiasm for finding personalized datasets. Therefore, we proposed a distributed machine learning system. The cloud machine learning service provider only serves as an intermediary between the data publisher and the algorithm owner. We also focus on the privacy of users’ datasets. The key point in distributed machine learning models is to ensure user’ s dataset information is not leaked. Traditional general encryption algorithms can ensure users’ datasets are encrypted, but it cannot ensure encrypted datasets can be recognized by deep learner patterns.

Although the Fully Homomorphic Encryption(FHE) algorithm proposed in recent years can achieve a certain effect in theory, the encrypted data must rely on a special neural network structure to be correctly identified, which is not convenient for algorithm designers. The difference privacy policy can not separate the base model and the encrypt model easily. In order to solve the contradiction between these two aspects, we propose the ArchNet and Tripartite Asymmetric Encryption. The dataset can be correctly recognized by the neural network model without special processing after ArchNet encryption. The problem is to guarantee the encrypted dataset is difficult to crack. Our method provides solutions to the data hiding problems of existing machine learning models and the data security problems that exist in distributed machine learning system.

In this paper, we first introduce a novel and effective distributed machine learning system for diverse deep learning algorithms (i.e., moving the computing end of machine learning from the central server to the remote end). We further propose a dataset encryption scheme (ArchNet), which can solve the problem of untrustworthy in the remote end of the distributed machine learning computing system. We deduce the basic principle of Tripartite Asymmetric Encryption mathematically, and give the formal proof that neural network can be used for encryption and decryption. We find that using neural network as encryptor can make the encrypted dataset difficult to be stolen by others and easy to learn at the computing end. The reasons are (i) some basic unit combinations of neural network have reversible operations, and (ii) it is difficult for human to recognize data in high-dimensional space.

We design and implement our technology as part of ArchNet, a new general dataset encryption strategy. We compare ArchNet with RC4(MIRONOV, 2002) in three image classification datasets (MNIST, F-MNIST(Xiao et al., 2017), Cifar-10(Krizhevsky, 2009)) using the same base model. Our results show that the recognition rate of dataset encrypted by ArchNet method is always significantly higher than that of RC4, and the recognition rate is almost the same as the original dataset and the DP policy(Mironov, Aug 2017). In terms of the recognizability of encryption, the picture encrypted by ArchNet is not easy to be recognized, and the encryption effect is better.

Our primary contributions in this paper are as follows:

  • We propose Tripartite Asymmetric Encryption and two kinds of key.

  • We are the first to prove the rationality of neural network in data encryption.

  • We are the first to apply the neural network to data encryption in the network-based distributed machine learning system.

  • We design, implement and evaluate our data encryption scheme (ArchNet), which significantly outperforms than encryption algorithm RC4.

The rest of the paper is organized as follows. Section II provides the principle of distributed machine learning. Section III summarizes the necessary theoretical basis of data hiding, and describes the basic principle of Tripartite Asymmetric Encryption. Section IV provides the ArchNet model to implement a data encryption scheme. We present our experimental results in Section VII. Section VIII summarizes the related work and Section IX concludes the paper.

2. Distributed Machine Learning Solution

In this section, we first describe the structure of distributed machine learning system. Finally, we demonstrate why we have to solve the model hiding problem and data hiding problem in the system.

2.1. Distributed Machine Learning System

The computing resources of the cloud servers are limited by hardware and the fixed algorithms of service provider. In order to solve this problem, we propose distributed machine learning system which is different from the distributed machine learning on heterogeneous computing system(i.e, a computer with GPU, TPU or FPGA). It is a change of the business pattern in cloud ML. Computer network is the basis of our system.

Figure 2. Distributed Machine Learning System Structure

We define the computer network structure as . Here denotes the union of 3 different sets as shown in Figure 2 machine learning server node , dataset publishing node sets, and computer node sets with computing resources . The machine learning server node receives datasets from via the Internet, and then post the task information. The computer that meets the task requirements can make a request to the machine learning server automatically, which send the dataset to the computer through the Internet. provides deep learning algorithm to train the dataset. When the training is finished, the computer sends the deep learning model back to the machine learning server, and the server pays a certain amount of fees to the computer after using the validation dataset. Finally, the server sends the deep learning model back to the dataset publisher node.

In this system, machine learning server does not need to have computing resources. It behaves as an intermediary of data and algorithm deployment. is the publisher of dataset and the user of trained deep learning model. is both a publisher of the deep learning model and a user of the dataset. (e.g., can be a university or a scientific research institute, which wants to use the dataset provided by to validate their new deep learning algorithms.)

Compared with the cloud ML system, our system has no resource restrictions in the server. It can help the dataset publishers find specified algorithms. The system makes algorithm designers more active in validating their algorithms.

2.2. Model Hiding

For computers , their users don’t want others to know their private training algorithms. serializes the model into a universal format(e.g., ONNX) which connects the deep learning model publisher and the dataset publisher. Because the model in universal format does not contain the optimization method and data augmentation algorithm, it can hide the algorithm of the deep learning model. The node knows how to train the model, while the node does not know the specific training algorithm.(e.g., suppose a scientific research group decides to test their new algorithm for image classification on the computer , they submit a dataset request to the machine learning service provider , then use their new algorithm training the dataset and generate ONNX file. Ultimately, they send ONNX file back to the server . The server validates the deep learning model after they complete the tasks. The server do not know the specific algorithm of training.) The private training algorithm can be protected well in our system. Only knows the crucial training method.

2.3. Data Hiding

For publishers with datasets , their users do not want others to know the contents of their datasets for the purpose of protecting privacy and sensitive information. (e.g., the dataset publisher publishes the dataset . Although and can get the dataset , they cannot know the meaning represented by , but they can use the dataset to train the deep learning model.) In order to hide data, we design an encryption algorithm. This general encryption algorithm like AES and RC4 has been applied to data on the Internet. However, the use of such encryption algorithm for data hiding will cause neural network training problems. This algorithm disrupts the original distribution of the dataset, resulting in the accuracy of neural network is very low. Therefore, some researchers propose fully homomorphic encryption (FHE). However, the current FHE does not have a universality in deep learning algorithms. Only specific models can be used as base model(e.g., CryptoNet). For the rest of the paper, we focus on the training accuracy of data encryption and propose our encrypt method(ArchNet) to improve the accuracy of neural network on encrypted dataset.

3. Encryption Principle

In this section, we first describe the basics of data hiding problem and give the mathematical principle of our encryption method. We finally propose measurement standards to evaluate the performance of an encryption model.

3.1. Data Hiding Problem

An optimization problem usually consists of three different components: a vector of parameters

, an objective function , and a set of constraint functions . The goal is to find a concrete value of the parameter vector that maximize while satisfying all constraint functions as shown below.


Here denote the sets of real numbers, the indices for inequality constraints, and the indices for equality constraints, respectively.

Usually, we use iterative method with a small learning rate to maximize

. We must make sure the result meet the constraints after each iteration. However, Most constraint functions can not be expressed analytically in practical applications. Evolutionary algorithm is generally used to solve the optimization problem with no gradient information, while back propagation in neural network is used to solve the optimization with obvious gradient information. Both of them can only solve unconstrained optimization problems. It is difficult to simplify constrained optimization problem to unconstrained optimization problem based on specific problems. Data hiding is a constrained optimization problem. The goal of the problem is to maximize the distance between the original data distribution function

and the encrypted data distribution function . The constraint of the problem is that the encrypted data can still be recognized with high accuracy by deep learning algorithms. The constraint is the premise of ensuring the flexibility of algorithm in distributed machine learning system. The rest of the paper focuses on the simplification of data hiding problem so that it can be solved by neural network.

3.2. Tripartite Asymmetric Encryption

In order to transform the data hiding problem into unconstrained optimization problem, we eliminate the constraint function . Assume the deep learner as a set of functions. The first problem to solve is what kind of function can act on the dataset to make the deep learner still get correct classification for . The function with this property can be used as encryption function. The following definitions define the above problems mathematically.

Definition 1 ().

Let the countable infinite function set

Here denotes the unit mapping, then is called the first type of encoding function or the first type of encryption function (E1) corresponding to , is called the first type of decoding function or the first type of decryption function (D1) corresponding to .

Definition 2 ().

Let the countable infinite function set

Then is the second type of encoding function or the second type of encryption function (E2) corresponding to , and is the second type of decoding function or the second type of decryption function (D2) corresponding to , where the definition domain of is dataset , the range of is the label set corresponding to dataset .

The definition above further describes that the first type of decoding function can decrypt the encrypted data to the original data, while the second decoding function can decrypt the encrypted data to the corresponding label. We intend to find a function with such a special property, which is both the first type of decoding function and the second type of decoding function.

Theorem 3.1 ().

If the function is the first type of encoding function, and is also the second type of encoding function, then it have the first type of decoding function and the second type of decoding function .


Under theorem 3.1

, for pattern recognition problem, there is now 3 propositions. Proposition P: deep learning model is used as decoder. Proposition Q: The target classification label is used as the supervision output. Proposition R: Original dataset is used as supervised output. It is easy to see that objects with

need to satisfy functions and , and objects with need to satisfy functions . (e.g.

, in our distributed machine learning system, in the view of computers with computing resources, it can be classified only when it satisfies the conjunction expression

, and in the view of dataset publisher, it can be encrypted only when it satisfies the conjunction expression .)

The above mathematical derivation provides a mathematical proof for our distributed machine learning system. The publisher of the dataset needs to have the encryption function and the first type of decoding function . The computer with computing resources need to have the second type of decoding function . Under our theorem, pattern recognition can continue under the data is encrypted. Theorem 3.1 is an extension of asymmetric encryption and the basic form of Tripartite Asymmetric Encryption (TAE). The following shows how TAE in distributed machine learning system can be implemented by eliminating constraints.

We use the analysis method to find the key of the problem. The constraint of the data hiding problem is that the second kind of decryptor must identify the encrypted data with high accuracy. In order to eliminate the constraint, we assume the second type of decryptor is an unconstrained optimization model, and this unconstrained optimization model as the second type of decryptor can be able to better recognize the encrypted dataset in all distribution.

We assume that the second type of decryptor is a neural network model. Neural networks can theoretically fit arbitrary nonlinear functions. In order to prove that the neural network model can recognize the encrypted dataset better, we define the concept of Function Compound Closure(FCC).

Definition 3 ().

Let countable functions set . Then can construct a new set as follows:

1. For all .

2. For any , if can be compound, and there exists inverse operation of compound operation, such that , .

Then set is called the closure of set under the function composition(FCC).

Suppose the neural network contains convolution, full connection, pooling and activation function operation structure

(L. et al., 2015), then the neural network is a function composition. However, we expect the neural network model is a closure under the function composition(FCC), so we can explain that there exists a neural network that can decode a dataset which is mapped to a higher dimension by neural network according to the fact that there exists an inverse operation of compound operation in the definition 3. Its mathematical expression is as follows.

Theorem 3.2 ().

Suppose the dataset can be classified correctly, and there exists a neural network that can decode the data encoded by neural network , then there exists a neural network mapping the back to the low-dimensional space , where maps the dataset to the high-dimensional space .


Let Neural Network FCC

Let a dataset

Here has attributes.

Function compound operation from , which makes dataset be mapped to high-dimensional space into dataset .

is FCC

can reduce the data dimension(e.g., fully-connection)

Neural Network , which can reduce to the original low dimension space and restore it to the dataset .

holds all the feature information about .

can be classified correctly

Neural Network , which can classify .

Theorem 3.2 demonstrates that there exists a neural network that can decode the encrypted data to its original state or its correct classification, where the encrypted data is encoded by a neural network encoder that maps the data to a higher dimension. Therefore, we provide a complete mathematical proof for the use of neural network as an unconstrained optimization tool to solve data hiding problems, which is also the theoretical basis of our encrypt algorithm implementation(ArchNet). We propose measurement standards to evaluate the performance of an encryption model. We expect the encryption method is difficult to be cracked by malicious users, and it is easy for a deep learning model to recognize the encrypted data pattern.

3.3. Difficulty to Steal

For data hiding, we expect the owner of the encrypted dataset can not obtain the original dataset without the first type of decryption function. We have proved that in order to obtain the first type of encryption function, the proposition R in section 3.2 must be satisfied. But for the owner of encrypted dataset, proposition R is not satisfied. It can not steal the original dataset theoretically. Therefore, the dataset encrypted by our policy is difficult to steal.

3.4. Computability

The data encryption method should make computer recognize patterns better through the deep learning model. The encryption algorithm with high computability should ensure the accuracy of dataset and dataset

is approximately equal on the validation set when the number of training epoch is same.


Here denotes the original dataset and denotes the encrypted dataset. Choosing the validation accuracy to measure the computability of encryption method largely depends on the quality of the base model. In order to remove the influence of the base model, we propose an indicator . Suppose we solve a image classification problem , encryption method acts on the original dataset. The computability of encryptor with classifier is expressed as follows.


Here denotes easy to calculation, and denotes the number of training epoch. During the same epochs, the higher the is, the worse the encryption method computability is. The encrypted data is difficult to recognize for deep learning model. The lower the is, the better the encryption method is. The encrypted data is easy to recognize for deep learning model.

4. ArchNet Structure

Figure 3 presents a high level overview of our approach. We describe the key components and the motivating examples of ArchNet in detail below.

4.1. Overview of ArchNet

Figure 3. The high level overview of the ArchNet.

According to Theorem 3.2

, the premise of using neural network as decoder is that it can map data to high-dimensional space. Image is a tensor and it has three dimensions: channel, height and width. The channel is displayed in pixel color, and the length and width are displayed in pixel location. The target of encryption method is to prevent people from recognizing the encrypted data. We propose a method to map the original data to high-dimensional space. It is different from the denoising self-encoder

(Bengio et al., 2013) to reduce the data dimension. People can recognize data less than 3 dimensions, but can not recognize data over 3 dimensions. According to Theorem 3.1, in the process of training encryptor and decryptor, we keep the combination of them to produce unit mapping as shown in Figure 3

. Therefore, the input dataset and the target dataset of the training model are the same. When the training is over, the input dataset and the target output dataset are split from the middle high-dimensional data. The first model is the first type of encoding function, and the second mode is the first type of decoding function. The dataset publisher can encrypt the training data by the first model to obtain the encrypted dataset. The dataset user does not need to have any part of the first type of functions. The image shape of the model layered output is shown in Figure 3, which is similar to an arch. We call this encryption and decryption method ArchNet. The first kind of encoding function which maps the original dataset into high-dimensional space is called H-encoder. The first kind of decoding function which restores the encrypted dataset from high-dimensional space to the original dataset is called L-decoder.

Figure 4.

The concrete architecture of the ArchNet. Yellow presents the fully-connected layer. Blue presents the Relu activation function. Black presents the convolutionion layer.

4.2. H-encoder

H-encoder consists of several basic modules of neural network. It can not be training separately without L-decoder. In order to make H-encoder keep the original distribution of data, we primarily focus on convolution layer. In order to make the model more difficult to steal when encoder is applied to simple dataset, fully-connected layer and activation function (i.e., ReLU, Softmax, Tanh etc.) need to be added to H-encoder. Because pooling layer will lose data information, we do not recommend adding pooling layer to it. It is not suitable to use pure convolution layer, because convolution layer is more regular and can not hide the data. The data in the middle layer can also be recognized by human beings if only use convolution layer as shown in figure 8. The output of H-encoder is high-dimensional data. In order to expand the dimension of data, we add transpose convolution layer at the end of H-encoder.

4.3. L-decoder

L-decoder is the implementation of first type of decoding function. The high-dimensional output of H-encoder is the input of L-decoder, whose goal is to remap the high-dimensional output to the original dataset. Convolution layers are included in L-decoder. Convolution layer can retain the local characteristics of data. For simple datasets, L-decoder includes fully-connected layer and activation function. In principle, L-decoder and H-encoder are symmetrical in structure. The purpose of unit mapping can be achieved by combining both of them. The difference between the two is that L-decoder does not have transposed convolution layer. We demonstrate of neural network design schemes from the dataset structure. For example: For the simple dataset(i.e., MNIST), fully- connected layer and activation function are added in the neural network to increase the complexity of the encrypted dataset. For the complex dataset(i.e., Cifar-10), only convolution layer is used in the neural network to increase the computability of encrypted dataset.

4.4. Training Strategy

ArchNet can be used to encrypt a variety of data under different tasks. This paper specifically primarily focuses on data encryption in image classification. The quality of the base model is significant when training the encrypted dataset. Considering the learning strategy of ArchNet, suppose there is a training set , is an image set, is a label set corresponding to the image set. The training task of ArchNet includes parameter function . The goal is to obtain the parameter such that

Where defines the loss between the output of ArchNet and the real output

. We use the binary cross entropy as the loss function, then the loss function is defined as:



denotes the number of elements in the dataset. ArchNet achieve better results by using uniform distribution to initialize parameters. The neural network with gradient back propagation outperforms other evolutionary optimization strategies in training convergence time.

4.5. Universality

ArchNet is the implementation of TAE, which is an encryption algorithm in deep learning system. It has no effect on the learning of base machine learning model. ArchNet can be expressed as , where is the first decoding function and is the first encoding function. In the general strategy, , where denotes the label of and denotes the second type of decoding function. ArchNet supports a diversity of base deep learning algorithms in distributed machine learning system. It is different from FHE, there are many choices of function . Compared with the traditional general encryption algorithms such as RC4 and AES(Daemen and Rijmen, 1998), the ArchNet model is more stable. It adapts to a variety of base models while maintaining a higher accuracy. The accuracy difference between the ArchNet model and the original model is less than 1%, which makes the equation more consistent.

4.6. Data Preprocessing

Data quantity, data quality and data distribution as three representations of data affect the effect of distributed machine learning from different perspectives. In distributed machine learning system, the amount of data affects the efficiency of Internet transmission. If the data quality is miserable, the accuracy of the base model is relatively low, and the accuracy of the encrypted data training is not high. The impact of data quality on ArchNet is significant. As the first type of encryption and decryption function, the encrypted data still has similar structural characteristics with the original data. Therefore, the data quality determines the data distribution. The data distribution affects the encrypted data distribution after ArchNet encryption. To get a better pattern recognition accuracy in distributed machine learning, we need to remove the noise in the data as much as possible to ensure that the original data can perform well in the base model.

4.7. Time Delay

In the distributed machine learning system, time delay is inevitable. If time delay is defined as , then , where denotes the delay of training ArchNet model, denotes the delay of network data transmission, which is divided into four parts: data publisher to machine learning server, machine learning server to data user(i.e., computers with computing source) to machine learning server, machine learning server to data publisher. Since the four parts are all network delays, we approximately equivalent them to the equivalent delay . denotes the time of computing on the computer resource side, and denotes the time of data staying in the machine learning server and waiting for allocation. From the actual situation, the delay of is the smallest, is different because of the difference in training computers. In general, delay is the largest. The generation of is determined by the size of dataset and its batch size.

4.8. A Motivating Example

Figure 5. The processing of the MNIST dataset.

We take MNIST dataset as an example to demonstrate the data flow in distributed machine learning system. MNIST dataset is a typical image classification dataset. Now suppose the dataset publisher wants to find a model to recognize handwritten numbers, but it is not like using its own computer. It is not sure that its algorithm is better. So it uses MNIST dataset to train ArchNet for encryption, then encrypts the MNIST dataset and sends it to machine learning service provider. Machine learning service provider receives the encrypted dataset and looks for a device that wants to receive computing resources. The device receives the encrypted MNIST dataset and begins to use its own design of deep learning algorithm training model. After the training, the trained model will be sent back to the service provider, and the service provider will validate the accuracy and then pass it to the dataset publisher.

5. Experimental Evaluation

In this section, we discuss our implementation and how we fine-tune ArchNet to achieve optimal performance. We will release our implementation later on GitHub. 111 All our measurements are performed on a system running Ubuntu 16.9 with NVIDIA GTX 2080 Ti GPU. Dataset generation is implemented on Intel (R) core (TM) i7-8700 CPU @ 3.20GHz.

5.1. Model Architecture

Our ArchNet model is implemented on MNIST, fashion MNIST and Cifar-10 datasets with pytorch-1.3.0 as DL-framework. H-encoder consists of four convolution layers, one transposed convolution layer and one fully-connected layer. The fully-connected layer uses ReLU as its activation function. This function intends to increase the nonlinearity of the encoder. The tensor size is doubled in the transposed convolution layer. L-decoder consists of 8 convolution layers, two of which use ReLU as activation function. The ArchNet model was trained 10 epochs (

i.e., 10 complete passes of the dataset) to achieve a high accuracy (e.g., the average accuracy in MNIST dataset is above 99.9%). The training time for the MNIST dataset using ArchNet is less than 10 minutes, and that of Cifar-10 dataset is less than 15 minutes. The layer with the largest parameter quantity of ArchNet model is fully-connected layer. The number of parameter in ArchNet is . Because Cifar-10 is a complex dataset, it does not need a full connection layer to increase the difficulty of stealing. Table 1 shows the ArchNet architecture in detail.

Method ArchNet RC4
F-MNIST 82.31% 84.15% -2.23% 82.31% 10.60% 87.49%
MNIST 97.31% 97.26% 0.05% 97.31% 12.65% 87.00%
Cifar-10 80.22% 79.80% 0.52% 80.22% 10.65% 86.72%
Table 1. Evaluation on ArchNet and RC4

5.2. Training Data

For each validation dataset, we construct H-encoder and L-decoder and train them for 50 epochs. The results shows that their accuracy are more than 99%. For MNIST and F-MNIST datasets, the training samples to validation samples ratio is 6:1. For Cifar-10 datasets, the training samples to validation samples ratio is 5:1. We get H-encoder at the end of ArchNet. And use H-encoder to encrypt dataset to get encrypted dataset. We use encrypted dataset to train the selected base model to train the final model. The accuracy of the final training model is verified by encryption validation set, and the result is obtained by value.

Figure 6. , values of RC4 and ArchNet compared with on F-MNIST, MNIST, Cifar-10.

5.3. Training Strategy

For ArchNet, we use Adam as the optimization algorithm. The initial learning rate is . We choose the Mean Square Error function as the loss function which can directly show the difference between the output image and the real image. In the initialization of neural network parameters, we use the uniform initialization method. We use Resnet151(He et al., 2016)

as the base model of Cifar-10 and convolutional neural network as the base model of MNIST and F-MNIST. The two basic models can better evaluate the effect of ArchNet. The base models are trained on MNIST dataset, Cifar-10 and F-MNIST dataset for 100 epochs. For the comparative experiment DP, we use tensorflow-privacy package with SGD optimal strategy.

5.4. Evaluation Results

The accuracy rate and relevant information of of validation dataset used in our experiment are shown in table 1. Where denotes the Accuracy of the Original dataset, denotes the accuracy of the encrypted dataset, and is defined by formula 3.

value is related to base model and dataset. value is primarily related to base model, but also related to encryption policy. value is primarily related to encryption policy.

(a) pic1.
(b) pic2.
(c) pic3.
Figure 7. The accuracy of the base model on different datasets. Subfigure (a) shows the training results on Fashion-MNIST dataset. Subfigure (b) shows the training results on MNIST dataset. Subfigure (c) shows the training results on Cifar-10 dataset.
parameter 122,960,821 122,960,821 14,961
Table 2. ArchNet structure on different Datasets

5.5. Analysis on the Difficulty of Stealing

Pure convolution ArchNet maps the data A to a high dimensional form B. We select three dimensions from B to visualize. Figure 6 is an image generated by passing the MNIST dataset through ArchNet with pure convolution layers. Under the regular operation of convolution layer, human can easily distinguish the characteristics of dataset. The reason is the receptive field of convolution is similar to that of human eyes. Convolution layer is only a two-dimensional linear processing of data, and does not break up the original distribution of data. Figure 7 uses ArchNet with activation function and fully-connected layer for simple dataset. It can break up the original distribution that the human can perceive. We can not obtain any other useful information through the visualization of encrypted data. Therefore, it is difficult to steal the original dataset through the ArchNet encrypted dataset.

(a) pic1.
(b) pic2.
(c) pic3.
(d) pic4.
(e) pic5.
(f) pic6.
(g) pic7.
(h) pic8.
Figure 8. mnist
(a) pic1.
(b) pic2.
(c) pic3.
(d) pic4.
Figure 9. cifar-10
Figure 10. Accuracy compare between DP and ArchNet

5.6. Operability Analysis

The value reflects the operability of the general encryption algorithm. In our experiment, the value reflects the difference between the pattern recognition of the encrypted dataset and the pattern recognition of the original dataset in the case of this encryption method. We show the difference as follows. When the original dataset is MNIST, the dataset value encrypted by ArchNet is 0.05%, which is much smaller than the value of 87.00% encrypted by RC4 algorithm. Similar effects exist in different datasets. Our method is much better than general encryption algorithm in operability.

5.7. Convergence Relation Analysis

The convergence curve of the base model in the training process based on encrypted dataset is basically the same as that in the training process based on original dataset. It shows that the encrypted dataset of ArchNet is close to the original dataset in convergence relation. In the distributed machine learning system, this convergence relation proves that in the relation between and is very small, which is defined in section 4.7. Because the training curve of encrypted dataset is similar to that of original dataset, the computing end with computing resources can be optimized on the basis of the existing model without considering encryption methods. As shown in figure 10, compared to the Difference Privacy policy with SGD, ArchNet is lack of stability but shows a small superiority on the accuracy.

5.8. General Analysis

The values of the three datasets encrypted by ArchNet are less than 1%. The values of the three datasets encrypted by RC4 are around 87.00%. The value appears negative when validating the F-MNIST dataset, which make the samples enhanced after ArchNet maps the data to the high-dimensional space. The same base model is easier to classify the data in the high-dimensional space. If the same encryption method is applied to different datasets and the value is similar, then the encryption method is independent of datasets. This encryption method has good universality. For the above ArchNet and RC4 algorithms, their values are similar in different datasets. Therefore, the generality ArchNet is the same as RC4.

6. Related Works

In this section, we demonstrate three areas that are related to the ArchNet.

6.1. Distributed Machine Learning

Distributed machine learning is a wide range of concepts, including multiple computing units of distributed learning model, big data distributed learning model, etc. Recently, there are researches on using distributed technology to improve the performance of traditional machine learning. Based on the concept of model sharing, a big data analysis system is introduced by Jie Jiang etc.(Jiang et al., ) The high-dimensional big model is reasonably divided into multiple sub-model server nodes. Some researchers also apply the concept of distributed to specific scenarios, such as medical, legal and other fields(Metsker et al., 2019; Li et al., 2019; Liu and Tian, 2019). However, the existing research of distributed machine learning system mainly focuses on the synchronization and data distribution of distributed machine learning(Ho et al., 2013; Parsaeefard et al., 2019). On the contrary, our distributed machine learning system focuses more on the innovation of business model, and uses ArchNet to solve the problem of data hiding in this business system.

6.2. Neural Network Encryption

The problem of neural network encryption is a hot topic. The theory of fully homomorphic encryption proposed by Gentry lays a foundation for encryption theory of complicated data(Gentry, 2009). The CryptoNets neural network model proposed by Dowlin et al(Xie et al., 2014). It use FHE to realize deep learning of privacy protection. They provide a framework for designing neural networks that can run on encrypted data, and propose a polynomial approximation using the Relu activation function. CryptoNets and its derived neural network for solving encrypted data are gradually improved(Boemer et al., 2019; Juvekar et al., 2018; Chou et al., 2018). Some researchers use encrypted neural network to solve the edge computing problem on IoT devices(Tian et al., 2019). However, the existing solution of neural network intends to design neural network with encrypted data. On the contrary, we research on how to encrypt data by certain methods. Therefore, the neural network without special processing can identify its pattern. It can increase the diversity of algorithm in the data computing end of distributed machine learning system.

6.3. Model Stealing and Prevention

In the application of machine learning with network, we need to solve the problem of model stealing. Model stealing refers to how to prevent data leakage when sensitive training data in machine learning model may leak personal privacy. Nicolas Papernot et al. proposes PATE that can differentiate the privacy data into different models through the strategies of students’ and teachers’ models to prevent model stealing(Papernot et al., 2018). Yunhui Long et al. improve PATE method by GAN(Long et al., 2019). Some researches also put forward aggressive schemes from statistical machine learning model to deep learning model stealing and corresponding countermeasures(Kesarwani et al., 2017; Juuti et al., 2018; Shi et al., 2018; Lee et al., 2018). However, the existing method is to prevent the server from stealing the client’s data in the machine learning of cloud computing, while our system is a machine learning system adopted in the distributed scenario, using two types of keys to lock the dataset. Our method makes the second type of key more flexible.

7. Conclusions

We propose the basic form of distributed machine learning system, give the basic form of three-way asymmetric encryption from the mathematical point of view, and give two formal theorems about distributed encryption. We further demonstrate how ArchNet can solve the problem of data hiding in distributed machine learning. Our experiment can well prove the correctness of the data hiding principle proposed by us. Compared with the traditional encryption algorithm of RC4, it also shows that the model based on neural network can well complete the encryption and decryption tasks in the distributed machine learning system from the aspects of difficulty to steal and operability. ArchNet can use in many ways, such as 3D Picture transmission and remote model extraction.


  • Y. Bengio, L. Yao, G. Alain, and P. Vincent (2013) Generalized denoising auto-encoders as generative models. External Links: Link Cited by: §4.1.
  • F. Boemer, A. Costache, R. Cammarota, and C. Wierzynski (2019) NGraph-he2: a high-throughput framework for neural network inference on encrypted data. pp. . Cited by: §6.2.
  • E. Chou, J. Beal, D. Levy, S. Yeung, A. Haque, and L. Fei-Fei (2018) Faster cryptonets: leveraging sparsity for real-world encrypted inference. External Links: Link Cited by: §6.2.
  • J. Daemen and V. Rijmen (1998) AES proposal: rijndael. Cited by: §4.5.
  • M. Farhadi, M. Ghasemi, and Y. Yang (2019) A novel design of adaptive and hierarchical convolutional neural networks using partial reconfiguration on fpga. In HPEC, External Links: Link Cited by: §1.
  • C. Gentry (2009) A fully homomorphic encryption scheme. Cited by: §6.2.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. pp. 770–778. External Links: Document Cited by: §5.3.
  • Q. Ho, J. Cipar, H. Cui, K. K. Jin, and E. P. Xing (2013) More effective distributed ml via a stale synchronous parallel parameter server. Advances in Neural Information Processing Systems 2013 (2013), pp. 1223–1231. Cited by: §6.1.
  • F. Huot, Y. Chen, R. Clapp, C. Boneti, and J. Anderson (2020) High-resolution imaging on tpus. In ISC HPC, External Links: Link Cited by: §1.
  • [10] J. Jiang, L. Yu, J. Jiang, Y. Liu, and B. Cui Angel: a new large-scale machine learning system. National Science Review v.5 (2), pp. 102–122. External Links: Link Cited by: §6.1.
  • M. Juuti, S. Szyller, A. Dmitrenko, S. Marchal, and N. Asokan (2018) PRADA: protecting against dnn model stealing attacks. External Links: Link Cited by: §6.3.
  • C. Juvekar, V. Vaikuntanathan, and A. Chandrakasan (2018) Gazelle: a low latency framework for secure neural network inference. External Links: Link Cited by: §6.2.
  • M. Kesarwani, B. Mukhoty, V. Arya, and S. Mehta (2017) Model extraction warning in mlaas paradigm. External Links: Link Cited by: §6.3.
  • A. Krizhevsky (2009) Learning multiple layers of features from tiny images. External Links: Link Cited by: §1.
  • Y. L., Y. B., and G. H (2015) Deep learning. Cited by: §3.2.
  • T. Lee, B. Edwards, I. Molloy, and D. Su (2018) Defending against model stealing attacks using deceptive perturbations. pp. . Cited by: §6.3.
  • H. Li, S. A. L, Q. Huining, M. Aditya, D. Hao, and L. Dianbo (2019) Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records. Journal of biomedical informatics (99). Cited by: §6.1.
  • H. Liu and G. Tian (2019) Building engineering safety risk assessment and early warning mechanism construction based on distributed machine learning algorithm. Safety Science (120). Cited by: §6.1.
  • Y. Long, S. Lin, Z. Yang, C. A. Gunter, and B. Li (2019) Scalable differentially private generative student model via pate. External Links: Link Cited by: §6.3.
  • O. Metsker, E. Trofimov, M. Petrov, and N. Butakov (2019) Russian court decisions data analysis using distributed computing and machine learning to improve lawmaking and law enforcement. Procedia Computer Science 156. Cited by: §6.1.
  • I. Mironov (Aug 2017) Renyi differential privacy. Proceedings of IEEE 30th Computer Security Foundations Symposium CSF 2017, pp. 263–275. External Links: Link Cited by: §1.
  • MIRONOV (2002) (Not so) random shuffles of rc4. In Proceedings of CRYPTO, pp. 304–319. Cited by: §1.
  • N. Papernot, S. Song, I. Mironov, A. Raghunathan, K. Talwar, and Ú. Erlingsson (2018) Scalable private learning with pate. In ICLR, External Links: Link Cited by: §6.3.
  • S. Parsaeefard, I. Tabrizian, and A. L. Garcia (2019) Representation of federated learning via worst-case robust optimization theory. In NeurIPS, External Links: Link Cited by: §6.1.
  • V. U. Prabhu (2019) Kannada-mnist: a new handwritten digits dataset for the kannada language. External Links: Link Cited by: §1.
  • Y. Shi, Y. Sagduyu, K. Davaslioglu, and J. Li (2018) Active deep learning attacks under strict rate limitations for online api calls. pp. . Cited by: §6.3.
  • Y. Tian, J. Yuan, S. Yu, and Y. Hou (2019) LEP-cnn: a lightweight edge device assisted privacy-preserving cnn inference solution for iot. External Links: Link Cited by: §6.2.
  • H. Xiao, K. Rasul, and R. Vollgraf (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. External Links: Link Cited by: §1.
  • P. Xie, M. Bilenko, T. Finley, R. Gilad-Bachrach, K. Lauter, and M. Naehrig (2014) Crypto-nets: neural networks over encrypted data. Computer Science. Cited by: §6.2.