1 Introduction
The lack of transparency and interpretability in neural networks is one of the root causes of their problems from the perspective of safety and IT security [1]. Starting from an initial state, a neural network is trained on training data using a certain algorithm. This gives rise to a new state of the neural network usually better fitted to solve the problem at hand. However, it is in general not possible to derive any reliable statements from this new state on what happened during training. Hence, it is extremely difficult to detect malicious data introduced by an attacker into the training data set in order to manipulate the resulting model in a socalled poisoning attack [4]. Although tests may be conducted after training using the new state of the network, due to the typically large dimension of the input space it is highly unlikely that they will reveal the effects of specially crafted poisoned data.
This document presents a different approach to the problem. Essentially, it allows verifying the correctness and integrity of the training procedure for a neural network afterwards. Hash functions, which are a standard cryptographic tool, are used to make sure an attacker cannot tamper with training data and claim that other data were used for training than those that were actually used. It is important to note that the ability to verify the training procedure is a large asset, but is not sufficient on its own to reliably detect and counter attacks during the training phase. Indeed, the approach only guarantees that the data provided were actually used for training the neural network. However, it does not make any statement about the integrity of the data themselves and the absence of manipulations, which still need to be checked. This can be done either completely by hand, which is typically unrealistic, by an automatic procedure or by using some technical preprocessing and manual inspection of a much smaller number of data items. Several methods have been proposed to detect poisoning attacks [3, 12, 14]. They essentially rely on clustering algorithms, which may take into account not only the data themselves but also the behaviour of the neural network when processing these data. Although some of these methods work quite well under certain circumstances, they cannot yet be used as a reliable tool for detection, especially when facing more sophisticated poisoning attacks, e.g. [10, 13].
The document starts with some definitions concerning neural networks and briefly reviews hash trees, the main building block of the proposed approach. It then presents a solution which allows verifying the correctness and integrity of training for a neural network in section 3. Two cases are distinguished. The straightforward solution from subsection 3.1 targets the whole training procedure, whereas a modified version, presented in subsection 3.2, allows selectively verifying some parts of the training procedure. The raison d’être of the modification is that the standard solution requires completely redoing the training procedure for the purpose of verification, which may come with very high costs. The modified solution takes samples, and can thus balance the costs by choosing the desired proportion of the training procedure to be checked. However, this partial verification procedure is susceptible to certain attacks, which are analysed in section 4. The solutions outlined in section 3 are extended in greater technical detail in Appendix A.
2 Definitions
In this document, we consider a neural network to be a function mapping data from the input space to the output space , which is defined by its general setup (including the type of network and the layers used) and the set of weights
connecting its neurons.
In the training phase of the neural network, the initial weights , are updated using (batches of) data from the training set , while the general setup is fixed. One training step may thus be formalised as a function
which induces a function
where the setup of is the same as that of and the weights of are and is the batch size, which is usually fixed. We use the shorthand notation for a batch of data. The function
may implicitly depend on certain values, e.g. pseudorandom seeds, from which pseudorandom numbers are derived, or hyperparameters, cf.
subsection A.1. Furthermore, we define the map for numbering the training data.A hash tree [8] is a standard cryptographic data structure. In a hash tree, each leaf contains some data (or its hash value), and each node contains the hash value of some combination (typically, the concatenation) of the contents of its children. Using hash trees, one can efficiently verify the integrity of large amounts of data. In order to verify the presence of a leaf in a hash tree, one needs to compute all intermediate hashes until one reaches the root hash of the tree. This requires computing a logarithmic (in the number of leaves) number of hashes if the number of children of a node is constant (e.g. two for binary hash trees). More precisely, for a binary hash tree, given a leaf , and the root hash , one needs to compute the hashes and check that holds, where is the hash function used, the tree has depth , is the sibling (in this example, the right sibling) of , respectively, and concatenation of and is denoted by . Besides and , the values , are thus necessary to verify that is contained in the tree with root hash . Generalising the verification procedure to nonbinary hash trees is straightforward. At each level, it requires the values of all siblings of the node through which the hash chain to the root passes.
We assume that the hash tree uses a cryptographically secure hash function . In addition, we make the assumption that the root hash is protected using a secure digital signature algorithm and that the signature itself is properly authenticated (e.g. via a PKI), which prevents tampering. Recommendations for these functions may for instance be found in [2].
3 Verification procedure
This section outlines the procedures for a complete or partial verification of the training procedure. Both solutions use hash trees. In principle, the complete verification, which needs to use all the data and completely recalculate the hash tree, could also use other hashbased data structures, like hash lists. However, for the sake of consistency we present the solution based on hash trees, since the partial verification does rely on properties of hash trees. In each solution, the proving party needs to store data specifying the training procedure, merge it into a hash tree and sign its root hash. Upon request by the verifying party, the proving party provides some information from the hash tree as well as the relevant information from the training procedure. This allows the verifying party to check the training procedure was conducted as stated by the proving party.
3.1 Complete verification
We denote our function with the initial set of weights . This function is now trained by repeated application of the function , where the second argument may differ between iterations, but is usually fixed as stated in section 2. This gives rise to a chain of transformations
with . Using the map and the function , knowledge of and the indices completely determines .
This observation gives rise to the following idea. We use a hash tree whose leaves contain the following information (for more details, see subsection A.1):

Meta data

Information determining the setup of

Information determining the function

Information determining the map

The indices used for training

The initial weights of
The data themselves need not be directly included in the hash tree, but rather their hash values are stored in category 4. (see subsection A.1 for details). We assume that these data are in any case stored by the proving party as a backup or for future use, whether or not the information for subsequent verification is generated during the training procedure.
Then given a neural network with weights and the digitally signed root hash of , the verifying party can check that was derived from using the training procedure as specified by . Since the result of training is deterministic given all the information contained in the leaves of , it is not possible to provide false information on the data used for training or on the method applied.
More precisely, the verifying party will check the following conditions in the stated order:

The digital signature of is authentic and correct.

Hashing the information mentioned above in the right way gives .

The training data hashes to the values stored in 4.

Applying the training procedure as specified by to with initial weights yields .
3.2 Partial verification
While the straightforward approach from subsection 3.1 can be used for the purpose of verification, it would require a lot of computational resources. The resources for the first three steps can be assumed to be negligible in comparison, but the fourth step requires the verifying party to repeat all the computations necessary to reach the final state starting from . One can assume that in many cases this is not acceptable or even infeasible for the verifying party. On the one hand, the computing power required may be prohibitive for large neural networks, even if we assume that the final training, which results in a neural network meeting the developer’s goals, accounts for only a small fraction of the total computational effort expended in development. On the other hand, the proving party may not want to disclose all data used for training the network, for instance for protecting its intellectual property or due to data protection requirements. In such a case, one can modify the procedure in a way which allows proving the correctness and integrity of any batch of multiple intermediate training steps. Proving these properties for all intermediate training steps would amount to proving them for the complete training procedure. The batches to be checked can later be chosen by the verifying party. The proving party then needs to provide the data necessary for checking, and the verifying party can use the hash tree and these data to check the respective batches. Depending on the amount of training steps to be checked, only a small subset of the training data may need to be disclosed, thus largely protecting the proving party’s intellectual property, since hash values from intermediate levels of do not leak any information on these data.
More precisely, we can use additional checkpoints between the initial state and the final one , where , and for all . The number is a parameter. The concrete value that should be assigned to it depends on the computational effort required for the transition between two checkpoints and the space required for storing a checkpoint.
A checkpoint is defined by the tuple , where is the number of times was applied to to arrive at (in other words, the number of training steps), and the neural network is defined by its weights . may be empty or, if applicable, hold additional required information. For instance, may contain the state of the pseudorandom number generator at this point, if this information cannot be straightforwardly derived from the initial pseudorandom seeds (stored in 3., cf. subsection A.1) and the value itself.
Then given and its weights for some , and its weights, and the training data used for the transition from to , the verifying party can check the correctness and integrity of the training steps for the transition between the checkpoints and . This is done by recomputing these steps and checking that by hashing the respective information one ultimately arrives at the root hash of the hash tree . As before, a digital signature of must also be provided.
In this case, the hash tree includes the following information:

Meta data

Information determining the setup of

Information determining the function

Information determining the map

For each with :

The value itself, i.e. the number of training steps since the start of training to arrive at

The weights of

The indices of data used in the training steps between and (which we define as the empty set for )

If applicable, additional information from

The verifying party will check the following conditions in the stated order:

The digital signature of is authentic and correct.

The information from 1.–4. is contained in the hash tree .

For each transition from to to be verified:

The respective information from 5. is contained in the hash tree .

The training data used in the training steps between and according to 5.(c) hashes to the values stored in 4.

Using the training set items as specified by the respective indices and applying the training procedure as specified in 1.–4. to yields .

Whenever the presence of information in the hash tree is checked, this is done using the properties of as discussed in section 2. The proving party needs to furnish all information from intermediate levels of the hash tree which is necessary for these calculations.
When choosing the number
of transitions between checkpoints to be verified, there is a tradeoff between efficiency and the integrity guarantees attained. On the one hand, decreasing this number reduces the computing time required to repeat the calculations. On the other hand, when checking less transitions, one only checks the correctness of a smaller portion of the training procedure, and the probability of discovering integrity violations diminishes. In any case, the concrete transitions to be checked must not be known beforehand, since using a predefined set of transitions would allow an adversary to hide malicious changes without any risk of being exposed.
4 Security analysis
In the case of partial verification, an attacker can provide a certain amount of false data and has some chance that he will not be exposed. Since only a certain amount of checkpoint transitions are verified, if the attacker provides false data for a small number of transitions, his risk of exposure is quite low. Assume there are checkpoint transitions in total, of them are verified (using random sampling) and the attacker manipulates the data for transitions, then the probability that this will go unnoticed is about
For instance, if the verifying party chooses to check transitions and the attacker has manipulated data for transitions, we get . We note that the precision of the first approximation is quite good, unless gets close to (i.e. a significant part of the transitions are checked); concerning the second approximation, convergence to the exponential function is quite fast. For example, taking and again using and , the exact formula for evaluates to .
The attacker may achieve even higher values for , but whether this is still feasible (since necessarily needs to hold) depends on the values of and , which he cannot directly influence. In addition, a very small value for means the attacker can only use a small amount of poisoned samples for training, which might severely degrade the performance of a poisoning attack, rendering it ineffective.
However, in case an attacker tampers with the verification data, there is no need for him to only provide false information on which data were used for training during a particular transition (category 5.(c) in subsection 3.2). He might as well lie about the number of training steps taken during this transition (category 5.(a)). In this way, a much more powerful attack may become feasible, since the attacker can generate the verification data for a particular transition and have it contain an average, inconspicuous number of training steps, whereas in fact that transition included many more steps and introduced a massive amount of poisoned data. Essentially, it is not possible to guarantee the correctness of the number of steps asserted for a transition without performing the verification procedure for this transition.
One approach to mitigate this problem would be to use a more sophisticated algorithm for sampling the transitions to be checked. For instance, one might choose those transitions with a higher probability whose initial and final weights and/or performance differ much more than is the case on average. Intuitively, this should increase the probability of finding transitions using an unusually large number of training steps. However, the attacker might additionally tamper with the initial or final weights of transitions adjacent to the transition he originally targeted in order to level out the changes and defeat the verifier’s heuristic. For instance, when targeting transition
and restricting manipulations to transitions as above, he might also change the information from category 5.(b) for transitions . If this successfully levels out any information the verifier might use, the probability of the attack going unnoticed is again as computed above.5 Conclusion and outlook
The integrity of the training procedure of neural networks can be protected using wellknown cryptographic mechanisms, which also allow another party to verify the integrity afterwards. This document has outlined a proposal on how to adapt the cryptographic mechanisms to the setting in question. While integrity can only be guaranteed with certainty by completely repeating the training procedure, the partial verification procedure as specified above can give a verifier a certain amount of confidence about the integrity and expose an attacker manipulating data to the risk of being detected. There is a tradeoff between the level of confidence, based upon the probability of manipulations going unnoticed, and the computational effort and storage space required for verification (see subsection A.3), which can be tuned using several parameters (the number of checkpoints to be stored and of transitions to be verified).
The more sophisticated attack scenarios on the partial verification procedure presented in section 4
, which are based on including false meta data about the number of training steps between two checkpoints, and possible mitigations could be further explored both analytically and empirically to derive more accurate estimates on the probability of successful attacks.
It is important to note that protecting and verifying the integrity of the training procedure does not in itself prevent poisoning attacks, which introduce specially crafted malicious training data, but is just one building block for solving this problem. Rather, the absence of malicious training data must additionally be confirmed using methods for poisoning detection.
This document focused on protecting the integrity of the training procedure as the key stage of the life cycle of neural networks, but the underlying ideas lend themselves to an easy generalisation to other stages of this life cycle. For instance, protecting the integrity of transmitted sensor data, their curation and their preprocessing all the way to the training data set can effectively prevent the addition of poisoned samples, if properly implemented.
The security of the proposed solution is based on the security of the cryptographic mechanisms used. These should hence be chosen and implemented with care. In particular, digital signatures must be used for sealing the hash tree against tampering and they must be properly authenticated.
Acknowledgements
The author would like to thank Ute Gebhardt and Matthias Neu for carefully proofreading earlier versions of this document and providing valuable suggestions for improvement.
References
 [1] Christian Berghoff, Matthias Neu, and Arndt von Twickel. Vulnerabilities of Connectionist AI Applications: Evaluation and Defence. arXiv preprint, abs/2003.08837, 2020.
 [2] BSI. TR02102 Cryptographic Mechanisms: Recommendations and Key Lengths. Technical report, Bundesamt für Sicherheit in der Informationstechnik, 2020.
 [3] Bryant Chen, Wilka Carvalho, Nathalie Baracaldo, Heiko Ludwig, Benjamin Edwards, Taesung Lee, Ian Molloy, and Biplav Srivastava. Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering. In Huáscar Espinoza, Seán Ó hÉigeartaigh, Xiaowei Huang, José HernándezOrallo, and Mauricio CastilloEffen, editors, Workshop on Artificial Intelligence Safety 2019 colocated with the ThirtyThird AAAI Conference on Artificial Intelligence 2019 (AAAI19), Honolulu, Hawaii, January 27, 2019, volume 2301 of CEUR Workshop Proceedings. CEURWS.org, 2019.
 [4] Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv preprint, abs/1712.05526, 2017.

[5]
Ian J. Goodfellow, Yoshua Bengio, and Aaron C. Courville.
Deep Learning.
Adaptive computation and machine learning. MIT Press, 2016.

[6]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
Deep Residual Learning for Image Recognition.
In
2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 2730, 2016
, pages 770–778. IEEE Computer Society, 2016.  [7] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 79, 2015, Conference Track Proceedings, 2015.
 [8] Ralph C. Merkle. A certified digital signature. In Advances in Cryptology  CRYPTO ’89, 9th Annual International Cryptology Conference, Santa Barbara, California, USA, August 2024, 1989, Proceedings, pages 218–238, 1989.
 [9] David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning representations by backpropagating errors. Nature, 323(6088):533–536, 1986.
 [10] Aniruddha Saha, Akshayvarun Subramanya, and Hamed Pirsiavash. Hidden Trigger Backdoor Attacks. arXiv preprint, abs/1910.00033, 2019.
 [11] Karen Simonyan and Andrew Zisserman. Very Deep Convolutional Networks for LargeScale Image Recognition. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, San Diego, 2015. http://arxiv.org/abs/1409.1556.
 [12] Brandon Tran, Jerry Li, and Aleksander Madry. Spectral Signatures in Backdoor Attacks. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò CesaBianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 38 December 2018, Montréal, Canada, pages 8011–8021, 2018.
 [13] Alexander Turner, Dimitris Tsipras, and Aleksander Madry. LabelConsistent Backdoor Attacks. arXiv preprint, abs/1912.02771, 2019.
 [14] Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y. Zhao. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. In Proceedings of the IEEE Symposium on Security and Privacy (SP), pages 707–723, 2019.
Appendix A Appendix: Technical details
a.1 List of data included in hash tree
This section makes a proposal on which data to include in the different categories as sketched in subsection 3.1 and subsection 3.2. The objective is to include all data which are necessary for deterministically reproducing the training procedure or parts thereof. If applicable, the lists may be extended or redundant information may be removed in concrete implementations. We advocate establishing consensus on a standardised list of data and on the structure of the hash tree (see subsection A.2) in order to make the approach interoperable between different parties with no or minimal modifications.
Meta data
The meta data contain all information necessary for parsing and interpreting the other pieces of information. This includes the concrete values of the constants (the total number of training steps) and (describing the number of checkpoints) as well as the number of values included in the different categories of data and the way they are structured.
Information determining the setup of
This category includes the following data:

General architecture of the neural network

Number of layers

Number of neurons in each layer

Ordering of weights (i.e. meta data on the order in which the weights are stored in the hash tree)
Information determining the function
This category includes the following data (readers unfamiliar with the general terminology of neural networks may for instance refer to [5]):

Optimisation method

Pseudorandom seeds

Regularisation terms

Values of hyperparameters
Information determining the map
This category includes the following data:

Size of training data set

, tuples of the number and the hash value of training set item
The indices used for training or between checkpoints and
This is straightforward.
Weights
a.2 Structuring the hash tree
The concrete structure of the hash tree affects the storage space and the amount of computations which are required for performing a verification. Binary hash trees would be the straightforward and standard solution. They offer the great advantage that for verifying the presence of a leaf they require at most one additional value at each level (if applicable, the sibling of the respective node) in order to compute and check the hash chain to the root hash. However, in our application the leaves whose presence is checked are not independent from each other and we often need to check whole batches of leaves at the same time anyway. Furthermore, binary trees require more intermediate levels than trees with more siblings. In particular, more intermediate values need to be stored. For instance, using the formulae for the geometric series it is easy to see that when using a hash tree where every node has four children instead of two, one can reduce the amount of intermediate nodes by up to a factor of three. Including the leaves, this leads to saving about one third of the storage space.
Due to this observation, we propose the following structure for the hash tree, which as a side effect tries to give some logical meaning to (some) intermediate nodes and thus make the scheme more easily comprehensible.
For complete verification, all the leaves of the tree need to be checked. In principle, one could imagine a tree with just one level, with the root hash being the hash of the concatenation of all the leaves. However, this would both completely obfuscate the semantic structure of the data to be stored and make debugging overly difficult. Instead, we propose to hash together data from the same category (i.e. meta data, information on , …) and finally compute the root hash from the concatenation of the categorywise root hashes (which we denote for category ). Data from the same category may be hashed together using concatenation of all items (which we propose for categories 1.–3., which contain relatively few data) or by using hash subtrees, whether binary or otherwise (which we suggest for the other categories). A sketch of the hash tree structure is depicted in Figure 1.
For partial verification, only a portion of the leaves needs to be checked. While the general information from categories 1.–4. is required as in the case of complete verification, only some sample of the information for the transition between two checkpoints is verified. This leads us to suggest the following layout for the hash tree: The root hash of the tree is again computed from the concatenation of categorywise root hashes (denoted for category again). Categories 1.–4. should be treated in the same way as for complete verification.
For category 5., for each with , the corresponding data should be hashed together. The top hashes for the different values of should be combined using a binary hash tree whose root hash is the root hash of category 5, . A binary tree is suggested, since only some values for are checked and they are not known beforehand. Using a binary tree makes sure that the amount of intermediate information required to recompute is as small as possible. For each , we compute the top hash as , where is the value from subcategory 5.(a), and , and are the top hashes of appropriate hash trees combining the data from subcategories (b), (c) and (d), respectively. Since the data from 5.(b), 5.(c) and 5.(d) for any selected need to be checked at the same time, we suggest to use nonbinary hash trees in order to save space. The exact number of children at each level of the hash tree can be chosen based on the amount of data to be stored and practical considerations regarding the implementation. Figure 2 sketches the proposed hash tree for partial verification.
a.3 Storage requirements
In this section, we estimate the storage overhead induced by the proposed approach. We assume that the training data themselves are in any case stored by the proving party as a backup or for future use and hence do not consider them in the analysis that follows.
Large neural networks can have up to about parameters [6, 11], and storing these parameters can require storage space in the hundreds of megabyte, which we estimate by when using 32 bits of precision. Hence, storing only the weights for checkpoints would require of storage space. The additional storage space for categories 1.–4. and 5.(a), 5.(c) and 5.(d) should be negligible in comparison (note that in category 4., only the hash values of the data items are stored, not the data themselves, which might have nonnegligible size).
For a fixed checkpoint , if the hash tree with root hash has, for instance, children at each level, the penultimate level includes nodes, each of size when using SHA256 as a hash function. Therefore, of storage space are required to store the penultimate level of the hash tree. The preceding layer would hence require , and so forth. In this way, the overall storage requirement for these hash trees when using children per level is less than the one required for storing the parameters themselves. The total storage requirement for storing the data corresponding to checkpoints is thus bounded by .
If weightdependent information for specific optimisation methods needs to be included (e.g. for the momentum method [9] or Adam [7]), this information needs to be stored at every checkpoint. Since it requires essentially the same amount of storage as the weights themselves, the total storage requirement doubles in this case, giving an upper bound of .
Comments
There are no comments yet.