I Introduction
Secure key agreement is one of the basic steps in secure channel establishment. The algorithms responsible for the key exchange must ensure that no eavesdroppers are able to reproduce the secure key. Applied key agreement protocols are based on mathematical operations which have no computationally efficient inversion, e.g. factorization of large number problem or other derived problems.
Quantum computing poses a real threat to applied cryptography systems. Currently used algorithms, based on publickey cryptography approach, offer conditional security. Efficient derivation of a secure key from exchanged fragmentary information may break the security of the key agreement protocol. Currently, there is one known algorithm  Shor’s algorithm  capable of factorizing large numbers. Hence, it can extract exchanged keys and break all applied asymmetric cipher cryptography [19]
. However, a successful implementation of this algorithm requires a quantum computer with the sufficient number of qubits.
Some modern cryptography techniques  such as quantum cryptography and neural cryptography – are able to overcome this problem and provide a variety of quantumproof algorithms. The TPM is one such solution. It achieves a key agreement functionality by mutual learning of two artificial neural networks. This paper introduces an accelerated key exchange process of two TPM by utilizing nonbinary vectors at the input.
The paper is structured as follows. Section II presents the architecture of TPM, the process of mutual learning, secure key agreement protocol, exchanged key length and security of TPM. Section III describes entropy and its appliance in terms of quality assessment of exchanged key. Section IV explains the methodology of the performed simulations. Section V presents an analysis of the gathered results.
Ii Tree Parity Machine
ANN are increasingly popular, finding application in fields including security. In [9] the authors introduce a novel approach for the key agreement functionality implemented with neural networks. Such an approach can be also used for error correction in quantum cryptography systems [14].
Iia TPM architecture
A TPM is a twolayered perceptronstructured artificial neural network with discrete weights, binary input and binary output
[11]. The input vector has elements, wheredenotes the number of inputs for each neuron in the first layer, and
indicates the number of neurons in the first layer. Every element of input vector can have one of two possible values, either or .The first layer consists of neurons similar to the McCullochPitts model [13]. Every input is connected to the th neuron and has its corresponding weight. The values of the weights are the only difference from the former model. Every weight can take a value between and , where is the parameter of the TPM and denotes the minimum/maximum possible weight value of the input neurons.
The output of the aforementioned neurons is based on the slightly changed signum function . The formula of the function is presented in (1). It differs from thr regular signum function in that it never returns zero. The value of is mapped either to or , based on whether the side is the sender or the recipient of the communication [21]. The recipient and sender side is denoted by and , respectively. The parties decide beforehand which side is the sender and the recipient.
(1) 
The argument for the neuron’s activation function is the sum of the products of the input vector’s elements with corresponding weight. The exact formula is presented in (
2).(2) 
IiB Key agreement protocol
The parties performing the key agreement execute the protocol which results in the secure shared key known only to the participating parties. This is usually achieved through the exchange of some information through an unsecured channel and by performing mathematical operations whose results are only known to the authorized parties [8]. The first and most popular key agreement protocol was proposed by Diffie and Hellman [2].
TPM offers functionality which can be adopted for key exchange purposes. The protocol for two parties consists of the following steps [21]:

both participants must agree on all the parameters for the TPM and initialize them with random weights;

the key agreement participants publicly exchange a previously chosen random input vector ;

each party computes the output from their TPM and publishes the results;

if the outputs match, both participants apply the appropriate learning rule which updates the weights of TPM accordingly;
The full synchronization is equivalent to every corresponding weight of both TPM being equal to each other, at which point both TPM are the same.
The aforementioned learning rules are responsible for updating the weights of each TPM in such a way that the synchronization process finishes in finite time [10]. There are three different learning rules which can be used in the process of updating weights[16]:

Hebbian learning rule
(4) 
AntiHebbian learning rule
(5) 
Random walk learning rule,
(6)
where denotes the function returning if and otherwise, and parameter denotes the iteration in the key agreement algorithm.
The synchronization process of two parity machines is not an deterministic algorithm. The number of iterations is not fixed and depends on the size and parameters of the TPM. However, it is shown that the time is finite and can be easily estimated by users
[5]. The process takes longer for larger TPM sizes ( and ) and maximum weight value (). Other factors which affect the number of iterations required for two TPM to finish mutual learning include distribution of initial weights and learning rule [3].IiC Security of Tree Parity Machines
Security of key agreement protocol is crucial for communication. Any eavesdropper being able to reproduce the key based on the messages exchanged between parties or any other source breaks the security of the channel. Subsequently, such a situation depreciate the secure key exchange protocol. Hence, it is crucial to assess security of any novel algorithm or protocol.
TPM have been studied extensively. In [6] and [12] the authors identify four distinct types of attacks that TPM may be vulnerable to:

brute force attack – research shows that it is impossible to find the exact key as a result of a brute force attack against TPM in polynomial time;

genetic algorithm for weight prediction: it has been shown that only TPM with a single neuron in the second layer are vulnerable to this type of attack;

anminthemiddle interception attack – studies show that on average of weights were synchronized in the eavesdropper’s TPM;

sign of weight classification using neural networks – in [12] authors demonstrate that classification using artificial neural networks has near accuracy in determining the sign of the weight in the TPM, which reducing the time needed by the brute force attack by almost half.
The studies show that, by utilizing these attack vectors, it is possible to gain some information about the key. Hence, cryptosystems should be aware of this threat and counteract it in order to minimize the likelihood of key reconstruction.
IiD Maninthemiddle attack
Synchronization of two TPM without additional layers of security is a process prone to maninthemiddle attacks. This attack relies on the possibility of placing a node between parties and performing a key agreement. The node eavesdrops on all the messages shared between and . Based on information collected, node may be able to gain unauthorized access to information sent between and
. Moreover, if the nodes are not mutually authenticated, the adversarial party may be able alter the messages accordingly to attempt an attack with a higher probability of success.
In terms of TPM, maninthemiddle attacks come down to capturing all the input vectors and outputs of parties being intercepted. An adversarial TPM performs the learning process on acquired data. There are three scenarios to be considered while intercepting the key exchange. Let , be the parties wishing to exchange the key and let be an intruder able to perform a maninthemiddle attack.

If – no TPM are synchronized during this step.

If – only TPM and are synchronized, while TPM (attacker) does not update its weights.

If – all the TPM update their weights accordingly.
The last scenario brings the adversarial party closer to obtaining the exchanged key. Hence, this situation should be avoided at all costs.
Iii Entropy
The quality of random numbers generation has a significant impact on the final security of the cryptosystem. A true random number generator produces every available output with equal probability. Unfortunately, computers are incapable of generating fully random numbers. Frequently, numbers are generated based on a pseudorandom number generator. This requires a seed supplied beforehand which is the starting point of the pseudorandom number sequence, and each further number depends on it. Many contemporary implementations lack important features like good mathematical foundations, lack of predictability and cryptographic security [15].
Entropy is one of the measures which assesses the quality of the generated numbers. Let us assume the random source generates different numbers with corresponding probabilities . Entropy for such a defined source is presented in (7) [20].
(7) 
The base of logarithm denotes the units in which entropy is measured, e.g. for and units are bits and nats respectively [1].
Let us consider a random source which produces two outputs with, either or with corresponding probabilities and . The entropy for the described source is presented in (8)[1].
(8) 
Figure 2 shows the plot of the entropy of the aforementioned twovalue random source. The maximum of the function is reached for where which is the equal probability for values and
. Hence, entropy values increase as the probability distribution of
gets closer to the uniform distribution. This can be generalized for sources producing more outcomes.
The entropy function can be used later to assess the quality of the keys generated by different types of TPM. Taking into account (9) the effective length of a key should depend on the entropy of the synchronized weights (not just their values).
Iv Nonbinary input vectors
The TPM uses binary vectors for input [9] during the synchronization process. There are other approaches presented in [4, 7, 18] which use complexvalued, vectorvalued and chaos generated input vectors accordingly to improve the learning process. Additionally, in [17] authors propose whale optimizationbased synchronization which results in reduction of the learning process duration.
This paper introduces a new approach: nonbinary input vectors used to synchronize TPM for a secure key agreement protocol. The authors propose that the mutual learning process which uses the vectors with a greater range of possible values of every element influence the synchronization time of two TPM. Simulations performed in the next section verify this proposition and indicate that this approach can significantly increase the security of neural cryptography.
Iva Nonbinary vector TPM architecture
So far, the exact TPM was defined by parameters . In this paper the authors introduce a new parameter , denoting the minimum/maximum value of each element of input vector . Hence, the input vector will have the following form: , where . Thus, during the synchronization process the entities can use nonbinary input vectors, instead of binary vectors which are currently used in practical implementations.
Introducing the parameter does not affect the architecture of the TPM or the learning process. The formulas shown in Section II are still valid despite more divergent values of the learning vectors. However, simulations presented in Section V
show that as the input vectors are more differentiated, the distribution of settled keys is less similar to the uniform distribution. Therefore, an unbiased estimation of key length is required.
IvB Agreed key length
After the synchronization process both parties share identical keys. The keys are distilled from weights of the TPM which are the same after the mutual learning process. The key length depends on the size of the TPM as well as the parameter which indicates the minimum/maximum value the weights may reach during synchronization. Assuming the ideal uniform distribution of the weights, the key length is equal to . However, the distribution of the weights differs from the uniform distribution [16]. Hence, the entropy should be used to measure the quality of the key exchanged between the parties. The updated key length is defined as follows:
(9) 
indicates an average entropy of the weights. Entropy itself is presented in Section III.
However, the exact distribution of weights is not known beforehand. Taking this fact into consideration, equation (9) should be updated. The estimated effective key length shown in equation (10) uses the estimated entropy based on the simulation results. Additionally, we propose using the floor function in the equation since the unit of effective key are bits.
(10) 
It should be noted that equation (9) indicates the theoretical maximum key length which can be extracted from mutual weights. However, a dedicated algorithm which equalizes the probability can be used to obtain a cryptography key from an unevenly distributed numerical sequence. This algorithm must be deterministic, since both parties retrieve the cryptographic key from weights simultaneously.
V Verification
This section presents the impact of the new parameter , indicating the maximum/minimum value of the input vectors during the synchronization process and how it affects the required iterations in the learning process and the quality of the output key.
Va Methodology
The quality of the output key is measured in its effective length. The effective length is calculated on the basis of (10
). Further, simulation scenarios cover multiple sets of TPM sizes. For each scenario statistical analysis was prepared based on 1000 simulations. The presented confidence intervals are calculated with a 95% probability. These scenarios include all possible combinations of parameters
and . For all simulation scenarios, parameters and are equal to and , respectively. Synchronization time, entropy and effective key length are measured in order to compare the chosen scenarios. Furthermore, we performed maninthemiddle attack scenarios during which we measured the average synchronization score of the malicious TPM.VB Results
The synchronization process becomes longer as the size of the TPM increases; it also generates a longer key for cryptographic purposes. However, simulations presented in Table I reveal that the TPM size and parameters are not the only elements that have an impact on the duration of the synchronization process. Multiple simulations were performed with a different values of parameter . An increase of the parameter value which limits the maximum and minimum possible values of the input vector reduces the synchronization time significantly. The synchronization time in Table I is expressed as a number of output bits exchanged between the parties to achieve full synchronization between the two TPM (learning iterations). Thus, the volume of data exchanged between the parties performing key agreement decreases as the value of parameter increases.
Synchronization time  

Average  Minimum  Maximum  Median  
It should be noted that faster synchronization increases security. This is because as the value of parameter M increases, the key agreement process takes less time, hence a longer and more secure key is obtained in a shorter period of time. This makes this solution more competitive among other key exchange protocols.
VC Extrema values effect
Numerous simulations of the TPM learning process using nonbinary input vectors led to the discovery of an effect named by the authors as the extrema value effect. Similar effect is shown in [18], however, only binary input vectors are considered in this paper.
Faster synchronization times and lower numbers of messages exchanged between users have an impact on the distribution of weights. As the minimum/maximum increases, the probability and also increases. As a result, the probability distribution of weights becomes less similar to the uniform distribution. Hence, every weight of the TPM carries less random information. The exact distribution of weights is presented in Figure 4.
The unequal distribution of weights in the TPM results in a reduction of the effective key length since as entropy value becomes lower. Entropy values and effective key lengths are presented in Table II. To visualize the proportion between effective key length, the results for the considered and parameters are presented in Figure 3.
Entropy  Estimated effective key length  

Adversarial TPM synchronization score ()  
Average  Minimum  Maximum  Median  
VD Susceptibility to a maninthemiddle attack
Many research considerations address TPM vulnerability to maninthemiddle attacks. Therefore, simulations with adversarial TPM have been conducted while utilizing learning by nonbinary input vectors.
We assumed the worstcase scenario in which the adversarial neural network was able to eavesdrop on all of the data exchanged between the parties performing the key exchange. During the simulations, the final synchronization score was gathered for the adversarial neural network. The synchronization score measures the similarity between two TPM. The more common weights there are, the higher score value is assigned. Hence, the formula needs to return higher values with the progress of the learning process. The formula for calculating the end score is presented in equation (11). In the following equation, denotes weights of adversarial TPM and function is defined in Section II.
(11) 
In terms of security, the attacker’s TPM should have the lowest synchronization score possible.
The synchronization score of adversarial TPM are presented in Table III. Additionally simulation results are shown in Figure 5 to visualize the relationship between scenarios with different TPM. Increased values of parameter M result in higher median synchronization scores, hence the TPM is more prone to maninthemiddle attacks. When parameter was equal to , we observed situations where the synchronization score was equal to . This means that the relationship between parameters should be preserved to ensure security. Additionally, the median is inversely proportional to the number of inputs N, therefore the impact of nonbinary input vectors on the synchronization score is less clear for larger TPM. Furthermore, the confidence intervals are considerable. This variability makes it difficult to predict the attacker’s malicious TPM weights.
Vi Summary
Correct selection of TPM parameters is a key issue of implementing secure key agreement protocols for neural cryptography. It is crucial to find a tradeoff between effective key length, synchronization time, and security of the final key which is used by users to protect data in network environment. This comes down to selecting the appropriate network size, extreme values of the weights and learning vectors.
This article proposes an improved way of learning TPM. A significant acceleration of the key agreement process was achieved by utilizing a nonbinary input vector. This reduces the volume of data exchanged between the parties performing key agreement. Faster synchronization increases security levels; in particular, it mitigates the risk of the key being obtained by an intruder using a maninthemiddle attack. However, the speeding up the process results in an unequal distribution of weights in the TPM. This was measured by calculating the effective key length based on the entropy of each weight. The proposed solution was also verified in an insecure environment in which two TPM are a subject to a maninthemiddle attack.
We envisage that future work will explore the development of a secure key exchange protocol using nonbinary input vectors in TPM during mutual learning. This work will be focused on studying the extrema values effect thoroughly and minimizing the reduction of effective key length.
Acknowledgement
This work was supported by the ECHO project which has received funding from the European Union’s Horizon 2020 research and innovation programme under the grant agreement no. 830943.
References
 [1] (2006) Elements of information theory (wiley series in telecommunications and signal processing). WileyInterscience, USA. External Links: ISBN 0471241954 Cited by: §III, §III.
 [2] (1976) New directions in cryptography. IEEE Transactions on Information Theory 22 (6), pp. 644–654. External Links: Document Cited by: §IIB.
 [3] (2015) DISTANCE of the initial weights of tree parity machine drawn from different distributions. Advances in Science and Technology Research Journal 9 (26), pp. 137–142. External Links: ISSN 20804075, Document, Link Cited by: §IIB.
 [4] (2020) Neural cryptography based on complexvalued neural network. IEEE Transactions on Neural Networks and Learning Systems 31 (11), pp. 4999–5004. External Links: Document Cited by: §IV.
 [5] (2016) Synchronization of two tree parity machines. In 2016 New Trends in Signal Processing (NTSP), pp. 1–4. External Links: Document Cited by: §IIB.
 [6] (2016) Synchronization of two tree parity machines. In 2016 New Trends in Signal Processing (NTSP), Vol. , pp. 1–4. Cited by: §IIC.
 [7] (20210204) Neural cryptography based on generalized tree parity machine for reallife systems. Security and Communication Networks 2021, pp. 6680782. Cited by: §IV.
 [8] (2005) Key agreement. In Encyclopedia of Cryptography and Security, pp. 325–325. External Links: ISBN 9780387234830, Document, Link Cited by: §IIB.
 [9] (200202) Secure exchange of information by synchronization of neural networks. EPL (Europhysics Letters) 57, pp. . External Links: Document Cited by: §II, §IV.
 [10] (2002) Theory of interacting neural networks. In Handbook of Graphs and Networks, pp. 199–217. External Links: ISBN 9783527602759, Document, Link, https://onlinelibrary.wiley.com/doi/pdf/10.1002/3527602755.ch9 Cited by: §IIB.
 [11] (2014) Machine learning: an algorithmic perspective, second edition. 2nd edition, Chapman & Hall/CRC. External Links: ISBN 1466583282 Cited by: §IIA.
 [12] (20180413) Security evaluation of tree parity rekeying machine implementations utilizing sidechannel emissions. EURASIP Journal on Information Security 2018 (1), pp. 3. External Links: ISSN 2510523X, Document, Link Cited by: 4th item, §IIC.
 [13] (19431201) A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics 5 (4), pp. 115–133. External Links: ISSN 15229602, Document, Link Cited by: §IIA.
 [14] (20190425) Error correction in quantum cryptography based on artificial neural networks. Quantum Information Processing 18 (6), pp. 174. External Links: ISSN 15731332, Document, Link Cited by: §II.
 [15] (201409) PCG: a family of simple fast spaceefficient statistically good algorithms for random number generation. Technical report Technical Report HMCCS20140905, Harvey Mudd College, Claremont, CA. Cited by: §III.
 [16] (2007) Neural synchronization and cryptography. Ph.D. Thesis, University of Würzburg. External Links: arXiv:0711.2411 Cited by: §IIB, §IVB.
 [17] (2021) Artificial neural synchronization using nature inspired whale optimization. IEEE Access 9 (), pp. 16435–16447. External Links: Document Cited by: §IV.

[18]
(202102)
Secure exchange of information using artificial intelligence and chaotic system guided neural synchronization
. Multimedia Tools and Applications. External Links: Link, Document Cited by: §IV, §VC.  [19] (1997) Polynomialtime algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Journal on Computing 26 (5). Cited by: §I.
 [20] (2020) Entropy and randomness: from analogic to quantum world. IEEE Access 8 (), pp. 74553–74561. Cited by: §III.
 [21] (2005) Tree parity machine rekeying architectures. IEEE Transactions on Computers 54 (4), pp. 421–427. Cited by: §IIA, §IIB.
Comments
There are no comments yet.