1 Introduction
The emerging field of neural cryptography is a subfield of cryptography that deals with artificial neural networks for encryption and cryptanalysis.
Early contributions in this area considered cryptographic systems based on recursive auto encoders and showed that feedforward networks trained via backpropagation can encrypt plaintext messages in the activation patterns of hidden layer neurons
[2]. Later work introduced keyexchange systems where coupled neural networks synchronize to establish common secret keys [5]; while the original approach was not completely secure [6], more recent work showed that modern convolutional interacting neural networks can indeed learn to protect their communication against adversary eavesdroppers [1]. Another popular idea is to combine chaotic dynamics and neural networks [7, 8, 11, 12, 13]. For example, chaotic neural networks were used for image encryption and experimentally verified to be secure and chaotic Hopfield networks were found to be able to generate random binary sequences for text encryption.Given this short survey, the novel idea for neural cryptography proposed in this paper can be seen as a hybrid approach that harnesses chaotic dynamics and the deterministic outcome of a training procedure. Namely, we propose to use echo state networks [3] both for encryption and decryption.
Considering the classic scenario where Alice and Bob exchange messages and want to protect their communication against Eve’s eavesdropping, we assume that both share an identical copy of an echo state network whose internal states evolve according to a nonlinear dynamical system. To encrypt a message (a text, an image, etc.), Alice feeds it into her copy of the network and trains the output weights such that the network reproduces the input. She then sends these output weights to Bob who uses them to run his copy of the network which will regenerate the message. Eve, on the other hand, may retrieve the communicated output weights, but without the corresponding echo state network (its structure, input weights, and internal weights), she will not be able to decipher the message. Our experiments with this kind of privatekey or symmetric cryptography system reveal the approach to be easy to use, efficient, scalable, and secure.
Next, we briefly summarize the basic theory behind echo state networks and how to use them as auto encoders that memorize their input. We then discuss how to harness them for cryptography and present experiments which underline that our approach satisfies the fundamental cryptographic properties of diffusion and confusion.
2 Echo State Networks as Memories
Echo state networks (ESNs) follow the paradigm of reservoir computing where a large reservoir of recurrently interconnected neurons processes sequential data. The central idea is to randomly generate weights between input and reservoir neurons as well as weights between reservoir neurons. Only the weights between reservoir and output neurons are trained in order to adapt the network to a particular task.
At time , the states of the input, output, and reservoir neurons are collected in , , and , respectively, and their evolution over time is governed by the following nonlinear dynamical system
(1)  
(2) 
where is called the leaking rate. The function
is understood to act componentwise on its argument and is typically a sigmoidal activation function. For the output layer, however,
is usually just a linear or softmax function depending on the application context.To train an echo state network, one provides a training sequence of input data gathered in a matrix together with a sequence of desired outputs gathered in . The training sequence is fed into the network and the internal activations that result from iterating equation (1) are recorded in a matrix . Appropriate output weights can then be determined using least squares
(3) 
where is a regularization constant. However, for a good practical performance, the scale of and the spectral radius of have to be chosen carefully. Together with the leaking rate , these parameters are rather task specific, yet, useful, commonly adhered to general guidelines are given in [9].
Because of its recurrent connections, the reservoir of an echo state network can be understood as a nonlinear highdimensional expansion of the input data that has a memory of the past. The temporal reach of this memory is called “memory capacity” and bounded by the number of reservoir neurons [4]. An entire input sequence (e.g. a text file) can therefore be stored in and retrieved from the reservoir provided the reservoir is large enough. Hence, our idea in this paper is to produce an echo state network with a large reservoir and to train it to memorize an input sequence. Once the training is complete, we let the network run freely to (re)generate the memorized sequence.
3 ESNBased Encryption and Decryption
We consider the classic cryptographic scenario where Alice and Bob want to secure their communication against Eve’s eavesdropping. Using a secret key, Alice converts her messages known as plaintexts into encrypted messages known as ciphertexts. She then sends the cyphertexts to Bob who uses the same key to convert them back into plaintexts.
Given this setup, our idea is to “memorize” a given message using an echo state network at one end of a communication channel and to “recall” it at the other end using the same network. If Alice and Bob share an identical copy of the network, Alice can train it to memorize the data and transmits only the resulting weights over the insecure channel. Bob then plugs these weights into his copy of the network and runs it to reconstruct Alice’s message. In other words, the weight matrices and and leaking rate of the echo state network constitute the secret key of our cryptographic system. Without it Eve can not decipher the transmitted cyphertext .
3.1 Representing Data
In our practical implementations of the above scheme, we consider bytelevel representations of messages. This allows for flexibility and wide applicability because, in the memory of a computer, texts or images are represented as a bytestream after all. To further increase flexibility, we consider a “one hot” encoding of individual bytes where each of the 256 possible values is represented as a 256dimensional binary vector.
3.2 Memorizing Data
Given any byte sequence of input data, we train and apply an echo state network as follows: First, we append a dummy byte at the beginning of the original sequence so as to make the later recall process independent of the value of the original first byte in the sequence. Second, we encode the resulting sequence to obtain where each is a binary vector of length 256. Given , we then set the in and output sequence for an echo state network to
(4)  
(5) 
where the indices of the vectors in sequences and differ by one time step. Given an echo state network with input weights and reservoir weights , we then iterate the system in (1) and (2) and learn appropriate output weights according to (3).
3.3 Recalling Data
Once has been determined, it can be plugged into an identical copy of the echo state network at the other end of a communication channel. This network can then regenerate the encoded message one element at a time. To this end, we consider the dummy byte and “one hot” encode it to obtain . Using this as the first input to the network, we run the system in (1) and (2) to obtain from . At each time step, we consider the network output
, which is not necessarily a binary vector, as a vector of probabilities for different bytes. We thus subject it to the softmax function which returns a 1 for the most likely entry and 0s for all others. The resulting binary vector is then used as the input
for the next iteration of the network. Moreover, we decode the binary vectors obtained in each iteration into bytes and collect them in a matrix , which is exactly the original sequence memorized by the echo state network.3.4 Working with “Data Chunks”
As the size of data sequence increases, the size of a reservoir that can memorize it increases, too. This makes the matrix multiplications required for the network’s state updates expensive. In fact, the total cost for internal updates will be of order and, to reduce this cost, we adopt a “divideandconquer” strategy where we split the data into chunks of size and employ a small reservoir to memorize each chunk at an effort of . Hence, for an entire sequence, i.e. for chunks, efforts reduce to .
4 Experiments and Results
In our practical experiments, we found that echo state networks used as described above can indeed memorize and perfectly recall different types of data such as texts, images, audio files, videos, archives, etc. In this section we report results obtained from different kinds of security analysis of our cryptographic scheme. The parametrization of the echo state networks considered in these experiments is summarized in Tab. 1.
parameter  value 

chunk size  (reservoir size chosen as ) 
leaking rate  
spectral radius of  
input scaling of  
random seed  randomly chosen 
input connectivity  input neurons are connected to 30 % of reservoir neurons 
reservoir connectivity  reservoir neurons are connected to 30% of reservoir neurons 
activation function  logistic for the reservoir and softmax for the output 
4.1 Security analysis
Any cryptography system should be robust against common types of attacks such as brute force attacks, knownplaintext attacks, and ciphertextonly attacks. A brute force attack is an attack in which an attacker attempts to find the keys of the system through trial and error. It is evident from the Tab. 1 that the key space of our proposed system is very large and most of the parameters are unbounded. This renders brute force attacks extremely time consuming and practically infeasible.
Figures 1(a) and (e) show two original images one of which (Lena) was given as a tiff file, the other (cat) as a png file. Both were encrypted and decrypted using the same echo state network. Decryption produced the images in Fig. 1(b) and (f) which are identical to the original ones. However, when decrypting with networks with slightly modified parameters, i.e. when using slight variations of the secret key, we obtained useless images as shown in Fig. 1(c), (d), (g), and (h). These results are prototypical and show that the system is highly sensitive to the secret key. This makes it robust against brute force attacks because decryption is only possible if all the parameters of the secret key are set precisely.
Knownplaintext attacks are ones where an attacker has access to an example of a plaintext (a message) and a corresponding ciphertext (a weight matrix ) and attempts to crack the secret key via a comparative analysis of changes between them. For instance, by analyzing changes in the ciphertexts of images which differ by just a few pixels, it might possible to obtain part of the mapping involved in encryption. Figures 2(a) and (f) show original images and Figs. 2(c) and (h) show slightly distorted versions where 1% of the pixels were randomly changed. The corresponding encrypted images (matrices ) are visualized in Fig. 2(b), (g), (d), and (i). Only small changes in the plaintext led to considerable changes in the chiphertext; these differences are visualized in Fig. 2(e) and (j) and amount to about . Thus, our system is sensitive to slight modification of the plaintext and therefore renders knownplaintext attacks very difficult.
In ciphertextonly attacks, an attacker has access to a set of ciphertexts, however has some knowledge about statistical distribution of plaintexts. Using frequency analysis of ciphertexts, for instance, exploiting the fact that “e” is the most frequent character in English texts, one can map the most frequent parts in a ciphertext to corresponding plaintexts. Figure 3 shows frequency distributions for the plaintexts and ciphertexts of the images “Lena” and “cat”. Although the plaintext distributions of two images differ, their ciphertext distributions are very similar. From these distributions it is evident that most of the elements () in the ciphertext (
) are zero and that the nonzero elements are uniformly distributed. Thus, frequency analysis will be ineffective and the proposed system is robust against ciphertextonly attacks.
According to Shannon [10], diffusion and confusion are the two fundamental properties of a good cryptography system. A system that has the diffusion property is one where a small change in either plaintext or key causes a large change in the ciphertext. A system with the confusion property is one where the mapping between plaintext and ciphertext is complex. Our experimental results indicate that the proposed system has both these properties.
4.2 Performance
To evaluate the runtime performance of our proposed system, we determined average encryption and decryption times for messages of different sizes. Our results are shown in Fig. 4. For instance, encrypting and decrypting a 3KB message took less than one second each and runtimes were found to increase linearly with the message size. Our approach therefore scales well and can be used in realtime applications.
5 Conclusion
In this paper, we proposed a novel neural cryptography scheme based on the capability of echo state networks to memorize and reproduce sequences of input data. The proposed system was found to be robust against common security attacks and satisfies the fundamental cryptographic properties of diffusion and confusion. Moreover, our approach is scalable, suitable for realtime applications, and does not require special purpose hardware (such as GPUs) for its computations.
References
 [1] Abadi, M., Andersen, D.G.: Learning to Protect Communications with Adversarial Neural Cryptography. arXiv:1610.06918 (2016)

[2]
Clark, M., Blank, D.: A NeuralNetwork Based Cryptographic System. In: Proc. Midwest Artificial Intelligence and Cognitive Science Conf. (1998)
 [3] Jäger, H.: The “Echo State” Approach to Analysing and Training Recurrent Neural Networks. Tech. Rep. 148, GMD (2001)
 [4] Jäger, H.: Short Term Memory in Echo State Networks. Tech. Rep. 152, GMD (2002)
 [5] Kanter, I., Kinzel, W., Kanter, E.: Secure Exchange of Information by Synchronization of Neural Networks. Europhysics Letters 57(1) (2002)
 [6] Klimov, A., Mityagin, A., Shamir, A.: Analysis of Neural Cryptography. In: Proc, Int. Conf. on Theory and Application of Cryptology and Information Security (2002)
 [7] Li, C., Li, S., Zhang, D., Chen, G.: ChosenPlaintext Cryptanalysis of a ClippedNeuralNetworkBased Chaotic Cipher. In: Int. Symp. on Neural Networks (2005)
 [8] Lian, S.: A Block Cipher Based on Chaotic Neural Networks. Neurocomputing 72(4–6) (2009)
 [9] Lukoševičius, M.: A Practical Guide to Applying Echo State Networks. In: Montavon, G., Orr, G., Müller, K.R. (eds.) Neural Networks: Tricks of the Trade, LNCS, vol. 7700. Springer (2012)
 [10] Shannon, C.E.: Communication theory of secrecy systems. Bell Labs Technical Journal 28(4), 656–715 (1949)

[11]
Wang, X.Y., Yang, L., Liu, R., Kadir, A.: A Chaotic Image Encryption Algorithm Based on Perceptron Model. Nonlinear Dynamics 62(3) (2010)
 [12] Yu, W., Cao, J.: Cryptography Based on Delayed Chaotic Neural Networks. Physics Letters A 356(4–5) (2006)
 [13] Zhou, T., Liao, X., Chen, Y.: A Novel Symmetric Cryptography Based on Chaotic Signal Generator and a Clipped Neural Network. In: Proc. Int. Symp. on Neural Networks (2004)
Comments
There are no comments yet.