Memory and Information Processing in Recurrent Neural Networks

04/23/2016 ∙ by Alireza Goudarzi, et al. ∙ 0

Recurrent neural networks (RNN) are simple dynamical systems whose computational power has been attributed to their short-term memory. Short-term memory of RNNs has been previously studied analytically only for the case of orthogonal networks, and only under annealed approximation, and uncorrelated input. Here for the first time, we present an exact solution to the memory capacity and the task-solving performance as a function of the structure of a given network instance, enabling direct determination of the function--structure relation in RNNs. We calculate the memory capacity for arbitrary networks with exponentially correlated input and further related it to the performance of the system on signal processing tasks in a supervised learning setup. We compute the expected error and the worst-case error bound as a function of the spectra of the network and the correlation structure of its inputs and outputs. Our results give an explanation for learning and generalization of task solving using short-term memory, which is crucial for building alternative computer architectures using physical phenomena based on the short-term memory principle.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


  • (1) S. Ganguli, D. Huh, H. Sompolinsky, Proc. Natl. Acad. Sci. USA 105, 18970–18975 (2008)
  • (2) P. Dominey, M. Arbib, and J. P. Joseph, J. Cogn. Neurosci. 7, 311–336 (1995)
  • Maass et al. (2002) W. Maass, T. Natschläger, and H. Markram, Neural Comput. 14, 2531–60, (2002)
  • (4) H. Jaeger and H. Haas, Science 304, 148102 (2004)
  • (5) Lukoševičius, H. Jaeger, and B. Schrauwen, Künstliche Intelligenz 26, 365–371 (2012)
  • (6) D. Hansel and C. van Vreeswijk, J. Neurosci. 32, 4049-4064 (2012)
  • (7) J. P. Crutchfield, W. L. Ditto, and S. Sinha, Chaos 20, 037101 (2010)
  • (8) O. L. White, D. D. Lee, and H. Sompolinsky, Phys. Rev. Lett. 92, 148102 (2004)
  • Toyoizumi (2002) T. Toyoizumi, Neural Comput. 24, 2678–99, (2012)
  • (10) L. Büsing, B. Schrauwen, and R. Legenstein, Neural Comput. 22, 1272–1311 (2010)
  • (11) A. Rodan and P. Tiňo, Neural Networks, IEEE Transactions on 22, 131-144 (2011)
  • (12) S. Ganguli and H. Sompolinsky, Advances in Neural Information Processing Systems, 23, 667–675, (2010)
  • (13) A. Goudarzi, C. Teuscher, N. Gulbahce, and T. Rohlf, Phys. Rev. Lett. 108, 128702 (2012)
  • (14) D. Snyder, A. Goudarzi, and C. Teuscher, Phys. Rev. E 87, 042808 (2013)
  • Sillin (2013) H. O. Sillin, R. Aguilera, H. Shieh, A. V. Avizienis, M Aono, A. Z. Stieg, and J. K. Gimzewski, Nanotechnology, 24, 384004, (2013)
  • Goudarzi (2014a) A. Goudarzi and D. Stefanovic, Procedia Computer Science 41, 176–181, (2014)
  • Haynes (2015) N. D. Haynes, M. C. Soriano, D. P. Rosin, I. Fischer, and D. J. Gauthier, Phys. Rev. E 91, 020801, (2015)
  • Nakajima (2014) K. Nakajima, T. Li, H. Hauser, and R. Pfiefer, J. R. Soc. Interface 11, 20140437, (2014)
  • (19) J. Bürger, A. Goudarzi, D. Stefanovic, and C. Teuscher, AIMS Materials Science 2, 530–545, (2015)
  • Katayama (2015) Y. Katayama, T. Yamane, D. Nakano, R. Nakane, and G. Tanaka, Proceedings of the 2015 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH ’15), IEEE, p. 23-24, (2015)
  • Vandoorne (2014) K. Vandoorne, P. Mechet, T. Van Vaerenbergh, M. Fiers, G. Morthier, D. Verstraeten, B. Schrauwen, J. Dambre, and P. Bienstman, Nat. Commun. 5, 3541, (2014)
  • Mackey (1977) M. C. Mackey and L. Glass, Science 197, 287–289, (1977)

Appendix A Computing the optimal readout weights using autocorrelation

The recurrent neural network (RNN) model that we study in this paper is an echo state network (ESN) with linear activation function. This system consist of an input driven recurrent network of size

, and a linear readout layer trained to calculate a desired function of the input. Let and indicate a one-dimensional input at time and an input weight vector respectively. Let be a recurrent weight matrix, be an -dimensional network state at time , and be the readout weight vector. The dynamics of the network and output is described by:


where the readout weights are given by white2004 :


The value of the optimal readout weights depend on the covariance and cross-covariance components and . Here we show that these can be computed exactly for any arbitrary system given by and and autocorrelation of the input and cross-correlation of input and output .

We begin by noting that the explicit expression for the system state is given by:


Calculating for a given problem requires the following input-output-dependent evaluations:


Appendix B Computing the total memory using autocorrelation

Here we compute the memory function and the total memory of the recurrent neural network described in Appendix A for exponentially correlated input where . The total memory of the system is given by the following summation over the memory function white2004 :


where is the input with lag , .

Computing requires the evaluation of:


This assumes an even correlation function, i.e., . For numerical computation it is more convenient to perform the calculation as follows:


where is a partial sum of satisfying , is a partial sum of satisfying , and is a partial sum of satisfying , which is double counted and must be subtracted. We can substitute and evaluate and as follows:


Here is the identity of the Hadamard product denoted by , and is a matrix inverse with respect to the Hadamard product. Here the trick is that takes the input to the basis of the connection matrix

allowing the dynamics to be described by the powers of the eigenvalues of

, i.e., . Since is symmetric we can use the matrix identity , where is the main diagonal of . Summing over the powers of gives us .

The covariance of the network states and the expected output is given by:


For , the signal becomes i.i.d. and the calculations simplify as follows (Goudarzi, 2014a):


The total memory capacity can be calculated by summing over :


Appendix C Experimental Setup for Memory Task

For our experiment with memory capacity of network under exponentially correlated input we used the following setup. We generated long sample inputs with autocorrelation function . To generate exponentially correlated input we draw samples from a uniform distribution over the interval . The samples are passed through a low-pass filter with a smoothing factor . We normalize and center so that and . The resulting normalized samples have exponential autocorrelation with decay exponent , i.e., . To validate our calculations, we use a network of nodes in a ring topology and identical weights. The spectral radius . The input weights

are created by sampling the binomial distribution and multiplying with

. The scale of the input weights does not affect the memory and the performance in linear systems and therefore we adopt this convention for generating throughout the paper. We also assumed , the number of samples , washout period of steps, and regularization factor .

Appendix D Experimental Setup for Topological Study

A long standing question in recurrent neural network is how its structure effect its memory and task solving performance. Our derivation lets us compute optimal readout layer for arbitrary network. Here we describe the calculations we performed to examine the effect of structure of the network on its memory and task solving performance. To this end, we use networks of size , , and and we systematically study the randomness and spectral radius. We start from a uniform weight ring topology and incrementally add randomness from to . The results for each value of and are averaged over instances. This averaging is necessary even for because the input weights are randomly generated and although their scaling does not affect the result their exact values do (ganguli2008, ).

Appendix E Computing the optimal readout weights using power spectrum

The calculations in Appendix A for optimal layer of a recurrent network may be described in a more generally in terms of power spectrum of the input signal. Here we assume the setup in Appendix A and derive an expressions for optimal readout layer using its the power spectrum of the input and output.

We start by the standard calculation of and :


We replace




which gives




Appendix F Memory capacity expressed in terms of power spectrum

Here we use the derivation in Appendix E and compute the memory function and the total memory of the system. Let and so that


We find that


The matrix is given by


and the matrix is given by:


The total memory is then given by: