Learning Continuous Chaotic Attractors with a Reservoir Computer

by   Lindsay M. Smith, et al.
University of Pennsylvania

Neural systems are well known for their ability to learn and store information as memories. Even more impressive is their ability to abstract these memories to create complex internal representations, enabling advanced functions such as the spatial manipulation of mental representations. While recurrent neural networks (RNNs) are capable of representing complex information, the exact mechanisms of how dynamical neural systems perform abstraction are still not well-understood, thereby hindering the development of more advanced functions. Here, we train a 1000-neuron RNN – a reservoir computer (RC) – to abstract a continuous dynamical attractor memory from isolated examples of dynamical attractor memories. Further, we explain the abstraction mechanism with new theory. By training the RC on isolated and shifted examples of either stable limit cycles or chaotic Lorenz attractors, the RC learns a continuum of attractors, as quantified by an extra Lyapunov exponent equal to zero. We propose a theoretical mechanism of this abstraction by combining ideas from differentiable generalized synchronization and feedback dynamics. Our results quantify abstraction in simple neural systems, enabling us to design artificial RNNs for abstraction, and leading us towards a neural basis of abstraction.


page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 8


Representing Formal Languages: A Comparison Between Finite Automata and Recurrent Neural Networks

We investigate the internal representations that a recurrent neural netw...

Teaching Recurrent Neural Networks to Modify Chaotic Memories by Example

The ability to store and manipulate information is a hallmark of computa...

Evolutionary Training and Abstraction Yields Algorithmic Generalization of Neural Computers

A key feature of intelligent behaviour is the ability to learn abstract ...

Controlling Recurrent Neural Networks by Conceptors

The human brain is a dynamical system whose extremely complex sensor-dri...

Phase Transition Adaptation

Artificial Recurrent Neural Networks are a powerful information processi...

Learn to Synchronize, Synchronize to Learn

In recent years, the machine learning community has seen a continuous gr...

Boolean automata isolated cycles and tangential double-cycles dynamics

Our daily social and political life is more and more impacted by social ...

I Introduction

Biological and artificial neural networks have the ability to make generalizations from only a few examples Zhang et al. (2011); Faulkner et al. (2008); Dunn and Wong (2012); Craik and Bialystok (2006); Tacchetti et al. (2018); Moser et al. (2008); Ifft et al. (2013). For instance, both types of networks demonstrate object invariance: the ability to recognize an object even after it has undergone translation or transformation Guyonneau et al. (2006); Zou et al. (2017). What is surprising about this invariance is not that neural systems can map a set of inputs to the same output. Rather, what is surprising is that they can first sustain internal representations of objects, and then abstract these representations to include translations and transformations. Hence, beyond simply memorizing static, discrete examples Hopfield (1982), neural systems have the ability to abstract their memories along a continuum of information by observing isolated examples Seung (1998). However, the precise mechanisms of such abstraction remain unknown, limiting the principled design and training of neural systems.

To make matters worse, much of the information represented by neural networks is not static, but dynamic. As a biological example, a songbird’s representation of song is inherently time-varying, and can be continuously sped up and slowed down through external perturbations Fee and Scharff (2010). In artificial networks, recurrent neural networks (RNNs) can store a history of temporal information such as language Mikolov et al. (2010), dynamical trajectories Jaeger (2010); Sussillo and Abbott (2009), and climate Nadiga (2021)

to more accurately classify and predict future events. To harness the power of RNNs for processing temporal information, efforts have focused on developing powerful training algorithms such as backpropagation through time (BPTT)

Lillicrap and Santoro (2019)

and neural architectures such as long short-term memory (LSTM) networks

Hochreiter and Schmidhuber (1997), alongside physical realizations in neuromorphic computing chips Furber (2016). Unfortunately, the dramatic increase in computational capability is accompanied by a similarly dramatic increase in the difficulty of understanding such systems, severely limiting their designability and generalizability beyond specific datasets.

To better understand the mechanisms behind neural representations of temporal information, the field has turned to dynamical systems. Starting with theories of synchronization between coupled dynamical systems Davison et al. (2016); Pecora and Carroll (1990), theories of generalized synchronization Rulkov et al. (1995) and invertible generalized synchronization Lu and Bassett (2020) provide intuition and conditions for when a neural network uniquely represents the temporal trajectory of its inputs, and when this representation can recover the original inputs to recurrently store them as memories Lu et al. (2018). These theories hinge on important ideas and tools such as delay embedding Garcia and Almeida (2005), Lyapunov exponents Dawson et al. (1994), and dimensionality Young (1982); Frederickson et al. (1983), which quantify crucial properties of time-varying representations. However, it is not yet known precisely how neural systems abstract such time-varying representations. Accordingly, the field is limited in its understanding of abstraction and meta-learning in existing neural systemsKumar et al. (2020); Schweighofer and Doya (2003); Santiago (2004); Feldkamp et al. (1997), and restricted in its ability to design neural systems for abstraction.

Here, we address this knowledge gap by providing a mechanism for the abstraction of time-varying attractor memories in a reservoir computer (RC) Lukoševičius and Jaeger (2009). First, we demonstrate that a neural network can observe low dimensional inputs and create higher dimensional abstractions, thereby learning a continuum of representations from a few examples. Then, we develop new theory to explain the mechanism of this abstraction by extending prior work Kim et al. (2021): we explicitly write the differential response of the RC to a differential change in the input, thereby giving a quantitative form to ideas of differentiable generalized synchronization Hunt et al. (1997). We quantify this abstraction by demonstrating that successful abstraction is driven by the acquisition of an additional 0 Lyapunov exponent in the RC’s dynamics, and study the role of the RC’s spectral radius and time constant on its ability to abstract dynamics. These results enable the development of more interpretable and designable methods in machine learning, and provide a quantitative hypothesis and measure of abstraction from neural dynamics.

Ii Mathematical Framework

Figure 1: Schematic of a reservoir computer learning a limit cycle memory. (a) Time series of a limit cycle that drives the RNN reservoir to the state of the limit cycle. Weighted sums of the reservoir states are trained to reproduce the original time series (b) by creating the matrix. (c) The reservoir uses the weighted sums in to evolve, closing the feedback loop in the RC. (d) The RC now evolves autonomously along a trajectory that closely follows the expected dynamics of the original limit cycle. Here, color represents time.

To study the ability of neural networks to process and represent time-varying information as memories, we use a simple nonlinear dynamical system from reservoir computing:



is a vector that represents the state of the

reservoir neurons, is the vector of inputs into the reservoir, is a constant vector of bias terms, is the matrix of connections between neurons, is the matrix of weights mapping inputs to neurons,

is a sigmoidal function that we take to be the hyperbolic tangent

, and is a time constant.

Throughout the results, we use an -neuron network, such that . We set to be sparse, where each non-zero entry of is a random number from -1 to 1, and then scale

such that the absolute value of the largest eigenvalue is

, the spectral radius of the network. In general, each entry of was drawn randomly from -1 to 1, and multiplied by a scalar coefficient set to 0.1; one analysis that stands as an exception is the parameter sweep in Subsection IV.1, where the scalar coefficient was varied systematically. Each entry of the bias term was drawn randomly from -1 to 1, and multiplied by a bias amplification constant, which was set to 10 in all cases except in the parameter sweep in Subsection IV.1.

To study the ability of the reservoir to form representations and abstractions of temporal data, we must define the data to be learned. Following prior work in teaching reservoirs to represent temporal information Kim et al. (2021); Lu et al. (2018); Jaeger (2010); Sussillo and Abbott (2009), we will use dynamical attractors as the memories. The first memory that we use is a stable limit cycle that evolves according to


To test the reservoir’s ability to learn and abstract more complex memories, the second memory that we use is the chaotic Lorenz attractor Lorenz (1963) that evolves according to


By driving the reservoir in Eq. 1 with the time series generated from either the stable limit cycle in Eq. 2 or the chaotic Lorenz system in Eq. 3, the response of the reservoir neurons is given by . In our experiments, we drive the reservoir and evolve the input memory for 50 seconds to create a transient phase which we discard, allowing the RC and the input memory to evolve far enough away from the randomly chosen initial conditions. Then, we drive the reservoir and the input memory together for 100 seconds to create a learning phase. Because we use a time step of , this process creates a learning phase time series of 100,000 points.

To store the attractor time series as memories, prior work in reservoir computing has demonstrated that it is sufficient to first train an output matrix that maps reservoir states to copy the input according to


and then perform feedback by replacing the inputs with the output of the reservoir, . This feedback generates a new system that evolves autonomously according to


We evolve this new system for 500 seconds to create a prediction phase.

As a demonstration of this process, we show a schematic of the reservoir being driven by the stable limit cycle input (Fig. 1a, blue), thereby generating the reservoir time series (Fig. 1a, gold), which is subsequently used to train a matrix such that copies the input (Fig. 1a, red). The training input, (Fig. 1b, blue), and the training output, (Fig. 1b, red), are plotted together and are indistinguishable. After the training, we perform feedback by replacing the reservoir inputs, , with the outputs, (Fig. 1c), and observe that the output of the autonomous reservoir remains as a limit cycle (Fig. 1d). Can this simple process be used not only to store memories, but also to abstract memories? If so, by what mechanism?

In what follows, we answer these questions by extending the framework to multiple isolated inputs. Specifically, rather than use only one attractor time series , we will use a finite number of translated attractor time series


for , where is a constant vector. We will use these time series to drive the reservoir to generate a finite number of neural responses . By concatenating all of the inputs and reservoir states along the time dimension into a single time series, and , respectively, we train an output matrix according to Eq. 4 that maps all of the reservoir states to all of the translated inputs. Finally, using , we perform feedback according to Eq. 5.

Iii Differential learning drives abstraction

Figure 2: Successful abstraction in learning a continuous limit cycle memory. (a) 5 shifted limit cycles are learned by the reservoir as 5 isolated examples. (b) 2D plot of the predicted output of the autonomous reservoir whose initial state has been prepared between the 5 training examples. The shift magnitude, or the distance of each translation, of the initial state is colored from green to black. (c) To visualize the abstraction that occurred in an additional dimension along the direction of the translation, we show a 3D plot of the predicted reservoir time series projected onto the vector in Eq. 11, and the first two principal components after removing the projection. The cutout highlights the "height" of the the continuous attractor, formed from the abstraction along the axis.

To teach the reservoir to generate higher-dimensional representations of isolated inputs, we train it to copy translations of an attractor memory. First, we consider the time series of a stable limit cycle generated by Eq. 2, , and we create shifted time series, , for according to Eq. 6 (Fig. 2a). We then use these time series to drive the reservoir according to Eq. 1 to generate the reservoir time series for , concatenate the time series into and , respectively, and train the output matrix according to Eq. 4 to generate the autonomous feedback reservoir that evolves according to Eq. 5.

To test whether the reservoir has learned a higher-dimensional continuum of limit cycles versus the five isolated examples, we evolve the autonomous reservoir at intermediate values of the translation variable . Specifically, we first prepare the reservoir state by driving the non-autonomous reservoir in Eq. 1 with limit cycles at intermediate translations (i.e. for ) for 50 seconds until any transient dynamics from the initial reservoir state have decayed, thereby generating a set of final reservoir states . We then use these final reservoir states as the initial state for the autonomous feedback reservoir in Eq. 5. Finally, we evolve the autonomous reservoir and plot the outputs in green in Fig. 2b. As can be seen, the autonomous reservoir whose initial state has been prepared at intermediary shifts, translations in position, continues to evolve about a stable limit cycle at that shift.

iii.1 Differential mechanism of learning

Now that we have numerically demonstrated the higher-dimensional abstraction of lower-dimensional attractors, we will uncover the underlying theoretical mechanism first by studying the response of the reservoir to different inputs, and then by studying the consequence of the training process.

First, we compute perturbations of the reservoir state, , in response to perturbations of the input, , by linearizing the dynamics about the trajectories and to yield


where is the derivative of evaluated at , and is the element-wise product of the -th element of and the -th row of either matrix or . We are guaranteed by differentiable generalized synchronization Hunt et al. (1997) that if is infinitesimal and constant, then is also infinitesimal and evolves according to Eq. 7. Fortuitously, the differential change is precisely infinitesimal and constant, and is given by the derivative of Eq. 6 to yield . We substitute this derivative into Eq. 7 to yield


Crucially, this system is linear, such that if a shift of yields a perturbed reservoir trajectory of , then a shift of yields a perturbed reservoir trajectory of . Hence, we can already begin to see the mechanism of abstraction: any scalar multiple of the differential input, , yields a scalar multiple of the trajectory as a valid perturbed trajectory.

To complete the abstraction mechanism, we note that the trained output matrix precisely learns the inverse map: if Eq. 8 maps scalar multiples of to scalar multiples of , then the trained output matrix maps scalar multiples of back to . To learn this inverse map, notice that our 5 training examples are spaced closely together (Fig. 2a), which allows the trained output matrix to map differential changes in to differential changes in . Hence, not only does , but also learns


The consequence of this differential learning is seen in the evolution of the perturbation of the autonomous feedback reservoir by substituting Eq. 9 into Eq. 8 to obtain


If the training examples are close enough to learn the differential relation in Eq. 9, then any perturbed trajectory, , generated by Eq. 8 is a valid trajectory in the feedback system to linear order. Further, any scalar multiple of is also a valid perturbed trajectory in the feedback system.

Hence, by training the output matrix to copy nearby examples—thereby learning the differential relation between and —we encode scalar multiples of as a linear subspace of valid perturbation trajectories. It is precisely this encoded subspace of valid perturbation trajectories that we call the higher-dimensional abstraction of the lower-dimensional input; in addition to the 2-dimensional limit cycle input, the reservoir encodes the subspace comprising scalar multiples of the perturbation trajectory as a third dimension. To visually represent this third dimension, we take the average of the perturbation vector across time as


and project all of the autonomous reservoir trajectories along this vector. We remove this projection from the first two principal components of the autonomous reservoir trajectories, and then we plot the same projection against these two modified principal components, shown in Fig. 2c. As can be seen, the shift in the limit cycle is encoded along the direction. Graphically and numerically, we have confirmed our theoretical mechanism of abstraction using a continuous limit cycle memory.

Iv Abstraction as the acquisition of a Lyapunov Exponent equal to zero

Figure 3: Obtaining a Lyapunov spectrum from a Lorenz attractor. (a) A 3D plot of the Lyapunov perturbation orbits (colored), which check the stability of a trajectory, about the Lorenz attractor (black, gray), obtained by evolving the orbits about the Jacobian of the Lorenz system evaluated at each point, followed by an orthonormalization. (b) A plot of the three Lyapunov exponents over seconds for the Lorenz system, whose average over

seconds yields the estimated Lyapunov exponents.

Now that we have a mechanism of abstraction, we seek a simple method to quantify this abstraction in higher-dimensional systems that do not permit an intuitive graphical representation (Fig. 2c). In the chaotic Lorenz system with a fractal orbit (Fig. 3a), it can be difficult to visually determine whether the prediction output is part of a given input example or in between two input examples. Hence, we would like some measure of the presence of perturbations along a trajectory, , that neither grow nor shrink along the direction of linearly scaled perturbations. If these perturbations neither grow nor shrink, they represent a stable trajectory that does not collapse into another trajectory or devolves into chaos.

To measure this abstraction, we compute the Lyapunov spectrum of the RC. Conceptually, the Lyapunov spectrum measures the stability of different trajectories along an attractor. It is computed by first generating an orbit along the attractor of a -dimensional dynamical system, , then by evaluating the Jacobian at every point along the orbit, , and finally by evolving orbits of infinitesimal perturbations, , along the time-varying Jacobian as


Along these orbits, the direction and magnitude of will change based on the linearly stable and unstable directions of the Jacobian . To capture these changes along orthogonal directions, after each time step of evolution along Eq. 12, we order the perturbation vectors into a matrix , and perform a Gram-Schmidt orthonormalization to obtain an orthonormal basis of perturbation vectors . In this way, eventually points along the least stable direction, along the second least stable direction, and along the most stable direction. The evolution of the three perturbation vectors of the Lorenz system are shown in Fig. 3a.

To calculate the Lyapunov spectrum, we compute the projection of each normalized perturbation vector along the Jacobian as the Lyapunov exponent (LE) over timeBalcerzak et al. (2018),


and the final LE is given by the time average (Fig. 3b). Every continuous-time dynamical system with bounded, non-fixed-point dynamics has at least one zero Lyapunov exponent corresponding to a perturbation that neither grows nor shrinks on average (Fig. 3b, red). In a chaotic system like the Lorenz, there is also a positive LE corresponding to an orthogonal perturbation that grows on average (Fig. 3b, blue). Finally, a negative LE corresponds to an orthogonal perturbation that decays on average (Fig. 3b, yellow). As can be seen in the plot of trajectories, the orbit of the negative LE is directed transverse to the plane that roughly defines the “wings” of the attractor, such that any deviation from the plane of the wings quickly collapses back onto the wings (Fig. 3a).

Using the Lyapunov spectrum, we hypothesize that the reservoir’s abstraction of an attractor memory will appear as an additional LE equal to zero. This is because through the training of nearby examples, the reservoir acquires the perturbation direction that neither grows nor decays on average, as all scalar multiples of are valid perturbation trajectories to linear order according to Eq. 10. Hence, the acquisition of such a perturbation direction that neither grows nor decays should present itself as an additional LE equal to zero.

iv.1 Abstraction depends on spectral radius and time constant

Figure 4: LEs of an RC after learning a continuous limit cycle. Heat maps of the first 4 LEs of the RC with different spectral radii () on the -axis and time constants () on the -axis.

With this mechanism of abstraction in RNNs, we provide a concrete implementation of our theory and study its limits. Our RNNs in Eq. 1 and in Eq. 5 depend on several parameter regimes; the spectral radius (given by ), the time constant , the bias term , the weighting of the input matrix , and the number and spacing of the training examples all impact whether abstraction can successfully occur. We quantify the effect of varying these parameters on the RC’s ability to abstract different inputs via the Lyapunov spectrum analysis.

We focus on the parameters that determine the internal dynamics of the RC: and . The RC is a carefully balanced system whose internal speed is set by . If is too small, the system is too slow to react to the inputs. And conversely, if is too large, the system responds too quickly to retain a history of the input. Thus, we hypothesize that an intermediate will yield optimal abstraction, and that the optimal range of will vary depending upon the time scale of the input. Similar to the time constant, the spectral radius is known to impact the success of the learning as it controls the excitability of the RNN Sussillo and Abbott (2009). For abstraction to succeed, the RNN needs an intermediate and to learn the input signals with an excitability and reaction speed suited for the input attractor memory.

To find the ideal parameter regime for abstracting a limit cycle attractor memory, we performed a parameter sweep on from 2.5 to 25.0 in increments of 2.5, and on from 0.2 to 2.0 in increments of 0.2. All other parameters in the closed and open loop reservoir equations were held constant. To measure the success of the abstraction, we calculated the first 4 LEs of the RC, looking for values of and equal to 0, and values of and that are negative. For this continuous limit cycle memory, we found that the best parameter regime was and (Fig. 4). Then, we tested the weighting of the input matrix, , while holding , , and all other parameters constant. We found that an optimal scaling of is between 0.001 and 0.1. We performed a similar test for the bias term, , resulting in an optimal scaling between 1.0 and 20.0. These parameter ranges demonstrate that a careful balance of and , along with and , is necessary to successfully achieve abstraction. More generally, our approach to defining these parameter ranges provides a principled method for future RNNs to learn different attractor memories.

iv.2 Abstraction of chaotic memories

Figure 5: LEs of a RC after learning a continuous Lorenz attractor. Heat maps of the first 5 LEs of the RC with different spectral radii () on the -axis and time constants () on the -axis.

While limit cycles provide an intuitive conceptual demonstration of abstraction, real neural networks such as the human brain learn more complex memories that involve a larger number of parameters, including natural and chaotic attractors such as weather phenomenaLorenz (1963) or diffusionPathak et al. (2017). Chaotic attractors pose a more complex memory for the reservoir to learn, so it is nontrival to show that the reservoir is able to abstract from several chaotic attractors to learn one single continuous chaotic attractor. By again analyzing the Lyapunov spectrum, we can quantify successful abstraction. As seen in Figure  3, a chaotic dynamical system is characterized by positive Lyapunov exponents. In the case of the Lorenz attractor, the first Lyapunov exponent is positive (), the second is equal to zero, and the third is negative (). Hence, when the RC learns a single Lorenz attractor, the first LE is positive, the second LE is zero, and the rest of the spectrum is increasingly negative. In the case of the successful and continuous abstraction of the Lorenz attractor, we expect to see that the first LE is positive, followed by not one, but two LEs equal to zero, followed by increasingly negative LEs.

To test this acquisition of an LE equal to zero, we trained the RNN to learn multiple chaotic attractor memories, focusing on the Lorenz attractor. To find the ideal parameter regime for learning a continuous Lorenz attractor memory from many discrete examples, we again performed a parameter sweep on from 2.5 to 25.0 in increments of 2.5, and on from 0.2 to 2.0 in increments of 0.2. All other parameters in the closed and open loop reservoir equations were held constant. To measure the success of abstraction, we calculated the first 5 LEs of the RC, looking for the values of to be positive, values of and to equal zero, and the values of and to be increasingly negative. For this continuous Lorenz attractor memory, we found that the best parameter regime was using 0.6-1.2 for and a of 25, as seen in Fig. 5. Hence, we demonstrate that in addition to simple limit cycle attractors, RNNs can succesfully abstract much more complex and unstable chaotic attractor memories, demonstrating the generalizability of our theory.

V Discussion

Reservoir computing has been gaining substantial traction, and significant advances have been made in many domains of application. Among them include numerical advances in adaptive rules for training reservoirs using evolutionary algorithms

Ferreira and Ludermir (2009); Ferreira et al. (2013) and neurobiologically-inspired spike-time-dependent-plasticity Paugam-Moisy et al. (2008). In tandem, physical implementations of reservoir computing in photonic Katumba et al. (2017); Salehi and Dehyadegari (2014); Röhm et al. (2019), memristiveMerkel et al. (2014), and neuromorphic Donati et al. (2018) systems provide low-power alternatives to traditional computing hardware. Each application is accompanied by its own unique set of theoretical considerations and limitations Köster et al. (2020), thereby emphasizing the need for the underlying analytical mechanisms to make meaningful generalizations across such a wide range of systems.

In this work, we provide such a mechanism for the abstraction of a continuum of attractor memories from discrete examples, and put forth the acquisition of an additional zero Lyapunov exponent as a quantitative measure of success. Moreover, the method can be applied to any learning of chaotic attractor memories due to the generality of the differential mechanism of learning we uncover. While our investigation simplifies the complexity of the network used and the memories learned, we show that the underlying mechanism of abstraction remains the same as we increase the complexity of the memory learned (e.g., discrete to continuous, and non-chaotic to chaotic).

Our work motivates several new avenues of inquiry. First, it would be of interest to examine the theoretical and numerical mechanism for abstracting more complex transformations. Second, it would be of interest to embark on a systematic study of the spacing between the discrete examples that is necessary to learn a differential attractor versus

discrete attractors, and the phase transition of abstraction. Third and finally, ongoing and future efforts could seek to determine the role of noise in both the RNN and input dynamics for abstracting high-dimensional continuous attractors from scattered low-dimensional and discrete attractors. Because different RNNs are better suited to learn and abstract different inputs, we expect that this work will shed light on studies that reveal how one can design specialized RNNs for better abstraction on particular dynamical attractors.

Vi Conclusion

Here we show that an RNN can successfully learn time-varying attractor memories. We demonstrate this process with both limit cycle and Lorenz attractor inputs. We then show the RNN several discrete examples of these attractors, translated from each other by a small distance. We find that the neural network is able to abstract to a higher dimension and learn a continuous attractor memory that connects all of the discrete examples together. This process of abstraction can be quantified by the acquisition of an additional exponent equal to zero in the Lyapunov spectrum of the RC’s dynamics. Our discovery has important implications for future improvements in the algorithms and methods used in machine learning, due specifically to the understanding gained from using this simpler model. More broadly, our findings provide new hypotheses regarding how humans construct abstractions from real-world inputs to their neural networks.

Vii Acknowledgments

LMS acknowledges support from the University Scholars Program at the University of Pennsylvania. JZK acknowledges support from the NIH T32-EB020087, PD: Felix W. Wehrli, and the National Science Foundation Graduate Research Fellowship No. DGE-1321851. DSB acknowledges support from the NSF through the University of Pennsylvania Materials Research Science and Engineering Center (MRSEC) DMR-1720530, as well as the Paul G. Allen Family Foundation, and a grant from the Army Research Office (W911NF-16-1-0474). The content is solely the responsibility of the authors and does not necessarily represent the official views of any of the funding agencies.

Viii Citation diversity Statement

We would like to include a citation diversity statement following a recent proposal Zurn et al. (2020). Recent work in several fields of science has identified a bias in citation practices such that papers from women and other minority scholars are under-cited relative to the number of such papers in the field Mitchell et al. (2013); Dion et al. (2018); Caplar et al. (2017); Maliniak et al. (2013); Dworkin et al. (2020); Bertolero et al. (2020); Wang et al. (2021); Chatterjee and Werner (2021); Fulvio et al. (2021)

. Here we sought to proactively consider choosing references that reflect the diversity of the field in thought, form of contribution, gender, race, ethnicity, and other factors. First, we obtained the predicted gender of the first and last author of each reference by using databases that store the probability of a first name being carried by a woman

Dworkin et al. (2020); Zhou et al. (2020). By this measure (and excluding self-citations to the first and last authors of our current paper), our references contain 6.67% woman(first)/woman(last), 16.4% man/woman, 13.33% woman/man, and 63.6% man/man. This method is limited in that a) names, pronouns, and social media profiles used to construct the databases may not, in every case, be indicative of gender identity and b) it cannot account for intersex, non-binary, or transgender people. Second, we obtained predicted racial/ethnic category of the first and last author of each reference by databases that store the probability of a first and last name being carried by an author of color Ambekar et al. (2009); Sood and Laohaprapanon (2018). By this measure (and excluding self-citations), our references contain 15.94% author of color (first)/author of color(last), 20.06% white author/author of color, 20.17% author of color/white author, and 43.83% white author/white author. This method is limited in that a) names and Florida Voter Data to make the predictions may not be indicative of racial/ethnic identity, and b) it cannot account for Indigenous and mixed-race authors, or those who may face differential biases due to the ambiguous racialization or ethnicization of their names. We look forward to future work that could help us to better understand how to support equitable practices in science.

Ix References


  • A. Ambekar, C. Ward, J. Mohammed, S. Male, and S. Skiena (2009)

    Name-ethnicity classification from open sources

    In Proceedings of the 15th ACM SIGKDD international conference on Knowledge Discovery and Data Mining, pp. 49–58. Cited by: §VIII.
  • M. Balcerzak, D. Pikunov, and A. Dabrowski (2018) The fastest, simplified method of lyapunov exponents spectrum estimation for continuous-time dynamical systems. Nonlinear Dynamics 94 (4), pp. 3053–3065. Cited by: §IV.
  • M. A. Bertolero, J. D. Dworkin, S. U. David, C. L. Lloreda, P. Srivastava, J. Stiso, D. Zhou, K. Dzirasa, D. A. Fair, A. N. Kaczkurkin, B. J. Marlin, D. Shohamy, L. Q. Uddin, P. Zurn, and D. S. Bassett (2020) Racial and ethnic imbalance in neuroscience reference lists and intersections with gender. bioRxiv. Cited by: §VIII.
  • N. Caplar, S. Tacchella, and S. Birrer (2017) Quantitative evaluation of gender bias in astronomical publications from citation counts. Nature Astronomy 1 (6), pp. 0141. Cited by: §VIII.
  • P. Chatterjee and R. M. Werner (2021) Gender disparity in citations in high-impact journal articles. JAMA Netw Open 4 (7), pp. e2114509. Cited by: §VIII.
  • F. I.M. Craik and E. Bialystok (2006) Cognition through the lifespan: mechanisms of change. Trends in Cognitive Sciences 10 (3), pp. 131–138. External Links: Document, ISSN 13646613 Cited by: §I.
  • E. N. Davison, B. Dey, and N. E. Leonard (2016) Synchronization bound for networks of nonlinear oscillators. In 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1110–1115. Cited by: §I.
  • S. Dawson, C. Grebogi, T. Sauer, and J. A. Yorke (1994) Obstructions to shadowing when a lyapunov exponent fluctuates about zero. Physical review letters 73 (14), pp. 1927. Cited by: §I.
  • M. L. Dion, J. L. Sumner, and S. M. Mitchell (2018) Gendered citation patterns across political science and social science methodology fields. Political Analysis 26 (3), pp. 312–327. Cited by: §VIII.
  • E. Donati, M. Payvand, N. Risi, R. Krause, K. Burelo, G. Indiveri, T. Dalgaty, and E. Vianello (2018) Processing emg signals using reservoir computing on an event-based neuromorphic system. In 2018 IEEE Biomedical Circuits and Systems Conference (BioCAS), pp. 1–4. Cited by: §V.
  • F. A. Dunn and R. O. L. Wong (2012)

    Diverse Strategies Engaged in Establishing Stereotypic Wiring Patterns among Neurons Sharing a Common Input at the Visual System’s First Synapse

    Journal of Neuroscience 32 (30), pp. 10306–10317. External Links: Document, ISSN 0270-6474 Cited by: §I.
  • J. D. Dworkin, K. A. Linn, E. G. Teich, P. Zurn, R. T. Shinohara, and D. S. Bassett (2020) The extent and drivers of gender imbalance in neuroscience reference lists. bioRxiv. External Links: Document, Link, https://www.biorxiv.org/content/early/2020/01/11/2020.01.03.894378.full.pdf Cited by: §VIII.
  • R. L. Faulkner, M.-H. Jang, X.-B. Liu, X. Duan, K. A. Sailor, J. Y. Kim, S. Ge, E. G. Jones, G.-l. Ming, H. Song, and H.-J. Cheng (2008) Development of hippocampal mossy fiber synaptic outputs by new neurons in the adult brain. Proceedings of the National Academy of Sciences 105 (37), pp. 14157–14162. External Links: Document, ISSN 0027-8424 Cited by: §I.
  • Michale. S. Fee and C. Scharff (2010) The Songbird as a Model for the Generation and Learning of Complex Sequential Behaviors. ILAR Journal 51 (4), pp. 362–377. External Links: Document, ISSN 1084-2020 Cited by: §I.
  • L. A. Feldkamp, G.V. Puskorius, and P.C. Moore (1997) Adaptive behavior from fixed weight networks. Information Sciences 98 (1), pp. 217 – 235. External Links: ISSN 0020-0255, Document Cited by: §I.
  • A. A. Ferreira, T. B. Ludermir, and R. R. De Aquino (2013) An approach to reservoir computing design and training. Expert systems with applications 40 (10), pp. 4172–4182. Cited by: §V.
  • A. A. Ferreira and T. B. Ludermir (2009) Genetic algorithm for reservoir computing optimization. In 2009 International Joint Conference on Neural Networks, pp. 811–815. Cited by: §V.
  • P. Frederickson, J. L. Kaplan, E. D. Yorke, and J. A. Yorke (1983) The liapunov dimension of strange attractors. Journal of differential equations 49 (2), pp. 185–207. Cited by: §I.
  • J. M. Fulvio, I. Akinnola, and B. R. Postle (2021) Gender (im)balance in citation practices in cognitive neuroscience. J Cogn Neurosci 33 (1), pp. 3–7. Cited by: §VIII.
  • S. Furber (2016) Large-scale neuromorphic computing systems. Journal of neural engineering 13 (5), pp. 051001. Cited by: §I.
  • S. P. Garcia and J. S. Almeida (2005) Multivariate phase space reconstruction by nearest neighbor embedding with different time delays. Physical Review E 72 (2), pp. 027205. Cited by: §I.
  • R. Guyonneau, H. Kirchner, and S. J. Thorpe (2006) Animals roll around the clock: the rotation invariance of ultrarapid visual processing. Journal of Vision 6 (10), pp. 1–1. Cited by: §I.
  • S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §I.
  • J. J. Hopfield (1982) Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences 79 (8), pp. 2554–2558. Cited by: §I.
  • B. R. Hunt, E. Ott, and J. A. Yorke (1997) Differentiable generalized synchronization of chaos. Physical Review E 55 (4), pp. 4029. Cited by: §I, §III.1.
  • P. J. Ifft, S. Shokur, Z. Li, M. A. Lebedev, and M. A. L. Nicolelis (2013) A Brain-Machine Interface Enables Bimanual Arm Movements in Monkeys. Science Translational Medicine 5 (210), pp. 210ra154–210ra154. External Links: Document, ISSN 1946-6234 Cited by: §I.
  • H. Jaeger (2010) The “ echo state ” approach to analysing and training recurrent neural networks – with an Erratum note. GMD Report 1 (148), pp. 1–47. Cited by: §I, §II.
  • A. Katumba, M. Freiberger, P. Bienstman, and J. Dambre (2017) A multiple-input strategy to efficient integrated photonic reservoir computing. Cognitive Computation 9 (3), pp. 307–314. Cited by: §V.
  • J. Z. Kim, Z. Lu, E. Nozari, G. J. Pappas, and D. S. Bassett (2021) Teaching recurrent neural networks to infer global temporal structure from local examples. Nature Machine Intelligence 3 (4), pp. 316–323. Cited by: §I, §II.
  • F. Köster, D. Ehlert, and K. Lüdge (2020) Limitations of the recall capabilities in delay-based reservoir computing systems. Cognitive Computation, pp. 1–8. Cited by: §V.
  • S. Kumar, I. Dasgupta, J. D. Cohen, N. D. Daw, and T. L. Griffiths (2020) Meta-learning of compositional task distributions in humans and machines. External Links: 2010.02317 Cited by: §I.
  • T. P. Lillicrap and A. Santoro (2019) Backpropagation through time and the brain. Current opinion in neurobiology 55, pp. 82–89. Cited by: §I.
  • E. N. Lorenz (1963) Deterministic Nonperiodic Flow. Journal of the Atmospheric Sciences 20 (2), pp. 130–141. External Links: ISSN 0022-4928 Cited by: §II, §IV.2.
  • Z. Lu and D. S. Bassett (2020) Invertible generalized synchronization: a putative mechanism for implicit learning in neural systems. Chaos: An Interdisciplinary Journal of Nonlinear Science 30 (6), pp. 063133. Cited by: §I.
  • Z. Lu, B. R. Hunt, and E. Ott (2018) Attractor reconstruction by machine learning. Chaos: An Interdisciplinary Journal of Nonlinear Science 28 (6), pp. 061104. Cited by: §I, §II.
  • M. Lukoševičius and H. Jaeger (2009) Reservoir computing approaches to recurrent neural network training. Computer Science Review 3 (3), pp. 127–149. Cited by: §I.
  • D. Maliniak, R. Powers, and B. F. Walter (2013) The gender citation gap in international relations. International Organization 67 (4), pp. 889–922. Cited by: §VIII.
  • C. Merkel, Q. Saleh, C. Donahue, and D. Kudithipudi (2014) Memristive reservoir computing architecture for epileptic seizure detection. Procedia Computer Science 41, pp. 249–254. Cited by: §V.
  • T. Mikolov, M. Karafiát, L. Burget, J. Černockỳ, and S. Khudanpur (2010) Recurrent neural network based language model. In Eleventh annual conference of the international speech communication association, Cited by: §I.
  • S. M. Mitchell, S. Lange, and H. Brus (2013) Gendered citation patterns in international relations journals. International Studies Perspectives 14 (4), pp. 485–492. Cited by: §VIII.
  • E. I. Moser, E. Kropff, and M. Moser (2008) Place Cells, Grid Cells, and the Brain’s Spatial Representation System. Annual Review of Neuroscience 31 (1), pp. 69–89. External Links: Document, ISSN 0147-006X Cited by: §I.
  • B. T. Nadiga (2021) Reservoir computing as a tool for climate predictability studies. Journal of Advances in Modeling Earth Systems 13 (4), pp. e2020MS002290. Cited by: §I.
  • J. Pathak, Z. Lu, B. R. Hunt, M. Girvan, and E. Ott (2017) Using machine learning to replicate chaotic attractors and calculate lyapunov exponents from data. Chaos: An Interdisciplinary Journal of Nonlinear Science 27 (12), pp. 121102. Cited by: §IV.2.
  • H. Paugam-Moisy, R. Martinez, and S. Bengio (2008) Delay learning and polychronization for reservoir computing. Neurocomputing 71 (7-9), pp. 1143–1158. Cited by: §V.
  • L. M. Pecora and T. L. Carroll (1990) Synchronization in chaotic systems. Physical review letters 64 (8), pp. 821. Cited by: §I.
  • A. Röhm, L. Jaurigue, and K. Lüdge (2019) Reservoir computing using laser networks. IEEE Journal of Selected Topics in Quantum Electronics 26 (1), pp. 1–8. Cited by: §V.
  • N. F. Rulkov, M. M. Sushchik, L. S. Tsimring, and H. D. I. Abarbanel (1995) Generalized synchronization of chaos in directionally coupled chaotic systems. Physical Review E 51 (2), pp. 980–994. External Links: Document, ISSN 1063-651X Cited by: §I.
  • M. R. Salehi and L. Dehyadegari (2014) Optical signal processing using photonic reservoir computing. Journal of Modern Optics 61 (17), pp. 1442–1451. Cited by: §V.
  • R. A. Santiago (2004) Context discerning multifunction networks: reformulating fixed weight neural networks. In 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541), Vol. 1, pp. 189–194. Cited by: §I.
  • N. Schweighofer and K. Doya (2003)

    Meta-learning in reinforcement learning

    Neural Networks 16 (1), pp. 5–9. Cited by: §I.
  • H. S. Seung (1998) Learning continuous attractors in recurrent networks. In Advances in neural information processing systems, pp. 654–660. Cited by: §I.
  • G. Sood and S. Laohaprapanon (2018) Predicting race and ethnicity from the sequence of characters in a name. arXiv preprint arXiv:1805.02109. Cited by: §VIII.
  • D. Sussillo and L.F. Abbott (2009) Generating Coherent Patterns of Activity from Chaotic Neural Networks. Neuron 63 (4), pp. 544–557. External Links: Document, ISSN 08966273 Cited by: §I, §II, §IV.1.
  • A. Tacchetti, L. Isik, and T. A. Poggio (2018) Invariant Recognition Shapes Neural Representations of Visual Input. Annual Review of Vision Science 4 (1), pp. 403–422. External Links: Document, ISSN 2374-4642 Cited by: §I.
  • X. Wang, J. D. Dworkin, D. Zhou, J. Stiso, E. B. Falk, D. S. Bassett, P. Zurn, and D. M. Lydon-Staley (2021) Gendered citation practices in the field of communication. Annals of the International Communication Association. External Links: Document Cited by: §VIII.
  • L. Young (1982) Dimension, entropy and lyapunov exponents. Ergodic theory and dynamical systems 2 (1), pp. 109–124. Cited by: §I.
  • Zhi. Zhang, Y.-Y. Jiao, and Q. Sun (2011) Developmental maturation of excitation and inhibition balance in principal neurons across four layers of somatosensory cortex. Neuroscience 174, pp. 10–25. External Links: Document, ISSN 03064522 Cited by: §I.
  • D. Zhou, E. J. Cornblath, J. Stiso, E. G. Teich, J. D. Dworkin, A. S. Blevins, and D. S. Bassett (2020) Gender diversity statement and code notebook v1.0 External Links: Document, Link Cited by: §VIII.
  • X. Zou, Z. Ji, X. Liu, Y. Mi, K. M. Wong, and S. Wu (2017) Learning a continuous attractor neural network from real images. In International Conference on Neural Information Processing, pp. 622–631. Cited by: §I.
  • P. Zurn, D. S. Bassett, and N. C. Rust (2020) The citation diversity statement: a practice of transparency, a way of life. Trends in Cognitive Sciences 24 (9), pp. 669–672. Cited by: §VIII.