I Experimental study: a relevant analogous system
i.1 Experimental setup and data
In order to gather time series of seismic-like activity, we collect experimental data while mechanically cycling a nylon thread, loose Stokinette knit, which was recently shown  to display such rich response. We use a setup similar to that reported in . The 83x83 stitch knitted fabric is clamped on its upper and lower rows by means of screws holding each stitch individually, imposing a constant spacing between them along the corresponding rows. Starting from an initial configuration with height and width , we perform a tensile test using an Instron (model 5965) dynamometer mounted with a load cell. It consists in varying cyclically the elongation of the fabric between the jaws, between and .
The mechanical response - the force signal - is recorded at high frequency (25 Hz) during the stretching phase on a shorter elongation range, between and . To approach the quasi-static deformation limit in the interval , we impose a constant loading speed and set it at a small value . To reduce the duration of the experiment, we fix outside this measurement window.
This protocol yields a value of the force every 0.04s, with a cycle duration of 2000 s, leading to 50000 points of mechanical response per cycle. We collected data from 20 experiments of approximately 30 cycles each, which corresponds to roughly to 25 millions force points. The raw data for one cycle is shown in Fig. 2.
i.2 Data preprocessing
Upon stretching, the mechanical response of the fabric displays large fluctuations around an affine response that is removed by a linear detrending.
The amplitude of these fluctuations is however non-stationary over the cycles, signalling the wear of the fabric over time. In order to build a robust normalization of this signal, we first remove the temporal mean of the data and scale it by its standard deviation. However, these quantities are different from on cycle to the next. Furthermore, within a given cycle, we observe that the amplitude of the fluctuations also increases linearly with the extension, presumably because the fabric has stored more potential energy. We thus apply a second such normalization, where the statistics are collected over a sliding time window over all cycles, which achieves a signalwith a stationary distribution.
i.3 Event statistics
Characterization of events – Upon closer inspection, the normalized fluctuations exhibit a complex, "stick-slip"-like behavior, with linear loading phases interrupted by abrupt drops, as can be seen in Fig. 3. These force drops, or "slip-events", were shown in  to be concomitant and correlated in amplitude with time-intermittent, spatially extended fault-like plastic events.
Gutemberg-Richter statistics – We are interested in these drops in the normalized loading force, as they are the equivalent of the seismic "quakes" that threaten populations, and as such are at the heart of our prediction endeavor. We therefore define the quantity at each time step by:
where we characterise each drop by its amplitude , see Fig. 3. Following this construction, more than 93% of ’s values are zeros. The events themselves, corresponding to non-zero values in , are roughly power-law distributed, as shown in Fig. 4. More precisely, the event distribution displays three different regimes : noisy structureless events of low amplitude, a scale-invariant regime that extends over slightly more than 3 decades with an exponent of -1.3, and an exponential cut-off for very large and rare events. This distribution is characteristic of the avalanching dynamics found in seismic activity  and references within.
This long-tailed statistics suggests to define classes of event severity in a logarithmic scale. As an illustrative example, we will pick five classes centered on the decimal magnitudes in the event number count distribution, as shown in Fig. 5. Class zero events correspond to noise, extended to a small event class 1. Then medium-sized events (class 2) and large events (class 3) potentially qualify as non-severe and severe respectively, while class 4 corresponds to very rare, catastrophic quakes.
Ii Supervised machine learning
In this section we describe in details the machine learning procedure we designed in order to predict various notions of danger imminence based on past measurements. We do time series forecasting using an artificial neural network architecture called ResNet Neural Network (NN), which is a well-known feature extractor based on a large number of convolutional layers with residual connexions .
ii.1 Building the labels
Because most of the ’s are zeros, the NN would only learn to predict zero without a proper rebalancing of the dataset. On the other hand, using a naively rebalanced dataset might lead to an over prediction of extreme events. We deal with this issue by introducing a future horizon window of size (in time steps) and predicting the severity of future events within that window. Such an aggregation of future events has indeed the automatic effect of partially rebalancing the dataset in a meaningful way. We have considered three different ways for aggregating the events in the near future:
Target 1 (T1):
This target focuses only on the largest event in the near future discarding the temporal information,
Target 2 (T2):
aggregates the amplitudes of events in the near future,
Target 3 (T3):
Thanks to exponentially decaying weights, this target integrates information on both amplitudes of events and their time of occurrence.
These scalar targets are then mapped to discrete labels, denoted Y(t) in the following, with value from 0 to N-1 corresponding to classes matching the events statistics discussed earlier - see Fig. 5 for details. The training dataset is then balanced with respect to these labels prior to training. We have tested different choices of time steps. These values are chosen because they roughly coincide with the average time delay between events from class 1 to 3.
Note that our approach is somewhat hybrid (via the use of ) with respect to other choices of the literature, where, eg. the prediction task focused on either:
time to failure: predict the time before next event of given class as in 
future amplitude: predict the amplitude of next event regardless of when it happens.
ii.2 Inputs and model architecture
The input of the NN is built by stacking together three 1D arrays:
Here, is a past horizon that we set to 256 time steps. We feed the neural network with both the past raw data, the past amplitude drops, and the correct past targets (with an appropriate shift so that the input data contains no information about the future).
One may object that contains some future data because its value, when is non zero, corresponds to the amplitude of an event that is just beginning. However, this does not introduce future biases because what we try to predict is , ie. the upcoming delta’s, and not the force signal itself. We verified this claim by simply removing that data from the input, which resulted in similar performance.
We investigated different NN architectures and found that convolutionnal-based NNs achieved the best results. We will thus only report the results obtained with a ResNet18 with 1D convolutions on the 3 input channels. We used AdaDelta as optimizer with an initial learning rate of and a learning rate decay of
every 50 epochs for 300-epoch runs. The model also includes weights decay set to.
The whole dataset of 27M sequences was split in 22M of them for training, 2.5M for the validation, and 2.5M for testing. The training sequences are then subsampled in order to enforce equiprobability of the associated labels. The test set is however not subsampled since we want to probe our model’s abilities on realistic data.
We trained and evaluated models for the three different targets T1, T2, T3, the three future window , and also varying the number of classes N from 2 to 5, resulting in 36 models in total.
One example of learning progress is displayed in Fig. 6 for classes. In that case, the accuracy for both the training and validation set reaches 75%, while the accuracy on the test set is around 74%. This two classes model is however not very helpful in order to track our ability to predict extreme events only. But upon increasing , the meaning of the accuracy value increasingly fails to capture that ability, due in particular to the proportion of events in each class being highly imbalanced.
We thus also report the F1-score, which is an harmonic mean of recall, and precision (see appendix B for definitions), two additional and complementary metrics. Both these indicators, displayed in Fig. 7, are normalized by the value they would assume with a random prediction. This figure shows that the NN is indeed able to learn and generalize from the training set, and we learn that increasing the number of classes or the future horizon typically leads to better performance. The target , which is simply the sum of the slip amplitudes in the future horizon, also seems to be mildly favored.
This result is confirmed by the direct study of false positive and false negative rates restricted to the two edge classes, noise versus extreme events only, as seen in Fig. 8. We see there that increasing the number of event classes greatly reduces both the risk of predicting low class events when a major event is in fact going to happen, and the risk of overestimating the danger, potentially leading to needless countermeasures. However, the interpretation of the results becomes increasingly complicated.
Furthermore, although these false positive and negative rates may seem very low (around 1% for the best model in the bottom left corner of Fig. 8), this is still far from being perfect because extreme events have themselves an even lower occurrence rate (around ten times smaller).
As stated in the introduction, these predictions are not handy to read or use, and a quantitative assessment of their differential quality is still lacking at this point.One way around these difficulties consists in defining exactly what we mean by useful prediction: within the georisk context, a prediction is useful if it allows policy-makers to make better informed decisions, in order in particular to save more lives while sparing the local economic activity as much as possible. In the next section we thus describe a second layer of (reinforcement) learning meant to design a risk-management policy based on the predictions of this first layer. The idea is that upon inspection of the past predictions of the model, we may on the one hand be able to compare the different models in terms of a single risk-based metric, and on the second hand be able to improve the decision of evacuating or not the city with respect to some naive policy by taking advantage of the dynamics of the past predictions of event severity.
Iii Game-theoretical framework
iii.1 Introducting KnitCity
In this section, we introduce KnitCity, a virtual city that is subjected to "knitquakes" based on the crackling dynamics extracted from the mechanical response of the knitted fabric. KnitCity laboratory keeps track of all past seismic data, and has access to all past predictions of the Neural Network designed previously.
At each time step, the mayor must decide whether to evacuate the city or not. Such a decision is a trade-off between the risk of exposing citizens to catastrophic quakes resulting in a severe human loss and the risk of evacuating the city for nothing, to which we may associate a "social cost", related to the corresponding economic loss for example. Such a social cost will be denoted for each day spent out of the city.
On the other hand, we model the human cost with the following weights corresponding respectively to events from class 0 to 4:
where is a free parameter. Such a modelling choice reflects the fact that for actual earthquakes, there is a threshold below which human cost most often vanishes. Also, increasing by an order of magnitude the damages caused by class 4 events with respect to class 3 events reflects the fact that amplitude drops, which embody the events severity, increase exponentially with class label. Since the ratio is the only parameter that drives the trade-off between casualties and time out, can be arbitrarily set to 1. There are then two limiting behaviors: if goes to infinity, the optimal choice is to always stay out of the city, and to always stay in if goes to zero. Although the precise value of
cannot be estimated a priori since it is based on social considerations, it must lie somewhere in between these two limits for otherwise there would be nothing to learn.
Using Eq. (5) and the event statistics, we may estimate typical values of by requiring that a random policy that evacuates the city percent of the time yields a social cost of the same order of magnitude as the human cost. Values of around or less (see results section below) led to the order of magnitude: .
A decision policy can be evaluated by a reward given by the sum over an episode (defined by some number of contiguous time steps, see below) :
where (0 or -1) is the social cost determined by action at time , and is the human cost that depends on both the action and the actual event :
In the following, we shall also use two more quantities in order to represent policy performances. First we define
the average life saving rate and
the average population retention rate. We may then define two dimensional diagrams in which, for example, the policy consisting of always staying in the city has coordinates (see Fig. 9), while the policy consisting in always evacuating the city has coordinates in the () space. An agent that randomly evacuates the city at some rate should on average expect casualties proportional to and thus the set of all possible random policies must lie on average on the straight line. Below that line are policies worse than random.
The optimal policy that consists in evacuating the city only when the damages are larger than the social cost can be evaluated by simply looking into the future: we find and also very close to 1 since dangerous events are very scarce. If we moreover add that the evacuation rate is fixed and continuously varying from 0 to 1, then we may define "an optimal frontier" as being the green straight lines in Fig. 9.
In that space, the actual reward by time step can also be mapped. It is easily shown to be via the previous equations, for two constants and that can be computed on the signal statistics and on . Models can thus be ranked with respect to the average reward per time step; constant reward are decreasing straight lines in that space, see e.g. Fig. 9.
iii.2 Training a decision making agent
Using the NN’s predictions of section 3, we first manually define a naive policy based only the last available prediction, where we evacuate at every alert beyond the threshold , ie. every time an event of class 3 or 4 is predicted:
Results of such a naive policy are shown in Fig.10. The natural question is however whether we can go beyond the naive policy by leveraging the sequence of past NN’s predictions through the training of a reinforcement learning agent.
We therefore trained an unsupervised model consisting in a Categorical Deep-Q-Learner 
belonging to the wide class of Reinforcement Learning (RL) Agents that have achieved remarkable performance in many areas that need the time component and memory to be taken into account : from games to control, auto-pilots, etc. Such a network learns the probability distribution function of the expected rewards for a given action and a current state, and acts by sampling it.
We use a distributional Q-function with discreet action of 51 atoms, with 2 hidden layers of 64 neurons. As optimizer we use Adam with anrate, and a discounting factor of .
We subsampled the test set of the previous section III. B in a set of 75 episodes, or "games" of 5000 time step each (such that on average, two major events of class 4 are present in each game), and validate on unseen data grouped in 50 episodes of 5000 steps. All models are then tested on 180 new unseen episodes of the same number of steps. We trained multiple RL agents that had access to a varying number of past predictions.
We first fix to 20, and train the RL agent for the 36 models obtained in the previous section, with both 1, 4 and 16 past predictions made available to the RL agent. We will denote RL(1), RL(4), … and so on these models. In terms of the average reward per step, the results in Fig. 10 clearly shows that RL agents improve a lot the naive policy. Notice also that some of the models fail to converge to an interesting policy but instead converge to the "always in" policy (especially RL(1) models).
Zooming in on the best performing models with a reward per time step below , we can return a list of best models, Irrespective of the number of past data fed to the RL agent, the best model is systematically the one that uses classes with target 3 (exponential decay [see Eq. (4)]) and a future horizon of 20 or 40.
Exploring further our results, we now fix the model (target 3 and ), and vary in the range and for past predictions made available to the RL agent. We got the the results of Fig. 11.
Finally, we report on the effect of varying the parameter . This trend is shown in Fig. 12. Points here are trained models with target 3 and 5 classes, for different future horizon and different RL(n), but are grouped only by mu-values. This enables us to see quite a smooth evolution from always-in (low ) to always-out policies in accordance with our expectations.
Again, this figure emphasizes that there is no unique way to define an optimal policy when predictions are not perfect; instead we get a family of trained decision agents that are dependent. However the point is that the framework we introduced now provides with both quantitative and qualitative tools to explore and rank decision policies, and most importantly it enables us to compare models between one another, and even between experiments of completely different nature.
In an effort to gain insights into the immediate future crackling activity in the dynamometric signal of a knitted fabric, we have established a general framework, applicable de facto to any time series prediction for which a clear motivation can be rationalized, in which we can on the one hand compare through a common rating the predictive power of the predictor, and on the other explicitly construct a step by step action plan answering the initial motivation by efficiently exploiting the predictions. This development allows in particular to rank the relevance of the various targets, past and future horizons used in neural network predictors for the risk management in a model seismic environment. Encouragingly, our analysis clearly shows that these predictors yield some valuable information about the future.
Of course many challenges remain to be tackled within our approach. Extensive optimization of hyperparameters would certainly allow substantial refinement both in the quality of the predictor and that of the decision-making agent: our work only provides a proof of concept of the type of insights that can be obtained within the model environment of KnitCity, and was not fully fine-tuned. Building on our findings, it would be worth further investigating the role of the number of output classes, even going to the regression limit.
Furthermore, our dataset allows for spatially resolved predictions , which would of course also be beneficial within the geoseismic context as satellite imaging enables such analysis: this approach is high on our list of future priorities. However, establishing a quantitative correspondance between the space and time scales in our system and those in the geophysical context would be required if we were to export the machinery introduced above to real-world situations, and this preoccupation will be at the heart of our future work on spatially resolved prediction.
We thank Sebastien Moulinet for his assistance in controling the experiment. We acknowledge the PaRis Artificial Intelligence Research InstitutE and Floran Krzakala for financial support.
Appendix A: Knitted fabric
The fabrics were crafted using a Toyota KS858 single bed knitting machine and were made of a nylon-based mono-filament (Stroft GTM) of diameter and length . All samples are composed of stitches with an average lateral and longitudinal size of, respectively, 3.9 and 2.8 mm. The yarn Young’s modulus was measured using a tensile test, yielding a bending modulus .
Appendix B: Metrics
We recall the definition of standard classification metrics for reference.
- Silver et al.  D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, et al., arXiv preprint arXiv:1712.01815 (2017).
- Mnih et al.  V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, arXiv preprint arXiv:1312.5602 (2013).
- Vinyals et al.  O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, et al., Nature 575, 350 (2019).
- Devlin et al.  J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, arXiv preprint arXiv:1810.04805 (2018).
- Li et al.  J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, and D. Jurafsky, arXiv preprint arXiv:1606.01541 (2016).
- Ruiz-del Solar and Loncomilla  J. Ruiz-del Solar and P. Loncomilla, in Deep Learning in Computer Vision (CRC Press, 2020) pp. 211–232.
- Shallue and Vanderburg  C. J. Shallue and A. Vanderburg, The Astronomical Journal 155, 94 (2018).
- Radovic et al.  A. Radovic, M. Williams, D. Rousseau, M. Kagan, D. Bonacorsi, A. Himmel, A. Aurisano, K. Terao, and T. Wongjirad, Nature 560, 41 (2018).
- Ribli et al.  D. Ribli, A. Horváth, Z. Unger, P. Pollner, and I. Csabai, Scientific reports 8, 1 (2018).
- Senior et al.  A. W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. Žídek, A. W. Nelson, A. Bridgland, et al., Nature 577, 706 (2020).
- Jing et al.  Y. Jing, Y. Bian, Z. Hu, L. Wang, and X.-Q. S. Xie, The AAPS journal 20, 1 (2018).
- Zhao et al.  Y. Zhao, H. Liu, K. Xie, E. K. Salje, and X. Jiang, Crystals 9 (2019).
- Ribeiro et al.  H. V. Ribeiro, L. S. Costa, L. G. A. Alves, P. A. Santoro, S. Picoli, E. K. Lenzi, and R. S. Mendes, Phys. Rev. Lett. 115, 025503 (2015).
- Geller et al.  R. J. Geller, D. D. Jackson, Y. Y. Kagan, and F. Mulargia, Science 275, 1616 (1997).
- Crampin and Gao  S. Crampin and Y. Gao, Geophysical Journal International 180, 1124 (2010).
- Bonamy et al.  D. Bonamy, S. Santucci, and L. Ponson, Physical Review Letters 101, 045501 (2008).
- Varotsos et al.  P. A. Varotsos, N. V. Sarlis, and E. S. Skordas, EPL (Europhysics Letters) 132, 29001 (2020).
- Viitanen et al.  L. Viitanen, M. Ovaska, M. J. Alava, and P. Karppinen, JSTAT 2017 (5), 053401.
- Ramos et al.  O. Ramos, E. Altshuler, and K. J. Måløy, Physical Review Letters 102, 078701 (2009).
- Antonaglia et al.  J. Antonaglia, W. J. Wright, X. Gu, R. R. Byer, T. C. Hufnagel, M. LeBlanc, J. T. Uhl, and K. A. Dahmen, Physical Review Letters 112, 155501 (2014).
- Sun et al.  B. A. Sun, H. B. Yu, W. Jiao, H. Y. Bai, D. Q. Zhao, and W. H. Wang, Physical Review Letters 105, 035501 (2010).
- Madhu et al.  B. Madhu, M. A. Rahman, A. Mukherjee, M. Z. Islam, R. Roy, and L. E. Ali, Journal of Computer and Communications 9, 78 (2021).
- Wu et al.  N. Wu, B. Green, X. Ben, and S. O’Banion, arXiv:2001.08317 [cs, stat] (2020), arXiv: 2001.08317.
- Panakkat and Adeli  A. Panakkat and H. Adeli, International Journal of Neural Systems 17, 13 (2007).
- Rouet-Leduc et al.  B. Rouet-Leduc, C. Hulbert, N. Lubbers, K. Barros, C. J. Humphreys, and P. A. Johnson, Geophysical Research Letters 44, 9276 (2017).
- Johnson et al.  P. A. Johnson, B. Rouet-Leduc, L. J. Pyrak-Nolte, G. C. Beroza, C. J. Marone, C. Hulbert, A. Howard, P. Singer, D. Gordeev, D. Karaflos, C. J. Levinson, P. Pfeiffer, K. M. Puk, and W. Reade, PNAS 118, e2011362118 (2021).
- Asim et al.  K. Asim, F. Martínez-Álvarez, A. Basit, and T. Iqbal, Natural Hazards 85, 471 (2017).
- Mohankumar and Sangeetha [2018a] K. Mohankumar and K. Sangeetha, IJCSE 6, 200 (2018a).
- Mohankumar and Sangeetha [2018b] K. Mohankumar and K. Sangeetha, IJCSE 6, 200 (2018b).
- Corbi et al.  F. Corbi, L. Sandri, J. Bedford, F. Funiciello, S. Brizzi, M. Rosenau, and S. Lallemand, Geophysical Research Letters 46, 1303 (2019).
- Huang et al.  J. Huang, X. Wang, Y. Zhao, C. Xin, and H. Xiang, Neural Network World 28, 149 (2018).
- Rouet-Leduc et al.  B. Rouet-Leduc, C. Hulbert, I. McBrearty, and P. A. Johnson, Geophysical Research Letters 47, 10.1029/2019GL085870 (2020), arXiv: 1906.08033.
- Poincloux et al. [2018a] S. Poincloux, M. Adda-Bedia, and F. Lechenault, Physical Review Letters 121, 058002 (2018a).
- Poincloux et al. [2018b] S. Poincloux, M. Adda-Bedia, and F. Lechenault, Physical Review X 8, 021075 (2018b).
He et al. 
K. He, X. Zhang, S. Ren, and J. Sun, in
Proceedings of the IEEE conference on computer vision and pattern recognition(2016) pp. 770–778.
- Mallouhy et al.  R. Mallouhy, C. A. Jaoude, C. Guyeux, and A. Makhoul, in 2019 ICT-DM (2019) pp. 1–7, iSSN: 2643-6868.
- Bellemare et al.  M. G. Bellemare, W. Dabney, and R. Munos, arXiv:1707.06887 [cs, stat] (2017), arXiv: 1707.06887.