Identifying (anti-)skyrmions while they form

by   Jack Y. Araz, et al.
Durham University

We use a Convolutional Neural Network (CNN) to identify the relevant features in the thermodynamical phases of a simulated three-dimensional spin-lattice system with ferromagnetic and Dzyaloshinskii-Moriya (DM) interactions. Such features include (anti-)skyrmions, merons, and helical and ferromagnetic states. We use a multi-label classification framework, which is flexible enough to accommodate states that mix different features and phases. We then train the CNN to predict the features of the final state from snapshots of intermediate states of the simulation. The trained model allows identifying the different phases reliably and early in the formation process. Thus, the CNN can significantly speed up the phase diagram calculations by predicting the final phase before the spin-lattice Monte Carlo sampling has converged. We show the prowess of this approach by generating phase diagrams with significantly shorter simulation times.


page 4

page 5

page 6


Neural network topological snake models for locating general phase diagrams

Machine learning for locating phase diagram has received intensive resea...

Machine-Learned Phase Diagrams of Generalized Kitaev Honeycomb Magnets

We use a recently developed interpretable and unsupervised machine-learn...

Machine-Learning Study using Improved Correlation Configuration and Application to Quantum Monte Carlo Simulation

We use the Fortuin-Kasteleyn representation based improved estimator of ...

Phases of two-dimensional spinless lattice fermions with first-quantized deep neural-network quantum states

First-quantized deep neural network techniques are developed for analyzi...

Deep learning for seismic phase detection and picking in the aftershock zone of 2008 Mw7.9 Wenchuan

The increasing volume of seismic data from long-term continuous monitori...

Classifying Multi-Gas Spectrums using Monte Carlo KNN and Multi-Resolution CNN

A Monte Carlo k-nearest neighbours (KNN) and a multi-resolution convolut...

Learning Disordered Topological Phases by Statistical Recovery of Symmetry

In this letter, we apply the artificial neural network in a supervised m...

I Introduction

Chiral magnets with Dzyaloshinskii-Moriya (DM) interactions [1, 2] present a rich set of phases in which topologically non-trivial structures arise. Among them, one finds skyrmions [3, 4], with unit topological charge; antiskyrmions, which are counterparts with opposite charge; and other objects with fractional charge, such as merons. These structures have been observed experimentally in a variety of materials [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]. The interest in these objects goes beyond the determination of their fundamental properties, as they have applications in the field of spintronics [20, 21, 22, 23, 24, 14].

One of the critical theoretical tools for studying chiral magnets is the implementation of Monte Carlo simulations for a discretized version of them. It has been shown in Ref. [25] that a 3D cubic spin lattice model with ferromagnetic and DM interactions can correctly reproduce the experimentally-determined phase diagram for materials that support Bloch skyrmions. An exploration of the consequences of varying both the strength and the internal structure of the DM interaction was performed in Ref. [26]. Changing their structure in this system generates Neel skyrmions and antiskyrmions, while the strength controls the size of the corresponding objects.

As these simulations produce more varied and complex states and are used to map larger parameter spaces, it becomes crucial to develop automatic methods for dealing with the data they generate. The supervised machine learning framework provides an excellent toolbox to do this. As long as they are adequately selected, machine learning models can be trained with a relatively small set of samples and then used to analyze any new data generated through simulations automatically. In particular, Convolutional Neural Networks (CNNs) have been successfully applied to this type of task: to identify the phase 

[27, 28, 29], the topological charge [30] or the DM interaction [31] from 2D images of a spin-lattice; and to find the phase from videos of the lattice [32].

In this work, we use a CNN in a multi-label approach to identify features such as skyrmions, merons, helical and ferromagnetic states, or hexagonal arrangements of the skyrmions. Several of these features can coexist in the same sample. The corresponding phase can then be inferred from the features. Furthermore, this approach accounts for states that mix two or more phases and allows for identifying features that may only appear in a small region of the image. We also apply this model to predict the features of the final state of a simulation from images of intermediate states. We then use it to construct a phase diagram by running the simulations for very short times and using the CNN to predict the final phase.

The rest of this paper is organized as follows. In Section II we review the Monte Carlo simulations that we perform for the generation of the training and test data. The data set itself, and the particular CNN that we use for it are described in Section III. We show our results in Section IV, and give our conclusions in Section V.

Ii Simulations

The system we simulate is a chiral magnet with ferromagnetic and DM interactions. From a coarse-grained perspective, it can be described by a local continuum Hamiltonian for the magnetisation field. We discretise it in a 3D cubic spin-lattice of size

, with a unit vector

at each lattice position and periodic boundary conditions. To leading order in the lattice spacing, the Hamiltonian of this discrete system consists of the following nearest neighbours interactions:


where is the DM interaction, is the magnitude of an external magnetic field applied along the direction, and and are parameters controlling the strength of the ferromagnetic and DM interactions, respectively. All these parameters of the lattice system are adimensional and proportional to the corresponding parameters of the physical system. Following Ref. [25], we correct the anisotropies generated by the finite lattice spacing by introducing next-to-nearest neighbours of the same form as the nearest-neighbours ones in Eq. (1), with adequate coefficients.

The DM interactions for different materials take different forms as functions of . Here, we select the structure that generates antiskyrmions, which is [26]:


When we extract data from the simulation to train the CNN, we will only use the component of . Since Bloch/Neel skyrmions and antiskyrmions have similar distributions for the component, the choice we have made for the DM interaction does not lead to a loss of generality: the approach we use in Section III would perform similarly for Bloch/Neel skyrmions.

At a finite temperature

, the probability of finding the system in a state with energy

is proportional to . As for the other parameters,

is here adimensional and proportionate to the physical temperature of the system. To reproduce this probability distribution in our simulation we use the Metropolis algorithm: the system is initialized in a random state and then updated iteratively by choosing a random spin and changing it to a new random direction with a probability


where is the energy difference between the system’s energy after the potential change and the current one. To speed up the process, we divide the lattice into three non-interacting sublattices and update all spins in each sublattice in parallel, using a GPU, as described in Ref [26].

A rescaling of the parameters , , and by the same factor leaves the system invariant. We are thus free to fix the value of any of them without loss of generality. We pick . We set the remaining parameters , and to constant values inside a given simulation but varying across different simulations.

We perform two types of procedures during a simulation, which we call themalization stages and averaging stages. A thermalization stage consists of 1000 lattice sweeps. A lattice sweep is an update of all of the spins in the lattice. An averaging step involves taking the average of 200 lattice configurations, with 50 sweeps performed between each consecutive pair. In a simulation, we perform thermalization and averaging stages alternatively, for a total of 20 each. The number of sweeps we take in both kinds of the stage is small compared to their typical size in other applications. As a result, the sequence of averaged configurations obtained from the averaging stages gives us a collection of snapshots showing the dynamical evolution of the system as the final state is formed.

Iii Data preparation and training

To generate our data set, we run 1000 Monte Carlo simulations with values of , and

randomly chosen with a uniform distribution in the intervals

, and , respectively. As described in Section II, we obtain 20 equally-spaced 3D snapshots for each simulation. We perform an additional averaging stage over 2000 configurations for the final state. For each snapshot (and for the final state), we take a 2D slice of the lattice at constant and keep only the component of the corresponding spins. This is enough information to identify the relevant objects in each case. We refer to the set of 20 2D snapshots plus the final 2D image (both of size ) obtained in this way as a sample.

To each sample, we attach a set of labels, chosen from the five ones presented in Table 1, and representing the features observed in the final image. One or more of these labels can be assigned to the same sample. This is represented as a vector of 5 binary 0/1 components for each sample, which we call the label vector. A component of the label vector being one means that the corresponding label is present, while a value of 0 indicates its absence. We remove 50 samples that do not present a clear structure with features matching any of the ones in Table 1. We then standardise the entire data set via RobustScaler from the scikit-learn package.

We perform a separate training procedure for each snapshot index from 1 to 20. Among the 950 samples we have, we use 190 for testing purposes and 20% of the remaining samples for validation. We augment the training data by a factor of 10 by applying a shift by a random amount in the horizontal and the vertical direction, wrapping around the edges. This has several related advantages. First, it increases the effective size of the data set by a factor of , the number of different shifts performed. Second, it enforces that the CNN learns the translational symmetry that the system possesses. Finally, it prevents over-fitting to particular characteristics localised at specific positions in the snapshots selected for training.

The network architecture has been formed via a simple convolution block followed by a fully connected layer. The convolution block has been constructed via a convolutional layer with 16 filters and four stride steps in each direction on the 2D image. The convolutional layer has been wrapped with a

ReLUactivation function, and its weights have been regularised via L2 with a weight of 0.01. A max-pooling layer has followed it, which takes the pixel with maximum impact within a square of pixels. The max-pooling layer has been sandwiched between two batch normalisation layers. The flattened output of the convolution block then has been fed into a fully connected layer with 16 nodes which, again, has been wrapped with ReLU

activation function, and the weights have been regularised with the same L2 regulariser. Finally, the fully connected layer has been padded with dropout layers with 30% probability. This network is designed to provide a five-dimensional output wrapped with a sigmoid activation function.

For training, we use the cross-entropy loss function, which we minimize using the

Adam algorithm [33], with a learning rate starting at

and decaying by a factor of 2 if the validation loss did not improve for 20 epochs. We perform a maximum of 200 training epochs for each snapshot, with the training being terminated if the validation loss function has not improved for 50 epochs. As mentioned above, to preserve the simplicity of the application, we used one architecture for all snapshots. However, since the last snapshots include more information than the initial snapshots, we observed overtraining within a short amount of epochs during the training of the earlier snapshots. Hence, we only used the network structure before it started to overtrain.

Label Description Examples
antiskyrmion A circular region of spins anti-aligned with the extenal magnetic field.
meron A wall of spins (anti-aligned with the external magnetic field) that ends.
helix At least two contiguous walls of spins anti-aligned with the external magnetic field.
ferromagnetic A region of spins aligned with the extenal magnetic field, either filling the full snapshot, or at least having a size larger than the typical distance between antiskyrmions.
hexagonal An arrangement of antiskyrmions forming a hexagonal lattice filling the entire snapshot.
Table 1: Labels used in the dataset. For each label (except hexagonal), if an object matching the description is observed, the corresponding element of the label vector is set to 1, otherwise it is set to 0.

Iv Results

In Fig. 1

, we show how the final accuracy changes as the network is trained on different snapshots from the simulations. We have performed the training independently for each snapshot, with the same architecture for all of them. In order to estimate the bias originating from the optimisation algorithm, the training is repeated 10 times for every snapshot with independent initiations. We present the results as central points representing the mean accuracy from the 10 runs, together with the error bars showing one standard deviation from the mean. Each label is assigned if the corresponding network output is above a threshold of 50%

111The network output does not refer to a specific certainty measure, where 90% output does not ensure that the network is absolutely certain with the labelling. Here we used the network output as a threshold for a complex observable designed by the network.. The colours distinguish the accuracies for the different labels defined in Table 1 where antiskyrmion, meron, helix, ferromagnetic and hexagonal are represented by red, green, blue, purple and orange colours.

Figure 1: The accuracy of the CNN predictions as a function of the index of the snapshot used. Each point represents the mean accuracy of ten independent training with one standard deviation in the outcome. Snapshots are tagged if and only if output probability is greater than 50%.

As Fig. 1 shows, the performance of the network for most labels improves as it is trained on snapshots generated with longer Monte Carlo times until the snapshot index is around 5–10. This automatically identifies the point at which the simulation can be stopped: at snapshot 10, the network can already predict each label with the same accuracy as for the final one.

For fixed snapshot index, there is a hierarchy in the network’s performance for different classes. With its arrangement of aligned spins, the ferromagnetic label is the easiest to identify, giving an accuracy of around 99% for all snapshots. The antiskyrmion, meron and helix features are next, all of them having a similar final accuracy of approximately 95–96%. The detection of the hexagonal arrangement is not so precise, with a maximal accuracy close to 90%. This is to be expected since it is the only label that does not depend on the local features of the images but their global structure.

We illustrate the evolution of the network predictions for three samples in Fig. 2. The first row shows the remarkable performance of the network for the early snapshots: to the naked eye, the first snapshot looks like it contains antiskyrmions with strong deformations, which could be even tagged as merons, and no hexagonal structure. Intuitively, it is not clear from this what the evolution will be. Despite this, the network can predict that the final phase will only contain antiskyrmions, not merons, and be arranged in a hexagonal lattice. Finally, we stress the importance of training the network on the snapshot index it is expected to be used: the network for the final image, which achieves high accuracy in its domain, performs worse than the one specialised in the first snapshot index here, giving the incorrect [antiskyrmion, meron] prediction.

Figure 2: Examples of snapshots from different simulations. Each row have been divided in three columns for first and tenth snapshot and final image. The bottom label in the final image shows the truth label of the image and the rest shows the mean output of the CNN with output probability greater than 50%.

In the second row of Fig. 2

, one can see the first snapshot with structures that the noise makes hard to classify by hand. One of them appears to be a wall of spins on the bottom left of the image, which could be misidentified as a bimeron being formed. This ends up disappearing in later snapshots. The network correctly predicts this by attaching a single

[antiskyrmion] label to it. However, in this case, one needs to wait until snapshot 10 for the network to predict the hexagonal arrangement of antiskyrmions. Finally, in the third row of Fig. 2, we show an example with meron/helical features which is correctly classified from the beginning, despite the noise that is present there.

As an example of the practical application of our approach, we use it to generate a phase diagram from the results of simulations that are stopped at early times and compare its application to the results of the full simulations. First, we generate samples in the same way as the training set, but this time for an equally-spaced grid of points in space, with . The temperature varies between 0.05 to 1.25, in steps of 0.05, while the external magnetic field goes from 0 to 0.4, in steps of 0.02. Then, we apply the previously-trained networks for snapshot indexes 6 and 20 to the corresponding snapshots and generate a phase diagram. To have smoother borders for each region in both diagrams, we take the average of the predictions for each block in the grid. We then assign a colour/label to each point as follows:

  • Points with a network output above 0.5 for the antiskyrmion label are coloured in red and labelled “Antiskyrmion”.

  • If the output for both the antiskyrmion and the ferromagnetic label is above 0.5, this means that both antiskyrmions and regions of significant size with spins aligned with the external magnetic field are found. This means that the antiskyrmions are not closely packed. Thus, the region is labeled “Antiskyrmion gas”, and a lighter red colour is used.

  • Points whose only network output above 0.5 is the ferromagnetic one are labelled “Ferromagnetic” and assigned a grey colour.

  • Similarly, those points whose only network output above 0.5 is the helical one are labelled “Helical” and assigned a blue colour.

  • Finally, we select a lower threshold of 0.25 for the meron output. If this is reached, independent of other labels, the corresponding point is labelled “Meron” and uses a light salmon colour. This is done to show a region where merons might appear, in conjunction with antiskyrmions, helical-phase walls or both. The choice of a lower threshold is made to show this region more clearly. For 0.5, it is smaller, and it almost disappears at snapshot index 19.

The points that do not fit any of these criteria are left in white colour, signalling that they could not be classified confidently in any of these phases.

We show the resulting phase diagrams in Fig. 3. Since the schedule consists of thermalising at a constant value of and , starting from a random lattice configuration, these diagrams correspond to what may be obtained experimentally by rapidly cooling from a high temperature directly into the target point.

Figure 3: Top: phase diagram for a constant schedule, obtained from short (snapshot 6) simulations followed by network prediction. Bottom: the same diagram, using instead long (snapshot 20) simulations. The main regions of a typical phase diagram (pure helical, pure ferromagnetic, and the complete region where antiskyrmions are found), are identified correctly from early stages of the simulation. However, longer simulations are needed to accurately resolve the regions for the occurence of finer details, such as the mixed states that our approach allows to identify.

We first notice the following differences between the phase diagrams for the two selected snapshots: the shrinking of the “Meron” region and the appearance of the “Antiskyrmion gas” one. The shrinking of the “Meron” region can be explained by the presence in the early stages of structures that are classified as merons, which later disappear in favour of antiskyrmions. This is partially mitigated by training the network on the early snapshots, as shown in some of the examples in Fig. 2, but the effect persists to a smaller degree. Another feature that can only be found at larger Monte Carlo times is the mixed antiskyrmion-ferromagnetic states, the “Antiskyrmion gas”, which is not detectable at early stages due to noise. We conclude that the simulations need to be run for the entire time we considered to resolve these details.

However, the rest of both diagrams approximately agree. Concretely, the regions for the helical and ferromagnetic phase and the regions where antiskyrmions are detected are similar. This means that to identify where these features emerge, one only needs to simulate snapshot 6. Since these are the most critical features, and the only ones present in a typical phase diagram, running short simulations and using the CNN to predict a final state is a viable option for many applications.

V Conclusions

We have shown that a multi-label machine learning approach allows us to identify complex structures and mixed states in the results of 3D Monte Carlo simulations of spin lattices with DM interactions. We have trained a CNN using 2D lattice slices to detect antiskyrmions, merons, helical-phase walls, regions with ferromagnetic arrangements of spins, and hexagonal lattices of antiskyrmions. We have only used the component of the lattice spins, which is similar among antiskyrmions and Bloch/Neel skyrmions. Although we have applied this approach to a version of the DM interactions that supports antiskyrmions, it would perform similarly for skyrmions of both types.

In addition to directly identifying these features, we have used this framework to predict their emergence from the early stages of the simulation. The CNNs trained on the first few snapshots, with labels given by the final configuration, can predict final features even in cases in which it is not intuitively apparent from an inspection by the naked eye of the corresponding images.

One of the applications of the early-snapshot CNNs is in shortening simulation times. Thanks to them, one can stop the simulation at early stages and predict the relevant features. As an example, we have constructed phase diagrams for snapshots 6 and 20 with . Although long simulation times are needed to resolve fine details involving mixtures of merons and antiskyrmions or whether antiskyrmions are in a crystal or a gas phase, we find that the main phases, antiskyrmion, helical, and ferromagnetic, can be detected with significantly shorter simulations.