Spin-transfer torque magnetoresistive random access memory (STT-MRAM) has been proposed as a non-volatile replacement for random access memory that offers high speed, low power consumption, non-volatility and unlimited endurance [1, 2, 3, 4, 5]. One of the primary obstacles to its widespread deployment is physical scaling, due to an increased error rate that accompanies smaller volumes of storage cells. A memory device should switch quickly and reliably when switching is intended and otherwise maintain its current state. However, thermal fluctuations in the magnetization orientation can sometimes induce unwanted switching during either storage or an attempted read event, or failure to switch during an attempted write event. These contribute to the write soft error rate (WSER), read soft error rate (RSER), and retention failure rate . The expected values of WSER and RSER in STT-MRAM should not exceed the order of without error correction . Due to the importance of these extremely small rates in quantifying the viability of experimental STT-MRAM configurations, analytical and computational techniques that facilitate their calculation are critically important.
One approach makes use of the Fokker-Planck equation (FPE) describing the time evolution of the switching probability [7, 8, 9]. In the macrospin approximation, treating each STT-MRAM bit as a single magnetic domain, the FPE can be solved directly or can be further approximated by the Brown-Kramers formula, which overestimates the RSER for short read times . However, both the effects of spatial variations in the magnetization across a single memory cell and interactions between adjacent cells obviously cannot be captured in the macrospin approximation. These effects increase in importance as the size of each cell exceeds the scale of above nm in lateral size and must therefore be taken into account to support development of magnetic nanodevices at this scale . Direct numerical simulations of the FPE increase exponentially in computational cost as the dimension of the coupled system of macrospins is increased, and this high cost is exacerbated by the presence of boundary layers associated with the small size of thermal fluctuations. Efficient computational methods are therefore needed to provide a means of determining these small switching probabilities and rates.
The most common approach to computing switching probabilities uses sampling to provide an empirical estimate of the quantity of interest. However, the extremely low switching probabilities and rates relevant to micromagnetic devices make naive Monte Carlo studies essentially impossible. A common approach to recover the tails of the probability distribution from Monte Carlo simulations is via extrapolation (see, e.g.,[11, 12, 13]). However, this may introduce large uncontrolled inaccuracies due to failure of the fitting form to capture the asymptotic behavior of the probability distribution in the small noise limit [14, 15]. Alternatively, variance reduction techniques such as importance splitting attempt to concentrate the samples generated on those with a higher likelihood of registering a rare event of interest [16, 17, 18, 19, 20].
Here, we demonstrate that accurate switching probabilities and error rates of STT-MRAM devices can be computed efficiently using another variance reduction technique known as importance sampling (IS) in an ensemble of biased Monte Carlo simulations. We intentionally choose a particularly simple micromagnetic setting to clearly illustrate the IS methodology in the context of STT-MRAM modeling. The main point of this paper is to demonstrate that with the help of IS the events with extremely low probability of occurrence can be accessed via direct Monte Carlo sampling with an additional post-processing step. This, together with the simplicity of its implementation could make the IS based approach a powerful tool in assisting the design of the next generation of spintronic nanodevices.
Our paper is organized as follows. In Section II, we review IS and the strategies for optimal choices of bias. In Section III we formulate the stochastic micromagnetic model for a single macrospin or a system of exchange coupled macrospins, and then show how to apply IS to sample switching probabilities in the macrospin approximation. In Section IV, we carry out IS simulations for a single macrospin and a coupled system of two indentical macrospins with different choices of biases to demonstrate the power and efficiency of IS in the context of STT-MRAM applications. Finally, in Section V we briefly summarize our findings.
Ii Importance sampling
Suppose we wish to estimate the probability
of a system driven by random variableproducing an event with indicator function . Here, denotes the expected value with respect to the density , such that
A naive Monte Carlo method uses independent draws from to approximate according to
The coefficient of variation (CV), which measures the statistical error of the sample, is given by
For , smallness of requires . In the present case, this necessitates an impractically large number of simulations to produce a reasonable estimate of the probability. The idea behind IS to sample from an alternative probability density that depends on a bias
chosen to increase the likelihood of the event of interest. An unbiased estimator is then recovered by weighting each result according to
where is called the likelihood ratio, and assumes that whenever The resulting CV is then given by
where denotes the expectation with respect to and . Expression (5) suggests that a “good” bias to use in IS keeps close to 1.
In many cases of interest, including the one discussed here, the underlying physical model is described by a stochastic differential equation (SDE)
where , for some , is a randomly evolving state variable, is its deterministic drift, is the noise strength matrix, is an infinitesimal increment of an -dimensional Brownian motion, and is the noise strength. In many situations of interest the noise is weak: . If we wished to observe the state exhibiting a behavior that is very rare for typical realizations of the Brownian motion , we would have to simulate this equation a very large number of times, even more so to produce an accurate probability estimate from these empirical observations. In the case of SDEs, the IS technique makes rare events happen more often by introducing a bias to the mean of the noise increment , such that the coefficient of variation of the estimate is reduced at the same time [21, 22]. This provides new paths that evolve according to
where stands for the Euclidean inner product in , is the realization of the noise that produced , and the last integral is understood in the Itô sense. This expression may then be incorporated into the IS estimator (4) to recover an unbiased probability estimate. We emphasize that in the limit one recovers from (4) the exact value of for the original, unbiased process. The effect of good biasing is simply to concentrate the runs sampled in (4) on those with highest likelihood of activating the indicator function, thereby producing an accurate estimate of with vastly fewer biased Monte Carlo runs.
Over finite-time horizons, an effective bias can be obtained by minimizing the Freidlin-Wentzell large deviation action. Namely, for a given time horizon , current time , current state and the set of targeted outcomes one looks for the minimizer of the functional
Over infinite-time horizons, i.e., when , a convenient reparametrization allows the action in (9) to be minimized with respect to arclength rather than time. In this case, one can choose , where is obtained from the minimizer of the functional ,
among all absolutely continuous paths satisfying and . Here . The infinite-time bias is then given by 
One of the strategies discussed below is the use of IS with infinite-time bias functions to obtain switching probabilities over finite time horizons. This strategy is based on the observation that, when the characteristic speed obtained by dividing the domain radius by the time horizon is small relative to the maximum speed of the infinite-time minimizing path, the finite-time and infinite-time minimizing paths are nearly identical outside of small neighborhoods around the dynamic fixed points. As will be seen in Sec. IV, this strategy is effective for intermediate times but does not correctly promote the long periods spent near the stable fixed point in the true dynamics. This manifests in reduced efficiency of the IS strategy in these cases. To address this phenomenon, we turn off the biasing for values of within a diffusion length of the stable fixed point, which leads to a significant improvement of sampling efficiency.
Iii Micromagnetic framework and thermally induced switching
We consider a region occupied by a ferromagnetic film with in-plane shape and thickness , i.e., , characterized by saturation magnetization , exchange stiffness and an in-plane uniaxial anisotropy constant , at temperature in energy units. To model the free layer of an in-plane STT-MRAM cell, we use the stochastic Landau-Lifshitz-Gilbert equation [28, 29]
for the unit magnetization vector, written in the Landau-Lifshitz form:
where time and lengths are measured in the units of and , respectively, is Gilbert damping, is the gyromagnetic ratio and is the spin-transfer torque. The effective field is given by
is the leading order thin film micromagnetic energy measured in the units of , and is the quality factor . The thermal fluctuation term
is a delta-correlated, suitably regularized three-dimensional spatiotemporal Gaussian white noise, with noise strength
The spin-transfer torque is given by
where and are dimensionless Slonczewski and field-like torque strengths . Here is the density of electric current passing perpendicularly through the film, is the elementary charge, is the spin polarization efficiency, is the relative strength of the field-like spin torque, and is the spin-polarization direction. In this paper, we consider , i.e., when the spin current is polarized along the easy axis in the film plane, as is the case in the basic in-plane spin valve [2, 34, 35].
) is an ordinary differential equation. In this case, where is the area of in the units of , and the effective field is given by
where , and is a three-dimensional Brownian motion. The noise coefficient is consistent with the Gibbs distribution, in which plays the role of the dimensionless temperature.
When the dimensionless parameters satisfy
i.e., in soft materials with relatively high damping and low spin torques, the magnetization is always constrained to lie almost entirely in the film plane [36, 37, 38]. In this case the system (13) may be simplified to an equation for the angle such that (see Appendix):
, , and the unit of time is now . We point out that even though (20) was obtained for an in-plane STT-MRAM cell, exactly the same equation arises in the modeling of perpendicular cells . Therefore, a direct comparison with the results obtained by Butler et al. , who used the analysis of the Fokker-Planck equation is also possible.
Equation (20) with will be used as the simplest example of a stochastic micromagnetic model with bistability, for which several IS biasing strategies will be illustrated. A more realistic model of an STT-MRAM cell would need to incorporate spatial heterogeneity within the cell, which may be captured by considering a system of exchange-coupled macrospins associated with the magnetization in each of the polycrystalline grains. If each grain has a dimensionless area , , and are such that the magnetization in each grain is , then (20) may be generalized to [39, 40, 41]
using Heisenberg exchange with dimensionless strengths for the interactions between the grains, with being uncorrelated Brownian motions. Note that this equation is a stochastic version of a gradient system governed by an effective potential
and obeying detailed balance. The coupling coefficients are non-zero only for the nearest neighbors and may in principle be determined from the geometric characteristics of the individual grains. For simplicity, in this paper we will limit ourselves to the consideration of the case of two identical macrospins only, which may correspond, e.g., to exchange-coupled synthetic bilayers .
The single macrospin drift term defined in (21) has stable fixed points and , separated by unstable fixed points , where . Therefore, for a trajectory starting close to at a switching event for the time horizon would be defined as one in which is close to at . For the purposes of this paper, we consider a switching event to have occurred if for some , starting with , i.e., the macrospin changes its direction along the easy axis. Similarly, for the system of two coupled macrospins governed by (22) and starting with , we define a switching event to have occurred if for some , i.e., at least one macrospin changes its direction along the easy axis.
The exact finite time switching probability for a single macrospin exhibited by (20) can be computed by solving the backward Fokker-Planck equation for the probability that a trajectory starting at a given value of reached the value of by time . This probability satisfies
for , with initial and boundary conditions
respectively. In particular, the probability of having switched by time , given the initial state is equal to .
As an alternative to solving (24), we incorporate IS in the estimation of by sampling controlled dynamics
among all with and . For sufficiently large time horizons, we set instead , where the ininfite-time bias is obtained from the minimizer of the infinite-time action . In the single macrospin case the latter is simply given by a straight line segment, resulting in a particularly simple explicit form of the bias:
To account for switching events, we stop the trajectory at time as soon as the switching criterion is satisfied, or otherwise set . The likelihood ratio is recovered from (8), which in this case is
where either or , depending on whether we use the finite- or infinite-time bias function, respectively, and is the solution of (26) with a particular realization of the noise.
The following sections describe results obtained from IS simulations of macrospin and coupled-spin systems using physical parameters drawn from Ref.  for the purpose of comparison. All simulations use the Euler-Maruyama method with a fixed time step .
The IS results presented below are obtained using bias functions based on either finite- or infinite-time minimizers of the action given by (9). Infinite-time bias functions are based on (28) in the macrospin case, and therefore do not require additional computation. Infinite-time bias functions for the coupled-spin system are obtained by minimizing the action in (9) through the geometric minimum action method (GMAM) with 50 gridpoints . Finite-time bias functions are obtained by minimizing the action in (9) through a combination of Newton’s method for the associated Euler-Lagrange equation and the improved adaptive minimum action method , with gridpoints in the macrospin case and in the coupled-spin case.
Iv-a Single macrospin
The discretized version of (26) reads explicitly
where , for , and
are independent and drawn from the standard normal distribution. The value ofis chosen so that either for all and , or , i.e., we stop the simulation if a switching event occurs at time . The likelihood ratio corresponding to (III) is then given explicitly by
The figures below are generated using switching probabilities and their coefficients of variation computed by applying formulas (4) and (5), respectively, to ensembles of runs, where for runs that generate a switching event and otherwise.
Figures 1 and 2 show the RSER as a function of time for seven values of between and with thermal stability factors and , respectively. Runs at both temperatures are included here to facilitate comparison with the results presented in Ref. . Both show a comparison between numerical solutions of backward FPE (24) and IS simulations for the macrospin model. For in Fig. 1 and in Fig. 2, infinite-time bias functions are used for IS while for in Fig. 1 and in Fig. 2, finite-time bias functions are used. Agreement between the FPE and IS results is excellent throughout the range of times and currents, and the IS results are internally consistent between the finite- and infinite-time bias functions used at and in Fig. 1, and used at in Fig. 2.
Fig. 1 (b) shows the CV for the IS estimates in Fig. 1 (a), and Fig. 2 (b) shows those for the estimates shown in Fig. 2 (a). The CV values for the IS estimates range from approximately to . We note that with an inappropriate choice of bias the CV can be an imperfect measure of accuracy for Monte Carlo estimates using variance reduction . However, this is precluded by our choice of an asymptotically optimal bias function that is based on large deviation theory [46, 24], as can also be seen from the excellent agreement with the solutions to FPE (24). The low CV values obtained for such extremely small probabilities with moderate sample size of are therefore a clear demonstration of the efficiency of the bias functions used here.
In both figures the CVs are observed to increase when is decreased from to , indicating that the infinite-time bias function used for these runs becomes progressively less efficient at capturing the switching events as the time horizon shrinks. The application of finite-time bias functions for smaller times lowers the CVs as expected. Furthermore, Figs. 1 (b) and 2 (b) show a similar pattern in the CVs generated by IS with infinite-time bias functions, where the CVs also increase for large times . Fig. 3 illustrates this sudden increase in CV in the context of longer times, as well as a histogram of switching times, and the time evolution of the spread in values of likelihood ratio.
This decrease in efficiency of the infinite-time bias function for large but finite time horizons is due to the fact that exits have a natural finite time scale dictated by diffusion near the fixed points with the action minimizer bridging the gap between them. When the time horizon of the simulation is large relative to this time scale, it allows for exit events that hover near the stable fixed point before exiting just prior to the horizon time. These events occur with considerably higher likelihood under the unbiased dynamics than under the biased dynamics, leading to a very large likelihood ratio that causes them to dominate the CV computation. Since the diffusion time grows as the noise strength decreases, this phenomenon can be regarded as a finite-noise effect, and it indeed vanishes as . We address this issue for finite noise by turning off the biasing near the stable fixed point, i.e., for , with the results plotted for different values of in Fig. 4. It is clearly seen that as increases, the anomalous behavior for large horizon time is mitigated, at the expense of sampling efficiency for small horizon times.
Iv-B Two coupled identical macrospins: model
To demonstrate that the IS method can also be effective in coupled systems, we simulate two spins with identical volume and dynamics given by (22), which in this case is explicitly
where is the ferromagnetic exchange coupling strength favoring parallel alignment of the two spins. The initial conditions are . Recall that as a switching criterion we adopt that at least one of the angles reaches in absolute value.
At finite temperature, the rare events of switching for the coupled spin system occur along the maximum likelihood paths, which are also the minimum energy paths of the system. When the system undergoes a transition, it switches by coherent rotation for strongly coupled spins, asymmetric coherent rotation for weakly coupled spins, or single particle reversal for extremely weakly coupled spin . More precisely, in the limit of infinite coupling strength , the coupled spin system collapses to the macrospin model with , and the most probable path terminating at is identical to that of a single macrospin. As the coupling strength decreases, the dynamics of the coupled spin system changes significantly, ultimately leading to spins that evolve independently. Note that in -dimensional coupled spin systems, the sequence of bifurcations from single macrospin dynamics to -fold macrospin dynamics as decreases from infinity  further exacerbates the challenge in finding appropriate bias functions.
Iv-C Two coupled identical macrospins: biased dynamics
In contrast to the single macrospin case, for two coupled macrospins an exact analytical bias function is no longer available even for infinite-time biasing. Therefore, it is necessary to obtain numerical bias functions by minimizing the Freidlin-Wentzell action (9) with terminal condition for some .
For finite-time bias the action functional is given explicitly by
and the corresponding finite-time bias is
where is the minimizer of satisfying and .
For infinite-time bias, we minimize
and express the bias as
Fig. 5 (a) shows IS estimates of switching probabilities obtained using finite-time bias functions and infinite-time bias functions with a suitable cutoff near the origin applied to the coupled system with non-dimensional temperature and coupling strength . The coupling strength is an example of strong coupling, for which the spins rotate coherently along optimal switching paths. The open circles and open diamonds denote estimates generated by IS using infinite-time and finite-time bias functions, respectively. With sampling size of , the infinite-time bias functions allow us to sample switching probabilities for with ranging from to . For , finite-time bias functions are used with IS. For infinite-time biasing the bias is switched off, i.e., is set to zero, when the effective potential from (23) of a configuration falls below that of with chosen to minimize the CV.
Fig. 5 (b) shows the CVs becoming larger as time decreases, indicating that the infinite-time bias functions become less efficient. The effectiveness of using finite-time bias functions is evident in the CV values in Fig. 5 (b), where the CV values with open diamonds at time are much smaller than the CV values with open circles at times and . This improved effectiveness comes with a cost of computing updated finite-time bias functions at each time step.
Similar results for a weakly coupled system are shown in Fig. 6. Fig. 6 (a) shows the switching probabilities as a function of time for different read currents with non-dimensional temperature and coupling strength . The coupling strength is an example of weak coupling, for which the spins rotate incoherently along optimal switching paths. With a sampling size of , IS using infinite-time bias functions allows us to sample switching probabilities for with ranging from to . As in Fig. 5, Fig. 6 (b) shows a decrease in efficiency of IS using infinite-time bias functions as decreases.
Fig. 6 (a) also shows switching probabilities generated using naive MC simulations with a sample size of for read current at various pulse durations. Naive MC simulations at this sample size fail to accurately capture probabilities less than while IS is able to estimate probabilities as low as . This is reflected in Fig. 6 (b) where the CV is seen to diverge as the probability estimate decreases. Finally, a few representative switching trajectories corresponding to the results in Figs. 5 and 6 are shown in Fig. 7.
Fig. 8 shows a comparison between the CV for IS with sample size of and the CV for naive MC with sample size of for . For time the CVs for naive MC exceed the CVs for IS, by a factor ranging from 1 to 6 as the value of is decreased. This means that the number of IS samples required to achieve an estimate with the same or better accuracy than an estimate generated by naive MC is smaller by several orders of magnitude. For example, for switching probability by time , where the CV ratio is about , al least naive MC samples are required to achieve the same accuracy of an IS estimate generated using only . This contrast is even more stark for smaller applied currents with much lower associated switching probabilities, where the computational effort required by naive MC simulations is prohibitive. Sampling-based probability estimates are only available using IS at these parameter values.
Iv-D Failure to switch (WSER)
In this section, we estimate the WSER using MC and IS for both the macrospin model and two coupled-spin system, where the current is set sufficiently high as to drive the origin unstable, such that a switching event is expected to occur in the majority of random realizations. The two-spin state is considered to have switched when for some
Fig. 9 shows a comparison between the numerical solution of the FPE and simulation results using IS for a single macrospin. Independent realizations of the macrospin evolution were computed with the same initial conditions, currents , and , and independent thermal noise with . The estimates obtained using IS samples show good agreement with the numerical solution of the backward FPE for moderate and long times. For the two-spin system, IS samples are used to estimate non-switching probabilities.
In conclusion, we have applied IS to estimate error rates in reading and writing spin-torque memory devices in the macrospin limit and for the case of two coupled spins, using a variety of applied currents and thermal stability factors. We have demonstrated how IS is able to compute probabilities well below those computable using naive MC simulations, while producing accurate estimates with improved efficiency in cases where the probability is computable using naive MC but still very small. Depending on the time horizon of the read or write event relative to the drift dynamics of the spin system, IS simulations using infinite- or finite-time bias functions are appropriate. Infinite-time bias functions require only moderate computational cost; however, the increased cost can be significant in computing finite-time bias functions. Further improvement of the infinite-time biasing can be achieved by introducing a threshold turn the bias on only some distance away from the metastable equilibrium whose thermal stability is investigated.
In this appendix we derive a reduced model for the macrospin dynamics of the in-plane magnetization component in the regime of (19), which is done in the spirit of [36, 37, 38]. Our starting point is (13) for a macrospin , in which satisfies (18). Introducing cylindrical coordinates, we can write
Then, changing the unit of time to we arrive, after some algebra and a change from Stratonovich to Itô formulation, to the following Itô SDEs for and :
where and are two uncorrelated Brownian motions and
We now assume that , while , and , with , and of order unity. In this case any deviations of from zero are strongly suppressed. Therefore, linearizing the above equations in and introducing , we obtain