Realistic complex systems often involve multiphysics and multiscale processes. Their effective mathematical modeling and simulations are challenging subjects under current study by many scientists. Since the invention of calculus, continuum models in the form of differential equations have been effective models of many physical processes, particularly when the underlying physical quantities are smooth and slowly changing. Mathematical descriptions based on differential operators connect instantaneous rates of changes of relevant quantities together, thus are local, since they involve only local information in an infinitesimal neighborhood. Partial differential equations (PDEs) are also used to model rapid and abrupt changes. An example relevant to our discussion here is given by linear second order elliptic equations with highly-oscillatory and multiscale coefficients. Such equations are popular models of diffusion and transport processes in highly heterogeneous environment, though their analytical studies and numerical simulations are more difficult to carry out, in comparison to equations having smoothly varying coefficients. As possible remedies, multiscale techniques like homogenization methods have been developed to capture effective properties with either analytical means or computational algorithms based on numerical homogenization[6, 30, 31, 46, 49, 60]. Meanwhile, in recent years, there have been growing interests and capabilities in the nonlocal modeling of complex processes that exhibit singularities/anomalies and multiple scales. Examples include nonlocal elasticity models like the Eringen model and peridynamics [8, 37].
The main aim of this work is to explore the connections between the subjects of homogenization and nonlocal modeling and the relevant computational issues. Homogenizations of multiscale problems, both analytical and numerical, are effective ways to achieve model reduction and coarse graining, while nonlocality is a generic feature of the latter. We intend to formalize this widely accepted understanding so as to promote the cross fertilization of ideas from the two research fronts. For example, we hope that homogenization may help characterizing the nature and form of the nonlocal interactions hypothesized in nonlocal models while studies of nonlocality may help developing more effective numerical methods for homogenization.
Our discussion is done through a review of some existing works in the literature and some illustrative new examples. It ranges from the study of deformation of lattice models to transport, diffusion and wave propagation in highly heterogeneous media, with the goal of building connections between nonlocal surrogate models and homogenization, model reduction and multiscale modeling. We largely limit ourselves to simple linear models to convey the key messages while pointing out some challenges and open questions concerning more complex and nonlinear systems for future research. We start with some general earlier results relating local and nonlocal equations, which we have not seen in the homogenization or nonlocal literature.
1.1. Local and nonlocal operators: general theorems and dispersion relation
A differential operator, acting on a function , is a linear operator that can be performed locally by taking infinitesimal changes of , offering its local character. In contrast, nonlocal operators can mean all operators that are not local. Let us review some general theorems on linear operators, and see that differential operators are the only local operators, and all the other linear operators are nonlocal in nature. Let and be open sets of and respectively. Any continuous function can be viewed as the kernel of an integral operator ,
Conversely, the way to characterize operators having such a kernel is done within the theory of distributions. The Schwartz kernel theorem says that there is a one-to-one correspondence between distributions and continuous linear maps from and . denotes the space of smooth functions with compact support, and denotes the space of distributions.
One may use the support of the kernel to characterize the range of interactions of : for that have singular support at , is a local operator. For example, leads to the identity map while where denotes distributional gradient operator leads to . On the other hand, is a nonlocal operator if the support of goes beyond the diagonal. For example, leads to a shift operator . In fact, it was shown in [42, Chapter 5] that the kernel of a continuous linear map is supported by the diagonal if and only if is in the form of
where and the sum in (1.1) is locally finite.
An alternative description of an operator being local is the property that the support of is a subset of the support of , for any suitable function . An interesting discussion, initiated by Peetre  is that differential operators can be characterized through such notion of locality. The original Peetre theorem says that a linear operator with is in the form of (1.1) with coefficient . Although the Peetre theorem looks like a variant of the Schwartz’s work presented previously, it is actually a deeper theorem. First, it says that if the operator maps to a better space, then the coefficient in (1.1) lives in a better space. Second, no continuity of the map is assumed in the Peetre theorem, so the continuity is an automatic fact out of locality. So the Peetre theorem characterizes differential operators only through locality. Generalizations of Peetre theorem can be found in [10, 64]. In particular, the conclusion of locally finite summation in (1.1) can actually be replaced with globally finite summation, namely the sum in (1.1) is for with a certain fixed number .
From the above discussions, we see that a linear operator is either local or nonlocal. Moreover, local operators are differential operators of finite order. Let us also remark that the notions of local and nonlocal in practical modeling are relative to the objects that they operate on: the equation is local, but can be seen as nonlocal. Indeed, generically, model reductions of local models often lead to nonlocality and nonlocal models often get localized by closure relations or the introduction of additional state variables . More discussions on these aspects can be found in later sections.
Based on the Peetre theorem, we have an important observation on the Fourier symbol of , or alternatively the dispersion relation associated with . For a spatially homogeneous operator , that is, the kernel given by the Schwartz kernel theorem is translation invariant. Peetre’s result shows that Fourier symbol of must be a polynomial of the wave number, if the operator is local. Furthermore, once the Fourier symbol of is not a polynomial of the wave number, the operator with an translation invariant kernel must be nonlocal. This simple fact out of the Peetre theorem will be used again and again in section 2.
To end this subsection, let us also mention that the generalized Laplacian, or more precisely, the general theory of Dirichlet forms is given in the work of Beurling and Deny , and it is shown that a Dirichlet form may involve local, nonlocal and self-interacting part. It is then demonstrated by Mosco  that the limits of Dirichlet forms of local differential operators in general may not be local forms. Therefore, given the generic nonlocal limiting forms of the local problems, one may anticipate that nonlocal limits and related numerical algorithms could play important roles in both theoretical analysis and practical applications. In the next, we will demonstrate that nonlocality is a generic feature in the context of homogenization. More discussions are followed in section 4.
1.2. Homogenization and multiscale modeling
Homogenization theory is concerned with the averaged behavior of heterogeneous differential equations with rapidly oscillating coefficients. In other words, homogenization aims to obtain the macroscopic or “homogenized” or “effective” equations from systems with a heterogeneous structure at a microscopic scale. Mathematically, the homogenization problem could be described as follows. Let be a family of differential operators indexed by the small parameter and has coefficients that are oscillatory on the -scale. Consider the problem
where is given. The question is to find a homogeneous effective equation
such that in some topology as .
We will consider two classical cases of second order differential equations with oscillatory coefficients. The first is one dimensional and the corresponding homogenized equation will be used in section 2.2 to derive a homogenized equation, which is nonlocal. The second is a multidimensional elliptic equation, which will be used as a model in the discussion of numerical homogenization in section 3.1. Let us illustrate our point by the simple one-dimensional example given also in . Consider the problem:
where and is 1-periodic. Notice that
The solution of (1.4) is exactly given by
where is determined by the boundary condition at and it is given by
By using the fact that , the limit of as is exactly given by
where is the constant that gives . It is now obvious that (1.5) satisfies the homogenized equation given by
where . This result will be used in section 2.2 to derive a nonlocal homogenized problem.
Consider a general elliptic problem
where the matrix is assumed to be symmetric, bounded measurable and -periodic, where denotes the unit cube. In addition, satisfies the uniform ellipticity condition. By the classical homogenization theory (), the homogenized problem is also an elliptic problem
with coefficient matrix given by
where is obtained by solving the cell problem
This is the classical homogenization case where the original heterogeneous differential equation (1.6) is local and the resulting homogenized equation (1.7) remains local. The process is however nonlocal and involves solving the cell problem (1.9) over a finite domain.
More generally, theoretical tools such as -, -, and -convergence are developed for the analysis of broader settings of homogenization, such as those without assumption of periodicity and those involving nonlinearities, see [20, 46, 69]. We note that the nonlinear homogenization theory is less developed. Already in  nonlinear homogenization is discussed but the analysis follows the standard linear derivation quite closely, resulting in local PDE effective equations. For more modern examples see, for example, the survey . The homogenized equations there are however also based on local operators.
2. Nonlocal homogenization limit
One key motivation of this work is to further explore the connection between nonlocality and homogenization/reduction of complex models. We will present in this and the following sections specific examples for which the homogenized operators, in contrast to (1.5) and (1.7), are nonlocal even if the original operators are local. The local limits exhibited in (1.5) and (1.7) may thus be seen as very special cases, although these are also cases that have been extensively studied in numerical homogenization . We now present some examples where nonlocality is essential, in the context of homogenization. Other examples can be found in, for instance, [5, 11, 12, 75].
2.1. Memory effect through homogenization
Homogenization problems of differential equations often involve coefficients that are highly oscillatory and only weakly convergent. Nonlocality may thus arise due to the fact that nonlinear functions and weak limit may not commute in general. A classical example that often has been cited is given by Tartar in  which considered the following equation,
Its solution is explicitly given by
The main point in  is that if converges only weakly as , to some , then the weak limit of may not be given by . Instead, the weak limit cannot be generically expressed by with any function and is given as
for certain family of probability measures. With a simple example that , where is given by
we see easily , which cannot be expressed by with any function . In , Laplace transform in is used to match the limiting function with the solution of following nonlocal in time equation
with a coefficient and a memory kernel . For the above example where is given by (2.2), the memory kernel is found to be an exponential function and the equation for the limiting function is given as
We note that although the resulting equation for appears essentially nonlocal, it can be localized, not with directly, rather by defining an additional configuration variable. For example, with
The introduction of an extended configuration space is one of the popular closure techniques that has also been used quite effectively in localized nonlocal models, such as in viscoelastiticity. More discussions on this can be found later in section 3.4.
2.2. Partial differential equation
We simplify an example given in  demonstrating that nonlocal operator may exist as the homogenized limit of differential equations.  considers an operator in the cylinder , where and are two elliptic operators in
with oscillatory coefficients. The homogenized operator is found through Fourier transform in. It is said in  that the Fourier symbol, as , is in general not a polynomial, and thus corresponds to a nonlocal operator in the variable. Here we given a simple example reminiscent of the one in  and show that nonlocal operator indeed exists as the limit of differential equations. Consider the following equation
where satisfies zero Dirichlet boundary condition in the variable and periodic in the variable. Let
where is 1-periodic and
Denote to be the Fourier transform of in the variable, then we have
Note that if the following equation is true
then the right hand side of the last formula in (2.8) is equal to . However, (2.9) is generally not true for . Indeed, if is nonnegative, then from Hölder’s inequality, we have always for and the equality holds only if is a constant almost everywhere. In general, we have the following expansion based on the equation (2.8),
The term in the parentheses on the right hand side of the last equation given above can be seen as the nonzero correction if (2.9) is violated. In general, , as expressed above, is not a polynomial of . Therefore, by the discussions in section 1.1, the homogenized equation of (2.6) contains a nonlocal operator in the variable.
2.3. Nonlocal effective wave equation
So far we have been discussing homogenization in a classical sense, namely that we exemplify the process of finding the limiting function of as as well as the equation it satisfies. From a broader view of homogenization, it is not necessary that the effective equation of a highly heterogeneous equation given as (1.2) should be an -independent equation given as (1.3). It satisfies the purpose of homogenization as long as one could find an approximation of (1.2) that does not carry the oscillatory behavior and thus is easier for numerical implementation. In this spirit, we give in the following an example where an -dependent effective equation is derived for wave equation in heterogeneous media, and in fact it serves as a better approximation of the original problem than the one given by the classical homogenization theory. Consider wave propagation through a periodic medium, which is given by the the equation
complemented with initial condition
where is given by (1.8). It turns out that the effective model (2.12) gives a good approximation of (2.10) for short times of observation [6, 9, 21, 62]. In a recent work , the authors showed that for initial-Dirichlet boundary value problems, for in a fixed time window, independent of . However, when the size of the time window is large, the important dispersive feature of (2.10) is not predicted by the the effective model (2.12). Santosa and Symes  were the first that gave a dispersive effective model by using Bloch wave expansion. Loosely speaking, their model is of the form , with , and the effective model can well approximate (2.10) when time scale . There are two drawbacks of such an effective model. The first is that the equation is actually ill-posed due to the term, although it can be made well-posed by a classical Boussinesq trick ([15, 38]). The second is that although the dispersive effective model in  does better than the non-dispersive effective model for large time scale, the approximation is still within the time scale .
Our main point here is that by allowing the time for given , it is necessary to have a spatially homogeneous nonlocal operator such that the equation
should be the spatially nonlocal operator with Fourier symbol corresponding to the first eigenvalue of the operator. And the equation (2.13) is naturally well-posed.
For any real vector, there exists a countable number of solutions of (2.14) in the form
with eigenvalue . is called the quasi-periodic Bloch wave and . The quantity , as a function of k, can be thought of as dispersion relations for the -th mode Bloch wave. The Bloch waves form a basis for , and each (complex-valued) function can be expanded
where . Moreover, the Parseval’s identity also holds. Please see  for more details. Now the spectral analysis for can be similarly defined by rescaled quantities:
There are two key observations in . The first observation is that eigenmodes with can be neglected, in the sense that by defining
we have for . The second observation is that can be further simplified by replacing and with the Fourier transform and the Fourier mode respectively. Let
then we have for . See 
for more discussions on the spatial norm in these estimates. Now if the initial condition (2.11) is chosen such that is supported on , then it is obvious that (2.17) is the solution to (2.13) with initial condition (2.11), where is an operator with Fourier symbol that matches for . More specifically, if we let
then the kernel should be chosen such that for , the equality holds
In fact, in order for (2.18) to be satisfied, one only need to determine a kernel according to
then by the scaling in (2.15), the kernel can be determined by a rescaling of :
Since is always real, we can further assume that is an even function. With out loss of generality, if we normalize the integral of to be , and extend smoothly to for large outside , then we may find the function from its Fourier transform determined by (2.19). The smoothness of for large naturally indicates the possibility of a kernel with a compact support or fast decay. Hence, the equation (2.19) in fact gives us a way to construct nonlocal models out of the dispersion relation of heterogeneous materials and there could be more than one solution to (2.19).
Notice that, in the case discussed here, we have
From the discussions in section 1.1, in order for to be a local operator, it is necessary that its Fourier symbol is a polynomial of wave number . However, in general is not a polynomial of , for example,  shows the dispersion relation for 1d composite materials where it is expressed in terms of trigonometric functions. Therefore is in general a nonlocal operator. In , is written as a Taylor expansion around where the coefficients involving solving a sequence of cell problems. It is also found in 
that all the odd powers in the expansion ofvanish. One can also extend to the complex plane, and it is shown in  that for the 1d Schrödinger operator (which is similar in our case for 1d), , as functions of complex variable , are branches of multivalued analytic functions. General queries for the analytic properties of periodic elliptic operators can be found in .
In addition, let us remark that one may study robust discretization schemes for (2.13) with the nonlocal operator of the form (2.20) in the spirit of asymptotically compatible schemes with respect to the parameter .
At last, it is interesting to see some striking similarities between the analysis above regarding wave propagation operator and the derivation of absorbing boundary conditions in  even if homogenization is not involved in the latter. Absorbing or far field or radiation boundary conditions are used to artificially limit the size of a computational domain without changing the solution of a PDE. In , such accurate conditions were derived in terms of pseudodifferential operators, which are nonlocal in the space-time boundary. The same concept is applicable to nonlocal problems  where the notion of boundary conditions are appropriated extended to accommodate nonlocal interactions [23, 24]. For PDEs, a hierarchy of local approximative boundary conditions based on differential operators can be given for computational efficiency, which are of increasing order for more accurate approximations of the nonlocal operator 
. Similar to the discussion above, some of these local approximations give ill posed problems and some well posed ones. For absorbing boundary conditions higher order Taylor expansion of the relevant dispersion relation may give ill posed initial-boundary value problems but Padé approximations could lead to well posed problems.
3. Nonlocality through model reduction, numerical homogenization
Nonlocality is a generic feature of model reduction . Mathematically this can be easily seen from simple examples like boundary integral formulation of elliptic equations. This is also evident in many applications, for example, the effective properties, being mechanical, thermal or electrical, of a heterogeneous medium may be strongly influenced by nonlocal effect [12, 59].
We will give a couple of examples from numerical homogenization and coarse graining of linear models to illustrate how nonlocality arises naturally and further characterize the nature of the nonlocal interactions.
3.1. Nonlocality through projection, Schur complements
We discuss some recent development of numerical homogenization for which projection and Schur complements play an important role [35, 36, 39, 56]. The projection-based numerical homogenization starts with a finite dimensional approximation of the differential equation of (1.3) on a very fine grid, and tries to project the solution onto the a coarser scale. In [22, 35], the coarse scale system is found by using the Schur complement of the fine scale system. It is known that the Schur complement of a sparse matrix, obtained from the finite dimensional approximation of differential operators, is no longer a sparse matrix. So the coarse scale matrix is essentially a nonlocal operator at the the discrete level. On the other hand, it is also observed in  that the coefficients of the resulting dense matrix has exponential decay measured by the distance to the diagonal. More recently,  uses a generalized finite element method based on orthogonal subspace decomposition and the method is re-interpreted in  involving a discrete integral operator. In the basic version of of their method, the basis functions have a global support but decay exponentially so it motivates the use of localized basis functions and therefore their method is called the localized orthogonal decomposition (LOD) method. Clearly, the two groups of studies both point to the fact that numerical homogenization results in a discrete nonlocal operator with exponential decay in the involved integral kernel under certain assumptions. Here we will establish the fact that the basic version of the subspace decomposition method in  is in fact exactly the same as the projection method in  using Schur complements. Note that the discussions in these works are also related to the variational multiscale method .
Consider a variational problem defined on a fine space , namely with
where is a symmetric bilinear form. It is also written as
where is the corresponding linear functional such that . Now let be a subspace of . Then the decomposition represents the splitting of a function into a coarse scale function in and its details in the orthogonal complement . Following the notations in , we define to be the orthogonal projection from to and . Then the system is written into
where . The homogenized operator is the Schur complement given by
and the homogenized right hand side is
The homogenized equation is given by
where . Now the basic version of the LOD method in  is as follows. Given any , the corrector is defined such that
The homogenized problem is now to find such that
Let us make several remarks for (3.3). First, if we define a space of functions of the form , then (3.3) can be seen as a generalized finite element method with this special finite dimensional space. Second, the corrector function is given using a family of global basis functions, and thus (3.3) is interpreted as a discrete integral operator in . The process of using local basis functions that approximates the global ones is called in [56, 39] the localization. Third, using (3.2), (3.1) is equivalent to
On the other hand, the definition of corrector in (3.2) is the same as
and it is equivalent to . So we have
and the right hand side of (3.4) is
It is observed in  that the global basis functions decay exponentially away from the node they are associated with, which is in accordance with the observation in  that the non-zero entries in the Schur complement have exponential decay. This justifies the use of local basis functions where they are constructed from cell problems on local patches with Dirichlet boundary conditions. This is also related to the oversampling technique in the context of multiscale finite element method that will be discussed in the following subsection. Finally, we remark that the exponential decaying property is true under the assumption that the bilinear forms are uniformly bounded and coercive. If the heterogeneous coefficients contain scales with very high contrast, then the exponential decaying property may be lost. This may be due to the fact that the intrinsic nonlocality could be developed under such cases. In fact,  gives an example of a sequence of local problems with coefficients that develop singular measures in the limit, and the sequence is shown to converge to a nonlocal problem.
3.2. MsFEM and HMM
These are two numerical methodologies that also are influenced by analytical homogenization based with cell problems. The methods are however more general and do not explicitly rely on cell problems. The similarity with homogenization is that there is a macroscale and microscales and a nonlocal operation of cell problem type. Most rigorous analysis for convergence of these methods is furthermore done in the setting of heterogeneous differential equations with a microstructure and based homogenization theory as in section 1.2.
We will first consider the MsFEM. The goal in this method is to solve a differential equation with a finite element method when the coefficients in the differential equation have a microstructure as in equation (1.6). Standard piecewise polynomial basis functions require element sizes , which are smaller than to resolve the oscillations. The key to the success of MsFEM is the development of multiscale basis functions, that are specific for each differential equation. The formulation with these new basis functions allows for efficient approximation even if their element size is larger than . The computation of the new basis functions is based on standard finite elements of size over domains of size that also approximates the original differential operator. Here is smaller than , which is typically smaller than . The relation between and is plotted in Figure 1. Since it is difficult to determine the accurate boundary conditions for computing the basis functions, the oversampling technique is often used, which means that the domain above is a bit larger than . The error created by the boundary conditions decays exponentially with the increase of oversampling layers . For efficiency this step can be done in parallel and ahead of the final finite element solve on the macroscale. The computation of the multiscale elements is a nonlocal operation since a solution operator is involved and thus has strong similarities to the homogenization process as outlined in section 1.2.
The HMM framework is not restricted to FEM discretization and can, for example, couple a finite volume method on the macroscale and molecular dynamics on the microscale. For such a case it is similar to the nonlocal version of the quasicontinuum method . We will here for comparison focus on the use of FEM for both micro and macro scale. Finite element HMM (FEM-HMM) is less ambitious than MsFEM in that the target is an approximation of the effective or homogenized solution and not the full multiscale solution. This means computing in (1.7) instead of in (1.6). It is however more ambitious in that the microscale simulations are concentrated on small subsets of the full computational domain. The subsets are of size , which is smaller than the coarse scale elements, see Figure 2. This means that much more extreme range of scales can be approximated but it also requires much clearer scale separation or gaps in the scales of the full multiscale problem.
Since the target is we will assume that something is known about an effective equation. For example, the structure of (1.7) is known but not the stiffness matrix . It is then possible to formulate a FEM discretization of (1.7) but for the explicit components of homogenized matrix . The size of the elements can be of the order as in MsFEM. This macroscale simulation is then coupled to a microscale simulation based on (1.6) with element size , which provides the missing components that are needed to determine the stiffness matrix. This microscale computation is done on small domains where the components are required typically around the location of the quadrature points in the macroscale FEM. The micro problem is usually complemented with periodic or Dirichlet boundary conditions. A more thoughtful way of imposing boundary conditions for the micro problem is to take into account the far field and create artificial boundary conditions that does not change the solution of a PDE when restricting it to small domains, see e.g. .
3.3. A linear lattice model as an illustration
Following the discussion on the Schur complement given previously, we can further provide a simple illustration using coarse grained models of linear lattice models as done in . More specifically,  considered a lattice model involving next nearest neighbor interactions associated with the energy,
where denotes a one-dimensional uniform lattice with being the lattice space and and being the force constants satisfying the so-called phonon stability condition , and .
The variational equation corresponding to (3.7
) is given by the finite difference equation
which can be viewed as a discretization of the Poisson equation for a one-dimensional elastic bar. A coarse lattice can be defined by selecting a representative atom from each group of atoms, as done in the local version of the quasicontinuum method . Let us denote the selected representative atoms by
where represents the spatial scale of interest. The original model on the full lattice can be reduced to a model on by eliminating the non-representative atoms in that are between neighboring atoms in . The reduced model can be written as,
This procedure is similar to the Schur complement reduction described earlier in section 3.1. As reflected by properties of the nonlocal kernel function , a main observation in  is that the reduced model on the coarse lattice is nonlocal: the displacement of all the representative atoms are in principle coupled with every other representative atom. Indeed, it was shown in  that the resulting nonlocal interaction through the coarse-graining has no compact support but decays faster than any algebraic power, that is, for any . The numerical test in fact suggested that decays exponentially.
Moreover, in the case where , the Equation (3.8) preserves the discrete maximum principle so that is an even function with a zero sum, positive at the origin but negative away from the it. Furthermore, with a numerically verified postulation on , it was shown in  that there exists a function such that,
holds for all sufficiently large , we let for . Then, in the experiments given in , it was shown that as increases, the rescaled function indeed converges to a fixed function. Thus, serves as a parameter for the characteristic range of the resulting nonlocal interactions derived through a coarse graining process.
3.4. Memory effect and nonlocality through Mori-Zwanzig
One can generalize the discussion of model reduction via Schur complement to the dynamic case which naturally lead to the appearance of nonlocal in time memory effect. A discussion of the same nature is given by the so-called Mori-Zwanzig formalism which exemplified the nonlocal memory effect in coarse-grained dynamic systems [58, 76]. The subject is of much importance in numerical simulations, for example, it is tied to the choice of thermo-stat in molecular dynamics simulations and dissipative Brownian particle dynamics [47, 13]. More recent discussions and applications can be found in [14, 65, 54, 41].
Adopting similar notations used before, we let be an orthogonal projection operator and . Starting from a simple autonomous system with a linear operator defined on the state space and , a direct decomposition leads to a coupled system
Assuming that the reduced quantity of interest (QoI) is , by elimination and substitution, we can get
The resulting differential integral equation contains the second term on the right hand side that reflects the nonlocal in time memory effect and possible nonlocal interaction in the state variables even if itself is local. Let us also remark that if is a fast variable, then the nonlocal equation (3.12) may be approximated by a differential equation again. Take the example in section 2.1. We know that the reduced equation of the system (2.5) is given by the nonlocal in time equation (2.4). However, if we assume is a fast variable by considering a modified version of (2.5):
For a nonlinear system, a similar derivation can be made. We let the coarse grained vairables be the QoI and work with the associated Liouville equation for given by where is the time-independent linear Liouville operator.
Formally, we have but this cannot be directly calculated since contains information on all the state variables, not just . Thus, we take
and apply the Dyson’s identity
This is similar to (3.12), with no reduction done yet from the original system.
Next, by making a choice of , such as the projection operator given by Mori  where denotes the correlation operator (or can be given as a conditional expectation with respect to ), we may rewrite the equation (3.14) as
On the right hand side of (3.15), the first term
gives the Markovian term operating only on the QoI and depending only on the current time. The third term is commonly treated as the stochastic fluctuation. This is where the reduction begins to take effect. The Mori-Zwanzig formalism ensures a thermodynamically consistent approach through appropriate coupling with the second term.
Specifically, by defining a memory kernel satisfying
we see that the second term on the right hand side of (3.15) indeed becomes