In our Big Data era, and as the interactions between physics-based modeling and data-driven modeling intensify, we will be encountering—with increasing frequency—situations where different researchers arrive at different data-driven models of the same phenomenon, even if they use the same training data. The same neural network, trained by the same research group, on the same data, with the same choice of inputs and outputs, the same loss function and even the same training algorithm—but just different initial conditions and/or different random seeds, will give different learned networks, even though their outputs may be practically indistinguishable on the training set. This becomes much more interesting when the same phenomenon is observed through different sensors, and one learns a predictive model of the same phenomenon but based on different input variables—how will we realize that two very different predictive models of the same phenomenon “have learned the same thing”? In other words, how can we establish, in a data-driven way that two models are transformations of each other? The problem then becomes: if we suspect that two neural networks embody models of the same phenomenon, how do we test this hypothesis? How do we establish the invertible, smooth transformation that embodies this equivalence? And, in a second stage, can we test how far (in input space) this transformation will successfully generalize? Our goal is to implement and demonstrate a data-driven approach to establishing “when two neural networks have learned the same thing” as our title indicates: the types of observations of the networks that can be used, and an algorithm that allows us to construct and test the transformation between them. Our work is based on nonlinear manifold learning techniques, and, in particular, Diffusion Maps using a Mahalanobis-like metric(Singer et al., 2009; Berry and Sauer, 2015; Dsilva et al., 2016). This is an attempt at “gauge invariant data mining”—processing observations in a way that maps them to an intrinsic description, where the transformation between different models is easy to perform (in our case, it can be a simple, orthogonal transformation between intrinsic descriptions). In effect, we construct data-driven nonlinear observers of each network based on the other network. We remind the reader that when we talk about “observations of the networks”, we do not mean their parameters after training; we mean observations of the “action” of the trained networks on input data: (some of) the resulting internal activations. These observations can be based on either output neurons or internal neurons of the networks, which we assume to be smooth functions over a finite dimensional input manifold. Whitney’s theorem then guides us in selecting an upper bound on how many of these functions are needed to preserve the manifold topology, and to embed the input manifold itself (Whitney, 1936).
We now briefly introduce the concept of transformations between neural networks. Details of the approach can be found in section 3 (also see figure 1). In general, we proceed as follows: (1) Two smooth functions, the tasks and , are defined on the same data set (for example, two separate denoising tasks defined on the same set of images). We will also, importantly, discuss cases of tasks defined on different data sets. In that case, when each data set lives on a separate (but assumed diffeomorphic) manifold, we will discuss conditions that enable our construction, turning the diffeomorphism into an isometry. (2) We separately train two networks to approximate the tasks (for example, two deep, convolutional, denoising auto-encoder networks). (3) Then, we use the output of “a few” internal neurons of each of the networks as an embedding space for the input manifold; the minimum number is dictated through Whitney’s theorem; (4) Using the Mahalanobis metric (see §3.2 for details) in a Diffusion Map kernel applied to these internal “intermediate outputs” yields two intrinsic embeddings of them that agree up to an orthogonal transformation ; (5) The transformation between activations of the two networks can then be defined through (see figure 1), i.e., we map from the selected activations of network into (a) the (Mahalanobis) Diffusion Map representation of these activations with , then (b) map through an orthogonal transformation to the Diffusion Map representation of network activations, and then finally (c) invert the network Diffusion Map to obtain the activations of network (see figure 1). Since the selected neurons are enough to embed the input manifold, all other internal (or final output layer) activations can be then approximated as functions on the embedding.
2 Related work
With the advent of accessible large-scale SIMD processors and GPUs, efficient backpropagation, and the wider adoption of “twists” like ReLU activations, dropout, and convolutional architectures, neural networks and AI/ML research in general experienced a “third spring” in the early 2000s, which is still going strong today. While the bulk of the work was in supervised methods, this boom also led to greater interest in unsupervised learning methods, in particular highly fruitful new work on neural generative modelling.Kingma and Welling (2014) and later Salimans et al. (2015)
began this thread with their work on variational autoencoders (VAEs), which upgrade traditional autoencoders to generative models. Different autoencoders, trained on observations of the same system, will give very different latent spaces; these can be thought of as different parametrizations of the same manifold. Variational autoencoders are then a way to “harmonize” several different autoencoders, by enforcing the same prior (distribution over the latent variables). InGoodfellow et al. (2014), the “adversarial learning” framework was applied to the VAE task, with separate generator and discriminator networks being trained. This work was significantly extended by Arjovsky et al. (2017). The idea of establishing a correspondence between different learned representations also motivates our work. Broadly, we construct invertible transformations between different sets of observations arising from activations of different neural networks. When the construction is successful, we can state that “the networks have learned the same task” over our data set; we can then explore the generalization of these transformations beyond the training set.
The idea of a transformation between systems has been broadly utilized in the manifold learning community following the seminal paper by Singer and Coifman (2008). Berry and Sauer (2015) discuss the possibility to represent arbitrary diffeomorphisms between manifolds using such a kernel approach. Both papers employ the diffusion map framework (Coifman and Lafon, 2006). Recently, a paper on a neural network approach for such isometries was uploaded to arXiv (Peterfreund et al., 2020), as an alternative to diffusion maps (see also the works of McQueen et al. (2016); Schwartz and Talmon (2019) for approximations of isometries).
3 Transformations between artificial neural networks
Here, we introduce mathematical notation, define the problem, and then illustrate the approach through examples of increasing complexity. We also introduce the concept of using the Mahalanobis metric with Diffusion Maps to construct the transformation, and discuss its use for neural networks.
3.1 Mathematical notation and problem formulation
We start by defining two smooth functions, the tasks and , with . The input domain of both tasks is the manifold . We assume that is a compact, -dimensional Riemannian manifold, embedded in Euclidean space , , with the metric induced by the embedding. In our first example is simply and the metric is just the Euclidean metric on ; it is only later, when we want to transform between networks defined on different manifolds, that considerations of the (possibly different) metrics on these manifolds become important, since our Mahalanobis approach is based on neighborhoods of the data points. We will then write for the volume form on induced by , and write for a given, fixed measure on (the “sampling measure”), which we assume to be absolutely continuous with respect to and normalized so that . We assume that the measure has a density , , that is bounded away from zero on our compact manifold . That means sampling points through “covers the manifold well”.
Let be a finite collection of points on , sampled from the measure ; for our first example, this is just the uniform measure on . Two artificial neural networks , approximate respectively the tasks and , such that for all , We purposefully omit assumptions on how the two networks generalize for data not contained in , because we want to test these generalization properties later. We will later also discuss how to transform between networks trained on different input datasets.
We then consider the output of every individual neuron in the two networks as a real valued function on the d-dimensional input manifold. If every layer has at least neurons, the topology of the input manifold will, generically, be preserved while it is passed from layer to layer. In this case, all neurons everywhere in the network can be considered generic observables of the input manifold, and so picking any of them as coordinates is enough to create an embedding space for the input manifold.
This means that our transformation function can map from any neurons in one network to any neurons in the other network, invertibly, for data on the manifold.
As an illustration of the theoretical concept, consider a line segment . If we pick a single polynomial with the parameter chosen randomly in , it cannot be used as an embedding of , because it folds over at the point . However, if we pick three of these random polynomials as three new coordinates for , then is an embedding of into
almost surely (i.e. with probability one). The fact that this holds for arbitrary smooth manifoldsand coordinates follows from the work of Whitney (1936), with probabilistic arguments by Sauer et al. (1991). generic observations provide a guarantee; if one is lucky, one may be able to create a useful embedding with less (but of course at least !) observations.
In general, if any layer (in particular, the output) of a network has less than neurons, (or if, say, several weights become identically zero so that the output of several neurons in a layer is constant, and no longer a generic function over the input) then the topology of the input manifold may not be preserved, and the manifold will be “folded” or “collapsed”. In this case, a transformation to the other network is no longer possible when we only use observations of neurons in or after that layer. We demonstrate this in §4.1.
We will write for the map of the input of network to internal neurons that we select for our transformation, and for the mapping from the input to the output. These neurons must constitute generic observations of the input manifold; they can in principle lie at any internal (some possibly even at the input or output) layer. Figure 2 illustrates that in the network we can pick (red) neurons as the output of our map only in layers where all previous layers have at least neurons.
The problem of finding transformations between neural networks can now be formulated as follows.
Based on output data only. Construct an invertible map such that ,
Based on internal activation data. Construct an invertible map , with , such that ,
Different input data. Assume the tasks and are defined on two isometric Riemannian manifolds and (such that there exists an isometry with ). Construct the map , , such that for ,
Here, denotes the push-forward of the metric by .
3.2 Constructing the transformation
In this section, we explain how to construct the transformation between the networks in a consistent way. The crucial step in our approach is to employ spectral representations of the internal activation space of the two networks. These representations are constructed through manifold learning techniques introduced by Singer and Coifman (2008); Berry and Sauer (2015): Given a metric and its push-forward metric by a diffeomorphism , the map can be reconstructed up to a linear, orthogonal map. The reconstruction can even be done in a data-driven way, employing diffusion map embeddings (Coifman and Lafon, 2006) and a so-called “Mahalanobis distance” (Singer and Coifman, 2008; Dsilva et al., 2016).
The diffusion maps algorithm utilizing the Mahalanobis distance is described in algorithm 1. We often only have non-linear observations of points on the manifold , given by an observation function . In our case, the tasks and in particular the neural network maps provide such non-linear observations of the input manifold . Such maps usually distort the metric on , so that the original geometry is not preserved. We can still parametrize with its original geometry if we employ the following approach: In the kernel used by the Mahalanobis Diffusion Map algorithm, the similarity between points includes information about the observation functions (here, the ) that have to be invertible on their image. For ,
where is the Jacobian matrix of the inverse transformation at the point . The product , on which the entire procedure hinges, can be approximated through observations of data point neighborhoods: typically in the form of a covariance, obtained, for example, by short bursts of a stochastic dynamical system on , subsequently observed through (Singer et al., 2009). In algorithm 1, we do not specify how the neighborhoods for the computation of the covariance matrix are obtained. We do so separately for each of the computational experiments in section 4 (see also the discussion by Dietrich et al. (2020)).
The reconstruction of the eigenspace up to an orthogonal map is justified through the following argument fromMoosmüller et al. (2020): using the distance (1) is effectively using the metric on the data, turning 2015; Rosenberg, 1997). Therefore, an isometry between the base manifolds and turns into an orthogonal map in eigenfunction coordinates of the manifolds (Berry and Sauer, 2015).
Finally, to construct the transformation between neural networks using the diffusion map approach, we proceed as described in algorithm 2. Essentially, we generate embeddings of the input manifolds of the two networks using their internal neurons, and then employ algorithm 1
to construct a consistent representation for each of these embeddings. Since these representations are invariant up to an orthogonal map, we can construct our transformation after estimating this final map, which only requires a small number of “common measurements” (it is easier to parametrize linear maps than general nonlinear ones!).
4 Demonstrating transformations between neural networks
We now demonstrate the transformation concept in three separate examples. Section 4.1 illustrates a simple transformation between two networks with the same one-dimensional input, trained on the same task. Section 4.2 then broadens the scope to networks trained on different one-dimensional inputs. Section 4.3 shows how to construct a transformation using observations of internal activations of two networks defined on two-dimensional spaces. Section 4.4 describes our most intricate example, with images as input for two deep auto-encoder networks with several convolutional layers. We show how to map between internal activations of these networks and how to reconstruct the output of one network from the input of the other. Table 1 contains metrics and parameters for the computational experiments.
|Experiment||§ 4.1||§ 4.2||§ 4.3||§ 4.4|
(at 6464 resolution)
|Architecture||1-1-1, 1-8-8-8-1||1-3-1||2-5-5-5-5-2||CNN (see text)|
|: Nbhd. size||1,024||(analytic covariance)||431|
|: # Nbhds.||512||100,000||649|
Data and parameters of computational experiments. If not otherwise specified, we use the same parameters for both networks in a column; otherwise, different network hyperparameters are given on different lines. Activation functions areexcept for §4.4, as described in the text.
4.1 Transformations between output layers of simple networks
In this first example, we demonstrate the idea of transformations between neural networks in a very simple caricature: Two tasks with , are approximated by two neural networks . We choose the tasks to be identical, , and defined by The training data set for the two networks is also the same, We follow algorithm 2 to construct the map between the one-dimensional output of both networks. The local neighborhoods of points in the training domain are generated through -balls around each point, with .
Figure 3(a) shows the task as well as the evaluations of the two networks on the training data in as well as their extrapolation to all of . The networks approximate well on the training data, but extrapolate differently beyond it. This extrapolation issue is further illustrated in Fig. 3(b), where we can see that it is impossible to map from network back to network over all of , because the output of is not an invertible function of the output of over all of . However, if we use as observables the one or eight neuron(s) in the final hidden layers of the two networks (see Fig. 4(a), and architectures in Table 1), we can easily construct the transformation over all of using algorithm 2 (for the result see Fig. 4(b)). Fig. 4(c) shows the reconstruction of the selected neuron activations of Network 2 based on our transformation and our Network 1 activation observations. It is worth noting that any other activation of Network 2 neurons can be also approximated through, e.g. Geometric Harmonics as functions of the intrinsic representation; other types of approximation of these functions (e.g. neural networks or Gaussian Processes, or even just nearest neighbor interpolation) are also possible. For the purposes of Fig. 4(c), we compute through univariate, linear interpolation. Note that for Network 1 we can “get by" with a single observation and not the “guaranteed" observations: the architecture and the activation function are so simple that this single observation is one-to-one with the single input (does not “fold" over the input).
4.2 Transformation between internal neurons of networks with different inputs
Here, we demonstrate the transformation concept through a simple mapping between two networks trained on the same task, but with different, albeit still one-dimensional inputs. This is similar to experiment §4.3, but in a much lower-dimensional ambient space, where we can easily visualize all embeddings. Table 1 contains metrics and parameters for this example.
Define tasks identical to with . We use training data and , which are related by the nonlinear transformation (see figure 5). This transformation is not an isometry, but we assume that we have access to neighborhoods created by sampling in domain 1, and then transforming to domain 2 through (see figure 5).
The neural networks approximate the task well on their training data (see figure 6). To construct the transformation between their internal neurons, we sample small neighborhoods of 15 points for each data point on the domains of the two networks (inside, but also far outside the training data set, on ). The neighborhoods on the domain of the second network are created by transforming points on the domain of the first through . It is important to note that the data for our construction of the transformation between the two networks do not need to be evaluated at exactly “corresponding points", obtained from each other through ; correctly sampled neighborhoods are enough. That means we do not even need access to the map , in principle, as long as we are given correct neighborhood information around each point by any other means, e.g. through a consistent sampling procedure Singer et al. (2009); Dietrich et al. (2020); Moosmüller et al. (2020). We then compute covariance matrices in the space of some triplets of internal neurons of the networks, after evaluating the networks on each point within every data point neighborhood. We do not need to store the output values of the networks, just values of activations of three neurons anywhere across the network architecture is enough ( here, since the input manifold is one-dimensional). The covariance matrices are then used to construct the Mahalanobis distance between the data points. As we use three internal neurons for each network, the matrices are and all have rank 1. As we demonstrate in §4.2, if the inputs are isometric, the covariance matrices with respect to the inputs do not need to be generated in such a way for the approach to work: we can directly compute by automatically differentiating through the network.
An embedding of the activations data over the domain for the two networks, using diffusion maps and the Mahalanobis distance, results in an embedding of the line segment, related only by an orthogonal map (here, a flip through multiplication by , see figure 7(c)). If we do not use the Mahalanobis distance, the embeddings of the input domain into the activation spaces of the two networks generically induce different metrics; then, even though the diffusion map algorithm will result in parametrizations of the two curves, they cannot easily be mapped to each other (figure 7(b)): they are related by an arbitrary diffeomorphism, which, in general, a few corresponding points are not enough to approximate.
4.3 Toward more complicated networks: mapping between vector fields
We now demonstrate that we can also transform between networks that were trained to approximate vector fields on two-dimensional Euclidean space. We denote the input coordinates for network 1 by and for network 2 by . Figure 8(a) shows the vector fields111The vector fields used for the tasks of this experiment are defined via
4.4 Transformations between deep convolutional networks with high-dimensional input
In order to demonstrate our transformation between networks on a more realistic example, we generate two datasets of images (high-dimensional representations) with a low-dimensional intrinsic space. Namely, we rotate a 3D model horse, and render images from two camera angles (see figure 9).
For training, we additionally create augmented data by adding to each pixel uniformly at random, where , pixel values are chosen in , and we re-threshold pixels to this range after augmentation. We divide the range of object rotation into 3,600 bins, but only train on images in .
The functions and in this example are denoising tasks for images obtained from the two cameras. To approximate them, we train two convolutional autoencoders to reconstruct the input images, with the reconstruction target being the noiseless original images when noise-augmented images are the input. We minimize a pixelwise binary cross-entropy loss, to which we add the term , where and are, respectively, the batchwise minimum and maximum of the bottleneck layer activations, and and are our desired values for this range. This pinning term does not affect any of our other results, but improves legibility of the plots.
Our encoders have interleaved two-dimensional convolutional layers ( filter sizes , channel counts
The decoders have interleaved convolutional layers (with reversed filter sizes and channel counts to produce a pyramid of same-shape tensors) and unlearned nearest-neighbor resizing operations in the decoding portion to output tensors of the same shapes as the input. All activations are ReLU except for the final layer, which is sigmoid. We use AdaM with Nesterov momentum with a batch size of 128, for 220 epochs, with a learning rate of. Training generally takes less than five minutes per autoencoder on an Nvidia RTX 2080 ti GPU, with a training loss of and validation loss of .
We did not do any systematic hyperparameter search, and there are obvious improvements that we did not attempt to achieve better performance on the autoencoder tasks, as the networknetwork transformation was the actual project of interest.
In order to compute the inverse covariances for the Mahalanobis distances described in algorithm 1, for each (high-dimensional) image we evaluate the (low-dimensional) encodings for the pre-sampled images to the left and to the right in , where is taken s.t. never indexes outside the training data ( and is as given in Table 1). We then evaluate the covariance of this cloud of low-dimensional encodings, and use the pseudoinverse of this as .
To complete the example, we compute “cross-decodings” by (1) evaluating an image from the first dataset on the first encoder, (2) interpolating this to its corresponding Mahalanobis diffusion map embedding, (3) finding the nearest embedding in the other network’s diffusion map, (4) interpolating this to the second network’s encoding, and finally (5) evaluating the second network’s decoder with this encoding. The result is shown in Fig. 10. When the first image is rotated by sweeping through the training range (not shown here), the reconstructed second view rotates correspondingly. Note that the map from one embedding to the other is an orthogonal transformation (here, simply multiplication by or ), and can, in general, be computed as an unscaled SVD , using only a small set of correspondences (here, just two pairs of corresponding points are enough to determine the sign). This is what allowed us to perform Step 3 in “cross-decoding" above.
In this work we formulated and implemented algorithms that construct transformation functions between observations of different deep neural networks in an attempt to establish whether these networks embodied realizations of the same phenomenon or model. We considered observations of the activations of both output neurons and internal neurons of the networks; we assumed these are smooth (nonconstant!) functions over an input manifold. This allowed us to employ Whitney’s theorem as an upper bound on how many of these functions are needed to preserve the input manifold topology. With the transformation, we could map from the input of one network, through its activations and the transformation, to the state (activations) and output of the other network. We also explored the generalization properties of the networks and how our transformation fails—as we explore input space—when the observations cannot be mapped to each other any more. Our transformations go beyond the probabilistic correspondences typically imposed by VAEs, towards point-wise correspondences between manifolds.
Open challenges: (1) Our approach hinges on being able to observe not just a single input processed by the network, but also how a small neighborhood of this input is processed by the network. Obtaining such neighborhoods on the input manifold in a way consistent across the two networks is a nontrivial component of the work—it constitutes part of our observation process, of the way data used for the training of the two networks have been collected. Our work suggests that protocols for data collection that provide such additional information can be truly beneficial in model building (Moosmüller et al., 2020; Dietrich et al., 2020); (2) Which internal neurons to pick for the input space embedding? In principle, any neurons would be enough, but in practice, different choices vary widely in the curvature they induce on the embedding, leading to different numerical conditioning of the problem; (3) How to deal with intrinsically high-dimensional input manifolds, or widely varying sampling densities? These would necessitate large training data sets. Recent work on Local Conformal Autoencoders (Peterfreund et al., 2020) and their empirically observed good extrapolation performance may prove helpful.
6 Broader Impact
The construction of transformation functions between different neural networks (and different data-driven, possibly even physically informed, models in general) has wide-ranging implications, since it enables us to calibrate different data-driven models to each other. It also holds the promise of allowing us to improve qualitatively correct (but quantitatively inaccurate) models by calibrating them to experimental data. Starting with one model and calibrating it to another is, in effect, a form of transfer learning and domain adaptation.
It may be possible to calibrate different models to each other over a large portion of input space, yet the calibration may fail for inputs far away from the training set. Our procedure allows us to explore the way this calibration (the ability to qualitatively generalize) fails, by locating singularities in the transformation as we move away from the training set, exploring input space. Systematically exploring the nature and onset of these singularities, and what they imply about the nature of the underlying physics is, we believe, an important frontier for data-driven modeling research.
This work was partially suppored by the DARPA PAI program. F.D. would also like to thank the Department of Informatics at the Technical University of Munich for their support. It is a pleasure to acknowledge discussions with Drs. J. Bello-Rivas and J. Gimlett.
- Wasserstein Generative Adversarial Networks. International Conference on Machine Learning, pp. 10 (en). Cited by: §2.
- Time-scale separation from diffusion-mapped delay coordinates. SIAM Journal on Applied Dynamical Systems 12 (2), pp. 618–649. External Links: Cited by: 1.
- Local kernels and the geometric structure of data. Applied and Computational Harmonic Analysis 40, pp. 439–469. External Links: Cited by: §1, §2, §3.2, §3.2.
- Reduced models in chemical kinetics via nonlinear data-mining. Processes 2 (1), pp. 112–140. External Links: Cited by: item 4.
- Diffusion maps. Applied and Computational Harmonic Analysis 21 (1), pp. 5–30. External Links: Cited by: §2, §3.2.
- Manifold learning for organizing unstructured sets of process observations. Chaos: An Interdisciplinary Journal of Nonlinear Science 30 (4), pp. 043108. External Links: Cited by: §3.2, §4.2, §5.
- Parsimonious representation of nonlinear dynamical systems through manifold learning: a chemotaxis case study. Applied and Computational Harmonic Analysis 44 (3), pp. 759–773. External Links: Cited by: item 3c.
- Data-driven reduction for a class of multiscale fast-slow stochastic dynamical systems. SIAM Journal on Applied Dynamical Systems 15 (3), pp. 1327–1351. External Links: Cited by: §1, §3.2.
- Generative Adversarial Networks. Neural information processing systems (en). Note: arXiv: 1406.2661 External Links: Cited by: §2.
- A solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A 32 (5), pp. 922–923. External Links: Cited by: item 4.
- Auto-Encoding Variational Bayes. International Conference on Learning Representations. Note: arXiv: 1312.6114 External Links: Cited by: §2.
- Nearly isometric embedding by relaxation. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.), pp. 2631–2639. Cited by: §2.
- A geometric approach to the transport of discontinuous densities. arXiv:1907.08260, accepted in SIAM-UQ. External Links: Cited by: §3.2, §4.2, §5.
- LOCA: local conformal autoencoder for standardized data coordinates. arXiv 2004.07234. External Links: Cited by: §2, §5.
- The laplacian on a riemannian manifold. Cambridge University Press. External Links: Cited by: §3.2.
- Efficient variants of the ICP algorithm. Proceedings Third International Conference on 3-D Digital Imaging and Modeling. Cited by: item 4.
- Markov Chain Monte Carlo and Variational Inference: Bridging the Gap. International Conference on Machine Learning, pp. 9 (en). Cited by: §2.
- Embedology. Journal of Statistical Physics 65 (3), pp. 579–616. External Links: Cited by: §3.1.
- Intrinsic isometric manifold learning with application to localization. SIAM Journal on Imaging Sciences 12 (3), pp. 1347–1391. External Links: Cited by: §2.
Non-linear independent component analysis with diffusion maps. Applied and Computational Harmonic Analysis 25 (2), pp. 226–239. External Links: Cited by: §2, §3.2.
- Detecting intrinsic slow variables in stochastic dynamical systems by anisotropic diffusion maps. Proceedings of the National Academy of Sciences 106, pp. 16090–16095. External Links: Cited by: §1, §3.2, §4.2, 1.
- Differentiable manifolds. The Annals of Mathematics 37 (3), pp. 645. External Links: Cited by: §1, §3.1.