Flow-based Generative Models for Learning Manifold to Manifold Mappings

by   Xingjian Zhen, et al.
University of Wisconsin-Madison

Many measurements or observations in computer vision and machine learning manifest as non-Euclidean data. While recent proposals (like spherical CNN) have extended a number of deep neural network architectures to manifold-valued data, and this has often provided strong improvements in performance, the literature on generative models for manifold data is quite sparse. Partly due to this gap, there are also no modality transfer/translation models for manifold-valued data whereas numerous such methods based on generative models are available for natural images. This paper addresses this gap, motivated by a need in brain imaging – in doing so, we expand the operating range of certain generative models (as well as generative models for modality transfer) from natural images to images with manifold-valued measurements. Our main result is the design of a two-stream version of GLOW (flow-based invertible generative models) that can synthesize information of a field of one type of manifold-valued measurements given another. On the theoretical side, we introduce three kinds of invertible layers for manifold-valued data, which are not only analogous to their functionality in flow-based generative models (e.g., GLOW) but also preserve the key benefits (determinants of the Jacobian are easy to calculate). For experiments, on a large dataset from the Human Connectome Project (HCP), we show promising results where we can reliably and accurately reconstruct brain images of a field of orientation distribution functions (ODF) from diffusion tensor images (DTI), where the latter has a 5× faster acquisition time but at the expense of worse angular resolution.


page 4

page 5

page 6

page 7

page 8


ManifoldNorm: Extending normalizations on Riemannian Manifolds

Many measurements in computer vision and machine learning manifest as no...

On Need for Topology Awareness of Generative Models

Manifold assumption in learning states that: the data lie approximately ...

Universality Theorems for Generative Models

Despite the fact that generative models are extremely successful in prac...

MVC-Net: A Convolutional Neural Network Architecture for Manifold-Valued Images With Applications

Geometric deep learning has attracted significant attention in recent ye...

Learning Generative Models across Incomparable Spaces

Generative Adversarial Networks have shown remarkable success in learnin...

Forward Operator Estimation in Generative Models with Kernel Transfer Operators

Generative models which use explicit density modeling (e.g., variational...

Dilated Convolutional Neural Networks for Sequential Manifold-valued Data

Efforts are underway to study ways via which the power of deep neural ne...

Code Repositories


This is the official webpage of the Flow-based Generative Models for Learning Manifold to Manifold Mappings in AAAI 2021

view repo


Many measurements in computer vision and machine learning appear in a form that does not satisfy common Euclidean geometry assumptions. Operating on data where the data samples live in structured spaces often leads to situations where even simple operations such as distances, angles and inner products need to be redefined: while occasionally, Euclidean operations may suffice, the error progressively increases depending on the curvature of the space at hand Feragen et al. (2015). One encounters such data quite often – shapes Chang et al. (2015), surface normal directions Straub et al. (2015), graphs and trees Scarselli et al. (2008); Kipf and Welling (2016)

as well as probability distribution functions

Srivastava et al. (2007) are some common examples in vision and computer graphics Bruno et al. (2005); Huang et al. (2019a). Symmetric positive definite matrices Moakher (2005); Jayasumana et al. (2013), rotation matrices Kendall and Cipolla (2017), samples from a sphere Koppers and Merhof (2016), subspaces/Grassmannians Huang et al. (2018); Chakraborty et al. (2017)

, and a number of other algebraic objects are key ingredients in the design of efficient algorithms in computer vision and medical image analysis as well as in the development or theoretical analysis of various machine learning problems. While a mature literature on extending classical models such as principal components analysis

Dunteman (1989)

, Kalman filtering

Haykin (2004); Grewal (2011), regression Fletcher (2013) to such a manifold data regime is available, identifying ways in which deep neural network (DNN) models can be adapted to leverage and utilize the geometry of such data has only become a prominent research topic recently Bronstein et al. (2017); Chakraborty et al. (2018b); Kondor and Trivedi (2018); Huang et al. (2018); Huang and Van Gool (2017)

. This research direction has already provided convolutional neural networks for various types of manifold measurements

Masci et al. (2015a, b) as well as sequential models such as LSTM Hochreiter and Schmidhuber (1997)/GRU Cho et al. (2014) for manifold settings Jain et al. (2016); Pratiher et al. (2018); Chakraborty et al. (2018c); Zhen et al. (2019).

The results in the literature, so far, on harnessing the power of DNNs for better analysis of manifold or structured data are impressive, but most approaches are discriminative in nature. In other words, the goal is to characterize the conditional distribution

based on the predictor variables or features

, here is manifold-valued and the responses or labels are Euclidean. The technical thrust is on the design of mechanisms to specify so that it respects the geometry of the data space. In contrast, work on the generative side is very sparse, and to our knowledge, only a couple of methods for a few specific manifolds have been proposed thus far Brehmer and Cranmer (2020); Rey et al. (2019); Miolane and Holmes (2020)

. What this means is that our ability to approximate the full joint probability distribution when the data are manifold-valued remains limited. As a result, the numerous application settings where generative models have shown tremendous promise, namely, semi-supervised learning, data augmentation

Antoniou et al. (2017); Radford et al. (2015) and synthesis of new image samples by modifying a latent variable Kingma and Dhariwal (2018); Sun et al. (2019)

as well as numerous others, currently cannot be evaluated for domains with data-types that are not Euclidean vector-valued data.

GANs for Manifold data: what is challenging?

There are some reasons why generative models have sparingly been applied to manifold data. A practical consideration is that many application areas where manifold data are common, such as shape analysis and medical imaging, cannot often provide the sample sizes needed to train off-the-shelf generative models such as Generative adversarial networks (GANs)

Goodfellow et al. (2014) and Variational auto-encoders (VAEs) Kingma and Welling (2013); Doersch (2016). There are also several issues on the technical side. Consider the case where a data sample corresponds to an image where each pixel is a manifold variable (such as a covariance matrix). This means that each sample lives on a product space of the manifold of covariance matrices. In attempting to leverage state of the art methods for GANs such as Wasserstein GANs (WGANs) Arjovsky et al. (2017)

will involve, as a first step, defining appropriate generators that take uniformly distributed samples on a product space of manifolds and transforming it into “realistic” samples which are also samples on a product space of manifolds. In principle, this can be attempted via recent developments by extending spherical CNNs or other architectures for manifold data

Chakraborty et al. (2018a). Next, one would not only need to define optimal transport Fathi and Figalli (2010) or Wasserstein distances Huang et al. (2019b) in complicated spaces, but also develop new algorithms to approximate such distances (e.g., Sinkhorn iterations) to make the overall procedure computationally feasible. An interesting attempt to do so was described in Huang et al. (2019b). In that paper, Huang et al. introduced a WGAN-based generative model that can generate low-resolution low-dimension manifold-valued images. On the other hand, VAEs are mathematically more convenient in comparison for such data, and as a result, a few recent works show how they can be used for dealing with manifold-valued data Miolane and Holmes (2020). While these methods inherit VAE’s advantages such as ease of synthesis, VAEs are known to suffer from optimization challenges as well as a tendency to generate smoothed samples. It is not clear how the numerical issues, in particular, will be amplified once we move to manifold data where the core operations of calculating geodesics and distances, evaluating derivatives, and so on, must also invoke numerical optimization routines.

Contributions. Instead of GANs or VAEs, the use of flow-based generative models Rezende and Mohamed (2015); Kingma and Dhariwal (2018), will enable latent variable inference and log-likelihood evaluation. It turns out, as we will show in our development shortly, that the key components (and layers) needed in flow-based generative models with certain mathematical/procedural adjustments, extends nicely to the manifold setting. The goal of this work is to describe our theoretical developments and show promising experiments in brain imaging applications involving manifold-valued data.


This subsection briefly summarizes some differential geometric concepts/notations we will use. The reader will find a more comprehensive treatment in Boothby (1986).

Definition 1 (Riemannian manifold and metric)

Let be an orientable complete Riemannian manifold with a Riemannian metric , i.e., is a bi-linear symmetric positive definite map, where is the tangent space of at . Let be the distance induced from the Riemannian metric .

Definition 2

Let , . Define to be an open ball at of radius .

Definition 3 (Local injectivity radius Groisser (2004))

The local injectivity radius is defined as where is defined and is a diffeomorphism onto its image at . The injectivity radius Manton (2004) of is defined as .

Figure 1: Schematic description of an exemplar manifold () and the corresponding tangent space at a “pole”.

Within , where , the mapping , is called the inverse Exponential/Log map, is the dimension of . For each point , there exists an open ball for some such that , where . Thus, we can cover by an indexed (possibly infinite) cover . This set is an example of a chart on ; for an example, see Krauskopf et al. (2007) and also Fig. 1.

For notational simplicity, we will denote a chart covering by , since in general, we can use an arbitrary chart instead of an inverse Exponential map. Note that the domain for two chart maps may not necessarily be disjoint.

Given a differentiable function defined as , where and are the functions in the chart covering and respectively and for some differentiable , the Jacobian of (denoted by ) is defined as:


The reason for the peculiar notation is that the derivative cannot be defined on manifold-valued data, so is not meaningful: we use the notation to acknowledge this difference. Also note that are the same only when (1) using the global charts for space and (2) and are on the same manifold.

Definition 4 (Group of isometries of ())

A diffeomorphism is an isometry if it preserves distance, i.e., . The set forms a group with respect to function composition.

Rather than writing an isometry as a function , we will write it as a group action. Henceforth, let denote the group , and for , , let denote the result of applying the isometry to point . Similar to the terminologies in Chakraborty et al. (2018c), we will use the term “translation” to denote the group action . This is due to the distance preserving property and is inspired by the analogy from the Euclidean space.

Flow-based generative models

In this section, we will introduce flow-based generative models for manifold-valued data. We will first describe the Euclidean formulation and specify which components need to be generalized to get the manifold-valued formulation.

Flow-based generative models: Euclidean case

Flow-based generative models Rezende and Mohamed (2015); Kingma and Dhariwal (2018); Yang et al. (2019) aim to maximize the log-likelihood of the input data from an unknown distribution. The idea involves mapping the unknown distribution in the input space to a known distribution in the latent space using an invertible function, . At a high level, sampling from a known distribution is simpler, so an invertible can help draw samples from the input space distribution.

Let be i.i.d. samples drawn from an unknown distribution . Let this unknown distribution be parameterized by . In the rest of the paper, we use as a proxy for . We learn over a dataset . We maximize the likelihood of the model given the dataset by minimizing the equivalent formulation of negative log-likelihood as:


But to minimize the above expression, we need to know . One way to bypass this problem is to learn a mapping from a known distribution in the latent space. Let the latent space be . Then, the generative step is given by . Here

can be a Gaussian distribution


Let be the inverse of . For normalizing flow Rezende and Mohamed (2015), the is composed of a sequence of invertible functions . Hence, we have

Using and , the log-likelihood of is x


In Kingma and Dhariwal (2018), the GLOW model is composed of three different layers whose Jacobian is a triangular matrix, simplifying the log-determinant:

Figure 2: The basic block of GLOW Kingma and Dhariwal (2018)

. The color represents the mean while the shape represents the standard deviation. The target distribution on the latent space is the “

Grape” rounded rectangles. Actnorm normalizes the data to almost “Grape” rounded rectangles, while the disturbance part is “Lavender”. convolution organizes the channels. Affine Coupling operates on half of the channels to fit the target distribution. The “” here is the element-wise multiplication, while “” is the matrix multiplication. “” are of the normal definition.

The three layers in the basic GLOW block (shown in Fig. 2), summarized in Table 1 are all invertible functions. These are (a) Actnorm, (b) Invertible convolution, and (c) Affine Coupling Layers. Note that the data is squeezed before it is fed into the block. Then, the data is split as in Dinh et al. (2016).

(a) Actnorm. normalizes the input to be a zero-mean and identity standard deviation. In (6), are initialized from the data and then trained independently.

(b) convolution.

applies the invertible matrix

on the channel dimension. In (7), and where is the resolution of the input variables while is the number of channels.

(c) Affine Coupling. uses the idea of split+concatenation. In (8), the input variable is split along the channel to , and then are concatenated to get the final output . Here, (and ) are real-valued matrices of the same dimension as for element-wise scaling (and translation).

Figure 3: Transfer from the source manifold to the target manifold with the generative model. The detail of blocks of our model. The meanings of colors and shapes are the same as Fig. 2, while all variables lie on the manifold instead of Euclidean space. The major difference between our manifold-valued GLOW and the original GLOW model is we use a tangent space transformation before and after every operator. Different from Fig. 2, there is no element-wise multiplication. The “” here is the group operation on the manifold-valued data. The “” is also the matrix multiplication in the tangent space.
Actnorm convolution Affine Coupling
Table 1: Definition of Actnorm, convolution and Affine Coupling layers in basic GLOW block. is the elementwise multiplication.The function NN() is a nonlinear mapping.

In Kingma and Dhariwal (2018), authors use a closed form for the inverse of these layers. Notice that calculating the determinant of the Jacobian is simple for all these layers except the affine coupling layer in (8) (Table 1). Since , the Jacobian determinant is .


Next Steps: With the description above, we can now list the key operational components in (6)-(Flow-based generative models: Euclidean case), which we need to modify for our manifold-valued extension.

Key ingredients: In (6) and (8), the operators are (i) elementwise multiplication for and (ii) the addition of bias for . (iii) In (7), we require invertible matrices. (iv) Finally, to compute the log-likelihood, we need the calculation of derivative in (Flow-based generative models: Euclidean case). Thus we can verify that the key ingredients to define the model in GLOW are (i) elementwise multiplication; (ii) addition of bias; (iii) invertible matrix; (iv) derivative calculation. In theory, if we can modify those components from Euclidean space to manifolds, we will obtain a flow-based generative model on a Riemannian manifold. Observe that (i) and (iii) are matrix multiplications, which are non-trivial to define on a manifold. In Def. 3, we can use the chart map to map the manifold to a subspace of where a matrix multiplication can be used. This also provides a way to solve item (iv) based on the chart map. In (1), we show how to compute the Jacobian of a differentiable function from one manifold to another, respecting to the charts of the manifolds. For the item (ii), adding a bias can be viewed as a “translation” in the Euclidean space, while in Def. 4 we define the translation on manifold-valued data using the group action. With these in hand, we are ready to present our proposed manifold version of these layers next.

Flow-based models: Riemannian manifold case

Actnorm convolution Affine Coupling
Table 2: Definition of Actnorm, convolution and Affine Coupling layers in our ManifoldGLOW block, with forward function on the top and reverse function in the bottom. Here and are the Chart Map and its inverse. is a diagonal matrix, so can be computed elementwise. represents the inverse of the group action. The is chosen as the rotation matrix. Thus, .

We will now introduce the manifold counterpart of the key operations. See Table 2 for a summary of functions.

(a) Actnorm. Let be the spatial resolution and be the channel size, . We modify (6) to manifold-valued data using the operators we mentioned above in Key ingredients. The bias term is replaced by the group operators while the multiplication is replaced by the diagonal matrix of size in the space after chart mapping . The layer function is defined as in (10).

Determinant of the Jacobian can be computed as shown below in (16). In general, can be a tuple, i.e., for 3D data, it is a 3 dimensional tuple.


(b) convolution. We define a convolution to offer the flexibility of interaction between channels. Here is a matrix applied after chart mapping . In general, we can learn any , i.e., a full rank matrix like in (7). But in practice, maintaining full rank is a hard constraint and may become unbounded. As a regularization, we choose to be a rotation matrix. This layer function is defined as in (11) using the same notation as in (7).

Determinant of the Jacobian can be computed as shown below in (17). Notice that for to be a rotation matrix, the contribution from is .


(c) Affine Coupling. For manifold-valued data, given (where and are spatial and channel resolutions), we first split the data along the channel dimension, i.e., partition into two parts denoted by and , where . From (8), we need to modify the scaling and translation. Here, and . These two operators play the same roles as in (8), scaling and translation. We need to be full rank. If needed, one may use constraints like orthogonality or bounded matrix for numerical stability. After performing the coupling, we simply combine and to get as our output. This function is defined in (12).

Determinant of the Jacobian can be computed as: Similar to (Flow-based generative models: Euclidean case), observe that involves taking the gradient of a neural network! But fortunately, we only require the determinant of the Jacobian matrix, and the independence of on saves the calculation of since . Thus, given , the Jacobian determinant is given as


Distribution on the latent space: After the cascaded functional transformations described above, we transform to the latent space . We define a Gaussian distribution on , namely , by inducing a multi-variate Gaussian distribution from as


where and (SPD denotes a symmetric positive definite matrix).

Learning mappings between manifolds

We can now ask the question: can we draw manifold-valued data conditioned on another manifold-valued sample? Due to the nature of the invertibility of our generative model, this seems to be possible since all we need to develop, in addition to what has been covered, is a scheme to sample data from Euclidean space conditioned on a vector-valued input.

Recently, extensions of the GLOW model (in a Euclidean setup) have been used to generate samples from space conditioned on space , see Sun et al. (2019). In this section, we roughly follow Sun et al. (2019) by using connections in a latent space but in a manifold setting to generate a sample from a manifold , conditioned on a sample on manifold . The underlying assumption is that there exists a (smooth) function from to . The generation step are as follows.

  1. Given variables and with the dimension of the manifolds and to be and respectively, we use the two parallel GLOW models (as discussed above) to get the corresponding latent space. Let it be denoted by and respectively.

  2. After getting the respective latent spaces, we need to fit a distribution on it. Since we wish to generate samples from , the distribution on the respective latent space must be induced from the variables in , i.e., the latent space for . We do not have any constraint on the distribution parameters for , so, we use a Gaussian distribution with a fixed and on . The parameters for the Gaussian distribution on are defined as functions of . Formally, we define using (19), where, and . Here, the two functions and are modeled using a neural network. The scheme is shown in Fig. 3.

Specific examples of manifolds. Finally, in order to implement (10), (11) and (12) mentioned in the previous sections, basic operations specific to a manifold are (a) the choice of distance, , (b) the isometry group, , (c) the chart map and its inverse, . We use three types of non-Euclidean Riemannian manifolds in the experiments presented in this work (including the supplement section), they are (a) hypersphere, (b) space of positive real numbers, (c) space of symmetric positive definite matrices (). We give the explicit formulation for the operations in Table 3.

Table 3: The explicit formulation for the basic operations. Here is an anchor point for chart map, which can be one of the poles. , and is the group of special orthogonal matrices. . Chol is the Cholesky decomposition.


We demonstrate the experimental results of our model using two setups. First, we generate texture images based on the local covariances, which serves as a sanity check evaluation relative to another generative model for manifold-valued data available at this time. The second experiment, which is our main scientific focus, generates orientation distribution function (ODF) images Hess et al. (2006) using diffusion tensor imaging (DTI) Basser et al. (1994); Alexander et al. (2007). Note that, in this setting we construct the DTI scans from under-sampled diffusion directions. This makes the generation of ODF conditioned on the DTI scans challenging and motivated us to tackle this problem using our proposed framework.

Baseline. Very recently, the -flow Brehmer and Cranmer (2020) was introduced, which provides a generative model for manifold-valued data.

-flow uses an encoder to encode the manifold-valued data in the high-dimensional space into a low-dimensional Euclidean space. During generation, the model will generate the low-dimensional Euclidean data and warp it back to the manifold in the high-dimensional space. The benefit of this method is that it can learn the dimension of the unknown manifold, including natural images like ImageNet

Deng et al. (2009). But for a known Riemannian manifold, the dimension of the manifold is fixed. For example, is of dimension , while is of dimension . Thus, for a known Riemannian manifold, -flow learns the chart using an encoder neural network and applies all the operations in the learned space with (known) dimension . Another interesting recent proposal, manifoldWGAN, Huang et al. (2019b) showed that it is possible to generate resolution matrices using WGAN. Due to the involved calculations needed by WGAN, extending it into high-dimension manifold-valued data including ODF () will require non-trivial changes. Further, manifoldWGAN in its current form does not deal with conditioning the generated images based on another manifold-valued data but is an interesting future direction to explore.

Now, we present experiments for generating texture images before moving to the more challenging ODF generation task.

Generating texture images

The earth texture images dataset was introduced in Yu et al. (2019). The train (and test) set have (and ) images. All images are augmented by random transformations and cropping to size . Our goal here is to generate texture images based on the local covariances of the three (R, G, B) channels. So the two manifolds are (for covariance matrix) and (for texture images). Since -flow can only take the Euclidean data as the “conditioning variable”, we vectorize the local covariances as the condition variable for -flow. The dimension of the learned space for -flow is chosen as (default configuration from StyleGAN Karras et al. (2020)). For our case, we build two parallel manifold-GLOW with blocks on each side. After every blocks, the spatial resolution is reduced to half. In the latent space, we train a residual network with residual blocks to map the distribution of the to . Example results are shown in Fig. 4. Even in this simple setting, due to the encoder in the -flow, the generated images lose sharpness. Our model uses the information of the local covariances to generate superior texture images.

Figure 4: Generated images from (a) -flow, (b) ours, and (c) the ground truth. The condition is the local covariances of the RGB channels.

Main focus: Diffusion MRI dataset

Our main focus is the conditional synthesis of structural brain image data. Diffusion-weighted magnetic resonance imaging (dMRI) is an MR imaging modality which measures the diffusion of water molecules at a voxel, and is used to understand brain structural connectivity. Diffusion tensor imaging (DTI), a type of dMRI Basser et al. (1994); Alexander et al. (2007), measures the restricted diffusion of water along only three canonical directions at each voxel. The measurement at each voxel is a symmetric positive definite (SPD) matrix (i.e., manifold-valued data). If multi-shell acquisition capabilities are available, we can obtain a richer acquisition; here, each voxel is an orientation distribution function (ODF) Hess et al. (2006) which describes the diffusivity in multiple directions (less lossy compared to DTI). By symmetrically/equally sampling points on the continuous distribution function Garyfallidis et al. (2014), each measurement is a -D vector (non-negative entries; sum to ). Using the square root parameterization Brody and Hughston (1998); Srivastava et al. (2007), the data at each voxel lies on the positive part of manifold.

We seek to generate a 3D brain image where each voxel is a ODF from the corresponding DTI image (each voxel is a SPD matrix). To make the setup more challenging (and scientifically interesting), we generate the DTI images only from randomly under-sampled diffusion directions. We now explain the (a) rationale for the application (b) data description (c) model setup (d) evaluations . Note that in the experiment, since we draw samples from the distribution on the latent space, conditioned on DTI, to get the target representation, we call it generation rather than reconstruction.

Why generating ODF from DTI is important? For dMRI, different types of acquisitions involve longer/shorter acquisition times. Higher spatial resolution images (e.g., ODF) involves a longer acquisition time ( mins per scan versus mins for an ODF multi-shell scan) and this is problematic, especially for children and the elderly. To shorten the acquisition time with minimal compromise in the image quality, we require mechanisms to transform data acquired from shorter acquisitions (DTI) to a higher spatial resolution image: a field (or image) of ODFs. This serves as our main motivation.

However, (a)

the per voxel degrees of freedom for ODF representation is

(lies on ) while for DTI is (lies on ). Hence, it is an ill-posed problem. (b) requires mathematical tools to “transform” from one manifold (DTI representation) to another (ODF representation) while preserving structural information . Now, we describe some details of the data, models and present the results.

Dataset Age Gender
22-25 26-30 31-35 36+ Female Male
All 224 467 364 10 575(54.0%) 490(46.0%)
Train 178 370 295 9 463(54.3%) 389(45.7%)
Test 46 97 69 1 112(52.6%) 101(47.4%)
Table 4: The demographics used in the study.

Dataset. The dataset for our method is the Human Connectome Project (HCP) Van Essen et al. (2013). The total number of subjects with diffusion measurements available is : were used as training and as the test set. Demographic details are reported in Table 4 (please see Van Essen et al. (2013) for more details of the dataset). All raw dMRI images are pre-processed with the HCP diffusion pipeline with FSL’s ‘eddy’ Andersson and Sotiropoulos (2016). After correction, ODF and DTI pairs were obtained using the Diffusion Imaging in Python (DIPY) toolbox Garyfallidis et al. (2014). Due to the memory requirements of the model and 3D nature of medical data, generation of an ODF image of the entire brain at once remains out of reach at this point, hence we resize the original data into but the process can proceed in a sliding window fashion as well.

Figure 5: The transformation from DTI to ODF. Both are generated from dMRI. But there might not be dMRI available in some situations. Thus, we want to train the network to transfer DTI to ODF. The latent space is the Gaussian distribution variable.
Figure 6: (a) Generated ODF from corresponding DTI. Each pair here contains the input DTI (top) and the generated ODF (bottom) (b) The distribution of reconstruction error over the testing population. (c) Reconstruction error over the test population shows that the generated image is closest to its own corresponding image (diagonal dominance))

Reduction in the memory costs. Since the entire 3D models for brain images are still too large to fit into the GPU memory, we need to further simplify the model without sacrificing the performance too much. Recently, NanoFlow Lee et al. (2020)

was introduced to reduce the number of parameters for sequential data processing. The assumption of NanoFlow is that the Affine Coupling layer, if fully trained, can estimate the distribution for any parts of the input data in a fixed order. There will be some performance drop compared with training different Affine Coupling layers for different parts of the data. But the gain from reducing the parameters is significant. Thus, in our setup, due to the large 3D input, we apply the NanoFlow trick for DTI and ODF separately. For example, for the DTI data, we first split the entire data into

slices called . Then we can share the two neural networks and in the Affine Coupling layer among these slices. The input of two neural networks and in Affine Coupling layer would be

, while the output will be the estimated mean and variance of

respectively. Due to sharing weights, the number of parameters reduces and becomes feasible for training our 3D DTI and ODF setups.

Model Setup. In order to set up our model, we first build two flow-based streams for DTI and ODF separately. Then, in the latent space, we train a transformation operating between the Gaussian distribution variable on the manifold and the Gaussian distribution variable on the manifold . This architecture with two flow-based models and the transformation module can be jointly trained as shown in Fig. 5. We use basic blocks of our manifold GLOW, and after every blocks, reduce the resolution by half. This setup is the same for both DTI and ODF. We use residual network blocks to map the latent space from DTI to ODF. The samples are presented to the model in paired form, i.e., a DTI image (field of SPD matrices) and a corresponding ODF image (a field of ODFs). To reduce the number of parameters for this 3D data, we use a similar idea as NanoFlow Lee et al. (2020) that shares the Affine Coupling layer for DTI and ODF separately, with setting . As a comparison, for the baseline model -flow, the learned dimension will be where for DTI and for ODF. While -flow could be trained for our texture experiments, here, the memory requirements are quite large, quantitatively the number of parameters required for -flow and our model are and respectively. A similar situation arises in the Euclidean space version of GLOW which also does not leverage the intrinsic Riemmanian metric: therefore, the memory cost will be more than the natural images which have dimension . This is infeasible even on clusters and therefore, results from these baselines are very difficult to obtain.

Choice of metrics. We will use “reconstruction error” using the distance in Table 3. Although the task here is generation, measuring reconstruction error assesses how “similar” the original ODF is to the generated ODF, generated directly from the corresponding DTI representation. We also perform a group difference analysis to identify statistically different regions across groups (grouped by a dichotomous variable). Since HCP only includes healthy subjects (HCP aging is smaller), we can perform a group difference test based on gender, i.e., male vs. female. We evaluate overlap: how/whether group-wise different regions on the generated/reconstructed data agrees with those on the actual ODF images.

Figure 7: The -value of one of the ROIs of the entire brain scan with full-resolution. We show that our proposed method can generate meaningful ODF with respect to the group level differences.

Generation results. We present quantitative and qualitative results for generation of ODF from its DTI representation. In Fig. 6(a), we show a few example slices from the given DTI and the generated ODF. Overall, the reconstruction error was . Since perceptually comparing fidelity between generated and ground truth images is difficult, we perform the following quantitative analysis: (a) a histogram of the reconstruction error over all test subjects (shown in Fig. 6(b)) (b) an error matrix showing how similar the generated ODF image is with the other “incorrect” samples of the population. The goal is to assess if the generated ODF is distinctive across different samples (shown in Fig. 6(c)) . From the histogram presented in Fig. 6(b), we can see that the reconstruction error is consistently low over the entire test population. Now, we generate Fig. 6(c) as follows. For each subject in the test population, we randomly select samples (subjects) from the population and compute the reconstruction error with the generated ODF. This gives us a

matrix (similar to the confusion matrix). Fig.

6(c) shows the average of runs: lighter shades mean a larger reconstruction error. So, we should ideally see a dark diagonal, which is approximately depicted in the plot. This suggests that for the test population, the generation is meaningful (preserves structures) and distinctive (maintains variability across subjects). There are only few experiments described in the literature on generation of dMRI data Huang et al. (2019b); Anctil-Robitaille et al. (2020). While Huang et al. (2019b) shows the ability to generate 2D () DTI, the techniques described here can operate on 3D ODF () data and should offer improvements.

Group difference analysis. We now quantitatively measure if the reconstruction is good enough so that the generated samples can be a good proxy for downstream statistical analysis and yield improvements over the same analysis performed on DTI. We run permutation testing with independent runs and compute the per-voxel -value to see which voxels were statistically different between the groups for the following settings (a) original ODF (b) generated ODF (c) DTI (d) functional anisotropy (FA) representation (commonly used summary of DTI). Both DTI and FA are commonly used for assessing statistically significant differences across genders Menzler et al. (2011); Kanaan et al. (2012). But since ODF contains more structural information than either the FA or DTI, our generated ODF should be able to pick up more statistically significant regions over DTI or FA. We evaluate the intersection of significant regions with the original ODF (the original ODF contains the most information). We compute the intersection over union (IoU) measure. For the whole brain, FA will have IoU 0.04, while DTI has IoU 0.16. The generated ODF has IoU 0.22. We see that the generated ODF has a larger intersection in the statistically significant regions with the original ODF and offers improvements over DTI. This provides some evidence that the generated ODF preserves the signal that is different across the male/female groups. We also show a zoomed in example of a ROI for the full-resolution images in Fig. 7. The -values for different ROIs are all in both the original ODF and our generated ODF, indicating consistency of our results, at least in terms of regions identified in downstream statistical analysis. Note that the analysis on the real ODF images serves as the ground truth.


A number of deep neural network formulations have been extended to manifold-valued data in the last two years. While most of these developments are based on models such as CNNs or RNNs, in this work, we study the generative regime: we introduce a flow-based generative model on the Riemannian manifold. We show that the three types of layers, Actnorm, Invertible convolution, and Affine Coupling layers in such models, can be generalized/ adapted for manifold-valued data in a way that preserves invertibility. We also show that with the transformation in the latent space between the two manifolds, we can generate manifold-valued data based on the information from another manifold. We demonstrate good generation results in the representation of ODF given DTI on the Human Connectome dataset. While the current formulation shows mathematical feasibility and promising results, additional work on the methodological and the implementation side is needed to reduce the runtime to a level where the tools can be deployed in scientific labs.


This research was supported in part by grant 1RF1AG059312-01A1 and NSF CAREER RI #1252725.


  • A. L. Alexander, J. E. Lee, M. Lazar, and A. S. Field (2007) Diffusion tensor imaging of the brain. Neurotherapeutics 4 (3), pp. 316–329. Cited by: Main focus: Diffusion MRI dataset, Experiments.
  • B. Anctil-Robitaille, C. Desrosiers, and H. Lombaert (2020) Manifold-aware cyclegan for high resolution structural-to-dti synthesis. arXiv preprint arXiv:2004.00173. Cited by: Main focus: Diffusion MRI dataset.
  • J. L. Andersson and S. N. Sotiropoulos (2016) An integrated approach to correction for off-resonance effects and subject movement in diffusion mr imaging. Neuroimage 125, pp. 1063–1078. Cited by: Main focus: Diffusion MRI dataset.
  • A. Antoniou, A. Storkey, and H. Edwards (2017) Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340. Cited by: Introduction.
  • M. Arjovsky, S. Chintala, and L. Bottou (2017) Wasserstein generative adversarial networks. In International conference on machine learning, pp. 214–223. Cited by: Introduction.
  • P. J. Basser, J. Mattiello, and D. LeBihan (1994) MR diffusion tensor spectroscopy and imaging. Biophysical journal 66 (1), pp. 259–267. Cited by: Main focus: Diffusion MRI dataset, Experiments.
  • W. M. Boothby (1986) An introduction to differentiable manifolds and riemannian geometry. Vol. 120, Academic press. Cited by: Preliminaries.
  • J. Brehmer and K. Cranmer (2020) Flows for simultaneous manifold learning and density estimation. arXiv preprint arXiv:2003.13913. Cited by: Introduction, Experiments.
  • D. C. Brody and L. P. Hughston (1998) Statistical geometry in quantum mechanics. Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences 454 (1977), pp. 2445–2475. Cited by: Main focus: Diffusion MRI dataset.
  • M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst (2017)

    Geometric deep learning: going beyond euclidean data

    IEEE Signal Processing Magazine 34 (4), pp. 18–42. Cited by: Introduction.
  • R. Bruno, M. Conti, and E. Gregori (2005) Mesh networks: commodity multihop ad hoc networks. IEEE communications magazine 43 (3), pp. 123–131. Cited by: Introduction.
  • R. Chakraborty, M. Banerjee, and B. C. Vemuri (2018a) A cnn for homogneous riemannian manifolds with applications to neuroimaging. arXiv preprint arXiv:1805.05487. Cited by: Introduction.
  • R. Chakraborty, J. Bouza, J. Manton, and B. C. Vemuri (2018b) Manifoldnet: a deep network framework for manifold-valued data. arXiv preprint arXiv:1809.06211. Cited by: Introduction.
  • R. Chakraborty, S. Hauberg, and B. C. Vemuri (2017) Intrinsic grassmann averages for online linear and robust subspace learning. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    pp. 6196–6204. Cited by: Introduction.
  • R. Chakraborty, C. Yang, X. Zhen, M. Banerjee, D. Archer, D. Vaillancourt, V. Singh, and B. Vemuri (2018c) A statistical recurrent model on the manifold of symmetric positive definite matrices. In Advances in Neural Information Processing Systems, pp. 8883–8894. Cited by: Introduction, Preliminaries.
  • A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, et al. (2015) Shapenet: an information-rich 3d model repository. arXiv preprint arXiv:1512.03012. Cited by: Introduction.
  • K. Cho, B. Van Merriënboer, D. Bahdanau, and Y. Bengio (2014)

    On the properties of neural machine translation: encoder-decoder approaches

    arXiv preprint arXiv:1409.1259. Cited by: Introduction.
  • J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Cited by: Experiments.
  • L. Dinh, J. Sohl-Dickstein, and S. Bengio (2016) Density estimation using real nvp. arXiv preprint arXiv:1605.08803. Cited by: Flow-based generative models: Euclidean case.
  • C. Doersch (2016)

    Tutorial on variational autoencoders

    arXiv preprint arXiv:1606.05908. Cited by: Introduction.
  • G. H. Dunteman (1989) Principal components analysis. Sage. Cited by: Introduction.
  • A. Fathi and A. Figalli (2010) Optimal transportation on non-compact manifolds. Israel Journal of Mathematics 175 (1), pp. 1–59. Cited by: Introduction.
  • A. Feragen, F. Lauze, and S. Hauberg (2015) Geodesic exponential kernels: when curvature and linearity conflict. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3032–3042. Cited by: Introduction.
  • P. T. Fletcher (2013) Geodesic regression and the theory of least squares on riemannian manifolds. International journal of computer vision 105 (2), pp. 171–185. Cited by: Introduction.
  • E. Garyfallidis, M. Brett, B. Amirbekian, A. Rokem, S. Van Der Walt, M. Descoteaux, and I. Nimmo-Smith (2014) Dipy, a library for the analysis of diffusion mri data. Frontiers in neuroinformatics 8, pp. 8. Cited by: Main focus: Diffusion MRI dataset, Main focus: Diffusion MRI dataset.
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: Introduction.
  • M. S. Grewal (2011) Kalman filtering. Springer. Cited by: Introduction.
  • D. Groisser (2004) Newton’s method, zeroes of vector fields, and the riemannian center of mass. Advances in Applied Mathematics 33 (1), pp. 95–135. Cited by: Definition 3.
  • S. Haykin (2004) Kalman filtering and neural networks. Vol. 47, John Wiley & Sons. Cited by: Introduction.
  • C. P. Hess, P. Mukherjee, E. T. Han, D. Xu, and D. B. Vigneron (2006) Q-ball reconstruction of multimodal fiber orientations using the spherical harmonic basis. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 56 (1), pp. 104–117. Cited by: Main focus: Diffusion MRI dataset, Experiments.
  • S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: Introduction.
  • R. Huang, M. Rakotosaona, P. Achlioptas, L. Guibas, and M. Ovsjanikov (2019a) OperatorNet: recovering 3d shapes from difference operators. arXiv preprint arXiv:1904.10754. Cited by: Introduction.
  • Z. Huang and L. Van Gool (2017) A riemannian network for spd matrix learning. In

    Thirty-First AAAI Conference on Artificial Intelligence

    Cited by: Introduction.
  • Z. Huang, J. Wu, and L. Van Gool (2018) Building deep networks on grassmann manifolds. In Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: Introduction.
  • Z. Huang, J. Wu, and L. Van Gool (2019b) Manifold-valued image generation with wasserstein generative adversarial nets. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 3886–3893. Cited by: Introduction, Main focus: Diffusion MRI dataset, Experiments.
  • A. Jain, A. R. Zamir, S. Savarese, and A. Saxena (2016) Structural-rnn: deep learning on spatio-temporal graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5308–5317. Cited by: Introduction.
  • S. Jayasumana, R. Hartley, M. Salzmann, H. Li, and M. Harandi (2013) Kernel methods on the riemannian manifold of symmetric positive definite matrices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 73–80. Cited by: Introduction.
  • R. A. Kanaan, M. Allin, M. Picchioni, G. J. Barker, E. Daly, S. S. Shergill, J. Woolley, and P. K. McGuire (2012) Gender differences in white matter microstructure. PloS one 7 (6). Cited by: Main focus: Diffusion MRI dataset.
  • T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila (2020) Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8110–8119. Cited by: Generating texture images.
  • A. Kendall and R. Cipolla (2017)

    Geometric loss functions for camera pose regression with deep learning

    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5974–5983. Cited by: Introduction.
  • D. P. Kingma and M. Welling (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: Introduction.
  • D. P. Kingma and P. Dhariwal (2018) Glow: generative flow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems, pp. 10215–10224. Cited by: Introduction, Introduction, Figure 2, Flow-based generative models: Euclidean case, Flow-based generative models: Euclidean case, Flow-based generative models: Euclidean case.
  • T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: Introduction.
  • R. Kondor and S. Trivedi (2018) On the generalization of equivariance and convolution in neural networks to the action of compact groups. arXiv preprint arXiv:1802.03690. Cited by: Introduction.
  • S. Koppers and D. Merhof (2016) Direct estimation of fiber orientations using deep learning in diffusion imaging. In International Workshop on Machine Learning in Medical Imaging, pp. 53–60. Cited by: Introduction.
  • B. Krauskopf, H. M. Osinga, and J. Galán-Vioque (2007) Numerical continuation methods for dynamical systems. Springer. Cited by: Preliminaries.
  • S. Lee, S. Kim, and S. Yoon (2020) NanoFlow: scalable normalizing flows with sublinear parameter complexity. arXiv preprint arXiv:2006.06280. Cited by: Main focus: Diffusion MRI dataset, Main focus: Diffusion MRI dataset.
  • J. H. Manton (2004) A globally convergent numerical algorithm for computing the centre of mass on compact lie groups. In ICARCV 2004 8th Control, Automation, Robotics and Vision Conference, 2004., Vol. 3, pp. 2211–2216. Cited by: Definition 3.
  • J. Masci, D. Boscaini, M. Bronstein, and P. Vandergheynst (2015a) Geodesic convolutional neural networks on riemannian manifolds. In Proceedings of the IEEE international conference on computer vision workshops, pp. 37–45. Cited by: Introduction.
  • J. Masci, D. Boscaini, M. Bronstein, and P. Vandergheynst (2015b) Shapenet: convolutional neural networks on non-euclidean manifolds. Technical report Cited by: Introduction.
  • K. Menzler, M. Belke, E. Wehrmann, K. Krakow, U. Lengler, A. Jansen, H. Hamer, W. H. Oertel, F. Rosenow, and S. Knake (2011) Men and women are different: diffusion tensor imaging reveals sexual dimorphism in the microstructure of the thalamus, corpus callosum and cingulum. Neuroimage 54 (4), pp. 2557–2562. Cited by: Main focus: Diffusion MRI dataset.
  • N. Miolane and S. Holmes (2020) Learning weighted submanifolds with variational autoencoders and riemannian variational autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14503–14511. Cited by: Introduction, Introduction.
  • M. Moakher (2005)

    A differential geometric approach to the geometric mean of symmetric positive-definite matrices

    SIAM Journal on Matrix Analysis and Applications 26 (3), pp. 735–747. Cited by: Introduction.
  • S. Pratiher, S. Chattoraj, S. Agarwal, and S. Bhattacharya (2018) Grading tumor malignancy via deep bidirectional lstm on graph manifold encoded histopathological image. In 2018 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 674–681. Cited by: Introduction.
  • A. Radford, L. Metz, and S. Chintala (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. Cited by: Introduction.
  • L. A. P. Rey, V. Menkovski, and J. W. Portegies (2019) Diffusion variational autoencoders. arXiv preprint arXiv:1901.08991. Cited by: Introduction.
  • D. J. Rezende and S. Mohamed (2015) Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770. Cited by: Introduction, Flow-based generative models: Euclidean case, Flow-based generative models: Euclidean case.
  • F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini (2008) The graph neural network model. IEEE Transactions on Neural Networks 20 (1), pp. 61–80. Cited by: Introduction.
  • A. Srivastava, I. Jermyn, and S. Joshi (2007)

    Riemannian analysis of probability density functions with applications in vision

    In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. Cited by: Introduction, Main focus: Diffusion MRI dataset.
  • J. Straub, J. Chang, O. Freifeld, and J. Fisher III (2015) A dirichlet process mixture model for spherical data. In Artificial Intelligence and Statistics, pp. 930–938. Cited by: Introduction.
  • H. Sun, R. Mehta, H. H. Zhou, Z. Huang, S. C. Johnson, V. Prabhakaran, and V. Singh (2019) DUAL-glow: conditional flow-based generative model for modality transfer. In Proceedings of the IEEE International Conference on Computer Vision, pp. 10611–10620. Cited by: Introduction, Learning mappings between manifolds.
  • D. C. Van Essen, S. M. Smith, D. M. Barch, T. E. Behrens, E. Yacoub, K. Ugurbil, W. H. Consortium, et al. (2013) The wu-minn human connectome project: an overview. Neuroimage 80, pp. 62–79. Cited by: Main focus: Diffusion MRI dataset.
  • G. Yang, X. Huang, Z. Hao, M. Liu, S. Belongie, and B. Hariharan (2019) Pointflow: 3d point cloud generation with continuous normalizing flows. In Proceedings of the IEEE International Conference on Computer Vision, pp. 4541–4550. Cited by: Flow-based generative models: Euclidean case.
  • N. Yu, C. Barnes, E. Shechtman, S. Amirghodsi, and M. Lukac (2019)

    Texture mixer: a network for controllable synthesis and interpolation of texture

    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12164–12173. Cited by: Generating texture images.
  • X. Zhen, R. Chakraborty, N. Vogt, B. B. Bendlin, and V. Singh (2019) Dilated convolutional neural networks for sequential manifold-valued data. In Proceedings of the IEEE International Conference on Computer Vision, pp. 10621–10631. Cited by: Introduction.