Dual_Manifold_GLOW
This is the official webpage of the Flowbased Generative Models for Learning Manifold to Manifold Mappings in AAAI 2021
view repo
Many measurements or observations in computer vision and machine learning manifest as nonEuclidean data. While recent proposals (like spherical CNN) have extended a number of deep neural network architectures to manifoldvalued data, and this has often provided strong improvements in performance, the literature on generative models for manifold data is quite sparse. Partly due to this gap, there are also no modality transfer/translation models for manifoldvalued data whereas numerous such methods based on generative models are available for natural images. This paper addresses this gap, motivated by a need in brain imaging – in doing so, we expand the operating range of certain generative models (as well as generative models for modality transfer) from natural images to images with manifoldvalued measurements. Our main result is the design of a twostream version of GLOW (flowbased invertible generative models) that can synthesize information of a field of one type of manifoldvalued measurements given another. On the theoretical side, we introduce three kinds of invertible layers for manifoldvalued data, which are not only analogous to their functionality in flowbased generative models (e.g., GLOW) but also preserve the key benefits (determinants of the Jacobian are easy to calculate). For experiments, on a large dataset from the Human Connectome Project (HCP), we show promising results where we can reliably and accurately reconstruct brain images of a field of orientation distribution functions (ODF) from diffusion tensor images (DTI), where the latter has a 5× faster acquisition time but at the expense of worse angular resolution.
READ FULL TEXT VIEW PDFThis is the official webpage of the Flowbased Generative Models for Learning Manifold to Manifold Mappings in AAAI 2021
Many measurements in computer vision and machine learning appear in a form that does not satisfy common Euclidean geometry assumptions. Operating on data where the data samples live in structured spaces often leads to situations where even simple operations such as distances, angles and inner products need to be redefined: while occasionally, Euclidean operations may suffice, the error progressively increases depending on the curvature of the space at hand Feragen et al. (2015). One encounters such data quite often – shapes Chang et al. (2015), surface normal directions Straub et al. (2015), graphs and trees Scarselli et al. (2008); Kipf and Welling (2016)
as well as probability distribution functions
Srivastava et al. (2007) are some common examples in vision and computer graphics Bruno et al. (2005); Huang et al. (2019a). Symmetric positive definite matrices Moakher (2005); Jayasumana et al. (2013), rotation matrices Kendall and Cipolla (2017), samples from a sphere Koppers and Merhof (2016), subspaces/Grassmannians Huang et al. (2018); Chakraborty et al. (2017), and a number of other algebraic objects are key ingredients in the design of efficient algorithms in computer vision and medical image analysis as well as in the development or theoretical analysis of various machine learning problems. While a mature literature on extending classical models such as principal components analysis
Dunteman (1989)Haykin (2004); Grewal (2011), regression Fletcher (2013) to such a manifold data regime is available, identifying ways in which deep neural network (DNN) models can be adapted to leverage and utilize the geometry of such data has only become a prominent research topic recently Bronstein et al. (2017); Chakraborty et al. (2018b); Kondor and Trivedi (2018); Huang et al. (2018); Huang and Van Gool (2017). This research direction has already provided convolutional neural networks for various types of manifold measurements
Masci et al. (2015a, b) as well as sequential models such as LSTM Hochreiter and Schmidhuber (1997)/GRU Cho et al. (2014) for manifold settings Jain et al. (2016); Pratiher et al. (2018); Chakraborty et al. (2018c); Zhen et al. (2019).The results in the literature, so far, on harnessing the power of DNNs for better analysis of manifold or structured data are impressive, but most approaches are discriminative in nature. In other words, the goal is to characterize the conditional distribution
based on the predictor variables or features
, here is manifoldvalued and the responses or labels are Euclidean. The technical thrust is on the design of mechanisms to specify so that it respects the geometry of the data space. In contrast, work on the generative side is very sparse, and to our knowledge, only a couple of methods for a few specific manifolds have been proposed thus far Brehmer and Cranmer (2020); Rey et al. (2019); Miolane and Holmes (2020). What this means is that our ability to approximate the full joint probability distribution when the data are manifoldvalued remains limited. As a result, the numerous application settings where generative models have shown tremendous promise, namely, semisupervised learning, data augmentation
Antoniou et al. (2017); Radford et al. (2015) and synthesis of new image samples by modifying a latent variable Kingma and Dhariwal (2018); Sun et al. (2019)as well as numerous others, currently cannot be evaluated for domains with datatypes that are not Euclidean vectorvalued data.
GANs for Manifold data: what is challenging?
There are some reasons why generative models have sparingly been applied to manifold data. A practical consideration is that many application areas where manifold data are common, such as shape analysis and medical imaging, cannot often provide the sample sizes needed to train offtheshelf generative models such as Generative adversarial networks (GANs)
Goodfellow et al. (2014) and Variational autoencoders (VAEs) Kingma and Welling (2013); Doersch (2016). There are also several issues on the technical side. Consider the case where a data sample corresponds to an image where each pixel is a manifold variable (such as a covariance matrix). This means that each sample lives on a product space of the manifold of covariance matrices. In attempting to leverage state of the art methods for GANs such as Wasserstein GANs (WGANs) Arjovsky et al. (2017)will involve, as a first step, defining appropriate generators that take uniformly distributed samples on a product space of manifolds and transforming it into “realistic” samples which are also samples on a product space of manifolds. In principle, this can be attempted via recent developments by extending spherical CNNs or other architectures for manifold data
Chakraborty et al. (2018a). Next, one would not only need to define optimal transport Fathi and Figalli (2010) or Wasserstein distances Huang et al. (2019b) in complicated spaces, but also develop new algorithms to approximate such distances (e.g., Sinkhorn iterations) to make the overall procedure computationally feasible. An interesting attempt to do so was described in Huang et al. (2019b). In that paper, Huang et al. introduced a WGANbased generative model that can generate lowresolution lowdimension manifoldvalued images. On the other hand, VAEs are mathematically more convenient in comparison for such data, and as a result, a few recent works show how they can be used for dealing with manifoldvalued data Miolane and Holmes (2020). While these methods inherit VAE’s advantages such as ease of synthesis, VAEs are known to suffer from optimization challenges as well as a tendency to generate smoothed samples. It is not clear how the numerical issues, in particular, will be amplified once we move to manifold data where the core operations of calculating geodesics and distances, evaluating derivatives, and so on, must also invoke numerical optimization routines.Contributions. Instead of GANs or VAEs, the use of flowbased generative models Rezende and Mohamed (2015); Kingma and Dhariwal (2018), will enable latent variable inference and loglikelihood evaluation. It turns out, as we will show in our development shortly, that the key components (and layers) needed in flowbased generative models with certain mathematical/procedural adjustments, extends nicely to the manifold setting. The goal of this work is to describe our theoretical developments and show promising experiments in brain imaging applications involving manifoldvalued data.
This subsection briefly summarizes some differential geometric concepts/notations we will use. The reader will find a more comprehensive treatment in Boothby (1986).
Let be an orientable complete Riemannian manifold with a Riemannian metric , i.e., is a bilinear symmetric positive definite map, where is the tangent space of at . Let be the distance induced from the Riemannian metric .
Let , . Define to be an open ball at of radius .
The local injectivity radius is defined as where is defined and is a diffeomorphism onto its image at . The injectivity radius Manton (2004) of is defined as .
Within , where , the mapping , is called the inverse Exponential/Log map, is the dimension of . For each point , there exists an open ball for some such that , where . Thus, we can cover by an indexed (possibly infinite) cover . This set is an example of a chart on ; for an example, see Krauskopf et al. (2007) and also Fig. 1.
For notational simplicity, we will denote a chart covering by , since in general, we can use an arbitrary chart instead of an inverse Exponential map. Note that the domain for two chart maps may not necessarily be disjoint.
Given a differentiable function defined as , where and are the functions in the chart covering and respectively and for some differentiable , the Jacobian of (denoted by ) is defined as:
(1) 
The reason for the peculiar notation is that the derivative cannot be defined on manifoldvalued data, so is not meaningful: we use the notation to acknowledge this difference. Also note that are the same only when (1) using the global charts for space and (2) and are on the same manifold.
A diffeomorphism is an isometry if it preserves distance, i.e., . The set forms a group with respect to function composition.
Rather than writing an isometry as a function , we will write it as a group action. Henceforth, let denote the group , and for , , let denote the result of applying the isometry to point . Similar to the terminologies in Chakraborty et al. (2018c), we will use the term “translation” to denote the group action . This is due to the distance preserving property and is inspired by the analogy from the Euclidean space.
In this section, we will introduce flowbased generative models for manifoldvalued data. We will first describe the Euclidean formulation and specify which components need to be generalized to get the manifoldvalued formulation.
Flowbased generative models Rezende and Mohamed (2015); Kingma and Dhariwal (2018); Yang et al. (2019) aim to maximize the loglikelihood of the input data from an unknown distribution. The idea involves mapping the unknown distribution in the input space to a known distribution in the latent space using an invertible function, . At a high level, sampling from a known distribution is simpler, so an invertible can help draw samples from the input space distribution.
Let be i.i.d. samples drawn from an unknown distribution . Let this unknown distribution be parameterized by . In the rest of the paper, we use as a proxy for . We learn over a dataset . We maximize the likelihood of the model given the dataset by minimizing the equivalent formulation of negative loglikelihood as:
(2) 
But to minimize the above expression, we need to know . One way to bypass this problem is to learn a mapping from a known distribution in the latent space. Let the latent space be . Then, the generative step is given by . Here
can be a Gaussian distribution
.Let be the inverse of . For normalizing flow Rezende and Mohamed (2015), the is composed of a sequence of invertible functions . Hence, we have
Using and , the loglikelihood of is x
(3)  
(4) 
In Kingma and Dhariwal (2018), the GLOW model is composed of three different layers whose Jacobian is a triangular matrix, simplifying the logdeterminant:
(5) 
The three layers in the basic GLOW block (shown in Fig. 2), summarized in Table 1 are all invertible functions. These are (a) Actnorm, (b) Invertible convolution, and (c) Affine Coupling Layers. Note that the data is squeezed before it is fed into the block. Then, the data is split as in Dinh et al. (2016).
(a) Actnorm. normalizes the input to be a zeromean and identity standard deviation. In (6), are initialized from the data and then trained independently.
(b) convolution.
applies the invertible matrix
on the channel dimension. In (7), and where is the resolution of the input variables while is the number of channels.(c) Affine Coupling. uses the idea of split+concatenation. In (8), the input variable is split along the channel to , and then are concatenated to get the final output . Here, (and ) are realvalued matrices of the same dimension as for elementwise scaling (and translation).
Actnorm  convolution  Affine Coupling  




In Kingma and Dhariwal (2018), authors use a closed form for the inverse of these layers. Notice that calculating the determinant of the Jacobian is simple for all these layers except the affine coupling layer in (8) (Table 1). Since , the Jacobian determinant is .
(9) 
Next Steps: With the description above, we can now list the key operational components in (6)(Flowbased generative models: Euclidean case), which we need to modify for our manifoldvalued extension.
Key ingredients: In (6) and (8), the operators are (i) elementwise multiplication for and (ii) the addition of bias for . (iii) In (7), we require invertible matrices. (iv) Finally, to compute the loglikelihood, we need the calculation of derivative in (Flowbased generative models: Euclidean case). Thus we can verify that the key ingredients to define the model in GLOW are (i) elementwise multiplication; (ii) addition of bias; (iii) invertible matrix; (iv) derivative calculation. In theory, if we can modify those components from Euclidean space to manifolds, we will obtain a flowbased generative model on a Riemannian manifold. Observe that (i) and (iii) are matrix multiplications, which are nontrivial to define on a manifold. In Def. 3, we can use the chart map to map the manifold to a subspace of where a matrix multiplication can be used. This also provides a way to solve item (iv) based on the chart map. In (1), we show how to compute the Jacobian of a differentiable function from one manifold to another, respecting to the charts of the manifolds. For the item (ii), adding a bias can be viewed as a “translation” in the Euclidean space, while in Def. 4 we define the translation on manifoldvalued data using the group action. With these in hand, we are ready to present our proposed manifold version of these layers next.
Actnorm  convolution  Affine Coupling  








We will now introduce the manifold counterpart of the key operations. See Table 2 for a summary of functions.
(a) Actnorm. Let be the spatial resolution and be the channel size, . We modify (6) to manifoldvalued data using the operators we mentioned above in Key ingredients. The bias term is replaced by the group operators while the multiplication is replaced by the diagonal matrix of size in the space after chart mapping . The layer function is defined as in (10).
Determinant of the Jacobian can be computed as shown below in (16). In general, can be a tuple, i.e., for 3D data, it is a 3 dimensional tuple.
(16) 
(b) convolution. We define a convolution to offer the flexibility of interaction between channels. Here is a matrix applied after chart mapping . In general, we can learn any , i.e., a full rank matrix like in (7). But in practice, maintaining full rank is a hard constraint and may become unbounded. As a regularization, we choose to be a rotation matrix. This layer function is defined as in (11) using the same notation as in (7).
Determinant of the Jacobian can be computed as shown below in (17). Notice that for to be a rotation matrix, the contribution from is .
(17) 
(c) Affine Coupling. For manifoldvalued data, given (where and are spatial and channel resolutions), we first split the data along the channel dimension, i.e., partition into two parts denoted by and , where . From (8), we need to modify the scaling and translation. Here, and . These two operators play the same roles as in (8), scaling and translation. We need to be full rank. If needed, one may use constraints like orthogonality or bounded matrix for numerical stability. After performing the coupling, we simply combine and to get as our output. This function is defined in (12).
Determinant of the Jacobian can be computed as: Similar to (Flowbased generative models: Euclidean case), observe that involves taking the gradient of a neural network! But fortunately, we only require the determinant of the Jacobian matrix, and the independence of on saves the calculation of since . Thus, given , the Jacobian determinant is given as
(18) 
Distribution on the latent space: After the cascaded functional transformations described above, we transform to the latent space . We define a Gaussian distribution on , namely , by inducing a multivariate Gaussian distribution from as
(19) 
where and (SPD denotes a symmetric positive definite matrix).
We can now ask the question: can we draw manifoldvalued data conditioned on another manifoldvalued sample? Due to the nature of the invertibility of our generative model, this seems to be possible since all we need to develop, in addition to what has been covered, is a scheme to sample data from Euclidean space conditioned on a vectorvalued input.
Recently, extensions of the GLOW model (in a Euclidean setup) have been used to generate samples from space conditioned on space , see Sun et al. (2019). In this section, we roughly follow Sun et al. (2019) by using connections in a latent space but in a manifold setting to generate a sample from a manifold , conditioned on a sample on manifold . The underlying assumption is that there exists a (smooth) function from to . The generation step are as follows.
Given variables and with the dimension of the manifolds and to be and respectively, we use the two parallel GLOW models (as discussed above) to get the corresponding latent space. Let it be denoted by and respectively.
After getting the respective latent spaces, we need to fit a distribution on it. Since we wish to generate samples from , the distribution on the respective latent space must be induced from the variables in , i.e., the latent space for . We do not have any constraint on the distribution parameters for , so, we use a Gaussian distribution with a fixed and on . The parameters for the Gaussian distribution on are defined as functions of . Formally, we define using (19), where, and . Here, the two functions and are modeled using a neural network. The scheme is shown in Fig. 3.
Specific examples of manifolds. Finally, in order to implement (10), (11) and (12) mentioned in the previous sections, basic operations specific to a manifold are (a) the choice of distance, , (b) the isometry group, , (c) the chart map and its inverse, . We use three types of nonEuclidean Riemannian manifolds in the experiments presented in this work (including the supplement section), they are (a) hypersphere, (b) space of positive real numbers, (c) space of symmetric positive definite matrices (). We give the explicit formulation for the operations in Table 3.
We demonstrate the experimental results of our model using two setups. First, we generate texture images based on the local covariances, which serves as a sanity check evaluation relative to another generative model for manifoldvalued data available at this time. The second experiment, which is our main scientific focus, generates orientation distribution function (ODF) images Hess et al. (2006) using diffusion tensor imaging (DTI) Basser et al. (1994); Alexander et al. (2007). Note that, in this setting we construct the DTI scans from undersampled diffusion directions. This makes the generation of ODF conditioned on the DTI scans challenging and motivated us to tackle this problem using our proposed framework.
Baseline. Very recently, the flow Brehmer and Cranmer (2020) was introduced, which provides a generative model for manifoldvalued data.
flow uses an encoder to encode the manifoldvalued data in the highdimensional space into a lowdimensional Euclidean space. During generation, the model will generate the lowdimensional Euclidean data and warp it back to the manifold in the highdimensional space. The benefit of this method is that it can learn the dimension of the unknown manifold, including natural images like ImageNet
Deng et al. (2009). But for a known Riemannian manifold, the dimension of the manifold is fixed. For example, is of dimension , while is of dimension . Thus, for a known Riemannian manifold, flow learns the chart using an encoder neural network and applies all the operations in the learned space with (known) dimension . Another interesting recent proposal, manifoldWGAN, Huang et al. (2019b) showed that it is possible to generate resolution matrices using WGAN. Due to the involved calculations needed by WGAN, extending it into highdimension manifoldvalued data including ODF () will require nontrivial changes. Further, manifoldWGAN in its current form does not deal with conditioning the generated images based on another manifoldvalued data but is an interesting future direction to explore.Now, we present experiments for generating texture images before moving to the more challenging ODF generation task.
The earth texture images dataset was introduced in Yu et al. (2019). The train (and test) set have (and ) images. All images are augmented by random transformations and cropping to size . Our goal here is to generate texture images based on the local covariances of the three (R, G, B) channels. So the two manifolds are (for covariance matrix) and (for texture images). Since flow can only take the Euclidean data as the “conditioning variable”, we vectorize the local covariances as the condition variable for flow. The dimension of the learned space for flow is chosen as (default configuration from StyleGAN Karras et al. (2020)). For our case, we build two parallel manifoldGLOW with blocks on each side. After every blocks, the spatial resolution is reduced to half. In the latent space, we train a residual network with residual blocks to map the distribution of the to . Example results are shown in Fig. 4. Even in this simple setting, due to the encoder in the flow, the generated images lose sharpness. Our model uses the information of the local covariances to generate superior texture images.
Our main focus is the conditional synthesis of structural brain image data. Diffusionweighted magnetic resonance imaging (dMRI) is an MR imaging modality which measures the diffusion of water molecules at a voxel, and is used to understand brain structural connectivity. Diffusion tensor imaging (DTI), a type of dMRI Basser et al. (1994); Alexander et al. (2007), measures the restricted diffusion of water along only three canonical directions at each voxel. The measurement at each voxel is a symmetric positive definite (SPD) matrix (i.e., manifoldvalued data). If multishell acquisition capabilities are available, we can obtain a richer acquisition; here, each voxel is an orientation distribution function (ODF) Hess et al. (2006) which describes the diffusivity in multiple directions (less lossy compared to DTI). By symmetrically/equally sampling points on the continuous distribution function Garyfallidis et al. (2014), each measurement is a D vector (nonnegative entries; sum to ). Using the square root parameterization Brody and Hughston (1998); Srivastava et al. (2007), the data at each voxel lies on the positive part of manifold.
We seek to generate a 3D brain image where each voxel is a ODF from the corresponding DTI image (each voxel is a SPD matrix). To make the setup more challenging (and scientifically interesting), we generate the DTI images only from randomly undersampled diffusion directions. We now explain the (a) rationale for the application (b) data description (c) model setup (d) evaluations . Note that in the experiment, since we draw samples from the distribution on the latent space, conditioned on DTI, to get the target representation, we call it generation rather than reconstruction.
Why generating ODF from DTI is important? For dMRI, different types of acquisitions involve longer/shorter acquisition times. Higher spatial resolution images (e.g., ODF) involves a longer acquisition time (– mins per scan versus mins for an ODF multishell scan) and this is problematic, especially for children and the elderly. To shorten the acquisition time with minimal compromise in the image quality, we require mechanisms to transform data acquired from shorter acquisitions (DTI) to a higher spatial resolution image: a field (or image) of ODFs. This serves as our main motivation.
However, (a)
the per voxel degrees of freedom for ODF representation is
(lies on ) while for DTI is (lies on ). Hence, it is an illposed problem. (b) requires mathematical tools to “transform” from one manifold (DTI representation) to another (ODF representation) while preserving structural information . Now, we describe some details of the data, models and present the results.Dataset  Age  Gender  

2225  2630  3135  36+  Female  Male  
All  224  467  364  10  575(54.0%)  490(46.0%) 
Train  178  370  295  9  463(54.3%)  389(45.7%) 
Test  46  97  69  1  112(52.6%)  101(47.4%) 
Dataset. The dataset for our method is the Human Connectome Project (HCP) Van Essen et al. (2013). The total number of subjects with diffusion measurements available is : were used as training and as the test set. Demographic details are reported in Table 4 (please see Van Essen et al. (2013) for more details of the dataset). All raw dMRI images are preprocessed with the HCP diffusion pipeline with FSL’s ‘eddy’ Andersson and Sotiropoulos (2016). After correction, ODF and DTI pairs were obtained using the Diffusion Imaging in Python (DIPY) toolbox Garyfallidis et al. (2014). Due to the memory requirements of the model and 3D nature of medical data, generation of an ODF image of the entire brain at once remains out of reach at this point, hence we resize the original data into but the process can proceed in a sliding window fashion as well.
Reduction in the memory costs. Since the entire 3D models for brain images are still too large to fit into the GPU memory, we need to further simplify the model without sacrificing the performance too much. Recently, NanoFlow Lee et al. (2020)
was introduced to reduce the number of parameters for sequential data processing. The assumption of NanoFlow is that the Affine Coupling layer, if fully trained, can estimate the distribution for any parts of the input data in a fixed order. There will be some performance drop compared with training different Affine Coupling layers for different parts of the data. But the gain from reducing the parameters is significant. Thus, in our setup, due to the large 3D input, we apply the NanoFlow trick for DTI and ODF separately. For example, for the DTI data, we first split the entire data into
slices called . Then we can share the two neural networks and in the Affine Coupling layer among these slices. The input of two neural networks and in Affine Coupling layer would be, while the output will be the estimated mean and variance of
respectively. Due to sharing weights, the number of parameters reduces and becomes feasible for training our 3D DTI and ODF setups.Model Setup. In order to set up our model, we first build two flowbased streams for DTI and ODF separately. Then, in the latent space, we train a transformation operating between the Gaussian distribution variable on the manifold and the Gaussian distribution variable on the manifold . This architecture with two flowbased models and the transformation module can be jointly trained as shown in Fig. 5. We use basic blocks of our manifold GLOW, and after every blocks, reduce the resolution by half. This setup is the same for both DTI and ODF. We use residual network blocks to map the latent space from DTI to ODF. The samples are presented to the model in paired form, i.e., a DTI image (field of SPD matrices) and a corresponding ODF image (a field of ODFs). To reduce the number of parameters for this 3D data, we use a similar idea as NanoFlow Lee et al. (2020) that shares the Affine Coupling layer for DTI and ODF separately, with setting . As a comparison, for the baseline model flow, the learned dimension will be where for DTI and for ODF. While flow could be trained for our texture experiments, here, the memory requirements are quite large, quantitatively the number of parameters required for flow and our model are and respectively. A similar situation arises in the Euclidean space version of GLOW which also does not leverage the intrinsic Riemmanian metric: therefore, the memory cost will be more than the natural images which have dimension . This is infeasible even on clusters and therefore, results from these baselines are very difficult to obtain.
Choice of metrics. We will use “reconstruction error” using the distance in Table 3. Although the task here is generation, measuring reconstruction error assesses how “similar” the original ODF is to the generated ODF, generated directly from the corresponding DTI representation. We also perform a group difference analysis to identify statistically different regions across groups (grouped by a dichotomous variable). Since HCP only includes healthy subjects (HCP aging is smaller), we can perform a group difference test based on gender, i.e., male vs. female. We evaluate overlap: how/whether groupwise different regions on the generated/reconstructed data agrees with those on the actual ODF images.
Generation results. We present quantitative and qualitative results for generation of ODF from its DTI representation. In Fig. 6(a), we show a few example slices from the given DTI and the generated ODF. Overall, the reconstruction error was . Since perceptually comparing fidelity between generated and ground truth images is difficult, we perform the following quantitative analysis: (a) a histogram of the reconstruction error over all test subjects (shown in Fig. 6(b)) (b) an error matrix showing how similar the generated ODF image is with the other “incorrect” samples of the population. The goal is to assess if the generated ODF is distinctive across different samples (shown in Fig. 6(c)) . From the histogram presented in Fig. 6(b), we can see that the reconstruction error is consistently low over the entire test population. Now, we generate Fig. 6(c) as follows. For each subject in the test population, we randomly select samples (subjects) from the population and compute the reconstruction error with the generated ODF. This gives us a
matrix (similar to the confusion matrix). Fig.
6(c) shows the average of runs: lighter shades mean a larger reconstruction error. So, we should ideally see a dark diagonal, which is approximately depicted in the plot. This suggests that for the test population, the generation is meaningful (preserves structures) and distinctive (maintains variability across subjects). There are only few experiments described in the literature on generation of dMRI data Huang et al. (2019b); AnctilRobitaille et al. (2020). While Huang et al. (2019b) shows the ability to generate 2D () DTI, the techniques described here can operate on 3D ODF () data and should offer improvements.Group difference analysis. We now quantitatively measure if the reconstruction is good enough so that the generated samples can be a good proxy for downstream statistical analysis and yield improvements over the same analysis performed on DTI. We run permutation testing with independent runs and compute the pervoxel value to see which voxels were statistically different between the groups for the following settings (a) original ODF (b) generated ODF (c) DTI (d) functional anisotropy (FA) representation (commonly used summary of DTI). Both DTI and FA are commonly used for assessing statistically significant differences across genders Menzler et al. (2011); Kanaan et al. (2012). But since ODF contains more structural information than either the FA or DTI, our generated ODF should be able to pick up more statistically significant regions over DTI or FA. We evaluate the intersection of significant regions with the original ODF (the original ODF contains the most information). We compute the intersection over union (IoU) measure. For the whole brain, FA will have IoU 0.04, while DTI has IoU 0.16. The generated ODF has IoU 0.22. We see that the generated ODF has a larger intersection in the statistically significant regions with the original ODF and offers improvements over DTI. This provides some evidence that the generated ODF preserves the signal that is different across the male/female groups. We also show a zoomed in example of a ROI for the fullresolution images in Fig. 7. The values for different ROIs are all in both the original ODF and our generated ODF, indicating consistency of our results, at least in terms of regions identified in downstream statistical analysis. Note that the analysis on the real ODF images serves as the ground truth.
A number of deep neural network formulations have been extended to manifoldvalued data in the last two years. While most of these developments are based on models such as CNNs or RNNs, in this work, we study the generative regime: we introduce a flowbased generative model on the Riemannian manifold. We show that the three types of layers, Actnorm, Invertible convolution, and Affine Coupling layers in such models, can be generalized/ adapted for manifoldvalued data in a way that preserves invertibility. We also show that with the transformation in the latent space between the two manifolds, we can generate manifoldvalued data based on the information from another manifold. We demonstrate good generation results in the representation of ODF given DTI on the Human Connectome dataset. While the current formulation shows mathematical feasibility and promising results, additional work on the methodological and the implementation side is needed to reduce the runtime to a level where the tools can be deployed in scientific labs.
This research was supported in part by grant 1RF1AG05931201A1 and NSF CAREER RI #1252725.
Geometric deep learning: going beyond euclidean data
. IEEE Signal Processing Magazine 34 (4), pp. 18–42. Cited by: Introduction.Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 6196–6204. Cited by: Introduction.On the properties of neural machine translation: encoderdecoder approaches
. arXiv preprint arXiv:1409.1259. Cited by: Introduction.Tutorial on variational autoencoders
. arXiv preprint arXiv:1606.05908. Cited by: Introduction.ThirtyFirst AAAI Conference on Artificial Intelligence
, Cited by: Introduction.Geometric loss functions for camera pose regression with deep learning
. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5974–5983. Cited by: Introduction.A differential geometric approach to the geometric mean of symmetric positivedefinite matrices
. SIAM Journal on Matrix Analysis and Applications 26 (3), pp. 735–747. Cited by: Introduction.Riemannian analysis of probability density functions with applications in vision
. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. Cited by: Introduction, Main focus: Diffusion MRI dataset.Texture mixer: a network for controllable synthesis and interpolation of texture
. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12164–12173. Cited by: Generating texture images.