As the population living longer, Alzheimer’s disease (AD) is now a major public health concern with the number of patients expected to reach 13.8 million by the year 2050 in the U.S. alone [Brookmeyer2007ADforcast]. However, the late interventions or the targets with secondary effects and less relevant to the disease initiation often make the current therapeutic failures in patients with dementia due to AD [Hyman2011Amyloid]. The accumulation of beta-amyloid plaques (A) in human brains is one of the hallmarks of AD, and preclinical AD is now viewed as a gradual process before the onset of the clinical symptoms. The A positivity is treated as the precursor of anatomical abnormalities such as atrophy and functional changes such as hypometabolism/hypoperfusion.
It is generally agreed that accurate presymptomatic diagnosis and preventative treatment of AD could have enormous public health benefits. Brain A pathology can be measured using positron emission tomography (PET) with amyloid-sensitive radiotracers, or in cerebrospinal fluid (CSF). However, these invasive and expensive measurements are less attractive to subjects in preclinical stage and PET scanning is also not widely available in clinics. Therefore, there is strong interest to develop structural magnetic resonance imaging (MRI) biomarkers, which are largely accessible, cost-effective and widely used in AD clinical research, to predict brain amyloid burden [Ansart:amyloidosis]. Tosun et al. [Tosun2014MRI] combine MRI-based measures of cortical shape and cerebral blood flow to predict amyloid status for early-MCI individuals. Pekkala et al. [Pekkala2020Amyloid] use the brain MRI measures like volumes of the cortical gray matter, hippocampus, accumbens, thalamus and putamen to identify the A positivity in cognitively unimpaired (CU) subjects. Meanwhile, a univariate imaging biomarker would be highly desirable for clinical use and randomized clinical trials (RCT)[Shi:TPAMI20, Tu:NEIN20]. Though a variety of research studies univariate biomarkers with sMRI analysis, there is limited research to develop univariate biomarker to predict brain amyloid burden, which will enrich our understanding of the relationship between brain atrophy and AD pathology and thus benefit assessing disease burden, progression and effects of treatments.
In this paper, we propose to use the worst transportation (WT) cost as a univariate biomarker to predict brain amyloid burden. Specifically, we compare the population statistics of WT costs between the A AD and the A CU subjects. The new proposed WT transports one measure to the other in the least economical way, which contrasts to the optimal transportation (OT) map that finds the most economical way to transport from one measure to the other. Similar to the OT map, the WT map is the gradient of a strictly concave function, which also satisfies the Monge-Ampere equation. Furthermore, the WT map can be computed by convex optimization with the geometric variational approach like the OT map [gu2013theory, Su2015Registration]. Intuitively, the OT/WT maps are solely determined by the Riemannian metric of the cortical surface, and they reflect the intrinsic geometric properties of the brain. Therefore, they tend to serve as continuous and refined shape difference measures.
The contribution of the paper includes: (i) In this work, we generalize the Brenier theorem from the OT map to the newly proposed WT map, and rigorously show that the WT map is the gradient of a concave function that satisfies the Monge-Ampere equation. To the best of our knowledge, it is the first WT work in medical imaging research; (ii) We propose an efficient and robust computational algorithm to compute the WT map based on computational geometry. We further validate it with geometrically complicated human cerebral cortical surfaces; (iii) Our extensive experimental results show that the proposed WT cost performs better than the OT cost when discriminating A AD patients from A CU subjects. This surprising result may help broaden univariate imaging biomarker research by opening up and addressing a new theme.
2 Theoretic Results
In this section, we briefly review the theoretical foundation of optimal transportation, then generalize the Brenier theorem and Yau’s theorem to the worst transportation.
2.1 Optimal Transportation Map
are domains in Euclidean space, with probability measuresand respectively satisfying the equal total mass condition: . The density functions are and . The transportation map is measure preserving if for any Borel set , , denoted as .
Monge raised the optimal transportation map problem [villani2008optimal]: given a transportation cost function , find a transportation map that minimizes the total transportation cost,
The minimizer is called the optimal transportation map (OT map). The total transportation cost of the OT map is called the OT cost.
Theorem 2.1 (Brenier [Brenier_1991])
Given the measures and with compact supports with equal total mass , the corresponding density functions , and the cost function , then the optimal transportation map from to exists and is unique. It is the gradient of a convex function , the so-called Brenier potential. is unique up to adding a constant, and .
If the Brenier potential is , then by the measure preserving condition, it satisfies the Monge-Ampère equation,
where is the Hessian matrix of .
2.2 Worst Transportation Map
With the same setup, the worst transportation problem can be formulated as follows: given the transportation cost function , find a measure preserving map that maximizes the total transportation cost,
The maximizer is called the worst transportation map. The transportation cost of the WT map is called the worst transportation cost between the measures. In the following, we generalize the Brenier theorem to the WT map.
Theorem 2.2 (Worst Transportation Map)
Given the probability measures and with compact supports respectively with equal total mass , and assume the corresponding density functions , the cost function , then the worst transportation map exists and is unique. It is the gradient of a concave function , where is the worst Brenier potential function, unique up to adding a constant. The WT map is given by . Furthermore, if is , then it satisfies the Monge-Ampère equation in Eqn. (1).
Suppose is a measure-preserving map, . Consider the total transportation cost,
Therefore, maximizing the transportation cost is equivalent to . With the Kantorovich formula, this is equivalent to finding the following transportation plan ,
where , are the projections from to and respectively. By duality, this is equivalent to , where the energy , and the functional space . Now we define the c-transform,
Fixing , is a linear function, hence is the lower envelope of a group of linear functions, and thus is a concave function with Lipschitz condition (since the gradient of each linear function is , is bounded). We construct a sequence of function pairs , where , . Then increases monotonously, and the Lipschitz function pairs converge to , which is the maximizer of . Since and are c-transforms of each other, we have
This shows the existence of the solution.
From the definition of c-transform in Eqn. (2), we obtain . Since is concave and almost everywhere differentiable, we have , which implies that . Therefore, the WT map is the gradient of the worst Brenier potential .
Next, we show the uniqueness of the WT map. Suppose there are two maximizers and , because is linear,therefore is also a maximizer. Assume
If , then , . But is also a maximizer, this contradicts to the Eqn. (3). This shows the uniqueness of the WT map.
Finally, is concave and piecewise linear, therefore by Alexandrov’s theorem [Alexandrov2005Convex], it is almost everywhere . Moreover, the WT map is measure-preserving and , thus we have
This completes the proof.
2.3 Geometric Variational Method
If is not smooth, we can still define the Alexandrov solution. The sub-gradient of a convex function at is defined as
The sub-gradient defines a set-valued map: , . We can use the sub-gradient to replace the gradient map in Eqn. (1), and define
Definition 1 (Alexandrov Solution)
A convex function satisfies the equation , or , , then is called an Alexandrov solution to the Monge-Ampère equation Eqn. (1).
The work of [gu2013theory] proves a geometric variational approach for computing the Alexandrov solution of the optimal transportation problem.
Semi-discrete OT/WT maps
Suppose the source measure is , is a compact convex domain with non-empty interior in and the density function is continuous (Fig. 1(a) gives an example). The target discrete measure is defined as , where are distinct points with and (Fig. 1(c) shows an example with the discrete measure coming from the 3D surface of Fig. 1(b)). Alexandrov [Alexandrov2005Convex] claims that there exists a height vector
height vector, so that the upper envelope of the hyper-planes gives an open convex polytope , the volume of the projection of the -th facet of in equals to . Furthermore, this convex polytope is unique up to a vertical translation. In fact, Yau’s work [gu2013theory] pointed out that the Alexandrov convex polytope , or equivalently the upper envelop is exactly the Brenier potential, whose gradient gives the OT map shown in Fig. 1(d).
Theorem 2.3 (Yau et. al. [gu2013theory])
Let be a compact convex domain, be a set of distinct points in and be a positive continuous function. Then for any with , there exists , unique up to adding a constant , so that , . The height vector is exactly the minimum of the following convex function
on the open convex set (admissible solution space)
Furthermore, the gradient map minimizes the quadratic cost among all the measure preserving maps , , .
|(a)||(b) 3D surface||(c)||(d) OT map image||(e) WT map image|
Theorem 2.4 (Semi-Discrete Worst Transportation Map)
Let be a compact convex domain, be a set of distinct points in and be a positive continuous function. Then for any with , there exists , unique up to adding a constant , so that , . The height vector is exactly the maximum of the following concave function
on the open convex set (admissible solution space) defined on Eqn. (5). Furthermore, the gradient map maximizes the quadratic cost among all the measure preserving maps , , .
Given the height vector , , we construct the upper convex hull of ’s (see Fig. 2(d)), each vertex corresponds to a plane . The convex hull is dual to the lower envelope of the plane (see Fig. 2(c)), which is the graph of the concave function . The projection of the lower envelope induces a farthest power diagram (see Fig. 2(c)) with , and .
|(a) OT envelope||(b) OT convex hull||(c) WT envelope||(d) WT concave hull|
The -volume of each cell is defined as
Similar to Lemma 2.5 in [gu2013theory], by direct computation we can show the symmetric relation holds:
This shows the differential form is a closed one-form. As in [gu2013theory], by Brunn-Minkowski inequality, one can show that the admissible height space in Eqn. (5) is convex and simply connected. Hence is exact. So the energy is well defined and its Hessian matrix is given by
Since the total volume of all the cells is the constant , we obtain
Therefore, the Hessian matrix is negative definite in and the energy is strictly concave in . By adding a linear term, the following energy is still strictly concave,
The gradient of is given by
On the boundary of , there is an empty cell , and the -th component of the gradient is , which points to the interior of . This shows that the global unique maximum of the energy is in the interior of . At the maximum point , is zero and . Thus, is the unique solution to the semi-discrete worst transportation problem, as shown in Fig. 1(e)
3 Computational Algorithms
This section gives a unified algorithm to compute both the optimal and the worst transportation maps based on convex geometry [Berg2008CG].
3.1 Basic Concepts from Computational Geometry
A hyperplane inis represented as . Given a family of hyperplanes , their upper envelope of is the graph of the function ; the lower envelope is the graph of the function ; the Legendre dual of is defined as . The c-transform of is defined as
Each hyperplane has a dual point in , namely . The graph of is the lower convex hull of the dual points . And the graph of is the upper convex hull of the dual points . (i) The projection of the upper envelope induces a nearest power diagram of with and . And the projection of the lower convex hull induces a nearest weighted Delaunay triangulation of .
(ii) The projection of the lower envelope induces a farthest power diagram of . And the projection of the upper convex hull induces a farthest weighted Delaunay triangulation . and are dual to each other, namely connects in if and only if is adjacent to . Similarly, and are also dual to each other. Fig. 2 shows these basic concepts.
3.2 Algorithms based on Computational Geometry
The algorithm in Alg. 1 mainly optimizes the energy in the admissible solution space using Newton’s method. At the beginning, for the OT (WT) map, the height vector is initialized as (). At each step, the convex hull of is constructed. For the OT (WT) map, the lower (upper) convex hull is projected to induce a nearest (farthest) weighted Delaunay triangulation of ’s. Each vertex on the convex hull corresponds to a supporting plane , each face in the convex hull is dual to the vertex in the envelope, which is the intersection point of and . For the OT (WT) map, the lower (upper) convex hull is dual to the upper (lower) envelope, and the upper (lower) envelope induces the nearest (farthest) power diagram. The relationship of the convex hulls and the envelopes are shown in Fig. 2.
Then we compute the -volume of each power cell using Eqn. (7), the gradient of the energy Eqn. (6) is given by Eqn. (11). The Hessian matrix can be constructed using Eqn. (9) for off diagonal elements and Eqn. (10) for diagonal elements. The Hessian matrices of the OT map and the WT map differ by a sign. Then we solve the following linear system to find the update direction,
Next we need to determine the step length , such that is still in the admissible solution space in Eqn. (5). Firstly, we set for OT map and for WT map. Then we compute the power diagram . If some cells disappear in , then it means exceeds the admissible space. In this case, we shrink by half, , and recompute the power diagram with . We repeat this process to find an appropriate step length and update . We repeat the above procedures until the norm of the gradient is less than a prescribed threshold . As a result, the upper (lower) envelope is the Brenier potential, the desired OT(WT) mapping maps each nearest (farthest) power cell to the corresponding point .
In order to compute the power diagram, we need to compute the convex hull [Berg2008CG]. The conventional method [Su2015Registration] computes the convex hull from the scratch at each iteration, which is the most time-consuming step in the algorithm pipeline. Actually, at the later stages of the optimization, the combinatorial structure of the convex hull does not change much. Therefore, in the proposed method, we only locally update the connectivity of the convex hull. Basically, we check the local power Delaunay property of each edge, and push the non-Delaunay edges to a stack. While the stack is non-empty, we pop the top edge and check whether it is local power Delaunay, if it is then we continue, otherwise we flip it. Furthermore, if the flipping causes some overlapped triangles, we flip it back. By repeating this procedure, we will finally update the convex hull and project it to the weighted Delaunay triangulation. If in the end, the stack is empty, but there are still non-local power Delaunay edges, then it means that the height vector is outside the admissible space and some power cells are empty. In this scenario, we reduce the step length by half and try again.
With a piecewise linear source density, we need to compute the -area of the power cells and the -length of the power diagram edges. The source measure is represented by a piecewise linear density function, defined on a triangulation, as shown in Fig. 3(a). Therefore, we need to compute the overlay (Fig. 3(c)) of the triangulation (Fig. 3(a)) and the power diagram (Fig. 3(b)) in each iteration. This step is technically challenging. If we use a naive approach to compute the intersection between each triangle and each power cell, the complexity is very high. Therefore, we use a Bentley-Ottmann type sweep line approach [Bentley-Ottmann1979] to improve the efficiency. Basically, all the planar points are sorted, such that the left-lower points are less than the right-upper ones. Then for each cell we find the minimal vertex and maximal vertex. A sweep line goes from left to right. When the sweep line hits the minimal vertex of a cell, the cell is born; when the sweep line passes the maximal vertex, the cell dies. We also keep the data structure to store all the alive triangles and cells, and compute the intersections among them. This greatly improves the computation efficiency.
|(a) PL source density||(b) Power diagram||(c) Overlay|
To show the practicality of our framework for structural MR images as well as the robustness over large brain image datasets, we aim to use the WT cost to statistically discriminate A AD patients and A CU subjects.
Data Preparation Brain sMRI data are obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database [ANDI], from which we use 81 A AD patients and 110 A CU subjects. The ADNI florbetapir PET data is processed using AVID pipeline [Navitsky:AD18] and later converted to centiloid scales. A centiloid cutoff of 37.1 is used to determine amyloid positivity [Fleisher:an2011]. The sMRIs are preprocessed using FreeSurfer [freesurfer] to reconstruct the pial cortical surfaces and we only use the left cerebral surfaces. For each surface, we remove the corpus callosum region which has little morphometry information related to AD, so that the final surface becomes a topological disk. Further, we compute the conformal mapping from the surface to the planar disk with the discrete surface Ricci flow [WangRiccTMI]. To eliminate the Mobius ambiguity, we map the vertex with the largest coordinate on to the origin of the disk, and the vertex with the largest coordinate on the boundary to coordinate of .
Computation Process In the experiment, we randomly select one subject from the A CU subjects as the template, and then compute both the OT costs and WT costs from the template to all other cortical surfaces. The source measure is piecewisely defined on the parameter space of the template, namely the planar disk. The measure of each triangle on the disk is equal to the corresponding area of the triangle on . Then the total source measure is normalized to be . For the target surface , the measure is defined on the planar points with . The discrete measure corresponding to both and is given by , where represents a face adjacent to on in . After normalization, the summation of the discrete measures will be equal to the measure of the planar disk, namely
. Then we compute both the OT cost and the WT cost from the planar source measure induced by the template surface to the target discrete measures given by the ADs and CUs. Finally, we run a permutation test with 50,000 random assignments of subjects to groups to estimate the statistical significance of both measurements. Furthermore, we also compute surface areas and cortical volumes as the measurements for comparison and the same permutation test is applied. The-value for the WT cost is 2e-5, which indicates that the WT-based univariate biomarkers are statistically significant between two groups and they may be used as a reliable indicator to separate two groups. It is worth noting that it is also far better than that of the surface area, cortical volume and OT cost, which are given by , and , respectively, as shown in Tab. 1.
|Method||Surface Area||Cortical Volume||OT Cost||WT Cost|
Our results show that the WT cost is promising as an AD imaging biomarker. It is unexpected to see that it performs far better than OT. A plausible explanation is that although human brain cortical shapes are generally homogeneous, the AD-induced atrophy may have some common patterns which are consistently exaggerated in the WT cost but OT is robust to these changes. More experiments and theoretical study are warranted to validate our observation.
In this work, we propose a new algorithm to compute the WT cost and validate its potential as a new AD imaging biomarker. In the future, we will validate our framework with more brain images and offer more theoretical interpretation to its improved statistical power.