1 Introduction
Since the dawn of medical image analysis, researchers have been using 3D imaging to capture structure of the brain. Throughout the last decade, this community has seen the emergence of deep learning due to its power to capture the local structure. One of the major challenges to deal with medical images is the misalignment in the collection of data. Although this can be addressed by image registration, yet the registration techniques incur error which propagates to the subsequent analysis. A way to address this issue is by making the model invariant to the geometric transformations, which entails data augmentation with an increase in model complexity and training time. This leads to the voyage of exploring alternatives of 3D structure representation. Pointcloud is an efficient way to represent 3D structures
[1, 2] because of its important geometric properties. Due to the lack of a smooth topology, standard convolution can not be applied on pointcloud. One of the popular approaches to do point convolution [3]is to divide the pointcloud into voxels and then extract some features using 3D convolution. However, this method suffers from the possible sparsity of pointclouds which results in multiple empty voxels. One possible solution is to use multilayer perceptron (MLP) to extract features from each point
[1] or from a local neighborhood around each point [2]. Unfortunately, all these methods are susceptible to random rotations which makes them incapable to efficiently deal with the error caused by rigid registration. In this work, we try to address this problem by developing an inherently rotation invariant model.In recent years, several researchers [4, 5] have proposed methods to do discrimination between subjects with and without dementia. Some of the popular approaches include using the 3D volume of the region of interest (ROI) and analyzing the shape of the anatomical structures of interest. In order to do that, the researchers either proposed techniques to map the 3D volume in high dimensional space [4] or mapped the shape of the anatomical structure on the complex projective space, i.e., Kendall’s shape space [5].
In this work, we use our proposed approach to discriminate between demented and nondemented subjects on publicly available OASIS dataset [6] based on the shape of the corpus callosum (as shown in Fig. 2). It is wellaccepted that thinning of corpuscallosum structure is related to dementia. We first segment the corpus callosum structure from the 3D image. Then we use pointcloud sampled from the the corpus callosum structure as the input of our proposed method.
Motivation: In medical imaging, the analysis is highly sensitive to the error caused by the rigid registration due to the presence of large rotations in collection of data. This dictates the necessity of a rotation invariant model for the medical imaging. However, achieving the invariance on 3D image scans is computationally expensive as it entails data augmentation. This motivates us to present 3D image scans with pointclouds. Nonetheless, due to the lack of local neighborhood structure, it is not possible to use standard convolution on the 3D pointcloud. Therefore, we define a rotation invariant convolution operator on pointcloud with the topology induced from sphere.
Our proposed method first constructs sphere around each point and collect response from the pointcloud. The responses are collected based on the inner product between the grid points on the sphere and the points in the pointcloud. We use spherical convolution on the collected responses to extract rotationinvariant features. Finally, we aggregate the invariant response for the entire pointcloud and use this feature to do classification of demented and nondemented subjects. A schematic of our proposed method is shown in Fig. 1.
The salient features of our proposed method are: (1) The previous methods to deal with pointclouds either use MLP to define “convolution” or use locally flat structure to define “convolution”. To the best of our knowledge, this is the first attempt to define a convolution operator on 3D pointcloud. (2)
In order to achieve geometric invariance, the popular approach is to use data augmentation which naturally increases the training time and model complexity. Here, we propose an “augmentationfree” rotation invariant convolution neural network (CNN) for pointcloud.
To the best of our knowledge, this is the first attempt to model medical images using 3D point clouds. Our model is much leaner because of the induced spherical topology on the pointcloud instead of mapping the pointcloud into a high dimensional space. (3)We perform an experimental evaluation in terms of classifying subjects with and without dementia on publicly available OASIS dataset.
2 Proposed algorithm
In this work, we propose a rotation invariant CNN using an induced spherical topology on the pointcloud. Though the formulation described below can be applied on for any , in this work we have restricted ourselves to . We divide our proposed algorithm into five key components.
Computing the centroid of a pointcloud: Given the pointcloud , we compute the centroid of the pointcloud to be the nearest point to the mean of . Formally, let us denote the centroid of to be . Then, is defined as
where, is the projection of in the set .
Extracting the “attention” from a pointcloud: Given the pointcloud , we extract the region of interest, i.e., “attention” to be a subset as follows: (a)
Compute the directional part of the vector from
to each point, . Let the vector be denoted by . (b) Pass the vector through a fullyconnected layer to get the confidence, , for selecting . (c)Define a random variable following multinomial distribution with
as the parameter. (d) Draw samples from this random variable to generate . We call this subset to be our region of interest.Finding Convex hull: We extract convex hull using the following scheme as it is useful to capture the global structure of a geometric shape. At each point , we center a sphere with radius , and collect the response from the set . This gives at each point , a function . Now, given (centered at , we compute the response , where is the unit ball with radius centered at .
Proposition 1.
If the pointcloud is rotated by the matrix , then the corresponding responses are also rotated by the matrix .
This representation can be viewed as putting omnidirectional camera at each point and collecting the responses in each viewing direction. This analogy makes one wonder: Is there a necessity for cameras where is the number of points inside the region of attention? Obviously, for a dense pointcloud the answer is no and hence we have incorporated a downsampling strategy as follows: (a) on each sphere centered at , collect the omnidirectional responses from . (a) choose top spheres with the largest responses. Let the chosen index set be .
Proposition 2.
The selected point set lie on the convex hull of the pointcloud.
Extracting features with spherical Convolution: After constructing sphere at , we use spherical convolution to extract rotation equivariant features. We define spherical convolution as follows:
Definition 1 ( convolution).
As the output of spherical convolution is a function from to , we need convolution in order to develop a deep CNN architecture. We use the standard group convolution with respect to the Haar measure, which we define as follows.
Definition 2 ( convolution).
Given (the signal) and (the learnable kernel), we define the convolution operator as Here, is the volume density on with respect to the Haar measure. As obvious from the definition of group convolution this is equivariant to the action of .
We extract the rotation equivariant features by using convolution and multiple
convolutions. In between convolutions, we use ReLU and batchnorm operations. After these convolution layers, the output is equivariant to rotations, i.e., if
, then where, is the rotation of using .We use an invariant layer after the convolution layers to compute the integrated response over . The purpose of this invariant layer is to make the entire network invariant to rotations.
Aggregating features: We extract rotation invariant spherical features for each point in the convex hull. With the purpose of classifying the pointcloud, we do aggregation of this invariant features collected over the convex hull. Given , as the extracted features, we use global maxpooling over the convex hull. Thus, for each pointcloud , we identify with a feature . We then use single fully connected layer to classify. An overview of our proposed method is given in Fig. 3.
In the next section, we will give the data description and the experimental details.
3 Experimental results
This section consists of the data description followed by the details of experimental validation.
Data description: In this section, we use OASIS data [6] to address the
classification of demented vs. nondemented subjects using
our proposed framework. This dataset contains at least two MR brain
scans of subjects, aged between to years old. For each
patient, scans are separated by at least one year. The dataset
contains patients of both sexes. In order to avoid gender effects, we take MR scans of male patients alone from three visits, which
resulted in the dataset containing MR scans of subjects with
dementia and subjects without dementia. This gives scans for subjects with dementia and scans for subjects without dementia. We first compute an atlas
(using the method in [9]) from the MR scans of patients without dementia.
After rigidly registering each MR scans to the atlas, we segment out the corpus callosum region from each scan. We represent the shape of the corpus callosum as a 3D pointcloud.
Experimental details:
Below, we provide the experimental details involving choice of hyperparameters, data augmentation scheme and ablation study.
Parameter selection: The dataset consists of points in each pointcloud. We select points on the convex hull sampled from the region of interest. We use Adam optimizer with batch size and learning rate . We use a spherical CNN model with one and one convolution layer where input bandwidths are and and output bandwidths are and respectively. The number of output channels used are and respectively. The total number of parameters used in this model is . In Fig. 4, we show a representative pointcloud from demented and nondemented subjects with the corresponding attention area and the convex hull points.
Data augmentation: As the number of samples () is much smaller than the number of parameters (), we use explicit data augmentations as follows: we use uniformly random downsampling scheme to select points out of points. This essentially increases the number of training samples significantly, hence the learning is feasible. As claimed before our model is rotation invariant, hence the explicit rotation augmentation does not make sense. In Fig. 5, we show that our model is indeed invariant to rotations.
Results and Ablation Study: We achieve classification accuracy with the sensitivity and specificity to be and respectively. If we remove the “attention” module, we can achieve classification accuracy. This clearly indicates the usefulness of the “attention” module used in this work.
4 Conclusions
Pointcloud helps with understanding 3D geometric shapes for medical data. In this work, we propose an “augmentationfree” rotation invariant CNN for pointcloud, and apply it on public dataset OASIS for identifying corpus callosum shapes. The core of our method are the proposed rotation invariant convolution on pointcloud by inducing topology from sphere, and the usage of attention mechanism to focus on specific parts of 3D shapes. Our method achieved superior performance with comparatively lean model. In future, we will focus on unified endtoend trainable model for segmenting ROI from the pointcloud extracted from brain images.
References

[1]
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas,
“Pointnet: Deep learning on point sets for 3d classification and
segmentation,”
in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, 2017, pp. 652–660.  [2] Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” in Advances in neural information processing systems, 2017, pp. 5099–5108.
 [3] Yin Zhou and Oncel Tuzel, “Voxelnet: Endtoend learning for point cloud based 3d object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4490–4499.
 [4] Rudrasis Chakraborty, Monami Banerjee, and Baba C Vemuri, “Statistics on the space of trajectories for longitudinal data analysis,” in 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017). IEEE, 2017, pp. 999–1002.
 [5] Prasanna Muralidharan and P Thomas Fletcher, “Sasaki metrics for analysis of longitudinal data on manifolds,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012, pp. 1027–1034.

[6]
Anthony F Fotenos, AZ Snyder, LE Girton, JC Morris, and RL Buckner,
“Normative estimates of crosssectional and longitudinal brain volume decline in aging and ad,”
Neurology, vol. 64, no. 6, pp. 1032–1039, 2005.  [7] Taco Cohen, Mario Geiger, Jonas Köhler, and Max Welling, “Convolutional networks for spherical signals,” arXiv preprint arXiv:1709.04893, 2017.
 [8] Rudrasis Chakraborty, Monami Banerjee, and Baba C Vemuri, “Hcnns: Convolutional neural networks for riemannian homogeneous spaces,” arXiv preprint arXiv:1805.05487, vol. 2, 2018.
 [9] Brian B Avants, Nick Tustison, and Gang Song, “Advanced normalization tools (ants),” Insight j, vol. 2, pp. 1–35, 2009.
Comments
There are no comments yet.