Since the dawn of medical image analysis, researchers have been using 3D imaging to capture structure of the brain. Throughout the last decade, this community has seen the emergence of deep learning due to its power to capture the local structure. One of the major challenges to deal with medical images is the misalignment in the collection of data. Although this can be addressed by image registration, yet the registration techniques incur error which propagates to the subsequent analysis. A way to address this issue is by making the model invariant to the geometric transformations, which entails data augmentation with an increase in model complexity and training time. This leads to the voyage of exploring alternatives of 3D structure representation. Point-cloud is an efficient way to represent 3D structures[1, 2] because of its important geometric properties. Due to the lack of a smooth topology, standard convolution can not be applied on point-cloud. One of the popular approaches to do point convolution 
is to divide the point-cloud into voxels and then extract some features using 3D convolution. However, this method suffers from the possible sparsity of point-clouds which results in multiple empty voxels. One possible solution is to use multi-layer perceptron (MLP) to extract features from each point or from a local neighborhood around each point . Unfortunately, all these methods are susceptible to random rotations which makes them incapable to efficiently deal with the error caused by rigid registration. In this work, we try to address this problem by developing an inherently rotation invariant model.
In recent years, several researchers [4, 5] have proposed methods to do discrimination between subjects with and without dementia. Some of the popular approaches include using the 3D volume of the region of interest (ROI) and analyzing the shape of the anatomical structures of interest. In order to do that, the researchers either proposed techniques to map the 3D volume in high dimensional space  or mapped the shape of the anatomical structure on the complex projective space, i.e., Kendall’s shape space .
In this work, we use our proposed approach to discriminate between demented and non-demented subjects on publicly available OASIS dataset  based on the shape of the corpus callosum (as shown in Fig. 2). It is well-accepted that thinning of corpus-callosum structure is related to dementia. We first segment the corpus callosum structure from the 3D image. Then we use point-cloud sampled from the the corpus callosum structure as the input of our proposed method.
Motivation: In medical imaging, the analysis is highly sensitive to the error caused by the rigid registration due to the presence of large rotations in collection of data. This dictates the necessity of a rotation invariant model for the medical imaging. However, achieving the invariance on 3D image scans is computationally expensive as it entails data augmentation. This motivates us to present 3D image scans with point-clouds. Nonetheless, due to the lack of local neighborhood structure, it is not possible to use standard convolution on the 3D point-cloud. Therefore, we define a rotation invariant convolution operator on point-cloud with the topology induced from sphere.
Our proposed method first constructs sphere around each point and collect response from the point-cloud. The responses are collected based on the inner product between the grid points on the sphere and the points in the point-cloud. We use spherical convolution on the collected responses to extract rotation-invariant features. Finally, we aggregate the invariant response for the entire point-cloud and use this feature to do classification of demented and non-demented subjects. A schematic of our proposed method is shown in Fig. 1.
The salient features of our proposed method are:
(1) The previous methods to deal with point-clouds either use MLP to define “convolution” or use locally flat structure to define “convolution”. To the best of our knowledge, this is the first attempt to define a convolution operator on 3D point-cloud.
(2) In order to achieve geometric invariance, the popular approach is to use data augmentation which naturally increases the training time and model complexity. Here, we propose an “augmentation-free” rotation invariant convolution neural network (CNN) for point-cloud. We perform an experimental evaluation in terms of classifying subjects with and without dementia on publicly available OASIS dataset.
In order to achieve geometric invariance, the popular approach is to use data augmentation which naturally increases the training time and model complexity. Here, we propose an “augmentation-free” rotation invariant convolution neural network (CNN) for point-cloud.To the best of our knowledge, this is the first attempt to model medical images using 3D point clouds. Our model is much leaner because of the induced spherical topology on the point-cloud instead of mapping the point-cloud into a high dimensional space. (3)
We perform an experimental evaluation in terms of classifying subjects with and without dementia on publicly available OASIS dataset.
2 Proposed algorithm
In this work, we propose a rotation invariant CNN using an induced spherical topology on the point-cloud. Though the formulation described below can be applied on for any , in this work we have restricted ourselves to . We divide our proposed algorithm into five key components.
Computing the centroid of a point-cloud: Given the point-cloud , we compute the centroid of the point-cloud to be the nearest point to the mean of . Formally, let us denote the centroid of to be . Then, is defined as
where, is the projection of in the set .
Extracting the “attention” from a point-cloud: Given the point-cloud , we extract the region of interest, i.e., “attention” to be a subset as follows:
(a) Compute the directional part of the vector from Define a random variable following multinomial distribution with
Compute the directional part of the vector fromto each point, . Let the vector be denoted by . (b) Pass the vector through a fully-connected layer to get the confidence, , for selecting . (c)
Define a random variable following multinomial distribution withas the parameter. (d) Draw samples from this random variable to generate . We call this subset to be our region of interest.
Finding Convex hull: We extract convex hull using the following scheme as it is useful to capture the global structure of a geometric shape. At each point , we center a sphere with radius , and collect the response from the set . This gives at each point , a function . Now, given (centered at , we compute the response , where is the unit ball with radius centered at .
If the point-cloud is rotated by the matrix , then the corresponding responses are also rotated by the matrix .
This representation can be viewed as putting omni-directional camera at each point and collecting the responses in each viewing direction. This analogy makes one wonder: Is there a necessity for cameras where is the number of points inside the region of attention? Obviously, for a dense point-cloud the answer is no and hence we have incorporated a downsampling strategy as follows: (a) on each sphere centered at , collect the omni-directional responses from . (a) choose top spheres with the largest responses. Let the chosen index set be .
The selected point set lie on the convex hull of the point-cloud.
Extracting features with spherical Convolution: After constructing sphere at , we use spherical convolution to extract rotation equivariant features. We define spherical convolution as follows:
Definition 1 ( convolution).
As the output of spherical convolution is a function from to , we need convolution in order to develop a deep CNN architecture. We use the standard group convolution with respect to the Haar measure, which we define as follows.
Definition 2 ( convolution).
Given (the signal) and (the learnable kernel), we define the convolution operator as Here, is the volume density on with respect to the Haar measure. As obvious from the definition of group convolution this is equivariant to the action of .
We extract the rotation equivariant features by using convolution and multiple
convolutions. In between convolutions, we use ReLU and batchnorm operations. After these convolution layers, the output is equivariant to rotations, i.e., if, then where, is the rotation of using .
We use an invariant layer after the convolution layers to compute the integrated response over . The purpose of this invariant layer is to make the entire network invariant to rotations.
Aggregating features: We extract rotation invariant spherical features for each point in the convex hull. With the purpose of classifying the point-cloud, we do aggregation of this invariant features collected over the convex hull. Given , as the extracted features, we use global maxpooling over the convex hull. Thus, for each point-cloud , we identify with a feature . We then use single fully connected layer to classify. An overview of our proposed method is given in Fig. 3.
In the next section, we will give the data description and the experimental details.
3 Experimental results
This section consists of the data description followed by the details of experimental validation.
Data description: In this section, we use OASIS data  to address the classification of demented vs. non-demented subjects using our proposed framework. This dataset contains at least two MR brain scans of subjects, aged between to years old. For each patient, scans are separated by at least one year. The dataset contains patients of both sexes. In order to avoid gender effects, we take MR scans of male patients alone from three visits, which resulted in the dataset containing MR scans of subjects with dementia and subjects without dementia. This gives scans for subjects with dementia and scans for subjects without dementia. We first compute an atlas (using the method in ) from the MR scans of patients without dementia.
After rigidly registering each MR scans to the atlas, we segment out the corpus callosum region from each scan. We represent the shape of the corpus callosum as a 3D point-cloud.
Below, we provide the experimental details involving choice of hyperparameters, data augmentation scheme and ablation study.
Parameter selection: The dataset consists of points in each point-cloud. We select points on the convex hull sampled from the region of interest. We use Adam optimizer with batch size and learning rate . We use a spherical CNN model with one and one convolution layer where input bandwidths are and and output bandwidths are and respectively. The number of output channels used are and respectively. The total number of parameters used in this model is . In Fig. 4, we show a representative point-cloud from demented and non-demented subjects with the corresponding attention area and the convex hull points.
Data augmentation: As the number of samples () is much smaller than the number of parameters (), we use explicit data augmentations as follows: we use uniformly random downsampling scheme to select points out of points. This essentially increases the number of training samples significantly, hence the learning is feasible. As claimed before our model is rotation invariant, hence the explicit rotation augmentation does not make sense. In Fig. 5, we show that our model is indeed invariant to rotations.
Results and Ablation Study: We achieve classification accuracy with the sensitivity and specificity to be and respectively. If we remove the “attention” module, we can achieve classification accuracy. This clearly indicates the usefulness of the “attention” module used in this work.
Point-cloud helps with understanding 3D geometric shapes for medical data. In this work, we propose an “augmentation-free” rotation invariant CNN for point-cloud, and apply it on public dataset OASIS for identifying corpus callosum shapes. The core of our method are the proposed rotation invariant convolution on point-cloud by inducing topology from sphere, and the usage of attention mechanism to focus on specific parts of 3D shapes. Our method achieved superior performance with comparatively lean model. In future, we will focus on unified end-to-end trainable model for segmenting ROI from the point-cloud extracted from brain images.
-  Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in , 2017, pp. 652–660.
-  Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” in Advances in neural information processing systems, 2017, pp. 5099–5108.
-  Yin Zhou and Oncel Tuzel, “Voxelnet: End-to-end learning for point cloud based 3d object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4490–4499.
-  Rudrasis Chakraborty, Monami Banerjee, and Baba C Vemuri, “Statistics on the space of trajectories for longitudinal data analysis,” in 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017). IEEE, 2017, pp. 999–1002.
-  Prasanna Muralidharan and P Thomas Fletcher, “Sasaki metrics for analysis of longitudinal data on manifolds,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012, pp. 1027–1034.
Anthony F Fotenos, AZ Snyder, LE Girton, JC Morris, and RL Buckner,
“Normative estimates of cross-sectional and longitudinal brain volume decline in aging and ad,”Neurology, vol. 64, no. 6, pp. 1032–1039, 2005.
-  Taco Cohen, Mario Geiger, Jonas Köhler, and Max Welling, “Convolutional networks for spherical signals,” arXiv preprint arXiv:1709.04893, 2017.
-  Rudrasis Chakraborty, Monami Banerjee, and Baba C Vemuri, “H-cnns: Convolutional neural networks for riemannian homogeneous spaces,” arXiv preprint arXiv:1805.05487, vol. 2, 2018.
-  Brian B Avants, Nick Tustison, and Gang Song, “Advanced normalization tools (ants),” Insight j, vol. 2, pp. 1–35, 2009.