IntrA
IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning
view repo
Medicine is an important application area for deep learning models. Research in this field is a combination of medical expertise and data science knowledge. In this paper, instead of 2D medical images, we introduce an openaccess 3D intracranial aneurysm dataset, IntrA, that makes the application of pointsbased and meshbased classification and segmentation models available. Our dataset can be used to diagnose intracranial aneurysms and to extract the neck for a clipping operation in medicine and other areas of deep learning, such as normal estimation and surface reconstruction. We provide a largescale benchmark of classification and part segmentation by testing stateoftheart networks. We also discuss the performance of each method and demonstrate the challenges of our dataset. The published dataset can be accessed here: https://github.com/intra3d2019/IntrA.
READ FULL TEXT VIEW PDF
Semantic segmentation of medical images with deep learning models is rap...
read it
Human activities are hugely restricted by COVID19, recently. Robots tha...
read it
In recent years, deep learning has rapidly become a method of choice for...
read it
Pollen grain classification has a remarkable role in many fields from
me...
read it
Many of the current scientific advances in the life sciences have their
...
read it
Unsupervised learning algorithms (e.g., selfsupervised learning,
autoe...
read it
The medical field is creating large amount of data that physicians are u...
read it
IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning
Intracranial aneurysm is a lifethreatening disease, and its surgical treatments are complicated. Timely diagnosis and preoperative examination are necessary to formulate the treatment strategies and surgical approaches. Currently, the primary treatment method is clipping the neck of an aneurysm to prevent it from rupturing, as shown in Figure 2. The decisions of the position and posture of the clip are still highly dependent on “clinical judgment” based on the experience of physicians. In the surgery support system of intracranial aneurysms simulating reallife neurosurgery and teach neurosurgical residents [1], the accuracy of aneurysm segmentation is the most crucial part because it is used to extract the neck of an aneurysm, that is, the boundary line of the aneurysm.
Based on 3D surface models, the diagnosis of an aneurysm can be much more accurate than 2D images. The edge of the aneurysm is much clearer for doctors, and the complicated and timeconsuming annotation of a mess of 2D images is avoided. There are many reports of automatic diagnosis and segmentation of aneurysms based on medical images, including intracranial aneurysm (IA) and abdominal aortic aneurysm (AAA) [23, 29, 37]; however, few reports have been published based on 3D models. This is not only because data collection is inefficient, subjective, and challenging to share in medicine, but also the joint knowledge of computer application science and medical science.
Objects with arbitrary shapes are ubiquitous, and a nonEuclidean manifold reveals more critical information than using Euclidean geometry, like complex typologies of brain tissues in neuroscience[4]
. However, the study of 2D magnetic resonance angiography (MRA) images confines the selection to 3D neural networks based on pixels and voxels, which also omits the information from manifolds. Therefore, we propose an openaccess 3D intracranial aneurysm dataset to solve the above issues and to promote the application of deep learning models in medical science. The points and meshesbased models exhibit excellent generalization abilities for 3D deep learning tasks in our experiments.
Our main contributions are:
We propose an open dataset that consists of 3D aneurysm segments with segmentation annotations, automatically generated blood vessel segments, and complete models of scanned blood vessels of the brains. All annotated aneurysm segments are processed as manifold meshes.
We develop tools to generate 3D blood vessel segments from complete models and to annotate a 3D aneurysm model interactively. The data processing pipeline is also introduced.
We evaluate the performance of various stateoftheart 3D deep learning methods on our dataset to provide benchmarks of classification (diagnose) and segmentation of intracranial aneurysms. Furthermore, we analyze the different features of each method from the results obtained.
Free openaccess online datasets in both medical and nonmedical, and 3D deep learning algorithms are extensively investigated.
Medical dataset. Although data collection is challenging, data sharing is common in medical research. Because largescale samples are required to surmount the challenges of the complexity and heterogeneity of many diseases, it is unattainable for a single research institute. Several medical datasets have been published online for collaboration on finding a treatment solution. For example, integrated dataset MedPix [31], bone Xrays dataset MURA (musculoskeletal radiographs) [35], brain neuroimaging dataset The Open Access Series of Imaging Studies (OASIS) [13], Medical Segmentation Decathlon [38]. Moreover, Harvard GSP is an MRI dataset to address complex questions regarding the relationship between the brain and behavior [5]. SCR database was constructed for the automatic segmentation of anatomical structures in chest radiographs [41].
Data collection is also critical for a single category of disease. The Lung Image Database Consortium image collection (LIDCIDRI) consists of diagnostic and lung cancer screening thoracic computed tomography (CT) scans with markedup annotated lesions [28]. The aim of the Indian Diabetic Retinopathy Image Dataset (IDRiD) is to evaluate algorithms for the automated detection and grading of diabetic retinopathy and diabetic macular edema using retinal fundus images [19]. EyePACS is a retinal image database comprised of diverse populations with various degrees of diabetic retinopathy [11]. Additionally, Autism Brain Imaging Data Exchange (ABIDE) is for the autism spectrum disorder (ASD) [2]. To date, almost all of them are 2D medical images.
Nonmedical 3D dataset.
In recent years, 3D model datasets were introduced in the research of computer vision and computer graphics with the development of deep learning algorithms. For instance, CAD model datasets: modelNet
[47], shapeNet [6], COSEG Dataset [44], ABC dataset [22]; 3D printing model datasets: Thingi10K [50], Human model dataset [30], etc. Various 3D deep learning tasks are widely carried out on these datasets.A 3D model has four kinds of representations, projected view, voxel, point cloud, and mesh. The methods based on projected view or voxel are implemented conveniently using similar structures with 2D convolutional neural networks (CNNs). Point cloud or mesh has a more accurate representation of a 3D shape; however, new convolution structures are required.
Projected View. Su et al. proposed a multiview CNN to recognize 3D shapes [40]. Kalogerakis et al. combined imagebased fully convolutional networks (FCNs) and surfacebased conditional random fields (CRFs) to yield coherent segmentation of 3D shapes [20].
Voxel. Çiçek et al. introduced 3D UNet for volumetric segmentation that learns from sparsely annotated volumetric images [7]. Wang et al. presented OCNN, an Octreebased Convolutional Neural Network (CNN), for 3D shape analysis [42]. Graham et al. designed new sparse convolutional operations to process spatiallysparse 3D data, called submanifold sparse convolutional networks (SSCNs) [14]. Wang and Lu proposed VoxSegNet to extract discriminative features encoding detailed information under limited resolution [45]. Le and Duan proposed the PointGrid, a 3D convolutional network that is an integration of point and grid [25].
Points. Qi et al. proposed PointNet, making it is possible to input 3D points directly for neural work [33], then they introduced a hierarchical network PointNet++ to learn local features [34]. Based on these pioneering works, many new convolution operations were proposed. Wu et al. treated convolution kernels as nonlinear functions of the local coordinates of 3D points comprised of weight and density functions, named PointConv [46]. Li et al. presented PointCNN that can leverage spatially local correlation in data represented densely in grids for feature learning [27]. Xu et al. designed the filter as a product of a simple step function that captures local geodesic information and a Taylor polynomial, named SpiderCNN [48]
. Moreover, the SONet models the spatial distribution of point cloud by building a SelfOrganizing Map (SOM)
[26]. Su et al. presented SPLATNet for processing point clouds that directly operates on a collection of points represented as a sparse set of samples in a highdimensional lattice [39]. Zhao et al. proposed 3D pointcapsule networks [49]. Wang et al. proposed dynamic graph neural network (DGCNN) [43].Mesh. Maron et al. applied a convolution operator to spheretype shapes using a global seamless parameterization to a planar flattorus [30]. Hanocka et al. utilize the unique properties of the triangular mesh for direct analysis of 3D shapes, named MeshCNN [15]. Feng et al. regard the polygon faces as the unit, split their features into spatial and structural features called MeshNet [12].
Our dataset includes complete models with aneurysms, generated vessel segments, and annotated aneurysm segments, as shown in Figure 3. 3D models of entire brain vessels are collected by reconstructing scanned 2D MRA images of patients. We do not publish the raw 2D MRA images because of medical ethics. blood vessel segments are generated automatically from the complete models, including healthy vessel segments and aneurysm segments for diagnosis. An aneurysm can be divided into segments that can verify the automatic diagnosis. aneurysm segments are divided and annotated manually by medical experts; the scale of each aneurysm segment is based on the need for a preoperative examination. The details are described in the next section. Furthermore, geodesic distance matrices are computed and included for each annotated 3D segment, because the expression of the geodesic distance is more accurate than Euclidean distance according to the shape of vessels. The matrix is saved as for a model with points, shortening the training computation time.
Our data have several characteristics common to medical data: 1) Small but diverse. The amount of data is not so large compared to other released CAD model datasets; however, it includes diverse shapes and scales of intracranial aneurysms as well as different amounts of vessel branches. 2) Unbalanced. The number of points of aneurysms and healthy vessel parts is imbalanced based on the shape of aneurysms. The number of 3D aneurysm segments and healthy vessel segments are not equal because the aneurysms are usually much smaller than the entire brain.
Challenge. Experts collected out dataset instead of regular people. Intact 3D models have to be restored from reconstructed data manually, as shown in Figure 4. Besides, the annotation of the neck of aneurysms necks requires years of clinical experience in complex situations. Also, the 3D models are not manifold. We clean the surface meshes to create an ideal dataset for algorithm research.
Statistics and analysis. The statistics of our dataset are shown in Figure 5
. We count the number points in each segment to some extent express the difference of the shapes, since the points are mostly uniformly distributed on the surface. The number of points generated in 1909 segments is approximately
to at Geodesic distance . Our dataset includes all types of intracranial aneurysms in medicine: bifurcation type, trunk type, blister type, and combined type. The shapes of aneurysm are diverse in our dataset both in geometry and topology; six aneurysm segments selected are shown in the right of Figure 3. Besides, we calculated the size of an aneurysm as the ratio between the diagonal distance of the global segment and the aneurysm part instead of the real size of the parent vessel or the aneurysm.We developed annotation tools and segment generation tools to assist in constructing our dataset.
Annotation. Users draw an intended boundary by clicking several points. The connection between two points is determined by the shortest path. After users create a closed boundary line, they annotate the aneurysm part by selecting a point inside of it. The enclosed area is calculated automatically by propagation from the point to the boundary line along with surface meshes. With the support of multiple boundary lines, the annotation tool also can be used for separating both the aneurysm part and the aneurysm segment manually, as shown in Figure 6.
Vessel segment generation. Vessel segments are generated by randomly picking points from the complete models and selecting the neighbor area whose geodesic distance along the vessel is smaller than a threshold. We also manually select points for increasing the number of segments with an aneurysm. To construct an ideal dataset, few data which are ambiguous or only include trivial components are removed by using our visualization tool.
3D reconstruction and restore. Our data are acquired by TimeOfFlight Magnetic Resonance Angiography (TOFMRA) of human brain. Using the single threshold method [17], each complete 3D model is reconstructed from 2D images sliced by . The aneurysm segments are separated and restored interactively using the multithreshold method [21] by two neurosurgeons, then processed by Gaussian smoothing. This image processing is conducted in life sciences software, Amira 2019 (Thermo Fisher Scientific, MA, USA). It takes about 50 workdays in total.
Generation and annotation.
By using our generation and annotation tools, blood vessel segments are obtained and classified. The segmentation annotation of aneurysm segments is also finished. A neurosurgeon completed it in 8 hours.
Data clean and remeshing. The reconstructed 3D models are noisy and not manifold. Huang et al. [18]
described an algorithm to generate a manifold surface for 3D models; however, this method does not remove isolated components and significantly changes the shape of the model. Therefore, we use filter tools in MeshLab to remove duplicate faces and vertices, and separate pieces in the data manually, which ensure that the models do not have nonmanifold edges. MeshLab also generates the normal vector at each point. The geodesic matrix is computed by solving the heat equation on the surface using a fast approximate geodesic distance method by
[8].Diagnose (Classification). The diagnosis of an aneurysm can be considered as a classification problem of aneurysms and healthy vessel segments. From a 3D brain model of a patient, vessel segments are generated by our tools; then, the diagnosis is completed by classifying the segments with aneurysms.
Part segmentation. Our annotated 3D models present a precise boundary of each aneurysm to support segmentation research. The data is easy to convert to any 3D representation of various deep learning algorithms.
We selected stateoftheart methods as the benchmarks of classification and segmentation of our dataset. We implemented dataset interfaces to the original implementations by the authors and kept the same hyperparameters and loss functions of models as in the original papers. A detailed explanation of the implementation of each method is described in the supplementary material. We tested these methods by 5fold crossvalidation. The shuffled data was divided into 5 subsamples and was the same for each method. 4 subsamples were used as training data, 1 was for test data. The experiments were carried out on PCs with GeForce RTX 2080 Ti
2, GeForce GTX 1080 Ti 1. The net training time of all methods was over 92 hours.6 methods were selected for classification benchmarks, including PointNet [33], PointNet++ (PN++) [34], PointCNN [27], SpiderCNN [48], selforganizing network (SONet) [26], dynamic graph CNN (DGCNN) [43]. We combined the generated blood vessel segments and manually segmented aneurysms in total 2025 as the dataset for testing classification accuracy and F1Score of each method. The experimental results are shown in Table 1.
Network  Input  V. (%).  A. (%)  F1Score 

PN++  512  
1024  
2048  
SpiderCNN  512  
1024  
2048  
SONet  512  
1024  
2048  
PointCNN  512  
1024  
2048  
DGCNN  512/10  
1024/20  
2048/40  
PointNet  512  
1024  
2048  

We selected 11 networks, PointGrid [25], two kind of submanifold sparse convolutional networks (SSCNs): fullyconnected (SSCNF) and Unetlike (SSCNU) [14] structures, PointNet [33], two kind of PointNet++ [34]: input with normal (PN++) and with normal and geodesic distance (PN++g), PointConv [46], PointCNN [27], SpiderCNN [48], MeshCNN [15], and SONet [26]
, to provide segmentation benchmarks. 116 annotated aneurysm segments were used for evaluating these methods. The test of each subsample was repeated 3 times, and the final results were the mean values of each best result. We assessed the effects of each method using two indexes: Jaccard Index (JI) and SørensenDice Coefficient (DSC). Jaccard Index is also known as Intersection over Union (IoU). The results of segmentation are shown in Figure
7 and Table 2.PN++ with 1024 sampling points has the highest accuracy of aneurysms, and PointCNN with 2048 sampled points has the greatest accuracy of artery and F1Score. The accuracy and F1score of almost all methods showed an increasing tendency as more sampled points were provided. However, SpiderCNN attained the highest aneurysm detection rate and F1Score at 1024 sampling points. The majority of misclassified 3D models contained smallsized or incomplete aneurysms that are hard to distinguish from healthy blood vessels.
Methods based on points. The segmentation methods based on point cloud obtained good results and maintained the same level with the results on ShapeNet [6]
. SONet showed excellent performance on IOU and DSC of aneurysms, while PointConv had the best result on parent blood vessels. PN++ had the thirdbest performance and had the fastest training speed (5s per epoch, and converged at approximately an epoch of 115 on GTX 1080 Ti). Meanwhile, PointCNN had the slowest training speed (24s per epoch, and converged at approximately an epoch of 500 on GTX 1080 Ti) and moderate segmentation accuracy. SpiderCNN did not have the same performance as it had on the ShapeNet, but CI95 was unusually high. Besides the methods mentioned in Section
4.2, we also tried 3D CapsuleNet [49], but it classified every point into the healthy blood vessel, which shows its limited generalization crossing datasets.Resolution of voxels. Methods based on voxels achieved relatively low IOU and DSC on each fold. The performance of SSCN grew as the resolution was increased from 24 to 40 (the resolution 24 was offered by the authur in the code). But the average IOU had a fluctuation of about 8%, which was quite obvious compared to other methods (about 2%). Based on the paper of PointGrid, and were recommended parameters. However, we noticed that the combination of and achieved the highest scores.
Common poorly segmented 3D models. Most models were segmented excellent as top two rows in Figure 8. However, the accuracy dropped when the aneurysm occupied a small size ratio of the segment, like the third and fourth row. Meanwhile, the segmentation performance of aneurysms with a large size ratio was satisfactory. The fifth row shows a special segment with 2 aneurysms. Although most of methods failed to segment it, PointConv and PN++ with geodesic information maintained a good performance.
Geodesic information. Compared to other CAD model datasets, the complex shapes of blood vessels is a different challenge in part segmentation. Methods based on points usually use the Euclidean distance to estimate the relevance between points. However, it is not ideal for our dataset. For example, PN++ misclassified the aneurysm points close to the blood vessels even with normal information, as shown in the last row of Figure 8. While, by using geodesic distance, PN++ learned more exact spatial structure. PointConv also segmented it well. Its excellent performance can be attributed to the network learning the parameters of spatial filters. In addition, MeshCNN segmented every aneurysm decently although the overall performance is not best, which owes to its convolution on meshes providing information on manifolds.
Network  Input  IoU ()  CI 95 ()  DSC ()  CI 95 ()  

V.  A.  V.  A.  V.  A.  V.  A.  
Point 
SONet  512  
1024  
2048  
Point 
PointConv  512  
1024  
2048  
Point 
PN++g  512  
1024  
2048  
Point 
PN++  512  
1024  
2048  
Point 
PointCNN  512  
1024  
2048  
Mesh 
MeshCNN  750  
1500  
2250  
Point 
SpiderCNN  512  
1024  
2048  
Voxel 
SSCNF  24  
32  
40  
Voxel 
SSCNU  24  
32  
40  
Voxel 
PointGrid  16/2  
16/4  
32/2  
Point 
PointNet  512  
1024  
2048  

In this paper, we introduced a 3D dataset of intracranial aneurysm with annotation by experts for geometric deep learning networks. The developed tools and data processing pipeline is also released. Furthermore, we evaluated and analyzed the stateoftheart methods of 3D object classification and part segmentation on our dataset. The existing methods are likely to be less effective on complex objects, though they perform well on the segmentation of common ones. It is possible to improve further the performance and generalization of networks when geodesic or connectivity information on 3D surfaces is accessible. The introduction of our dataset can be instructive to the development of new structures of geometric deep learning methods for medical datasets.
In further work, we will keep increasing processed real data for our dataset. Besides, we will verify the feasibility of synthetic data for data augmentation, which can significantly improve the efficiency of data collection. We hope more deep learning networks will be applied to medical practice.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 5868–5877. Cited by: §3.4.Proceedings of the AAAI Conference on Artificial Intelligence
, Vol. 33, pp. 8279–8286. Cited by: §2.2.Vessel and intracranial aneurysm segmentation using multirange filters and local variances
. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pp. 866–874. Cited by: §3.4.
Comments
There are no comments yet.