Reliable information about fiber characteristics in short-fiber reinforced polymers (SFRP) is much needed for process optimization during the product development phase. The influence of fiber characteristics on the mechanical properties of SFRP composites is of particular interest and significance for manufacturers [Fu and Lauke(1996)]. The recent development of X-ray computed tomography (CT) for nondestructive quality control enabled the possibility to scan the materials and retrieve the 3D spatial information of SFRPs. Fiber extraction is the first step towards any further analysis of a SFRP material. However, the spatial resolution of a scan is a limiting factor which makes fiber extraction a difficult problem.
Acquiring scans in high resolution is time consuming and costly. Therefore, in this work we consider only scans acquired by a CT system with low (
) resolution. The methods currently in use are usually based on hand designed features. Since fibers can be described as long cylindrically shaped objects, the most widely used family of fully-automatic methods is based on Hessian eigenvalues. Using a set of Hessian based filters at a number of scales, a confidence map of fiber occurrence can be produced[Frangi et al.(1998)Frangi, Niessen, Vincken, and Viergever]. To extract individual fibers, a template matching [Pinter et al.(2016)Pinter, Bertram, and Weidenmann] [Fast et al.(2015)Fast, Scott, Bale, and Cox] or a watershed splitting and skeletonisation technique [Sencu et al.(2016)Sencu, Yang, Wang, Withers, Rau, Parson, and Soutis] [Zhang et al.(2011)Zhang, Li, Yang, Wang, and Liu] is then applied. However, the performance of these methods degrades severely if the resolution is too low and fails to produce meaningful results [Konopczyński et al.(2017)Konopczyński, Rathore, Kröger, Zheng, Garbe, Carmignato, and Hesser]
. A deep learning method has already shown its superiority over Hessian based techniques to produce more accurate results for semantic segmentation of fibers at low CT resolution[Konopczyński et al.(2018)Konopczyński, Rathore, Rathore, Kröger, Zheng, Garbe, Carmignato, and Hesser].
Deep learning architectures have been successfully applied to semantic segmentation problems for both natural 2D images and 3D CT volumes [Long et al.(2015)Long, Shelhamer, and Darrell] [Christ et al.(2016)Christ, Elshaer, Ettlinger, Tatavarty, Bickel, Bilic, and Sommer]. Similar solutions have been found for the problem of 2D instance segmentation. Faster R-CNN [Ren et al.(2015)Ren, He, Girshick, and Sun] and the Mask R-CNN [He et al.(2017)He, Gkioxari, Dollár, and Girshick] architectures are examples of region-proposal-based techniques which are the state-of-the-art for common scene-understanding datasets like COCO [Lin et al.(2014)Lin, Maire, Belongie, Hays, Perona, Ramanan, and Zitnick]
or ImageNet[Deng et al.(2009)Deng, Dong, Socher, Li, Li, and Fei-Fei]. However, it is not clear how this approach can be extended to 3D volumetric data with densely packed objects like fibers in SFRP. This is why for our 3D problem, we have opted for alternative deep learning methods for instance segmentation. There are numerous works in which authors try to come up with different ideas for 2D datasets. An interesting idea that could be extended to 3D volumes has been proposed by [Bai and Urtasun(2017)] to reformulate the problem of instance segmentation into learning a mapping to watershed energy. Then, for the final output, a Watershed transform is applied to get the instances. Unfortunately, this method is not applicable to our problem, because fibers are usually too thin to find a border. Another promising idea proposed by [Romera-Paredes and Torr(2016)]
is to combine convolutional neural networks (CNN) with recurrent neural networks (RNN). The recurrent structures are used to keep track of objects that have already been found, and excludes these regions from further analysis by the algorithm.
In this work we propose a novel deep learning architecture for automatic extraction (instance segmentation) of fibers from low resolution 3D X-ray computed tomography scans of short glass fiber reinforced polymers. The sketch of the method is presented in Fig. 1.
We explore and discuss the performance of the presented method achieved by training on a low resolution SFRP CT scan and compare it to a standard watershed splitting and skeletonisation technique. We test the importance of the semantic segmentation branch by replacing it with a ground truth semantic segmentation. To the best of our knowledge, this is the first attempt of using deep embedding learning for the task of instance segmentation on a 3D volumetric data. The proposed method is also the first to successfully retrieve single-fiber segmentation from a low resolution SFRP CT scan, while the outcome of the standard methods is producing unacceptable results. We base our method on an embedding learning approach [Weinberger and Saul(2009)] [Brabandere et al.(2017)Brabandere, Neven, and Gool]
. The idea is to use a special embedding layer which is placed at the end of a given deep network. The network is then trained by using a special loss function on the final embedding layer which encourages special structure in the embedding space: pixels belonging to the same class should be close, whereas pixels belonging to different classes should be far apart (in the Euclidean metric of the embedding space). The method as a learnable loss function has been first mentioned by[Weinberger and Saul(2009)], and was then used with some modifications in the deep learning architecture of [Brabandere et al.(2017)Brabandere, Neven, and Gool] and [Fathi et al.(2017)Fathi, Wojna, Rathod, Wang, Song, Guadarrama, and Murphy]. These methods achieved competitive performance on 2D datasets compared with the R-CNN based state-of-the-art.
In the problem of fiber segmentation, the network will learn a mapping of each voxel and its surrounding from the input to an embedding space in which voxels belonging to one fiber are separated from voxels belonging to another. Unfortunately, there is one drawback to this method. Such a network is capable of processing only one small sub-volume of a volume at a time because of memory limitation. Each time the network processes a sub-volume it assigns an arbitrary
index to a fiber region. Because of that, we can not do a simple merge as it is usually done for a semantic segmentation problem, where the output is a probability of being an object of a certain class. For a semantic segmentation mask one can take a simple average over overlapping regions in order to merge sub-volumes into a full volume. Therefore, in order to produce an instance segmentation for a full CT scan, we propose a post-processing algorithm, which merges the overlapping predictions of small blocks into a consistent prediction for the entire CT scan during the prediction phase.
Similar to the work of [Brabandere et al.(2017)Brabandere, Neven, and Gool], we have extended the Fully Convolutional Network (FCN) architecture [Long et al.(2015)Long, Shelhamer, and Darrell] designed for semantic segmentation tasks to produce embeddings by using an extra output. The extra output could be attached at the very end of the backbone of the semantic segmentation network as in [Neven et al.(2017)Neven, Brabandere, Georgoulis, Proesmans, and Gool], but in our setup we decided to use two sub-networks. One is responsible for computing the semantic segmentation mask, and the other for computing the embedding of voxels. The network can be trained separately for embedding and semantic segmentation using corresponding outputs or trained together for both tasks at the same time. We will refer to the two sub-networks as semantic segmentation branch and embedding learning branch. The semantic segmentation branch outputs a confidence map that a given voxel belongs to any fiber or not. The embedding learning branch outputs voxels coded in the embedding space. During the training phase, the architecture is trained only based on outputs from the semantic segmentation branch and the embedding learning branch using specified loss functions.
During the prediction phase a clustering step generates clusters corresponding to individual fibers. The clustering method is applied to the embedded voxels which have a high confidence of being a fiber based on the output from the semantic segmentation branch. The outputs from the two branches and the region on which the clustering is computed are presented in Fig 2. The clusters are then mapped back to the spatial domain creating a label volume, where each voxel is assigned an integer label corresponding to the fiber instance it is a part of. To make it possible to use on volumes of any size, we have proposed a greedy merging algorithm. The network produces outputs for overlapping sub-volumes of the input volume, which are then merged to a full volume. The detailed architecture of the network is presented in Fig. 3. In the following sections we will describe the above steps in more detail.
2.1 Semantic Segmentation
The semantic segmentation branch is a standard FCN for semantic segmentation. We have used an architecture that has been designed for the task of semantic fiber segmentation [Konopczyński et al.(2018)Konopczyński, Rathore, Rathore, Kröger, Zheng, Garbe, Carmignato, and Hesser]. The output of the branch is penalized by the standard voxel-wise binary cross entropy loss , as is common for semantic segmentation tasks. It is defined as:
where are the true binary labels, and are the predicted labels. During the prediction phase, the output is thresholded at value 0.5 in order to produce binary masks. An example slice of an output of the branch is shown in Fig 2 (b).
2.2 Embedding Learning Loss
The output of the embedding branch is a representation of the sub-volume in an embedding space. The architecture of the branch is identical to the semantic segmentation branch. The only difference is the number of output channels in the final convolutional layer and the loss function. In the semantic segmentation task, the output is producing a volume with two channels, where one is reasoning on the foreground and the other on the background. In the embedding learning, the output has as many channels as the dimensionality of the embedding space (a hyperparameter in the algorithm). An example visualizing feature maps of the embeddings is shown in Fig.4.
The loss function penalizes voxels of different instances that are too close to each other in the embedding space and encourages voxels of the same instance to be close. As a result, the network maps the voxels into the embedding space, such that voxels that belong to the same fiber should be placed next to each other and form easily separable clusters.
We find that the loss function introduced by [Brabandere et al.(2017)Brabandere, Neven, and Gool] inspired by work of [Weinberger and Saul(2009)] and extended to 3D by us works best for our problem. Even though we have extended the problem to 3D, and have used data that contains a high number of objects compared to common scene-understanding problems, the method does not seem to be affected by that. The loss consists of three terms: keeps voxels belonging to the same object close to each other, which forces a minimal distance between clusters of different objects, and which regularizes the cluster centers to be close to the origin. The terms are defined as:
where is the number of objects in the ground truth patch (clusters), is the number of voxels that corresponds to the object , is the embedding in the final embedding layer, is the mean of the embedding of object , is the norm, and . The parameters and are used to control the desired positions of the clusters. The final loss for the embedding learning is a sum of the previous components.
where and control the strength of the corresponding term. An example slice of an output of the branch is visualized in Fig 2 (c). Note, that the loss is computed only based on the voxels that belong to the foreground fibers. It is the task of the semantic segmentation branch to find the correct position of the fibers.
1. Visualization of the clustering steps of the method. (a) Masked embeddings form clusters in a multi-dimensional embedding space (visualized by t-SNE). (b) DBSCAN clusters the embedding representation and assign a different index (color) to each fiber (cluster) with black crosses for outliers. (c) Clusters are then mapped back to the spatial domain. Here a corresponding example slice of the mapping with red pixels for outliers. (d) The watershed algorithm is then applied as a post-processing step to fill the outliers.
As discussed in the previous section, the semantic segmentation output creates a confidence map that a given voxel belongs to any fiber or not. A clustering is then applied to the embedded voxels with a high confidence of being fibers. An example input slice of one of the feature maps of the embedding is shown in Fig 2 (d). In this work, we found DBSCAN [Ester et al.(1996)Ester, Kriegel, Sander, and Xu] to work best on the SFRP dataset. In contrast to Mean Shift used in [Brabandere et al.(2017)Brabandere, Neven, and Gool], DBSCAN does not make assumptions about the shape of the clusters. We apply clustering only in the prediction phase because the instance segmentation loss function does not require the instance segmentation map. Note, that DBSCAN does not necessarily assign a label to all voxels. Voxels that were not assigned to any label are assigned as outliers. The clusters are then mapped back to the spatial domain creating an instance segmentation map. Outliers are extrapolated based on their neighborhood in the spatial domain by use of the watershed algorithm, using the clustering labels as seeds. An example visualization of the described steps with help of the t-SNE [Maaten and Hinton(2008)] is shown in Fig 5.
Finally, the inference is produced on small overlapping sub-volumes of the entire volume. Each sub-volume contains different label IDs for fibers, making it not clear which fiber is which. To overcome this problem we have designed a merging algorithm, which joins label IDs among the sub-volumes based on a spatial distance of fibers in the overlapped regions. The algorithm is applied at each sub-volume and processes recursively one fiber at a time, looking at neighboring sub-volumes with overlapping regions with objects being close to the fiber of interest. The merging procedure is described in more details in algorithm 1.
We have evaluated the proposed setup on two hand-annotated regions of low resolution CT scans of SFRP composites acquired by a Nikon MCT225 X-ray CT system from [Konopczyński et al.(2017)Konopczyński, Rathore, Kröger, Zheng, Garbe, Carmignato, and Hesser]. Scans exhibit typical artifacts and have low, but isotropic resolution. The parts from which the scans were acquired were manufactured by micro injection molding using PBT-10% GF, a commercial polybutylene terephthalate PBT (BASF, Ultradur B4300 G2) reinforced with short glass fibers (10% in weight). The volumes have been hand annotated with center lines and processed by a watershed algorithm to create the instance segmentation ground truth. Both volumes are cubes of dimension 62 260 260 with approx. 6,500 fibers each. Fibers have a diameter of 10-14 (2-3 voxels) and are approx. 1.1 mm long. One scan is used for training, while the other is only used for testing.
|Raw volume||Ground truth||CC||Our method|
3.2 Training details
The volumes have been normalized to have unit variance and zero mean. Additionally, most of the air voxels surrounding the specimen have been removed by a simple thresholding method. We have trained and evaluated the network on sub-volumes offrom the training volume. The sub-volumes are randomly flipped and rotated (by 90, 180 or 270 degrees) during the training phase. As mentioned in the introduction, and shown in Figure 3, for backbones of both the semantic segmentation and embedding learning branch we have used the architecture proposed in [Konopczyński et al.(2018)Konopczyński, Rathore, Rathore, Kröger, Zheng, Garbe, Carmignato, and Hesser] designed for semantic fiber segmentation. It is a 3D FCN with standard residual units [He et al.(2016)He, Zhang, Ren, and Sun]Ioffe and Szegedy(2015)]
but with no max-pooling to keep the resolution of the already very thin fibers.
The embedding learning is not stable, when trained from noise. Therefore, first we have trained the semantic segmentation branch for 20,000 iterations and saved the weights. Then, we have used the weights as an initialization for the embedding learning branch and trained it for another 20,000 iterations. The loss used for training the embedding learning uses the semantic ground truth masks.
It would also have been an option to share the embeddings and weights for both tasks. Such setup is reported to slightly increase the performance of both semantic and instance segmentation [Neven et al.(2017)Neven, Brabandere, Georgoulis, Proesmans, and Gool]. However, in our setup, we have found the above two-stage training to work better. We use 16 feature embedding maps and set and to 1 and to 0.001. Optimization has been done by using the Adam optimizer [Kingma and Ba(2014)] with an initial learning rate set to 0.001. During the prediction phase, the algorithm processes overlapping sub-volumes of the test volume with an overlap of in each direction. The post-processing merging algorithm merges the overlapping sub-volumes and produces the final instance segmentation volume.
For a metric we have use the Adjusted Rand Index [Hubert and Arabie(1985)] to measure the performance of instance segmentation. We find it more informative in the context of SFRP data over the mAP. Defining the ground truth labels as clusters and the corresponding predicted labels as clusters , the Adjusted Rand Index is:
where , , , , and is the number of voxels in the volume. The Rand Index varies from 0 to 1, where 1 means a perfect match between the algorithm output and the ground truth mask.
|Setup||Mean ARI||Merged ARI|
|Embedding Learning + true semantic||0.9129||0.7817|
|Connected Components + true semantic||0.3614||0.2534|
We have compared our method to a standard skeletonization followed by connected component analysis and the Watershed method [Zhang et al.(2011)Zhang, Li, Yang, Wang, and Liu]. In the method, a binary erosion is first applied on the semantic mask, which serves as seeds after connected component analysis for a watershed segmentation algorithm. See Fig. 6 for a visual comparison. We have also evaluated the importance of a good semantic segmentation mask. We provide results for both our method and connected components given the semantic segmentation computed by the semantic segmentation branch as well as using the ground truth semantic segmentation.
Therefore we compare four different setups. Our Embedding Learning method using the final instance segmentation produced given the semantic segmentation mask from the semantic segmentation branch. Embedding Learning + true semantic which is our method but using ground truth semantic segmentation mask instead of the one produced by the methods branch (which is not ideal). Connected Components and Connected Components + true semantic is the connected component method used either on the output of the semantic segmentation branch or the ground truth semantic mask.
We provide two results in Table 1 for each setup. In the first column, the mean ARI is the mean ARI of all the sub-volumes in the test volume without the merging step. In the second column one can see the score computed over the entire volume after the post-processing merging step which we call a merged ARI. We report the ARI score only for the voxels that belong to the ground truth instance segmentation mask. Including the background voxels would artificially increase the score.
While the standard method clearly fails even when using the true semantic segmentation mask, the proposed method produces meaningful results in all cases. When reasoning on small overlapping patches the proposed method achieves 0.9048 average ARI score. The merging algorithm has trouble with ambiguity of two neighboring outputs and favors merging over splitting. This results in merging two fibers into one, when they are too close to each other. After the merging post-processing step the ARI score decreases to 0.6529.
In this work, we proposed a deep 3D fully convolutional architecture together with a set of post-processing steps for a problem of single fiber segmentation from CT scans of SFRP. We extend a less common approach of embedding learning for the task of 3D instance segmentation. We explain in detail the steps of the method together with a post-processing and a merging procedure. We show that we are better than the traditional skeletonization - watershed method. We expect our findings to be applicable to a wide variety of volumetric data and not only to fiber composites.
- [Bai and Urtasun(2017)] M. Bai and R. Urtasun. Deep watershed transform for instance segmentation.
- [Brabandere et al.(2017)Brabandere, Neven, and Gool] B. De Brabandere, D. Neven, and L. Van Gool. Semantic instance segmentation with a discriminative loss function. arXiv preprint arXiv:1708.02551., 2017.
- [Christ et al.(2016)Christ, Elshaer, Ettlinger, Tatavarty, Bickel, Bilic, and Sommer] P. F. Christ, M. E. A. Elshaer, F. Ettlinger, S. Tatavarty, M. Bickel, P. Bilic, and W. H. Sommer. Automatic liver and lesion segmentation in ct using cascaded fully convolutional neural networks and 3d conditional random fields. International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 415–423, 2016.
- [Deng et al.(2009)Deng, Dong, Socher, Li, Li, and Fei-Fei] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
- [Ester et al.(1996)Ester, Kriegel, Sander, and Xu] M. Ester, H. P. Kriegel, J. Sander, and X. Xu. Density-based spatial clustering of applications with noise. Int. Conf. Knowledge Discovery and Data Mining, 1996.
- [Fast et al.(2015)Fast, Scott, Bale, and Cox] T. Fast, A. E. Scott, H. A. Bale, and B. N. Cox. Topological and euclidean metrics reveal spatially nonuniform structure in the entanglement of stochastic fiber bundles. Journal of materials science, 50(6):2370–2398, 2015.
- [Fathi et al.(2017)Fathi, Wojna, Rathod, Wang, Song, Guadarrama, and Murphy] A. Fathi, Z. Wojna, V. Rathod, P. Wang, H. O. Song, S. Guadarrama, and K. P. Murphy. Semantic instance segmentation via deep metric learning. arXiv preprint arXiv:1703.10277., 2017.
- [Frangi et al.(1998)Frangi, Niessen, Vincken, and Viergever] A. F. Frangi, W. J. Niessen, K. L. Vincken, and M. A. Viergever. Multiscale vessel enhancement filtering. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 1998.
- [Fu and Lauke(1996)] S. Y. Fu and B. Lauke. Effects of fiber length and fiber orientation distributions on the tensile strength of short-fiber-reinforced polymers. Composites Science and Technology, 56(10):1179–1190, 1996.
- [He et al.(2016)He, Zhang, Ren, and Sun] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
- [He et al.(2017)He, Gkioxari, Dollár, and Girshick] K. He, G. Gkioxari, P. Dollár, and R. Girshick. Mask r-cnn. International Conference on Computer Vision, pages 2980–2988, 2017.
- [Hubert and Arabie(1985)] L. Hubert and P. Arabie. Comparing partitions. Journal of classification, 2(1):193–218, 1985.
- [Ioffe and Szegedy(2015)] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
- [Kingma and Ba(2014)] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- [Konopczyński et al.(2017)Konopczyński, Rathore, Kröger, Zheng, Garbe, Carmignato, and Hesser] T. Konopczyński, J. Rathore, T. Kröger, L. Zheng, C. S. Garbe, S. Carmignato, and J. Hesser. Reference setup for quantitative comparison of segmentation techniques for short glass fiber ct data. Conference on Industrial Computed Tomography, 2017.
- [Konopczyński et al.(2018)Konopczyński, Rathore, Rathore, Kröger, Zheng, Garbe, Carmignato, and Hesser] T. Konopczyński, D. Rathore, J. Rathore, T. Kröger, L. Zheng, C. S. Garbe, S. Carmignato, and J. Hesser. Fully convolutional deep network architectures for automatic short glass fiber semantic segmentation from ct scans. Conference on Industrial Computed Tomography, 2018.
- [Lin et al.(2014)Lin, Maire, Belongie, Hays, Perona, Ramanan, and Zitnick] T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, and C. L. Zitnick. Microsoft coco: Common objects in context. European conference on computer vision, pages 740–755, 2014.
- [Long et al.(2015)Long, Shelhamer, and Darrell] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. conference on computer vision and pattern recognition, pages 3431–3440, 2015.
[Maaten and Hinton(2008)]
L. V. D. Maaten and G. Hinton.
Visualizing data using t-sne.
Journal of machine learning research, 9:2579–2605, 2008.
- [Neven et al.(2017)Neven, Brabandere, Georgoulis, Proesmans, and Gool] D. Neven, B. De Brabandere, S. Georgoulis, M. Proesmans, and L. Van Gool. Fast scene understanding for autonomous driving. arXiv preprint arXiv:1708.02550, 2017.
- [Pinter et al.(2016)Pinter, Bertram, and Weidenmann] P. Pinter, B. Bertram, and K. A. Weidenmann. A novel method for the determination of fibre length distributions from uct-data. Conference on Industrial Computed Tomography, 2016.
- [Ren et al.(2015)Ren, He, Girshick, and Sun] S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, pages 91–99, 2015.
- [Romera-Paredes and Torr(2016)] B. Romera-Paredes and P. H. S. Torr. Recurrent instance segmentation. European Conference on Computer Vision, pages 312–329, 2016.
- [Sencu et al.(2016)Sencu, Yang, Wang, Withers, Rau, Parson, and Soutis] R. M. Sencu, Z. Yang, Y. C. Wang, P. J. Withers, C. Rau, A. Parson, and C. Soutis. Generation of micro-scale finite element models from synchrotron x-ray ct images for multidirectional carbon fibre reinforced composites. Composites Part A: Applied Science and Manufacturing, 91:85–95, 2016.
- [Weinberger and Saul(2009)] K. Q. Weinberger and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10:207–244, 2009.
- [Zhang et al.(2011)Zhang, Li, Yang, Wang, and Liu] X. Zhang, D. Li, W. Yang, J. Wang, and S. Liu. A fast segmentation method for high-resolution color images of foreign fibers in cotton. Composites Part A: Applied Science and Manufacturing, 78(1):71–79, 2011.