Prostate cancer is a major cause of cancer mortality among men. Multi-parameter MRI is being used, both in diagnosis and in treatment of prostate cancer. For these purposes, however, segmentation is necessary, which requires a lot of expertise. So automation of this task could greatly benefit clinical practice, possibly even enabling population-wide pre-emptive screening. In particular, it would be useful to segment two different zones within the prostate: the transition zone (TZ) and peripheral zone (PZ), since these have differing guidelines for mpMRI diagnosis of cancer. This is quite challenging, especially the border between the two zones is subtle and hard to segment.
Automatic segmentation has greatly improved recently, most notably due to the use of convolutional neural networks like UNet.[cicek] A variant, VNet, has been applied to full prostate segmentation [milletari] and recently a 3D-version of UNet has been used to do multi-zonal prostate segmentation. [germonda]
In cardiac imaging, autoencoders have been used as a way of implementing prior knowledge into neural networks, with some positive results. [oktay] We hypothesize that the same techniques can be used to increase the accuracy of automatic multi-label prostate segmentation.
The used dataset constists of 64 3D T2-weighted MRI volumes of the prostate and surrouding region from the 2016 Detection Archive [dataset]. In these volumes both TZ and PZ are annotated by hand. See for example Figure 1.
The original images are too large to fit in memory, so they are cropped and rescaled, see Table 1.
During training, the data is augmented by small transitions, left-right flips, isotropic expansions, elastic deformations, and rotations.
|dimension||original voxels||voxel size||rescaled voxels||voxel size|
|x||384 or 640||0.5mm||36||3mm|
|y||384 or 640||0.5mm||36||3mm|
The main network used for segmentation is based on the 3D-UNet architecture, with one modification: to reflect the anisotropicity of the MRIs, some 3D-convolutions were replaced by 2D-convolutions. [cicek, germonda]
In an attempt to improve this, an autoencoder is added.
An autoencoder is a neural network that consists of two parts: an encoder, that reduces a given segmentation to a lower-dimensional encoding and a decoder, that aims to reconstruct the original segmentation from the encoding as accurately as possible.
Since the size of the encoding is lower than the input, the encoder has to capture the most important features of the data. So this lower-dimensional encoding can be used as a summary of the global properties of a segmentation.
The used autotoencoder is a fully convolutional one that reduces the 36x36x18 segmentation to a 9x9x5 encoding. See for an example Figure 5.
This autoencoder is trained on the manual annotations in the dataset for 100 epochs, using binary crossentropy loss and an Adam optimizer.
The main metric used to evaluate performance is the DICE score, given by
where is the prediction and the ground truth.
The autoencoder could reconstruct the TZ with an average DICE of 0.95 and the PZ with a DICE of 0.85.
During training of the 3D-UNet, the pre-trained encoder is used to add an extra global loss, as seen in Figure 6. This global loss is added to the pixel-wise loss, where the pixel-wise loss has a weight factor of 1 and the encoder-generated global loss a weight factor of 0.2.
The pixel-wise loss is calculated by weighted categorical crossentropy, where the background has weight 1, the TZ weight 2, and the PZ weight 6, in order to compensate for label inbalances.
The 3D-UNet was trained for 300 epochs, using an Adam optimizer, a learning rate of 0.0001 and L2 kernel regulazation.
First the 3D-UNet was trained without the extra loss provided by the encoder, and the results were compared to a 3D-UNet that was trained with the extra encoder-based loss.
The 3D-UNet that was trained with the encoder obtained slightly better results.
|3D-UNet trained with encoder||0.85||0.67|
In this work we applied convolutional autoencoders to aid the training of a 3D-UNet in multi-label prostate segmentation. This did increase the segmentation accuracy, but only slightly.
One of the reasons the improvement is quite small could be that the image size is already reduced quite significantly before using it in 3D-UNet. It could be studier further whether for larger images, the autoencoder has more impact on performace.
Another possible improvement would be gradually decreasing the weight of the encoder-based loss while training the 3D-UNet, since the contribution the encoder-based global loss is largest during the beginning of the training.