Real-time cine Magnetic Resonance Imaging (MRI) has enabled fast and accurate visual guidance in various cardiac interventions, such as aortic valve replacement , cardiac electroanatomic mapping and ablation , electrophysiology for atrial arrhythmias , intracardiac catheter navigation , and myocardial chemoablation . In these applications, it is strongly desirable to segment the temporal frames on-the-fly, satisfying both throughput and latency requirements. The throughput should be at least above the cine MRI reconstruction rate of 22 frames per second (FPS) [19, 13]. The latency should be no more than 50 ms to avoid visually noticeable lags . Most of the existing segmentation methods [14, 29, 28, 23, 27, 26], however, focus on accuracy. In order to handle cardiac border ambiguity and large variations among target objects from different patients, these methods come with high computation cost. Hence their inference latency and throughput are far from meeting the real-time requirements and thus can only be applied offline.
MSU-Net  was proposed in MICCAI’19 as the first framework achieving the real-time segmentation of 3D cardiac cine MRI. It uses a canonical form distribution to describe the multiple input frames in a snippet of cine MRI so that only a single pass through the network is needed for all the frames in the snippet. While MSU-Net increases the throughput drastically, the inference latency is also increased to well above 50 ms due to the need of input clustering, i.e., the inference is carried out only after all the frames in a snippet have arrived. When MSU-Net is applied to real-time cine MRI segmentation, such significant visual lags jeopardize the effectiveness of visual guidance in cardiac intervention.
As a popular computational method for decomposing a multivariate signal into additive independent non-Gaussian signals (bases), Independent Component Analysis (ICA) has been widely used in multiple image processing applications such as noise reduction , image separation in medical data [5, 3] and image decomposition . Through the unmixing process in ICA, any image patch out of a given image can be represented by a linear combination of a set of independent bases of the same size as the image patch. In the mixing process, the original image can be reconstructed using the bases with proper coefficients.
In this paper, based on a new interpretation of ICA for learning (Section 2
), we propose ICA-UNet, a novel model that can not only achieve highly accurate 3D cardiac cine MRI segmentation results, but also attain both high throughput and low latency. Specifically, an input temporal frame in the cine MRI is decomposed into independent bases and a mixing tensor, composed of the coefficient tensors of all the bases, by a light-weight ICA-encoder. Such an ICA-encoder mimics the unmixing operation in ICA. A U-Net like architecture is trained to learn the transform of the mixing tensor from its original function of image reconstruction to the target function of image segmentation. As such, the transformed mixing tensors can be mixed with the bases through light-weight ICA-decoders to get the desirable features for final segmentation evaluation. Because the coefficient tensors that compose the mixing tensor are much smaller in size than the input frame and can be processed in parallel due to the independence between their corresponding bases, significant latency reduction can be achieved.
Experiment results show that, compared with the state-of-the-art real-time cardiac cine MRI segmentation method MSU-Net, ICA-UNet achieves much higher Dice scores for all cardiac classes with up to 12.6 latency reduction. More specifically, the latency of ICA-UNet is below 50 ms while its throughput is still above 22 FPS, which implies that ICA-UNet is the first method meeting the real-time performance requirements in terms of both throughput and latency for MRI guided cardiac intervention with no visually noticeable lags. In fact, the accuracy achieved by ICA-UNet is on a par with state-of-the-art methods that focus on accuracy and can only run offline because of their complexity.
It has long been recognized that Independent Component Analysis (ICA) can be used to extract features (bases) from images . Following the similar setup as , we can partition an image into a set of smaller image patches such that each image patch can be represented as a linear combination of independent basis image patches along with their coefficients. Compactly put, for a set of input image patches
where each row vector of
represents an input image patch, the goal of ICA is to estimate the unmixing matrixsuch that the realizations of bases are as mutually independent as possible (which is called the unmixing process), while the reconstruction of input image patches, , is as close to as possible (which is called the mixing process). Matrix is called the mixing matrix, which equals the pseudo inverse of . There are different ICA algorithms and implementations, and one popular implementation is FastICA. In the rest of the paper, we extend the matrix term to tensor as mixing tensor and basis tensor , due to the multi-dimension nature of the input images. Each channel in represents a basis and the number of channels equals the basis dimension of ICA. The corresponding coefficient tensor of each basis can be extracted from the mixing tensor . We will also use basis tensor and bases interchangeably for the simplicity of discussion.
We can have an interesting interpretation of ICA: Both the mixing tensor and the bases can be considered as some kind of feature representation of the input image, with the bases being more fundamental to the input image while the mixing tensor being more related to a particular application such as reconstruction of the input image. In other words, bases can be treated as well-behaved image features that can be reused for different applications, while the mixing tensor can be treated as weights of a simple fully connected layer used to reconstruct the input image.
With such an insight, we wonder if we can learn to transform the mixing tensor so that it can be utilized for a different set of applications (with the help of the bases) beyond the original image reconstruction. As illustrated in Fig. 1
(a), the original mixing tensor for image reconstruction is transformed to that for target application, which, after mixing together with the bases, can be used to get the desirable target features for final evaluation of the target application. As the bases are shared, only the mixing tensor, which is composed of the coefficient tensors of all the bases, needs to be transformed. During this process, the coefficient tensors of different bases can be computed in parallel due to the independence between the bases, and each of which is much smaller than the original input. Thus significant latency reduction can be achieved. Since the mixing tensor still exhibits spatial patterns, a conventional image oriented deep neural networks such as U-Net can be used as the backbone to learn the transform.
In conventional ICA, the unmixing operation is lossy, which affects the downstream application (such as image reconstruction) accuracy, and runs as a separate optimization process, which can be quite time-consuming. Since the learning of the target mixing tensor is also an optimization problem, why not combining the ICA process with the learning process as a joint end-to-end training process so that we can not only mitigate the impact of lossy unmixing operation on accuracy, but also reduce one separate optimization process? Such a motivation drives us to propose a lightweight neural network based ICA encoder and decoder to mimic the unmixing and mixing operations in ICA. We further integrate them with a U-Net backbone so that they can be end-to-end trained.
Driven by the motivation in Section 2, a conceptual illustration of our ICA-UNet is shown in Fig. 1(b). Its detailed architecture is shown in Fig. 2, where we use superscript to denote the time stamp of input frame (e.g. ). A summary of all the notations used in this section is also included in the figure. ICA-UNet is mainly made of four types of modules: the ICA-encoder, the contracting blocks (), the expanding blocks (), and the ICA-decoders.
is the number of contracting/expanding blocks acting as a hyperparameter that can affect accuracy and speed, as will be shown in the experiments.
1) ICA-encoder: The ICA-encoder extracts both the statistically independent basis tensor and the associated initial mixing tensor from the input image. Instead of running a standard ICA process where the image needs to be explicitly partitioned and an explicit iterative optimization is needed to obtain the mixing tensor and the realization of the bases, we propose to use a neural network to obtain them as a function of the input image.
For an input frame size of (,,) for depth, height, and width, respectively, we can choose the basis dimension (in this paper ), the size of the initial mixing tensor as (1,,,,), and the size of independent basis tensor as (1,,,,), where is the output channel width of the transposed convolution between the concatenated mixing tensor and in the ICA-decoder ( in this paper). Each channel in corresponds to an independent basis. The corresponding coefficient tensor of each basis can be obtained from the mixing tensor , each of size (1,1,,,). The extraction of and
shares some layers for low level feature extraction before they are split channel-wise in order to reduce the computation.
After both and are obtained, is forwarded to the following contracting block , and is directly forwarded to each ICA-decoder block for mixing operation.
The objective function of ICA-encoder, which is used for regulating the optimization towards sparsity, independence and accuracy, can be expressed as
where , and are the weights of the loss terms which are set to 1.0 in our experiments. The first term reflects sparsity through L1 norm. The second term reflects independence through neg-entropy ; is a constant number between and (we take in our experiments); denotes element-wise average. The third term reflects reconstruction loss. We adopt transposed convolution () as the mixing operation, so the L2 distance between the reconstructed frame and the original frame should be minimized.
2) Contracting blocks: The contracting blocks of ICA-UNet are designed to further propagate the mixing tensor , and generate the learned ones in a multi-resolution manner. As shown in Fig. 2, the contracting part is made of contracting blocks, ranging from to . The contracting block () takes
as input, propagates it through a downsampling module (i.e., convolution with stride 2) and convolution modules (i.e., conv) sequentially, and outputs, which is then forwarded to the next contracting block as well as the corresponding expanding block .
3) Expanding blocks: The expanding blocks () are designed to process the features generated by the contracting blocks and forward the outputs to ICA-decoder blocks and the next expanding block (or concatenation block for ). An expanding block has two sub-tasks: upsampling the mixing tensor to (by upsample block ), and calculating dimension aligned correlation features from the neighbour frames (by concatenation block ). Note we use to represent the mixing tensor in the upsampling path, distinguishing it from those in the downsampling path .
We use transposed convolution to achieve the upsampling on , during which we obtain the outputs with various resolutions. Taking , , , and , , from the outputs of contracting blocks for frames and , respectively, we can calculate the temporal correlation features following  between mixing tensors and , and between and (). The obtained correlation features explicitly provide matching information from the neighbouring frames for more accurate segmentation. The correlation features are then concatenated with , and forwarded through a convolution module with kernel (conv-bn-leakyrelu) for dimension reduction. These computations are processed in the concatenation block as shown in Fig. 2.
4) ICA-decoder: The ICA-decoder block is designed to mimic the mixing operation between the concatenated mixing tensor () and the basis , as in the standard ICA, to generate the output for evaluation. As discussed earlier, transposed convolution is used as the mixing operation. The transposed convolution acts as both upsampling for evaluation and the multiplication projection field between and each value in , which helps reducing both the parameter size and the computation load.
After mixing, the mixed features are propagated through a convolution module as the output for evaluation. From to we can obtain a total of multi-resolution segmentation outputs for evaluations, denoted as , . After we get from , we forward , , , , and to the final concatenation block and its corresponding ICA-decoder, where we obtain the output with the same size as the original input. Thus, we obtain a total of outputs in multi-resolution for evaluation.
Objective function: We evaluate the outputs from the decoder blocks with the multi-resolutions ground-truth. The overall objective function is
where () is the evaluation loss of the multi-resolution outputs in the decoder (with Cross-entropy); is the corresponding loss weights, while we take for and ; is the loss from Equation (1), and the weight is set to in our experiments. Cross-entropy is used for calculating with the rescaled versions of ground truth.
3.0.1 Latency reduction analysis
For the inference of a frame of size , the mixing tensor , as the input to the backbone, is composed of the coefficient tensors, each of size , which is smaller than the original frame size for a regular U-Net. In addition, these coefficient tensors can be handled in parallel (i.e., a new task parallelism) due to the independence between the bases. Note that the processing of each coefficient tensor can still utilize any existing parallelization techniques such as model or operator parallelization [8, 6, 4, 21] by applying them on the backbone. Therefore, significant latency reduction can be achieved.
Experiment Setup: We evaluate our model on an extended ACDC MICCAI 2017 challenge dataset made available by MSU-Net  with labels on all the frames in the training data. We compare our ICA-UNet with MSU-Net, the state-of-the-art real-time cine MRI segmentation method on the same dataset . We perform 5-fold cross-validation and use the average Dice score (the higher the better) to evaluate the segmentation accuracy. To further see how the accuracy of ICA-UNet compare against the state-of-the-art offline segmentation methods that achieves high accuracy but looses real-time performance, we evaluate the test data by submitting the segmentation results of ED and ES instants to ACDC online evaluation platform .
All the methods were implemented in PyTorch and trained from scratch with the same hyperparameters and optimizer setting. MSU-Nets were based on the implementations by. All the networks are fully parallelized using CUDA/CuDNN. All experiments run on a machine with 16 cores of Intel Xeon E5-2620 v4 CPU, 256G memory, and an NVIDIA Tesla P100 GPU.
|Methods||Dice score||Hausdorff (mm)|
Performance on real-time segmentation: The results of ACDC 3D cardiac cine MRI segmentation are shown in Table 1. We can see that ICA-UNet increases the Dice score by 0.061(RV), 0.050(MYO), 0.051(LV), and 0.053(average), respectively, compared with the best results achieved by MSU-Nets. ICA-UNets also achieve smaller Dice score variations than MSU-Nets in most cases. In terms of throughput, although both ICA-UNets and MSU-Nets can satisfy the real-time requirement of 22 FPS, only ICA-UNets can meet real-time latency requirement (below 50 ms), up to 12.6 faster than MSU-Nets. In summary, ICA-UNet not only achieves the best Dice score, but also is the only real-time segmentation method that can simultaneously meet the real-time throughput and latency requirements for visual guidance of cardiac interventions.
From the table, we can also see that the number of convolutional decoder blocks is an effective tuning knob for Dice and speed tradeoff. A higher number of blocks result in higher Dice scores at the cost of slightly reduced throughput and increased latency. Visualization of segmentation results by ICA-UNet along with the corresponding ground truth is shown as Fig. 3.
Accuracy v.s. state-of-the-art offline methods: To see how the accuracy of ICA-UNet compares with state-of-the-art offline segmentation methods which do not satisfy real-time requirements, we further verify our ICA-UNet on ED and ES instants of ACDC test data. The evaluation results reported by , in terms of both dice score and Hausdorff distance, are shown in Table 2. The results from the best approaches in the literature, including GridNet , -Net , and ensemble U-Net , are also included for reference. With complex network structures, the latency and throughput of these methods are far from the real-time requirements, as shown in . In contrast, we see that the accuracy of ICA-UNet comes very close to these state-of-the-art results while meeting the real-time throughput and latency requirements.
Inspired by ICA, ICA-UNet decomposes temporal frames in 3D cardiac cine MRI into independent bases and the corresponding coefficient tensors, which are much smaller in size and help to learn better. Experimental results show that compared with the state-of-the-arts, ICA-UNet is the only 3D cardiac cine MRI segmentation method that can satisfy both real-time throughput and latency requirements with comparable (if not better) accuracy.
-  ACDC challenge, https://www.creatis.insa-lyon.fr/Challenge/acdc/
-  Annett, M., Ng, A., Dietz, P., Bischof, W., Gupta, A.: How low should we go? understanding the perception of latency while inking. In: 2014 Graphics Interface. pp. 167–174 (2014)
-  Bronstein, A.M., Bronstein, M.M., Zibulevsky, M., Zeevi, Y.Y.: Sparse ICA for blind separation of transmitted and reflected images. International Journal of Imaging Systems and Technology 15(1), 84–91 (2005)
-  Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., Shelhamer, E.: cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014)
-  Delorme, A., Sejnowski, T., Makeig, S.: Enhanced detection of artifacts in EEG data using higher-order statistics and independent component analysis. Neuroimage 34(4), 1443–1449 (2007)
-  Dryden, N., Maruyama, N., Benson, T., Moon, T., Snir, M., Van Essen, B.: Improving strong-scaling of cnn training by exploiting finer-grained parallelism. In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). pp. 210–220. IEEE (2019)
-  Gaspar, T., Piorkowski, C., Gutberlet, M., Hindricks, G.: Three-dimensional real-time mri-guided intracardiac catheter navigation. European heart journal 35(9), 589–589 (2014)
-  Gholami, A., Azad, A., Jin, P., Keutzer, K., Buluc, A.: Integrated model, batch, and domain parallelism in training neural networks. In: Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures. pp. 77–86 (2018)
-  Hoyer, P.O., Hyvärinen, A.: Independent component analysis applied to feature extraction from colour and stereo images. Network: computation in neural systems 11(3), 191–210 (2000)
-  Hyvärinen, A., Karhunen, J., Oja, E.: Independent component analysis, vol. 46. John Wiley & Sons (2004)
-  Hyvärinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural networks 13(4-5), 411–430 (2000)
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: Flownet 2.0: Evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2462–2470 (2017)
-  Iltis, P.W., Frahm, J., Voit, D., Joseph, A.A., Schoonderwaldt, E., Altenmüller, E.: High-speed real-time magnetic resonance imaging of fast tongue movements in elite horn players. Quantitative imaging in medicine and surgery 5(3), 374 (2015)
-  Isensee, F., Jaeger, P.F., Full, P.M., Wolf, I., Engelhardt, S., Maier-Hein, K.H.: Automatic cardiac disease assessment on cine-MRI via time-series segmentation and domain specific features. In: International workshop on statistical atlases and computational models of the heart. pp. 120–129. Springer (2017)
-  McVeigh, E.R., Guttman, M.A., Lederman, R.J., Li, M., Kocaturk, O., Hunt, T., Kozlov, S., Horvath, K.A.: Real-time interactive mri-guided cardiac surgery: Aortic valve replacement using a direct apical approach. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 56(5), 958–964 (2006)
-  Olshausen, B.A., Field, D.J.: Natural image statistics and efficient coding. Network: computation in neural systems 7(2), 333–339 (1996)
-  Radau, P.E., Pintilie, S., Flor, R., Biswas, L., Oduneye, S.O., Ramanan, V., Anderson, K.A., Wright, G.A.: Vurtigo: visualization platform for real-time, mri-guided cardiac electroanatomic mapping. In: International Workshop on Statistical Atlases and Computational Models of the Heart. pp. 244–253. Springer (2011)
-  Rogers, T., Mahapatra, S., Kim, S., Eckhaus, M.A., Schenke, W.H., Mazal, J.R., Campbell-Washburn, A., Sonmez, M., Faranesh, A.Z., Ratnayaka, K., et al.: Transcatheter myocardial needle chemoablation during real-time magnetic resonance imaging: a new approach to ablation therapy for rhythm disorders. Circulation: Arrhythmia and Electrophysiology 9(4), e003926 (2016)
-  Schaetz, S., Voit, D., Frahm, J., Uecker, M.: Accelerated computing in magnetic resonance imaging: Real-time imaging using nonlinear inverse reconstruction. Computational and mathematical methods in medicine 2017 (2017)
-  Starck, J.L., Elad, M., Donoho, D.L.: Image decomposition via the combination of sparse representations and a variational approach. IEEE transactions on image processing 14(10), 1570–1582 (2005)
-  Vasudevan, A., Anderson, A., Gregg, D.: Parallel multi channel convolution using general matrix multiplication. In: 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP). pp. 19–24. IEEE (2017)
-  Vergara, G.R., Vijayakumar, S., Kholmovski, E.G., Blauer, J.J., Guttman, M.A., Gloschat, C., Payne, G., Vij, K., Akoum, N.W., Daccarett, M., et al.: Real-time magnetic resonance imaging–guided radiofrequency atrial ablation and visualization of lesion formation at 3 tesla. Heart Rhythm 8(2), 295–303 (2011)
-  Vigneault, D.M., Xie, W., Ho, C.Y., Bluemke, D.A., Noble, J.A.: -net (omega-net): fully automatic, multi-view cardiac mr detection, orientation, and segmentation with deep neural networks. Medical image analysis 48, 95–106 (2018)
-  Wang, T., Xiong, J., Xu, X., Jiang, M., Yuan, H., Huang, M., Zhuang, J., Shi, Y.: MSU-Net: Multiscale statistical u-net for real-time 3d cardiac mri video segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 614–622. Springer (2019)
-  Wang, T., Xiong, J., Xu, X., Shi, Y.: SCNN: A general distribution based statistical convolutional neural network with application to video object detection. arXiv preprint arXiv:1903.07663 (2019)
-  Xu, X., Lu, Q., Yang, L., Hu, S., Chen, D., Hu, Y., Shi, Y.: Quantization of fully convolutional networks for accurate biomedical image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 8300–8308 (2018)
-  Xu, X., Wang, T., Shi, Y., Yuan, H., Jia, Q., Huang, M., Zhuang, J.: Whole heart and great vessel segmentation in congenital heart disease using deep neural networks and graph matching. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 477–485. Springer (2019)
-  Yan, W., Wang, Y., Li, Z., Van Der Geest, R.J., Tao, Q.: Left ventricle segmentation via optical-flow-net from short-axis cine mri: preserving the temporal coherence of cardiac motion. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 613–621. Springer (2018)
Zotti, C., Luo, Z., Lalande, A., Jodoin, P.M.: Convolutional neural network with shape prior applied to cardiac MRI segmentation. IEEE journal of biomedical and health informatics (2018)