Multi-channel MR Reconstruction (MC-MRRec) Challenge – Comparing Accelerated MR Reconstruction Models and Assessing Their Genereralizability to Datasets Collected with Differ

11/10/2020 ∙ by Youssef Beauferris, et al. ∙ 0

The 2020 Multi-channel Magnetic Resonance Reconstruction (MC-MRRec) Challenge had two primary goals: 1) compare different MR image reconstruction models on a large dataset and 2) assess the generalizability of these models to datasets acquired with a different number of receiver coils (i.e., multiple channels). The challenge had two tracks: Track 01 focused on assessing models trained and tested with 12-channel data. Track 02 focused on assessing models trained with 12-channel data and tested on both 12-channel and 32-channel data. While the challenge is ongoing, here we describe the first edition of the challenge and summarise submissions received prior to 5 September 2020. Track 01 had five baseline models and received four independent submissions. Track 02 had two baseline models and received two independent submissions. This manuscript provides relevant comparative information on the current state-of-the-art of MR reconstruction and highlights the challenges of obtaining generalizable models that are required prior to clinical adoption. Both challenge tracks remain open and will provide an objective performance assessment for future submissions. Subsequent editions of the challenge are proposed to investigate new concepts and strategies, such as the integration of potentially available longitudinal information during the MR reconstruction process. An outline of the proposed second edition of the challenge is presented in this manuscript.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 24

page 25

page 26

page 27

page 29

page 30

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Magnetic resonance (MR) imaging is a commonly used, diagnostic imaging modality. It is a non-invasive technique that provides images with excellent soft-tissue contrast. Brain MR imaging, for example, produces a wealth of information, which often leads to definitive diagnosis of a number of neurological conditions, such as cancer and stroke. MR data acquisition occurs in the Fourier or spatial-frequency domain, more commonly referred to as

k-space. Image reconstruction consists of transforming the acquired k

-space raw data into interpretable images. Traditionally, data is collected following the Nyquist sampling theorem, and a simple inverse Fourier transform operation is sufficient to reconstruct an image. The physics underlying the MR data acquisition process, however, makes fully sampled acquisitions inherently slow. This fact represents a crucial drawback when MR imaging is compared to other medical imaging modalities and impacts both patient tolerance of the procedure and throughput.

Parallel imaging (PI) (sense; grappa; deshmane2012parallel) and compressed sensing (CS) (lustig2007; liang2009accelerating) are two proven approaches that are able to reconstruct high-fidelity images from sub-Nyquist sampled acquisitions. PI techniques leverage the spatial information available across multiple, spatially distinct, receiver coils to allow reconstruction of undersampled k-space. PI techniques, such as generalized autocalibrating partially parallel acquisition (GRAPPA) (grappa), which operates in the k-space domain, and sensitivity encoding for fast MR imaging (SENSE) (sense), which works in the image domain, are currently used clinically. CS methods leverage image sparsity properties to allow reconstruction of undersampled k-space data. Some CS techniques, such as compressed SENSE (liang2009accelerating), also have seen clinical adoption.

Those PI and CS methods that have been approved for routine clinical use are generally restricted to relatively conservative accelerations (e.g., to acceleration). Currently employed MR scanning protocols, even those that use PI and CS, typically require between 30 and 45 minutes per patient procedure. Longer procedural times lessen the likelihood of patient acceptance and likely increase susceptibility to motion artifacts. Relatively long procedural times combined with the high-costs of installing a MR scanner ( per scanner) and ongoing operational and maintenance costs, also make MR examinations relatively expensive ( per examination). Finally, long procedure times and the limited number of installed scanners in many jurisdictions can potentially render these procedures less accessible (e.g., wait times 9 weeks in Canada (waittimes)).

In 2016, the first deep-learning-based MR image reconstruction models were presented 

(wang2016; sun2016). The excellent initial results obtained by these models caught the attention of the MR imaging community, and subsequently, dozens of deep-learning-based MR reconstruction models were proposed, cf.(hammernik2018; sun2016; wang2016; schlemper2017; dedmari2018complex; Schlemper2018StochasticDC; kwon2017parallel; RN323; dautomap; pawar2019deep; eo2018kiki; qin2019convolutional; mardani2019deep; zhang2018multi; semantic_interpretability; akccakaya2019scan; souza2018hybrid; RN305; RN307; RN289; RN311; zengK2019; sriram2020grappanet; zhou2020dudornet; hosseini2020dense) provides a partial listing. Many of these studies demonstrated superior quantitative results from deep-learning-based methods compared to non-deep-learning-based MR reconstruction algorithms (hammernik2018; schlemper2017; knoll2020advancing). Deep-learning-based methods were found to be capable of simultaneously leveraging the spatial information across multiple receiver coils and image sparsity (i.e., PI and CS combined). These new methods are also capable of accelerating MR examinations more than traditional PI and CS methods. There is good evidence that deep-learning-based MR reconstruction methods can accelerate MR examinations by factors  (zbontar2018fastmri). Image reconstruction is considered by many a frontier of machine learning (wang2018image).

A significant drawback, that hinders the progress of the MR reconstruction field, is the lack of benchmark datasets. Importantly, the lack of benchmarks makes comparison of different methods challenging. The fastMRI effort (zbontar2018fastmri) is an excellent initiative that provides large volumes of raw MR k-space data. The initial release of the fastMRI dataset provided two-dimensional (2D) MR acquisitions of the knee. A subsequent release added brain MR data. The Calgary-Campinas (souza2017) initiative contains numerous sets of brain imaging data. For MR reconstruction experiments, Calgary-Campinas initiative also provides MR raw data from three-dimensional (3D) acquisitions. The k-space datasets correspond to either 12- or 32-channel data.

The goals of the first edition of the Multi-channel MR Reconstruction (MC-MRRec) challenge was to help fill that gap in terms of MR reconstruction benchmarks, facilitate comparison of different reconstruction models, better understand the challenges related to clinical adoption of these models, and investigate the upper limits of MR acceleration. The specific objectives of the first edition of the challenge (2020) were:

  1. Compare different MR reconstruction models on a large dataset, and

  2. Assess the generalizability of these models to datasets acquired with different number of channels.

The results presented in this report correspond to challenge submissions received up to 27 October 2020. These results provide a relevant performance summary of some state-of-the-art of MR reconstruction approaches including different model architectures, processing strategies, and emerging metrics for training and assessing reconstruction models. This edition of the MC-MRRec continues as an online challenge and is accepting new submissions.

Subsequent editions of the challenge will expand on these efforts and investigate pertinent new concepts and strategies. In the second edition of the challenge, we plan to investigate the integration of potentially available longitudinal information during MR image reconstruction (known as longitudinally integrated MR imaging, LI-MRI).

2 Materials and Methods

2.1 Calgary-Campinas Raw MR Dataset

The data used in this challenge were acquired as part of the Calgary Normative Study (mccreary2020calgary), which is a multi-year, longitudinal project that investigates normal human brain ageing by acquiring quantitative MR imaging data using a protocol approved by our local research ethics board. Raw data from T1 volumetric imaging were anonymized and incorporated into the Calgary-Campinas (CC) dataset (souza2017). The dataset currently provides k-space data from 167 3D, T1-weighted, gradient-recalled echo, 1 mm isotropic sagittal acquisitions collected on a clinical 3-T MR scanner (Discovery MR750; General Electric Healthcare, Waukesha, WI). The brain scans are from presumed healthy subjects ( years; range: years to years, male).

The datasets were acquired using either a 12-channel (117 scans) or 32-channel receiver coil (50 scans). Acquisition parameters were TR/TE/TI = 6.3 ms / 2.6 ms / 650 ms (93 scans) or TR/TE/TI = 7.4 ms / 3.1 ms / 400 ms (74 scans), with 170 to 180 contiguous 1.0 mm slices and a field of view of mm mm. The acquisition matrix size for each channel was , where , , and denote readout, phase-encode and slice-encode directions, respectively. In the slice-encode () direction, only 85% of the k-data was collected; the remainder (15% of 170-180) was zero-filled. This partial acquisition technique is a typical occurrence and represents a very realistic situation in MR imaging. Because -space undersampling only occurs in the phase-encode and slice-encode directions, the 1D inverse Fourier transform (iFT) along has already been performed and hybrid datasets were provided. This pre-processing effectively allows the MR reconstruction problem to be treated as a 2D problem (in and ). The partial Fourier reference data was reconstructed by taking the 2D iFT along the plane for each individual channel and combining these using the conventional square-root sum-of-squares algorithm (larsson2003snr).

2.2 MC-MRRec Challenge Description

The MC-MRRec challenge was launched in February 2020. We plan for this challenge to be ongoing and propose to use a combination of sessions at meetings and virtual sessions, supplemented by periodic on-line updates. The challenge is readily extensible and new challenge tracks are expected to be added in further editions.

The initial results of the first edition of the challenge were released on 9 July 2020, during the 2020 Medical Imaging with Deep Learning virtual conference (https://2020.midl.io/). After the conference, the challenge continued online. It continues to receive submissions and updating its on-line leader-board (https://sites.google.com/view/calgary-campinas-dataset/mr-reconstruction-challenge). It takes on average 48 hours from receipt of a submission to generate an update of the leader-board.

The first edition of the MC-MRRec challenge was split into two separate tracks. Teams could decide whether to submit a solution to just one track or to both tracks. Each track has a separate leader-board. The tracks were:

  • Track 01: Teams had access to 12-channel data to train and validate their models. Models submitted were evaluated only using the 12-channel test data.

  • Track 02: Teams had access to 12-channel data to train and validate their models. Models submitted were evaluated for both the 12-channel and 32-channel test data

In both tracks, the goal was to assess the MR reconstruction quality and noticeable loss of high-frequency details, especially at the higher acceleration rates. By having two separate tracks, we hoped to determine whether a generic reconstruction model trained on data from one coil would have decreased performance when applied to data from another coil. Two MR acceleration factors () were tested: and . These factors were chosen intentionally to exceed the acceleration factors typically used clinically. A Poisson disc distribution sampling scheme, where the center of k-space was fully sampled within a circle of radius of 16 pixels to preserve the low-frequency phase information, was used to achieve these acceleration factors.

Teams that participated in the challenge had to agree to the posted Challenge Code of Conduct, to make their code publicly available, and to provide a two-page maximum description of their method or a reference, if their method had been published. The provided method descriptions are available as supplementary material. Participating teams are also encouraged to submit independent manuscripts describing their methods.

The training, validation and test split of the dataset is summarized in Table 1. The initial 50 and last 50 slices in the test set were removed because they have little anatomy present. The fully sampled k-space data of the training and validation sets were made public for teams to develop their models. Pre-undersampled k-space data corresponding to the test sets were provided for the teams for accelerations of and .

Coil Category # of datasets # of slices
12-channel Train 47 12,032
Validation 20 5,120
Test 50
32-channel Test 50
Table 1: Summary of the raw MR -space datasets used in the first edition of the challenge. The number of slices in the test sets are after removal of the initial 50 and last 50 slices because they have little anatomy present.

2.3 Evaluation Metrics

In order to measure the quality of the image reconstructions, three commonly used, quantitative performance metrics were selected: peak signal-to-noise ratio (pSNR), structural similarity (SSIM) index 

(wang2004), and visual information fidelity (VIF) (sheikh2006). The choice of performance metrics is challenging and it is recognized that objective measures like pSNR, SSIM and VIF may not correlate well with subjective human image quality assessments. Nonetheless, these metrics provide a broad basis to assess model performance in this challenge.

The pSNR is a metric commonly used for MR reconstruction assessment and consists of the log ratio between the maximum value of the reference reconstruction and the root mean squared error (RMSE):

(1)

where is the reference image, is the reconstructed image, and is the number of pixels in the image. Higher pSNR values represent higher-fidelity image reconstructions. However, pSNR does not take into consideration factors involved in human vision. For this reason, increased pSNR can suggest that reconstructions are of higher quality, when in fact they may not be as well perceived by the human visual system.

Unlike pSNR, SSIM and VIF are metrics that attempt to model aspects of the human visual system. SSIM considers biological factors like luminance, contrast and structural information. SSIM is computed using:

(2)

where and represent corresponding image windows from the reference image and the reconstructed image, respectively, and

represent the mean and standard deviation inside the image window,

, and and represent the mean and standard deviation inside the reconstructed image window, . The constants and are used to avoid numerical instability. SSIM values for non-negative images are within , where 1 represents two identical images.

The VIF metric is based on natural scene statistics (simoncelli2001; wilson2008). VIF models the natural scene statistics based on a Gaussian scale mixture model in the Wavelet domain, and additive white Gaussian noise is used to model the Human Visual System (HVS). The natural scene of the reference image is modeled into wavelet components (C) and the HVS is modeled by adding a zero-mean white Gaussian noise in the wavelet domain (N), which results in the perceived reference image (E = C + N). In the same way, the reconstructed image, which is called the distorted image, is also modeled by a natural scene model (D) and the HVS model (N’), leading to the perceived distorted image (F = D + N’). The VIF is given by the ratio of the mutual information of I(C, F) and I(C, E):

(3)

where represents the mutual information.

Mason et al. (mason2019comparison)

investigated the VIF metric for assessing MR reconstruction quality. Their results indicated that it has a stronger correlation with subjective radiologist opinion about MR quality than other metrics like pSNR and SSIM. VIF’s Gaussian noise variance was set to

as recommended in (mason2019comparison). All metrics were computed slice-by-slice in the test set. The reference and reconstructed images were normalized by dividing them by the maximum value of the reference image.

2.4 Ranking Criteria

The ranking criteria is based on the three metrics described previously (pSNR, SSIM, VIF). For each slice in the test set and for and acceleration factors, the three metrics were extracted and the average values for these metrics computed. For each metric, teams were ranked from best average metric to worse average metric. Note that these rankings are computed across acceleration factors. The next step was to compute a composite score using the following weighted sum:

(4)

The final challenge ranking was obtained by ordering the composite scores for each team from the lowest to the highest value. The weights for each metric were , , and . This choice of weights have much influence on the final ranking computation. In this challenge we gave priority to metrics that try to match features of human perception.

2.5 Models

Track 01 of the challenge had five baseline models, selected from the literature. These models where the zero-filled reconstruction, the U-net model (RN254), the WW-net model (souza2020dual), the hybrid-cascade model (souza19a), and a parallel-hybrid model. Track 01 also received four independent submissions from teams ResoNNance (lonning2019recurrent), The Enchanted, TUMRI, and M-L UNICAMP. The links to the source code for all models and a description of unpublished methods are available in the supplementary material.

Track 02 of the challenge had two baseline models. These models where the zero-filled reconstruction and the U-net model. Teams ResoNNance and The Enchanted also submitted their models to Track 02 of the challenge.

3 Results

To date, four teams have participated in the challenge Track 01 and two teams participated in challenge Track 02. For comparison, five other models were included in Track 01 and two on Track 02 to help establish a performance measurement baseline.

3.1 Track 01

The quantitative results for Track 01 and the rankings for this track are shown in Table 2 and Table 3, respectively. There were nine entries (five baseline and four submitted) in Track 01. For , team ResoNNance achieved the best SSIM and pSNR metrics, while it achieved the third best VIF metric. Team The Enchanted obtained the best VIF metric and the second best SSIM value, but it achieved only the seventh best pSNR measure. Team TUMRI achieved the second best VIF score.

For , Team The Enchanted achieved the best VIF and the second best SSIM metrics, but the eighth best pSNR score. Team ResoNNance achieved the best SSIM and pSNR scores, and second best VIF measure. As anticipated, the metric scores indicated worse performance for each method with compared to the results.

In the final ranking that composes both and metrics, team ResoNNance was ranked first, followed by team The Enchanted. Although team The Enchanted achieved the best overall VIF metric and second best overall SSIM, it ranked seventh in terms of pSNR. Team ResoNNance had more consistent results across the metrics and was ranked first in terms of SSIM and pSNR, and second in terms of VIF. Team TUMRI was ranked third overall. It achieved the second best pSNR, third best VIF, and the fifth best SSIM score. Three of the four submitted models were ranked above the best performing baseline approach (WW-net) in Track 01. Representative reconstructions resulting from the different models for and are depicted in Figure 1 and Figure 2, respectively.

3.2 Track 02

Two teams submitted responses to the Track 02 challenge and their results were compared to two baseline techniques. The results for Track 02 using the 12-channel test dataset are summarized in Table 4. Team ResoNNance achieved the best SSIM and pSNR for and . It achieved the second best VIF for and . Team The Enchanted achieved the best VIF score for and . Team The Enchanted obtained the second best SSIM for and , and the second best and third best pSNR for and , respectively. Representative reconstructions resulting from the different models for and using the 12-channel test dataset are depicted in Figure 3 and Figure 4, respectively.

The results for Track 02 using the 32-channel test set are summarized in Table 5. For the 32-channel test dataset, team ResoNNance obtained the best results across all metrics and acceleration factors. Team The Enchanted was ranked second in terms of SSIM and VIF, and third in terms of pSNR for both and . Representative reconstructions resulting from the different models for and using the 32-channel test set are depicted in Figure 5 and Figure 6, respectively. The Track 02 rankings are summarized in Table 6.

4 Discussion

The first track of the challenge compared nine different reconstruction models (Table 2). As expected, the zero-filled reconstruction, which does not involve any training from the data, universally had the poorest results. The second worst technique was the U-net model, which uses as input the channel-wise zero-filled reconstruction and tries to recover a high-fidelity image. The employed U-net (RN254) model did not include any data consistency steps. The remaining seven models all had some form of data consistency step, which seems to be an essential step for high-fidelity reconstruction, as has been pointed out in (schlemper2017; eo2018kiki).

The Parallel-hybrid and M-L UNICAMP models both explored parallel architectures that operated both in the k-space and image domains. Parallel-hybrid was ranked seventh and M-L UNICAMP was tied for fifth position.

The top ranked methods were the cascaded networks (Hybrid-cascade, WW-net, TUMRI, The Enchanted) or the recurrent method (ResoNNance). The top two models were both submitted networks: ResoNNance and The Enchanted. Both of these teams estimated coil sensitivities and combined the channels, which made these models flexible and capable of working with datasets acquired with an arbitrary number of channels. Both were image-domain methods.

The other highly ranked deep learning models used an approach that receives all channels as input, which make these models tailored to a specific coil configuration (i.e., number of channels). Though the methods that combined the channels before reconstruction achieved the best results so far, it is still unclear if this approach is superior to models that do not combine the channels before reconstruction. A recent work (sriram2020grappanet) indicated that the latter approach may be advantageous compared to models that combine the k-space channels before reconstruction. All of the models submitted to the MC-MRRec challenge had a relatively narrow input convolutional layer (e.g. 64 filters), which may have resulted in the loss of relevant information. In (sriram2020grappanet), they used 15-channel data and the first layer had 384 filters.

TUMRI, WW-net and Hybrid-cascade are hybrid methods that operate in both -space and image domains. TUMRI is the only method that implemented a complex-valued network.

We also noted the large variability in the ranking across metrics (Table 2). For example for , team The Enchanted had the best VIF and second best SSIM metrics, but was ranked only eighth in terms of pSNR. This variability reinforces the importance of organizing challenges in that they concisely summarise the result of multiple methods using a consistent set of multiple metrics. Studies that use a single image quality metric are potentially problematic as the chosen measure may mask performance issues. While imperfect, the use of a composite score attempts to reduce this inherent variability by averaging multiple metrics.

Careful inspection of the MR image reconstructions (cf. Figure 1 and Figure 2) indicates that the models proposed by teams ResoNNance and The Enchanted are capable of reconstructing background noise that is similar to the background noise present in the reference reconstruction. This observation led us to question whether the metrics we used are the best suited to determine the reconstruction quality. Given a noisy reference image, a noise free reconstruction will potentially achieve lower pSNR, SSIM, and VIF than the same reconstruction with added noise. This finding is contrary to the human eye perception, where noise impacts the image quality negatively and is, in general, undesired. One potential solution to mitigate this issue is to mask out noise in the image background before computing the metrics.

All trainable baseline models and the model submitted by M-L UNICAMP used mean squared error as their cost function. The model submitted by TUMRI was trained using a combination of multi-scale SSIM (msssim) and VIF as their cost function. The model submitted by team The Enchanted has two components in their cost function: 1) their model was trained using mean squared error as the cost function with the target being the coil-combined complex-valued fully sampled reference and then 2) their Down-Up network (yu2019deep)

received as input the absolute value of the reconstruction obtained in the previous stage and the reference was the square-root sum-of-squares fully sampled reconstruction. The Down-Up network was trained using SSIM as the loss function.

The model submitted by team ResoNNance used a combination of SSIM and the norm as the training loss function, which is a combination that has been shown to be effective for image restoration (zhao2016loss). Because the background in the images is quite substantial and SSIM is a bounded metric that is computed across image patches, this observation causes models trained using SSIM as part of their loss function to try to match the background noise in their reconstructions. This observation may offer a potential explanation why the models submitted by teams The Enchanted and ResoNNance were able to preserve the noise pattern in their reconstructions.

For , the top three models: TUMRI, The Enchanted, and ResoNNance produced the most visually pleasing reconstructions and also had the top performing metrics. The reconstruction from team TUMRI missed details of the brain sulci in some cases (Figure 1). For , the reconstructed images lost much of the high-frequency details and clearly have little clinical value (Figure 2). It is important to emphasize that and in the challenge are relative to the of k-space that was sampled in the slice-encode () direction. If we consider the equivalent full k-space, these acceleration factors would be and . Based on the Track 01 results, we would say that an acceleration between 5 and 6 might be feasible to be incorporated into a clinical setting for a single-sequence MR image reconstruction model. Further analysis of the image reconstructions by radiologists is needed to better assess clinical value before achieving a definite conclusion.

Although VIF has been shown to have a strong correlation with radiologist opinion of image quality (mason2019comparison), the radiologists that participated in this study did not know which images were the fully sampled reference. Therefore, their assessment of image quality might not correlate well with the underlying structure being imaged, and this fact raises issues whether VIF is an appropriate metric to assess MR image reconstruction quality.

The second track of the challenge compared four different reconstruction models (Table 4 and Table 5). Teams The Enchanted and ResoNNance achieved the best overall results. For the 12-channel test set (Figure 3 and Figure 4), the results were similar to the results they obtained in Track 01 of the challenge. More interesting are the results for the 32-channel test set. Though the metrics for the 32-channel test set are higher than for the 12-channel test set, by visually inspecting the quality of the reconstructed images, it is clear that 32-channel image reconstructions are of poorer quality compared to 12-channel reconstructions (Figure 5 and Figure 6). This fact raises concerns about generalizability of the reconstruction models across different coils. Potential approaches to mitigate this issue is to include representative data collected with different coils in the training and validation sets or develop data augmentation strategies that simulate data acquired under different coil configurations.

Though the generalization of learned MR image reconstruction models and their potential for transfer learning have been previously assessed

(knoll2019assessment), the results from Track 02 of our challenge indicate that there is still room for improvements.

5 Future Directions

The MC-MRRec challenge is intended to provide an extensible framework to assess reconstruction models. It is also intended to be responsive to the changing needs of the scientific community it serves. We plan for subsequent editions of the challenge to investigate new concepts and strategies.

We proposed that the second edition of the The MC-MRRec challenge will explore the use of previously acquired image data to help guide image reconstruction. MR images acquired over time are continuously revisited to guide future patient care decisions. Nevertheless, potentially available longitudinal information is wholly disregarded during MR acquisition and reconstruction. Integrating longitudinal information during acquisition and reconstruction has the potential of making MR examinations more efficient by making them faster and detecting longitudinal image changes seamlessly during MR acquisition and reconstruction. The longitudinal information referred here is across MR scan sessions and not across MR sequences that are acquired within a session. There are methods that already deal with the challenge of multi-sequence MR reconstruction (xiang2018deep; zhou2020dudornet).

Faster MR examinations resulting from undersampled acquisitions will lead to a reduction of MR examination costs and it will maker MRI more accessible to the population as a screening tool. Currently, MR examinations are interpreted mainly by radiologists, and it can take several days before they produce their medical reports. Faster MR imaging will exacerbate this issue by producing larger volumes of data that need to be analyzed. Thus, it is crucial to integrate analysis tools during acquisition and reconstruction to make the whole MR workflow more efficient not just in terms of acquisition speed, but also the velocity the images are analyzed by the radiologists. In this paradigm, a person that is scanned for the first time will be eligible for an acceleration . While, a subject that has had a previous MR examination will be eligible for an acceleration (). Also, for subjects that have been scanned before, longitudinal change maps will be produced as well as the usual reconstructed images. A preliminary investigation using the U-net model to incorporate longitudinal information improved the MR reconstruction results by  (souza2020enhanced). The dataset used in this study corresponded to brain scans of older presumed normal subjects that were scanned years apart. Brain changes due to normal aging were expected to be seen on these scans.

The second edition of the MC-MRRec challenge will include Track 03 that will investigate this longitudinally integrated MR Imaging (LI-MRI) paradigm. Past subject-specific MR information is promptly accessible through the picture archiving and communication systems (PACS) present in hospitals and clinics. A summary of the LI-MRI paradigm is depicted in Figure 7. Track 03 of the MC-MRRec challenge will officially be launched and start receiving submissions in April 2021.

6 Summary

The MC-MRRec provided an objective benchmark for assessing brain MR image reconstruction and the generalizability of models across datasets collected using different coils. Track 01 compared nine models and Track 02 compared four models. The results indicated that current models do not generalize well across datasets collected using different coils. This indicates a promising research field in the coming years that is very relevant for the potential clinical adoption of deep-learning-based MR image reconstruction models. The MC-MRRec continues as an online challenge and future editions will have further tracks (i.e., Track 03 LI-MRI). The organization of the challenge will also incorporate more data into the challenge datasets, which will potentially allow to train deeper models. We also expect to have radiologists to assess the image reconstructions in future challenge editions.

Acknowledgements

We thank the Seaman Family MR Research Centre and the Canadian Institutes for Health Research (CIHR, FDN-143298, PI: Frayne) for supporting the Calgary Normative Study and acquiring the raw datasets. The CIHR also provided ongoing operating support for this project. We also acknowledge the infrastructure funding provided by the Canada Foundation of Innovation (CFI). The organizers of the challenge also acknowledge Nvidia for providing a Titan V graphics processing unit and Amazon Web Services for providing computational infra-structure that was used by some of the teams to develop their models.

The Medical Imaging with Deep Learning (MIDL 2020) conference organizers are thanked for nimbly moving to a virtual format and providing a platform for the first edition of this challenge. YB was supported by an Alberta Innovates (AI) Summer Studentship. RF holds the Hopewell Professorship in Brain Imaging. RS was supported by the T Chen Fong postdoctoral fellowship and the Canada Open Neuroscience Platform research scholar award (funded by Brain Canada).

References