fastMRI
A large-scale dataset of both raw MRI measurements and clinical MRI images.
view repo
Accelerating Magnetic Resonance Imaging (MRI) by taking fewer measurements has the potential to reduce medical costs, minimize stress to patients and make MRI possible in applications where it is currently prohibitively slow or expensive. We introduce the fastMRI dataset, a large-scale collection of both raw MR measurements and clinical MR images, that can be used for training and evaluation of machine-learning approaches to MR image reconstruction. By introducing standardized evaluation criteria and a freely-accessible dataset, our goal is to help the community make rapid advances in the state of the art for MR image reconstruction. We also provide a self-contained introduction to MRI for machine learning researchers with no medical imaging background.
READ FULL TEXT VIEW PDFA large-scale dataset of both raw MRI measurements and clinical MRI images.
MRAugment: physics-aware data augmentation for deep learning based accelerated MRI reconstruction
None
Self-Supervised MRI Reconstruction
The excellent soft tissue contrast and flexibility of magnetic resonance imaging (MRI) makes it a very powerful diagnostic tool for a wide range of disorders, including neurological, musculoskeletal, and oncological diseases. However, the long acquisition time in MRI, which can easily exceed 30 minutes, leads to low patient throughput, problems with patient comfort and compliance, artifacts from patient motion, and high exam costs.
As a consequence, increasing imaging speed has been a major ongoing research goal since the advent of MRI in the 1970s. Increases in imaging speed have been achieved through both hardware developments (such as improved magnetic field gradients) and software advances (such as new pulse sequences). One noteworthy development in this context is parallel imaging, introduced in the 1990s, which allows multiple data points to be sampled simultaneously, rather than in a traditional sequential order [39, 26, 9].
The introduction of compressed sensing (CS) in 2006 [2, 23] promised another breakthrough in the reduction of MR scan time. At their core, CS techniques speed up the MR acquisition by acquiring less measurement data than has previously been required to reconstruct diagnostic quality images. Since undersampling of this kind violates the Nyquist-Shannon sampling theorem, aliasing artifacts are introduced which must be eliminated in the course of image reconstruction. This can be achieved by incorporating additional a priori knowledge during the image reconstruction process.
The last two years have seen the rapid development of machine learning approaches for MR image reconstruction, which hold great promise for further acceleration of MR image acquisition [10, 48, 11, 35, 60]. Some of the first work on this subject was presented at the 2016 annual meeting of the International Society for Magnetic Resonance in Medicine (ISMRM). The 2017 ISMRM annual meeting included, for the first time, a dedicated session on machine learning for image reconstruction, and presentations on the subject at the 2018 annual meeting spanned multiple focused sessions, including a dedicated category for abstracts.
Despite this substantial increase in research activity, the field of MR image reconstruction still lacks large-scale, public datasets with consistent evaluation metrics and baselines. Many MR image reconstruction studies use datasets that are not openly available to the research community. This makes it challenging to reproduce and validate comparisons of different approaches, and it restricts access to work on this important problem to researchers associated with or cooperating with large academic medical centers where such data is available.
In contrast, research in computer vision applications such as object classification has greatly benefited from the availability of large-scale datasets associated with challenges such as the
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [31]. Such challenges have served as a catalyst for the recent explosion in research activity on deep learning
[21].The goal of the fastMRI dataset is to provide a first step towards enabling similar breakthroughs in the machine-learning-based reconstruction of accelerated MR images. In this work we describe the first large-scale release of raw MRI data that includes 1,594 volumes, consisting of 56,987 slices^{1}^{1}1A slice corresponds to one image., associated with in vivo examinations from a range of MRI systems. In addition, we are releasing processed MR images in the DICOM format from 10,000 knee examinations from a representative clinical patient population, consisting of more than 1.2 million slices.
Prior to providing details about the dataset and about target reconstruction tasks with associated benchmarks, we begin with a brief primer on MR image acquisition and reconstruction, in order to enable non-MRI-experts to get up to speed quickly on the information content of the dataset. In general, both the fastMRI dataset and this paper aim to connect the data science and the MRI research communities, with the overall goal of advancing the state of the art in accelerated MRI.
MR imaging is an indirect process, whereby cross-sectional images of the subject’s anatomy are produced from frequency and phase measurements instead of direct, spatially-resolved measurements. A measuring instrument, known as a receiver coil, is placed in proximity to the area to be imaged (Figure 1). During imaging, a sequence of spatially- and temporally-varying magnetic fields, called a “pulse sequence,” is applied by the MRI machine. This induces the body to emit resonant electromagnetic response fields which are measured by the receiver coil. The measurements typically correspond to points along a prescribed path through the multidimensional Fourier-space representation of an imaged body. This Fourier space is known as k-space in the medical imaging community. In the most basic usage of MR imaging, the full Fourier-space representation of a region is captured by a sequence of samples that tile the space up to a specified maximum frequency.
The spatially-resolved image
can be estimated from the full k-space
by performing an inverse multidimensional Fourier transform:
(1) |
where is a noise-corrupted estimate of the true image .
The number of samples captured in k-space is a limiting factor for the speed of MR imaging. Fewer samples can be captured by sampling up to a lower maximum frequency, however this produces images of lower spatial resolution. An alternative undersampling approach involves omitting some number of k-space samples within a given maximum frequency range, which then results in aliasing artifacts. In order to remove these artifacts and infer the true underlying spatial structure of the imaged subject, one may apply a number of possible reconstruction strategies.
In parallel MR imaging, multiple receiver coils are used, each of which produces a separate k-space measurement matrix. Each of these matrices is different, since the view each coil provides of the imaged volume is modulated by the differential sensitivity that coil exhibits to MR signal arising from different regions. In other words, each coil measures Fourier components of the imaged volume multiplied by a complex-valued position-dependent coil sensitivity map . The measured k-space signal for coil in an array of coils is given by
(2) |
where the multiplication is entry-wise. This is illustrated in Figure (b)b, which shows the absolute value of the inverse discrete Fourier transform (DFT) of fully-sampled complex-valued k-space signals for each coil in a 15-element coil array. Each coil is typically highly sensitive in one region, and its sensitivity falls off significantly in other regions.
If the sensitivity maps are known, and the k-space sampling is full (i.e., satisfying the Nyquist sampling condition), then the set of linear relations between and each defines a linear system that is overdetermined by a factor of . It may be inverted using a pseudoinverse operation to produce a reconstruction of , as long as the linear system is full rank. The quality of this reconstruction will depend on the measurement noise, since the signal-to-noise ratio is poor in parts of the volume where the coil sensitivity is low.
In accelerated parallel imaging, each coil’s k-space signal is undersampled. As long as the total number of measurements across all coils exceeds the number of image voxels to be reconstructed, an unregularized least squares solution can still be used, leading to a theoretical -fold speedup over fully-sampled single-coil imaging. Each extra coil effectively produces an additional “sensitivity-encoded” measurement of the volume [26], which augments the frequency and phase encoded measurements obtained from the sequential application of magnetic field gradients in the MR pulse sequence. Estimates of coil sensitivity patterns, required for inversion of the undersampled multi-coil linear system, may be generated from separate low-resolution calibration scans. They may also be derived directly from the k-space measurements by fully sampling a comparatively small central region of k-space, which corresponds to low spatial frequencies.
In practice, the use of sub-sampling results in significant amplification of noise, and regularization is usually needed. In cases where a tight imaging field of view is used, or at imaging depths exceeding the dimensions of the individual coils, the sensitivity patterns of different coils become somewhat linearly dependent, thereby lowering the effective rank of the linear system, increasing noise amplification associated with the inverse operation, and limiting the maximum practical acceleration. As a result, in the clinic, parallel imaging acceleration factors are typically on the order of two to three.
Classical approaches to MRI reconstruction solve a regularized inverse optimization problem to find the spatially-resolved image from the sub-sampled k-space data, in both the single-coil and the multi-coil case. We describe the classical approach in more detail in Section 6. In the machine learning approach, a reconstruction function
(3) |
is learned from input and output pair tuples drawn from a population. The goal is to find a function that minimizes the risk (i.e., expected loss) over the population distribution:
We discuss error metrics that may be used as loss functions
in Section 5. In practice this optimization problem must be approximated with the empirical risk using a sample from the population, with respect to a loss function :(4) |
The availability of public datasets has played an important role in advancing research in medical imaging, providing benchmarks to compare different approaches and leading to more impactful contributions. Early works such as DDSM [13], SLIVER07 [14] and CAUSE07 [8] triggered increasing efforts to collect new larger-scale biomedical datasets, which resulted in over one hundred public releases (counting the entries on https://grand-challenge.org/) to advance medical image analysis research. The vast majority of these datasets, which include a range of medical imaging modalities, are designed to test the limits of current methods in the tasks of segmentation, classification, and detection. Datasets such as BraTS [24], LUNA [37], ChestX-ray [50], DeepLesion [55], and Camelyon [1] are widely used in research with clinical applications.
However, the current lack of large-scale reference standards for MR image reconstruction hinders progress in this important area. Most research uses synthetic k-space data that is not directly acquired but rather obtained from post-processing of already-reconstructed images [5, 38, 57, 56, 27]. Research using small-scale proprietary raw k-space datasets is also common [15, 36, 35, 33, 22].
In order to address the above-mentioned shortcomings, recent efforts have been devoted to collecting and publicly releasing datasets containing raw (unprocessed) k-space data; see, e.g., [34, 11]. However, the size of these existing datasets remains small. Although these datasets might provide a valuable test bed for signal processing methods, larger datasets are required to fully realize the potential of deep learning. Table 1 lists publicly available MR datasets containing raw k-space data.
Dataset | Volumes | Body part | MR scan type |
---|---|---|---|
NYU dataset [11] | 100 | knee | PD, T2 |
Stanford dataset 2D FSE | 89 | knee | PD |
Stanford dataset 3D FSE [34] | 20 | knee | PD |
Stanford undersampled dataset | 38 | knee | PD |
fastMRI dataset | 1594 | knee | PD |
The fastMRI dataset (http://fastmri.med.nyu.edu/) contains four types of data from MRI acquisitions of knees.
unprocessed complex-valued multi-coil MR measurements.
combined k-space data derived from multi-coil k-space data in such as way as to approximate single-coil acquisitions, for evaluation of single-coil reconstruction algorithms.
real-valued images reconstructed from fully-sampled multi-coil acquisitions using the simple root-sum-of-squares method detailed below. These may be used as references to evaluate the quality of reconstructions.
spatially-resolved images for which the raw data was discarded during the acquisition process. These images are provided to represent a larger variety of machines and settings than are present in the raw data.
This data was designed to enable two distinct types of tasks:
Single-coil reconstruction task: reconstruct images approximating the ground-truth from undersampled single-coil data.
Multi-coil reconstruction task: reconstruct images approximating the ground-truth from undersampled multi-coil data.
For each task we provide an official split of the k-space data and ground-truth images into training and validation subsets that contain fully-sampled acquisitions, as well as test and challenge subsets which contain k-space data that have been subjected to undersampling masks as described below. Ground-truth images are not being released for the test and challenge datasets. During training of a machine-learning model, the training k-space data should be programmatically masked following the same procedure. The challenge subsets are not being released at the time of writing and are reserved for future challenges associated with the fastMRI dataset.
The rationale for having a single-coil reconstruction task (and for providing simulated single-coil data), even though reconstruction from multi-coil data is expected to be more precise, is twofold: (i) to lower the barrier of entry for researchers who may not be familiar with MRI data, since the use of a single coil removes a layer of complexity, and (ii) to include a task that is relevant for the single-coil MRI machines still in use throughout the world.
The DICOM images may be useful as additional data for training. Their distribution is different from that of the ground-truth images, since they were acquired with a larger diversity of scanners, manners of acquisition, reconstruction methods, and post-processing algorithms, so the application of transfer-learning techniques may be necessary. Most DICOM images are the result of accelerated parallel imaging acquisitions and corresponding reconstructions, with image quality that differs from that of putative fully-sampled acquisitions and reconstructions. The ground-truth images may, in many cases, represent a higher standard of image quality than the clinical gold standard, for which full sampling is not routine or even practical.
Curation of the datasets described here was part of a study approved by the NYU School of Medicine Institutional Review Board. Raw data was anonymized via conversion to the vendor-neutral ISMRMRD format [18]. DICOM data was anonymized using the RSNA clinical trial processor. We performed manual inspection of each DICOM image for the presence of unexpected protected health information (PHI), manual checking of metadata in raw data files, as well as spot checking of all metadata and image content.
Multi-coil raw data was stored for 1,594 scans acquired for the purpose of diagnostic knee MRI. For each scan, a single fully sampled MRI volume was acquired on one of three clinical 3T systems (Siemens Magnetom Skyra, Prisma and Biograph mMR) or one clinical 1.5T system (Siemens Magnetom Aera). Data acquisition used a 15 channel knee coil array and conventional Cartesian 2D TSE protocol employed clinically at NYU School of Medicine. The dataset includes data from two pulse sequences, yielding coronal proton-density weighting with (PDFS, 798 scans) and without (PD, 796 scans) fat suppression (see Figure 3). Sequence parameters are, as per standard clinical protocol, matched as closely as possible between the systems.
System | Number of scans |
---|---|
Skyra 3T | 663 |
Prisma 3T | 83 |
Biograph mMR 3T | 153 |
Aera 1.5T | 695 |
The following sequence parameters were used: Echo train length 4, matrix size , in-plane resolution mmmm, slice thickness 3mm, no gap between slices. Timing varied between systems, with repetition time (TR) ranging between 2200 and 3000 milliseconds, and echo time (TE) between 27 and 34 milliseconds.
We used an emulated single-coil (ESC) methodology to simulate single-coil data from a multi-coil acquisition [43]. ESC computes a complex-valued linear combination of the responses from multiple coils, with the linear combination fitted to the ground-truth root-sum-of-squares reconstruction in the least-squares sense.
The root-sum-of-squares reconstruction method applied to the fully sampled k-space data [28] provides the ground truth for the multi-coil dataset. The single-coil dataset includes two ground truth reconstructions, which we denote ESC and RSS. The ESC ground truth is given by the inverse Fourier transform of the single-coil data, and the RSS ground truth is given by the root-sum-of-squares reconstruction computed on the multi-coil data that were used to generate the virtual single-coil k-space data. All ground truth images are cropped to the central pixel region to compensate for readout-direction oversampling that is standard in clinical MR examinations.
The root-sum-of-squares approach [28] is one of the most commonly-used coil combination methods in clinical imaging. It first applies the inverse Fourier Transform to the k-space data from each coil:
(5) |
where is the k-space data from the th coil and is the th coil image. Then, the individual coil images are combined voxel by voxel as follows:
(6) |
where is the final image estimate and
is the number of coils. The root-sum-of-squares image estimate is known to converge to the optimal, unbiased estimate of
in the high-SNR limit [20].Volumes | Slices | |||
---|---|---|---|---|
Multi-coil | Single-coil | Multi-coil | Single-coil | |
training | 973 | 973 | 34,742 | 34,742 |
validation | 199 | 199 | 7,135 | 7,135 |
test | 118 | 108 | 4,092 | 3,903 |
challenge | 104 | 92 | 3,810 | 3,305 |
Each volume is randomly assigned to one of the following six component datasets: training, validation, multi-coil test, single-coil test, multi-coil challenge, or single-coil challenge. Table 3
shows the number of volumes assigned to each dataset. The training and validation datasets may be used to fit model parameters or to determine hyperparameter values. The test dataset is used to compare the results across different approaches. To ensure that models do not overfit to the test set, the ground truth reconstructions are not publicly released for this set. Evaluation on the test set is accomplished by uploading results to the public leaderboard at
http://fastmri.org/. The challenge portion of the dataset will be forthcoming.A volume from the train or validation dataset is used in both the single-coil and multi-coil tracks, whereas a volume from the test or challenge dataset is only used in either the single-coil or the multi-coil track. Volumes were only included in a single test or challenge set to ensure information from one could not be used to help the result in another.
Volumes in the test and challenge datasets contain undersampled k-space data. The undersampling is performed by retrospectively masking k-space lines from a fully-sampled acquisition. k-space lines are omitted only in the phase encoding direction, so as to simulate physically realizable accelerations in 2D data acquisitions. The same undersampling mask is applied to all slices in a volume, with each case consisting of a single volume. In order to provide diverse undersampling patterns across the datasets, the undersampling mask is chosen randomly for each case. First, the overall acceleration factor is set randomly either to four or to eight (representing a four-fold or an eight-fold acceleration, respectively), with equal probability for each. The undersampling mask is then generated by first including some number of adjacent lowest-frequency k-space lines to provide a fully-sampled k-space region. When the acceleration factor equals four, the fully-sampled central region includes 8% of all k-space lines; when it equals eight, 4% of all k-space lines are included. The remaining k-space lines are included uniformly at random, with the probability set so that, on average, the undersampling mask achieves the desired acceleration factor. Figure
4 depicts two randomly-undersampled k-space trajectories. Random undersampling is performed in order to meet the general conditions for compressed sensing [2, 23], for fair comparison of learned reconstruction algorithms with traditional sparsity-based regularizers.In addition to the scanner raw data described above, the fastMRI dataset includes DICOM data from 10,000 clinical knee MRI scans. These images represent a wider variety of scanners and pulse sequences than those represented in the collection of raw data. Each MR exam for which DICOM images are included typically consisted of five clinical pulse sequences:
Coronal proton-density weighting without fat suppression,
Coronal proton-density weighting with fat suppression,
Sagittal proton-density weighting without fat suppression,
Sagittal T2 weighting with fat suppression, and
Axial T2 weighting with fat suppression.
The two coronal sequences have the same basic specifications (matrix size, etc) as the sequences associated with raw data. The sagittal and axial sequences have different matrix sizes and have no direct correspondence to the sequences represented in raw data.
The Fourier transformation of an image from a DICOM file does not directly correspond to the originally measured raw data, due to the inclusion of additional post-processing steps in the vendor-specific reconstruction pipeline. Most of the DICOM images are also derived from accelerated acquisitions and are reconstructed with parallel imaging algorithms, since this baseline acceleration represents the current clinical standard. The image quality of DICOM images, therefore, is not equivalent to that of the ground truth images directly associated with fully sampled raw data. The DICOM images are distinct from the validation, test, or challenge sets.
The assessment of MRI reconstruction quality is of paramount relevance to develop and compare machine learning and medical imaging systems [51, 53, 3, 58]. The most commonly used evaluation metrics in the MRI reconstruction literature [3] include (normalized) mean squared error, which measures pixel-wise intensity differences between reconstructed and reference images, and signal-to-noise ratio, which measures the degree to which image information rises above background noise. These metrics are appealing because they are easy to understand and efficient to compute. However, they both evaluate pixels independently, ignoring the overall image structure.
Additional metrics have been introduced in the literature to capture structural distortion [41, 6, 58]. For example, the structural similarity index [53] and its extended version, multiscale structural similarity [52], provide a mechanism to assess the perceived quality of an image using local image patches.
The most recent developments in the computer vision literature leverage pretrained deep neural networks to measure the perceptual quality of an image by computing differences at the representation level
[19], or by means of a downstream task such as classification [32].In the remainder of this section, we review the definitions of the commonly-used metrics of normalized mean square error, peak signal-to-noise ratio, and structural similarity. As is discussed later, while we expect these metrics to serve as a familiar starting point, we also hope that the fastMRI dataset will enable robust investigations into improved evaluation metrics as well as improved reconstruction algorithms.
The normalized mean square error
(NMSE) between a reconstructed image or image volume represented as a vector
and a reference image or volume is defined as(7) |
where is the squared Euclidean norm, and the subtraction is performed entry-wise. In this work we report NMSE values computed and normalized over full image volumes rather than individual slices, since image-wise normalization can result in strong variations across a volume.
NMSE is widely used, and we recommend that it be reported as the primary measure of reconstruction quality for experiments on the fastMRI dataset. However, due to the many downsides of NMSE, such as a tendency to favor smoothness rather than sharpness, we recommend also reporting additional metrics such as those described below.
The peak signal-to-noise ratio (PSNR) represents the ratio between the power of the maximum possible image intensity across a volume and the power of distorting noise and other errors:
(8) |
Here is the reconstructed volume, is the target volume, is the largest entry in the target volume , is the mean square error between and defined as and is the number of entries in the target volume . Higher values of PSNR (as opposed to lower values of NMSE) indicate a better reconstruction.
The structural similarity (SSIM) index measures the similarity between two images by exploiting the inter-dependencies among nearby pixels. SSIM is inherently able to evaluate structural properties of the objects in an image and is computed at different image locations by using a sliding window. The resulting similarity between two image patches and is defined as
(9) |
where and are the average pixel intensities in and , and
are their variances,
is the covariance between and and and are two variables to stabilize the division; and . For SSIM values reported in this paper, we choose a window size of , we set , , and define as the maximum value of the target volume, .Along with releasing the fastMRI data, we detail two reference approaches to be used as reconstruction baselines: a classical non-machine learning approach, and a deep-learning approach. Each of these baselines has versions tailored for single-coil or multi-coil data. The “classical” baselines are comprised of reconstruction methods developed by the MRI community over the last 30+ years. These methods have been extensively tested and validated, and many have demonstrated robustness sufficient for inclusion in the clinical workflow. By comparison, machine learning reconstruction methods are relatively new in MRI, and deep-learning reconstruction techniques in particular have emerged only in the past few years. We include some deliberately rudimentary deep-learning models as starting points, with the expectation that future learning algorithms will provide markedly improved performance.
In the single-coil imaging setting, the task is to reconstruct an image, , from k-space observations, . In the presence of undersampling, the vector has a length smaller than that of . Therefore there are, in principle, infinitely many possibilities for that can be mapped onto a single . The advent of compressed sensing [2, 23] provided a framework for reconstruction of images from undersampled data that closely approximate images derived from fully-sampled data, subject to sparsity constraints. Compressed sensing theory requires the images in question to be sparse in some transform domain. Two common examples are to assume sparsity in the wavelet domain, or to assume sparsity of the spatial gradients of the image. The particular assumption impacts the mathematical formulation of the reconstruction problem, either in the cost function or through a regularization term.
More concretely, the sparse reconstruction approach consists of finding an image whose Fourier space representation is close to the measured k-space matrix at all measured spatial frequencies, yet at the same time minimizes a sparsity-inducing objective that penalizes unnatural reconstructions:
(11) |
Here, is a projection function that zeros out entries that are masked, and is a specified small threshold value. In most applications it is easier to work with a soft penalty instead of a constraint, so the Lagrangian dual form of Equation 11 is used instead, with penalty parameter :
(12) |
For a convex regularizer , there exists, for any choice , a value such that these two formulations have equivalent solutions.
The most common regularizers used for MRI are:
The penalty works best when the MR images are sparse in image space, for instance in vascular imaging (e.g., Yamamoto et al. [54]). This is not the case for most MRI applications. The total-variation (TV) penalty encourages sparsity in the spatial gradients of the reconstructed image, as given by a local finite-difference approximation [30] (Figure (g)g). The TV regularizer can be very effective for some imaging protocols, but it also has a tendency to remove detail (Figure (h)h). The penalty encourages sparsity in the discrete wavelet transform of the image. Most natural images exhibit significant sparsity when expressed in a wavelet basis. The most commonly used transform is the Multiscale Daubechies (DB2) transform (Figure (e)e). To date, due to their computational complexity as well as their tendency to introduce compression artifacts or oversmoothing, compressed sensing approaches have taken some time to gain acceptance in the clinic, though commercial implementations of compressed sensing are currently beginning to appear.
The single-coil classical baseline provided with the fastMRI dataset was adopted from the widely-used open-source BART toolkit (Appendix B), using total variation as the regularizer. We ran the optimization algorithm for 200 iterations on each slice independently.
Single-coil classical baseline (TV model) applied to validation data | |||||||
---|---|---|---|---|---|---|---|
Acceleration | Regularization Weight | NMSE | PSNR | SSIM | |||
PD | PDFS | PD | PDFS | PD | PDFS | ||
4-fold | 0.0355 | 0.0919 | 30.2 | 27.6 | 0.637 | 0.506 | |
0.0342 | 0.0916 | 30.4 | 27.6 | 0.641 | 0.505 | ||
0.0287 | 0.09 | 31.4 | 27.7 | 0.645 | 0.494 | ||
0.0313 | 0.0993 | 30.9 | 27.3 | 0.575 | 0.399 | ||
1 | 0.0522 | 0.124 | 28.5 | 26.2 | 0.526 | 0.327 | |
8-fold | 0.0708 | 0.118 | 27.1 | 26.4 | 0.551 | 0.417 | |
0.0699 | 0.118 | 27.1 | 26.4 | 0.553 | 0.416 | ||
0.063 | 0.117 | 27.7 | 26.4 | 0.564 | 0.408 | ||
0.0537 | 0.117 | 28.4 | 26.5 | 0.55 | 0.357 | ||
1 | 0.0742 | 0.132 | 26.9 | 25.9 | 0.538 | 0.333 |
Table 4 summarizes the results of applying this method to the single-coil validation data with different regularization strengths and different acceleration factors. These results indicate that NMSE and PSNR metrics are highly (inversely) correlated and generally favor models with stronger regularization than SSIM does. Stronger regularization generally results in smoother images that lack the fine texture of the ground truth images. A regularization parameter of 0.01 yields the best results for 4-fold acceleration in most cases, whereas the higher 8-fold acceleration gets slightly better results with a regularization parameter of 0.1.
When multiple receiver coils are used, the reconstruction process must combine information from multiple channels into one image. Multi-coil acquisitions currently represent the norm in clinical practice, for two principal reasons: they provide increased SNR, as compared with single-coil acquisitions, over extended fields of view, and they enable acceleration via parallel imaging. Equation 2 in Section 2.1 describes the forward model for parallel imaging. The SENSE formulation [26] of parallel image reconstruction involves direct inversion of this forward model, via a suitable pseudoinverse. Leveraging the convolution property of the Fourier Transform reveals the following convolution relationship:
(13) |
Here is the Fourier Transform of the coil sensitivity pattern and denotes the convolution operation. The GRAPPA/SMASH formulation of parallel image reconstruction [39, 9] involves filling in missing k-space data via combinations of acquired k-space data within a defined convolution kernel, prior to inverse Fourier transformation.
Either formulation requires estimates of the coil sensitivity information in or , which may be derived either from a separate reference scan or directly from the acquired undersampled k-space data itself. Reference scan methods are often used in the SENSE formulation, whereas GRAPPA formulations are typically self-calibrating, relying on subsets of fully-sampled data generally in central k-space regions.
The parallel imaging techniques described above may be combined productively with compressed sensing, via the use of sparsity-based regularizers. For example, one may extend Equation 12 in Section 6.1 above to include multi-coil data as follows:
(14) |
Various methods may be used to solve this reconstruction problem. One frequently-used method is the ESPIRiT approach [45], which harmonizes parallel imaging and compressed sensing in a unified framework.
As was the case for the classical single-coil baseline, the classical multi-coil baseline provided with the fastMRI dataset was adopted from the BART toolkit (Appendix B). In the multi-coil case, the ESPIRiT algorithm was used to estimate coil sensitivities, and to perform parallel image reconstruction in combination with compressed sensing using a total-variation regularizer.
Multi-coil classical baseline (TV model) applied to validation data | |||||||
---|---|---|---|---|---|---|---|
Acceleration | Regularization Weight | NMSE | PSNR | SSIM | |||
PD | PDFS | PD | PDFS | PD | PDFS | ||
4-fold | 0.0246 | 0.0972 | 31.6 | 27.4 | 0.677 | 0.53 | |
0.0222 | 0.0951 | 32.1 | 27.5 | 0.693 | 0.554 | ||
0.0198 | 0.0971 | 32.6 | 27.5 | 0.675 | 0.588 | ||
0.0251 | 0.109 | 31.3 | 27 | 0.633 | 0.538 | ||
8-fold | 0.0494 | 0.114 | 28.2 | 26.5 | 0.61 | 0.505 | |
0.0447 | 0.112 | 28.6 | 26.6 | 0.626 | 0.524 | ||
0.0352 | 0.109 | 29.6 | 26.8 | 0.642 | 0.551 | ||
0.0389 | 0.114 | 29.2 | 26.7 | 0.632 | 0.527 |
Results using this baseline model are summarized in Table 5. The experimental setup is identical to the single-coil scenario, except that we compare the reconstructions with the root-sum-of-squares ground truth instead of the ESC ground truth.
Various deep-learning techniques based on Convolutional Neural Networks have recently been proposed to tackle the problem of reconstructing MR images from undersampled k-space data
[10, 48, 11, 35, 60, 17, 12]. Many of these proposed methods are based on the U-Net architecture introduced in [29]. U-Net models and their variants have successfully been used for many image-to-image prediction tasks including MRI reconstruction [17, 12] and image segmentation [29].The U-Net single-coil baseline model included with the fastMRI data release (Figure 6) consists of two deep convolutional networks, a down-sampling path followed by an up-sampling path. The down-sampling path consists of blocks of two 33 convolutions each followed by instance normalization [46]
and Rectified Linear Unit (ReLU) activation functions. The blocks are interleaved by down-sampling operations consisting of max-pooling layers with stride 2 which halve each spatial dimension. The up-sampling path consists of blocks with a similar structure to the down-sampling path, interleaved with bilinear up-sampling layers which double the resolution between blocks. Each block consists of two 3
3 convolutions with instance normalization [46] and ReLU activation layers. In contrast to the down-sampling path, the up-sampling path concatenates two inputs to the first convolution in each block: the up-sampled activations from the previous block, together with the activations that follow the skip connection from the block in the down-sampling path with the same resolution (horizontal arrows in Figure 6). At the end of the up-sampling path, we include a series of 11 convolutions that reduce the number of channels to one without changing the spatial resolution.For the single-coil MRI reconstruction case, the zero-filled image is used as the input to the model. The zero-filled image is obtained by first inserting zeros at the location of all unobserved k-space values, applying a two-dimensional Inverse Fourier Transform (IFT) to the result, and finally computing the absolute value. The result is center cropped to remove any readout and phase oversampling. Using the notation from section 6.1, the zero-filled image is given by , where is the linear operator corresponding to the center cropping and is the two-dimensional IFT.
The entire network is trained on the training data in an end-to-end manner to minimize the mean absolute error with respect to corresponding ground truth images. Let be the function computed by the U-Net model, where represents the parameters of the model. Then the training process corresponds to the following optimization problem:
(15) |
where the ground truths are obtained using the ESC method described in Section 4.3
. Our particular single-coil U-Net baseline model was trained on 973 image volumes in the training set, using the RMSProp algorithm
[42]. We used an initial learning rate of 0.001, which was multiplied by 0.1 after 40 epochs, after which the model was trained for an additional 10 epochs. During training, we randomly sampled a different mask for each training example in each epoch independently using the protocol described in Section
4.6 for the test data. At the end of each epoch, we recorded the NMSE on the validation data. After training, we picked the model that achieved the lowest validation NMSE.Single-coil U-Net baseline applied to validation data | ||||||||
---|---|---|---|---|---|---|---|---|
Acceleration | Channels | #Params | NMSE | PSNR | SSIM | |||
PD | PDFS | PD | PDFS | PD | PDFS | |||
4-fold | 32 | 3.35M | 0.0161 | 0.0531 | 33.78 | 29.90 | 0.81 | 0.631 |
64 | 13.39M | 0.0157 | 0.0528 | 33.90 | 29.9 | 0.813 | 0.633 | |
128 | 53.54M | 0.0154 | 0.0525 | 34.01 | 29.95 | 0.815 | 0.634 | |
256 | 214.16M | 0.0154 | 0.0525 | 34.00 | 29.95 | 0.815 | 0.636 | |
8-fold | 32 | 3.35M | 0.0283 | 0.0698 | 31.13 | 28.6 | 0.754 | 0.555 |
64 | 13.39M | 0.0272 | 0.0693 | 31.30 | 28.63 | 0.758 | 0.558 | |
128 | 53.54M | 0.0265 | 0.0686 | 31.44 | 28.68 | 0.761 | 0.558 | |
256 | 214.16M | 0.0261 | 0.0682 | 31.5 | 28.71 | 0.762 | 0.559 |
Table 6 presents the results from running trained U-Net models of different capacities on the single-coil validation data. These results indicate that the trained U-Net models perform significantly better than the classical baseline method. The best U-Net models obtain 40-50% relative improvement over the classical methods (see Table 4) in terms of NMSE.
The performance of the U-Net models continues to increase with increasing model capacity, and even the largest model with over 200 million parameters is unable to overfit the training data. These improvements begin to saturate after 50 million parameters for the simpler 4-fold acceleration case. However, for the more challenging 8-fold acceleration task, the largest model performs significantly better than the smaller models. This suggests that models with very large capacities trained on large amounts of data can enable high acceleration factors.
Single-coil classical and U-Net baselines applied to test data | ||||
Model | Acceleration | NMSE | PSNR | SSIM |
Classical Model (Total Variation) | 4-fold | 0.0479 | 30.69 | 0.603 |
8-fold | 0.0795 | 27.12 | 0.469 | |
Aggregate | 0.0648 | 28.77 | 0.531 | |
U-Net | 4-fold | 0.0320 | 32.22 | 0.754 |
8-fold | 0.0480 | 29.45 | 0.651 | |
Aggregate | 0.0406 | 30.7 | 0.699 |
Table 7 compares the performance of the classical and the U-Net baseline models for the single-coil task, as applied to the test dataset. For the classical baseline model, we chose the best regularization weights for each modality and for each acceleration factor based on the validation data results, resulting in a regularization weight of 0.1 for 8-fold acceleration on Proton Density without fat suppression and 0.01 for every other case. For the U-Net baseline model, we chose the model with the largest capacity.
In the multi-coil MRI reconstruction task, we have one set of undersampled k-space measurements from each coil, and a different zero-filled image can be computed from each coil. These coil images can be combined using the root-sum-of-squares algorithm. Let be the zero-filled image from coil . With defined as in Equation 6, the U-Net model described in Section 6.3 can be used for the multi-coil reconstruction task by simply feeding this combined image in as input: The model is trained to minimize the mean absolute error loss similarly to the single-coil task. The training procedure is also identical to the single-coil case except that the root-sum-of-squares image is used as the ground truth as described in Section 4.4.
Multi-coil U-Net baseline applied to validation data | ||||||||
---|---|---|---|---|---|---|---|---|
Acceleration | Channels | #Params | NMSE | PSNR | SSIM | |||
PD | PDFS | PD | PDFS | PD | PDFS | |||
4-fold | 32 | 3.35M | 0.0066 | 0.0122 | 36.7 | 35.97 | 0.9192 | 0.8595 |
64 | 13.39M | 0.0063 | 0.0120 | 36.95 | 36.11 | 0.9224 | 0.8615 | |
128 | 53.54M | 0.0057 | 0.0113 | 37.38 | 36.33 | 0.9266 | 0.8641 | |
256 | 214.16M | 0.0054 | 0.0112 | 37.58 | 36.39 | 0.9287 | 0.8655 | |
8-fold | 32 | 3.35M | 0.0144 | 0.0197 | 33.31 | 33.82 | 0.8778 | 0.8213 |
64 | 13.39M | 0.0136 | 0.0198 | 33.56 | 33.93 | 0.8825 | 0.8238 | |
128 | 53.54M | 0.0123 | 0.0179 | 34.01 | 34.25 | 0.8892 | 0.8277 | |
256 | 214.16M | 0.0120 | 0.0181 | 34.12 | 34.23 | 0.8915 | 0.8286 |
As is the case for the single-coil task, the multi-coil U-Net baselines substantially outperform the classical baseline models (compare Table 8 with Table 5). Note that this is true despite the fact that the multi-coil U-Net baseline defined above does not take coil sensitivity information into account, and therefore neither includes a direct parallel image reconstruction nor accounts for sparsity or other correlations among coils. Models that incorporate coil sensitivity information are expected to perform better than the current multi-coil U-Net baselines. Table 8 shows, once again, that the performance of the U-Net models improves with model size, with the largest U-Net baseline model providing the best performance.
Multi-coil classical and U-Net baselines applied to test data | ||||
Model | Acceleration | NMSE | PSNR | SSIM |
Classical Model (Total Variation) | 4-fold | 0.0503 | 30.88 | 0.628 |
8-fold | 0.0760 | 28.25 | 0.593 | |
Aggregate | 0.0633 | 29.54 | 0.610 | |
U-Net | 4-fold | 0.0106 | 35.91 | 0.904 |
8-fold | 0.0171 | 33.57 | 0.858 | |
Aggregate | 0.0139 | 34.7 | 0.881 | |
Table 9 compares the performance of the classical and the U-Net baseline models for the multi-coil task, as applied to the test dataset. For the classical baseline model, we chose the best regularization weights for each modality and for each acceleration factor based on the validation data results, resulting in a regularization weight of 0.001 for 4-fold undersampling for Proton Density with Fat Suppression and 0.01 for every other case. For the U-Net baseline model, we chose the model with the largest capacity.
To appreciate the value of the dataset size, we study how model performance scales with the amount of data used to train a model. To this end, we trained several U-Net models with varying model capacities on different sized subsets of the training data. Figure 7 shows the SSIM metric computed on the validation data for the multi-coil task. It is evident from these results that training with larger amounts of data yields substantial improvements in the quality of reconstructions, which highlights the need for the release of large datasets like fastMRI.
As mentioned in Section 4.6.1, the fastMRI dataset also includes a large set of DICOM images that can be used as additional training data. It is possible that the baseline U-Net models could be improved further by making use of this additional data.
MR image reconstruction is an inverse problem, and thus it has many connections to inverse problems in the computer vision literature [40, 7, 4, 47]
, such as super-resolution, denoising and in-painting. In all of these inverse problems, the goal is to recover a high-dimensional ground truth image from a lower-dimensional measurement. Such ill-posed problems are very difficult to solve since there exists an infinite number of high-dimensional images that can result in the same-low dimensional measurement. In order to simplify the problem, an assumption is often made that only a small number of high-resolution images would correspond to natural images
[4]. Given that MRI reconstruction is a similar inverse problem, we hope that the computer vision community, as well as the medical imaging community, will find our dataset beneficial.In the clinical setting, radiologists use MRI to search for abnormalities, make diagnoses, and recommend treatment options. Thus, contrary to many computer vision problems where small texture changes might not necessarily alter the overall satisfaction of the observer, in MRI reconstruction, extra care should be taken to ensure that the human interpreter is not misled by a very plausible but not necessarily correct reconstruction. This is especially important as image generation techniques increase in their ability to generate photo-realistic results [49]. Therefore some research effort should be devoted to look for solutions that, by design, ensure correct diagnosis, and we hope that our dataset will provide a testbed for new ideas in these directions as well.
An important question in MRI reconstruction is the choice of the evaluation metric. The current consensus in the MRI community is that global metrics, such as NMSE, SSIM and PSNR, do not necessarily capture the level of detail required for proper evaluation of MRI reconstruction algorithms [25, 16]. A natural question arises: what would the optimal metric be? An ideal MRI reconstruction algorithm should produce sharp, trustworthy images, that ultimately ensure the proper radiologic interpretation. While our dataset will help ensure consistent evaluation, we hope that it will also trigger research on MRI reconstruction metrics. This goal will be impossible to achieve without clinical studies involving radiologists evaluating fully-sampled and undersampled MRI reconstructions to make sure that both images lead to the same diagnosis.
Although this dataset provides an excellent entry point for machine learning methods for MR reconstruction, there are some aspects of MR imaging that we have not yet considered here. Physical effects such as spin relaxation, eddy currents and field distortions are not at present explicitly accounted for in our retrospective undersampling approaches or our baseline models. The manifestation of these effects depends upon the object being imaged, the MRI scanner used, and even the sampling pattern selected. Extending the results from methods developed for this challenge to the clinic remains an open problem, but we believe the provision of this dataset is an important first step on the path to this goal.
In this work we detailed the fastMRI dataset: the largest raw MRI dataset to be made publicly available to date. Previous public datasets have focused on post-processed magnitude images for specific biologic and pathologic questions. Although our dataset was originally acquired for a focused task, the inclusion of raw k-space data allows methods to be developed for the imaging pipeline itself, in principle allowing them to be applied on any MRI scanner for any imaging task.
In addition to the data, we provide evaluation metrics and baseline algorithms to aid the research community in assessing new approaches. Consistent evaluation of MRI reconstruction techniques is provided by a leaderboard using held-out test data.
We hope that the availability of this dataset will accelerate research in MR image reconstruction, and will serve as a benchmark during training and validation of new algorithms.
We acknowledge grant support from the National Institutes of Health under grants NIH R01 EB024532 and NIH P41 EB017183. We would also like to thank Michela Paganini and Mark Tygert.
ESPIRiT -an eigenvalue approach to autocalibrating parallel MRI: where SENSE meets GRAPPA.
Magnetic resonance in medicine, 71(3), 2014.Conference on Computer Vision and Pattern Recognition (CVPR)
, 2018.ISMRMRD files were converted into simpler HDF5 files that store the entire k-space in a single tensor. One HDF5 file was created per volume. The HDF5 files share the following common attributes:
Acquisition protocol (either CORPD or CORPDFS, indicating coronal proton density with or without fat saturation, respectively. See Figure 3).
The XML header copied verbatim from the ISMRMRD file that was used to generate the HDF5 file. It contains information about the scanner, field of view, dimensions of k-space, and sequence parameters.
A unique string identifying the examination, and substituting anonymously for the patient identification.
The Euclidean norm and the largest entry of the target volume. For the multi-coil track the target volume is stored in reconstruction_rss. For the single-coil track the target volume is stored in reconstruction_esc. These two attributes are only available in the training and validation datasets.
Acceleration factor of the undersampled k-space trajectory (either 4 or 8). This attribute is only available in the test dataset.
The number of low-frequency k-space lines in the undersampled k-space trajectory. This attribute is only available in the test dataset.
The rest of this section describes the format of the HDF5 files for the multi-coil and single-coil tracks.
Training dataset for the multi-coil track. The HDF5 files contain the following tensors:
Multi-coil k-space data. The shape of the kspace tensor is (number of slices, number of coils, height, width).
root-sum-of-squares reconstruction of the multi-coil k-space data cropped to the center region. The shape of the reconstruction_rss tensor is (number of slices, 320, 320).
Validation dataset for the multi-coil track. The HDF5 files have the same structure as the HDF5 files in multicoil_train.tar.gz.
Test dataset for the multi-coil track. The HDF5 files contain the following tensors:
Undersampled multi-coil k-space. The shape of the kspace tensor is (number of slices, number of coils, height, width).
Defines the undersampled Cartesian k-space trajectory. The number of elements in the mask tensor is the same as the width of k-space.
Training dataset for the single-coil track. The HDF5 files contain the following tensors:
Emulated single-coil k-space data. The shape of the kspace tensor is (number of slices, height, width).
root-sum-of-squares reconstruction of the multi-coil k-space that was used to derive the emulated single-coil k-space cropped to the center region. The shape of the reconstruction_rss tensor is (number of slices, 320, 320).
The inverse Fourier transform of the single-coil k-space data cropped to the center region. The shape of the reconstruction_esc tensor is (number of slices, 320, 320).
Validation dataset for the single-coil track. The HDF5 files have the same structure as the HDF5 files in singlecoil_train.tar.gz.
Test dataset for the single-coil track. The HDF5 files contain the following tensors:
Undersampled emulated single-coil k-space. The shape of the kspace tensor is (number of slices, height, width).
Defines the undersampled Cartesian k-space trajectory. The number of elements in the mask tensor is the same as the width of k-space.
The Berkeley Advanced Reconstruction Toolbox (BART) [44] ^{2}^{2}2Version 0.4.03 https://mrirecon.github.io/bart/ contains implementations of standard methods for coil sensitivity estimation and undersampled MR image reconstruction incorporating parallel imaging and compressed sensing. We used this tool to produce the classical baseline MSE estimates, as well as the illustrations in Figure 2. In this section we provide a brief introduction to the tool sufficient for reproducing our baseline results. We will use as an example a 640x368 undersampled MRI scan with 15 coils. The target region is a central region which will be cropped to after reconstruction.
BART provides a command line interface which acts on files in a simple storage format. Each multidimensional array is stored in a pair of files, a header file .hdr
and a data file .cfl
. The header file contains the dimensions of the array given in ASCII. In our running example, this should be input.hdr
:
1 640 368 15
The CFL file contains the raw data in column-major order, stored as complex float values. Missing k-space values are indicated by 0 entries. BART provides Python and MATLAB interfaces for reading and writing this format.
When working with k-space data with BART, it is simplest to use data in ”centered” form, where the low frequency values are in the center of the image, and the high frequency values are at the edges. Most FFT libraries output the data in uncentered form. BART provides a tool for conversion:
bart fftshift 7 input output
The input and output are specified without file extensions. The value 7 above is a bitmask indicating the image is stored in axis 0,1,2 (1+2+4) of the input array. This bitmask is used in the commands that follow also.
Uncentered k-space data is easily identified by comparing the magnitude of the corners versus the center of the array. Centered FFTs of natural data will have the largest magnitudes near the center of the array when plotted.
Parallel MR imaging is often performed as a two-step process consisting of coil-sensitivity estimation, then reconstruction assuming the estimated sensitivity maps are exact. BART implements this approach through the ecalib
and pics
commands. The coil-sensitivity maps can be estimated using the ESPIRiT approach using the command
bart ecalib
-m1
Produce a single set of sensitivity maps
-r26
Number of fully sampled reference lines
input output_sens
The central reference region is used by BART to estimate the coil sensitivities. This area is also known as the auto-calibration region. The number of lines used in our masking procedure is a percentage of the k-space width, as described in Section 4.2.
Given the estimated coil sensitivities, a reconstruction using TV regularization can be performed with
bart pics
-d4
Debug log level, use 0 for no stdout output
-i200
Optimization iterations
-R T:7:0:0.05
Use TV (T) with regularizer strength 0.05, with bitmask 7
input output_sens output
The output of this command is in CFL format. It can be converted to a PNG using bart toimg
. When using L1 wavelet regularization, the character ”W” should be used in the R
option, with the additional -m
argument to ensure that ADMM is used.