Models of signals and images based on sparsity, low-rank, and other properties are useful in image and video processing. In ill-posed or ill-conditioned inverse problems, it is often useful to employ signal models that reflect known or assumed properties of the latent images. Such models are often used to construct appropriate regularization. For example, the sparsity of images in wavelet or discrete cosine transform (DCT) domains has been exploited for image and video reconstruction tasks [1, 2, 3]. In particular, the learning of such models has been explored in various settings [4, 5, 6, 7, 8, 9, 10], where they may potentially outperform fixed models since they adapt to signals and signal classes.
There has been growing interest in such dictionary learning-based image restoration or reconstruction methods [11, 12, 13]. For example, in blind compressed sensing [10, 12, 13], a dictionary for the underlying image or video is estimated together with the image from undersampled measurements. This allows the dictionary to adapt to the current data, which may enable learning novel features, providing improved reconstructions.
For inverse problems involving large-scale or streaming data, e.g., in interventional imaging or restoring (e.g, denoising, inpainting) large or streaming videos, etc., it is often critical to obtain reconstructions in an online or time-sequential (or data-sequential) manner to limit latency. Batch methods that process all the data at once are often prohibitively expensive in terms of time and memory usage, and infeasible for streaming data. Methods for online learning of dictionary and sparsifying transform models from streaming signals including noisy data have been recently proposed [14, 15, 16, 17, 18] and shown to outperform state-of-the-art methods [19, 20, 21]
including deep learning methods for video denoising.
This paper focuses on methods for dynamic image reconstruction from limited measurements such as video or medical imaging data. One such important class of dynamic image reconstruction problems arises in dynamic magnetic resonance imaging (dMRI), where the data are inherently or naturally undersampled because the object is changing as the data (samples in k-space or Fourier space of the object acquired sequentially over time) is collected. Various techniques have been proposed for reconstructing such dynamic image sequences from limited measurements [22, 23, 24]. Such methods may achieve improved spatial or temporal resolution by using more explicit signal models compared to conventional approaches (such as k-space data sharing in dMRI, where data is pooled in time to make sets of k-space data with sufficient samples, such as in the form of a Casorati matrix ); these methods typically achieve increased accuracy at the price of increased computation.
While some reconstruction techniques are driven by sparsity models and assume that the image sequence is sparse in some transform domain or dictionary , other methods exploit low-rank or other kinds of sophisticated models [25, 26, 27, 28, 29, 24, 30]. For example, L+S methods [24, 30] assume that the image sequence can be decomposed as the sum of low-rank and sparse (either directly sparse or sparse in some known transform domain) components that are estimated from measurements. Dictionary learning-based approaches including with low-rank models provide promising performance for dynamic image reconstruction [12, 31, 32]. Although these methods allow adaptivity to data and provide improved reconstructions, they involve batch processing and typically expensive computation and memory use. Next, we outline our contributions, particularly a new online framework that address these issues of state-of-the-art batch reconstruction methods.
This paper investigates a framework for Online Adaptive Image Reconstruction, dubbed OnAIR, exploiting learned dictionaries. In particular, we model spatiotemporal patches of the underlying dynamic image sequence as sparse in an (unknown) dictionary, and propose a method to jointly and sequentially (over time) estimate the dictionary, sparse codes, and images from streaming measurements. Various constraints are considered for the dictionary model such as a unitary matrix, or a matrix whose atoms/columns are low-rank when reshaped into spatio-temporal matrices. The proposed OnAIR algorithms involve simple and efficient updates and require a small, fixed amount of data to be stored in memory at a time, greatly reducing the computation and memory demands during processing compared to conventional batch methods that process all of the data simultaneously.
Numerical experiments demonstrate the effectiveness of the proposed OnAIR methods for performing video inpainting from subsampled and noisy pixels, and dynamic MRI reconstruction from very limited k-t space measurements. The experiments show that our proposed methods are able to learn dictionaries adaptively from corrupted measurements with important representational features that improve the quality of the reconstructions compared to non-adaptive schemes. Moreover, the OnAIR methods provide better image quality than batch schemes with a single adapted dictionary since they allow the dictionary to change over time for changing (dynamic) videos. At the same time, they require much lower runtimes than adaptive batch (or offline) reconstructions.
Short versions of this work appeared recently in  and . This paper extends those initial works by exploring OnAIR methods with multiple constraints on the learned models including a novel Unitary Dictionary (OnAIR-UD) constraint, and a Low-rank Dictionary atom (OnAIR-LD) constraint. Such constraints may offer a degree of algorithmic efficiency and robustness to artifacts in limited data settings. For example, the updates per time instant in OnAIR-UD are simpler and non-sequential, enabling it to be faster in practice than the other OnAIR schemes. Importantly, this paper reports extensive numerical experiments investigating and evaluating the performance of the proposed online methods in multiple inverse problems, and comparing their performance against recent related online and batch methods in the literature. Finally, we also compare the performance of the OnAIR methods to an oracle online scheme, where the dictionary is learned offline from the ground truth data, and show that the proposed online learning from highly limited and corrupted streaming measurements performs comparably to the oracle scheme.
The rest of this paper is organized as follows. Section II reviews the dictionary learning framework that forms the basis of our online schemes. Section III describes our formulations for online adaptive image reconstruction and Section IV presents algorithms for solving the problems. Section V presents extensive numerical experiments that demonstrate the promising performance of our proposed methods in inverse problem settings such as video inpainting and dynamic MRI reconstruction from highly limited data. Finally, Section VI concludes with proposals for future work.
Ii Dictionary Learning Models
Here, we briefly review some prior works on dictionary learning that helps build our OnAIR framework. Given a set of signals (or vectorized image patches) that are represented as columns of a matrix, the goal of dictionary learning (DL) is to learn a dictionary and a matrix of sparse codes such that . Traditionally, the DL problem is often formulated  as follows:
where and denote the th column of and the th column of , respectively, and denotes a target sparsity level for each signal. Here, the “norm” counts the number of non-zero entries of its argument, and the columns of are set to unit norm to avoid scaling ambiguity between and . Alternatives to (1) exist that replace the “norm” constraint with other sparsity-promoting constraints, or enforce additional properties on the dictionary [36, 37, 38], or enable online dictionary learning .
Dictionary learning algorithms [39, 4, 40, 41, 14, 42] typically attempt to solve (1) or its variants in an alternating manner by performing a sparse coding step (updating ) followed by a dictionary update step (updating ). Methods such as K-SVD  also partially update the coefficients in in the dictionary update step, while some recent methods update the variables jointly and iteratively . Algorithms for (1) typically repeatedly update (involves an NP-hard sparse coding problem) and tend to be computationally expensive.
The DINO-KAT learning problem  is an alternative dictionary learning framework that imposes a low-rank constraint on reshaped dictionary atoms (columns). The corresponding problem formulation is
where the “norm” counts the total number of nonzeros in . The operator reshapes dictionary atoms into matrices of size for some and such that , and is the maximum allowed rank for each reshaped atom. The dimensions of the reshaped atoms can be chosen on an application-specific basis. For example, when learning from 2D image patches, the reshaped atoms can have the dimensions of the 2D patch. In the case where spatiotemporal (3D) patches are vectorized and extracted from dynamic data, the atoms could be reshaped into space-time (2D) matrices. Spatiotemporal patches of videos often have high temporal correlation, so they may be well represented by a dictionary with low-rank space-time (reshaped) atoms . The parameter in (2) controls the overall sparsity of . Penalizing the overall or aggregate sparsity of enables variable sparsity levels across training signals (a more flexible model than (1)). The constraints for prevent pathologies (e.g., unbounded algorithm iterates) due to the non-coercive objective . In practice, we set very large, and the constraint is typically inactive. We proposed an efficient algorithm in  for Problem (2) that requires less computation than algorithms such as K-SVD.
Another alternative to the DL problems in (1) and (2) involves replacing the constraints in (2) with the unitary constraint , where is the identity matrix. Learned unitary operators work well in image denoising and more general reconstruction problems [45, 46]. Moreover, algorithms for learning unitary dictionaries tend to be computationally cheap . Our OnAIR framework exploits some of the aforementioned dictionary structures.
Iii Problem Formulations
This section formulates data-driven online image reconstruction. First, we propose an online image reconstruction framework based on an adaptive dictionary regularizer as in (2). Let denote the sequence of dynamic image frames to be reconstructed. We assume that noisy, undersampled linear measurements of these frames are observed. We process the streaming measurements in minibatches, with each minibatch containing measurements of consecutive frames. Let
denote the vectorized version of the 3D tensor obtained by (temporally) concatenatingconsecutive frames of the unknown dynamic images. In practice, we construct the sequence using a sliding window (over time) strategy, which may involve overlapping or non-overlapping minibatches. We model the spatiotemporal (3D) patches of each as sparse with respect to a latent dictionary . Under this model, we propose to solve the following online dictionary learning-driven image reconstruction problem for each time
Here indexes time, and denotes the (typically undersampled) measurements that are related to the underlying frames (that we would like to reconstruct) through the linear sensing operator . For example, in video inpainting, samples a subset of pixels in , or in dynamic MRI, corresponds to an undersampled Fourier encoding. The operator is a patch extraction matrix that extracts the th spatiotemporal patch from as a vector. A total of (possibly) overlapping 3D patches are assumed. Matrix with is the synthesis dictionary to be learned and is the unknown sparse code for the th patch of , with . Matrix has as its columns, and its aggregate (total) sparsity is penalized. The weights are regularization parameters that control the relative adaptive dictionary regularization and sparsity of , respectively, in the model.
Problem (P1) has a form often used for online optimization [14, 47]. The objective is a weighted average of the function , with the constraints defining the feasible sets for and the sparse codes. Problem (P1) jointly estimates the adaptive dictionary model for the patches of together with the underlying image frames. Note that, for each time index , we solve (P1) only for the latest group of frames and the latest sparse coefficients , while the previous images and sparse coefficients are set to their estimates from previous minibatches (i.e., and for ). However, the dictionary is adapted to all the spatiotemporal patches observed up to time .111We emphasize the global dependence of on all previous data by using the optimization variable within the objective rather than a time-indexed variable as done for the variables and . Assuming , the objective in (P1) with respect to acts as a surrogate (upper bound) for the usual empirical (batch) risk function [14, 47] that uses the optimal reconstructions and sparse codes (i.e., those that minimize the cost) for all the data. An exponential forgetting factor with is used for the terms in (P1), and is a normalization constant for the objective. The forgetting factor diminishes the influence of “old” data on the dictionary adaptation process. When the dynamic object or scene changes slowly over time, a large (close to 1) is preferable so that past information has more influence and vice versa.
As written in (P1), the dictionary is updated based on patches from all previous times; however, the proposed algorithm does not need to store this information during optimization. Indeed, our algorithm in Section IV computes only a few constant-sized matrices that contain the necessary cumulative (over time) information to solve (P1).
When minibatches and do not overlap (i.e., no common frames), each frame is reconstructed exactly once in its corresponding window in (P1). However, it is often beneficial to construct the ’s using an overlapping sliding window strategy , in which case a frame may be reconstructed in multiple windows (minibatches of frames). In this case, we independently produce estimates for each time index as indicated in (P1), and then we produce a final estimate of the underlying frame by computing a weighted average of the reconstructions of that frame from each window in which it appeared. We found empirically that an exponentially -weighted average (similar to that in (P1)) performed better than alternatives such as an unweighted average of the estimates from each window or using only the most recent reconstruction from the latest window.
In (P1), we imposed a low-rank constraint on the dictionary atoms. As an alternative, we consider constraining the dictionary to be a unitary matrix. The online optimization problem in this case is as follows:
where all terms in (P2) are defined as in (P1). Note that (P2) does not require the -norm constraints on the sparse coefficients because the unitary constraint on the dictionary precludes the possibility of repeated dictionary atoms, which was the motivation for including these constraints in (P1) .
Iv Algorithms and Properties
This section presents the algorithms for Problems (P1) and (P2) and their properties. We propose an alternating minimization-type scheme for (P1) and (P2) and exploit the online nature of the optimization to minimize the costs efficiently. At each time index , we alternate a few times between updating while holding fixed (the dictionary learning step) and then updating with held fixed (the image update step). For each , we use a warm start for (initializing) the alternating approach. We initialize the dictionary with the most recent dictionary (). Frames of that were estimated in the previous (temporal) windows are initialized with the most recent
-weighted reconstructions, and new frames are initialized using simple approaches (e.g., interpolation in the case of inpainting). Initializing the sparse coefficientswith the codes estimated in the preceding window () worked well in practice. All updates are performed efficiently and with modest memory usage as will be shown next. Figure 1 provides a graphical flowchart depicting our proposed online scheme. The next subsections present the alternating scheme for each time in more detail.
Iv-a Dictionary Learning Step for (P1)
Let . Minimizing (P1) with respect to yields the following optimization problem:
where is the matrix whose columns contain the patches for , and is the th column of . We use a block coordinate descent approach  (with few iterations) to update the sparse coefficients and atoms (columns of ) in (3) sequentially. For each , we first minimize (3) with respect to keeping the other variables fixed (sparse coding step), and then we update keeping the other variables fixed (dictionary atom update step). These updates are performed in an efficient online manner as detailed next.
Iv-A1 Sparse Coding Step
Minimizing (3) with respect to leads to the following subproblem:
where the matrix
where is the elementwise hard thresholding operator that sets entries with (complex) magnitude less than to zero and leaves other entries unaffected, is a length- vector of ones, and denote elementwise multiplication and elementwise minimum respectively, and is computed elementwise, with denoting the phase. We do not construct in (6) explicitly; rather we efficiently compute the matrix-vector product based on the most recent estimates of each quantity using sparse matrix-vector operations .
Iv-A2 Dictionary Atom Update Step
Here, we minimize (3) with respect to . This update uses past information via the forgetting factor . Let and denote the -weighted patches and sparse coefficients, respectively, and let and denote the matrices formed by stacking the ’s horizontally and ’s vertically, respectively, for times 1 to . Finally, define using the most recent estimates of all variables, with denoting the th column of . Using this notation, the minimization of (3) with respect to becomes
Let be the rank-
truncated singular value decomposition (SVD) of the matrixthat is obtained by computing the leading singular vectors and singular values of the full SVD . Then a solution to (7) is given by 
where is any matrix222In our experiments, we set to be the reshaped first column of the identity matrix, which worked well. of appropriate dimension with rank at most such that .
The main computation in (8) is computing , since the SVD of the small (i.e., space-time matrix with typically) matrix has negligible computational cost. In principle, the matrix-vector multiplication depends on all past information processed by the streaming algorithm; however, it can be recursively computed using constant time and memory. Indeed, observe that
where denotes the th element of a vector . The vectors and depend on all previous data, but they can be recursively computed over time as
where for each column index , the matrix is understood to contain the latest versions of the sparse codes already updated (sequentially) during the dictionary learning step. Using these recursive formulas, the product can be readily computed each time in our algorithm. Thus, the update in (8) can be performed in a fully online manner (i.e., without storing all the past data).
Alternatively, we can collect the vectors and as columns of matrices and , and perform the following recursive updates once at the end of of the overall algorithm for (P1) (i.e., once per minibatch of frames):
Here denotes the final sparse codes estimates from the algorithm for (P1). In this case, when performing the inner update (9), the contributions of the two terms on the right hand side of (10) are incorporated separately. The matrices and are small, constant-sized matrices whose dimensions are independent of the time index and the dimensions of the frame sequence, so they are stored for efficient use in the next minibatch. Moreover, the matrix is sparse, so all the matrix-matrix multiplications in (11) (or the matrix-vector multiplications in (10)) are computed using efficient sparse operations.
Iv-B The Unitary Dictionary Variation
In the case of (P2), unlike for (P1), we do not perform block coordinate descent over the columns of and . Instead we minimize (P2) with respect to each of the matrices and directly and exploit simple closed-form solutions for the matrix-valued updates. The following subsections describe the solutions to the and update subproblems.
Iv-B1 Sparse Coding Step
Since is a unitary matrix, minimizing (P2) with respect to yields the following subproblem:
where is again the matrix whose th column is . The solution to (12) is given by the simple elementwise hard-thresholding operation
Iv-B2 Dictionary Update Step
Minimizing (P2) with respect to results in the following optimization:
where is a full SVD of . Similarly as in (11), let . The matrix can be recursively updated over time according to (11) (or (10)), so the dictionary update step can be performed efficiently and fully online.
Iv-C Image Update Step
Minimizing (P1) or (P2) with respect to the minibatch yields the following quadratic sub-problem:
Problem (17) is a least squares problem with normal equation
In applications such as video denoising or inpainting, the matrix pre-multiplying in (18) is a diagonal matrix that can be efficiently pre-computed and inverted. More generally, in inverse problems where the matrix pre-multiplying (i.e., in particular) in (18) is not diagonal or readily diagonalizable (e.g., in dynamic MRI with multiple coils), we minimize (17) by applying an iterative optimization method. One can use any classical algorithm in this case, such as the conjugate gradient (CG) method. For the experiments shown in Section V, we used a few iterations (indexed by ) of the simple proximal gradient method [49, 50] with updates of the form
where and the proximal operator of a function is . The proximal operation in (19) corresponds to a simple least squares problem that is solved efficiently by inverting a diagonal matrix (arising from the normal equation of the proximal operation), which is pre-computed. A constant step-size suffices for convergence . Moreover, the iterations (19) monotonically decrease the objective in (17) when a constant step size is used .
Fig. 2 summarizes the overall OnAIR algorithms for the online optimization Problems (P1) and (P2). In practice, we typically use more iterations for reconstructing the first minibatch of frames, to create a good warm start for further efficient online optimization. The initial in the algorithm can be set to an analytical dictionary (e.g., based on the DCT or wavelets) and we set the initial
to a zero matrix.
|OnAIR Algorithms for (P1) and (P2)|
|Inputs : measurement sequence , weights and , rank , upper bound , forgetting factor , number of dictionary learning iterations , number of proximal gradient iterations , and the number of outer iterations per minibatch .|
|Outputs : reconstructed dynamic image sequence .|
|Initialization: Initial (first) estimates of new incoming (measured) frames, initial and at , and .|
|For = , do|
|Warm Start: Set , , and is set as follows: frames estimated in previous minibatches are set to a -weighted average of those estimates and new frames are set as in .|
|For = repeat|
|Recursive Updates: Update and using (11).|
|Output Updates: Set , and . For frames not occurring in future windows, is set to a -weighted average of estimates over minibatches.|
Iv-D Computational Cost and Convergence
The computational cost for each time index of the proposed algorithms for solving the online image reconstruction problems (P1) and (P2) scales as , where with assumed, and is the number of (overlapping) patches in each temporal window. The cost is dominated by various matrix-vector multiplications. Assuming each window’s length , the memory (storage) requirement for the proposed algorithm scales as , which is the space required to store the image patches of when performing the updates for (P1) or (P2). Typically the minibatch size is small (to allow better tracking of temporal dynamics), so the number and maximum temporal width of 3D patches in each window are also small, ensuring modest memory usage for the proposed online methods.
While the overall computational cost and memory requirements for each time index are similar for the algorithms for (P1) and (P2), the simple matrix-valued forms of the dictionary and sparse code updates for (P2) result in practice, in a several-fold decrease in actual runtimes due to optimizations inherent to matrix-valued computations in modern linear algebra libraries.
The proposed algorithms involve either exact block coordinate descent updates or for example, proximal gradient iterations (with appropriate step size) when the matrix pre-multiplying in (18) is not readily diagonalizable. These updates are guaranteed to monotonically decrease the objectives in (P1) and (P2) for each time index . Whether the overall iterate sequence produced by the algorithms also converges over time (see e.g., ) is an interesting open question that we leave for future work.
|% Missing Pixels||Coastguard||Bus||Flower Garden|
V Numerical Experiments
This section presents extensive numerical experiments illustrating the usefulness of the proposed OnAIR methods. We consider two inverse problem applications in this work: video reconstruction (inpainting) from noisy and subsampled pixels, and dynamic MRI reconstruction from highly undersampled k-t space data. The algorithm for (P1) is dubbed OnAIR-LD when the parameter is lower than its maximum setting (i.e., low-rank dictionary atoms) and is dubbed OnAIR-FD (i.e., full-rank dictionary atoms) otherwise. The algorithm for (P2) is dubbed OnAIR-UD (unitary dictionary). The following subsections discuss the applications and results. A link to software to reproduce our results will be provided at http://web.eecs.umich.edu/~fessler/.
V-a Video Inpainting
First, we consider video inpainting or reconstruction from subsampled and potentially noisy pixels. We work with the publicly available videos333The data is available at http://www.cs.tut.fi/~foi/GCF-BM3D/. provided by the authors of the BM4D method . We process the first frames of each video at native resolution. We measure a (uniform) random subset of the pixels in each frame of the videos, and also simulate additive (zero mean) Gaussian noise for the measured pixels in some experiments. The proposed OnAIR methods are then used to reconstruct the videos from their corrupted (noisy and/or subsampled) measurements.
The parameters for the proposed OnAIR methods are chosen as follows. We used a sliding (temporal) window of length
frames with a temporal stride of 1 frame, to reconstruct (maximally overlapping) minibatches of frames. In each window, we extractedoverlapping spatiotemporal patches with a spatial stride of 2 pixels. We learned a square dictionary, and the operator reshaped dictionary atoms into space-time matrices. We ran the OnAIR algorithms for (P1) and (P2) for iterations in each temporal window (minibatch), with inner iteration (of block coordinate descent or alternation) for updating , and used a direct reconstruction as per (18) in the image update step. A forgetting factor was observed to work well. We ran the algorithms for more () iterations for the first minibatch of data, for which the initial was the discrete cosine transform (DCT) matrix and the initial sparse codes were zero. Newly arrived frames were first initialized (i.e., in the first minibatch they occur) using 2D cubic interpolation. We simulated various levels of subsampling of the videos (with and without noise), and we chose or full-rank atoms in (P1), which outperformed low-rank atoms in the experiments here. The videos in this section have substantial temporal motion, so allowing full-rank atoms enabled the algorithm to better learn the dynamic temporal features of the data. Section V-B demonstrates the usefulness of low-rank atoms in (P1). We tuned the weights and for (P1) and (P2) to achieve good reconstruction quality at an intermediate undersampling factor (70% missing pixels) for each video, and used the same parameters for other factors.
We measure the performance of our methods using the (3D) peak signal-to-noise ratio (PSNR) metric that is computed as the ratio of the peak pixel intensity in the video to the root mean square reconstruction error relative to the ground truth video. All PSNR values are expressed in decibels (dB).
We compare the performance of the proposed OnAIR-FD () and OnAIR-UD methods for (P1) and (P2) with that of an identical online algorithm but which uses a fixed DCT dictionary. We also produce reconstructions using a “batch” version of the method for (P1)  (with ) that processes all video frames jointly.444This batch learning-based reconstruction method is equivalent to the proposed online method for (P1) with set to the total number of frames in the video. For each comparison method, we used the same patch dimensions, initializations, etc., and tuned the parameters and of each method individually. The one exception is that we used a spatial patch stride of 4 pixels for the Batch Learning method rather than the 2 pixel spatial stride used for the online methods. This change was made because the batch method is memory intensive—it extracts and processes image patches from the entire video concurrently—so it was necessary to process fewer patches to make the computation feasible. We ran the batch learning-based reconstruction for 20 iterations, which worked well. Finally, we also compare the OnAIR reconstructions to baseline reconstructions produced by 2D (frame-by-frame cubic) interpolation and 3D interpolation (using the natural neighbor method in MATLAB).
|% Missing Pixels||Coastguard (25 dB PSNR)|
Table I lists the PSNR values (in dB) for the various reconstruction methods for different levels of subsampling (from 50% to 90% missing pixels) of three videos (without added noise) from the BM4D dataset. Table II shows analogous results for the Coastguard video when i.i.d. zero-mean Gaussian noise with 25 dB PSNR was added before sampling the pixels. The proposed OnAIR-FD method typically provides the best PSNRs at higher undersampling factors, and the proposed OnAIR-UD method (with a more structured/unitary555This also corresponds to a unitary sparsifying transform model , with denoting the sparsifying transform or operator. ) performs better at lower undersampling factors.
The best PSNRs achieved by the OnAIR methods are typically better (by up to 1.4 dB) than for the Batch Learning scheme. Both OnAIR variations also typically outperform (by up to 2 dB) the online method with a fixed DCT dictionary (the initial in our methods), suggesting that the dictionaries learned by the proposed methods are successfully adapting to and reconstructing underlying features of the videos.
Figs. 3 and 4 show the original and reconstructed frames for a few representative frames of the Coastguard and Flower Garden videos. Results for multiple methods are shown. Fig. 3 shows that the proposed OnAIR-FD method produces visually more accurate reconstructions of the texture in the waves and also produces fewer artifacts near the boats in the water and the rocks on the shore. Fig. 4 illustrates that the proposed OnAIR method produces a sharper reconstruction with less smoothing artifacts than the online method with a fixed DCT dictionary and the batch learning-based reconstruction method, and it is less noisy than the interpolation-based reconstruction.
Fig. 5 shows the frame-by-frame (2D) PSNRs for the Coastguard video inpainted from 70% missing pixels, using various methods. Clearly the proposed OnAIR-FD method achieves generally higher PSNRs across frames of the video. The overall trends in PSNR over frames are similar across the methods and are due to motion in the original videos, with more motion generally resulting in lower PSNRs.
Fig. 6 shows a representative example of a (final) learned dictionary produced by the proposed OnAIR-FD method for the Bus video, along with the initial (at ) DCT dictionary. The dictionaries each contain 320 atoms, each of which is an space-time tensor. We visualize each atom by plotting the first slice of the atom and also plotting the profile from a vertical slice through the middle of each atom tensor. The (first slice) images show that the learned dictionary has adapted to both smooth and sharp gradients in the image, and the dynamic (evolved) nature of the profiles shows that the dictionary atoms have adapted to temporal trends in the data.
To assess the relative efficiency of the OnAIR methods compared to the Batch Learning method, we measured runtimes on the Coastguard video with the same fixed patch sizes, patch strides, and numbers of iterations for each method.666Experiments were performed in MATLAB R2017b on a 2016 MacBook Pro with a 2.7 GHz Intel Core i7 processor and 16 GB RAM. In this case, the OnAIR-FD and OnAIR-UD methods were 1.8x and 4.0x faster, respectively, than the Batch Learning method. In addition, the OnAIR methods typically require much fewer iterations (per minibatch) compared to the batch method to achieve their best reconstruction accuracies, so, in practice, when utilizing half as many outer iterations, the OnAIR-FD and OnAIR-UD methods were approximately 3.6x and 8.0x faster, respectively, than the batch scheme.
V-B Dynamic MRI Reconstruction
Here, we demonstrate the usefulness of the proposed OnAIR methods for reconstructing dynamic MRI data from highly undersampled k-t space measurements. We work with the multi-coil (12-element coil array) cardiac perfusion data  and the PINCAT data [12, 52] from prior works. For the cardiac perfusion data, we retrospectively undersampled the k-t space using variable-density random Cartesian undersampling with a different undersampling pattern for each time frame, and for the PINCAT data we used pseudo-radial sampling with a random rotation of radial lines between frames. The undersampling factors tested are shown in Table III. For each dataset, we obtained reconstructions using the proposed OnAIR methods, the online method but with a fixed DCT dictionary, and the batch learning-based reconstruction (based on (P1)) method . We also ran the recent L+S  and k-t SLR 
methods, two sophisticated batch methods for dynamic MRI reconstruction that process all the frames jointly. Finally, we also computed a baseline reconstruction in each experiment by performing zeroth order interpolation across time at non-sampled k-t space locations (by filling such locations with the nearest non-zero entry along time) and then backpropagating the filled k-t space to image space by pre-multiplying with thecorresponding to fully sampled data. The first estimates () of newly arrived frames in our OnAIR methods are also computed via such zeroth order interpolation, but using only the estimated (i.e., already once processed and reconstructed) nearest (older) frames.
For the online schemes, we used spatiotemporal patches with frames per temporal window (minibatch) and a (temporal) window stride of 1 frame. We extracted overlapping patches in each minibatch using a spatial stride of 2 pixels along each dimension. We learned a square dictionary whose atoms when reshaped into space-time matrices had rank , which worked well in our experiments. A forgetting factor of was observed to work well. We ran the online schemes for outer iterations777Prior to the first outer iteration for each minibatch in (P1), the sparse coefficients were updated using block coordinate descent iterations to allow better adaptation to new patches. per minibatch, with iteration in the dictionary learning step and proximal gradient steps in the image update step, respectively. We used outer iterations for the first minibatch to warm start the algorithm and used an initial DCT , and the initial sparse codes were zero. After one complete pass of the online schemes over all the frames, we performed another pass over the frames, using the reconstructed frames and learned dictionary from the first pass as the first initializations in the second pass. The additional pass gives a slight image quality improvement over the first (a single pass corresponds to a fully online scheme) as will be shown later.
For the batch dictionary learning-based reconstruction method, we used the same patch dimensions, strides, and initializations as for the online methods. We ran the batch method for 50 iterations. For the L+S and k-t SLR methods, we used the publicly available MATLAB implementations from  and , respectively, and ran each method to convergence. The regularization parameters (weights) for all the methods here were tuned for each dataset by sweeping them over a range of values and selecting values that achieved good reconstruction quality at intermediate k-t space undersampling factors. We measured the dMRI reconstruction quality using the normalized root mean square error (NRMSE) metric expressed as percentage that is computed as
where is a candidate reconstruction and is the reference reconstruction (e.g., computed from “fully” sampled data).
|OnAIR-LD (2 passes)||10.2%||12.8%||14.8%||16.7%||18.1%||18.0%|
|OnAIR-LD (1 pass)||10.2%||12.9%||14.8%||16.6%||18.3%||18.1%|
V-B2 Results and Comparisons
Table III shows the reconstruction NRMSE values obtained using various methods for the cardiac perfusion and PINCAT datasets at several undersampling factors. The proposed OnAIR-LD method achieves lower NRMSE values in almost every case compared to the L+S and k-t SLR methods, the online DCT scheme, the baseline reconstruction, and the batch learning-based reconstruction scheme. Unlike the batch schemes (i.e., L+S, k-t SLR, and the batch learning-based reconstruction) that process or learn over all the data jointly, the OnAIR-LD scheme only directly processes data corresponding to a minibatch of frames at any time. Yet OnAIR-LD outperforms the other methods because of its memory (via the forgetting factor) and local temporal adaptivity (or tracking). These results show that the proposed OnAIR methods are well-suited for processing streaming data.
Fig. 7 shows the reconstructions and reconstruction error maps (magnitudes displayed) for some representative frames from the cardiac perfusion and PINCAT datasets at 12x and 7x undersampling, respectively. The error maps indicate that the proposed OnAIR-LD scheme often produces fewer artifacts compared to the existing methods.
Fig. 8 shows that the OnAIR-LD scheme also typically achieves better frame-by-frame NRMSE compared to the other methods. Finally, Fig. 9 shows the reconstructed profiles for various methods obtained by extracting the same vertical line segment from each reconstructed frame of the PINCAT data and concatenating them. The online DCT scheme and the batch methods L+S and k-t SLR show line-like or additional smoothing artifacts that are not produced by the proposed OnAIR method.
Table IV investigates the properties of the proposed OnAIR methods in more detail using the cardiac perfusion data. Specifically, it compares the NRMSE values produced by the OnAIR-LD scheme with one or two passes over the data, the OnAIR-UD method, and the online method with a fixed DCT dictionary. In addition, we ran the online method but with a fixed “oracle” dictionary learned from patches of the reference (true) reconstruction by solving the DINO-KAT learning problem (2). The oracle dictionary was computed based on the “fully” sampled data, so it can be viewed as the “best” dictionary that one could learn from the undersampled dataset. From Table IV, we see that the NRMSE values achieved by the OnAIR-LD scheme with two passes are within 0.0% - 0.5% of the oracle NRMSE values, which suggests that the proposed scheme is able to learn dictionaries with good representational (recovery) qualities from highly undersampled data. Moreover, the performance of the OnAIR-LD scheme with a single pass is almost identical to that with two passes, suggesting promise for fully online dMRI reconstruction. The OnAIR-LD method outperformed the OnAIR-UD scheme for the cardiac perfusion data indicating that temporal low-rank properties better characterize the dataset than unitary models. All the OnAIR schemes outperformed the nonadaptive yet online DCT-based scheme.
To assess the relative efficiency of the OnAIR methods compared to the Batch Learning method, we measured runtimes on the PINCAT dataset with the same parameter settings for each method used to generate the results in Table III.888Experiments were performed in MATLAB R2017b on a 2016 MacBook Pro with a 2.7 GHz Intel Core i7 processor and 16 GB RAM. With these settings, the OnAIR-LD and OnAIR-UD methods were 9.8x and 40.4x faster, respectively, than the Batch Learning method. One key reason for these significant performance improvements is that 50 outer iterations were required by the batch method while only 7 outer iterations were required by the online methods (after the first minibatch) to achieve the reported accuracies; this result suggests that the online dictionary adaptation performed by the OnAIR methods is both computationally efficient and allows the model to better adapt to the underlying structure of the data. The additional speedup of the OnAIR-UD method with respect to OnAIR-LD is attributable to the relative efficiency of the matrix-valued updates of the unitary method compared to the less optimized block coordinate descent iterations over the columns of and prescribed by OnAIR-LD.
Fig. 10 shows an example of a learned dictionary produced by the proposed OnAIR-LD method on the PINCAT dataset at 9x undersampling, which is compared with the initial DCT dictionary. The dictionaries have atoms, each a complex-valued space-time tensor. Since the OnAIR LD dictionary atoms have rank when reshaped into space-time matrices, we directly display the real and imaginary parts of the first () slice of each learned atom. The initial DCT is also displayed similarly. The (eventual) learned dictionary has clearly evolved significantly from the initial DCT atoms and has adapted to certain smooth and sharp textures at various orientations in the data.
Recall that we chose full-rank () atoms in the video inpainting experiments, while here we chose low-rank () atoms. Intuitively, low-rank atoms are a better model for the dynamic MRI data because the videos have high temporal correlation and rank- atoms are necessarily constant in their temporal dimension. Conversely, the videos from Section V-A contained significant camera motion and thus dictionary atoms with more temporal variation (i.e., higher rank) enabled more accurate reconstructions.
This paper has presented a framework for online estimation of dynamic image sequences by learning dictionaries. Various properties were also studied for the learned dictionary such as a unitary property, and low-rank atoms, which offer additional efficiency or robustness to artifacts. The proposed OnAIR algorithms sequentially and efficiently update the images, dictionary, and sparse coefficients of image patches from streaming measurements. Importantly, our algorithms can process arbitrarily long video sequences with constant memory usage over time. Our numerical experiments demonstrated that the proposed methods produce accurate reconstructions for video inpainting and dynamic MRI reconstruction. The proposed methods may also be suitable for other inverse problems, including medical imaging applications such as interventional imaging, and other video-processing tasks from computer vision. We hope to investigate these application domains as well as demonstrate potential real-time applicability for OnAIR approaches in future work.
-  N. Rajpoot, Z. Yao, and R. Wilson, “Adaptive wavelet restoration of noisy video sequences,” in International Conference on Image Processing, vol. 2, 2004, pp. 957–960 Vol.2.
-  D. Rusanovskyy and K. Egiazarian, “Video denoising algorithm in sliding 3D DCT domain,” in Advanced Concepts for Intelligent Vision Systems: 7th International Conference, ACIVS 2005, 2005, pp. 618–625.
-  M. Lustig, D. L. Donoho, and J. M. Pauly, “Sparse MRI: The application of compressed sensing for rapid MR imaging,” Magnetic Resonance in Medicine, vol. 58, no. 6, pp. 1182–1195, 2007.
-  M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Transactions on signal processing, vol. 54, no. 11, pp. 4311–4322, 2006.
-  M. Yaghoobi, T. Blumensath, and M. Davies, “Dictionary learning for sparse approximations with the majorization method,” IEEE Transaction on Signal Processing, vol. 57, no. 6, pp. 2178–2191, 2009.
-  S. Ravishankar and Y. Bresler, “Learning sparsifying transforms,” IEEE Trans. Signal Process., vol. 61, no. 5, pp. 1072–1086, 2013.
-  ——, “Learning doubly sparse transforms for images,” IEEE Transactions on Image Processing, vol. 22, no. 12, pp. 4598–4612, 2013.
-  B. Wen, S. Ravishankar, and Y. Bresler, “Learning flipping and rotation invariant sparsifying transforms,” in 2016 IEEE International Conference on Image Processing, 2016, pp. 3857–3861.
-  J. Mairal, M. Elad, and G. Sapiro, “Sparse representation for color image restoration,” IEEE Trans. on Image Processing, vol. 17, no. 1, pp. 53–69, 2008.
-  S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1028–1041, 2011.
-  M. Protter and M. Elad, “Image sequence denoising via sparse and redundant representations,” IEEE Trans. on Image Processing, vol. 18, no. 1, pp. 27–36, 2009.
-  S. G. Lingala and M. Jacob, “Blind compressive sensing dynamic MRI,” IEEE Transactions on Medical Imaging, vol. 32, no. 6, pp. 1132–1145, 2013.
-  S. Ravishankar, B. E. Moore, R. R. Nadakuditi, and J. A. Fessler, “Efficient learning of dictionaries with low-rank atoms,” in IEEE Global Conference on Signal and Information Processing (GlobalSIP), December 2016, pp. 222–226.
-  J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online learning for matrix factorization and sparse coding,” J. Mach. Learn. Res., vol. 11, pp. 19–60, 2010.
-  S. Ravishankar, B. Wen, and Y. Bresler, “Online sparsifying transform learning – Part I: Algorithms,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 4, pp. 625–636, 2015.
-  S. Ravishankar and Y. Bresler, “Online sparsifying transform learning – part II: Convergence analysis,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 4, pp. 637–646, 2015.
-  B. Wen, S. Ravishankar, and Y. Bresler, “Video denoising by online 3D sparsifying transform learning,” in 2015 IEEE International Conference on Image Processing (ICIP), 2015, pp. 118–122.
-  ——, “VIDOSAT: High-dimensional sparsifying transform learning for online video denoising,” IEEE Transactions on Image Processing, 2018, to appear.
-  H. Guo and N. Vaswani, “Video denoising via online sparse and low-rank matrix decomposition,” in IEEE Statistical Signal Processing Workshop (SSP), June 2016, pp. 1–5.
-  C. Sutour, C. Deledalle, and J. Aujol, “Adaptive regularization of the NL-means: Application to image and video denoising,” IEEE Transactions on Image Processing, vol. 23, no. 8, pp. 3506–3521, Aug 2014.
-  K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising,” IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3142–3155, July 2017.
-  M. Lustig, J. M. Santos, D. L. Donoho, and J. M. Pauly, “k-t SPARSE: High frame rate dynamic MRI exploiting spatio-temporal sparsity,” in Proc. ISMRM, 2006, p. 2420.
-  H. Jung, K. Sung, K. S. Nayak, E. Y. Kim, , and J. C. Ye, “k-t FOCUSS: A general compressed sensing framework for high resolution dynamic MRI,” Magnetic Resonance in Medicine, vol. 61, no. 1, pp. 103–116, 2009.
-  R. Otazo, E. Candès, and D. K. Sodickson, “Low-rank plus sparse matrix decomposition for accelerated dynamic MRI with separation of background and dynamic components,” Magnetic Resonance in Medicine, vol. 73, no. 3, pp. 1125–1136, 2015.
-  Z. P. Liang, “Spatiotemporal imaging with partially separable functions,” in IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2007, pp. 988–991.
-  J. P. Haldar and Z. P. Liang, “Spatiotemporal imaging with partially separable functions: A matrix recovery approach,” in IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2010, pp. 716–719.
-  B. Zhao, J. P. Haldar, C. Brinegar, and Z. P. Liang, “Low rank matrix recovery for real-time cardiac MRI,” in IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2010, pp. 996–999.
H. Pedersen, S. Kozerke, S. Ringgaard, K. Nehrke, and W. Y. Kim, “k-t PCA: Temporally constrained k-t BLAST reconstruction using principal component analysis,”Magnetic Resonance in Medicine, vol. 62, no. 3, pp. 706–716, 2009.
-  J. Trzasko and A. Manduca, “Local versus global low-rank promotion in dynamic MRI series reconstruction,” in Proc. ISMRM, 2011, p. 4371.
-  E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?” J. ACM, vol. 58, no. 3, pp. 11:1–11:37, 2011.
-  J. Caballero, A. N. Price, D. Rueckert, and J. V. Hajnal, “Dictionary learning and time sparsity for dynamic mr data reconstruction,” IEEE Transactions on Medical Imaging, vol. 33, no. 4, pp. 979–994, 2014.
-  S. Ravishankar, B. E. Moore, R. R. Nadakuditi, and J. A. Fessler, “Low-rank and adaptive sparse signal (LASSI) models for highly accelerated dynamic imaging,” IEEE Transactions on Medical Imaging, vol. 36, no. 5, pp. 1116–1128, 2017.
-  B. E. Moore and S. Ravishankar, “Online data-driven dynamic image restoration using DINO-KAT models,” in IEEE International Conference on Image Processing, September 2017, pp. 3590–3594.
-  S. Ravishankar, B. E. Moore, R. R. Nadakuditi, and J. A. Fessler, “Efficient online dictionary adaptation and image reconstruction for dynamic MRI,” in Asilomar Conference on Signals, Systems, and Computers, October 2017, pp. 835–839.
-  R. Gribonval and K. Schnass, “Dictionary identification–sparse matrix-factorization via -minimization,” IEEE Trans. Inform. Theory, vol. 56, no. 7, pp. 3523–3539, 2010.
-  D. Barchiesi and M. D. Plumbley, “Learning incoherent dictionaries for sparse approximation using iterative projections and rotations,” IEEE Transactions on Signal Processing, vol. 61, no. 8, pp. 2055–2065, 2013.
I. Ramirez, P. Sprechmann, and G. Sapiro, “Classification and clustering via
dictionary learning with structured incoherence and shared features,” in
Proc. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2010, 2010, pp. 3501–3508.
-  R. Rubinstein, M. Zibulevsky, and M. Elad, “Double sparsity: Learning sparse dictionaries for sparse signal approximation,” IEEE Transactions on Signal Processing, vol. 58, no. 3, pp. 1553–1564, 2010.
-  K. Engan, S. Aase, and J. Hakon-Husoy, “Method of optimal directions for frame design,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1999, pp. 2443–2446.
-  M. Yaghoobi, T. Blumensath, and M. Davies, “Dictionary learning for sparse approximations with the majorization method,” IEEE Transaction on Signal Processing, vol. 57, no. 6, pp. 2178–2191, 2009.
-  K. Skretting and K. Engan, “Recursive least squares dictionary learning algorithm,” IEEE Transactions on Signal Processing, vol. 58, no. 4, pp. 2121–2130, 2010.
-  L. N. Smith and M. Elad, “Improving dictionary learning: Multiple dictionary updates and coefficient reuse,” IEEE Signal Processing Letters, vol. 20, no. 1, pp. 79–82, Jan 2013.
-  A. Rakotomamonjy, “Direct optimization of the dictionary learning problem,” IEEE Transactions on Signal Processing, vol. 61, no. 22, pp. 5495–5506, 2013.
-  S. Ravishankar, R. R. Nadakuditi, and J. A. Fessler, “Efficient sum of outer products dictionary learning (SOUP-DIL) and its application to inverse problems,” IEEE Transactions on Computational Imaging, vol. 3, no. 4, pp. 694–709, Dec 2017.
-  S. Ravishankar and Y. Bresler, “Closed-form solutions within sparsifying transform learning,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 5378–5382.
-  ——, “Data-driven learning of a union of sparsifying transforms model for blind compressed sensing,” IEEE Transactions on Computational Imaging, vol. 2, no. 3, pp. 294–309, 2016.
L. Bottou, “On-line learning in neural networks.” New York, NY, USA: Cambridge Univ Pr, 1998, ch. Online learning and stochastic approximations, pp. 9–42.
-  J. C. Gower, “Generalized procrustes analysis,” Psychometrika, vol. 40, no. 1, pp. 33–51, 1975.
-  P. L. Combettes and V. R. Wajs, “Signal recovery by proximal forward-backward splitting,” Multiscale Modeling & Simulation, vol. 4, no. 4, pp. 1168–1200, 2005.
-  N. Parikh and S. Boyd, “Proximal algorithms,” Foundations and Trends® in Optimization, vol. 1, no. 3, pp. 127–239, 2014.
-  M. Maggioni, V. Katkovnik, K. Egiazarian, and A. Foi, “Nonlocal transform-domain filter for volumetric data denoising and reconstruction,” IEEE transactions on image processing, vol. 22, no. 1, pp. 119–133, 2013.
-  B. Sharif and Y. Bresler, “Adaptive real-time cardiac MRI using paradise: Validation by the physiologically improved NCAT phantom,” in IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2007, pp. 1020–1023.
-  S. G. Lingala, Y. Hu, E. DiBella, and M. Jacob, “Accelerated dynamic MRI exploiting sparsity and low-rank structure: kt slr,” IEEE transactions on medical imaging, vol. 30, no. 5, pp. 1042–1054, 2011.
-  R. Otazo, “L+S reconstruction matlab code,” http://cai2r.net/resources/software/ls-reconstruction-matlab-code, 2014, [Online; accessed Mar. 2015].
-  S. G. Lingala, “k-t SLR: MATLAB package,” http://user.engineering.uiowa.edu/~jcb/software/ktslr_matlab/Software.html, 2011, [Online; accessed Mar. 2015].