1 Introduction
With the rapid progress in Magnetic Resonance Imaging (MRI), there are a multitude of mechanisms to generate tissue contrast that are associated with various anatomical or functional features. However, the acquisition of a complete multimodal set of highresolution images faces constraints associated with scanning costs, scanner availability, scanning time, and patient comfort. In addition, longterm longitudinal studies such as ADNI
[24] imply that changes exist in the scanner or acquisition protocol over time. In these situations, it is not uncommon to have images of the same subject but obtained from different sources, or to be confronted with missing or corrupted data from earlier time points. In addition, highresolution (HR) 3D medical imaging usually requires long breathhold and repetition times, which lead to longterm scanning times that are challenging or unfeasible in clinical routine. Acquiring lowresolution (LR) images and/or skipping some imaging modalities altogether from the acquisition are then not uncommon. In all such scenarios, it is highly desirable to be able to generate HR data from the desired target modality from the given LR modality data.The relevant literature in this area can be divided into either superresolution (SR) reconstruction from single/multiple image modalities or crossmodality (image) synthesis (CMS). On the one hand, SR is typically concerned with achieving improved visual quality or overcoming the resolution limits of the acquired image data. Such a problem is generally underdetermined and illposed, hence, the solution is not unique. To mitigate this fact, the solution space needs to be constrained by incorporating strong priors. Prior information comes in the form of smoothness assumptions as in, for example, interpolationbased SR
[20, 28]. Stateoftheart methods mostly adopt either external data or internal data to guide the learning algorithms [25, 30]. On the other hand, due to variations in optimal image representations across modalities, the learned image model from one modality data may not be the optimal model for a different modality. How to reveal the relationship between different representations of the underlying image information is a major research issue to be explored. In order to synthesize one modality from another, recent methods in CMS proposed utilizing nonparametric methods like nearest neighbor (NN) search [8], nonlinear regression forests [19], coupled dictionary learning [26], and convolutional neural network (CNN)
[10], to name a few. Although these algorithms achieve remarkable results, most of them suffer from the fundamental limitations associated with supervised learning and/or patchbased synthesis. Supervised approaches require a large number of training image pairs, which is impractical in many medical imaging applications. Patchbased synthesis suffers from inconsistencies introduced during the fusion process that takes place in areas where patches overlap.
In this paper, we propose a weaklysupervised convolutional sparse coding method with an application to neuroimaging that utilizes a small set of registered multimodal image pairs and solves the SR and CMS problems simultaneously. Rather than factorizing each patch into a linear combination of patches drawn from a dictionary built under sparsity constraints (sparse coding), or requiring a training set with fully registered multimodal image pairs, or requiring the same sparse code to be used for both modalities involved, we generate a unified learning model that automatically learns a joint representation for heterogeneous data (e.g., different resolutions, modalities and relative poses). This representation is learned in a common feature space that preserves the local consistency of the images. Specifically, we utilize the cooccurrence of texture features across both domains. A manifold ranking method picks features of the target domain from the most similar subjects in the source domain. Once the correspondence between images in different domains is established, we directly work on a whole image representation that intrinsically respects local neighborhoods. Furthermore, a mapping function is learned that links the representations between the two modalities involved. We call the proposed method WEaklysupErvised joiNt convolutIonal sparsE coding (WEENIE), and perform extensive experiments to verify its performance.
The main contributions of this paper are as follows: 1) This is the first attempt to jointly solve the SR and CMS problems in 3D medical imaging using weaklysupervised joint convolutional sparse coding; 2) To exploit unpaired images from different domains during the learning phase, a heterodomain image alignment term is proposed, which allows identifying correspondences across source and target domains and is invariant to pose transformations; 3) To map LR and HR crossmodality image pairs, joint learning based on convolutional sparse coding is proposed that includes a maximum mean discrepancy term; 4) Finally, extensive experimental results show that the proposed model yields better performance than stateoftheart methods in both reconstruction error and visual quality assessment measures.
2 Related Work
With the goal to transfer the modality information from the source domain to the target domain, recent developments in CMS, such as texture synthesis [6, 10, 13], face photosketch synthesis [9, 36], and multimodal retrieval [23, 29], have shown promising results. In this paper, we focus on the problems of image superresolution and crossmodality synthesis, so only review related methods on these two aspects.
Image SuperResolution: The purpose of image SR is to reconstruct an HR image from its LR counterpart. According to the image priors, image SR methods can be grouped into two main categories: interpolationbased, external or internal data driven learning methods. Interpolationbased SR works, including the classic bilinear [21], bicubic [20], and some followup methods [28, 41], interpolate much denser HR grids by the weighted average of the local neighbors. Most modern image SR methods have shifted from interpolation to learning based. These methods focus on learning a compact dictionary or manifold space to relate LR/HR image pairs, and presume that the lost highfrequency (HF) details of LR images can be predicted by learning from either external datasets or internal selfsimilarity. The external data driven SR approaches [3, 7, 38] exploit a mapping relationship between LR and HR image pairs from a specified external dataset. In the pioneer work of Freeman et al. [7]
, the NN of an LR patch is found, with the corresponding HR patch, and used for estimating HF details in a Markov network. Chang
et al. [3] projected multiple NNs of the local geometry from the LR feature space onto the HR feature space to estimate the HR embedding. Furthermore, sparse codingbased methods [27, 38] were explored to generate a pair of dictionaries for LR and HR patch pairs to address the image SR problem. Wang et al. [35] and Huang et al. [14] further suggested modeling the relationship between LR and HR patches in the feature space to relax the strong constraint. Recently, an efficient CNN based approach was proposed in [5], which directly learned an endtoend mapping between LR and HR images to perform complex nonlinear regression tasks. For internal dataset driven SR methods, this can be built using the similarity searching [25] and/or scalespace pyramid of the given image itself [15].CrossModality Synthesis: In parallel, various CMS methods have been proposed for synthesizing unavailable modality data from available source images, especially in the medical imaging community [26, 33, 34]. One of the wellestablished modality transformation approaches is the examplebased learning method generated by Freeman et al. [8]. Given a patch of a test image, several NNs with similar properties are picked from the source image space to reconstruct the target one using Markov random fields. Roy et al. [26] used sparse coding for desirable MR contrast synthesis assuming that crossmodality patch pairs have same representations and can be directly used for training dictionaries to estimate the contrast of the target modality. Similar work was also used in [17]. In [1], a canonical correlation analysisbased approach was proposed to yield a feature space that can get underlying common structures of coregistered data for better correlation of dictionary pairs. More recently, a locationsensitive deep network [33] has been put forward to explicitly utilize the voxel image coordinates by incorporating image intensities and spatial information into a deep network for synthesizing purposes. Gatys et al. [10] introduced a CNN algorithm of artistic style, that new images can be generated by performing a preimage search in highlevel image content to match generic feature representations of example images. In addition to the aforementioned methods, most CMS algorithms rely on the strictly registered pairs to train models. As argued in [34], it would be preferable to use an unsupervised approach to deal with input data instead of ensuring data to be coupled invariably.
3 WeaklySupervised Joint Convolutional Sparse Coding
3.1 Preliminaries
Convolutional Sparse Coding (CSC) was introduced in the context of modeling receptive fields preciously, and later generalized to image processing, in which the representation of an entire image is computed by the sum of a set convolutions with dictionary filters. The goal of CSC is to remedy the shortcoming of conventional patchbased sparse coding methods by removing shift variations for consistent approximation of local neighbors on whole images. Concretely, given the vectorized image
, the problem of generating a set of vectorized filters for sparse feature maps is solved by minimizing the objective function that combines the squared reconstruction error and the norm penalty on the representations:(1)  
where is an image in vector form, refers to the th filter in vector form, is the sparse feature map corresponding to with size to approximate , controls the penalty, and denotes the 2D convolution operator. and are filters and feature maps stacked as the single column vector, respectively. Here, the inequality constraint on each column of vectorized prevents the filter from absorbing all the energy of the system.
Similar to the original sparse coding problem, Zeiler et al. [39] proposed to solve the CSC in Eq. (1) through alternatively optimizing one variable while fixing the other one in the spatial domain. Advances in recent fast convolutional sparse coding (FCSC) [2] have shown that feature learning can be efficiently and explicitly solved by incorporating CSC within an alternating direction method of multipliers (ADMMs) framework in the Fourier domain.
3.2 Problem Formulation
The simultaneous SR and crossmodality synthesis problem can be formulated as: given a threedimensional LR image of modality , the task is to infer from a target 3D image that is as similar as possible to the HR ground truth of desirable modality . Suppose that we are given a group of LR images of modality , i.e., , and a set of HR images of modality , i.e., . and are the numbers of samples in the training sets, and , denote the dimensions of axial view of each image, while is the size of the image along the zaxis. Moreover, in both training sets, subjects of source modality are mostly different from target modality , that is, we are working with a small number of paired data while most of them are unpaired. Therefore, the difficulties of this problem vary with heterodomain images, e.g., resolutions and modalities, and how well the two domains fit. To bridge image appearances across heterogeneous representations, we propose a method for automatically establishing a onetoone correlation between data in and firstly, then employ the aligned data to jointly learn a pair of filters, while assuming that there exists a mapping function for associating and predicting crossmodality data in the projected common feature space. Particularly, we want to synthesize MRI of human brains in this paper. An overview of our proposed work is depicted in Fig. 1.
Notation: For simplicity, we denote matrices and 3D images as uppercase bold (e.g., image ), vectors and vectorized 2D images as lowercase bold (e.g., filter ), and scalars as lowercase (e.g., the number of filter ). Image with modality called source modality belongs to the source domain, and with modality called target modality belongs to the target domain.
3.3 HeteroDomain Image Alignment
The design of an alignment from to requires a combination of extracting common components from LR/HR images and some measures of correlation between both modalities. In SR literature, common components are usually accomplished by extracting highfrequency (HF) edges and texture features from LR/HR images, respectively [3, 38]. In this paper, we adopt first and secondorder derivatives involving horizontal and vertical gradients as the features for LR images by . , and each gradient has the same length of zaxis as input image while , , and , . For HR images, HF features are obtained through directly subtracting mean value, i.e., . To define the heterodomain image alignment term , we assume that the intrinsic structures of brain MRI of a subject across image modalities are also similar in the HF space since images of different modalities are more likely to be described differently by features. When HF features of both domains are obtained, it is possible to build a way for crossmodality data alignment (in particular, a unilateral crossmodality matching can be thought as a special case in [16]). To this end, we define a subjectspecific transformation matrix as
(2) 
where is used for measuring the distances between each pair of HF data in and computed by the Gaussian kernel as
(3) 
where determines the width of Gaussian kernel. In order to establish a onetoone correspondence across different domains, for each element of , the most relevant image with maximum from is preserved while discarding the rest of the elements:
(4) 
where denotes the maximum element of the th row of . We further set to 1, and all the blank elements to 0. Therefore, is a binary matrix. Since is calculated in a subjectspecific manner, each subject of can only be connected to one target of the most similar brain structures. Hence, images under a heterodomain can be treated as being the registered pairs, i.e., , by constructing virtual correspondence: .
3.4 Objective Function
For image modality transformation, coupled sparse coding [18, 38] has important advantages, such as reliability of correspondence dictionary pair learning and less memory cost. However, the arbitrarily aligned bases related to the small part of images may lead to shifted versions of the same structures or inconsistent representations based on the overlapped patches. CSC [39] was then proposed to generate a global decomposition framework based on the whole image for solving the above problem. In spired by CSC and the benefits of coupled sparsity [18], we introduce a joint convolutional sparse coding method in a weaklysupervised setting for heterodomain images. The small number of originally registered pairs are used to carry the intrinsic relationship between and while the majority of unpaired data are introduced to exploit and enhance the diversity of the original learning system.
Assume that the aforementioned alignment approach leads to a perfect correspondence across and , such that each aligned pair of images possesses approximately identical (or the same for coregistered data) information. Moreover, to facilitate image mappings in a joint manner, we require sparse feature maps of each pair of corresponding source and target images to be associated. That is, suppose that there exists a mapping function , where the feature maps of LR modality images can be converted to their HR versions. Given and , we propose to learn a pair of filters with corresponding feature maps and a mapping function together with the aligned term by
(5)  
where and are the th sparse feature maps that estimate the aligned data terms and when convolved with the th filters and of a fixed spatial support, . Concretely, denotes the aligned image from with LR and modality; denotes the aligned image from containing HR and modality. A convolution operation is represented as operator, and denotes a Frobenius norm chosen to induce the convolutional least squares approximate solution. and are adopted to list all filters, while and represent corresponding feature maps for source and target domains, respectively. is combined to enforce the correspondence for unpaired auxiliary subjects. The mapping function is modeled as a linear projection of and by solving a set of the least squares problem (i.e., ). Parameters , and balance sparsity, feature representation and association mapping.
It is worth noting that may not be perfect since HF feature alignment in Eq. (4) is not good enough for very heterogeneous domain adaptation by matching the first and secondorder derivatives of and means of , which leads to suboptimal filter pairs and inaccurate results. To overcome such a problem, we need additional constraints to ensure the correctness of registered image pairs produced by the alignment. Generally, when feature difference is substantially large, there always exists some subjects of the source domain that are not particularly related to target ones even in the HF subspace. Thus, a registered subject pairs’ divergence assessment procedure should be cooperated with the aforementioned joint learning model to handle this difficult setting. Recent works [4, 22, 42] have performed instance/domain adaptation via measuring data distribution divergence using the maximum mean discrepancy (MMD) criterion. We follow such an idea and employ the empirical MMD as the nonparametric distribution measure to handle the heterodomain image pair mismatch problem in the reproducing kernel Hilbert space (RKHS). This is done by minimizing the difference between distributions of aligned subjects while keeping dissimilar ’registered’ pairs (i.e., discrepant distributions) apart in the sparse feature map space:
(6)  
where indicates RKHS space, and are the paired sparse feature maps for with , is the th element of while denotes the MMD matrix and can be computed as follows
(7) 
3.5 Optimization
We propose a threestep optimization strategy for efficiently tackling the objective function in Eq. (8) (termed (WEENIE), summarized in Algorithm 1) considering that such multivariables and unified framework cannot be jointly convex to , , and . Instead, it is convex with respect to each of them while fixing the remaining variables.
3.5.1 Computing Convolutional Sparse Coding
Optimization involving only sparse feature maps and is solved by initialization of filters , and mapping function (
is initialized as an identity matrix). Besides the original CSC formulation, we have additional terms associated with data alignment and divergence reducing in the common feature space. Eq. (
8) is firstly converted to two regularized subCSC problems. Unfortunately, each of the problems constrained with an penalty term cannot be directly solved, which is not rotation invariant. Recent approaches [2, 12] have been proposed to work around this problem on the theoretical derivation by introducing two auxiliary variables and to enforce the constraint inherent in the splitting. To facilitate componentwise multiplications, we exploit the convolution subproblem [2] in the Fourier domain^{2}^{2}2Fast Fourier transform (FFT) is utilized to solve the relevant linear system and demonstrated substantially better asymptotic performance than processed in the spatial domain.
derived within the ADMMs framework:(9)  
where applied to any symbol indicates the discrete Fourier transform (DFT), for example , and denotes the Fourier transform operator. represents the Hadamard product (i.e., componentwise product), is the inverse DFT matrix, and projects a filter onto a small spatial support. By utilizing slack variables , and ,
, the loss function can be treated as the sum of multiple subproblems and with the addition of equality constraints.
3.5.2 Training Filters
Similar to theoretical CSC methods, we alternatively optimize the convolutional least squares term for the basis function pairs and followed by an regularized least squares term for the corresponding sparse feature maps and . Like the subproblem of solving feature maps, filter pairs can be learned in a similar fashion. With , and fixed, we can update the corresponding filter pairs , and as
(10)  
The optimization with respect to Eq. (10) can be solved by a onebyone update strategy [35] through an augmented Lagrangian method [2].
3.5.3 Learning Mapping Function
Finally, can be learned by fixing , , and , :
(11)  
where Eq. (11
) is a ridge regression problem with a regularization term. We simplify the regularization term
and analytically derive the solution as , where is an identity matrix.3.6 Synthesis
Once the training stage is completed, generating a set of filter pairs , and the mapping , for a given test image in domain , we can synthesize its desirable HR version of style . This is done by computing the sparse feature maps of with respect to a set of filters , and associating to the expected feature maps via , i.e., . Therefore, the desirable HR modality image is then obtained by the sum of converted sparse feature maps convolved with desired filters (termed (SRCMS) summarized in Algorithm 2):
(12) 
4 Experimental Results
We conduct the experiments using two datasets, i.e., IXI^{3}^{3}3http://braindevelopment.org/ixidataset/ and NAMIC brain mutlimodality^{4}^{4}4http://hdl.handle.net/1926/1687 datasets. Following [11, 35, 38], LR counterparts are directly downsampled from their HR ground truths with rate
by bicubic interpolation, boundaries are padded (with eight pixels) to avoid the boundary effect of Fourier domain implementation. The regularization parameters
, , , and are empirically set to be 1, 0.05, 0.1, 0.15, respectively. Optimization variables , , , and are randomly initialized with Gaussian noise considering [2]. Generally, a larger number of filters leads to better results. To balance between computation complexity and result quality, we learn 800 filters following [11]. In our experiments, we perform a more challenging division by applying half of the dataset (processed to be weakly coregistered data) for training while the remaining for testing. To the best of our knowledge, there is no previous work specially designed for SR and crossmodality synthesis simultaneously by learning from the weaklysupervised data. Thus, we extend the range of existing works as the baselines for fair comparison, which can be divided into two categories as follows: (1) brain MRI SR; (2) SR and crossmodality synthesis (onebyone strategy in comparison models). For the evaluation criteria, we adopt the widely used PSNR and SSIM [37] indices to objectively assess the quality of the synthesized images.Experimental Data: The IXI dataset consists of 578 MR healthy subjects collected at three hospitals with different mechanisms (i.e., Philips 3T system, Philips 1.5T system, and GE 3T system). Here, we utilize 180 Proton Densityweighted (PDw) MRI subjects for image SR, while applying both PDw and registered T2weighted (T2w) MRI scans of all subjects for major SRCMS. Further, we conduct SRCMS experiments on the processed NAMIC dataset, which consists of 20 subjects in both T1weighted (T1w) and T2w modalities. As mentioned, we leave half of the dataset out for crossvalidation. We randomly select 30 registered subject pairs for IXI, and 3 registered subject pairs for NAMIC, respectively, from the half of the corresponding dataset for training purposes, and process the reminding training data to be unpaired. Particularly, all the existing methods with respect to crossmodality synthesis in brain imaging request a preprocessing, i.e., skull stripping and/or bias corrections, as done in [34, 26]. We follow such processes and further validate whether preprocessing (especially skull stripping) is always helpful for brain image synthesis.
4.1 Brain MRI SuperResolution
Metric(avg.)  ScSR [38]  Zeyde [40]  ANR [31]  NE+LLE [3]  A+ [32]  CSCSR [11]  WEENIE 

PSNR(dB)  31.63  33.68  34.09  34.00  34.55  34.60  35.13 
SSIM  0.9654  0.9623  0.9433  0.9623  0.9591  0.9604  0.9681 
For the problem of image SR, we focus on the PDw subjects of the IXI dataset to compare the proposed WEENIE model with several stateoftheart SR approaches: sparse codingbased SR method (ScSR) [38], anchored neighborhood regression method (ANR) [31], neighbor embedding + locally linear embedding method (NE+LLE) [3], Zeyde’s method [40], convolutional sparse codingbased SR method (CSCSR) [11], and adjusted anchored neighborhood regression method (A+) [32]. We perform image SR with scaling factor 2, and show visual results on an example slice in Fig. 2. The quantitative results for different methods are shown in Fig. 3, and the average PSNR and SSIM for all 95 test subjects are listed in Table 1. The proposed method, in the case of brain image SR, obtains the best PSNR and SSIM values. The improvements show that the MMD regularized joint learning property on CSC has more influence than the classic sparse codingbased methods as well as the stateofthearts. It states that using MMD combined with the joint CSC indeed improves the representation power of the learned filter pairs.
Metric(avg.)  IXI  
PDT2  T2PD  PDT2+PRE  T2PD+PRE  
WEENIE  MIMECS  WEENIE(reg)  WEENIE  MIMECS  WEENIE(reg)  WEENIE  
PSNR(dB)  37.77  31.77  30.60  30.93  33.43  29.85  30.29  31.00 
SSIM  0.8634  0.8575  0.7944  0.8004  0.8552  0.7503  0.7612  0.8595 
Metric(avg.)  NAMIC  

T1T2  T2T1  
MIMECS  VeUS  VeS  WEENIE  MIMECS  VeUS  VeS  WEENIE  
PSNR(dB)  24.36  26.51  27.14  27.30  27.26  27.81  29.04  30.35 
SSIM  0.8771  0.8874  0.8934  0.8983  0.9166  0.9130  0.9173  0.9270 
4.2 Simultaneous SuperResolution and CrossModality Synthesis
To comprehensively test the robustness of the proposed WEENIE method, we perform SRCMS on both datasets involving six groups of experiments: (1) synthesizing SR T2w image from LR PDw acquisition and (2) vice versa; (3) generating SR T2w image from LR PDw input based on preprocessed data (i.e., skull strapping and bias corrections) and (4) vice versa; (5) synthesizing SR T1w image from LR T2w subject and (6) vice versa. The first four sets of experiments are conducted on the IXI dataset while the last two cases are evaluated on the NAMIC dataset. The stateoftheart synthesis methods include Vemulapalli’s supervised approach (VS) [34], Vemulapalli’s unsupervised approach (VUS) [34] and MR image exampledbased contrast synthesis (MIMECS) [26] approach. However, Vemulapalli’s methods cannot be applied for our problem, because they only contain the crossmodality synthesis stage used in the NAMIC dataset. Original data (without degradation processing) are used in all Vemulapalli’s methods. MIMECS takes image SR into mind and adopts two independent steps (i.e. synthesis+SR) to solve the problem. We compare our results on only using registered image pairs denoted by WEENIE(reg) (that can directly substantiate the benefits of involving unpaired data) and the results using all training images with/without preprocessing for the proposed method against MIMECS, VUS and VS in above six cases and demonstrate examples in Fig. 4 for visual inspection. The advantage of our method over the MIMECS shows, e.g., in white matter structures, as well as in the overall intensity profile. We show the quantitative results in Fig. 5, and Fig. 6, and summarize the averaged values in Table 2, respectively. It can be seen that the performance of our algorithm is consistent across two whole datasets, reaching the best PSNR and SSIM for almost all subjects.
5 Conclusion
In this paper, we proposed a novel weaklysupervised joint convolutional sparse coding (WEENIE) method for simultaneous superresolution and crossmodality synthesis (SRCMS) in 3D MRI. Different from conventional joint learning approaches based on sparse representation in supervised setting, WEENIE only requires a small set of registered image pairs and automatically aligns the correspondence for auxiliary unpaired images to span the diversities of the original learning system. By means of the designed heterodomain alignment term, a set of filter pairs and the mapping function were jointly optimized in a common feature space. Furthermore, we integrated our model with a divergence minimization term to enhance robustness. With the benefit of consistency prior, WEENIE directly employs the whole image, which naturally captures the correlation between local neighborhoods. As a result, the proposed method can be applied to both brain image SR and SRCMS problems. Extensive results showed that WEENIE can achieve superior performance against stateoftheart methods.
References
 [1] K. Bahrami, F. Shi, X. Zong, H. W. Shin, H. An, and D. Shen. Hierarchical reconstruction of 7tlike images from 3t mri using multilevel cca and group sparsity. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pages 659–666. Springer, 2015.

[2]
H. Bristow, A. Eriksson, and S. Lucey.
Fast convolutional sparse coding.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pages 391–398, 2013.  [3] H. Chang, D.Y. Yeung, and Y. Xiong. Superresolution through neighbor embedding. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, volume 1, pages I–I. IEEE, 2004.
 [4] L. Chen, W. Li, and D. Xu. Recognizing rgb images by learning from rgbd data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1418–1425, 2014.
 [5] C. Dong, C. C. Loy, K. He, and X. Tang. Image superresolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence, 38(2):295–307, 2016.
 [6] A. A. Efros and W. T. Freeman. Image quilting for texture synthesis and transfer. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 341–346. ACM, 2001.
 [7] W. T. Freeman, T. R. Jones, and E. C. Pasztor. Examplebased superresolution. IEEE Computer graphics and Applications, 22(2):56–65, 2002.
 [8] W. T. Freeman, E. C. Pasztor, and O. T. Carmichael. Learning lowlevel vision. International journal of computer vision, 40(1):25–47, 2000.
 [9] X. Gao, N. Wang, D. Tao, and X. Li. Face sketch–photo synthesis and retrieval using sparse representation. IEEE Transactions on circuits and systems for video technology, 22(8):1213–1226, 2012.
 [10] L. A. Gatys, A. S. Ecker, and M. Bethge. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2414–2423, 2016.
 [11] S. Gu, W. Zuo, Q. Xie, D. Meng, X. Feng, and L. Zhang. Convolutional sparse coding for image superresolution. In Proceedings of the IEEE International Conference on Computer Vision, pages 1823–1831, 2015.
 [12] F. Heide, W. Heidrich, and G. Wetzstein. Fast and flexible convolutional sparse coding. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5135–5143. IEEE, 2015.
 [13] A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, and D. H. Salesin. Image analogies. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 327–340. ACM, 2001.
 [14] D.A. Huang and Y.C. Frank Wang. Coupled dictionary and feature space learning with applications to crossdomain image synthesis and recognition. In Proceedings of the IEEE international conference on computer vision, pages 2496–2503, 2013.
 [15] J.B. Huang, A. Singh, and N. Ahuja. Single image superresolution from transformed selfexemplars. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5197–5206. IEEE, 2015.
 [16] Y. Huang, F. Zhu, L. Shao, and A. F. Frangi. Color object recognition via crossdomain learning on rgbd images. In Robotics and Automation (ICRA), 2016 IEEE International Conference on, pages 1672–1677. IEEE, 2016.
 [17] J. E. Iglesias, E. Konukoglu, D. Zikic, B. Glocker, K. Van Leemput, and B. Fischl. Is synthesizing mri contrast useful for intermodality analysis? In International Conference on Medical Image Computing and ComputerAssisted Intervention, pages 631–638. Springer, 2013.
 [18] K. Jia, X. Wang, and X. Tang. Image transformation based on learning dictionaries across image spaces. IEEE transactions on pattern analysis and machine intelligence, 35(2):367–380, 2013.
 [19] A. Jog, S. Roy, A. Carass, and J. L. Prince. Magnetic resonance image synthesis through patch regression. In 2013 IEEE 10th International Symposium on Biomedical Imaging, pages 350–353. IEEE, 2013.
 [20] R. Keys. Cubic convolution interpolation for digital image processing. IEEE transactions on acoustics, speech, and signal processing, 29(6):1153–1160, 1981.
 [21] X. Li and M. T. Orchard. New edgedirected interpolation. IEEE transactions on image processing, 10(10):1521–1527, 2001.
 [22] M. Long, G. Ding, J. Wang, J. Sun, Y. Guo, and P. S. Yu. Transfer sparse coding for robust image representation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 407–414, 2013.
 [23] F. Monay and D. GaticaPerez. Modeling semantic aspects for crossmedia image indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(10):1802–1817, 2007.
 [24] S. G. Mueller, M. W. Weiner, L. J. Thal, R. C. Petersen, C. Jack, W. Jagust, J. Q. Trojanowski, A. W. Toga, and L. Beckett. The alzheimer’s disease neuroimaging initiative. Neuroimaging Clinics of North America, 15(4):869–877, 2005.
 [25] F. Rousseau, A. D. N. Initiative, et al. A nonlocal approach for image superresolution using intermodality priors. Medical image analysis, 14(4):594–605, 2010.
 [26] S. Roy, A. Carass, and J. L. Prince. Magnetic resonance image examplebased contrast synthesis. IEEE transactions on medical imaging, 32(12):2348–2363, 2013.
 [27] A. Rueda, N. Malpica, and E. Romero. Singleimage superresolution of brain mr images using overcomplete dictionaries. Medical image analysis, 17(1):113–132, 2013.
 [28] L. Shao and M. Zhao. Order statistic filters for image interpolation. In 2007 IEEE International Conference on Multimedia and Expo, pages 452–455. IEEE, 2007.

[29]
A. W. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain.
Contentbased image retrieval at the end of the early years.
IEEE Transactions on pattern analysis and machine intelligence, 22(12):1349–1380, 2000.  [30] Y. Tang and L. Shao. Pairwise operator learning for patchbased singleimage superresolution. IEEE Transactions on Image Processing, 26(2):994–1003, 2017.
 [31] R. Timofte, V. De Smet, and L. Van Gool. Anchored neighborhood regression for fast examplebased superresolution. In Proceedings of the IEEE International Conference on Computer Vision, pages 1920–1927, 2013.
 [32] R. Timofte, V. De Smet, and L. Van Gool. A+: Adjusted anchored neighborhood regression for fast superresolution. In Asian Conference on Computer Vision, pages 111–126. Springer, 2014.
 [33] H. Van Nguyen, K. Zhou, and R. Vemulapalli. Crossdomain synthesis of medical images using efficient locationsensitive deep network. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pages 677–684. Springer, 2015.
 [34] R. Vemulapalli, H. Van Nguyen, and S. Kevin Zhou. Unsupervised crossmodal synthesis of subjectspecific scans. In Proceedings of the IEEE International Conference on Computer Vision, pages 630–638, 2015.
 [35] S. Wang, L. Zhang, Y. Liang, and Q. Pan. Semicoupled dictionary learning with applications to image superresolution and photosketch synthesis. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2216–2223. IEEE, 2012.
 [36] X. Wang and X. Tang. Face photosketch synthesis and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(11):1955–1967, 2009.
 [37] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
 [38] J. Yang, J. Wright, T. S. Huang, and Y. Ma. Image superresolution via sparse representation. IEEE transactions on image processing, 19(11):2861–2873, 2010.
 [39] M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus. Deconvolutional networks. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2528–2535. IEEE, 2010.
 [40] R. Zeyde, M. Elad, and M. Protter. On single image scaleup using sparserepresentations. In International conference on curves and surfaces, pages 711–730. Springer, 2010.
 [41] L. Zhang and X. Wu. An edgeguided image interpolation algorithm via directional filtering and data fusion. IEEE transactions on Image Processing, 15(8):2226–2238, 2006.
 [42] F. Zheng, Y. Tang, and L. Shao. Heteromanifold regularisation for crossmodal hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016.
Comments
There are no comments yet.