I Introduction
Big data sets—produced by scientific experiments or projects—often contain heterogeneous data obtained by capturing a physical process or object using diverse sensing modalities [2]. The result is a rich set of signals, heterogeneous in nature but strongly correlated due to their being generated by a common underlying phenomenon. Multimodal signal processing and analysis is thus gaining momentum in various research disciplines ranging from medical diagnosis [3]
to remote sensing and computer vision
[4]. In particular, the analysis of highresolution multimodal digital acquisitions of paintings in support of art scholarship has proved a challenging field of research. Examples include the numerical characterization of brushstrokes [5, 6] for the authentication or dating of paintings, canvas thread counting [7, 8, 9] with applications in art forensics, and the (semi) automatic detection and digital inpainting of cracks [10, 11, 12].The Lasting Support project has focused on the investigation of the Ghent Altarpiece (1432), also known as The Adoration of the Mystic Lamb, a polyptych on wood panel painted by Jan and Hubert van Eyck. One of the most admired and influential masterpieces in the history of art, it has given rise to many puzzling questions for art historians. Currently, the Ghent Altarpiece is undergoing a major conservation and restoration campaign that is planned to end in 2017. The panels of the masterpiece were documented with various imaging modalities, amongst which visual macrophotography, infrared macrophotography, infrared reflectography and Xradiography [12]. A massive visual data set (comprising over 2TB of data) has been compiled by capturing small areas of the polyptych separately and stitching the resulting image blocks into one image per panel [13].
Xray images are common tools in painting investigation, since they reveal information about the composition of the materials, the variations in paint thickness, the support as well as the cracks and losses in the ground and paint layers. The problem we address in this paper relates to the outer side panels, namely, the panels showing near lifesized depictions of Adam and Eve, shown in Fig. 1. Due to Xray penetration, the scans of these panels are a mixture of the paintings from each side of the panel as well as the wood panel itself. The presence of all these components makes the reading of the Xray image difficult for art experts, who would welcome an effective approach to separate the components.
The task of separating a mixture of signals into its constituent components is a popular field of research. Most work addresses the blind source separation (BSS) problem, where the goal is to retrieve the different sources from a given linear mixture. Several methods attempt to solve the BSS problem by imposing constraints on the sources’ structure. Independent component analysis (ICA)
[14] commonly assumes that the components are independent nonGaussian and attempts to separate them by minimizing the mutual information [15]. Nonnegative matrix factorization is another approach to solve the problem, where it is assumed that the sources are nonnegative (or they are transformed to a nonnegative representation) [16]. In an alternative path, the problem has been cast into a Bayesian framework, where either the sources are viewed as latent variables [17], or the problem is solved by maximizing the joint posterior density of the sources [18]. Under the Bayesian methodology, spatial smoothing priors (via, for example, Markov random fields) have been used to regularize blind image separation problems [19]. These assumptions do not fit our particular problem as both components have similar statistical properties and they are certainly not statistically independent.Sparsity is another source prior heavily exploited in BSS [20, 21], as well as in various other inverse problems, such as, compressed sensing [22, 23]
[24, 25], denoising [26], and deconvolution [27]. Morphological component analysis (MCA), in particular, is a stateoftheart sparsitybased regularization method, initially designed for the singlemixture problem [28, 20] and then extended to the multimixture case [29]. The crux of the method is the basic assumption that each component has its own characteristic morphology; namely, each has a highly sparse representation over a set of bases (or, dictionaries), while being highly nonsparse for the dictionaries of the other components. Prior work in digital painting analysis has employed MCA to remove cradling artifacts within Xray images of paintings on a panel [30]. The cradling and painting components have very different morphologies, captured by different predefined dictionaries. Namely, complex wavelets [31] provide a sparse representation for the smooth Xray image and shearlets [32] were used to represent the texture of the wood grain. Alternatively, dictionaries can be learned from a set of training signals; several algorithms have been proposed to construct dictionaries including the method of optimal directions (MOD) [33] and the KSVD algorithm [34]. Both utilize the orthogonal matching pursuit (OMP) [35] method for sparse decomposition but they differ in the way they update the dictionary elements while learning. Recently, multimixture MCA has been combined with KSVD, resulting in a method where dictionaries are learned adaptively while separating [36].However, in our particular separation problem we have a simple mixture of two Xray components that are morphologically very similar [see Fig. 1]. Hence, as we will show in the experimental section, simply using fixed or learned dictionaries is insufficient to discriminate one component from the other. Unlike prior work, in our setup we have access to highquality photographic material from each side of the panel that can be used to assist the Xray image separation process.
In this work, we elaborate on a novel method to perform separation of Xray images from a single mixture by using images of another modality as side information. Our contributions are as follows:

We present a new model based on parsimonious representations, which captures both the inherent similarities and the discrepancies among heterogeneous correlated data. The model decomposes the data into a sparse component that is common to the different modalities and a sparse component that uniquely describes each data type. Our model enables the formulation of appropriately regularized convex optimization procedures that address the separation problem at hand.

We propose a novel dictionary learning approach that trains dictionaries coupling the images from the different modalities. Our approach introduces a new modified OMP algorithm that is tailored to our data model.

We devise a novel method that ignores craquelure pixels—namely, pixels that visualize cracks in the surface of paintings—when learning coupled dictionaries. Paying no heed to these pixels avoids contaminating the dictionaries with high frequency noise, thereby leading to higher separation performance. Our approach bears similarities with inpainting approaches, e.g., [25]; it is, however, different in the way the dictionary learning problem is posed and solved.

We devise a novel multiscale image separation strategy that is based on a recursive decomposition of the mixed Xray and visual images into low and highpass bands. As such, the method enables the accurate separation of highresolution images even when a local sparsity prior is assumed. Our approach differs from existing multiscale dictionary learning methods [37, 38, 25] not only by considering imaging data gleaned from diverse modalities but also in the way the multiscale decomposition is constructed.

We conduct experiments using synthetic and real data proving that the use of side information is crucial in the separation of Xray images from doublesided paintings.
In the remainder of the paper: Section II reviews related work and Section III poses our source separation with side information problem. Section IV describes the proposed coupled dictionary learning algorithm. Section V presents our method that ignores cracks when learning dictionaries and, Section VI elaborates on our single and multiscale approaches to Xray image separation. Section VII presents the evaluation of our algorithms while, Section VIII draws our conclusions.
Ii Related Work
Iia Source Separation
Adhering to a formal definition, MCA [28, 20] decomposes a source or image mixture , with , into its constituents, with the assumption that each admits a sparse decomposition in a different overcomplete dictionary , . Namely, each component can be expressed as , where
is a sparse vector comprising a few nonzero coefficients:
, with denoting the pseudonorm. The BSS problem is thus addressed as [28, 20](1) 
Unlike the BSS problem, informed source separation (ISS) methods utilise some form of prior information to aid the task at hand. ISS methods are tailored to the application they address (to the best of our knowledge they are applied only for audio mixtures [39, 40]). For instance, an encoding/decoding framework is proposed in [39], where the sources are mixed at the encoder and the mixtures are sent to the decoder together with side information that is embedded by means of quantization index modulation (QIM) [41]. Unlike these methods, we propose a generic source separation framework that incorporates side information gleaned from a correlated heterogeneous source by means of a new dictionary learning method that couples the heterogenous sources.
IiB Dictionary Learning
Dictionary learning factorizes a matrix composed of training signals into the product as
(2) 
where contains the sparse vectors corresponding to the signals and is the Frobenius norm of a matrix. The columns of the dictionary are typically constrained to have unit norm so as to improve the identifiability of the dictionary. To solve Problem (2), which is nonconvex, Olshausen and Field [42] proposed to iterate through a step that learns the sparse codes and a step that updates the dictionary elements. The same strategy is followed in subsequent studies [43, 33, 34, 44, 45]. Alternatively, polynomialtime algorithms that are guaranteed to reach a globally optimal solution appear in [46, 47].
In order to capture multiscale traits in natural signals, a method to construct multiscale dictionaries was presented in [25]. The multiscale representation was obtained by using a quadtree decomposition of the learned dictionary. Alternatively, the work in [38, 37] applied dictionary learning in the domain of a fixed multiscale operator (wavelets). In our approach we follow a different multiscale strategy, based on a pyramid decomposition, similar to the Laplacian pyramid [48].
There exist dictionary learning approaches designed to couple multimodal data. Monaci et al. [49] proposed an approach to learn basis functions representing audiovisual structures. The approach, however, enforces synchrony between the different modalities. Alternatively, Yang et al. [50, 51] considered the problem of learning two dictionaries and for two families of signals , coupled by a mapping function [with ]. The constraint was that the sparse representation of in is the same as that of in
. The application targeted was image superresolution, where
(resp. ) is the low (resp. high) resolution image. The study in [4] followed a similar approach with the difference that the mapping function was applied to the sparse codes, i.e., , rather than the signals. Jia et al. [52]proposed dictionary learning via the concept of group sparsity so as to couple the different views in human pose estimation. Our coupled dictionary learning method is designed to address the challenges of the targeted source separation application and as such, the model we consider to represent the correlated sources is fundamentally different from previous work. Moreover, we extend coupled dictionary learning to the multiscale case and we provide a way to ignore certain noisy parts of the training signals (corresponding to cracks in our case).
Iii Image Separation with Side Information
We denote by and two vectorized Xray image patches that we wish to separate from each other given a mixture patch , where . Let and be the colocated (visual) image patches of the front and back of the painting. These patches play the role of side information that aids the separation. The use of side information has proven beneficial in various inverse problems [53, 54, 55, 56, 57, 58, 59]. In this work, we consider the signals to obey (superpositions of) sparse representations in some dictionaries:
(3) 
and
(4) 
where , with and , denotes the sparse component that is common to the images in the visible and the Xray domain with respect to dictionaries , respectively. The parameter denotes the sparsity of the vector . Moreover, , with denotes the sparse innovation component of the Xray image, obtained with respect to the dictionary . The common components express global features and structural characteristics that underlie both modalities. The innovation components capture parts of the signal specific to the Xray modality, that is, traces of the wooden panel or even footprints of the vertical and horizontal wooden slats attached to the back the painting. We acknowledge the relation of our model with the sparse common component and innovations model that captures intra and intersignal correlation of physical signals in wireless sensor networks [56, 60]. Our approach is however more generic, since we decompose the signals in learnt dictionaries rather than fixed canonical bases, as in [60].
Given the proposed model and provided that the dictionaries , , and are known, the corresponding Xray separation problem can be formulated as
(5) 
where we applied convex relaxation by replacing the pseudo norm with the norm, denoted as . Problem (5) is underdetermined, namely, and cannot be distinguished due to the symmetry in the constraints. A solution to the unmixing problem can be obtained when , which is formally written as
(6) 
Problem (6) boils down to basis pursuit, which is solved by convex optimization tools, e.g., [61]. The assumption is not only practical but is also motivated by the actual problem. Since the paintings are mounted on the same wooden panel, the sparse components that decompose the Xray images via the dictionary are expected to be the same.
Iv Coupled Dictionary Learning Algorithm
In order to address the source separation with side information problem, we learn coupled dictionaries, , , , by using image patches sampled from visual and Xray images of singlesided panels, which do not suffer from superposition phenomena. The images were registered using the algorithm described in [62]. Let represent a set of colocated vectorized visual and Xray patches, each containing pixels. As per our model in (III) and (III), the columns of and are decomposed as
(7a)  
(7b) 
where we collect their common components into the columns of the matrix and their innovation components into the columns of . We formulate the coupled dictionary learning problem as
(8) 
Similar to related work [34, 25, 38], we solve Problem (8) by alternating between a sparsecoding step and a dictionary update step. Particularly, given initial estimates for dictionaries , , and —in line with prior work [34] we use the overcomplete discrete cosine transform (DCT) for initialization—we iterate on between a sparsecoding step:
(9) 
which is performed to learn the sparse codes having the dictionaries fixed, and a dictionary update step
(10) 
which updates the dictionaries given the calculated sparse codes. The algorithm iterates between these steps until no additional iteration reduces the value of the cost function below a chosen threshold, or until a predetermined number of iterations is reached. In what follows, we provide details regarding the solution of the problem at each stage.
Sparsecoding step. Problem (9) decomposes into problems, each of which can be solved in parallel:
(11) 
To address each of the problems in (11), we propose a greedy algorithm that constitutes a modification of the OMP method [see Algorithm 1]. Our method adapts OMP [35] to solve:
(12) 
where [resp., ] denotes the components of vector indexed by the index set (resp., ), with , . Each subproblem in (11) translates to (12) by replacing: , , and .
Dictionary update step. Problem (10) can be written as
(13) 
where and . Problem (13) decouples into two (independent) problems, that is,
(14) 
and
(15) 
Provided that and are full rowrank, each of these problems has a closedform solution, namely,
and
When and are rankdeficient, (14) and (15
) have multiple solutions, from which we select the one with minimal Frobenius norm. This is done by taking a thin singular value decomposition of
and , and calculatingand
V Weighted Coupled Dictionary Learning
Visual and Xray images of paintings contain a high number of pixels that depict cracks. These are fine patterns of dense cracking formed within the materials. When taking into account these pixels, the learned dictionaries comprise atoms that correspond to high frequency components. As a consequence, the reconstructed images are contaminated by high frequency noise. In order to improve the separation performance, our objective is to obtain dictionaries that ignore pixels representing cracks. To identify such pixels, we generate binary masks identifying the location of cracks by applying our method in [10]. Each sampled image patch may contain a variable number of crack pixels, meaning that each column of the data matrix contains a different number of meaningful entries. To address this issue, we introduce a weighting scheme that adds a weight of 0 or 1 to the pixels that do or do not correspond to cracks, respectively. These crackinduced weights are included using a Hadamard product, namely, our model in (7) is modified to
(16a)  
(16b) 
where the matrix has exactly the same dimensions as and and its entries are 0 or 1 depending on whether a pixel is part of a crack or not, respectively. We now formulate the weighted coupled dictionary learning problem as
(17) 
Similar to (8), the solution for (17) is obtained by alternating between a sparsecoding and a dictionary update step.
Sparsecoding step. Similar to (11), the sparsecoding problem decomposes into problems that can be solved in parallel:
(18) 
where we used to represent column of and to denote a rowvector of ones with dimension equal to . To address each of the subproblems in (V), we use the mOMP algorithm described in Algorithm 1, as each subproblem in (V) reduces to (12) by replacing: , , and .
Dictionary update step. The dictionary update problem is now written as
(19) 
and it decouples into:
(20a)  
(20b) 
We present only the solution of the first problem since the solution of the other follows the same logic. Specifically, we express the Frobenius norm in (20a) as the sum of norm terms, each corresponding to a vectorized training patch
(21) 
By replacing the Hadamard product with multiplication with a diagonal matrix , (21) can be written as
(22) 
To minimize the expression in (22), we take the derivative with respect to the dictionary and set it to zero:
(23) 
Before proceeding with the method to solve (23), we recall that the entries of are either 0 or 1. To avoid dividing by zero when solving (23), we have to update the rows of the dictionary matrix onebyone. Specifically, for each row of , we consider the matrix , where is the support^{1}^{1}1The support is defined by the indices where the th row of is equal to 1. of the th row of , and is the th column of . We also create a vector , where is the th entry of . Provided that is invertible, the th row of (which we denote by the rowvector ) will be given by
(24) 
If each is drawn randomly,
is invertible with probability 1 whenever the cardinality of
is at least equal to the number of columns of . Although in practice each is not randomly drawn, we still obtain an invertible by guaranteeing that the number of training samples is large enough.Vi Single And MultiScale Separation Methods
Via SingleScale Approach
Given the trained coupled dictionaries, the source separation method described in Section III is applied locally, per overlapping patch of the Xray image. Let the corresponding patches from the mixed Xray and the two corresponding visual images be denoted as , , and , respectively. Each patch contains pixels and has topleft coordinates
where , is the overlap stepsize, is the floor function, and are the image height and width, respectively. Prior to separation, the DC value is removed from the pixels in each overlapping patch and the residual values are vectorized. The solution of Problem (6) yields the sparse components corresponding to the patch with coordinates . The texture of each separated patch is then reconstructed following the model in (III), that is, and . In certain cases, the component may capture parts of the actual content; for example, vertical brush strokes can be misinterpreted as the wood texture of the panel. In this case, we can choose to skip the component; namely, we can reconstruct the texture of the Xray patches as and . The DC values are weighted according to the DC values of the colocated patches in the visual images and then added back to the corresponding separated Xray patches. Finally, the pixels in each separated Xray are recovered as the average of the colocated pixels in each overlapping patch.
ViB MultiScale Approach
Due to the restricted patch size in comparison to the high resolution of the Xray image, the DC values of all patches carry a considerable amount of the total image energy. In the singlescale approach, these DC values are common to the two separated Xrays, thereby leading to poor separation. To address this issue, we devise a multiscale image separation approach. In contrast with the techniques in [37, 38, 25], the proposed multiscale approach performs a pyramid decomposition of the mixed Xray and visual images, that is, the images are recursively decomposed into lowpass and highpass bands. The decompositions at scale are constructed as follows. The images at scale —where we use the notation , to refer to the mixed Xray and the two visuals, respectively—are divided into overlapping patches , each of size pixels. Each patch has topleft coordinates
where is the overlap stepsize, and are the height and width of the image decomposition at scale . The DC value is extracted from each patch, thereby constructing the high frequency band of the image at scale . The aggregated DC values comprise the lowpass component of the image, the resolution of which is pixels. The lowpass component is then decomposed further at the subsequent scale (). The pyramid decomposition is schematically sketched in Fig. 2 and exemplified in Fig. 3.
The texture of the mixed Xray image at scale is separated patchbypatch by solving Problem (6). The texture of each separated patch is then reconstructed as, and ; or alternatively, as and . Note that the dictionary learning process can be applied per scale, yielding a triple of coupled dictionaries per scale . In practice, due to lack of training data in the higher scales, dictionaries are learned only from the lowscale decompositions and then copied to the higher scales.
The separated Xray images are finally reconstructed by following the reverse operation: descending the pyramid, the separated component at the coarser scale is upsampled and added to the separated component of the finer scales.
Vii Experiments
Viia Experiments with Synthetic Data
As in [33, 34], we first evaluate the performance of our coupled dictionary learning algorithm—described in Section IV—and our source separation with side information method (see Section III) using synthetic data. Firstly, we generate synthetic signals, , , according to model (III), (III), using random dictionaries and then, given the data, we assess whether the algorithm recovers the original dictionaries. The random dictionaries , , and of size
contain entries drawn from the standard normal distribution and their columns are normalized to have unit
norm. Given the dictionaries, sparse vectors and were produced, each with dimension . The columnvectors and , , comprised of respectively and nonzero coefficients distributed uniformly and placed in random and independent locations. Combining the dictionaries and the sparse vectors according to the model in (7) yields the correlated data signals and , to which white Gaussian noise with a varying signaltonoise ratio (SNR) has been added.SNR [dB]  40  35  30  25  20  15  

96%  95.18%  95.38%  95.65%  95.20%  90.42%  12.53%  
96.78%  95.97%  96.53%  96.50%  95.48%  74.35%  0.27%  
92.95%  91.90%  91.73%  91.27%  91.50%  88.25%  3.07% 
SNR [dB]  40  35  30  25  20  15  

0.0094  0.0307  0.0734  0.1215  0.2125  0.5854  
0.0097  0.0222  0.0818  0.0857  0.2516  0.2781 
To retrieve the initial dictionaries, we apply the proposed method in Section IV with the dictionaries initialised randomly and the maximum number of iterations set to 100—experimental evidence has shown that this value strikes a good balance between complexity and dictionary identifiability. To compare the retrieved dictionaries with the original ones, we adhere to the approach in [34]: per generating dictionary, we sweep through its columns and identify the closest column in the retrieved dictionary. The distance between the two columns is measured as , where is the th column in the original dictionary , , or , and is the th column in the corresponding recovered dictionary. Similar to [34], a distance less than 0.01 signifies a success. The percentage of successes per dictionary and for various SNR values is reported in Table I. The results, which are averaged over trials, show that for very noisy data (that is, ) the dictionary identifiability performance is low. However, for SNR values higher than 20 dB, the percentage of recovered dictionary atoms is up to 96.78%. The obtained performance is systematic for different dictionary and signal sizes as well as for different sparsity levels.
In a second stage, given the learned dictionaries, we separate signal pairs from mixtures by solving Problem (6) using the corresponding pair as side information. The pairs are taken from the correlated data signals and , to which white Gaussian noise with a varying SNR has been added. Table II reports the normalized meansquared error between the reconstructed—defined by —and the original signals, that is, , . The results show that at low and moderate noise SNRs the reconstruction error is very low. When the noise increases, the recovery performance drops; this is to be expected as the noise affects both the dictionary leaning and the generation of the mixtures.
ViiB Experiments with Real Data
We consider eight image pairs—each consisting of an Xray scan and the corresponding photograph—taken from digital acquisitions [12] of singlesided panels of the Ghent Altarpiece (1432). Furthermore, we are given access to eight crack masks (one per visual/Xray image pair) that indicate the pixel positions referring to cracks (these masks were obtained using our method in [10]). Fig. 4 depicts two such pairs with the crack masks, one visualizing a face and the other a piece of fabric. An example Xray mixture (of size pixels) together with its two registered visual images corresponding to the two sides of the painting are depicted in Fig. 5.
Firstly, adhering to the singlescale approach, described in Section VIA, we train a dictionary triplet, , using our method in Section IV. We use patches, each containing pixels, the dictionaries, , , , have a dimension of , and we set and . The separated Xrays that correspond to the mixture in Fig. 5 are depicted in the first column of Fig. 6. We observe that our singlescale approach separates the texture of the Xrays; this is demonstrated by the accurate separation of the cracks. Still, however, the lowpass band content is not properly split over the images; namely, part of the cloth and the face are present in both separated images. Next, we apply the multiscale framework, where we use scales with parameters , , , , and . Dictionary triplets , each with dimension of , are trained for the first three scales and the dictionaries of the third scale are used for the forth. We use , and patches for scale 1, 2 and 3, respectively. The visualizations in the second column of Fig. 6 show that, compared to the single scale approach, the multiscale method properly discriminates the lowpass frequency content of the two images (most part of the cloth is allocated to “Separated Side 1” while the face is only visible in “Separated Side 2”), thereby leading to a higher separation performance. Finally, we also construct dictionary triplets according to our weighted dictionary learning method in Section V. The remaining dictionary learning parameters are as before. It is worth mentioning that, in order to obtain a solution in (24), the number of training samples needs to be higher that the total dimension of the dictionary. Namely, to update the columns of dictionary we need at least samples. Correspondingly, to update the rows of dictionary we need more than samples. The visual results in the third column of Fig. 6 corroborate that the quality of the separation is improved when the dictionaries are learned from only noncrack pixels. Indeed, with this configuration, the separated images are not only smoother but also the separation is more prominent.
It is worth mentioning that the results of our method, depicted in Fig. 6, are obtained without including the component during the reconstruction; namely, we reconstructed each Xray patch as and . The visual results of our method when including the component during the reconstruction are depicted in Fig. 7. These results are obtained with the same dictionaries that yield the result in the third column of Fig. 6. By comparing the two reconstructions, we can make the following observations. First, the component successfully expresses the Xray specific features, such as the wood grain, visualized by the periodic vertical stripes in the Xray scan. The reconstruction of these stripes is much more evident in Fig. 7. Secondly, in this case, the component also captures parts of the actual content that we wish to separate. For example, we can discern a faint outline of the eye in Fig. 7(a) as well as a fold of fabric appearing in Fig. 7(b).
We compare our best performing multiscale approach (namely, the one that omits cracks when learning dictionaries) with the stateoftheart MCA method [28, 20]. Two configurations of the latter are considered. Based on prior work [30], in one configuration we use fixed dictionaries, namely, the discrete wavelet and curvelet transforms are applied on blocks of pixels. Inherently, the lowfrequency content cannot be split by MCA and it is equally divided between both retrieved components. In the other configuration, we learn dictionaries with KSVD using the same training Xray images as in the previous experiment. One dictionary is trained on the Xray images depicting fabric and the other on the images of faces. The KSVD parameters are the same as the ones used in our method. Furthermore, the same multiscale strategy is applied to the configuration of MCA with KSVD trained dictionaries. The results are depicted in Fig. 8 and Fig. 9. Note that the third column in Fig. 8 and Fig. 9 are without and with taking the component into account, respectively. It is clear that MCA with fixed dictionaries can only separate based on morphological properties; for example, the wood grain of the panel is captured entirely by curvelets and not by the wavelets. It is, however, unsuitable to separate painted content—it is evident that part of the cloth and face appear in both separated components. Furthermore, MCA with KSVD dictionaries is also unable to separate the Xray content. Nevertheless, we do observe that most cracks are captured by the face dictionary, as more cracks are present in that type of content. Unlike both stateoftheart configurations of MCA, the proposed method separates the Xray content accurately (the cloth is always depicted on “Separated Side 1” while the face is only visible in “Separated Side 2”), leading to better visual performance. These results corroborate the benefit of using side information by means of photographs to separate mixtures of Xray images.
ViiC Experiments on Simulated Mixtures
Due to the lack of proper ground truth data, we generate simulated Xray image mixtures in an attempt to assess our method in an objective manner. To this end, we utilised the Xray images from singlesided panels, depicting content similar to the mixture in Fig. 5(c) and (f). We generated mixtures by summing these independent Xray images^{2}^{2}2We divided the sum by two to bring the mixture to the same range as the independent components. and then we assessed the separation performance of the proposed method visàvis MCA either with fixed or KSVD trained dictionaries. For this set of experiments, patches of size pixels were considered and the parameters of the different methods were kept the same as in the previous section. Table III reports the quality of the reconstructed Xray components by means of the peaksignaltonoiseratio (PSNR) and structural similarity index metric (SSIM) [63]. It is clear that the proposed method outperforms the alternative stateoftheart methods both in terms of PSNR and SSIM performance. Compared to MCA with fixed dictionaries, the proposed method brings an improvement in the quality of the separation by up to dB in PSNR and in SSIM for “Mixture 3”. The maximum gains against MCA with KSVD trained dictionaries are dB and for “Mixture 3” again. While we realize that PSNR and SSIM are not necessarily the right image quality metrics in this scenario, they do demonstrate objectively the improvements that our method brings over the state of the art.
Mixture 1  Mixture 2  Mixture 3  Mixture 4  Mixture 5  

Image  PSNR [dB]  SSIM  PSNR [dB]  SSIM  PSNR [dB]  SSIM  PSNR [dB]  SSIM  PSNR [dB]  SSIM  
MCA fixed  Xray 1  25.69  0.7941  30.87  0.9003  27.28  0.7915  27.99  0.7972  26.96  0.8473 
Xray 2  25.50  0.8134  30.73  0.8818  27.15  0.8198  27.86  0.8628  26.78  0.8068  
MCA trained  Xray 1  26.04  0.8245  31.07  0.8381  28.13  0.7703  27.56  0.7783  27.24  0.8258 
Xray 2  25.83  0.8485  31.15  0.8189  27.23  0.6966  27.41  0.8464  27.05  0.7927  
Proposed  Xray 1  26.21  0.8583  31.91  0.9072  28.54  0.8656  28.31  0.8266  27.34  0.8592 
Xray 2  26.00  0.8759  31.75  0.8892  28.36  0.8859  28.16  0.8921  27.14  0.8329 
Viii Conclusion
We have proposed a novel sparsitybased regularization method for source separation guided by side information. Our method learns dictionaries, coupling registered acquisitions from diverse modalities, and comes both in a single and multiscale framework. The proposed method is applied in the separation of Xray images of paintings on wooden panels that are painted on both sides, using the photographs of each side as side information. Experiments on real data, consisting of digital acquisitions of the Ghent Altarpiece (1432), verify that the use of side information can be highly beneficial in this application. Furthermore, due to the high resolution of the data relative to the restricted patch size, the multiscale version of the proposed algorithm improves the quality of the results significantly. We also observed experimentally that omitting the high frequency crack pixels in the dictionary learning process results in smoother and visually more pleasant separation results. Finally, the superiority of our method, compared to the stateoftheart MCA technique [20, 21, 36], was validated visually using real data and objectively using simulated Xray image mixtures.
Acknowledgment
Miguel Rodrigues acknowledges valuable feedback from Jonathon Chambers.
References
 [1] N. Deligiannis, J. F. C. Mota, B. Cornelis, M. R. D. Rodrigues, and I. Daubechies, “Xray image separation via coupled dictionary learning,” in IEEE Int. Conf. Image Process. (ICIP). [Available: arXiv:1605.06474], 2016, pp. 1––5.
 [2] M. Chen, S. Mao, and Y. Liu, “Big data: A survey,” Mobile Networks and Applications, vol. 19, no. 2, pp. 171–209, 2014.
 [3] D. Zhang and D. Shen, “Multimodal multitask learning for joint prediction of multiple regression and classification variables in Alzheimer’s disease,” NeuroImage, vol. 59, no. 2, pp. 895–907, 2012.

[4]
S. Wang, L. Zhang, Y. Liang, and Q. Pan, “Semicoupled dictionary learning
with applications to image superresolution and photosketch synthesis,” in
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, 2012, pp. 2216–2223.  [5] C. R. Johnson, E. Hendriks, I. J. Berezhnoy, E. Brevdo, S. M. Hughes, I. Daubechies, J. Li, E. Postma, and J. Z. Wang, “Image processing for artist identification,” IEEE Signal Process. Mag., vol. 25, no. 4, pp. 37–48, 2008.
 [6] N. van Noord, E. Hendriks, and E. Postma, “Toward discovery of the artist’s style: Learning to recognize artists by their artworks,” IEEE Signal Process. Mag., vol. 32, no. 4, pp. 46–54, July 2015.
 [7] D. Johnson, C. Johnson Jr., A. Klein, W. Sethares, H. Lee, and E. Hendriks, “A thread counting algorithm for art forensics,” in IEEE Digital Signal Processing Workshop and IEEE Signal Processing Education Workshop (DSP/SPE), Jan 2009, pp. 679–684.
 [8] H. Yang, J. Lu, W. Brown, I. Daubechies, and L. Ying, “Quantitative canvas weave analysis using 2D synchrosqueezed transforms: Application of timefrequency analysis to art investigation,” IEEE Signal Process. Mag., vol. 32, no. 4, pp. 55–63, July 2015.

[9]
L. van der Maaten and R. Erdmann, “Automatic threadlevel canvas analysis: A machinelearning approach to analyzing the canvas of paintings,”
IEEE Signal Process. Mag., vol. 32, no. 4, pp. 38–45, July 2015.  [10] B. Cornelis, T. Ružić, E. Gezels, A. Dooms, A. Pižurica, L. Platiša, J. Cornelis, M. Martens, M. De Mey, and I. Daubechies, “Crack detection and inpainting for virtual restoration of paintings: The case of the Ghent Altarpiece,” Signal Process., 2012.
 [11] B. Cornelis, Y. Yang, J. Vogelstein, A. Dooms, I. Daubechies, and D. Dunson, “Bayesian crack detection in ultra high resolution multimodal images of paintings,” arXiv preprint, ArXiv:1304.5894, 2013.
 [12] A. Pizurica, L. Platisa, T. Ruzic, B. Cornelis, A. Dooms, M. Martens, H. Dubois, B. Devolder, M. De Mey, and I. Daubechies, “Digital image processing of the Ghent Altarpiece: Supporting the painting’s study and conservation treatment,” IEEE Signal Process. Mag., vol. 32, no. 4, pp. 112–122, 2015.
 [13] B. Cornelis, A. Dooms, J. Cornelis, F. Leen, and P. Schelkens, “Digital painting analysis, at the cross section of engineering, mathematics and culture,” in Eur. Signal Process. Conf. (EUSIPCO), 2011, pp. 1254–1258.
 [14] A. Hyvärinen, J. Karhunen, and E. Oja, Independent component analysis. John Wiley & Sons, 2004, vol. 46.

[15]
A. Hyvärinen, “Gaussian moments for noisy independent component analysis,”
IEEE Signal Process. Lett., vol. 6, no. 6, pp. 145–147, 1999.  [16] P. Smaragdis, C. Févotte, G. Mysore, N. Mohammadiha, and M. Hoffman, “Static and dynamic source separation using nonnegative factorizations: A unified view,” IEEE Signal Process. Mag., vol. 31, no. 3, pp. 66–75, 2014.
 [17] J. W. Miskin, “Ensemble learning for independent component analysis,” in Advances in Independent Component Analysis, 2000.
 [18] D. B. Rowe, “A Bayesian approach to blind source separation,” Journal of Interdisciplinary Mathematics, vol. 5, no. 1, pp. 49–76, 2002.
 [19] K. Kayabol, E. Kuruoğlu, and B. Sankur, “Bayesian separation of images modeled with MRFs using MCMC,” IEEE Trans. Image Process., vol. 18, no. 5, pp. 982–994, 2009.
 [20] J. Bobin, J.L. Starck, J. Fadili, and Y. Moudden, “Sparsity and morphological diversity in blind source separation,” IEEE Trans. Image Process., vol. 16, no. 11, pp. 2662–2674, 2007.
 [21] M. Zibulevsky and B. Pearlmutter, “Blind source separation by sparse decomposition in a signal dictionary,” Neural Computation, vol. 13, no. 4, pp. 863–882, 2001.
 [22] E. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489–509, 2006.
 [23] D. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, 2006.
 [24] C. Guillemot and O. Le Meur, “Image inpainting: Overview and recent advances,” IEEE Signal Process. Mag., vol. 31, no. 1, pp. 127–144, 2014.
 [25] J. Mairal, G. Sapiro, and M. Elad, “Learning multiscale sparse representations for image and video restoration,” Multiscale Modeling & Simulation, vol. 7, no. 1, pp. 214–241, 2008.
 [26] M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Trans. Image Process., vol. 15, no. 12, pp. 3736–3745, 2006.
 [27] I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” Comm. Pure Appl. Math, vol. 57, p. 1413–1541, 2004.
 [28] J.L. Starck, M. Elad, and D. Donoho, “Redundant multiscale transforms and their application for morphological component separation,” Advances in Imaging and Electron Physics, vol. 132, pp. 287–348, 2004.
 [29] J. Bobin, Y. Moudden, J.L. Starck, and M. Elad, “Morphological diversity and source separation,” IEEE Signal Process. Lett., vol. 13, no. 7, pp. 409–412, 2006.
 [30] R. Yin, D. Dunson, B. Cornelis, B. Brown, N. Ocon, and I. Daubechies, “Digital cradle removal in Xray images of art paintings,” in IEEE Int. Conf. Image Process. (ICIP), Oct 2014, pp. 4299–4303.
 [31] N. G. Kingsbury, “The dualtree complex wavelet transform: a new technique for shift invariance and directional filters,” in Proc. 8th IEEE DSP Workshop, vol. 8, 1998, p. 86.
 [32] K. Guo and D. Labate, “Optimally sparse multidimensional representation using shearlets,” SIAM journal on Mathematical Analysis, vol. 39, no. 1, pp. 298–318, 2007.
 [33] K. Engan, S. O. Aase, and J. HakonHusoy, “Method of optimal directions for frame design,” in IEEE Int. Conf. Acoustics, Speech, Signal Process. (ICASSP), 1999, pp. 2443–2446.
 [34] M. Aharon, M. Elad, and A. Bruckstein, “KSVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Process., vol. 54, no. 11, pp. 4311–4322, Nov. 2006.
 [35] J. A. Tropp and A. C. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit,” IEEE Trans. Inf. Theory, vol. 53, no. 12, pp. 4655–4666, 2007.
 [36] V. Abolghasemi, S. Ferdowsi, and S. Sanei, “Blind separation of image sources via adaptive dictionary learning,” IEEE Trans. Image Process., vol. 21, no. 6, pp. 2921–2930, 2012.
 [37] R. Yan, L. Shao, and Y. Liu, “Nonlocal hierarchical dictionary learning using wavelets for image denoising,” IEEE Trans. Image Process., vol. 22, no. 12, pp. 4689–4698, 2013.
 [38] B. Ophir, M. Lustig, and M. Elad, “Multiscale dictionary learning using wavelets,” IEEE J. Sel. Topics Signal Process., vol. 5, no. 5, pp. 1014–1024, 2011.
 [39] A. Liutkus, J. Pinel, R. Badeau, L. Girin, and G. Richard, “Informed source separation through spectrogram coding and data embedding.” Signal Process., vol. 92, no. 8, pp. 1937–1949, 2012.
 [40] S. Gorlow and S. Marchand, “Informed separation of spatial images of stereo music recordings using secondorder statistics,” in Machine Learning for Signal Processing (MLSP), 2013 IEEE International Workshop on. IEEE, 2013, pp. 1–6.
 [41] B. Chen and G. W. Wornell, “Quantization index modulation: A class of provably good methods for digital watermarking and information embedding,” IEEE Trans. Inf. Theory, vol. 47, no. 4, pp. 1423–1443, 1999.
 [42] B. A. Olshausen and D. J. Field, “Emergence of simplecell receptive field properties by learning a sparse code for natural images,” Nature, vol. 381, no. 6583, pp. 607–609, 1996.
 [43] K. KreutzDelgado, J. F. Murray, B. D. Rao, K. Engan, T.W. Lee, and T. J. Sejnowski, “Dictionary learning algorithms for sparse representation,” Neural Computation, vol. 15, no. 2, pp. 349–396, 2003.
 [44] W. Chen, I. Wassell, and M. R. Rodrigues, “Dictionary design for distributed compressive sensing,” IEEE Signal Process. Lett., vol. 22, no. 1, pp. 95–99, 2015.
 [45] H. Zayyani, M. Korki, and F. Marvasti, “Dictionary learning for blind one bit compressed sensing,” IEEE Signal Process. Lett., vol. 23, no. 2, p. 187–191, 2016.
 [46] D. A. Spielman, H. Wang, and J. Wright, “Exact recovery of sparselyused dictionaries,” in Conference on Learning Theory (COLT), 2012.
 [47] S. Arora, R. Ge, and A. Moitra, “New algorithms for learning incoherent and overcomplete dictionaries,” arXiv preprint arXiv:1308.6273, 2013.
 [48] P. J. Burt and E. H. Adelson, “The Laplacian pyramid as a compact image code,” IEEE Trans. Commun., vol. 31, no. 4, pp. 532–540, 1983.
 [49] G. Monaci, P. Jost, P. Vandergheynst, B. Mailhe, S. Lesage, and R. Gribonval, “Learning multimodal dictionaries,” IEEE Trans. Image Process., vol. 16, no. 9, pp. 2272–2283, 2007.
 [50] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image superresolution via sparse representation,” IEEE Trans. Image Process., vol. 19, no. 11, pp. 2861–2873, 2010.
 [51] J. Yang, Z. Wang, Z. Lin, S. Cohen, and T. Huang, “Coupled dictionary training for image superresolution,” IEEE Trans. Image Process., vol. 21, no. 8, pp. 3467–3478, 2012.
 [52] Y. Jia, M. Salzmann, and T. Darrell, “Factorized latent spaces with structured sparsity,” in Advances in Neural Information Processing Systems, 2010, pp. 982–990.
 [53] N. Vaswani and W. Lu, “Modifiedcs: Modifying compressive sensing for problems with partially known support,” IEEE Trans. Signal Process., vol. 58, no. 9, pp. 4595–4607, 2010.
 [54] J. F. C. Mota, N. Deligiannis, and M. R. D. Rodrigues, “Compressed sensing with prior information: Optimal strategies, geometry, and bounds,” arXiv preprint arXiv:1408.5250, 2014.
 [55] ——, “Compressed sensing with side information: Geometrical interpretation and performance bounds,” in IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE, 2014, pp. 512–516.
 [56] F. Renna, L. Wang, X. Yuan, J. Yang, G. Reeves, R. Calderbank, L. Carin, and M. R. Rodrigues, “Classification and reconstruction of highdimensional signals from lowdimensional noisy features in the presence of side information,” arXiv preprint arXiv:1412.0614, 2014.
 [57] J. Scarlett, J. S. Evans, and S. Dey, “Compressed sensing with prior information: Informationtheoretic limits and practical decoders,” IEEE Trans. Signal Process., vol. 61, no. 2, pp. 427–439, 2013.
 [58] E. Zimos, J. F. C. Mota, M. R. D. Rodrigues, and N. Deligiannis, “Bayesian compressed sensing with heterogeneous side information,” in IEEE Data Compression Conf. (DCC), 2016.
 [59] M. A. Khajehnejad, W. Xu, A. S. Avestimehr, and B. Hassibi, “Weighted ℓ 1 minimization for sparse recovery with prior information,” in IEEE Int. Symp. Inf. Theory (ISIT). IEEE, 2009, pp. 483–487.
 [60] M. B. Wakin, M. F. Duarte, S. Sarvotham, D. Baron, and R. G. Baraniuk, “Recovery of jointly sparse signals from few random projections,” in Proc. Neural Inform. Process. Syst., (NIPS), 2005.
 [61] E. van den Berg and M. P. Friedlander, “Probing the Pareto frontier for basis pursuit solutions,” SIAM Journal on Scientific Computing, vol. 31, no. 2, pp. 890–912, 2008.
 [62] I. ArgandaCarreras, C. O. S. Sorzano, R. Marabini, J. M. Carazo, C. O. de Solorzano, and J. Kybic, “Consistent and elastic registration of histological sections using vectorspline regularization,” in Computer Vision Approaches to Medical Image Analysis, May 2006, pp. 85–95.
 [63] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004.
Comments
There are no comments yet.