X-ray image separation via coupled dictionary learning

05/20/2016 ∙ by Nikos Deligiannis, et al. ∙ 0

In support of art investigation, we propose a new source sepa- ration method that unmixes a single X-ray scan acquired from double-sided paintings. Unlike prior source separation meth- ods, which are based on statistical or structural incoherence of the sources, we use visual images taken from the front- and back-side of the panel to drive the separation process. The coupling of the two imaging modalities is achieved via a new multi-scale dictionary learning method. Experimental results demonstrate that our method succeeds in the discrimination of the sources, while state-of-the-art methods fail to do so.



There are no comments yet.


page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The analysis and enhancement of high-resolution digital acquisitions of paintings is becoming a popular field of research [1, 2]. Prior work includes the removal of canvas artifacts in high-resolution photographs [3], the removal of cradling in X-ray images of paintings on panel [4], as well as the detection and digital removal of cracks  [5].

In this work, we propose a novel framework to separate X-ray images taken from double-sided paintings. A famous piece of art that contains panels painted on both sides is the Ghent Altarpiece (1432) created by Jan and Hubert van Eyck. In preparation of its restoration, the masterpiece was digitized by means of various modalities: visual photography, infrared photography and reflectography, and X-radiography [6]. The latter is a powerful tool for art investigation, as it reveals information about the structural condition of the painting. However, X-ray scans of double-sided paintings are very cluttered, thus making their reading by art experts difficult. The reason is that these images contain information from both sides of the painting as well as its support (wood structure or canvas).

Prior work on umixing signals focuses mostly on the blind source separation (BSS) problem, where the task is to retrieve the different signal sources from one or more linear mixtures. Independent component analysis (ICA)

[7]—where the sources are assumed to be statistically independent—and nonnegative matrix factorization—where the sources are considered or transformed into a nonnegative representation [8]—are representative methods to solve the BSS problem. Alternative solutions adhere to a Bayesian formulation, via, for example, Markov random fields [9]. Sparsity is another source prior, heavily exploited in BSS problems [10, 11], with morphological component analysis (MCA) being a state-of-the-art method. The assumption in MCA is that each source has a different morphology; namely, it has a sparse representation over a set of bases, alias, dictionaries, while being non-sparse over other dictionaries. The dictionaries can be pre-defined, for instance, the wavelet or the discrete cosine transform (DCT), or learned from a set of training signals. Seminal dictionary learning works include the method of optimal directions (MOD) [12] and the K-SVD algorithm [13], both utilizing the orthogonal matching pursuit (OMP) [14] method to perform sparse signal decomposition. Recently, MCA has been combined with K-SVD, thus enabling dictionaries to be learned while separating [15].

The assumptions in previous source separation methods are not fitting our problem as both sources have similar morphological and statistical traits. In this work, we propose a novel method to perform separation of X-ray images of paintings by using images of another modality as side information. Our approach consists of two steps: 1) learning multi-scale dictionaries from photographs and X-rays of single-sided panels (in which the X-rays are not mixed), and 2) separating the given mixed X-ray from a double-sided panel, using those dictionaries and the photographs from each side. Previous work has used coupled dictionary learning to address problems in audio-visual analysis [16]

, super-resolution

[17], photo-sketch synthesis[18]

, and human pose estimation

[19]. Besides the application domain, our method differs from prior work in the way we model the correlation between the sources. Experimental evidence proves that our method is superior compared to the state-of-the-art MCA technique, configured either with fixed or trained dictionaries.

2 Image Separation with Side Information

We start by describing MCA, as the state-of-the-art sparsity based source separation method, and afterwards we introduce the proposed method which, unlike the former, makes use of side information. First, let us denote by and

two vectorized X-ray image patches that we wish to separate from a given X-ray scan patch

Morphological Component Analysis. Assume that each admits a sparse decomposition in a different overcomplete dictionary , ; namely, each component can be expressed as , where is a sparse vector comprising a few non-zero coefficients: , with denoting the pseudo-norm. MCA [10, 11] decomposes the mixture by approximately solving the following optimization problem:


A typical approximation consists of replacing the pseudo-norm with the -norm.
Source Separation with Side Information. The use of side information has proven beneficial in various inverse problems [20, 21, 22]. Adhering to this logic, we show how side information can be helpful in separating mixtures, where the sources have similar characteristics. In our particular problem, we consider side information signals and formed by the co-located visual image patches of the front and the back of the painting. Both the X-ray and side information signals admit a sparse decomposition in given dictionaries, namely,




where , with , denotes the sparse component that is common to the visual and X-ray images with respect to dictionaries . Moreover, , with , denotes the sparse innovation component of the X-ray image, obtained with respect to dictionary . The common components express the structure underlying both the X-ray and natural images, while the innovation component captures X-ray specific parts of the signal (e.g., traces of the wooden panel). The separation problem is now formulated as the following problem:


The relaxed version of Problem (4) boils down to Basis Pursuit, which is solved by convex optimization tools, e.g., [23].

3:Initialize residual: .
4:Total sparsity of vector : .
5:Counters for the sparsity of and : , .
6:Set of non-zero elements of : .
9:for  do
10:     Sort the indices , corresponding to the columns of , such that are in descending order. Put the ordered indices in the vector .
11:     Set and auxiliary iterator .
12:     while  do
13:          it=it+1.
14:          Find index that corresponds to value of it: .
15:          if  AND  then
16:               Set and increase: .
17:          else
18:               if  AND  then
19:                    Set and increase : .
20:               end if
21:          end if
22:     end while
23:     Update the set of non-zero elements of , i.e., , and the matrix of chosen atoms: .
24:     Solve: .
25:     Calculate the new residual: .
26:end for
Algorithm 1 modified Orthogonal Matching Pursuit

3 Coupled Dictionary Learning Algorithm

We train coupled dictionaries, , , , by using image patches sampled from registered visual and X-ray images of single-sided panels, which do not suffer from superposition phenomena. Let represent a set of  co-located vectorized visual and X-ray patches, each containing pixels. We assume that the columns of  and  can be decomposed as in (2), and we collect their common components into the columns of the matrix and their innovation components into the columns of . We formulate the coupled dictionary learning problem as


where , are sparse vector-columns of matrix and , runs over the columns of and , and , are thresholds on the sparsity level. Given initial estimates for the dictionaries111We use the overcomplete DCT to initialize our dictionaries., Problem (5) is solved by iterating between a sparse-coding step, where the dictionaries are fixed, and a dictionary update step, in which the coefficients are fixed, as in [13, 12].

Given fixed dictionaries, the sparse coding problem decomposes into  problems that can be solved in parallel:


where we used , , , and  to represent column  of , , , and , respectively, and counts the iterations. To address each of the sub-problems in (6), we propose a greedy algorithm that constitutes a modification of the OMP method [see Algorithm 1]. Our method adapts OMP [14] to solve:


where [resp., ] denotes the components of vector indexed by the index set (resp., ), with , . Each sub-problem in (6) translates to (7) by replacing: , , and .

Figure 1: Image set cropped from a double-sided panel of the altarpiece, on which we assess the proposed method; (a) photograph of side 1, (b) photograph of side 2; (c) corresponding X-ray image. The resolution is pixels.

Given fixed sparse coefficients, the dictionary update problem decouples into two (independent) problems, that is,


where and . Each of these problems has a closed-form solution.

Figure 2: Visual evaluation of the proposed multi-scale method in the separation of the X-ray in Fig. 1(c); (left) separated side 1, (right) separated side 2. The competing methods are: (1st row) MCA with fixed dictionaries [4], (2nd row) multi-scale MCA with K-SVD, (3rd row) Proposed.

4 X-ray Image Separation Method

Because of complexity, dictionaries are learned for small image patches, usually with dimensions of pixels; namely, we adhere to a local sparsity prior. However, due to the high-resolution of the images, patches of that size cannot fully capture large structures. Hence, we propose a multi-scale image separation approach that is based on a pyramid decomposition of the images. Our multi-scale strategy is as follows: The images at scale —where we use the notation , to refer to the mixed X-ray and the two visuals, respectively—are divided into overlapping patches , each of size pixels. Each patch has top-left coordinates

where is the overlap step-size, and are the height and width of the image decomposition at scale . The DC value is extracted from each patch, thereby constructing the high frequency band of the image at scale . The aggregated DC values comprise the low-pass component of the image, the resolution of which is pixels. The low-pass component is then decomposed further at the subsequent scale (). The texture of the mixed X-ray image at scale is separated patch-per-patch by solving Problem (4). The texture of each separated patch is then reconstructed as and . Namely, we omit the innovation component [see (2)] during reconstruction, as this is common to the two X-rays222Experimental observation revealed that including the innovation component leads to poorer visual quality of the separation. . The separated X-ray images are finally reconstructed by following the reverse operation: Descending the pyramid, the separated component at the coarser level is up-sampled and added to the separated component of the finer scales.

As a final note, the dictionary learning process is applied per scale, yielding a triple of coupled dictionaries per scale . Due to lack of training data in the coarser scales, dictionaries are typically learned on the finer scales and then re-used in the coarsest scale.

5 Experiments

We assess our method on different crops, with dimensions of , taken from the digital acquisitions [6] of one double-sided panel of the Ghent Altarpiece (1432). An example X-ray image we aim to separate and the two corresponding visual images from each side of the panel are depicted in Fig. 1. We apply the multi-scale framework, where we use scales with parameters , , and . Dictionary triplets , each with dimension of , are trained for the first two layers and the dictionaries of the second layer are extrapolated to the third. We use patches from digital acquisitions of the single-sided panels of the altarpiece and set and .

To demonstrate the benefit of using side information, we compare our method against two configurations of MCA [10, 11]. In the first one we use the discrete wavelet and curvelet transforms on blocks of pixels [4]; the low-frequency content is divided between the two components. In the second configuration we use K-SVD to train two dictionaries: one on X-ray images depicting cloth and the other on images depicting faces—content also found in the X-ray mixtures. The K-SVD method is extended with our multi-scale strategy and the same parameters are used. As no ground truth data is available, we first resort to visual comparisons. The results, depicted in Fig. 2, clearly show that MCA with fixed dictionaries can only separate based on morphological properties; for example, the wood grain of the panel is captured entirely by curvelets and not by the wavelets. It is, however, unfitted to separate painted content. MCA with K-SVD dictionaries is also unable to separate the X-ray content as the dictionaries are not sufficiently discriminative. The results using our method show the benefit of incorporating side information in the separation problem. Towards a more objective comparison, we measure the structural similarity (SSIM) [24] index between the two separated components, where low SSIM values would indicate less similarity; hence good separation. The results on two additional X-ray scans from the same painting, reported in Table 1, confirm the better separation performance of our method, as advocated by the lowest SSIM values.

MCA fixed MCA trained Proposed
X-ray mixture 1 0.9249 0.7385 0.1681
X-ray mixture 2 0.9603 0.8341 0.6034
Table 1: Similarity scores (obtained with the SSIM [24] metric) between the separated components.

6 Conclusion

We have proposed a novel sparsity-based regularization method for source separation guided by side information. Our method is based on a new multi-scale algorithm that learns dictionaries coupling multi-modal data. We apply the proposed method to separate X-ray images of paintings with content on both sides of their panel, where photographs of each side are used as side information. Experiments with real data from digital acquisitions of the Ghent Altarpiece (1432), prove the superiority of our method compared to the state-of-the-art MCA technique [10, 11, 15].


  • [1] L. van der Maaten and R.G. Erdmann,

    “Automatic thread-level canvas analysis: A machine-learning approach to analyzing the canvas of paintings,”

    IEEE Signal Process. Mag., vol. 32, no. 4, pp. 38–45, July 2015.
  • [2] N. van Noord, E. Hendriks, and E. Postma, “Toward discovery of the artist’s style: Learning to recognize artists by their artworks,” IEEE Signal Process. Mag., vol. 32, no. 4, pp. 46–54, July 2015.
  • [3] B. Cornelis, A. Dooms, J. Cornelis, and P. Schelkens, “Digital canvas removal in paintings,” Signal Process., vol. 92, no. 4, pp. 1166–1171, 2012.
  • [4] R. Yin, D. Dunson, B. Cornelis, B. Brown, N. Ocon, and I. Daubechies, “Digital cradle removal in X-ray images of art paintings,” in IEEE ICIP, 2014, pp. 4299–4303.
  • [5] B. Cornelis, T. Ružić, E. Gezels, A. Dooms, A. Pižurica, L. Platiša, J. Cornelis, M. Martens, M. De Mey, and I. Daubechies, “Crack detection and inpainting for virtual restoration of paintings: The case of the Ghent Altarpiece,” Signal Process., 2012.
  • [6] A. Pizurica, L. Platisa, T. Ruzic, B. Cornelis, A. Dooms, M. Martens, H. Dubois, B. Devolder, M. De Mey, and I. Daubechies, “Digital image processing of the Ghent Altarpiece: Supporting the painting’s study and conservation treatment,” IEEE Signal Process. Mag., vol. 32, no. 4, pp. 112–122, 2015.
  • [7] A. Hyvärinen, J. Karhunen, and E. Oja, Independent component analysis, vol. 46, John Wiley & Sons, 2004.
  • [8] P. Smaragdis, C. Févotte, G. Mysore, N. Mohammadiha, and M. Hoffman, “Static and dynamic source separation using nonnegative factorizations: A unified view,” IEEE Signal Process. Mag., vol. 31, no. 3, pp. 66–75, 2014.
  • [9] K. Kayabol, E. Kuruoğlu, and B. Sankur, “Bayesian separation of images modeled with MRFs using MCMC,” IEEE Trans. Image Process., vol. 18, no. 5, pp. 982–994, 2009.
  • [10] J. Bobin, J.-L. Starck, J. Fadili, and Y. Moudden, “Sparsity and morphological diversity in blind source separation,” IEEE Trans. Image Process., vol. 16, no. 11, pp. 2662–2674, 2007.
  • [11] M. Zibulevsky and B. Pearlmutter, “Blind source separation by sparse decomposition in a signal dictionary,” Neural computation, vol. 13, no. 4, pp. 863–882, 2001.
  • [12] K. Engan, S. O. Aase, and J. Hakon-Husoy, “Method of optimal directions for frame design,” in IEEE ICASSP, 1999, pp. 2443–2446.
  • [13] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Process., vol. 54, no. 11, pp. 4311–4322, Nov. 2006.
  • [14] J. A. Tropp and A. C. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit,” IEEE Trans. Inf. Theory, vol. 53, no. 12, pp. 4655–4666, 2007.
  • [15] V. Abolghasemi, S. Ferdowsi, and S. Sanei, “Blind separation of image sources via adaptive dictionary learning,” IEEE Trans. Image Process., vol. 21, no. 6, pp. 2921–2930, 2012.
  • [16] G. Monaci, P. Jost, P. Vandergheynst, B. Mailhe, S. Lesage, and R. Gribonval, “Learning multimodal dictionaries,” IEEE Trans. Image Process., vol. 16, no. 9, pp. 2272–2283, 2007.
  • [17] J. Yang, Z. Wang, Z. Lin, S. Cohen, and T. Huang, “Coupled dictionary training for image super-resolution,” IEEE Trans. Image Process., vol. 21, no. 8, pp. 3467–3478, 2012.
  • [18] S. Wang, L. Zhang, Y. Liang, and Q. Pan, “Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis,” in IEEE CVPR, 2012, pp. 2216–2223.
  • [19] Y. Jia, M. Salzmann, and T. Darrell, “Factorized latent spaces with structured sparsity,” in Advances in Neural Information Processing Systems, 2010, pp. 982–990.
  • [20] N. Vaswani and W. Lu, “Modified-cs: Modifying compressive sensing for problems with partially known support,” IEEE Trans. Signal Process., vol. 58, no. 9, pp. 4595–4607, 2010.
  • [21] J. F. C. Mota, N. Deligiannis, and M. R. D. Rodrigues, “Compressed sensing with prior information: Optimal strategies, geometry, and bounds,” arXiv preprint arXiv:1408.5250, 2014.
  • [22] J. F. C. Mota, N. Deligiannis, and M. R. D. Rodrigues, “Compressed sensing with side information: Geometrical interpretation and performance bounds,” in IEEE GlobalSIP, 2014, pp. 512–516.
  • [23] E. van den Berg and M. P. Friedlander, “Probing the Pareto frontier for basis pursuit solutions,” SIAM Journal on Scientific Computing, vol. 31, no. 2, pp. 890–912, 2008.
  • [24] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004.