Multi-Focus Image Fusion Via Coupled Sparse Representation and Dictionary Learning

by   Rui Gao, et al.

We address the multi-focus image fusion problem, where multiple images captured with different focal settings are to be fused into an all-in-focus image of higher quality. Algorithms for this problem necessarily admit the source image characteristics along with focused and blurred feature. However, most sparsity-based approaches use a single dictionary in focused feature space to describe multi-focus images, and ignore the representations in blurred feature space. Here, we propose a multi-focus image fusion approach based on coupled sparse representation. The approach exploits the facts that (i) the patches in given training set can be sparsely represented by a couple of overcomplete dictionaries related to the focused and blurred categories of images; and (ii) merging such representations leads to a more flexible and therefore better fusion strategy than the one based on just selecting the sparsest representation in the original image estimate. By jointly learning the coupled dictionary, we enforce the similarity of sparse representations in the focused and blurred feature spaces, and then introduce a fusion approach to combine these representations for generating an all-in-focus image. We also discuss the advantages of the fusion approach based on coupled sparse representation and present an efficient algorithm for learning the coupled dictionary. Extensive experimental comparisons with state-of-the-art multi-focus image fusion algorithms validate the effectiveness of the proposed approach.



There are no comments yet.


page 20

page 23


A Fast Dictionary Learning Method for Coupled Feature Space Learning

In this letter, we propose a novel computationally efficient coupled dic...

Multi-focus Image Fusion using dictionary learning and Low-Rank Representation

Among the representation learning, the low-rank representation (LRR) is ...

Multimodal Task-Driven Dictionary Learning for Image Classification

Dictionary learning algorithms have been successfully used for both reco...

Coupled Feature Learning for Multimodal Medical Image Fusion

Multimodal image fusion aims to combine relevant information from images...

Multi-focus Image Fusion for Visual Sensor Networks

Image fusion in visual sensor networks (VSNs) aims to combine informatio...

Coupled Dictionary Learning for Multi-contrast MRI Reconstruction

Medical imaging tasks often involve multiple contrasts, such as T1- and ...

Deep Convolutional Sparse Coding Networks for Image Fusion

Image fusion is a significant problem in many fields including digital p...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Over the last several decades, considerable attention has been given to the multi-focus image fusion problem [1, 2, 3, 4, 5]. Multi-focus image fusion is an effective post-processing technique for combining multiple images captured with different focal distances into an all-in-focus image, without sacrificing image quality, and at the same time without using specialized optic sensors [6, 7, 8]. The problem is of high importance in many fields, ranging from remote sensing to medical imaging [9, 10, 11, 12], especially for addressing the demand for cost minimization of optical sensors/cameras.

Looking at recent approaches, sparsity and overcompleteness have been successfully used for computational image fusion [13, 14, 15, 16, 17, 18, 19]. The methods exploit the fact that patches of natural image can be compactly represented using an overcomplete dictionary as a linear combination of only few atoms

. It means that the vector of weighting coefficients for the atoms is sparse.

Many image processing applications have benefited remarkably by using the above approach with a single learned overcomplete dictionary. A coupled dictionary-based approach [20, 21, 22, 23, 24] has not been however used before for multi-focus image fusion.111Except for our preliminary conference contribution [26]. Indeed, the forenamed image fusion methods [13, 14, 15, 16, 17, 18] directly learn and exploit a single overcomplete dictionary in a single feature space in order to describe multiple images which contain both the focused and blurred categories of image features. Hence, these methods ignore the sparse representations in blurred feature space, and set limits on sparsity of the vector of coefficients. The latter consequently leads to a limited accuracy of fusing coefficients. These disadvantages, which are associated with a limitation of single feature space only, motivate us to perform fusion in double feature spaces. In particular, instead of learning a single overcomplete dictionary from the focused features only, this paper suggests to learn two dictionaries over focused and blurred feature spaces, and then use the pair of dictionaries to perform fusion via sparse representation in both spaces. In this way, we exploit the existing structure shared by all available multi-focus images, correlate the representations over double feature spaces, and improve the fusion performance. We use the method proposed in our previous work [25] for learning two correlated dictionaries representing focused and blurred feature spaces. Our approach, based on dictionaries learned from double feature spaces and fusion via sparse representations over both spaces proves to be more accurate than traditional methods based on single learned dictionary.

In this paper, we extend the coupled dictionary learning approach based on sparse and redundant representations to the problem of fusing multi-focus images. Coupled overcomplete dictionary is expected to lead to more compact representation of the focused and blurred categories of images. Then the weighted max--norm strategy can be used while seeking for the focused patches needed to reconstruct an all-in-focus image.

The paper presents both algorithmic developments and simulation results for multi-focus image fusion. We formulate the multi-focus image fusion problem as a problem of obtaining an all-in-focus image from multi-focus input images based on their sparse representations over a coupled dictionary (of focused and blurred dictionaries). Moreover, we develop a fusion procedure that finds an accurate decision map based on the sparse representations of multi-focus source images, which are obtained using a coupled dictionary found by our novel coupled dictionary learning algorithm. The all-in-focus image is then reconstructed using such decision map and the source image patches.

There are major differences between the approach in this paper and the conference paper [26]. It includes the way of obtaining sparse representation. Averaging operator is used for fusing sparse coefficients in [26], while here it is omitted since a coupled dictionary is used for fusion instead of two dictionaries separately. Also, the all-in-focus image is reconstructed here using the exact patches from source images that are found focused by our selection operator instead of using their sparse representations as in [26].

We use bold capital letters for matrices, for example, we denote -th multi-focus and all-in-focus images as the following matrices of pixels and , respectively. Similarly, the matrices and denote the focus and blurred dictionaries, respectively. The operators and sets are denoted using calligraphic letters, for instance, represents the fusion operator.

Images are processed by patches.222It is because adapting a dictionary to large size images is impractical. Image patches are extracted using the sliding window technique (moving through images starting from the left-top corner to the right-bottom corner). Then the image patch of the size pixels e.g., is ordered lexicographically as a column vector and denoted as . For notation simplicity and without loss of generality, we drop hereafter the indices marking the patch position in an image, and we denote patches of -th multi-focus image and all-in-focus image just as and , respectively.

The vector norm for is the standard -norm, and denotes the operator that counts the number of non-zero entries in a vector. For a matrix , we define the Frobenius norm as . The symbol represents element-wise product of matrices, denotes the forward finite difference operator on the vertical and horizontal directions, and stands for the transpose operation.

The remainder of the paper is organized as follows. Section II gives the problem description and summarizes some general assumptions and a solution approach. Section III gives a detailed explanation of our fusion procedure. Simulation results are provided in Section IV. Finally, we conclude the paper in Section V.

Ii Problem Description and Solution Approach

Consider the problem of constructing/reconstructing a high quality all-in-focus image from a set of multi-focus source images , which can be abstractly written in the form of the following fusing process


where stands for a fusing operator and is the noise. The fusion goal is to obtain from . It is assumed here for simplicity that each multi-focus image is captured for the same scene and all multi-focus images are properly aligned.333Note that this assumption is typical in the literature with a focus on image fusion algorithms design, but proper alignment of images is also an important practical issue. One more typical assumption is that at each position, one of the input image patches is (the most) in-focus. Hence, the problem is to find the most focused patch in a set of all available corresponding patches, extracted from the multi-focus inputs.

Fig. 1: Block-diagram of the procedure for reconstructing an all-in-focus image from the given set of multi-focus images via coupled sparse representations in focus and blurred spaces and coupled dictionary learning.

As a solution approach, Fig. 1 shows the block-diagram of the proposed procedure for constructing/reconstructing an all-in-focus image from a given set of multi-focus images . In this block-diagram, the block “Patch extraction” represents the above described simple process of extracting patches from the available multi-focus images . Due to the fact that visible artifacts may occur on patch boundaries, overlapping patches that include pixels of neighboring patches are typically used to suppress such artifacts. The input to the block “Coupled dictionary learning” is a set of two available subsets of training patches, i.e., . Here and

are the subsets of manually labeled (classified) in-focus and blurred training patches, which are extracted from corresponding parts of available multi-focus images.

444Note that the method proposed in the paper can be straightforwardly extended also to the case when image patches are labeled at different focal depths to more than two sets of patches, but then the main cost will be labeling the patches.

The output of the block “Coupled dictionary learning” is the coupled dictionary representing the in-focus and blurred image feature spaces. The method proposed in our previous work [25]

for learning two correlated dictionaries representing focused and blurred feature spaces suites here perfectly. Indeed, the existing coupled dictionary learning algorithms have been developed for solving another inverse problem – super resolution, where the objective is to find an accurate transform from blurred inputs to unknown focused image patches.

The procedure of fusing multi-focus images is presented in Fig. 1 by three sub-blocks. The sub-block “Sparse approximation” represents a procedure of finding sparse representations of image patches over the coupled dictionary . The selection operation, represented by sub-block “Selection”, finds the focused patches and mask (a matrix containing the indix of in-focus patch at each position). Having the image patches and the mask, the in-focus patches are found, then the initial all-in-focus image is constructed, that is shown by the sub-block “Reconstruction of initial image ”.

The averaging nature of such all-in-focus image reconstruction from its overlapping patches may cause some blurring, mostly around the edges, where each of the source images is focused on one side and is blurred on the other side. Indeed, the patches that cross the focus boundaries and scattered noise in the decision map introduce some blurredness around those edges and fade the small details. Thus, a global reconstruction may need to be performed optionally in order to restore the contrast resolution of the initial estimation .

Iii Fusion via Coupled Sparse Representation

Using the coupled dictionary learning method in [25] the two dictionaries and , which are better representatives for, respectively, the focused and blurred feature spaces, can be obtained. Considering the characteristics of learned dictionaries and , we can infer that for the residuals and , representing a pair of corresponding focused and blurred features, the inequalities


hold. We next propose a greedy method to find the sparse representations representing corresponding patches over the coupled dictionary (horizontal concatenation of and ), knowing that the greedy methods solve the problem


to find the best matching atom approximating the residual at iteration . Then based on (2) and (3), we can instantly deduce that


In this section, a fusion approach based on (4) and using a coupled learned dictionary is proposed.

Iii-a Proposed Local Fusion Method

The local fusion operation as presented in Fig. 1 consists of three operations, namely, the sparse approximation, selection, and mask generation. Then the proposed local fusion approach can be mathematically formulated as


where is the mask operator, is the selection operator, and is the sparse approximation operator.

The operator can be formulated as the following -norm minimization problem


Note that problem (6) can be efficiently solved by many existing greedy methods, e.g., conventional orthogonal matching pursuit (OMP) algorithm [41].

To deal with the effect of luminance, the mean intensities are removed from all image patches before sparse approximation. Moreover, all vectors in , and are normalized to have Frobenius norm of one. After obtaining each vector of sparse representation for the corresponding patch , the selection operator is applied in accordance with (5) to find the sparse vector of coefficients that represents the most focused image patch .

According to the max--norm rule [16], the most focused sparse vector of coefficients from the set of vectors

is the vector with the highest activity level, where the activity level is measured by

-norm. In the coupled dictionary framework, based on (4), we extend the max--norm rule to the following weighted -norm, where the weight for the -norm of coefficients that correspond to is greater than the weight for the -norm of coefficients that correspond to . Thus, the selection operator finds the focused patch and its corresponding index by solving the following problem


where and are the weights corresponding to the focused and blurred subspaces, respectively, , and and denote two segments of that correspond to and .555Note that for the cases when there are more than two learned dictionaries, the proposed selection operator can be extended by segmenting based on the number of dictionaries and weighting those segments proportional to the focus level of dictionaries.

Then, the all-in-focus patch is found by applying the linear mask operator as follows


Going across the whole image, all in-focus image patches are found separately one by one, then by placing them at their positions and averaging the overlapping pixels, the initial approximation of all-in-focus image is reconstructed.

Iii-B Global Reconstruction

A global reconstruction, as optionally suggested at the end of Section II, can be achieved by applying the total variation (TV) regularization, which is commonly used in the natural image analysis. Thus, applying the TV prior on the image gradients magnitude, the global reconstruction problem can be written as


where takes the form of TV, denotes the discretization of the gradient for -th element, defined as with linear operators and representing finite difference approximations of the first-order horizontal and vertical partial derivatives [30]. Similar to the approaches in [30, 31, 32], optimization problem (9) can be efficiently solved by the alternating directions method of multipliers (ADMM) [33, 35], which decomposes a large scale global problem into a series of smaller local subproblems. The resulting after applying global reconstruction is then taken as the final estimate of the all-in-focus image.

Iii-C Algorithm of Fusion via Coupled Sparse Representation

In summary, when the underlying dictionaries and are known, the fusion via local sparse representations is first calculated. Then the global reconstruction is employed for enhancing the contrast resolution of the reconstructed all-in-focus image. The overall algorithm for multi-focus image fusion via sparse representation is summarized as Algorithm 1.

0:  Multi-focus source images and learned coupled dictionary
1:  Obtain vectorized image patches ;
2:  Remove the mean intensities and normalize all image patches as:, , ;
3:  for each set of :
4:   Find by solving (6);
5:   Find by applying the selection operator (7);
6:   Find the in-focus patch using (8);
7:  end for
8:  Form the initial estimate of all-in-focus image ;
9:  (Optional) Perform the global reconstruction using (9).
9:  The all-in-focus image .  
Algorithm 1 Image fusion via sparse representation.

Iv Experimental Results

Iv-a Experimental Setup

In this section, we evaluate the proposed approach for image fusion and compare it to some existing state-of-the-art approaches in terms of visual and quantitative comparison. Next, we discuss also various factors that influence the performance.

The quantitative assessments are based on two non-reference-based image fusion performance metrics, namely, normalized mutual information (NMI) and  [37], and two reference-based metrics, namely, structural similarity index (SSIM) [39] and mean square error (MSE).

Fig. 2: Learning data [29]: The red and green rectangles show parts of images used as blurred and focused learning data, respectively.

The proposed approach is compared to the following existing state-of-the-art multi-focus image fusion algorithms: discrete wavelet transform-based image fusion approach (DWT) [34], sparse representation “choose-max”-based image fusion approach (SR-CM) [16]

, sparse representation “choose-max”-based image fusion via trained dictionary using K-SVD (SR-KSVD), multi-focus image fusion based on principal component analysis (PCA) 

[40], multi-focus image fusion using dictionary-based sparse representation of focus measures (SR-FM) [17], and multi-focus image fusion using dense SIFT (DSIFT) [38].

a b c d e f
Fig. 3: Input 1: gray-scale multi-focus images[17]: (a) Clocks, (b) Lab, (c) Pepsi, (d) Disk, (e) Jug, and (f) Doll.
a b c d
Fig. 4: Input 2: triple color multi-focus images[29]: (a) Diver, (b) Keyboard, (c) Folders, and (d) Seals.
a b c d
Fig. 5: Visual comparison between separately learned dictionaries: (a) and (b) , and Coupled dictionary: (c) and (d) .

Throughout all experiments, the parameters used in the methods are set as follows. For DWT method, the source images are decomposed to levels and the wavelet basis “db1” is applied. In DSIFT, an orientation histogram with 8 bins is used for quantizing the gradient information, and the feature vector is of

. In implementation of SR-FM method, the Laplacian-energy is calculated as local focus measure and max-pooling is used for feature aggregation. Also the reconstruction phase is performed using overlapping patches, as the segmentation method applied to decision map in the work of 

[17] is not the focus of our work. All methods are assessed without applying any post-processing technique, since the problem of refining the decision map is independent from fusion methods.

For fair comparison, all sparsity based algorithms are implemented using the same patch and dictionary size of and ( for coupled), pixel overlap between neighboring patches and the tolerance error of . In addition, for dictionary learning, we execute 10 multiple dictionary update cycles.

For the proposed method, the visual results before and after global reconstruction are given separately. For global reconstruction, the ADMM algorithm of [27] is used with the regularization parameter , updating parameter , internal parameter .

The learning data includes 30,000 pairs of patches taken from image parts indicated by rectangles in Fig. 2. The images used for learning are taken from Lytro dataset[17]. One coupled dictionary is learned and used for all experiments.

The input data includes six pairs of gray-scale multi-focus images (see Fig. 3) taken from standard multi-focus dataset [17]666For the gray-scale inputs the available reference images in dataset [17] are used as perfectly fused all-in-focus images for measuring MSE and SSIM. and four triple series (see Fig. 4) of color multi-focus images taken from Lytro dataset. The size of gray-scale image pairs Doll, Clocks and Pepsi is , Lab and Disk are and Jug is . The color images are of size . All experiments are performed on a PC running a Intel(R) Xeon(R) 3.40GHz CPU.

Iv-B Coupled versus Separately Learned Dictionaries

The coupled dictionary used in the experiments and two dictionaries separately learned over the same focused and blurred learning data using K-SVD are visualized in Fig. 5 for comparison.

Fig. 6: Comparing fusion performances, using proposed method, over coupled dictionary and separately learned dictionaries.
a b c d
Fig. 7: Comparing masks obtained using (c) single dictionary (SR-KSVD) and (d) coupled dictionary from source images in (a) and (b). The images are taken from lytro multi-focus dataset [29].
a b c d e f g h
Fig. 8: Fusion result for multi-focus images ”Clocks”, obtained by methods: DWT (a), PCA (b), SR-FM (c), DSIFT (d), SR-CM (e), SR-KSVD (f), proposed method (g), and the proposed after global reconstruction (h)

The pairwise correlations between the atoms of and (see Figs. 5.(c) and (d)) are obtained by enforcing identical sparse representations through the dictionary learning [25]. These pairwise correlations ensure that and represent corresponding focused and blurred features.

a b c d e f g h
Fig. 9: Fusion result for multi-focus images ”Doll”, the same order as Fig. 8
a b c d e f g h
Fig. 10: Fusion result for multi-focus images ”Pepsi”, the same order as Fig. 8

Note that when and are learned separately, in accordance with max--norm rule [16] the sparse representations of blurred data is sparser, so it contains larger amplitude (but fewer number) non-zero entries (size of focused/blurred patches are normalized). Thus, the correlations between the atoms of and their corresponding blurred features are larger than those between the atoms of and their corresponding focused features.

To empirically demonstrate the effectiveness of using coupled dictionary learning instead of a single dictionary, we show in Fig. 6, the NMI and results of fusion using coupled and separately learned dictionaries for all images in the gray-scale dataset. The figure clearly shows that the results obtained using coupled dictionary are superior in all cases.

Moreover, the masks obtained using the proposed and SR-KSVD methods are compared in Fig. 7. It can be seen in the mask resulted by SR-KSVD method (see Fig. 7.(c)) that the excessive bias in selecting patches with the largest -norm as the focused one leads to wrong decisions around edges where sub-blurred and sub-focused patches need to be fused. That is because using one dictionary that only represents focused features sets limit on the sparsity of representations of sub-blurred patches, and lead to larger -norm for those sparse representations.

However, the proposed method approximates all patches over both the focused and blurred feature spaces ( and ), then using a weighted -norm, it finds the image patch with the highest contribution from the focused features (the atoms of ) in its sparse approximation as the most focused. Moreover, by jointly learning and , two balanced models of two feature spaces are obtained. Thus, the accuracy of the fusion operation is improved. It can be seen form the mask obtained by the proposed method (see Fig. 7.(d)) that the excessive error has been reduced to a high degree.

Iv-C Comparison Results

The proposed method can be used for fusion of multifocus image sets , where can be any number. The fusion rule (7) will take one patch as the most focused at each position to reconstruct the all-in-focus image. Here, the experiments are performed for double and triple input series ( and ). For color images, the mask is obtained for gray-scale version of the input images and then it is used for fusion of each of the three layers in RGB (red, green and blue) format.

(a) (b)
Fig. 11: Fusion result for multi-focus images ”Diver”: (a) SR-KSVD, (b) proposed.
(a) (b)
Fig. 12: Fusion result for multi-focus images ”Keyboard”: (a) SR-KSVD, (b) proposed.
Methods Measures Clocks Lab Pepsi Disk Jug Doll

DWT [34]
0.9847 1.0027 1.0079 0.8129 0.8497 0.8553
0.6600 0.5487 0.6587 0.5102 0.5048 0.6184
0.9403 0.9372 0.9362 0.9068 0.8871 0.9211
32.2172 60.1514 43.2703 94.3113 51.6737 46.3669

PCA [40]
1.0276 1.0270 1.0610 0.8372 0.8854 0.8965
0.6939 0.5651 0.6752 0.5352 0.5083 0.6355
0.9572 0.9468 0.9351 0.9226 0.9048 0.9418
24.8221 54.4139 29.4576 80.9688 45.1700 36.2968

DSIFT [38]
1.0015 1.0726 0.9943 0.9217 0.8836 0.8921
0.7017 0.7269 0.7243 0.6938 0.7714 0.7416
0.8649 0.9003 0.8955 0.8779 0.9661 0.9817
32.1295 17.55 19.1050 45.9702 4.3322 6.7304

SR-FM [17]
1.1100 1.0573 1.1764 0.8878 0.9490 1.0935
0.7462 0.6900 0.7577 0.6380 0.7174 0.7380
0.9451 0.8153 0.9296 0.8325 0.9490 0.9862
5.5989 12.0835 5.8016 30.7394 19.3786 7.4314

SR-CM [16]
1.1188 1.1079 1.1063 0.9460 1.0630 1.0547
0.7301 0.7058 0.7290 0.7052 0.7656 0.7402
0.8813 0.7843 0.8229 0.8367 0.9609 0.9817
1.8879 7.4700 3.9962 11.0090 3.3700 3.3617

1.1658 1.1235 1.1685 0.9821 1.1417 1.0517
0.7557 0.7295 0.7613 0.7206 0.7766 0.7454
0.9527 0.8400 0.9258 0.8667 0.9925 0.9888
1.6457 7.7026 3.6903 11.0777 3.2798 3.4843

1.1833 1.1733 1.1803 1.0176 1.1513 1.1475
0.7578 0.7340 0.7678 0.7247 0.7786 0.7490
0.9565 0.8214 0.9464 0.8716 0.9926 0.9928
1.3048 5.6800 2.8415 6.3516 1.7500 2.9864
TABLE I: Objective evaluation of fusion performance for input dataset 1. Results are ranked by colors as follows. Red is the best, blue is the second best, and green, the third best.
Methods Measures Diver Keyboard Folders Seals
SR-KSVD 0.5882 0.5820 0.6123 0.6337
1.3522 1.0798 1.4685 1.4206
Proposed 0.6247 0.6207 0.6544 0.6825
1.4644 1.1831 1.5984 1.5103
TABLE II: Objective evaluation of fusion performance for input dataset 2. Best result are shown in bold.

Fig. 13: Fusion performance vs. for and .

Fig. 14: usion performance vs. for .

Fig. 15: Fusion performance vs. for .

The representative fusion results for three pairs of images: Clocks, Doll, and Pepsi, and the two triple series: Diver and Keyboard, are shown in Figs. 8-12, respectively. All the figures also include the magnified details.

Visually inspecting the results from gray-scale dataset, we can see that the DWT method results in blocking artifacts, the fusion using PCA method shows excessive blurring effect in all three cases, although the images are relatively smooth. The fused images produced using DSIFT method although have high contrast resolution in all cases, the misaligned decision map results in blocking artifacts that highly reduces the visual quality of the images. These blocking artifacts are less visible in Figs. 9 and 10 comparing to Fig. 8. It can be explained in terms of low robustness of DSIFT method against different levels of blurredness in multi-focus source images.

The other four methods, namely SR-FM, SR-CM, SR-KSVD, and the proposed method give smooth fused images. Looking more carefully at details and magnified parts (quantitative evaluations are also given later), it can be seen however that the proposed method yields the best results. The effectiveness of the proposed method is more visible in the fusion results for triple series where the inputs are larger and more diverse in terms of focus level. We compared the proposed methods to its closest competitor SR-KSVD. It can clearly be seen from the magnified detailes that the proposed method has a better performance. For example, in the fusion results for the image Diver (see Fig. 11), the cloud in the red rectangle and the trace of water in the green rectangle, and in the results for Keyboard (see Fig. 12) the hair strand in the green rectangle are only visible in the results obtained by the proposed method.

It can be also seen in all three visualized gray-scale cases that the image resulted from applying ADMM-based global reconstruction to the fused image is significantly better than the images before applying global reconstruction in terms of contrast resolution and visibility of details.

In addition to the visual comparison, Tables I and II summarize the quantitative evaluations for the methods tested on the datasets in Figs. 3 and 4. It can be seen from both tables that the proposed approach produces the best quantitative results in term of and NMI in all cases, which means that it reduces the blocking artifacts and artificial distortions and combines a significant edge information into the fused images, while showing the highest fidelity in preserving the pixel intensities of source image. The lowest MSE values obtained in all gray-scale experiments mean that the results are the closest to the reference images, which shows the high accuracy of the proposed selection operator.

Iv-D Effects of Main Parameters

The following three main parameters influence the fusion performance: patch size , tolerance error , and weight parameter in (7). To study the effects of these parameters, we run the proposed method on the whole gray-scale dataset and average the results for NMI and . Fig. 13 shows the averaged results for weighting parameter . It can be seen that for between 0.54 and 0.56, the best results are achieved. The effect of different tolerance error values on fusion performance is shown in Fig. 14. As it can be seen, the best results attained for , while NMI increases for larger values. This observation shows that larger values of decrease the scattered error so leads to better NMI results, however it increases the bias (uniformly wrong selected areas in decision map) and reduces the . Fig. 15 shows that with increasing the patch size, the fusion performance slightly improves. However, computation over larger patches increases the run time and computational costs, thus we run all other experiments using .

V Conclusion

We have proposed a fusion algorithm for combing multiple images with different focal settings into one all-in-focus image. We first have formalized the physical process of capturing multi-focus images, and then developed a basic model based on the idea of sparse representation of all-in-focus image using a coupled overcomplete dictionary. This approach is straightforwardly extendable to the case of multiple (more than two) dictionaries. Using the coupled dictionary from the focused and blurred feature spaces, we have developed an efficient and accurate fusing approach, and have demonstrated that the proposed approach well preserves the edge and structural information of source images; drastically reduces the blocking artifacts, circle blurring, and artificial distortions; and shows in general better results than the existing fusion methods including state-of-the-art methods.

Vi Acknowledgment

The authors would like to thank Rui Gao for performing some preliminary studies that finally led to this paper, although the particular results of this paper did not follow from her preliminary studies.


  • [1] T. Wan, C. Zhu, and Z. Qin, “Multifocus Image Fusion Based on Robust Principal Component Analysis, ”Pattern Recognit. Lett., vol. 34, no. 9, pp. 1001–1008, Jul. 2013.
  • [2] Y. Liu, S. Liu, and Z. Wang, “Multi-focus Image Fusion With Dense SIFT,” Inf. Fusion, vol. 23, pp. 139–155, May 2015.
  • [3] S. Pertuz, D. Puig, M. A. Garcia, and A. Fusiello, “Generation of All-in-Focus Images by Noise-robust Selective Fusion of Limited Depth-of-field Images,” IEEE Trans. Image Process., vol. 22, no. 3, pp. 1242–1251, Mar. 2013.
  • [4] J. Tian, and L. Chen, “Multi-focus Image Fusion Using Wavelet-Domain Statistics,” in Proc. IEEE Int. Conf. Image Process., Hong Kong, 2010, pp. 1205–1208.
  • [5] J. Tian, L. Chen, L. Ma, and W. Yu, “Multi-focus image fusion using a bilateral gradient-based sharpness criterion,” Opt. Commun., vol. 284, no. 1, pp. 80–87, Jan. 2011.
  • [6] M. Subbarao, T. Choi, and A. Nikzad, “Focusing techniques,” Opt. Eng., vol. 32, pp. 2824–2836, Mar. 1993.
  • [7] M. Born and E. Wolf, Principles of Optics. Cambridge Univ. Press., 1999.
  • [8] Q. Zhang, and B. L. Guo, “Multifocus image fusion using the nonsubsampled contourlet transform,” Signal Process., vol. 89, pp. 1334–1346, Jul. 2009.
  • [9] F. Nencini, A. Garzelli, S. Baronti, and L. Alparone, “Remote sensing image fusion using the curvelet transform,” Inf. Fusion, vol. 8, no. 2, pp. 143–156, Apr. 2007.
  • [10] G. Pajares and J. Cruz, “A wavelet-based image fusion tutorial,” Pattern Recognit., vol. 37, no. 9, pp. 1855–1872, Sep. 2004.
  • [11] O. Rockinger, “Image sequence fusion using a shift-invariant wavelet transform,” in Proc. IEEE Int. Conf. Image Process., Santa Barbara, CA, 1997, pp. 288–291.
  • [12] V. D. Calhoun and T. Adali, “Feature-based fusion of medical imaging data,” IEEE Trans. Inf. Technol. Biomedicine, vol. 13, no. 5, pp. 711–720, Sep. 2009.
  • [13] T. Wan, N. Canagarajah, and A. Achim, “Compressive image fusion,” in Proc. IEEE Int. Conf. Image Process., San Diego, CA, 2008, pp. 1308–1311.
  • [14] T. Wan, Z. Qin, C. Zhu, and R. Liao, “A robust fusion scheme for multifocus images using sparse features,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Process., British Columbia, Canada, 2013, pp. 1957–1961.
  • [15] H. Li, L. Li, and J. Zhang, “Multi-focus image fusion based on sparse feature matrix decomposition and morphological filtering,” Opt. Commun., vol. 342, pp. 1–11, May. 2015.
  • [16] B. Yang and S. Li, “Multifocus image fusion and restoration with sparse representation,” IEEE Trans. Instrum. Meas., vol. 59, no. 4, pp. 884–892, Apr. 2010.
  • [17] M. Nejati, S. Samavi, and S. Hirani, “Multi-focus image fusion using dictionary-based sparse representation,” Inf. Fusion, vol. 25, pp. 72–84, Sep. 2015.
  • [18] Q. Zhang, and M. D. Levine, “Robust multi-focus image fusion using multi-task sparse representation and spatial context,” IEEE Trans. Image Process., vol. 25, no. 5, pp. 2045–2058, Mar. 2016.
  • [19] R. Gao, S. A. Vorobyov, and H. Zhao, “Image fusion with cosparse analysis operator,” in IEEE Signal Process. Lett., vol. 24, no. 7, pp. 943–947, July 2017.
  • [20] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE Trans. Image Process., vol. 19, no. 11, pp. 2861–2873, May. 2010.
  • [21] J. Yang, Z. Wang, Z. Lin, S. Cohen, and T. Huang, “Coupled dictionary training for image super-resolution,” IEEE Trans. Image Process., vol. 21, no. 8, pp. 3467–3478, Aug. 2012.
  • [22] J. Sadasivan, S. Mukherjee, and C. S. Seelamantula, “Joint dictionary training for bandwidth extension of speech signals,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Process., Shanghai, China, 2016, pp. 5925–5929.
  • [23] S. Wang, L. Zhang, Y. Liang, and Q. Pan, “Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., Rhode Island, USA, 2012, pp. 2216–2223.
  • [24] T. Peleg, and M. Elad, “A statistical prediction model based on sparse representations for single image super-resolution,” IEEE Trans. Image Process., vol. 23, no. 6, pp. 2569–2582, Jun. 2014.
  • [25] F. G. Veshki, and S. A. Vorobyov, “A Fast Dictionary Learning Method for Coupled Feature Space Learning,” arXiv preprint arXiv:1904.06968, April. 2019.
  • [26] R. Gao, S. A. Vorobyov, and H. Zhao, “Multi-focus image fusion via coupled dictionary training,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Process., Shanghai, China, 2016, pp. 1666–1670.
  • [27] S. H. Chan, X. Wang and O. A. Elgendy, ”Plug-and-Play ADMM for image restoration: Fixed point convergence and applications,” IEEE Trans. Comput. Imag.,Nov. 2016, vol. 3, pp. 84–98.
  • [28] R. Rubinstein, M. Zibulevsky, and M. Elad, “Efficient implementation of the K-SVD algorithm using batch orthogonal matching pursuit,” CS Technion, vol. 40, no. 8, pp. 1–15, Apr. 2008.
  • [29] ¡¿.
  • [30] A. Lanza, S. Morigi, F. Sgallari, “Convex image denoising via non-convex regularization,” Scale Sp. Var. Methods Comput. Vis., vol. 9087, Springer, pp. 666–677, 2015.
  • [31] M. Nikolova, S. Esedoglu, and T. F. Chan, “Algorithms for finding global minimizers of image segmentation and denoising models,” SIAM J. Appl. Math., vol. 66, no. 5, pp. 1632–1648, Jun. 2006.
  • [32] A. Parekh, and I. W. Selesnick, “Convex denoising using non-convex tight frame regularization,”IEEE Signal Process. Lett., vol. 22, no. 10, pp. 1786–1790, Oct. 2015.
  • [33] J. Eckstein, and D. Bertsekas, “On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators,” Math. Program., vol. 55, no. 3, pp. 293–318, Nov. 1992.
  • [34] J. Tian and L. Chen, “Adaptive multi-focus image fusion using a wavelet-based statistical sharpness measure,” Signal Process., vol. 92, no. 9, pp. 2137–2146, Sep. 2012.
  • [35] S. Xie, and S. Rahardja, “Alternating direction method for balanced image restoration,” IEEE Trans. Image Process., vol. 21, no. 11, pp. 4557–4567, Nov. 2012.
  • [36] L. N. Smith and M. Elad, “Improving dictionary learning: multiple dictionary updates and coefficient reuse,” IEEE Signal Process. Lett., vol. 20, no. 1, pp. 79–82, Jan. 2013.
  • [37] C. Xydeas and V. Petrović, “Objective image fusion performance measure,” Electron. Lett., vol. 36, no. 4, pp. 308–309, Feb. 2000.
  • [38] Y. Liu, S. Liu, and Z. Wang, “Multi-focus image fusion with dense SIFT,” Inf. Fusion, vol. 23, pp. 139–155, May. 2015.
  • [39] Z. Wang, A. Bovik, E. Simoncelli and H. Sheikh, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, pp. 600–612, April. 2004.
  • [40] U. Patil and U. Mudengudi, “Image fusion using hierarchical PCA,” IEEE Int. Conf. Image Inf Process., Shimla, India, pp. 1-6, December. 2011.
  • [41] J. A. Tropp, and A.C. Gilbert,“Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit ,” in Proc. IEEE Trans. Inf. Proc., vol. 53, no. 12, pp. 4655–4666, Dec. 2007.