Image fusion means to combine information from different images to a single image, which is more informative and can be better processed by the following process. Figure 1 shows an example where in the fused image the target and more details about the scene can be seen. Many image fusion algorithms have been proposed, which can be generally divided into pixel-level, feature-level and decision-level fusion approaches based on the level of fusion. Also, image fusion can either be performed in the spatial domain or transform domain. Based on application areas, image fusion technology can be grouped into several types, namely medical image fusion [11, 39], multi-focus image fusion [36, 23, 41], remote sensing image fusion , multi-exposure image fusion [30, 32], visible and infrared image fusion . Among these types, the visible and infrared image fusion is one of the most frequently used ones. This is because that the visible and infrared image fusion can be applied in many applications, for instance object tracking [42, 43, 15], object detection , and biometric recognition [13, 1].
|Name||Image/Video pairs||Image type||Resolution||Year||Results||Code library|
|OSU Color-Thermal Database||6 video pairs||RGB, Infrared||320 240||2005||No||No|
|TNO||63 image pairs||multispectral||Various||2014||No||No|
|VLIRVDIF||24 video pairs||RGB, Infrared||720480||2019||No||No|
|VIFB||21 image pairs||RGB, Infrared||Various||2020||Yes||Yes|
However, current research on visible and infrared image fusion is suffering from several problems, which hinder the development of this field severely. First, there is no a well-recognized visible and infrared image fusion dataset which can be used to compare performance under the same standard. Therefore, it is quite common that different images are utilized in experiments in the literature, which makes it difficult to compare the performance of different algorithms. Second, although the source codes of many image fusion algorithms have been made publicly available, for example the FusionGAN  and DenseFuse , the input and output formats of most algorithms are different and thus it is inconvenient for large scale performance evaluation. Third, it is crucial to evaluate the performance of state-of-the-art fusion algorithms to demonstrate their strength and weakness and to help identify future research directions in this field for designing more robust algorithms. However, many evaluation metrics have been proposed to evaluate the fused images, but none of them is better than all other metrics. As a result, researchers normally choose several metrics which support their methods in the literature. This further makes it difficult to objectively compare performances.
To solve these issues, in this work we build a visible and infrared image fusion benchmark (VIFB) that includes 21 pairs of visible and infrared images, 20 publicly available fusion algorithms and 13 evaluation metrics to facilitate the evaluation task.
The main contributions of this paper lie in the following aspects:
Dataset. We created a dataset containing 21 pairs of visible and infrared images. These image pairs cover a wide range of environments and conditions, such as indoor, outdoor, low illumination, and over-exposure. Therefore, the dataset is able to test the generalization ability of fusion algorithms.
Code library. We collected 20 recent image fusion algorithms and integrated them into a cold library, which can be easily utilized to run algorithms and compare performance. Most of these algorithms are published in recent 5 years. An interface is designed to integrate any image fusion algorithms written in Matlab into VIFB easily.
Comprehensive performance evaluation. We implemented 13 evaluation metrics in VIFB to comprehensively compare fusion performance. We have run the collected 20 algorithms on the proposed dataset, and performed comprehensive comparison of those algorithms. All the results are made available for the interested readers to use.
|MSVD ||2011||Defense Science Journal||Multi-scale|
|GFF ||2013||IEEE Transactions on Image Processing||Multi-scale|
|MST_SR ||2015||Information Fusion||Hybrid|
|RP_SR ||2015||Information Fusion||Hybrid|
|NSCT_SR ||2015||Information Fusion||Hybrid|
|CBF ||2015||Signal, image and video processing||Multi-scale|
|ADF ||2016||IEEE Sensors Journal||Multi-scale|
|GFCE ||2016||Applied Optics||Multi-scale|
|Hybrid_MSD ||2016||Information Fusion||Hybrid|
|TIF ||2016||Infrared Physics & Technology||Saliency-based|
|GTF ||2016||Information Fusion||Other|
|FPDE ||2017||International Conference on Information Fusion||Subspace-based|
|IFEVIP ||2017||Infrared Physics & Technology||Other|
|VSMWLS ||2017||Infrared Physics & Technology||Saliency-based|
International Conference on Pattern Recognition
|CNN ||2018||International Journal of Wavelets, Multiresolution and Information Processing||DL-based|
|MGFF ||2019||Circuits, Systems, and Signal Processing||Multi-scale|
|ResNet ||2019||Infrared Physics & Technology||DL-based|
|AG||Average gradient||+||RW||Relatively wrap||-|
|CE||Cross entropy||-||RMSE||Root mean squared error||-|
|EI||Edge intensity||+||Edge based similarity measurement||+|
|FD||Figure definition||+||SF||Spatial frequency||+|
|MI||Mutual information||+||SSIM||Structural similarity index measure||+|
2 Related Work
In this section, we briefly review recent visible and infrared image fusion algorithms. In addition, we summarize existing visible and infrared image datasets.
2.1 Visible-infrared fusion methods
In recent years, a lot of visible and infrared image fusion methods have been proposed. Before deep learning is introduced to the image fusion community, main image fusion methods can be generally grouped into several categories, namely multi-scale transform-, sparse representation-, subspace-, and saliency-based methods, hybrid models, and other methods according to their corresponding theories.
. Deep learning can help to solve several important problems in image fusion. First, deep learning can provide better features compared to handcrafted ones. Second, deep learning can learn adaptive weights in image fusion, which is crucial in many fusion rules. Regarding methods, convolutional neural network (CNN)[10, 23, 41, 39, 32], generative adversarial networks (GAN) , Siamese networks 16] have been explored to conduct image fusion. Apart from image fusion methods, the image quality assessment, which is critical in image fusion performance evaluation, has also benefited from deep learning 
. It is foreseeable that in the future, image fusion technology will develop in the direction of machine learning, and an increasing number of research results will appear.
2.2 Existing dataset
Although the research on image fusion has begun for many years, there is still no a well-recognized and common used dataset in the community of visible and infrared image fusion. This differs from the visual tracking where several well-known benchmarks have been proposed and widely utilized, such as OTB [37, 38] and VOT . Therefore, it is common that different image pairs are utilized in visible and infrared image fusion literature, which makes the objective comparison difficult.
At the moment, there are several existing visible and infrared image fusion dataset, including OSU Color-Thermal Database111http://vcipl-okstate.org/pbvs/bench/, TNO Image fusion dataset222https://figshare.com/articles/TN_Image_Fusion_Dataset/1008029, and VLIRVDIF 333http://www02.smt.ufrj.br/ fusion/. The main information about these dataset are summarized in Table 1. Actually, apart from OSU, the number of image pairs in TNO and VLIRVDIF is not small. However, the lack of code library, evaluation metrics as well as results on these dataset make it difficult to gauge the state-of-the-art based on them.
3 Visible and Infrared Image Fusion Benchmark
The dataset in VIFB, which is a test set, includes 21 pairs of visible and infrared images. These images cover a wide range of environments and working conditions, such as indoor, outdoor, low illumination, and over-exposure. Each pair of visible and infrared image has been strictly registered to make sure that the image fusion can be successfully performed. There are various image resolution in the dataset, such as 320240, 630460, 512184, and 452332. Some examples of images in the dataset are given in Fig. 2. The images are collected by the authors from the Internet444https://www.ino.ca/en/solutions/video-analytics-dataset/ and fusion tracking dataset .
3.2 Baseline algorithms
In recent years, a lot of algorithms have been proposed to perform visible and infrared image fusion. However, only a part of papers provide the source code. Besides, these codes have different input and output interfaces, and they may require different running environment. These factors hinder the usage of these codes to produce results and compare performances.
In VIFB benchmark, we integrate 20 recently published visible-infrared image fusion algorithms including MSVD , GFF , MST_SR , RP_SR , NSCT_SR , CBF , ADF , GFCE , HMSD_GF , Hybrid-MSD , TIF , GTF , FPDE , IFEVIP , VSM_WLS , DLF , LatLRR , CNN , MGFF , ResNet . Table 2 lists more details about these algorithms. At the moment, all these image fusion algorithms integrated in VIFB are written in Matlab. Note that some algorithms can only fused gray-scale images while some methods can fusion colorful images.
These algorithms cover almost every kind of visible-infrared fusion algorithms, and most algorithms are proposed in the last five years, which can represent the development of the field of visible-infrared fusion algorithms to some extent. We will continue to add more algorithms to VIFB in future.
To integrate algorithms into VIFB and for the convenience of users, we designed an interface to integrate fusion algorithms written in Matlab into VIFB. By using this interface, any visible-infrared fusion algorithm written in Matlab can be added into the benchmark easily. However, it should be mentioned that, for those methods whose source codes are not in Matlab, we also design an interface to integrate the fusion results to VIFB in order to compare their results with other algorithms.
3.3 Evaluation metrics
Numerous evaluation metrics for visible-infrared image fusion have been proposed, such as mutual information, spatial frequency, cross entropy. However, none of them is better than all other metrics. Therefore, in the literature, authors normally chose and present several evaluation metrics which support their methods. However, it may not be able to compare performance comprehensively in this way. In VIFB, we implement 13 evaluation metrics. It is convenient to compute all these metrics for each method in VIFB, thus making it easy to compare performances. All evaluation metrics that have been implemented in VIFB are listed in Table 3. Here we only introduce several metrics. More introduction to other metrics will be put in the supplementary material.
Mutual information (MI).
MI  is used to measure the amount of information that is transferred from source images to the fused image. It is defined as:
where and denote the information transferred from visible and infrared images to the fused image, respectively. The subscript denotes the fused image. Specifically, is defined as follows:
where is for visible image and is for infrared image, and are the marginal histograms of source image and fused image , respectively. is the joint histogram of source image and fused image . A large MI value means a good fusion performance since considerable information is transferred to the fused image.
Root mean squared error (RMSE).
RMSE is defined as:
where denotes the dissimilarity between the visible and fused images, is the dissimilarity between the infrared and fused images. is defined as:
where is for visible image and is for infrared image, and are the width and height of the images, respectively. If the fused image has a small amount of error and distortion, then there will be a small RMSE value.
Spatial frequency (SF).
SF  can measure the gradient distribution of an image thus revealing the detail and texture of an image. It is defined as:
where and . A large SF value indicates rich edges and textures, thus indicating good fusion performance.
More information about evaluation metrics can be founded in .
This section presents experimental results on the VIFB dataset. Section 4.1 and Section 4.2 presents qualitative and quantitative performance comparison, respectively. Section 4.3 compares the runtime of each algorithm. All experiments are performed using a desktop equipped with an NVIDIA GTX 1080Ti GPU and i7-8700K CPU. Default parameters reported by the corresponding authors of each algorithm are employed. Note that due to the page limits, we just present a part of results here. More fusion results will be provided in the supplementary materials.
4.1 Qualitative performance comparison
Qualitative evaluation methods are important in fusion quality assessment and they assess the quality of fused images on the basis of the human visual system. Figure 3 presents the qualitative performance comparison of 20 fusion methods on the fight image pair. In this image pair, several people are in the shadow of a car thus can not be seen clearly in the visible image while can be seen in infrared image. As can be seen, in almost all fused images these people can be seen. However, the fused images which are obtained by some algorithms have more artifacts information. These include ADF, CBF, DLF, FPDE, GFCE, HMSD_GF, Hybrid_MSD, IFEVIP, MST_SR, MSVD, NSCT_SR, ResNet, PR_SR, TIF and VSMWLS. Besides, the fused images produced by GTF, ResNet, and PR_SR do not preserve much detail information contained in the visible image. Figure 3 indicates that the fused images obtained by GFF and MGFF are more natural for human sensitivity and preserve more details.
Figure 4 shows the qualitative comparison of 20 methods on manlight image pair. It can be seen that in many fused images, the people around the car are still invisible or not clear, such as those produced by ADF, CNN, GFCE, HMSD_GF, Hybrid_MSD, and IFEVIP. Some other fused images have more artifacts which are not presented in original images, such as those obtained by CBF, GFCE, and NSCT_SR. The results indicate that DLF, FPDE, MGFF, and MST_SR give better subjective fusion performance for the manlight case.
4.2 Quantitative performance comparison
Table 4 presents the average value of 13 evaluation metrics for all methods on 21 image pairs. As can be seen, the LatLRR method obtains the best overall performance by having 4 best values and 3 second best values. The NSCT_SR, GFCE and IFEVIP also show relatively good overall performance. However, this table indicates clearly that there is no a dominant fusion method that can beat other methods in all or most evaluation metrics. Besides, all three deep learning-based methods, namely DLF, CNN and ResNet, do not show very competitive overall performance. This is different from the field of tracking and detection which is almost dominated by deep learning-based approaches.
To further show the quantitative comparison of fusion performances of different methods, nine metric results of all 20 methods on 21 image pairs are presented in Figure 5.
4.3 Runtime comparison
The runtime of all algorithms integrated in VIFB is listed in Table 5. As can be seen, the runtime of image fusion methods vary significantly from one to another. This is also true even for methods in the same category. For instance, both TIF and LatLRR are saliency-based methods, but the runtime of LatLRR is more than 2000 times that of TIF. Besides, multi-scale methods are generally fast and deep learning-based algorithms are slower than others even with the help of GPU. The fastest deep learning-based method, i.e. ResNet, takes 2.89 seconds to fuse one image pair. It should be mentioned that all three deep learning-based algorithms in VIFB do not update the model online, but use pretrained model instead.
One important application area of visible and infrared image fusion is the RGB-infrared fusion tracking [42, 43], where the tracking speed is vital for practical applications. As pointed out in , if an image fusion algorithm is very time-consuming, like LatLRR  and NSCT_SR , then it will not be feasible to develop a real-time fusion tracker based on this image fusion algorithm. Actually, most image fusion algorithms listed in Table 5 are computationally expensive in terms of tracking.
5 Concluding Remarks
In this paper, we present a visible and infrared image fusion benchmark (VIFB), which includes a dataset of 21 image pairs, a code library consists of 20 algorithms, 13 evaluation metrics and all results. To the best of our knowledge, this is the first visible and infrared image fusion benchmark to date. This benchmark facilitates better understanding of the state-of-the-art image fusion approaches, and can provide a platform for gauging new methods.
We carry out large scale experiments based on VIFB to evaluate the performance of all integrated fusion algorithms. We have several observations on the status of visible and infrared image fusion based on our experimental results. First, unlike some other fields in computer vision where deep learning is almost the dominant method, such as object tracking and detection, different kinds of methods are still being frequently utilized in visible and infrared image fusion. Second, although there are an increasing number of deep learning-based image fusion methods, their performances do not show superiority over non-learning algorithms at the moment. However, due to its strong representation ability and the end-to-end property, we believe that the deep learning-based image fusion approach will be an important research direction in future. Third, the computational efficiency of visible and infrared image fusion algorithms still need to be improved in order to be applied in real-time applications, such as tracking and detection.
We will continue to extend the dataset and code library of VIFB to contain more image pairs and fusion algorithms. We will also implement more evaluation metrics in VIFB. We hope that VIFB can serve as a good starting point for researchers who are interested in visible and infrared image fusion.
-  Syed Mohd Zahid Syed Zainal Ariffin, Nursuriati Jamil, Puteri Norhashimah Megat Abdul Rahman, Syed Mohd, Zahid Syed, Zainal Ariffin, Nursuriati Jamil, and Universititeknologi Mara. Can thermal and visible image fusion improves ear recognition? In Proceedings of the 8th International Conference on Information Technology, pages 780–784, 2017.
-  Durga Prasad Bavirisetti and Ravindra Dhuli. Fusion of infrared and visible sensor images based on anisotropic diffusion and karhunen-loeve transform. IEEE Sensors Journal, 16(1):203–209, 2016.
-  Durga Prasad Bavirisetti and Ravindra Dhuli. Two-scale image fusion of visible and infrared images using saliency detection. Infrared Physics & Technology, 76:52–64, 2016.
Durga Prasad Bavirisetti, Gang Xiao, and Gang Liu.
Multi-sensor image fusion based on fourth order partial differential equations.In 2017 20th International Conference on Information Fusion (Fusion), pages 1–9. IEEE, 2017.
-  Durga Prasad Bavirisetti, Gang Xiao, Junhao Zhao, Ravindra Dhuli, and Gang Liu. Multi-scale guided image and video fusion: A fast and efficient approach. Circuits, Systems, and Signal Processing, 38(12):5576–5605, Dec 2019.
-  James W Davis and Vinay Sharma. Background-subtraction using contour-based fusion of thermal and visible imagery. Computer vision and image understanding, 106(2-3):162–182, 2007.
-  Andreas Ellmauthaler, Carla L Pagliari, Eduardo AB da Silva, Jonathan N Gois, and Sergio R Neves. A visible-light and infrared video database for performance evaluation of video/image fusion methods. Multidimensional Systems and Signal Processing, 30(1):119–143, 2019.
-  Ahmet M Eskicioglu and Paul S Fisher. Image quality measures and their performance. IEEE Transactions on communications, 43(12):2959–2965, 1995.
-  Hassan Ghassemian. A review of remote sensing image fusion methods. Information Fusion, 32:75–89, 2016.
-  Haithem Hermessi, Olfa Mourali, and Ezzeddine Zagrouba. Convolutional neural network-based multimodal image fusion via similarity learning in the shearlet domain. Neural Computing and Applications, pages 1–17, 2018.
-  Alex Pappachen James and Belur V Dasarathy. Medical image fusion: A survey of the state of the art. Information Fusion, 19:4–19, 2014.
-  Xin Jin, Qian Jiang, Shaowen Yao, Dongming Zhou, Rencan Nie, Jinjin Hai, and Kangjian He. A survey of infrared and visual image fusion methods. Infrared Physics & Technology, 85:478–501, 2017.
Seong G. Kong, Jingu Heo, Besma R. Abidi, Joonki Paik, and Mongi A. Abidi.
Recent advances in visual and infrared face recognition - A review.Computer Vision and Image Understanding, 97(1):103–135, 2005.
-  Matej Kristan, Jiri Matas, Aleš Leonardis, Tomas Vojir, Roman Pflugfelder, Gustavo Fernandez, Georg Nebehay, Fatih Porikli, and Luka Čehovin. A novel performance evaluation methodology for single-target trackers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(11):2137–2155, Nov 2016.
-  Chenglong Li, Xinyan Liang, Yijuan Lu, Nan Zhao, and Jin Tang. Rgb-t object tracking: benchmark and baseline. Pattern Recognition, page 106977, 2019.
-  Hui Li and Xiaojun Wu. Densefuse: A fusion approach to infrared and visible images. IEEE Transactions on Image Processing, 28(5):2614–2623, 2018.
-  Hui Li and Xiaojun Wu. Infrared and visible image fusion using latent low-rank representation. arXiv preprint arXiv:1804.08992, 2018.
-  Hui Li, Xiao-Jun Wu, and Tariq S Durrani. Infrared and visible image fusion with resnet and zero-phase component analysis. Infrared Physics & Technology, 102:103039, 2019.
-  Hui Li, Xiao-Jun Wu, and Josef Kittler. Infrared and visible image fusion using a deep learning framework. 24th International Conference on Pattern Recognition, 2018.
-  Shutao Li, Xudong Kang, Leyuan Fang, Jianwen Hu, and Haitao Yin. Pixel-level image fusion: A survey of the state of the art. Information Fusion, 33:100–112, 2017.
-  Shutao Li, Xudong Kang, and Jianwen Hu. Image fusion with guided filtering. IEEE Transactions on Image processing, 22(7):2864–2875, 2013.
-  Yu Liu, Xun Chen, Juan Cheng, Hu Peng, and Zengfu Wang. Infrared and visible image fusion with convolutional neural networks. International Journal of Wavelets, Multiresolution and Information Processing, 16(03):1850018, 2018.
-  Yu Liu, Xun Chen, Hu Peng, and Zengfu Wang. Multi-focus image fusion with a deep convolutional neural network. Information Fusion, 36:191–207, 2017.
-  Yu Liu, Xun Chen, Zengfu Wang, Z Jane Wang, Rabab K Ward, and Xuesong Wang. Deep learning for pixel-level image fusion: Recent advances and future prospects. Information Fusion, 42:158–173, 2018.
-  Yu Liu, Shuping Liu, and Zengfu Wang. A general framework for image fusion based on multi-scale transform and sparse representation. Information Fusion, 24:147–164, 2015.
-  Jiayi Ma, Chen Chen, Chang Li, and Jun Huang. Infrared and visible image fusion via gradient transfer and total variation minimization. Information Fusion, 31:100–109, 2016.
-  Jiayi Ma, Yong Ma, and Chang Li. Infrared and visible image fusion methods and applications: A survey. Information Fusion, 45:153–178, 2019.
-  Jiayi Ma, Wei Yu, Pengwei Liang, Chang Li, and Junjun Jiang. FusionGAN: A generative adversarial network for infrared and visible image fusion. Information Fusion, 48(June 2018):11–26, 2019.
-  Jinlei Ma, Zhiqiang Zhou, Bo Wang, and Hua Zong. Infrared and visible image fusion based on visual saliency map and weighted least square optimization. Infrared Physics & Technology, 82:8–17, 2017.
-  Kede Ma, Kai Zeng, and Zhou Wang. Perceptual quality assessment for multi-exposure image fusion. IEEE Transactions on Image Processing, 24(11):3345–3356, 2015.
Image fusion technique using multi-resolution singular value decomposition.Defence Science Journal, 61(5):479–484, 2011.
-  K Ram Prabhakar, V Sai Srikar, and R Venkatesh Babu. Deepfuse: A deep unsupervised approach for exposure fusion with extreme exposure image pairs. In 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, pages 4724–4732, 2017.
-  Guihong Qu, Dali Zhang, and Pingfan Yan. Information measure for performance of image fusion. Electronics letters, 38(7):313–315, 2002.
-  B. K. Shreyamsha Kumar. Image fusion based on pixel significance using cross bilateral filter. Signal, Image and Video Processing, 9(5):1193–1204, Jul 2015.
-  Helene Torresan, Benoit Turgeon, Clemente Ibarra-Castanedo, Patrick Hebert, and Xavier P Maldague. Advanced surveillance systems: combining video and thermal imagery for pedestrian detection. In Thermosense XXVI, volume 5405, pages 506–516. International Society for Optics and Photonics, 2004.
-  Zhaobin Wang, Yide Ma, and Jason Gu. Multi-focus image fusion using pcnn. Pattern Recognition, 43(6):2003–2016, 2010.
-  Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. Online object tracking: A benchmark. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2411–2418, 2013.
-  Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9):1834–1848, 2015.
-  Kaijian Xia, Hongsheng Yin, and Jiangqiang Wang. A novel improved deep convolutional neural network model for medical image fusion. Cluster Computing, pages 1–13, 2018.
-  Qingsen Yan, Dong Gong, and Yanning Zhang. Two-stream convolutional networks for blind image quality assessment. IEEE Transactions on Image Processing, 28(5):2200–2211, 2018.
-  Xiang Yan, Syed Zulqarnain Gilani, Hanlin Qin, Ajmal Mian, Student Member, Syed Zulqarnain Gilani, Hanlin Qin, and Ajmal Mian. Unsupervised Deep Multi-focus Image Fusion. pages 1–11, 2018.
-  Xingchen Zhang, Gang Xiao, Ping Ye, Dan Qiao, Junhao Zhao, and Shengyun Peng. Object fusion tracking based on visible and infrared images using fully convolutional siamese networks. In Proceedings of the 22nd International Conference on Information Fusion. IEEE, 2019.
-  Xingchen Zhang, Ping Ye, Shengyun Peng, Jun Liu, Ke Gong, and Gang Xiao. Siamft: An rgb-infrared fusion tracking method via fully convolutional siamese networks. IEEE Access, 7:122122–122133, 2019.
Yu Zhang, Lijia Zhang, Xiangzhi Bai, and Li Zhang.
Infrared and visual image fusion through infrared feature extraction and visual information preservation.Infrared Physics & Technology, 83:227 – 237, 2017.
-  Zhiqiang Zhou, Mingjie Dong, Xiaozhu Xie, and Zhifeng Gao. Fusion of infrared and visible images for night-vision context enhancement. Applied optics, 55(23):6480–6490, 2016.
-  Zhiqiang Zhou, Bo Wang, Sun Li, and Mingjie Dong. Perceptual fusion of infrared and visible images through a hybrid multi-scale decomposition with gaussian and bilateral filters. Information Fusion, 30:15–26, 2016.