TransFuse: A Unified Transformer-based Image Fusion Framework using Self-supervised Learning

by   Linhao Qu, et al.

Image fusion is a technique to integrate information from multiple source images with complementary information to improve the richness of a single image. Due to insufficient task-specific training data and corresponding ground truth, most existing end-to-end image fusion methods easily fall into overfitting or tedious parameter optimization processes. Two-stage methods avoid the need of large amount of task-specific training data by training encoder-decoder network on large natural image datasets and utilizing the extracted features for fusion, but the domain gap between natural images and different fusion tasks results in limited performance. In this study, we design a novel encoder-decoder based image fusion framework and propose a destruction-reconstruction based self-supervised training scheme to encourage the network to learn task-specific features. Specifically, we propose three destruction-reconstruction self-supervised auxiliary tasks for multi-modal image fusion, multi-exposure image fusion and multi-focus image fusion based on pixel intensity non-linear transformation, brightness transformation and noise transformation, respectively. In order to encourage different fusion tasks to promote each other and increase the generalizability of the trained network, we integrate the three self-supervised auxiliary tasks by randomly choosing one of them to destroy a natural image in model training. In addition, we design a new encoder that combines CNN and Transformer for feature extraction, so that the trained model can exploit both local and global information. Extensive experiments on multi-modal image fusion, multi-exposure image fusion and multi-focus image fusion tasks demonstrate that our proposed method achieves the state-of-the-art performance in both subjective and objective evaluations. The code will be publicly available soon.


page 17

page 20

page 25

page 27

page 28

page 29

page 32


TransMEF: A Transformer-Based Multi-Exposure Image Fusion Framework using Self-Supervised Multi-Task Learning

In this paper, we propose TransMEF, a transformer-based multi-exposure i...

T-Net: A Template-Supervised Network for Task-specific Feature Extraction in Biomedical Image Analysis

Existing deep learning methods depend on an encoder-decoder structure to...

Equivariant Multi-Modality Image Fusion

Multi-modality image fusion is a technique used to combine information f...

Multi-interactive Encoder-decoder Network for RGBT Salient Object Detection

RGBT salient object detection (SOD) aims to segment the common prominent...

A Symmetric Encoder-Decoder with Residual Block for Infrared and Visible Image Fusion

In computer vision and image processing tasks, image fusion has evolved ...

Guided Deep Decoder: Unsupervised Image Pair Fusion

The fusion of input and guidance images that have a tradeoff in their in...

Neural Image Re-Exposure

The shutter strategy applied to the photo-shooting process has a signifi...

Please sign up or login with your details

Forgot password? Click here to reset