DeepAI
Log In Sign Up

Towards Generalization on Real Domain for Single Image Dehazing via Meta-Learning

Learning-based image dehazing methods are essential to assist autonomous systems in enhancing reliability. Due to the domain gap between synthetic and real domains, the internal information learned from synthesized images is usually sub-optimal in real domains, leading to severe performance drop of dehaizing models. Driven by the ability on exploring internal information from a few unseen-domain samples, meta-learning is commonly adopted to address this issue via test-time training, which is hyperparameter-sensitive and time-consuming. In contrast, we present a domain generalization framework based on meta-learning to dig out representative and discriminative internal properties of real hazy domains without test-time training. To obtain representative domain-specific information, we attach two entities termed adaptation network and distance-aware aggregator to our dehazing network. The adaptation network assists in distilling domain-relevant information from a few hazy samples and caching it into a collection of features. The distance-aware aggregator strives to summarize the generated features and filter out misleading information for more representative internal properties. To enhance the discrimination of distilled internal information, we present a novel loss function called domain-relevant contrastive regularization, which encourages the internal features generated from the same domain more similar and that from diverse domains more distinct. The generated representative and discriminative features are regarded as some external variables of our dehazing network to regress a particular and powerful function for a given domain. The extensive experiments on real hazy datasets, such as RTTS and URHI, validate that our proposed method has superior generalization ability than the state-of-the-art competitors.

READ FULL TEXT VIEW PDF

page 7

page 8

page 9

page 10

02/16/2022

Learning to Generalize across Domains on Single Test Samples

We strive to learn a model from a set of source domains that generalizes...
10/18/2021

Exploiting Domain-Specific Features to Enhance Domain Generalization

Domain Generalization (DG) aims to train a model, from multiple observed...
12/01/2020

Learning to Generalize Unseen Domains via Memory-based Multi-Source Meta-Learning for Person Re-Identification

Recent advances in person re-identification (ReID) obtain impressive acc...
12/15/2020

Cluster, Split, Fuse, and Update: Meta-Learning for Open Compound Domain Adaptive Semantic Segmentation

Open compound domain adaptation (OCDA) is a domain adaptation setting, w...
10/14/2022

Meta Transferring for Deblurring

Most previous deblurring methods were built with a generic model trained...
11/01/2020

Discriminative Adversarial Domain Generalization with Meta-learning based Cross-domain Validation

The generalization capability of machine learning models, which refers t...
12/18/2017

Visual Explanation by Interpretation: Improving Visual Feedback Capabilities of Deep Neural Networks

Learning-based representations have become the defacto means to address ...

1 Introduction

The control and navigation of autonomous systems rely heavily on accurate visual perception (Ifqir et al., 2022; Gróf et al., 2022; Tang et al., 2021), in which clear images play an significant role (Zhao et al., 2022b, a; Wu et al., 2022). However, when affected by haze, images captured outdoors usually suffer from low contrast and poor visibility, resulting in severe performance degradations of perception models in autonomous systems. Thus, it is significant to recover clear images from hazy counterparts, called image dehazing, for enhancing autonomous system reliability.

Recently, benefit from the advances in deep learning, a great number of learning-based image dehazing methods are proposed, which provides a solution to promote the robustness of autonomous systems in foggy days

(Zhang et al., 2020a; Tang et al., 2022). Nevertheless, the optimization of these dehazing models requires a large quantity of paired hazy/clear images, which are laborious and expensive to be collected in practice. To address this issue, numerous synthetic datasets (Li et al., 2019, 2020; Sakaridis et al., 2018; Zheng et al., 2021) are spwaned, where hazy images are synthesized from clear counterparts. However, these synthesized hazy images are biased to depict real scenarios, leading to the domain gap between synthetic and real samples. Thus, the dehazing models learned from these synthetic samples (Li et al., 2019, 2020; Sakaridis et al., 2018; Zheng et al., 2021) frequently suffer from severe performance drop on real images, as the internal information learned from synthetic domains is usually sub-optimal in real domains.

Meta-learning is capable of digging out internal information from a few samples of target domains, so that the model only trained on source domains can quickly adapt to target domains (Zhang et al., 2020a; Sun et al., 2022). Therefore, applying meta-learning to single image dehazing will be a promising subject to improve the generalization ability on real domains. Meta-learning can be categorized into metric-based, optimization-based and model-based approaches (Huisman et al., 2021; Chen et al., 2022), where optimization-based techniques (Finn et al., 2017; Liu et al., 2019a; Sun et al., 2020) are usually employed in image restoration (Soh et al., 2020; Chi et al., 2021) to explore the internal information of new scenarios. Nonetheless, these methods (Soh et al., 2020; Chi et al., 2021) depend heavily on test-time training, which requires carefully designed hyperparameters (i.e., iteration steps) for unseen test samples (Gao et al., 2022) and contributes to additional computational consumptions (Huisman et al., 2021), which hinders the real-time application of autonomous systems (Kaleli, 2020; Zhang et al., 2022; Van Dooren et al., 2022). Different from these researches (Soh et al., 2020; Chi et al., 2021), we seek to deal with domain generalization by leveraging model-based meta-learning methods (Huisman et al., 2021; Chen et al., 2022; Garnelo et al., 2018a; Zhang et al., 2021; Ye and Yao, 2022), epecifically adaptive risk minimization (ARM) (Zhang et al., 2021), which can effectively avoid test-time training and enable internal learning (Chi et al., 2021; Zhang et al., 2019) on real domains with a few real-world hazy samples.

To capture internal information of a given domain, Zhang et al. (2021) add two entities termed adaptation network and average aggregator to the prediction network. The adaptation network assists in distilling internal information from a few samples and caching it into a collection of features. The average aggregator strives to summarize the generated features to grasp domain-specific internal properties. The summarized features are regarded as some external variables of the prediction network to regress a particular function for the given domain. Intuitively, the more representative and discriminative the extracted domain-specific information is, the more capable the regressed prediction function is to cope with samples in the given domain. However, directly employing ARM (Zhang et al., 2021)

into image dehazing may cause some limitations. Firstly, the adaptation network is made of vanilla convolution layers, which may lead to unrealiable internal information, as the inputs are generally covered by haze in image dehazing. Secondly, the average aggregator treats each sample equally and makes the external variables vulnerable to outliers, since the features encoded from outliers usually fail to grasp representative domain-specific information. Thirdly, the adaptation network and the aggregator only focus on intra-domain information and leave inter-domain information unused, leading to discriminative domain-specific information failing to be captured sufficiently.

Targeting at the first issue, we embed context-gate convolution (CG-Conv) layers (Lin et al., 2020) into the adaptation network, which enhances the reliability of features by fusing context information from entire images. Aiming at the second challenge, we propose a non-parameter distance-aware aggregator to suppress the misleading information of outliers, according to the discovery that features of outliers are usually located far away from that of normal samples. The proposed aggregator reweights the internal features of diverse samples according to their relative distances, so that the misleading information from outliers is weaken and the representative information of normal samples is enhanced. To address the third dilemma, intra- and inter-domain information is incorporated and a domain-relevant contrastive regularization is presented, which attempts to make the internal information distilled from the same domain more similar and that from diverse domains more distinct. By embedding the domain-relevant contrastive regularization into our framework, the intra-domain homogeneity and inter-domain heterogeneity can be grasped, which further enhances the representativeness and discrimination of external variables. Comprehensive experiments are conducted to demonstrate the generalization ability of our dehazing model on real domains.

In summary, the main contributions are listed as follows:

  • A domain generalization framework is proposed for single image dehazing, which can generalize to real hazy images effectively without test-time training.

  • A distance-aware aggregator is presented to capture more representative information of normal samples and suppress the misleading one of outliers.

  • A domain-relevant contrastive regularization is presented, which facilitates the regressed dehazing function to capture more discriminative internal information of domains.

  • The experiments on real hazy images demonstrate that our proposed framework is superior to the state-of-the-art learning-based dehazing methods (Yang et al., 2022; Guo et al., 2022).

The rest of this paper is organized as follows. Section 2 reviews some leading-edge studies related to both single image dehazing and meta-learning in image restoration. Section 3 gives a detailed overview of our dehazing framework. Section 4 illustrates the implementation details and experimental results. Section 5 summarizes this paper and discusses the future work.

2 Related Work

2.1 Learning-based Single Image Dehazing

The past decade has witnessed the emergency of a large number of learning-based image dehazing algorithms. Li et al. (2017) predict transmission maps and atmospheric lights jointly in a unified CNN architecture and generates haze-free images through a variant of the atmospheric scattering model (McCartney, 1976; Narasimhan and Nayar, 2000, 2002). More learning-based methods (Liu et al., 2019b; Dong et al., 2020a; Qin et al., 2020; Wu et al., 2021; Guo et al., 2022)

tend to generate haze-free images directly, as estimating intermediate variables (i.e., transmission maps) may give rise to cumulative errors

(Cai et al., 2016; Ren et al., 2016; Zhang and Patel, 2018; Lee et al., 2020). Liu et al. (2019b) abandon the estimation of transmission maps and design a end-to-end CNN network to conduct dehazing. This method (Liu et al., 2019b) fails to leverage features from different scales, which drives Dong et al. (2020a) to dig out the correlations of multi-level features. Taking the importance of diverse features and pixels into account, Qin et al. (2020) employ the channel attention and the pixel attention to treat each feature and each pixel unequally. To balance the performance and computational costs, Wu et al. (2021) devise a compact network architecture and introduces contrastive learning to suppress unexpected predictions. Driven by the ability of Transformer in modeling long-range feature dependences, Guo et al. (2022) integrate Transformer and CNN for single image dehazing.

Learning-based approaches require a great deal of paired data to optimize their models, which spawns various synthetic datasets (Li et al., 2019, 2020; Sakaridis et al., 2018; Zheng et al., 2021). These hazy images with diverse densities are synthesized from clear ones by adjusting scattering coefficient and atmospheric light via the atmospheric scattering model (McCartney, 1976; Narasimhan and Nayar, 2000, 2002). Nevertheless, due to the inaccurate estimation of depths and the complexity of real imaging mechanism, the synthetic hazy images fail to depict real-world hazy scenes reliably, contributing to the domain gap between synthetic and real-world hazy images. Although the model trained on synthetic datasets (Li et al., 2019, 2020; Sakaridis et al., 2018; Zheng et al., 2021) can grasp the internal information of synthetic domains, it frequently suffers from a performance drop on the real hazy images, as the features learned from synthetic domains are sub-optimal in real domains due to the domain gap.

Figure 1: The overview of the proposed framework.
Symbolism Meaning
Domain set of hazy samples
The th domain
The th task
Hazy images
Haze-free images
Task-specific parameter of
Optimal domain-specific parameter of
Preliminary parameter
Adaptation network
Dehazing network
Neural weights of adaptation network
Neural weights of dehazing network
Number of sample pairs in each task
Number of sampled tasks
Number of domains in
Number of preliminary parameters of a task
Table 1: The nomenclature of the symbolism involved in this paper.

2.2 Meta-Learning in Image Restoration

Meta-learning, also known as learning to learn, targets adapting to a new scenario rapidly from a limited number of samples, which has been applied to diverse computer vision tasks and has made significant breakthroughs in recent years. Meta-learning is often categorized into metric-based, optimization-based, and model-based techniques

(Huisman et al., 2021), where optimization-based algorithms, especially model-agnostic meta-learning (MAML) (Finn et al., 2017) and its variants (Liu et al., 2019a; Sun et al., 2020), are widely employed in image restoration to refine the model parameters at test-time for improving the generalization capability. Soh et al. (2020)

employ MAML to image super-resolution for obtaining an optimal model initialization, based on which the model can adapt to unseen samples with several test-time training steps.

Chi et al. (2021) adopt an auxiliary reconstruction task to optimize the model indirectly to deal with blur images caused by unseen kernels. To tackle multi-domain learning in image dehazing, Liu et al. (2022) conduct test-time training to enable the adaptation to specific domains.

However, the effectiveness of test-time training depends heavily on manually designed hyperparameters (i.e., iteration steps and learning rates), which may be various among different test samples and lead to under-fitting on target unsen images (Gao et al., 2022). In addition, test-time training increases the computational costs and runtime of the model (Huisman et al., 2021; Liu et al., 2022), which results in low efficiency in practical applications. To tackle these issues, this paper resorts to model-based meta-learning approaches (Garnelo et al., 2018a, b; Zhang et al., 2021), especially ARM (Zhang et al., 2021), to modify the model without sensitive and time-consuming test-time training. Note that although both our work and that in Liu et al. (2022) are based on meta-learning, the problem definition is different. Liu et al. (2022) require that test hazy images are from the same domain as the training ones, while we attempt to overcome the out-of-distribution challenge with test hazy images from unseen real domains.

3 Methodology

3.1 Framework Overview

Figure 2: The process of obtaining task-specific parameters.

In this paper, we present a novel framework for single image dehazing, which can deal with out-of-distribution domain generalization and enable internal learning on real domains without test-time training. As shown in Figure 1, our proposed framework includes an adaptation network and a dehazing network . and stand for the neural parameters of and , respectively, and denotes external variables and serves as an additional input of . Among them, and are fixed after meta-training and are shared cross domains, while varys with domain properties of input hazy images and is specified to a particular domain.

Assume that there are domains with hazy and haze-free pairs , and the optimal external variables of domains are represented as domain-specific parameters . Intuitively, embody the most representative and discriminative internal information of corresponding domains. Given a specific domain sampled from and a task sampled from , our aim is to estimate a task-specific parameter from to approximate , so that the regressed dehazing function is capable of handling samples in commendably. We adopt to dig out internal information related to from each sample in and cache the distilled information into a series of features , which we call preliminary parameters in this paper. These preliminary parameters are then aggregated to obtain via a permutation invariant operation (Garnelo et al., 2018a, b; Zhang et al., 2021; Ye and Yao, 2022). In this way, distinct and powerful dehazing functions can be obtained for a batch of corresponding tasks without test-time training, where are randomly sampled from various domains in . The nomenclature of the symbolism are listed in Table. 1.

3.2 Adaptation Network

As exhibited in Figure 1, of is obtained through our adaptation network as well as our aggregator. The explores the domain properties from the samples in and store them into a series of preliminary parameters . The aggregator summarizes the internal information hidden in to obtain . In this section, we focus on our designed and discuss our aggregator in next section.

Figure 3: UMAP feature visualization of 12000 images randomly sampled from OTS, ITS and RTTS datasets (Li et al., 2019), which are denote as the label 0, label 1 and label 2, respectively. The features generated by a pretrained ResNet101 (He et al., 2016). Some outliers of the OTS dataset are highlighted by the green dotted circles.

The architecture of is displayed in Figure 2. Since the internal information of input images is generally covered by haze, directly employing vanilla convolution layers, which adopt a local perspective to extract domain information, may lead to unreliable with poor internal information. In this work, CG-Conv (Lin et al., 2020) is introduced to compose the principal part of . By adopting CG-Conv layers, the extracted internal information is capable of integrating context information from entire images, so as to access more robust with richer domain properties. In particular,

consists of two CG-Conv layers. Each layer is followed by a batch normalization (BN) and a ReLU activation function, where the BN layer is employed to accelerate the convergence of the network. Moreover, a conventional convolution layer is connected to output the preliminary parameter

for each input sample .

3.3 Distance-Aware Aggregation

Apart from normal samples, there are some outliers in hazy domains. The features generated by outliers are usually located far away from those of normal samples, as shown in the green dotted circles in Figure 3, and fail to capture representative domain-specific internal properties. Thus, it is necessary for the aggregator to suppress misleading information of outliers. A commonly-used method to aggregate for is to treat the internal properties extracted from each sample equally and employ an average operation to obtain (Garnelo et al., 2018a, b; Zhang et al., 2021; Ye and Yao, 2022):

(1)

However, when encountering outliers, the average aggregator may lead to deviate from , as depicted in Figure 4(a) and (b).

Figure 4: Comparison between the average operation and our distance-aware aggregation. The blue ellipse represents the distribution of estimated by samples in .

Considering the feature distribution of outliers and normal samples, we propose a non-parametric distance-aware aggregation operation to alleviate the adverse effects caused by outliers. Specifically, we aim to reduce the contributions of outliers on and heighten that of normal samples. We first calculate the average distance between th preliminary parameter and remaining ones of :

(2)

where denotes the regularization. Thus, we can obtain a set of distance values for samples in . The larger the value of

, the higher the probability that

can be regarded as an outlier and the weight coefficient of needs to be reduced. Then, we reset the weights of according to the computed , and obtain by summing the reweighted preliminary parameters:

(3)

In this way, can capture more representative internal information from normal samples and less misleading information from outliers, which encourages to be located closer to , as illustrated in Figure 4(c).

3.4 Domain-Relevant Contrastive Regularization

To regress a more discriminative and representative task-specific parameter, in this paper, we incorporate inter-domain information with intra-domain properties. We note that each task-specific parameter in is a linear combination of corresponding preliminary parameters and can be regarded as a collection of internal features. It is reasonable to assume that the task-specific parameters generated from the same domain have more similar internal features, while the ones from different domains have more distinct features. Based on this assumption, we introduce contrastive learning (He et al., 2020; Chen et al., 2020) and propose a novel domain-relevant contrastive regularization to capture the intra-domain homogeneity and inter-domain heterogeneity. Figure 5 provide an example to depict our domain-relevant contrastive regularization.

For the sake of measuring the feature similarities of task-specific parameters generated from non-aligned tasks, we draw lessons from some studies in image-to-image translation

(Zhang et al., 2020c; Zhan et al., 2021) and employ the contextual loss via:

(4)

where and are the indexes of the feature maps in of and of , respectively. is the contextual similarity measurement commonly adopted in image-to-image translation (Mechrez et al., 2018). The remaining question is how to choose “positive” and “negative” pairs.

Figure 5: An example to illustrate the domain-relevant contrastive regularization, where and are from and are from .

Suppose that merely and in are from the same domain , while the others are from diverse domains except . Our idea for the “positive” pair is to pick the more representative one from and

and adopt it to guide another to capture more representative internal information. Targeting this issue, we embed an additional classifier into our framework, as exhibited in Figure

1, which takes the as inputs and attempt to predict the confidence scores. The higher the confidence score, the more representative the parameter is and easier it to be classified into the corresponding domain. If the confidence score of is higher than that of , will be selected to assist to learn more representative internal information, and vice versa. For the “negative” pairs, the unpicked parameter is grouped with each task-specific parameter generated from other domains to enhance its discrimination. Therefore, our domain-relevant contrastive regularization can be deduced as:

(5)

where the picked parameter is that has higher confidence scores than . The is a constant to avoid situations where the denominator becomes zero.

3.5 Loss Function

Except for the domain-relevant contrastive regularization , there are other four loss functions are employed in our experiments, including the pixel-wise loss (Dong et al., 2020b; Qin et al., 2020), the structure similarity loss (Dong et al., 2020b), the contrastive regularization loss (Wu et al., 2021) and the cross entropy loss . Therefore, the overall optimization function of a batch of sampled tasks is defined as

(6)

where , , and are the trade-off weights.

3.5.1 Pixel-wise Loss.

Pixel-wise loss is employed to quantify the pixel-wise distance between the generated image and the ground-truth:

(7)

where the is the haze-free image generated from the th sample in the -th task.

3.5.2 Structure Similarity Loss.

Structure similarity loss quantifies the distance between the restored image and the ground truth in terms of brightness and contrast. It is defined as

(8)

where stands for the operation to calculate the structure similarity of two images.

3.5.3 Contrastive Regularization Loss.

We also employ the contrastive regularization (Wu et al., 2021) to further meliorate the quality of the restored images in the representative space.

(9)

where denotes the fixed pre-trained feature extractor and is -th feature map from the feature extractor. is the number of feature maps. is the balancing term of the contrastive regularization loss that relates to the -th feature map.

3.5.4 Cross Entropy Loss.

Cross entropy loss is employed to train our classifier, which is defined as:

(10)

where the and are the given probability and the estimated probability of , respectively. The is the Euler number.

4 Experiments

[b] Methods Year Real RTTS URHI ESPW #Param Runtime(s) BRISQUE NIQE BRISQUE NIQE BRISQUE NIQE Hazy - - 37.011* 3.583* 33.531 4.128 21.081 2.859 - - DAD 2020 32.727* 3.672* - - 29.728 3.439 54.59M 0.010 PSD 2021 25.239* 3.077* - - 23.616 3.259 33.11M 0.024 AOD-Net 2017 35.466 3.636 34.077 3.605 21.911 3.288 0.002M 0.004 GDN 2019 28.086 3.200 27.941 4.924 20.613 2.861 0.96M 0.014 MSBDN 2020 28.743* 3.154* 26.617 3.079 22.352 2.939 31.35M 0.021 FFA-Net 2020 30.183 3.050 26.141 3.265 22.753 2.959 4.68M 0.087 AECR-Net 2021 28.594 3.139 25.879 3.251 17.840 2.964 2.61M 0.028 4kD 2021 27.254 3.149 24.853 4.975 19.116 3.085 34.55M 0.094 D4 2022 29.536 3.174 27.429 3.065 18.987 2.906 10.70M 0.032 DeHamer 2022 30.986 3.197 28.202 3.023 20.723 2.889 4.63M 0.066 Ours 2022 26.521 2.997 24.465 2.941 17.726 2.887 31.58M 0.062

Table 2: Quantitative comparison of the SOTA dehazing models on real hazy datasets (Li et al., 2019; Ancuti et al., 2018, 2020).
  • “Real” denotes the access of real hazy images during the training stage.

  • “*” represents that the results are obtained from the existing paper (Chen et al., 2021).

Figure 6: Visual comparisons with conventional learning-based methods on real-world hazy images from RTTS dataset (Li et al., 2019).

4.1 Implementation Details

4.1.1 Datasets

RESIDE (Li et al., 2019) is a widely-used dataset for single image dehazing, which consists of five subsets, including Indoor Training Set (ITS), Outdoor Training Set (OTS), Synthetic Object Testing Set (SOTS), Real Task-driven Testing Set (RTTS) and Unannotated Real Hazy Images (URHI). Among them, ITS, OTS, and SOTS are synthesized by adjusting the values of scattering coefficient and atmospheric light artificially, while RTTS and URHI are taken in real hazy scenarios directly. In our experiments, we select 6000 pairs of images from ITS and OTS, respectively, and create our training set that is composed of 12000 pairs of synthetic samples. We define each synthetic set as a particular domain, as the average depth errors are different among diverse datasets (Li et al., 2019). Therefore, our training set can be structured into two domains. To evaluate the proposed method on real hazy images, RTTS and URHI are adopted in our experiments, where RTTS is composed of 4322 real hazy images and URHI consists of 4809 ones. We also evaluate our model on other real hazy images employed by some previous work (Fattal, 2014; He et al., 2010), which are denoted as ESPW in this paper.

4.1.2 Training Details

In our experiments, the coefficients , , and are set to 0.5, 0.1, 1 and 0.5, respectively, to balance the value of each loss function. The is set to . The feature extractor employed for the contrastive regularization loss is the frozen pre-trained VGG19, and the features are selected from the st, rd, th, th and th layers of the feature extractor with the coefficients set to and , respectively (Wu et al., 2021)

. Our proposed model is implemented on the basis of PyTorch, and is trained by the Adam optimizer with

and . In each iteration, the model samples 2 tasks, and each task consists of 4 synthetic hazy images from the same domain. The initial learning rate is set as 0.0002 for the whole network. In addition, the input size is . The training data is augmented by means of random rotation and random flip. For the network architecture, we follow the previous work MSBDN (Dong et al., 2020a) due to its compact architecture and effective performance. All experiments are conducted on a NVIDIA Tesla V100 GPU.

Figure 7: Visual comparisons with conventional learning-based methods on real-world hazy images from URHI dataset (Li et al., 2019).
Figure 8: Visual comparisons with conventional learning-based methods on real hazy images from ESPW dataset (Fattal, 2014; He et al., 2010).

4.1.3 Competitors and Evaluation Metrics

Our proposed model is compared with open-source and state-of-the-art learning-based algorithms, such as AOD-Net

(Li et al., 2017), GridDehazeNet (GDN) (Liu et al., 2019b), MSBDN (Dong et al., 2020a), FFA-Net (Qin et al., 2020), AECR-Net (Wu et al., 2021), 4kDehazing (4kD) (Zheng et al., 2021), D4 (Yang et al., 2022) and DeHamer (Guo et al., 2022). Some image dehazing models based on domain adaptation, which have been pre-trained or fine-tuned on a large number of real hazy images, are also involved to assess our proposed model, such as DAD (Shao et al., 2020) and PSD (Chen et al., 2021). The results of the competitors are from existing papers, if available. Otherwise, the results are generated through the pre-trained models provided by their authors.

Due to the absence of paired samples, commonly-used structural similarity index (SSIM) and peak signal to noise ratio (PSNR) fail to be applied for the assessment on the dehazing results in RTTS, URHI

(Li et al., 2019) and ESPW (Fattal, 2014; He et al., 2010)

. Thus, we resort to blind evaluation metrics for providing quantitative comparison on real hazy images. In particular, we adopt blind/referenceless image spatial quality evaluator (BRISQUE) and natural image quality evaluator (NIQE) in the following experiments. The lower values of BRISQUE and NIQE, the higher quality of restored images.

4.2 Comparison with State-of-The-Art Methods

Figure 9: Visual comparisons with domain adaptation-based methods on real hazy images (Li et al., 2019).
Figure 10: Ablation study of our proposed model with different settings on the RTTS dataset (Li et al., 2019).

4.2.1 Quantitative Comparison

We first leverage BRISQUE and NIQE to evaluate the performance of our proposed model with the state-of-the-art competitors. Table 2 reveals the quantitative results of each participant on RTTS, URHI (Li et al., 2019) and ESPW (Fattal, 2014; He et al., 2010) datasets. The best results of dehazing models trained merely on synthetic samples are shown in bold, and the runtime of each method is obtained by averaging the cost of 1000 images with the size of . Due to the exposure to URHI dataset during training or fine-tuning, DAD (Shao et al., 2020) and PSD (Chen et al., 2021) are only evaluated on RTTS and ESPW datasets. It can be clearly observed that our proposed model has the better performance against the other conventional learning-based algorithms in the case of training on synthetic data. In addition, our model is competitive to the domain adaptation-based algorithms, which have the access to real hazy images before testing. Compared with our backbone MSBDN (Dong et al., 2020a), our model achieves considerable performance gains on real hazy images with a small amount of additional parameters and runtime costs.

Methods Real mAP (%) Gain
Hazy - 63.32 -
DAD 65.02 +1.70
PSD 65.84 +2.52
AOD-Net 60.45 -2.87
GDN 63.59 +0.27
MSBDN 65.16 +1.84
FFA-Net 64.44 +1.12
AECR-Net 65.39 +2.07
4kD 64.53 +1.21
D4 63.44 +0.12
DeHamer 64.63 +1.31
Ours 65.55 +2.23
Table 3: Quantitative comparison of object detection on RTTS dataset (Li et al., 2019).

4.2.2 Qualitative Comparison

In this section, we provide qualitative results to further evaluate the performance of our proposed model. Figure 6, Figure 7 and Figure 8 exhibit the visual comparison of both our model and conventional learning-based algorithms on RTTS, URHI and ESPW datasets (Li et al., 2019), respectively. It can be seen that our proposed model can obtain higher-quality dehazing results from both global and local perspectives, where the images with less color distortion and more detailed textures can be restored. Our model is improved on MSBDN (Dong et al., 2020a) but is capable of restoring more clear objects with less residual haze, which demonstrates that utilizing the internal information of real domains contributes to boost the model performance on real hazy images. Compared with our model, there are more challenges for the competitive algorithms to deal with thick haze and color shift in hazy inputs, leading to erroneously-handling areas and darker visual perception of their restored images. We also exhibit the dehazing results of the algorithms based on domain adaptation in Figure 9. It can be clearly found that although our model fail to be pre-trained and fine-tuned on real hazy images, the restored image of our model has less residual haze than that of PSD (Chen et al., 2021) and has visual effects similar to that of DAD (Shao et al., 2020).

Methods Settings URHI
Baseline Adaptation network Aggregator DCR
BRISQUE NIQE
Our model 26.537 2.998
26.470 2.983
26.012 2.964
25.791 2.955
24.465 2.941
Table 4: Ablation study of our proposed model with different settings on URHI dataset (Li et al., 2019).
Figure 11: Visual comparisons on other similar tasks. The inputs of (a) and (d) are download from the Internet, and the inputs of (b) and (c) are from the relevant existing datasets of nighttime image dehazing (Li et al., 2015; Zhang et al., 2017, 2020b) and underwater image enhancement (Islam et al., 2020).

4.2.3 Evaluation on Object Detection

We further evaluate the generalization ability of the dehazing algorithms according to the performance improvement of the object detection model. We adopt RTTS dataset (Li et al., 2019), which is composed of real hazy image with annotations of object categories as well as bounding boxes. We use different dehazing algorithms to restore and enhance the images of RTTS (Li et al., 2019) separately for meliorating the quality of test samples. We leverage YOLOv3 (Redmon and Farhadi, 2018) to detect objects on generated haze-free results, and calculate their mean average accuracy (mAP). Table 3 shows the detection accuracy and performance gains of YOLOv3 (Redmon and Farhadi, 2018) on diverse dehazing results. It can be noticed that the mAP has a significant decline on the images restored by AOD-Net (Li et al., 2017), suggesting that AOD-Net (Li et al., 2017) tend to entail the performance degradation of the object detection model. Moreover, compared with remaining learning-based competitors, the mAP value calculated from our model reaches the top accuracy, which implies that the images generated by our model have higher perceptual sensitivity. Furthermore, in terms of the evaluation driven by the downstream visual task, our model is comparable to the domain adaptation-based methods (Shao et al., 2020; Chen et al., 2021).

4.3 Ablation Study

In this section, we conduct comprehensive ablation studies on the RTTS dataset (Li et al., 2019) to illustrate the effectiveness of different elements in our proposed model. Figure 10 and Table 4 exhibit the qualitative and quantitative results generated by our proposed model with diverse settings, respectively. Taking the differences of loss functions into account, we retrain the MSBDN (Dong et al., 2020a) with our loss functions (denoted as baseline), so that the evaluation interference caused by loss functions can be eliminated. and stands for the adaptation network based on conventional CNN and CG-Conv, respectively. and denotes the the average and the distance-aware aggregation, respectively. DCR is the domain-relevant contrastive regularization. It can be clearly observed in Figure 10 that merely adopting the adaptation network and the average aggregator contributes to residual haze in the restored image. The presented distance-aware aggregator significantly alleviates the residual-haze effect but darkens generated images, leading to poor visual perception. Instead, the combination of the proposed adaptation network, the distance-aware aggregator and the domain-relevant contrastive regularization can produce clearer and higher-quality images. Table 4 also illustrates the effectiveness of our proposed elements.

4.4 Application to Other Low-Level Tasks

Apart from single image dehazing, we explore some other low-level computer vision tasks, including single image deraining, nighttime image dehazing, underwater image enhancement, and single image glare removal, to evaluate the performance gains of our model compared with the backbone MSBDN (Dong et al., 2020a). Figure 11 shows the results generated by MSBDN (Dong et al., 2020a) and our model. It can be seen that our contributions on MSBDN (Dong et al., 2020a) not only improve the performance on real hazy samples, but also enhance the generalization capability in other similar low-level tasks.

5 Conclusions

In this work, we propose a domain generalization framework via model-based meta-learning for single image dehazing. By combining both our adaptation network and distance-aware aggregator with the dehazing network, our model can dig out representative internal information from a specific real domain. In addition, we present a domain-relevant contrastive regularization to facilitate external variables to capture more discriminative information of domains, contributing to a more powerful dehazing function for the given domain. The extensive experiments demonstrate that our proposed model outperforms the state-of-the-art competitors on real hazy domains.

References

  • Ancuti et al. (2020) Ancuti, C.O., Ancuti, C., Timofte, R., 2020.

    NH-HAZE: An image dehazing benchmark with non-homogeneous hazy and haze-free images, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 444–445.

  • Ancuti et al. (2018) Ancuti, C.O., Ancuti, C., Timofte, R., De Vleeschouwer, C., 2018. O-HAZE: a dehazing benchmark with real hazy and haze-free outdoor images, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 754–762.
  • Cai et al. (2016) Cai, B., Xu, X., Jia, K., Qing, C., Tao, D., 2016. DehazeNet: An end-to-end system for single image haze removal. IEEE Transactions on Image Processing 25, 5187–5198.
  • Chen et al. (2022) Chen, R., Gao, N., Vien, N.A., Ziesche, H., Neumann, G., 2022. Meta-learning regrasping strategies for physical-agnostic objects. arXiv preprint arXiv:2205.11110 .
  • Chen et al. (2020) Chen, T., Kornblith, S., Norouzi, M., Hinton, G., 2020.

    A simple framework for contrastive learning of visual representations, in: Proceedings of the International Conference on Machine Learning, pp. 1597–1607.

  • Chen et al. (2021) Chen, Z., Wang, Y., Yang, Y., Liu, D., 2021. PSD: Principled synthetic-to-real dehazing guided by physical priors, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7180–7189.
  • Chi et al. (2021) Chi, Z., Wang, Y., Yu, Y., Tang, J., 2021. Test-time fast adaptation for dynamic scene deblurring via meta-auxiliary learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9137–9146.
  • Dong et al. (2020a) Dong, H., Pan, J., Xiang, L., Hu, Z., Zhang, X., Wang, F., Yang, M.H., 2020a. Multi-scale boosted dehazing network with dense feature fusion, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2157–2167.
  • Dong et al. (2020b) Dong, Y., Liu, Y., Zhang, H., Chen, S., Qiao, Y., 2020b.

    FD-GAN: Generative adversarial networks with fusion-discriminator for single image dehazing, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 10729–10736.

  • Fattal (2014) Fattal, R., 2014. Dehazing using color-lines. ACM Transactions on Graphics 34, 1–14.
  • Finn et al. (2017) Finn, C., Abbeel, P., Levine, S., 2017. Model-agnostic meta-learning for fast adaptation of deep networks, in: Proceedings of the International Conference on Machine Learning, pp. 1126–1135.
  • Gao et al. (2022) Gao, N., Ziesche, H., Vien, N.A., Volpp, M., Neumann, G., 2022. What matters for meta-learning vision regression tasks?, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14776–14786.
  • Garnelo et al. (2018a) Garnelo, M., Rosenbaum, D., Maddison, C., Ramalho, T., Saxton, D., Shanahan, M., Teh, Y., Rezende, D., Eslami, S., 2018a. Conditional neural processes, in: Proceedings of the International Conference on Machine Learning, pp. 1704–1713.
  • Garnelo et al. (2018b) Garnelo, M., Schwarz, J., Rosenbaum, D., Viola, F., Rezende, D., Eslami, S., Teh, Y., 2018b. Neural processes. arXiv:1807.01622 .
  • Gróf et al. (2022) Gróf, T., Bauer, P., Watanabe, Y., 2022. Positioning of aircraft relative to unknown runway with delayed image data, airdata and inertial measurement fusion. Control Engineering Practice 125, 105211.
  • Guo et al. (2022) Guo, C.L., Yan, Q., Anwar, S., Cong, R., Ren, W., Li, C., 2022. Image dehazing transformer with transmission-aware 3D position embedding, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5812–5820.
  • He et al. (2020) He, K., Fan, H., Wu, Y., Xie, S., Girshick, R., 2020. Momentum contrast for unsupervised visual representation learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738.
  • He et al. (2010) He, K., Sun, J., Tang, X., 2010. Single image haze removal using dark channel prior. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 2341–2353.
  • He et al. (2016) He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778.
  • Huisman et al. (2021) Huisman, M., Van Rijn, J.N., Plaat, A., 2021. A survey of deep meta-learning. Artificial Intelligence Review 54, 4483–4541.
  • Ifqir et al. (2022) Ifqir, S., Combastel, C., Zolghadri, A., Alcalay, G., Goupil, P., Merlet, S., 2022. Fault tolerant multi-sensor data fusion for autonomous navigation in future civil aviation operations. Control Engineering Practice 123, 105132.
  • Islam et al. (2020) Islam, M., Xia, Y., Sattar, J., 2020. Fast underwater image enhancement for improved visual perception. IEEE Robotics and Automation Letters 5, 3227–3234.
  • Kaleli (2020) Kaleli, A., 2020. Development of the predictive based control of an autonomous engine cooling system for variable engine operating conditions in SI engines: design, modeling and real-time application. Control Engineering Practice 100, 104424.
  • Lee et al. (2020) Lee, B., Lee, K., Oh, J., Kweon, I., 2020. CNN-based simultaneous dehazing and depth estimation, in: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 9722–9728.
  • Li et al. (2017) Li, B., Peng, X., Wang, Z., Xu, J., Feng, D., 2017. AOD-Net: All-in-one dehazing network, in: Proceedings of the IEEE International Conference on Computer Vision, pp. 4770–4778.
  • Li et al. (2019) Li, B., Ren, W., Fu, D., Tao, D., Feng, D., Zeng, W., Wang, Z., 2019. Benchmarking single-image dehazing and beyond. IEEE Transactions on Image Processing 28, 492–505.
  • Li et al. (2020) Li, R., Zhang, X., You, S., Li, Y., 2020. Learning to dehaze from realistic scene with a fast physics-based dehazing network. arXiv:2004.08554 .
  • Li et al. (2015) Li, Y., Tan, R., Brown, M., 2015. Nighttime haze removal with glow and multiple light colors, in: Proceedings of the IEEE International Conference on Computer Vision, pp. 226–234.
  • Lin et al. (2020) Lin, X., Ma, L., Liu, W., Chang, S., 2020. Context-gated convolution, in: Proceedings of the European Conference on Computer Vision, Springer. pp. 701–718.
  • Liu et al. (2022) Liu, H., Wu, Z., Li, L., Salehkalaibar, S., Chen, J., Wang, K., 2022. Towards multi-domain single image dehazing via test-time training, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5831–5840.
  • Liu et al. (2019a) Liu, S., Davison, A., Johns, E., 2019a. Self-supervised generalisation with meta auxiliary learning. Advances in Neural Information Processing Systems 32.
  • Liu et al. (2019b) Liu, X., Ma, Y., Shi, Z., Chen, J., 2019b. GridDehazeNet: Attention-based multi-scale network for image dehazing, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7314–7323.
  • McCartney (1976) McCartney, E., 1976. Optics of the atmosphere: Scattering by molecules and particles. John Wiley and Sons: New York .
  • Mechrez et al. (2018) Mechrez, R., Talmi, I., Zelnik-Manor, L., 2018. The contextual loss for image transformation with non-aligned data, in: Proceedings of the European Conference on Computer Vision, pp. 768–783.
  • Narasimhan and Nayar (2000) Narasimhan, S., Nayar, S., 2000. Chromatic framework for vision in bad weather, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 598–605.
  • Narasimhan and Nayar (2002) Narasimhan, S., Nayar, S., 2002. Vision and the atmosphere. International Journal of Computer Vision 48, 233–254.
  • Qin et al. (2020) Qin, X., Wang, Z., Bai, Y., Xie, X., Jia, H., 2020. FFA-Net: Feature fusion attention network for single image dehazing, in: Proceedings of the AAAI Conference on Artificial Intelligence.
  • Redmon and Farhadi (2018) Redmon, J., Farhadi, A., 2018. YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767 .
  • Ren et al. (2016) Ren, W., Liu, S., Zhang, H., Pan, J., Cao, X., Yang, M., 2016.

    Single image dehazing via multi-scale convolutional neural networks, in: Proceedings of the European Conference on Computer Vision, Springer. pp. 154–169.

  • Sakaridis et al. (2018) Sakaridis, C., Dai, D., Gool, L., 2018.

    Semantic foggy scene understanding with synthetic data.

    International Journal of Computer Vision 126, 973–992.
  • Shao et al. (2020) Shao, Y., Li, L., Ren, W., Gao, C., Sang, N., 2020. Domain adaptation for image dehazing, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2808–2817.
  • Soh et al. (2020) Soh, J., Cho, S., Cho, N., 2020.

    Meta-transfer learning for zero-shot super-resolution, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3516–3525.

  • Sun et al. (2022) Sun, Q., Yen, G.G., Tang, Y., Zhao, C., 2022. Learn to adapt for monocular depth estimation. arXiv preprint arXiv:2203.14005 .
  • Sun et al. (2020) Sun, Y., Wang, X., Liu, Z., Miller, J., Efros, A., Hardt, M., 2020. Test-time training with self-supervision for generalization under distribution shifts, in: Proceedings of the International Conference on Machine Learning, pp. 9229–9248.
  • Tang et al. (2022) Tang, Y., Zhao, C., Wang, J., Zhang, C., Sun, Q., Zheng, W., Du, W., Qian, F., Kurths, J., 2022. An overview of perception and decision-making in autonomous systems in the era of learning.

    IEEE Transactions on Neural Networks and Learning Systems .

  • Tang et al. (2021) Tang, Z., Cunha, R., Cabecinhas, D., Hamel, T., Silvestre, C., 2021. Quadrotor going through a window and landing: An image-based visual servo control approach. Control Engineering Practice 112, 104827.
  • Van Dooren et al. (2022) Van Dooren, S., Duhr, P., Amstutz, A., Onder, C.H., 2022. Optimal control of real driving emissions. Control Engineering Practice 127, 105269.
  • Wu et al. (2021) Wu, H., Qu, Y., Lin, S., Zhou, J., Qiao, R., Zhang, Z., Xie, Y., Ma, L., 2021. Contrastive learning for compact single image dehazing, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10551–10560.
  • Wu et al. (2022) Wu, J., Jin, Z., Liu, A., Yu, L., Yang, F., 2022. A hybrid deep-Q-network and model predictive control for point stabilization of visual servoing systems. Control Engineering Practice 128, 105314.
  • Yang et al. (2022) Yang, Y., Wang, C., Liu, R., Zhang, L., Guo, X., Tao, D., 2022. Self-augmented unpaired image dehazing via density and depth decomposition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2037–2046.
  • Ye and Yao (2022) Ye, Z., Yao, L., 2022. Contrastive conditional neural processes, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9687–9696.
  • Zhan et al. (2021) Zhan, F., Yu, Y., Cui, K., Zhang, G., Lu, S., Pan, J., Zhang, C., Ma, F., Xie, X., Miao, C., 2021. Unbalanced feature transport for exemplar-based image translation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15028–15038.
  • Zhang et al. (2020a) Zhang, C., Wang, J., Yen, G.G., Zhao, C., Sun, Q., Tang, Y., Qian, F., Kurths, J., 2020a. When autonomous systems meet accuracy and transferability through AI: A survey. Patterns 1, 100050.
  • Zhang and Patel (2018) Zhang, H., Patel, V., 2018. Densely connected pyramid dehazing network, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3194–3203.
  • Zhang et al. (2022) Zhang, H., Zhao, C., Ding, J., 2022.

    Online reinforcement learning with passivity-based stabilizing term for real time overhead crane control without knowledge of the system model.

    Control Engineering Practice 127, 105302.
  • Zhang et al. (2017) Zhang, J., Cao, Y., Fang, S., Kang, Y., Wen Chen, C., 2017. Fast haze removal for nighttime image using maximum reflectance prior, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7418–7426.
  • Zhang et al. (2020b) Zhang, J., Cao, Y., Zha, Z., Tao, D., 2020b. Nighttime dehazing with a synthetic benchmark, in: Proceedings of the 28th ACM International Conference on Multimedia, pp. 2355–2363.
  • Zhang et al. (2021) Zhang, M., Marklund, H., Dhawan, N., Gupta, A., Levine, S., Finn, C., 2021. Adaptive risk minimization: Learning to adapt to domain shift. Advances in Neural Information Processing Systems 34.
  • Zhang et al. (2020c) Zhang, P., Zhang, B., Chen, D., Yuan, L., Wen, F., 2020c. Cross-domain correspondence learning for exemplar-based image translation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5143–5153.
  • Zhang et al. (2019) Zhang, T., Fu, Y., Wang, L., Huang, H., 2019. Hyperspectral image reconstruction using deep external and internal learning, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8559–8568.
  • Zhao et al. (2022a) Zhao, C., Tang, Y., Sun, Q., 2022a. Unsupervised monocular depth estimation in highly complex environments. IEEE Transactions on Emerging Topics in Computational Intelligence , 1–10.
  • Zhao et al. (2022b) Zhao, C., Zhang, Y., Poggi, M., Tosi, F., Guo, X., Zhu, Z., Huang, G., Tang, Y., Mattoccia, S., 2022b. MonoViT: Self-supervised monocular depth estimation with a vision transformer .
  • Zheng et al. (2021) Zheng, Z., Ren, W., Cao, X., Hu, X., Wang, T., Song, F., Jia, X., 2021. Ultra-high-definition image dehazing via multi-guided bilateral learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16180–16189.