DeepAI
Log In Sign Up

Domain Decorrelation with Potential Energy Ranking

07/25/2022
by   Sen Pei, et al.
0

Machine learning systems, especially the methods based on deep learning, enjoy great success in modern computer vision tasks under experimental settings. Generally, these classic deep learning methods are built on the i.i.d. assumption, supposing the training and test data are drawn from a similar distribution independently and identically. However, the aforementioned i.i.d. assumption is in general unavailable in the real-world scenario, and as a result, leads to sharp performance decay of deep learning algorithms. Behind this, domain shift is one of the primary factors to be blamed. In order to tackle this problem, we propose using Potential Energy Ranking (PoER) to decouple the object feature and the domain feature (i.e., appearance feature) in given images, promoting the learning of label-discriminative features while filtering out the irrelevant correlations between the objects and the background. PoER helps the neural networks to capture label-related features which contain the domain information first in shallow layers and then distills the label-discriminative representations out progressively, enforcing the neural networks to be aware of the characteristic of objects and background which is vital to the generation of domain-invariant features. PoER reports superior performance on domain generalization benchmarks, improving the average top-1 accuracy by at least 1.20% compared to the existing methods. Moreover, we use PoER in the ECCV 2022 NICO Challenge[https://nicochallenge.com], achieving top place with only a vanilla ResNet-18. The code has been made available at https://github.com/ForeverPs/PoER.

READ FULL TEXT VIEW PDF
04/05/2019

NELEC at SemEval-2019 Task 3: Think Twice Before Going Deep

Existing Machine Learning techniques yield close to human performance on...
08/23/2022

IMPaSh: A Novel Domain-shift Resistant Representation for Colorectal Cancer Tissue Classification

The appearance of histopathology images depends on tissue type, staining...
10/01/2019

Entropy Penalty: Towards Generalization Beyond the IID Assumption

It has been shown that instead of learning actual object features, deep ...
10/28/2020

Object Hider: Adversarial Patch Attack Against Object Detectors

Deep neural networks have been widely used in many computer vision tasks...
05/21/2022

Gradient Concealment: Free Lunch for Defending Adversarial Attacks

Recent studies show that the deep neural networks (DNNs) have achieved g...
08/21/2020

Domain Adaptation of Learned Features for Visual Localization

We tackle the problem of visual localization under changing conditions, ...
10/18/2020

Feature Importance Ranking for Deep Learning

Feature importance ranking has become a powerful tool for explainable AI...

Introduction

Deep learning methods have been proved to be increasingly effective in many difficult machine learning tasks, such as large-scale image classification Cun et al. (1990); Krizhevsky et al. (2012); Simonyan and Zisserman (2015); He et al. (2016); Huang et al. (2017), objects detection Girshick et al. (2014); Redmon et al. (2016); Liu et al. (2016); Carion et al. (2020) and image generation Kingma and Welling (2014); Dinh et al. (2015); Goodfellow et al. (2020); Mirza and Osindero (2014); Gulrajani et al. (2017); Ho et al. (2020), etc. Generally, the human-surpassing performance that deep neural networks enjoy is greatly benefited from the i.i.d. assumption, supposing the training and the test data are drawn from the same distribution independently and identically. Unfortunately, in open-world scenarios, it is difficult to guarantee that the i.i.d. assumption always holds, and as a result, leads to sharp performance decay in the presence of inputs from unseen domains. Formally, the aforementioned problem is termed domain generalization (DG). Given several labeled domains (a.k.a.

source domains), DG aims to train classifiers only with data from these labeled source domains that can generalize well to any unseen target domains. Different from the closely related domain adaptation (DA) task, DG has no access to the data from target domains while DA can use that for finetuning, namely the adaptation step.

(a) category-wise
(b) domain-wise
Figure 1: Feature distribution of the proposed PoER. A vanilla ResNet-18 He et al. (2016) is trained on PACS Li et al. (2017) dataset, and the above images show the category-wise and domain-wise feature distribution. We remove the conventional classification head and perform cluster using the outputted feature. (a): Feature distribution across different categories. It is clear that the final outputs of PoER are pure label-related representations. (b): Feature distribution across domains. PoER makes the feature extractor aware the characteristic of both label-related and domain-related information first, and filters out the domain appearance progressively with the stacked convolutional blocks, achieving feature alignment for better generalization ability.

Commonly, a straightforward way to deal with the domain generalization problem is to collect as much data as possible from diverse domains for training. However, this solution is costly and impractical in some fields, and it is actually inevitable that the deep neural networks deployed in open-world scenarios will encounter out-of-distribution (OOD) inputs that are never exposed in the training phase no matter how much data you collect. Except for the aforementioned solution, the schemes aiming to mitigate the negative effects of domain shifts can be roughly divided into three categories which are data augmentation schemes, representation learning techniques, and optimization methods. The data augmentation schemes such as Zhou et al. (2020a, b) and Zhou et al. (2021b) mainly generate auxiliary synthetic data for training, improving the robustness and generalization ability of classifiers. The representation learning represented by feature alignment Ganin et al. (2015) enforces the neural networks to capture domain-invariant features, removing the irrelevant correlations between objects and background (i.e., domain appearance). There are adequate literature in this line of research such as Shankar et al. (2018); Tzeng et al. (2014); Muandet et al. (2013); Sun and Saenko (2016) and Zhang et al. (2021). Apart from the previously mentioned research lines, as stated in Shen et al. (2021), optimization methods that are both model agnostic and data structure agnostic are established to guarantee the worst-case performance under domain shifts. From the overall view, our proposed PoER belongs to the representation learning methods, and it decouples the features of domains and objects progressively from the perspective of energy, making the classifiers aware the characteristic of domains and objects which in turn promotes the generation of domain-invariant features.

As the saying goes, know yourself and know your enemy, and you can fight a hundred battles with no danger of defeat. The main drawback of existing representation learning methods is paying much attention to the generation of domain-invariant features before knowing the characteristics of the domain itself. By comparison, PoER makes the classifiers capture label-discriminative features containing domain information explicitly first in shallow layers, and with the distillation ability of the stacked convolutional blocks, filters the irrelevant correlations between objects and domains out. From the perspective of potential energy, PoER enforces the features with identical domain labels to have lower energy differences (i.e., pair-potential) in shallow layers. Similarly, in deeper convolutional layers, the features are more label-related Zhou et al. (2021b) instead of domain-related and PoER penalizes the classifiers if features with identical category labels are pushed far away from each other, i.e., greater pair-potential. The key contributions of this paper are summarized as follows:

  • A plug-and-play regularization term, namely PoER, is proposed for mitigating the negative effects of domain shifts, filtering out the irrelevant relation between objects and domains, and allowing better generalization of deep neural networks. PoER can be easily combined with the mainstream neural network architectures.

  • The proposed PoER reports superior performance on domain generalization benchmarks, reducing classification error by at least 1.20% compared with existing techniques. PoER is parameters-free and training-stable, promoting the learning of domain-invariant features.

  • We tackle the domain generalization problem from a new perspective, i.e., potential energy, wishing to introduce more insights to this line of research.

Figure 2: Domain generalization on NICO dataset. NICO He et al. (2021) contains two super categories which are animals and vehicles. Animals are composed of 10 child categories while vehicles contain 9 sub-classes. Within each subclass, there are 10 different domains. In the image above, the background in different colors indicates different domains. Moreover, the newly released NICO++ Zhang et al. (2022) is also a similar DG dataset.

Related Work

Domain Augmentation Schemes.

This line of research argues that diverse training data is the key to more generalizable neural networks. In DDAIG Zhou et al. (2020a), the adversarial training schemes are used for generating images from unseen domains that served as the auxiliary data, boosting the generalization ability of conventional neural networks. In MixStyle Zhou et al. (2021b), the InstanceNorm Ulyanov et al. (2016) and AdaIN Huang and Belongie (2017) are used for extracting domain-related features. The mixing operation between these features results in representations from novel domains, increasing the domain diversity of training data (i.e., source domains). More recently, Style Neophile Kang et al. (2022) synthesizes novel styles constantly during training, addressing the limitation and maximizing the benefit of style augmentation. In SSAN Wang et al. (2022), different content and style features are reassembled for a stylized feature space, expanding the diversity of labeled data. Similarly, EFDM Zhang et al. (2022)

proposes to match the empirical Cumulative Distribution Functions (eCDFs) of image features, mapping the representation from unseen domains to the specific feature space.

Figure 3: The proposed PoER framework. In the shallow layers, neural networks extract feature representations containing both label-related and domain-related information, and PoER pushes the features with different domains far away from each other, making the neural networks aware the characteristic across domains. Following this, with the stacked convolutional blocks, PoER enforces the features within identical category labels close to each other progressively no matter domain labels, filtering out the irrelevant correlation between objects and domains. The distilled pure label-related feature is used for classification finally. In the image above, we use data within two categories and four domains as an example for depiction. indicates the pair-potential which describes the difference of potential energy between any given feature pairs.

Domain-invariant Representation Learning.

This is another research line for dealing with the domain generalization problem from the perspective of representation learning. In Ganin et al. (2015)

, the deep features are promoted to be discriminative for the main learning task and invariant with respect to the shift across domains. In

Tzeng et al. (2014); Shankar et al. (2018) and Li et al. (2022), the neural networks are guided to extract domain-invariant features which can mitigate the effects of domain shifts. In Akada et al. (2022)

, self-supervised manners are extended for learning domain-invariant representation in depth estimation, yielding better generalization ability.

Nguyen et al. (2021) obtains a domain-invariant representation by enforcing the representation network to be invariant under all transformation functions among domains. Also in LIRR Li et al. (2021a), the main idea is to simultaneously learn invariant representations and risks under the setting of Semi-DA. More recently, in CCT-Net Zhou et al. (2021), the author employs confidence weighted pooling (CWP) to obtain coarse heatmaps which help generate category-invariant characteristics, enabling transferability from the source domain to the target domain. In PDEN Li et al. (2021b), multiple domains are progressively generated in order to simulate various photometric and geometric transforms in unseen domains. Benefiting from the contrastive learning scheme, the learned domain-invariant representations are well clustered.

Preliminaries

Problem Statement.

Suppose , , and are the space of raw training data, domain labels, and category labels respectively. A classifier parameterized by is defined as:

. Domain generalization aims to sample data only from the joint distribution

for training while obtaining model which can generalize well to unseen domains. Feel free to use the domain labels or not during the training. Under the DG settings, has no access to the data from target domains which is different from the DA task.

Background of Potential Energy.

It is generally acknowledged that the energy is stored in objects due to their position in the field, namely potential energy. Pair potential is a function that describes the difference in potential energy between two given objects. A cluster of objects achieves stability when their pair potential between each other is fairly low. Inspired by this principle, we treat the representation and feature space of neural networks as objects and the potential field respectively. The classifier is expected to achieve stability where the energy difference (i.e., pair-potential) between the same domains and the same labels is lower. Based on this fact, we enforce the neural network to capture the representation explicitly containing the domain and the label information, and with the stacked convolutional blocks, filtering out the irrelevant correlation between label-related objects and the appearance. This means the pair potential is increasing across data with different category labels while decreasing across domains with identical category labels.

Methods

In this section, we detail our energy-based modeling first, elaborating the methodology and insights of PoER. Following this, we give the training and inference pipeline for reproducing in clarity.

Energy-based Modeling

We treat the feature space as a potential field , and the feature map from different layers is described as . Formally, is a metric function which measures the distance (i.e., potential difference) between any given feature pairs. Picking distance as the measurement, the potential difference is obtained as:

(1)

where and represent the flattened feature maps extracted from the identical layers of neural networks, and is their dimensionality. With an energy kernel , the potential difference in Eq.(1) is mapped to the pair-potential as shown below:

(2)

where is a hyper-parameter. In the shallow layers, PoER pushes the features from identical categories and domains close to each other while making that from different categories or domains stay away, aiming to make the neural networks aware the characteristic of domains and objects. For meeting this awareness, PoER employs margin-based ranking loss to supervise the discrimination across different domains and categories. Intuitively, it is straightforward to accept that the features with identical category and domain labels should have lower pair-potential compared to that with different category or domain labels. equals 1 default.

Suppose is a feature representation from category and domain , and indicates the feature with category label and domain label . Notes that and . To formalize the idea of PoER, we build the ranking order across these features as follows:

(3)

where indicates another feature with the identical category and domain label as . With the margin-based ranking loss, we combine the feature pairs above for calculating the pair-wise ranking loss. Formally, we have:

(4)
(5)
(6)

where margin is a positive scalar. The complete ranking loss in shallow layers is depicted as .

With the stacked convolutional blocks, PoER filters the domain-related information out progressively. In deeper layers, we enforce the features distribute closely intra-class while discretely inter-class, ignoring the domain labels. We use and to depict the data with category label and respectively. Therefore, given feature from category , the cluster loss is formulated as follows:

(7)

The PoER regularization is the sum of aforementioned loss functions, saying

. Moreover, the aforementioned regularization term can be easily calculated within each batch of the training data, since there exists all of the combinations stated above.

Distance-based Classification

We classify the data in GCPL Yang et al. (2018) manner since we regularize the features in neural networks. is the feature from the penultimate layer of the discriminative model such as conventional ResNet-18 with shape , and is the learnable prototypes with shape , where and are the numbers of classes and prototypes each class. The distance is calculated between and along the last dimension of prototypes. The distance matrix is obtained as in shape . For calculating the category cross-entropy loss, we pick the minimal distance within prototypes of each class, saying is acquired by selecting the minimum value along the last dimension of in shape

. Formally, the predicted probability that given data belongs to category

is built as:

(8)

Suppose is an indicator function that equals to 1 if and only if the corresponding label of is otherwise 0. is the category label of feature . Therefore, the distance-based classification loss is formulated as:

(9)

The overall loss function of distance-based classification with PoER is in the form as:

(10)

where is a robust hyper-parameter for balancing the regularization term and classification error.

Training and Inference

We detail the training and inference pipeline in this section for easy reproduction and a clear understanding. As stated in the previous section, , , and are the space of image data, domain labels, and category labels respectively. We sample data from the joint distribution of for training. Suppose is a feature extractor that returns the flattened features of each block in neural networks. The feature extractor has no more need of a classification head since we employ the distance-based cluster manner for identification. The training and inference processes are summarized in Algorithm 1.

Input: training data , neural network
Output: trained neural network
1 while Training do
2       Sample a batch data from ;
3       Get the features from each block: ;
4       Calculate for features from the first three blocks in with pair-wise manner;
5       Calculate for features from the left blocks in , including the last one;
6       Calculate the with feature from the last block in , summing them up as Eq.(10);
7       Update the parameters of with gradient descent method.
8while Inference do
9       Sample from the testing set;
10       Get feature from the last block of ;
11       Calculate distance between and prototypes with Eq.(1);
12       Classify the given data using Eq.(8).
Algorithm 1 Potential energy ranking for DG task.

It is worth noting that the classification is only based on the feature from the last block of . PoER performs regularization on features from all blocks.

Experiments

Experimental Setup

We use ResNet-18 He et al. (2016) without the classification head as the feature extractor

, and the backbone is pre-trained on ImageNet

Russakovsky et al. (2015). has 5 blocks including the top convolutional layer. We reduce the dimension of the outputted feature of ResNet from 512 to 128 with a linear block. For summarize, our

returns 6 flattened features in total. The learning rate starts from 1e-4 and halves every 70 epochs. The batch size is set to 128. The hyper-parameter

in Eq.(10) is set to 0.1 during the first 70 epochs otherwise 0.2. Only the RandomHorizontalFlip and ColorJitter

are adopted as the data augmentation schemes. The AdamW optimizer is used for training. The mean-std normalization is used based on the ImageNet statistics. GCPL

Yang et al. (2018) uses the same settings as stated above, and all other methods employ the default official settings. We store the models after the first 10 epochs based on the top-1 accuracy on the validation set. The number of prototypes is set to 3.

Dataset

We consider 4 benchmarks for evaluating the performance of our proposed PoER, namely PACS Li et al. (2017), VLCS Ghifary et al. (2015b), Digits-DG Zhou et al. (2021b), and Office-Home Venkateswara et al. (2017). On the NICO He et al. (2021) dataset, we only report the limited results of some new methods we collected. The datasets mentioned below can be downloaded at Dassl Zhou et al. (2021a).

PACS contains images with shape in RGB channel, belonging to 7 categories within 4 domains which are Photo, Art, Cartoon, and Sketch. Under DG settings, the model has no access to the target domain, and therefore the dataset is split into three parts used for training, validation, and test. We use the split file provided in EntropyReg Zhao et al. (2020). The training and validation set are data from the source domains while the test set is sampled from the target domain. We pick classifiers based on the validation metric.

Office-Home contains images belonging to 65 categories within 4 domains which are artistic, clip art, product, and the real world. Following DDAIG Zhou et al. (2020a), we randomly split the source domains into 90% for training and 10% for validation, reporting the metrics on the leave-one-out domain using the best-validated model.

Digits-DG

is a mixture of 4 datasets, namely MNIST

LeCun et al. (1998), MNIST-M Ganin et al. (2015)

, SVHN

Netzer et al. (2011), and SYN Ganin et al. (2015). All images are resized into . The reported metrics use the leave-one-domain-out manner for evaluation.

VLCS contains images from 5 categories within 4 domains which are Pascal VOC2007 Everingham et al. (2010), LabelMe Russell et al. (2008), Caltech Fei-Fei et al. (2004), and SUN09 Choi et al. (2010). We randomly split the source domains into 70% for training and 30% for validation following Ghifary et al. (2015b), reporting metrics on the target domain using the best-validated classifier.

NICO consists of natural images within 10 domains, 8 out of which are treated as the source and 2 as the target. Following Zhang et al. (2021), we randomly split the data into 90% for training and 10% for validation, reporting metrics on the left domains with the best-validated model.

Metrics

We report top-1 classification accuracy on the aforementioned datasets. For avoiding occasionality, each setting is measured with 5 runs. We also give the 95% confidence intervals calculated with

, where , , and

are the mean, standard deviation, and runs of the top-1 accuracy. A part of previous methods report no 95% confidence intervals, and therefore, we give the top-1 classification accuracy.

Evaluation on Domain Generalization Benchmarks

We report the main results of domain generalization on common benchmarks in this section. If not specified, the ResNet-18 He et al. (2016) is adopted as the backbone across different techniques. We use Avg. to represent the average top-1 classification accuracy over different domains.

Leave-one-domain-out results on PACS.

Since we only collect limited results with 95% confidence intervals, we report the mean top-1 accuracy over 5 runs. Methods shown in the upper part of Table.1 use AlexNet Krizhevsky et al. (2012) as the backbone while the following part in gray background uses ResNet-18 He et al. (2016). The vanilla counterpart of PoER is GCPL Yang et al. (2018). PoER improves the top-1 classification accuracy up to 2.32% and 0.48% compared to its vanilla counterpart and the existing state-of-the-art methods. Methods shown below are arranged in the decreasing order of top-1 accuracy. A, C, P, and S in Table.1 indicate Art, Cartoon, Photo, and Sketch.

Methods A. C. P. S. Avg.
D-MATE Ghifary et al. (2015a) 60.27 58.65 91.12 47.68 64.48
M-ADA Qiao et al. (2020) 61.53 68.76 83.21 58.49 68.00
DBADG Li et al. (2017) 62.86 66.97 89.50 57.51 69.21
MLDG Li et al. (2018a) 66.23 66.88 88.00 58.96 70.01
Feature-critic Li et al. (2019b) 64.89 71.72 89.94 61.85 71.20
CIDDG Li et al. (2018c) 66.99 68.62 90.19 62.88 72.20
MMLD Matsuura et al. (2020) 69.27 72.83 88.98 66.44 74.38
MASF Dou et al. (2019) 70.35 72.46 90.68 67.33 75.21
EntropyReg Zhao et al. (2020) 71.34 70.29 89.92 71.15 75.67
MMD-AAE Li et al. (2018b) 75.20 72.70 96.00 64.20 77.03
CCSA Motiian et al. (2017) 80.50 76.90 93.60 66.80 79.45
ResNet-18 He et al. (2016) 77.00 75.90 96.00 69.20 79.53
StableNet Zhang et al. (2021) 80.16 74.15 94.24 70.10 79.66
JiGen Carlucci et al. (2019) 79.40 75.30 96.00 71.60 80.50
CrossGrad Shankar et al. (2018) 79.80 76.80 96.00 70.20 80.70
DANN Ganin et al. (2015) 80.20 77.60 95.40 70.00 80.80
Epi-FCR Li et al. (2019a) 82.10 77.00 93.90 73.00 81.50
MetaReg Balaji et al. (2018) 83.70 77.20 95.50 70.30 81.70
GCPL Yang et al. (2018) 82.64 75.02 96.40 73.36 81.86
EISNet Wang et al. (2020) 81.89 76.44 95.93 74.33 82.15
L2A-OT Zhou et al. (2020c) 83.30 78.20 96.20 73.60 82.83
MixStyle Zhou et al. (2021b) 84.10 78.80 96.10 75.90 83.70
PoER (Ours) 85.30 77.69 96.42 77.30 84.18
Table 1:

Leave-one-domain-out results on PACS dataset without 95% confidence intervals. The methods in gray background use ResNet-18 as the backbone while other methods employ AlexNet for feature extraction.

Leave-one-domain-out results on OfficeHome dataset.

We report the mean top-1 accuracy and 95% confidence interval results on OfficeHome. Some of the following results are from DDAIG Zhou et al. (2020a). We use the same method as stated in DDAIG to split the source domains into 90% for training and 10% for validation. The images in OfficeHome are colorful in the RGB channel whose scale scatters from pixels to pixels. The short edge of all images is resized to 227 first, maintaining the aspect ratio, and then the training inputs are obtained through RandomResizedCrop with shape 224. In Table.2, CCSA, MMD-AAE, and D-SAM are from Motiian et al. (2017), Li et al. (2018b), and D’Innocente et al. (2018), and other methods have been introduced before. As stated in the previous section, the vanilla counterpart of PoER is GCPL. It can be found that PoER reduces the classification error by a clear margin of 1.3% and 1.2% compared to its vanilla counterpart and the state-of-the-art method DDAIG.

Method Artistic Clipart Product Real World Avg.
ResNet-18 58.9.3 49.4.1 74.3.1 76.2.2 64.7
CCSA 59.9.3 49.9.4 74.1.2 75.7.2 64.9
MMD-AAE 56.5.4 47.3.3 72.1.3 74.8.2 62.7
CrossGrad 58.4.7 49.4.4 73.9.2 75.8.1 64.4
D-SAM 58.0 44.4 69.2 71.5 60.8
JiGen 53.0 47.5 71.5 72.8 61.2
GCPL 58.3.1 51.9.1 74.1.2 76.7.1 65.3
DDAIG 59.2.1 52.3.3 74.6.3 76.0.1 65.5
PoER (ours) 59.1.2 53.4.3 74.9.2 79.1.3 66.6
Table 2: Leave-one-domain-out results on OfficeHome dataset with 95% confidence intervals. No confidence intervals are reported in the original paper of D-SAM and JiGen.

Domain generalization results on NICO dataset.

NICO is different from the aforementioned dataset. It consists of two super-categories, namely Animal and Vehicle, including 19 sub-classes in total. Moreover, the domains of each sub-class are different from each other. In a nutshell, NICO contains 19 classes belonging to 65 domains. For each class, we randomly select 2 domains as the target while the left 8 domains are treated as the source. Within source domains, we further split the data into 90% for training and 10% for validation. The metrics are reported with the best-validated models on target domains. RSC indicates the algorithm from Huang et al. (2020). No pre-trained weights are used in Table.3. PoER reports the superior performance by a remarkable margin of 2.83% compared to the existing methods.

M-ADA MMLD ResNet-18
PoER (ours)
62.62
NICO 40.78 47.18 51.71
JiGen RSC StableNet
NICO 54.42 57.59 59.76
Table 3: Domain generalization results on NICO dataset.

Leave-one-domain-out results on Digits-DG dataset.

Digits-DG consists of four different datasets containing digits with different appearances. Following DDAIG Zhou et al. (2020a), all images are resized to with RGB channel. For the MNIST dataset, we replicate the gray channel three times to construct the color images. As stated in Zhou et al. (2020a), we randomly pick 600 images for each class in these four datasets. Images are split into 90% for training and 10% for validation. The leave-one-domain-out protocol is used for evaluated the domain generalization performance. All images in the left domain are tested for reporting metrics. Table.4 tells the 95% confidence intervals of PoER and its comparisons with some existing domain generalization methods. It is clear to see that PoER surpasses previous techniques on most domains by a large margin, reducing the classification error up to 4.23% and achieving newly state-of-the-art domain generalization performance with only a vanilla ResNet-18 backbone. All methods shown in Table.4 are presented in previous sections.

Method MNIST MNIST-M SVHN SYN Avg.
ResNet-18 95.8.3 58.8.5 61.7.5 78.6.6 73.7
CCSA 95.2.2 58.2.6 65.5.2 79.1.8 74.5
MMD-AAE 96.5.1 58.4.1 65.0.1 78.4.2 74.6
CrossGrad 96.7.1 61.1.5 65.3.5 80.2.2 75.8
GCPL 96.3.1 58.7.5 70.2.3 80.5.3 76.4
DDAIG 96.6.2 64.1.4 68.6.6 81.0.5 77.6
PoER (ours) 97.2.4 60.1.3 75.6.4 94.4.3 81.8
Table 4: Leave-one-domain-out results on Digits-DG with 95% confidence intervals.

Leave-one-domain-out results on VLCS dataset.

VLCS consists of four common datasets. All methods shown in Table.5 can be found in Table.1

in detail. VOC indicates the Pascal VOC dataset. Part of the following results is from StableNet

Zhang et al. (2021). Following Zhao et al. (2020), the leave-one-domain-out protocol is used for evaluation. The source domains are split into 70% for training and 30% for validation. The best-validated model reports domain generalization performance on all images from the left target domain. PoER gives a better classification accuracy surpassing other methods with a large improvement, saying 1.44% outperforming the current techniques on Caltech.

Method VOC LabelMe Caltech SUN09 Avg.
DBADG 69.99 63.49 93.64 61.32 72.11
ResNet-18 67.48 61.81 91.86 68.77 72.48
JiGen 70.62 60.90 96.93 64.30 73.19
MMLD 71.96 58.77 96.66 68.13 73.88
CIDDG 73.00 58.30 97.02 68.89 74.30
EntropyReg 73.24 58.26 96.92 69.10 74.38
GCPL 67.01 64.84 96.23 69.43 74.38
RSC 73.81 62.51 96.21 72.10 76.16
StableNet 73.59 65.36 96.67 74.97 77.65
PoER (ours) 69.96 66.41 98.11 72.04 76.63
Table 5: Leave-one-domain-out results on VLCS dataset. PoER gets lower metrics on Pascal VOC and SUN09 while reporting superior performance on Caltech.

To clarify, the short edge of images is resized to 512, and then we randomly crop squares with shape 512 for training.

Ablation Study

Ablation on the weight of PoER .

In Eq.10, we introduce a hyper-parameter for balancing the classification loss and energy ranking loss. We set this parameter from 0.0 to 0.9 with a step 0.1 for testing its sensitivity. For clarify, equals to 0 indicates the vanilla counterpart of PoER, i.e., GCPL. We use the PACS benchmark and set the number of prototypes to 2. The ablation results are shown in Table.6.

0.0 0.1 0.2 0.3 0.4
PACS Avg. 81.86 83.92 84.20 84.16 84.16
0.5 0.6 0.7 0.8 0.9
PACS Avg. 84.13 84.06 84.12 83.97 83.60
Table 6: Ablation results on hyper-parameter .

From table.6, we find that the performance is better when setting to 0.2, considering the training stability, we set to 0.1 in the first 70 epochs and otherwise 0.2.

Ablation on the number of prototypes .

In Eq.8

, we set the learnable prototypes as a tensor with shape

where the , , and are the number of classes, the number of prototypes, and the dimensionality of the outputted feature from the last block. We test the impacts of the number of prototypes with respect to the classification performance from 1 to 10 with step 1. Table.7 reports the results tested on PACS benchmark.

1 2 3 4 5
PACS Avg. 82.40 83.97 84.18 84.12 84.07
6 7 8 9 10
PACS Avg. 84.20 84.18 84.19 84.19 84.21
Table 7: Ablation results on the number of prototypes .

From Table.7, we find that more prototypes guarantee a better classification result to some extent. Considering both the performance and calculation efficiency, we set the number of prototypes to 3 default, leading to metrics which are marginally lower than the best one.

Ablation on the position of PoER.

In Eq.6 and Eq.7, we propose calculating ranking loss in the shallow layers while performing clustering regularization in the deeper layers. Taking ResNet-18 as an example, we extract features from the first three blocks (including the first convolutional layer) for calculating the ranking loss as shown in Eq.6 while the features extracted from the following blocks are treated as the deeper layers for getting clustering loss as shown in Eq.7. We test the combinations of different loss functions, namely , , , and . Noting that equals to . Recall that indicates the vanilla counterpart of PoER, i.e., GCPL. We use the default settings of and as stated in previous sections. The ablation results on different combinations of loss functions are presented in Table.8. Metrics are evaluated on the PACS benchmark. From Table.8, we find that both and help to improve the domain generalization ability of conventional neural networks. Noting that can be treated as the self-supervised manner for boosting domain generalization performance.

Loss
PACS Avg. 81.86 83.72
Loss
PACS Avg. 82.60 84.18
Table 8: Ablation results on different combinations of the proposed loss functions. Especially, as shown above, the group with surpasses up to 1.12%, suggesting the importance of domain ranking proposed in PoER.

Auxiliary Visualization Results

In this section, we provide the feature distribution within the identical category from the shallow layers to the deeper. From these distributions, one can capture the insights of our proposed PoER directly and effortlessly.

Figure 4: Visualization results of the feature distribution in each layer. Images above are feature distributions with identical category label from block 1 to block 6 (from left to right and from top to bottom) successively.

From the feature distribution of each block shown in Figure.4, it can be concluded that in the first three blocks (the first row), PoER employs domain ranking loss to make the neural network be aware of the characteristic across domains, separating the features from different domains. In the following blocks (the second row), PoER aims to filter the domain-related information out for clustering, making the features with identical category labels close together no matter domains. As stated in the beginning, PoER learns the characters of each domains and categories before generating domain-invariant features, laying the foundation for distillation of pure label-related features.

Conclusion

In this paper, we propose using PoER to make the classifier aware of the characters of different domains and categories before generating domain-invariant features. We find and verify that PoER is vital and helpful for improving the generalization ability of models across domains. PoER reports superior results on sufficient domain generalization benchmarks compared to existing techniques, achieving state-of-the-art performance. Insights of the proposed idea are given both statistically and visually. We hope the mentioned energy perspective can inspire the following works.

References

  • H. Akada, S. F. Bhat, I. Alhashim, and P. Wonka (2022)

    Self-supervised learning of domain invariant features for depth estimation

    .
    In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 3377–3387. Cited by: Domain-invariant Representation Learning..
  • Y. Balaji, S. Sankaranarayanan, R. Chellappa, and R. Chellappa (2018) MetaReg: towards domain generalization using meta-regularization. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), pp. 1006–1016. External Links: Link Cited by: Table 1.
  • N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko (2020) End-to-end object detection with transformers. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I, A. Vedaldi, H. Bischof, T. Brox, and J. Frahm (Eds.), Lecture Notes in Computer Science, Vol. 12346, pp. 213–229. External Links: Link, Document Cited by: Introduction.
  • F. M. Carlucci, A. D’Innocente, S. Bucci, B. Caputo, and T. Tommasi (2019) Domain generalization by solving jigsaw puzzles. In

    IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019

    ,
    pp. 2229–2238. External Links: Link, Document Cited by: Table 1.
  • M. J. Choi, J. J. Lim, A. Torralba, and A. S. Willsky (2010) Exploiting hierarchical context on a large database of object categories. In The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010, pp. 129–136. External Links: Link, Document Cited by: Dataset.
  • Y. L. Cun, B. Boser, J. S. Denker, D. Henderson, and L. D. Jackel (1990) Handwritten digit recognition with a back-propagation network. In Advances in Neural Information Processing Systems, pp. 396–404. Cited by: Introduction.
  • A. D’Innocente, B. Caputo, B. Caputo, and A. D’Innocente (2018) Domain generalization with domain-specific aggregation modules. In Pattern Recognition - 40th German Conference, GCPR 2018, Stuttgart, Germany, October 9-12, 2018, Proceedings, T. Brox, A. Bruhn, and M. Fritz (Eds.), Lecture Notes in Computer Science, Vol. 11269, pp. 187–198. External Links: Link, Document Cited by: Leave-one-domain-out results on OfficeHome dataset..
  • L. Dinh, D. Krueger, and Y. Bengio (2015) NICE: non-linear independent components estimation. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Workshop Track Proceedings, Y. Bengio and Y. LeCun (Eds.), External Links: Link Cited by: Introduction.
  • Q. Dou, D. C. de Castro, K. Kamnitsas, and B. Glocker (2019) Domain generalization via model-agnostic learning of semantic features. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett (Eds.), pp. 6447–6458. External Links: Link Cited by: Table 1.
  • M. Everingham, L. V. Gool, C. K. I. Williams, J. M. Winn, and A. Zisserman (2010) The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88 (2), pp. 303–338. External Links: Link, Document Cited by: Dataset.
  • L. Fei-Fei, R. Fergus, P. Perona, and P. Perona (2004) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2004, Washington, DC, USA, June 27 - July 2, 2004, pp. 178. External Links: Link, Document Cited by: Dataset.
  • Y. Ganin, V. S. Lempitsky, Y. Ganin, and V. S. Lempitsky (2015)

    Unsupervised domain adaptation by backpropagation

    .
    In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, F. R. Bach and D. M. Blei (Eds.), JMLR Workshop and Conference Proceedings, Vol. 37, pp. 1180–1189. External Links: Link Cited by: Introduction, Domain-invariant Representation Learning., Dataset, Table 1.
  • M. Ghifary, W. B. Kleijn, M. Zhang, and D. Balduzzi (2015a)

    Domain generalization for object recognition with multi-task autoencoders

    .
    In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pp. 2551–2559. External Links: Link, Document Cited by: Table 1.
  • M. Ghifary, W. B. Kleijn, M. Zhang, and D. Balduzzi (2015b) Domain generalization for object recognition with multi-task autoencoders. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pp. 2551–2559. External Links: Link, Document Cited by: Dataset, Dataset.
  • R. B. Girshick, J. Donahue, T. Darrell, and J. Malik (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014, pp. 580–587. External Links: Link, Document Cited by: Introduction.
  • I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio (2020) Generative adversarial networks. Commun. ACM 63 (11), pp. 139–144. External Links: Link, Document Cited by: Introduction.
  • I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville (2017) Improved training of wasserstein gans. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett (Eds.), pp. 5767–5777. External Links: Link Cited by: Introduction.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: Figure 1, Introduction, Experimental Setup, Leave-one-domain-out results on PACS., Evaluation on Domain Generalization Benchmarks, Table 1.
  • Y. He, Z. Shen, and P. Cui (2021) Towards non-iid image classification: a dataset and baselines. Pattern Recognition 110, pp. 107383. Cited by: Figure 2, Dataset.
  • J. Ho, A. Jain, and P. Abbeel (2020) Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin (Eds.), External Links: Link Cited by: Introduction.
  • G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708. Cited by: Introduction.
  • X. Huang and S. J. Belongie (2017) Arbitrary style transfer in real-time with adaptive instance normalization. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 1510–1519. External Links: Link, Document Cited by: Domain Augmentation Schemes..
  • Z. Huang, H. Wang, E. P. Xing, and D. Huang (2020) Self-challenging improves cross-domain generalization. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part II, A. Vedaldi, H. Bischof, T. Brox, and J. Frahm (Eds.), Lecture Notes in Computer Science, Vol. 12347, pp. 124–140. External Links: Link, Document Cited by: Domain generalization results on NICO dataset..
  • J. Kang, S. Lee, N. Kim, and S. Kwak (2022) Style neophile: constantly seeking novel styles for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7130–7140. Cited by: Domain Augmentation Schemes..
  • D. P. Kingma and M. Welling (2014) Auto-encoding variational bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Y. Bengio and Y. LeCun (Eds.), External Links: Link Cited by: Introduction.
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012)

    Imagenet classification with deep convolutional neural networks

    .
    Advances in neural information processing systems 25. Cited by: Introduction, Leave-one-domain-out results on PACS..
  • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner (1998) Gradient-based learning applied to document recognition. Proc. IEEE 86 (11), pp. 2278–2324. External Links: Link, Document Cited by: Dataset.
  • B. Li, Y. Shen, Y. Wang, W. Zhu, D. Li, K. Keutzer, and H. Zhao (2022) Invariant information bottleneck for domain generalization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36, pp. 7399–7407. Cited by: Domain-invariant Representation Learning..
  • B. Li, Y. Wang, S. Zhang, D. Li, K. Keutzer, T. Darrell, and H. Zhao (2021a) Learning invariant representations and risks for semi-supervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1104–1113. Cited by: Domain-invariant Representation Learning..
  • D. Li, Y. Yang, Y. Song, and T. M. Hospedales (2017) Deeper, broader and artier domain generalization. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 5543–5551. External Links: Link, Document Cited by: Figure 1, Dataset, Table 1.
  • D. Li, Y. Yang, Y. Song, and T. Hospedales (2018a) Learning to generalize: meta-learning for domain generalization. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32. Cited by: Table 1.
  • D. Li, J. Zhang, Y. Yang, C. Liu, Y. Song, and T. M. Hospedales (2019a) Episodic training for domain generalization. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp. 1446–1455. External Links: Link, Document Cited by: Table 1.
  • H. Li, S. J. Pan, S. Wang, and A. C. Kot (2018b) Domain generalization with adversarial feature learning. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 5400–5409. External Links: Link, Document Cited by: Leave-one-domain-out results on OfficeHome dataset., Table 1.
  • L. Li, K. Gao, J. Cao, Z. Huang, Y. Weng, X. Mi, Z. Yu, X. Li, and B. Xia (2021b) Progressive domain expansion network for single domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 224–233. Cited by: Domain-invariant Representation Learning..
  • Y. Li, X. Tian, M. Gong, Y. Liu, T. Liu, K. Zhang, and D. Tao (2018c) Deep domain generalization via conditional invariant adversarial networks. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XV, V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss (Eds.), Lecture Notes in Computer Science, Vol. 11219, pp. 647–663. External Links: Link, Document Cited by: Table 1.
  • Y. Li, Y. Yang, W. Zhou, and T. Hospedales (2019b) Feature-critic networks for heterogeneous domain generalization. In International Conference on Machine Learning, pp. 3915–3924. Cited by: Table 1.
  • W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. E. Reed, C. Fu, and A. C. Berg (2016) SSD: single shot multibox detector. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part I, B. Leibe, J. Matas, N. Sebe, and M. Welling (Eds.), Lecture Notes in Computer Science, Vol. 9905, pp. 21–37. External Links: Link, Document Cited by: Introduction.
  • T. Matsuura, T. Harada, T. Matsuura, and T. Harada (2020) Domain generalization using a mixture of multiple latent domains. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pp. 11749–11756. External Links: Link Cited by: Table 1.
  • M. Mirza and S. Osindero (2014) Conditional generative adversarial nets. CoRR abs/1411.1784. External Links: Link, 1411.1784 Cited by: Introduction.
  • S. Motiian, M. Piccirilli, D. A. Adjeroh, and G. Doretto (2017) Unified deep supervised domain adaptation and generalization. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 5716–5726. External Links: Link, Document Cited by: Leave-one-domain-out results on OfficeHome dataset., Table 1.
  • K. Muandet, D. Balduzzi, and B. Schölkopf (2013) Domain generalization via invariant feature representation. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, JMLR Workshop and Conference Proceedings, Vol. 28, pp. 10–18. External Links: Link Cited by: Introduction.
  • Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng (2011) Reading digits in natural images with unsupervised feature learning. Cited by: Dataset.
  • A. T. Nguyen, T. Tran, Y. Gal, and A. G. Baydin (2021) Domain invariant representation learning with domain density transformations. Advances in Neural Information Processing Systems 34, pp. 5264–5275. Cited by: Domain-invariant Representation Learning..
  • F. Qiao, L. Zhao, X. Peng, and X. Peng (2020) Learning to learn single domain generalization. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 12553–12562. External Links: Link, Document Cited by: Table 1.
  • J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi (2016) You only look once: unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 779–788. External Links: Link, Document Cited by: Introduction.
  • O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei (2015) ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115 (3), pp. 211–252. External Links: Document Cited by: Experimental Setup.
  • B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman (2008) LabelMe: A database and web-based tool for image annotation. Int. J. Comput. Vis. 77 (1-3), pp. 157–173. External Links: Link, Document Cited by: Dataset.
  • S. Shankar, V. Piratla, S. Chakrabarti, S. Chaudhuri, P. Jyothi, and S. Sarawagi (2018) Generalizing across domains via cross-gradient training. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, External Links: Link Cited by: Introduction, Domain-invariant Representation Learning., Table 1.
  • Z. Shen, J. Liu, Y. He, X. Zhang, R. Xu, H. Yu, and P. Cui (2021) Towards out-of-distribution generalization: A survey. CoRR abs/2108.13624. External Links: Link, 2108.13624 Cited by: Introduction.
  • K. Simonyan and A. Zisserman (2015) Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, Cited by: Introduction.
  • B. Sun and K. Saenko (2016) Deep CORAL: correlation alignment for deep domain adaptation. In Computer Vision - ECCV 2016 Workshops - Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III, G. Hua and H. Jégou (Eds.), Lecture Notes in Computer Science, Vol. 9915, pp. 443–450. External Links: Link, Document Cited by: Introduction.
  • E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell (2014) Deep domain confusion: maximizing for domain invariance. CoRR abs/1412.3474. External Links: Link, 1412.3474 Cited by: Introduction, Domain-invariant Representation Learning..
  • D. Ulyanov, A. Vedaldi, and V. S. Lempitsky (2016) Instance normalization: the missing ingredient for fast stylization. CoRR abs/1607.08022. External Links: Link, 1607.08022 Cited by: Domain Augmentation Schemes..
  • H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan (2017) Deep hashing network for unsupervised domain adaptation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 5385–5394. External Links: Link, Document Cited by: Dataset.
  • S. Wang, L. Yu, C. Li, C. Fu, and P. Heng (2020) Learning from extrinsic and intrinsic supervisions for domain generalization. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part IX, A. Vedaldi, H. Bischof, T. Brox, and J. Frahm (Eds.), Lecture Notes in Computer Science, Vol. 12354, pp. 159–176. External Links: Link, Document Cited by: Table 1.
  • Z. Wang, Z. Wang, Z. Yu, W. Deng, J. Li, T. Gao, and Z. Wang (2022) Domain generalization via shuffled style assembly for face anti-spoofing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4123–4133. Cited by: Domain Augmentation Schemes..
  • H. Yang, X. Zhang, F. Yin, and C. Liu (2018) Robust classification with convolutional prototype learning. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 3474–3482. External Links: Link, Document Cited by: Distance-based Classification, Experimental Setup, Leave-one-domain-out results on PACS., Table 1.
  • X. Zhang, P. Cui, R. Xu, L. Zhou, Y. He, and Z. Shen (2021) Deep stable learning for out-of-distribution generalization. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pp. 5372–5382. External Links: Link, Document Cited by: Introduction, Dataset, Leave-one-domain-out results on VLCS dataset., Table 1.
  • X. Zhang, Y. He, R. Xu, H. Yu, Z. Shen, and P. Cui (2022) NICO++: towards better benchmarking for domain generalization. External Links: 2204.08040 Cited by: Figure 2.
  • Y. Zhang, M. Li, R. Li, K. Jia, and L. Zhang (2022) Exact feature distribution matching for arbitrary style transfer and domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8035–8045. Cited by: Domain Augmentation Schemes..
  • S. Zhao, M. Gong, T. Liu, H. Fu, and D. Tao (2020) Domain generalization via entropy regularization. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin (Eds.), External Links: Link Cited by: Dataset, Leave-one-domain-out results on VLCS dataset., Table 1.
  • K. Zhou, Y. Yang, T. M. Hospedales, and T. Xiang (2020a) Deep domain-adversarial image generation for domain generalisation. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pp. 13025–13032. External Links: Link Cited by: Introduction, Domain Augmentation Schemes., Dataset, Leave-one-domain-out results on OfficeHome dataset., Leave-one-domain-out results on Digits-DG dataset..
  • K. Zhou, Y. Yang, T. M. Hospedales, and T. Xiang (2020b) Learning to generate novel domains for domain generalization. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XVI, A. Vedaldi, H. Bischof, T. Brox, and J. Frahm (Eds.), Lecture Notes in Computer Science, Vol. 12361, pp. 561–578. External Links: Link, Document Cited by: Introduction.
  • K. Zhou, Y. Yang, T. M. Hospedales, and T. Xiang (2020c) Learning to generate novel domains for domain generalization. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XVI, A. Vedaldi, H. Bischof, T. Brox, and J. Frahm (Eds.), Lecture Notes in Computer Science, Vol. 12361, pp. 561–578. External Links: Link, Document Cited by: Table 1.
  • K. Zhou, Y. Yang, Y. Qiao, and T. Xiang (2021a) Domain adaptive ensemble learning. IEEE Transactions on Image Processing (TIP). Cited by: Dataset.
  • K. Zhou, Y. Yang, Y. Qiao, and T. Xiang (2021b) Domain generalization with mixstyle. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, External Links: Link Cited by: Introduction, Introduction, Domain Augmentation Schemes., Dataset, Table 1.
  • Y. Zhou, L. Huang, T. Zhou, and L. Shao (2021) CCT-net: category-invariant cross-domain transfer for medical single-to-multiple disease diagnosis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8260–8270. Cited by: Domain-invariant Representation Learning..