Progressive Graph Learning for Open-Set Domain Adaptation

06/22/2020 ∙ by Yadan Luo, et al. ∙ 0

Domain shift is a fundamental problem in visual recognition which typically arises when the source and target data follow different distributions. The existing domain adaptation approaches which tackle this problem work in the closed-set setting with the assumption that the source and the target data share exactly the same classes of objects. In this paper, we tackle a more realistic problem of open-set domain shift where the target data contains additional classes that are not present in the source data. More specifically, we introduce an end-to-end Progressive Graph Learning (PGL) framework where a graph neural network with episodic training is integrated to suppress underlying conditional shift and adversarial learning is adopted to close the gap between the source and target distributions. Compared to the existing open-set adaptation approaches, our approach guarantees to achieve a tighter upper bound of the target error. Extensive experiments on three standard open-set benchmarks evidence that our approach significantly outperforms the state-of-the-arts in open-set domain adaptation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

While deep learning has made remarkable advances across a wide variety of machine-learning tasks and applications, it is commonly assumed that the training and test data are drawn from the same distribution. In practice, however, this assumption can be violated due to a number of factors, such as the change of lighting conditions, background, environment, or data modalities, which is referred to as the

domain shift problem.

Unsupervised Domain Adaptation (UDA) approaches tackle the domain shift problem by aligning the training (source) and test (target) distributions, and can be roughly divided into statistical matching  (Baktashmotlagh et al., 2013; Tzeng et al., 2014; Long et al., 2013, 2015; Ganin et al., 2016; Tzeng et al., 2017; Long et al., 2017) or adversarial learning (Ganin et al., 2016; Tzeng et al., 2017; Long et al., 2017; Ghifary et al., 2016) methods. Theoretical analysis of UDA approaches has been widely studied (Ben-David et al., 2006; Mansour et al., 2009; Zhang et al., 2019), which provides rigorous error bounds on the target data.

Existing UDA algorithms are developed under the assumption that the source and target domains share an identical group of classes. Such a scenario typically refers to a closed-set setting, which could be hardly guaranteed in real-world applications. Therefore, a more realistic Open-set adaptation setting has been introduced recently (Saito et al., 2018) which allows the target data to contain an additional “unknown” category, covering all irrelevant classes not present in the source domain.

The core idea of unsupervised open-set domain adaptation (OUDA) approaches (Busto and Gall, 2017; Baktashmotlagh et al., 2019; Saito et al., 2018; Liu et al., 2019; Feng et al., 2019)

is to learn a classifier from a larger hypothesis space for both shared and unknown classes in the source and target domains. According to 

(Ben-David et al., 2006; Mansour et al., 2009), the target error is bounded by the source risk, discrepancy distance across the domains, the shared error coming from the conditional shift (Zhao et al., 2019), and the open-set risk. Open-set risk contributes the most to the error bound when a large percentage of data is unknown.

While promising, existing OUDA approaches (Busto and Gall, 2017; Saito et al., 2018; Baktashmotlagh et al., 2019; Liu et al., 2019; Feng et al., 2019) lack an essential theoretical analysis of the aforementioned partial risks and the upper bound for the target risk, thus omitting potential solutions for improvement and leading to a biased solution. With the aim of minimising the aforementioned partial risks and achieving a tighter error bound for open-set adaptation, we combine the following four strategies in an end-to-end progressive learning framework:

  1. To suppress the source risk, we decompose the original hypothesis space into two subspaces and , where includes classifiers for the shared classes of the source and target domains and is specific to classifying unknowns in the target domain. With a restricted size of the subspace , the possibility of misclassifying source data as unknowns will be reduced.

  2. To control the open-set risk, we adopt the progressive learning paradigm (Bengio et al., 2009), where the target samples with low classification confidence are gradually rejected from the target domain and inserted as the pseudo-labeled unknown set in the source domain. This mechanism suppresses the potential negative transfer where the private representations across domains are falsely aligned.

  3. We address conditional shift (Zhao et al., 2019) at the both sample- and manifold-level in a transductive setting. Specifically, we design an episodic training scheme and align conditional distributions across domains by gradually replacing the source data with the pseudo-labeled known data in each episode. We learn class-specific representations by aggregating the source and target features and passing episodes through deep graph neural networks.

  4. Our algorithm is seamlessly equipped with an adversarial domain discriminator, which effectively closes the gap between the source and target marginal distributions for the known categories.

We applied our method on three challenging open-set object recognition benchmarks, i.e., the Office-Home, VisDA-17, and Syn2Real-O, to confirm its superiority to the existing state-of-the-art open-set domain adaptation approaches.

2 Preliminaries

In this section, we introduce the notations, problem settings and the theoretical definitions and analysis for the tasks of closed-set and open-set unsupervised domain adaptation.

Definition 1.

Closed-set Unsupervised Domain Adaptation (UDA). Let and be the distributions of the source domain and the target domain, respectively. The corresponding label spaces for both domains are equal, i.e., = {1, …, C}, where is the number of classes. The ultimate goal is to learn an optimal classifier for the target domain , based on the labeled source data and the unlabeled target data, where is the hypothesis space of classifiers.

Definition 2.

Open-set Unsupervised Domain Adaptation (OUDA) (Saito et al., 2018). Assume that we have the labeled source data and unlabeled target data , where

being the joint probability distribution of the source domain,

being the marginal distribution of the target domain, with and indicating the size of source and target dataset respectively. With training samples drawn i.i.d from both domains, the goal is to learn an optimal target classifier . Here the target label space includes the additional unknown class , which is not present in the source label space .

The source risk and target risk of a classifier with respect to the source distribution and the target distribution are given by,

(1)

where and

are class-prior probabilities of the source and target distributions, respectively. The bounded loss function

satisfies symmetry and triangle inequality. Particularly, the partial risk and can be defined as,

(2)

Before introducing the generalization bound for open-set domain adaptation, it is crucial to define a discrepancy measure between the source and target domains:

Definition 3.

Discrepancy Distance (Mansour et al., 2009). For any , the discrepancy between the distributions of the source and target domains can be formulated as:

(3)
Theorem 2.1.

Open-set Domain Adaptation Upper Bound (Fang et al., 2019). Given the hypothesis space with a mild condition that constant function , for , the expected error on target samples is bounded as,

(4)

where the shared error . The proof can be founded in the supplementary material.

Remark.

To compute the error upper bound for the closed-set unsupervised domain adaptation, Theorem 2.1 can be reduced to:

(5)

where and .

According to Equation (4), the target error is bounded by four terms, which opens four directions for improvement:

  • Source risk . Assuming that source domain does not include any unknown samples, a part of the source risk can be avoided, which in turn minimizes the error upper bound. This direction is rarely investigated in the existing literature of open-set domain adaptation.

  • Discrepancy distance . Minimizing the discrepancy distance between the source and the target domains has been well investigated in recent years in statistics-based (Gretton et al., 2006) or adversarial-based approaches (Ganin et al., 2016).

  • Shared error of the joint ideal hypothesis . tends to be large when the conditional shift encountered, where the class-wise conditional distributions are not aligned even with marginal distribution aligned.

  • Open set risk . When a large percentage of data is unknown (), this term contributes the most to the error bound. As shown in Equation (4), it can be interpreted as the mis-classification rate for the unknown samples.

Figure 1: Proposed PGL framework. Circles indicate the source data and triangles are the target data. Different colors indicate different classes. By alternating between Steps 2 and 3, we progressively achieve the optimal classification model for the shared classes and pseudo-labeling function for rejecting the unknowns.

3 Progressive Open-Set UDA

Aiming to minimise the four partial risks mentioned above, we reformulate the open-set unsupervised domain adaptation in a progressive way, and as such, we redefine the task at hand as follows.

Definition 4.

Progressive Open-Set Unsupervised Domain Adaptation (POUDA). Given the labeled source data and unlabeled target data , the main goal is to learn an optimal target classifier for the shared classes and a pseudo-labeling function for the unknown class .

Assume the target set will be pseudo-labeled through steps, thereby the enlarging factor for each step is defined as . As long as the hypothesis and

share the same feature extraction part, we can decompose the shared hypothesis

into and define the pseudo-labeling function at the -th step in line with ’s prediction:

(6)

with and

being the index-based threshold to classify the unknown and known samples. The hyperparameter

measures the openness of the given target set as the ratio of unknown samples. is a global ranking function which ranks predicted probabilities in ascending order and returns the sorted index list as an output. The pseudo-labeling function gives for the possible known samples, and for the unknown ones.

In our case, the upper bound of expected target risk is formulated in the following theorem,

Theorem 3.1.

POUDA Error Bound. Given the hypothesis space , , , for and , with a condition that the openness of the target set is fixed, the expected error on the target samples is bounded as:

(7)

where the shared error and indicates the probability that target samples being pseudo-labeled by (refer to the supplementary material for proof).

Remark.

For and , the following inequality holds,

(8)

We can observe that our progressive learning framework can achieve a tighter upper bound compared with conventional open-set domain adaptation framework.

4 Methodology

In this section, we go through the details of the proposed Progressive Graph Learning (PGL) framework as illustrated in Figure 1. Our approach is mainly motivated by the two aspects of alleviating the shared error and effectively controlling the progressive open-set risk .

Minimizing the shared error . Conditional shift (Zhao et al., 2019) is the most significant obstacle for finding a joint ideal classifier for the source and target data, which arises when the class-conditional distributions of the input features substantially differ across the domains. That means, with unaligned distributions of the source distribution and target distribution , there is no guarantee to find an ideal shared classifier for both domains. Therefore, we address the conditional shift in a transductive setting from two perspectives:

  • Sample-level: Motivated by (Vinyals et al., 2016; Snell et al., 2017), we adopt the episodic training scheme (Section 4.1), and leverage the source samples from each class to “support” predictions on unlabeled data in each episode. With an enlarging labeled set through pseudo-labeling (Section 4.2), we progressively update training episodes by replacing the source samples with pseudo-labeled target samples (Section 4.3).

  • manifold-level: To regularize the class-specific manifold, we construct -layer Graph Neural Networks (GNNs) on top of the backbone network (e.g., ResNet), which consists of paired node update networks and edge update networks . The source nodes and pseudo-labeled target nodes from the same class are densely connected, aggregating information though multiple layers.

Controlling progressive open-set risk . As discussed in Section 4.2, we iteratively squeeze the index-based thresholds and to approximate the optimal threshold as illustrated in Figure 2. Since the thresholds are mainly determined by the enlarging factor , we can always seek a proper value of to alleviate the mis-classification error and the subsequent negative transfer. Our experimental results characterize the trade-off between computational complexity and performance improvement.

4.1 Initial Episodic Training with GNNs

Firstly, we denote the initial episodic formulation of a batch input as , with as the batch size. Each episode in the batch consists of two parts, i.e., the source episode randomly sampled from each class and the target episode randomly sampled from the target set. All instances in a mini-batch can form an undirected graph . Each vertex is associated with a source or a target feature, and the edge between nodes and measures the node affinity. The integrated GNNs are naturally able to perform a transductive inference taking advantage of labeled source data and unlabeled target data. The propagation rule for edge update and node update is elaborated in the following subsections.

4.1.1 Edge Update

The generic propagation rule for normalized edge features at the -th layer can be defined as,

(9)

with

being the sigmoid function,

the degree matrix of ,

the identity matrix, and

the non-linear network parameterized by .

4.1.2 Node Update

Similarly, the propagation rule for node features at the -layer is defined as,

(10)

with being the neighbor set of the node , the concatenation operation and the network consisting of two convolutional layers, LeakyReLU activations and dropout layers. The node embedding is initialized with the extracted representations from the backbone embedding model, i.e., .

4.1.3 Joint Optimization

Domain Adaptation. We exploit adversarial loss to align the distributions of the source and target features extracted from the backbone network . Specifically, a domain classifier is trained to discriminate between the features coming from the source or target domains, along with a generator to fool the discriminator . The two-player minimax game shown in Eq.(4.1.3) is expected to reach an equilibrium resulting in the domain invariant features:

Node Classification. By decomposing the shared hypothesis into a feature learning module and a shared classifier , we train the both networks to classify the source node embedding. To alleviate the inherent class imbalance issue, we adopt the focal loss to down-weigh the loss assigned to correctly-classified examples:

with the hyperparameter and being the node embedding from the -th node update layer. The total loss combines all losses from layers to improve the gradient flow in the lower layers.

Edge Classification. Based on the given labels of the source data, we construct the ground-truth of edge map , where if and belong to the same class, and otherwise. The networks are trained by minimizing the following binary cross-entropy loss:

Final Objective Function. Formally, our ultimate goal is to learn the optimal parameters for the proposed model,

(11)

with and the coefficients of the edge loss and adversarial loss, respectively.

4.2 Pseudo-Labeling in Progressive Paradigm

With the optimal model parameters obtained at the -th step, we freeze the model and feed all the target samples in the forward direction, as shown in the Step 2 of Figure 1. Then, we rank the maximum likelihood produced from the shared classifier in an ascending order. Giving priority to the “easier” samples with relatively high/low confidence scores, we select samples to enlarge the pseudo-labeled known set and known set (Refer to Eq. (6)):

(12)

Note that and are newly annotated known set and unknown set, respectively and the pseudo-label is given by . To find a proper value of enlarging factor , we have two options: by aggressively setting a large value to , the progressive paradigm can be accomplished in fewer steps resulting in potentially noisy and unreliable pseudo-labeled candidates; on the contrary, choosing a small value of can result in a steady increase of the model performance and the computational cost.

Figure 2: An illustration of the progressive learning to construct the pseudo-labeled target set. indicates the ideal threshold for classifying known and unknown samples.

4.3 Episodic Update with Mix-up Strategy

We mix the source data with the samples from the updated pseudo-labeled known-set at the -th step, and construct new episodes at the -th step, as depicted in the Step 3 of Figure 1. In particular, We randomly replace the source samples with pseudo-labeled known data with a probability . Each episode in the new batch consists of three parts,

(13)

with being the conditional distribution of the pseudo-labeled known set at the -th step. Then, we update the model parameters according to Equation (11) and repeat pseudo-labeling with the newly constructed episodes until convergence.

5 Experiments

In this section, we quantitatively compare our proposed model against various domain adaptation baselines on the Office-Home, Syn2Real-O and VisDA-17 datasets. The baselines include three open-set domain adaptation methods of ATI- (Busto and Gall, 2017), OSBP (Saito et al., 2018), STA (Liu et al., 2019)); two closed-set domain adaptation methods of MMD (Gretton et al., 2006), DANN (Ganin et al., 2016)), and a basic ResNet-50 (He et al., 2016) deep classification model. To be able to apply the closed-set baseline methods (ATI-, MMD, DANN, ResNet-50) in the open-set setting, we follow the previous baselines (Liu et al., 2019)

and reject unknown outliers from the target data using

OSVM (Jain et al., 2014).

Evaluation Metrics: To evaluate the proposed method and the baselines, we utilize three widely used measures (Saito et al., 2018; Liu et al., 2019) , i.e., accuracy (ALL), normalized accuracy for all classes (OS) and normalized accuracy for the known classes only (OS):

(14)

with being the set of target samples in the -th class, and the classifier. In our case, we use the shared classifier for the known classes and pseudo-labeling function for the unknown one.

Implementation Details:PyTorch implementation of our approach is avaibale in an annonymized repository111https://github.com/BUserName/PGL. In our experiments, we employ ResNet-50 (He et al., 2016) or VGGNet (Simonyan and Zisserman, 2015)

pre-trained on ImageNet as the backbone network. For VGGNet, we only fine-tune the parameters in FC layers. The networks are trained with the ADAM optimizer with a weight decay of

. The learning rate is initialized as and for the GNNs and the backbone module respectively, and then decayed by a factor of every epochs. The dropout rate is fixed to and the depth of GNN is set to for all experiments. The loss coefficients and are empirically set to and , respectively.

5.1 Datasets

Office-Home (Venkateswara et al., 2017) is a challenging domain adaptation benchmark, which comprises 15,500 images from 65 categories of everyday objects. The dataset consists of 4 domains: Art (Ar), Clipart (Cp), Product (Pr), and Real-World (Rw). Following the same splits used in  (Liu et al., 2019), we select the first 25 classes in alphabetical order as the known classes, and group the rest of the classes as the unknown.

VisDA-17 (Peng et al., 2017) is a cross-domain dataset with 12 categories in two distinct domains. The Synthetic domain consists of 152,397 synthetic images generated by 3D rendering and the Real domain contains 55,388 real-world images from MSCOCO (Lin et al., 2014) dataset. Following the same protocol used in  (Saito et al., 2018; Liu et al., 2019), we construct the known set with 6 categories and group the remaining 6 categories as the unknown set.

Syn2Real-O (Peng et al., 2018) is the most challenging synthetic-to-real testbed, which is constructed from the VisDA-17. The Syn2Real-O dataset significantly increases the openness to 0.9 by introducing additional unknown samples in the target domain. According to the official setting, the Synthetic source domain contains training data from the VisDA-17 as the known set, and the target domain Real includes the test data from the VisDA-17 (known set) plus 50k images from irrelevant categories of MSCOCO dataset (unknown set).

Method ArCl ArPr ArRw ClRw ClPr ClAr PrAr PrCl PrRw RwAr RwCl RwPr Avg.
OS OS OS OS OS OS OS OS OS OS OS OS OS OS OS OS OS OS OS OS OS OS OS OS OS OS
ResNet+OSVM 37.5 38.7 42.2 42.6 49.2 51.4 53.8 55.5 48.5 50.0 39.2 40.3 53.4 55.1 43.5 44.8 70.6 72.9 65.6 67.4 49.5 50.8 72.7 75.1 52.1 53.7

DANN+OSVM
52.3 52.1 71.3 72.4 82.3 83.8 73.2 74.5 62.8 64.1 61.4 62.3 63.5 64.5 46.0 46.3 77.2 78.3 70.5 71.3 55.5 56.2 79.1 80.7 66.2 67.2

ATI-+OSVM
53.1 54.2 68.6 70.4 77.3 78.1 74.3 75.3 66.7 68.3 57.8 59.1 61.2 62.6 53.9 54.1 79.9 81.1 70.0 70.8 55.2 55.4 78.3 79.4 66.4 67.4
OSBP 56.1 57.2 75.8 77.8 83.0 85.4 75.5 77.2 69.2 71.3 64.6 65.9 64.6 65.3 48.3 48.7 79.5 81.6 72.1 73.5 54.3 55.3 80.2 81.9 68.6 70.1
STA 58.1 - 71.6 - 85.0 - 75.8 - 69.3 - 63.4 - 65.2 - 53.1 - 80.8 - 74.9 - 54.4 - 81.9 - 69.5 -
STA 46.6 45.9 67.0 67.2 76.2 76.6 64.9 65.2 57.7 57.6 50.2 49.3 49.5 48.4 42.9 40.8 76.6 77.3 68.7 68.6 46.0 45.4 73.9 74.5 60.0 59.8
PGL 61.6 63.3 77.1 78.9 85.9 87.7 82.8 85.9 72.0 73.9 68.8 70.2 72.2 73.7 58.4 59.2 82.6 84.8 78.6 81.5 65.0 68.8 83.0 84.8 74.0 76.1
Table 1: Recognition accuracies (%) on 12 pairs of source/target domains from Office-Home benchmark using ResNet-50 as the backbone. Ar: Art, Cp: Clipart, Pr: Product, Rw: Real-World. indicates our re-implementation with the officially released code.
Method Aer Bic Bus Car Hor Kni Mot Per Pla Ska Tra Tru UNK OS OS
ResNet (He et al., 2016)+OSVM 29.7 39.2 49.9 54.0 76.8 22.2 71.2 32.6 75.1 21.5 65.2 0.6 45.2 44.9 44.8
DANN (Ganin et al., 2016)+OSVM 50.8 44.1 19.0 58.5 76.8 26.6 68.7 50.5 82.4 21.1 69.7 1.1 33.6 46.3 47.4
OSBP (Saito et al., 2018) 75.5 67.7 68.4 66.2 71.4 0.0 86.0 3.2 39.4 23.2 68.1 3.7 79.3 50.1 47.7
STA (Liu et al., 2019) 64.1 70.3 53.7 59.4 80.8 20.8 90.0 12.5 63.2 30.2 78.2 2.7 59.1 52.7 52.2
PGL 81.5 68.3 74.2 60.6 91.9 45.4 92.2 41.0 87.9 67.5 79.2 6.4 49.6 65.5 66.8
Table 2: Recognition accuracies (%) for open-set domain adaptation experiments on the Syn2Real-O (ResNet-50).

5.2 Results and Analysis

As reported in Table 1, Table 2, and Table 3, we clearly observe that our method PGL consistently outperforms the state-of-the-art results, improving mean accuracy (OS) by , and on the benchmark datasets of Office-Home, Syn2Real-O and VisDA-17 datasets respectively. Note that our proposed approach provides significant performance gains for the more challenging datasets of Syn2Real-O and VisDA-17 which require knowledge transfer across different modalities. This phenomenon can be also observed in the transfer sub-tasks with a large domain shift e.g., RwCl and PrAr in Office-Home, which demonstrates the strong adaptation ability of the proposed framework. To study the validity of the progressive paradigm and early stopping strategy, we provide detailed graphs of our test performance per training step (OS, OS and ALL scores) in the supplementary material.

Method Bic Bus Car Mot Tra Tru UNK OS OS
MMD+OSVM 39.0 50.1 64.2 79.9 86.6 16.3 44.8 54.4 56.0
DANN+OSVM 31.8 56.6 71.7 77.4 87.0 22.3 41.9 55.5 57.8
ATI- 46.2 57.5 56.9 79.1 81.6 32.7 65.0 59.9 59.0
OSBP 51.1 67.1 42.8 84.2 81.8 28.0 85.1 62.9 59.2
STA 52.4 69.6 59.9 87.8 86.5 27.2 84.1 66.8 63.9
PGL 93.5 93.8 75.7 98.8 96.2 38.5 68.6 80.7 82.8
Table 3: Performance comparisons on the VisDA-17 (VGGNet).
Model UNK ALL OS OS
PGL w/o Progressive 43.6 44.8 54.4 55.3
PGL w NLL 48.6 49.7 56.9 57.6
PGL w/o GNNs 49.2 50.3 57.8 58.5
PGL w/o Mix-up 49.8 51.3 62.5 63.6
PGL 49.6 51.5 65.5 66.8
Table 4: Ablation performance on the Syn2Real-O (ResNet-50). “w” indicates with and “w/o” indicates without.
Enlarging Factor     Syn2Real-O Office-Home (Ar-Cl)
 OS OS   OS OS
63.0 63.3 59.9 61.1
64.5 65.7 60.7 61.6
65.6 66.5 61.8 63.1
Table 5: Performance comparisons w.r.t. varying enlarge factor on the VisDA-17 (ResNet-50).

Ablation Study: To investigate the impact of the derived progressive paradigm, GNNs, node classification loss, and mix-up strategy, we compare four variants of the PGL model on the Syn2Real-O dataset shown in Table 4. Except for PGL w/o Progressive that takes and , all experiments are conducted under the default setting of hyperparameters. PGL w/o Progressive corresponds to the model directly trained with one step, followed by pseudo-labeling function for classifying the unknown samples. As shown in Table 4, without applying the progressive learning strategy, the OS result of PGL w/o Progressive significantly drops by 16.9% because PGL w/o Progressive does not leverage the pseudo-labeled target samples leading to the failure in minimizing the shared error at the sample-level. In PGL w NLL, the focal loss of the node classification objective is replaced with the Negative log-likelihood (NLL) loss, resulting in OS performance dropping from 65.5% to 56.9%. Due to the absence of the focal loss re-weighting module, the model tends to assign more pseudo-labels to easy-to-classify samples, which consequently hinders effective graph learning in the episodic training process. In PGL w/o GNNs, we used ResNet-50 as the backbone for feature learning, which triggers 12.5% OS performance drops comparing to the graph learning model. The inferior results reveal that the GNN module can learn the class-wise manifold, which mitigates the potential noise and permutation by aggregating the neighboring information. PGL w/o Mix-up refers to the model that constructs episodes without taking any pseudo-labeled target data. We observe that the OS performance of PGL w/o Mix-up is 4.6% lower than the proposed model, confirming that replacing the source samples with pseudo-labeled target samples progressively can alleviate the side effect of conditional shift.

Figure 3: The t-SNE visualization of feature distributions on the RwAr task (Office-Home) with the ResNet-50 backbone.
Figure 4: Performance Comparisons w.r.t. varying (a) openness of the Syn2Real-o (ResNet-50); (b) loss coefficients and on the ArCl task (Office-Home) with the ResNet-50 backbone.

Robustness Analysis to Varying Openness: To verify the robustness of the proposed PGL, we conduct experiments on the Syn2Real-O with the openness varying in . The openness is defined as the ratio of unknown samples to all samples in the entire target set, which explicitly implies the level of challenge. The results of OSBP, STA and the proposed PGL are depicted in Figure 4(a). Note that OSBP and our PGL approach empirically sets a hyperparameter ( in our case) to control the openness, while STA automatically generates the soft weight in adversarial way and inevitably results in performance fluctuation. We observe that PGL consistently outperforms the counterparts by a large margin, which confirms its resistance to the change in openness.

Sensitivity to Loss Coefficients and : We show the sensitivity of our approach to varying the edge loss coefficient and adversarial loss coefficient in Figure 4(b). We vary the value of one loss coefficient from (0, 2] at each time, while fixing the other parameter to the default setting. Two observations can be drawn from Figure 4(b): The OS score becomes stable when loss coefficients are within the interval of [0.7, 2]; When , , the model performance drops by and respectively, which verifies the importance of the edge supervision and adversarial learning in our framework.

Sensitivity to Enlarging Factor : We further study the effectiveness of the enlarging factor , which controls the enlarging speed of the pseudo-labeled set, shown in Table 5. We note that the proposed model with a smaller value of consistently performs better on both the Syn2Real-O and Office-Home datasets. This testifies our theoretical findings that the progressive open-set risk can be controlled by consecutively classifying unknown samples. With a sacrifice on the training time, this strategy also provides more reliable pseudo-labeled candidates for the shared classifier learning preventing the potential error accumulation in the next several steps.

t-SNE Visualization. To intuitively showcase the effectiveness of OUDA approaches, we visualize the feature distributions with t-SNE after extracting features from the baseline models (ResNet-50, DANN, OSBP, STA) and our proposed model PGL. As shown in Figure 3, compared with ResNet-50 and DANN, open-set domain adaptation methods generally have a better separation between the known (in blue and red) and unknown (in grey) categories. STA achieves a better alignment between the source and target distributions in comparison with OSBP, while the PGL can obtain a clearer class-wise classification boundary benefiting from our graph neural networks and the mix-up strategy. Please refer to supplementary material for more t-SNE visualizations.

Edgemap Visualization. To further analyze the validity of the edge update networks, we extract the learned feature map from the PGL with a single-layer GNN on the Syn2Real-O dataset. As visualized in Figure 5(b), a large value of corresponds to a high degree of correlations between node and , which resembles the pattern of the ground-truth edge label as displayed in Figure 5(a).

Figure 5: Visualization of edge features on the Syn2Real-O. Left: the binary ground-truths label map. Right: the learned edge map from the proposed edge update networks. Best viewed in color.

6 Conclusion

We have addressed the open-set domain shift problem in both sample- and manifold-level by controlling the open-set risk. Experiments show that our proposed progressive graph learning framework performs consistently well on challenging object recognition benchmarks for open-set adaptation with significant domain discrepancy and conditional shifts.

Acknowledgements

This work was partially supported by ARC DP 190102353.

References

  • M. Baktashmotlagh, M. Faraki, T. Drummond, and M. Salzmann (2019) Learning factorized representations for open-set domain adaptation. In Proc. Int. Conference on Learning Representations (ICLR), Cited by: §1, §1.
  • M. Baktashmotlagh, M. T. Harandi, B. C. Lovell, and M. Salzmann (2013) Unsupervised domain adaptation by domain invariant projection. In

    Proc. Int. Conference on Computer Vision (ICCV)

    ,
    Cited by: §1.
  • S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira (2006) Analysis of representations for domain adaptation. In Proc. Advances in Neural Information Processing Systems (NeurIPS), Cited by: §1, §1.
  • Y. Bengio, J. Louradour, R. Collobert, and J. Weston (2009) Curriculum learning. In Proc. Int. Conference on Machine Learningn (ICML), Cited by: item 2.
  • P. P. Busto and J. Gall (2017) Open set domain adaptation. In Proc. Int. Conference on Computer Vision (ICCV), Cited by: §1, §1, §5.
  • Z. Fang, J. Lu, F. Liu, J. Xuan, and G. Zhang (2019) Open set domain adaptation: theoretical bound and algorithm. CoRR arXiv preprint arXiv:1907.08375. Cited by: Theorem 2.1.
  • Q. Feng, G. Kang, H. Fan, and Y. Yang (2019) Attract or distract: exploit the margin of open set. In Proc. Int. Conference on Computer Vision (ICCV), Cited by: §1, §1.
  • Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. S. Lempitsky (2016) Domain-adversarial training of neural networks. Journal of Machine Learning Research, pp. 59:1–59:35. Cited by: §1, 2nd item, Table 2, §5.
  • M. Ghifary, W. B. Kleijn, M. Zhang, D. Balduzzi, and W. Li (2016) Deep reconstruction-classification networks for unsupervised domain adaptation. In Proc. European Conference on Computer Vision (ECCV), Cited by: §1.
  • A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. J. Smola (2006) A kernel method for the two-sample-problem. In Proc. Advances in Neural Information Processing Systems (NeurIPS), Cited by: 2nd item, §5.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    ,
    Cited by: Table 2, §5, §5.
  • L. P. Jain, W. J. Scheirer, and T. E. Boult (2014) Multi-class open set recognition using probability of inclusion. In Proc. European Conference on Computer Vision (ECCV), Cited by: §5.
  • T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick (2014) Microsoft COCO: common objects in context. In Proc. European Conference on Computer Vision (ECCV), Cited by: §5.1.
  • H. Liu, Z. Cao, M. Long, J. Wang, and Q. Yang (2019) Separate to adapt: open set domain adaptation via progressive separation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1, §1, §5.1, §5.1, Table 2, §5, §5.
  • M. Long, Y. Cao, J. Wang, and M. I. Jordan (2015) Learning transferable features with deep adaptation networks. In Proc. Int. Conference on Machine Learning (ICML), Cited by: §1.
  • M. Long, J. Wang, G. Ding, J. Sun, and P. S. Yu (2013)

    Transfer feature learning with joint distribution adaptation

    .
    In Proc. Int. Conference on Computer Vision (ICCV), Cited by: §1.
  • M. Long, H. Zhu, J. Wang, and M. I. Jordan (2017)

    Deep transfer learning with joint adaptation networks

    .
    In Proc. Int. Conference on Machine Learning (ICML), Cited by: §1.
  • Y. Mansour, M. Mohri, and A. Rostamizadeh (2009) Domain adaptation: learning bounds and algorithms. In Proc. Conference on Learning Theory (COLT), Cited by: §1, §1, Definition 3.
  • X. Peng, B. Usman, N. Kaushik, J. Hoffman, D. Wang, and K. Saenko (2017) VisDA: the visual domain adaptation challenge. CoRR arXiv preprint arXiv:1710.06924. Cited by: §5.1.
  • X. Peng, B. Usman, K. Saito, N. Kaushik, J. Hoffman, and K. Saenko (2018) Syn2Real: A new benchmark for synthetic-to-real visual domain adaptation. CoRR arXiv preprint arXiv:1806.09755. Cited by: §5.1.
  • K. Saito, S. Yamamoto, Y. Ushiku, and T. Harada (2018)

    Open set domain adaptation by backpropagation

    .
    In Proc. European Conference on Computer Vision (ECCV), Cited by: §1, §1, §1, §5.1, Table 2, §5, §5, Definition 2.
  • K. Simonyan and A. Zisserman (2015) Very deep convolutional networks for large-scale image recognition. In Proc. Int. Conference on Learning Representations (ICLR), Cited by: §5.
  • J. Snell, K. Swersky, and R. S. Zemel (2017) Prototypical networks for few-shot learning. In Proc. Advances in Neural Information Processing Systems (NeurIPS), Cited by: 1st item.
  • E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell (2017) Adversarial discriminative domain adaptation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1.
  • E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell (2014) Deep domain confusion: maximizing for domain invariance. CoRR arXiv preprint arXiv:1412.3474. Cited by: §1.
  • H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan (2017) Deep hashing network for unsupervised domain adaptation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §5.1.
  • O. Vinyals, C. Blundell, T. Lillicrap, K. Kavukcuoglu, and D. Wierstra (2016) Matching networks for one shot learning. In Proc. Advances in Neural Information Processing Systems (NeurIPS), Cited by: 1st item.
  • Y. Zhang, T. Liu, M. Long, and M. I. Jordan (2019) Bridging theory and algorithm for domain adaptation. In Proc. Int. Conference on Machine Learning (ICML), Cited by: §1.
  • H. Zhao, R. T. des Combes, K. Zhang, and G. J. Gordon (2019) On learning invariant representations for domain adaptation. In Proc. Int. Conference on Machine Learning (ICML), Cited by: item 3, §1, §4.