Attract or Distract: Exploit the Margin of Open Set

08/06/2019 ∙ by Qianyu Feng, et al. ∙ University of Technology Sydney 0

Open set domain adaptation aims to diminish the domain shift across domains, with partially shared classes. There exist unknown target samples out of the knowledge of source domain. Compared to the close set setting, how to separate the unknown (unshared) class from the known (shared) ones plays a key role. Whereas, previous methods did not emphasize the semantic structure of the open set data, which may introduce bias into the domain alignment and confuse the classifier around the decision boundary. In this paper, we exploit the semantic structure of open set data from two aspects: 1) Semantic Categorical Alignment, which aims to achieve good separability of target known classes by categorically aligning the centroid of target with the source. 2)Semantic Contrastive Mapping, which aims to push the unknown class away from the decision boundary. Empirically, we demonstrate that our method performs favourably against the state-of-the-art methods on representative benchmarks, e.g. Digit datasets and Office-31 datasets.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recent days have witnessed the advancement in many computer vision tasks  

[21, 37, 38, 15, 30, 24, 16]. The success achieved can be largely attributed to the sufficient amount of labeled in-domain data. However, it is common that test data comes from a different distribution against the training data. Such so-called domain shift may degenerate the model performance heavily. Domain adaptation deals with this issue by diminishing the discrepancy across two domains. The widely considered close set setting assumes that both domains share the same set of underlying categories. However, in practice, it is common that some unshared (unknown) classes exist in the target. The methods developed for close set domain adaptation may not be trivially transferred to such open set setting.

In this paper, we focus on the open set visual domain adaptation which aims to deal with the domain shift and the identification of unknown objects simultaneously, in the absence of target domain labels. Compared to the close set domain adaptation, how to separate the unknown class from the known ones plays a key role. Up to now, the open set recognition still remains as a pending issue. First raised by Busto et al. [3], they proposed to deal with open-set domain adaptation as an assignment task. [1] separated the unknown according to whether the sample can be reconstructed with the shared feature or not. While the above methods use part of labeled data from uninteresting classes as unknown samples, it is not possible to represent all the unknown categories in the wild. Another setting has been raised by Saito et al. [22] where unknown samples only exist in the target domain, which is closer to a realistic scenario. Saito et al. regarded the unknown samples as a separate class together with an adversarial loss to distinguish them. It is worth noting that the existence of unknown samples hinders the alignment across domain. In the meanwhile, the disalignment inter-class across domain also makes it harder to distinguish the unknown samples.

Figure 1: Visualization of data distribution with the proposed method. Left: Data before the adaptation with the existence of domain shift and unknown samples. Middle: Neighbors from the same class are pulled closer, while samples from unknown class are pushed away. Right: With the proposed method, representations become more discriminative. Samples from target domain can be better aligned within the corresponding neighborhood or distracted away from the known classes.

Considering the aforementioned problem, we take the semantic structure of open set data into account to make the unknown class more separable and thus improve the model’s predictive ability on target domain data. Specifically, we focus on enlarging two kinds of margins: 1) the margin across the known classes and 2) the margin between the unknown samples and the known classes. For the first one, we aim to make the known classes more separable and for the second one, we expect to push the unknown class away from the decision boundary. As shown in Fig. 1, during training, samples coming from different domains but within the same class (e.g. the blue circle and the red circle) “attract” each other. For each domain, the margin between different known classes (e.g. the blue circle and the blue star) are enlarged. Moreover, the samples within unknown class (e.g. the irregular polygons) are “distracted” from the samples from known classes.

We propose using semantic categorical alignment (SCA) and semantic contrastive mapping (SCM) to achieve our goal. For semantic categorical alignment, due to the absence of target annotations, we indirectly promote the separability across target known classes by categorically aligning their centers with those in the source domain. For the source domain, although to an extent, the separability across known classes can be achieved through imposing the cross-entropy loss on the labeled data, we explicitly model and enhance such separability with the contrastive-center loss [4]. Empirically we demonstrate that our method leads to more discriminative features and benefit the semantic alignment across domains.

Although the semantic categorical alignment helps make the decision boundary aligned across two domains, there may still exists confusing data of unknown class lying near the decision boundary. Thus we propose using semantic contrastive mapping to push the unknown class away from the boundary. In detail, we design the contrastive loss to make the margin between the unknown and known class larger than that between known classes.

As the target labels are not available, we use the predictions of the network at each iteration as the hypothesis of target labels to perform the semantic categorical alignment and the semantic contrastive mapping. We start our training from the source trained model to give a good initialization of target label hypothesis. Although the hypothesis of target labels may not be accurate, SCA and SCM in itself are robust to such noisy labels. Empirically we find that the estimated SCA/SCM loss works as a good proxy to improve the model’s performance on the target domain data.

In a nutshell, our contributions can be summarized as

  • We propose using semantic categorical alignment to achieve good separability of target known classes and semantic contrastive mapping to push the unknown class away from the decision boundary. Both benefits the adaptation performance noticeably.

  • Our method performs favourably against the-state-of-the-art methods on two representative benchmarks, i.e. on Digits dataset, it achieves 84.3% accuracy on the average, 1.9% higher than the state-of-the-art; on Office-31 dataset, we achieve 89.7% with AlexNet and 89.1% with VGG.

2 Related work

Domain adaptation for visual recognition aims to bridge the knowledge gap between across different domains. Approaches for open set recognition attempt to figure out the unknown samples while identifying samples from known classes. In a real scenario, data not only come from diverse domains but also varies in a wide range of categories. This paper focuses on dealing with the overlap of these two problems.

Many methods [25, 26, 10, 39, 2, 36, 19, 42, 27, 43] have been proposed for unsupervised domain adaptation (UDA), including the deep network. These work bring significant results focusing on the closed set in the following aspects. Distribution-based learning. Many approaches aim to learn features invariant to domain with a distance metric [25, 26, 40], , KL-divergence, Maximum Mean Discrepancy (MMD), Wasserstein distance, but they neglected the alignment of conditional distribution. The categorical information is exploited to align domains at a fine-grained level together with the pseudo-label. The marginal distribution and conditional distribution can also be jointly aligned with a combined MMD proposed by [33]. [35, 14, 32, 34] pay attention to the discriminative property of the representation. This paper is also related to work [18, 12, 5, 28, 7, 8, 41] considering the categorical semantic compactness and separability as the same time. Task-oriented learning. Approaches [39, 2, 9] tend to align the domain discrepancy in an adversarial style. Ganin et al. [9] proposed to learn domain-invariant feature by using an adversarial loss which reverses the gradients during the back-propagation. Bousmalis et al. [2] enabled the network to separate the generated features into domain-specific subspace and domain-invariant subspace.

Figure 2: Framework of the proposed method. There are three modules: Adversarial Domain Adaptation (ADA), Semantic Categorical Alignment (SCA) and Semantic Contrastive Mapping (SCM). SCA aims to learn discriminative representation and align samples from the same category across domains. SCM attempts to distract unknown samples away from all the known categories. All the modules are trained simultaneously and work together to better categorize each known class and unknown class.

The aforementioned methods tackle with domain adaptation in the closed-set scenario. Inspired by recent work in open-set recognition, the problem of Open Set Domain Adaptation (OSDA) is raised by Busto et al. [3]. With several classes trained as unknown samples, Busto et al. proposed to solve this problem by learning a mapping across domains with an assignment problem of the target samples. As is aforementioned, it is not possible to cover all the unknown samples with selected categories. Another setting where the unknown class does not exist in the source domain is raised by Saito et al. [22]. Regarding unknown as a different class, they enable the network to align feature between known classes and reject the unknown samples at the same time. [1] tried to separate unknown samples from known class by disentangling the representation into private and shared parts. They proved that the samples from the known classes can be reconstructed with shared features while the unknown samples can not.

Our method also regards the unknown samples as an “unknown” class. What is different, our method devotes to solve the open-set domain adaptation by enhancing the discriminative property of representation, aligning similar samples in the target with source domain while pushing the unknown samples away from all the known classes.

3 Method

3.1 Overall Architecture

The crucial problems of open set domain adaptation consist in two aspects, , align the target known samples with the known samples in the source domain, and separate unknown samples in target from the target known samples. To solve these two problems, we design the following modules. 1) Adversarial Domain Adaptation (ADA). Based on a cross-entropy loss, ADA aims to initially align samples in the target with source known samples or classify them as unknown. 2) Semantic Categorical Alignment (SCA). This module consists of two parts. First, based on a contrastive-center loss, aims to compact the representation of samples from the same class. Second, based on a center loss across domains, tries to align the distribution of the same class between source and target. 3) Semantic Contrastive Mapping (SCM). With a contrastive loss, SCM aims to encourage the known samples in the target to move closer to the corresponding centroid in source. While it also attempts to keep the unknown samples away from all the known classes.

The overall framework of our method is illustrated in Fig 2. It consists of an encoder , a generator and a discriminator . The image encoder

is a pretrained CNN network to extract semantic features which may involve the domain variance. The feature generator

is composed of a stack of fully-connected (FC) layers. It aims to transform the image representation into a task-oriented feature space. The discriminator classifies each sample with the generated representation into a category.

3.2 Adversarial Domain Adaptation

Suppose is a set of labeled images sampled from the source domain, in which each image is paired with a label . Another set of images derives from the target domain. Different from , each image in is unlabelled and may come from unknown classes. The goal of open set domain adaptation is to classify the input image into classes, where denotes the number of known classes. All samples from unknown classes are expected to be assigned to the unknown class .

We leverage an adversarial training method to initially align samples in the target with source known samples or reject them as the unknown. Specifically, the discriminator is trained to separate the source domain and the target domain. However, the feature generator tries to minimize the difference between the source and the target. When an expert fails to figure out which domain the sample comes from, the learns the domain-invariant representation.

We use the cross-entropy loss together with the softmax function for the known source samples classification,

(1)

Following [22], in an attempt to make a boundary for an unknown sample, we utilize a binary cross entropy loss,

(2)

By the gradient reverse layer [9], we can flip the sign of the gradient during backward, which allows us to update the parameters of and simultaneously. Then, the objective of the ADA module can be formulated as

(3)

The ADA module only initially aligns samples in the target with source known samples and learns a rough boundary between the known and the unknown.

3.3 Semantic Categorical Alignment

We try to address the issues existing in ADA by further exploring the semantic structure of open-set data. To separate the unknown class from the known in the target domain, we should 1) make each known class more concentrate and the alignment between the source and the target more accurate and 2) push the unknown class away from the decision boundary. In this section, we aim to solve the first problem.
We introduce the Semantic Categorical Alignment (SCA), which aims to compact the representation of known classes and distinguish each known class from others. There are two steps in SCA. First, the contrastive-center loss[4] is adopted to enhance the discriminative property of generated features of source samples. Second, each centroid of known classes from target will be aligned with the corresponding centroid of class in source domain. In this way, representations of source samples will finally become more discriminative, meanwhile, the known target centroids will be aligned more accurate.
To compact the source samples that belong to the same class in the feature space, we apply the following contrastive-center loss to the source samples,

(4)

where denotes the number of samples in a mini-batch during training procedure, denotes the -th training sample from the source domain. denotes the centroid of class in the source domain. is a constant used for preventing zero-denominator. In our experiments, is set to be by default.

To align the two centroids of a known class between the source and target, we try to minimize the distance between the pair of centroids , where and represent the centroids of class from the source and target domain, respectively.

Due to the randomness and deviation in each mini-batch, we align the global centroids (calculated from all samples) instead of the local centroids (calculated from a mini-batch). However, it is not easy to directly obtain the global centroids. We propose to partially update them with the local centroids at every iteration, according to their cosine similarities to the centroids in the source domain. Specifically, we first compute the initial global centroids based on the prediction of the pretrained model as follows,

(5)

where denotes the number of samples with label as . The pretrained model is trained on the source domain within the supervised classification paradigm. For the target samples, we use the results of prediction as pseudo labels. In each iteration, we compute a set of local centroids using the mini-batch samples, where denotes the iteration. We compute the local centroids as the average of all samples in each iteration. Then, the source centroid and target centroid are updated with re-weighting as follows,

(6)
(7)

where is defined as Finally, the categorical center alignment loss is formulated as follows,

(8)

The benefits of SCA are intuitive: 1) The contrastive center loss, , Eq. (4), enhances the compactness of the representations which also enlarges the margin inter-class. 2) The categorical center alignment loss, Eq. (8), guarantees that the centroids of the same class are aligned between the source domain and the target domain. 3) The dynamic update together ensures that the SCA aligns the global and up-to-date categorical distributions. Furthermore, the reweighting technique weakens the incorrect pseudo-labels and therefore can alleviate the accumulated error of the pseudo-labels.

3.4 Semantic Contrastive Mapping

SCA aligns the centroids of the same class between the source domain and the target domain. For the non-centroid samples in the target domain, we employ a contrastive loss function to encourage the known samples to move closer to their centroids and enforce the unknown samples to stay far away from all the centroids of known classes. By this way, we can align the non-centroid samples in the target domain. We refer to this process as the Semantic Contrastive Mapping (SCM).

Since the pseudo labels of target samples are not totally correct, we select reliable samples whose classification probabilities are over a threshold. We set the threshold to

in our method. SCM aims to reduce the distance between the reliable known samples and their centroids, while enlarge the distance between the reliable unknown samples and all centroids.

(9)

where is equal to 0 if is predicted as class , otherwise, z equals to 1. denotes the distance between target known samples and the corresponding source centroid.

denotes the distance between target unknown samples and all the source known classes. Inspired by the energy-base model in 

[13], functions are designed as follows,

(10)
(11)

where denotes the cosine similarity. To ensure an efficient and accurate measurement of the distances, we also use a hyper-parameter to re-weight distances calculated in the loss. is a categorical adaptive margin to measure the radius of neighborhood of class , defined as follows,

(12)

3.5 Objective

0:  Labeled samples batches from source domain, unlabeled samples batches from target domain. and denote the mini-batch data in the training set.
0:  Parameters in the network
1:  1st Stage
2:  Pretrain and based on , update
3:  2nd Stage
4:  
5:  while not converge do
6:     Calculate the current global centroids and
7:     for  to  do
8:        Update and by using Eq. 6 and Eq. 7
9:        Calculate pair distance between and
10:        Select reliable target samples
11:        Calculate pair distance between ,
12:        Train with by optimizing loss in Eq. 13, update
13:        
14:     end for
15:  end while
Algorithm 1 Exploit the Margin of Open Set, e denotes the training step, denotes the iteration times.

In the proposed method, considering the intra-class compactness and inter-class separability, we design the two modules SCA and SCM based on the adversarial learning in ADA. Formally, the final objective is defined in Eq. 13.

(13)

In each iteration, the network updates the class centroids and network parameters simultaneously. The overall algorithm is shown in Algorithm 1. SCA attempts to enlarge the margins between known classes in source and categorically align the centroids across domain. SCM attempts to align all the known target samples to its source neighborhoods, while keeping the distance between unknown samples and the centroids of known classes around an adaptively determined margin. With SCA, the discriminator in ADA is access to more discriminative representation and well-aligned semantic features. On the other side, SCM aids to distinguish the unknown samples from the other known classes.

4 Experiments

SVHN-MNIST USPS-MNIST MNIST-USPS Average
Method OS OS* ALL UNK OS OS* ALL UNK OS OS* ALL UNK OS OS* ALL UNK
OSVM 54.3 63.1 37.4 10.5 43.1 32.3 63.5 97.5 79.8 77.9 84.2 89.0 59.1 57.7 61.7 65.7
MMD+OSVM 55.9 64.7 39.1 12.2 62.8 58.9 69.5 82.1 80.0 79.8 81.3 81.0 68.0 68.8 66.3 58.4
BP+OSVM 62.9 75.3 39.2 0.7 84.4 92.4 72.9 0.9 33.8 40.5 21.4 44.3 60.4 69.4 44.5 15.3
OSDA+BP[22] 63.0 59.1 71.0 82.3 92.3 91.2 94.4 97.6 92.1 94.9 88.1 78.0 82.4 81.7 84.5 85.9
Ours w/o SCA 65.6 61.6 73.9 85.4 93.6 95.4 87.9 82.8 86.5 86.1 88.1 88.5 81.9 81.0 83.3 85.6
Ours w/o SCM 65.5 61.0 74.8 87.8 92.5 93.8 87.1 81.1 84.6 84.0 86.0 87.7 80.9 79.6 82.6 85.5
Ours 68.6 65.5 75.3 84.3 93.1 95.2 92.8 91.7 91.3 92.0 90.7 87.8 84.3 84.2 86.3 87.9
Table 1: Accuracy (%) of experiments on Digit dataset.
Figure 3: A comparison between the existing method and the proposed method. First row: visualization of features of the state-of-the-art method OSDA+BP[22] . Second row: visualization of features generated by our method. The left two column show features of source and target in SVHN MNIST, the right two columns are features of source and target in MNIST USPS. The color red in the target represents the unknown class.

4.1 Setup

In this section, we evaluate the proposed method on the open set domain adaptation task using two benchmarks, , Digit datasets and Office-31 [31]. Considering the setting where unknown samples only exist in the target domain. We compare the performance of our method OSDA+BP [22] and other baselines including: Open-set SVM (OSVM) [17] and other methods combined with OSVM, , Maximum Mean Discrepancy(MMD) [11], BP [9], ATI- [3]. OSVM classifies test samples into unknown class with a threshold of probability when the predicted probability is lower than the threshold for other classes. OSVM also requires no unknown samples in the source domain during training. MMD+OSVM is a combination method with OSVM and MMD-based method for network in [25]. MMD is discrepancy measure metric used to match the distribution across domains. BP+OSVM combines OSVM with a domain classifier, BP [9], which is a representative of adversarial learning applied in unsupervised domain adaptation.

Digits We begin by exploring three Digit datasets, i.e. SVHN [29], MNIST [23] and USPS [23]. SVHN contains colored digit images of size , where more than one digit may appear in a single image. MNIST includes grey digit images and USPS consists of grey digit images. We conduct 3 common adaptation scenarios including SVHN to MNIST, USPS to MNIST and MNIST to USPS.

Office-31 [31] is a standard benchmark for domain adaptation. There exist three distinct domains: Amazon (A) with 2817 images from the merchants, Webcam (W) with 795 images of low resolution and DSLR (D) with 498 images of high resolution. Each domain shares 31 categories with the others. We examine the full transfer scenarios in our experiments.

Implementation For Digit datasets, we employ the same architecture with [22]. For Office-31, we employ two representative CNN architectures, AlexNet [21] and VGGNet [37]

, to extract the visual features. For both the generator and classifier, we use one-layer FC followed with Leaky-RELU and Batch-Normalization. For

Office-31

, we initialize the feature extractor from the ImageNet 

[6] pretrained model For both datasets, we first train our model with labeled source domain data. All networks are trained by Adam [20] optimizer with weight decay . The initial learning rates for Digit and Office-31 datasets is and respectively. Learning rate decreases following a cosine ramp-down schedule. We set the hyper-parameters , , and in all the experiments. Following [3], we report the accuracy averaged over the classes in the OS and OS*. The average accuracy of all classes including the unknown one is denoted as OS. Accuracy measures only on the known classes of the target domain is denoted as OS*. All the results reported are the accuracy averaged over three independent running.

Adaptation Scenario
A-D A-W D-A D-W W-A W-D AVG
OS OS* OS OS* OS OS* OS OS* OS OS* OS OS* OS OS*
Method w/o unknown classes in source domain (AlexNet)
OSVM 59.6 59.1 57.1 55.0 14.3 5.9 44.1 39.3 13.0 4.5 62.5 59.2 40.6 37.1
MMD + OSVM 47.8 44.3 41.5 36.2 9.9 0.9 34.4 28.4 11.5 2.7 62.0 58.5 34.5 28.5
BP+OSVM 40.8 35.6 31.0 24.3 10.4 1.5 33.6 27.3 11.5 2.7 49.7 44.8 29.5 22.7
ATI-[3] + OSVM 72.0 - 65.3 - 66.4 - 82.2 - 71.6 - 92.7 - 75.0 -
OSDA+BP[22] 76.6 76.4 70.1 69.1 62.5 62.3 94.4 94.6 82.3 82.2 96.8 96.9 80.4 80.2
Ours w/o SCA 87.8 89.0 85.6 87.1 74.2 73.8 97.2 98.1 74.9 73.9 98.5 99.0 86.5 87.0
Ours w/o SCM 89.8 91.2 88.0 90.6 77.8 77.9 97.6 98.6 75.1 75.0 98.0 99.3 87.7 88.8
Ours 91.0 92.7 89.5 89.6 81.8 83.0 97.8 98.8 78.7 81.4 98.5 99.7 89.7 90.7
Method w/o unknown classes in source domain (VGGNet)
OSVM 82.1 83.9 75.9 75.8 38.0 33.1 57.8 54.4 54.5 50.7 83.6 83.3 65.3 63.5
MMD + OSVM 84.4 85.8 75.6 75.7 41.3 35.9 61.9 58.7 50.1 45.6 84.3 83.4 66.3 64.2
BP+OSVM 83.1 84.7 76.3 76.1 41.6 36.5 61.1 57.7 53.7 49.9 82.9 82.0 66.4 64.5
OSDA+BP[22] 85.8 85.8 76.9 76.6 89.4 91.5 96.0 96.6 83.4 83.1 97.1 97.3 88.0 88.5
Ours 90.1 92.0 86.4 87.7 81.6 88.4 97.9 99.8 80.3 82.6 98.2 99.3 89.1 91.6
Table 2: Accuracy (%) of each method on scores of OS and OS*. A, D and W correspond to Amazon, DSLR and Webcam respectively. The ablation versions of our method w/o SCA and w/o SCM are also reported.

4.2 Results on Digit Dataset

In the three Digit datasets, numbers 04 are chosen as known classes. Samples from the known classes make up the source samples. In the target samples, numbers 59 are regarded as one unknown class. Besides the scores of OS and OS*, we also report the total accuracy for samples in target and the accuracy of unknown class, which are denoted as ALL and UNK, respectively.

As shown in Table 1, our method produces competitive results compared to other methods. Results of our method outperform the other methods in SVHN MNIST, MNIST USPS and the average scores. It is shown that our method is better at recognizing the unknown samples(59) while maintaining the performance of identifying the known classes. For SVHN MNIST, the semantic gap between them is large, as there may exist several digits in the images of SVHN. Thus the accuracies of SVHN MNIST are lower than the other two scenarios. Our method outperforms existing methods on the average scores. For instance, our approach achieves 87.9% in the average score of unknown class. This is 2.0% higher than OSDA+BP [22]. Learned features obtained by the trained model of the three scenarios are visualized in Fig 3. We observe that the distribution of unknown samples is more decentralized in SVHN MNIST because of the large divergence of the two domains. Compared with OSDA+BP [22], the proposed method could better centralize samples of the same known class and distinguish unknown samples from known ones.

4.3 Results on Office-31 Dataset

We compare our method with other works on Office-31 dataset following [3]. There are 31 classes in this dataset, and the first 10 classes in alphabetical order are selected as known classes. In  [22], 21-31 classes are selected as unknown samples in the target domain, which only exist in the target domain. We evaluate the experiment on all the 6 scenario tasks: A D, A W, D A, D W, W A, W D. The improvement on some hard transfer tasks is encouraging to prove the effectiveness and value of the proposed method.

Results of Office-31 are shown in Table 2

. With features extracted with AlexNet, our method significantly outperforms the state-of-the-art methods except W

A. Our approach achieves 89.7% in OS score based on AlexNet, overpassing the state-of-the-art by 2%. For OS*, our approach improves the results of state-of-the-art from 80.2% to 90.7%. Based on VGG, our method reaches 89.1% on OS score and 91.6 on OS* score, outperforming the state-of-the-art method by 1.1% and 3.1%, respectively. Especially, under the harder scenarios of A D, A W, D A, our method brings large improvement.

4.4 Ablation Study

Figure 4: (a): A comparison of the behavior of our method with different re-weight in the contrastive loss. (b): A comparison between static margin and our adaptive margin. (c): Distances between centroid of class “backpack” labeled “0” in the target with centroids of class 05 in source . (d): Performance of the proposed method under different ratios of unknown samples.

For a straightforward understanding of the proposed method, we further evaluate each module via ablation experiments on Digit datasets. We alternately remove the SCA and SCM from our model. Results are reported in Table 1 above our final results. A decrease in performance is observed when removing SCA or SCM. Particularly, when the discriminative learning (w/o SCA) or contrastive learning (w/o SCM) is ablated, the accuracy of OS* or UNK or both of them will decrease significantly. We reconfirm the effect of each module based on AlexNet in the experiments of Office-31. Results in Table 2 also indicate the importance of learning the discriminative representation and contrastive mapping simultaneously. It indicates that the two modules take effect jointly. The discriminative representation helps to push away the unknown samples while the distraction of unknown samples assists the alignment of known categories.
Effect of adaptive margin. As the margin is designed for the contrastive loss as an adaptive one, we also observe the behavior of the model with static margin on A D. We choose constant value of margin for comparison. When is equal to 0, the contrastive term in the objective is only to align all the target samples predicted as known with the corresponding centroid in source. When is assigned with a large value, the model tends to penalize all the target samples predicted as unknown samples with large loss. According to results in Fig. 4(b), the accuracies of OS and OS* are trending downward when using a constant margin. The accuracy of UNK raises when the margin is 20 and 60. To further investigate the changes of distances between categories during the training of model, we visualize the centroid of known class “Backpack” in the target with the centroids of known classes in the source domain. The “Backpack” is labeled as 0. Results are shown in Fig. 4

(c). With the training of model, the distance between the centroids of class 0 in the source and target domains is declining. This indicates that the distribution between the source and target domains are aligned for class 0. In the meanwhile, distances between centroids of target class 0 and the other classes in source domain improve with the increase of training epoch. In spite of the reduce of discrepancy between the two domains, the distances with the other classes in source are increasing. This demonstrates that it is improper to use a static margin to measure the energy for pulling apart unknown samples. With the alignment across domain and the separation between different class, the radius of the neighborhood of each class would change. This also explains the reason why the adaptive margin produces higher scores than the static margin.


Effect of re-weighting the contrastive loss. There is another hyper-parameter which re-weights the distances in . We conduct experiments with . As shown in Fig. 4(a), our approach achieves the best results when is equal to 0.5. This parameter smooths the weight of calculated with cosine similarity. When is 0, the re-weighting term is equal to 1, which means the contrastive loss is calculated without the re-weighting. It also reveals that the effectiveness of the re-weighting term could help the model to better measure the distances between unknown samples and centroids in source.

Effect of the ratio of unknown samples. In this section, we aim to investigate the robustness of our method under different ratios of unknown samples in the target data. In Fig. 4(d), we compare our results with method BP+OSVM and OSDA+BP under ratio . Unknown samples are randomly sampled according to the ratio. It can be seen that the accuracy of our model fluctuates little and is always above the baseline methods, which implies the robustness of our method.

5 Conclusion

In this paper, we focus on the open set domain adaptation setting where only source domain annotations are available and the unknown class exists in the target. To better identify the unknown and meanwhile diminish the domain shift, we take the semantic margin of open set data into account through semantic categorical alignment and semantic contrastive mapping, aiming to make the known classes more separable and push the unknown class away from the decision boundary, respectively. Empirically, we demonstrate that our method is comparable to or even better than the state-of-the-art methods on representative open set benchmarks, i.e. Digits and Office-31. The effectiveness of each component of our method is also verified. Our method implies that explicitly taking the semantic margin of open set data into account is beneficial. And it is promising to make more exploration in this direction in the future.

References

  • [1] M. Baktashmotlagh, M. Faraki, T. Drummond, and M. Salzmann (2019) Learning factorized representations for open-set domain adaptation. In ICLR, Cited by: §1, §2.
  • [2] K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan (2017) Unsupervised pixel-level domain adaptation with generative adversarial networks. In CVPR, Cited by: §2.
  • [3] P.P. Busto and J. Gall (2017) Open set domain adaptation. In ICCV, Cited by: §1, §2, §4.1, §4.1, §4.3, Table 2.
  • [4] Q. Ce and S. Fei (2017)

    Contrastive-center loss for deep neural networks

    .
    In 2017 IEEE International Conference on Image Processing (ICIP), pp. 2851–2855. Cited by: §1, §3.3.
  • [5] B. Chen, W. Deng, and H. Shen (2018) Virtual class enhanced discriminative embedding learning. In Advances in Neural Information Processing Systems, pp. 1942–1952. Cited by: §2.
  • [6] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In CVPR, Cited by: §4.1.
  • [7] H. Fan, X. Chang, D. Cheng, Y. Yang, D. Xu, and A. G. Hauptmann (2017) Complex event detection by identifying reliable shots from untrimmed videos. In ICCV, Cited by: §2.
  • [8] H. Fan, L. Zheng, C. Yan, and Y. Yang (2018) Unsupervised person re-identification: clustering and fine-tuning. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14 (4), pp. 83. Cited by: §2.
  • [9] Y. Ganin and V. Lempitsky (2015)

    Unsupervised domain adaptation by backpropagation

    .
    In ICML, Cited by: §2, §3.2, §4.1.
  • [10] M. Ghifary, W. B. Kleijn, M. Zhang, D. Balduzzi, and W. Li (2016) Deep reconstruction-classification networks for unsupervised domain adaptation. In ECCV, Cited by: §2.
  • [11] A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf, and A. J. Smola (2007) A kernel method for the two-sample-problem. In NIPS, pp. 513–520. Cited by: §4.1.
  • [12] Q. Guan and Y. Huang (2018) Multi-label chest x-ray image classification via category-wise residual attention learning. Pattern Recognition Letters. Cited by: §2.
  • [13] R. Hadsell, S. Chopra, and Y. LeCun (2006) Dimensionality reduction by learning an invariant mapping. In CVPR, Cited by: §3.4.
  • [14] P. Haeusser, T. Frerix, A. Mordvintsev, and D. Cremers (2017) Associative domain adaptation. In ICCV, Cited by: §2.
  • [15] K. He, G. Gkioxari, P. Dollár, and R. Girshick (2017) Mask r-cnn. In ICCV, Cited by: §1.
  • [16] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In CVPR, Cited by: §1.
  • [17] L.P. Jain, W.J. Scheirer, and T.E. Boult (2014) Multi-class open set recognition using probability of inclusion. In ECCV, Cited by: §4.1.
  • [18] G. Kang, L. Jiang, Y. Yang, and A. G. Hauptmann (2019) Contrastive adaptation network for unsupervised domain adaptation. In CVPR, Cited by: §2.
  • [19] G. Kang, L. Zheng, Y. Yan, and Y. Yang (2018)

    Deep adversarial attention alignment for unsupervised domain adaptation: the benefit of target expectation maximization

    .
    In ECCV, Cited by: §2.
  • [20] D. P. Kingma and J. Ba (2015) Adam: a method for stochastic optimization. In ICLR, Cited by: §4.1.
  • [21] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012)

    Imagenet classification with deep convolutional neural networks

    .
    In NIPS, pp. 1097–1105. Cited by: §1, §4.1.
  • [22] S. Kuniaki, Y. Shohei, U. Yoshitaka, and H. Tatsuya (2018) Open set domain adaptation by backpropagation. In ECCV, Cited by: §1, §2, §3.2, Figure 3, §4.1, §4.1, §4.2, §4.3, Table 1, Table 2.
  • [23] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al. (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §4.1.
  • [24] J. Long, E. Shelhamer, and T. Darrell (2015) Fully convolutional networks for semantic segmentation. In CVPR, Cited by: §1.
  • [25] M. Long, Y. Cao, J. Wang, and M. I. Jordan (2015) Learning transferable features with deep adaptation networks. In ICML, Cited by: §2, §4.1.
  • [26] M. Long, H. Zhu, J. Wang, and M. I. Jordan (2016) Unsupervised domain adaptation with residual transfer networks. In Advances in Neural Information Processing Systems, pp. 136–144. Cited by: §2.
  • [27] Y. Luo, P. Liu, T. Guan, J. Yu, and Y. Yang (2019) Significance-aware information bottleneck for domain adaptive semantic segmentation. In ICCV, Cited by: §2.
  • [28] Y. Luo, L. Zheng, T. Guan, J. Yu, and Y. Yang (2019) Taking a closer look at domain shift: category-level adversaries for semantics consistent domain adaptation. In CVPR, Cited by: §2.
  • [29] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A.Y. Ng (2011) Reading digits in natural images with unsupervised feature learning. In NIPS, Cited by: §4.1.
  • [30] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi (2016) You only look once: unified, real-time object detection. In CVPR, Cited by: §1.
  • [31] K. Saenko, B. Kulis, M. Fritz, and T. Darrell (2010) Adapting visual category models to new domains. In ECCV, Cited by: §4.1, §4.1.
  • [32] K. Saito, Y. Ushiku, T. Harada, and K. Saenko (2018) Adversarial dropout regularization. In ICLR, Cited by: §2.
  • [33] K. Saito, Y. Ushiku, and T. Harada (2017) Asymmetric tri-training for unsupervised domain adaptation. In ICML, Cited by: §2.
  • [34] K. Saito, K. Watanabe, Y. Ushiku, and T. Harada (2018) Maximum classifier discrepancy for unsupervised domain adaptation. In CVPR, Cited by: §2.
  • [35] O. Sener, H. O. Song, A. Saxena, and S. Savarese (2016) Learning transferrable representations for unsupervised domain adaptation. In Advances in Neural Information Processing Systems, pp. 2110–2118. Cited by: §2.
  • [36] R. Shu, H. H. Bui, H. Narui, and S. Ermon (2018) A dirt-t approach to unsupervised domain adaptation. ICLR. Cited by: §2.
  • [37] K. Simonyan and A. Zisserman (2015) Very deep convolutional networks for large-scale image recognition. In ICLR, Cited by: §1, §4.1.
  • [38] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovish (2015) Going deeper with convolutions. In CVPR, Cited by: §1.
  • [39] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell (2017) Adversarial discriminative domain adaptation. In CVPR, Cited by: §2.
  • [40] E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell (2014) Deep domain confusion: maximizing for domain invariance. arXiv preprint arXiv:1412.3474. Cited by: §2.
  • [41] Z. Zheng, L. Zheng, M. Garrett, Y. Yang, and Y. Shen (2017) Dual-path convolutional image-text embedding with instance loss. arXiv preprint arXiv:1711.05535. Cited by: §2.
  • [42] Z. Zhong, L. Zheng, Z. Luo, S. Li, and Y. Yang (2019) Invariance matters: exemplar memory for domain adaptive person re-identification. In CVPR, Cited by: §2.
  • [43] Z. Zhong, L. Zheng, Z. Zheng, S. Li, and Y. Yang (2019) CamStyle: a novel data augmentation method for person re-identification. IEEE Transactions on Image Processing 28, pp. 1176–1190. Cited by: §2.