I Introduction
Domain Adaptation (DA) methods aim to train a targetdomain classifier with data in source and target domains [lu2015transfer]. Based on the variety of data in the target domain (i.e., fullylabeled, partiallylabeled, and unlabeled), DA consists of three categories: supervised DA [motiian2017unified, zuo2018fuzzy01, zuo2017granular], semisupervised DA [pereira2018semi, saito2019semi, zuo2018fuzzy02], and unsupervised DA (UDA) [liu2017heterogeneous, fang2019unsupervised]. In practice, UDA methods have been deployed to solve diverse realworld problems, such as object recognition [gopalan2011domain, kan2014domain], crossdomain recommendation [zhang2017cross]
, and sentiment analysis
[liu2020heterogeneous].There are two common settings in UDA: unsupervised closed set domain adaptation (UCSDA) and unsupervised open set domain adaptation (UOSDA). UCSDA is a classical scenario in which source and target domains share the same label sets. By contrast, in UOSDA, the target domain contains some unknown classes that are not observed in the source domain, and the data with unknown classes are called unknown target data. In Fig. 1, the source domain contains four known classes (i.e., monitor, mug, staple, and calculator), but the target domain contains some unknown classes in addition to the classes in the source domain.
UOSDA is more general than UCSDA, since the label sets are usually not consistent between source and target domains in a realworld scenario. Namely, the target domain may contain classes that are not observed in the source domain. For example, a classifier trained with images of various kinds of cats is likely to encounter the image of a dog or another animal in reality. In this case, the UCSDA methods are unable to distinguish the unseen animals (i.e., unknown classes). UOSDA methods, however, can establish a boundary between known classes and unknown classes.
Panareda et al. [panareda2017open] are the first to propose the setting of UOSDA, but the source domain also contains some unknown classes in Panareda’s paper. Since it is expensive and prohibitive to obtain data labeled by unknown classes in the source domain, Saito et al. [saito2018open] propose a new UOSDA setting where the source domain only contains known classes. In this paper, we focus on the same setting as Saito’s paper, which is more realistic [saito2018open, fang2019open].
In UOSDA, we aim to train a targetdomain classifier with labeled data in the source domain and unlabeled data in the target domain. The trained classifier is expected to accurately 1) recognize unknown target data, and 2) classify other target data. Existing UOSDA methods can be divided into two groups: shallow methods and deep methods. For shallow methods, a recent work [fang2019open] proved an upper bound of targetdomain risk, which can provide a theoretical guarantee for the design of a shallow UOSDA method. For deep methods, since [long2013transfer, yosinski2014transferable, DBLP:conf/icml/DonahueJVHZTD14] have shown that DNNs can learn more transferable features, researchers presented DNNsbased methods to address the UOSDA problem [saito2018open, feng2019attract, liu2019separate]. Nevertheless, these deep UOSDA methods lack theoretical guarantees. Thus, how to bridge theoretical bound and deep algorithms is both necessary and important for addressing the UOSDA problem.
In order to train an effective targetdomain classifier, Zhen et al. [fang2019open] have proven an upper bound of targetdomain risk (Eq. (14)) for the UOSDA problem and propose a shallow UOSDA method. Specifically, the bound consists of four terms: sourcedomain risk, distributional discrepancy between domains, open set difference (), and a constant. Open set difference, as an important term in upper bound, is leveraged to measure the risk of a classifier on unknown target data. The shallow method in [fang2019open]
trains a targetdomain classifier by minimizing the empirical estimation of the upper bound.
However, the theoretical bound presented in [fang2019open] is not adaptable to flexible classifiers (i.e., deep neural networks (DNNs)). In Fig. 2, we show that if the classifier is a DNN, the accuracy (OS in Fig. 2 (b)) in the target domain will drop significantly (yellow line in Fig. 2 (b)) when minimizing the empirical estimates of the upper bound. This phenomenon confirms that we cannot simply combine the existing theoretical bound and deep algorithms to address the UOSDA problem.
To reveal the nature of this phenomenon, we investigate that the lower bound of the distributional discrepancy is the negative value of open set difference. Since DNNs are very flexible and the empirical open set difference can be a negative value, empirical open set difference will be quickly minimized to a very negative value (yellow line in Fig. 2 (a)). Based on the lower bound of the distributional discrepancy, if the empirical open set difference is a very small negative number, the distributional discrepancy is greater than a very large positive number. Consequently, we fail to align the distributions of the two domains, resulting in a very low accuracy on the target domain (yellow line in Fig. 2 (b)).
In this paper, we propose a new upper bound of targetdomain risk for UOSDA (Eq. (20)), which includes four terms: sourcedomain risk, open set difference (), conditional distributional discrepancy between domains, and a constant. is the lower bound of open set difference and we construct a new risk estimator that limits the descent of the open set difference by . can ensure the promptly prevention of the lower bound of the distributional discrepancy between two domains from significantly increasing. Fig. 2 shows that minimizing the empirical estimates of the new upper bound achieves higher accuracy (green line in Fig. 2(b)).
Then, we propose a new principleguided deep UOSDA method that trains DNNs via minimizing empirical estimates of the new upper bound. The network structure is shown in Fig. 3. We employ a generator () to extract the feature of input data, a classifier () to classify input data, and a domain discriminator () to assist distribution alignment. The overall object function consists of source classification loss, binary adversarial loss, domain adversarial loss, and empirical . Specifically, the source classification loss and empirical are minimized by gradient descent, and a gradient reverse layer is adopted for adversarial losses.
To effectively align distributions between data with known classes, we propose a novel openset conditional adversarial training strategy based on the tensor product between the feature representation and the label prediction to capture the multimodal structure of distribution. According to
[song2009hilbert, long2018conditional], it is significant to capture the multimodal structures of distributions using crosscovariance dependency between the features and classes. However, existing deep UOSDA methods align distributions by either the binary adversarial net [saito2018open, feng2019attract] or the multibinary classifier [liu2019separate], which is not adequate for distributions with multimodal structure. Furthermore, this novel training strategy also pushes unknown target data away from data with known classes via . As shown in Fig. 2 (b), the novel distribution alignment strategy can further boost the performance of the classifier.To validate the efficacy of the proposed method, we conduct extensive experiments on several standard benchmark datasets containing transfer tasks. Compared to existing shallow and deep UOSDA methods, our method shows stateoftheart performance on digit recognition (MNIST, SVHN, USPS), object recognition (Office31, OfficeHome) and face recognition (PIE). The main contributions of this paper are:

A new theoretical bound of targetdomain risk for UOSDA is proposed. It is essential since the existing bound does not apply to flexible classifiers (i.e., DNNs). Thus this work can bridge the gap between the existing theoretical bound and deep algorithms for the UOSDA problem.

A UOSDA method based on DNNs is proposed under the guidance of the proposed theoretical bound. The method can better estimate the risk of the classifier on unknown data than existing deep methods with the theoretical guarantee.

A novel openset conditional adversarial training strategy is proposed to ensure that our method can align the distributions of two domains better than existing UOSDA methods.

Experiments on Digits, Office31, OfficeHome, and PIE show that the accuracy of the OS of our method significantly outperforms all baselines, which shows that our method achieves stateoftheart performance.
This paper is organized as follows. Section II reviews the works related to UCSDA, open set recognition, and UOSDA. Section III introduces the definitions of notations and our problem. Section IV demonstrates the motivation of this paper. Theoretical results and the proposed method are shown in Section V. Experimental results and analyses are provided in Section VI. Finally, Section VII concludes this paper.
Ii Related Work
Unsupervised open set domain adaptation is a combination of unsupervised closed set domain adaptation and open set recognition. In this section, we present a systematic review of related studies.
Iia Closed Set Domain Adaption
In [ben2007analysis], a theoretical bound for UCSDA is given, which indicates that minimizing the source risk and distributional discrepancy is the key to the UCSDA problem. Based on this point, there are two kinds of methods for UCSDA: one is to employ a distributional discrepancy measurer to measure the domain gap [pan2010domain]; the other is the adversarial training strategy [long2018conditional].
Transfer Component Analysis (TCA) [pan2010domain] utilizes MMD [gretton2012kernel]
learning a domain invariant feature by aligning marginal distribution. Meanwhile, Joint Distribution Adaptation (JDA)
[long2013transfer] align marginal distribution and conditional distribution simultaneously. In order to simplify the training of a classifier, Easy Transfer Learning (EasyTL) [wang2019easy] exploits the intradomain information to get a nonparametric feature and the classifier. CORrelation Alignment (CORAL) [sun2016return] aligns secondorder statistics of source and target domain to minimize domain divergence. Manifold Embedded Distribution Alignment (MEDA) [wang2018visual] performs a dynamic distribution alignment in a Grassmann manifold subspace.Meanwhile, deep neural networks have also been introduced into domain adaptation and achieved competitive performance in UCSDA. Deep Adaptation Networks (DAN) [long2015learning] employs the multikernel MMD (MKMMD) to align the feature of 68 layers in Alexnet. Deep CORAL Correlation is the extension of shallow method CORAL in deep neural networks. Wasserstein Distance Guided Representation Learning (WDGRL) [shen2018wasserstein] employs the Wasserstein distance to learn an invariant representation in deep neural networks.
Representative adversarialtrainingbased method are DomainAdversarial Training of Neural Networks (DANN) [ganin2016domain] and conditional adversarial domain adaptation (CDAN) [long2018conditional]. DANN employs a domain discriminator to recognize which domain data comes from and deceives the domain discriminator by changing features so that an invariant representation can be learned during the adversarial procession. Furthermore, CDAN utilizes the tensor product between feature and classifier prediction to grasp the multimodal information and an entropy condition to control the uncertainty of the classifier. However, these methods can only cope with the UCSDA problem and are unable to address the UOSDA problem.
IiB Open Set Recognition
This setting allows some unknown classes to be shown in the target domain, but there is no distributional discrepancy between domains. Open Set SVM [jain2014multi] rejects the unknown classes via a fixed threshold. Open Set Nearest Neighbor (OSNN) [junior2017nearest] extends the Nearest Neighbor to recognize unknown classes. Bendale et al. [bendale2016towards]
introduces a layer named OpenMAX to estimate the probability that an input data is recognized as unknown classes in DNNs. However, these methods do not consider distributional discrepancy. They are also unable to address the UOSDA problem.
IiC Open Set Domain Adaptation
Busto et al. [panareda2017open]
were the first to propose the setting of UOSDA. They employed a method named AssignandTransformIterately (ATI) to assign labels to target data using a distance matrix between target data and source class centers and aligned distributions through a mapping matrix. In the setting of this paper, however, the source domain contains some unknown classes to assist the classifier to recognize unknown data. Since obtaining unknown samples of the source domain is expensive and timeconsuming, Open Set Backpropagation (OSBP)
[saito2018open] assumes a more realistic scenario that the source domain has no unknown classes, which is more challenging. An adversarial network is used to recognize unknown samples and align distribution during backpropagation.Based on OSBP, Feng et al. [feng2019attract] proposed a method named SCI_SCM, which utilizes semantic structure among data to align the distribution of known classes and push unknown classes away from known classes. Separate to Adapt (STA) [liu2019separate] utilizes a coarsetofine weight mechanism to separate unknown samples from the target domain. In Distribution Alignment with Open Difference (DAOD) [fang2019open], a theoretical bound is proposed for UOSDA and a risk estimator is used to recognize unknown target data.
However, existing deep UOSDA methods lack the theoretical guidance and the upper bound in [fang2019open] is not applicable to DNNs, which causes a large distributional discrepancy (details are shown in Section IV). Obviously, for UOSDA, there is a gap between existing theoretical bound and deep algorithms. In this paper, we aim to fill this gap.
Iii Preliminary and Notations
The definitions of the UOSDA problem and some important concepts are introduced in this section. The notations used in this paper are summarized in Table I.
Iiia Definitions and Problem Setting
Important definitions are presented as follows.
Definition 1 (Domain[fang2019open]).
Given a feature space and a label space , a domain is a joint distribution
, where the random variables
, .In Definition 1, and mean that the spaces and contain the image sets of and respectively. In the paper, we name the random variable
as feature vector and the random variable
as label. Based on this definition, we have:Definition 2 (Domains for Open Set Domain Adaptation[fang2019open]).
Given a feature space and the label spaces , the source and target domains have different joint distributions and , where the random variables , , , and the label space .
From the definitions above, we can notice that: 1) This paper focuses on homogeneous situations. Thus and are belong to the same space, and 2) contains . It is unknown target classes that are the classes from . It is are the known classes that are the classes from . Thus, the UOSDA problem is:
Problem 1 (Unsupervised Open Set Domain Adaptation (UOSDA) [fang2019open]).
Given labeled samples drawn from the joint distribution of the source domain i.i.d and unlabeled samples drawn from the marginal distribution of the target domain i.i.d. The aim of UOSDA is to find a target classifier such that
1) classifies the known target samples into the correct known classes;
2) recognizes the unknown target samples as unknown.
According to the definition of the problem, the targetdomain classifier only needs to recognize unknown target data as unknown and classify other target data. It is not necessary to classify unknown target data, and all unknown target data are recognized as the “unknown class”. In general, we assume that , where the label denotes the unknown class and the label is a onehot vector. The label denotes the th class.
Notation  Description  Notation  Description 

feature space  source, target joint distributions  
source, target label sets  source, target marginal distributions  
random variables on the feature space  open set difference  
,  random variables on the label spaces  
source, target risks  partial risk on known target classes  
onehot vector (class )  partial risk on unknown target classes  
feature transformation , classifier over  risks that samples regarded as unknown  
hypothesis space, set of classifiers  classprior probability for unknown class 

sample from  empirical distribution, empirical risk  
distance  tensor discrepancy distance 
IiiB Concepts and Notations
It is necessary to introduce some important concepts and notations before demonstrating our main results. Unless otherwise specified, all the following notations are used consistently throughout this paper without further explanations.
IiiB1 Notations for distributions
For simplicity, we denote the joint distributions and by the notations and respectively. Similarly, we use and denote the marginal distributions and respectively.
denotes the target conditional distribution for the known classes, while denotes the target conditional distribution for the unknown classes. denotes the classprior probability for the unknown target classes.
Given a feature transformation:
(1) 
the induced distributions related to and are
(2) 
Lastly, the notation denotes the corresponding empirical distribution to any distribution . For example, represents the empirical distribution corresponding to .
IiiB2 Risks and Partial Risks
In learning theory, risks and partial risks are two important concepts, which are briefly explained below.
Following the notations in [DBLP:conf/icml/0002LLJ19], consider a multiclass classification task with a hypothesis space of the classifiers
(3) 
Let
(4) 
be the loss function. For convenience, we also require
to satisfy the following conditions in Theorem 1:1. is symmetric and satisfies triangle inequality;
2. iff ;
3. if and are onehot vectors.
We can check many losses satisfying the above conditions such as  loss and loss .
Then the risks of w.r.t. under and are given by
(5) 
The partial risk of for the known target classes is
(6) 
and the partial risk of for the unknown target classes is
(7) 
Lastly, we denote
(8) 
as the risks that the samples are regarded as the unknown classes.
Given a risk , it is convenient to use notation as the empirical risk that corresponds to .
IiiB3 Discrepancy Distance
How to measure the difference between domains plays a critical role in domain adaptation. To achieve this, a famous distribution distance has been proposed as the measures of the distribution difference.
Definition 3 (Distributional Discrepancy [DBLP:conf/colt/MansourMR09]).
Given a hypothesis space containing a set of functions defined in a feature space . Let be a loss function, and be distributions on space . The distance between distributions and over is
In this paper, we have used a tighter distance named tensor discrepancy distance, which is firstly proposed by [long2018conditional]. The tensor discrepancy distance can future extract the multimodal structure of distributions to make sure the knowledge related to learned classifier and pseudo labels can be utilized during the distribution aligning process.
We consider the following tensor mapping:
(9) 
Then we induce two importance distributions:
(10) 
Using , we reconstruct a new hypothetical set:
(11) 
where . Then the distance between and is:
(12) 
where is the sign function.
It is easy to prove that under the conditions (1)(3) for loss and for any , we have
(13) 
IiiB4 Existing Theoretical Bound
Zhen et al. [fang2019open] firstly proposed a theoretical bound for UOSDA:
(14) 
There are four main terms: source risk, distributional discrepancy, a constant and open set difference. The fourth term, open set difference, is designed to estimate the risk of classifier on unknown data.
Iv Motivation
In UOSDA, the targetdomain classifier aims to accurately recognize unknown target data and classify the other target data. Since the knowledge about unknown classes is missing, the classifier is likely to be confused about the boundary between known and unknown target data. Thus, recognizing unknown target data plays a critical role in addressing the UOSDA problem.
In order to obtain an effective targetdomain classifier, Zhen et al. [fang2019open] have proven an upper (Eq. (14)) bound for UOSDA and proposed a shallow method based on the bound. It consists of four terms: sourcedomain risk, distributional discrepancy, open set difference (), and a constant. Particularly, open set difference, as an important term, is leveraged to estimate the risk of the classifier on unknown target data.
In order to verify whether open set difference works in DNNs, we introduced open set difference into DNNs and conducted a group of experiments on the task Ar Cl in OfficeHome. The classifier consists of backbone (ResNet50), generator (two linear layers), and classifier (one linear layer). It is evident that the classifier is very flexible. As shown in Fig. 2, the empirical open set difference converges to a negative value (refer to the yellow line in Fig. 2(a)) and the accuracy of OS, average accuracy among all classes that include unknown classes (Eq. (29)), significantly decreases when empirical open set difference converges to a negative value.
To reveal the nature of this phenomenon, first we investigate the distributional discrepancy and discover that the distributional discrepancy has a lower bound. Specifically, the distributional discrepancy is greater than the negative value of open set difference (Eq. (18)). Based on the lower bound, if the value of the open set difference is a large negative number, then the distributional discrepancy is greater than a large positive number. Hence, we may fail to align the distributional discrepancy. In fact, experiments have shown that the empirical open set difference may converge to a large negative value if we introduce the open set difference into DNNs.
Clearly, there is a gap between existing theoretical bound and DNNs. In order to bridge theoretical bound and deep algorithms, in this paper, we propose a new practical upper bound (Eq. (20)) for UOSDA that applies to DNNs. The term, open set difference, in the new bound can effectively overcome the defect of open set difference. As shown in Fig. 2, open set difference guarantees that the risk of the classifier on unknown data is always greater than the lower bound of open set difference by (refer to the green line in Fig. 2(a)). Furthermore, the open set difference significantly outperforms the open set difference (refer to the green line in Fig. 2(b)).
To sum up, existing upper bound is not compatible with DNNs. That is why we propose a new upper bound that contains an amended risk estimator, open set difference (). Details of the new upper bound and are shown in Section V.
V The Proposed Method
In this section, we firstly propose a theoretical bound that applies to DNNs for UOSDA. Under the guidance of the bound, we then propose a UOSDA method based on DNNs.
Notation  Description 

cross entropy, mean square error loss function  
set of predicted unknown target data with high confidence  
set of predicted known target data with high confidence  
number of source data  
number of target data  
number of  
number of  
source data  
target data 
Va Theoretical Results
VA1 An Analysis for Open Set Difference
Eq. (15) is the open set difference:
(15) 
where and are defined in Eq. (8). The positive term is used to recognize unknown data and the negative term is designed to prevent known data from being classified as unknown classes. By combining these two terms, the classifier can recognize unknown target samples. According to [fang2019open], the open set difference satisfies the following inequality:
(16) 
The proof of Eq. (16) can be found in Appendix A. proposition 1. Note that
(17) 
hence, the distributional discrepancy is greater than the negative open set difference:
(18) 
Theoretically, we hope that the optimized open set difference should not be a large negative value. Otherwise, it is impossible to eliminate the distributional discrepancy. However, in fact, the empirical open set difference may converge to a large negative value (see Fig. 2). This results in that the distributional discrepancy may still be large.
VA2 Open Set Difference
Based on the analyses above, we try to correct the open set difference to avoid the problem mentioned above. According to Eq. (18), the open set difference is lower bounded. We denoted the lower bound of the open set difference by . An potentiality is to limit the lower bound of the open set difference by a small negative constant . Hence, we propose an amended risk estimator, open set difference (), to overcome the existing defect in the open set difference:
(19) 
If we optimize the empirical open set difference, we can guarantee that the empirical open set difference is always larger than . Lastly, combining Eqs. (12), (13) with Eq. (19), we develop a new theoretical bound for UOSDA.
Theorem 1.
Given a feature transformation , a loss function satisfying conditions 13 introduced in Section IIIB2), a nonegative constant and a hypothesis with a mild condition that the constant vector value function , then for any , we have
(20) 
where and are the risks defined in (5), and are the risks defined in (8), is the partial risk defined in (6) and .
Proof.
The proof is given in Appendix A. ∎
It is notable that the theoretical bound introduced in Theorem 1 has two main differences from the learning bound introduced by [fang2019open]. The first one is the open set difference. As mentioned before, open set difference is designed to eliminate distributional discrepancy caused by open set difference when the module is based on DNNs. The other difference is that we use the tensor distributional discrepancy to estimate the domain difference. There are two advantages for the tensor distributional discrepancy compared with the distributional discrepancy (Definition 3): 1) the tensor distributional discrepancy is tighter than the distributional discrepancy (see Eq. (13)); 2) the tensor distributional discrepancy can extract the multimodal structure of distributions to make sure the knowledge related to the learned classifier and pseudo labels can be utilized during the process of distribution alignment [long2018conditional].
VB Method Description
According to Theorem 1, we formally present our method (see Fig. 3), which consists of three parts. Part 1) Binary adversarial domain adaptation. Following [saito2018open], we employ a binary adversarial module to find a rough boundary between the classknown data (known data) and the classunknown data (unknown data), and thus this module can provide target samples with high confidence for other modules. Part 2) open set difference (). The is leveraged to estimate the risk of the classifier on unknown data such that the classifier can accurately recognize the unknown target data. Part 3) Conditional adversarial domain adaptation. Existing deep UOSDA methods ignore the importance of the multimodal structure of distribution while aligning distributions for known classes. According to the tensor distributional discrepancy, we design a novel open set conditional adversarial strategy to align distributions for known classes. Notations used in this section are summarized in Table II.
VB1 Binary adversarial domain adaptation (BADA)
According to our theoretical bound, the first term is source risk. For the source domain, the label is available. We utilize a crossentropy for the classification of source samples:
(21) 
For the target domain, it is imperative to recognize the unknown target data before aligning distribution. Following [saito2018open], we employ a binary crossentropy and a gradient reverse layer between generator and classifier to find a boundary between the known data and the unknown data:
(22) 
where is the th value of hypothesis function .
The minimax game is shown in Section VC. During the process of adversarial training, the classifier attempts to minimize , but the generator attempts to maximize . Therefore, recognizing unknown data is achieved during the process of adversarial training.
However, this module can only find a coarse boundary between the known data and the unknown data, which cannot accurately recognize the unknown target data. Table VI verifies that only binary adversarial domain adaptation cannot achieve satisfactory performance. Therefore, we employ the open set difference for recognizing unknown target data more appropriately and the openset conditional adversarial strategy to further align distribution.
VB2 open set difference
The principle of the open set difference () is adequately demonstrated in Sections IV and VA. Then we introduce to recognize unknown target data. According to Eqs. (19), (23), we can calculate the empirical open set difference by:
(23) 
Without more label information, in Eq.(19) is impossible to be evaluated accurately, thus, we introduce a parameter, , to replace it. The analysis of is discussed in Section VI.
VB3 Conditional adversarial domain adaptation
Here we utilize the tensor distributional discrepancy to align the distribution between the known classes. Firstly, the empirical representations of and can be written as follows:
(24) 
where is the set of target data from the known classes and is the Dirac measure.
Then, motivated by DANN [ganin2016domain] and CDAN [long2018conditional] , we can reformulate the tensor distributional discrepancy between the known classes as follows:
(25) 
where is the domain discriminator designed to classify domains.
Since the target data is unlabeled, Eq. (25) cannot be directly calculated. Thanks to the pseudo labels provided by BADA, we leverage it to replace the true label. Since these pseudo labels are not completely accurate, we only select the samples with a confidence of 0.9. We then formulate the domain adversarial loss function below.
(26) 
where denotes the set of samples from known classes with high confidence in the target domain, and .
Domain adversary loss aims to minimize over and maximize over . The gradient reverse layer between and results in becoming confused about the source data and the target data. The minimax game is shown in Section VC. The classifier aims to identify what input data belongs to which domain, but the generator aims to deceive the classifier by changing the features of the input data. Distribution alignment can be achieved during this process.
Furthermore, the unknown data may distract distribution alignment of the known data. Thus the unknown data should be pushed away from known data to prevent them from affecting distribution alignment. We construct the loss function below. It is worth noting that there is no gradient reverse between and during the process of backpropagation.
(27) 
where is the unknown target samples with high confidence and .
In this subsection, we construct a domain discriminator () to align the distributions for the known data by a tensor product, which can capture the multimodal structure of distribution. Furthermore, we construct a loss function to push the unknown data away from the known data to prevent the unknown data affecting distribution alignment.
VC Training Procedure
Combining Eqs. (21), (22), (23), (26) and (27), We solve UOSDA problem by the following minimax game:
(28) 
We introduce the gradient reverse layer for adversary learning. The whole training procedure is shown in Algorithm 1. Firstly, we initialize the parameters of the generator (), the classifier () and the domain discriminator (
) (line 1). In each epoch, we divide data into multi minibatches (line 45). Then we calculate source risk (
), binary adversarial loss () and according to Eqs. (21), (22), (23) (line 67). After selecting target samples with high confidence () (line 8), we calculate and according to Eqs. (26) and (27) (line 9). Finally, parameters are updated Via the SGD optimizer (line 10).With the proposed method, in binary adversarial domain adaptation (, ), a coarse boundary between known data and unknown data can be found. Furthermore, open set difference () can adequately estimate the risk of the classifier on unknown data, which is effective for the classifier to accurately recognize unknown target data. Then, we further align distributions of known data () and push unknown data away from known data () using a domain discriminator. Finally, combining these three modules, we can adequately solve the UOSDA problem.
Vi Experiments And Evaluations
In this section, we conducted extensive experiments on standard benchmark datasets (including transfer tasks) to demonstrate the effectiveness of our method. Several stateoftheart UOSDA methods such ATI [panareda2017open], OSBP [saito2018open], SCI_SCM [feng2019attract], STA [liu2019separate] and DAOD [fang2019open] are employed as our baselines.
Via Datasets
Digits contains three digit datasets: MNIST (M) [lecun1998gradient], SVHN (S) [netzer2011reading], USPS (U) [hull1994database]. We construct three open set domain adaptation tasks as previous works [saito2018open]: S M, M U and U M. Following the protocol of [saito2018open], we select classes  as the known classes and classes  as the unknown classes of the target domain.
Office31 [saenko2010adapting] is an object recognition dataset with imges, which consists of three domains with slight discrepancy: amazon (A), dslr (D) and webcam (W). Each domain contains kinds of object. So there are open set domain adaptation tasks on Office31: A D, A W, D A, D W, W A, W D. We follow the open set protocol of [saito2018open], selecting the first classes in alphabetical order as the known classes and classes  as the unknown classes of the target domain.
OfficeHome [venkateswara2017deep] is an object recognition dataset with image, which contains four domains with more obvious domain discrepancy than Office31. These domains are Artistic (Ar), Clipart (Cl), Product (Pr), RealWorld (Rw). Each domain contains kinds of objects. So there are open set domain adaptation tasks on OfficeHome: Ar Cl, Ar Pr, Ar Rw, …, Rw Pr. Following the standard protocol, we chose the first classes as the known classes and  classes as the unknown classes of the target domain.
PIE [Rasouli_2019_ICCV] is a face recognition dataset, containing images of people with multifarious pose, illumination and expression. following the protocol of [fang2019open], We performed open set domain adaptation among out of poses and selected classes  as the known classes and classes  as the unknown classes of the target domain:x PIE1 (left pose), PIE2 (upward pose), PIE3 (downward pose), PIE4 (frontal pose) and PIE5 (right pose). We construct open set domain adaptation tasks, i.e., PIE1 PIE2, PIE1 PIE3, …, PIE5 PIE4.
ViB Implementation
Network structure. For the Digit
, we employ the similar convolution neural network as
[shu2018a, saito2018open] for S M and other tasks, respectively, and train the DNNs from scratch. For Office31, we leverage VGGNet [simonyan2014very] as backbone to extract features of images. We employ two fullyconnected layers as the generator and one fullyconnected layer as the classifier. For OfficeHome, We leverage ResNet [he2016deep] as backbone to extract features of images. The network structure of the generator and the classifier are the same as Office31. PIE has provided valid features of all images. Therefore CNN is not necessary, and we adopted a similar generator and classifier as Office31. Details about the network can be found in Appendix B. In the same manner as [saito2018open, feng2019attract], we do not update the parameters of the backbone during the training process.Parameter setting. In the proposed method, there are two important parameters: and . We set as in all experiments, which is because distributional discrepancy is gradually approaching to during the process of domain adaptation and should be greater than or equal to when distributional discrepancy is . Besides, we set as for Office31, for Digit and OfficeHome, and for PIE. When the distributional discrepancy is relatively large, we advise that should be smaller for steady training. All experiment results are the accuracy averaged over three independent runs.
ViC Baselines
We compare our method with five UOSDA methods: ATI, OSBP [saito2018open], SCA_SCM [feng2019attract], STA [liu2019separate], and DAOD [fang2019open]. We briefly introduce these baselines in the following.
ATI [panareda2017open] employs an integer programming to assign the label for the target domain and a mapping matrix to align distribution.
OSBP [saito2018open] employs a classifier to align distributions between data (with known classes) in both source and target domains and an adversarial net to reject unknown samples through the probability of samples in the target domain.
SCA_SCM [feng2019attract] aligns the centroids between source and target and pushes unknown samples away from known classes to achieve a good performance.
STA [liu2019separate] utilizes a coarsetofine weight mechanism to separate unknown samples from the target domain and achieves distribution alignment simultaneously.
DAOD [fang2019open] trains a targetdomain classifier via minimizing Eq. (14). The term, open set difference, is used to estimate the risk of the classifier on unknown classes.
ViD Evaluation Metrics
Following previous works [panareda2017open, saito2018open, fang2019open], we employ the two metrics below to evaluate our method. OS: average accuracy among all classes that include unknown classes. OS*: average accuracy among known classes.
(29) 
where is the target classifier, and is the set of target samples with label .
ViE Results
Dataset  ATI  OSBP  SCA_SCM  STA  DAOD  OURS  

OS  OS*  OS  OS*  OS  OS*  OS  OS*  OS  OS*  OS  OS*  
S M  67.6  66.5  63.1  59.1  68.6  65.5  76.9  75.4      82.9  82.6 
M U  86.8  89.6  92.1  94.9  91.3  92.0  93.0  94.9      93.4  94.6 
U M  82.4  81.5  92.3  91.2  93.1  95.2  92.2  91.3      90.7  92.7 
Average  78.9  79.2  82.4  81.7  84.3  84.2  87.3  87.2      89.0  90.0 
Dataset  ATI  OSBP  SCA_SCM  STA  DAOD  OURS  
OS  OS*  OS  OS*  OS  OS*  OS  OS*  OS  OS*  OS  OS*  
A D  79.8  86.8  85.8  85.8  90.1  92.0  88.6  92.8  89.2  91.1  96.0  97.5 
A W  86.4  93.0  76.9  76.6  86.4  87.7  91.9  94.3  90.5  91.9  92.5  93.7 
D A  75.0  81.5  89.4  91.5  81.6  88.4  73.4  74.3  75.4  73.6  85.3  86.0 
D W  91.7  98.6  96.0  96.6  97.9  99.8  96.5  99.5  98.6  100.0  98.4  100.0 
W A  75.8  82.0  83.4  83.1  80.3  82.6  71.3  71.3  75.6  74.7  83.2  83.9 
W D  91.5  99.3  97.1  97.3  98.2  99.3  95.4  100.0  98.6  99.3  98.6  100.0 
Average  83.4  90.2  88.0  88.5  89.1  91.6  86.2  88.7  88.0  88.4  92.3  93.5 
Ar Cl  53.1  54.2  53.1  53.3  58.9  59.9  57.0  59.3  55.4  55.3  61.6  62.8 
Ar Pr  68.6  70.4  68.4  69.2  73.4  74.4  67.2  69.5  71.8  72.6  76.6  78.3 
Ar Rw  77.3  78.1  78.0  79.1  79.2  80.2  79.1  81.9  77.6  78.2  83.2  85.0 
Cl Ar  57.8  59.1  57.9  58.2  60.6  61.5  59.1  61.3  59.2  59.1  62.2  62.8 
Cl Pr  66.7  68.3  71.6  72.4  67.5  68.4  63.4  65.9  70.1  70.8  71.0  72.2 
Cl Rw  74.3  75.3  71.4  72.3  74.8  75.8  72.7  75.5  77.0  77.8  77.7  79.0 
Pr Ar  61.2  62.6  59.6  61.0  63.8  64.7  63.8  65.2  65.8  66.7  64.6  65.4 
Pr Cl  53.9  54.1  55.7  56.9  58.1  59.0  56.5  58.6  59.1  60.0  60.0  60.8 
Pr Rw  79.9  81.1  82.1  83.9  77.7  78.7  80.1  82.4  82.2  84.1  81.5  82.9 
Rw Ar  70.0  70.8  66.5  68.2  67.3  68.2  69.3  71.3  70.5  71.3  70.6  71.6 
Rw Cl  55.2  55.4  57.8  59.2  55.8  56.7  57.5  59.2  57.8  58.4  58.8  59.6 
Rw Pr  78.3  79.4  78.6  80.8  77.7  78.6  79.4  82.2  80.6  81.8  81.3  82.8 
Average  66.4  67.4  66.7  67.9  67.9  68.8  67.1  69.4  68.9  69.6  70.8  71.9 
Dataset  ATI  OSBP  SCA_SCM  STA  DAOD  OURS  

OS  OS*  OS  OS*  OS  OS*  OS  OS*  OS  OS*  OS  OS*  
P1 P2  41.9  44.0  64.2  66.6  60.7  60.9  54.2  55.0  56.5  57.3  76.4  78.1 
P1 P3  53.6  56.3  66.4  69.1  65.7  66.0  67.7  68.8  52.2  53.1  75.7  77.4 
P1 P4  64.6  67.9  76.2  80.0  79.5  80.3  81.6  83.6  82.4  85.2  89.6  91.6 
P1 P5  43.3  45.4  49.1  50.2  45.7  45.3  42.4  41.7  46.1  47.3  57.2  58.0 
P2 P1  56.7  59.5  52.9  54.2  63.6  65.2  51.0  51.6  68.1  69.7  81.6  83.9 
P2 P3  53.6  56.3  61.5  63.5  66.9  68.5  58.3  59.0  69.9  71.7  76.5  78.3 
P2 P4  73.5  77.1  90.4  92.9  91.2  93.6  78.6  80.6  88.2  91.2  94.0  96.4 
P2 P5  34.9  36.7  45.1  45.9  45.3  46.0  39.6  39.6  49.4  49.8  51.8  52.6 
P3 P1  66.9  68.4  61.3  61.0  75.2  77.3  69.2  70.7  66.6  68.3  82.7  85.0 
P3 P2  52.4  55.0  64.1  64.6  68.9  70.7  59.5  61.0  68.5  70.4  76.0  78.0 
P3 P4  70.5  74.0  74.7  76.9  86.6  89.1  77.6  79.8  83.9  87.1  84.9  87.2 
P3 P5  44.8  47.1  46.3  46.7  59.7  61.0  46.3  46.7  52.3  53.3  62.8  64.2 
P4 P1  63.7  66.8  67.2  68.7  85.7  86.9  84.4  86.6  84.4  87.1  93.1  95.4 
P4 P2  74.4  78.1  82.2  85.0  90.0  91.3  89.7  92.5  82.4  84.8  93.9  96.2 
P4 P3  58.7  61.7  66.9  67.6  86.0  87.1  81.6  84.4  77.6  80.0  85.1  86.9 
P4 P5  46.2  48.5  61.7  63.8  63.2  63.6  68.8  71.0  59.9  61.3  71.3  72.7 
P5 P1  30.2  23.5  64.2  66.6  54.3  55.7  61.2  62.6  59.2  60.6  62.8  64.3 
P5 P2  34.9  36.7  35.4  35.8  48.8  49.7  49.8  50.0  35.0  34.8  50.2  51.1 
P5 P3  39.9  41.9  45.1  46.3  58.7  60.0  46.5  46.3  44.6  44.4  69.2  70.8 
P5 P4  55.8  58.6  52.2  53.5  71.1  73.0  70.2  71.7  68.6  70.3  80.2  82.4 
Average  53.0  55.2  61.4  62.9  68.3  69.6  63.9  65.2  64.8  66.4  75.8  77.5 
Dataset  A D  A W  D A  D W  W A  W D  Avg  

OS  OS*  OS  OS*  OS  OS*  OS  OS*  OS  OS*  OS  OS*  OS  OS*  
BADA  85.8  85.8  76.9  76.6  89.4  91.5  96.0  96.6  83.4  83.1  97.1  97.3  88.0  88.5 
BADA+  92.7  93.3  89.8  90.6  81.6  81.7  98.0  99.5  83.6  78.9  98.5  100.0  89.9  90.7 
BADA+c  92.2  94.1  87.6  89.0  81.5  84.1  97.7  100.0  80.3  83.4  97.3  100.0  89.5  91.8 
BADA++c  94.1  94.6  89.2  89.7  83.2  83.4  98.5  100.0  83.3  81.9  98.6  100.0  90.9  91.7 
BADA+  95.5  97.0  92.6  94.0  82.3  82.6  98.0  99.5  83.4  79.5  98.4  100.0  91.0  92.2 
OURS  96.0  97.5  92.5  93.7  85.3  86.0  98.4  100.0  83.2  83.9  98.6  100.0  92.3  93.5 
Results on three tasks of Digit datasets are shown in Table III, Obviously, our method achieves the best performance ( on OS and on OS*) within three tasks. Moreover, compared to U M and M U, M U is more challenging. There is a bigger distribution between S and M. Whereas on the most difficult task, our method still outperforms the best baseline STA by and on OS and OS* respectively. It is worth noting that DAOD is a shallow method, which cannot extract feature by convolutional neural network. Therefore there is no comparison on Digits. The results of ATI are from [liu2019separate].
Results on standard benchmark object datasets (Office31 and OfficeHome) are recorded in Table IV. For Office31, our method significantly outperforms baselines among
Comments
There are no comments yet.