Universal Source-Free Domain Adaptation

by   Jogendra Nath Kundu, et al.

There is a strong incentive to develop versatile learning techniques that can transfer the knowledge of class-separability from a labeled source domain to an unlabeled target domain in the presence of a domain-shift. Existing domain adaptation (DA) approaches are not equipped for practical DA scenarios as a result of their reliance on the knowledge of source-target label-set relationship (e.g. Closed-set, Open-set or Partial DA). Furthermore, almost all prior unsupervised DA works require coexistence of source and target samples even during deployment, making them unsuitable for real-time adaptation. Devoid of such impractical assumptions, we propose a novel two-stage learning process. 1) In the Procurement stage, we aim to equip the model for future source-free deployment, assuming no prior knowledge of the upcoming category-gap and domain-shift. To achieve this, we enhance the model's ability to reject out-of-source distribution samples by leveraging the available source data, in a novel generative classifier framework. 2) In the Deployment stage, the goal is to design a unified adaptation algorithm capable of operating across a wide range of category-gaps, with no access to the previously seen source samples. To this end, in contrast to the usage of complex adversarial training regimes, we define a simple yet effective source-free adaptation objective by utilizing a novel instance-level weighting mechanism, named as Source Similarity Metric (SSM). A thorough evaluation shows the practical usability of the proposed learning framework with superior DA performance even over state-of-the-art source-dependent approaches.


page 1

page 2

page 3

page 4


UMAD: Universal Model Adaptation under Domain and Category Shift

Learning to reject unknown samples (not present in the source classes) i...

Towards Inheritable Models for Open-Set Domain Adaptation

There has been a tremendous progress in Domain Adaptation (DA) for visua...

Less Confusion More Transferable: Minimum Class Confusion for Versatile Domain Adaptation

Domain Adaptation (DA) transfers a learning model from a labeled source ...

Keep it Simple: Image Statistics Matching for Domain Adaptation

Applying an object detector, which is neither trained nor fine-tuned on ...

Balancing Discriminability and Transferability for Source-Free Domain Adaptation

Conventional domain adaptation (DA) techniques aim to improve domain tra...

Source-Relaxed Domain Adaptation for Image Segmentation

Domain adaptation (DA) has drawn high interests for its capacity to adap...

Pareto Domain Adaptation

Domain adaptation (DA) attempts to transfer the knowledge from a labeled...

1 Introduction

Deep learning models have proven to be highly successful over a wide variety of tasks [20, 35]. However, a majority of these are heavily dependent on access to a huge amount of labeled data to achieve a reliable level of generalization. A recognition model trained on a certain distribution of labeled samples (source domain) often fails to generalize [7] when deployed in a new environment (target domain) with discrepancy in the data distribution [43]. Unsupervised Domain Adaptation (DA) algorithms seek to minimize this discrepancy without accessing the target label information, either by learning a domain invariant feature representation [26, 21, 9, 45], or by learning independent transformations [28, 32] to a common latent representation through adversarial distribution matching [46, 22].

Figure 1: We address unsupervised domain adaptation in absence of source data (source-free), without any category-gap knowledge (universal). A lock indicates “no access” during adaptation.

Most of the existing approaches [38, 56, 46] assume a shared label set between the source and the target domains (i.e. ), i.e. Closed-Set DA (Fig. 2A). Though this assumption helps gain various insights for DA algorithms [2], it rarely holds true in real-world scenarios. Recently, researchers have independently explored two broad adaptation settings by partly relaxing the Closed-Set assumption (see Fig. 2A). In the first kind, Partial DA [54, 5, 6], the target label space is considered as a subset of the source label space (i.e. ). This setting is more suited for large-scale universal source datasets, which will almost always subsume the label set of a wide range of target domains. However, the availability of such a large-scale source is highly questionable for a wide range of input domains. In the second kind, Open-set DA [39, 1, 10], the target label space is considered as a superset of the source label space (i.e. ). The major challenge in this setting is to detect target samples from the unobserved categories (similar to detection of out-of-distribution samples [31]) in a fully-unsupervised scenario. Apart from the above two extremes, certain works define a partly mixed scenario by allowing a “private” label set for both source and target domains (i.e. and ) but with extra supervision such as few-shot labeled data [30] or the knowledge of common categories [4].

Most of the prior approaches [46, 39, 5] consider each scenario in isolation and propose independent solutions. Thus, the knowledge of the relationship between the source and the target label space (category-gap) is required to carefully choose whether to apply Closed-set, Open-set or Partial DA algorithm for the problem in hand. Furthermore, all the prior unsupervised DA works require the coexistence of source and target samples even during deployment, hence are not source-free. This is highly impractical, as labeled source data may not be accessible after deployment due to several reasons. Many datasets are withheld due to privacy concerns (e.g. biometric data) [29] or simply due to the proprietary nature of the dataset. Moreover, in real-time deployment scenarios [51], training on the entire source data is not feasible due to computational limitations. Even otherwise, an accidental loss (e.g. data corruption) of the source data renders the prior unsupervised DA methods non-viable for a future model adaptation [25]. Acknowledging these issues, we aim to formalize a unified solution for unsupervised DA completely devoid of these limitations. Our problem setting is illustrated in Fig. 1 (note source-free and universal).

The available DA techniques heavily rely on the adversarial discriminative [46, 56, 38] strategy. Thus, they require access to the source samples to reliably characterize the source domain distribution. Clearly, such approaches are not equipped to operate in a source-free setting. Though a generative model can be used as a memory-network [41, 3] to realize source-free adaptation, such a solution is not scalable for large-scale source datasets (e.g

. ImageNet 

[36]), as it introduces unnecessary additional parameters along with the associated training difficulties [40]. As a novel alternative, we hypothesize that, to facilitate source-free adaptation, the source model should have the ability to reject samples that are out of the source data distribution [14].

In general, fully-discriminative deep models have a tendency to over-generalize for regions not covered by the training set, hence are highly confident in their predictions even for negative samples [24]. Though this problem can be addressed by training the source model on a negative source dataset, a wrong choice of negative data makes the model incapable of rejecting unknown target samples encountered after deployment [42]. Aiming towards a data-free setting, we hypothesize that the target samples have similar local part-based features as found in the source data, which also holds for novel target categories as encountered in Open-set DA. For example, consider an animal classification model (see Fig. 2B) where the deployed environment contains novel target categories unobserved in the source dataset (e.g. Giraffe). Here, the composition of local regions (e.g. body-parts) between pairs of source images drawn from different categories (e.g. Seahorse and Tiger) can be used to synthetically generate hypothetical negative classes which can act as a proxy for the unobserved animal categories. Such synthetic samples are a better approximation of the expected characteristics (e.g. long-neck) in the deployed target environment, as compared to samples from other unrelated datasets.

In summary, we propose a convenient DA framework, which is equipped to address Universal Source-Free Domain Adaptation. A thorough evaluation shows the practical usability of our approach with superior DA performance even over state-of-the-art source dependent approaches, across a variety of unknown label-set relationships.

Figure 2: a) Various label-set relationships (category-gap). b) Composite image as a reliable negative sample.
Figure 3: Latent space cluster arrangement during adaptation (see Section 3.1.1).

2 Related work

We briefly review the available domain adaptation methods under the three major divisions according to the assumption on label-set relationship. a) Closed-set DA. The cluster of prior closed-set DA works focuses on minimizing the domain gap at the latent space either by minimizing well-defined statistical distance functions [49, 8, 55, 37] or by formalizing it as an adversarial distribution matching problem [46, 17, 27, 16, 15] inspired from the Generative Adversarial Nets [11]. Certain prior works [41, 57, 15] use the GAN framework to explicitly generate target-like images translated from the source image samples, which is also regarded as pixel-level adaptation [3] in contrast to other feature level adaptation works [32, 46, 26, 28]. b) Partial DA. [5] proposed to achieve adversarial class-level matching by utilizing multiple domain discriminators furnishing a class-level and an instance-level weighting for individual data samples. [54] proposed to utilize importance weights for source samples depending on their similarity to the target domain data using an auxilliary discriminator. To effectively address the problem of negative-transfer [50], [6] employed a single discriminator to achieve both adversarial adaptation and class-level weighting of source samples. c) Open-set DA. [39] proposed a more general open-set DA setting without accessing the knowledge of source-private labels in contrast to [33]. They extended the classifier to accommodate an additional “unknown” class, which is adversarially trained against other source classes to detect target-private samples. d) Universal DA. [52] proposed the Universal DA setting, which requires no prior knowledge of label-set relationship (see Fig. 2A), similar to our proposed setting, but considers access to both source and target samples during adaptation.

3 Proposed approach

Our approach to solve the source-free domain adaptation problem is broadly divided into a two stage process. Note, source-free DA means the adaptation step is source-free. See Supplementary for a notation table.

a) Procurement stage. In this stage, we have a labeled source dataset, , where is the distribution of source samples and denotes the label-set of the source domain. Here, the prime objective is to equip the model for a future source-free adaptation, where the model will encounter an unknown domain-shift and category-gap in the target domain. To achieve this we rely on an artificially generated negative dataset, , where is the distribution of negative source samples such that .

b) Deployment stage. After obtaining a trained model from the Procurement stage, the model will have its first encounter with the unlabeled target domain samples from the deployed environment. We denote the unlabeled target data by , where is the distribution of target samples. Note that, the source dataset from the Procurement stage is inaccessible during adaptation in the Deployment stage. Suppose that, is the label-set of the target domain. In the Universal setting [52], we do not have any knowledge of the relationship between and . Nevertheless, without the loss of generality, we define the shared labels as and the private label-set for the source and the target domains as and respectively.

Figure 4: A) Simulated labeled negative samples using randomly created spline segments (in pink), B) Proposed architecture, C) Procurement stage promotes intra-class compactness with inter-class separability.

3.1 Learning in the Procurement stage

3.1.1. Challenges. The available DA techniques heavily rely on the adversarial discriminative [46, 38] strategy. Thus, they require access to the source data to reliably characterize the source distribution. Further, these approaches are not equipped to operate in a source-free setting. Though a generative model can be used as a memory-network [41, 3] to realize source-free adaptation, such a solution is not scalable for large-scale source datasets (e.g. ImageNet [36]), as it introduces unnecessary additional parameters alongside the associated training difficulties [40]. This calls for a fresh analysis of the requirements beyond the existing solutions.

In a general DA scenario, with access to source samples in the Deployment stage (specifically for Open-set or Partial DA), a widely adopted approach is to learn domain invariant features. In such approaches, the placement of source category clusters is learned in the presence of unlabeled target samples which obliquely provides a supervision regarding the relationship between and . For instance, in case of Open-set DA, the source clusters may have to disperse to make space for the clusters from target-private (see Fig. 3A to 3B). Similarly, in partial DA, the source clusters may have to rearrange themselves to keep all the target shared clusters () separated from the source private (see Fig. 3A to 3C). However in a completely source-free framework, we do not have the liberty to leverage such information as source and target samples never coexist together during training. Motivated by the adversarial discriminative DA technique [46], we hypothesize that, inculcating the ability to reject samples that are out of the source data distribution can facilitate future source-free domain alignment using this discriminatory knowledge. Therefore, in the Procurement stage the overarching objective is two-fold.

  • [leftmargin=6mm]

  • Firstly, we must aim to learn a certain placement of source clusters best suited for all kinds of category-gap scenarios acknowledging the fact that, a source-free scenario does not allow us to modify the placement in the presence of target samples during adaptation (Fig. 3D).

  • Secondly, the model must have the ability to reject out-of-distribution samples, which is an essential requirement for unsupervised adaptation under domain-shift.

3.1.2. Solution.

In the presence of source data, we aim to restrain the model’s domain and category bias which is generally inculcated as a result of the over-confident supervised learning paradigms. To achieve this goal, we adopt two regularization strategies viz. i) utilization of a labeled simulated negative source dataset to generalize for the latent regions not covered by the given positive source samples (see Fig. 

4C) and ii) regularization via generative modeling.

How to configure the negative source dataset?  While configuring , the following key properties have to be met. Firstly, latent clusters formed by the negative categories must lie in-between the latent clusters of positive source categories to enable a higher degree of intra-class compactness with inter-class separability (Fig. 4C). Secondly, the negative source samples must enrich the source domain distribution without forming a new domain by themselves. This rules out the use of Mixup [53] or adversarial noise  [44] as negative samples in this scenario. Thus, we propose the following method to synthesize the desired negative source dataset.

1:input: , ; , , : Parameters of , and respectively.
2:initialization: pretrain using cross-entropy loss on followed by initialization of the sample mean and covariance (at -space) of for from class ;
3:for  do
4:     ; ; ; for ;
5:     , and where , are the indices of ground-truth class ,
6:     ; ;
7:     , where
8:     Update , , by minimizing , , , and alternatively using separate optimizers.
9:     if   then
10:          Recompute the sample mean () and covariance () of for from class ;
11:          (For : generate fresh latent-simulated negative samples using the updated priors)      
Algorithm 1 Training algorithm in the Procurement stage

Image-composition. One of the key characteristics shared between the samples from source and unknown target domain is the semantics of the local part-related features specifically for image-based object recognition tasks. Relying on this assumption, we propose a systematic procedure to simulate the samples of by randomly compositing local regions between a pair of images drawn from the source dataset (see Fig. 4A and Suppl. Algo. 1). These composite samples created on image pairs from different positive source classes are expected to lie in-between the two source clusters in the latent space, thus introducing a combinatorial amount of new class labels i.e. .

This approach is motivated from and conforms with the observation in the literature, that one can indeed generate semantics for new classes using the known classes [23, 48]. Intuitively, from the perspective of combining features, when local parts from two different positive source classes are combined, the resulting image would tend to produce activations for both the classes (due to the presence of salient features from both classes). Thus, the sample would fall near the decision boundary in-between the two clusters in the latent space. Alternatively, from the perspective of discarding features, as we mask-out regions in a source image (Fig. 4), the activation in the corresponding class reduces. Thus, the model would be less confident for such samples, thereby emulating the characteristics of a negative class.

Training procedure. The generative source classifier is divided into three stages; i) backbone-model , ii) feature extractor , and iii) classifier (see Fig. 4B). The output of the backbone-model is denoted as , where is drawn from either or . Following this, the output of and are represented as and respectively.

outputs a

-dimensional logit vector denoted as

for , where

. The individual class probabilities,

are obtained by applying softmax over the logits i.e. , where denotes function composition, denotes the softmax activation and the superscript denotes the class-index.

Additionally, we define priors only for the positive source classes, (for ) at the intermediate embedding

. Here, the parameters of the normal distributions are computed during training as shown in line-10 of Algo. 

1. A cross-entropy loss over these prior distributions is defined as (line-7 in Algo. 1), that effectively enforces intra-class compactness with inter-class separability (Fig. 4C).

Motivated by generative variational auto-encoder (VAE) setup [19], we introduce a decoder , which minimizes the cyclic reconstruction loss selectively for the samples from positive source categories and randomly drawn samples from the corresponding class priors (i.e. losses and in line-6 of Algo. 1). This, along with a lower weightage for the negative source categories (i.e. at the cross-entropy loss in line-6 of Algo. 1), is incorporated to deliberately bias towards the positive source samples, considering the level of unreliability of the generated negative dataset.

3.2 Learning in the Deployment stage

3.2.1. Challenges. We hypothesize that, the large number of negative source categories along with the positive source classes i.e. can be interpreted as a universal source dataset, which can subsume label-set of a wide range of target domains. Moreover, we seek to realize a unified adaptation algorithm, which can work for a wide range of category-gaps. However, a forceful adaptation of target samples to positive source categories will cause target-private samples to be classified as an instance of the source private or the common label-set, instead of being classified as "unknown", i.e. one of the negative categories in .

3.2.2. Solution. In contrast to domain agnostic architectures [52, 5, 38], we resort to an architecture supporting domain specific features [46], as we must avoid disturbing the placement of source clusters obtained from the Procurement stage. This is an essential requirement to retain the task-dependent knowledge gathered from the source dataset. Thus, we introduce a domain specific feature extractor denoted as , whose parameters are initialized from the fully trained (see Fig. 4B). Further, we aim to exploit the learned generative classifier from the Procurement stage to complement for the purpose of separate ad-hoc networks (critic or discriminator) as utilized by the prior works [52, 6].

a) Source Similarity Metric (SSM). For each target sample , we define a weighting factor called the SSM. A higher value of this metric indicates ’s similarity towards the positive source categories, specifically inclined towards the common label space . Similarly, a lower value of this metric indicates ’s similarity towards the negative source categories , showing its inclination towards the private target labels . Let, , be the distribution of source and target samples with labels in and respectively. We define, and to denote the distribution of samples from source and target domains belonging to the shared label-set . Then, the SSM for the positive and negative source samples should lie on the two extremes, forming the inequality:


To formalize the SSM criterion we rely on the class probabilities defined at the output of source model only for the positive class labels, i.e. for . Note that, is obtained by performing softmax over categories as discussed in the Procurement stage. Finally, the SSM and its complement are defined as,


We hypothesize that this definition will satisfy Eq. 1, as a result of the generative learning strategy adopted in the Procurement stage. In Eq. 2 the exponent is used to further amplify separation between target samples from the shared label-set and those from the private label-set (Fig. 5A).

b)Source-free domain adaptation. To perform domain adaptation, the objective function aims to move the target samples with higher SSM value towards the clusters of positive source categories and vice-versa at the frozen source embedding, -space (from the Procurement stage). To achieve this, parameters of only network are allowed to be trained in the Deployment stage. However, the decision of weighting the loss on target samples towards the positive or negative source clusters is computed using the source feature extractor i.e. the SSM in Eq. 2. We define, the deployment model as using the target feature extractor, with softmax predictions over categories obtained as

. Thus, the primary loss function for adaptation is defined as,


Additionally, in the absence of label information, there would be uncertainty in the predictions as a result of distributed class probabilities. This leads to a higher entropy for such samples. Entropy minimization [12, 28] is adopted in such scenarios to move the target samples close to the highly confident regions (i.e. positive and negative cluster centers from the Procurement stage) of the classifier’s feature space. However, it has to be done separately for positive and negative source categories based on the SSM values of individual target samples to effectively distinguish the target-private set from the full target dataset. To achieve this, we define two different class probability vectors separately for the positive and negative source classes (Fig. 4B) as,


We obtain the entropy of the target samples for the positive source classes as and for the negative classes as . Subsequently, the entropy minimization is formulated as,


Thus, the final loss function for adaptation is . Here is a hyper-parameter controlling the importance of entropy minimization during adaptation.

4 Experiments

We perform a thorough evaluation of the proposed universal source-free domain adaptation framework against prior state-of-the-art methods across multiple datasets. We also provide a comprehensive ablation study to establish generalizability of the approach across a variety of label-set relationships and justification of the various model components.

4.1 Experimental Setup

a) Datasets. We resort to the experimental settings followed by [52] (UAN). Office-Home [47] dataset consists of images from 4 different domains - Artistic (Ar), Clip-art (Cl), Product (Pr) and Real-world (Rw). VisDA2017 [34] dataset comprises of 12 categories with synthetic (S) and real (R) domains. Office-31  [37] dataset contains images from 3 distinct domains - Amazon (A), DSLR (D) and Webcam (W). To evaluate scalability, we use ImageNet-Caltech with 84 common classes (following [52]).

b) Simulation of labeled negative samples. To simulate negative samples for training in the Procurement stage, we first sample a pair of images, each from different categories of , to create unique negative classes in . Note that, we impose no restriction on how the hypothetical classes are created (e.g. one can composite non-animal with animal). A random mask is defined which splits the images into two complementary regions using a quadratic spline passing through a central image region (see Suppl. Algo. 1). Then, the negative image is created by merging alternate mask regions as shown in Fig. 2A. For the IC task of ImageNet-Caltech, the source domain ImageNet (I), having 1000 classes, results in a large number of possible negative classes (i.e. ). We address this by randomly selecting only 600 of these negative classes for ImageNet (I), and 200 negative classes for Caltech (C) in the task CI.

Method SF Office-Home
ArCl ArPr ArRw ClAr ClPr ClRw PrAr PrCl PrRw RwAr RwCl RwPr Avg
ResNet [13] 59.37 76.58 87.48 69.86 71.11 81.66 73.72 56.30 86.07 78.68 59.22 78.59 73.22
IWAN [54] 52.55 81.40 86.51 70.58 70.99 85.29 74.88 57.33 85.07 77.48 59.65 78.91 73.39
PADA [54] 39.58 69.37 76.26 62.57 67.39 77.47 48.39 35.79 79.60 75.94 44.50 78.10 62.91
ATI [33] 52.90 80.37 85.91 71.08 72.41 84.39 74.28 57.84 85.61 76.06 60.17 78.42 73.29
OSBP [39] 47.75 60.90 76.78 59.23 61.58 74.33 61.67 44.50 79.31 70.59 54.95 75.18 63.90
UAN [52] 63.00 82.83 87.85 76.88 78.70 85.36 78.22 58.59 86.80 83.37 63.17 79.43 77.02
Ours USFDA 63.35 83.30 89.35 70.96 72.34 86.09 78.53 60.15 87.35 81.56 63.17 88.23 77.03
Table 1: Average per-class accuracy () for universal-DA tasks on Office-Home dataset (with ). Scores for the prior works are directly taken from UAN [52]. Here, SF denotes support for source-free adaptation.
Figure 5: Ablative analysis on the task AD (Office-31). A) Histogram of SSM values of separately for target-private and target-shared samples at the Procurement iteration 100 (top) and 500 (bottom). B) The sensitivity curve for shows marginally stable adaptation accuracy for a wide-range of values. C) A marginal increase in is observed with increase in .

4.2 Evaluation Methodology

a) Average accuracy on Target dataset, . We resort to the evaluation protocol proposed in the VisDA2018 Open-Set Classification challenge. Accordingly, all the target-private classes are grouped into a single "unknown" class and the metric reports the average of per-class accuracy over classes. In our framework, a target sample is marked as "unknown" if it is classified () into any of the negative classes. In contrast, UAN [52] relies on the sample-level weight, to mark a target sample as "unknown

" based on a sensitive threshold hyperparameter. Also note that our method is truly

source-free during adaptation, while all other methods have access to the full source-data.

b) Accuracy on Target-Unknown data, . We evaluate the target unknown accuracy, , as the proportion of actual target-private samples (i.e. ) being classified as "unknown" after adaptation. Note that, UAN [52] does not report which is a crucial metric to evaluate the vulnerability of the model after its deployment in the target environment. The metric fails to capture this as a result of class-imbalance in the Open-set scenario [39]. Hence, to realize a common evaluation ground, we train the UAN implementation provided by the authors [52] and denote it as UAN* in further sections of this paper. We observe that, the UAN[52] training algorithm is often unstable with a decreasing trend of and

over increasing training iterations. We thus report the mean and standard deviation of the peak values of

and achieved by UAN*, over 5 separate runs on Office-31 dataset (see Table 2).

c) Implementation Details.

We implement our network in PyTorch and use ResNet-50 

[13] as the backbone-model , pre-trained on ImageNet [36] inline with UAN [52]. The complete architecture of other components is provided in the Supplementary. We denote our approach as USFDA. A sensitivity analysis of the major hyper-parameters used in the proposed framework is provided in Fig. 5B-C, and Suppl. Fig. 2B. In all our ablations across the datasets, we fix the hyperparameters values as and . We utilize Adam optimizer [18] with a fixed learning rate of for training in both the Procurement and the Deployment stages. For the implementation of UAN*, we use the hyper-parameter value , as specified by the authors for the task AD in the Office-31 dataset.

4.3 Discussion

a) Comparison against prior arts. We compare our approach with UAN [52], and other prior methods. The results are presented in Tables 1-2. Our approach yields state-of-the-art results even in a source-free setting on several tasks. Particularly in Table 2, we present on various datasets and also report the mean and standard-deviation for both the accuracy metrics computed over 5 random initializations in the Office-31 dataset (the last six rows). Our method is able to achieve much higher than UAN* [52], highlighting our superiority as a result of the novel learning approach incorporated in both Procurement and Deployment stages. We also perform a characteristic comparison of algorithm complexity in terms of the amount of learnable parameters and training time; a) Procurement: [11.1M, 380s], b) Deployment: [3.5M, 44s], c) UAN [52]: [26.7M, 450s] (in a consistent setting). The significant computational advantage in the Deployment stage makes our approach highly suitable for real-time adaptation. In contrast to UAN, the proposed framework offers a much simpler adaptation algorithm devoid of networks such as an adversarial discriminator and additional finetuning of the ResNet-50 backbone.

Method SF Office-31 VisDA ImNet-Caltech
ResNet [13] 75.94 89.60 90.91 80.45 78.83 81.42 82.86 52.80 70.28 65.14
IWAN [54] 85.25 90.09 90.00 84.27 84.22 86.25 86.68 58.72 72.19 66.48
PADA [54] 85.37 79.26 90.91 81.68 55.32 82.61 79.19 44.98 65.47 58.73
ATI  [33] 79.38 92.60 90.08 84.40 78.85 81.57 84.48 54.81 71.59 67.36
OSBP [39] 66.13 73.57 85.62 72.92 47.35 60.48 67.68 30.26 62.08 55.48
UAN [52] 85.62 94.77 97.99 86.50 85.45 85.12 89.24 60.83 75.28 70.17
UAN* 83.001.8 94.170.3 95.400.5 83.430.7 86.901.0 87.180.6 88.34 54.21 74.77 71.51
Ours USFDA 85.561.6 95.200.3 97.790.1 88.470.3 87.500.9 86.610.6 90.18 63.92 76.85 72.13
UAN* 20.7211.7 53.532.4 51.575.0 34.433.3 51.884.8 43.111.3 42.54 19.68 33.43 31.24
Ours USFDA 73.987.5 85.642.2 80.001.1 82.232.7 78.593.2 75.521.5 79.32 36.25 51.21 48.76
Table 2: on Office-31 (with ), VisDA (with ), and ImageNet-Caltech (with ). Scores for the prior works are directly taken from UAN [52]. SF denotes support for source-free adaptation.
Figure 6: Comparison across varied label-set relationships for the task AD in Office-31 dataset. A) Visual representation of label-set relationships and at the corresponding instances for B) UAN* [52] and C) ours source-free model. Effectively, the direction along x-axis (blue horizontal arrow) characterizes increasing Open-set complexity. The direction along y-axis (red vertical arrow) shows increasing complexity of Partial DA scenario. The pink diagonal arrow denotes the effect of decreasing shared label space.

b) Does SSM satisfy the expected inequality? Effectiveness of the proposed learning algorithm, in case of source-free deployment, relies on the formulation of SSM, which is expected to satisfy Eq. 1. Fig. 5A shows a histogram of the SSM separately for samples from target-shared (blue) and target-private (red) label space. The success of this metric is attributed to the generative nature of Procurement stage, which enables the source model to distinguish between the marginally more negative target-private samples as compared to the samples from the shared label space.

c) Sensitivity to hyper-parameters. As we tackle DA in a source-free setting simultaneously intending to generalize across varied category-gaps, a low sensitivity to hyperparameters would further enhance our practical usability. To this end, we fix certain hyperparameters for all our experiments (also in Fig. 6C) even across datasets (i.e. , ). Thus, one can treat them as global-constants with being the only hyperparameter, as variations in one by fixing the others yield complementary effect on regularization in the Procurement stage. A thorough analysis reported in the Suppl. Fig. 2, demonstrates a reasonably low sensitivity of our model to these hyperparameters.

d) Generalization across category-gap. One of the key objectives of the proposed framework is to effectively operate in the absence of the knowledge of label-set relationships. To evaluate it in the most compelling manner, we propose a tabular form shown in Fig. 6A. We vary the number of private classes for target and source along the x-axis and y-axis respectively, with a fixed . We compare the metric at the corresponding table instances, shown in Fig. 6B-C. The results clearly highlight superiority of the proposed framework specifically for the more practical scenarios (close to the diagonal instances) as compared to the unrealistic Closed-set setting ().

e) DA in absence of shared categories. In universal adaptation, we seek to transfer the knowledge of "class-separability criterion" obtained from the source domain to the deployed target environment. More concretely, it is attributed to the segregation of data samples based on some expected characteristics, such as classification of objects according to their pose, color, or shape etc. To quantify this, we consider an extreme case where (AD in Office-31 with , ). Allowing access to a single labeled target sample from each category in , we aim to obtain a one-shot recognition accuracy (assignment of cluster index or class label using the one-shot samples as the cluster center at ) to quantify the above metric. We obtain 64.72% accuracy for the proposed framework as compared to 13.43% for UAN*. This strongly validates our superior knowledge transfer capability as a result of the generative classifier with labeled negative samples complementing for the target-private categories.

f) Dependency on the simulated negative dataset. Conceding that a combinatorial amount of negative labels can be created, we evaluate the scalability of the proposed approach, by varying the number of negative classes in the Procurement stage by selecting , , , , and negative classes as reported in the X-axis of Fig. 5C. For the case of negative classes, denoted as in Fig. 5C, we synthetically generate random negative features at the intermediate level , which are at least 3- away from each of the positive source priors for . We then make use of these feature samples along with positive image samples, to train a class Procurement model with a single negative class. The results are reported in Fig. 5C on the AD task of Office-31 dataset with category relationship inline with the setting in Table 2. We observe an acceptable drop in accuracy with decrease in number of negative classes, hence validating scalability of the approach for large-scale classification datasets (such as ImageNet). Similarly, we also evaluated our framework by combining three or more images to form such negative classes. However, we found that with increasing number of negative classes (), the model achieves under-fitting on positive source categories (similar to Fig. 5C, where accuracy reduces beyond a certain limit because of over regularization).

5 Conclusion

We have introduced a novel Universal Source-Free Domain Adaptation framework, acknowledging practical domain adaptation scenarios devoid of any assumption on the source-target label-set relationship. In the proposed two-stage framework, learning in the Procurement stage is found to be highly crucial, as it aims to exploit the knowledge of class-separability in the most general form with enhanced robustness to out-of-distribution samples. Besides this, the success in the Deployment stage is attributed to the well-designed learning objectives effectively utilizing the source similarity criterion. This work can be served as a pilot study towards learning efficient inheritable models in future.

Acknowledgements. This work is supported by a Wipro PhD Fellowship (Jogendra) and a grant from Uchhatar Avishkar Yojana (UAY, IISC_010), MHRD, Govt. of India. We would also like to thank Ujjawal Sharma (IIT Roorkee) for assisting with the implementation of prior arts.


  • [1] M. Baktashmotlagh, M. Faraki, T. Drummond, and M. Salzmann (2019) Learning factorized representations for open-set domain adaptation. In ICLR, Cited by: §1.
  • [2] S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira (2007) Analysis of representations for domain adaptation. In NeurIPS, Cited by: §1.
  • [3] K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan (2017) Unsupervised pixel-level domain adaptation with generative adversarial networks. In CVPR, Cited by: §1, §2, §3.1.
  • [4] P. P. Busto and J. Gall (2017) Open set domain adaptation. In ICCV, Cited by: §1.
  • [5] Z. Cao, M. Long, J. Wang, and M. I. Jordan (2018)

    Partial transfer learning with selective adversarial networks

    In CVPR, Cited by: §1, §1, §2, §3.2.
  • [6] Z. Cao, L. Ma, M. Long, and J. Wang (2018) Partial adversarial domain adaptation. In ECCV, Cited by: §1, §2, §3.2.
  • [7] Y. Chen, W. Chen, Y. Chen, B. Tsai, Y. Frank Wang, and M. Sun (2017) No more discrimination: cross city adaptation of road scene segmenters. In ICCV, Cited by: §1.
  • [8] L. Duan, I. W. Tsang, and D. Xu (2012) Domain transfer multiple kernel learning. TPAMI 34 (3), pp. 465–479. Cited by: §2.
  • [9] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky (2016)

    Domain-adversarial training of neural networks


    The Journal of Machine Learning Research

    17 (1), pp. 2096–2030.
    Cited by: §1.
  • [10] Z. Ge, S. Demyanov, Z. Chen, and R. Garnavi (2017) Generative openmax for multi-class open set classification. In BMVC, Cited by: §1.
  • [11] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In NeurIPS, Cited by: §2.
  • [12] Y. Grandvalet and Y. Bengio (2005) Semi-supervised learning by entropy minimization. In NeurIPS, Cited by: §3.2.
  • [13] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In CVPR, Cited by: §4.2, Table 1, Table 2.
  • [14] D. Hendrycks, M. Mazeika, and T. Dietterich (2019)

    Deep anomaly detection with outlier exposure

    In ICLR, Cited by: §1.
  • [15] J. Hoffman, E. Tzeng, T. Park, J. Zhu, P. Isola, K. Saenko, A. A. Efros, and T. Darrell (2018) Cycada: cycle-consistent adversarial domain adaptation. In ICLR, Cited by: §2.
  • [16] L. Hu, M. Kan, S. Shan, and X. Chen (2018) Duplex generative adversarial network for unsupervised domain adaptation. In CVPR, Cited by: §2.
  • [17] G. Kang, L. Zheng, Y. Yan, and Y. Yang (2018)

    Deep adversarial attention alignment for unsupervised domain adaptation: the benefit of target expectation maximization

    In ECCV, Cited by: §2.
  • [18] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.2.
  • [19] D. P. Kingma and M. Welling (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §3.1.
  • [20] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012)

    Imagenet classification with deep convolutional neural networks

    In NeurIPS, Cited by: §1.
  • [21] A. Kumar, P. Sattigeri, K. Wadhawan, L. Karlinsky, R. Feris, B. Freeman, and G. Wornell (2018) Co-regularized alignment for unsupervised domain adaptation. In NeurIPS, Cited by: §1.
  • [22] J. N. Kundu, N. Lakkakula, and R. V. Babu (2019) UM-adapt: unsupervised multi-task adaptation using adversarial cross-task distillation. In ICCV, Cited by: §1.
  • [23] C. H. Lampert, H. Nickisch, and S. Harmeling (2009) Learning to detect unseen object classes by between-class attribute transfer. In CVPR, Cited by: §3.1.
  • [24] K. Lee, H. Lee, K. Lee, and J. Shin (2018) Training confidence-calibrated classifiers for detecting out-of-distribution samples. In ICLR, Cited by: §1.
  • [25] Z. Li and D. Hoiem (2017) Learning without forgetting. TPAMI 40 (12), pp. 2935–2947. Cited by: §1.
  • [26] M. Long, Y. Cao, J. Wang, and M. Jordan (2015) Learning transferable features with deep adaptation networks. In ICML, Cited by: §1, §2.
  • [27] M. Long, Z. Cao, J. Wang, and M. I. Jordan (2018) Conditional adversarial domain adaptation. In NeurIPS, Cited by: §2.
  • [28] M. Long, H. Zhu, J. Wang, and M. I. Jordan (2016) Unsupervised domain adaptation with residual transfer networks. In NeurIPS, Cited by: §1, §2, §3.2.
  • [29] R. G. Lopes, S. Fenu, and T. Starner (2017) Data-free knowledge distillation for deep neural networks. In LLD Workshop at NeurIPS, Cited by: §1.
  • [30] Z. Luo, Y. Zou, J. Hoffman, and L. F. Fei-Fei (2017) Label efficient learning of transferable representations acrosss domains and tasks. In NeurIPS, Cited by: §1.
  • [31] A. Malinin and M. Gales (2018)

    Predictive uncertainty estimation via prior networks

    In NeurIPS, Cited by: §1.
  • [32] J. Nath Kundu, P. Krishna Uppala, A. Pahuja, and R. Venkatesh Babu (2018) Adadepth: unsupervised content congruent adaptation for depth estimation. In CVPR, Cited by: §1, §2.
  • [33] P. Panareda Busto and J. Gall (2017) Open set domain adaptation. In ICCV, Cited by: §2, Table 1, Table 2.
  • [34] X. Peng, B. Usman, N. Kaushik, J. Hoffman, D. Wang, and K. Saenko (2018) Visda: the visual domain adaptation challenge. In CVPR workshops, Cited by: §4.1.
  • [35] S. Ren, K. He, R. Girshick, and J. Sun (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In NeurIPS, Cited by: §1.
  • [36] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. (2015) Imagenet large scale visual recognition challenge. IJCV 115 (3), pp. 211–252. Cited by: §1, §3.1, §4.2.
  • [37] K. Saenko, B. Kulis, M. Fritz, and T. Darrell (2010) Adapting visual category models to new domains. In ECCV, Cited by: §2, §4.1.
  • [38] K. Saito, K. Watanabe, Y. Ushiku, and T. Harada (2018) Maximum classifier discrepancy for unsupervised domain adaptation. In CVPR, Cited by: §1, §1, §3.1, §3.2.
  • [39] K. Saito, S. Yamamoto, Y. Ushiku, and T. Harada (2018)

    Open set domain adaptation by backpropagation

    In ECCV, Cited by: §1, §1, §2, §4.2, Table 1, Table 2.
  • [40] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen (2016) Improved techniques for training gans. In NeurIPS, Cited by: §1, §3.1.
  • [41] S. Sankaranarayanan, Y. Balaji, C. D. Castillo, and R. Chellappa (2018) Generate to adapt: aligning domains using generative adversarial networks. In CVPR, Cited by: §1, §2, §3.1.
  • [42] A. Shafaei, M. Schmidt, and J. Little (2019) A Less Biased Evaluation of Out-of-distribution Sample Detectors. In BMVC, Cited by: §1.
  • [43] H. Shimodaira (2000) Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference 90 (2), pp. 227–244. Cited by: §1.
  • [44] R. Shu, H. Bui, H. Narui, and S. Ermon (2018) A DIRT-t approach to unsupervised domain adaptation. In ICLR, Cited by: §3.1.
  • [45] E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko (2015) Simultaneous deep transfer across domains and tasks. In ICCV, Cited by: §1.
  • [46] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell (2017) Adversarial discriminative domain adaptation. In CVPR, Cited by: §1, §1, §1, §1, §2, §3.1, §3.1, §3.2.
  • [47] H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan (2017) Deep hashing network for unsupervised domain adaptation. In CVPR, Cited by: §4.1.
  • [48] O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al. (2016) Matching networks for one shot learning. In NeurIPS, Cited by: §3.1.
  • [49] X. Wang and J. Schneider (2014) Flexible transfer learning under support and model shift. In NeurIPS, Cited by: §2.
  • [50] Z. Wang, Z. Dai, B. Póczos, and J. Carbonell (2019) Characterizing and avoiding negative transfer. In CVPR, Cited by: §2.
  • [51] A. Wu, W. Zheng, X. Guo, and J. Lai (2019) Distilled person re-identification: towards a more scalable system. In CVPR, Cited by: §1.
  • [52] K. You, M. Long, Z. Cao, J. Wang, and M. I. Jordan (2019-06) Universal domain adaptation. In CVPR, Cited by: §2, §3.2, §3, Figure 6, §4.1, §4.2, §4.2, §4.2, §4.3, Table 1, Table 2.
  • [53] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz (2018) Mixup: beyond empirical risk minimization. In ICLR, Cited by: §3.1.
  • [54] J. Zhang, Z. Ding, W. Li, and P. Ogunbona (2018) Importance weighted adversarial nets for partial domain adaptation. In CVPR, Cited by: §1, §2, Table 1, Table 2.
  • [55] K. Zhang, B. Schölkopf, K. Muandet, and Z. Wang (2013) Domain adaptation under target and conditional shift. In ICML, Cited by: §2.
  • [56] W. Zhang, W. Ouyang, W. Li, and D. Xu (2018) Collaborative and adversarial network for unsupervised domain adaptation. In CVPR, Cited by: §1, §1.
  • [57] J. Zhu, T. Park, P. Isola, and A. A. Efros (2017)

    Unpaired image-to-image translation using cycle-consistent adversarial networks

    In ICCV, Cited by: §2.