ProxyMix: Proxy-based Mixup Training with Label Refinery for Source-Free Domain Adaptation

by   Yuhe Ding, et al.

Unsupervised domain adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. Owing to privacy concerns and heavy data transmission, source-free UDA, exploiting the pre-trained source models instead of the raw source data for target learning, has been gaining popularity in recent years. Some works attempt to recover unseen source domains with generative models, however introducing additional network parameters. Other works propose to fine-tune the source model by pseudo labels, while noisy pseudo labels may misguide the decision boundary, leading to unsatisfied results. To tackle these issues, we propose an effective method named Proxy-based Mixup training with label refinery (ProxyMix). First of all, to avoid additional parameters and explore the information in the source model, ProxyMix defines the weights of the classifier as the class prototypes and then constructs a class-balanced proxy source domain by the nearest neighbors of the prototypes to bridge the unseen source domain and the target domain. To improve the reliability of pseudo labels, we further propose the frequency-weighted aggregation strategy to generate soft pseudo labels for unlabeled target data. The proposed strategy exploits the internal structure of target features, pulls target features to their semantic neighbors, and increases the weights of low-frequency classes samples during gradient updating. With the proxy domain and the reliable pseudo labels, we employ two kinds of mixup regularization, i.e., inter- and intra-domain mixup, in our framework, to align the proxy and the target domain, enforcing the consistency of predictions, thereby further mitigating the negative impacts of noisy labels. Experiments on three 2D image and one 3D point cloud object recognition benchmarks demonstrate that ProxyMix yields state-of-the-art performance for source-free UDA tasks.


page 1

page 2

page 9


Generation, augmentation, and alignment: A pseudo-source domain based method for source-free domain adaptation

Conventional unsupervised domain adaptation (UDA) methods need to access...

Improving Pseudo Labels With Intra-Class Similarity for Unsupervised Domain Adaptation

Unsupervised domain adaptation (UDA) transfers knowledge from a label-ri...

Adaptive Semantic Segmentation with a Strategic Curriculum of Proxy Labels

Training deep networks for semantic segmentation requires annotation of ...

PANDA: Prototypical Unsupervised Domain Adaptation

Previous adversarial domain alignment methods for unsupervised domain ad...

Dual-Correction Adaptation Network for Noisy Knowledge Transfer

Previous unsupervised domain adaptation (UDA) methods aim to promote tar...

On Assessing the Usefulness of Proxy Domains for Developing and Evaluating Embodied Agents

In many situations it is either impossible or impractical to develop and...

Unsupervised Domain Adaptation for Device-free Gesture Recognition

Device free human gesture recognition with Radio Frequency signals has a...

Code Repositories

I Introduction

Fig. 1: The motivation of ProxyMix, which aligns the unseen source domain and target domain by two aspects: 1) aligning the proxy and target domain; and 2) refining the pseudo labels.

The standard practice in the deep learning era—learning with massively labeled data—becomes expensive and laborious in many real-world scenarios. Besides, the learned models often perform poorly in generalization to new unlabeled domains due to the domain discrepancy

[1]. Hence, considerable efforts are devoted to unsupervised domain adaptation (UDA) [10, 24, 12, 31], which aims to transfer knowledge from a labeled source dataset to an unlabeled target dataset. In recent years, UDA methods have been widely explored in various tasks such as image classification [12] and semantic segmentation [53]

. The key problem of UDA is to alleviate the gap across different domains. Prior UDA methods mainly fall into three paradigms. The first paradigm aims to pull the statistical moments of different feature distributions closer

[72, 6], and the second paradigm introduces adversarial training with additional discriminators [12, 54]. The last paradigm adopts various regularizations on the target network outputs like self-training or entropy-related objectives [76, 9]. Despite the impressive progress, the source data is always necessary during domain alignment, which might raise data privacy concerns nowadays.

The practical demand directly motivates a novel UDA setting named source-free domain adaptation (SFDA) [26, 23], where only the well-trained source model instead of the well-annotated source dataset is provided to the target domain. The booming efforts in the SFDA community are either generation-based or pseudo label-based. The generation-based methods [23, 51, 41] introduce extra generative modules to recover the unseen source domain at image-level or feature-level, and then address this problem from a UDA perspective. Nevertheless, generative modules introduce additional parameters, and the recovered virtual source domain usually suffers from a mode collapse problem, which results in low-quality images or features. The pseudo label-based methods [41, 28, 60, 16] label the target samples based on the present model’s prediction or feature structure. However, due to the extreme domain shift, the noises are inescapable, result in inaccurate decision boundary.

Fig. 2: The accuracies per task of proxy source domain on Office-home.

To address the issues above (additional parameters and noisy labels), we propose a new and effective method called Proxy-based Mixup training with label refinery (ProxyMix), to deal with the source-free domain adaptation problem. To bridge the gap between the unseen source domain and the target domain while avoiding introducing extra parameters, we first select part of source-similar samples from the target domain rather than synthesize virtual images to construct a proxy source domain. Specifically, we define the weights of the source classifier as the class prototypes [36], then select the nearest neighbors for each class prototype in angle space to construct the proxy source domain. Priors methods with proxy source domain primarily employ entropy-criterion [28, 11], which select samples with lower entropy for each class from pseudo-labeled target data. In practice, as shown in Fig. 2, we observe that the mean accuracy of our angle-induced proxy source domain is clearly higher than the entropy criterion. Another significant benefit is that our pseudo labels are determined by the corresponding prototype, rather than the predictions from the source model, allowing us to create a class-balanced proxy source domain.

To improve the reliability of pseudo labels, we propose a frequency-weighted aggregation pseudo-labeling strategy (FA) as pseudo label refinery. FA includes three operations applied to the predictions: sharpening, re-weighting, and aggregation. Specifically, to avoid the ambiguous, we first sharpen the predictions of the classifier. At the same time, we take the frequency of each class into account and re-weight the probability of each class, to improve the contribution of low-frequency classes and avoid bias to majority and easy classes in the target domain during gradient updating. Then we introduce a non-parametric neighborhood aggregation strategy to pull the unlabeled target features close to their semantic neighbors, aiming to reduce the impact of outlier noisy labels and compact the semantic clusters.

With the proxy source domain, we tackle the challenging SFDA problem using a semi-supervised style with the aid of refined pseudo labels. To align the proxy and target domain, while alleviating the negative consequence of noisy labels, two mixup regularizations [73, 3, 2, 4], i.e., inter-domain and intra-domain mixup, are incorporated into our framework, enforcing the model to maintain consistency, thus improving the robustness against noisy labels. As illustrated in Fig. 1, the FA strategy refines the pseudo labels and compacts the feature clusters while the mixup training aligns the two domains, obtaining clear decision boundaries.

To summarize, the main contributions of this work are listed below in three-fold:

  • We propose a simple yet effective method, ProxyMix, for source-free domain adaptation, which aims to discover a proxy source domain and utilize mixup training to implicitly bridge the gap between the target domain and the unseen source domain.

  • To obtain a reliable proxy source domain, we exploit the network weights of the source model and select source-like samples from the target domain in an efficient and accurate way.

  • To refine the noisy pseudo labels during alignment, we further propose a new frequency-weighted aggregation strategy, compacting the target feature clusters and avoiding bias to majority and easy classes.

We conduct ablation study to verify the contribution and effectiveness of both proxy source domain construction and pseudo label refinery. Extensive results on four datasets further validate that ProxyMix yields comparable or superior performance to the state-of-the-art SFDA methods.

Ii Related Work

Ii-a Unsupervised Domain Adaptation (UDA)

UDA aims to transfer knowledge from a label-rich source domain to an unlabeled target domain. UDA problems can be classified into four cases according to the relationship between the source and target domain, i.e., closed-set [44], partial-set [5], open-set [35], and universal [71]

. As a typical example of transfer learning, UDA provides methods to bridge domain gaps for various applications such as object recognition

[30, 12, 10, 22, 24, 62] and semantic segmentation [53, 76]. The most prevailing paradigm for UDA is to extract domain-invariant features to align different domains while preserving the category information from the labeled source domain. Roughly speaking, existing feature-level domain alignment could be divided into two different categories. The first line [12, 54, 31] aligns representations by fooling a domain discriminator through adversarial training, while the second line [30, 48] directly minimizes different discrepancy metrics (e.g., statistical moments) to match the feature distributions. Besides, another line [15] focuses on the image space alignment and converts the target image into a source style image (and visa versa). By contrast, output-level regularization methods [9, 17]

achieve implicit domain alignment by forcing the target outputs to be diverse one-hot encodings.

[27] proposes an auxiliary classifier for target data to get the high-quality pseudo labels and [29] introduces cycle self-training by utilizing target pseudo labels to train another head and enforce them to perform well on the source domain. [63, 59] are the two most closely related works that introduce mixup training into adversarial UDA. However, our method does not require access to source data and develops a new pseudo label refinery strategy instead of focusing on the mix manner.

Ii-B Source-free Domain Adaptation (SFDA)

SFDA aims to tackle the domain adaptation problem without accessing the raw source data. Before deep learning era, there are a number of transfer learning works [68, 52, 18, 8, 25] without source data that have been empirically successful. In recent years, pioneering works [26, 23] discover that the well-trained source model conceals sufficient source knowledge for the following target adaptation stage, and [26] provides a clear definition of this problem. The last two years have witnessed an increasing number of SFDA approaches [41, 28, 60, 16], most of which are generation based [23, 51, 41] or self-training [26, 69] based methods. Generation based methods [51, 41, 23, 66, 11] generate virtual high-level features of the source domain to bridge the unseen source and target distribution. Self-training based methods seek to refine the source model by using self-supervised techniques, with the pseudo label technique [26, 69] being the most extensively employed. [60, 16] learn from target samples by distinct variants of contrastive learning. [69] mines the hidden structure information such as the neighbor features to get the pseudo labels. However, generating source samples usually introduces additional modules such as generators or discriminators, while pseudo-labeling might lead to wrong labels due to domain shift, both of which cause negative effects on the adaptation procedure. Another practice [66, 11, 28] is selecting part of the target data as a pseudo source domain, to compensate for the unseen source domain. A typical method is entropy-criterion [28]

, which constructs the pseudo source domain by estimating a split ratio using the target dataset’s mean and maximum entropy, and then uses the split ratio to choose samples with lower entropy for all pseudo-labeled target domains within each class. The entropy-criterion provides a proxy source domain with a huge number of samples. However, the existence of hard classes and domain shift, causes the entropy-criterion to suffer from a severe class-imbalance problem. Despite the fact that

[11] attempts to tackle this problem by simply choosing the same number for each class, there is no data in some hard classes, so the class-imbalance problem is unavoidable. Unlike the previous works, our method builds the proxy source domain directly from the target domain using the source classifier weights, which is flexible and works well for SFDA. Besides, our mixup training strategy is also different from theirs, which transfers the label information from the proxy source to the unlabeled target domain.

Ii-C Semi-Supervised Learning (SSL)

SSL aims to combine supervised learning and unsupervised learning, leveraging the vast amount of unlabeled data with limited labeled data to improve the performance of classifier and to deal with the scenarios where labeled data is scarce

[55]. As opposed to the domain adaptation problem, SSL deals with the samples from two identical domains. SSL has flourished in recent years [57, 40, 21], temporal ensemble [19]

introduces self-ensembling, forming a consensus prediction of the unknown labels using the outputs of the network-in-training on different epochs; MixMatch

[3] proposes a holistic approach for data-augmented unlabeled examples and mixing labeled and unlabeled data using mixup; ReMixMatch [2] aligns the distribution of labeled and unlabeled data. FixMatch [47] demonstrates the strong performance of consistency regularizations and pseudo labels; AdaMatch [4]

proposes a unified approach to solve the unsupervised domain adaptation, semi-supervised learning, and semi-supervised domain adaptation problems. Existing methods demonstrate the usefulness of mixup training in aligning distributions, and the growing popularity of SSL motivates us to convert the SFDA problem to an SSL challenge. Such methods use the true labels, which are not available in our task, and these labels provide strong and diverse supervision. Our data is pseudo-labeled, with little diversity and a lot of noise, so these semi-supervised learning approaches cannot be directly applied to our problem.

Fig. 3: Overview of ProxyMix on solving source-free domain adaptation. We treat the weights of the classifier as class prototypes to choose a series of confident samples to construct a class-balanced proxy source domain. Then the proxy source samples participate in two types of mixup training based on the proposed frequency-weighted soft label.

Iii Methodology

This paper mainly follows the problem definition of SHOT [26] and focuses on a -way visual classification task. We aim to learn a target model , and predict the label for an input target image with only target data and the well-trained source model . The model consists of two modules: the feature extractor and the classifier .

Following the standard paradigm of SFDA [26], as a preliminary, we train the source model with the label smoothing [34] technique:


where , is the one-hot encoding of , is the smoothing parameter, and is the soft-max output of the

-dimensional vector


During adaptation, we directly initialize the target model with the well-trained source model , then freeze the classifier and fine-tune the feature extractor to ensure the target features are implicitly aligned with unseen source features via a same hypothesis. It is worth noting that we do not adopt the special design of normalization techniques of SHOT [26] for simplicity and commonality.

Iii-a Proxy Source Domain Construction by Prototypes

Recently, semi-supervised learning approaches [3, 2] have also shown impressive achievements on UDA problem, and Rukhovich et al. [42] even wins the VisDA competition by directly exploiting MixMatch [3] in 2019. Inspired by them, we construct the proxy source domain by pseudo-labeling portions of confident samples (source-similar samples), and try to solve the SFDA task in a semi-supervised style. Since the source data is unavailable, we expect to mine the source information from the model . Previous works [50, 70] leverage the weights of classifier as class prototypes in other fields, and obtain positive results. Another classical practice [36] exposes that the classifier weight vector of a well-trained last-layer classifier converges to a high-dimension geometry structure, which maximally separates the pair-wise angles of all classes in the classifier. Therefore, inspired by these works, it is natural to select the nearest neighbors of classifiers’ weights in angle space to construct the proxy source domain. Concretely, we first define the weights of the classifier as the class prototypes, where is the number of categories. We use the class prototype as the cluster centroid to search and pseudo-label nearest samples in the unlabeled target domain for the purpose of forming proxy source domain :


and denotes choosing samples with minimum distance for each class, is a hyper-parameter, deciding how many samples we select in each class. To prevent the negative consequences caused by class imbalance, we select the same number of samples for each class. measures the distance between and

in angle space, we use the cosine similarity by default. For these proxy source data, we directly calculate the cross entropy loss with labeling smoothing in the following,


where is the smoothed label, denotes the one-hot encoding of .

Iii-B Pseudo-labeling by Frequency-weighted Aggregation (FA)

Pseudo-labeling is a heuristic approach to semi-supervised learning, which progressively treats the predictions on unlabeled data as true labels, and often employs cross-entropy loss during training. However, in an unsupervised learning setting, the class distribution is unknown, and the model is biased towards easy classes. To mitigate the imbalance and sensitivity of pseudo labels, inspired by several classical works

[27, 61], we propose a new pseudo label refinery strategy to get reliable soft pseudo labels in the presence of domain shift. In specific, we adjust the class distribution of the prediction to alleviate the class imbalance, and then we use the center of semantic neighbors as the pseudo label, rather than depending on a single prediction. This compacts the cluster by pulling the unlabeled target features closer to their semantic neighbors, resulting in a clear classification boundary. Note that hard labels reinforce the confidence of the current model, while losing some information. Hence we use the soft predictions rather than the one-hot vectors as the pseudo labels, which are able to provide more distribution information and decrease the negative effect of corrupted one-hot labels.

Fig. 4: Illustration of the frequency-weighted strategy as label refinery. We first sharpen the predictions to the second power, and then normalize the predictions by the frequency per class.

Neighborhood Aggregation. To leverage the local data structure, we employ the neighborhood aggregation strategy, which is based on the idea of message passing via neighbors, to adjust the predictions of the input target data. Concretely, we construct a large memory bank to store both the features and the predictions of target data. During pseudo-labeling, we retrieve nearest neighbors from the memory bank for each sample in the current mini-batch according to their features , and calculate the soft label of data point by aggregating these predictions of feature-level neighbors:


where is the neighbor index set of the data , are the frequency-weighted predictions of neighbors stored in the bank, then we explain how these predictions are obtained.

Frequency-weighted prediction. As illustrated in Fig. 4, to avoid ambiguity, we first sharpen the calculated output predictions

. Besides, the network will be empirically skewed towards these majority classes due to the class imbalance. Then, we further multiply the predictions by a weight based on the frequency of the class. In specific, given the soft-max output predictions

, the frequency-weighted predictions can be obtained through


where are soft cluster frequencies calculated by the current batch of samples. Through the operation above, we expect to achieve class-balance in the predictions. At each iteration, we update the features and predictions associated with the data in the corresponding location in the memory bank.

Iii-C Domain Alignment by Mixup Training

Two mixup training procedures are incorporated in our method. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels to regularize the network to support linear behavior in-between training samples. Pioneers have proved the effectiveness of mixup training on UDA and SSL tasks

[73, 3, 2, 42]. Such a simple regularization can improve the generalization and the robustness to some noisy labels, so it is suitable for pseudo label-based unsupervised learning tasks. Inspired by these methods, with the prototype-induced pseudo source domain and target domain , we introduce two different regularizations via mixup training.

Inter-domain Mixup. To align the proxy source domain and the target domain, we employ inter-domain mixup regularization. [3] mixes the labeled data with both unlabeled data and labeled data itself. However, the “labeled” data in our case is not completely trustworthy. As a result, we do not add any mixup training between the proxy source samples, but only between the pseudo source domain and the target domain only, constructing in virtual training samples below:

where denotes the one-hot encoding of , and is the soft label of calculated by Eq. (4),

is the mixup coefficient sampled from a random Beta distribution.

Then we adopt the KL divergence to calculate the soft label classification loss:

0:  Target dataset ; well-trained source model , where is the feature extractor and is the classifier;
1:  Build the proxy source domain by Eq. (2);
2:  Initialize the feature memory bank and prediction memory bank ;
3:  repeat
4:     Randomly sample a batch of target data from and proxy source data from ;
5:     Obtain the soft label of by Eq. (4);
6:     Update by Eq. (8);
7:     Update the corresponding features and predictions of in feature bank and prediction bank ;
8:  until Iterations are exhausted.
8:  New model .
Algorithm 1 Algorithm of the proposed ProxyMix.

SF Method ArCl ArPr ArRe ClAr ClPr ClRe PrAr PrCl PrRe ReAr ReCl RePr Avg. No Adapt. 46.1 67.0 74.3 52.0 62.7 64.3 53.8 42.1 73.7 67.0 47.7 78.2 60.7 MCD [46] 48.9 68.3 74.6 61.3 67.6 68.8 57.0 47.1 75.1 69.1 52.2 79.6 64.1 CDAN [31] 50.7 70.6 76.0 57.6 70.0 70.0 57.4 50.9 77.3 70.9 56.7 81.6 65.8 SAFN [65] 52.0 71.7 76.3 64.2 69.9 71.9 63.7 51.4 77.1 70.9 57.1 81.5 67.3 SymNets [74] 47.7 72.9 78.5 64.2 71.3 74.2 64.2 48.8 79.5 74.5 52.6 82.7 67.6 MDD [75] 54.9 73.7 77.8 60.0 71.4 71.8 61.2 53.6 78.1 72.5 60.2 82.3 68.1 TADA [58] 53.1 72.3 77.2 59.1 71.2 72.1 59.7 53.1 78.4 72.4 60.0 82.9 67.6 BNM [9] 52.3 73.9 80.0 63.3 72.9 74.9 61.7 49.5 79.7 70.5 53.6 82.2 67.9 BDG [67] 51.5 73.4 78.7 65.3 71.5 73.7 65.1 49.7 81.1 74.6 55.1 84.8 68.7 SRDC [49] 52.3 76.3 81.0 69.5 76.2 78.0 68.7 53.8 81.7 76.3 57.1 85.0 71.3 RSDA-MSTN [13] 53.2 77.7 81.3 66.4 74.0 76.5 67.9 53.0 82.0 75.8 57.8 85.4 70.9 ATDOC [27] 60.2 77.8 82.2 68.5 78.6 77.9 68.4 58.4 83.1 74.8 61.5 87.2 73.2 SSFT-SSD [66] 51.7 76.0 79.9 66.8 75.8 77.2 63.9 52.1 80.6 73.5 57.1 83.0 69.8 VDM-DA [51] 59.3 75.3 78.3 67.6 76.0 75.9 68.8 57.7 79.6 74.0 61.1 83.6 71.4 CPGA [41] 59.3 78.1 79.8 65.4 75.5 76.4 65.7 58.0 81.0 72.0 64.4 83.3 71.6 SHOT [26] 57.1 78.1 81.5 68.0 78.2 78.1 67.4 54.9 82.2 73.3 58.8 84.3 71.8 PS [11] 57.8 77.3 81.2 68.4 76.9 78.1 67.8 57.3 82.1 75.2 59.1 83.4 72.1 NRC [69] 57.7 80.3 82.0 68.1 79.8 78.6 65.3 56.4 83.0 71.0 58.6 85.6 72.2 [60] 58.4 79.0 82.4 67.5 79.3 78.9 68.0 56.2 82.9 74.1 60.5 85.0 72.8 ProxyMix 59.3 81.0 81.6 65.8 79.7 78.1 67.0 57.5 82.7 73.1 61.7 85.6 72.8

TABLE I: Classification accuracies (%) of state-of-the-art methods on Office-home [56] (ResNet-50). SF denotes source-free. We use Bold to highlight the best and underline to highlight the second best among source-free methods.

SF Method AD AW DA DW WA WD Avg. No Adapt. 77.3 73.8 59.9 96.5 60.7 98.4 77.8 67.9 MCD [46] 92.2 88.6 69.5 98.5 69.7 100.0 86.5 80.0 CDAN [31] 92.9 94.1 71.0 98.6 69.3 100.0 87.7 81.8 MDD [75] 90.4 90.4 75.0 98.7 73.7 99.9 88.0 82.4 BNM [9] 90.3 91.5 70.9 98.5 71.6 100.0 87.1 81.1 DMRL [59] 93.4 90.8 73.0 99.0 71.2 100.0 87.9 82.1 BDG [67] 93.6 93.6 73.2 99.0 72.0 100.0 88.5 83.1 MCC [17] 95.6 95.4 72.6 98.6 73.9 100.0 89.4 84.4 SRDC [49] 95.8 95.7 76.7 99.2 77.1 100.0 90.8 86.3 RWOT [64] 94.5 95.1 77.5 99.5 77.9 100.0 90.8 86.3 RSDA-MSTN [13] 95.8 96.1 77.4 99.3 78.9 100.0 91.1 87.1 ATDOC [27] 95.4 94.6 77.5 98.1 77.0 99.7 90.4 86.1 SHOT [26] 94.0 90.1 74.7 98.4 74.3 99.9 88.6 83.3 SSFT-SSD [66] 95.2 95.0 72.7 98.7 73.5 100.0 89.2 84.1 NRC [69] 96.0 90.8 75.3 99.0 75.0 100.0 89.4 84.3 HCL [16] 94.7 92.5 75.9 98.2 77.7 100.0 89.8 85.2 CPGA [41] 94.4 94.1 76.0 98.4 76.6 99.8 89.9 85.3 ProxyMix 95.4 96.7 75.1 98.5 75.4 99.8 90.1 85.6

TABLE II: Classification accuracies (%) on Office-31 [43] (ResNet-50). [: mean values except DW.]

Intra-domain Mixup. To mine the inner structure of the target domain, we also adopt the mixup regularization between different target data. As it is typical in many SSL methods, we use data augmentation on target data. In specific, for each mini-batch of target data , we concatenate it with its augmented version to construct a vector notated as . Then we mixup and its shuffled version to construct the virtual training samples below:

where is the shuffled version of , and are the soft label of and calculated by Eq. (4), respectively. Then we formulate the intra-domain mixup regression loss as:


Note here we use square loss. Unlike the cross entropy loss used in Eq. (6), it is bounded and more robust due to the insensitivity to corrupted labels.

SF Method plane bicycle bus car horse knife mcycl person plant sktbrd train truck Per-class No Adapt. 63.2 10.4 47.6 73.0 46.9 4.5 66.4 15.6 62.1 17.7 88.5 7.2 41.9 ADR [45] 94.2 48.5 84.0 72.9 90.1 74.2 92.6 72.5 80.8 61.8 82.2 28.8 73.5 CDAN [31] 85.2 66.9 83.0 50.8 84.2 74.9 88.1 74.5 83.4 76.0 81.9 38.0 73.9 CDAN+BSP [7] 92.4 61.0 81.0 57.5 89.0 80.6 90.1 77.0 84.2 77.9 82.1 38.4 75.9 SAFN [65] 93.6 61.3 84.1 70.6 94.1 79.0 91.8 79.6 89.9 55.6 89.0 24.4 76.1 SWD [20] 90.8 82.5 81.7 70.5 91.7 69.5 86.3 77.5 87.4 63.6 85.6 29.2 76.4 MDD [75] - - - - - - - - - - - - 74.6 DMRL [59] - - - - - - - - - - - - 75.5 MCC [17] 88.7 80.3 80.5 71.5 90.1 93.2 85.0 71.6 89.4 73.8 85.0 36.9 78.8 STAR [33] 95.0 84.0 84.6 73.0 91.6 91.8 85.9 78.4 94.4 84.7 87.0 42.2 82.7 RWOT [64] 95.1 80.3 83.7 90.0 92.4 68.0 92.5 82.2 87.9 78.4 90.4 68.2 84.0 ATDOC [27] 93.0 77.4 83.4 62.3 91.5 88.4 91.8 77.1 90.9 86.4 85.8 48.2 81.4 SSFT-SSD [66] 95.4 86.5 79.3 51.5 92.9 94.5 82.1 79.7 90.0 87.1 87.8 57.9 82.1 SHOT [26] 94.3 88.5 80.1 57.3 93.1 94.9 80.7 80.3 91.5 89.1 86.3 58.2 82.9 HCL [16] 93.3 85.4 80.7 68.5 91.0 88.1 86.0 78.6 86.6 88.8 80.0 74.7 83.5 PS [11] 95.3 86.2 82.3 61.6 93.3 95.7 86.7 80.4 91.6 90.9 86.0 59.5 84.1 [60] 94.0 87.8 85.6 66.8 93.7 95.1 85.8 81.2 91.6 88.2 86.5 56.0 84.3 VDM-DA [51] 96.9 89.1 79.1 66.5 95.7 96.8 85.4 83.3 96 86.6 89.5 56.3 85.1 NRC [69] 96.8 91.3 82.4 62.4 96.2 95.9 86.1 80.6 94.8 94.1 90.4 59.7 85.9 CPGA [41] 95.6 89.0 75.4 64.9 91.7 97.5 89.7 83.8 93.9 93.4 87.7 69.0 86.0 ProxyMix 95.4 81.7 87.2 79.9 95.6 96.8 92.1 85.1 93.4 90.3 89.1 42.2 85.7

TABLE III: Classification accuracies (%) on the large-scale synthesized-to-real dataset VisDA [37] (ResNet-101).

SF Method M S M S M S M S Avg. No Adapt. 21.5 21.7 18.5 29.5 18.8 25.8 22.6 MMD [32] 57.5 27.9 40.7 26.7 47.3 54.8 42.5 DANN [12] 58.7 29.4 42.3 30.5 48.1 56.7 44.2 ADDA [54] 61.0 30.5 40.4 29.3 48.9 51.1 43.5 MCD [46] 62.0 31.0 41.4 31.3 46.8 59.3 45.3 PointDAN [39] 64.2 33.0 47.6 33.9 49.1 64.1 48.7 VDM-DA [51] 58.4 30.9 61.0 40.8 45.3 61.8 49.7 NRC [69] 64.8 25.8 59.8 26.9 70.1 68.1 52.6 ProxyMix 65.2 22.4 60.8 30.8 81.2 64.2 54.1

TABLE IV: Classification accuracies (%) on the 3D point cloud dataset PointDA-10 [38] (PointNet [39]). The results except ours are from NRC [69] and PointDAN [39].

Iii-D Overall Objective

Combining the proxy source classification loss and two types of mixup loss, our overall objective is formulated as:


where and are trade-off parameters to balance losses. The overall pipeline of ProxyMix is illustrated in Algorithm 1.

Iv Experiment

Datasets. We conduct the experiments on four popular benchmark datasets: (1) Office-31 [43] is a standard domain adaptation dataset consisting of three distinct domains, i.e., Amazon (A), DSLR (D) and Webcam (W), and 31 categories in the shared label space. The specific numbers of images for each domain are 2,817 (A), 498 (D) and 795 (W), therefore the dataset suffers from severe data imbalance. (2) Office-home [56] is a medium-sized domain adaptation dataset with 15,500 images collected from four domains Art (Ar), Clipart (Cl), Product (Pr), and Real-World (Re). There are 65 categories per domain, which is much more than Office-31. (3) VisDA [37] is a large-scale challenging dataset which consists of a 12-class synthesize-to-real object recognition task. The source domain involves 152k synthetic images which are produced by 3D rendering model under various conditions. The target domain contains 55k images collected from the real-world scene. (4) PointDA-10 [39] is a common-used 3D cloud-point dataset extracted from three popular 3D object/scene datasets, i.e, modelnet (M) shapenet (S), and scannet () for cross-domain 3D object recognization. Each domain contains its own training and testing sets. We train our models by source and target domain’s training set, and show the test resutls on the target domain’s test set.

Baselines. We compare ProxyMix with the state-of-the-art source-free domain adaptation methods: SHOT [26], CPGA [41], [60], HCL [16], NRC [69], SSFT-SSD [66], PS [11]. Moreover, to illustrate the effectiveness of ProxyMix, we further compare our method with the state-of-the-art UDA methods: SymNets [74], TADA [58], BNM [9], BDG [67], SRDC [49], RSDA-MSTN [13], ADR [45], CDAN [31], CDAN+BSP [7], SAFN [65], SWD [20], MDD [75], DMRL [59], MCC [17], STAR [33], RWOT [64], ATDOC [27], MMD [32], DANN [12], ADDA [54], MCD [46], PointDAN [39]. We use bold to highlight the best results and underline to highlight the second best results among source-free methods.

Implementation Details.

We implement our method based on PyTorch. For network architecture, we adopt ResNet


, pretrained on the ImageNet as the backbone, and replace the original fully connected layer with a bottleneck layer followed by a task-specific linear layer. In the source model training stage, we exploit SGD optimizer with learning rate

for backbone and for the bottleneck and classifier. In the target adaptation stage, we use SGD optimizer with learning rate for the backbone and freeze the fully connected classification layer. The numbers of epoch are set to 30, 50, 5 in training stage and 50, 50, 1 in adaptation stage for Office-31, Office-home and VisDA, respectively. Specially, for PointDA-10

, we follow the open source code of NRC

[69], use PointNet [38] as our backbone network, learning rate and Adam optimizer with 100 epochs each stage. For the hyper-parameters, considering the confidence of pseudo labels, we set , , and we alter and linearly by multiplying a ratio that varies linearly from 0 to 1 based on the number of the current iteration. Besides, we set , beta distribution parameter in mixup and for Office-31, Office-home, PointDA-10 and VisDA. All results are the averages of three random runs with seed {0, 1, 2}.

Iv-a Comparison Results

2D image datasets. We first compare our method with the state-of-the-art methods on 2D image datasets in Table I, II, and III. Note that the results of other methods are from the original papers, except ours. On Office-home, we achieve the best results on three tasks, and the highest mean accuracy, demonstrating the effectiveness of ProxyMix to deal with the multi-class classification problem on the medium-size dataset. On Office-31, we also achieve the highest mean accuracy among SFDA methods, validating the efficacy of ProxyMix handling with small datasets with fewer categories. On VisDA, we achieve the best results on four single tasks and a comparable mean accuracy with the state-of-the-art methods. The reason why the performance on VisDA is not as good as the first two may be that the scale of the proxy source domain is too small relative to the entire dataset, which causes the network to have a certain bias towards the proxy source domain. In summary, our method ProxyMix achieves competitive accuracy across three benchmarks when compared with others, which demonstrates the effectiveness in dealing with the standard 2D image domain adaptation benchmarks. We achieve similar results compared with the state-of-the-art SFDA methods [60] (ICCV-21) and NRC [69] (NeurIPS-21), and UDA method ATDOC [27] (CVPR-21). The presented results clearly demonstrate the efficacy of the proposed method in dealing with domain-imbalanced, multi-class and large-scale challenges.

3D point cloud dataset. To explore the generalization performance of ProxyMix on 3D data, we also report the results for the PointDA-10 dataset in Table IV. Without any extra modules, our method achieves the highest average accuracy on the benchmark, even compared with UDA methods and the 3D cloud point domain adaptation method PointDAN [39].

Choices of soft label Office-31 Office-home VisDA
MixMatch [3] 88.4 72.4 83.0
ReMixMatch [2] 88.1 71.3 80.2
ATDOC [27] 88.5 72.2 84.7
Ours 90.1 72.8 85.7
TABLE V: Analysis of different soft pseudo labels.
Variants Office-31 Office-home VisDA
w/o aggregation 88.4 71.3 82.4
w/ aggregation (Ours) 90.1 72.8 85.7
TABLE VI: Analysis of aggregation strategy.
Method Office-31 Office-home VisDA
Random-selected 83.9 69.0 81.9
Entropy-guided 86.3 70.5 72.6
Ours 90.1 72.8 85.7
TABLE VII: Analysis of different selection methods of proxy source samples.
Office-31 Office-home VisDA
83.5 66.3 69.6
89.1 72.4 78.5
86.7 65.8 84.9
89.3 72.3 78.4
89.9 71.3 84.7
90.1 72.8 85.7

Ablation study on the loss functions.

Fig. 5: The accuracy curve of the task ArCl on Office-home.

Iv-B Empirical Analysis

To explore the effectiveness of the proposed pseudo-labeling strategy, the aggregation strategy, the construction method of proxy source domain, we conduct a series of ablation analysis on the three common-used 2D image classification datasets Office-31, Office-home and VisDA. Then we explore the influence of three loss functions in our method, the training stability, and the sensitivity of the important hyper-parameters. We also show the t-SNE visualization results of task ArCl to clearly validate the altering of features.

Effectiveness of the proposed frequency-weighted aggregation soft pseudo label. Our frequency-weighted aggregation strategy (FA) is a soft pseudo label generation method. To verify the influence, we compare our method with three label refinery strategies. 1) MixMatch [3] calculates the soft pseudo label by sharpening and normalizing the predictions directly. 2) ReMixMatch [2] sharpens the predictions first, then multiplies a distribution alignment ratio calculated by the current batch of samples. 3) ATDOC [27] only uses the highest possibilities that are multiplied by a balanced ratio, causing the sums to not be equal to 1, which is not conducive to the calculation of KL divergence. Therefore, we normalize the predictions of ATDOC in our experiments. The results shown in Table V demonstrate that the proposed frequency-weighted aggregation module effectively improves the soft label’s reliability.

(a) Influence of the weight . (b) Influence of the weight . (c) Influence of N.
Fig. 6: Sensitivity of hyper-parameters of task ArCl on Office-home. (a) Influence of the weight of ; (b) Influence of the weight of ; (c) Influence of the number N per class in proxy source domain.
(a) Before adaptation. (b) After adaptation.
(c) Before adaptation. (d) After adaptation.
Fig. 7: The t-SNE visulization of task ArCl on Office-home. (a) and (b): the unseen source features (blue points) and the target features (red points) before and after adaptation, respectively. (c) and (d): the target features before and after adaptation, respectively. For clarity, we select first 10 classes in the 65 classes on Office-home.

Effectiveness of the aggregation strategy. Our aggregation technique pulls unlabeled target data to semantic neighbors, allowing us to investigate the target domain’s structure information and mitigate the detrimental effects of noisy labels. Table VI shows the variant of ProxyMix without the aggregation approach to demonstrate the usefulness of the aggregation strategy. The accuracy of standard ProxyMix is higher than that of variants without aggregation, demonstrating that leveraging the semantic neighbors’ center as the pseudo label is effective and reliable.

Analysis of the construction method of proxy source domain. To study the influence of the proposed construction method of the class-balanced proxy source domain, we compare ProxyMix with a common-used method, i.e., i.e., randomly-selected criterion, entropy-guided criterion, and the baseline method. 1) Randomly-selected: to ensure fairness, we randomly select N samples for each class from the target data to generate a class-balanced proxy source domain based on the classification results of the source model. Because we cannot discover N examples for some difficult classes, we choose the remaining numbers of samples from other classes at random as compensation. 2) Entropy-guided: as commonly used in other works [28], we compare our method with the entropy-guided method. In specific, we calculate the mean entropy of the source model’s prediction on the full target dataset, then obtain a split ratio , where denotes the size of the subset formed by samples which satisfy the condition , is the entropy function. Then we compute the class distribution according to the predictions given by the source model, and select samples with the lowest entropy for each class. The results are shown in Table VII. Random-selected perform unsatisfactory due to the poor confidence of the source model before adaption. Although the entropy-criterion reflects the confidence of the prediction, it exacerbates the class imbalance problem and leads the model bias to the easier classes, which is not satisfactory in comparison to ours. The proposed prototype-induced method achieves the highest accuracy. We take both confidence and class-balance into consideration, and as illustrated in Fig. 2, we observed that the accuracy of the proxy source domain is higher than the entropy-criterion.

Ablation studies on the proposed loss functions. To investigate the proposed loss functions, we show the results of variants with different combinations of loss functions in Table VIII. As shown, without the proxy source domain classification loss , the accuracy of Office-31 has the biggest drop. The accuracy of Office-home is more likely to be influenced by the inter-domain mixup loss . As for the large-scale dataset VisDA, the intra-domain mixup loss contributes a lot. The effectiveness of and also illustrate the reliability of the proposed frequency-weighted soft labels from another perspective.

Training stability. We show the accuracy curve of task ArCl on Office-home in Fig. 5, the accuracy during training grows up quickly and then converges as we expected. Therefore, the training procedure of ProxyMix is stable and reliable.

Sensitivity of hyper-parameters. To better understand the effects of the hyper-parameters , and , we explore their performance sensitivity in a single task ArCl on Office-home in Fig. 6. The accuracies around and fluctuate very softly in (a) and (b). The results on the proxy source domain scale are provided in (c), shows that the accuracies change slightly around . Generally, in our method ProxyMix, the hyper-parameters are not sensitive.

t-SNE visualization. To evaluate the effectiveness of ProxyMix, We show the t-SNE visualization111 of target features on task ArCl in Fig. 7. To validate the effectiveness of domain alignment, we show the features of the unseen source domain (blue points) and the target domain (red points) in (a) and (b). The distribution of target features is closer to the source feature after adaptation as we expected. We also show the target feature distribution of the first 10 classes of Office-home in (c) and (d). Benefiting from our frequency-weighted aggregation strategy, the feature clusters after adaptation are compact, and the classification boundary is clear.

V Conclusion

In this paper, we focus on the source-free domain adaptation problem, and propose a simple yet effective method named Proxy-based Mixup training with label refinery (ProxyMix). In specific, we treat weights of the fully-connected layer as class prototypes to choose a series of confident samples to construct a class-balanced proxy source domain. Then label information is expected to flow from the pseudo source domain to the unlabeled target domain via mixup training. To enhance mixup training, we further introduce a new pseudo label refinery strategy, which combines frequency-weighted sharpening and neighborhood aggregation to obtain reliable soft predictions of unlabeled target data. Experiments on four popular benchmarks prove the effectiveness of ProxyMix without access to source data. Although our method outperforms several UDA methods that are based on source data, we should recognize that removing all noisy labels in an unsupervised manner is still tough. We believe that our work is an attempt in that direction, with the intention of inspiring others in the UDA community.


  • [1] S. Ben-David, J. Blitzer, K. Crammer, F. Pereira, et al. (2007) Analysis of representations for domain adaptation. Proc. NeurIPS. Cited by: §I.
  • [2] D. Berthelot, N. Carlini, E. D. Cubuk, A. Kurakin, K. Sohn, H. Zhang, and C. Raffel (2020) Remixmatch: semi-supervised learning with distribution alignment and augmentation anchoring. In Proc. ICLR, Cited by: §I, §II-C, §III-A, §III-C, §IV-B, TABLE V.
  • [3] D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and C. Raffel (2019) Mixmatch: a holistic approach to semi-supervised learning. In Proc. NeurIPS, Cited by: §I, §II-C, §III-A, §III-C, §III-C, §IV-B, TABLE V.
  • [4] D. Berthelot, R. Roelofs, K. Sohn, N. Carlini, and A. Kurakin (2022) Adamatch: a unified approach to semi-supervised learning and domain adaptation. In Proc. ICLR, Cited by: §I, §II-C.
  • [5] Z. Cao, M. Long, J. Wang, and M. I. Jordan (2018) Partial transfer learning with selective adversarial networks. In Proc. CVPR, Cited by: §II-A.
  • [6] C. Chen, Z. Fu, Z. Chen, S. Jin, Z. Cheng, X. Jin, and X. Hua (2020) Homm: higher-order moment matching for unsupervised domain adaptation. In Proc. AAAI, Cited by: §I.
  • [7] X. Chen, S. Wang, M. Long, and J. Wang (2019) Transferability vs. discriminability: batch spectral penalization for adversarial domain adaptation. In Proc. ICML, Cited by: TABLE III, §IV.
  • [8] B. Chidlovskii, S. Clinchant, and G. Csurka (2016) Domain adaptation in the absence of source domain data. In Proc. KDD, Cited by: §II-B.
  • [9] S. Cui, S. Wang, J. Zhuo, L. Li, Q. Huang, and Q. Tian (2020) Towards discriminability and diversity: batch nuclear-norm maximization under label insufficient situations. In Proc. CVPR, Cited by: §I, §II-A, TABLE I, TABLE II, §IV.
  • [10] P. Dai, P. Chen, Q. Wu, X. Hong, Q. Ye, Q. Tian, C. Lin, and R. Ji (2021) Disentangling task-oriented representations for unsupervised domain adaptation. IEEE Transactions on Image Processing 31, pp. 1012–1026. Cited by: §I, §II-A.
  • [11] Y. Du, H. Yang, M. Chen, J. Jiang, H. Luo, and C. Wang (2021) Generation, augmentation, and alignment: a pseudo-source domain based method for source-free domain adaptation. arXiv preprint arXiv:2109.04015. Cited by: §I, §II-B, TABLE I, TABLE III, §IV.
  • [12] Y. Ganin and V. Lempitsky (2015)

    Unsupervised domain adaptation by backpropagation

    In Proc. ICML, Cited by: §I, §II-A, TABLE IV, §IV.
  • [13] X. Gu, J. Sun, and Z. Xu (2020) Spherical space domain adaptation with robust pseudo-label loss. In Proc. CVPR, Cited by: TABLE I, TABLE II, §IV.
  • [14] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proc. CVPR, Cited by: §IV.
  • [15] J. Hoffman, E. Tzeng, T. Park, J. Zhu, P. Isola, K. Saenko, A. Efros, and T. Darrell (2018) Cycada: cycle-consistent adversarial domain adaptation. In Proc. ICML, Cited by: §II-A.
  • [16] J. Huang, D. Guan, A. Xiao, and S. Lu (2021) Model adaptation: historical contrastive learning for unsupervised domain adaptation without source data. Proc. NeurIPS. Cited by: §I, §II-B, TABLE II, TABLE III, §IV.
  • [17] Y. Jin, X. Wang, M. Long, and J. Wang (2020) Minimum class confusion for versatile domain adaptation. In Proc. ECCV, Cited by: §II-A, TABLE II, TABLE III, §IV.
  • [18] I. Kuzborskij and F. Orabona (2013) Stability and hypothesis transfer learning. In Proc. ICML, Cited by: §II-B.
  • [19] S. Laine and T. Aila (2017) Temporal ensembling for semi-supervised learning. In Proc. ICLR, Cited by: §II-C.
  • [20] C. Lee, T. Batra, M. H. Baig, and D. Ulbricht (2019) Sliced wasserstein discrepancy for unsupervised domain adaptation. In Proc. CVPR, Cited by: TABLE III, §IV.
  • [21] J. Li, S. Wu, C. Liu, Z. Yu, and H. Wong (2019) Semi-supervised deep coupled ensemble learning with classification landmark exploration. IEEE Transactions on Image Processing 29, pp. 538–550. Cited by: §II-C.
  • [22] J. Li, E. Chen, Z. Ding, L. Zhu, K. Lu, and H. T. Shen (2020) Maximum density divergence for domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (11), pp. 3918–3930. Cited by: §II-A.
  • [23] R. Li, Q. Jiao, W. Cao, H. Wong, and S. Wu (2020) Model adaptation: unsupervised domain adaptation without source data. In Proc. CVPR, Cited by: §I, §II-B.
  • [24] S. Li, S. Song, G. Huang, Z. Ding, and C. Wu (2018) Domain invariant and class discriminative feature learning for visual domain adaptation. IEEE Transactions on Image Processing 27 (9), pp. 4260–4273. Cited by: §I, §II-A.
  • [25] J. Liang, R. He, Z. Sun, and T. Tan (2019) Distant supervised centroid shift: a simple and efficient approach to visual domain adaptation. In Proc. CVPR, Cited by: §II-B.
  • [26] J. Liang, D. Hu, and J. Feng (2020) Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In Proc. ICML, Cited by: §I, §II-B, TABLE I, TABLE II, TABLE III, §III, §III, §III, §IV.
  • [27] J. Liang, D. Hu, and J. Feng (2021) Domain adaptation with auxiliary target domain-oriented classifier. In Proc. CVPR, Cited by: §II-A, §III-B, TABLE I, TABLE II, TABLE III, §IV-A, §IV-B, TABLE V, §IV.
  • [28] J. Liang, D. Hu, Y. Wang, R. He, and J. Feng (2021) Source data-absent unsupervised domain adaptation through hypothesis transfer and labeling transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §I, §I, §II-B, §IV-B.
  • [29] H. Liu, J. Wang, and M. Long (2021) Cycle self-training for domain adaptation. In Proc. NeurIPS, Cited by: §II-A.
  • [30] M. Long, Y. Cao, J. Wang, and M. Jordan (2015) Learning transferable features with deep adaptation networks. In Proc. ICML, Cited by: §II-A.
  • [31] M. Long, Z. Cao, J. Wang, and M. I. Jordan (2018) Conditional adversarial domain adaptation. In Proc. NeurIPS, Cited by: §I, §II-A, TABLE I, TABLE II, TABLE III, §IV.
  • [32] M. Long, J. Wang, G. Ding, J. Sun, and P. S. Yu (2013)

    Transfer feature learning with joint distribution adaptation

    In Proc. ICCV, Cited by: TABLE IV, §IV.
  • [33] Z. Lu, Y. Yang, X. Zhu, C. Liu, Y. Song, and T. Xiang (2020) Stochastic classifiers for unsupervised domain adaptation. In Proc. CVPR, Cited by: TABLE III, §IV.
  • [34] R. Müller, S. Kornblith, and G. Hinton (2019) When does label smoothing help?. Proc. NeurIPS. Cited by: §III.
  • [35] P. Panareda Busto and J. Gall (2017) Open set domain adaptation. In Proc. ICCV, Cited by: §II-A.
  • [36] V. Papyan, X. Han, and D. L. Donoho (2020) Prevalence of neural collapse during the terminal phase of deep learning training. Proceedings of the National Academy of Sciences 117 (40), pp. 24652–24663. Cited by: §I, §III-A.
  • [37] X. Peng, B. Usman, N. Kaushik, J. Hoffman, D. Wang, and K. Saenko (2017) Visda: the visual domain adaptation challenge. arXiv preprint arXiv:1710.06924. Cited by: TABLE III, §IV.
  • [38] C. R. Qi, H. Su, K. Mo, and L. J. Guibas (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In Proc. CVPR, Cited by: TABLE IV, §IV.
  • [39] C. Qin, H. You, L. Wang, C. J. Kuo, and Y. Fu (2019) Pointdan: a multi-scale 3d domain adaption network for point cloud representation. In Proc. NeurIPS, Cited by: TABLE IV, §IV-A, §IV, §IV.
  • [40] Y. Qin, H. Wu, X. Zhang, and G. Feng (2021) Semi-supervised structured subspace learning for multi-view clustering. IEEE Transactions on Image Processing 31, pp. 1–14. Cited by: §II-C.
  • [41] Z. Qiu, Y. Zhang, H. Lin, S. Niu, Y. Liu, Q. Du, and M. Tan (2021) Source-free domain adaptation via avatar prototype generation and adaptation. In Proc. IJCAI, Cited by: §I, §II-B, TABLE I, TABLE II, TABLE III, §IV.
  • [42] D. Rukhovich and D. Galeev (2019) Mixmatch domain adaptaion: prize-winning solution for both tracks of visda 2019 challenge. arXiv preprint arXiv:1910.03903. Cited by: §III-A, §III-C.
  • [43] K. Saenko, B. Kulis, M. Fritz, and T. Darrell (2010) Adapting visual category models to new domains. In Proc. ECCV, Cited by: TABLE II, §IV.
  • [44] K. Saenko, B. Kulis, M. Fritz, and T. Darrell (2010) Adapting visual category models to new domains. In Proc. ECCV, Cited by: §II-A.
  • [45] K. Saito, Y. Ushiku, T. Harada, and K. Saenko (2018) Adversarial dropout regularization. In Proc. ICLR, Cited by: TABLE III, §IV.
  • [46] K. Saito, K. Watanabe, Y. Ushiku, and T. Harada (2018) Maximum classifier discrepancy for unsupervised domain adaptation. In Proc. CVPR, Cited by: TABLE I, TABLE II, TABLE IV, §IV.
  • [47] K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C. A. Raffel, E. D. Cubuk, A. Kurakin, and C. Li (2020) Fixmatch: simplifying semi-supervised learning with consistency and confidence. In Proc. NeurIPS, Cited by: §II-C.
  • [48] B. Sun and K. Saenko (2016) Deep coral: correlation alignment for deep domain adaptation. In Proc. ECCV Workshops, Cited by: §II-A.
  • [49] H. Tang, K. Chen, and K. Jia (2020) Unsupervised domain adaptation via structurally regularized deep clustering. In Proc. CVPR, Cited by: TABLE I, TABLE II, §IV.
  • [50] K. Tanwisuth, X. Fan, H. Zheng, S. Zhang, H. Zhang, B. Chen, and M. Zhou (2021) A prototype-oriented framework for unsupervised domain adaptation. Proc. NeurIPS. Cited by: §III-A.
  • [51] J. Tian, J. Zhang, W. Li, and D. Xu (2021) VDM-da: virtual domain modeling for source data-free domain adaptation. IEEE Transactions on Circuits and Systems for Video Technology. Cited by: §I, §II-B, TABLE I, TABLE III, TABLE IV.
  • [52] T. Tommasi, F. Orabona, and B. Caputo (2010) Safety in numbers: learning categories from few examples with multi model knowledge transfer. In Proc. CVPR, Cited by: §II-B.
  • [53] Y. Tsai, W. Hung, S. Schulter, K. Sohn, M. Yang, and M. Chandraker (2018) Learning to adapt structured output space for semantic segmentation. In Proc. CVPR, Cited by: §I, §II-A.
  • [54] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell (2017) Adversarial discriminative domain adaptation. In Proc. CVPR, Cited by: §I, §II-A, TABLE IV, §IV.
  • [55] J. E. Van Engelen and H. H. Hoos (2020) A survey on semi-supervised learning. Machine Learning 109, pp. 373–440. Cited by: §II-C.
  • [56] H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan (2017) Deep hashing network for unsupervised domain adaptation. In Proc. CVPR, Cited by: TABLE I, §IV.
  • [57] X. Wang, D. Kihara, J. Luo, and G. Qi (2020) EnAET: a self-trained framework for semi-supervised and supervised learning with ensemble transformations. IEEE Transactions on Image Processing 30, pp. 1639–1647. Cited by: §II-C.
  • [58] X. Wang, L. Li, W. Ye, M. Long, and J. Wang (2019) Transferable attention for domain adaptation. In Proc. AAAI, Cited by: TABLE I, §IV.
  • [59] Y. Wu, D. Inkpen, and A. El-Roby (2020) Dual mixup regularized learning for adversarial domain adaptation. In Proc. ECCV, Cited by: §II-A, TABLE II, TABLE III, §IV.
  • [60] H. Xia, H. Zhao, and Z. Ding (2021) Adaptive adversarial network for source-free domain adaptation. In Proc. CVPR, Cited by: §I, §II-B, TABLE I, TABLE III, §IV-A, §IV.
  • [61] J. Xie, R. Girshick, and A. Farhadi (2016)

    Unsupervised deep embedding for clustering analysis

    In Proc. ICML, Cited by: §III-B.
  • [62] B. Xu, Z. Zeng, C. Lian, and Z. Ding (2022) Few-shot domain adaptation via mixup optimal transport. IEEE Transactions on Image Processing 31, pp. 2518–2528. Cited by: §II-A.
  • [63] M. Xu, J. Zhang, B. Ni, T. Li, C. Wang, Q. Tian, and W. Zhang (2020) Adversarial domain adaptation with domain mixup. In Proc. AAAI, Cited by: §II-A.
  • [64] R. Xu, P. Liu, L. Wang, C. Chen, and J. Wang (2020) Reliable weighted optimal transport for unsupervised domain adaptation. In Proc. CVPR, Cited by: TABLE II, TABLE III, §IV.
  • [65] R. Xu, G. Li, J. Yang, and L. Lin (2019) Larger norm more transferable: an adaptive feature norm approach for unsupervised domain adaptation. In Proc. ICCV, Cited by: TABLE I, TABLE III, §IV.
  • [66] H. Yan, Y. Guo, and C. Yang (2021) Source-free unsupervised domain adaptation with surrogate data generation. In Proc. BMVC, Cited by: §II-B, TABLE I, TABLE II, TABLE III, §IV.
  • [67] G. Yang, H. Xia, M. Ding, and Z. Ding (2020) Bi-directional generation for unsupervised domain adaptation. In Proc. AAAI, Cited by: TABLE I, TABLE II, §IV.
  • [68] J. Yang, R. Yan, and A. G. Hauptmann (2007) Cross-domain video concept detection using adaptive svms. In Proc. ACM-MM, Cited by: §II-B.
  • [69] S. Yang, J. van de Weijer, L. Herranz, S. Jui, et al. (2021) Exploiting the intrinsic neighborhood structure for source-free domain adaptation. Proc. NeurIPS. Cited by: §II-B, TABLE I, TABLE II, TABLE III, TABLE IV, §IV-A, §IV, §IV.
  • [70] Y. Yang, L. Xie, S. Chen, X. Li, Z. Lin, and D. Tao (2022) Do we really need a learnable classifier at the end of deep neural network?. arXiv preprint arXiv:2203.09081. Cited by: §III-A.
  • [71] K. You, M. Long, Z. Cao, J. Wang, and M. I. Jordan (2019) Universal domain adaptation. In Proc. CVPR, Cited by: §II-A.
  • [72] W. Zellinger, T. Grubinger, E. Lughofer, T. Natschläger, and S. Saminger-Platz (2017) Central moment discrepancy (cmd) for domain-invariant representation learning. In Proc. ICLR, Cited by: §I.
  • [73] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz (2018) Mixup: beyond empirical risk minimization. Proc. ICLR. Cited by: §I, §III-C.
  • [74] Y. Zhang, H. Tang, K. Jia, and M. Tan (2019) Domain-symmetric networks for adversarial domain adaptation. In Proc. CVPR, Cited by: TABLE I, §IV.
  • [75] Y. Zhang, T. Liu, M. Long, and M. Jordan (2019) Bridging theory and algorithm for domain adaptation. In Proc. ICML, Cited by: TABLE I, TABLE II, TABLE III, §IV.
  • [76] Y. Zou, Z. Yu, B. Kumar, and J. Wang (2018) Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In Proc. ECCV, Cited by: §I, §II-A.