Model Adaptation: Historical Contrastive Learning for Unsupervised Domain Adaptation without Source Data

10/07/2021
by   Jiaxing Huang, et al.
Nanyang Technological University
0

Unsupervised domain adaptation aims to align a labeled source domain and an unlabeled target domain, but it requires to access the source data which often raises concerns in data privacy, data portability and data transmission efficiency. We study unsupervised model adaptation (UMA), or called Unsupervised Domain Adaptation without Source Data, an alternative setting that aims to adapt source-trained models towards target distributions without accessing source data. To this end, we design an innovative historical contrastive learning (HCL) technique that exploits historical source hypothesis to make up for the absence of source data in UMA. HCL addresses the UMA challenge from two perspectives. First, it introduces historical contrastive instance discrimination (HCID) that learns from target samples by contrasting their embeddings which are generated by the currently adapted model and the historical models. With the source-trained and earlier-epoch models as the historical models, HCID encourages UMA to learn instance-discriminative target representations while preserving the source hypothesis. Second, it introduces historical contrastive category discrimination (HCCD) that pseudo-labels target samples to learn category-discriminative target representations. Instead of globally thresholding pseudo labels, HCCD re-weights pseudo labels according to their prediction consistency across the current and historical models. Extensive experiments show that HCL outperforms and complements state-of-the-art methods consistently across a variety of visual tasks (e.g., segmentation, classification and detection) and setups (e.g., close-set, open-set and partial adaptation).

READ FULL TEXT VIEW PDF
06/05/2021

Category Contrast for Unsupervised Domain Adaptation in Visual Tasks

Instance contrast for unsupervised representation learning has achieved ...
07/10/2022

Domain Confused Contrastive Learning for Unsupervised Domain Adaptation

In this work, we study Unsupervised Domain Adaptation (UDA) in a challen...
02/20/2020

Do We Really Need to Access the Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptation

Unsupervised domain adaptation (UDA) aims to leverage the knowledge lear...
04/21/2022

Contrastive Test-Time Adaptation

Test-time adaptation is a special setting of unsupervised domain adaptat...
10/28/2021

Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

Unsupervised domain adaptation which aims to adapt models trained on a l...
03/26/2021

Unsupervised Robust Domain Adaptation without Source Data

We study the problem of robust domain adaptation in the context of unava...
11/26/2021

Contrastive Vicinal Space for Unsupervised Domain Adaptation

Utilizing vicinal space between the source and target domains is one of ...

1 Introduction

Deep neural networks (DNNs) 

huang2017densely ; szegedy2015going ; he2016deep

have achieved great success in various computer vision tasks 

chen2017deeplab ; noh2015learning ; ren2015faster ; redmon2016you ; huang2017densely ; szegedy2015going ; he2016deep but often generalize poorly to new domains due to the inter-domain discrepancy ben2007analysis . Unsupervised domain adaptation (UDA) tzeng2017adversarial ; luo2019taking ; tsai2018learning ; saito2017adversarial ; saito2018maximum ; vu2019advent ; tsai2019domain ; zou2018unsupervised ; zou2019confidence ; hoffman2018cycada ; sankaranarayanan2018learning ; li2019bidirectional ; hong2018conditional ; yang2020fda addresses the inter-domain discrepancy by aligning the source and target data distributions, but it requires to access the source-domain data which often raises concerns in data privacy, data portability, and data transmission efficiency.

In this work, we study unsupervised model adaptation (UMA), an alternative setting that aims to adapt source-trained models to fit target data distribution without accessing the source-domain data. Under the UMA setting, the only information carried forward is a portable source-trained model which is usually much smaller than the source-domain data and can be transmitted more efficiently liang2020we ; li2020model ; li2020free ; sivaprasad2021uncertainty ; liu2021source as illustrated in Table 1. Beyond that, the UMA setting also alleviates the concern of data privacy and intellectual property effectively. On the other hand, the absence of the labeled source-domain data makes domain adaptation much more challenging and susceptible to collapse.

To this end, we develop historical contrastive learning (HCL) that aims to make up for the absence of source data by adapting the source-trained model to fit target data distribution without forgetting source hypothesis, as illustrated in Fig. 1. HCL addresses the UMA challenge from two perspectives. First, it introduces historical contrastive instance discrimination (HCID) that learns target samples by comparing their embeddings generated by the current model (as queries) and those generated by historical models (as keys): a query is pulled close to its positive keys while pushed apart from its negative keys. HCID can thus be viewed as a new type of instance contrastive learning for the task of UMA with historical models, which learns instance-discriminative target representations without forgetting source-domain hypothesis. Second, it introduces historical contrastive category discrimination (HCCD) that pseudo-labels target samples for learning category-discriminative target representations. Instead of globally thresholding the predicted pseudo labels, HCCD re-weights the pseudo labels according to their consistency across the current and historical models.

Storage size (MB) Semantic segmentation Object detection Image classification
GTA5 SYNTHIA Cityscapes VisDA17
Source dataset
Source-trained model
Table 1: Source data have much larger sizes than source-trained models.

The proposed HCL tackles UMA with three desirable features: 1) It introduces historical contrast and achieves UMA without forgetting source hypothesis; 2) The HCID works at instance level, which encourages to learn instance-discriminative target representations that are locally smooth and generalize well to new domains zhao2021what ; 3) The HCCD works at category level which encourages to learn category-discriminative target representation that is well aligned with the objective of down-stream tasks.

The contributions of this work can be summarized in three aspects. First, we investigate memory-based learning for unsupervised model adaptation that learns discriminative representations for unlabeled target data without forgetting source hypothesis. To the best of our knowledge, this is the first work that explores memory-based learning for the task of UMA. Second, we design historical contrastive learning which introduces historical contrastive instance discrimination and category discrimination, the latter is naturally aligned with the objective of UMA. Third, extensive experiments show that the proposed historical contrastive learning outperforms and complements state-of-the-art methods consistently across a variety of visual tasks and setups.

2 Related Works

Our work is closely related to several branches of research in unsupervised model adaptation, domain adaptation, memory-based learning and contrastive learning.

Unsupervised model adaptation aims to adapt a source-trained model to fit target data distributions without accessing source-domain data. This problem has attracted increasing attention recently with a few pioneer studies each of which focuses on a specific visual task. For example, liang2020we ; liang2021source

freezes the classifier of source-trained model and performs information maximization on target data for classification model adaptation.

li2020model tackles classification model adaptation with a conditional GANs that generates training images with target-alike styles and source-alike semantics. li2020free presents a self-entropy descent algorithm to improve model adaptation for object detection. sivaprasad2021uncertainty reduces the uncertainty of target predictions (by source-trained model) for segmentation model adaptation. liu2021source introduces data-free knowledge distillation to transfer source-domain knowledge for segmentation model adaptation. Despite the different designs for different tasks, the common motivation of these studies is to make up for the absence of source data in domain adaptation. kurmi2021domain and yeh2021sofa tackle source-free domain adaptation from a generative manner by generating samples from the source classes and generating reference distributions, respectively.

We tackle the absence of source data by a memory mechanism that encourages to memorize source hypothesis during model adaptation. Specifically, we design historical contrastive learning that learns target representations by contrasting historical and currently evolved models. To the best of our knowledge, this is the first work that explores memory mechanism for UMA. In addition, HCL is the first generic UMA method that works for a variety of visual tasks and adaptation setups.

Domain adaptation is related to UMA but it requires to access labeled source data in training. Most existing work handles UDA from three typical approaches. The first exploits adversarial training to align source and target distributions in the feature, output or latent space tzeng2017adversarial ; luo2019taking ; tsai2018learning ; chen2018road ; zhang2017curriculum ; saito2017adversarial ; saito2018maximum ; vu2019advent ; huang2021semi ; tsai2019domain ; huang2020contextual ; guan2021uncertainty ; zhang2021detr ; huang2021mlan . The second employs self-training to generate pseudo labels to learn from unlabeled target data iteratively zou2018unsupervised ; saleh2018effective ; zhong2019invariance ; zou2019confidence ; huang2021cross ; guan2021scale . The third leverages image translation to modify image styles to reduce domain gaps hoffman2018cycada ; sankaranarayanan2018learning ; li2019bidirectional ; Zhang2019lipreading ; hong2018conditional ; yang2020fda ; huang2021fsdr ; huang2021rda ; zhang2021spectral . In addition, huang2021category proposes a categorical contrastive learning for domain adaptation.

Memory-based learning has been studied extensively. Memory networks  weston2014memory

as one of early efforts explores to use external modules to store memory for supervised learning. Temporal ensemble

laine2016temporal , as well as a few following works tarvainen2017mean ; chen2018semi

extend the memory mechanism to semi-supervised learning. It employs historical hypothesis/models to regularize the current model and produces stable and competitive predictions. Mean Teacher

tarvainen2017mean leverages moving-average models as the memory model to regularize the training, and similar idea was extended for UDA french2017self ; zheng2019unsupervised ; cai2019exploring . Mutual learning zhang2018deep has also been proposed for learning among multiple peer student models.

The aforementioned methods require labeled data in training. They do not work for UMA due to the absence of supervision from the labeled source data, by either collapsing in training or helping little in model adaptation performance. We design innovative historical contrastive learning to make up for the absence of the labeled source data, more details to be presented in the ensuing subsections.

Figure 1: Illustration of unsupervised domain adaptation, unsupervised model adaptation and the proposed historical contrastive learning which exploits historical source hypothesis (or memorized knowledge) to make up for the absence of source supervision in the process of UMA. Here the historical source hypothesis could be the original source hypothesis ( t=m, trained using the labeled source data only) or the adapted source hypothesis ( m<t, trained in the last epoch). We use and (the previous-epoch model) in our implementation.

Contrastive learning wu2018unsupervised ; ye2019unsupervised ; he2020momentum ; misra2020self ; zhuang2019local ; hjelm2018learning ; oord2018representation ; tian2019contrastive ; chen2020simple learns discriminative representations from multiple views of the same instance. It works with certain dictionary look-up mechanism he2020momentum , where a given image is augmented into two views, query and key, and the query token should match its designated key over a set of negative keys from other images. Existing work can be broadly classified into three categories based on dictionary creation strategies. The first creates a memory bank wu2018unsupervised to store all the keys in the previous epoch. The second builds an end-to-end dictionary ye2019unsupervised ; tian2019contrastive ; chen2020simple that generates keys using samples from the current mini-batch. The third employs a momentum encoder he2020momentum that generates keys on-the-fly by a momentum-updated encoder. We design historical contrastive learning that generates keys by historical models.

Other related source-free adaptation works. cha2021co considers supervised continual learning from previously learned tasks to a new task, which learns representations using the contrastive learning objective and preserves learned representations using a self-supervised distillation step, where the contrastively learned representations are more robust against the catastrophic forgetting for supervised continual learning. kundu2020universal addresses a source-free universal domain adaptation problem that does not guarantee that the classes in the target domain are the same as in the source domain, whereas we focus on close-set DA. kundu2020towards propose a simple yet effective solution to realize inheritable models suitable for open-set source-free DA problem whereas we focus on close-set DA.

3 Historical Contrastive Learning

This section presents the proposed historical contrastive learning that memorizes source hypothesis to make up for the absence of source data as illustrated in Fig. 2. The proposed HCL consists of two key designs. The first is historical contrastive instance discrimination which encourages to learn instance-discriminative target representations that generalize well to new domains zhao2021what . The second is historical contrastive category discrimination that encourages to learn category-discriminative target representations which is well aligned with the objective of visual recognition tasks. More details to be described in the ensuring subsections.

Figure 2: The proposed historical contrastive learning consists of two key designs including historical contrastive instance discrimination (HCID) and historical contrastive category discrimination (HCCD). HCID learns from target samples by contrasting their embeddings generated by the current model (as queries) and historical models (as keys), which learns instance-discriminative target representations under the guidance of . HCCD pseudo-labels target samples to learn category-discriminative target representations under the guidance of , where the pseudo labels are re-weighted adaptively according to the prediction consistency across the current and historical models.

3.1 Historical Contrastive Instance Discrimination

The proposed HCID learns from unlabeled target samples via contrastive learning over their embeddings generated from current and historical models: the positive pairs (embeddings of the same sample) are pulled close while negative pairs (embeddings of different samples) are pushed apart. It is a new type of contrastive learning for UMA, which preserves source-domain hypothesis by generating positive keys from historical models. HCID works at instance level and encourages to learn instance- discriminative target representations that are locally smooth zhao2021what and generalize well to new domains.

HCID loss. Given a query sample and a set of key samples , HCID employs current model to encode the query , and historical encoders to encode the keys . With the encoded embeddings, HCID is achieved via a historical contrastive loss , minimization of which pulls close to its positive key while pushing it apart from all other (negative) keys:

(1)

where is a temperature parameter wu2018unsupervised , indicates the reliability of each key

, with which we reweight the similarity loss of each key to encourage to memorize well-learnt instead of poorly-learnt historical embeddings. In this work, we use the classification entropy to estimate the reliability of each key:

. The positive key is the augmentation of the query sample, and all the rest are negative keys.

Remark 1

Note in Eq.1 has a similar form as the InfoNCE lossoord2018representation ; he2020momentum . InfoNCE can actually be viewed as a special case of HisNCE, where all the query and keys are encoded by the current model () and the reliability is fixed (). For HisNCE, we assign each key a reliability score to encourage to memorize the well-learnt historical embeddings only. It is also worth noting that Eq. 1 only shows historical contrast with one historical model for simplicity. In practice, we could employ multiple historical models to comprehensively distill (memorize) the well-learnt embeddings from them. In this work, we adopt two historical models including the source-trained model (, ) that provides the original source hypothesis for long-term memorization, and the model of of previous epoch (, ) that provides evolved source hypothesis for short-term memorization. It is achieved by computing Eq.1 multiple times independently.

3.2 Historical contrastive category discrimination

We design HCCD that generates pseudo labels conditioned on a historical consistency, , the prediction consistency across the current and historical models. HCCD can be viewed as a new type of self-training, where pseudo labels are weighted by the historical consistency instead of global thresholding. It works at category level and encourages to learn category-discriminative target representations that are aligned with the objective of visual recognition tasks in UMA.

Historical contrastive pseudo label generation. Given an unlabeled sample , the current and historical models predict (as the query) and (as the keys). The pseudo label and the historical consistency of the sample are computed by:

where is a

-class probability vector and

is the predicted category label.

HCCD loss. Given the unlabeled data and its historical contrastive pseudo label , HCCD performs self-training on target data via a weighted cross-entropy loss:

(2)

where is the per-sample historical consistency and we use it to re-weight the self-training loss. If the predictions of a sample across the current and historical models are consistent, we consider it as a well-learnt sample and increase its influence in self-training. Otherwise, we consider it as pooly-learnt sample and decrease its influence in self-training.

3.3 Theoretical Insights

The two designs in Historical Contrastive Learning (HCL) are inherently connected with some probabilistic models and convergent under certain conditions:

Proposition 1

The historical contrastive instance discrimination (HCID) can be modelled as a maximum likelihood problem optimized via Expectation Maximization.

Proposition 2

The HCID is convergent under certain conditions.

Proposition 3

The historical contrastive category discrimination (HCCD) can be modelled as a classification maximum likelihood problem optimized via Classification Expectation Maximization.

Proposition 4

The HCCD is convergent under certain conditions.

The proofs of Propositions 1, 2, 3 and 4 are provided in the appendix.

Method SF Road SW Build Wall Fence Pole TL TS Veg. Terrain Sky PR Rider Car Truck Bus Train Motor Bike mIoU
CBST zou2018unsupervised 91.8 53.5 80.5 32.7 21.0 34.0 28.9 20.4 83.9 34.2 80.9 53.1 24.0 82.7 30.3 35.9 16.0 25.9 42.8 45.9
AdaptSeg tsai2018learning 86.5 36.0 79.9 23.4 23.3 23.9 35.2 14.8 83.4 33.3 75.6 58.5 27.6 73.7 32.5 35.4 3.9 30.1 28.1 42.4
AdvEnt vu2019advent 89.4 33.1 81.0 26.6 26.8 27.2 33.5 24.7 83.9 36.7 78.8 58.7 30.5 84.8 38.5 44.5 1.7 31.6 32.4 45.5
IDA pan2020unsupervised 90.6 37.1 82.6 30.1 19.1 29.5 32.4 20.6 85.7 40.5 79.7 58.7 31.1 86.3 31.5 48.3 0.0 30.2 35.8 46.3
CRST zou2019confidence 91.0 55.4 80.0 33.7 21.4 37.3 32.9 24.5 85.0 34.1 80.8 57.7 24.6 84.1 27.8 30.1 26.9 26.0 42.3 47.1
CrCDA huang2020contextual 92.4 55.3 82.3 31.2 29.1 32.5 33.2 35.6 83.5 34.8 84.2 58.9 32.2 84.7 40.6 46.1 2.1 31.1 32.7 48.6
RDA huang2021rda 91.2 44.7 82.5 27.7 24.7 36.7 36.5 24.7 84.9 38.1 78.4 62.2 27.4 84.9 39.3 46.7 12.9 31.9 36.7 48.0
CaCo huang2021category 91.9 54.3 82.7 31.7 25.0 38.1 46.7 39.2 82.6 39.7 76.2 63.5 23.6 85.1 38.6 47.8 10.3 23.4 35.1 49.2
UR sivaprasad2021uncertainty 92.3 55.2 81.6 30.8 18.8 37.1 17.7 12.1 84.2 35.9 83.8 57.7 24.1 81.7 27.5 44.3 6.9 24.1 40.4 45.1
+HCL 92.2 54.1 81.7 34.2 25.4 37.9 35.8 29.8 84.1 38.0 83.9 59.1 27.1 84.6 33.9 41.9 16.2 27.7 44.7 49.1
SFDA liu2021source 91.7 52.7 82.2 28.7 20.3 36.5 30.6 23.6 81.7 35.6 84.8 59.5 22.6 83.4 29.6 32.4 11.8 23.8 39.6 45.8
+HCL 92.3 54.5 82.6 33.1 26.2 38.9 37.9 31.7 83.5 38.1 84.4 60.9 30.0 84.5 32.6 41.2 14.2 26.4 43.2 49.3
HCID 89.5 53 80.3 33.9 22.9 36.2 32.7 23.8 82.3 36.5 73.7 60.0 22.4 83.8 28.9 34.7 13.5 21.2 38.0 45.6
HCCD 91.0 53.6 81.5 32.4 23.1 36.9 32.3 26.3 82.8 37.2 80.4 58.5 25.0 82.5 29.9 34.2 15.5 23.2 40.5 46.7
HCL 92.0 55.0 80.4 33.5 24.6 37.1 35.1 28.8 83.0 37.6 82.3 59.4 27.6 83.6 32.3 36.6 14.1 28.7 43.0 48.1
Table 2: Experiments on semantic segmentation task GTA5 Cityscapes (SF: source free).
Method SF Road SW Build Wall* Fence* Pole* TL TS Veg. Sky PR Rider Car Bus Motor Bike mIoU mIoU*
CBST zou2018unsupervised 68.0 29.9 76.3 10.8 1.4 33.9 22.8 29.5 77.6 78.3 60.6 28.3 81.6 23.5 18.8 39.8 42.6 48.9
AdaptSeg tsai2018learning 84.3 42.7 77.5 - - - 4.7 7.0 77.9 82.5 54.3 21.0 72.3 32.2 18.9 32.3 - 46.7
AdvEnt vu2019advent 85.6 42.2 79.7 8.7 0.4 25.9 5.4 8.1 80.4 84.1 57.9 23.8 73.3 36.4 14.2 33.0 41.2 48.0
IDA pan2020unsupervised 84.3 37.7 79.5 5.3 0.4 24.9 9.2 8.4 80.0 84.1 57.2 23.0 78.0 38.1 20.3 36.5 41.7 48.9
CRST zou2019confidence 67.7 32.2 73.9 10.7 1.6 37.4 22.2 31.2 80.8 80.5 60.8 29.1 82.8 25.0 19.4 45.3 43.8 50.1
CrCDAhuang2020contextual 86.2 44.9 79.5 8.3 0.7 27.8 9.4 11.8 78.6 86.5 57.2 26.1 76.8 39.9 21.5 32.1 42.9 50.0
UR sivaprasad2021uncertainty 59.3 24.6 77.0 14.0 1.8 31.5 18.3 32.0 83.1 80.4 46.3 17.8 76.7 17.0 18.5 34.6 39.6 45.0
+HCL 76.7 33.7 78.7 7.2 0.1 34.4 23.2 31.6 80.5 84.3 54.4 26.6 79.5 35.9 24.8 34.4 44.1 51.1
SFDA liu2021source 67.8 31.9 77.1 8.3 1.1 35.9 21.2 26.7 79.8 79.4 58.8 27.3 80.4 25.3 19.5 37.4 42.4 48.7
+HCL 78.2 35.3 79.6 7.3 0.2 37.7 21 30.9 80.4 83.3 59.8 29.4 79.2 34.2 24.5 38.9 45.0 51.9
HCL 80.9 34.9 76.7 6.6 0.2 36.1 20.1 28.2 79.1 83.1 55.6 25.6 78.8 32.7 24.1 32.7 43.5 50.2
Table 3: Experiments on semantic segmentation task SYNTHIA Cityscapes (SF: source free).

4 Experiments

This section presents experiments including datasets, implementation details, evaluations of the proposed HCL in semantic segmentation, object detection and image classification tasks as well as the discussion of its desirable features.

4.1 Datasets

UMA for semantic segmentation is evaluated on two challenging tasks GTA5 richter2016playing Cityscapes cordts2016cityscapes and SYNTHIA ros2016synthia Cityscapes. GTA5 has synthetic images and shares categories with Cityscapes. SYNTHIA contains synthetic images and shares categories with Cityscapes. Cityscapes has and real-world images for training and validation, respectively.

UMA for object detection is evaluated on tasks Cityscapes Foggy Cityscapes sakaridis2018semantic and Cityscapes BDD100k yu2018bdd100k . Foggy Cityscapes is derived by applying simulated fog to the Cityscapes images. BDD100k has training images and validation images, and shares categories with Cityscapes. We evaluate a subset of BDD100k (, daytime) as in xu2020exploring ; saito2019strong ; chen2018domain for fair comparisons.

UMA for image classification is evaluated on benchmarks VisDA17 peng2018visda and Office-31 saenko2010adapting . VisDA17 has synthetic images as the source domain and real images of shared categories as the target domain. Office-31 has images of three sources including 2817 from Amazon, 795 from Webcam and 498 from DSLR (with shared categories). Following zou2019confidence ; saenko2010adapting ; sankaranarayanan2018generate , the evaluation is on six adaptation tasks AW, DW, WD, AD, DA, and WA.

4.2 Implementation Details

Semantic segmentation: Following vu2019advent ; zou2018unsupervised , we employ DeepLab-V2 chen2017deeplab with ResNet101 he2016deep as the segmentation model. We adopt SGD bottou2010large with momentum , weight decay and learning rate , where the learning rate is decayed by a polynomial annealing policy chen2017deeplab .

Object detection: Following xu2020exploring ; saito2019strong ; chen2018domain , we adopt Faster R-CNN ren2015faster with VGG-16 szegedy2015going as the detection model. We use SGD bottou2010large with momentum and weight decay . The learning rate is for first iterations and then decreased to for another iterations.

Image classification: Following zou2019confidence ; saenko2010adapting ; sankaranarayanan2018generate , we adopt ResNet-101 and ResNet-50 he2016deep as the classification models for VisDA17 and Office-31, respectively. We use SGD bottou2010large with momentum , weight decay , learning rate and batch size .

Method SF person rider car truck bus train mcycle bicycle mAP
MAF he2019multi 28.4 39.5 43.9 23.8 39.9 33.3 29.2 33.9 34.0
SCDA zhu2019adapting 33.5 38.0 48.5 26.5 39.0 23.3 28.0 33.6 33.8
DA chen2018domain 25.0 31.0 40.5 22.1 35.3 20.2 20.0 27.1 27.6
MLDA Xie_2019_ICCV 33.2 44.2 44.8 28.2 41.8 28.7 30.5 36.5 36.0
DMA kim2019diversify 30.8 40.5 44.3 27.2 38.4 34.5 28.4 32.2 34.6
CAFA hsu2020every 41.9 38.7 56.7 22.6 41.5 26.8 24.6 35.5 36.0
SWDA saito2019strong 36.2 35.3 43.5 30.0 29.9 42.3 32.6 24.5 34.3
CRDA xu2020exploring 32.9 43.8 49.2 27.2 45.1 36.4 30.3 34.6 37.4
CaCo huang2021category 38.3 46.7 48.1 33.2 45.9 37.6 31.0 33.0 39.2
SFOD li2020free 25.5 44.5 40.7 33.2 22.2 28.4 34.1 39.0 33.5
+HCL 39.3 46.7 48.6 32.9 46.2 38.2 33.9 36.9 40.3
HCL 38.7 46.0 47.9 33.0 45.7 38.9 32.8 34.9 39.7
Table 4: Experiments on object detection task Cityscapes Foggy Cityscapes (SF: source free).
Method SF person rider car truck bus mcycle bicycle mAP
DA chen2018domain 29.4 26.5 44.6 14.3 16.8 15.8 20.6 24.0
SWDA saito2019strong 30.2 29.5 45.7 15.2 18.4 17.1 21.2 25.3
CRDA xu2020exploring 31.4 31.3 46.3 19.5 18.9 17.3 23.8 26.9
CaCo huang2021category 32.7 32.2 50.6 20.2 23.5 19.4 25.0 29.1
SFOD li2020free 32.4 32.6 50.4 20.6 23.4 18.9 25.0 29.0
+HCL 33.9 34.4 52.8 22.1 25.3 22.6 26.7 31.1
HCL 32.7 33.2 52.0 21.3 25.6 21.5 26.0 30.3
Table 5: Experiments on object detection task Cityscapes BDD100k (SF: source free).

4.3 Unsupervised Domain Adaption for Semantic Segmentation

We evaluated the proposed HCL in UMA-based semantic segmentation tasks GTA5 Cityscapes and SYNTHIA Cityscapes. Tables 2 and 3 show experimental results in mean Intersection-over-Union (mIoU). We can see that HCL outperforms state-of-the-art UMA methods by large margins. In addition, HCL is complementary to existing UMA methods and incorporating it as denoted by “+HCL” improves the existing UMA methods clearly and consistently. Furthermore, HCL even achieves competitive performance as compared with state-of-the-art UDA methods (labeled by ✗in the column SF) which require to access the labeled source data in training.

Ablation studies. We conduct ablation studies of the proposed HCL over the UMA-based semantic segmentation task GTA5 Cityscapes. As the bottom of Table 2 shows, either HCID or HCCD achieves comparable performance. In addition, HCID and HCCD offer orthogonal self-supervision signals where HCID focuses on instance-wise discrimination between queries and historical keys and HCCD focuses on category-wise discrimination among samples with different pseudo category labels. The two designs are thus complementary and the combination of them in HCL produces the best segmentation.

4.4 Unsupervised Domain Adaptation for Object Detection

We evaluated the proposed HCL over the UMA-based object detection tasks Cityscapes Foggy Cityscapes and Cityscapes BDD100k. Tables 4 and 5 show experimental results. We can observe that HCL outperforms state-of-the-art UMA method SFOD clearly. In addition, incorporating HCL into SFOD improves the detection consistently across both tasks. Similar to the semantic segmentation experiments, HCL achieves competitive performance as compared with state-of-the-art UDA methods (labeled by ✗in column SF) which require to access labeled source data in training.

4.5 Unsupervised Domain Adaptation for Image Classification

We evaluate the proposed HCL over the UMA-based image classificat tasks VisDA17 and Office-31. Tables 6 and  7 show experimental results. We can observe that HCL outperforms state-of-the-art UMA methods clearly. In addition, incorporating HCL into state-of-the-art UMA methods further improve the classification accuracy consistently in both tasks. Similar to the semantic segmentation and object detection experiments, HCL achieves competitive performance as compared with state-of-the-art UDA methods (labeled by ✗) which require to access labeled source data in training.

Method SF Aero Bike Bus Car Horse Knife Motor Person Plant Skateboard Train Truck Mean
MMD long2015learning 87.1 63.0 76.5 42.0 90.3 42.9 85.9 53.1 49.7 36.3 85.8 20.7 61.1
DANN ganin2016domain 81.9 77.7 82.8 44.3 81.2 29.5 65.1 28.6 51.9 54.6 82.8 7.8 57.4
ENT grandvalet2005semi 80.3 75.5 75.8 48.3 77.9 27.3 69.7 40.2 46.5 46.6 79.3 16.0 57.0
MCD saito2018maximum 87.0 60.9 83.7 64.0 88.9 79.6 84.7 76.9 88.6 40.3 83.0 25.8 71.9
CBST zou2018unsupervised 87.2 78.8 56.5 55.4 85.1 79.2 83.8 77.7 82.8 88.8 69.0 72.0 76.4
CRST zou2019confidence 88.0 79.2 61.0 60.0 87.5 81.4 86.3 78.8 85.6 86.6 73.9 68.8 78.1
CaCo huang2021category 90.4 80.9 69.1 66.7 88.8 80.0 86.5 78.1 87.3 87.0 80.3 75.8 80.9
3C-GAN li2020model 94.8 73.4 68.8 74.8 93.1 95.4 88.6 84.7 89.1 84.7 83.5 48.1 81.6
+HCL 93.8 86.6 84.1 74.3 93.2 95.0 88.4 85.0 90.4 85.2 84.5 49.8 84.2
SHOT liang2020we 93.7 86.4 78.7 50.7 91.0 93.5 79.0 78.3 89.2 85.4 87.9 51.1 80.4
+HCL 94.3 87.0 82.6 70.6 92.0 93.2 87.0 80.6 89.6 86.8 84.6 58.7 83.9
HCL 93.3 85.4 80.7 68.5 91.0 88.1 86.0 78.6 86.6 88.8 80.0 74.7 83.5
Table 6: Experiments on image classification benchmark VisDA17 (SF: source free).
Method SF AW DW WD AD DA WA Mean
DAN long2015learning 80.5 97.1 99.6 78.6 63.6 62.8 80.4
RTN long2016unsupervised 84.5 96.8 99.4 77.5 66.2 64.8 81.6
DANN ganin2016domain 82.0 96.9 99.1 79.7 68.2 67.4 82.2
ADDA tzeng2017adversarial 86.2 96.2 98.4 77.8 69.5 68.9 82.9
JAN long2017deep 85.4 97.4 99.8 84.7 68.6 70.0 84.3
CBST zou2018unsupervised 87.8 98.5 100 86.5 71.2 70.9 85.8
CRST zou2019confidence 89.4 98.9 100 88.7 72.6 70.9 86.8
CaCo huang2021category 90.9 99.1 100.0 94.2 73.2 76.6 89.0
3C-GAN li2020model 93.7 98.5 99.8 92.7 75.3 77.8 89.6
+HCL 93.4 99.3 100.0 94.6 77.1 79.0 90.6
SHOT liang2020we 91.2 98.3 99.9 90.6 72.5 71.4 87.3
+HCL 92.8 99.0 100.0 94.4 76.1 78.3 90.1
HCL 92.5 98.2 100.0 94.7 75.9 77.7 89.8
Table 7: Experiments on image classification benchmark Office-31 (SF: source free).
Partial-set DA SF AC AP AR CA CP CR PA PC PR RA RC RP Mean
IWAN zhang2018importance 53.9 54.5 78.1 61.3 48.0 63.3 54.2 52.0 81.3 76.5 56.8 82.9 63.6
SAN cao2018partial 44.4 68.7 74.6 67.5 65.0 77.8 59.8 44.7 80.1 72.2 50.2 78.7 65.3
ETN cao2019learning 59.2 77.0 79.5 62.9 65.7 75.0 68.3 55.4 84.4 75.7 57.7 84.5 70.5
SAFN xu2019larger 58.9 76.3 81.4 70.4 73.0 77.8 72.4 55.3 80.4 75.8 60.4 79.9 71.8
SHOT liang2020we 57.9 83.6 88.8 72.4 74.0 79.0 76.1 60.6 90.1 81.9 68.3 88.5 76.8
+HCL 66.9 85.5 92.5 78.3 77.2 87.1 78.3 65.1 90.7 82.4 68.7 88.4 80.1
HCL 65.6 85.2 92.7 77.3 76.2 87.2 78.2 66.0 89.1 81.5 68.4 87.3 79.6
Open-set DA SF AC AP AR CA CP CR PA PC PR RA RC RP Mean
ATI- panareda2017open 55.2 52.6 53.5 69.1 63.5 74.1 61.7 64.5 70.7 79.2 72.9 75.8 66.1
OSBP saito2018open 56.7 51.5 49.2 67.5 65.5 74.0 62.5 64.8 69.3 80.6 74.7 71.5 65.7
OpenMax bendale2016towards 56.5 52.9 53.7 69.1 64.8 74.5 64.1 64.0 71.2 80.3 73.0 76.9 66.7
STA liu2019separate 58.1 53.1 54.4 71.6 69.3 81.9 63.4 65.2 74.9 85.0 75.8 80.8 69.5
SHOT liang2020we 62.5 77.8 83.9 60.9 73.4 79.4 64.7 58.7 83.1 69.1 62.0 82.1 71.5
+HCL 64.2 78.3 83.0 61.1 72.2 79.6 65.5 59.3 80.6 80.1 72.0 82.8 73.2
HCL 64.0 78.6 82.4 64.5 73.1 80.1 64.8 59.8 75.3 78.1 69.3 81.5 72.6
Table 8: Experiments on image classification benchmark Office-Home under the setup of partial-set DA (domain adaptation) and open-set DA.

4.6 Discussion

Generalization across computer vision tasks: We study how HCL generalizes across computer vision tasks by evaluating it over three representative tasks on semantic segmentation, object detection and image classification. Experiments in Tables 27 show that HCL achieves competitive performance consistently across all three visual tasks without any task-specific optimization or fine-tuning. To the best of our knowledge, HCL is the first generic UMA method that works for a variety of visual tasks.

Complementarity studies: We study the complementarity of our proposed HCL by incorporating it into existing UMA methods. Experiments in Tables 27 (the row highlighted by “+HCL”) show that incorporating HCL boosts the existing UMA methods consistently across different visual tasks.

Feature visualization: This paragraph presents the t-SNE maaten2008visualizing visualization of feature representation on GTA Cityscapes model adaptation task. We compare HCL with two state-of-the-art UMA methods, , “UR" sivaprasad2021uncertainty and “SFDA" liu2021source , and Fig.3 shows the visualization. We can observe that HCL can learn desirable instance-discriminative (, smooth) yet category-discriminative representations because it incorporates two key designs that work in a complementary manner: 1) HCID works at instance level, which encourages to learn instance-discriminative target representations that are locally smooth and generalize well to new domains zhao2021what ; 2) HCCD works at category level which encourages to learn category-discriminative target representations that are well aligned with the objective of down-stream visual tasks. In addition, qualitative illustrations are provided in Fig.4. It can be observed that our proposed HCL clearly outperforms UR and SFDA.

Generalization across learning setups: We study how HCL generalizes across learning setups by evaluating it over two adaptation setups, , partial-set adaptation and open-set adaptation. Experiments in Table 8 show that HCL achieves competitive performance consistently across both setups with little setup-specific optimization or fine-tuning.

URsivaprasad2021uncertainty SFDA liu2021source HCL(Ours)
Figure 3: The t-SNE maaten2008visualizing visualization of feature representation on GTA Cityscapes model adaptation task: Each color stands for a category of samples (image pixels) with a digit representing the category center. The proposed HCL outperforms “UR" and “SFDA" qualitatively, by generating instance-discriminative (, smooth) yet category-discriminative representations for unlabeled target data.
URsivaprasad2021uncertainty SFDAliu2021source HCL(Ours) Ground Truth
Figure 4: Qualitative illustrations and comparison over domain adaptive semantic segmentation task GTA5 Cityscapes. Our historical contrastive learning (HCL) exploits historical source hypothesis to make up for the absence of source data in UMA, which produces better qualitative results (, semantic segmentation) by preserving the source hypothesis.

5 Conclusion

In this work, we studied historical contrastive learning, an innovative UMA technique that exploits historical source hypothesis to make up for the absence of source data in UMA. We achieve historical contrastive learning by novel designs of historical contrastive instance discrimination and historical contrastive category discrimination which learn discriminative representations for target data while preserving source hypothesis simultaneously. Extensive experiments over a variety of visual tasks and learning setups show that HCL outperforms and complements state-of-the-art techniques consistently. Moving forward, we will explore memory-based learning in other transfer learning tasks.

Acknowledgement

This study is supported under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from Singapore Telecommunications Limited (Singtel), through Singtel Cognitive and Artificial Intelligence Lab for Enterprises (SCALE@NTU).

References

  • [1] Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. Analysis of representations for domain adaptation. In Advances in neural information processing systems, pages 137–144, 2007.
  • [2] Abhijit Bendale and Terrance E Boult. Towards open set deep networks. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    , pages 1563–1572, 2016.
  • [3] Léon Bottou.

    Large-scale machine learning with stochastic gradient descent.

    In Proceedings of COMPSTAT’2010, pages 177–186. Springer, 2010.
  • [4] Qi Cai, Yingwei Pan, Chong-Wah Ngo, Xinmei Tian, Lingyu Duan, and Ting Yao. Exploring object relation in mean teacher for cross-domain detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11457–11466, 2019.
  • [5] Zhangjie Cao, Mingsheng Long, Jianmin Wang, and Michael I Jordan. Partial transfer learning with selective adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2724–2732, 2018.
  • [6] Zhangjie Cao, Kaichao You, Mingsheng Long, Jianmin Wang, and Qiang Yang. Learning to transfer examples for partial domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2985–2994, 2019.
  • [7] Hyuntak Cha, Jaeho Lee, and Jinwoo Shin. Co2l: Contrastive continual learning. arXiv preprint arXiv:2106.14413, 2021.
  • [8] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2017.
  • [9] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  • [10] Yanbei Chen, Xiatian Zhu, and Shaogang Gong.

    Semi-supervised deep learning with memory.

    In Proceedings of the European Conference on Computer Vision (ECCV), pages 268–283, 2018.
  • [11] Yuhua Chen, Wen Li, Christos Sakaridis, Dengxin Dai, and Luc Van Gool. Domain adaptive faster r-cnn for object detection in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3339–3348, 2018.
  • [12] Yuhua Chen, Wen Li, and Luc Van Gool. Road: Reality oriented adaptation for semantic segmentation of urban scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7892–7901, 2018.
  • [13] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele.

    The cityscapes dataset for semantic urban scene understanding.

    In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016.
  • [14] Geoffrey French, Michal Mackiewicz, and Mark Fisher. Self-ensembling for visual domain adaptation. arXiv preprint arXiv:1706.05208, 2017.
  • [15] Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.
  • [16] Yves Grandvalet and Yoshua Bengio. Semi-supervised learning by entropy minimization. In Advances in neural information processing systems, pages 529–536, 2005.
  • [17] Dayan Guan, Jiaxing Huang, Shijian Lu, and Aoran Xiao.

    Scale variance minimization for unsupervised domain adaptation in image segmentation.

    Pattern Recognition, 112:107764, 2021.
  • [18] Dayan Guan, Jiaxing Huang, Aoran Xiao, Shijian Lu, and Yanpeng Cao. Uncertainty-aware unsupervised domain adaptation in object detection. IEEE Transactions on Multimedia, 2021.
  • [19] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9729–9738, 2020.
  • [20] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • [21] Zhenwei He and Lei Zhang. Multi-adversarial faster-rcnn for unrestricted object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6668–6677, 2019.
  • [22] R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670, 2018.
  • [23] Judy Hoffman, Eric Tzeng, Taesung Park, Jun-Yan Zhu, Phillip Isola, Kate Saenko, Alexei Efros, and Trevor Darrell. Cycada: Cycle-consistent adversarial domain adaptation. In International Conference on Machine Learning, pages 1989–1998, 2018.
  • [24] Weixiang Hong, Zhenzhen Wang, Ming Yang, and Junsong Yuan.

    Conditional generative adversarial network for structured domain adaptation.

    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1335–1344, 2018.
  • [25] Cheng-Chun Hsu, Yi-Hsuan Tsai, Yen-Yu Lin, and Ming-Hsuan Yang. Every pixel matters: Center-aware feature alignment for domain adaptive object detector. In European Conference on Computer Vision, pages 733–748. Springer, 2020.
  • [26] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017.
  • [27] Jiaxing Huang, Dayan Guan, Shijian Lu, and Aoran Xiao. Mlan: Multi-level adversarial network for domain adaptive semantic segmentation. arXiv preprint arXiv:2103.12991, 2021.
  • [28] Jiaxing Huang, Dayan Guan, Aoran Xiao, and Shijian Lu. Category contrast for unsupervised domain adaptation in visual tasks. arXiv preprint arXiv:2106.02885, 2021.
  • [29] Jiaxing Huang, Dayan Guan, Aoran Xiao, and Shijian Lu. Cross-view regularization for domain adaptive panoptic segmentation. arXiv preprint arXiv:2103.02584, 2021.
  • [30] Jiaxing Huang, Dayan Guan, Aoran Xiao, and Shijian Lu. Fsdr: Frequency space domain randomization for domain generalization. arXiv preprint arXiv:2103.02370, 2021.
  • [31] Jiaxing Huang, Dayan Guan, Aoran Xiao, and Shijian Lu. Rda: Robust domain adaptation via fourier adversarial attacking. arXiv preprint arXiv:2106.02874, 2021.
  • [32] Jiaxing Huang, Dayan Guan, Aoran Xiao, and Shijian Lu. Semi-supervised domain adaptation via adaptive and progressive feature alignment. arXiv preprint arXiv:2106.02845, 2021.
  • [33] Jiaxing Huang, Shijian Lu, Dayan Guan, and Xiaobing Zhang. Contextual-relation consistent domain adaptation for semantic segmentation. In European Conference on Computer Vision, pages 705–722. Springer, 2020.
  • [34] Taekyung Kim, Minki Jeong, Seunghyeon Kim, Seokeon Choi, and Changick Kim. Diversify and match: A domain adaptive representation learning paradigm for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12456–12465, 2019.
  • [35] Jogendra Nath Kundu, Naveen Venkat, R Venkatesh Babu, et al. Universal source-free domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4544–4553, 2020.
  • [36] Jogendra Nath Kundu, Naveen Venkat, Ambareesh Revanur, R Venkatesh Babu, et al. Towards inheritable models for open-set domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12376–12385, 2020.
  • [37] Vinod K Kurmi, Venkatesh K Subramanian, and Vinay P Namboodiri. Domain impression: A source data free domain adaptation method. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 615–625, 2021.
  • [38] Samuli Laine and Timo Aila. Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242, 2016.
  • [39] Rui Li, Qianfen Jiao, Wenming Cao, Hau-San Wong, and Si Wu. Model adaptation: Unsupervised domain adaptation without source data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9641–9650, 2020.
  • [40] Xianfeng Li, Weijie Chen, Di Xie, Shicai Yang, Peng Yuan, Shiliang Pu, and Yueting Zhuang. A free lunch for unsupervised domain adaptive object detection without source data. arXiv preprint arXiv:2012.05400, 2020.
  • [41] Yunsheng Li, Lu Yuan, and Nuno Vasconcelos. Bidirectional learning for domain adaptation of semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6936–6945, 2019.
  • [42] Jian Liang, Dapeng Hu, and Jiashi Feng. Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In International Conference on Machine Learning, pages 6028–6039. PMLR, 2020.
  • [43] Jian Liang, Dapeng Hu, Yunbo Wang, Ran He, and Jiashi Feng. Source data-absent unsupervised domain adaptation through hypothesis transfer and labeling transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
  • [44] Hong Liu, Zhangjie Cao, Mingsheng Long, Jianmin Wang, and Qiang Yang. Separate to adapt: Open set domain adaptation via progressive separation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2927–2936, 2019.
  • [45] Yuang Liu, Wei Zhang, and Jun Wang. Source-free domain adaptation for semantic segmentation. arXiv preprint arXiv:2103.16372, 2021.
  • [46] Mingsheng Long, Yue Cao, Jianmin Wang, and Michael Jordan. Learning transferable features with deep adaptation networks. In International Conference on Machine Learning, pages 97–105, 2015.
  • [47] Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I Jordan. Unsupervised domain adaptation with residual transfer networks. In Advances in Neural Information Processing Systems, pages 136–144, 2016.
  • [48] Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I Jordan. Deep transfer learning with joint adaptation networks. In International conference on machine learning, pages 2208–2217. PMLR, 2017.
  • [49] Yawei Luo, Liang Zheng, Tao Guan, Junqing Yu, and Yi Yang. Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2507–2516, 2019.
  • [50] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008.
  • [51] Ishan Misra and Laurens van der Maaten. Self-supervised learning of pretext-invariant representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6707–6717, 2020.
  • [52] Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE international conference on computer vision, pages 1520–1528, 2015.
  • [53] Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  • [54] Fei Pan, Inkyu Shin, Francois Rameau, Seokju Lee, and In So Kweon. Unsupervised intra-domain adaptation for semantic segmentation through self-supervision. arXiv preprint arXiv:2004.07703, 2020.
  • [55] Pau Panareda Busto and Juergen Gall. Open set domain adaptation. In Proceedings of the IEEE International Conference on Computer Vision, pages 754–763, 2017.
  • [56] Xingchao Peng, Ben Usman, Neela Kaushik, Dequan Wang, Judy Hoffman, and Kate Saenko. Visda: A synthetic-to-real benchmark for visual domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 2021–2026, 2018.
  • [57] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
  • [58] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015.
  • [59] Stephan R Richter, Vibhav Vineet, Stefan Roth, and Vladlen Koltun. Playing for data: Ground truth from computer games. In European conference on computer vision, pages 102–118. Springer, 2016.
  • [60] German Ros, Laura Sellart, Joanna Materzynska, David Vazquez, and Antonio M Lopez. The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3234–3243, 2016.
  • [61] Kate Saenko, Brian Kulis, Mario Fritz, and Trevor Darrell. Adapting visual category models to new domains. In European conference on computer vision, pages 213–226. Springer, 2010.
  • [62] Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, and Kate Saenko. Adversarial dropout regularization. arXiv preprint arXiv:1711.01575, 2017.
  • [63] Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, and Kate Saenko. Strong-weak distribution alignment for adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6956–6965, 2019.
  • [64] Kuniaki Saito, Kohei Watanabe, Yoshitaka Ushiku, and Tatsuya Harada. Maximum classifier discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3723–3732, 2018.
  • [65] Kuniaki Saito, Shohei Yamamoto, Yoshitaka Ushiku, and Tatsuya Harada.

    Open set domain adaptation by backpropagation.

    In Proceedings of the European Conference on Computer Vision (ECCV), pages 153–168, 2018.
  • [66] Christos Sakaridis, Dengxin Dai, and Luc Van Gool. Semantic foggy scene understanding with synthetic data. International Journal of Computer Vision, 126(9):973–992, 2018.
  • [67] Fatemeh Sadat Saleh, Mohammad Sadegh Aliakbarian, Mathieu Salzmann, Lars Petersson, and Jose M Alvarez. Effective use of synthetic data for urban scene semantic segmentation. In European Conference on Computer Vision, pages 86–103. Springer, 2018.
  • [68] Swami Sankaranarayanan, Yogesh Balaji, Carlos D Castillo, and Rama Chellappa. Generate to adapt: Aligning domains using generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8503–8512, 2018.
  • [69] Swami Sankaranarayanan, Yogesh Balaji, Arpit Jain, Ser Nam Lim, and Rama Chellappa. Learning from synthetic data: Addressing domain shift for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3752–3761, 2018.
  • [70] Prabhu Teja Sivaprasad and Francois Fleuret. Uncertainty reduction for model adaptation in semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), number CONF. IEEE, 2021.
  • [71] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
  • [72] Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in neural information processing systems, pages 1195–1204, 2017.
  • [73] Yonglong Tian, Dilip Krishnan, and Phillip Isola. Contrastive multiview coding. arXiv preprint arXiv:1906.05849, 2019.
  • [74] Yi-Hsuan Tsai, Wei-Chih Hung, Samuel Schulter, Kihyuk Sohn, Ming-Hsuan Yang, and Manmohan Chandraker. Learning to adapt structured output space for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7472–7481, 2018.
  • [75] Yi-Hsuan Tsai, Kihyuk Sohn, Samuel Schulter, and Manmohan Chandraker. Domain adaptation for structured output via discriminative patch representations. In Proceedings of the IEEE International Conference on Computer Vision, pages 1456–1465, 2019.
  • [76] Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7167–7176, 2017.
  • [77] Tuan-Hung Vu, Himalaya Jain, Maxime Bucher, Matthieu Cord, and Patrick Pérez. Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2517–2526, 2019.
  • [78] Jason Weston, Sumit Chopra, and Antoine Bordes. Memory networks. arXiv preprint arXiv:1410.3916, 2014.
  • [79] Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3733–3742, 2018.
  • [80] Rongchang Xie, Fei Yu, Jiachao Wang, Yizhou Wang, and Li Zhang. Multi-level domain adaptive learning for cross-domain detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Oct 2019.
  • [81] Chang-Dong Xu, Xing-Ran Zhao, Xin Jin, and Xiu-Shen Wei. Exploring categorical regularization for domain adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11724–11733, 2020.
  • [82] Ruijia Xu, Guanbin Li, Jihan Yang, and Liang Lin. Larger norm more transferable: An adaptive feature norm approach for unsupervised domain adaptation. In Proceedings of the IEEE International Conference on Computer Vision, pages 1426–1435, 2019.
  • [83] Yanchao Yang and Stefano Soatto. Fda: Fourier domain adaptation for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4085–4095, 2020.
  • [84] Mang Ye, Xu Zhang, Pong C Yuen, and Shih-Fu Chang. Unsupervised embedding learning via invariant and spreading instance feature. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6210–6219, 2019.
  • [85] Hao-Wei Yeh, Baoyao Yang, Pong C Yuen, and Tatsuya Harada. Sofa: Source-data-free feature alignment for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 474–483, 2021.
  • [86] Fisher Yu, Wenqi Xian, Yingying Chen, Fangchen Liu, Mike Liao, Vashisht Madhavan, and Trevor Darrell. Bdd100k: A diverse driving video database with scalable annotation tooling. arXiv preprint arXiv:1805.04687, 2(5):6, 2018.
  • [87] Jing Zhang, Zewei Ding, Wanqing Li, and Philip Ogunbona. Importance weighted adversarial nets for partial domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8156–8164, 2018.
  • [88] Jingyi Zhang, Jiaxing Huang, and Shijian Lu. Spectral unsupervised domain adaptation for visual recognition. arXiv preprint arXiv:2106.06112, 2021.
  • [89] Jingyi Zhang, Jiaxing Huang, Zhipeng Luo, Gongjie Zhang, and Shijian Lu. Da-detr: Domain adaptive detection transformer by hybrid attention. arXiv preprint arXiv:2103.17084, 2021.
  • [90] Xiaobing Zhang, Haigang Gong, Xili Dai, Fan Yang, Nianbo Liu, and Ming Liu. Understanding pictograph with facial features: End-to-end sentence-level lip reading of chinese. In AAAI, pages 9211–9218, 2019.
  • [91] Yang Zhang, Philip David, and Boqing Gong. Curriculum domain adaptation for semantic segmentation of urban scenes. In Proceedings of the IEEE International Conference on Computer Vision, pages 2020–2030, 2017.
  • [92] Ying Zhang, Tao Xiang, Timothy M Hospedales, and Huchuan Lu. Deep mutual learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4320–4328, 2018.
  • [93] Nanxuan Zhao, Zhirong Wu, Rynson W. H. Lau, and Stephen Lin. What makes instance discrimination good for transfer learning? In International Conference on Learning Representations, 2021.
  • [94] Zhedong Zheng and Yi Yang. Unsupervised scene adaptation with memory regularization in vivo. arXiv preprint arXiv:1912.11164, 2019.
  • [95] Zhun Zhong, Liang Zheng, Zhiming Luo, Shaozi Li, and Yi Yang. Invariance matters: Exemplar memory for domain adaptive person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 598–607, 2019.
  • [96] Xinge Zhu, Jiangmiao Pang, Ceyuan Yang, Jianping Shi, and Dahua Lin. Adapting object detectors via selective cross-domain alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 687–696, 2019.
  • [97] Chengxu Zhuang, Alex Lin Zhai, and Daniel Yamins.

    Local aggregation for unsupervised learning of visual embeddings.

    In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6002–6012, 2019.
  • [98] Yang Zou, Zhiding Yu, Xiaofeng Liu, BVK Kumar, and Jinsong Wang. Confidence regularized self-training. In Proceedings of the IEEE International Conference on Computer Vision, pages 5982–5991, 2019.
  • [99] Yang Zou, Zhiding Yu, BVK Vijaya Kumar, and Jinsong Wang. Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In Proceedings of the European Conference on Computer Vision (ECCV), pages 289–305, 2018.