Appearance changes based on lighting, seasonal, and weather conditions provide a significant challenge for outdoor robots relying on machine learning models for perception. While providing high performance in their training domain, visual shifts occurring in the environment can result in significant deviations from the training distribution, severely reducing accuracy during deployment. Commonly, this challenge is partially counteracted by employing additional training methods to render these models invariant to their application domain .
. Recent state-of-the-art approaches which address this challenge operate by training deep neural networks within an adversarial domain adaptation (ADA) framework. These approaches are characterised by the optimisation of potentially multiple encoders with the objective to confuse a domain discriminator operating on their output[3, 4, 5] in additional to their main objective. The main intuition behind this framework is that by training the encoder to obtain a domain invariant embedding, we allow the main supervised task to be robust to changes in the application domain.
Recent successes based on adversarial domain adaptation have achieved state-of-the-art performance on toy datasets [3, 6, 7, 8] as well as real-world applications for autonomous driving within changing environmental conditions [9, 5]. However, domains with a significant difference in appearance - such as day and night - continue to present a substantial challenge . We conjecture that the observed change in environmental conditions in many domains of continuous deployment (e.g. in autonomous driving) is up to some extend composed via a gradual process which accumulates to produce massive differences over extended periods. This work exploits the incremental changes observed throughout deployment to continuously counteract the domain shift by updating discriminator and encoder incrementally while observing the visual shift (as illustrated in Figure 1).
During domain adaptation training, existing methods rely on data from the source domain to ensure training the discriminator based on a balanced distribution of source and target data to prevent overfitting to the target domain. These approaches therefore require the storage of potentially massive datasets, which can provide a challenge in particular for mobile applications with limited available memory.
Similar to work on synthetic dataset extension , we remove this requirement by training a generative adversarial network (GAN)  to imitate the marginal encoder feature distribution in the source domain. We empirically demonstrate that domain adaptation via aligning target encoder features with GAN generated samples instead of source domain feature embeddings only results in minor performance reduction. Crucially, this means that the deployment module is fully independent of the size of source domain dataset, enabling application on mobile platforms with limited memory resources.
The approach is evaluated both on synthetic and real-world data. An artificial dataset is created with direct control over the number of intermediate domains and the strength of the incremental shifts for illustration and demonstration purposes. The following real-world evaluation focuses on a drivable-path segmentation task for autonomous driving based on segments from the Oxford RobotCar Dataset  with different illumination conditions from different times of day.
The contributions of our work are as follows:
Introduction of a method for incremental unsupervised domain adaptation for platforms deployed in continuously changing environments.
Presentation of an additional method to remove the requirement of retaining extensive amounts of source data by modelling the feature representation of source domain data with a generative model.
Quantitative investigation of the influence of dividing the adaptation task into incremental alignment over smaller shifts based on a synthetic toy example.
Application of the proposed method to the real-world task of drivable-terrain segmentation and proof of feasibility for online application in the context of run-time evaluation on an NVIDIA GPU.
Ii Related Work
Continuously changing environment appearances have been a long-standing challenge for robot deployment as shifts between training and deployment data can seriously degrade model performance. Considerable efforts have been focused on designing and comparing various feature transformations with the goal of creating representations invariant to environmental change . Other approaches address the problem through retaining multiple experiences  or synthesising images between discrete domains . However, it is unclear how these systems can efficiently scale to a continuous shift in the domain distribution.
In recent years, there has been a steady trend towards applying deep networks for various robotics tasks, where early layers act as a feature encoder with a supervised loss for the desired task on the output of the network. Unfortunately, even such powerful models still suffer from the problem of shifts in domain appearance. This has prompted a number of works which try to address this issue [3, 9, 16, 5].
The possibility to directly optimise complete feature representations via backpropagation for domain invariance or target-source mappings  has lead to significant success of deep architectures in this field. Long et al.  focus on minimising the Maximum Mean Discrepancy for the feature distributions of multiple layers of the network architecture. Rozantsev et al.  extend in a similar direction and impose a penalty for deviations in the network parameters across domains. Sun et al.  align second order statistics of layer activations for source and target domains. Hoffman et al.  match the label statistics between the true source and predicted target labels for semantic classification.
, which rely on training a domain discriminator to classify the domains underlying an encoder’s feature distribution. While adversarial training techniques have been shown to be notoriously unstable and difficult to optimise, there has been a pronounced body of work towards improving their stability, including more dominant use of the confusion loss and more recently the Wasserstein GAN framework .
All the above mentioned works treat the unsupervised domain adaptation problem as a batch transition without exploiting temporal coherence commonly available to robots in continuous deployment. Continuous refinement has however been actively researched in supervised learning for many years (e.g. [20, 21, 22]), yet there has been little work on methods for unsupervised domain adaptation. One notable exception is the work by Hoffman et al. , which addresses the problem with predefined features and focuses on the challenges of aligning to a continuously reshaping target domain. This work seeks to extend the recently developed approach of adversarial domain adaption to a continuously evolving target domain by capitalising on the perpetual observations made by a robot.
Incremental Adversarial Domain Adaptation addresses the problem of degraded model performance due to continuously shifting environmental conditions. This includes changes caused by weather and lighting occurring in outdoor scenarios. Compared to the regular single-step domain adaptation paradigm, we benefit in applications building on continual deployment through exploitation of the incremental changes that integrate to large domain shifts. Continuously observable lighting or seasonal shifts in outdoor robotics and other applications constitute a prime example for this paradigm.
The approach extends adversarial domain adaptation approaches  aiming to facilitate learning a feature encoding which is invariant with respect to the origin domain of its input data. In this way, the method enables the application of a supervised module trained only on source domain data to incoming unsupervised data from the application domain as depicted in Figure 2.
In comparison to existing methods [3, 7, 5] which frame the task of unsupervised domain adaptation as a one-step approach between distinct source and target domains, IADA treats the incoming data as a stream of incrementally changing domains. Exploiting access to data from these incremental changes facilitates alignment over greater overall shifts between the target and source domains. The encoder and discriminator models are updated gradually to enable alignment for each incrementally shifted domain.
Adversarial domain adaptation 
generally tends to be hyperparameter search intensive as it - in addition to the adversarial min-max problem - is affected by the potential conflict of the domain invariance objective and the supervised objective. Intuitively, by dividing the domain alignment procedure into smaller incremental shifts, we simplify the overall task which can minimise the loss of relevant information.
The training procedure is split into two principal segments: offline supervised optimisation on source domain data and the unsupervised domain adaptation procedure, which potentially can be run online during platform deployment as displayed in Figure 2.
Hereinafter, let be the parametrisation of module and the incoming images. Source and target domains are represented with subscripts and respectively. The supervised training procedure optimises the supervised module with the predicted label as well as the source domain encoder based on a supervised task (e.g. classification or segmentation as in Section IV).
The parameters of both of these modules remain unchanged during training for domain adaptation, which enables us to keep source performance unaffected (an approach suggested for regular ADA in ). Only the target encoder and discriminator are trained via their respective objectives and in Equations 1 and III to align the target and source encoder feature spaces. Let and respectively denote the feature encoding of source and target images and .
The target encoder weights are initialised with parameters from the source encoder trained on the supervised task. These inchoate parameters are then subsequently adapted to align to the currently encountered target data by optimising both the target encoder and discriminator using the objectives in Equations 1 and III respectively. Intuitively, this procedure entails using the optimised parameters from the previously encountered target domain as initialisation for adapting to the current domain. The currently encountered unsupervised data is hereby utilised to fill a buffer from which is continuously sampled for the domain adaption training procedure.
Iii-a Source Distribution Modelling
As IADA’s benefits apply in the context of continuously deployed platforms, the inherent requirement of ADA-based methods to retain potentially large amounts of source training data can constrain its application. Limitations on the weight and size of platforms often leads to restrictions on computational resources including data storage. To counteract this requirement, we additionally extend our method with a GAN-based approach to mimic the source domain’s feature distribution, thus rendering the approach independent of the amount of source data during the domain adaptation task.
More concretely, we optimise a generator
which maps from n-dimensional, normally distributed noiseto approximate the feature distribution in our source domain during the offline training step. While the original GAN framework  aims at mimicking natural images, our approach simply aims to imitate the feature encoding of images (displayed in Figure 3). The resulting objectives for generator and discriminator are displayed in Equations 3 and III-A. Let denote the generator features generated as .
Subsequently during the domain adaptation procedure, the target encoder is optimised to align to the feature distribution of the GAN, whose parameters remain static to model the source domain. Instead of optimising the discriminator to classify between source and target domain, in this scenario it learns to distinguish between synthetically generated source features and actual target features encoding target images. Target encoder and discriminator are optimised towards the objectives and in Equations 5 and III-A respectively.
Similar to IADA , we utilise all models, which are trained in the source domain, as initialisation for training in the target domain. For SDM, this procedure additionally includes the discriminator. Lastly, the deployment setup is equivalent to IADA.
Our evaluation is split into two parts: we first investigate a toy scenario with artificially induced domain shift for the purpose of visualisation and clarification, then we demonstrate performance gains in a continuous deployment scenario for drivable-path segmentation for autonomous mobility.
The evaluation compares IADA against its one-step counterpart ADA, and furthermore investigates the influence of source domain modelling based on Section III-A
. The evaluation metric depends on the supervised task in the respective target domains, classification accuracy for the toy example and mean average precision for the drivable-path segmentation task.
While ADA only utilises the final source domain, IADA has access to all incremental domains. To evaluate if the cause of IADA’s advantages simply is the reliance on a larger dataset, we additionally introduce ADA Union. The method combines all target domains into a single dataset and performs regular ADA with respect to this union over all target domains.
Iv-a Toy Example: Incrementally Transformed MNIST
To quantify the benefits of IADA in relation to the strength of domain shifts and the number of intermediate domains, we first evaluate the approach in a scenario based on increasingly, synthetically deformed versions of the popular MNIST dataset.
We create additional, transformed copies of the original dataset with height-rescaled digits of between 0.9 to 0.5 times the original height, which are visualised in Figure 4. These synthetically transformed domains enable us to create a scenario with full control over the underlying domain shift and ensure that the occurring changes can be observed and utilised for domain adaptation in arbitrary detail.
We employ a Network-in-Network like architecture 
with exponential linear activation functions
splitting after the last hidden layer and applying a discriminator with 2 hidden layers and each 512 neurons. All parameters before the split are duplicated for source and target encoders while the supervised module consists of the last fully connected layer. The adversarial loss is weighted by a factor of 0.001 for domain adaptation as well as training the GAN as part of the source domain modelling step.
Table I shows the target domain classification accuracy of 1-step adaptation methods against their incremental counterparts which continue optimising the target encoder across domains with incrementally increasing domain shift. Furthermore, we test the methods in combination with GAN-based Source Domain Modelling (SDM) introduced in Section III-A.
As displayed in Table I, all domain adaptation approaches outperform pure source-optimised models, while incremental domain adaptation provides additional benefits over regular ADA. Utilising the union of all target domains as target domain for regular ADA increases model accuracy in the final target domain only slightly above using only one target domain and still performs worse than IADA. Finally, the SDM variants of all approaches only result in minor performance reductions, while reducing memory requirements significantly.
To investigate classification accuracy in dependence of the number of available intermediate domains, the MNIST digits are rescaled further to 0.3 of the original height and evaluated with varying numbers of equally spread intermediate domains with IADA and IADA SDM. We chose to increase the deformation in comparison to the earlier experiment to increase the complexity of the domain adaptation task and enable evaluating a wider range of sub-domain discretisations.
Separating larger shifts into incremental steps as displayed in Figure 5 enables us to address the problem with a curriculum of easier tasks. Above a certain threshold however target performance remains consistent with further increase of the number of target domains. For the domain adaptation from MNIST to its rescaled copy, the benefits of incremental domain adaptation saturate at around 10 to 20 intermediate domains. However, more complex transformations can rely on even more incremental approaches.
Iv-B Free Space Segmentation
An area of active research in autonomous driving is the detection of traversable terrain. Especially when utilising images as input, methods often rely on collecting data in all possible deployment domains and weather conditions.
We evaluate IADA in this context for a drivable-path segmentation method based on segments of the Oxford RobotCar dataset . The employed path segmentation labels are generated in a self-supervised setting based on . The dataset consists of approximately hour-long driving sessions from different days collected over the course of a year. Based on the nature of the dataset, we approximate the scenario of continuous application by picking five datasets to represent different daylight conditions from morning to evening and train on a labelled source dataset based on morning data as seen in Figure 6.
The resulting 5 intermediate domains were chosen to represent incremental change in lighting conditions and serve as a proxy for the online deployment scenario. Each domain consists of about 2000 images, rescaled for the evaluation to a size of 128 by 320 pixels. Pixel-wise segmentation labels for training are available only for the source domain, while the approach utilises test labels for the evaluation in all domains.
For all segmentation tasks, we employ an adaptation of the ENet  architecture which presents a compromise of model performance and size. The architecture focuses on strong segmentation accuracy as well as reasonable computational requirements, which makes it a strong contender for online deployment on mobile platforms. For the discriminator, we split the ENet architecture just before the upsampling stages (see ) and employ an additional 4-layer convolutional discriminator. Similar to Section IV-A we duplicate all parameters before the architecture split to be utilised as source and target encoders.
The results for drivable-path segmentation are represented in Table II. Similar to the application on the synthetic domain shift dataset in Section IV-A, IADA outperforms one-step domain adaptation. ADA with respect to the union over all incremental target domains is more accurate than with only the final domain but not as exact as IADA. Again, SDM slightly reduces the target performances, however rendering the storage of significant source datasets unnecessary.
In real-world scenarios, we cannot ensure smoothness over the appearance changes and the turning-on of street lights for the final target domain indeed represents a step change in our environment. It is to be expected that more continuous domain shifts would increase the advantages of IADA as displayed in the context of synthetic data in Section IV-A.
In comparison to the toy example, the combination of all target domains only leads to minor improvement over the regular application of ADA with only the final target domain. The final target domain’s instantaneous change in lighting due to the switching-on of the street lights leads to significant differences to previous target domains. This scenario renders domain adaptation to the union over all target domains less efficient and emphasising the importance of an incremental method, which focuses on the current target domain.
With computation times of approximately 26 minutes for the adaptation to a new incremental target domain on an NVIDIA GeForce GTX Titan Xp GPU, we can potentially deploy the system on vehicles to adapt to the currently encountered domain at a rate of about 55 times a day in continuous deployment. The extension with source domain modelling reaches computation times of 29 minutes resulting in nearly 50 updates per day.
While IADA’s principal benefits are based on continuous access to the incremental shifts between source and target domains, the evaluation for drivable-path segmentation with our offline datasets builds on a sequence of distinct target domains extracted from the Oxford RobotCar Dataset. The approach can be extended easily to more continuous alignment to the online perceived data domain via the utilisation of sliding window sampling during deployment. Interestingly, it was shown in Section IV-A that the benefits of dividing the target domains further for IADA can saturate when the intermediate domains are becoming increasingly similar.
IADA relies on access to the incremental shifts in the appearance of our environment. With limited access or step-wise changes in the perceived environment the approach degrades to regular adversarial domain adaptation. In particular, this paradigm becomes visible in our segmentation datasets where the turning-on of the streetlights leads to an instantaneous change in the appearance of the environment.
However, this instantaneous domain shift caused by the final domain’s lighting change further emphasises the benefits of an incremental approach over simply using the union over all target domains for regular domain adaptation. IADA significantly outperforms this method as it specifically optimises for the currently relevant target domain.
All experimental results noted in our work are based on the confusion loss for domain adaptation . An adaptation of the Wasserstein GAN framework  for domain adaptation leads to (on average) slightly more stable training and statistically insignificantly improvement in performance. However, we focused on the confusion loss formulation as, due to the additional critic training rounds required for the WGAN framework, it leads to significantly lower training duration.
While increased computational effort might not be critical for server-side computation, it can limit applications of embedded systems. In the context of cloud computing or larger platforms with significant data storage volumes, the minor accuracy loss can be prevented when applying the original formulation for IADA in Section III.
Vi Conclusion and Future Work
We present a method for addressing the task of domain adaptation in an incremental fashion, adapting to a continuous stream of changing target domains. Furthermore, we introduce an approach for source domain modelling, training a GAN to approximate the feature distribution in the source domain to render the domain adaptation step independent of retaining large amounts of source data. Both methods are evaluated first on synthetically shifted versions of rescaled MNIST digits for illustration purposes and full access to the number of intermediate domains. Furthermore, we empirically demonstrate their performance on the real-world task of drivable-path segmentation in the context of autonomous driving.
The field of continual training during deployment provides many possible benefits as models can be adapted to the currently encountered environment and learn from data unavailable during offline training. However, the approach also opens up new security challenges. The well-known problem of perpetrators introducing adversarial samples to the system could lead to not only corruption of the current prediction but prolonged distortion of the model. This area represents an essential direction for further research on defending against adversarial examples. Further indispensable extensions of this work include addressing the additional problem of catastrophic forgetting in lifelong-learning scenarios. This direction has the potential to further reduce computational requirements as it will discard the necessity to readapt to once encountered target domains.
The authors would like to acknowledge the support of the UK’s Engineering and Physical Sciences Research Council (EPSRC) through the Programme Grant EP/M019918/1 and the Doctoral Training Award (DTA) as well as the support of the Hans-Lenze-Foundation. Additionally, the donation from NVIDIA of the Titan Xp GPU used in this work is gratefully acknowledged.
-  Sivalogeswaran Ratnasingam and Steve Collins. Study of the photodetector characteristics of a camera for color constancy in natural scenes. JOSA A, 27(2):286–294, 2010.
-  Mingsheng Long, Jianmin Wang, and Michael I. Jordan. Deep transfer learning with joint adaptation networks. CoRR, abs/1605.06636, 2016.
-  Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, Victor Lempitsky, Urun Dogan, Marius Kloft, Francesco Orabona, and Tatiana Tommasi. Domain-Adversarial Training of Neural Networks. Journal of Machine Learning Research, 17:1–35, 2016.
-  Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, and Mario Marchand. Domain-adversarial neural networks. arXiv preprint arXiv:1412.4446, 2014.
-  Markus Wulfmeier, Alex Bewley, and Ingmar Posner. Addressing appearance change in outdoor robotics with adversarial domain adaptation. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, 2017.
-  Konstantinos Bousmalis, George Trigeorgis, Nathan Silberman, Dilip Krishnan, and Dumitru Erhan. Domain separation networks. In Advances in Neural Information Processing Systems, pages 343–351, 2016.
-  Eric Tzeng, Coline Devin, Judy Hoffman, Chelsea Finn, Xingchao Peng, Sergey Levine, Kate Saenko, and Trevor Darrell. Towards adapting deep visuomotor representations from simulated to real environments. arXiv preprint arXiv:1511.07111, 2015.
-  Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. Adversarial discriminative domain adaptation. arXiv preprint arXiv:1702.05464, 2017.
-  Judy Hoffman, Dequan Wang, Fisher Yu, and Trevor Darrell. Fcns in the wild: Pixel-level adversarial and constraint-based adaptation. arXiv preprint arXiv:1612.02649, 2016.
-  Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda Wang, and Russ Webb. Learning from simulated and unsupervised images through adversarial training. CoRR, abs/1612.07828, 2016.
-  Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
-  Will Maddern, Geoff Pascoe, Chris Linegar, and Paul Newman. 1 Year, 1000km: The Oxford RobotCar Dataset. The International Journal of Robotics Research (IJRR), 36(1):3–15, 2017.
-  Stephanie Lowry, Niko Sünderhauf, Paul Newman, John J Leonard, David Cox, Peter Corke, and Michael J Milford. Visual place recognition: A survey. IEEE Transactions on Robotics, 32(1):1–19, 2016.
-  Winston Churchill and Paul Newman. Experience-based Navigation for Long-term Localisation. The International Journal of Robotics Research (IJRR), 2013.
-  Peer Neubert, Niko Sunderhauf, and Peter Protzel. Appearance change prediction for long-term navigation across seasons. In Mobile Robots (ECMR), 2013 European Conference on, pages 198–203. IEEE, 2013.
-  Sungeun Hong, Woobin Im, Jongbin Ryu, and Hyun S Yang. Sspp-dan: Deep domain adaptation network for face recognition with single sample per person. arXiv preprint arXiv:1702.04069, 2017.
-  Artem Rozantsev, Mathieu Salzmann, and Pascal Fua. Beyond sharing weights for deep domain adaptation. arXiv preprint arXiv:1603.06432, 2016.
-  Baochen Sun, Jiashi Feng, and Kate Saenko. Correlation alignment for unsupervised domain adaptation. CoRR, abs/1612.01939, 2016.
-  Martin Arjovsky and Léon Bottou. Towards principled methods for training generative adversarial networks. In NIPS 2016 Workshop on Adversarial Training. In review for ICLR, volume 2016, 2017.
-  Aniruddha Kembhavi, Behjat Siddiquie, Roland Miezianko, Scott McCloskey, and Larry S Davis. Incremental multiple kernel learning for object recognition. In Computer Vision, 2009 IEEE 12th International Conference on, pages 638–645. IEEE, 2009.
Vidit Jain and Erik Learned-Miller.
Online domain adaptation of a pre-trained cascade of classifiers.
Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 577–584. IEEE, 2011.
-  Alex Bewley, Vitor Guizilini, Fabio Ramos, and Ben Upcroft. Online self-supervised multi-instance segmentation of dynamic objects. In Robotics and Automation (ICRA), 2014 IEEE International Conference on, pages 1296–1303. IEEE, 2014.
-  Judy Hoffman, Trevor Darrell, and Kate Saenko. Continuous manifold based adaptation for evolving visual domains. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 867–874, 2014.
-  Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. arXiv preprint arXiv:1312.4400, 2013.
-  Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289, 2015.
-  Dan Barnes, William P. Maddern, and Ingmar Posner. Find your own way: Weakly-supervised segmentation of path proposals for urban autonomy. CoRR, abs/1610.01238, 2016.
-  Adam Paszke, Abhishek Chaurasia, Sangpil Kim, and Eugenio Culurciello. Enet: A deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147, 2016.
-  Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017.