1 Introduction
Automated plant identification is important in recognizing plant species. The availability of massive labeled training data is a prerequisite of machine learning models. Unfortunately, such a requirement cannot be met in the plant identification problem since we have sparse labels for realworld plant images. Therefore, we propose to transfer knowledge from an existing auxiliary labeled herbarium domain to the field photo domain with limited or no labels. However, due to the phenomenon of data bias or domain shift
[11], classification models do not generalize well from an existing herbarium domain to a novel field photo domain.Domain adaptation (DA) has been proposed to leverage knowledge from an abundant labeled source domain to learn an effective predictor for the target domain with few or no labels, while mitigating the domain shift problem [16, 19, 17, 20]. In this paper, we focus on unsupervised domain adaptation (UDA), where the target domain has no labels. Since we have fewer classes in the field photo domain, and the classes of the field photo domain is a subset of the classes of the source herbarium domain, we investigate partial domain adaptation (PDA) for the PlantCLEF 2020 Challenge.
Recently, deep neural network methods have been widely used in the domain adaptation problem. Notably, adversarial learning shows its power in embedding in deep neural networks to learn feature representations to minimize the discrepancy between the source and target domains
[14, 9]. Inspired by the generative adversarial network (GAN)
[6], adversarial learning also contains a feature extractor and a domain discriminator. The domain discriminator can distinguish the source domain from the target domain, while the feature extractor can learn domaininvariant representations to fool the domain discriminator [10, 18, 9]. The target domain risk (the error of the target domain) is expected to be minimized via minimax optimization. Cao et al. presented adversarial learning for PDA, which alleviates negative transfer by reducing the outlier of source classes for training the source classifier and domain labels, while positive transfer is improved via matching the feature distributions in the shared label space
[2]. Similarly, the example transfer network is proposed to jointly learn domaininvariant representations and a progressive weighting method to examine the transferability of source examples. The model can improve positive transfer by relevant examples and mitigate negative transfer by identifying irrelevant examples [3].Although many methods are proposed for partial domain adaptation, they still suffer from two challenges: (1) the models are evaluated on small datasets, while it has lower transferability to the largescale dataset, and (2) the feature consistency of two domains is inappropriately ignored.
To address the aforementioned challenges, we aggregate three different loss functions in one framework: source domain classification loss, adversarial learning loss, and feature consistency loss to reduce the discrepancy of the two domains. Moreover, our model is evaluated on a largescale plant identification dataset to improve the estimate of the generalization ability of our model.
Our contributions are threefold:

We propose a novel adversarial consistent learning network () for PDA, to adversarially minimize the domain discrepancy of the source and target domains and maintain domaininvariant features;

The proposed adversarial learning loss and feature consistency loss can distinguish the target domain from the source domain, and preserve the finegrained feature transition between the two domains;

We impose shared category selection to filter out the irrelevant categories in the source domain. By downweighting the irrelevant categories in the source domain, we can reduce negative transfer from the source domain to the target domain.
Experimental results show that achieves higher classification accuracy than several baseline methods and yields promising results on the PlantCLEF 2020 Challenge.
2 Dataset
PlantCLEF 2020 is a largescale dataset of the PlantCLEF 2020 task [5], organized in the context of the LifeCLEF 2020 challenge [8]. Fig. 1 shows some challenging images in this dataset. The herbarium domain contains 320,750 images in 997 species, and the number of images in different species are unbalanced. This dataset consists of herbarium sheets whereas the test set will be composed of field pictures. The validation set consists of two domains herbarium_photo_associations and photos. Herbarium_photo_associations domain includes 1,816 images from 244 species. This domain contains both herbarium sheets and field pictures for a subset of species, which enables learning a mapping between the herbarium sheets domain and the field pictures domain. Another photo domain has 4,482 images from 375 species and images are from plant pictures in the field, which is similar to the test dataset. The test dataset contains 3,186 unlabeled images. Due to the significant difference between herbarium and real photos, it is extremely difficult to identify the correct class.
Domain  Number of Samples  Number of Classes 

Herbarium (H)  320,750  997 
Herbarium_photo_associations (A)  1,816  244 
Photo (P)  4,482  375 
Test (T)  3,186   
We exclude the classId of “108335” in the photo domain since the major classes are from the herbarium domain. In addition, herbarium domain does not contain the “108335” category. Therefore, eight images are excluded in the photo domain. The statistics of the PlantCLEF 2020 dataset are listed in Tab. 1.
3 Methods
3.1 Motivation
Previous partial domain adaptation methods [2, 3] evaluated their models based on a small dataset (e.g., Office 31), while their models have lower generalizability to largescale datasets. In addition, feature consistency of both source and target domains is not well addressed in the PDA.
In this paper, we present our approach: adversarial consistent learning (ACL) on partial domain adaptation. It can align the feature distribution of the source and target domains in the shared categories and guarantee feature consistency across the two domains. Importantly, ACL identifies irrelevant source categories via downweighting class importance automatically. Evaluation on the largescale PlantCLEF 2020 challenge dataset shows a high generalizability of our model.
3.2 Problem and notation
For unsupervised domain adaptation, given a source domain of labeled samples across the set of categories and a target domain of samples without any labels ( is unknown) across the set of categories . For partial domain adaptation, the number of categories in is less than the number of categories in , and . The samples and obey the marginal distributions of and . The conditional distributions of two domains are denoted as and . Due to the discrepancy of two domains, the distributions are assumed to be different, i.e., and . Our ultimate goal is to learn a classifier under a feature extractor , which selects shared categories between two domains, and ensures lower generalization error in the target domain.
3.3 Deep features extraction
To circumvent the large computation resource requirement of training largescale PlantCLEF 2020 challenge datasets, we instead focus on deep features from pretrained models. Based on Zhang and Davison [17], the deep features are extracted from the last fully connected layer of the pretrained model via
. One represented feature vector has the size of
and corresponds to one plant image. Therefore, the source domain and the target domain can be represented by and , respectively.3.4 Source classifier
The task in the source domain is trained using the typical crossentropy loss in following equation:
(1) 
where
is the probability of each class of ground truth for the
th element of S, is the classifier in Fig. 2, and is the predicted probability.3.5 Adversarial domain loss
In general adversarial learning, the system learns a mapping from the source domain to the target domain. Given the feature representation of feature extractor , we can learn a discriminator , which can distinguish the two domains using the following loss function:
(2)  
However, Eq. 2 only guarantees source domain data will be close to the target data (), and it does not ensure that the target data will be close to the source data. We hence introduce another mapping from the target domain to the source domain in Eq. 3 and train it with the same adversarial loss as in as shown in Eq. 2.
(3) 
For , the source domain has the label of and the target domain has the label of , which is corresponding to Domain Label 1 in Fig. 2. Meanwhile, for , is the new label for the source domain and and is the new label for target domain, which is corresponding to Domain Label 2 in Fig. 2. Therefore, we define the adversarial learning loss as:
(4)  
3.6 Feature consistency loss
To encourage the source domain and target domain information to be preserved during adversarial learning, we propose a feature consistency loss in our model. Details of the feature reconstruction layers are shown in Fig. 2; the reconstructed layers are right behind the feature extractor in the shared layers, and they aim to reconstruct the extracted features and maintain the invariant features during the conversion process. The feature consistency loss is defined as:
(5)  
where is the mean squared error loss function, which calculates the difference between true features and the reconstructed features.
3.7 Shared categories selection
In PDA, the set of target domain labels is a subset of the source domain labels, i.e., . In the PlantCLEF challenge, the size of irrelevant label set () is far larger than the size of (). If we use all elements of the source domain distribution to match the target domain distribution, it will cause negative transfer since the target domain will also be forced to match the irrelevant labels (). Therefore, it is important to identify the shared categories between source and target domains.
To address the aforementioned challenge, we reweight the source domain label set via reducing the irrelevant label set. During the training, we can get the predicted probability of the target domain: , which gives a probability of each source label in . As we know, the set of irrelevant source labels and target label set are disjoint, and the target data are significantly dissimilar to the source data in the irrelevant label set. Therefore, the probability of irrelevant categories should be sufficiently small and can be ignored. We then defined the weight vector as:
(6) 
where is a dimensional weight vector. The irrelevant categories () will have a much smaller weight than the shared categories. We then assign the weight as if its element is less than a sufficiently small number (e.g., ). By reducing the weight of irrelevant categories, the shared categories can be emphasized and negative transfer will be mitigated. The weight vector is applied in both the source classifier and domain discriminator over the source domain data as shown in the following objective function.
3.8 Overall objective
We combine the three aforementioned loss functions to formalize our objective function:
(7)  
where and are tradeoff parameters between different loss functions. Our model ultimately minimizes the difference during the transition from the source domain to target domain and from the target domain to the source domain. Meanwhile, it maximizes the ability to distinguish the two domains.
3.9 Gradients of shared layers
The shared layers consist of the feature extractor and the feature reconstruction layers. In
, there are two dense layers, a “Relu” activation layer, and a dropout layer. The numbers of units of the dense layer are 1000 and 997, respectively. The rate of the Dropout layer is 0.5. The feature reconstruction layers have a “Relu” activation layer, a dropout layer and a dense layer with the number of units of 1000. The shared layers are jointly optimized by both the source classification loss, adversarial domain loss and feature consistency loss.
Let be the output of the shared encoder with parameters of . In addition, let be the output of class label classifier with parameters of , be the output of domain label predictor with parameters of , and be the output of feature consistency regressor with parameters of . Therefore, the shared layers are optimized by these three gradients. The parameters in the shared layers are updated in the following equation:
(8)  
where is the learning rate and is the adaptation factor from gradient reversal layer (GRL) in [4].
3.10 Theoretical Analysis
We now formalize the error bound of our model. model is trained with both the labeled source domain and the unlabeled target domain. The error bound of the source domain and the target domain () in our model is then formally written as:
(9) 
where is the predicted label of target domain. The term and denote the expected risk over the source domain and the target domain with respect to the ground truth labels and predicted labels, respectively (where is the L1 norm).
During the training, we expect the error to be close to , which evaluates the classifier with true target domain labels. The smaller the difference between these two errors, the better the model performs and more discrepancies of the two domains are reduced. Existing domain adaptation theory shows that the risk in the target domain can be minimized by bounding the source risk and discrepancy between source and target domains [1]). Therefore, the generalization error bound of our model is shown in the following Lemma.
Lemma 1 Let be a hypothesis in a class . Then
(10)  
where is the divergence of training and test data in the hypothesis space is the adaptability to quantify the error in ideal hypothesis space of training and test data, which should be small and is the optimal hypothesis via minimizing the joint error in Eq. 11.
(11) 
In Lemma 1, the generalization boundary of our model consists of three terms: training data error, data discrepancy , which is estimated by the disagreement of hypothesis in the space , and the adaptability of the ideal joint hypothesis. In model, the first term is measured by Eq. 1. The domain discrepancy is assessed by adversarial learning loss and feature consistency loss. Furthermore, finds the ideal hypothesis and reduces the training error in each iteration. Hence, our model can find a minimal boundary for two domains. In other words, can implicitly minimize the target domain risk, domain discrepancy, and the adaptability of true hypothesis in terms of the hypothesis space
4 Experiments
4.1 Implementation details
As aforementioned, the deep features are extracted from the last fully connected layer [15, 17]. One represented feature vector has the size of and is corresponding to one plant image. Therefore, the feature representation of domain herbarium (H) has the size of , domain herbarium_photo_associations (A) has the size of , domain photo (P) has the size of , and domain test (T) has the size of . In the experiment, our task is to reduce the error in the target domain (realworld plant images), i.e., photo domain or test domain. Our tasks will focus more on the evaluation of the domain P and domain T. Since the herbarium_photo_associations (A) is important to bridge the map between two domains, we hence include the domain A in the training procedure to form a new source domain, which consists of domain herbarium (H) and domain A. Domain H + A has the size of . We then train the model based on these extracted feature vectors. In Tab. 2, H P represents learning knowledge from domain H, which is applied to domain P.
The parameters of
are first tuned based on the performance of the domain P, while the model is trained with H + A domain. We then apply these parameters to domain T and submit it to the challenge for the evaluation. Our implementation is based on Keras. The parameters settings are
, , learning rate: , batch size = 128, the number of iterations is 1000 and the optimizer is Adam. The details of the layers are shown in Fig. 3.We also compare our results with two domain adaptation methods: DANN [4] and ADDA [14]. In addition, we extracted features from four welltrained models (ResNet50 [7], InceptionV3 [13], InceptionResnetV2 [12], NASNetLarge [21]
), which is trained based on largescale ImageNet datasets. We then feed these different extracted features into the shared layers and optimize the objective function in Eq.
7.4.2 Results
Task  A P  H P  H+A P 

DANN [4]  1.07  1.85  2.01 
ADDA [14]  2.95  3.05  3.43 
ResNet50  2.96  4.83  6.97 
InceptionV3  3.02  5.93  7.95 
InceptionResnetV2  3.73  7.07  8.43 
NASNetLarge  3.84  7.92  8.18 
NASNetLarge  5.98  8.64  9.67 
The performance of the photo domain is shown in Tab. 2. We report the accuracy of the whole photo domain (, where is the predicted label for the target domain. We can observe that the extracted features from NASNetLarge with our
architecture achieves the highest performance across all three tasks. We observe that two domain adaptation methods have relatively lower performance in all three tasks. One reason is that these two methods have weak feature extractors, and they do not exclude the irrelevant categories in the source domain, which might cause the negative transfer. Moreover, with the increasing of the ImageNet model, we can extract better features from plant images, which lead to the high performance of the NASNetLarge
model. In addition, we conduct an ablation study in which we train the best NASNetLarge model without the shared categories selection (NASNetLarge ). The results from all three tasks are lower than NASNetLarge model, which indicates the shared categories selection is useful in our model. These experiments demonstrate the efficiency of the model in finding the invariantfeatures of two domains.In the final stage of the PlantCLEF 2020 Challenge, our solutions are evaluated by the organizers using the test domain data. As shown in Tab. 3, our method achieved mean reciprocal rank (MRR) of 0.032 in the whole test domain, and MRR of 0.016 in the subset of the test domain, and our method places 4th in the contest.
Team  Full test set  Subset of the test set 

ITCR PlantNet  0.180  0.052 
Neuon AI  0.121  0.107 
UWB  0.039  0.007 
LU(ours)  0.032  0.016 
Domain  0.031  0.015 
To Be  0.028  0.016 
SSN  0.008  0.003 
5 Discussion
There are two compelling advantages of the
model. First, we consider the adversarial consistent learning paradigm, which maintains the domaininvariant features from the source domain to the target domain and vice versa. Secondly, we reduce the weight of irrelevant categories in the source domain, which eliminates the negative transfer during the training. Although the performance of our model is better than several baseline methods, the highest accuracy of the photo domain is less than 10%, which illustrates that the transfer learning ability in the real world image is lower. One underlying reason is that PlantCLEF 2020 Challenge has difficult datasets—that there are significant differences between herbarium domain and photo domain, as shown in Fig.
1. Another reason is caused by the weakness of our model since we only train deep features instead of raw images to reduce the computational requirements; some features might be ignored during the training. The performance of the model could be improved if we train the architecture with raw images.6 Conclusion
In this paper, we propose an adversarial consistent learning network on partial domain adaptation termed () to overcome limitations in finding proper shared categories and guaranteeing the feature consistency of two domains. Our model is optimized via minimizing a threecomponent loss function. As each component of our model, explicit domaininvariant features are maintained through such a crossdomain training scheme. Experimental results demonstrate our proposed model yields promising results on the PlantCLEF 2020 Challenge.
References
 [1] (2010) A theory of learning from different domains. Machine Learning 79 (12), pp. 151–175. Cited by: §3.10.

[2]
(2018)
Partial adversarial domain adaptation.
In
Proceedings of the European Conference on Computer Vision (ECCV)
, pp. 135–150. Cited by: §1, §3.1. 
[3]
(2019)
Domain adversarial reinforcement learning for partial domain adaptation
. arXiv preprint arXiv:1905.04094. Cited by: §1, §3.1. 
[4]
(2014)
Domain adaptive neural networks for object recognition.
In
Pacific Rim international conference on artificial intelligence
, pp. 898–904. Cited by: §3.9, §4.1, Table 2.  [5] (2020) Overview of lifeclef plant identification task 2020. In CLEF working notes 2020, CLEF: Conference and Labs of the Evaluation Forum, Sep. 2020, Thessaloniki, Greece., Cited by: §2, Table 3.
 [6] (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §1.

[7]
(2016)
Deep residual learning for image recognition.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pp. 770–778. Cited by: §4.1.  [8] (2020) Overview of lifeclef 2020: a systemoriented evaluation of automated species identification and species distribution prediction. In Proceedings of CLEF 2020, CLEF: Conference and Labs of the Evaluation Forum, Sep. 2020, Thessaloniki, Greece., Cited by: §2.
 [9] (2019) Transferable adversarial training: a general approach to adapting deep classifiers. In International Conference on Machine Learning, pp. 4013–4022. Cited by: §1.
 [10] (2018) Conditional adversarial domain adaptation. In Advances in Neural Information Processing Systems, pp. 1647–1657. Cited by: §1.
 [11] (2010) A survey on transfer learning. IEEE Transactions on knowledge and data engineering 22 (10), pp. 1345–1359. Cited by: §1.

[12]
(2017)
Inceptionv4, inceptionresnet and the impact of residual connections on learning
. In ThirtyFirst AAAI Conference on Artificial Intelligence, Cited by: §4.1.  [13] (2016) Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826. Cited by: §4.1.
 [14] (2017) Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7167–7176. Cited by: §1, §4.1, Table 2.

[15]
(2018)
Automated identification of hookahs (waterpipes) on instagram: an application in feature extraction using convolutional neural network and support vector machine classification
. Journal of Medical Internet Research 20 (11), pp. e10513. Cited by: §4.1.  [16] (2019) Modified distribution alignment for domain adaptation with pretrained inception resnet. arXiv preprint arXiv:1904.02322. Cited by: §1.
 [17] (2020) Impact of imagenet model selection on domain adaptation. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision Workshops, pp. 173–182. Cited by: §1, §3.3, §4.1.
 [18] (2019) Domainsymmetric networks for adversarial domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5031–5040. Cited by: §1.
 [19] (2019) Transductive learning via improved geodesic sampling. In Proceedings of the 30th British Machine Vision Conference, Cited by: §1.
 [20] (2020) Domain adaptation for object recognition using subspace sampling demons. Multimedia Tools and Applications. Cited by: §1.
 [21] (2018) Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8697–8710. Cited by: §4.1.
Comments
There are no comments yet.