Performance Evaluation of Deep Transfer Learning on Multiclass Identification of Common Weed Species in Cotton Production Systems

by   Dong Chen, et al.

Precision weed management offers a promising solution for sustainable cropping systems through the use of chemical-reduced/non-chemical robotic weeding techniques, which apply suitable control tactics to individual weeds. Therefore, accurate identification of weed species plays a crucial role in such systems to enable precise, individualized weed treatment. This paper makes a first comprehensive evaluation of deep transfer learning (DTL) for identifying common weeds specific to cotton production systems in southern United States. A new dataset for weed identification was created, consisting of 5187 color images of 15 weed classes collected under natural lighting conditions and at varied weed growth stages, in cotton fields during the 2020 and 2021 field seasons. We evaluated 27 state-of-the-art deep learning models through transfer learning and established an extensive benchmark for the considered weed identification task. DTL achieved high classification accuracy of F1 scores exceeding 95 across models. ResNet101 achieved the best F1-score of 99.1 the 27 models achieved F1 scores exceeding 98.0 minority weed classes with few training samples was less satisfactory for models trained with a conventional, unweighted cross entropy loss function. To address this issue, a weighted cross entropy loss function was adopted, which achieved substantially improved accuracies for minority weed classes. Furthermore, a deep learning-based cosine similarity metrics was employed to analyze the similarity among weed classes, assisting in the interpretation of classifications. Both the codes for model benchmarking and the weed dataset are made publicly available, which expect to be be a valuable resource for future research in weed identification and beyond.



There are no comments yet.


page 5

page 10

page 12


Go-CaRD – Generic, Optical Car Part Recognition and Detection: Collection, Insights, and Applications

Systems for the automatic recognition and detection of automotive parts ...

Benchmarking Multi-Task Learning for Sentiment Analysis and Offensive Language Identification in Under-Resourced Dravidian Languages

To obtain extensive annotated data for under-resourced languages is chal...

Dimensionality Expansion and Transfer Learning for Next Generation Energy Management Systems

Electrical management systems (EMS) are playing a central role in enabli...

Bird Species Classification using Transfer Learning with Multistage Training

Bird species classification has received more and more attention in the ...

Deep Transfer Learning for Multiple Class Novelty Detection

We propose a transfer learning-based solution for the problem of multipl...

Simpson's Bias in NLP Training

In most machine learning tasks, we evaluate a model M on a given data po...

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Weeds are critical threats to crop production; potential crop yield loss due to weeds is estimated at 43% on a global scale

Oerke (2006). In cotton production, poor weed management can lead to yield loss of up to 90% Manalil et al. (2017). Weeds control is traditionally performed through machines or by hand weeding. With the advent of trans-genetic, glyphosate-tolerant crops since 1996, over 90% of the U.S. farm lands for field crops such as cotton, are planted with herbicide-resistant seeds Service. (2015). The weed control has become predominantly reliant on herbicide application Duke (2015); Pandey et al. (2021). Intensive, blanket, broadcast application of herbicides, however, has adverse environmental impacts and facilitates the evolution of herbicide-resistant weeds (e.g., Palmer Amaranth and Waterhemp), which in turn substantially increases management costs Norsworthy et al. (2012).

Precision weed management (PWM) has recently emerged as a promising solution for sustainable, effective weed control, which incorporates sensors, computer systems and robotics into cropping systems Young et al. (2014). By recognizing the biological attributes of different weed species, PWM enables precise and minimum necessary treatments according to site-specific demand and targeting individual weeds or a small cluster Gerhards and Christensen (2003), which can lead to significant reduction in the consumption of herbicides and other resources. For instance, a robotic weeder can spray a particular type or volume of herbicide or use mechanical weeder or lasers to treat specific weed species, avoiding unnecessary application to crops, bare soil or plant residuals Barnes et al. (2021). Therefore, successful implementation of integrated, precise weed control strategies relies on accurate identification, localization and monitoring of weeds. Currently, machine vision and robotic technology for automated weed control have been demonstrated in certain speciality crops Fennimore and Cutulle (2019). However, commercial-scale applicability to row crops such as cotton in varying growing conditions has yet to be evaluated or demonstrated. Lack of a robust machine vision system capable of weed recognition with accuracy exceeding 95% in unstructured field conditions has been identified as one of the most critical technological bottlenecks towards full realization of automated weeding Westwood et al. (2018). The key to addressing this bottleneck thus lies in the development of image analysis and modeling algorithms of high and robust performance.

Image analysis methods based on the extraction of color and texture features, followed by thresholding or supervised modeling, are widely used for weed classification and detection Wang et al. (2019); Meyer and Neto (2008). A variety of color indices that accentuate plant greenness have been proposed for separating weeds from soil backgrounds Meyer and Neto (2008); Woebbecke et al. (1995). The color indices that are developed from empirical observations are however not robust enough in dealing with images acquired under variable field lighting conditions Hamuda et al. (2016). In Bawden et al. (2017), texture features including local binary pattern and covariance features were used to perform weed classification, and the extracted features that were applied to a robotic platform, achieved an accuracy of 92.3% on the dataset containing 40 images of 6 weed species. Local shape and edge orientation features were used in Ahmad et al. (2018) for discriminating monocot and dicot weeds, which achieved an overall accuracy of 98.4% based on AdaBoost with Naïve Bayes. In Bakhshipour and Jafari (2018)

, Fourier descriptors and invariant moments were extracted and fed into support vector machine for classifying four common weeds in sugarbeet fields, resulting in an accuracy of 93.3% accuracy. Despite promising results, the aforementioned color or texture feature-based approaches require engineering hand-crafted features for given weed detection/classification tasks, which may not adapt satisfactorily to a more diverse set of imaging conditions.

Recently, data-driven methods such as deep learning (DL), e.g., convolutional neural networks (CNNs), have been researched for weed classification and detection

Hasan et al. (2021)

. CNNs are able to capture spatial and temporal dependencies of images through the use of shared-weight filters and can be trained end-to-end without explicit feature extraction

O’Shea and Nash (2015)

, empowering neural networks to adaptively discover the underlying class-specific patterns and the most discriminative features. In

Dyrmann et al. (2016), a CNN model, which was trained on a dataset containing 10413 images with 22 plant species at early growth stage, achieved a classification accuracy of up to 98%. A graph-based DL architecture with multi-scale graph representations was developed in Hu et al. (2020) for weed classification, achieving an accuracy of 98.1% on the DeepWeeds dataset Olsen et al. (2019). While successful, training such DL models from scratch are very time-consuming and resource-intensive, requiring high-performance computation units and large-scale, high-quality annotated image datasets, which may not be readily available.

Transfer learning, a methodology that aims at transferring the knowledge across domains, can greatly reduce the training time and the dependence on massive training data by reusing already trained models for new problems Zhuang et al. (2020). Therefore, deep transfer learning (DTL, i.e., transferring DL models) only involves fine-tuning model parameters using new datasets in the target domain. DTL has recently been investigated for weed identification. In Olsen et al. (2019), two pretrained DL models were fined-turned and tested on the DeepWeeds dataset, achieving average accuracies above 95%. In Espejo-Garcia et al. (2020a)

, the authors found that fune-tuning DL models on agricultural datasets helped reduce training epochs while improving model accuracies. They fine-tuned four DL models on the Plant Seedlings Dataset

Giselsson et al. (2017) and the Early Crop Weeds Dataset Espejo-Garcia et al. (2020b) and improved the classification accuracy by 0.51% to 1.89%, respectively. In Suh et al. (2018), six pretrained DL models wer adopted to classify sugarbeet and volunteer potato images and achieved the best accuracy of up to 98.7%. In Ahmad et al. (2021), three pre-trained CNN models, were used for weed classification, achieving 98.8% accuracy in classifying four weed species in corn and soybean fields. These studies, however, only experimented a small number of DL models. Given active developments in DL model architectures Khan et al. (2020), it would be beneficial to the research community to evaluate a broad range of state-of-the-art DL models on weed identification, so as to facilitate informed selection of high-performance models in terms of accuracy, training time, model complexity, and inference speed.

Despite transfer learning strategies, having large volumes of annotated image data is highly desirable for powering DL models in visual categorization tasks Sun et al. (2017). Currently, the dearth of such datasets remains a crucial hurdle for exploiting the potential of DL and advancing machine vision systems for precision agriculture Lu and Young (2020); Library (2021). In weed detection, to achieve high accuracy and robustness requires a dataset that provides adequate representation of important weed species and accounts for the variations associated with environmental factors (e.g., soil types and characteristics, field light, shadows) as well as growth-stage-related morphological or physiological variations. Recently, Lu and Young (2020) reviewed 15 publicly available weed image datasets dedicated to weed control Lu and Young (2020), such as DeepWeeds Olsen et al. (2019), Early crop weed dataset Espejo-Garcia et al. (2020b), Open Plant Phenotyping Database Leminen Madsen et al. (2020), among some others. Most of these datasets target a small number of weed species, with images acquired from a single growth season in geographically similar field sites. No image datasets of weeds specific to cotton production systems have been published so far.

In this paper, we present a new weed dataset collected in cotton fields in multiple southern U.S. states over the two consecutive seasons of 2020 and 2021. We establish a comprehensive benchmark of a large set of DL architectures for weed classification on the new dataset. This research is expected to provide a valuable reference for future research on developing machine vision systems for cotton weed control and beyond. The contributions of this paper are highlighted as follows:

  1. The presentation of a unique, diverse weed dataset111 consisting of 5187 images of 15 weed classes specific to the U.S. cotton production systems.

  2. A comprehensive evaluation and benchmark of 27 state-of-the-art DL models222 through transfer learning for multi-class weed identification.

  3. A novel DL-based cosine similarity metric for assisting in the interpretation of DL output and a weighted loss function for improving classification accuracies for minority weed classes.

2 Materials and Methods

2.1 Cotton Weed Dataset

RGB (Red-Green-Blue) images of weed plants were collected from cotton fields using either smartphones or hand-held digital color cameras. For the sake of image diversity, following the recommendations in Lu and Young (2020), images were captured from different view angles, under natural field light conditions, at varying stages of weed growth, at different locations across the U.S. cotton belt states (primarily in North Carolina and Mississippi). Regular visits to cotton fields were conducted throughout June to August in the growing seasons of 2020 and 2021 for weed image collection. In 2020, images were mainly acquired in the cotton fields of North Carolina State University research stations, including Central Crops Research Station (Clayton, NC), Upper Coastal Plain Research Station (Rocky Mount, NC) and Cherry Research Farm (Goldsboro, NC). In 2021, more weed images were acquired in cotton fields of R. R. Foil Plant Science Research Center (Starkville, MS) and Black Belt Experiment Station (Brooksville, MS) of Mississippi State University. To create a diverse, large-scale dataset, weed scientists at different institutions were invited to participate in the image collection effort. A Google form333 was created and shared for uploading weed images and associated metadata (e.g., weed species, field sites, weather conditions).

Figure 1: Bar plot of the cotton weed dataset. Images per weed class are randomly partitioned into 65%-20%-15% splits of train, validation and test subsets, respectively. Numbers above the bars represent the total number of images for the corresponding weed classes.

The acquired images were first annotated for weed species by weed experts during image submission through the Google from, and the received images were then annotated by trained individuals, and the final annotations were examined again by experts to ensure annotation quality. The images with multiple classes of weeds were cropped so that each resultant image contained a single class of weeds. The weed classes were defined by common names of weed plants. At the time of writing, the entire dataset contains more than 10000 images for over 50 weed species, which will be documented in detail in a future study. Here the weed dataset used for benchmarking DL models consists of a total of 5187 images for 15 common weed classes. The image number for each weed class is shown in Fig. 1

. It should be noted that all weed classes, except Morningglory, correspond to single weed species. The images of different Morningglory species (e.g., Ivy Morninglory, Pitted Morningglory, Entireleaf Morningglory and Tall Morningglory) were grouped together as a single weed class, because of their similarity in weed management. Overall the weed classes including Morningglorg, Carpetweed, Palmer Amaranth, Waterhemp and Purlane are the four major classes in terms of image number, as opposed to weed species like Crabgrass, Swinecress and Spurred Anoda, corresponding to minority classes. It is clear that the present dataset has unbalanced classes. The class imbalance generally poses a challenge to machine learning modeling, which will be discussed in Sections 

2.4 and 3.2.

Fig. 2 shows example images from the cotton weed dataset. The images within the same weed class have large variations in leaf color and morphology, soil background and field light conditions, which are desirable for building models robust to image conditions or dataset shift. The image variations vary among weed classes; despite distinct identifying characteristics, some weed classes exhibit relatively high similarities in the plant morphology. For instance, some young Morningglory and Spurred Anoda seedlings have similar, broad leaves, and the latter is also similar to Prickly Sida in terms of toothed leaf margins. Goosegrass and Crabgrass are both grassy weeds that grow prostrate on the ground, with similar leaf shapes. Palmer Amaranth and Waterhemp that are both pigweed species may look similar and are difficult to distinguish from each other. These similarities may contribute to errors in weed identification by DL models. A quantitative DL-based similarity measure along with a similarity matrix will be discussed in later sections (see Sections 2.5 and 3.3) to characterize the similarity among weed classes.

Figure 2: Example images from the cotton weed dataset. Each row displays randomly selected images from each of the 15 weed classes.

2.2 Transfer learning

Deep transfer learning (DTL) starts with a pre-trained DL model on a large-scale dataset (e.g., ImageNet

Deng et al. (2009)) and then fine-tunes the model on a new dataset from the specific domain of interest Zhuang et al. (2020)

. For the weed classification in this study, we replace the last fully-connected (FC) layers of DL models with a layer that has 15 neurons corresponding to the same number of weed classes in the cotton weed dataset.

Literature review was conducted to select appropriate DL models for weed identification. The main selection criterion was the demonstrated performance of models in visual categorization tasks in the computer vision community and the availability of their source-code implementations. As a result, a suite of 27 state-of-the-art CNN models of different architectures, as summarized in Table 2 , were selected for classifying the cotton weed images here. Some of them including Xcpetion Chollet (2017), VGG16 Simonyan and Zisserman (2014), ResNet50 He et al. (2016), InceptionV3 Szegedy et al. (2016) and DenseNet Huang et al. (2017) has recently been evaluated for classifying weeds in other cropping systems Espejo-Garcia et al. (2020b); Olsen et al. (2019); Ahmad et al. (2021). The majority of these models, such as EfficientNet Tan and Le (2019) and MnasNet Tan et al. (2019), remain to be evaluated for weed classification tasks.

The DL models were trained with a conventional cross entropy (CE) loss function as follows:



is a vector of a Softmax output layer

Goodfellow et al. (2018)

indicating the probability of the 15 weed classes. Here

is the number of weed classes, and denotes the true probability and is defined as:


For model development and evaluation, the cotton weed dataset was randomly partitioned into three subsets: 65% for training, 20% for validation and 15% for testing, as shown in Fig. 1. All training and validation images were resized to pixels in size before being fed into DL models (images of and pixels were also examined, but the size of pixels was found to be better in terms of accuracy and speed). The image pixel intensities per color channel were normalized to the range of for enhanced image recognition performance Koo and Cha (2017). In addition, for better model accuracy, real-time data augmentation was conducted by randomly rotating images in the range of and random flipping during the training process.

Because of the random nature of dataset partition, it would be desirable to run model training and testing multiple times for obtaining a reliable estimate of model performance Raschka (2018)

. In this study, DL models were trained with 5 replications, with different random seeds that were shared by all the models, and the mean accuracies on test data were computed for performance evaluation. All the models were trained for 50 epochs (that were found sufficient for modeling the weed data) with the SGD (stochastic gradient descent) optimizer and a momentum of 0.9. The learning rate was initially set to be 0.001, and dynamically decreased by a factor of 0.1 every 7 epochs for stabilizing model training. The DL framework Pytorch (version 1.9) with Torchvision (version 0.10.0)

Paszke et al. (2019) were used for model training, in which a multiprocessing package444

was employed with 32 CPU cores to speed up the training. The experiments were performed on an Ubuntu 20.04 server with an AMD 3990X 64-Core CPU and a GeForce RTX 3090Ti GPU (24 GB GDDR6X memory). Readers are referred to the open-source codes

555Code at:, for detailed implementation of transfer learning for the 27 DL models.

2.3 Performance Metrics

The performance of the DL models in weed identification was evaluated in terms of number of model parameters, training and inference times, confusion matrix and F1-score.

2.3.1 Number of Model Parameters

In this study, pretrained DL models were fined tuned by updating all the model parameters for the weed classification task. Thus the number of model parameters refer to all the weights (and biases) in the network that are updated/learnt during the training process through back-propagation. The parameter number is a direct measure of model complexity: networks with a larger number of parameters potentially require greater deployment memory and incur longer training and inference times (see Subsection 2.3.2).

2.3.2 Training and Inference Times

The training time is the time required to train a DL model with prescribed model configurations and computing resources. The training time depends on factors such as model architecture, number of model parameters, data size, hyper-parameters, DL framework as well as computing hardware. The training time is an important consideration where development time and resources are constrained.

A trained DL model is to be used to make predictions (also known as inference). The inference time (i.e., latency) is one crucial aspect in deploying DL models for real-time applications (e.g., in-field weed identification). It is the time that a trained DL model takes to make a prediction given an image input. In this paper, for reliable estimation, the inference time was measured as the average time needed to predict 30 weed images randomly selected from the testing dataset.

2.3.3 Confusion Matrix and F1-score

The confusion matrix on testing images, which provides the accuracy for each class while revealing detailed misclassifications, was presented to show the classification for individual weed classes. The classification accuracy was measured in terms of F1 score. For the multi-class weed classification, the micro-averaged F1 score Yang and Liu (1999) was calculated as the classification accuracy. In micro-averaging, the per-class classifications are aggregated across classes to compute the micro-averaged precision P and recall R by counting the total true positives, false negatives and false positives, and then a harmonic combination of P and R, i.e., Micro-F1, as follows:


2.4 Weighted Cross Entropy Loss

The CE loss function defined in Eqn. 1 does not account for the class imbalance encountered in the cotton weed dataset (Fig. 2) in this study. Training with the CE loss may result in large classification errors for minority weed classes (e.g., Spurred Annoda). To mitigate this issue, a weighted cross entropy (WCE) Phan and Yamamoto (2020) loss function was introduced, performing re-weighting according to image numbers for each weed class as follows:


where is a weighting vector that assigns individualized penalty to each class, preferentially placing larger weights on minority classes. The conventional CE loss without considering class imbalance corresponds to a weighting vector of ones (the CE column in Table 1). In this study, an inverse-proportion weighting strategy Phan and Yamamoto (2020) was adopted to assign the weight to the th weed class as follows:


where denotes the number of images for the th weed class and represents the maximum number of images among classes, i.e., 1115 (for Morningglory). As a result, weed classes with fewer images are assigned with relatively greater weights. For example, the weight for Spurred Anoda is set to be 18.3 (). This strategy, which enforces larger penalties on misclassifications for minority classes, can potentially enhance the classification accuracy for these classes. In preliminary testing, it is observed that the direct inversion of image ratios may lead to sub-optimal performance, hence the final adopted weights are fine-tuned and empirically set as shown in the WCE column in Table 1. Other different choices of weighting strategies are discussed in Section 4.1.

# of images Weighting Coefficients
CE Eqn. 5 WCE
Morningglory 1115 1.0 1.0 1.4
Carpetweed 763 1.0 1.463 2.05
Palmer Amaranth 689 1.0 1.62 2.27
Waterhemp 451 1.0 2.47 3.46
Purslane 450 1.0 2.48 3.47
Nutsedge 273 1.0 4.09 5.72
Eclipta 254 1.0 4.39 4.39
Spotted Spurge 234 1.0 4.77 4.77
Sicklepod 240 1.0 4.65 4.65
Goosegrass 216 1.0 5.16 5.16
Prickly Sida 129 1.0 8.64 8.64
Ragweed 129 1.0 8.64 8.64
Crabgrass 111 1.0 10.05 10.05
Swinecress 72 1.0 15.49 15.49
Spurred Anoda 61 1.0 18.28 18.28
Table 1: Weighting coefficients of weed classes for CE loss, Eqn. 5 and WCE loss functions.

2.5 Deep Learning-based Similarity Measure

To assist in the interpretation of DL classifications, an inter-class (or within-class) analysis was conducted by quantifying the similarity of the images of weed classes. Euclidean distance is the most commonly used measure of inter-class similarity, but it is sensitive to varying image conditions (e.g., variable ambient light, variations in camera view angle and position), which are typical of the cotton weed images collected under natural field conditions. Cosine similarity (CS), which measures the cosine of the angle between two vectors and is thus not sensitive to magnitude, offers an effective alternative to the Euclidean distance Xia et al. (2015).

In this study, we employed a DL-based CS measure for quantifying inter-class similarities. A DL model was used as feature extractor to obtain hierarchically learnt high-level representation of weed images, based on which the CS was calculated between two weed classes. Specifically, the VGG11 Simonyan and Zisserman (2014) model was trained on the cotton weed dataset through DTL, and the output of the first FC layer was taken as the feature vector, which is of length 4,096 (i.e., the output size of the FC layer in the VGG11 network Simonyan and Zisserman (2014)). While other DL models can also be used for feature extraction, the VGG11 was chosen because it achieved the best trade-off between classification performance and training time (see Table 2), particularly with high accuracies for minority weed classes (see Table 3). Given the extracted features for any two weed classes, the CS was calculated as follows Xia et al. (2015) :


where and are two feature vectors extracted by the VGG11 model and we randomly sample pairs of images from two weed classes of interest and compute the averaged similarity value. The CS values range from -1 to 1, where 1 means the two classes are perfectly similar and -1 means they are perfectly dissimilar.

3 Experimental Results

Figure 3: Training and inference time v.s. number of parameters. The DL models are from the same family are labeled with the same marker.
Figure 4: Training accuracy and loss curves for the deep learning models.

3.1 Deep Learning Model Performance

Table 2 summarizes the number of model parameters, training and inference times and F1 score of the selected 27 DL models. There is a large variation in the number of parameters across the models, ranging from 0.74 M (million) in the SqueezeNet to 139.6 M in the VGG19. Depending on model architectures, the training time ranged from 37 min to 144 min. Models with a larger number of parameters tended to require a longer training time (see Fig. 3 left), because of increased model complexity. The inference times also exhibited an increasing trend with the number of parameters (see Fig. 3 right), although it had a notably smaller difference among models, ranging from 188 ms to 256 ms. Inference is mainly a forward propagation process that requires no parameter estimation and is thus far more efficient than training. Particularly, models including AlexNet, SqueezeNet, GoogleNet, ResNet18, ResNet50, VGGs, and MobileNets required inference times less than 200 ms, translating into a prediction speed of over 5 frames per second. The DL models overall show good potential to be deployed for real-time weed identification.

Figure 4 shows the training accuracy and loss curves of the DL models. All the DL models exhibited promising training performance in terms of fast convergence speed, low training losses and high training accuracies (F1 scores). The training accuracies tended to plateau after 10 epoches at a level exceeding 90%. Regarding test accuracies (Table 2), ResNet101 achieved the best overall F1-score of 99.1%, followed by ResNet50 with the F1=99.0%. There were other 12 models that gave F1 scores exceeding 98%, such as three Densenet variants, DPN68, MobilenetV3-large, mong others, and the top-10 models achieved an average F1 score of 98.71%. On the other hand, three models including AlexNet, SqueezeNet and MnasNet, yieleded the lowest F1 scores that were close to or less than 96%, although they were all superiorly efficient to train and fast to make inferences.

Index Model Parameter Number Training Time Training F1-Score Testing F1-Score Inference Time (ms)
1 AlexNet Krizhevsky et al. (2012) 57.1M 37m 2s 95.4 ± 0.2 95.3 ± 0.4 188.5 ± 2.2
2 SqueezeNet Iandola et al. (2016) 0.743M 46m 7s 96.4 ± 0.2 95.8 ± 0.5 187.3 ± 1.6
3 GoogleNet Szegedy et al. (2015) 5.6M 52m 28s 94.7 ± 0 97.8 ± 0.3 196.3 ± 0.5
4 Xception Chollet (2017) 20.8M 89m 9s 94.7 ± 0.2 97.5 ± 0.4 211.3 ± 1.8
5 DPN68 Chen et al. (2017) 11.8M 79m 10s 98.5 ± 0.1 98.8 ± 0.2 219.0 ± 6.9
6 MnasNet Tan et al. (2019) 3.1M 51m 3s 91.8 ± 0.2 96.0 ± 0.4 191.2 ± 2.0
7 ResNet ResNet18 He et al. (2016) 11.2M 47m 30s 96.9 ± 0.1 98.1 ± 0.2 188.9 ± 0.9
ResNet50 He et al. (2016) 23.5M 73m 17s 98.0 ± 0.1 99.0 ± 0.1 195.6 ± 0.4
ResNet101 He et al. (2016) 42.5M 92m 55s 98.3 ± 0.1 99.1 ± 0.2 207.0 ± 0.6
10 VGG VGG11 Simonyan and Zisserman (2014) 128.8M 67m 46s 97.3 ± 0.1 98.1 ± 0.2 194.1 ± 1.3
VGG16 Simonyan and Zisserman (2014) 134.3M 99m 25s 97.7 ± 0.2 98.1 ± 0.3 195.7 ± 1.4
VGG19 Simonyan and Zisserman (2014) 139.6M 112m 41s 97.9 ± 0.1 97.9 ± 0.2 197.2 ± 1.4
13 Densenet Densenet121 Huang et al. (2017) 7.0M 75m 40s 97.9 ± 0.1 98.7 ± 0.1 212.4 ± 0.8
Densenet161 Huang et al. (2017) 26.5M 133m 42s 98.4± 0.1 98.9 ± 0.4 227.4 ± 0.5
Densenet169 Huang et al. (2017) 12.5M 85m 1s 98.1 ± 0.1 98.9 ± 0.3 226.8 ± 0.5
16 Inception Inception v3 Szegedy et al. (2016) 24.4M 73m 50s 96.7 ± 0 98.4 ± 0.3 206.3 ± 0.4
Inception v4 Szegedy et al. (2017) 41.2M 120m 42s 95.9 ± 0.1 98.1 ± 0.4 235.4 ± 0.8
Inception-ResNet v2 Szegedy et al. (2017) 54.3M 124m 36s 94.0 ± 0.2 97.6 ± 0.4 255.9 ± 1.4
19 Mobilenet MobilenetV2 Sandler et al. (2018) 2.2M 53m 27s 97.4 ± 0.1 98.4 ± 0.1 191.1 ± 0.8
MobilenetV3-small Howard et al. (2019) 1.5M 41m 27s 94.5 ± 0.2 96.6 ± 0.1 193.1 ± 1.2
MobilenetV3-large Howard et al. (2019) 4.2M 49m 4s 96.6 ± 0.1 98.6 ± 0.2 193.8 ± 2.0
22 EfficientNet EfficientNet-b0 Tan and Le (2019) 4.0M 63m 39s 93.0 ± 0.1 97.4 ± 0.4 202.0 ± 5.6
EfficientNet-b1 Tan and Le (2019) 6.5M 77m 8s 93.8 ± 0.2 97.3 ± 0.4 203.8 ± 0.8
EfficientNet-b2 Tan and Le (2019) 7.7M 78m 56s 94.1 ± 0.2 97.8 ± 0.1 204.5 ± 1.7
EfficientNet-b3 Tan and Le (2019) 10.7M 92m 51s 95.0 ± 0.2 98.2 ± 0.1 211.3 ± 1.2
EfficientNet-b4 Tan and Le (2019) 17.6M 113m 12s 94.1 ± 0.2 97.8 ± 0.2 216.3 ± 1.3
EfficientNet-b5 Tan and Le (2019) 28.4M 144m 44s 94.1 ± 0.3 97.4 ± 0.1 224.1 ± 1.5
Table 2:

Performance of 27 state-of-the-art deep learning models on the cotton weed dataset. Note that the variations in training time are negligible so its standard deviation is not included. The top-10 testing accuracies of DL models are highlighted in bold. “M” stands for a million.

The confusion matrices on test data for all the DL models are available on our Github page666 Due to space constraints, we only show the confusion matrices for one top F1-score model, ResNet-101, and one low-performant model, MnasNet (MnasNet1.0), in Fig. 5 and Fig. 6, respectively. The ResNet-101 yielded perfect classifications for 12 out of 15 weed classes, although it misclassified 3%, 4% and 20% of the images of Goosegrass, Palmer Amaranth and Spurred Annoda, respectively. Spurred Anoda was the most challenging weed class to distinguish from others. The ResNet-101 achieved the classification accuracy of 80% for this species, misclassifying 20.0% of the weed as PricklySida. The MnasNet model only achieved the accuracy of 20% in the identification of Spurred Annoda, as shown in Fig. 6, misclassifying 60% and 20% of the weed as Prickly Sida and Palmer Amaranth, respectively. The poor accuracies are presumably because of the smallest number of images available in the dataset (61 as shown in Fig. 1). Similar low accuracies were also observed for other minority weed classes such as Crabgrass and Ragweed, with an accuracy of 88% and 80%, respectively, by the MasNet. To improve the performance of DL models on the minority weed classes, the proposed WCE loss function is discussed next.

Figure 5: The confusion matrix of the ResNet-101 model on the test dataset.
Figure 6: The confusion matrix of the MnasNet model on the test dataset.

3.2 Performance Improvement with the WCE Loss

Fig. 7 shows the confusion matrix achieved by the MnasNet model trained with the WCE loss function (Eqn. 4). The WCE-based model achieved remarkable improvements over the counterpart (Fig. 6) trained with the regular CE function (Eqn. 1) in classifying minority weed classes. The classification accuracy of Spurred Annoda jumped from 20% to 80%, and the accuracies for Crabgrass and Ragweed were improved from 88% to 94% and from 80% to 95%, respectively.

Figure 7: The confusion matrix of the MnasNet model with the weighted cross entropy loss strategy evaluated on the test dataset.

Table 3 compares the classification accuracies by five selected models (including the aforementioned MnasNet) using the CE loss and the WCE loss. The confusion matrices for all the DL models trained with the CE and WCE loss separately are available at the site777 Emphasis here is placed on classifying two majority weeds, Morningglory and Waterhemp, and two minority weeds, Crabgrass and Spurred Anoda. Considerable improvements were achieved by all these models for the minority weed classes. Notably, in addition to MnasNet, EfficientNet-b2 and Xception achieved improvements of 40% and 20% in identifying Spurred Anoda, respectively, compared to the models the CE loss.

Despite improvements on minority classes, models like Xception, MnasNet and EfficientNet-b2 resulted in a slightly decreased accuracy for Morningglory. This is because the WCE strategy that placed stronger weights on minority classes might negatively affect the classification for major classes. Nonetheless, the significant improvements on the minority classes outweighed the decrease accuracy in the majority classes, leading to overall improvements in F1-score by these models. Particularly, DenseNet161 achieved an overall F1-score of 99.24%, outperforming the ResNet101 that achieved the best accuracy (99.1%) among all the CE-based models. The VGG11 saw a slight decrease in the overall F1-score, but it is encouraging to see that the model achieved 100% classification for Spurred Anoda that has only 61 images in the weed.

Morningglory Waterhemp Crabgrass Spurred Anoda Overall F1-score
Densenet161 100 100 98.53 100 100 100 70 80 98.85 99.24
Xception 100 98.21 100 100 94.12 100 50 70 97.58 97.96
Mnasnet 98.81 95.83 97.06 97.06 88.24 94.12 20 80 95.67 96.06
EfficientNet-b2 98.81 97.02 100 100 94.12 100 40 80 97.71 97.96
VGG11 99.4 97.02 98.53 100 100 94.12 90 100 97.84 97.07
Table 3: Performance comparison between cross entropy (CE) loss and weighted cross entropy (WCE) loss for each weed class and the overall F1-score (%).

3.3 Weed Similarity Analysis

Fig. 8 shows an inter-class CS (cosine similarity) matrix based on the features extracted by the VGG11 model (with the CE loss) (Section 2.5). The CS matrix helps explain misclassifications by DL models among weed classes. Weed classes that share more common features tended to have higher CS values. For example, Goosegrass and Crabgrass that are both grassy weeds in the Poaceae family, had a CS of 0.69, greater than their similarities with all other weeds. The high CS is in agreement with the classification errors observed between the two classes (see Fig. 5, Fig. 6 and Fig. 7). For the ResNet101 model, for instance, all the 3% misclassifications for Goosegrass were due to misclassifying the weed as Crabgrass (Fig. 5). Spurred Anoda and Prickly Sida are another pair of similar weeds, which both have toothed leaf margins and are members of the Mallow family. The globally highest CS of 0.73 was observed between the two classes. Their strong similarity, along with the fact that Prickly Sida has more than twice as many images as Spurred Anoda, explains the significant proportion of Spurred Anoda misclassified as Prickly Sida (see Fig. 5, Fig. 6 and Fig. 7).

Figure 8: The similarity matrix achieved by the deep learning-based similarity measure scheme 2.5 on the cotton weed dataset. The diagonal entries indicate the perfect similarity of each class with itself

4 Discussion and Future Research

In this section, we discuss two potential approaches to improving the performance on minority weed classes, which will be investigated in future studies.

4.1 Weighted Loss Functions

The WCE loss function (Eqn. 4) improves the CE loss by adaptively assigning weights to individual weed classes to account for class imbalance. In addition to the weighting in Eqn. 5, there are other weighting Phan and Yamamoto (2020) or cost-sensitive methods Khan et al. (2017) to cope with imbalanced data.

The class-balanced (CB) loss introduced in Cui et al. (2019) re-balances the classification loss based on the effective number of samples for each class.

The CB loss is defined as:


When , the CB loss is equivalent to the CE loss and corresponds to re-weighting by inverse class frequency, which enables us to smoothly adjust the class-balanced term between no re-weighting and re-weighing by inverse class frequency Cui et al. (2019).

Focal loss (FL), which was originally proposed in Lin et al. (2017), offers another promising alternative for imbalanced learning, which is calculated as follows:


where is called the modulating factor, which allows down-weighting the contributions of easy examples or majority classes during training while rapidly focusing on challenging classes that have few images. Here, is the focusing parameter, and FL loss is reduced to the conventional CE loss when equivalent to .

In future research, we will experiment and evaluate these weighted loss functions for improved classification of minority weed classes.

4.2 Data Augmentation

In this paper, although overall DL models achieved remarkable weed identification accuracy, some models that prove to be powerful in visual categorization tasks, like EfficientNet Tan and Le (2019), did not perform well as expected, especially on minority weed classes888 This is likely due to the fact that these models are heavily reliant on large-scale data to be sufficiently optimized while avoiding overfitting Shorten and Khoshgoftaar (2019). One intuitive solution is to collect more images for the under-performed weed classes. Unfortunately, many weed species may be difficult to collect due to unpredicted weather conditions and limited access to a diversity of field sites.

Data augmentation (DA) offers an effective means to address the insufficiency of physically collected image data. In DA, a suite of techniques Shorten and Khoshgoftaar (2019)

, such as geometric transformations, color space augmentations and generative adversarial networks (GANs), can be used to enhances the size and quality of training images such that deep learning models can be trained on the artificially expanded dataset and then gain better performance. Particularly, GANs have received increasing attention, representing a novel framework of generative modeling through adversarial training

Creswell et al. (2018). Recently, GAN methods have been investigated for weed classification tasks Espejo-Garcia et al. (2021, 2021) to address the lack of large-scale domain datasets. In this paper, we also applied geometric transformation methods such as random rotation and pixel normalization, but did not fully explore the potential of DA techniques in the classification of weed images, which will be a subject of future research.

5 Conclusion

In this study, a first, comprehensive benchmark of a suite of 27 DL models was established through transfer learning for multi-class identification of common weeds specific to cotton production systems. A dedicated dataset consisting of 5187 images of 15 weed classes was created by collecting images under natural light conditions and at varied weed growth stages from a diversity of cotton fields in southern U.S. states over two growth seasons. DTL proved to be effective for achieving high weed classification accuracies (F1 score >95%) within reasonably short training time (<2.5 hours). ResNet101 was the best-performant model in terms of the highest accuracy of F1=99.1%, and the top-10 models resulted in an average F1-score of 98.71%. A WCE loss function was proposed for model training, in which individualized weights were assigned to weed classes to account for class imbalance, which achieved substantial improvements in classifying minority weed classes. A DL-based cosine similarity metric was found to be useful for assisting the interpretation of the misclassifications. Both the source codes for model development and evaluation and the weed dataset were made publicly accessible for the research community. This study provides a good foundation for informed choice of DL models for weed classification tasks, and can be beneficial for precision agriculture research at large.

Authorship Contribution

Dong Chen: Formal analysis, Software, Writing - original draft; Yuzhen Lu: Conceptualization, Investigation, Data curation, Supervision, Writing - review & editing; Zhaojiang Li: Resources, Writing - review & editing; Sierra Young: Data curation, Writing - review & editing.


This work was supported in part by Cotton Incorporated award 21-005. The authors thank Dr. Camp Hand and Dr. Edward Barnes for contributing weed images and Dr. Charlie Cahoon for the assistance in weed identification. We also thank Mr. Shea Hoffman and Mr. Vinay Kumar for helping label the weed images.


  • A. Ahmad, D. Saraswat, V. Aggarwal, A. Etienne, and B. Hancock (2021) Performance of deep learning models for classifying and detecting common weeds in corn and soybean production systems. Computers and Electronics in Agriculture 184, pp. 106081. Cited by: §1, §2.2.
  • J. Ahmad, K. Muhammad, I. Ahmad, W. Ahmad, M. L. Smith, L. N. Smith, D. K. Jain, H. Wang, and I. Mehmood (2018) Visual features based boosted classification of weeds for real-time selective herbicide sprayer systems. Computers in Industry 98, pp. 23–33. Cited by: §1.
  • A. Bakhshipour and A. Jafari (2018) Evaluation of support vector machine and artificial neural networks in weed detection using shape features. Computers and Electronics in Agriculture 145, pp. 153–160. Cited by: §1.
  • E. Barnes, G. Morgan, K. Hake, J. Devine, R. Kurtz, G. Ibendahl, A. Sharda, G. Rains, J. Snider, J. M. Maja, et al. (2021) Opportunities for robotic systems and automation in cotton production. AgriEngineering 3 (2), pp. 339–363. Cited by: §1.
  • O. Bawden, J. Kulk, R. Russell, C. McCool, A. English, F. Dayoub, C. Lehnert, and T. Perez (2017) Robot for weed species plant-specific management. Journal of Field Robotics 34 (6), pp. 1179–1199. Cited by: §1.
  • Y. Chen, J. Li, H. Xiao, X. Jin, S. Yan, and J. Feng (2017) Dual path networks. arXiv preprint arXiv:1707.01629. Cited by: Table 2.
  • F. Chollet (2017) Xception: deep learning with depthwise separable convolutions. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    pp. 1251–1258. Cited by: §2.2, Table 2.
  • A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, and A. A. Bharath (2018) Generative adversarial networks: an overview. IEEE Signal Processing Magazine 35 (1), pp. 53–65. Cited by: §4.2.
  • Y. Cui, M. Jia, T. Lin, Y. Song, and S. Belongie (2019) Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9268–9277. Cited by: §4.1, §4.1.
  • J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Cited by: §2.2.
  • S. O. Duke (2015) Perspectives on transgenic, herbicide-resistant crops in the united states almost 20 years after introduction. Pest management science 71 (5), pp. 652–657. Cited by: §1.
  • M. Dyrmann, H. Karstoft, and H. S. Midtiby (2016) Plant species classification using deep convolutional neural network. Biosystems engineering 151, pp. 72–80. Cited by: §1.
  • B. Espejo-Garcia, N. Mylonas, L. Athanasakos, and S. Fountas (2020a) Improving weeds identification with a repository of agricultural pre-trained deep neural networks. Computers and Electronics in Agriculture 175, pp. 105593. Cited by: §1.
  • B. Espejo-Garcia, N. Mylonas, L. Athanasakos, S. Fountas, and I. Vasilakoglou (2020b) Towards weeds identification assistance through transfer learning. Computers and Electronics in Agriculture 171, pp. 105306. Cited by: §1, §1, §2.2.
  • B. Espejo-Garcia, N. Mylonas, L. Athanasakos, E. Vali, and S. Fountas (2021) Combining generative adversarial networks and agricultural transfer learning for weeds identification. Biosystems Engineering 204, pp. 79–89. Cited by: §4.2.
  • S. A. Fennimore and M. Cutulle (2019) Robotic weeders can improve weed control options for specialty crops. Pest management science 75 (7), pp. 1767–1774. Cited by: §1.
  • R. Gerhards and S. Christensen (2003) Real-time weed detection, decision making and patch spraying in maize, sugarbeet, winter wheat and winter barley. Weed research 43 (6), pp. 385–392. Cited by: §1.
  • T. M. Giselsson, M. Dyrmann, R. N. Jørgensen, P. K. Jensen, and H. S. Midtiby (2017) A Public Image Database for Benchmark of Plant Seedling Classification Algorithms. arXiv preprint. Cited by: §1.
  • I. Goodfellow, Y. Bengio, and A. Courville (2018) Softmax units for multinoulli output distributions. deep learning. MIT Press. Cited by: §2.2.
  • E. Hamuda, M. Glavin, and E. Jones (2016) A survey of image processing techniques for plant extraction and segmentation in the field. Computers and Electronics in Agriculture 125, pp. 184–199. Cited by: §1.
  • A. M. Hasan, F. Sohel, D. Diepeveen, H. Laga, and M. G. Jones (2021) A survey of deep learning techniques for weed detection from images. Computers and Electronics in Agriculture 184, pp. 106067. Cited by: §1.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §2.2, Table 2.
  • A. Howard, M. Sandler, G. Chu, L. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, et al. (2019) Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324. Cited by: Table 2.
  • K. Hu, G. Coleman, S. Zeng, Z. Wang, and M. Walsh (2020) Graph weeds net: a graph-based deep learning method for weed recognition. Computers and Electronics in Agriculture 174, pp. 105520. Cited by: §1.
  • G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708. Cited by: §2.2, Table 2.
  • F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer (2016) SqueezeNet: alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size. arXiv preprint arXiv:1602.07360. Cited by: Table 2.
  • A. Khan, A. Sohail, U. Zahoora, and A. S. Qureshi (2020) A survey of the recent architectures of deep convolutional neural networks. Artificial Intelligence Review 53 (8), pp. 5455–5516. Cited by: §1.
  • S. H. Khan, M. Hayat, M. Bennamoun, F. A. Sohel, and R. Togneri (2017)

    Cost-sensitive learning of deep feature representations from imbalanced data

    IEEE transactions on neural networks and learning systems 29 (8), pp. 3573–3587. Cited by: §4.1.
  • K. Koo and E. Cha (2017) Image recognition performance enhancements using image normalization. Human-centric Computing and Information Sciences 7 (1), pp. 1–11. Cited by: §2.2.
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25, pp. 1097–1105. Cited by: Table 2.
  • S. Leminen Madsen, S. K. Mathiassen, M. Dyrmann, M. S. Laursen, L. Paz, and R. N. Jørgensen (2020) Open plant phenotype database of common weeds in denmark. Remote Sensing 12 (8), pp. 1246. Cited by: §1.
  • E. Library (2021) High quality plant datasets for your ai application or research project, in one place.. Note: Cited by: §1.
  • T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár (2017) Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp. 2980–2988. Cited by: §4.1.
  • Y. Lu and S. Young (2020) A survey of public datasets for computer vision tasks in precision agriculture. Computers and Electronics in Agriculture 178, pp. 105760. Cited by: §1, §2.1.
  • S. Manalil, O. Coast, J. Werth, and B. S. Chauhan (2017) Weed management in cotton (gossypium hirsutum l.) through weed-crop competition: a review. Crop Protection 95, pp. 53–59. Cited by: §1.
  • G. E. Meyer and J. C. Neto (2008) Verification of color vegetation indices for automated crop imaging applications. Computers and electronics in agriculture 63 (2), pp. 282–293. Cited by: §1.
  • J. K. Norsworthy, S. M. Ward, D. R. Shaw, R. S. Llewellyn, R. L. Nichols, T. M. Webster, K. W. Bradley, G. Frisvold, S. B. Powles, N. R. Burgos, et al. (2012) Reducing the risks of herbicide resistance: best management practices and recommendations. Weed science 60 (SP1), pp. 31–62. Cited by: §1.
  • K. O’Shea and R. Nash (2015) An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458. Cited by: §1.
  • E. Oerke (2006) Crop losses to pests. The Journal of Agricultural Science 144 (1), pp. 31–43. Cited by: §1.
  • A. Olsen, D. A. Konovalov, B. Philippa, P. Ridd, J. C. Wood, J. Johns, W. Banks, B. Girgenti, O. Kenny, J. Whinney, et al. (2019) DeepWeeds: a multiclass weed species image dataset for deep learning. Scientific reports 9 (1), pp. 1–12. Cited by: §1, §1, §1, §2.2.
  • P. Pandey, H. N. Dakshinamurthy, and S. N. Young (2021) Autonomy in detection, actuation, and planning for robotic weeding systems. Transactions of the ASABE, pp. 0. Cited by: §1.
  • A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. (2019) Pytorch: an imperative style, high-performance deep learning library. Advances in neural information processing systems 32, pp. 8026–8037. Cited by: §2.2.
  • T. H. Phan and K. Yamamoto (2020) Resolving class imbalance in object detection with weighted cross entropy losses. arXiv preprint arXiv:2006.01413. Cited by: §2.4, §4.1.
  • S. Raschka (2018) Model evaluation, model selection, and algorithm selection in machine learning. arXiv preprint arXiv:1811.12808. Cited by: §2.2.
  • M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. Chen (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520. Cited by: Table 2.
  • U. E. R. Service. (2015) Genetically engineered varieties of corn, upland cotton, and soybeans, by state and for the united states, 2000–15. Adoption of genetically engineered crops. Cited by: §1.
  • C. Shorten and T. M. Khoshgoftaar (2019) A survey on image data augmentation for deep learning. Journal of Big Data 6 (60), pp. 1–48. Cited by: §4.2, §4.2.
  • K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §2.2, §2.5, Table 2.
  • H. K. Suh, J. Ijsselmuiden, J. W. Hofstee, and E. J. van Henten (2018) Transfer learning for the classification of sugar beet and volunteer potato under field conditions. Biosystems engineering 174, pp. 50–65. Cited by: §1.
  • C. Sun, A. Shrivastava, S. Singh, and A. Gupta (2017) Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE international conference on computer vision, pp. 843–852. Cited by: §1.
  • C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi (2017)

    Inception-v4, inception-resnet and the impact of residual connections on learning

    In Thirty-first AAAI conference on artificial intelligence, Cited by: Table 2.
  • C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich (2015) Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9. Cited by: Table 2.
  • C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna (2016) Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826. Cited by: §2.2, Table 2.
  • M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, and Q. V. Le (2019) Mnasnet: platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2820–2828. Cited by: §2.2, Table 2.
  • M. Tan and Q. Le (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning, pp. 6105–6114. Cited by: §2.2, Table 2, §4.2.
  • A. Wang, W. Zhang, and X. Wei (2019) A review on weed detection using ground-based machine vision and image processing techniques. Computers and electronics in agriculture 158, pp. 226–240. Cited by: §1.
  • J. H. Westwood, R. Charudattan, S. O. Duke, S. A. Fennimore, P. Marrone, D. C. Slaughter, C. Swanton, and R. Zollinger (2018) Weed management in 2050: perspectives on the future of weed science. Weed science 66 (3), pp. 275–285. Cited by: §1.
  • D. M. Woebbecke, G. E. Meyer, K. Von Bargen, and D. A. Mortensen (1995) Color indices for weed identification under various soil, residue, and lighting conditions. Transactions of the ASAE 38 (1), pp. 259–269. Cited by: §1.
  • P. Xia, L. Zhang, and F. Li (2015) Learning similarity with cosine similarity ensemble. Information Sciences 307, pp. 39–52. Cited by: §2.5, §2.5.
  • Y. Yang and X. Liu (1999) A re-examination of text categorization methods. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 42–49. Cited by: §2.3.3.
  • S. L. Young, G. E. Meyer, and W. E. Woldt (2014) Future directions for automated weed management in precision agriculture. In Automation: The future of weed control in cropping systems, pp. 249–259. Cited by: §1.
  • F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, and Q. He (2020) A comprehensive survey on transfer learning. Proceedings of the IEEE 109 (1), pp. 43–76. Cited by: §1, §2.2.