Semi-supervised dry herbage mass estimation using automatic data and synthetic images

10/26/2021
by   Paul Albert, et al.
Insight Centre for Data Analytics
4

Monitoring species-specific dry herbage biomass is an important aspect of pasture-based milk production systems. Being aware of the herbage biomass in the field enables farmers to manage surpluses and deficits in herbage supply, as well as using targeted nitrogen fertilization when necessary. Deep learning for computer vision is a powerful tool in this context as it can accurately estimate the dry biomass of a herbage parcel using images of the grass canopy taken using a portable device. However, the performance of deep learning comes at the cost of an extensive, and in this case destructive, data gathering process. Since accurate species-specific biomass estimation is labor intensive and destructive for the herbage parcel, we propose in this paper to study low supervision approaches to dry biomass estimation using computer vision. Our contributions include: a synthetic data generation algorithm to generate data for a herbage height aware semantic segmentation task, an automatic process to label data using semantic segmentation maps, and a robust regression network trained to predict dry biomass using approximate biomass labels and a small trusted dataset with gold standard labels. We design our approach on a herbage mass estimation dataset collected in Ireland and also report state-of-the-art results on the publicly released Grass-Clover biomass estimation dataset from Denmark. Our code is available at https://git.io/J0L2a

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 4

page 5

12/19/2020

Three Ways to Improve Semantic Segmentation with Self-Supervised Depth Estimation

Training deep networks for semantic segmentation requires large amounts ...
03/19/2017

Algorithms for Semantic Segmentation of Multispectral Remote Sensing Imagery using Deep Learning

Deep convolutional neural networks (DCNNs) have been used to achieve sta...
04/16/2020

In Search of Life: Learning from Synthetic Data to Detect Vital Signs in Videos

Automatically detecting vital signs in videos, such as the estimation of...
03/05/2020

Generalizable semi-supervised learning method to estimate mass from sparsely annotated images

Mass flow estimation is of great importance to several industries, and i...
11/09/2016

Semi-Supervised Recognition of the Diploglossus Millepunctatus Lizard Species using Artificial Vision Algorithms

Animal biometrics is an important requirement for monitoring and conserv...
11/21/2020

Height Prediction and Refinement from Aerial Images with Semantic and Geometric Guidance

Deep learning provides a powerful new approach to many computer vision t...
02/24/2016

Automatic Moth Detection from Trap Images for Pest Management

Monitoring the number of insect pests is a crucial component in pheromon...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Local monitoring of the biomass composition of grassland has great potential to improve the reasonable use of fertilizers on dairy farms. Nitrogen over-fertilization has detrimental effects on the environment such as the pollution of underground water or nearby rivers and a reduction in crop yield [3, 25, 39]. Clover is an important ally to reduce the need for nitrogen fertilization as it influences the impact of the fertilization process [52, 41]. Mapping the density of the clover content in grassland enables a targeted fertilization (as opposed to a uniform fertilization), which allows farmers to anticipate the amount of nitrogen required, and to limit over-fertilization. Balanced amounts of clover also have a important role to play in the final dry feed for the cow, as sufficient amounts of clover in the cow feed increases food intake and augments dairy production [38].

Figure 1: Overview of the dry herbage mass prediction task 

Species phenotyping proposes a direct application of computer vision where a canopy view of the objects is passed to an algorithm tasked with a computer vision problem. Some examples of these tasks include semantic segmentation [51, 18, 36], object counting [27, 6], classification [14, 17, 37], object detection [24, 47] and regression [38, 40]. The principal limitation when applying deep learning approaches to species phenotyping remains the large amount of annotated data required. Lower supervision alternatives using semi-supervised or unsupervised approaches can lower the annotation burden and enable a stronger convergence than using a small number of annotated images alone. In the case of grass/clover biomass estimation this is even more important, as the annotation process is destructive. To accurately measure biomass the region of interest has to be cut, separated, and weighed in a laboratory whereas the collection of un-annotated images is fast and simple.

In this paper, we use a large collection of unlabeled images together with a small annotated subset to improve the accuracy of a dry herbage mass predicting convolutional neural network (CNN, see Figure 

1

). We first learn a weakly-supervised semantic segmentation network on synthetic images to estimate the species density in the herbage. We then use the segmentation masks to generate automatic biomass labels for the unlabeled images using a simple regression algorithm. Finally, we train a convolutional neural network on a mix of the automatically labeled data and a small number of manually labeled examples to improve the regression accuracy over training on the small number of manually labeled examples alone. We construct our algorithm on an Irish dry herbage mass dataset 

[19] and validate our results on a publicly available dry biomass dataset [51] collected in Denmark.

Our contributions are:

  1. A herbage height aware, weakly supervised, semantic segmentation algorithm trained on synthetic images that is used to automatically label data;

  2. An algorithm leveraging automatically labeled images to improve grass/clover/weed dry herbage mass estimation;

  3. A detailed study of the importance of the low supervision elements for the final accuracy of our algorithm, and a comparison against the state-of-the-art on a publicly available dataset.

2 Related work

2.1 Image analysis for plant phenotyping

Plant phenotyping and dry matter prediction are excellent domains for the application for image analysis approaches since they enable insight to be extracted from the environment in a non-destructive manner. Existing works explore a variety of computer vision applications for plant phenotyping and in this section we review some of the most relevant to our work. Weed detection aims at localizing unwanted weeds to ultimately remove them by hand or using a robot. Common approaches include employing color filtering, edge detection, and area classification [44, 43, 54]

; utilising color features used to train random forest algorithms and support vector machines

[21, 22], or using neural networks used to semantically segment images [29]. Fruit or vegetable detection and counting reduces human labor by enabling automatic fruit treatment or collection on the farm. Examples include tomato segmentation and counting using a convolutional neural network [1], large scale fruit detection in trees [47], or real-time detection using a lightweight neural network [9].

Some approaches use UAV imaging as opposed to ground-level image capture, introducing a fast solution to mapping weeds in a field [15, 22]. As well as using RGB images alone, additional sensors can be added to reduce the difficulty of the phenotyping task [35] such as radar or lidar [30].

2.2 Species biomass estimation from canopy view images

Biomass estimation from canopy view images aims at providing solutions for targeted fertilization in fields. This opens the way for automated fertilization, reducing costs for the farmer and reducing water pollution due to over-fertilization [3, 52]. The heavy occlusions present in canopy images (see Figure 1) poses significant challenges as the biomass estimate should account for elements hidden from the canopy view.

Himstedt  [20] study the biomass of clover in a legume-grass mixture and demonstrate a good capacity to detect clover from the legumes using morphological filtering and color segmentation to detect the clover. The authors were then able to accurately predict the clover biomass in a controlled environment under the assumption that the total biomass is known. Mortensen  [38] propose segment the grass clover mixture using color filtering and edge detection before employing a linear regressor to learn the mapping between coverage area of each species and dry biomass content. The authors were then able to directly predict the dry biomass of each element from the image alone. The cow feed scenario presents the added constraint of estimating dry biomass from an image of the fresh pasture.

Skovesen  [50] propose an improvement over previous work by using a neural network to segment images, and then fitting a linear regressor to the detected species percentages to predict the biomass percentages. To train the neural network, a synthetic dataset is generated using sample crops of relevant species pasted on a soil background. This allows the authors to generate an infinite amount of training images with ground truth from a similar visual domain for their segmentation algorithm. Based on this work, the GrassClover dataset challenge [51] asks entrants to improve the author’s baseline using the synthetic images together with a large collection of unlabeled real images and a small set of manually labeled real images.

2.3 Semantic segmentation on synthetic images and domain adaptation

Semantic segmentation aims at predicting the object that each pixel in an image belongs to [16]. The human annotation required for semantic segmentation tasks is extensive, often requiring several hours per image [33]. This makes training strategies using fewer human annotated images attractive. Synthetic images promise to solve part of the problem by providing an unlimited amount of perfectly segmented training images. Popular synthetic datasets for semantic segmentation include The Grand Theft Auto V (GTA V) [45] or SYNTHIA [46] datasets that create synthetic images of cities using graphics engines.

Although the large quantity of labeled data allows a semantic segmentation neural network to converge on a synthetic dataset, the results need to generalize to real world data. Domain adaptation aims at learning domain agnostic features that can generalize from synthetic data to the real world. Domain adaptation strategies can be applied at different stages in a network: input adaptation, feature adaptation, or output adaptation. Input adaptation strategies aim at transforming synthetic images to look more realistic by applying a realistic style on synthetic images, often using a Generative Adversarial Network 

[60, 49, 48, 12].

Feature adaptation approaches aim to discover domain invariant (or aligned) features. Chen  [11] propose to use a maximum square loss to enforce a linear gradient increase between easier and harder classes. Luo  [34] use a significance aware adversarial information bottleneck; Chen  [13]

propose a knowledge distillation approach by matching network activations to a network pretrained on ImageNet.

Output adaptation techniques constrain the network prediction directly to enforce better generalization. This can be achieved using adversarial approaches where the predictions made on synthetic and real data should be indistinguishable to a discriminator network [8], or by enforcing low entropy (more confident) predictions [56]

. Batch normalization fine tuning on real data where the batch normalization parameters are tuned on the real images before evaluation has also been shown to be a simple but effective domain transfer strategy 

[32]. For a more detailed study of domain adaptation for semantic segmentation, we refer the reader to the domain review of Toldo  [55]

2.4 Semi-supervised learning and label noise

Training computer vision algorithms with limited supervision aims at learning representative features for a downstream task with little to no supervision. In the scope of this paper we train models using a small annotated subset together with a large amount of un-annotated images, which we refer to as a semi-supervised learning scenario. We additionally introduce label noise literature references, which tackles the scenario of approximately labeled data. This is related to the automatic labels we use in this paper.

Semi-supervised learning

aims at learning robust features to solve a task using limited annotations. Annotations are necessary in supervised learning to compute the weights of a neural network using gradient descent on a loss computed using the ground truth annotations. In the case of semi-supervised learning, only part of the dataset has been annotated by humans and the rest is unlabeled images. Iteratively approximating labels for the unlabeled data is a tedious task as the errors made by the network will be amplified (confirmation bias 

[5]). State-of-the-art semi-supervised learning uses consistency regularization mechanisms where labels are guessed using multiple views of a sample (different data augmentations) [7], sometimes coupled with pseudo-labeling [53].

Label noise

proposes robust algorithms to mitigate approximate labeling. Approximate labelling can occur when a dataset is created from web queries [31] or when labels are inferred using label propagation [2]. Solutions for training a neural network on label noise datasets include lowering the contribution of noisy labels in the training loss [23], correcting the label using the network prediction [4], meta-learning inspired corrections [57], monitoring feature space consistency [42], or robust data augmentation [59].

3 Biomass prediction in grass-clover pastures

This section introduces the semi-supervised learning problem of dry biomass estimation of grass-clover pastures, the datasets used, the synthetic image generation process, the automatic labelling pipeline, and our automatic label robust biomass regression algorithm.

Figure 2: Herbage height aware semantic segmentation on synthetic images 

3.1 Semi-supervised biomass estimation in grass-clover pastures

We consider here a semi-supervised regression problem with labeled canopy images of grass and clover, and their corresponding label assignment where is the number of species to predict. The small labeled set is complemented by a large set of unlabeled images with no corresponding labels and . We note the complete dataset used to train the network . This paper aims to solve the dry biomass prediction problem from images using a convolutional neural network using unlabeled images to the improve the regression accuracy.

3.2 Grass clover dry biomass datasets

We consider two different dry biomass prediction datasets, both centered around grass and clover biomass prediction. The first dataset we will refer to as the Irish dataset [19] consists of images labeled with: total dry herbage mass (kg DM/ha), dry grass biomass percentage (%), dry clover biomass percentage (%), and dry weed biomass percentage (%). We study here the low supervision version of the dataset which includes fully annotated images ( for training and for validation) and an additional unlabeled images. The images were collected using a high resolution Canon camera in Ireland in the summer of 2020.

The second dataset is the GrassClover dataset [51] which contains images labeled with: dry grass biomass percentage (%), dry white clover biomass percentage (%), dry red clover biomass percentage (%), and dry weed biomass percentage (%). Contrary to the Irish dataset, the GrassClover dataset distinguishes between red and white clover species but does not target the direct estimation of the dry herbage biomass (kg DM/ha). The fully annotated images are completed with unlabeled images without corresponding ground truth. The dataset was collected in Denmark in 2017 and 2018.

Figure 3: Cropped out samples for every species

3.3 Herbage height aware semantic segmentation on synthetic images 

The task we aim to solve in this section is to first predict a semantic segmentation of the herbage into grass, clover (possibly red-white), and weeds; and second, a herbage height map. Since human annotation of ground truth for semantic segmentation can take up to several hours per image [33] and since a pixel specific herbage height is difficult to estimate in practice, we propose (similar to [51]) to train our semantic segmentation network on a synthetically generated dataset . We generate the synthetic semantic segmentation images together with their 100% pixel-accurate synthetic segmentation ground truth using manually cropped out elements from the unlabeled images. In accordance to the low supervision scope of this paper, we only crop out samples (see Figure 3) and collect bare soil images to paste elements onto. The bare soil images are collected at the same site and using the same equipment as Hennessey et al. [19] during the Summer of 2021.

Figure 4: Automatic labeling from semantic segmentation

To produce images similar to the real images we aim to make predictions for, we respect the species ratio in images by enforcing the probability of a species to be pasted according to the observed average dry biomass distribution in the training dataset:

grass, clover, weeds. We draw the probability of each species to be pasted from a component Dirichlet distribution with parameters for (grass, clover, weeds). Once the species has been decided, we randomly draw a sample for this category and apply a series of transformation to increase the diversity of the synthetic images. The transformations include: (uniform) random rotation , random Gaussian blur , random brightness change , and random resizing

. Finally, we select a random center location to paste the sample on the background images as well as a mask of the sample’s label on the ground truth map. We additionally approximate the herbage height in the synthetic images as the sum of the total number of successive elements pasted on a pixel. In the rest of the paper, this approximation made on synthetic images will be referred to as herbage height. For example, if three samples have been pasted at the same pixel (clover on top of grass on top of clover), we define the un-normalized herbage height as 3 for the given pixel. Once the synthetic dataset has been fully generated, we compute the 75th percentile of the herbage height for every pixel in all generated images (allowing us to filter outliers) and use this value to clip overly high herbage height numbers and produce a normalized herbage height between 0 and 1 for every pixel in every synthetic image. The normalized herbage height becomes the ground truth target for the segmentation network. Additionally, we found that the quality of the segmentation learnt by

is best when the number of elements to paste is in per image (randomly varied across images); beyond this the synthetic images become overly cluttered. Images are generated at a resolution. The RGB images are stored in the JPEG format, the grayscale ground truth maps are stored as PNG images, and the herbage height matrix is stored as a compressed numpy array. Figure 4 illustrates the automatic labeling pipeline.

3.4 Generating synthetic images suitable for herbage mass estimation 

To concurrently solve the tasks of semantically segmenting the herbage images and estimating the herbage height for every pixel in the images, we propose a herbage height aware semantic segmentation network consisting of a single feature extractor coupled with two decoder branches (see Figure 2). We concurrently train the species segmentation branch using a pixel-level cross-entropy loss:

where is the softmaxed prediction of the network and are the synthetic segmentation labels. The herbage height branch is trained using a root mean square error (RMSE) loss:

where is the total amount of pixels in the images, is the ground truth synthetic height label, and is the network prediction (sigmoid). The total training loss of the segmentation network is .

HRMSE RMSE
Total Grass Clover Weeds Avg. HRAE Grass Clover Weeds Avg.
Simple DA 357.35 328.66 55.74 26.75 137.05 35.26 8.11 6.87 3.22 6.07
+ ColorJitter 319.92 289.32 60.81 31.40 127.18 35.46 8.63 7.68 3.55 6.62
+ BN tuning 284.60 258.34 51.92 27.05 112.44 31.79 6.49 4.94 3.24 4.89
Table 1: Importance of data augmentation and batch normalization tuning when training on synthetic images.

3.5 Automatic label prediction from species density estimations

The herbage height aware semantic segmentation network

allows us to reduce the complexity of the biomass prediction problem by simplifying the input domain from high resolution real RGB images to the surface area occupied by each species in the canopy as well as an estimated herbage height map. From there, we compute the relative area occupied by each species in the canopy (in %) and the predicted herbage height over each image and train a simple ridge regression algorithm using the small number of labels,

, to predict approximate labels for . This intermediate task allows us to generate accurate automatic labels for even if the number of images in is very limited.

3.6 Regression on automatic labels with a trusted subset

Although the biomass information can be directly predicted using the automatic annotation process (as done in Skovsen et al. [51]), we propose to attempt to decrease the regression error further by solving the regression problem directly from the RGB images using a convolutional neural network, , and both human-labeled and automatically labeled image datasets: coupled with ground truth labels (the trusted set) and coupled with approximate labels (the automatically labeled set). is trained to predict the biomass composition (%) and the dry herbage mass (kg DM/ha) from RGB images alone; the automatic images are only used in to help predict the automatic labels for unlabeled images in . To ensure that will not over-fit to incorrect approximate labels, we use three mechanisms. First, we over-sample the trusted data to ensure that a fixed percentage will always be presented to the network in every mini-batch ( approximate labels, trusted labels). Second, we use a label perturbation strategy where we randomly perturb the automatic labels to avoid over-fitting incorrect targets, and to avoid penalizing the network for making a prediction slightly different than the incorrect prediction. In practice, we randomly perturb the label by two times the observed RMSE of the automatic labels on the validation set. Finally, we find that adding vertical flipping and randomly grayscaling to the input images to be interesting augmentations that preserve the full herbage information of the image and help further decrease validation error.

HRMSE RMSE
Total Grass Clover Weeds Avg. HRAE Grass Clover Weeds Avg.
HL 351.54 332.88 51.34 28.29 137.50 41.61 6.82 6.20 3.25 5.42
SL 310.68 279.98 57.48 28.15 121.87 34.18 7.61 5.20 3.24 5.35
HL + SL 315.20 288.52 53.37 28.11 123.33 34.33 6.49 4.91 3.23 4.88
HL + SL + H 284.60 258.34 51.92 27.05 112.44 31.79 6.49 4.94 3.24 4.89
Table 2: Ablation study for predicting approximate labels. We report the biomass prediction errors on a heldout validation set. HL: hard labels, SL: soft labels, H: herbage height

4 Experiments

4.1 Training details 

We use two different neural networks to solve two distinct tasks. For the semantic segmentation network , we use a state-of-the-art architecture: DeepLabV3+ [10] where we duplicate the decoder to create the herbage height branch. is trained on synthetic images and uses synthetic images for validation. We use a ResNet34 [26]

as the feature extractor, initialized on ImageNet 

[28]

, and with an output stride of 16 for both training and testing. We use the “poly” lr schedule 

[10] starting at , a batch size of , and train for epochs. For the base data augmentation we resize images to on the short size, randomly crop a square, randomly flip horizontally, and normalize the images.

For the regression network , we use a ResNet18 network [58] pretrained on ImageNet to solve the regression problem from RGB images directly. We train for 100 epochs, starting with a learning rate of dividing it by 2 at epochs and . We use the same base data augmentation as for but with a resolution lowered to . For the strong(er) data augmentation, we add random vertical flipping and random grayscaling (). We train with a batch size of .

We use the Irish dataset [19] in its low supervision configuration ( images are used for training, for validation and for testing) for our exploratory studies, and generate 1000 synthetic images to train according to the process described in Section 3.4. We validate our results on the GrassClover dataset [51] and use the full fully annotated biomass images, dividing them into for training and for validation; we use the images withheld for the CodaLab 111https://competitions.codalab.org/competitions/21122 for testing. We make use of randomly selected synthetic images out of the generated by the authors for , keeping extra images for validation. We do not train the herbage height branch on the GrassClover dataset.

To evaluate the performance of the algorithms, we report the RMSE when predicting the dry biomass species percentage for both the Irish and GrassClover datasets. For the Irish dataset, we additionally report the RMSE of the global herbage mass prediction (HRMSE, kg DM/ha), the herbage relative absolute error (HRAE, in %) and the HRMSE specific to each species (kg DM/ha).

4.2 Semantic segmentation on synthetic images

To encourage to learn robust features that will generalize to unseen real images, we augment the synthetic images using color jittering and Gaussian blur. Furthermore, once the network has converged on the synthetic dataset and before predicting on the real images, we perform batch normalization tuning which is a common domain adaptation strategy [32] on the real images. An ablation study on the importance of the data augmentation and batch normalization tuning is given in Table 1, where we use the best performing regression algorithm from 4.3.

HRMSE RMSE
Total Grass Clover Weeds Avg. HRAE Grass Clover Weeds Avg.
LR 284.60 258.34 51.92 27.05 112.44 31.79 6.49 4.94 3.24 4.89
T 249.48 253.63 45.62 32.67 110.64 21.67 6.28 5.07 3.94 5.10
A 258.00 239.81 46.51 27.74 104.69 23.48 5.72 5.20 3.29 4.74
T + A 245.04 233.34 34.94 26.32 98.20 21.60 4.70 4.45 3.17 4.11
+ random GS 234.25 217.55 37.57 27.72 94.28 21.55 4.66 4.47 3.27 4.13
+ trusted oversampling 232.08 220.09 35.93 26.34 94.12 21.36 4.33 4.17 3.15 3.88
+ random perturbation 229.93 216.23 35.79 26.05 92.69 19.96 4.22 4.21 3.10 3.84
Table 3:

Ablation study on training with approximate labels. We report results on the validation set using the linear regression baseline

LR or training on the trusted data only T, the automatic data only A, or combinations of both T+A
Clover
Grass Total White Red Weeds Avg.
Skovsen  [51] 9.05 9.91 9.51 6.68 6.50 8.33
Naranayan  [40] 8.64 8.73 8.16 10.11 6.95 8.52
Trusted data 10.28 10.32 9.24 9.54 7.37 9.35
+ Automatic data 8.78 8.35 7.72 7.35 7.17 7.87
Table 4: Results on the GrassClover test set (RMSE).

4.3 Regression from species coverage 

We compare different sets of simple features to extract from the segmentation masks as well as the importance of the herbage height prediction when estimating the dry herbage mass. For features directly related to the dry biomass percentages, we compare averaging the most confident prediction for every pixel only (hard label, HL), averaging the full softmax prediction at each pixel (soft label, SL), or using the two sets of features jointly (HL+SL). In the regression model each feature is the average of the observations over the whole image: 4 features (soil %, grass %, clover %, weeds %) for HL or SL (8 for HL+SL), and 1 feature for the herbage height.

We fit a least squares regularized (ridge) regression algorithm to all features with a regularization factor of , and train on the small subset of annotated images before evaluating on the validation set (Table 2). First, we report the RMSE error of the total herbage mass error (kg DM/ha), as well as the detailed grass/clover/weed herbage mass estimation (kg DM/ha). Second, we report the relative RMSE for the total herbage mass (%) and the RMSE for the relative dry biomass estimation (%) for the grass/clover/weeds. We notice that using SL is better than HL when predicting the herbage mass, demonstrating the interest of capturing the full softmax information over the max prediction only. The precision of HL is still beneficial as we observe a good improvement in terms of dry biomass percentage RMSE when the two sets of features are coupled. When adding the information about the herbage height, a decrease in HRMSE error is observed, validating the importance of the herbage height module in the segmentation architecture.

4.4 Biomass prediction using automatic labels and a trusted subset

We use the automatic labels to enhance the generalization of the regression CNN in order to improve over the linear regression from the predictions of , especially in terms of herbage mass prediction. Table 3 reports the ablation study showing how the additional mechanisms we introduce allow us to be robust to the approximate automatic regression labels. The reported metrics are described in Section 4.1. We also compare the performance of the regression network against the linear regression from the prediction of .

4.5 Comparison against other works on the GrassClover dataset 

We compare the improvements of our approach on the publicly released GrassClover dataset [51]. The target metrics for this dataset are limited to the dry biomass percentages, for which we report RMSE errors. Table 4 reports the performance of our algorithm with and without automatic labels on the test set available on the CodaLab challenge 222https://competitions.codalab.org/competitions/21122 and compares against the best available results. We report a lower RMSE on average than the methods we compare against and show that our algorithm is capable of using unlabeled images to reduce the biomass estimation error for every species over training on the small trusted subset alone.

5 Conclusion

This paper proposes to improve upon existing low supervision baselines in dry grass clover biomass prediction by making use of unlabeled images. To do so, we first train a herbage height aware semantic segmentation network on synthetic images that we use to generate automatic labels for the unlabeled data using a small set of labeled images. We then train a regression CNN on RGB images directly using the automatic labels to improve the accuracy over using the trusted data alone. We demonstrate the importance of our herbage height aware segmentation network when predicting dry herbage masses from canopy view images as well as the noise robust mechanisms we use to train on automatically labeled data. We improve over our baseline on the Irish dry herbage biomass dataset and set a new state-of-the-art performance level on the publicly available GrassClover dataset.

References

  • [1] M. Afonso, H. Fonteijn, F. S. Fiorentin, D. Lensink, M. Mooij, N. Faber, G. Polder, and R. Wehrens (2020) Tomato Fruit Detection and Counting in Greenhouses Using Deep Learning. Frontiers in plant science. Cited by: §2.1.
  • [2] P. Albert, D. Ortego, E. Arazo, N.E. O’Connor, and K. McGuinness (2021) ReLaB: Reliable Label Bootstrapping for Semi-Supervised Learning. In International Joint Conference on Neural Networks (IJCNN), Cited by: §2.4.
  • [3] F. Albornoz (2016) Crop responses to nitrogen overfertilization: A review. Scientia horticulturae. Cited by: §1, §2.2.
  • [4] E. Arazo, D. Ortego, P. Albert, N. O’Connor, and K. McGuinness (2019) Unsupervised Label Noise Modeling and Loss Correction. In

    International Conference on Machine Learning (ICML)

    ,
    Cited by: §2.4.
  • [5] E. Arazo, D. Ortego, P. Albert, N.E. O’Connor, and K. McGuinness (2020) Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning. In International Joint Conference on Neural Networks (IJCNN), Cited by: §2.4.
  • [6] T. W. Ayalew, J. R. Ubbens, and I. Stavness (2020) Unsupervised Domain Adaptation for Plant Organ Counting. In European Conference on Computer Vision, Cited by: §1.
  • [7] D. Berthelot, N. Carlini, E. Cubuk, A. Kurakin, K. Sohn, H. Zhang, and C. Raffel (2020) ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring. In International Conference on Learning Representations (ICLR), Cited by: §2.4.
  • [8] M. Biasetton, U. Michieli, G. Agresti, and P. Zanuttigh (2019) Unsupervised domain adaptation for semantic segmentation of urban scenes. In

    IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

    ,
    Cited by: §2.3.
  • [9] K. Bresilla, G. D. Perulli, A. Boini, B. Morandi, L. Corelli Grappadelli, and L. Manfrini (2019) Single-shot convolution neural networks for real-time fruit detection within the tree. Frontiers in plant science. Cited by: §2.1.
  • [10] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In European Conference on Computer Vision (ECCV), Cited by: §4.1.
  • [11] M. Chen, H. Xue, and D. Cai (2019) Domain adaptation for semantic segmentation with maximum squares loss. In IEEE International Conference on Computer Vision (ICCV), Cited by: §2.3.
  • [12] Y. Chen, W. Li, X. Chen, and L. V. Gool (2019) Learning semantic segmentation from synthetic data: A geometrically guided input-output adaptation approach. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.3.
  • [13] Y. Chen, W. Li, and L. Van Gool (2018) Road: Reality oriented adaptation for semantic segmentation of urban scenes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.3.
  • [14] E. David, S. Madec, P. Sadeghi-Tehran, H. Aasen, B. Zheng, S. Liu, N. Kirchgessner, G. Ishikawa, K. Nagasawa, and M. A. Badhon (2020) Global Wheat Head Detection (GWHD) dataset: a large and diverse dataset of high-resolution RGB-labelled images to develop and benchmark wheat head detection methods. Plant Phenomics. Cited by: §1.
  • [15] A. Etienne and D. Saraswat (2019) Machine learning approaches to automate weed detection by UAV based sensors. In Autonomous Air and Ground Sensing Systems for Agricultural Optimization and Phenotyping, Cited by: §2.1.
  • [16] M. Everingham and J. Winn (2011) The pascal visual object classes challenge 2012 (voc2012) development kit. Pattern Analysis, Statistical Modelling and Computational Learning, Tech. Rep. Cited by: §2.3.
  • [17] T. M. Giselsson, R. N. Jørgensen, P. K. Jensen, M. Dyrmann, and H. S. Midtiby (2017) A public image database for benchmark of plant seedling classification algorithms. arXiv:1711.05458. Cited by: §1.
  • [18] S. Haug and J. Ostermann (2014) A crop/weed field image dataset for the evaluation of computer vision based precision agriculture tasks. In European Conference on Computer Vision, Cited by: §1.
  • [19] D. Hennessy, M. Saad, B. Mac Namee, N.E. O’Connor, K. McGuinness, P. Albert, B. Narayanan, and A. O’Connor (2021) Using image analysis and machine learning to estimate sward clover content. In European Grassland Federation Symposium, Cited by: §1, §3.2, §3.3, §4.1.
  • [20] M. Himstedt, T. Fricke, and M. Wachendorf (2012) The benefit of color information in digital image analysis for the estimation of legume contribution in legume–grass mixtures. Crop Science. Cited by: §2.2.
  • [21] N. Islam, M. Rashid, S. Wibowo, S. Wasimi, A. Morshed, C. Xu, and S. Moore (2020) Machine learning based approach for Weed Detection in Chilli field using RGB images. In International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), Cited by: §2.1.
  • [22] N. Islam, M. M. Rashid, S. Wibowo, C. Xu, A. Morshed, S. A. Wasimi, S. Moore, and S. M. Rahman (2021) Early Weed Detection Using Image Processing and Machine Learning Techniques in an Australian Chilli Farm. Agriculture. Cited by: §2.1, §2.1.
  • [23] L. Jiang, D. Huang, M. Liu, and W. Yang (2020) Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels. In International Conference on Machine Learning (ICML), Cited by: §2.4.
  • [24] Y. Jiang, C. Li, A. H. Paterson, and J. S. Robertson (2019)

    DeepSeedling: deep convolutional network and Kalman filter for plant seedling detection and counting in the field

    .
    Plant methods. Cited by: §1.
  • [25] X. Ju, X. Liu, F. Zhang, and M. Roelcke (2004) Nitrogen fertilization, soil nitrate accumulation, and policy recommendations in several agricultural regions of China. AMBIO: a Journal of the Human Environment. Cited by: §1.
  • [26] H. Kaiming, Z. Xiangyu, R. Shaoqing, and S. Jian (2016) Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §4.1.
  • [27] S. Khaki, N. Safaei, H. Pham, and L. Wang (2021) Wheatnet: A lightweight convolutional neural network for high-throughput image-based wheat head detection and counting. arXiv:2103.09408. Cited by: §1.
  • [28] A. Krizhevsky, I. Sutskever, and G. Hinton (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (NeurIPS), Cited by: §4.1.
  • [29] P. Lameski, E. Zdravevski, V. Trajkovik, and A. Kulakov (2017) Weed detection dataset with RGB images taken under variable light conditions. In International Conference on ICT Innovations, Cited by: §2.1.
  • [30] M. A. Lefsky, W. B. Cohen, G. G. Parker, and D. J. Harding (2002) Lidar remote sensing for ecosystem studies: Lidar, an emerging remote sensing technology that directly measures the three-dimensional distribution of plant canopies, can accurately estimate vegetation structural attributes and should be of particular interest to forest, landscape, and global ecologists. BioScience. Cited by: §2.1.
  • [31] W. Li, L. Wang, W. Li, E. Agustsson, and L. Van Gool (2017) WebVision Database: Visual Learning and Understanding from Web Data. arXiv: 1708.02862. Cited by: §2.4.
  • [32] Y. Li, N. Wang, J. Shi, J. Liu, and X. Hou (2016) Revisiting batch normalization for practical domain adaptation. In International Conference on Learning Representations Worksop (ICLRW), Cited by: §2.3, §4.2.
  • [33] T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick (2014)

    Microsoft coco: Common objects in context

    .
    In European conference on computer vision (ECCV), Cited by: §2.3, §3.3.
  • [34] Y. Luo, P. Liu, T. Guan, J. Yu, and Y. Yang (2019) Significance-aware information bottleneck for domain adaptive semantic segmentation. In IEEE International Conference on Computer Vision (ICCV), Cited by: §2.3.
  • [35] C. L. McCarthy, N. H. Hancock, and S. R. Raine (2010) Applied machine vision of plants: a review with implications for field deployment in automated farming operations. Intelligent Service Robotics. Cited by: §2.1.
  • [36] A. Milioto, P. Lottes, and C. Stachniss (2018) Real-time semantic segmentation of crop and weed for precision agriculture robots leveraging background knowledge in CNNs. In IEEE international conference on robotics and automation (ICRA), Cited by: §1.
  • [37] M. Minervini, A. Fischbach, H. Scharr, and S. A. Tsaftaris (2016) Finely-grained annotated data20sets for image-based plant phenotyping. Pattern recognition letters. Cited by: §1.
  • [38] A. K. Mortensen, H. Karstoft, K. Søegaard, R. Gislum, and R. N. Jørgensen (2017) Preliminary results of clover and grass coverage and total dry matter estimation in clover-grass crops using image analysis. Journal of Imaging. Cited by: §1, §1, §2.2.
  • [39] F. Nájera, Y. Tapia, C. Baginsky, V. Figueroa, R. Cabeza, and O. Salazar (2015) Evaluation of soil fertility and fertilisation practices for irrigated maize (Zea mays L.) under Mediterranean conditions in central Chile. Journal of soil science and plant nutrition. Cited by: §1.
  • [40] B. Narayanan, M. Saadeldin, P. Albert, K. McGuinness, and B. Mac Namee (2020) Extracting pasture phenotype and biomass percentages using weakly supervised multi-target deep learning on a small dataset. In Irish Machine Vision and Image Processing conference, Cited by: §1, Table 4.
  • [41] D. Nyfeler, O. Huguenin-Elie, M. Suter, E. Frossard, J. Connolly, and A. Lüscher (2009) Strong mixture effects among four species in fertilized agricultural grassland led to persistent and consistent transgressive overyielding. Journal of Applied Ecology. Cited by: §1.
  • [42] D. Ortego, E. Arazo, P. Albert, N. O’Connor, and K. McGuinness (2020) Towards Robust Learning with Different Label Noise Distributions. In International Conference on Pattern Recognition (ICPR), Cited by: §2.4.
  • [43] A. Paikekari, V. Ghule, R. Meshram, and V. Raskar (2016) Weed detection using image processing. International Research Journal of Engineering and Technology (IRJET). Cited by: §2.1.
  • [44] C. A. Pulido-Rojas, M. A. Molina-Villa, and L. E. Solaque-GuzmÃ!‘n (2016) Machine vision system for weed detection using image filtering in vegetables crops. Revista Facultad de IngenierÃa Universidad de Antioquia. Cited by: §2.1.
  • [45] S. R. Richter, V. Vineet, S. Roth, and V. Koltun (2016) Playing for data: Ground truth from computer games. In European conference on computer vision, Cited by: §2.3.
  • [46] G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez (2016) The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In IEEE conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.3.
  • [47] I. Sa, Z. Ge, F. Dayoub, B. Upcroft, T. Perez, and C. McCool (2016) Deepfruits: A fruit detection system using deep neural networks. Sensors. Cited by: §1, §2.1.
  • [48] S. Sankaranarayanan, Y. Balaji, A. Jain, S. N. Lim, and R. Chellappa (2017) Unsupervised domain adaptation for semantic segmentation with gans. arXiv: 1711.06969. Cited by: §2.3.
  • [49] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb (2017) Learning from simulated and unsupervised images through adversarial training. In IEEE conference on computer vision and pattern recognition (CVPR), Cited by: §2.3.
  • [50] S. Skovsen, M. Dyrmann, J. Eriksen, R. Gislum, H. Karstoft, and R. N. Jørgensen (2018) Predicting dry matter composition of grass clover leys using data simulation and camera-based segmentation of field canopies into white clover, red clover, grass and weeds. In International Conference on Precision Agriculture, Cited by: §2.2.
  • [51] S. Skovsen, M. Dyrmann, A. K. Mortensen, M. S. Laursen, R. Gislum, J. Eriksen, S. Farkhani, H. Karstoft, and R. N. Jorgensen (2019) The GrassClover image dataset for semantic and hierarchical species understanding in agriculture. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Cited by: §1, §1, §2.2, §3.2, §3.3, §3.6, §4.1, §4.5, Table 4.
  • [52] K. Søegaard (2009) Nitrogen fertilization of grass/clover swards under cutting or grazing by dairy cows. Acta Agriculturae Scandinavica Section B–Soil and Plant Science. Cited by: §1, §2.2.
  • [53] K. Sohn, D. Berthelot, C.-L. L, Z. Zhang, N. Carlini, E. Cubuk, A. Kurakin, H. Zhang, and C. Raffel (2020) FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence. arXiv: 2001.07685. Cited by: §2.4.
  • [54] J. Tang, X. Chen, R. Miao, and D. Wang (2016) Weed detection using image processing under different illumination for site-specific areas spraying. Computers and Electronics in Agriculture. Cited by: §2.1.
  • [55] M. Toldo, A. Maracani, U. Michieli, and P. Zanuttigh (2020) Unsupervised domain adaptation in semantic segmentation: a review. Technologies. Cited by: §2.3.
  • [56] T. Vu, H. Jain, M. Bucher, M. Cord, and P. Pérez (2019) Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.3.
  • [57] N. Vyas, S. Saxena, and T. Voice (2020) Learning Soft Labels via Meta Learning. arXiv: 2009.09496. Cited by: §2.4.
  • [58] S. Zagoruyko and N. Komodakis (2016) Wide residual networks. arXiv: 1605.07146. Cited by: §4.1.
  • [59] H. Zhang, M. Cisse, Y.N. Dauphin, and D. Lopez-Paz (2018) mixup: Beyond Empirical Risk Minimization. In International Conference on Learning Representations (ICLR), Cited by: §2.4.
  • [60] J. Zhu, T. Park, P. Isola, and A. A. Efros (2017)

    Unpaired image-to-image translation using cycle-consistent adversarial networks

    .
    In IEEE international conference on computer vision (ICCV), Cited by: §2.3.