A review of the recent literature shows there is great optimism that advances in sensors (1; 2; 3; 4), robotics (5; 6; 7; 8; 9; 10), and machine learning (11; 12; 13; 14; 15; 16) will bring new innovations destined to increase agricultural production and global food security. Whether one speaks more broadly of precision agriculture, digital agriculture or Agriculture 4.0 (in reference to the anticipated fourth agricultural revolution), the confluence of these technologies in particular could lead, for example, to automated methods of weeding, disease evaluation, plant care, and phenotyping (16; 17; 18; 19; 20; 21; 22; 23; 24; 25; 26). Such capabilities would increase crop yields and expedite breeding programs, while minimizing inputs (e.g. water, fertilizer, herbicide, pesticide) and reducing the impact on the environment.
Prototypes of autonomous vehicles performing farming tasks in the field exist already (9; 10; 27; 28). However, putting the “brains” into such agents is still a hard challenge and success is limited to a crop’s specifics and the task at hand. Machine learning (ML) utilizing convolutional neural networks (CNNs) holds great promise for image-based location and identification tasks in agriculture. The capabilities of CNNs have improved vastly in recent years (29; 30; 31) and are now used as solutions to previously difficult problems such as object detection within images (32)33), automatic image annotation (34), self-driving cars (35) and automated map production (36).
While there are many different CNN architectures and training methods, a general rule of thumb is the following: A model’s capability to identify objects in previously unseen data (called generalizing) depends significantly on the amount of data the model has seen during training (29; 37). As a result, an inadequate amount of high-quality training data—in particular, labeled data—is often the bottleneck in developing ML-based applications, a fact underscored by many authors working in plant sciences and agriculture (11; 12; 13; 17; 18; 19; 21; 22; 23; 24; 25; 26). This problem is magnified by the circumstance that each application is likely to require its own specific training data, especially given the very wide variety of plant appearances, e.g. tillering versus ripening, healthy versus diseased, crop versus weed. For example, training CNNs to distinguish oats from their wild counterpart—which are responsible for an annual loss of up to $500 million in the Province of Manitoba alone111According to https://www.gov.mb.ca/agriculture/crops/weeds/wild-oats.html —would certainly require a qualitatively and quantitatively rich dataset of labeled images of all variants.
The need for labeled training data is often satisfied by manual annotation, which is typically achieved through one of two ways. If the classification problem is common knowledge, it can be crowdsourced, as is done through platforms, such as Mechanical Turk (38) and ReCaptcha (39). Conversely, if the classification problem requires expert knowledge, crowdsourcing will not be reliable and annotation must be performed by experts only. Both methods have been suggested for labeling plant images (12; 22; 24; 25), and although there are tools available to ease the process (40; 41; 42)
, manual annotation is both cost- and time-intensive and usually leads to comparably small datasets in the magnitude of a couple of thousands images. As a workaround to having large, labeled datasets, several strategies, such as transfer learning with smaller labeled data sets(12; 43; 44)
or unsupervised learning with unlabeled data(18), are being explored. Given the preference for large, labeled datasets, data augmentation is also being employed and ranges from simple modifications of existing images (e.g. rotation, translation, flipping, and scaling) (12) to generating synthetic images (11; 12; 44).
In an effort to produce large quantities of high-quality training data for machine learning applications in agriculture, we have developed an embedded system to automatically generate and label images of real plants. This system—henceforth referred to as EAGL-I (Embedded Automated Generator of Labeled Images)—is, in a nutshell, a robotically moved camera that takes pictures of known plants at known locations from a large variety of known positions and angles. Since we have full information and control over where on the image the plants are located, we can automatically identify and label them. As a result, EAGL-I can generate labeled data at the rate of thousands to tens of thousands of images per day, with minimal human interaction and no dependence on crowdsourcing or expert knowledge.
, their primary purpose has been to capture and compare phenotypic information and growth metrics. This is typically achieved through overhead shots only and requires close-to-zero variance in imaging conditions to ensure a high accuracy in extracting plant characteristics. This is at odds with the type of datasets needed to train machine learning algorithms for plant classification. In this case, one is interested in arich dataset, with a wide variety of images falling under the same label. Variety can be achieved through differences in used parameters, such as imaging angle (Fig 1), camera-to-plant distance, lighting conditions, time of day, growth stage, and the use of different plants of the same cultivar or species. One must also include different plants with different growing characteristics. For example corn (a fast-growing, tall grass) is very different, say, compared to dandelion (a ground-hugging rosette), but one still needs examples of both (and indeed others) in the same training set to identify crop versus weed with the highest possible accuracy. EAGL-I has the capabilities to incorporate all these differences and is, to the best of our knowledge, the only imaging system fully dedicated to the goal of generating machine learning datasets for plant classification.
The contributions of this paper are the following:
We designed an imaging system to create labeled datasets for training machine learning models
This system has a high imaging rate and autonomously labels the imaged plants, offering an alternative to cost-intensive manual labeling.
The system can image plants from any angle and at different distances, thus, producing the variety needed for training datasets
A wide variety of plants can be imaged and there is full freedom in their arrangement in the coverable volume
As a proof of concept, we generated a dataset of different weeds commonly found in the Province of Manitoba, trained a CNN with it, and evaluated the resulting model on previously unseen data
The rest of the paper is structured as follows. Section 2 describes the EAGL-I’s parts, specifications, and mode of operation. Section 3 describes data generation and defines the imaging rate of EAGL-I. It also lists the parameters we used in production to generate a training dataset. In Section 4, we characterize that dataset and use it to train a CNN to distinguish dicots from monocots. Section 5 concludes the paper and discusses planned improvements to the system and future work.
|X, Y, and Z Actuators||Macron Dynamics||MSA-628||Volume covered (mm):|
|Planetary Gearbox||Servo-Elements||MPS-60-005||5:1 Ratio|
|Stepper Motors (XYZ)||Servo-Elements||ST24-1.8-297||NEMA 24, 24 V DC 2.8 A1.8° step angle2.7 Nm rated torque.Stepper drives integrated.|
|Controller of X, Y, Z Actuators||Arduino||Uno Rev3||Microcontroller: ATmega328PClock Speed: 16 MHzMax. Pulse Rate:4000 pulses/second|
|RGB-Camera||GoPro||Hero 7 Black||Res: pxUsed in linear mode, no zoom: FOV = 98.7°File format: jpg|
|Servo Motors (pan-tilt)||Dynamixel||MX-28T||11.1-14.8 V, 1.4 A0.088° step angle2.5 Nm stall torque|
|AC/DC Converter||Mean Well USA||LRS-350-36||Output: 36 V, 9.7 A, 350 W|
2 System Overview
Table 1 gives an overview on the EAGL-I hardware.
The system is setup in a gantry configuration (Fig 2), such that the gantry head can be moved in all three dimensions of a volume measuring115 x 84 x 71 cm. Two actuators per axis provide movement in the X-Y-plane and a fifth actuator raises or lowers the gantry head. For safety and repeatability, we equipped the actuators with limit and homing switches. The normally closed limit switches prevent the actuators to move beyond their bounds. When the switches trigger (or lose power) the whole systems shuts off immediately and until a manual reset. The homing switches counteract possible drifts or slips of the actuators. An Arduino Uno controls the gantry system’s actuators, with power supplied by a 350-W AC/DC converter.
On the gantry head we attached a pan-tilt system followed by an RGB camera. An Arduino-compatible micro-controller powers and controls the pan-tilt system via two servo motors, allowing the camera to be rotated through any combination of azimuthal and polar angles (360 pan, 180 tilt). The camera itself is powered by a commercial 20-Ah power bank that can support its imaging process for over 8 hours and is easily swapped out.
3 Data Production
The two main contributions here are the duration of the robotic movement and the image processing time of the camera, each of which are discussed separately below.
3.1 Robotic Movement
The camera is moved by the xyz-gantry and the pan-tilt-subsystem. Since panning and tilting the camera happens in parallel to the movement in x, y, and z (and is almost always faster), we can neglect that contribution for the imaging rate. We control the actuators close to the maximal pulse-rate the Arduino Uno can output (4000 pulses per second). This translates into a movement speed of
where is the pulse rate, is the distance traveled per revolution of the actuator in millimeters, is the fraction of a full revolution made by 1 step of the stepper motors, is the gearbox’s reduction ratio, and is a factor determined by the stepping mode. For full-stepping mode , whereas half-stepping means . The controller uses a linear acceleration and deceleration profile to ease in and out of the actuators’ movements. Overall, then, we have a nearly linear proportional relationship between pulse rate and travel speed. Furthermore, all three axis can be moved in parallel or one after each other.
When the camera is moved to a new position and orientation, it is useful to pause before proceeding to trigger it to take an image. This allows vibrations to settle down and not doing so might result in blurry images, especially when using longer exposure times.
When going through many different camera positions in sequence, the order in which those positions are visited is of equal, if not even higher, importance than the speed with which the camera is moved. To obtain a general optimal solution one would have to solve a three-dimensional traveling salesman problem (TSP), which is a well-known NP-hard problem in combinatorial optimization. In our typical application, we would have to solve the TSP for thousands of different positions. While still feasible, we settled for a nested zig-zag algorithm, as depicted in Fig3, which offers a straightforward method to keep travel times between successive camera positions short.
The cuboid-shaped volume through which the gantry system can move the camera is divided into slabs of equal width along its X-axis. Those slabs are all subdivided into equally wide columns along the Y-axis. Now, starting at the bottom of the first column (containing the coordinate system’s origin), we move the gantry head to the position inside that column with the smallest Z-value. From there we move upwards through the positions with the next-largest Z-values inside that column (ties in Z-values are resolved arbitrarily). Note that small movements in X- and Y-direction are still happening, but are limited by the columns boundaries. We keep moving upwards until reaching the highest position inside the column. From there we continue to the next column in positive Y-direction and reverse the procedure: we start with the position having the largest Z-value and descend through the column. We keep zig-zagging through the first slab’s columns until we reach the end of its last column. From there we move to the second slab in positive X-direction. We continue a zig-zag motion working our way through the columns, but this time, when we change columns we move in negative Y-direction, until having traversed the entire second slab. We continue those zig-zag motions from slab to slab, until each position was visited.
3.2 Imaging Process
The imaging process is initiated by sending an HTTP request to the camera over WiFi. The delay to send and process the request is negligible (of order of a few milliseconds) and thus is of no concern for the imaging rate. The time to perform the imaging itself depends on the camera settings and lighting conditions. In our indoor setup, without additional light sources and a maximal ISO of 200, the camera needs approximately 2.7 seconds to take an image. Allowing a higher exposure index would reduce that time, but also introduce grain to the image. Additional lighting will reduce the exposure time, but is presently not a main concern.
Images can be downloaded from the camera via a USB or WiFi connection. In either case, one can retrieve each image directly after it has been captured or retrieve all images in bulk after the system went through each of its positions. Retrieving the images in bulk decouples the imaging procedure from retrieving the data. By doing so, any delays or problems when transferring the images does not interfere with collecting the images. For the sake of automation, we value image collection higher than the data retrieval, since data generation takes much longer than its retrieval and thus is harder to repeat.
Depending on the application, an easy way to increase imaging rate is by cropping several subimages from a single image taken a given position. In our application (generating single plant training data) this is a valid approach and can increase imaging rate up to one order of magnitude. Cropping out subimages results in different image sizes, which could be considered a drawback for some applications, but is rarely so in machine learning. Fig 4 shows an example of cropping several images from a master image.
3.3 Production Settings
We define average production times and for master- and subimages, respectively, as follows:
where is the total time required to produce master images (including robotic movements), is the time to bulk download all master images from the camera to the computer, and is the time required to crop out a total of subimages from the master images.
To create a training dataset, we have performed runs with the system on a daily basis under the settings listed in Table 2. This resulted in s and s. Those settings are conservative and we have achieved during testing s. Imaging at such fast rates comes at a cost of image quality, however. First, the shorter exposure time increases the ISO needed, which in turn introduces grain to the image. Second, to achieve maximal imaging rates, we have to pack plants in a tighter arrangement under the system. That can lead to overlap in the bounding boxes, i.e. meaning there are cases in which we can see plant material of neighboring plants in the images. Both points have to be accounted for, when using the data as training sets in machine learning. Higher grain in the image masks detailed features, and plant material from neighboring plants bring in unwanted features that do not correlate with the actual plant in the image. Image quality and imaging speed are two defining factors for the datasets that can be produced by EAGL-I and often have to be traded off for one another.
|Parallel X, Y, Z Movement||No|
|Peak Pulse Rate||3000 pulses|
|Acceleration Rate||10000 pulses|
|Pause before Camera Trigger||3 seconds|
|Routing Algorithm||Nested Zig-Zag|
|Imaging Time||Approx. 2.7 seconds|
|Image Download||WiFi, In bulk|
|Time for Imaging||3 hours, 25 minutes|
|Download Time||46 minutes|
|Cropping Time||34 minutes|
|Imaging Rate (Images)||Approx. 7 s/image|
|Imaging Rate (Subimages)||Approx. 4.8 s/image|
|Size on Disk||8-9 GB|
3.4 Cropping and Labeling Subimages
Different methods are available to us for cropping out a single plant from a master image. In the following we give a roadmap for two approaches based on image processing and CNNs, respectively. We chose for our system a third approach, instead, that relies on spatial information alone.
An image processing approach relies on color differences between the plant, the soil, and the image’s background. With segmentation algorithms we could identify the plants inside the image and construct a minimal bounding box around it. We describe a similar process in Subsection 3.5. In a second step the segments would have to be matched to the plants’ known positions to assign the correct label.
Alternatively one could consider machine learning techniques themselves for cropping and labeling subimages. This approach, however, can only be applied once a sufficiently trained model is available. Here a two-step procedure could be employed: First, a model is trained to define bounding boxes in the image for each plant. These bounding boxes would again be matched to the plants’ known positions for labeling. Now, a second model could be bootstrapped, that not only finds bounding boxes, but also labels them by recognizing the plants shown. Keeping in mind, that creating such models is ultimately the purpose of EAGL-I, we encounter a “chicken or egg” problem.
In the case that there are more than one plant captured in one image, both approaches mentioned above have to rely on the plants’ spatial information at one point or another to correctly match labels with subimages. Only after achieving the goal, which EAGL-I was built to solve, we can discard spatial information completely, while still correctly labeling subimages. On the other side, spatial information is always available to us and is sufficient for cropping and labeling subimages. This motivates the purely geometric approach we have implemented into our system. It calculates the plants’ coordinates inside the image from their known relative position and angle to the camera. As a result, labeling sub-images becomes trivial. Furthermore, the method is robust, as we do not have to rely on the stability of an image processing pipeline or a machine learning algorithm’s accuracy.
To calculate the bounding box around the plant we define a sequence of linear transformations that match the plant’s real-world coordinates (world frame) with the plant’s xy-position inside the image (image frame). The net transform is
Here is the linear transformation from world frame to camera frame, i.e. a frame in which the camera is the origin pointing in positive x-direction. Thus, the linear transformation consists of a translation, depending on the gantry head position and the displacement due to the pan-tilt system, and a rotation due to panning and tilting the camera. The transformation converts the camera frame to the image frame, meaning that the objects inside the camera’s field of view are being projected on the xy-coordinates of the image. For this we calculate bearing and elevation of the object’s position from the camera. Using these angles we map the object to xy-coordinates (given in pixels), depending on the camera’s resolution and field of view. To calculate the object’s size in the image frame we calculate its subtended angle from the camera. To this end, we replace, for calculations, the plant by a sphere with radius large enough that the plant is contained inside of it. For full details on these transformations, we refer to our code in (59).
Given that we place plants on the floor (meaning the z-coordinate is known), we can also invert the projection and the transform to map the position and size of objects in image frame back to world frame. This inversion effectively allows us to determine any (sufficiently flat) object’s x- and y-position from a single overhead image taken by the system itself.
We want to point out that following a geometric approach to locate the plants comes with its own challenges: It relies on precise and accurate movements of the camera and location of the targets. Accuracy and precision of our robot’s movements turned out to be sufficient for this approach. To achieve good positioning of the targets, we measured and marked 12 target locations that we use repeatedly. The system can also generate new target locations and mark them with a laser. This allows us to not be limited to a fixed set of positions. A second challenge to a geometric approach are lens distortions, i.e. deviations from a perfect rectilinear projection from camera frame to image frame. Such distortions usually appear on the image frame’s margins. We countermeasure those drawbacks by using relatively large spheres to approximate the plants imaged. Other countermeasures would be to measure the distortions and use software correction before cropping the subimages, or to simply not use subimages that lie too close to the image’s margins, or to use digital zoom that effectively reduces the field of view to an area with only negligible lens distortions.
3.5 Image Postprocessing
As mentioned above, EAGL-I produces images against a neutral blue background. This enables and simplifies image processing techniques. To demonstrate that, we performed a simple color-based background removal on the example in Fig 5. To this end, we used the PlantCV library for Python (60), which itself is based on OpenCV (61). In a first step we convert the image from the usual RGB color space to the CIELAB color space, in which the b-channel ranges from low values for blue pixels to high values for yellow pixels. Fig 5b shows the b-channel of our example as a grayscale image, the blue background appears dark, whereas the plant and soil are bright shades of gray. With a binary threshold-filter based on this channel we keyed out the plant as shown in Fig 5c. Additional filters can be applied to remove artifacts and to smooth the edges of the thresholding operation (e.g. dilating, filling holes, eroding, Gaussian blur).
Since the camera positioning can be repeated precisely, a second technique to key out the plant also becomes available: background subtraction. For this technique a second picture is taken from the same position and angle but without the plant. This image, that contains the background only, can be subtracted from the image containing the plant, leaving the plant itself.
Further image processing can be employed to remove the dark soil from the green plant or to extract morphological information. Those techniques are widely deployed in the area of (high throughput) phenotyping. For those techniques we refer to (60) and PlantCVs online documentation.
4 The Weedling Dataset
As proof of concept we have generated a labeled dataset of seedlings of eight weeds that are common in Manitoban fields. This dataset (62) allows us to test systems that lie downstream in the development pipeline, in particular databases and the training of machine learning algorithms.
We chose weed species as targets, as they are of general interest and can be found amongst virtually every cash-crop in the field. The reasons to focus on a rather early growing stage are several. Using seedlings allows us to grow more individuals in rotation, discarding older plants for newer ones. This results in a richer dataset, compared to imaging a smaller number of individuals over their full life cycle. Furthermore, we can image more plants in parallel, thus achieving a higher imaging rate, if they are small. Lastly, a rather important and pragmatic argument is that the identification (and eradication) of weeds is most critical in the early stages of crop growth when plants are small and a canopy has not yet developed.
To generate the dataset we used the production settings as given in Table 2. In 10 runs we generated 34,666 subimages of weeds in a total imaging time of 47 hours and 30 minutes. Setting up the system to perform a single run requires personal attendance of roughly 15 minutes, after which the system continues autonomously and does not need further supervision. All images were taken in front of the blue background (Figs. 1, 2 and 4) to ease image processing and segmentation. Furthermore, the uniform background helps in the training process to focus the model on the plant features and eliminate random correlations. Table 3 gives an overview on the dataset’s characteristics.
|Weed||Number of Images*|
|Echinochloa crus-galli (Barnyard Grass)||8621|
|Cirsium arvense (Canada Thistle)||4706|
|Brassica napus (Volunteer Canola)||6723|
|Taraxacum officinale (Dandelion)||4797|
|Persicaria spp. (Smartweed)||870|
|Fallopia convolvulus (Wild Buckwheat)||4621|
|Avena fatua (Wild Oat)||1218|
|Setaria pumila (Yellow Foxtail)||3110|
*Variations are due to different germination success
Each image of the dataset is accompanied by two additional files. The first is a copy of the original image that contains bounding boxes corresponding to the cropped out subimages. The second is a JSON-file containing the following metadata fields:
version:A version number differentiating file formats; this dataset’s version is 1.5 and differs from earlier test sets in the number of data fields and formatting style.
file_name: A unique image identifier of the form yyyymmddhhmmss-pose#.jpg, where the first 14 digits encode year, month, day, hour, minute, and second of when the image was captured. The number after pose denotes the position of a specific data-acquisition run.
bb_file_name: A unique identifier for a copy of the master image with bounding boxes drawn on it. The format is equal to the one in file_name but with a -bb attached after the pose number.
date and time: Date and time at which the picture was taken
room and institute: Abbreviated location of where EAGL-I was set up.
camera and lens: Information about the camera being used. In the case that there is no specific lens information the lens field can be used for model information (in our case we use camera = GoPro and lens = Hero 7 Black)
camera_pose: A literal containing the camera position in terms of X, Y, and Z coordinates, polar-, and azimuthal angle.
bounding_boxes: A list of objects containing information for all cropped subimages, containing the following fields for each such image:
plant_id: A unique identifier for each plant, consisting of the first letters of its scientific name and a number, for example: echcru002
label: The common name label, for example: BarnyardGrass
scientific_name: For example Echinochloa crus-galli
position_id: Denoting the positional ID at which the plant was located
subimage_file_name: A unique subimage identifier of the form yyyymmddhhmmss#.jpg, where # is the position ID that ensures uniqueness
date_planted: The day we put the plant’s seed in soil
x_min, x_max, y_min, y_max: The subimage’s coordinates in the parent image, given as a percentage. A value of denotes the image’s upper right corner, whereas denotes the lower left corner; this is conform to the directions as defined in the OpenCV-library, which is used for our image processing pipeline
Since the available imaging perspectives of a plant depends on where it is located, we have sorted the position IDs into two classes: In the first class, four of the positions lie on the edge of the volume that the gantry system can cover. That limits the camera-poses from which we can image that position to half a cylinder. The second class of positions lie in the interior of the coverable volume, resulting in a half-sphere of possible camera-poses to image from. See Fig 6 for a visualization of the two different classes. The subimages are sorted by these two location classes and saved into respective subfolders.
4.1 Training a CNN
Here we establish the value of data collected with the EAGL-I system by training a CNN to sort plant images into one of two distinct classes.
4.1.1 Model and Task
The specific task we pose to the network is to differentiate between grasses and non-grasses. As representatives for grasses we have barnyard grass, wild oats and yellow foxtail. We chose this task (in contrast to other classification challenges like identifying each species by itself or for example differentiating the cash crop canola from weeds) for two reasons: First, a significant portion of our training images includes seedlings that have not grown their first true leaves, yet. Since all non-grasses in our datasets are dicots, a visual distinction between grasses and non-grasses is possible even during their earliest growing stages. Second, a key question to answer is how the data generation process has to be improved such that models trained on the respective data generalize to new scenarios. For this it is instrumental to test the models on external data. Defining this rather general task allows us to run the model with a wider variety of data, specifically to plants that we did not have access to when generating the training set.
To perform this task we train a model based on the established ResNet architecture (63) with 50 layers and randomly initialized weights. We average and normalize the input images to enhance the actual differences between the pictures, which are the plants (in contrast to the rather uniform blue background). To counteract the slight imbalance between the two classes we introduce class weights and defined as
We used 80% of the data for training, reserved 20% as validation data, and repeated training over the entirety of the training set 50 times (each one forming an epoch
). The validation accuracy achieved a satisfactory convergence with a validation accuracy of 99.71% after 50 epochs (average of 99.79% and a variance below 0.025% over the last 20 epochs). The evolution of the validation accuracy per trained epoch is graphed in Fig7.
4.1.2 Results in Different Scenarios
To test our network’s capabilities of distinguishing monocots from dicots, we presented it with new, previously unseen data. To investigate how well the model generalizes, we tested it in the following scenarios that increasingly differ from the training data:
Images of the same species taken by the EAGL-I system, but with new individual plants. Those images differ from the training set only in having different individuals of the same species.
Images of the same species taken by the EAGL-I system, but under randomized camera angles and distances.
Images of the same species outside EAGL-I’s environment with a neutral background taken by a smartphone camera.
A collection of Arabidopsis and tobacco plant images under lab conditions produced by Minvervi et al. (51).
A collection of field data of sugar beets produced by Haug and Ostermann (64).
A collection of plant seedling images produced by Giselsson et al. (65).
The results for the different scenarios are given in Table 4.
|Scenario||Size of Test Set||Correctly Identified||Accuracy|
|EAGL-I camera, same species, same angles||3494||3437||98.37%|
|EAGL-I camera, same species, randomized angles||520||513||98.65%|
|Neutral Background, smartphone, same species||56||50||89.29%|
|Minervi et al. (51)||347||283||81.56%|
|Haug and Ostermann (64), field data, sugar beets||162 (of 494)||120||74.07%|
|Giselsson et al. (65) field data, different species||500 (of 5539)||316||63.20%|
In the first scenario an accuracy of 8.37% was achieved, indicating that the model generalizes to new plants of the same species imaged under the same circumstances. The model we used has converged on the training data and might even show first signs of overfitting. For example, if we apply the model that is available after 40 epochs of training, the accuracy, the accuracy on the test data increases by 0.5% to 98.89%. This is a sign that improving the accuracy further requires a richer dataset, a more complex model, or a combination of both.
When we randomize the positions from which we take images, we see that it has no significant impact on the model’s overall accuracy. From this we conclude that the variety of angles covered in our training sets are sufficient for the model to be insensitive to imaging angles (such as profile shots or overhead shots) when distinguishing grasses from non-grasses.
For images taken by smartphone with a neutral background, a high accuracy above 89% is still achieved. The model generalizes to new imaging conditions, then, although with reduced accuracy (which is to be expected). However, test data is significantly smaller in quantity, since its generation required to manual capture and labeling. Thus, the accuracy on the test data could deviate from the model’s accuracy on a larger set of similar images. To give a more complete picture of where the model’s true accuracy lies, we calculated a Clopper-Pearson confidence interval ofat a confidence level .
We now explore how our model generalizes to data produced by others for species that are not represented in our training set. The dataset in (51)
consists of 283 images of Arabidopsis plants and 62 tobacco plant images. The images are all taken top-down and show the plants at different growing stages. The dataset was taken with phenotyping applications in mind and contains images of dicots only. On the overall data we achieve an accuracy of 81.56%, which in this case coincides with how many plants were classified as dicots. This is a strong demonstration that our model can generalize to species not included in the training data. If we break the test data down via the two species, we see that the model has an even better performance on the Arabidopsis images (91.23%), while performing rather poorly on tobaccos (37.1%). This tells us that the training set we generated is missing dicots that are morphologically similar to tobacco plants, and that we need to include these to achieve a more robust model.
As a next step to test how far our binary classifier generalizes, we applied it to the dataset provided in (64). This dataset consists of field data taken in a sugar beet field and features crop and weed plants. Since the annotations do not specify the weeds, we only use images that show sugar beets (a dicot). The original data in (64) shows several plants per image. Thus, we used the metadata provided by the authors to crop out the sugar beet plants. Still, on many of those cropped images we see plants overlapping into the cropped section. This is a challenge for our model, which was trained on single plant images only. The test data also features natural background (dirt) in contrast to the rather homogenous backgrounds on images we trained and tested on before. On the aforementioned subset our model achieves an accuracy of 74.04%. While not perfect, this shows that the model has already some capacity to generalize to new lighting and background conditions and another species of plants the model has not trained on.
Finally, we applied our model to the dataset given in (65). This dataset is very challenging for various reasons: First, the contrast between plant and background is not as distinguished as in our training set or the other test sets. Second, the data contains many plants at their earliest growing stages and as a result some images have a resolution as small as 49 x 49 pixels (see Fig 8 for an example of a high- and low-resolution image). Third, as in the previous test dataset, the images contain multiple plants, though the authors of (65) have ensured that only one species is present in each image. Fourth, the dataset contains only species that are not present in our training data. Still, our goal to distinguish monocots from dicots remains. To this end, we sorted the plants in (65) into two categories: maize, wheat, blackgrass, and loose silky-bent represent monocots; whereas sugar beet, mayweed, chickweed, shepherd’s purse, cleavers, charlock, fat hen, and cranesbill comprise the set of dicots. To test the model we chose the 250 images with highest resolutions for both classes. The achieved accuracy is 63.2% (confidence interval at ). Although this value does not lie far above 50%, it is still significant as it shows that the model generalizes to some extent to data that shares only small similarities to the training data. A first step to improve accuracy would be to detect and crop out the plants in the test data before classification. This reduces the number of artifacts and ensures that no multiple plants are in a single image. Another improvement for this specific test data would be achieved by generating training data more suitable to the task, meaning imaging species used in Ref. (65) and focusing on overhead shots. As presented in Subsection 3.5, the blue background in the training images can be replaced by images of the granulate appearing in the images of Ref. (65) to achieve an even higher similarity to the test data. This idea to create training data that resembles the data we can expect in an application is exactly the raison d’etre of the EAGL-I system.
5 Conclusion and Future Work
In this paper we described the construction, operation, and utility of an embedded system (EAGL-I) that can automatically generate and label large datasets of plant images for machine learning applications in agriculture. Human interaction is reduced to selecting the plants to image and placing them inside the system’s imaging volume. EAGL-I can create a wide diversity of datasets as there are no limitations in plant placement, camera angle, or distance between camera and plant within this volume. Furthermore, the use of blue keying fabric as a background enables additional image processing techniques such as background removal and image segmentation. The system’s performance was demonstrated along several dimensions. With a subimage production time of s, we produced a dataset of over 34,000 labeled images of assorted weeds that are common in the Province of Manitoba. We subsequently used that dataset to train a simple convolutional neural network for distinguishing monocots from dicots, which in turn was tested on a variety of other datasets with quite favorable results.
We see the EAGL-I system as a important stepping stone to enabling new ML-based technologies in agriculture, such as automated weeding, that will require large amounts of labeled training data. Our system also provides opportunities to follow research questions that were not accessible before. For example, with the ability to generate a quasi-unlimited source of data ourselves, we can investigate how quantity and quality of training data influences machine learning models. Normally the amount of training data for a problem is hard-capped and acts as an observation limit for this type of research.
There are many other directions for improvements and future work for the EAGL-I system, of which we mention a few here.
Lighting: The addition of programmable LED lighting elements are being planned and will allow us to customize lighting conditions on a per image basis, if desired. This will enable an even wider variety of images to be collected by simulating different lighting scenarios, e.g. sunny, cloudy, evening hours, etc.
System design and dimensions: EAGL-I is presently limited to take images inside its coverable volume putting hard limits on the number and size of plants that can be imaged in a given run. This leads to research questions about the design of imaging systems that are specific for the creation of labeled data. The challenge, then, is to design a system that can produce a wide variety of images—preferably including a wide variety of plants differing in size and growing pattern—at a small cost and high imaging rate. The gantry architecture of EAGL-I is simple and functional, but may not be optimal. One direction we are considering is mounting linear actuators and cameras directly to the walls and ceiling of a growth chamber.
Three dimensional plant data: Since we have full control over the camera position, we should be able to use software to reconstruct 3-dimensional plant models from 2-dimensional images taken from different angles. This could be a simple depth map extracted from two or more images via parallax or a 3d-point cloud combining more images. Alternatively, we can mount different imaging systems, such as stereoscopic cameras, to the gantry head in order to generate 3d data directly.
Detection and imaging of plant organs: Often one is interested in the specific parts or organs of a plant, such as wheat spikes. To image these effectively, we have to solve how to point the camera at the desired organ for each plant. To achieve this we could combine machine learning techniques with our imaging system to bootstrap a training dataset for identifying specific plant organs. From there we can use a model to automatically move the camera in close proximity of the wheat spikes, say, and capture high resolution images. Both, the training set for identification, and the image dataset of high resolution wheat spikes would be valuable for subsequent applications such as phenotyping, blight detection and crop evaluation in the field.
6 Data Availability
The dataset and model described in Section 4 are publicly available (62). The production of much larger future datasets is underway and will include Canadian crop plants, such as wheat, canola, soybean, and pulses. We presently envision depositing these datasets at the Federated Research Data Repository222https://www.frdr-dfdr.ca/repo/ through a data management plan developed with the tools provided by the Portage Network 333https://portagenetwork.ca.
The authors would like to the thank the following people for their support, generosity and vision: Ezzat Ibrahim for establishing the Dr. Ezzat A. Ibrahim GPU Educational Lab at the University of Winnipeg, which we used extensively for the computing resources needed here; Rafael Otfinowski, Karina Kachur and Tabitha Wood for providing us with seeds, plants and laboratory space to develop our prototypes and datasets; Jonathan Ziprick for many helpful conversations about the gantry system and actuators; and Russ Mammei and Jeff Martin for allowing us to use their magnetic field mapping system as the first prototype of EAGL-I.
- (1) Vazquez–Arellano M, Griepentrog HW, Reiser D, Paraforos DS. 3-D imaging systems for agricultural applications–A review. Sensors (Basel). 2016; 16(5):618.
- (2) Narvaez FY, Reina G, Torres-Torriti M, Kantor G, Cheein FA. A survey of ranging and imaging techniques for precision agriculture phenotyping. IEEE ASME Trans Mechatron. 2017; 22(6):2428–39.
- (3) Antonacci A, Arduini F, Moscone D, Palleschi G, Scognamiglio V. Nanostructured (Bio)sensors for smart agriculture. Trends Analyt Chem. 2018; 98:95–103.
- (4) Khanna A, Kaur S. Evolution of Internet of Things (IoT) and its significant impact in the field of Precision Agriculture. Comput Electron Agric. 2019;157:218–231.
- (5) Oberti R, Shapiro A. Advances in robotic agriculture for crops Biosyst Eng. 2016; 146:1–2.
- (6) Bechar A, Vigneault C. Agricultural robots for field operations: Concepts and components. Biosyst Eng. 2016; 149:94–111.
- (7) Bechar A, Vigneault C. Agricultural robots for field operations. Part 2: Operations and systems. Biosyst Eng. 2017; 153:110–28.
- (8) Duckett T, Pearson S, Blackmore S, Grieve B, Smith M. White paper–Agricultural robotics: The future of robotic agriculture EPSRC, 2018 International Robotics Showcase. UK-RAS White Papers, EPSRC UK-Robotics and Autonomous Systems Network. Retrieved May 6, 2020, from https://arxiv.org/ftp/arxiv/papers/1806/1806.06762.pdf
- (9) Shamshiri RR, Weltzien C, Hameed IA, Yule IJ, Grift TE, Balasundram SK, et al. Research and development in agricultural robotics: a perspective of digital farming. International Journal of Agricultural and Biological Engineering. 2018;11:1–14.
- (10) Relf-Eckstein JE, Ballantyne AT, Phillips PWB. Farming Reimagined: a case study of autonomous farm equipment and creating an innovation opportunity space for broadacre smart farming. NJAS - Wageningen Journal of Life Sciences. 2019;90-91:100307.
- (11) Lobet G. Image analysis in plant science: publish then perish. Trends Plant Sci. 2017; 22(7):559–66.
- (12) Waldchen J, Rzanny M, Seeland M, Mader P. Automated plant species identification – Trends and future directions. PLoS Comput Biol. 2018; 14(4):e1005993.
- (13) Liakos KG, Busato P, Moshou D, Pearson S, Bochtis D. Machine learning in agriculture: A review. Sensors (Basel). 2018; 18(8):2674.
Kamilaris A, Prenafeta-Boldú FX. Deep learning in agriculture: A survey. Comput Electron Agric. 2018;147:70–90.
- (16) Jha K, Doshi A, Patel P, Shah M. A comprehensive review on automation in agriculture using artificial intelligence. Artificial Intelligence in Agriculture. 2019;2:1–12.
- (17) Binch A, Fox CW. Controlled comparison of machine vision algorithms for Rumex and Urtica detection in grassland. Comput Electron Agric. 2017; 140:123–38.
- (18) Bah DM, Hafiane A, Canals R. Deep learning with unsupervised data labelling for weed detection in line crops in UAV images. Remote Sens (Basel). 2018; 10(11), 1690.
- (19) Bosilj P, Duckett T, Cielniak G. Analysis of morphology-based features for classification of crop and weeds in precision agriculture. IEEE Robot Autom Lett. 2018; 3(4):2950–56.
- (20) Barbedo JGA. Digital image processing techniques for detecting, quantifying and classifying plant diseases. Springeplus. 2013;2:660.
- (21) Fahlgen N, Gehan MA, Baxter I. Lights, camera, action: high-throughput plant phenotyping is ready for a close-up. Curr Opin Plant Biol. 2015; 24:93–99.
- (22) Singh A, Ganapathysubramanian B, Singh AK, Sarkar S. Machine learning for high-throughput stress phenotyping in plants. Trends Plant Sci. 2016; 21(2):110–24.
- (23) Shakoor N, Lee S, Mockler TC. High throughput phenotyping to accelerate crop breeding and monitoring of diseases in the field. Curr Opin Plant Biol. 2017; 38:184.
- (24) Gehan MA, Kellogg EA. High-throughput phenotyping. Am J Bot. 2017; 104(4):505–08.
- (25) Giuffrida MV, Chen F, Scharr H, Tsaftaris SA. Citizen crowds and experts: observer variability in image-based plant phenotyping. Plant Methods. 2018; 14:12.
- (26) Tardieu F, Cabrera-Bosquet L, Pridmore T, Bennett M. Plant phenomics, from sensors to knowledge. Curr Biol. 2017; 27(15):R770–83.
- (27) Lottes P, Hoeferlin M, Sander S, Müter M, Schulze P, Stachniss LC. An effective classification system for separating sugar beets and weeds for precision farming applications. Proc IEEE Int Conf Robot Autom; 2016. p. 5157–5163.
- (28) Ünal I, Topakci M. Design of a Remote-Controlled and GPS-Guided Autonomous Robot for Precision Farming. International Journal of Advanced Robotic Systems. 2015;12:194.
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vis. 2015;115:211–252.
- (30) LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444.
- (31) He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. Proc IEEE Int Conf Comput Vis. 2015; pp. 1026–1034.
- (32) Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Adv Neural Inf Process Syst. 2015. p. 91–99.
Taigman Y, Yang M, Ranzato M, Wolf L. DeepFace: Closing the Gap to Human-Level Performance in Face Verification. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit; 2014.
, Vinyals O, Toshev A, Bengio S, Erhan D. Show and Tell: A Neural Image Caption Generator, Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit; 2015.
- (35) Bojarski M, Del Testa D, Dworakowski D, Firner B, Flepp B, Goyal P,et al. End to End Learning for Self-Driving Cars. arXiv:1604.07316 [Preprint]. 2016. Available from: https://arxiv.org/abs/1604.07316
- (36) Henry CJ, Storie CD, Palaniappan M, Alhassan V, Swamy M, Aleshinloye D, et al. Automated LULC map production using deep neural networks. Int J Remote Sens. 2019;40:4416–4440.
- (37) Sun C, Shrivastava A, Singh S, Gupta A. Revisiting unreasonable effectiveness of data in deep learning era Proc IEEE Int Conf Comput Vis. 2017. p. 843–852
- (38) Buhrmester M, Kwang T, Gosling SD. Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality data? Perspect Psychol Sci. 2016;6:3–5.
Schenk E, Guittard C, et al. Crowdsourcing: What can be Outsourced to the Crowd, and Why. Workshop on open source innovation, Strasbourg, France. 2009; 72:3
- (40) Russell BC, Torralba A, Murphy KP, Freeman WT. LabelMe: a database and web-based tool for image annotation. Int J Comput Vis. 2008;77:157–173.
- (41) Rapson CJ, Seet BC, Naeem MA, Lee JE, Al-Sarayreh M, Klette R. Reducing the pain: A novel tool for efficient ground-truth labelling in images. Proc IEEE IVCNZ; 2018. p. 1–9.
- (42) Dutta A, Zisserman A. The VIA annotation software for images, audio and video. Proc ACM Int Conf Multimed; 2019. p. 2276–2279.
- (43) Ubbens JR, Stavness I. Deep Plant Phenomics: A deep learning platform for complex plant phenotyping tasks. Front Plant Sci. 2017; 8:1190.
- (44) Ubbens J, Cieslak M, Prusinkiewicz P, Stavness I. The use of plant models in deep learning: an application to leaf counting in rosette plants. Plant Methods. 2018; 14:6.
- (45) Crimmins MA, Crimmins TM. Monitoring plant phenology using digital repeat photography. Environ Manage. 2008; 41:949–958.
- (46) Tisné S, Serrand Y, Bach L, Gilbault E, Ben Ameur R, Balasse H, et al. Phenoscope: an automated large-scale phenotyping platform offering high spatial homogeneity. Plant J. 2013;74:534–544.
- (47) Granier C, Aguirrezabal L, Chenu K, Cookson SJ, Dauzat M, Hamard P, et al. PHENOPSIS, an automated platform for reproducible phenotyping of plant responses to soil water deficit in Arabidopsis thaliana permitted the identification of an accession with low sensitivity to soil water deficit. New Phytol. 2006;169:623–635.
- (48) Jansen M, Gilmer F, Biskup B, Nagel KA, Rascher U, Fischbach A, et al. Simultaneous phenotyping of leaf growth and chlorophyll fluorescence via GROWSCREEN FLUORO allows detection of stress tolerance in Arabidopsis thaliana and other rosette plants. Funct Plant Biol. 2009;36:902–914.
- (49) Chéné Y, Rousseau D, Lucidarme P, Bertheloot J, Caffier V, Morel P, et al. On the use of depth camera for 3D phenotyping of entire plants. Comput Electron Agric. 2012;82:122–127.
- (50) Dobrescu A, Scorza LC, Tsaftaris SA, McCormick AJ. A Do-It-Yourself phenotyping system: measuring growth and morphology throughout the diel cycle in rosette shaped plants. Plant Methods. 2017;13:95.
- (51) Minervini M, Fischbach A, Scharr H, Tsaftaris SA. Finely-grained annotated datasets for image-based plant phenotyping. Pattern Recognit Lett. 2015; p. 1–10. Available from: https://www.plant-phenotyping.org/datasets-home
- (52) Minervini M, Giuffrida MV, Perata P, Tsaftaris SA. Phenotiki: an open software and hardware platform for affordable and easy image-based phenotyping of rosette-shaped plants. Plant J. 2017;90:204–216.
- (53) Bai G, Ge Y, Hussain W, Baenziger PS, Graef G. A multi-sensor system for high throughput field phenotyping in soybean and wheat breeding. Comput Electron Agric. 2016;128:181–192.
- (54) Jiang Y, Li C, Paterson AH. High throughput phenotyping of cotton plant height using depth images under field conditions. Comput Electron Agric. 2016;130:57–68.
- (55) Barker J, Zhang N, Sharon J, Steeves R, Wang X, Wei Y, et al. Development of a field-based high-throughput mobile phenotyping platform. Comput Electron Agric. 2016;122:74–85.
- (56) Jimenez-Berni JA, Deery DM, Rozas-Larraondo P, Condon ATG, Rebetzke GJ, James RA, et al. High throughput determination of plant height, ground cover, and above-ground biomass in wheat with LiDAR. Front Plant Sci. 2018;9:237.
- (57) Story D, Kacira M. Design and Implementation of a Computer Vision-Guided Greenhouse Crop Diagnostics System. Mach Vision Appl. 2015;26:495–506.
- (58) Lee U, Chang S, Putra GA, Kim H, Kim DH. An automated, high-throughput plant phenotyping system using machine learning-based plant segmentation and image analysis. PLoS One. 2018;13:1–17.
- (59) Beck MB, Liu CY. EAGL-I [software]. 2020 May [cited 2020 May] Available from: https://github.com/UWDigitalAg/EAGL-I
- (60) Gehan MA, Fahlgren N, Abbasi A, Berry JC, Callen ST, Chavez L, et al. PlantCV v2: Image analysis software for high-throughput plant phenotyping. PeerJ. 2017;5.
- (61) Bradski G, Kaehler A. Learning OpenCV: Computer vision with the OpenCV library. O’Reilly Media, Inc.; 2008.
- (62) Beck MA, Liu CY, Bidinosti CP, Henry CJ, Godee CM. The weedling dataset. Available from: https://doi.org/10.5061/dryad.gtht76hhz
- (63) He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit; 2016. p. 770–778.
- (64) Haug S, Ostermann J. A Crop/Weed Field Image Dataset for the Evaluation of Computer Vision Based Precision Agriculture Tasks. Comput Vis ECCV; 2015. p. 105–116. Available from: http://dx.doi.org/10.1007/978-3-319-16220-1_8.
- (65) Giselsson TM, Dyrmann M, Jørgensen RN, Jensen PK, Midtiby HS. A Public Image Database for Benchmark of Plant Seedling Classification Algorithms. arXiv preprint. 2017; Available from: https://vision.eng.au.dk/plant-seedlings-dataset