Agriculture has been one of the most critical industries since human civilization. From manual-powered efforts thousands of years ago to the modern large-scale machine-driven farming, this industry is evolving to feed the increasing population while facing limited resources. In agriculture, soybeans (Glycine max) rank among one of the most important crops worldwide (lee2015soybean; pagano2016importance). Soybeans are not only a key source of protein for humans and livestock but also a key ingredient for biodiesel and vegetable oils. In 2019, the United States soybean exports totaled 19 billion USD with the global soybean market estimated to be over 50 billion USD each year since 2015 (USDA2019Soy).
Given the economic impact of soybeans, breeding companies, both in the private and public sector, endeavor to release improved soybean varieties to meet market demands where these new varieties are hopefully higher yielding than the previous generation. The process to identify superior varieties is through years of observed field trials. Within all soybean breeding pipelines, the relative maturity of a genetic soybean variety is an important characteristic to determine optimal growing environments, based mostly on latitude (ortel2020soybean). Identifying a soybean’s relative maturity enables plant breeders to make optimal planting placement decisions to ensure a soybean reaches its maximum growth potential. That is, placing a genetic soybean variety in the correct latitude band will allow for the proper accumulation of sunlight needed for optimal growth. If a soybean variety is not placed into its optimal growing environment, the soybean may freeze before harvest, ultimately wasting precious resources and its yield potential. For a commercial breeding organization, at the end of each growing season, soybean advancement decisions must be made to determine which soybeans will be chosen to progress towards another year of field testing and which ones will be discarded. These meetings are typically considered the backbone of any breeding organization, and a company’s success depends on identifying the best crops to move forward. It is common for these decisions to encompass 40,000 genetically different soybean varieties, and having key information is the difference between a confident decision and an educated guess. Soybean relative maturity is one of the most important factors (in addition to yield) in determining if a soybean is advanced or not. Therefore, it is vital that this information is accurately and efficiently collected before these decisions need to be made.
To determine the relative maturity of a soybean variety, soybean field trials are manually screened approximately once or twice per week by soybean physiology experts to determine the day of the year that 95% of soybean pods within a plot have reached their mature brown color (fehr1977stages). However, for a global breeding organization, manually screening soybean fields is labor-intensive, time-consuming, and prone to human error, making the data collection process a difficult task. Of the 40,000 genetically different soybean varieties, only a fraction will have their relative maturity identified, leading to less optimal advancement decisions.
In recent years, digital and data-driven breeding/farming has led agriculture to a new era (yost2011decision; aubert2012enabler; kurkalova2017sustainable; karimzadeh2019data; shahhosseini2020improved; moeinizade2019LAS; MoeinizadeMTLAS; moeinizade2021TI; Gorkem2020land; amini2021look; han2021dynamic). Satellite imaging, live GPS tracking, drone-based fertilizing and screening, micro-sensors in the soil, etc., are all examples of new techniques that have brought significant changes to the agricultural industry. Modern image capturing devices such as unmanned aerial vehicles (UAVs) allow researchers to gather images of crop fields at a faster rate with reduced dependency on error-prone, manual labor. Combined with modern analytical techniques, these images provide crop level data that breeders and farmers can use to enhance decision making (nguyen2020monitoring; peng2020evaluation; khaki2020yieldnet). A scalable image capturing technology has the ability to increase information throughput as well as reduce human error. However, the adoption of a new process leads to a new set of challenges, such as when and how frequently images should be collected and how algorithms can be trained to best convert images into useful data for decision making. Oftentimes, the turnaround from the last drone flight to the advancement decision deadline is a couple of days. Therefore, any established data pipeline needs to be scalable to provide efficient data turnaround that is accurate for end goal needs.
Within machine learning, deep neural networks (DNN) are commonly used models that contain many sequentially stacked layers allowing the model to learn features that encode information from the image. Using a DNN, features are automatically learned from input data without manual feature creation. With increased computing power, automating image analysis is possible for agricultural tasks such as image classification, object detection, and object counting. These approaches often rely on a special type of DNN called convolutional neural networks (CNN). CNNs excel at image-based tasks because they take advantage of the spatial structure of the pixels. Additionally, CNNs require a fewer number of learned parameters, achieve superior computational performance compared to traditional machine learning, and reduce the risk of overfitting (o2015introduction). Given the flexibility and strength of CNNs, it comes as no surprise the vast amount of domains and applications in which they are used (heinrich2021process; liu2021survey; tang2021model).
Cornerstone deep learning frameworks such as RetinaNet (alon2019tree), Mask-RCNN (yu2019fruit), and YOLO (mosley2020image)
have been applied to detect sorghum heads, count strawberries, and classify tree species. Also given that commercial corn is a staple crop around the world due to being used for animal feed and biofuels, it is no surprise that there is substantial work combining corn images and deep learning(khaki2020high; khaki2020convolutional; khaki2021deepcorn). Aside from deep learning, many other researchers have combined traditional statistical modeling, machine learning, and image processing with agricultural data (singh2016machine; naik2017real; moeinizade2018stochastic; dobbels2019soybean; moeinizade2020complementarity; pothen2020detection; LI2020304; Austin; shahhosseini2019maize; shahhosseini2020forecasting; shahhosseini2021coupling). With the rapid combination of analytical techniques and plant breeding, this large body of recent work illustrate that large-scale analysis of imagery for improved decision making is possible.
Soybean relative maturity is an excellent example of a trait that can be measured using image analysis techniques. Recent efforts have used drone-based imagery to estimate the days to soybean maturity, that is, the day of the year in which 95% of pods within a plot have turned to their mature brown color. These approaches have relied on multiple linear regression(christenson2016predicting), LOESS regression (Austin), segmented regression (narayanan2019improving), and partial least squares regression (zhou2019estimation). Each of these methods relies on condensing each RGB image to a single value, such as the normalized difference vegetation index or the green leaf index. However, such an approach lends itself to information loss due to summarizing a complete image towards a single number. Moreover, to date, trevisan2020high
is the only work that has used a CNN to estimate soybean maturity dates. However, their work does not take into account the temporal relationship between drone flight dates and simply uses an image classification approach. Therefore any extension of the literature requires addressing the temporal relationship between drone images, such as by using a long short term memory (LSTM) recurrent neural network(yu2019review). Like a CNN, an LSTM is a special type of deep neural network that can process multiple data points and sequences. Thus, they naturally lend themselves to be used for time series analysis. Given that important decisions depend on the accuracy of these predictions, it is vital that any edge in prediction accuracy is obtained.
Given this information, our research aims to
Create a temporal, image-based, deep learning model (CNN-LSTM) to estimate soybean maturity
Determine how our model’s accuracy changes under the differing temporal frequency of flights
Create a data-driven support system that will inform plant breeders when sufficient UAV imagery is captured for a soybean variety advancement decision
Ideally, throughout a soybean’s growth cycle, a drone flight should be conducted each day. However, in practice, due to limited resources (time, labor, money, etc.) and varying weather conditions, drone images may only be taken at most once or twice a week. Therefore it is the goal of this paper to create a system so that commercial breeding organizations can make decisions on when to best conduct drone flights and identify when enough flights are taken for an accurate soybean maturity date estimation to, ultimately, aid in advancement decisions. Additionally, if a prediction is within two days of the actual maturity date, that is enough to make a confident advancement decision.
In this section, we introduce an end-to-end framework to systematically estimate the relative maturity of soybeans using computer vision and deep learning techniques. Our proposed framework comprises of four phases which are the data extraction and assembly, data pre-processing, feature extraction, and prediction phases. The data extraction and assembly phase first extracts plot images from each flight’s full field orthomosaic and then assembles the time series of plot images for a given date. That is, each variety is snipped into a single image and then sequenced together by drone flight date. The data pre-processing phase includes rotating and resizing images to make the data consistent. In the feature extraction phase, time distributed convolutional neural networks are designed to automatically extract features from the time series of images for each variety. Finally, long short-term memory recurrent neural networks are used to predict the relative maturity of soybeans given features extracted from the previous phase. Figure1
illustrates this process for a single flight orthomosaic (one drone flight on a single day) for numerous soybean varieties. At each stage, we make use of open-source software for our end-to-end process. In practice, this entire pipeline is completed in an efficient manner, so that plant breeders have sufficient data to make necessary soybean advancement decisions.
2.1 Proposed deep learning model
demonstrates the structure of the deep learning model. In this hybrid model, for each plot snip, time-distributed CNNs are used for deep feature extraction, and LSTM is used for capturing the sequential behavior of time series data. Time distributed layers have the advantage of applying convolution on each image in the time series independently.
In detail, each image is first passed to a set of convolutional and max pooling layers to produce a fixed-length vector representation. Convolutional layers act as a feature extractor that automatically learns specific details about the image. This contrasts with traditional machine learning methods, where features need to be handcrafted by a domain expert. Acting as our feature extractor, we have four convolutional layers and two max pooling layers. The repetitive nature of the convolutions helps to extract a more detailed structure about the images. As an illustration, the first couple of convolutional layers learn simple features such as colors and edges, but as the number of convolutions increase, the new features identify shading patterns, areas of interest, and pixel densities. At a high level, the max pooling layers act to reduce the dimensions from learned feature maps, and therefore reduce the number of parameters to learn, thus leading to less computational demand. In short, the convolutional layers act as a way to reduce an image down to a single vector of features so that the resulting output can be read into a learning algorithm.
for more details about the network architecture). The outputs of CNN for the sequence of plot images are passed to an LSTM model. Long short-term memory (LSTM) is a special type of recurrent neural network (RNN) that is well suited for making predictions based on time series data. LSTMs solve the vanishing gradient problem that can be encountered in classical RNNs by using information gates to store useful information and forget unnecessary information. The output of the LSTM layer is passed to an output layer, and finally, relative maturity is estimated. More instructions about experiment design are presented in section3.3.
2.1.1 Loss Function
Here is the true label, and is the prediction. For small values of , this function is quadratic and for large values it is linear. Moreover, is a hyper-parameter that can be tuned. This loss function brings the advantages of mean square error (MSE) loss and mean absolute error (MAE) loss together by combining them in a piecewise manner.
We compare the performance of our proposed deep learning model against a recent state-of-the-art benchmark created by Austin where a local regression model is fitted to the RGB color transformation values over time. That is, for each plot and drone flight date, an RGB transformation is performed, and the output is fed into the regression model. Austin demonstrated that using the mean greenness leaf index (GLI) on each plot combined with locally estimated scatter plot smoothing (LOESS) regression results in higher correlations between the predicted and ground truth maturity days. The LOESS model combines much of the simplicity of linear least squares regression with the flexibility of nonlinear regression by fitting linear models to localized subsets of data determined by a nearest neighbor algorithm (LOESScleveland1988locally)
. In this study, the LOESS model was implemented using the “lowess” function from “statmodels” package in Python to interpolate the GLI values extracted from the time series of images. This function implements the algorithm described inLOESScleveland1979robust. The GLI index can be calculated as follows:
where R, G, and B are the mean values of red, green, and blue channels respectively for each image. Finally, the relative maturity date is predicted as the day with the closest value to a predefined threshold, where the threshold value corresponds to a level of greenness. Austin suggested that a GLI threshold of 0.02 was optimal from their extensive testing. However, this value can vary based on the environment and equipment, and optimizing a single threshold is not scalable. This is one limitation we hope to address with our CNN-LSTM approach.
3 Experiments and Results
In this section, we present the data sets used in this study, the deep learning model’s hyper-parameter tuning, and the results for both the deep learning model and the LOESS model.
Data from 6 different locations across the United States in two growing seasons (2018 and 2019) is used in this study. All data (orthomosaics, plot delineations, and ground-truth) except environment 3 (Elkhart, IA, 2019) were adopted from the University of Minnesota Soybean breeding project (dataset). The environment 3 data set is provided by a commercial breeding organization. The following table includes more details about the data sets used in this study. Environments 3 and 6 have the largest number of plots (soybean varieties), which is almost twice the number of plots in other environments. During the growing season, a commercial breeding organization can have upwards of 40,000 genetically different soybeans planted in 200 geographically different locations (ranging from Mississippi to Canada). Due to the labor requirements and plant breeding expertise needed to accurately observe soybean maturity, it is challenging to have extensive ground-truth data.
|Environment||Location||Year||Number of plots||Planting date||UAS Platform|
|1||Waseca, MN||2018||874||May 17||DJI Phantom 3 Pro|
|2||Lamberton, MN||2018||796||May 16||DJI Phantom 3 Pro|
|3||Elkhart, IA||2019||1686||April 24||DJI Phantom 3 Pro|
|4||Waseca, MN||2019||688||May 15||DJI Phantom 3 Pro|
|5||Lamberton, MN||2019||896||May 16||DJI Phantom 3 Pro|
|6||Rosemount, MN||2019||1410||May 26||DJI Inspire 1|
Each data set includes orthomosaics, plot boundary delineations, ground control points, and the ground-truth notes that were taken by soybean physiology experts visiting plots and manually observing the soybean’s relative maturity. To extract individual plots from orthomosaic images, we used a Python script to parse the plot boundaries, and we tagged each image with the drone flight date and unique identifier (soybean’s name). We performed this process for each orthomosaic across all flight dates as shown in Table 2. In total, we processed 31,750 unique images. It should be noted that the number of plots is not exactly the same as the previous work conducted by Austin since we kept outliers in the data sets to test the robustness of our proposed models. Whereas Austin removed outliers.
|1||Sep 6, Sep 13, Sep 20, Sep 27, Oct 8|
|2||Sep 5, Sep 14, Sep 17, Sep 25, Oct 4|
|3||Sep 7, Sep 13, Sep 20, Sep 26, Oct 3|
|4||Sep 6, Sep 14, Sep 21, Sep 27, Oct 7|
|5||Sep 6, Sep 14, Sep 18, Sep 25, Oct 4|
|6||Sep 6, Sep 13, Sep 20, Sep 27, Oct 7|
The relative maturity days are calculated as the number of calendar days after August 31. That is, our model simply estimates a numeric value that we add to August 31 to get a month-day estimation for the maturity date. Figure 4 visualizes the distribution of relative maturity days for each environment. As shown in Figure 4 distributions vary in shape and spread. The median of relative maturity days across these 6 environments is 20, 24, 15, 25, 26, and 27, respectively. As expected, environment 3 (with the earliest planting date) has the lowest median, that is, the earliest maturity date.
All flights start from mid-August and last in mid-October, and each environment varies in flight dates. To be consistent, we selected weekly flights considering a 5-week period, including 4 weeks in September and the first week of October for each environment. We train our models on weekly and bi-weekly flights, including 5 and 3 images, respectively, to investigate the effect of less frequent flights on prediction performance. In practice, if accurate performance can be obtained from fewer flights, that would increase the speed of information gain, thus allowing sufficient time for plant breeders to make advancement decisions. Operationally, a reduced frequency in the number of drone flights would present significant cost savings both in time and reduce drone costs. In an ideal scenario, a breeding organization can fly drones every day, but due to restrictive factors (weather, time, labor, etc.), flights can only be conducted at most once or twice a week. A system that can yield accurate and confident results with fewer drone flights would have numerous positive outcomes for a breeding organization.
3.2 Data Processing
A number of pre-processing steps are applied to the images before moving on to the prediction task. The first step is to resize all images to a fixed width and length. The original images of plots have a rectangular shape with width varying from 57 to 65 and length varying from 146 to 280. We tried different sizes and find out that the prediction is not very sensitive to the image size since we are mostly extracting features related to color and shape. Therefore, we decided to resize all images to a fixed size of for consistency.
Next, we applied some data augmentation techniques to test the robustness of the proposed deep learning model against some variation of images which can be caused due to cloudiness and the relative position between the camera and sun. These techniques include changing the brightness and contrast and making some images blur. Data augmentation techniques have been conducted on 20% of total images. To change the brightness of images, we added a constant to each pixel. Similarly, contrast can be changed by multiplying each pixel by a constant. To make the images blur, we used Gaussian smoothing to remove noise.
3.3 Design of Experiments
The CNN-LSTM model consists of four time-distributed convolutional layers of 32 filters and two LSTM layers. The detailed structure of the network is provided in Table 3
. In the CNN model, downsampling is performed by max pooling with a stride of 2. The output of the last max pooling layer is flattened and used as input features for LSTM layers. The LSTM model has 256 hidden units. The first LSTM layer (LSTM) returns a sequence that goes as input to the second LSTM layer (LSTM
) that outputs a vector. Finally, a dense layer with 1 neuron is applied to estimate the relative maturity of each plot. After trying different network designs, we found this architecture to provide the best overall performance. This architecture has a total of 948,449 trainable parameters.
The weights of the network are initialized using Xavier initialization (glorot2010understanding)in the Huber loss function is set as 1. To determine this value, we applied a search grid over different values of
and chose the optimal one. The loss function is optimized using stochastic gradient descent (SGD) with a batch size of 64(lecun1998gradient)
. The optimization algorithm is Adaptive Moment Estimation (Adam) with a learning rate ofand decay rate of (kingma2014adam). The model is trained for 300 iterations.
|Layer name||Filter size||# Filters||Stride||Padding||Input size|
|Time distributed Conv2D||32||2||Same||(5, 256, 64, 3)|
|Time distributed Conv2D||32||2||Same||(5, 128, 32, 32)|
|Time distributed MaxPooling2D||-||2||-||(5, 64, 16, 32)|
|Time distributed Conv2D||32||2||Same||(5, 32, 8, 32)|
|Time distributed Conv2D||32||2||Same||(5, 16, 4, 32)|
|Time distributed MaxPooling2D||-||2||-||(5, 8, 2, 32)|
|Time distributed Flatten||-||2||-||(5, 4, 1, 32)|
|Dense (1 neuron)||-||-||-||-||256|
According to Table 1 there exists total plots. We randomly select 85% of plots as the input and use the rest as the test data to evaluate performance. Since the number of plots varies across different environments, we kept approximately 15% of total plots per environment as test data. As such, the test size across the six environments is 117, 112, 260, 104, 134, and 226, respectively. The input data is split into train and validation sets randomly. The validation set (10 % of input data) is used to monitor the training process.
3.4 Performance Evaluation Metrics
To measure the performance of the proposed model, we use the mean absolute error (MAE) and mean squared error (MSE) metrics, which are defined as follows:
Here denotes the true relative maturity for the plot, denotes the predicted relative maturity, and is the total number of plots.
In this section, we provide the final results for the proposed CNN-LSTM model and the benchmark (LOESS model) and compare their performance.
3.5.1 CNN-LSTM Performance
The CNN-LSTM model is trained using weekly and bi-weekly flight images. Figure 6 demonstrates the performance of the deep learning model across six environments using 5 images on a weekly basis. According to the test performance, the between the ground truth and estimated relative maturity days are higher than 0.8 for all environments except environment 6. Environment 6 has achieved an of 0.5 on the test data. Having more outliers and a different UAV platform has resulted in a decrease in performance for this environment.
Environment 3 has the lowest mean absolute error (less than 1 day) amongst all environments. Environments 1, 4, and 5 have achieved a mean absolute error of fewer than 2 days, whereas environments 2 and 6 have a mean absolute error of almost 2 days.
Figure 7 presents training and test results across six environments using 3 images. As expected, in most of the cases, has decreased with respect to the previous weekly model. Take, for example, environments 2, 3, 4, 5, and 6 have less compared to the previous model for both train and test sets. However, environment 1 has performed slightly better in terms of all three metrics, which can be explained by highly correlated images.
As previously mentioned, the LOESS model depends upon setting a threshold value that is chosen through trial and error. Therefore, we estimate the LOESS model’s performance using 9 different threshold values between 0.01 and 0.09. Detailed results are available in tables 5 and 6 (see appendix). The threshold value corresponding to the best performance differs across environments. Even in some cases, the optimal threshold differs with respect to the chosen metric (see Env2 and Env5 in table 6). Using the LOESS model is difficult in practice due to the sensitivity of identifying the correct threshold. For soybean field trials placed around the entire Midwestern United States, identifying a different threshold value for each environment is not practical and makes scalable implementation difficult.
Table 4 summarizes CNN-LSTM and LOESS performance for both weekly (using 5 images) and bi-weekly (using 3 images) analysis. It should be noted that the LOESS model is fitted to the time series of each plot separately, whereas the deep learning model fits a universal CNN-LSTM on training data and uses the test data for evaluation.
In all cases, the CNN-LSTM model has achieved higher and lower MAE and MSE values, except for environment 1 where LOESS performs slightly better than the CNN-LSTM test performance. For environment 6, the LOESS model is not able to predict any variations on the response (), whereas the CNN-LSTM model can predict almost 50% of the variation. Furthermore, decreasing the number of input images from 5 to 3 has affected the performance of the LOESS model more than the CNN-LSTM model.
|Environment||Metric||Method (Weekly)||Method (Bi-weekly)|
4 Discussion and Implications
For a commercial plant breeding organization, identifying the maturity date of soybeans is critical. When a soybean’s relative maturity is not classified correctly, it affects harvest operations, can cause seed germination issues, affects advancement decisions, and contributes to a loss in genetic gain. With this approach, we have shown that our CNN-LSTM provides a robust approach to estimating a soybean’s maturity date within 2 days of the actual maturity date without the ambiguity of the LOESS model. That is, we are robust to outliers and do not need to set arbitrary thresholds for prediction. Even using just three drone flights taken two weeks apart for predictions, we achieve high performance accuracy.
The impact of this work demonstrates that drone flights do not need to be conducted every day and that if a drone flight can only be conducted in two week intervals, predictions will still be accurate. This will present itself as a cost saving measure, and no longer do operational specialists have to rush to the field to capture frequent drone flights or even to manually traverse the fields.
Additionally, given the geographical considerations of environment locations, we also notice that some locations may need more drone flights than others, on average. For example, Environment 2 and Environment 6 will need at least 5 drone flights. Whereas Environments 1, 3, 4, and 5 having three drone flights is sufficient. This discrepancy can be attributed to differences in land topography, weather, or soil conditions. This inadvertently leads itself into other plant breeding questions that connect soybean genetics to the environment and farm management practices, all of which are important pieces to consider for soybean advancement.
Moreover, from the perspective of the plant breeder, knowing that an advancement decision can be made after taking three drone flights for a subset of locations is reassuring. Once a subset of locations has enough information for a confident decision, breeders can then focus their time gathering data about the rest of their field. Moreover, since an increase in drone flight only marginally improves accuracy for a subset of fields, a plant breeder can work with the operations team to create a schedule of drone flights that is flexible yet allows for enough data collection for the fields that are known to require more flights. Additionally, the need for fewer flights allows for more plots to be captured. As stated previously, it is common for a commercial organization to have 40,000 genetically different soybeans being grown at a single time, yet, our training data set only consists of 1,650 soybeans. Therefore with less demand on flight frequencies, organizations are now able to capture more data with fewer flights. That is, instead of simply having soybean relative maturity data for a small subset of varieties, organizations can now consider expanding drone programs to cover all their varieties. This leads to better decision making when determining the next soybean to advance to the next stage, and ultimately, be grown by farmers.
In this study, we demonstrated how an applied deep learning system could be used to help aid soybean breeding decisions. To achieve this, we developed an end-to-end framework to estimate the relative maturity of soybeans given the time series of UAV images. A hybrid deep learning model is proposed to extract features from images and capture the temporal behavior of the time series data. Analyses are conducted for 6350 plots across six environments in Minnesota and Iowa in two growing seasons given two different flight frequencies (weekly and bi-weekly). The deep learning model can estimate the relative maturity days with less than 2 days of error for five environments among six total environments. The sixth environment has less than 3 days of error in prediction.
In predicting soybean relative maturity, color is the most important feature since soybean becomes mature when 95% of the pods turn brown. Therefore, a successful model should be able to detect color properly. In a deep learning context, color is considered a simple feature that can be detected through earlier layers. This explains why we can achieve such a good performance using only four conventional layers in our proposed model.
To evaluate our estimations, we compared the deep learning results with a benchmark. The benchmark is a local regression model (LOESS), which models the greenness decay over time. We demonstrated the sensitivity of this method to its predefined threshold value. Although the local regression model is simple and fast, optimizing the threshold value can be analytically challenging and computationally expensive. This method maps each image to a single value (vegetation index) and does not keep all features of the images as what CNNs do. Furthermore, a separate model should be learned for the time series of each plot, which means the model cannot be generalized to new data.
One of the advantages of the CNN-LSTM model is its robustness to data quality issues (e.g., dark and blur images). The other advantage of the deep learning model is its good performance in less frequent flights. Furthermore, our proposed method can be generalized to data from new environments.
With the accessibility of satellite imagery, future research in this area can expand on using satellite imagery to estimate soybean maturity. Moreover, given this framework, another research path would be to identify other soybean traits or traits from other crops that can be analyzed from drone imagery with the ultimate goal of improving decision making.
Conflicts of Interest
The authors declare no conflicts of interest.
This work was partially supported by Syngenta Company.