## I Introduction

While the demand for food increases and the need for environmentally conscious farming practices is clearer than ever before, precision agriculture promises the ability to reduce input costs, while simultaneously maximizing yield and reducing environmental impact. But to live up to these promises, phenotype data must be collected for individual plants on an extremely large scale [htpp]

. The current solution, which is to estimate the phenotype of an entire field from a manually measured sample, is prone to sampling bias and does not scale. Requiring human intervention to produce data slows the rate at which data can be collected and makes every data point costly in terms of both time and money. The remote sensing and agricultural robotics communities have been working to automate this process, but obtaining accurate measurements of individual plants in an unstructured environment is a multifaceted problem with many challenging intermediate steps. Currently proposed solutions utilize specialized sensors and computationally expensive algorithms that make these solutions unable to perform at the scale of entire fields. But instead of performing all operations on a large scale, it may be possible to collect data on a large scale and then split the data into small, yet semantically meaningful, units that can be processed individually. This would allow data to be captured at a large scale and then broken into individual plants to be processed by a computationally expensive phenotyping system, thus avoiding the need to feed large amounts of data into phenotyping systems while simultaneously simplifying the task they need to accomplish. It is therefore necessary to find a way of splitting a field of plants into individual plants in a computationally inexpensive way.

In this paper, we present and evaluate a number of algorithms developed to roughly segment individual plants from one another over an entire field. The algorithms are all defined as slight modifications of a commonly used clustering technique so that their computational complexity is low and well defined. The use of common data clustering techniques as an algorithmic basis constrains the new algorithms to be relatively data and species agnostic while maintaining a known behavior when scaling up to large input sizes. These algorithms allow data collection and phenotype measurement to happen at different scales so the computationally expensive phenotyping algorithms can operate on small portions of the point cloud containing individual plants in order to maintain a lower runtime.

Because the plant separation problem is just a small part of what is to be a long and complex system for phenotype measurement, the required inputs and outputs of each algorithm are kept as general as possible. Each algorithm takes in a point cloud of a group of plants and produces a partition of the input points into sets roughly corresponding

to individual plants. It is important to note that since this step is meant as a pre-clustering step to produce clusters that will be further processed, the partitioning of plants does not have to be precise and a rough approximation is adequate as long as the algorithm scales. The input point clouds are required to be rotated such that the gravity vector of the world frame is pointing in the

direction (up is direction), but the coordinate system is otherwise unconstrained. They do not require row, color, or radiometric information. Only the three spatial dimensions are necessary for each point.The algorithms defined in this work are more general than any other method of separating plants in point clouds of crop fields. Their independence from coordinate system, training data, row information, and radiometric or color information make them the most widely applicable algorithms available for the task of plant segmentation. They do not require a specific sensor like Lidar or an RGB camera and they do not require any training data like common machine learning approaches. One of them is even unit agnostic, avoiding the need to be scaled into known units as is commonly required for reconstructions created via structure-from-motion. The algorithms defined here are not the final solution to the problem they address, but they beat the current state-of-the-art by a large margin, are scalable enough to be used on large-scale data, general enough to be a starting point for most similar applications, and simple enough to be adapted to specific problems.

## Ii Related Work

Separating plants in a point cloud for use in phenotype measurement applications is a well explored problem. There are a few different formulations of the problem, some of which are comparable to the formulation used in this paper and some that are not, requiring specific sensor data. But most existing solutions are all either too complex to scale to point sets the size of an entire field or they require specific information only available with expensive, specialized equipment.

Specialized sensors like lidar have proven useful for this problem [lidar, lidar_FRCNN, lidar_VCNN, lidar_cluster], but lidar sensors remain prohibitively expensive when compared to more commonly available sensors like RGB cameras, which are commonly used for crop surveying. In [lidar], radiometric data from the lidar sensor is directly utilized, meaning the algorithm is tied to a single sensing modality. In contrast, [lidar_FRCNN] and [lidar_VCNN]

do not use radiometric information so they can presumably be implemented with different sensors, but they instead utilize deep learning methods that are unable to scale well. Of the works using lidar, the work of

[lidar_cluster] is the most general as they use lidar only to obtain the point cloud and then use a region growing clustering method that incorporates row and plant spacing parameters to separate plants. The resulting plant detection algorithm performs well on simulated data but the performance deteriorates on data from real crop fields, achieving an average detection rate of only 60%.While using lidar produces a more accurate and complete reconstruction of the environment, it has not been the only approach to crop reconstruction. Some groups have instead opted to use stereo vision or structure-from-motion as they do not require expensive sensors. But due to the increased noise and non-uniform sampling imposed by these reconstruction techniques, the approach to the problem is usually slightly different. The work of [voxel_lab] only operates on data collected in laboratory conditions, [manual] uses manually separated plant data, and [just_leaf] skips the plant level altogether and just segments leaves. In [jay2015field] they claim the ability to segment well separated plants form field data, but the method of separating plants from one another is not detailed enough to be reproducible. The workflow proposed in [rain] uses structure-from-motion and then a skeletonization procedure before finally clustering the point cloud skeleton in order to achieve sparse detection and sampled segmentations of plants in a large scale point cloud. This work achieves state-of-the-art results with scalable performance but does so on a skeletonized representation of the point cloud. The algorithm used is easily applicable to the original point cloud though, if the skeletonization is not run and the algorithm is performed on the entire point cloud rather than on a sampling, the complexity is still only which matches the complexity of all the algorithms proposed in this paper and achieves reasonable results. In fact, when extending the RAIN algorithm defined in [rain] to run without randomization, a similarity with the Quickshift algorithm [quickshift] was noticed and became the basis of this paper.

Quickshift [quickshift] is a density-based, mode finding, clustering algorithm. It is similar to Mean-Shift [meanshift] in that points are clustered with those that are drawn to the same mode, but it is not iterative and only evaluates the density function times. This makes Quickshift very efficient as its complexity is dominated by spatial neighborhood queries which can be performed quickly with the help of spatial data structures like a D-Tree. With this paper we sought to exploit the similarity between RAIN and Quickshift to define a more accurate and generalized plant clustering algorithm. In this way we switch from using algorithms made specifically for an application to using a slightly modified version of a more general and well vetted clustering algorithm.

## Iii Algorithms

Noticing how RAIN [rain] takes advantage of point cloud topology and approximates common density based clustering methods, we define a few similar agglomerative clustering algorithms using Quickshift [quickshift] as a template. The first of these is a non-randomized adaptation of the RAIN algorithm to be used for comparison while the others are slight modifications of Quickshift or Quickshift++ [quickshift++].

### Iii-a Non-random RAIN

The first algorithm we define is a non-randomized version of the RAIN algorithm from [rain]. The purpose being that RAIN is the current state-of-the-art in this domain but it only partitions a sampling of the points and therefore does not directly match the formulation used in this work. So with this algorithm we remove the random sampling part of the algorithm and run the procedure on all points in the point cloud. In this way it is directly comparable to the other algorithms in this work while maintaining its state-of-the-art performance. The algorithm takes as input a neighborhood distance and a point cloud . For each point , we assign to the cluster of the point in with the minimum coordinate where , the neighborhood of points within distance of the point . Assuming this neighborhood can be computed with a spatial data structure like a D-tree in approximately time, this means the entire algorithm operates in . This adaptation leaves the adapted algorithm with only one parameter, , which is a neighborhood distance threshold in 3D space.

### Iii-B Z-Quickshift

Since the RAIN algorithm clusters each point with its lowest ( dimension) neighbor within some neighborhood, we can instead define an algorithm to cluster each point with its nearest lower neighbor. This modification preserves the topology following analogy used in [rain] of a raindrop running down a plant while formulating the algorithm in a way that fits the mold of Quickshift [quickshift]. Similar to the non-randomized RAIN defined earlier, this algorithm maintains the single distance parameter governing neighborhood searches. This algorithm can be implemented as a simple modification the density function of Quickshift to . The modified density function is extremely simple to compute and does not require neighborhood information, but it does not change the computational complexity of Quickshift which is dominated by neighborhood finding. The modified Quickshift is dubbed Z-Quickshift in this work and also has a time complexity of .

### Iii-C Ground Density Quickshift

Instead of utilizing the vertical structure of the plant, it was noted that viewing the structure of the plant in the horizontal plane gives a very clear picture of the separation between plants and takes advantage of the vertical structure of the plant to cluster points that are close in the projected space. This structure can be seen in Fig. 2. This is conducive to clustering points that are close in the projected space but may not be close in 3D, such as points on opposite ends of the plant stem. To take advantage of the observable structure in the ground plane we can project all points into the ground plane where we can then perform standard Quickshift in 2D. For this algorithm we utilize the density function defined in [quickshift++] as it can be performed with the neighborhood queries which must be computed for Quickshift anyway, allowing a runtime of . The density function has a single integer parameter determining the kernel size. This gives GD Quickshift a total of two parameters, and .

### Iii-D Ground Density Quickshift++

Projecting the data into the ground plane before clustering easily separates vertical structures but will also place leaves in the same cluster if they overlap in the 2D space even if they are far apart in 3D. To combat this we could instead use the densities determined in 2D and cluster the points in 3D, but this creates many small clusters along the length of the stem rather than keeping them all in the same cluster. As a workaround, we can instead use Quickshift++ [quickshift++]

and cluster the densest regions of nearby points in the 2D density space as an initialization. We can then cluster the remaining points in 3D to ensure points are only in the same cluster if they are close in 3D or were in an extremely dense region in the 2D projection (probably a stem). Due to the initialization, there is no need for a distance parameter making this algorithm completely agnostic to scale. Instead, Quickshift++ uses a parameter

to identify the densest regions of points for the initialization. Along with the parameter used for the density function, GD Quickshift++ has a total of two parameters. The initialization of clusters runs in so the final algorithm is also .## Iv Data

To evaluate the algorithms from Section III, both synthetic data and real-world sensor data are used. The data and algorithms have been made publicly available^{1}^{1}1https://github.com/hennels/CropPreClustering to facilitate the replication of results and continued development by the community.

Synthetic data was created by taking the individual plant reconstructions provided in [corn50] and organizing them into a large scale field of plants. This allows us to have a known ground truth partitioning of the point cloud with which to quantitatively compare against. Two large scale point clouds are synthesized, one containing 100 plants and the other containing 400 plants. The ground truth partitions are encoded in the point clouds using color information.

A small dataset of reconstructions from production corn fields was also compiled. A UAV with an attached RGB camera flew at a low altitude over corn fields collecting video. Afterwards, the video was used to build a 3D reconstruction of the corn plants using Pix4D to perform structure-from-motion. The dense central region of each reconstruction was then extracted and transformed such that the plant stems were parallel with the -axis. The data was collected in three different fields over two different growing seasons with corn plants spanning stages from V3 to V6. Qualitative results from only one point cloud are shown in this paper but the reader is encouraged to test the algorithms on the other point clouds to see additional results.

## V Evaluation

Algorithm | |||
---|---|---|---|

Non-random RAIN | ✓ | ||

Z Quickshift | ✓ | ||

GD Quickshift | ✓ | ✓ | |

GD Quickshift++ | ✓ | ✓ |

In order to evaluate the algorithms proposed in this paper we use both quantitative and qualitative methods. For the synthetic data with ground truth partitions we are able to directly compare the results of the algorithms with the ground truth after finding a bipartite correspondence between predicted and ground truth clusters. To compare partitions the intersection over union (IoU) is used and a bipartite matching between predicted and ground truth clusters is computed such that the sum of IoUs is maximized. This makes sure each estimated cluster is only compared against a single ground truth cluster. For the real-world data where ground truth is not available, we show qualitative results for each algorithm using a variety of parameters. To evaluate, we run each algorithm on the same point cloud, with the same parameters, and view the differences. Since each algorithm requires different parameters (see Table I), we notice that many of the algorithms are most sensitive to the neighborhood distance threshold , so this is the parameter we vary. For , a few values were tried and one with reasonable performance on all point clouds () is shown. The GD Quickshift++ algorithm was found to give good results for any value of so a value of is used for all work shown in this paper. For the best performing algorithm we are able to compare manually countable metrics on real-world data such as the ground truth number of plants in the point cloud vs the total number of clusters and the number of clusters containing more than one plant because the errors are few enough to be manually counted.

Since these are pre-clustering algorithms, it is important to note that the necessary performance is dependant on what the subsequent processing algorithms can handle. Ideally, a plant pre-clustering algorithm should be able to separate plants and keep all plants whole in the process, but these goals are often in opposition to one another and it is difficult to accomplish both well. So depending on the subsequent processing algorithm, an over or under-fragmented reconstruction may be preferable and the parameters could be adjusted to produce such a partition.

### V-a Quantitative Evaluation

Algorithm | Clusters | mean IoU | median IoU |
---|---|---|---|

Ground Truth | 400 | - | - |

Non-random RAIN | 474 | 62.1% | 68.4% |

Z-Quickshift | 419 | 62.9% | 72.7% |

GD Quickshift | 397 | 81.0% | 81.7% |

GD Quickshift++ | 401 | 84.3% | 85.1% |

Algorithm | Clusters | mean IoU | median IoU |
---|---|---|---|

Ground Truth | 100 | - | - |

Non-random RAIN | 102 | 54.9% | 65.8% |

Z-Quickshift | 102 | 64.9% | 81.3% |

GD Quickshift | 116 | 81.9% | 82.7% |

GD Quickshift++ | 100 | 85.7% | 87.1% |

For quantitative evaluation on the synthetic data, each algorithm was run with a range of parameters. The results in Tables II and III show the best results for each algorithm that produced a number of clusters within 20% of the ground truth value on the two synthetic point clouds. Due to the bipartite matching between estimated and ground truth clusters, the IoU values can be significantly biased if the number of estimated clusters differs significantly from the number of ground truth clusters which is the reason for the 20% bound. In both tables II and III Ground Density Quickshift++ outperforms the other algorithms on all metrics and thus this is the algorithm we also evaluate quantitatively on real-world data. After applying Ground Density Quickshift++ to each of our real-world point clouds the number of plants and the number of clusters containing more than one plant were manually counted in each. The number of clusters with more than one plant is a metric measuring the number of cases that the algorithm was unable to separate plants. All of these occurrences were clusters with only two plants, most of which were double planting events as is shown in Fig. 5. The number of clusters found minus the total number of plants in the point cloud is a measure of how many extraneous clusters the algorithm produces. Ideally, the number of multi-plant clusters and the number of extra clusters should both be zero. While we considered any cluster not containing a plant of the cultivated crop extraneous, it was noted that many of these clusters were on weeds in the field, so they may be desirable for applications such as weed monitoring or herbicide prescription planning. The results are shown in Table IV. From these values we can see that the total number of clusters produced by Ground Density Quickshift++ is only slightly higher than the total number of plants and the number of multi-plant clusters is relatively small for all three point clouds.

Dataset | Total Plants | Total clusters | Multi-plant Clusters |
---|---|---|---|

St. Paul V3 | 136 | 144 | 4 |

St. Pual V6 | 103 | 107 | 6 |

Dji V4 | 90 | 92 | 3 |

### V-B Qualitative Evaluation

In Figures 3 and 4 qualitative results on the St. Paul V3 dataset are shown. From these images it is clear that non-randomized RAIN (left column Fig. 3) is unable to completely separate plants with any of the sampled parameters and often produces more than one cluster on each plant. The behavior changes quickly from over-fragmentation to under-fragmentation as the neighborhood distance threshold increases. Z Quickshift (center column Fig. 3) gives slightly cleaner segmentations but maintains the fragmentation sensitivity to the distance parameter. Ground Density Quickshift (right column Fig. 3) gives comparatively much cleaner segmentations and seems less sensitive to the neighborhood distance threshold as the over and under-fragmentation are well balanced for a majority of the input values. The Ground Density Quickshift++ result, which is shown in Fig. 4 as it does not utilize a neighborhood distance threshold, is the cleanest of any of the algorithms tested and is very insensitive to input parameters. Each plant in the Ground Density Quickshift++ result is part of just a single cluster while the ground points are placed in relatively few clusters. Performance of Ground Density Quickshift++ on a larger portion of the same dataset can be seen in Fig. 1.

## Vi Discussion

Although all the algorithms presented in this work have a similar structure, they have very different behaviors as is seen in Section V. The non-random RAIN algorithm does not group nearby points well because it’s goal groups points with their lowest neighbor within some distance. This does not ensure that points cluster with their immediate neighbors and often results in a periodic clustering pattern as shown on the left side of Fig. 6. Similarly, Z-Quickshift can also produce layered structures because it may not cluster points that are close together if there are different local minima at the bottom of the plant (Fig. 6 right). Both of these algorithms show the need for points to be clustered with neighbors as long as they are part of the same structure. This motivates the use of the density function used in the Ground Density algorithms. Both of the ground density function based approaches work much better in general and produce extremely clean clusters that mostly represent individual plants.

The main difference between Ground Density Quickshift and Ground Density Quickshift++ is the presence or absence of the neighborhood distance threshold. The neighborhood parameter governs how close two points need to be to be clustered together. This means it will have to be less than the distance between the nearest two plants if they are to be separated, but this is difficult in some situations like the double planting events shown in Fig. 5. The absence of the threshold in Ground Density Quickshift++ lets the threshold differ across the point cloud. Plants are not evenly distributed in a field, so typically the threshold would need to be lower in the row of plants and larger in the gap between rows to minimize the number of extraneous clusters. Requiring a constant threshold ensures that either plants will not be separated or there will be a large number of extraneous clusters. One of these alternatives may be permissible because they are able to be handled by subsequent algorithms but this deficiency simply adds algorithmic complexity later in the processing pipeline. Ground Density Quickshift++ does not require a constant threshold so it is better able to adapt to these different plant densities. This explains its good performance on the areas of different density seen in Figs. 4 and 1 where it has many clusters close together in the row of plants but very few in the gap between plant rows.

Without the explicit neighborhood distance threshold, Ground Density Quickshift++ is independent of the scale of the point cloud meaning a point cloud produced via structure-from-motion does not need to be scaled to world units before processing. The other three algorithms all require the neighborhood distance threshold and thus the choice of is dependant on the units of the point cloud. So in order to use consistent parameters, point clouds would need to be scaled to world units. Even though Ground Density Quickshift++ is independent of scale, it is dependent on the sampling density of the point cloud due to its density function. If scale or point cloud density is easier to control, this difference may inform which algorithm may be of use in a given application.

## Vii Conclusion

In this paper we proposed and evaluated a range of scalable clustering algorithms designed to give good results, even at the scale of entire fields. The best of these, Ground Density Quickshift and Ground Density Quickshift++, both perform better than the current state-of-the-art on scalable individual plant segmentation. While the performance differs slightly, all algorithms in this work have a time complexity of approximately . We also provided a synthetic dataset with ground truth segmentations and a real-world dataset on which to test the proposed algorithms and to foster further development in this domain.

This work enables the development of a high throughput plant phenotyping system by providing algorithms to act as a first step to break up a reconstruction into manageable parts representing individual plants. More specialized algorithms will need to be developed to operate on the resulting clusters, but many machine learning techniques have already shown promising results in such niche contexts. The main contribution of this work is not only that the proposed algorithms beat the current state-of-the-art method with the same time complexity, but also the generality and thus applicability of the proposed algorithms. As long as plants have a single vertical structure, one of the Ground Density algorithms should be able to find and segment them. Although the algorithms were not tested on plant structures besides corn, the weeds in the corn fields that were segmented can be taken as evidence that these algorithms may also extend to other plant species as well.

## Viii Acknowledgements

A special thanks goes to Dimitris Zermas and Sentera, Inc. for processing imagery. USDA/NIFA has supported this work through the grant 2020-67021-30755. This work was partially supported by the National Science Foundation through grants #CNS-1439728, #CNS-1544887, and #CNS-1939033.

Comments

There are no comments yet.