1 Introduction
The structures of different plant species pose a range of challenges for 3D scanning and reconstruction. Various solutions to the issue of digitally imaging plants have been reported. For plants with larger leaves and simple structures (such as maize, sorghum, cereal seedlings) it is possible to capture a small number of digital images from various viewing angles (typically 3), analyse these in 2D, then develop a relationship between these 2D poses and the (destructively measured) leaf area and biomass of the species [1]. However, commercial systems using this approach are relatively expensive, with closed and proprietary analysis software; these have generally been deployed in large phenomics centres with conveyor based systems [2]. The 2D approach has difficulty in resolving concavities, leaf overlap and other occlusions; many of the powerful image analysis tools which can be applied to 3D meshes (e.g., volumetric based shape recognition and organ tracking) are more difficult in 2D [3].
Laser scanning, e.g., light detection and ranging (LIDAR), has been applied to plant digitisation but reconstructing a mesh from a pointcloud sufficiently dense to capture thin narrow leaves is computationally intensive. While this approach has been applied successfully to forestry [4, 5] and to statistical analysis of canopies, it is not well suited to extracting single plant attributes [4, 6]. Full waveform LIDAR is extremely expensive; simpler machine vision LIDAR systems of sufficient resolution can cost tens of thousands of dollars. Structured light approaches using affordable sensors such as the Kinect gaming sensor or cameravideo projector setups do not offer the resolution or spatial repeatability to cope with complex plant structures [7, 8].
Recently, approaches using multiple images from a larger number of viewing angles have yielded promising results [3, 9, 10, 11, 12, 13], either by using a silhouettebased 3D mesh reconstruction method or patch based stereo reconstruction transforming the images to point clouds. Silhouettebased reconstruction is prone to errors in camera calibration and silhouette extraction. Image acquisition for 3D reconstruction remains largely manual [14], limiting the speed and accuracy of 3D reconstruction.
Existing background removal techniques for silhouette extraction [15, 16] are not reliable for complex plant structures. For current patch based methods, reconstruction quality is usually poor due to both weaktextures (patterns on leaves) and thin structures. In both cases, high accuracy 3D reconstruction requires a very rigid imaging station and the engineering required for sensor integration is costly. Recent work has included semiautomated approaches to modelling the plants [17, 18] increasing completeness and visual quality at the expense of throughput.
To address the shortcomings of existing scanning systems, we describe “PlantScan Lite”, an affordable automated 3D imaging platform system to accurately digitise complex plant structures. This can be readily built for less than AU$8000 of components (including DSLR cameras). The system has a number of novel elements including the use of highresolution digital singlelens reflex (DSLR) cameras synchronised with a turntable; a highaccuracy, easytosetup camera calibration; and an accurate background removal method for 3D reconstruction.
2 Methods
PlantScan Lite uses a four step process of 3D scanning and reconstruction (Fig. 1):

Image acquisition. Multipleview images of a calibration target, a plant of interest and background are captured using a turntable and tilting cameras.

Camera calibration estimates camera parameters for all captured images.

Image processing corrects for optical distortion and extract plant silhouettes (background removal).

3D reconstruction. This paper focuses on the visual hull reconstruction method which uses silhouette images and corresponding camera parameters and poses to create a 3D mesh model. The model is then processed to extract plant geometry information.
2.1 Image acquisition system
PlantScan Lite’s image acquisition system consists of (Fig. 2):

Two Canon DSLR cameras (1, 2) at an angle of approximately 40 degrees. The cameras are powered by AC adapter kits and connect to a computer through USB cables for tethering capture.

Two aluminum frames, one tilted (3) and one fixed to the ground (4). The tilted frame is to mount the cameras and move them up/down. The frames join together with two hinges where the axis of rotation crosses that of the turntable near the middle of a plant to be scanned. Two guiding bars (5) are attached to the lower frame (4) both to keep the tilted frame moving on a vertical plane.

Two Phidget bipolar stepper motors (6, 7); one drives a turntable, the other moves the upper frame and the two cameras via a threaded rod (8). Both stepper motors have a gearbox to increase the rotation resolution (0.018 degree/step for the turntable) and torque (18 kgcm for the threaded rod). The turntable controller is synchronised with the cameras via software based on Phidget Python library [19] and Piggyphoto[20].
2.2 Camera calibration for turntable
Fig. 3 shows the schematic description of the system with a calibration target and a plant (notation follows [21]). A chessboard target has a local coordinate system with relative pose to the world which is fixed to the turntable. A camera at tilt position j has a local coordinate system and relative pose to the world coordinates . Standard mono camera calibration only provides camera pose relative to target at angles where the chessboard pattern is visible to the camera. The aim of camera calibration for the turntable is to find intrinsic parameters and poses for all cameras at all positions. Three constraints are considered: a) turntable rotation angle is known, b) images captured by the same camera have the same intrinsic parameters, and c) different extrinsic parameters are assigned to different tilt positions as if these belong to different cameras that form a vertical arc. The calibration process consists of (1) mono and stereo camera calibrations (2) plane and circle fitting (3) camera to world pose derivation, and (4) optimisation of all camera parameters. Theses step are detailed below.
2.2.1 Mono and stereo camera calibration
The OpenCV library provides mono and stereo camera calibration routines [22, 23]. The calibration procedure starts by taking images of a chessboard target at different angles and distances. Corners of the squares on chessboard target are detected. To make this detection efficient, images of the chessboard calibration target are repeatedly scaled down 50% in a pyramid fashion to approximately 1K 1K resolution at the top level. The detected corner positions are obtained at the top level (lowest resolution) using OpenCV’s function findChessboardCorners and tracked with subpixel accuracy using function cornerSubPix on images at lower levels of the pyramid.
Fig. 4
describes transformations between different coordinate systems. Thin arrows represent coordinate vectors. A pair of orthogonal arrows denotes a coordinate system. Thick arrows show relative pose between two coordinate systems. The coordinates (in mm) of a corner
p with respect to the chessboard target coordinate system are represented as . The position of p relative to a camera coordinate system is . The relationship between and is expressed as:(1) 
where is a rotation matrix, a translation vector. This rotation translation matrix represents the extrinsic parameters of camera K relative to target T. The rotation matrix can be represented as a angleaxis rotation vector, so extrinsic parameters have only 6 independent components . The “” operator is a matrixtovector multiplication.
Point p forms an image on camera sensor at coordinates . An extended pinhole camera model is used to describe this relationship between and with radial optical distortion:
(2) 
(3) 
(4) 
where is focal length, optical center on image, and radial distortion coefficients. A vector represents intrinsic camera parameters.
To calibrate the camera, multiple images of the same target are captured at different pan angles and tilt angles , where i = 0 to and j = 0 to . Mono camera calibration, using OpenCV’s calibrateCamera (based on [24]), takes lists of and and computes intrinsic parameters for the camera, and extrinsic parameters for each of the images.
Fig. 4B shows the multiple camera setup with an additional camera at a different tilt position. The same point p is seen by the camera at coordinates . The transformation between the two camera coordinate systems gives:
(5) 
The transformation between cameras K and is equal to stereo transformation (using OpenCV’s stereoCalibCameras). If K and are of the same camera but at different tilting angles and , the transformation becomes . The “” operator denotes matrixtomatrix multiplication. The transformation can be applied repeatedly between successive camera pairs at different tilt positions. Given stereo transformations between successive cameras and the pose of the first camera, the poses of other cameras are also obtained.
2.2.2 Estimation of axis and centre of rotation
Location of the world coordinate system fixed to the rotation axis is found in two steps (Fig. 5): (1) compute the normal vector n of the plane containing the rotation orbit of the target’s corners, and (2) fit a circle to find the centre of rotation o. The normal vector and centre of rotation define the world coordinates.
First, rotation axis is estimated from the rotational motion of the chessboard target. From extrinsic parameters , the same chessboard corner is seen as moving on a circular orbit as shown in Fig. 5A. Note that the chessboard pattern and its positions can only be detected in 1/4 to 1/3 of the number of target images around the circular orbit. The equation of the orbit plane is expressed as:
(6) 
where is the centroid (or the mean) of all positions on the same orbit, and is the normal vector of the plane.
Equation (6) can be rewritten in matrix form where and is a matrix containing chessboard positions relative to the centroid of all the positions. Vector
is an eigenvector corresponding to the smallest eigenvalue obtained from Singular Value Decomposition of matrix
.Second, to find the centre of rotation of the calibration target, a different coordinate system is used (Fig. 5B). is in fact equivalent to with a rotation transformation such that the yaxis is parallel to . In , a 2D circle can be fitted onto the target point orbit and the centre of rotation can be obtained. The relationship between and with respect to is:
(7) 
The transformation has the form , where is a rotation matrix whose angleaxis rotation vector can be obtained as:
(8) 
(9) 
where is the rotation angle and is a vector around which the rotation is applied to turn to yaxis of . The bar denotes vector normalization.
is computed from by Rodrigues’ formula:
(10) 
After applying the rotation transformation to target positions, a circle can be fitted to coordinates by a Linear LeastSquares algorithm [25]. This fitting gives the centre of the orbit , with to be the averaged ycomponent of the target point positions in coordinate system.
Now the world coordinate system is set at centre of rotation and with its axes parallel to those of , the transformation from to is:
(11) 
where . As a result, pose of camera K relative to can be expressed as:
(12)  
(13) 
2.2.3 Estimation of camera poses relative to the world coordinate system fixed to the turntable axis
Since the pose of the camera at tilt position is obtained, the pose of any additional camera relative to the world coordinate system can be obtained from a given stereo transformation as shown in Fig. 6A:
(14) 
Similarly, the pose of the target relative to the world coordinate system is expressed as:
(15) 
We are interested in the reverse transformation to obtain target point position in the world coordinate system:
(16)  
(17)  
(18) 
Since there are multiple estimates of for different rotation angle , for zero rotation angle is obtained by applying an inverse of the rotation to the corresponding pose:
(19)  
(20) 
where is the matrix obtained from angleaxis rotation vector (equation (10)), and .
Since the world coordinate systems and the target are fixed, the camera coordinate system needs to move in a circle around y axis of to represent the correct relative motion seen by the camera, as shown in Fig. 6B. The pose of camera K for rotation angle is:
(21) 
The pose of camera for rotation angle is :
(22)  
(23) 
2.2.4 Optimisation to refine camera parameters
A nonlinear leastsquare optimisation is applied to refine estimates of camera intrinsic parameters and anglerotation vector and translation vector of camera pose and at different tilt angles, and target inverse pose . Optimisation seeks to minimize is the pixel distance between projected target corners to camera and the corners on the actual images.
Estimated position of chessboard corner relative to camera K at rotation :
(24) 
A pinhole camera projection with radial distortion is applied with equations (2), (3) and (4) to obtain corresponding image point . This is applied to other tilt positions and the second camera. The squared distance between image projection of estimated corners and their detected image positions is minimised.
2.3 Background subtraction
Plants with large leaves can cast strong shadows, so a simple image threshold will not completely remove the background. We found that a shadow removal algorithm based on static background proposed in [15] performs background removal for thin leaves more accurately and with less computation than other techniques [26, 27, 28]. Here, we extend the technique of [15] for LAB color space [29], further improving background removal accuracy.
Suppose , and are the luminance and two color channels of the current image and , and the corresponding image channels of the background image. Three error functions applied to each pixel position are defined as follows:
(25) 
(26) 
(27) 
These error functions , and represent the differences in luminance, color and texture respectively. An overall score is computed to determine a pixel as foreground or background:
(28) 
where the values of , , are found empirically.
A threshold is applied to to separate background and foreground. For our images, , , and to 10 were found to work well. Unlike [15], our proposed technique allows for segmentation of dark objects (such as the plant pots) and this is controlled via coefficient . Fig. 7 shows an example of background removal using our proposed algorithm.
2.4 3D reconstruction
2.4.1 Bounding box estimation
The bounding box of the subject is obtained in two steps:

An initial 3D bounding box is estimated based on silhouettes from the most horizontal camera view. These silhouette images are overlapped/combined into a single image and 2D bounding box is computed (Fig. 8). The Y axis and the origin are projected onto this overlapped image. The crossing points of the projected axis with the 2D bounding box are mapped back to the turntable axis in 3D space to obtain and . The back projection of the rectangle width to the world origin gives a single magnitude for , , and .

A refined bounding box is calculated from a 3D reconstruction at a low resolution ( voxels) using the initial bounding box. This takes only a few seconds to compute. Particularly, we found that no thin parts of the plant are missing when reconstructed at low resolution. As a result, the refined bounding box tightly contains the 3D space of the plant.
2.4.2 Volume reconstruction
In this work, the 3D plant was reconstructed using a visualhull volume carving approach. This method recovers sharp and thin structures common to plants (although a major drawback is that it cannot correctly recover concave surfaces, making reconstructions of curved surfaces such as leaves thicker than they should be). There may be plant movements induced by air circulation or mechanical vibration which can be accounted for by some tuning during reconstruction. The reconstruction method consisted of 3 steps:

A 3D volume equal to the bounding box is generated and split into voxels. Each voxel is repeatedly projected into the silhouettes and its 2D signed distance to the nearest boundary of each silhouette is calculated. If the distance is negative (outside) in any of the silhouettes, the voxel is flagged as empty. To accommodate some uncertainty in the silhouettes and plant movements, a voxel is set to be removed if it is outside more than a fixed number of silhouettes (3 is chosen in this paper). The process repeats until the end where remaining voxels form a 3D hull model of plant. An octree structure is used for voxel removal from lowest resolution to the highest resolution [30], giving a 3X speedup as compared to processing all voxels of a full resolution 3D volume.

Removal of pot and pot carrier. Since we are only interested in the plant, the pot and pot carrier need to be removed to simplify mesh analysis as well as reduce the mesh size. One method is to subtract voxels inside a given bounding tapered cylinder of the pot and pot carrier. Fig. 9 shows the snapshot of the mesh of one of the plastic plants before and after pot subtraction. This needs to be done before the 3D meshing step to produce a clean and watertight mesh.

3D meshing by marching cube from the remaining voxels. A grid point is checked against 8 surrounding voxels. The value of the 8 surrounding voxels is matched with a 256element lookup table to determine if the grid is on or close to the mesh so that a polygon can be created from this grid and nearby grid points.
2.5 Mesh segmentation and geometry
Mesh segmentation algorithms involve assigning a unique label to all the vertices of the 3D mesh that belong to the same region. This paper uses a simplified version of the “hybrid” segmentation pipeline previously presented in [3]. Primarily it is based around a constrained regiongrowing algorithm. In short, the curvature and normal for the 3D mesh were precomputed. A user defined curvature threshold was provided to find large “flat” regions (e.g., broad leaves) for use as seed regions. A curvatureconstrained region growing was then performed from each seed region. The geometry (area, width, length, and circumference) of each segmented leaf was then extracted using the approach outlined in [3]. The result for large plastic plant is shown in Fig. 10.
3 Results
Fig. 11 shows reconstructions of different plants with different complexity and leaf shapes. 3D meshes of plants with thin and narrow leaves are reconstructed with excellent geometric agreement, although there is a minor discrepancy at the tips of the leaves. For visual comparison the pots are included in this figure, however they are removed (as in Fig. 9) before geometry measurement.
The same plastic plants are also reconstructed twice with different numbers of input images to see how this affects the reconstruction quality. Without tilting the cameras, it took 3 minutes for two cameras to capture a total of 72 images, as compared to 30 minutes to capture 360 images where the two camera were moved to 5 tilt positions (taking half of the total scanning time). A visual comparison (not shown in paper) does not show obvious differences between the two meshes of the same plant.
The leaves of the large plastic plant (bottom of Fig. 9) were dissected and scanned to measure the length, width, perimeter and area). There are 12 leaves grouped into 3 sizes shown in Fig. 12 and Tab. I. Since the leaves mostly curve along their length, the length measurement is likely to be affected. For validation of the reconstruction accuracy, the width of the leaves is chosen as this is less affected by the curving.
A quantitative comparison between the two cases is shown in Fig. 13A and B. The ground truth obtained from the 2D scans of the dissected leaves (Fig. 12) was graphed against the measurement obtained from the 3D meshes of the plant. To fit into the plots, the values of perimeter are scaled down half and the values of area are rootsquared. It can be seen that the measurements agree quite well with the ground truth. The average relative error of the measurement is 4.0% for 72 images and 3.3% for 360 images:
Leaf  Length (mm)  Width (mm)  Perim. (mm)  Area (mm) 

B  152.6  74.5  350.89  8535.6 
C  186.9  96.6  436.16  13303.4 
D  221.0  115.6  519.1  18868.0 
4 Conclusion and discussion
We have presented a complete system for automatic highresolution 3D plant phenotyping. Several technical solutions in camera calibration, image processing and 3D reconstruction have been proposed for high accuracy of 3D mesh models. Notably, we proposed a camera calibration procedure that uses a standard chessboard calibration target that is easy to make and use in production environment. We also proposed an extension of foreground segmentation to LAB color space for improved segmentation accuracy for plants with thin leaves commonly found in major crop plants.
The system captures high quality images with accurate camera poses for imagebased 3D reconstruction algorithm. The quantitative measurements using 3D visual hull algorithm provided an estimate of the accuracy of the whole system in general. We showed that useful metrics such as leaf width, length and area can be obtained with high accuracy from the 3D mesh models. Fast scanning only takes 3 minutes (72 images) per plant and still produces a reasonable measurement (4% error). More images (360 images) per plant is required for better accuracy (3.3% error) especially for complex plant structure, but requires 5 to 10 times more time to scan.
Future works include a calibration using both pan and tilt axes so that camera pose can be obtained for an arbitrary pair of pantilt rotation angles. This would enable a more flexible scanning trajectory other than circular rotation with fixed number of images per tilt angle.
Acknowledgments
Chuong Nguyen acknowledges the support by ARC DP120103896 and CE140100016 through ARC Centre of Excellence for Robotics Vision (http://www.roboticvision.org), CSIRO OCE Postdoctoral Scheme and the National Collaborative Infrastructure Strategy (NCRIS) project, ”Australian Plant Phenomics Centre”. Thanks to Dr. Geoff Bull of the High Resolution Plant Phenomics Centre for his valuable feedback to the manuscript.
References
 [1] K. Rajendran, M. Tester, and S. J. Roy, “Quantifying the three main components of salinity tolerance in cereals,” Plant, cell & environment, vol. 32, no. 3, pp. 237–249, 2009.
 [2] Danforth, “Improve human condition through plant science,” 2016. [Online]. Available: https://www.danforthcenter.org
 [3] A. Paproki, X. Sirault, S. Berry, R. Furbank, and J. Fripp, “A novel mesh processing based technique for 3d plant analysis,” Bmc Plant Biology, vol. 12, 2012.
 [4] D. L. Jupp, D. Culvenor, J. Lovell, G. Newnham, A. Strahler, and C. Woodcock, “Estimating forest lai profiles and structural parameters using a groundbased laser called ‘echidna®,” Tree physiology, vol. 29, no. 2, pp. 171–181, 2009.
 [5] X. Yang, A. H. Strahler, C. B. Schaaf, D. L. Jupp, T. Yao, F. Zhao, Z. Wang, D. S. Culvenor, G. J. Newnham, and J. L. Lovell, “Threedimensional forest reconstruction and structural parameter retrievals using a terrestrial fullwaveform lidar instrument (echidna®),” Remote sensing of environment, vol. 135, pp. 36–51, 2013.
 [6] D. Deery, J. JimenezBerni, H. Jones, X. Sirault, and R. Furbank, “Proximal remote sensing buggies and potential applications for fieldbased phenotyping,” Agronomy, vol. 4, no. 3, pp. 349–379, 2014.
 [7] C. V. Nguyen, S. Izadi, and D. Lovell, “Modeling kinect sensor noise for improved 3d reconstruction and tracking,” Second Joint 3dim/3dpvt Conference: 3d Imaging, Modeling, Processing, Visualization & Transmission (3dimpvt 2012), pp. 524–530, 2012.
 [8] Y. Y. Li, X. C. Fan, N. J. Mitra, D. Chamovitz, D. CohenOr, and B. Q. Chen, “Analyzing growing plants from 4d point cloud data,” Acm Transactions on Graphics, vol. 32, no. 6, 2013.
 [9] X. Sirault, J. Fripp, A. Paproki, P. Kuffner, C. Nguyen, R. Li, H. Daily, J. Guo, and R. Furbank, “Plantscan: a threedimensional phenotyping platform for capturing the structural dynamic of plant development and growth,” in Proceedings of the 7th International Conference on Functional–Structural Plant Models, Saariselkä, Finland, 2013, pp. 9–14.

[10]
A. Tabb, “Shape from silhouette probability maps: reconstruction of thin objects in the presence of silhouette extraction and calibration error,”
2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pp. 161–168, 2013.  [11] L. Lou, Y. Liu, M. Sheng, J. Han, and J. H. Doonan, A costeffective automatic 3D reconstruction pipeline for plants using multiview images. Springer, 2014, pp. 221–230.
 [12] M. P. Pound, A. P. French, E. H. Murchie, and T. P. Pridmore, “Automated recovery of threedimensional models of plant shoots from multiple color images,” Plant Physiology, vol. 166, no. 4, pp. 1688–U801, 2014.
 [13] T. T. Nguyen, D. C. Slaughter, N. Max, J. N. Maloof, and N. Sinha, “Structured lightbased 3d reconstruction system for plants,” Sensors, vol. 15, no. 8, pp. 18 587–18 612, 2015.
 [14] T. Duan, S. Chapman, E. Holland, G. Rebetzke, Y. Guo, and B. Zheng, “Dynamic quantification of canopy structure to characterize early plant vigour in wheat genotypes,” Journal of Experimental Botany, 2016.
 [15] K. H. Lo, M. T. Yang, and R. Y. Lin, “Shadow removal for foreground segmentation,” Advances in Image and Video Technology, Proceedings, vol. 4319, pp. 342–352, 2006, lecture Notes in Computer Science.
 [16] N. D. Campbell, G. Vogiatzis, C. Hernández, and R. Cipolla, “Automatic object segmentation from calibrated images,” in Visual Media Production (CVMP), 2011 Conference for. IEEE, 2011, pp. 126–137.
 [17] L. Quan, P. Tan, G. Zeng, L. Yuan, J. D. Wang, and S. B. Kang, “Imagebased plant modeling,” Acm Transactions on Graphics, vol. 25, no. 3, pp. 599–604, 2006.
 [18] K. X. Yin, H. Huang, P. X. Long, A. Gaissinski, M. L. Gong, and A. Sharf, “Full 3d plant reconstruction via intrusive acquisition,” Computer Graphics Forum, vol. 35, no. 1, pp. 272–284, 2016.
 [19] Phidgets, “Language  python  phidgets support,” 2016. [Online]. Available: http://www.phidgets.com/docs/Language__Python
 [20] A. Dumitrache, “Piggyphoto,” 2011. [Online]. Available: https://github.com/alexdu/piggyphoto
 [21] P. Corke, Robotics, vision and control: fundamental algorithms in MATLAB. Springer, 2011, vol. 73.
 [22] G. Bradski, “The opencv library,” Dr Dobbs Journal, vol. 25, no. 11, pp. 120–126, 2000.
 [23] OpenCV, “Camera calibration and 3d reconstruction,” 2016. [Online]. Available: docs.opencv.org/doc/tutorials/calib3d/camera_calibration/camera_calibration.html
 [24] Z. Zhang, “A flexible new technique for camera calibration,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 22, no. 11, pp. 1330–1334, 2000.
 [25] I. D. Coope, “Circle fitting by linear and nonlinear leastsquares,” Journal of Optimization Theory and Applications, vol. 76, no. 2, pp. 381–388, 1993.
 [26] P. KaewTraKulPong and R. Bowden, An improved adaptive background mixture model for realtime tracking with shadow detection. Springer, 2002, pp. 135–144.

[27]
Z. Zivkovic, “Improved adaptive gaussian mixture model for background subtraction,” in
Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, vol. 2. IEEE, 2004, pp. 28–31.  [28] Z. Zivkovic and F. van der Heijden, “Efficient adaptive density estimation per image pixel for the task of background subtraction,” Pattern recognition letters, vol. 27, no. 7, pp. 773–780, 2006.
 [29] Wikipedia, “Lab color space,” 2016. [Online]. Available: https://en.wikipedia.org/wiki/Lab_color_space
 [30] R. Szeliski, “Rapid octree construction from image sequences,” CVGIP: Image understanding, vol. 58, no. 1, pp. 23–32, 1993.
Comments
There are no comments yet.