Joint Multi-Leaf Segmentation, Alignment and Tracking from Fluorescence Plant Videos

by   Xi Yin, et al.

This paper proposes a novel framework for fluorescence plant video processing. The plant research community is interested in the leaf-level photosynthetic analysis within a plant. A prerequisite for such analysis is to segment all leaves, estimate their structures, and track them over time. We identify this as a joint multi-leaf segmentation, alignment, and tracking problem. First, leaf segmentation and alignment are applied on the last frame of a plant video to find a number of well-aligned leaf candidates. Second, leaf tracking is applied on the remaining frames with leaf candidate transformation from the previous frame. We form two optimization problems with shared terms in their objective functions for leaf alignment and tracking respectively. A quantitative evaluation framework is formulated to evaluate the performance of our algorithm with four metrics. Two models are learned to predict the alignment accuracy and detect tracking failure respectively in order to provide guidance for subsequent plant biology analysis. The limitation of our algorithm is also studied. Experimental results show the effectiveness, efficiency, and robustness of the proposed method.


page 1

page 3

page 4

page 5

page 10

page 14


Analyzing Insect-Plant Predation Data By Bayesian Nonparametrics

In the prospect of ecology and biology, studying insect-plant predation ...

Adaptive Plant Propagation Algorithm for Solving Economic Load Dispatch Problem

Optimization problems in design engineering are complex by nature, often...

On Duality Of Multiple Target Tracking and Segmentation

Traditionally, object tracking and segmentation are treated as two separ...

A Coarse-To-Fine Framework For Video Object Segmentation

In this study, we develop an unsupervised coarse-to-fine video analysis ...

The subset-matched Jaccard index for evaluation of Segmentation for Plant Images

We describe a new measure for the evaluation of region level segmentatio...

Power Plant Performance Modeling with Concept Drift

Power plant is a complex and nonstationary system for which the traditio...

Track Facial Points in Unconstrained Videos

Tracking Facial Points in unconstrained videos is challenging due to the...

1 Introduction

Plants are the major organisms that can produce biomass and oxygen by absorbing solar energy. One key problem in plant growth study is to understand the photosynthetic activities of plants under various external stimulus or genetic variations. For this purpose, plant researchers conduct large-scale experiments in a chamber, as shown in the left part of Fig. 1, where the temperature and light conditions can be controlled, and ceiling-mounted fluorescence cameras capture images of the plants during their growth period [1]. The pixel intensity of the fluorescence image indicates the photosynthetic efficiency (PE) of the plants. Given such a high-throughput imaging system, the massive amount of resultant visual data calls for advanced visual analysis in order to study a wide range of plant physiological problems [2].

Fig. 1: Leaf-level plant analysis. Given a fluorescence plant video captured during its growth period, our algorithm performs multi-leaf segmentation, alignment, and tracking jointly, i.e., estimating unique and consistent-over-time labels for all leaves and their individual leaf structure like leaf tips.

Leaves at different developmental ages may respond differently to changes of environmental conditions [3]. For example, biologists may be interested in the heterogeneity of PE among leaves, the heterogeneity of PE over time, and whether the PE of younger leaves is more sensitive to the change of light conditions. Therefore, it is important to provide a leaf-level visual analysis, which answers the essential questions such as how many leaves are there, what are their structures, and how do they change over time. These problems are the main focus of this paper.

As shown in Fig. 1, given a fluorescence video as input, our algorithm performs multi-leaf segmentation, alignment, and tracking jointly. Specifically, leaf segmentation [4] detects the edge of each leaf and thus the total number of leaves in one plant. Leaf alignment [5] estimates the leaf structure by aligning with labeled leaf templates. Leaf tracking [6] associates the leaves over time. This multi-leaf analysis is a challenging problem due to a number of factors. First, fluorescence images are of low resolution and result in very small leaf sizes that can be hard for humans to clearly recognize every leaf. Second, there are various degrees of overlap among leaves, which pose significant challenges in estimating their leaf boundaries and structures. Third, leaves of a single plant may exhibit various shapes, sizes, and orientations, which also change over time. Therefore, effective algorithms should be developed to handle all these challenges.

To the best of our knowledge, there is no previous study focusing on leaf segmentation, alignment, and tracking simultaneously from plant videos. To solve this new problem, we develop two optimization-based algorithms for multi-leaf alignment and tracking respectively. Specifically, leaf alignment is based on the well-known Chamfer matching (CM) algorithm [7], which is used to align one object instance in an image with a given template. However, classical CM may not work well for aligning multiple overlapping leaves. Motivated by the crowd segmentation work [8], where both the number and locations of pedestrians are estimated simultaneously, this paper proposes a novel framework to jointly align multiple leaves in an image. We first generate a large set of leaf templates with various shapes, sizes, and orientations. Applying all templates to the edge map of a plant image leads to the same amount of transformed leaf templates. Our leaf alignment is an optimization process to select a subset of leaf candidates to best explain the test image.

While leaf alignment works well for one image, applying it to every video frame independently does not enable tracking - associating aligned leaves over time. Therefore, we formulate leaf tracking on one frame as a problem of transforming multiple aligned leaf candidates from the previous frame. Due to the slow plant growth, the tracking optimization initialized with results of the previous frame can converge very fast and thus result in enhanced leaf association and computational efficiency.

In order to predict the alignment accuracy as well as tracking performance, two quality prediction models are learned respectively. We develop a quantitative analysis with three metrics to evaluate the multi-leaf segmentation, alignment, and tracking performance simultaneously. It is implemented by using the leaf structure labels of fluorescence plant videos. The experimental results demonstrate the effectiveness and robustness of our proposed approach.

In summary, this paper has four main contributions:

We identify a novel computer vision problem of joint leaf segmentation, alignment, and tracking from fluorescence plant videos. We collect a dataset for this novel problem and make it publicly available to facilitate future research and comparison.

We propose two optimization processes for leaf alignment and tracking respectively. By optimizing designed objective functions, our method estimates the leaf number and structure over time effectively.

We build two quality prediction models to predict the alignment accuracy of a leaf in each frame and detect tracking failure of a leaf over time, which is used during the tracking process.

We set up a quantitative evaluation framework with three metrics to jointly evaluate the performance of segmentation, alignment, and tracking.

Compared to the earlier work [5, 9], four main changes have been made: One term is added to the objective function for leaf alignment and one term is modified in the objective function for leaf tracking. The proposed method is shown to be superior to [5, 9] on a larger dataset. We build two quality prediction models to estimate the alignment accuracy and tracking performance for every leaf at each frame. We enhance the performance evaluation procedure so that error caused by tracking failure does not influence the alignment accuracy. We study the limitation of our tracking algorithm and find it is very robust to leaf template transformation.

Fig. 2: Overview of joint leaf segmentation, alignment, and tracking method.

2 Prior Work

There are a lot of well-studied problems on leaves in computer graphics. For example, a shape and appearance model [10] of leaves is proposed to render photo-realistic images of plants. A data-driven leaf synthesis approach [11] is proposed to produce realistic reconstructions of dense foliage. However, these models may not be applied to fluorescence images due to the lack of leaf appearance information.

Computer vision has prior work on tasks such as leaf segmentation [12, 4], alignment [13, 5], tracking [6, 9], and identification [14, 15, 16]. However, most of prior work focuses on only one or two of these tasks. In contrast, our method addresses leaf segmentation, alignment, and tracking simultaneously.

Image segmentation is a well-studied topic with lots of prior work. For example, a marker-controlled watershed segmentation method [17] is introduced to segment leaf images with complicated background. Teng et al. [4] develop a leaf segmentation and classification system from natural images with the manual assistance from humans. A similar system is also developed by using 3D points from a depth camera [12]. The existing work on leaf segmentation are all targeting at images either with a single leaf on a clean background [18, 13], or with the single dominant leaf in the natural setting [19, 20].

In contrast, the multiple overlapping leaves in our application make it hard to isolate the segmentation and alignment problems. Therefore, we solve these two problems simultaneously using a novel extension of Chamfer matching (CM) [7]. The well-known CM is widely used to align two images based on their edge maps. However, CM, its extensions [21, 22], and other image alignment methods, e.g., ASM [23], AAM [24], are all designed to align a single object instance within a test image. Our work extends CM to align multiple potentially overlapping object instances in an image.

Leaf tracking models the leaf transformations over time. A probabilistic parametric active contours model [6] is applied for leaf segmentation and tracking to automatically measure the average temperature of leaves. However leaves of those images are well separated without any overlap. And the initialization of the active contours is based on the groundtruth segments. In contrast, our leaf alignment framework can handle leaves with overlap, and leaf tracking is initialized from leaf candidates of previous frame without using any groundtruth labels.

With respect to tracking performance evaluation, there are many different measures used by various authors. A recent study [25] narrows down the set of potential measures to only two complementary ones: the average overlap and failure rate. However, this is the performance evaluation for tracking alone. We develop a novel and comprehensive evaluation scheme to measure the performance of multi-leaf joint segmentation, alignment, and tracking.

3 Our Method

As shown in Fig. 2, given a fluorescence plant video, we first apply leaf segmentation and alignment simultaneously on the last frame of the video to find a number of well-aligned leaf candidates. Leaf tracking can be treated as an alignment problem of all leaf candidates initialized from a previous frame. During the tracking process, a leaf candidate whose size is smaller than a threshold will be deleted. And a new candidate will be generated and added for tracking once there is a certain region of the plant image mask that has not been explained by the existing leaf candidates. Two prediction models are learned and applied to all leaf candidates in real time to predict the alignment quality and tracking performance respectively. For clarity, all used notations are summarized in Table I.

Notation Definition
: D coordinates of an edge map
: D coordinates of a leaf template
D coordinates of a transformed leaf template

-dim row vector of a plant image mask

a leaf template mask
a -dim row vector of a transformed template mask
numbers of leaf template shapes, sizes, and orientations
the total number of leaf templates,
objective functions for alignment and tracking
the distance transform image of
the diagonal length of
the center of a plant image
the center of the leaf candidate
a collection of transformed templates
a matrix collecting all from
a -dim - indicator vector
, a -dim vector of CM distances and angle errors in
a constant value used in
, the number of estimated and labeled leaves in a frame
a collection of selected leaf candidates
a set of transformation parameters
is the parameter for
estimated and labeled tips for one leaf
estimated and labeled tips for one frame
collections of estimated and labeled tips for all videos
the total number of labeled leaves
the tip-based error normalized by leaf length
a matrix of leaf correspondence
a matrix of tip-based errors in one frame
a collection of all for labeled frames
the number of leaf without correspondence
vectors to save tip-based errors used in Algorithm 2
a threshold for performance evaluation
performance metric : unmatched leaf rate
performance metric : landmark error
performance metric : tracking consistency
the quality to predict alignment accuracy
the quality to predict tracking failure
features to learn quality prediction models
weights used in
weights used in
step sizes in the gradient descent of and
the smallest leaf size our algorithm can process
TABLE I: Notations.

3.1 Multi-leaf Alignment Algorithm

Our multi-leaf alignment algorithm mainly consists of two steps, as shown in Fig. 2. First, a set of leaf templates is applied to the edge map of a test image to find an over-complete set of transformed leaf templates. Second, we formulate an optimization process to estimate an optimal subset of leaf candidates according to a joint objective function.

3.1.1 Candidate nomination via chamfer matching

Chamfer matching (CM) is a well-known technique used to find the best alignment between two edge maps. Let and be the edge maps of a template and a test image respectively. CM distance is computed as the average distance of each point in with its nearest edge point in :


where is the number of edge points in . CM distance can be computed efficiently via a pre-computed distance transform image , which calculates the distance of each coordinate to its nearest edge point in . During the Chamfer matching process, an edge template is superimposed on and the average value sampled by the template edge points equals to the CM distance, i.e., .

Given a test image with multiple leaves, we first apply Sobel edge detector to generate an edge map and a distance transform image . The basic idea of leaf alignment is to transform the 2D edge coordinates of a template in the leaf template space to a new set of 2D coordinates in the test image space so that the CM distance is small, i.e., the leaf template is well aligned with .

Fig. 3: Forward and backward warping.

Image warping: There are two types of transformations involved in our multi-leaf alignment framework, as shown in Fig. 3. Both transformations include scaling, rotation and translation. Let be a forward warping function that transfers 2D edge points from the template space to the test image space, parametered by :


where is the in-plane rotation angle, is the scaling factor, and are the translations along and axis respectively. is the center of the leaf, i.e., the average of all coordinates of . Including allows us to model the leaf scaling and rotation w.r.t. the individual leaf center – a typical phenomenon in plant growth.

The second transformation is the backward warping: , from the image space to the template space. We denote as a matrix including all coordinates in the test image space. Thus, are the corresponding coordinates of in the template space. The purpose for this backward warping is to generate a -dim vector , which is the warped version of the original template mask .

Fig. 4: Leaf template scaling and rotation from basic template shapes.

Transformed templates and leaf candidates: Because there are large variations in the shapes of all leaves in , it is infeasible to match leaf with only one template. We manually select basic templates (first row in Fig. 4) with representative leaf shapes from the plant videos and compute their individual edge map and mask .

While aligning the basic templates to a test image, we choose to synthesize an over-complete set of transformed templates by selecting a discrete set of and . This leads to an array of leaf templates shown in Fig. 4, where , and , are the numbers of leaf sizes and orientations respectively, which are expected to include all potential leaf configurations in . For each template , it goes through all possible locations on and the location with the minimum CM distance provides the and optimal to . Note that the yellow and green points in Fig. 4 are the two labeled leaf tips, which will be used to find the corresponding leaf tips in according to Eqn. 1. Therefore, with the manually selected , , and exhaustive chosen and , transformed templates are generated from basic templates. For each transformed template, we record the 2D edge coordinates of its basic template, warped template mask, transformation parameters, CM distance and the estimated leaf tips as .

Note that is an over-completed set of transformed leaf templates including the true leaf candidates as its subset. Hence, the critical question is how to select such a subset of candidates from based on certain objectives.

3.1.2 Objective function

The goal of leaf segmentation and alignment is to discover the correct number of leaves and estimate the structure of each leaf precisely. If the candidates are well selected, there should not be redundant leaves or missing leaves (Fig. 5 (a,b)). And each leaf candidate should be well aligned with the edge map, i.e., with a small CM distance and the long leaf axis pointing toward the plant center. The rationality behind our method leads to a four-term objective function used to optimize the joint selection of all leaf candidates, which seeks the minimal number of leaf candidates () with small CM distances () and small angle differences () to best cover the test image mask ().

Fig. 5: Undesired (a,b) and desired (c) solutions.

The objective function is defined on a -dim indicator vector , where means that the transformed template is selected and otherwise. Hence uniquely specifies a combination of transformed templates from . denotes the number of selected leaf candidates. And it is the first term .

We concatenate all from to be a -dim vector . The second term, i.e., the average CM distance of selected leaf candidates, can be formulated as:

Fig. 6: The process of generating and .

The third term is the comparison of the synthesized mask and the test image mask. As shown in Fig. 6, given a test image, we obtain its image mask by foreground segmentation and convert it to a -dim row vector by raster scan. Similarly, each warped template mask is a -dim vector with its pixel being indicating the leaf region and elsewhere. The collection of from all transformed templates is denoted as a matrix . Note that is indicative of the synthesized mask except that the values of overlapping pixels are larger than . In order to make it in the range of to so as to be comparable with , we employ the function, similar to prior image alignment work [26, 27],


where is a constant controlling how closer the function approximates the step function. Note that the actual step function cannot be used here since it is not differentiable and thus is difficult to optimize. The constant within the parentheses is a flip point separating where the value of will be pushed toward either or . Therefore, the third term becomes:


One property of Arabidopsis plant is that the long axes of most leaves point toward the center of the plant. To take advantage of this domain-specific knowledge, the fourth term encourages the rotation angle of transformed leaf template to be similar to the direction of the leaf center to the plant center. Figure 7 shows the geometric relationship of the angle difference. That is, the angle difference can be computed by , where and are the geometric centers of a plant and a leaf, i.e., the average coordinates of all points in and respectively, is the distance between the leaf center and the plant center, and is the rotation angle. Furthermore, since this property is more dominant for leaves far away from the plant center, we weight the above angle difference by and normalize it by the image size. The weighted angle difference is,


where is the diagonal length of . Similarly, we compute for transformed templates from and concatenate them to be a -dim vector . Thus the fourth term becomes the average weighted angle difference of selected leaf candidates: .

Fig. 7: The angle difference - the long axis of leaves should point to the plant center.

Finally, our objective function is:


where , and are the weights. The four terms jointly provide guidance on what constitutes an optimal combination of leaf candidates.

3.1.3 Gradient descent based optimization

The optimization process is to minimize the objective function in Eqn. 7. Apparently exhaustive search is not feasible due to the high computational cost. And integer programming cannot be applied due to the nonlinear function . Instead, we propose a suboptimal gradient descent-based optimization to solve this problem, which is possible owing to the smoothness of Eqn. 7. Specifically, the derivative of the objective function w.r.t.  is:


where is a function returning the sign of each element in vector , and is the element-wise division of vectors.

All elements in are initialized as , i.e., all transformed templates are initially selected. In each iteration of gradient descent, is updated by . The element with the largest gradient will be chosen, which means that this element has a relatively larger influence in minimizing the objective function. Then we verify whether this element should be fixed to either or in order to obtain a smaller . Once this element has been fixed, its value remains unchanged in future iterations. The total number of iterations equals to the number of transformed leaf templates . Finally, all elements in will be either or and those elements equal to provide the combination of leaf candidates.

This joint leaf segmentation and alignment is applied on the last frame of a plant video and results in leaf candidates that will be used for leaf tracking in remaining video frames. We denote the set of leaf candidates selected from as , which means the basic leaf template after transformation under can result in a leaf candidate that is well-aligned with the edge map .

3.2 Multi-leaf Tracking Algorithm

Leaf tracking aims to assign the same leaf ID for the same leaf through the whole video. In order to track all leaves over time, one way is to apply leaf alignment framework on every frame of the video and then build leaf correspondence between consecutive frames. However the leaf tracking consistency can be an issue due to the potentially different leaf segmentation results for different frames. Therefore, given the slow plant growth between consecutive frames, we form an optimization problem for leaf tracking based on template transformation.

3.2.1 Objective function

Similar to the formulation of the objective function in Eqn. 7, we formulate a three-term objective function parameterized by a set of transformation parameters , where is the transformation parameters for leaf candidate .

The first term is to update so that the transformed leaf candidates are well aligned with the edge map of current frame. It is computed as the average CM distance:


The second term is to encourage the synthesized mask from all transformed candidates to be similar to the test frame mask . Since the synthesized mask of one transformed leaf candidate is , we formulate the second objective as:


The same as Eqn. 6, the third term is the average weighted angle difference:


Finally, the objective function is formulated as:


where and are the weighting parameters.

Note the differences in two objective functions and . Since the number of leaves is fixed for tracking, is not needed in the formation of . We use function in because the magnitude in the synthesized mask can be very large due to all transformed templates being selected initially. And is used to convert all elements in the synthesized mask to the range of to . While during tracking, the number of leaves is fixed and relatively small, and is not needed because the synthesized mask is already comparable to the test image mask.

3.2.2 Gradient descent based optimization

Given the objective function in Eqn. 12, our goal is to minimize it by estimating , i.e., . Since involves texture warping, it is an nonlinear optimization problem without close-form solutions. We use gradient descent to solve this problem. The derivation of w.r.t. can be written as:


where and are the gradient images of at and axis. These two gradient images only need to be computed once for each frame. and can be easily computed from Eqn. 2 w.r.t. , , and separately.

Similarly, the derivation of w.r.t.  is:


where and are the gradient images of the template mask at and axis respectively. and can be computed based on the inverse function of Eqn. 2.

The derivation of w.r.t.  is more complex than to the other three transformation parameters. For clarity, we give an example of the derivative over :


During the optimization process, is initialized as the transformation parameters of leaf candidates from previous frame and updated by applying for each leaf at iteration . The iteration stops when there is little change in or it reaches the maximum iteration. Note that this is a multi-leaf joint optimization problem because the computation of involves all leaf candidates.

3.2.3 Leaf candidates update

Given a multi-day fluorescence video, we apply leaf alignment algorithm on the last frame to generate and employ the leaf tracking toward the first frame. Due to plant growth and leaf occlusion, the number of groundtruth leaves may not be the same throughout the entire video. If the size of any leaf candidate at one frame is less than a threshold (the smallest leaf size), we will remove it from the leaf candidates.

On the other hand, a new leaf candidate should be detected and added to . To do this, we compute the synthesized mask of all leaf candidates and subtract it from the test image mask to generate a residue image for each frame. Connected component analysis is applied to find components that are larger than . We then apply a subset of leaf templates to find a leaf candidate based on the edge map of the residue image. The new candidate will be recorded and tracked in the remaining frames. Figure  8 shows one example.

Fig. 8: Generating a new leaf candidate in tracking.

3.3 Quality Prediction

While many computer vision algorithms strive for perfect performance, it is inevitable that unsatisfactory or failed results may be obtained on challenging test samples. It is a critical goal for a computer vision algorithm to be aware of this situation. One approach to achieve this goal is to perform the quality prediction for the computer vision tasks, similar to quality estimation for fingerprint [28] and face [29]. The key tasks in our work include leaf alignment, estimating the two tips of a leaf, and leaf tracking, keeping leaf consistency over time. Therefore, we learn two different quality prediction models to predict the alignment accuracy and detect the tracking failure respectively.

3.3.1 Alignment quality

Suppose is the alignment accuracy of a leaf, which is used to indicate how well the two tips are aligned. We envision what factors may influence the estimation of the two tips. First of all, the CM distance is the overall estimation of how well the template and the test image are aligned. Second, a well-aligned leaf candidate may have large overlap with the test image mask and small overlap with neighboring leaves. Third, the leaf area, angle and distance to the plant center may influence the alignment result. Therefore, we extract a -dim feature vector denoted as from the alignment result of each frame as: the CM distance , the overlap ratio with the test image mask , the overlap ratio with the other leaves , the area normalized by test image mask , the angle difference and the distance to the plant center

. A linear regression model can be learned by optimizing the following objective on

training leaves with groundtruth .


where is a -dim weighting vector for features in . The learned model can be applied to predict the alignment accuracy of each leaf.

3.3.2 Tracking quality

Due to the limitation of our algorithm, it is possible that one leaf might diverge to the location of adjacent leaves and results in tracking inconsistency. We name it tracking failure. One example can be found in Fig. 9, where leaf replaces leaf

in the third frame. The goal of tracking quality prediction is to detect the moment when tracking starts to fail. We denote tracking quality as

, where means a tracking failure of one leaf and means tracking success.

Similar to 3.3.1, we first extract a -dim feature for one particular leaf. Noted that alone can not predict the tracking performance because it does not take into account the tracking results over time. Therefore, we compare the -dim feature of the current frame with that of a reference frame , which is frames before . Since tracking failure may result in abnormal changes in leaf area, angle, distance to the center, etc. We compute the leaf angle difference, leaf center distance, leaf overlap ratio between the current frame and the reference frame. Finally, we form a -dim feature denoted as : , , the leaf angle difference , the leaf center distance , the leaf overlap ratio . Given the training set , where

, a SVM classifier is learned as the tracking quality model.

4 Performance Evaluation

Leaf segmentation is to detect the correct number of leaves in a test image. Leaf alignment is to correctly estimate two tips of all individual leaves. And leaf tracking is to keep the leaf ID consistent over the video. In order to quantitatively evaluate the performance of joint leaf segmentation, alignment, and tracking, we need to provide the groundtruth of the number of leaves in each frame, the two tips of all leaves, and the leaf IDs for all leaves in the video.

As shown in Fig. 9, we label the two tips of individual leaves and manually assign their IDs in several frames of one video. We record the label result in one frame as a matrix , where is the number of labeled leaves and records tip coordinates of leaf in this frame. The collection of all labeled frames in all videos is denoted as , where , is the number of labeled videos and is the number of labeled frames in each video. And the total number of labeled leaves in is .

Fig. 9: Step in Algorithm 2 of one video with frames. , , and are accumulated for all video frames and used in step to compute , , and . In frame , we demonstrate the process of Algorithm 1.
Input: Estimated leaf tips matrix () and labeled leaf tips matrix ().
Output: , , and .
Initialize ===.
for  do
        for  do
for  do
        ;; ; ; .
Algorithm 1 Build leaf correspondence [, , ] = leafMatch (, ).
Input: Tracking results , label results .
Output: , , and .
Initialize , .1.for  do
        = cell(, ), .for  do
               ; ; ;; ; ; ;
       for  do
               ;; ; ;
2.for  do
        ; ; .
Algorithm 2 Performance evaluation process.

During the template transformation process, the corresponding points of transformed template tips in become the estimated leaf tips . Similarly to the data structure of , the tracking results of all videos over the labeled frames can be written as . Given and , Algorithm 2 provides our detailed performance evalution, which is also illustrated by a synthetic example in Fig. 9 for easier understanding.

We start with building frame-to-frame leaf correspondence, as shown in Algorithm 1 and the red dotted box in Fig. 9. To build the leaf correspondence of estimated leaves with labeled leaves, a matrix is computed, which records all tip-based errors of each estimated leaf tips with every labeled leaf tips normalized by labeled leaf length:


We build the leaf correspondence by finding minimum errors in that do not share columns or rows. It results in leaf pairs and leaves without correspondence. Finally, it outputs the number of unmatched leaf , recording tip-based errors and recording the leaf correspondence. This frame-to-frame correspondence is built on all frames and the results are added together in and . We build the video-to-video leaf correspondence using the accumulated . and are the tip-based errors of leaf pairs with frame-to-frame correspondence and video-to-video correspondence respectively. The difference of and is from estimated leaf , while it is well aligned with labeled leaf in the third frame, but it does not have leaf correspondence if we consider all frames together.

We define a threshold to operate on and . Finally we compute three metrics by varying . Unmatched leaf rate is the percentage of unmatched leaves w.r.t. the total number of labeled leaves . Noted that can attribute to two sources, leaf without correspondence and the correspondence leaf pairs whose tip-based errors are larger than . Landmark error is the average of all tip-based errors in that are smaller than . Tracking consistency is the percentage of leaf pairs whose tip-based errors in are smaller than w.r.t. . These three metrics can jointly estimate our algorithm performance in leaf segmentation (), alignment (), and tracking ().

Noted that only is considered in [9]. While it is reasonable to use to compute , it is unfair to use to compute because leaf is still well aligned in the third frame though it does not have video-to-video correspondence.

5 Experiments and Results

5.1 Dataset and Templates

To study the photosynthetic efficiency under different light conditions during days, our dataset includes videos, each captured from one unique Arabidopsis plant by a fluorescence camera. Each video has frames, with the image resolution ranging from to . For each video, we label the two tips of all visible leaves in frames, each being the middle frame of one day. In total we label leaves. The collection of all labeled tips is denoted as .

To generate leaf templates, we select leaves with representative shapes and label the two tips for each leaf. As in Fig. 4, the basic leaf templates are manually rotate to be vertical. We select scales for each leaf shape to guarantee the scaled templates can cover all possible leaf sizes in the dataset. For each scaled leaf template, we rotate it every in the space. Thus, the total number of leaf templates is with  111The dataset, label and templates used in this paper are publicly available at

5.2 Experimental Setup

For one plant video, our proposed method applies the alignment optimization to the last frame to generate a set of leaf candidates for tracking toward the first frame. Another option is to apply the alignment method on all the frames independently and build the leaf correspondence based on the leaf center distances between two frames. We compare our algorithm with: Baseline Chamfer Matching, Prior Work  [5],  [9], and Manual Results. Now we will introduce the setup for our method and three comparison methods.

Proposed Method templates are applied to the edge map of the last video frame to generate the same amount of transformed templates. We first narrow down the search space by deleting transformed templates with less than overlap ratio with the test image mask. The remaining ones will be used to calculate , , and in Eqn. 7. We experimentally set the parameters to be: , and . After leaf alignment, leaf candidates are generated and saved to be , which will initialize leaf tracking toward the first video frame.

Leaf tracking iteratively updates according to Eqn. 12. The parameter setting for leaf tracking is: , and . The iteration stops when there is little change in or the iteration exceeds . The area of smallest leaf is pixels. The tracking produce updates and initializes the next frame. We record the estimated tip coordinates of all leaf candidates in labeled frames as .

Fig. 10: (a) Alignment optimization where transformed leaf templates are deleted iteratively; (b) Tracking optimization where leaf candidates are transformed iteratively to align with the edge map ( iterations are used in this synthetic example). Numbers under images are the iteration number. Yellow/green dots are the estimated outer/inner leaf tips. Red contour is . Blue box encloses the edge points matching . The number on a leaf is the leaf ID. Best viewed in color.
Fig. 11: Qualitative results: (a) manual labels; (b) baseline CM; (c) [9]; (d) proposed method; and (e) manual results. Each column is one labeled frame in the video (day/frame).

Baseline Chamfer Matching The basic idea of CM is to align one object in an image. To align multiple leaves in a plant image, we design the baseline chamfer matching as an iterative version to align each leaf separately. At each iteration, we apply all templates to the edge map of a test image to find a large pool of transformed leaf templates, which is the same as the first step of our multi-leaf alignment. The transformed template with the minimum CM distance is selected and denoted as a leaf candidate. We update the edge map by deleting matched edge points of the selected leaf candidate. The iteration continues until of the edge points has been deleted. We apply this method to the labeled frames of each video and build the leaf correspondence based on leaf center locations.

Multi-leaf Alignment [5] We compare the proposed method with our earlier work in [5], where we do not have term in . The optimization process is the same as our proposed leaf alignment on the last frame. We apply [5] on all labeled frames and build the leaf correspondence based on leaf center distances.

Multi-leaf Tracking [9] We also compare the proposed method with [9]. The differences are the new in Eqn. 7, modified in Eqn. 12, and the scheme to generate a new leaf candidate during tracking (Fig. 8).

Manual Results In order to find the upper bound of our algorithm, we use the groundtruth labels to find the optimal set of . To do this, for each labeled leaf, we find the leaf candidate with smallest tip-based error from transformed templates, which is generated in the first step of our leaf alignment.

Fig. 12: Performance comparison of , , and vs. .

5.3 Experimental Results

Qualitative Results We apply all methods to the dataset of videos. Figure 10 shows the iterative results of leaf alignment and tracking. Figure 11 shows the results on the labeled frames within one video. These results illustrate that our method performs substantially better than the baseline CM,  [5], and [9].

Since the baseline CM only considers the CM distance to segment each leaf separately, leaf candidates are more likely to fail around the edge points and result in large landmark errors. While [9] can keep the leaf ID consistent, it does not include the scheme to generate new leaf candidate during the tracking (e.g., leaf in Fig. 11). Our proposed method can work perfectly on this video. It has the same segmentation as the labels and all leaves are well tracked. Leaf is deleted when it gets too small. Note that the manual result may not be perfect all the time. This is due to the limitation of the finite amount of templates. However in our tracking method, we allow template transformation under any parameters in without limiting to a finite number.

Quantitative Results We set the threshold to vary in and generate the performance curves for all methods, as shown in Fig. 12. It is noteworthy that our method can maintain lower landmark error and higher tracking consistency while segmenting more leaves. When is relatively small, i.e., we have very strict requirements on the accuracy of tip estimation, all methods work well for easy-to-align leaves. With the increase of , more and more hard-to-align leaves with relatively larger tip-based errors will be considered as well-aligned leaves and contribute to compute the landmark error and tracking consistency . Therefore, detecting more leaves will generally result in larger and . However with , the proposed method can achieve lower unmatched leaf rate than [5, 9], higher than [9] and higher than [5] in tracking consistency , without increasing landmark error . This mainly owns to the enhanced objective functions and the new scheme to add/delete leaf candidates during tracking.

Manual result is the upper bound of our algorithm. It is obvious that will be and will be with the increase of because we enforce the correspondence of all labeled leaves. But will not be due to the limitation of finite templates. Overall, the proposed method in this paper performs much better than baseline CM and our prior work. However there is still a gap between the proposed method and the manual results, which calls for future research.

Quality Prediction Two models are learned: a liner regression model for alignment quality prediction and an SVM classifier for tracking quality prediction.

(1) Alignment quality model: Data samples for evaluating our alignment quality model are selected from in Algorithm 2, which contains the tip-based errors of all leaf pairs with of them are less than . We select samples from for each interval of tip-based error within . Sample duplication is employed when the number of sample in a particular interval is less than . All samples with tip-based error larger than will also be selected but without duplication. Finally we select samples and extract features for each sample. We assign in order to make the model output in the range of . And for all samples with . We randomly select samples as the test set and the rest samples are used to learn the regression model in Eqn. 16. Figure 13 (a) shows the result of the model on both training and testing samples.

Fig. 13: (a) Alignment quality model applied to both training and test samples; (b) Tracking quality model applied to one video: the SVM classifier output (top row), the result after Gaussian filtering and thresholding (bottom row), and the two lines are the labeled starting and ending frames.

is calculated, which is used to measure how well the model fits our data, and defined as:


where is the predicted values of all test samples, is the mean of . In our model, and the correlation coefficients for all test samples is . Both values indicate a high correlation of and . This quality model is used to predict the alignment accuracy and generate one predicted curve for each leaf in one video, as shown in Fig. 2.

(2) Tracking quality model: We manually go through all tracking videos and find videos that have tracking failures of one leaf. As the goal for tracking quality model is to detect when tracking failure starts, we label two frames when failure starts and ends in each video. The starting frame is when a leaf candidate starts to change its location toward its neighbor leaves. The ending frame is when a leaf candidate totally overlaps the other leaf. Among these failure samples, the shortest tracking failure length is frames and the average length is frames.

We select - frames near the ending frame to be the negative training samples whose class labels and frames evenly distributed before failure starts to be the positive training samples whose class labels . Features are extracted as discussed in Sec. 3.3.2 and used to train the SVM classifier. The learned model is applied to all frames to predict the tracking quality. Figure 13

(b) shows an example of the output. We apply a Gaussian filter to remove outliers and delete those failure length with less than

frames (the shortest length of failure samples).

We compare the first frame of a predicted failure with that of a labeled failure. When their distance is less than frames (the average length of failure samples), it is considered as a true detection. Otherwise it is a false detection. Using the leave-one-video-out testing scheme, the quality model generates true detections and false detections over labeled failures. Similarly, this quality model is applied during tracking and outputs a prediction curve for each leaf (shown in Fig. 2).

Limitation Analysis Any vision algorithm has its limitation. Hence, it is important to explore the limitation of the proposed method in alignment and tracking. First, one interesting question in multi-leaf alignment is to what extend our alignment method can correctly identify leaves in the overlapping region. We answer this question using a simple synthetic example. As shown in Fig. 14, our method performs well when the percentage of overlap is less than . Otherwise it identifies two leaves as one leaf, which appears to be reasonable when the percentage is higher (e.g., ).

Fig. 14: Leaf alignment results on synthetic leaves with various amount of overlap. From left to right, the overlap ratio w.r.t. the smaller leaf is , , , , , and respectively.
Fig. 15: Mean tip-based error with different initializations. The axes on top of the figures show the initial tip-based errors of transforming all leaf candidates.
Fig. 16: Example results: the first row shows the initialization, and the second row shows the tracking results.

Second, leaf tracking normally starts with very good initialization of leaf candidates from the previous frame. Another interesting question is to what extend our tracking method can succeed with bad initializations. In order to study this, one frame is selected from videos with good tracking performance. We change the transformation parameters in to synthesize different amount of distortions and apply the proposed tracking algorithm on these frames. For the translation parameter, we define as the translation ratio and the direction is randomly selected. The leaf candidate is deleted only if it becomes one point and the tip-based error is set to be . We compute the average tip-based error of all leaf candidates in one frame.

By varying the rotation angle , scaling factor , and shift ratio , we generate the performance curves in Fig. 15, which shows the average and range of tip-based errors for all frames. As shown in Fig. 15, our proposed tracking method can reduce the initial tip-based error to a small amount. It is most robust to and most sensitive to .

Figure 16 shows some examples. For rotation angle less than , our method works well for different amounts of leaf rotations. For the scaling factor, as long as the leaf candidate is not too small, our method can be very robust even if we enlarge the original leaf candidates to be times larger. For the translation ratio, it is sensitive because the shifting direction is randomly selected and leaf candidates are very likely to shift to the locations of neighboring leaves. Changing the initialization of and for separate leaves (leaf in Fig. 16) leads to better performance than that of neighboring leaves (leaf in Fig. 16) because neighboring leaves will have overlap with each other and therefore influence the tracking performance. Overall, as the distortion increases, the average tip-based error increases while some of the leaf candidates can still be well aligned.

Results of Efficiency Table II illustrates the average execution time of each method. Our method is superior to the baseline CM and [5] in terms of efficiency. And it is a little slower than [9] because we update and detect new leaf candidates during the tracking process. The time is calculated based on a Matlab implementation on a conventional laptop.

Methods Baseline  [5]  [9] Proposed
Time 51.28 16.42 1.98 2.15
TABLE II: Computational efficiency comparison (sec./image).

6 Conclusions

In this paper, we identify a new computer vision problem of leaf segmentation, alignment, and tracking from fluorescence plant videos. Leaf alignment and tracking are two optimization problems based on Chamfer matching and leaf template transformation. Two models are learned to predict the quality of leaf alignment and tracking in real time. A quantitative evaluation algorithm is designed to evaluate the performance. The limitations of our algorithm are studied and experimental results show the effectiveness, efficiency, and robustness of the proposed method.

With the leaf boundary and structure information over time, the photosynthetic efficiency can be computed for each leaf, which paves the way for leaf-level photosynthetic analysis. In the future, 3D leaf alignment and tracking will be studied in order to ultimately model the interaction between 3D leaves and light rays. Note that very little domain knowledge of plants is used in our alignment and tracking optimization problems, neither in the evaluation process. Therefore, the proposed method and the evaluation scheme are potentially applicable to other multi-object alignment and tracking problems.


  • [1] Ladislav Nedbal and John Whitmarsh, “Chlorophyll fluorescence imaging of leaves and fruits,” in Chlorophyll a Fluorescence, pp. 389–407. Springer, 2004.
  • [2] Xu Zhang, Ronald J. Hause, and Justin O. Borevitz, “Natural genetic variation for growth and development revealed by high-throughput phenotyping in Arabidopsis thaliana,” G3: Genes, Genomes, Genetics, vol. 2, no. 1, pp. 29–34, 2012.
  • [3] C. P. Chen, X. G. Zhu, and S. P. Long, “The effect of leaf-level spatial variability in photosynthetic capacity on biochemical parameter estimates using the farquhar model: A theoretical analysis,” Plant Physiology, vol. 148, no. 2, pp. 1139–1147, 2008.
  • [4] Chin-Hung Teng, Yi-Ting Kuo, and Yung-Sheng Chen, “Leaf segmentation, its 3D position estimation and leaf classification from a few images with very close viewpoints,” in Image Analysis and Recognition, vol. 5627, pp. 937–946. Springer, 2009.
  • [5] Xi Yin, Xiaoming Liu, Jin Chen, and David M Kramer, “Multi-leaf alignment from fluorescence plant images,” in IEEE Winter Conf. on Applications of Computer Vision (WACV), Steamboat Springs CO, Mar. 2014.
  • [6] Jonas Vylder, Daniel Ochoa, Wilfried Philips, Laury Chaerle, and Dominique Straeten, “Leaf segmentation and tracking using probabilistic parametric active contours,” in Computer Vision/Computer Graphics Collaboration Techniques, vol. 6930, pp. 75–85. Springer, 2011.
  • [7] Harry G. Barrow, Jay M. Tenenbaum, Robert C. Bolles, and Helen C. Wolf, “Parametric correspondence and Chamfer matching: Two new techniques for image matching,” Tech. Rep., DTIC Document, 1977.
  • [8] Bastian Leibe, Edgar Seemann, and Bernt Schiele, “Pedestrian detection in crowded scenes,” in

    Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR)

    . IEEE, 2005, vol. 1, pp. 878–885.
  • [9] Xi Yin, Xiaoming Liu, Jin Chen, and David M Kramer, “Multi-leaf tracking from fluorescence plant videos,” in Proc. Int. Conf. Image Processing (ICIP), Paris, France, Oct. 2014.
  • [10] Long Quan, Ping Tan, Gang Zeng, Lu Yuan, Jingdong Wang, and Sing Bing Kang, “Image-based plant modeling,” in ACM SIGGRAPH, Boston, Massachusetts, 2006, pp. 599–604, ACM.
  • [11] Derek Bradley, Derek Nowrouzezahrai, and Paul Beardsley, “Image-based reconstruction and synthesis of dense foliage,” ACM Trans. Graph., vol. 32, no. 4, pp. 74, 2013.
  • [12] Yann Chéné, David Rousseau, Philippe Lucidarme, Jessica Bertheloot, Valérie Caffier, Philippe Morel, Étienne Belin, and François Chapeau-Blondeau, “On the use of depth camera for 3D phenotyping of entire plants,” Computers and Electronics in Agriculture, vol. 82, pp. 122–127, 2012.
  • [13] Guillaume Cerutti, Laure Tougne, Julien Mille, Antoine Vacavant, and Didier Coquin, “Understanding leaves in natural images–a model-based approach for tree species identification,” Comput. Vision and Image Understanding, vol. 117, no. 10, pp. 1482–1501, 2013.
  • [14] Sofiene Mouine, Itheri Yahiaoui, and Anne Verroust-Blondet,

    “Advanced shape context for plant species identification using leaf image retrieval,”

    in Proc. ACM Int. Conf. Multimedia Retrieval (ICMR), Hongkong, China, 2012, pp. 49:1–49:8, ACM.
  • [15] Neeraj Kumar, Peter N. Belhumeur, Arijit Biswas, David W. Jacobs, W. John Kress, Ida C. Lopez, and João VB. Soares, “Leafsnap: A computer vision system for automatic plant species identification,” in Proc. European Conf. Computer Vision (ECCV), pp. 502–516. Springer, 2012.
  • [16] Guillaume Cerutti, Laure Tougne, Julien Mille, Antoine Vacavant, Didier Coquin, et al., “A model-based approach for compound leaves understanding and identification,” in Proc. Int. Conf. Image Processing (ICIP), 2013, pp. 1471–1475.
  • [17] Xiao-Feng Wang, De-Shuang Huang, Ji-Xiang Du, Huan Xu, and Laurent Heutte, “Classification of plant leaf images with complicated background,” Applied mathematics and computation, vol. 205, no. 2, pp. 916–926, 2008.
  • [18] Guillaume Cerutti, Laure Tougne, Antoine Vacavant, and Didier Coquin, “A parametric active polygon for leaf segmentation and shape estimation,” in Advances in Visual Computing, pp. 202–213. Springer, 2011.
  • [19] Xianghua Li, Hyo-Haeng Lee, and Kwang-Seok Hong, “Leaf contour extraction based on an intelligent scissor algorithm with complex background,” in 2nd International Conference on Future Computers in Education, 2012, pp. 215–220.
  • [20] Antoine Vacavant, Tristan Roussillon, Bertrand Kerautret, and Jacques-Olivier Lachaud, “A combined multi-scale/irregular algorithm for the vectorization of noisy digital contours,” Comput. Vision and Image Understanding, vol. 117, no. 4, pp. 438–450, 2013.
  • [21] Ming-Yu Liu, Oncel Tuzel, Ashok Veeraraghavan, and Rama Chellappa, “Fast directional Chamfer matching,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR). IEEE, 2010, pp. 1696–1703.
  • [22] Tianyang Ma, Xingwei Yang, and Longin Jan Latecki, “Boosting Chamfer matching by learning Chamfer distance normalization,” in Proc. European Conf. Computer Vision (ECCV), pp. 450–463. Springer, 2010.
  • [23] T. Cootes, C. Taylor, and A. Lanitis, “Active shape models: Evaluation of a multi-resolution method for improving image search,” in Proc. British Machine Vision Conf. (BMVC), York, UK, Sept. 1994, vol. 1, pp. 327–336.
  • [24] Iain Matthews and Simon Baker, “Active appearance models revisited,” Int. J. Comput. Vision, vol. 60, no. 2, pp. 135–164, 2004.
  • [25] Luka Čehovin, Matej Kristan, and Aleš Leonardis, “Is my new tracker really better than yours?,” in IEEE Winter Conf. on Applications of Computer Vision (WACV), 2014.
  • [26] Xiaoming Liu, Ting Yu, Thomas Sebastian, and Peter Tu, “Boosted deformable model for human body alignment,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Anchorage, Alaska, June 2008, pp. 1–8.
  • [27] Xiaoming Liu, “Discriminative face alignment,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 11, pp. 1941–1954, 2009.
  • [28] Eyung Lim, Xudong Jiang, and Weiyun Yau, “Fingerprint quality and validity analysis,” in Proc. Int. Conf. Image Processing (ICIP). IEEE, 2002, vol. 1, pp. 469–472.
  • [29] Kamal Nasrollahi and Thomas B. Moeslund, “Face quality assessment system in video sequences,” in Biometrics and Identity Management, pp. 10–18. Springer, 2008.