Automated Image Analysis Framework for the High-Throughput Determination of Grapevine Berry Sizes Using Conditional Random Fields

12/15/2017 ∙ by Ribana Roscher, et al. ∙ University of Bonn 0

The berry size is one of the most important fruit traits in grapevine breeding. Non-invasive, image-based phenotyping promises a fast and precise method for the monitoring of the grapevine berry size. In the present study an automated image analyzing framework was developed in order to estimate the size of grapevine berries from images in a high-throughput manner. The framework includes (i) the detection of circular structures which are potentially berries and (ii) the classification of these into the class 'berry' or 'non-berry' by utilizing a conditional random field. The approach used the concept of a one-class classification, since only the target class 'berry' is of interest and needs to be modeled. Moreover, the classification was carried out by using an automated active learning approach, i.e no user interaction is required during the classification process and in addition, the process adapts automatically to changing image conditions, e.g. illumination or berry color. The framework was tested on three datasets consisting in total of 139 images. The images were taken in an experimental vineyard at different stages of grapevine growth according to the BBCH scale. The mean berry size of a plant estimated by the framework correlates with the manually measured berry size by 0.88.



There are no comments yet.


page 5

page 7

page 9

page 13

page 18

page 20

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Grapevine (V.vinifera L. subsp. vinifera) is one of the oldest and one of the economically most important fruit crops. Grapevines are highly susceptible to various diseases like powdery and downy mildew requiring high plant protection efforts. Hence, grapevine breeders around the world select for high disease resistance, climatically well adapted and high quality new cultivars (Töpfer et al. (2011)). Due to the specific cultivation of grapevines as a perennial plant e.g. fruit traits can only be evaluated in the vineyard and are highly influenced by environmental factors. Their evaluation requires several repetitions. Up to now phenotyping of grapevines in vineyards has been carried out by estimation applying the BBCH scale (Bloesch and Viret (2008)) or OIV descriptors (OIV (2001)). It is very time consuming, requires a lot of expertise and is expensive. The resulting data are subjective which make subsequent analyses more difficult like the identification of new Quantitative Trait Loci (QTL). Accurate phenotyping is the key tool for future plant breeding. Objectivity, automation and precision of phenotypic data evaluation are crucial in order to reduce the phenotyping bottleneck.

The application of digital image analysis tools and image interpretation techniques promise a technology for high-throughput phenotyping in order to (a) increase the quantity of phenotyping samples, (b) to improve the quality of recording and (c) minimize error variation. Low-level analysis tasks such as finding geometric objects (e.g. Peng et al. (2007); Chan and Shen (2005)) as well as tasks with introduced semantic higher-level information have been dealt within the literature for various applications. Especially, higher-level knowledge about the context and the spatial arrangement of objects have been early proved beneficial for object detection or semantic image segmentation (e.g. Bar and Ullman (1996); Biederman et al. (1982); Palmer (1975)). A well established way to incorporate this knowledge is the utilization of a conditional random field, which was introduced by Lafferty et al. (2001). It has been used for example by Gould et al. (2008), Galleguillos et al. (2008) as well as Rabinovich et al. (2007) in order to incorporate semantic context between detected objects of different pre-defined classes. Another approach was applied by Lafarge et al. (2010) or Descombes et al. (2009), who extract different kinds of geometric objects with point processes yielding an optimal object configuration. Such approaches assume that the objects are disconnected from each other and the background is distinct enough so that the objects are clearly visible (Lempitsky and Zisserman (2010)). This situation is not always given, even less for phenotyping in the field.

One challenge in digital image analysis for high-throughput phenotyping is that only one target class, such as ’berry’, is of interest. Other classes, which are necessary for multi-class classification, are hard to gather and cannot be specified in many cases due to their high intra- and inter-class variety. In order to overcome this problem, the concept of one-class classification has been introduced, which distinguishes one target class from all other classes without explicitly defining them (e.g. Khan and Madden (2010); Tax (2001); Moya and Hostetler (1993)

). In this framework, both conditional random fields and an one-class classifier are combined in order to find objects which belong to the target class ’berry’. Similar to

Song et al. (2013), who are using a conditional random fields in order to model temporal dependencies in an one-class dataset, this framework exploits information of the spatial arrangement of berries in clusters. Moreover, the framework uses an active learning approach (Settles (2010)) which defines the one-class dataset from scratch in each image. This has the advantage that no human user interaction is required during classification process and in addition, the process adapts automatically to changing conditions, e.g. illumination or berry color.

Image-based detection of grapes is known from precision viticulture. For example, Nuske et al. (2011) detect and count berries for yield estimation, Berenstein et al. (2010) detect and localize berry clusters for selective spraying or Mazzetto et al. (2010) monitor canopy health and vigour utilizing optical and analogue sensors. Image-based phenotyping in vineyards in order to support the identification of new molecular marker for grapevine breeding comprises more detailed detection and survey of small structures, e.g. single grapevine berries. The grapevine berry size is one of the most important target fruit traits in viticulture (Fanizza et al. (2005); Cabezas et al. (2006); Costantini et al. (2008)), whereas grapevine cultivars should preferentially have uniformity size of berries (Beslic et al. (2009)). In general, the berry diameter is estimated by experts applying the OIV descriptor number 221 (OIV (2001)). This descriptor enables the classification of the berry size into five classes (class 1: very narrow berries up to about  mm; class 2: narrow berries about  mm; class 3: medium berries about  mm; class 4: wide berries about  mm; and class 5: very wide berries about  mm and more). The results of the visual estimated berry diameter by humans are subjective resulting in error variations between the results of different people. In addition, precision from only  mm could be achieved, which is too inaccurate for precise berry size QTL calculations. Moreover, it should be noted that the manual estimation of sufficient amounts is very time consuming and consequently the classification of the berry size is only feasible on selected breeding material. Minor differences in berry sizes of only  mm have to be achieved on thousands of grapevines at few days (ensure comparability of records), which is possible using image-based approaches. The framework presented in the current study aimed at an automated estimation of the size of grapevine berries from single images, which were taken in an experimental vineyard at different developmental stages. Hereby, the detection of representative berries and the determination of their diameter will be included.

The field experiments, obtained plant material and images are introduced in Section 2.1 and Section 2.2. In Section 2.3 the proposed framework and its parts are introduced. Section 2.4 explains the introduced parts in more detail. The experiments and the obtained results are showed and discussed in Section 3. The paper concludes in Section 4.

2 Material and Methods

2.1 Plant Material

Field experiments were conducted during the growing season of 2012. Tests involved rows of the Vitis vinifera ssp. vinifera cultivars ’Riesling’, ’Pinot Blanc’, ’Pinot Noir’ and ’Dornfelder’ at the experimental vineyard of Geilweilerhof located in Siebeldingen, Germany (N 49°21.747, E 8°04.678). Fifteen plants per cultivar were used for image acquisition and the measurement of reference data.

2.2 Image acquisition and reference measurements

Image acquisitions were carried out using a single-lens reflex (SLR) camera (Canon® EOS 60D). Camera calibration was performed according to Abraham and Hau (1997) with a wide-angle of  mm equivalent focal length. Images (8-bit RGB, pixel) of grapevines were captured in the vineyard with a distance of about  m at three different plant development stages BBCH 75, BBCH 81 and BBCH 89 (Bloesch and Viret (2008)). The images were acquired under natural illumination field conditions with manually controlled exposure. Images were saved for offline processing. Reference measurements were conducted manually in parallel to image acquisition. Therefore, 50 berries per plant, cultivar and BBCH stage were randomly selected to measure the berry diameter by the utilization of an electronic calliper (Insize® Co.LTD, Conrad electronics SE, Hirschau, Germany). In order to transform measurements in the images from pixel to mm, colored labels with a width of  mm (Roth® GmbH, Karlsruhe, Germany) were fixed on the wires in the vineyard.

2.3 Framework

A five-step framework was developed using Matlab® (Mathworks, Ismaning, Germany) in order to extract phenotypic data from images (Figure 1). The steps include various image analyzing tools and interpretation methods, which are explained in more detail in Section 2.4. The challenge of the framework is the detection of as many berries as possible in order to extract a representative amount of phenotypic data while keeping the error rate of falsely detected berries as low as possible in order to ensure a high quality of the extracted data.

Figure 1: Image analysis framework for automated detection and measurement of grapevine berries. In (Step 1) the images are pre-preprocessed for (Step 2) in which circles for a reference set and a candidate set are detected. Various complementary features used for the classification are extracted in (Step 3). The detected candidate berries are classified in either ’berry’ or ’non-berry’ in (Step 4). In the end in (Step 5) the berry sizes are determined.
(Step 1) Pre-processing

The image is adjusted automatically regarding brightness, color and contrast in order to compensate illumination effects. For this the image is converted into the YIQ color space and adjusted, whereas Y is the luminance and I and Q contain the chrominance information. Moreover, the contrast is stretched.

(Step 2) Detection of circular structures (see Section 2.4.1)

Two sets of circles are determined using circular Hough transform (Peng et al. (2007)):

  • Automated detection of reference circles : Reference berries are image patches which are showing distinct circular structures. Assuming that the most dominant circles in one image are berries which can be used as training data in the classification process, the circle detector is applied with high constraints, i.e. the detector returns only very distinct circles.

  • Automated detection of berry candidates : Candidates for grapevine berries are all image patches which consist of at least a weak circular structure potentially showing a berry. The candidates are extracted by the circle detector using weak constraints, i.e. the detector also returns circles with low responses. The reference set is a subset of the candidate set, whereas all candidates represent the test data for the classification process. The test data is classified into the class ’berry’ and ’non-berry’.

(Step 3) Feature extraction (see Section 2.4.2)

Complementary features, namely color, histogram-of-oriented gradients (Dalal and Triggs (2005)) and gist (Oliva and Torralba (2001)), are extracted from image patches around the detected circles. The high-dimensional features are transformed into a low-dimensional feature space and used as the input for the classification process.

(Step 4) Classification of the image patches (see Section 2.4.3)

The classification of the image patches is performed in two steps.

  • Estimation of posterior probabilities: In order to estimate posterior probabilities, feature-wise thresholds are derived from the training data. After the application of the thresholds to the test data, the output is transformed with a sigmoid function into probabilities.

  • Application of a conditional random field (Lafferty et al. (2001)): A conditional random field is used to classify the extracted features of the candidates into the classes ’berry’ and ’non-berry’. It uses the estimated posterior probabilities and prior knowledge about the spatial arrangement of berries, i.e. that grapevine berries are arranged in clusters and have similar features such as color.

(Step 5) Determination of the berry size (see Section 2.4.4)

The size of the berries in the image is derived from the diameter of the detected circles classified as ’berry’ and a single scale in order to transform pixel values into mm. A more accurate result can be derived by using a depth map, which assigns a depth to each pixel rather than one depth to all pixels.

2.4 Image Analysis and Interpretation Methods

In the following section the proposed berry detection framework is introduced in more detail. Vectors

are denoted with small bold symbols and matrices with elements and column vectors with capital symbols. Calligraphy symbols are used for sets. The elements (scalars or vectors) of a set can be collected in a vector or a matrix G by concatenation, using the same letter of the alphabet.

2.4.1 Circle Detection

The circular Hough transform presented by Peng et al. (2007) is utilized with some modifications in order to detect circular shaped objects like berries in images and estimate their position  and radius . The values of possible diameters are restricted to a range .

(a) Original image
(b) Gradients
(c) Accumulation array
(d) Signature curve
(e) Detected Circle
Figure 2: Illustrated steps of the detection of circular structures. From the original image the gradients are obtained using a Sobel filter. The gradients are transformed to an accumulation array, whereas the bright peak indicates the position of the circle center. Using the signature curve the radius of the circle can be obtained.

First, a Sobel filter is applied to the -dimensional gray-valued image I yielding the gradient image in vertical direction and in horizontal direction . The magnitude is given by .

In order to find circular structures, the gradient field is converted to an accumulation array A of the same size. For this G is first thresholded by yielding the binary image . In a second step, each non-zero element in votes for several positions in the accumulation array with weights . High values in the accumulation array are indicating centers of circles.

Contrary to Peng et al. (2007)

in this framework the threshold is determined automatically utilizing a standard deviation ridge detector (

Hidayat and Green (2009)). The output of the standard deviation ridge detector is denoted with S, whereas the pixel-wise values of S are the standard deviation calculated in the local neighborhood of each pixel in I. The largest values are indicating boundaries between regions with different textures. The threshold is determined by testing several values for and correlating the obtained binary with S. The largest correlation coefficient indicates the best threshold. In this way, enough gradients are suppressed in order to remove noise and distinct gradients remain in order to search for circular structures. Using a fixed value for the threshold in the proposed framework would not lead to good results due to changing image conditions such as illumination and berry color and thus, changing magnitudes of the gradients.

Each non-zero value in votes for several positions in the accumulation array, namely for all coordinates of pixels that lie on the line segment defined by the gradient direction and the range of possible radii. Because the gradient directions point either towards the circle center or away from it, the sign of the gradient is omitted and the vote is added in both directions. For each vote the weights are derived from the output of the standard deviation ridge detector S. The votes are accumulated and peaks in the accumulation array indicates probable positions of circle centers, as can be seen in Figure 2. In order to find distinct peaks, the array is smoothed and a local maximum filter is applied. Moreover, the array is thresholded by yielding reference circles and yielding candidate circles.

The radii of the detected circle centers are determined using so-called signature curves (Figure 2(d)). A signature curve belonging to a detected circle center is a function of the radius. The function value of the curve is computed from the gradients supporting a circle when choosing a certain radius. The more distinct the circular structure given a specific radius is, the higher is the response. A more detailed description of the signature curves can be found in Peng et al. (2007).

Since the images are very cluttered and the gradient directions can be noisy, contrary to Peng et al. (2007) another step is introduced in order to refine the positions and radii of the detected circles. For this a sliding window approach is used, which is a localized search over space and scale. In this framework, for each detected circle a small accumulation array is built with three dimensions: shift of circle center in vertical direction, shift of circle center in horizontal direction, scale of the radius . Based on the current position and radius of a circle the position of the circle center is shifted in both directions within a range of and the radius is scaled within a range under the condition that the adjusted radius must lie within the restricted range introduced in the first paragraph in this section. For each shift and scale a circle is constructed with discrete pixel coordinates. The sum of the pixel values in the image S with these coordinates are used as weight in the accumulation array. The peak in the accumulation array indicate the shift and scale of the detected circle. One example can be seen in Figure 3.

(a) Detected circles without the refinement
(b) Detected circles after the refinement
Figure 3: The circle centers and radii of the detected circles (left image) are refined using a sliding window approach. The detected circles in the right image fit better to the border of the grapevine berries than the output of the circular Hough transform (left image).

Summarizing, the circle detector uses a coarse-to-fine strategy, since first the positions and radii of the circles are roughly determined using a circular Hough transform and afterwards adjusted using a sliding window approach.

The detected circle centers and radii of the reference circles and candidates are collected in the reference set , , and candidate set , , respectively. The reference set is a subset of the candidate set .

From the detected circles the training and test data for the classification is derived. All candidates are meant to be classified and are thus the test data. Training data is necessary to learn a classification model from which each candidate can be classified. Since there is no training data given in advance, an active learning approach is used. This approach uses the assumption that most of the detected reference circles are berries and thus, can be labeled as the class ’berry’. To be robust against sporadic falsely detected reference circles, not all reference circles are used as training data, which will be explained in more detail in Section 2.4.3. The training data is actively acquired from scratch by the circle detector for each new image and thus, the classification model automatically adapts to changing conditions such as illumination or berry color.

2.4.2 Feature Extraction

Within the framework color features , gradient based histogram-of-oriented-gradients (HoG) features as well as gist features describing the dominant spatial structure are used. Due to the separate treatment of these highly complementary features, they can be used with different weights in the conditional random field ensuring a best possible discrimination of patches into the classes ’berry’ or ’non-berry’.

Quadratic image patches of the size around the circle centers are defined, whereas is the radius of the -th detected circle. All patches are resized to an uniform size of to ensure an equal feature dimension for the candidates. While the dimension of HoG and gist features are independent of the patchsize, the resizing is necessary when extracting color features, which are used vectorized.

Color Features

Although the berries do not significantly differ from leafs and grass, which makes most of the background, the features can be used to discriminate circles which are berries and these which are positioned on canes, artificial background objects and ground. RGB color features are extracted and vectorized, so that a -dimensional feature vector is assigned to each candidate.

Histogram-of-Oriented-Gradients Features

Besides the color features also gradient information are used, which are represented as histogram-of-orientated-gradients (HoG) features (Dalal and Triggs (2005)). The structure of an image patch is described by the distribution of the magnitude and directions of gradients, in which the circular structure of a berry yield a characteristic HoG descriptor. Here, the image patches are convolved with a Sobel filter and divided into a fixed number of quadratic regions of equal size. In each region a orientation-based histogram of the unsigned gradients directions comprising a fixed number of bins is computed, whereas each gradient value casts a weighted vote for the histogram. The histogram entries are concatenated and vectorized yielding a feature vector assigned to each candidate.

Gist Features

Gist features are used in order to represent the dominant spatial structure of a patch/image such as roughness or openness. Following Oliva and Torralba (2001)

a gist descriptor is built based on a very low dimensional representation of the scene called spatial envelope. In order to describe the patch, a discrete Fourier transform is performed yielding the amplitude spectrum of the gray-valued image. The amplitude spectrum gives information about the structure of the image, such as the orientation and smoothness of object contours. Additionally, the energy spectrum is derived, which is the squared magnitude of the amplitude spectrum. Instead of using the Fourier transform for the whole image, a windowed Fourier transform for uniformly arranged, overlapping image parts is applied. Based on the values of the energy spectrum for each part the gist descriptor is built yielding a feature vector

assigned to each candidate.

Transformation of the Features

One challenge for one-class classification is the choice of the features for the target class and associated with this the definition of a suitable decision boundary in order to distinguish the target objects from all other objects. Therefore, instead of using the features in the original feature space, they are transformed into a lower-dimensional space. This makes it easier to find thresholds defining the decision boundary. From the decision boundaries posterior probabilities for each candidate can be derived.

The feature transformation follows the idea of correlation coefficient clustering proposed by Hsu and Hsieh (2010), in which data points with similar features are grouped in clusters when using their mutual correlation coefficients. Since it can be assumed that berries in one image have similar features, the new features of the candidates are derived from the median correlation to the reference patches


where is the correlation coefficient between the candidate feature vector and the reference feature vector . The kind of feature is generalized denoted with . Then small values indicate a faint resemblance to the reference patches and high values indicate a close resemblance, i.e. candidates with high feature values are most probable berries. The median is used in order to robustly define a threshold, which should meet the following conditions for the purpose of finding a representative amount of berries: circles with correlations coefficients larger than the threshold are most certainly berries and circles with correlations coefficients smaller than the threshold are no berries or precarious berries.

The introduced rgb, HoG and gist features are treated separately in the classification process for the feature-wise assignment of weights in the conditional random field depending on their discriminative power. However, an alternative representation as a concatenated feature vector is also possible.

2.4.3 Classification

Since not all candidates which are detected by the circle detector are berries, a classification problem is formulated in order to classify each candidate into the target class ’berry’ or all other objects belonging to the class ’non-berry’. The classification is done via a conditional random field, which uses posterior probabilities of the features as well as prior knowledge of the spatial relations between candidates.

Estimation of Posterior Probabilities of the Test data

In order to derive posterior probabilities for the target class ’berry’ and all other objects denoted with ’non-berry’, two steps must be performed. First, for each kind of feature a threshold must be found from which each candidate based on its feature vector can be assigned to one class with a certain degree of confidence. Second, the confidence is transformed into probabilities.

The threshold for a feature  is chosen to be the -percentile of the set of all correlation coefficients between the reference circles with . The parameter is the value so that percent of all correlation coefficients are smaller than . The larger is, the more candidates will be classified as ’berry’. The percentile is used in order to be robust against noise, effects such as occlusions, clutter and illumination changes and incorrectly classified reference circles.

This can be formulated in a probabilistic way by stating that all candidates whose feature vector is smaller than the threshold get a probability smaller than and are unlikely ’berry’. The probabilities can be obtained by a sigmoid transformation


with defining the sharpness of the probabilities.

Conditional Random Field

A conditional random field (CRF, Lafferty et al. (2001)) is an undirected graphical model, which is used to incorporate prior knowledge about the spatial relations between the candidates. Because neighbored candidates tend to have the same class label (’berry’ or ’non-berry’) if their features closely resemble each other, an irregular graph structure is introduced, which models the connection between these candidates. Besides this and the posterior probabilities in Section 2.4.3 also the average distance to neighbors are used within the model in order to prevent that isolated circles are classified as ’berry’.

The conditional random field model is defined as


The class labels of the circles are given by , which are either ’berry’ or ’non-berry’. The first three, unary terms are defined as the negative logarithms of the posteriors described in Section 2.4.3. The fourth, unary term is the negative logarithm comprising the average distance to neighbored circles. Since the final labeling of the candidates is assumed to be smooth within the image, i.e. neighbored candidates have the same class label, this prior knowledge is introduced by means of a data-depended Potts model in the fifth, binary term. The variable is the concatenation of all features which are used in the unary terms, whereas are transformed to a range . The set of spatial neighbors is denoted by . The variables are the weights between the terms. Graph-cut (Boykov et al. (2001)) is used to solve for the best labeling .

Besides the posterior probabilities described in Section 2.4.3 also the average distance to neighbored candidates is used. Because berries are grouped in clusters it is more likely that isolated circles belong to the class ’non-berry’. Therefore, the fourth, unary term is introduced that models the probability of a circle belonging to the class ’berry’ or ’non-berry’. From the neighborhood the mean distance from each candidate to its neighbors can be derived with


where is the number of neighbors. The probability that a candidate belongs to one of the classes is given by


The value is set to 3 times the current median diameter of the reference circles. The sigmoid function is cropped, because isolated candidates with no nearby neighbors are probable no berries, but vice versa candidates positioned nearby are not necessarily berries. Their probability are set to , so that the other terms decide whether these candidates are ’berry’ or ’non-berry’.

The binary term is introduced in order to favor that neighbored circles with similar features get the same class label. For example, if features of patches showing berries are uncertain regarding their class label (e.g. caused by illumination effects or occlusions) and their neighbors are berries with similar features, the binary term guides the decision into the correct direction to classify the circle as ’berry’.

In order to define the neighbors of each candidate, an irregular graph structure is defined by a Voronoi diagram, whereas the positions of the circles are the centers of the diagram (see Figure 4(a)). Thus, adjacent cells indicate neighbored candidates. Figure 4(b) shows four candidates and their neighbors, which where obtained from the Voronoi diagram.

The binary term is modeled as the euclidean distance between the concatenated features of two neighbored circles


involving the dot product of both vectors. The concatenation of all features is defined as , where is the median distance between the candidates in order to scale to a similar range as the other features. The term is only considered if two neighbors and get the same label, i.e. if two neighboring candidates have a close resemblance of their features they are likely to have the same class label, but if they have a faint resemblance of their features it does not automatically indicate that their class labels are unequal.

(a) Voronoi diagram for the derivation of the graph structure
(b) Irregular graph structure
Figure 4: Spatial relations between the candidates. Each candidate is connected to its neighbors (right image) derived from neighbored cells in the Voronoi diagram (left image). As an example, the right image shows four candidates and its connections as black lines. All spatial relations define the irregular grid structure within the conditional random field.

In the following the set of candidates which are labeled with ’berry’ is denoted as .

2.4.4 Determination of the berry size

The detected circle diameters are obtained in pixel. In order to transform the berry diameters into mm, a colored label of  mm width was fixed at the steel wire, and was used to calculate the conversion ratio between mm and pixel. The ratio is given by , where  [pixel] is the width of the colored marking observed in the image. Then the radius  [pixel] of a circle in the image can be transferred to . In these experiments the colored label is measured manually in the image. However, this process can by automated, e.g. if a stereo camera system is used with a fixed and known basis.

3 Experiments and Results

3.1 Experimental Setup

The experiments are conducted with the proposed framework, which is written in Matlab®. The images are resized to and the luminance and chrominance as well as the contrast are adjusted using a Matlab® file package available online111 The extraction of HoG features is done by an own implementation in Matlab®. There are cells used and orientations of unsigned gradients in each cell. In order to extract the gist features the implementation of Oliva and Torralba (2001) is used, which is available for download222 The number of cells for the windowed Fourier transform is set to , the number of scales is chosen to be and the number of orientations is set to for each scale. For the conditional random field the parameters are derived experimentally and set to , , , and . The weights are chosen best according to the discriminative power of the feature. Alternatively, the parameters can be learned using for example maximum likelihood, see Korč and Förstner (2008). The value for for the definition of the percentile is set to , which represents the median value. If no circles were classified as ’berry’, the value was increased to .

For the evaluation the mean diameter and the standard deviation are compared to the manual reference measurements. All estimated diameters are rounded to  mm steps and represented in a histogram. From the histogram the occurrence and frequency of the estimated diameters can directly be derived.

3.2 Results and Discussion

Three important berry development stages of grapevine were investigated 1) BBCH 75 – the pea size of berry development; 2) BBCH 81 – the beginning of ripening when berries start softening; and 3) BBCH 89 – the end of ripening and time of harvest. Table 1 shows the manual results and the obtained results of the framework for these data sets.

Stage Sort Mean # Mean diameter [mm]
Manually Framework MD MAD
BBCH 75 Riesling 71.1 8.5 (1.1) 9.5 (0.7) 0.9 1.0
Pinot Blanc 67.1 8.6 (1.2) 8.7 (0.8) -0.1 0.7
Pinot Noir 63.4 7.8 (1.4) 9.8 (1.0) 2.0 2.0
Dornfelder 83.1 8.9 (1.1) 10.1 (0.5) 1.2 1.2
BBCH 81 Riesling 156.7 11.8 (1.2) 11.8 (0.7) 0.0 0.6
Pinot Blanc 75.1 11.9 (1.2) 12.4 (0.4) 0.6 0.6
Pinot Noir 148.9 10.1 (1.6) 12.3 (1.0) 2.1 2.1
Dornfelder 202.7 12.1 (1.1) 13.2 (0.8) 1.1 1.2
BBCH 89 Riesling 102.7 13.5 (1.3) 14.2 (1.2) 1.1 1.5
Pinot Blanc 232.2 13.5 (1.2) 13.9 (0.6) 0.4 0.5
Pinot Noir 90.0 11.8 (1.8) 13.4 (0.9) 1.6 1.7
Dornfelder 112.7 15.4 (1.4) 15.2 (1.5) -0.1 1.0
Table 1: Sizes of three developmental stages of grapevine berries: 1) BBCH 75 – the pea size of berry development; 2) BBCH 81 – the beginning of ripening when berries start softening; and 3) BBCH 89 – the end of ripening and time of harvest. Reported are the number of detected berries # , the manual measurement of the mean diameter with the standard deviation in brackets, the estimated mean diameter and standard deviation in brackets obtained by the framework as well as the mean differences between the manual measurements and the estimated diameters (MD) and the mean absolute differences between the manual measurements and the estimated diameters (MAD).

The average difference for all images showing berries in BBCH 75 is  mm, for BBCH 81  mm and for BBCH 89  mm and the average absolute difference for all images showing berries in BBCH 75 is  mm, for BBCH 81  mm and for BBCH 89  mm. The average difference for ’Riesling’ computed for all stages of growth is  mm, for ’Pinot Blanc’  mm, for ’Pinot Noir’  mm and for ’Dornfelder’  mm and the average absolute difference for ’Riesling’ computed for all stages of growth is  mm, for ’Pinot Blanc’  mm, for ’Pinot Noir’  mm and for ’Dornfelder’  mm. Thus, the obtained averages (absolute) differences are similar when comparing the growth of stages, but vary for several sorts. The mean berry size of a plant estimated by the framework correlates with the manual measurements by , as illustrated in Figure 5.

Figure 5: Correlation plot of the manually measured and image-based measured berry diameter per image ()

There is an overestimation in the berry’s diameter for nearly all data sets. The main reason is that berries with a large diameter generally are more likely to be detected than berries with small diameters since their structure is more distinct. Contrary to this, the manual reference measurements were randomly selected. This could be an explanation for the surpassing overestimation for ’Pinot Noir’ of  mm. In 2012 the berry sizes per grape cluster especially varying for ’Pinot Noir’ in contrast to ’Riesling’, ’Pinot Blanc’ and ’Dornfelder’ underlined by the standard deviation over the mean diameters given in brackets in Table 1. Therefore, taking the mean of the estimates obtained from the framework lead to an overestimation since larger diameter have a higher weight due to their frequent occurrence in the detection result. Factors influencing this effect are the variations of the berry sizes, the compactness and arrangement or the color of the berries. Thus, a histogram of diameters is more meaningful than only using the mean in order to interpret the results.

In general, OIV descriptors are applied in order to classify grapevine traits, whereas the berry diameter is estimated using the OIV descriptor number 221. In contrast to the proposed image-based framework, the application of the OIV descriptor classifies the berry size into only five classes covering all expected sizes from  mm up to  mm. Thus, the usage of this classification resulting in missing precision. Moreover, beside the necessity that experts are needed, the results of the visual estimated berry diameter by humans are subjective resulting in error variations between the results of different people. Nevertheless, precise berry size data ( mm accuracy) are required from a mapping population of several hundreds of individual plants in order to enable fine mapping of QTL regions or in order to determine the berry size of cultivars before harvest. The non-invasive image capture of plants in the field followed by the automated image analysis framework ensures a more comprehensive phenotypic analysis in a high-throughput manner. It also enables phenotypic evaluation from several plants per genotype/cultivar which ensures several biological repeats.

Moreover, it should be noted that the manual estimation of a sufficient amount is very time consuming and usually it is not feasible within the regularly breeding programs. In the conducted experiments the manual measurement of reference berries directly in the field by using a digital caliper needs  minutes per plant. Thus, precise berry size data could be recorded by hand from grapevines in one hour. In comparison to that,  seconds are needed in order to capture one image per grapevine. That implies that the acquisition of images in the field is about

times faster compared to making manual measurements. Except the provision of the captured images the framework needs no human user interaction and automatically analyze the images in order to make it available to the user. Thus, the analysis can be performed in parallel to the usual work within the breeding program, but also allows for a retrospective analysis. At the moment the program needs about

 minutes per image on a standard computer.

(a) Riesling BBCH 75
(b) Riesling BBCH 81
(c) Riesling BBCH 89
(d) Pinot Blanc BBCH 75
(e) Pinot Blanc BBCH 81
(f) Pinot Blanc BBCH 89
(g) Pinot Noir BBCH 75
(h) Pinot Noir BBCH 81
(i) Pinot Noir BBCH 89
(j) Dornfelder BBCH 75
(k) Dornfelder BBCH 81
(l) Dornfelder BBCH 89
Figure 6: Differences between the mean the manual measurements.

Figure 6 shows the differences of the estimated mean berry sizes to the manual measurements for all sorts and stages of growth. It can be seen that the mean diameter have in most cases differences not more than  mm. The highest differences are observed when small berries are not distinct enough to be detected by the circle detector. The variations in the plots can be explained by the fact that each image has different conditions regarding illumination and visibility of berries. Thus, a reliable evaluation is only guaranteed if several images of the grapevine or images of grapevines of the same sort around the same time are acquired and their results averaged.

(a) Original image of Pinot Blanc (BBCH 81)
(b) Original image of Pinot Blanc (BBCH 89)
(c) Classification result of Pinot Blanc (BBCH 81) without binary term
(d) Classification result of Pinot Blanc (BBCH 89) without binary term
(e) Classification result of Pinot Blanc (BBCH 81) with CRF
(f) Classification result of Pinot Blanc (BBCH 89) with CRF
Figure 7: Top row: Original images; Second and third row: Example results of candidates with position and radius (circles) and their classification: red are candidates which are classified as ’berry’ and blue are candidates which are classified as ’non-berry’. Second row: Classification result without the usage of the binary term; Third row: Classification result with the usage of the CRF, i.e. unary and binary term.

Figure 7 shows the classification results and the obtained berry sizes of two images of the sort ’Pinot Blanc’. The first row shows the original image, the second row the classification result using only the unary terms without introduced prior knowledge about the arrangement of berries and the third row shows the classification result using the conditional random field. Using the conditional random field with unary and binary terms yields slightly better results for BBCH 75 and BBCH 81, but a significantly better result for BBCH 89. Isolated berries are eliminated and many circles which features are classified as ’non-berry’ are correctly classified as ’berry’ when introducing the prior knowledge about the arrangement of the berries. The presented results reflect the obtained classifications of the other images used in this experiments, so that the conditional random field contributes most to images showing BBCH 89. Considering the images showing BBCH 75, the classification with and without conditional random field yield similar results. One reason is that for BBCH 89 more reference circles as well as candidates could be found than for BBCH 75, and thus a more suitable neighborhood graph can be obtained since the assumption that neighbored candidates tend to have the class label is fulfilled best. Another reason is that the images of BBCH 89 were taken on sunny days, so that the appearing backlight causes many distinct structures in the background with features similar to these of berries, yielding detected circles which are incorrectly classified as ’berry’. These incorrectly classified circles are rather spread over the whole image than clustered and thus, can be eliminated using the neighbored circles obviously classified as ’non-berry’.

(a) Berry sizes of Pinot Blanc (BBCH 81)
(b) Berry sizes of Pinot Blanc (BBCH 89)
Figure 8: Histogram of determined berry diameters. The histograms show a wide range of diameters but a concentration around a peak.

As can be seen in Figure 8 all histograms show a wide range of found diameters, but a concentration around the highest peak. Because the development of the berries vary within each cluster and thus, also in the vineyard, the histogram allows for a more meaningful interpretation than only the mean of the detected berries.

A basic assumption of the approach states that the silhouette of the berries is characterized by strong image gradients. Violations of this assumptions cause the circle detection module to fail to proper detect the berry’s boundaries or to define a suitable reference set, which clearly represents the major limitation of the approach. The problem can be overcome by using additional lights when taking the images.

In some cases it cannot be guarenteed that enough berries are visible in the image, e.g. if the image acquisition is fully automated. In this case the detection step for reference circles fails or gives only poor results. This step can be replaced by the usage of manually selected berries acquired from other images. However, the selected training data needs to be appropriate in order to be representative for the candidates. In order to proof the applicability of this approach, the experiments presented here were also conducted with 150 manually selected green berries of BBCH 81, which were used as reference berries. Using only gist features yields the best results with mean absolute differences ranging from  mm for ’Pinot Blanc’ and ’Dornfelder’ to  mm for ’Pinot Noir’ in BBCH 75, from  mm for ’Pinot Blanc’ and ’Pinot Noir’ to  mm for ’Dornfelder’ in BBCH 81 and from  mm for ’Pinot Noir’ to  mm for ’Dornfelder’ in BBCH 89. The results deteriorates for larger berry sizes, because the chosen reference berries were collected from an earlier stage. To overcome this problem the reference berries should be roughly chosen according to the current berry size. Nevertheless, the approach seems promising due to the fact that even dark berries could be detected using green reference berries.

A shortcoming of the proposed framework is the conversion from pixel to mm. It can be defined more accurately when using a camerasystem with known interior and relative orientation. In contrast to the current approach, where only one scale for the whole image is used, a depth map can be derived in order to define a pixelwise scale. First experiments regarding the computation of depth maps are showing promising result since the approach assumes a 3D architecture of the grapevine rather than a single plane (see Figure 9). The depth maps were computed using patch-based multi-view stereo software proposed by Furukawa and Ponce (2010) and the orientation was obtained using the approach of Abraham and Hau (1997). The depth maps would also enable the removal of far-off background in order to restrict the sets of circles to those circles lying in a distance of about m.

(a) One image captured with camerasystem
(b) Computed depthmap
Figure 9: Left: One image from a camera system with known interior and relative orientation. Right: The depth information, given in m, is color-coded, whereas red pixels indicate large distances, blue pixels small distances and gray pixels indicate void distances, which can be assumed to be background.

4 Conclusion

The paper proposed a high-throughput image analysis framework, which non-invasively detect grapevine berries and determine their size in mm from RGB images which were taken in vineyards. The framework automatically detects berries in images by first finding circular structures and classify them into the classes ’berry’ and ’non-berry’ using the concept of one-class classification. The classification is done by utilizing an active learning framework and a conditional random field. The experiments could show that the framework is able to detect a representative amount of berries in order to extract a reliable quantity of phenotypic data while keeping the error rate of falsely detected berries as low as possible in order to ensure a high quality of the extracted data. The obtained results showed a mean difference of about  mm to manual reference measurements and a correlation between the mean berry size and the manual reference measurements by .


  • Abraham and Hau (1997) Abraham, S., Hau, T., 1997. Towards autonomous highprecision calibration of digital cameras. In: Proceedings of SPIE Annual Meeting, Videometrics V. Vol. 3174. pp. 82–93.
  • Bar and Ullman (1996) Bar, M., Ullman, S., 1996. Spatial context in recognition. Perception 25 (3), 343–352.
  • Berenstein et al. (2010) Berenstein, R., Shahar, O., Shapiro, A., Edan, Y., 2010. Grape clusters and foliage detection algorithms for autonomous selective vineyard sprayer. Intelligent Service Robotics 3 (4), 233–243.
  • Beslic et al. (2009) Beslic, Z., Todic, S., Sivcev, B., 2009. Inheritance of yield components and quality of grape in hybridization of grapevine cultivars. Acta Horticulturae 827, 501–503.
  • Biederman et al. (1982) Biederman, I., Mezzanotte, R., Rabinowitz, J., 1982. Scene perception: Detecting and judging objects undergoing relational violations. Cognitive psychology 14 (2), 143–177.
  • Bloesch and Viret (2008) Bloesch, B., Viret, O., 2008. Stades phénologiques repères de la vigne. Revue suisse de viticulture, arboriculture, horticulture 40 (6), 1–4.
  • Boykov et al. (2001) Boykov, Y., Veksler, O., Zabih, R., 2001. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (11), 2001.
  • Brown et al. (1999) Brown, M., Moore, J., Fenn, P., McNew, R., 1999. Comparison of leaf disk, greenhouse, and field screening procedures for evaluation of grape seedlings for downy mildew resistance. HortScience 34 (2), 331–333.
  • Cabezas et al. (2006) Cabezas, J., Cervera, M., Ruiz-Garcia, L., Carreno, J., Martinez-Zapater, J., 2006. A genetic analysis of seed and berry weight in grapevine. Genome 49 (12), 1572–1585.
  • Chan and Shen (2005) Chan, T., Shen, J., 2005. Image Processing And Analysis: Variational, Pde, Wavelet, And Stochastic Methods. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA.
  • Costantini et al. (2008) Costantini, L., Battilana, J., Lamaj, F., Fanizza, G., Grando, M., 2008. Berry and phenology-related traits in grapevine (vitis vinifera l.): From quantitative trait loci to underlying genes. BMC Plant Biology 8 (1), 38.
  • Dalal and Triggs (2005)

    Dalal, N., Triggs, B., 2005. Histograms of oriented gradients for human detection. In: IEEE Proc. Computer Vision and Pattern Recognition. pp. 886–893.

  • Descombes et al. (2009) Descombes, X., Minlos, R., Zhizhina, E., 2009. Object extraction using a stochastic birth-and-death dynamics in continuum. Journal of Mathematical Imaging and Vision 33 (3), 347–359.
  • Dickscheid et al. (2008) Dickscheid, T., Läbe, T., Förstner, W., 2008. Benchmarking automatic bundle adjustment results. In: Congress of the International Society for Photogrammetry and Remote Sensing (ISPRS). Beijing, China.
  • Fanizza et al. (2005) Fanizza, G., Lamaj, F., Costantini, L., Chaabane, R., Grando, M., 2005. Qtl analysis for fruit yield components in table grapes (vitis vinifera). Theoretical and Applied Genetics 111 (4), 658–664.
  • Furukawa and Ponce (2010) Furukawa, Y., Ponce, J., 2010. Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (8), 1362–1376.
  • Galleguillos et al. (2008) Galleguillos, C., Rabinovich, A., Belongie, S., 2008. Object categorization using co-occurrence, location and appearance. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp. 1–8.
  • Gould et al. (2008) Gould, S., Rodgers, J., Cohen, D., Elidan, G., Koller, D., 2008. Multi-class segmentation with relative location prior. International Journal of Computer Vision 80 (3), 300–316.
  • Hidayat and Green (2009) Hidayat, R., Green, R., 2009. Real-time texture boundary detection from ridges in the standard deviation space. In: British Machine Vision Conference.
  • Hsu and Hsieh (2010)

    Hsu, H.-H., Hsieh, C.-W., 2010. Feature selection via correlation coefficient clustering. Journal of Software 5 (12), 1371–1377.

  • Khan and Madden (2010)

    Khan, S. S., Madden, M. G., 2010. A survey of recent trends in one class classification. In: Proc. Irish conference on Artificial intelligence and cognitive science. AICS’09. Springer-Verlag, Berlin, Heidelberg, pp. 188–197.

  • Korč and Förstner (2008) Korč, F., Förstner, W., 2008. Approximate parameter learning in conditional random fields: An empirical investigation. In: Pattern Recognition. Springer, pp. 11–20.
  • Lafarge et al. (2010)

    Lafarge, F., Gimel’Farb, G., Descombes, X., 2010. Geometric feature extraction by a multimarked point process. IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (9), 1597–1609.

  • Lafferty et al. (2001)

    Lafferty, J. D., McCallum, A., Pereira, F. C. N., 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. International Conference on Machine Learning.

  • Lempitsky and Zisserman (2010) Lempitsky, V., Zisserman, A., 2010. Learning to count objects in images. In: Proc. Neural Information Processing Systems.
  • Longo et al. (2010) Longo, D., Pennisi, A., Bonsignore, R., Muscato, G., Schillaci, G., 2010. A multifunctional tracked vehicle able to operate in vineyards using gps and laser range-finder technology. In: International Conference: Work safety and risk prevention in agro-food and forest systems.
  • Lorenz et al. (1994) Lorenz, D., Eichhorn, K., Bleiholder, H., Klose, R., Meier, U., Weber, E., 1994. Phänologische entwicklungsstadien der weinrebe (vitis vinifera l. ssp. vinifera). Viticulture and Enology Science 49, 66–70.
  • Mazzetto et al. (2010) Mazzetto, F., Calcante, A., Mena, A., Vercesi, A., 2010. Integration of optical and analogue sensors for monitoring canopy health and vigour in precision viticulture. Precision Agriculture 11 (6), 636–649.
  • Moya and Hostetler (1993)

    Moya, M., K. M., Hostetler, L., 1993. One-class classifier networks for target recognition applications. In: Proc. World Congress on Neural Networks.

  • Nuske et al. (2011) Nuske, S., Achar, S., Bates, T., Narasimhan, S., Singh, S., September 2011. Yield estimation in vineyards by visual grape detection. In: Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems.
  • OIV (2001) OIV, 2001. 2nd edition of the oiv descriptor list for grape varieties and vitis species.
  • Oliva and Torralba (2001) Oliva, A., Torralba, A., 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision 42 (3), 145–175.
  • Palmer (1975) Palmer, t. E., 1975. The effects of contextual scenes on the identification of objects. Memory & Cognition 3 (5), 519–526.
  • Peng et al. (2007) Peng, T., Balijepalli, A., Gupta, S., LeBrun, T., 2007. Algorithms for on-line monitoring of micro spheres in an optical tweezers-based assembly cell. Journal of computing and information science in engineering 7 (4), 330–338.
  • Rabinovich et al. (2007) Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S., 2007. Objects in context. In: Proc. International Conference on Computer Vision. IEEE, pp. 1–8.
  • Settles (2010) Settles, B., 2010. Active learning literature survey, computer sciences technical report 1648. Tech. rep., University of Wisconsin–Madison.
  • Song et al. (2013)

    Song, Y., Wen, Z., Lin, C.-Y., Davis, R., August 2013. One-class conditional random fields for sequential anomaly detection. In: Proc. International Joint Conference on Artificial Intelligence (IJCAI). Beijing, China.

  • Tax (2001) Tax, D., 2001. One-class classification: concept-learning in the absence of counter-examples. Ph.D. thesis, Delft University of Technology.
  • Töpfer et al. (2011) Töpfer, R., Hausmann, L., Harst, M., Maul, E., Zyprian, E., Eibach, R., 2011. New horizons for grapevine breeding. Methods in Temperate Fruit Breeding. Fruit, Vegetable and Cereal Science and Biotechnology 5 (Special Issue 1), 79–100.