1 Introduction
Adaptive thresholding is one of the most used technique in many applications because it is fast to compute and when combined with previous filters, it gives robust decision rules for pattern recognition. Among many other techniques of thresholding [1]; Köhler’s method computes a curve of contrast of the region boundaries in an image [2]. The contrast steps correspond to the local maxima of the curve and can be extracted for (multi)thresholding of the image. This is useful for many applications: industrial, biomedical, video, etc [3, 4, 5]. However, computing Köhler’s method is time consuming; almost 1 minute using a C++ implementation on a current computer with an image acquired by a recent camera (18 million pixels, fig. 1). The purpose of this paper, is to introduce and implement a new algorithm for Köhler’s thresholding method faster than the existing algorithms and making it useful for applications requiring fast or realtime processing (e.g. video thresholding, large datasets) [6].
Previously, two attempts were made to speed up the computation of Kölher’s method. Zeboudj [7, 8] used mathematical morphology operations to give a similar version of Köhler’s method. However, his approach was efficient on specific devices which are not available any more. The other one, from Hautière [9, 3], consists of making the computation on a reduced part of the neighbourhood and to precalculate some intermediate images of minimum and maximum between the image translated horizontally and vertically in order to compute the contrast. However, this algorithm does not introduce any parallelisation.
In this paper, after a reminder on Köhler’s method and on parallel computing in Mathematical Morphology [10, 11, 12]; we will introduce a parallel algorithm for Köhler’s method using line translations of the image. We will also propose to compute the contrast on a reduced neighbourhood as in [9]. Eventually, we will compare an implementation of our algorithm, using vectorisation with SIMD instructions [13] and multicore (i.e. parallel) processing with OpenMP [14], to other implementations of Köhler’s method.
2 Prerequisites
Let us remind Köhler’s method and the acceleration of Mathematical Morphology operations. An image is a function defined on a discrete domain with values in , and for 8 bits images. We denote the location of a pixel and its value. In the sequel, we will use the 4neighbourhood of pixels. For bidimensional images, we can also use the 8 or the 6neighbourhood [12] with insignificant differences [3].
2.1 Köhler’s method
Let us remind Köhler’s method [2, 4]. Given a greylevel image and a threshold , two classes are generated by : and . A boundary is also generated:
(1) 
For each couple of pixels of , Köhler associates a contrast defined as:
(2) 
which is the minimum of the two steps (in terms of contrast) generated by the threshold between and . The average contrast of the boundary is defined as:
(3) 
is the cardinality (number of elements) of and the summation is made on the couples of pixels belonging to . This generates a curve of contrasts for all the possible thresholds . The optimal threshold is selected as:
(4) 
In figure 1, we have extracted the 6 most significant thresholds (i.e. the local maxima) from the contrast curve. These multiple thresholds give an efficient simplification (i.e. compression) of the image grey levels: passing from 256 to 7 grey levels. The 7 grey levels corresponds to the mean value of the pixels for each class of the segmentation.
(a) Initial image (256 classes)  (c) Segmented image (7 classes) 
(c) Contrast curve 
A direct C++ implementation consists of computing for each threshold , the contrast of the boundary . It has a duration of 53 s using an image of size pixels and a processor Intel®Core^{TM} i7 CPU 4702HQ, 2.20 GHz, 4 cores, 8 threads. As the algorithm is not parallel, a single thread is used. For realtime applications, or big datasets, a faster algorithm is needed.
2.2 Accelerating operations on a neighbourhood
In Mathematical Morphology, for operations on a neighbourhood some acceleration methods exists. With a symmetric structuring element (such as the one associated to the 4neighbours), the morphological dilation corresponds to the Minkowski addition [15, 10, 11, 12, 16]:
(5) 
is the set translated by the vector . A direct implementation of a dilation, by computing the union on the neighbourhood of each pixel (fig. 2 (a)), will lead to an algorithm of complexity of [17]. is the number of pixels in the image and is the cardinality of . However, the Minkowksi addition (i.e. the dilation) has the property to distribute the union [18, 11]: . This property has important technological consequences as a dilation can be computed elements by elements of the structuring element , before combining the intermediate results by union. Therefore,
(6) 
An implementation by translating all pixel by the vectors and checking if they belong to the set will have a complexity of [17]. As images are stored in computer memory as unidimensional arrays, an efficient implementation [19] consists of translating the lines instead of the pixels (fig. 2 (b)).
(a)  (b) 
As the operations in each line are independent from the other lines (eq. 6), a parallel programming method is used, namely OpenMP [14], which is designed for multiprocessor/core, shared memory computers. Each thread is computed by a single core and all cores share a common memory. In addition, vectorised data are used with SIMD instructions [13] in each thread. Vector operations using SIMD instructions allow multiple data to be processed with a single instruction of the processor while scalar operations use one instruction to process each individual data. The size of the SIMD registers being of 128 bits, 16 operations on integers of 8 bits are performed at the same time instead of a single one. In recent compilers, the vectorisation is performed automatically after activation of the right option. In [19], a dilation with a square structuring element of size 3 pixels in a 8 bit image (of size pixels) is accelerated by a factor 136 with a neighbourhood implementation by comparison to a line implementation with parallelisation and vectorisation (45 ms versus 0.33 ms, Intel®Core^{TM} i3 CPU M330, 2.13 GHz, 2 cores, 4 threads). Let us introduce an efficient method to compute Köhler’s method.
3 A fast algorithm for Köhler’s method
3.1 Accelerating the computation of Köhler’s contrast
A direct implementation of Köhler’s approach (section 2.1) is not designed for parallel and vector processing. Kölher’s contrast (eq. 2) is summed on boundaries which are computed by the set difference between a morphological dilation of the set and this same set: . The structuring element corresponds to the 4neighbours. The direct implementation has a complexity of (i.e. the complexity of a dilation, , multiplied by the complexity of scanning the pixel pairs of the boundary , ). A direct acceleration would consist of computing the boundary with an accelerated morphological dilation, as presented above. However, such approach does not reduce the complexity of the algorithm. For this purpose, we propose first to perform the translation of the image lines, which is suited for parallel processing, and then to compute the contribution to the contrast of each pixel pairs, for each threshold. The complexity decreases to , i.e. the product of the number of pixels by the number of grey levels. The algorithm 1 presents our approach.
3.2 Reduction of the neighbourhood size
(a)  (b) 
In order to gain an additional factor 2, let us show that it is the same to make the computation on a “half” neighbourhood as on the 4neighbourhood (fig. 3). The idea has been introduced in [9, 3]. Here, the equality of the two approaches is demonstrated.
When scanning the boundary with the 4neighbourhood , only the pixels , such as , are contributing to the boundary contrast. However, both pixels and are neighbours:
. Therefore, both (ordered) pairs
and are scanned and only one pair is contributing to the contrast between the (unordered) set of points(7) 
Let us remove the order condition between the grey levels, and define the absolute pair contrast . When scanning both pairs without the grey level order, the contrast between the set of points , is:
(8) 
Using the “half”neighbourhood allows to scan only one pair of pixels. Therefore, we obtain our result:
(9) 
3.3 Implementation: parallelisation and vectorisation
In order to make parallel the algorithm, the computation of the contrast can be performed independently line by line. For each parallel thread , two arrays of length , (contrast) and (counter), need to be created at the beginning of the parallel process (line 6, alg. 1). At the end of the parallel process (line 19), they are grouped by summation in two arrays (contrast) and (counter). The parallel programming language used is OpenMP in C++. Instead of being performed between single numbers, several operations can be performed using arrays, allowing the vectorisation of the data. The following operations are vectorised and processed using SIMD instructions: the line translation (line 9), the computation of the minimum (line 11) and the maximum (line 12) between the arrays and , the computation of the contrast (line 14) and of the counter (line 15) and the normalisation of the contrast (line 20).
4 Results
We now compare the duration of the direct implementation to the fast algorithm. We have used a processor Intel®Core^{TM} i7 CPU 4702HQ, 2.20 GHz, 4 cores, 8 threads with 16Gb RAM. Using the image “Tulips” (fig. 1) with a current camera resolution of pixels and the fast algorithm with parallelisation, the computation of Köhler’s method is made in 0.13 s (tab. 1) instead of 53 s, with a gain factor of 405. With images of former resolution ( pixels), such as Lenna image; the direct method takes 0.69s and the fast method 0.005s with a gain factor of 126. Therefore, the necessity of using a faster algorithm instead of direct implementation becomes essential to process images with current resolution. Other experiments have confirmed this result.
Name  Lenna  Tulips 

Size (pixels)  
Direct (D)  6.90e01 s  5.29e+01 s 
Fast (B)  1.62e02 s  4.86e01 s 
Fast (A)  5.46e03 s  1.30e01 s 
Gain (B vs. D)  42  109 
Gain (A vs. D)  126  405 
Let us try Köhler’s method with a video of a car from the dataset YFCC100M (Yahoo Flickr Creative Commons 100M) [6, 20]. In figure 4, two frames and their segmentations in two classes are shown. The direct implementation segments the video at a rate of 1 frame per second while the fast implementation (with parallelisation) processes 97 frames per second, which is faster than the 25 frames/s of the video (tab. 2). Therefore, the fast algorithm for Köhler’s method is suited for real time video processing.
Name  YFCC100M (car) 

Size (pixels)  
Number of frames  640 
Direct (D)  0.99 frames/s 
Fast (A)  96.96 frames/s 
Gain (A vs. D)  98 
(a) frame 101  (b) frame 251 
(c) segmented frame 101  (c) segmented frame 251 
5 Conclusion and perspectives
A faster algorithm for Köhler’s thresholding has been introduced with a lower complexity, , than the direct approach, . It is designed to benefit from the capacities of processors: multicore processing with OpenMP and vector processing using SIMD instructions. Results show that with an image of 18 million pixels the duration is reduced by a factor 405 (from 53 s to 0.13 s) and that a video can be processed at a rate of 97 frames/s instead of 1 frame/s. Importantly, this algorithm is suited for applications requiring realtime or fast processing: video, industrial, large databases, etc. Its practical interest is to be combined with previous transforms: a lowpass filter, a mathematical morphology transform [11, 21, 17, 22] or a map of colour distances [23, 24]. In future works, the influence on the method of different contrasts will be presented (already studied), such as the contrasts defined in the Logarithmic Image Processing framework [4, 5].
References
 [1] H. Cai, Z. Yang, X. Cao, W. Xia, and X. Xu, “A new iterative triclass thresholding technique in image segmentation,” IEEE Transactions on Image Processing, vol. 23, no. 3, pp. 1038–1046, March 2014.
 [2] Ralf Kohler, “A segmentation system based on thresholding,” Computer Graphics and Image Processing, vol. 15, no. 4, pp. 319 – 338, 1981.

[3]
N. Hautière, R. Labayrade, and D. Aubert,
“Realtime disparity contrast combination for onboard estimation of the visibility distance,”
IEEE Transactions on Intelligent Transportation Systems, vol. 7, no. 2, pp. 201–212, June 2006.  [4] M. Jourlin, M. Carré, J. Breugnot, and M. Bouabdellah, “Chapter 7  Logarithmic image processing: Additive contrast, multiplicative contrast, and associated metrics,” in Advances in Imaging and Electron Physics, Peter W. Hawkes, Ed., vol. 171, pp. 357 – 406. Elsevier, 2012.
 [5] M. Jourlin, “Chapter three  metrics based on logarithmic laws,” in Logarithmic Image Processing: Theory and Applications, Michel Jourlin, Ed., vol. 195 of Advances in Imaging and Electron Physics, pp. 61 – 113. Elsevier, 2016.
 [6] Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and LiJia Li, “YFCC100M: The new data in multimedia research,” Commun. ACM, vol. 59, no. 2, pp. 64–73, Jan. 2016.
 [7] Rachid Zéboudj and Michel Jourlin, Filtrage, seuillage automatique, contraste et contours du prétraitement à l’analyse d’image, Ph.D. thesis, Université de SaintEtienne, France, 1988, Thèse de doctorat Sciences. Informatique.
 [8] M. Coster and J.L. Chermant, Précis d’analyse d’images, CNRS plus. Presses du CNRS, 1989.
 [9] Nicolas Hautière and Michel Jourlin, “Measurement of local contrast in images, application to the measurement of visibility distance through use of an onboard camera,” Traitement du Signal, vol. 23, no. 2, pp. 145–158, 2006.
 [10] G. Matheron, Eléments pour une théorie des milieux poreux, Masson, Paris, 1967.
 [11] J. Serra and N.A.C. Cressie, Image analysis and mathematical morphology: Vol. 1, Academic Press, London, 1982.
 [12] Pierre Soille, Morphological Image Analysis: Principles and Applications, SpringerVerlag New York, Inc., Secaucus, NJ, USA, 2 edition, 2003.
 [13] Paul Cockshott and Kenneth Renfrew, SIMD Programming Manual for Linux and Windows, Springer Publishing Company, Incorporated, 1st edition, 2010.
 [14] B. Chapman, G. Jost, and R. van der Pas, Using OpenMP: Portable Shared Memory Parallel Programming, Number vol. 10 in Scientific Computation Series. MIT Press, 2008.
 [15] Hermann Minkowski, “Volumen und oberfläche,” Mathematische Annalen, vol. 57, pp. 447–495, 1903.
 [16] L. Najman and H. Talbot, Mathematical Morphology, ISTE. Wiley, 2013.
 [17] Thierry Géraud, Hugues Talbot, and Marc Van Droogenbroeck, Algorithms for Mathematical Morphology, pp. 323–353, John Wiley & Sons, Inc., 2013.
 [18] H. Hadwiger, Vorlesungen über Inhalt, Oberfläche und Isoperimetrie, Grundlehren der mathematischen Wissenschaften. Springer, 1957.
 [19] Matthieu Faessel and Michel Bilodeau, “SMIL: simple morphological image library,” in Séminaire Performance et Généricité, LRDE, Mar. 2013.
 [20] “Video of a car from the YFCC100M dataset,” www.flickr.com/photos/32109282@N00/4078577031.
 [21] J. Serra, Image analysis and mathematical morphology: Theoretical advances, vol. 2, Academic Press, 1988.
 [22] G. Noyel, J. Angulo, and D. Jeulin, “A new spatiospectral morphological segmentation for multispectral remotesensing images,” International Journal of Remote Sensing, vol. 31, no. 22, pp. 5895–5920, 2010.
 [23] G. Noyel and M. Jourlin, “Asplünd’s metric defined in the logarithmic image processing (LIP) framework for colour and multivariate images,” in Image Processing (ICIP), 2015 IEEE International Conference on, Sept 2015, pp. 3921–3925.
 [24] Guillaume Noyel and Michel Jourlin, “Spatiocolour Asplünd ’s metric and Logarithmic Image Processing for Colour Images (LIPC),” in CIARP2016  XXI IberoAmerican Congress on Pattern Recognition, Lima, Peru, Nov. 2016, International Association for Pattern Recognition (IAPR).
Comments
There are no comments yet.