Computationally Efficient Implementation of Convolution-based Locally Adaptive Binarization Techniques

10/11/2012 ∙ by Ayatullah Faruk Mollah, et al. ∙ 0

One of the most important steps of document image processing is binarization. The computational requirements of locally adaptive binarization techniques make them unsuitable for devices with limited computing facilities. In this paper, we have presented a computationally efficient implementation of convolution based locally adaptive binarization techniques keeping the performance comparable to the original implementation. The computational complexity has been reduced from O(W2N2) to O(WN2) where WxW is the window size and NxN is the image size. Experiments over benchmark datasets show that the computation time has been reduced by 5 to 15 times depending on the window size while memory consumption remains the same with respect to the state-of-the-art algorithmic implementation.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Document image binarization is an extensively studied topic over the past decades. It is one of the most important steps of any document processing systems. It can be defined as a process of converting a multi-chromatic digital image into a bi-chromatic one. A multi-chromatic image also called as color image consists of color pixels each of which is represented by a combination of three basic color components viz. red (), green () and blue (). The range of values for all these color components is 0-255. So, the corresponding gray scale value for a pixel located at may be obtained by using Eq. 1.


where = 0.299, = 0.587 and = 0.114. As =1, the range of is also 0-255. So, a gray scale image can be represented as a matrix of gray level intensities = where and denote the number of rows i.e. the height of the image and the number of the columns i.e. the width of the image respectively. Similarly, a binarized image can be represented as such that {0, 255}.

Techniques developed so far for document image binarization are categorized into two types - global binarization techniques and locally adaptive binarization techniques. In the first case, pixels constituting the image are binarized with a single threshold as shown in Eq. 2. A number of such techniques [1]-[4] have been developed, of which Otsu’s technique [4] has been found to be the best in a study conducted by Trier et al. [5]-[6].


Global binarization techniques in general produce good results for noise free and homogeneous document images of good quality. But, it fails to properly binarize the images with uneven illumination and noise. Locally adaptive binarization techniques evolved to overcome this problem by binarizing pixels with pixel specific threshold as shown in Eq. 3.


Quite a good number of such adaptive techniques have been found in the literature [7]-[14]. Among these techniques, the best one has been found to be Niblack’s one [11] in the same study by Trier et al. [5]-[6]. Later, more advanced techniques have been designed and some of them have been reported in [15]-[19]. Sauvola’s [15] text binarization method (TBM) is one of them. This method calculates from mean

and standard deviation

of the gray levels of the pixels within a window around the subject pixel as described in Eq. 4.


where is a positive constant and is the dynamic range of standard deviation. A good number of the locally adaptive binarization techniques including Sauvola [15] are convolution based. As a result, the computational complexity and computation time of such techniques are very high. So far, binarization techniques are evaluated on the basis of binarization accuracy only [20]-[21]. But, study on computational requirements of algorithms is also required especially for real time systems and low-resourceful computing devices such as cell-phones, Personal Digital Assistants (PDA), iPhones, iPod-Touch, etc.

The present work is an attempt to reduce the computational complexity of convolution based binarization techniques while retaining comparable accuracies. The computational complexity of the global binarization technique is usually where is the image size. In case of convolution based binarization techniques, selection of threshold for each pixel requires computation of mean and standard deviation of the gray level intensities of the surrounding pixels within the window. So, computation of individual threshold value for each pixel has a complexity of where is the window size and overall complexity is for an image of pixels.

As the study [22] shows, handheld/mobile devices may not be capable of running algorithms of within affordable time. Even, the time taken for such algorithms on desktop computers may lead to dissatisfaction to many. In [23], Shafait et al. have suggested an implementation for such algorithm with computational complexity of . They proposed a faster calculation of and using integral images, but at the cost of 5 to 6 times more memory. In the present work, we have proposed a novel implementation of the convolution based binarization algorithms, which has a computational complexity of and does not require any additional memory. Experimental results on publicly available standard datasets and our own dataset have also been presented.

2 Present Work

The computation of mean m(x,y) and standard deviation s(x,y) for each pixel (x,y) is the most time consuming operation in convolution based locally adaptive binarization techniques. If a window of size pixels is taken around a pixel (x,y), then the set of window pixels, S, will have number of elements. The performance of such methods is heavily dependent on the size of the window. Window size is decided on the basis of pattern stroke and pattern size. It cannot be made arbitrarily small.

A possible way to reduce the execution time of such binarization methods is to reduce the number of pixels in S by considering only the pixels which effectively contribute in computation of mean and variance within the window. In our present work, we have tried to reduce this number by sampling pixels from S following some geometrical order to form a reduced set S


Different geometric structures can be defined to select the contributive pixels. A few of such geometric structures have been shown in Fig. 1. S contains pixels corresponding to black boxes marked in the geometric structures of Fig. 1. It may be observed that in S, the number of foreground pixels is much lesser than that of the background pixels for the windows of same size around both foreground and background pixels. The mean and standard deviation computed from S are denoted as and respectively.

Figure 1: Various geometric structures for selection of representative pixels in a window

It is evident that S is a very small subset of S for all possible geometric structures of Fig. 1. Also, in this context, the formulation of Sauvola’s method is as given in Eq. 5.


where denotes the threshold calculated from the reduced set S for the pixel , is the dynamic range of and is a positive constant.

3 Experimental Results

The proposed implementation has been tested on the printed as well as handwritten images used for benchmarking the performance of various binarization techniques in the recent Document Image Binarization Contest (DIBCO) 2009 [24] and Handwritten Document Image Binarization Contest (H-DIBCO) 2010 [25]. It has also been tested on our own dataset (CMATERdb-6) as well. This dataset contains 5 representative images. The first one is of a handwritten Bengali document in which the texts on the rear side are visible from the front side and the image is unevenly illuminated. The second image is of an old historical printed Bengali document. The third one is an image of printed English text and has been captured from a notice board by a cell-phone camera. The fourth image is of an old printed document having texts of multiple fonts and font-sizes. The last one is a cell-phone camera captured business card image.

Ground truth data for the DIBCO and H-DIBCO datasets are publicly available. We have prepared the ground truth data for the images of CMATERdb-6 dataset. The original and ground truth images of this dataset are publicly available for research purposes at [26]. These three datasets together have 25 representative images containing various kinds of degradations and deformations. The results obtained with the proposed implementation have been compared with the original/algorithmic implementation of Sauvola’s binarization method, since it is one of the best convolution-based binarization methods. Results obtained with Niblack as well as Otsu’s binarization technique have also been given.

3.1 Performance Analysis

Current implementation incepts from a conjecture that the threshold value t(x,y) obtained from m(x,y) and s(x,y) are not considerably different from t(x,y) computed from m(x,y) and s(x,y). As a result, the performance remains comparable. The binarized result obtained with the new threshold may not be exactly same with that obtained with t(x,y), but experiments show that the results obtained with the presented technique serves the purpose of binarization very well.

Comparing the output images obtained using various geometric structure of Fig.1(a-f) with their ground truth images, we find the number of true positives (TP), number of true negatives (TN), number of false positives (FP) and the number of false negatives (FN). The definition for F-Measure (FM) in terms of Recall rate (R) and precision rate (P) has been given in Eq. 6.


where and . In an ideal situation i.e. when the output image is identical with the ground truth image, , and should be all 100%. While calculating the F-Measure, the best combination of window size () and has been considered in all cases.

Table 1 shows F-Measures achieved with Otsu, Niblack, Sauvola and proposed implementations of Sauvola’s method for DIBCO image dataset. It contains 5 (1-5) printed and 5 (6-10) handwritten images. Bold cells represent the highest F-Measure achieved for the corresponding image. It may be noted that the highest mean F-Measure (91.13%) has been achieved with GS-3. Moreover, mean F-Measures achieved with GS-4, GS-5 and GS-6 are greater than that of Sauvola’s method. F-Measure with GS-2 is equal to that of Sauvola.

Image F-Measures (%)
Otsu Niblack Sauvola GS-1 GS-2 GS-3 GS-4 GS-5 GS-6
1 91.06 88.12 91.64 91.18 91.83 91.88 91.61 91.91 91.71
2 96.56 94.76 96.39 95.58 96.27 96.16 96.14 96.40 96.25
3 96.71 88.65 95.82 94.90 95.83 95.58 95.68 95.96 95.77
4 82.59 90.29 92.93 91.70 93.02 92.88 92.56 92.93 92.86
5 89.58 85.59 89.81 88.49 89.85 89.36 89.33 89.87 89.62
6 90.85 92.70 92.34 91.64 91.96 92.15 91.77 91.78 91.98
7 86.15 75.02 86.65 89.64 88.99 89.41 89.21 89.06 89.23
8 84.11 88.19 87.99 88.08 88.02 89.08 88.74 88.46 88.92
9 40.56 86.20 88.62 88.09 88.65 89.24 88.52 88.79 88.99
10 28.04 85.62 85.38 86.36 83.13 85.59 85.13 85.05 84.81
Mean 78.62 87.51 90.76 90.57 90.76 91.13 90.89 91.02 91.01
Table 1: F-Measures achieved with different techniques/implementations for DIBCO images (Bold cells represent the highest F-Measure for the corresponding image)

Similar to Table 1, Table 2 shows F-Measures achieved with H-DIBCO images. It contains 10 representative handwritten document images. Proposed implementations have achieved highest F-Measures for 6 images out of 10. The implementation referred to as GS-3 alone has yielded 3 highest F-Measures. Although, the mean F-Measure is highest in case of Sauvola, F-Measures of the proposed implementations are close to that.

Image F-Measures (%)
Otsu Niblack Sauvola GS-1 GS-2 GS-3 GS-4 GS-5 GS-6
1 91.47 90.98 91.23 89.39 90.82 89.71 90.28 91.37 90.57
2 88.18 88.46 89.03 89.86 88.32 89.47 89.53 88.64 89.26
3 84.36 81.78 85.64 84.16 85.01 84.36 84.45 85.15 84.63
4 85.62 89.80 89.67 89.29 89.82 89.84 89.45 89.60 89.69
5 88.28 84.57 92.26 92.91 93.20 93.51 93.19 93.23 93.38
6 80.38 84.38 84.09 84.04 83.77 84.11 84.42 84.33 84.33
7 90.12 89.57 90.87 90.76 90.69 91.13 90.84 90.79 91.09
8 85.68 88.32 88.23 87.27 88.01 88.29 88.30 88.12 88.32
9 81.28 88.43 88.42 88.40 87.88 87.88 87.85 87.92 87.92
10 79.25 87.67 87.60 87.90 86.00 85.91 85.54 85.67 85.66
Mean 85.46 87.40 88.70 88.40 88.35 88.42 88.39 88.48 88.49
Table 2: F-Measures achieved with different techniques/implementations for H-DIBCO images (Bold cells represent the highest F-Measure for the corresponding image)
Image F-Measures (%)
Otsu Niblack Sauvola GS-1 GS-2 GS-3 GS-4 GS-5 GS-6
1 88.00 89.71 89.98 90.10 90.09 90.16 90.05 89.96 90.14
2 88.88 88.68 89.04 88.97 89.07 89.05 89.06 89.07 89.07
3 91.93 92.89 93.41 93.00 93.37 93.46 93.29 93.38 93.46
4 99.04 95.64 98.02 97.01 97.42 97.90 97.80 97.82 97.94
5 91.06 90.94 91.65 91.40 91.63 91.70 91.81 91.77 91.80
Mean 91.78 91.57 92.42 92.10 92.32 92.45 92.40 92.40 92.48
Table 3: F-Measures achieved with different techniques/implementations for CMATERdb6 images (Bold cells represent the highest F-Measure for the corresponding image)

Table 3 shows the F-Measures achieved with CMATERdb-6 images. It is noteworthy that highest mean F-Measure i.e. 92.48% has been achieved for GS-6 whereas the mean F-Measure for Sauvola’s method is 92.42%. It may also be noted that Otsu’s global binarization method has given the highest F-Measure for the fourth image.

Figure 2: Mean F-Measures computed for all images of the 3 benchmarking datasets with various techniques and implementations

Figure 3: Sample images and binarized results with various techniques. (a,e,i) Image #7 of [24], #2 of [25] and #4 of [26] respectively, (b,f,j) Binarized images with Sauvola’s method, (c,g,k) Binarized images with GS-3, GS-1 and GS-3 respectively, (d) Ground truth images

A comparison of the mean F-Measures achieved for all 25 images of the 3 datasets with various techniques has been shown in Fig. 2. The highest F-Measure i.e. 90.31% is achieved with GS-3. F-Measures achieved with all present implementations are greater than that of Niblack. Three implementations viz. GS–1, GS–2 and GS–4 have yielded F-Measures slightly less than that of Sauvola and the remaining implementations viz. GS–3, GS–5 and GS–6 have yielded slightly improved F-Measures than that of Sauvola. This shows that the results with the proposed implementations are comparable with the result of Sauvola. Fig. 3 show some sample images of the above datasets and their binarized images for some techniques.

3.2 Computational Complexity and Computation Time

The proposed technique calculates the threshold for each pixel with computation time of time. So, computational complexity of the proposed technique is . Plot of mean computation times of Niblack, Sauvola and proposed techniques has been shown in Fig. 4 with respect to a moderately powerful notebook (DualCore T2370, 1.73 GHz, 1GB RAM, 1MB L2 Cache). It may be observed from Fig. 4 that the computation time of the proposed technique is much lesser than Niblack’s and Sauvola’s implementations.

Figure 4: Plot of mean computation times of Niblack, Sauvola and proposed techniques (for the images of resolution 1024x768 with 20x20 window size)

3.3 Memory Consumption

As storing a pixel of a gray scale image requires 1 byte of memory, an image requires bytes of memory. The algorithmic implementation of Niblack’s and Sauvola’s technique makes a copy of the image before binarizing its pixels by convolving the window. So, the amount of memory consumption of this algorithm is bytes where is a constant.

The implementation proposed by Shafait et al. [23] is faster, but it requires additional memory. It prepares two types of integral images from the given image - one for intensity values and the other for square of the intensity values. To store these integral images with 32 bit and 64 bit integers respectively, we need and bytes of memory. So, the amount of memory consumption in this case is where is another constant. It may be noted that the memory consumption is 6 times higher than that of the algorithmic implementation.

Memory consumption of our implementation can be given as where is another constant. It may be noted that this implementation requires no additional memory compared to the original/algorithmic implementation.

4 Conclusion

In this paper, we have presented a novel implementation of convolution based locally adaptive binarization techniques. Both the computational complexity and computation time are significantly reduced while keeping the performance close to the ordinary implementation. The computational complexity has been reduced from to and the time computation has been reduced by 5 to 15 times depending on the window size. At the same time, memory consumption is the same with the original implementation. This type of implementation is especially useful in image analysis and document processing systems for real-time systems and on handheld mobile devices having limited computational facilities. As the trend in designing camera based applications on mobile devices has recently increased considerably, the presented technique will be highly useful.

4.0.1 Acknowledgments.

We are thankful to the Center for Microprocessor Application for Training Education and Research (CMATER), Jadavpur University for providing infrastructural support for the research work. The first author is also thankful to the School of Mobile Computing and Communication (SMCC) for providing fellowship to him.


  • [1]

    Abutaleb, A.S.: Automatic thresholding of gray-level pictures using two-dimensional entropy. Computer Vision, Graphics, and Image Processing, 47, 22–32 (1989)

  • [2] Kapur, J.N., Sahoo, P.K., Wong, A.K.C.: A new method for gray level picture thresholding using the entropy of the histogram. Computer Vision, Graphics, and Image Processing, 29, 273–285 (1985)
  • [3]

    Kittler, J., Illingworth, J.: Minimum error thresholding. Pattern Recognition, 19(1), 41–47 (1986)

  • [4] Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Systems, Man, and Cybernetics, 9(1), 62-66 (1979)
  • [5] Trier, .D., Taxt, T.: Evaluation of binarization methods for document images. IEEE Trans. Pattern Anal. Mach. Intell., 17(3), 312–315 (1995)
  • [6] Trier, .D., Jain, A.K.: Goal-directed evaluation of binarization methods. IEEE Trans. Pattern Anal. Mach. Intell., 17(12), 1191–1201 (1995)
  • [7] Bernsen, J.: Dynamic thresholding of grey-level images. In: Eighth Int’l Conf. on Pattern Recognition, pp. 1251–1255, Paris (1986)
  • [8] Nakagawa, Y., Rosenfeld, A.: Some experiments on variable thresholding. Pattern Recognition, 11(3), 191–204 (1979)
  • [9] Eikvil, L., Taxt, T., Moen, K.: A fast adaptive method for binarization of document images. In: First Int’l Conf. on Document Analysis and Recognition, pp. 435–443, Saint-Malo, France (1991)
  • [10] Mardia K.V., Hainsworth, T.J.: A spatial thresholding method for image segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 10(6), 919–927 (1988)
  • [11] Niblack, W.: An Introduction to Digital Image Processing. Prentice-Hall, Englewood Cliffs, NJ, pp. 115–116 (1986)
  • [12] White, J.M., Rohrer, G.D.: Image thresholding for optical character recognition and other app1ications requiring character image extraction. IBM J. Research and Development, 27(4), 400–411 (1983)
  • [13] Parker, J.R.: Gray level thresholding in badly illuminated images. IEEE Trans. Pattern Analysis and Machine Intelligence, 13(8), 813–819 (1991)
  • [14] Trier, .D., Taxt, T.: Improvement of integrated function algorithm for binarization of document images. Pattern Recognition Letters, 16(3), 277–283 (1995)
  • [15] Sauvola, J., Pietikainen, M.: Adaptive document image binarization. Pattern Recognition, 33, 225–236 (2000)
  • [16] Seeger, M., Dance, C.: Binarizing camera images for OCR. In: 6th Int’l Conf. on Document Analysis and Recognition, pp. 54-58 (2001)
  • [17] Wolf, C., Jolion, J.M., Chassaing, F.: Text localization, enhancement and binarization in multimedia documents. In: Int’l Conf. on Pattern Recognition, pp. 1037-1040 (2002)
  • [18] Gatos, B., Pratikakis, I., Perantonis, S.J.: Adaptive degraded document image binarization. Pattern Recognition, 39(3), 317–327 (2006)
  • [19] Shin, K.T., Jang, I.H., Kim, N.C.: Block adaptive binarization of ill-conditioned business card images acquired in a PDA using a modified quadratic filter. IET Image Processing, 1(1), 56-66 (2007)
  • [20] Gatos, B., Ntirogiannis, K., Pratikakis, I.: ICDAR 2009 Document Image Binarization Contest (DIBCO 2009). In: 10th Int’l Conf. on Document Analysis and Recognition, pp. 1375-1382, Spain (2009)
  • [21] Pratikakis, I., Gatos, B., Ntirogiannis, K.: H-DIBCO 2010 – Handwritten Document Image Binarization Competition. In: 12th Int’l Conf. on Frontiers in Handwriting Recognition, pp. 727-732, India (2010)
  • [22] Dunlop, M.D., Brewster, S.A.: The Challenge of Mobile Devices for Human Computer Interaction. Personal and Ubiquitous Computing, 6(4), 235–236 (2002)
  • [23] Shafait, F., Keysers, D., Breuel, T.M.: Efficient implementation of local adaptive thresholding techniques using integral images. In: Document Recognition and Retrieval XV, San Jose, USA (2008)
  • [24] DIBCO 2009 Benchmarking Dataset,
  • [25] H-DIBCO 2010 Benchmarking Dataset,
  • [26] CMATER Database Repository,