1 Introduction
Document image binarization is an extensively studied topic over the past decades. It is one of the most important steps of any document processing systems. It can be defined as a process of converting a multichromatic digital image into a bichromatic one. A multichromatic image also called as color image consists of color pixels each of which is represented by a combination of three basic color components viz. red (), green () and blue (). The range of values for all these color components is 0255. So, the corresponding gray scale value for a pixel located at may be obtained by using Eq. 1.
(1) 
where = 0.299, = 0.587 and = 0.114. As =1, the range of is also 0255. So, a gray scale image can be represented as a matrix of gray level intensities = where and denote the number of rows i.e. the height of the image and the number of the columns i.e. the width of the image respectively. Similarly, a binarized image can be represented as such that {0, 255}.
Techniques developed so far for document image binarization are categorized into two types  global binarization techniques and locally adaptive binarization techniques. In the first case, pixels constituting the image are binarized with a single threshold as shown in Eq. 2. A number of such techniques [1][4] have been developed, of which Otsu’s technique [4] has been found to be the best in a study conducted by Trier et al. [5][6].
(2) 
Global binarization techniques in general produce good results for noise free and homogeneous document images of good quality. But, it fails to properly binarize the images with uneven illumination and noise. Locally adaptive binarization techniques evolved to overcome this problem by binarizing pixels with pixel specific threshold as shown in Eq. 3.
(3) 
Quite a good number of such adaptive techniques have been found in the literature [7][14]. Among these techniques, the best one has been found to be Niblack’s one [11] in the same study by Trier et al. [5][6]. Later, more advanced techniques have been designed and some of them have been reported in [15][19]. Sauvola’s [15] text binarization method (TBM) is one of them. This method calculates from mean
of the gray levels of the pixels within a window around the subject pixel as described in Eq. 4.(4) 
where is a positive constant and is the dynamic range of standard deviation. A good number of the locally adaptive binarization techniques including Sauvola [15] are convolution based. As a result, the computational complexity and computation time of such techniques are very high. So far, binarization techniques are evaluated on the basis of binarization accuracy only [20][21]. But, study on computational requirements of algorithms is also required especially for real time systems and lowresourceful computing devices such as cellphones, Personal Digital Assistants (PDA), iPhones, iPodTouch, etc.
The present work is an attempt to reduce the computational complexity of convolution based binarization techniques while retaining comparable accuracies. The computational complexity of the global binarization technique is usually where is the image size. In case of convolution based binarization techniques, selection of threshold for each pixel requires computation of mean and standard deviation of the gray level intensities of the surrounding pixels within the window. So, computation of individual threshold value for each pixel has a complexity of where is the window size and overall complexity is for an image of pixels.
As the study [22] shows, handheld/mobile devices may not be capable of running algorithms of within affordable time. Even, the time taken for such algorithms on desktop computers may lead to dissatisfaction to many. In [23], Shafait et al. have suggested an implementation for such algorithm with computational complexity of . They proposed a faster calculation of and using integral images, but at the cost of 5 to 6 times more memory. In the present work, we have proposed a novel implementation of the convolution based binarization algorithms, which has a computational complexity of and does not require any additional memory. Experimental results on publicly available standard datasets and our own dataset have also been presented.
2 Present Work
The computation of mean m(x,y) and standard deviation s(x,y) for each pixel (x,y) is the most time consuming operation in convolution based locally adaptive binarization techniques. If a window of size pixels is taken around a pixel (x,y), then the set of window pixels, S, will have number of elements. The performance of such methods is heavily dependent on the size of the window. Window size is decided on the basis of pattern stroke and pattern size. It cannot be made arbitrarily small.
A possible way to reduce the execution time of such binarization methods is to reduce the number of pixels in S by considering only the pixels which effectively contribute in computation of mean and variance within the window. In our present work, we have tried to reduce this number by sampling pixels from S following some geometrical order to form a reduced set S
S.Different geometric structures can be defined to select the contributive pixels. A few of such geometric structures have been shown in Fig. 1. S contains pixels corresponding to black boxes marked in the geometric structures of Fig. 1. It may be observed that in S, the number of foreground pixels is much lesser than that of the background pixels for the windows of same size around both foreground and background pixels. The mean and standard deviation computed from S are denoted as and respectively.
It is evident that S is a very small subset of S for all possible geometric structures of Fig. 1. Also, in this context, the formulation of Sauvola’s method is as given in Eq. 5.
(5) 
where denotes the threshold calculated from the reduced set S for the pixel , is the dynamic range of and is a positive constant.
3 Experimental Results
The proposed implementation has been tested on the printed as well as handwritten images used for benchmarking the performance of various binarization techniques in the recent Document Image Binarization Contest (DIBCO) 2009 [24] and Handwritten Document Image Binarization Contest (HDIBCO) 2010 [25]. It has also been tested on our own dataset (CMATERdb6) as well. This dataset contains 5 representative images. The first one is of a handwritten Bengali document in which the texts on the rear side are visible from the front side and the image is unevenly illuminated. The second image is of an old historical printed Bengali document. The third one is an image of printed English text and has been captured from a notice board by a cellphone camera. The fourth image is of an old printed document having texts of multiple fonts and fontsizes. The last one is a cellphone camera captured business card image.
Ground truth data for the DIBCO and HDIBCO datasets are publicly available. We have prepared the ground truth data for the images of CMATERdb6 dataset. The original and ground truth images of this dataset are publicly available for research purposes at [26]. These three datasets together have 25 representative images containing various kinds of degradations and deformations. The results obtained with the proposed implementation have been compared with the original/algorithmic implementation of Sauvola’s binarization method, since it is one of the best convolutionbased binarization methods. Results obtained with Niblack as well as Otsu’s binarization technique have also been given.
3.1 Performance Analysis
Current implementation incepts from a conjecture that the threshold value t(x,y) obtained from m(x,y) and s(x,y) are not considerably different from t(x,y) computed from m(x,y) and s(x,y). As a result, the performance remains comparable. The binarized result obtained with the new threshold may not be exactly same with that obtained with t(x,y), but experiments show that the results obtained with the presented technique serves the purpose of binarization very well.
Comparing the output images obtained using various geometric structure of Fig.1(af) with their ground truth images, we find the number of true positives (TP), number of true negatives (TN), number of false positives (FP) and the number of false negatives (FN). The definition for FMeasure (FM) in terms of Recall rate (R) and precision rate (P) has been given in Eq. 6.
(6) 
where and . In an ideal situation i.e. when the output image is identical with the ground truth image, , and should be all 100%. While calculating the FMeasure, the best combination of window size () and has been considered in all cases.
Table 1 shows FMeasures achieved with Otsu, Niblack, Sauvola and proposed implementations of Sauvola’s method for DIBCO image dataset. It contains 5 (15) printed and 5 (610) handwritten images. Bold cells represent the highest FMeasure achieved for the corresponding image. It may be noted that the highest mean FMeasure (91.13%) has been achieved with GS3. Moreover, mean FMeasures achieved with GS4, GS5 and GS6 are greater than that of Sauvola’s method. FMeasure with GS2 is equal to that of Sauvola.
Image  FMeasures (%)  

Otsu  Niblack  Sauvola  GS1  GS2  GS3  GS4  GS5  GS6  
1  91.06  88.12  91.64  91.18  91.83  91.88  91.61  91.91  91.71 
2  96.56  94.76  96.39  95.58  96.27  96.16  96.14  96.40  96.25 
3  96.71  88.65  95.82  94.90  95.83  95.58  95.68  95.96  95.77 
4  82.59  90.29  92.93  91.70  93.02  92.88  92.56  92.93  92.86 
5  89.58  85.59  89.81  88.49  89.85  89.36  89.33  89.87  89.62 
6  90.85  92.70  92.34  91.64  91.96  92.15  91.77  91.78  91.98 
7  86.15  75.02  86.65  89.64  88.99  89.41  89.21  89.06  89.23 
8  84.11  88.19  87.99  88.08  88.02  89.08  88.74  88.46  88.92 
9  40.56  86.20  88.62  88.09  88.65  89.24  88.52  88.79  88.99 
10  28.04  85.62  85.38  86.36  83.13  85.59  85.13  85.05  84.81 
Mean  78.62  87.51  90.76  90.57  90.76  91.13  90.89  91.02  91.01 
Similar to Table 1, Table 2 shows FMeasures achieved with HDIBCO images. It contains 10 representative handwritten document images. Proposed implementations have achieved highest FMeasures for 6 images out of 10. The implementation referred to as GS3 alone has yielded 3 highest FMeasures. Although, the mean FMeasure is highest in case of Sauvola, FMeasures of the proposed implementations are close to that.
Image  FMeasures (%)  

Otsu  Niblack  Sauvola  GS1  GS2  GS3  GS4  GS5  GS6  
1  91.47  90.98  91.23  89.39  90.82  89.71  90.28  91.37  90.57 
2  88.18  88.46  89.03  89.86  88.32  89.47  89.53  88.64  89.26 
3  84.36  81.78  85.64  84.16  85.01  84.36  84.45  85.15  84.63 
4  85.62  89.80  89.67  89.29  89.82  89.84  89.45  89.60  89.69 
5  88.28  84.57  92.26  92.91  93.20  93.51  93.19  93.23  93.38 
6  80.38  84.38  84.09  84.04  83.77  84.11  84.42  84.33  84.33 
7  90.12  89.57  90.87  90.76  90.69  91.13  90.84  90.79  91.09 
8  85.68  88.32  88.23  87.27  88.01  88.29  88.30  88.12  88.32 
9  81.28  88.43  88.42  88.40  87.88  87.88  87.85  87.92  87.92 
10  79.25  87.67  87.60  87.90  86.00  85.91  85.54  85.67  85.66 
Mean  85.46  87.40  88.70  88.40  88.35  88.42  88.39  88.48  88.49 
Image  FMeasures (%)  

Otsu  Niblack  Sauvola  GS1  GS2  GS3  GS4  GS5  GS6  
1  88.00  89.71  89.98  90.10  90.09  90.16  90.05  89.96  90.14 
2  88.88  88.68  89.04  88.97  89.07  89.05  89.06  89.07  89.07 
3  91.93  92.89  93.41  93.00  93.37  93.46  93.29  93.38  93.46 
4  99.04  95.64  98.02  97.01  97.42  97.90  97.80  97.82  97.94 
5  91.06  90.94  91.65  91.40  91.63  91.70  91.81  91.77  91.80 
Mean  91.78  91.57  92.42  92.10  92.32  92.45  92.40  92.40  92.48 
Table 3 shows the FMeasures achieved with CMATERdb6 images. It is noteworthy that highest mean FMeasure i.e. 92.48% has been achieved for GS6 whereas the mean FMeasure for Sauvola’s method is 92.42%. It may also be noted that Otsu’s global binarization method has given the highest FMeasure for the fourth image.
A comparison of the mean FMeasures achieved for all 25 images of the 3 datasets with various techniques has been shown in Fig. 2. The highest FMeasure i.e. 90.31% is achieved with GS3. FMeasures achieved with all present implementations are greater than that of Niblack. Three implementations viz. GS–1, GS–2 and GS–4 have yielded FMeasures slightly less than that of Sauvola and the remaining implementations viz. GS–3, GS–5 and GS–6 have yielded slightly improved FMeasures than that of Sauvola. This shows that the results with the proposed implementations are comparable with the result of Sauvola. Fig. 3 show some sample images of the above datasets and their binarized images for some techniques.
3.2 Computational Complexity and Computation Time
The proposed technique calculates the threshold for each pixel with computation time of time. So, computational complexity of the proposed technique is . Plot of mean computation times of Niblack, Sauvola and proposed techniques has been shown in Fig. 4 with respect to a moderately powerful notebook (DualCore T2370, 1.73 GHz, 1GB RAM, 1MB L2 Cache). It may be observed from Fig. 4 that the computation time of the proposed technique is much lesser than Niblack’s and Sauvola’s implementations.
3.3 Memory Consumption
As storing a pixel of a gray scale image requires 1 byte of memory, an image requires bytes of memory. The algorithmic implementation of Niblack’s and Sauvola’s technique makes a copy of the image before binarizing its pixels by convolving the window. So, the amount of memory consumption of this algorithm is bytes where is a constant.
The implementation proposed by Shafait et al. [23] is faster, but it requires additional memory. It prepares two types of integral images from the given image  one for intensity values and the other for square of the intensity values. To store these integral images with 32 bit and 64 bit integers respectively, we need and bytes of memory. So, the amount of memory consumption in this case is where is another constant. It may be noted that the memory consumption is 6 times higher than that of the algorithmic implementation.
Memory consumption of our implementation can be given as where is another constant. It may be noted that this implementation requires no additional memory compared to the original/algorithmic implementation.
4 Conclusion
In this paper, we have presented a novel implementation of convolution based locally adaptive binarization techniques. Both the computational complexity and computation time are significantly reduced while keeping the performance close to the ordinary implementation. The computational complexity has been reduced from to and the time computation has been reduced by 5 to 15 times depending on the window size. At the same time, memory consumption is the same with the original implementation. This type of implementation is especially useful in image analysis and document processing systems for realtime systems and on handheld mobile devices having limited computational facilities. As the trend in designing camera based applications on mobile devices has recently increased considerably, the presented technique will be highly useful.
4.0.1 Acknowledgments.
We are thankful to the Center for Microprocessor Application for Training Education and Research (CMATER), Jadavpur University for providing infrastructural support for the research work. The first author is also thankful to the School of Mobile Computing and Communication (SMCC) for providing fellowship to him.
References

[1]
Abutaleb, A.S.: Automatic thresholding of graylevel pictures using twodimensional entropy. Computer Vision, Graphics, and Image Processing, 47, 22–32 (1989)
 [2] Kapur, J.N., Sahoo, P.K., Wong, A.K.C.: A new method for gray level picture thresholding using the entropy of the histogram. Computer Vision, Graphics, and Image Processing, 29, 273–285 (1985)

[3]
Kittler, J., Illingworth, J.: Minimum error thresholding. Pattern Recognition, 19(1), 41–47 (1986)
 [4] Otsu, N.: A threshold selection method from graylevel histograms. IEEE Trans. Systems, Man, and Cybernetics, 9(1), 6266 (1979)
 [5] Trier, .D., Taxt, T.: Evaluation of binarization methods for document images. IEEE Trans. Pattern Anal. Mach. Intell., 17(3), 312–315 (1995)
 [6] Trier, .D., Jain, A.K.: Goaldirected evaluation of binarization methods. IEEE Trans. Pattern Anal. Mach. Intell., 17(12), 1191–1201 (1995)
 [7] Bernsen, J.: Dynamic thresholding of greylevel images. In: Eighth Int’l Conf. on Pattern Recognition, pp. 1251–1255, Paris (1986)
 [8] Nakagawa, Y., Rosenfeld, A.: Some experiments on variable thresholding. Pattern Recognition, 11(3), 191–204 (1979)
 [9] Eikvil, L., Taxt, T., Moen, K.: A fast adaptive method for binarization of document images. In: First Int’l Conf. on Document Analysis and Recognition, pp. 435–443, SaintMalo, France (1991)
 [10] Mardia K.V., Hainsworth, T.J.: A spatial thresholding method for image segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 10(6), 919–927 (1988)
 [11] Niblack, W.: An Introduction to Digital Image Processing. PrenticeHall, Englewood Cliffs, NJ, pp. 115–116 (1986)
 [12] White, J.M., Rohrer, G.D.: Image thresholding for optical character recognition and other app1ications requiring character image extraction. IBM J. Research and Development, 27(4), 400–411 (1983)
 [13] Parker, J.R.: Gray level thresholding in badly illuminated images. IEEE Trans. Pattern Analysis and Machine Intelligence, 13(8), 813–819 (1991)
 [14] Trier, .D., Taxt, T.: Improvement of integrated function algorithm for binarization of document images. Pattern Recognition Letters, 16(3), 277–283 (1995)
 [15] Sauvola, J., Pietikainen, M.: Adaptive document image binarization. Pattern Recognition, 33, 225–236 (2000)
 [16] Seeger, M., Dance, C.: Binarizing camera images for OCR. In: 6th Int’l Conf. on Document Analysis and Recognition, pp. 5458 (2001)
 [17] Wolf, C., Jolion, J.M., Chassaing, F.: Text localization, enhancement and binarization in multimedia documents. In: Int’l Conf. on Pattern Recognition, pp. 10371040 (2002)
 [18] Gatos, B., Pratikakis, I., Perantonis, S.J.: Adaptive degraded document image binarization. Pattern Recognition, 39(3), 317–327 (2006)
 [19] Shin, K.T., Jang, I.H., Kim, N.C.: Block adaptive binarization of illconditioned business card images acquired in a PDA using a modified quadratic filter. IET Image Processing, 1(1), 5666 (2007)
 [20] Gatos, B., Ntirogiannis, K., Pratikakis, I.: ICDAR 2009 Document Image Binarization Contest (DIBCO 2009). In: 10th Int’l Conf. on Document Analysis and Recognition, pp. 13751382, Spain (2009)
 [21] Pratikakis, I., Gatos, B., Ntirogiannis, K.: HDIBCO 2010 – Handwritten Document Image Binarization Competition. In: 12th Int’l Conf. on Frontiers in Handwriting Recognition, pp. 727732, India (2010)
 [22] Dunlop, M.D., Brewster, S.A.: The Challenge of Mobile Devices for Human Computer Interaction. Personal and Ubiquitous Computing, 6(4), 235–236 (2002)
 [23] Shafait, F., Keysers, D., Breuel, T.M.: Efficient implementation of local adaptive thresholding techniques using integral images. In: Document Recognition and Retrieval XV, San Jose, USA (2008)
 [24] DIBCO 2009 Benchmarking Dataset, http://users.iit.demokritos.gr/~bgat/DIBCO2009/benchmark
 [25] HDIBCO 2010 Benchmarking Dataset, http://users.iit.demokritos.gr/~bgat/HDIBCO2010/benchmark
 [26] CMATER Database Repository, http://code.google.com/p/cmaterdb
Comments
There are no comments yet.