A Groundtruth generation Tool for Document images
We propose a graphical user interface based groundtruth generation tool in this paper. Here, annotation of an input document image is done based on the foreground pixels. Foreground pixels are grouped together with user interaction to form labeling units. These units are then labeled by the user with the user defined labels. The output produced by the tool is an image with an XML file containing its metadata information. This annotated data can be further used in different applications of document image analysis.READ FULL TEXT VIEW PDF
A Groundtruth generation Tool for Document images
Document digitization has attracted attention for several years. Conversion of a document image into electronic format requires several types of document image analysis. Typical document image analysis includes different types of segmentation, optical character recognition (
), etc. Numerous algorithms have been proposed to achieve these objectives. The performance of these algorithms can be measured with the help of groundtruth. The data with groundtruth is of immense importance in document image analysis. It is required for training, machine learning based algorithms, and it is also used for evaluation of various algorithms. The generation of groundtruth is a manual and time consuming process. Hence, the groundtruth generation tool should be user friendly, reliable, effective, and capable of generating data in a convenient manner.
Several systems for groundtruth generation have been reported in the literature for producing benchmark datasets to evaluate competitive algorithms. Pink Panther  is one such groundtruth generator, and is mainly used for evaluation of layout analysis. PerfectDoc  is a groundtruth generation system for document images, based on layout structures. Various layout based groundtruth generation tools are present in the literature , , . These groundtruth generators , , , only support rectangular regions for annotation. Hence, they fail to generate groundtruth for documents with complex layout.
A recent groundtruth generator , supports annotation by generating a polygonal region. However, it is observed that the tool is quite inefficient for images of larger size (). PixLabeler  is an example of pixel level groundtruth generator. Similar tools are also reported in , , . Pixel level annotation gives more general measure for annotation, but it involves more time for completing the annotation task.
In this paper, we propose a tool to annotate a document image at pixel level. The main objective of the tool is to efficiently annotate data using less amount of time. Towards this, we have provided a semi-automatic interactive platform to annotate document images efficiently. Since our main goal is to annotate foreground pixels, we segment foreground pixels from its background with user assistance. Next, we group foreground pixels such that neighboring pixels of similar types get connected. Finally, annotation of each such group of pixels is performed with a predefined set of labels.
The work-flow of the Anveshak system is shown in Figure 1. Some semi-automated modules are implemented to speed up the annotation process.
We are mainly concerned with the annotation of foreground pixels of a document image. A module is integrated with Anveshak to efficiently segment foreground pixels from its background. This task can be performed with three types of thresholding techniques, first, based thresholding, second, a based adaptive thresholding technique , and third, the Otsu’s thresholding technique . Here, a user can segment foreground from its background efficiently, using either of these three thresholding techniques. An example of foreground background separation module using based thresholding is shown in Figure 2.
Anveshak has a unique technique to predefine labeling units. Labeling units are generated using based morphological operations. Morphological operations included in Anveshak are, erosion, dilation, closing, opening, gap-filling, and smoothing.
A labeling unit is a collection of foreground pixels, grouped together using a suitable morphological operator. Pixels are grouped together by choosing either of these morphological operations - erosion, dilation, closing, and opening . The user can select an ideal element size and element type, in order to group pixels. A user can also accumulate pixels to form a group by a smoothing operation , where choosing of run length parameter is an interactive process. Foreground pixels can also be grouped together using gap filling operation , where selection of the parameter, gap size in horizontal and vertical directions, is a user driven process. An instance of Anveshak for generating labeling units is shown in Figure 3.
After grouping the pixels, contours of each group is obtained using the method described in . Each contour is then approximated to a polygon by applying Douglas-Peucker algorithm . The polygons thus computed are the basic units for annotation in Anveshak. An example of a collection of labeling units is shown in Figure 4, where each labeling unit is represented using a unique colors.
There are some predefined labels in Anveshak. The tool provides an option to add and delete labels, as shown in Figure 5. After defining all the labels, a user can annotate the labeling units of the input document with the defined labels. A unique index number and a color is assigned to each label, which are used in the later stages of annotation.
Overall annotation process can be summarized uasing a flow chart given in Figure 6. Annotation of labeling units is performed in two ways as shown in Figure 7. A user can label unlabeled units one by one with the predefined labels. In this case, an unlabeled unit is displayed in a window and the user is prompted for a label for the displayed unit. This process continues until each of the units is labeled, or the user chooses to label the units by selecting a region of interest ().
Another method of labeling units is to select a region of interest. In this module, a user can select an , which can be annotated with the defined labels. At first, all units are determined which are completely present within the selected . After selection of an , units present within the can be labeled using three different modes (Figure 8). A user can annotate all units within the with one label, and update all units with the selected label. Another way of annotation is by labeling all units belonging to the selected with a particular type. Lastly, a user can annotate each unit belonging to the selected individually with a label. Pixels belonging to a particular labeling unit are updated with the unique index corresponding to the label of , and color of those pixels is updated with the color of that label. Belongingness of a pixel to a particular labeling unit is computed through point-polygon test. At each stage of the annotation process, the updated color image is displayed, where labeled pixels are displayed with color of the corresponding label, and unlabeled pixels are displayed with their original color value.
The process of annotation continues until all labeling units are marked. After completion of annotation, the user is asked, whether he/she wants to update any label, or finalize the labels. After finalizing the labels, output labeled image and its corresponding file are generated. An example of different stages of labeling is shown in Figure 9.
Anveshak is implemented in , using cross-platform application framework for graphical user interface and with customized modules developed using OpenCV . Annotation of an image is achieved through the user interface and after completion, a single image in format is generated. Each pixel of the output image is represented with an index corresponding to a particular annotation.
The metadata of the concerned image is stored in an file, which also includes the information of the source image along with the annotated image. In the file, an index corresponds to the unique pixel value for a particular label in the annotated image. Examples of two different annotated images and their corresponding files are respectively shown in Figures 9 (c) and (d) and Figures 10 (a) and (b). Anveshak is tested to annotate images from the dataset reported in . It has been observed by the annotator that, the labeling can be performed in a much easier and faster way than it could be performed with PixLabeler  or .
In our present implementation of Anveshak, only one annotation per block is supported. In many scenarios, it is desirable to have multiple annotations per block, mainly in case of overlapping regions. In future, we plan to support more than one annotation per block. Present implementation of Anveshak has been made available online111http://www.facweb.iitkgp.ernet.in/~jay/anveshak/anveshak.html.
Anveshak is used to generate groundtruth for the dataset reported in . The images in the dataset consist of various regions like logo, headers, text, signature, headline, bold text, etc. However, annotation of stamp regions is only available with the original dataset. The dataset consists of scanned images in , , and resolutions. Out of these images, images contain non overlapping regions. Anveshak is used to annotate these images of resolution, and the groundtruth data has been made available online222http://www.facweb.iitkgp.ernet.in/~jay/anveshak_gt/anveshak_gt.html. These images are annotated using Anveshak with the help of users. There are on an average labels, and segments per image in the given dataset. Users involved in annotation are initially trained to annotate data with one random image. Average time taken by a user to annotate an image with Anveshak is about minutes. The annotated dataset has been used in the works reported in  and .
The primary target of Anveshak is to annotate an input document image in an efficient manner. Our tool produces an file containing the metadata information, along with an annotated image. We have developed a user friendly groundtruth generation tool, with some semi-automatic modules which make the annotation process faster. We hope that Anveshak will serve the document analysis community in an effective manner by simplifying groundtruth generation procedure.
This work is partly funded by TCS research scholar program and partly by Ministry of Communications & Information Technology, Government of India; MCIT 11(19)/ 2010-HCC (TDIL) dt. 28-12-2010.
Stamp and logo detection from document images by finding outliers.In , pages 1–4, Dec 2015.