Design Identification of Curve Patterns on Cultural Heritage Objects: Combining Template Matching and CNN-based Re-Ranking

05/17/2018 ∙ by Jun Zhou, et al. ∙ 2

The surfaces of many cultural heritage objects were embellished with various patterns, especially curve patterns. In practice, most of the unearthed cultural heritage objects are highly fragmented, e.g., sherds of potteries or vessels, and each of them only shows a very small portion of the underlying full design, with noise and deformations. The goal of this paper is to address the challenging problem of automatically identifying the underlying full design of curve patterns from such a sherd. Specifically, we formulate this problem as template matching: curve structure segmented from the sherd is matched to each location with each possible orientation of each known full design. In this paper, we propose a new two-stage matching algorithm, with a different matching cost in each stage. In Stage 1, we use a traditional template matching, which is highly computationally efficient, over the whole search space and identify a small set of candidate matchings. In Stage 2, we derive a new matching cost by training a dual-source Convolutional Neural Network (CNN) and apply it to re-rank the candidate matchings identified in Stage 1. We collect 600 pottery sherds with 98 full designs from the Woodland Period in Southeastern North America for experiments and the performance of the proposed algorithm is very competitive.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 4

page 5

page 6

page 9

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Millions of archived cultural heritage objects such as bone, pottery, shell, wood, and cloth are very precious records in archeology – many of these objects are embellished with various man-made patterns, especially curve patterns, and the designs of these patterns provide important information to archaeologists. However, most of these cultural heritage objects are highly fragmented, e.g. potsherds rather than whole vessels, and each of them only show a small portion of the underlying full design of patterns. For example, Figure 1(a) shows one pottery sherd from the Woodland Period in Southeastern North America (300-600 AD), when the Native Americans had a tradition of decorating pottery by stamping carved wooden paddle on the pottery surface. The full curve pattern carved on the wooden paddle, as shown in Fig. 1(c), is the underlying design, which can be used to build chronologies, to track trade networks, and to reconstruct aspects of style and the creative process. An important problem in archeology is to automatically and quickly identify the underlying design from the partial pattern shown on the surface of a sherd [1].

Fig. 1: An illustration of identifying the underlying design for a pottery sherd. (a) A sherd’s RGB and depth images. (b) The curve structure segmented from the sherd. From top to bottom are the segmented curve structure of the sherd and the mask derived from the sherd boundary, and the segmented curve structure masked by the sherd boundary. (c) A database of known designs, where the true design with the best matching is highlighted in the red box. Original design reproduced with permission, courtesy of Frankie Snow, South Georgia State College.

In this paper, we investigate this important problem by focusing on curve patterns, where the underlying full design and the partial pattern on the sherd are curve structures, as shown in Fig. 1. More specifically, after decades of efforts from archaeologists, many full designs have been revealed, reconstructed and archived for different periods and regions in archeology, such as the Woodland Period in Southeastern North America. This way, the design identification problem can be formulated as identifying the best matched design for a sherd by segmenting the curve structure from the sherd and then matching it against a set of known designs. As illustrated in Fig. 1, we can match the curve structure segmented from a sherd, as shown in Fig. 1(b), against each location, with each possible orientation, of each known design, and then select the design with the lowest matching cost as the matched design. This exhaustive matching procedure identifies not only the matched design, but also the matched location and orientation on the matched design. Note that, scale transform is usually not considered in the matching since both sherds and full designs have known size in real world and their matchings are not scale invariant, i.e., two designs are considered different in archeology even if they are identical after a uniform scale transform.

Fig. 2: An illustration of various noise and deformations in the curve structures segmented from sherds: (a) A sherd with missing curve segments due to erosion; (b) A sherd with noisy curves due to weathering and/or shallow paddle stamping on rough pottery surface; (c) A sherd with deformed curve patterns due to the drying process in making the pottery. For each case, we show the sherd’s RGB image (top-left), the depth image (top-right), the segmented curve structure (bottom-left) and the corresponding potion in the underlying design (bottom-right).

Based on this problem formulation, the key issue is then the definition of an appropriate cost in matching the curve structure segmented from a sherd to a location of a full design, with a specified orientation. This problem is nontrivial in the proposed archeology application for two reasons. First, the exhaustive matching against each possible location and orientation of each design leads to a very large search space. To prevent from an overly slow algorithm, we require the matching cost to be very efficient to compute for each possible solution in the search space. Second, compared with the underlying design, the curve structure segmented from the sherd usually contain strong noise and deformations in the drying process in making many of these objects, many years of erosion and sediment under the earth, and the imperfectness of the curve-segmentation algorithms, as illustrated by several examples in Fig. 2.

To address this problem, in this paper we propose a new two-stage matching algorithm, with a different matching cost in each stage. In Stage 1, we propose to use a classical template matching method, which is highly computational efficient, over the whole search space to identify a small set of candidate matchings on all the known designs. This simple matching cost can help efficiently reduce the search space of solutions. In Stage 2, we further derive a new matching cost by training a dual-source Convolutional Neural Network (CNN) and apply this more computational-intensive matching to re-rank the candidate matchings identified in Stage 1. Through supervised learning, we expect that various kinds of noise and deformations in the segmented curve structures, as shown in Fig. 

2, can be implicitly identified and suppressed in computing the CNN-based matching cost.

In the experiments, we collected the images of 600 pottery sherds excavated from archaeological sites associated with the Woodland period paddle-stamping tradition, together with their corresponding 98 full paddle designs, to evaluate the performance of the proposed two-stage matching algorithm. Based on the Cumulative Matching Characteristics (CMC) ranking metric, the proposed algorithm can achieve a much higher accuracy than several other existing matching algorithms that are chosen for performance comparison in the experiment.

Ii Related Work

Many previous works on image processing of cultural heritage objects are focused on fragment classification and matching, which aims to recognize sherds that are originally from the same object, such as a pottery vessel, and then assemble these sherds to reconstruct the original 3D object. For example, Durham et al. [2] applied a generalized Hough transform to perform artifact retrieval and then assembled them based on edge detection. Smith et al. [3] proposed a method for thin ceramic sherd classification based on color and texture characteristics using total different geometry energies. Schurmans et al. [4] studied the sherd uniformity and standardization using points or areas from a 3D model and the profile of the studied shapes. Different from these works, in this paper our goal is to identify the design of curve patterns on a sherd and the sherds identified with the same design are usually not from the same object, e.g., the same vessel. As a result, we could not assemble them for 3D reconstruction.

Curve-structure matching

has been a long studied problem in computer vision and image processing. By thinning all the curve structure to one-pixel wide, many shape matching algorithms have been developed for matching curve structures. For example, Belongie et al. 

[5] proposed a shape context approach, which builds a log-polar histogram around each sampled curve point and then uses this histogram as the feature to match two curve structures. Barrow et al. [6] proposed the widely used Chamfer matching algorithm by pre-computing a distance map for efficiently locating one curve structure on another curve structure. By treating curve structure as binary images, image-based matching algorithms can be directly applied here for matching curve structures. For example, Brunelli [7] developed a template matching method by treating one as a convolution mask over the other. In Perceptual Hash (pHash) [8], each image was coded into a 64-bit fingerprint number, which was then used as features to compare and match images. However, many of these existing matching algorithms are sensitive to noise and deformation present in the curve structures segmented from sherds.

In principle, local feature matching [9] can also be used for handling curve-structure matching – we can extract a set of image features, such as the Gabor [10], the local binary patterns (LBP) [11], the histogram of oriented gradients (HOG) [12] or the scale-invariant feature transform (SIFT) [13], from both the image of curve structure segmented from a sherd and the image of the design, and match them by identifying a set of corresponding features. However, most of these local feature detectors are developed for color or gray-scale images and could not accurately capture the critical local features in binary images of curve structures. Furthermore, these image feature detectors and the corresponding feature descriptors are not robust enough to the noise and deformation present in curve structures segmented from sherds.

In Stage 2 of the proposed method, we employ a dual-source CNN for matching two curve structures. Deep neural network based image matching have been studied by several research groups in recent years. For example, a patch-based local image matching network, called MatchNet [14], was developed to jointly learn the feature representation and the matching function from image data. In [15], after exploring multiple neural networks, a CNN-based model called DeepCompare was developed to match a variety of images based on their appearance. In [16]

, Lin et al. designed a CNN-based model for fast image retrieval using binary hash codes. In 

[17], Wang et al. proposed a deep ranking approach by learning fine-grained image similarity. Several networks have been proposed in related applications. For example, AlexNet [18] claimed victory at the LSVRC2012 classification contest. ResNet [19], GoogleNet [20] and VGGNet [21] later drew a lot of attention in large scale image recognition and object detection. However, these methods are mainly developed for matching color or gray-scale images and show degraded performance in matching binary images of curve structures with noise and deformation.

In the later experiments, we include several of these existing matching algorithms for performance evaluation and comparison.

Iii Proposed Method

The full pipeline of the proposed method is illustrated in Fig. 3.

Fig. 3: An illustration of the full pipeline of the proposed method. (a) Curve structure segmentation from a sherd. (b) Stage 1: template matching with all the designs for selecting a small set of candidate matchings of the input sherd. (c) Stage 2: CNN-based re-ranking of the candidate matchings. For (b) and (c), from top to bottom the resulting matchings are shown in ascending order with respect to their costs. Correctly matching design is shown in red box, which is ranked low in Stage 1 but ranked at the top in Stage 2. Original design reproduced with permission, courtesy of Frankie Snow, South Georgia State College.

Given the image of a sherd, we first segment the curve structure in the form of a binary image masked by the sherd boundary. We then develop a two-stage algorithm for design identification. 1) Template matching. In this stage, we apply a simple template matching to match the curve structure segmented from the sherd over all the locations, with all possible orientations, of all the designs and select a small set of candidate matchings. 2) CNN-based re-ranking. In this stage, we derive a new matching cost by training a dual-source Convolutional Neural Network (CNN) and apply it to re-rank the candidate matchings for final matching designs, as well as the matching locations and orientations. In the following, we first introduce the curve structure segmentation from the image of a sherd, and then elaborate on the two stages of the proposed algorithm.

Iii-a Curve-Structure Segmentation from a Sherd

Generally speaking, extracting curve structures from the surface of a sherd is a typical low-level image segmentation problem. However, the erosion and sediment usually make the curve structures on the sherd very weak and blurred, which substantially increases the difficulty in accurately segmenting them. In this paper, we use the excavated pottery sherds associated with the Woodland period for experiments and we found that it is very difficult to extract these curve structures from the camera-taken RGB images of these sherds. Given that these curve structures are stamped on the surfaces of pottery vessels by carved paddles, curve structures usually show larger depth than the adjacent non-curve surface. Therefore, in archeology, 3D scanners are usually applied to achieve the 3D depth image of the sherd surface, as illustrated in Fig. 4, and the curve structures are then segmented directly from the depth image.

Fig. 4: An illustration of scanning sherds for depth images: (a) RGB image of a sherd, (b) setup of a 3D scanner, (c) 3D point cloud of the sherd surface obtained by the 3D scanner, and (d) depth image of the sherd surface: pixel intensity represents the depth value at a location.

However, after buried under the earth for thousands of years, together with possible shallow stamping in making the vessel, the curve structures can still be difficult to segment even from the scanned high-resolution depth images. In our previous work [22], we propose a new CNN-based algorithm to more accurately and reliably segment the stamped curve structures from the depth images of the sherds, by learning and incorporating the implied curve geometry, such as curve smoothness and parallelism, in the underlying designs. Specifically, we train a Fully Convolutional Network (FCN) to detect the skeletons of the curve structures in the depth images. Then, we train a dense prediction convolutional network to identify and prune false positive skeleton pixels. Finally, we recover the curve width by a scale-adaptive thresholding algorithm to get the final segmentation of curve structures. Figure 5 shows the sample results after each step of this algorithm. We also extract the boundary of sherd, indicated by red contours in Fig. 5. The sherd boundary provides a mask to exclude all the information outside the sherd boundary from matching in the later two-stage algorithm.

Fig. 5: An illustration of segmenting curve structures from sample sherds: (a) depth images of sherds, where darker pixels have larger depths. (b) FCN-extracted curve skeletons, (c) Refined curve skeletons by using a dense prediction CNN, and (d) final segmented curve structures with recovered curve width, masked by the sherd boundaries (indicated by red contours).

It has been shown in [22] that, this CNN-based algorithm can segment the curve structure from a sherd much more accurately than other low-level and high-level image segmentation algorithms. However, the segmented curves are far from perfect, because of the strong noise and shallow stampings on the unearthed sherds, as shown in Fig. 2. In particular, many segmentation errors may occur near the sherd boundary, where the depth information is more noisy due to the erosion and sediment. Furthermore, as mentioned above, the curve structure segmented from a sherd may show deformation from its underlying design due to the drying process in making the object. In the following, we elaborate on the proposed two-stage matching algorithm that is robust to noise, errors and deformation present in the segmented curve structures.

Iii-B Stage 1: Template Matching

In Stage 1, we treat the design identification as a traditional template matching problem. The curve structure segmented from a sherd is taken as a template, which is a binary image with mask . The known designs, each of which is a binary image , , are taken as sources. At location with orientation on design , we adopt the simple template matching cost

(1)

where is the template with translation and rotation angle . In this stage of the algorithm, we calculate the matching cost exhaustively for the whole search space – for design , the translation contains all the pixel locations in and the orientation angle covers all the 360 integer degrees in the range of . This way, just for design , we need to calculate the matching cost times with being the size of , i.e., the number of pixels in . Combining all designs, the whole search space for calculating the matching cost in this stage is , which can be very large when the number of designs increases and the size of the design increases. But given the high simplicity of this matching cost on the binary images, its calculation over the whole search space can be still achieved very quickly by applying fast cross-correlation of and  [23].

Given the noise, errors and deformation in the template and the simplicity of the matching cost Eq. (1), the globally optimal matching over the whole search space may not be the desired correct matching. Therefore, in this stage we select a small set of candidate matchings with relatively low template matching costs of Eq. (1), with an expectation that the desired ground-truth matching is included in these candidate matchings. The final matching and design identification will be determined by a more refined matching cost, which we will discuss later in Stage 2.

For simplicity, for a given template, we select candidate matchings on each design , . In selecting the candidate matchings, we need to avoid the inclusion of spatially adjacent ones, e.g., the matchings and on the same design since they may show very similar matching cost. To address this problem, we employ the following “non-minimum suppression” strategy to select the candidate matchings on the design .

  1. On the design , compute the template matching cost over the whole search space. This leads to a 3D matrix with dimensions of , , and .

  2. Find the local minimums of this 3D matrix, i.e., all the elements that satisfy

    with , , and take value in . On the design, we apply rotation symmetry, i.e.,

    for padding along the dimension of

    and the traditional float maximum padding along the dimensions of and . An example is shown in Fig. 6.

  3. Select the of these local minimums with the lowest matching cost as the candidate matchings on design . If there are less than local minimums, we simply take all the local minimums as the candidate matchings on design .

Fig. 6: 3D non-minimum suppression for selecting the candidate matchings. Local minimums are defined over a window over the matrix constructed by the template matching costs computed over the whole search space.

This way, for each design , we have candidate matchings111The degenerated cases of less than local minimums on a design does not make any difference to the later algorithm development., which we denote as , and . In the following section, we introduce a CNN-based algorithm to re-rank these candidate matchings for the final matching and design identification.

Iii-C Stage 2: CNN-based Re-ranking

For the curve structure segmented from a sherd, the template matching in Stage 1 provides us candidate matchings, i.e., candidate matchings on each design. The remaining problem is to rank these candidate matchings to locate the best matchings, with which we can identify the best matched designs for the input sherd. The simplest way is to directly use the template matching cost in Eq. (1) to rank these candidate matchings. However, as mentioned above, the noise, errors, and deformation in the template may prevent the use of this simple template-matching cost from making the correct ranking and identifying the underlying design. As shown in Fig. 7, a false matching in the candidates may show lower template-matching cost of Eq. (1) than the true matching, i.e., the true location and orientation on the true design.

Fig. 7: Two examples of ranking candidate matchings using the template-matching cost of Eq. (1) with each row showing an example. (a) Depth images of sherds, (b) templates, i.e., the curve structures segmented from the sherds, (c-g) top five ranked candidate matchings (from the first to the fifth) and the corresponding matching costs. For each design, only the matched portion masked by the corresponding sherd boundary is displayed. Red boxes indicate the desired true matchings.

We can see that the incorrect rankings may be caused by different kinds of degradations such as noise, errors and deformation of the template. For example, the segmentation error near the sherd boundary may be high, as shown in the top row of Fig. 7. Furthermore, the curve structure in certain area of the sherd may be totally missing due to shallow stamping in making the object, as shown in in the bottom row of Fig. 7. It is very difficult to find and explicitly model all the possible degradations and propose specific strategies to tackle these degradations. Instead, in this stage of the algorithm, we propose to develop a CNN-based algorithm to learn features that are robust to these degradations in a supervised way. We then use these robust features to calculate the matching cost and re-rank the candidate matchings derived in Stage 1 for final design identification.

For each candidate matching , it actually matches the curve structure segmented from the sherd, i.e., and an identical-size patch of the design defined by the location and orientation of and the same mask , as illustrated in Fig. 7(c). By setting the intensity of all the pixels outside the mask to be zero, all we need is a robust matching cost between binary images and . To solve this problem, we propose a dual-source CNN with two identical sub-networks as shown in Fig. 8(a). These two sub-networks take and

as the inputs, respectively. Each sub-network consists of a sequence of convolution, max pooling layers and a global average pooling layer (GAP) for feature learning, as detailed in Table 

I. We implement this dual-source CNN by truncating AlexNet [18], as shown in Fig. 8(b), to “conv4” layer and replacing all layers after “conv4” layer with a GAP layer. Both inputs, i.e., and are re-sized to pixels, before being fed to the sub-network.

Fig. 8: An illustration of the dual-source CNN architectures: (a) the proposed CNN and (b) AlexNet.
Name Type Configuration
GAP GlobalAveragePooling
conv4 Convolution
conv3 Convolution
pool2 MaxPooling
conv2 Convolution
pool1 MaxPooling
conv1 Convolution
data Input binary image
TABLE I: Configuration of each sub-network;

stand for the number of outputs, kernel size, stride and padding size respectively.

In the CNN training, we use a set of curve-structure image pairs with ground-truth labels of true or false matching and train the CNN by minimizing the contrastive loss

where if is a true matching and otherwise, is a margin for separation, and is the learned feature of an input after the “GAP” layer.

One issue in training the proposed CNN is the insufficient number of positive training data, i.e., the matched pair of images . For each sherd, we can only construct one positive training sample by pairing the template with its ground-truth design, cropped at the ground-truth location with the ground-truth orientation. In practice, we only have a small number of sherds with known ground-truth designs and insufficient training data may lead to over-fitting in CNN training. By following previous work on data augmentation, we take the following strategies to increase the number of positive training samples. 1) Given a positive matching sample , we simultaneously rotate both of them by an angle to construct a new positive matching sample. More specifically, we set the angle to be , , , , , , , and . 2) Given a positive matching sample , we can simultaneously flip both of them, horizontally or vertically or non-flipping, to construct new positive samples. 3) We apply FishEye transformation to each positive sample by different exponents (1, 1.25, 1.5 and 1.75) to construct new positive samples. By combing these three strategies, for each sherd with its known ground-truth design, as well as the matched location and orientation, we can construct positive training samples.

Similarly, in the testing stage, we also apply the same strategies to compute a more reliable CNN-based matching cost. More specifically, given a pair of candidate matching , we construct 96 candidate matchings using the combined rotation, flipping and FishEye transformation as mentioned above. We then compute the CNN-based matching cost for each of these 96 pairs, by feeding the input pair of images to the two sub-networks of the trained CNN, respectively. We find that the final features is usually inappropriate for measuring this CNN-based matching cost since the above training is based on binary classification, i.e., the two inputs are either matched or not matched. Instead, in the testing stage, the resulting matching cost is expected to be a real number. Therefore, we define their CNN-based matching cost using Euclidean norm. For example, for the original pair without any rotation, flipping, and FishEye transformation, this matching cost is computed by

(2)

where is the learned feature of an input of the trained CNN as shown in Fig. 8. For each of the 96 pairs derived from , we calculate their matching cost and denote their average as . We perform re-ranking of the candidate matchings according to this averaged matching cost . The highest ranked candidate matchings, i.e., the ones with the lowest average matching costs, provide the matched designs and the matched location and orientation on these designs.

The whole algorithm is summarized as shown in Algorithm 1.

  Input: A sherd depth image; designs
  Segment input sherd image for curve structure and mask
  for  to  do
     for each translation  do
        for each rotation  do
           Calculate template matching cost by Eq. (1)
        end for
     end for
     Select candidate matchings on design using “non-minimum suppression”
  end for
  for  to  do
     for  to  do
        Construct 96 pairs of input from by data augmentation
        Compute the average matching cost
     end for
  end for
  Re-rank candidate matchings based on
Algorithm 1 Algorithm for design identification.

Iv Experiment

In this section, we conduct experiments to validate the effectiveness of the proposed method. We first quantitatively evaluate the performance of the proposed method in terms of CMC metric and compare it against eight existing methods. Then, we conduct an ablation study to justify the usefulness of both stages in the proposed matching algorithm.

Iv-a Dataset and Settings

For our study, we collected 600 pottery sherds that were excavated in various archaeological sites located in Southeastern North America. These 600 sherds represent 98 unique paddle designs. Each sherd only displays one design, while the same design may be applied to the surface of multiple sherds. We used a linear array 3D laser scanner, NextEngine, to get the point cloud of a sherd surface with the resolution of 100 points per . Then its depth image is constructed by following the same resolution, i.e., each pixel in the depth image covers an area of . The scanner is placed about 9 inches above the sherd and is perpendicular to the platform where the sherd is seated. The size of the collected depth image ranges from pixels to pixels, while the size of design image ranges from pixels to pixels. Archaeologists helped identify the true matching and the true design for each sherd, which we use as ground truth in our experiments. Curve structures are segmented from the depth image of each sherd using our previously developed method as described in Section III-A.

For computing the matching cost in Eq. (2) as shown in Fig. 8. We set , i.e., for each sherd , we use template matching to select three candidate matchings on each design. In Stage 2, we randomly select 300 sherds for training the CNN. More specifically, with , these 300 training sherds generate candidate matchings in the form of image pairs in Stage 1. Among them, true matchings are taken as positive training samples while the false matchings are taken as negative training samples. We then use the remaining 300 sherds for testing. Similarly with , these 300 testing sherds also generate candidate matchings in the form of image pairs in Stage 1. The trained CNN is used to re-rank the candidate matchings generated by each testing sherd for identifying the matched design of this testing sherd.

In Stage 2, the proposed re-ranking CNN is initialized by the pre-trained AlexNet model. It is trained by Stochastic Gradient Descent (SGD) with a batch size of 32, momentum of 0.9, and weight decay of 0.0005. The base learning rate is

, and it decreases slowly in the training process. We set the maximum number of iterations to 50,000 and the margin for separation to 0.5.

Iv-B Design Identification Performance

In this paper, we use the Cumulative Matching Characteristics (CMC) ranking metric to evaluate the design-identification performance. For each sherd , we match it against all designs, generating candidate matchings. We then use the average CNN-based matching cost to re-rank these candidate matchings. In the re-ranking result, for the candidate matchings from the same design, we only keep the one with the smallest cost . This way, we get a re-ranking result of candidate matchings, each from one different design, with the ascending cost . The Rank- CMC value is the percentage of the testing sherds with correctly identified designs among the top candidate matchings in the re-ranking results. By varying from 1 to , we can draw a CMC curve based on the corresponding Rank- CMC values. The higher the CMC curve, the better the identification performance.

To evaluate the effectiveness of the proposed method for design identification, we select eight existing matching algorithms for comparison: Template Matching [7], Chamfer Matching [6], Shape Context [5], Nearest Neighbor [24], pHash [8], Gabor [10], DeepCompare [15] and MatchNet [14]. The experiments are conducted on the same testing dataset with 300 sherds.

Template Matching directly uses the matching cost in Eq. (1) for finding the best matched designs, as well as locations and orientations. In Chamfer Matching, sherd curve structure and each design are first thinned to one-pixel-wide skeleton and , respectively. Then is translated and rotated to match in terms of Chamfer distance

(3)

where indicates a spatial transform. The Chamfer matching cost is then defined as the minimal over possible ’s, including all translations and rotations. Chamfer Matching can be computed efficiently by building a distance map for .

As in Chamfer Matching, Shape Context also uses the skeletons and for matching. Since this is a partial matching and Shape Context is rotation invariant, we slide over and calculate the shape-context matching at each location of sliding for best matching locations. We directly use the Shape Context implementation, as well as its matching cost, from the OpenCV package222https://docs.opencv.org/3.0-beta/modules/shape/doc/shape_distances.html . Nearest Neighbor, pHash, Gabor, DeepCompare and MatchNet use the candidate matchings selected by the Stage 1 of the proposed method and then re-rank the candidate matchings using these matching methods and their respective matching costs. The same CMC ranking metric is then computed for each of them for performance evaluation. Specifically, for Nearest Neighbor, we directly calculate the intensity difference between a pair of inputs as their matching cost. For pHash, we use the radial hash as the hash function, and the parameters follow the setting in [8]. pHash was implemented using pHash library333https://www.phash.org/. For Gabor, we construct gabor features using Gabor filter from OpenCV package444https://docs.opencv.org/3.0-beta/modules/imgproc/doc/filtering.html. We set orientation angle to be , , , , , , and , and wavelength to be , , , and . The matching cost is then defined as Euclidean distance between these Gabor features. For the MatchNet, we employ its original network architecture and training parameters, then fine-tune with the above training dataset on the model trained on “Yosemite” dataset555https://github.com/hanxf/matchnet. DeepCompare has multiple networks. We choose the 2-channel deep network introduced in [15], and the base model trained on “Yosemite” dataset666https://github.com/szagoruyko/cvpr15deepcompare. We then fine-tune the model with the canny edge images generated from the “Yosemite” dataset.

CMC curves of the proposed method and the eight comparison methods are shown in Fig. 9. We can see that the proposed method achieves the best CMC performance, and outperforms the second best matching method by 17.3% on Rank-1 CMC value. Figure 10 shows the identification result of the proposed method and the eight comparison methods on a curve structure segmented from a degraded sherd. We can see that, the proposed method matches the true design (in red box) at CMC Rank 1, while the other comparison methods do not. In this figure, we only display the portion masked by the sherd boundary for the top five matched designs. True matchings are highlighted in red boxes.

Fig. 9: CMC curves of the proposed method and the eight comparison matching methods.
Fig. 10: The top 5 matched designs (from top to the bottom) identified by (a) the proposed method, (b) Template Matching, (c) Nearest Neighbor, (d) MatchNet, (e) DeepCompare, (f) Shape Context, (g) Chamfer matching, (h) Gabor and (i) pHash, respectively. True designs are highlighted in the red box. Original design reproduced with permission, courtesy of Frankie Snow, South Georgia State College.

One interesting finding in Fig. 9

is that Template Matching and Nearest Neighbor methods produce much better results than Chamfer Matching and Shape Context. We believe the major reason is that the thinning of curve structures to skeletons in Chamfer Matching and Shape Context makes them very sensitive to image and structure noise. Both DeepCompare and MatchNet could not well learn the features for the proposed curve-pattern matching with noise and deformations and therefore, using them for re-ranking could not produce satisfactory results. We also found that the handcrafted features, such as Gabor feature cannot well capture spatial information, thus show poor performance in our application.

Iv-C Usefulness of Each Stage

Intuitively, either of the two stages in the proposed matching algorithm can be replaced by other alternatives or ignored. To justify the usefulness of each stage, we perform several additional experiments, in which we modify or remove one stage of the proposed algorithm, and then check the influence to the identification performance.

Removing Stage 1: Stage 1 of the proposed method uses a highly computationally efficient matching method - template matching - to exhaustively go over the whole search space, consisting of all possible translations and rotations to align the sherd to a design. To justify its usefulness, we can bypass Stage 1, by directly applying CNN to each solution in the whole search space. Our experiment shows the proposed CNN matching has much higher computational complexity, compared to the template matching in the proposed Stage 1. For example, just matching one sherd curve structure of size pixels to 3,750 design patches by the proposed CNN take 57.895 seconds. This way, matching such a sherd curve structure to a design of size pixels over the whole search space takes 7.24 hours on a DELL Precision T7600 workstation with an Intel Xeon E5-2650 CPU, 32GB memory and an nVidia Tesla K20 GPU card. But exhaustive template matching over the same search space takes only 2.178 seconds on the same computer. With the increased number of designs, sherds and the increased image resolutions, it is impossible to run CNN over the whole search space exhaustively.

Furthermore, we found that the trained CNN may not be able to correctly locate the true locations and orientations on the matched designs when applying it exhaustively over the whole search space, i.e., removing Stage 1. To validate our findings, for each sherd in testing dataset, we randomly select 39 masked design patches located in the range of in each design, while indicates the location and orientation with the best template matching in each design. We also include masked design patches with the best template matching in each design. We apply the proposed CNN matching on the sherds in training dataset and the above selected design patches, the Rank-1 CMC value drops from 67.3% to 29.3%. As shown by an example in Fig. 11, the CNN matching cost is not the minimum when the sherd is matched to the true location with the true orientation in its true design. We believe this is due to the use of limited training samples for the proposed CNN – both positive and negative training samples are the candidate matchings selected in Stage 1, and could not represent all possible matchings in the whole search space. To make it work over the whole search space, we have to use a much larger number of training samples. This is difficult since we only have a relatively small number of positive training samples.

Fig. 11: An illustration of the inappropriateness of applying the trained CNN over the whole search space. (a) Curve structure of a sherd, (b) matching to the true location and orientation, with , and (c) matching to another location and orientation, with lower . Original design reproduced with permission, courtesy of Frankie Snow, South Georgia State College.

Modifying Stage 2: Stage 2 of the proposed method employs the average CNN-based matching cost to re-rank all the candidate matchings generated in Stage 1. In Stage 2, we truncate AlexNet to “conv4” layer and add a GAP layer to reduce feature map dimension. Basically, the learned features from each layer of AlexNet, shown in Fig. 8(b), can be used to calculate the matching cost. However, in practice, we truncate all the layers after “conv4” because it achieves the optimal identification performance using features from “conv4” layer. To validate this choice, we conducted comparison experiments by replacing the proposed CNN with AlexNet and utilizing features from every layer of the AlexNet for defining the CNN-based matching cost denoted by Eq. (2

). The AlexNet was trained using the same training dataset. Note that different from the original AlexNet, we replace the last softmax layer with a “feat” layer to reduce the feature map dimension, since the original AlexNet was designed for multiple classes, while our application needs a binary classifier. Figure 

12 shows the Rank-1 CMC values when using features from different CNN layers.

Fig. 12: Rank-1 CMC values of AlexNet on the 300 testing sherds using learned features from each layer of AlexNet.

We can see that the highest Rank-1 CMC value is achieved when using the features from “conv4” layer, not the features from the final “feat” layer, although the CNN was trained to minimize the loss after the “feat” layer. The major reason is that the CNN is trained as a binary classifier, while the matching cost is a real value reflecting the similarity between two input images. Different layer can measure the image similarity at different levels of details. From “conv1” to “feat”, the features become more and more abstracted. However, we desire an image similarity to be measured at a geometric structure level, which might be better reflected at a hidden layer instead of the final layer. Therefore, we believe it is reasonable to construct the proposed CNN by truncating AlexNet to “conv4” layer.

In Stage 2, we use data augmentation strategy to improve the testing performance by using average matching cost over 96 pairs of images. We conducted a comparison experiment by removing this data augmentation strategy. Our experiment shows the data augmentation in testing improve Rank-1 CMC value from 60.7% to 67.3%. Note that this experiment justifies the use of data augmentation in testing. In the training of CNN, data augmentation is always used and its effectiveness has been verified by many previous works [18, 25, 26].

Iv-D Running Time

For identifying the underlying design of a sherd, we need to match the curve structure segmented from the sherd against all 98 designs and Table II reports the running time for identifying the design of one sherd, averaged over 300 testing sherds. The experiments are conducted on a DELL Precision T7600 workstation with an Intel Xeon E5-2650 CPU, 32GB memory and an nVidia Tesla K20 GPU card. We can see that the proposed method runs faster than Chamfer Matching and Shape Context because it uses the efficient Template Matching in Stage 1 to reduce the search space. However, it is less efficient than Template Matching due to the additional Stage 2 of re-ranking. Nearest Neighbor, pHash, Gabor, MatchNet and DeepCompare take less running time than the proposed method because they do not consider data augmentation.

Method Running time (hour)
Template Matching 0.969
Chamfer Matching 29.90
Shape Context 4.90
Nearest Neighbor 0.969
pHash 0.972
Gabor 0.970
MatchNet 0.970
DeepCompare 0.970
Proposed 1.129
TABLE II: Average running time of identifying the underlying design of a sherd using the proposed method and eight comparison methods.

V Conclusion

In this paper, we explored an important and challenging task in archeology: identifying the underlying curve design on the surface of a highly fragmented and degraded pottery sherd. We developed a new 2-stage template matching algorithm to match the curve structure segmented from a sherd to a set of known designs. In Stage 1, we used a computationally efficient template-matching algorithm to select a small set of candidate matchings on all the designs. In Stage 2, we developed a new CNN-based model to re-rank the candidate matchings for identifying the underlying matched designs. In the experiment, we validated the proposed method by using 600 real sherds together with their corresponding 98 designs from the Woodland Period in Southeastern North America. Comparison to eight existing matching methods verified that the proposed method can achieve a new state-of-the-art performance with reasonable computation time.

Acknowledgment

This research was supported by the National Science Foundation Archaeology and Archaeometry Grant Program (1658987), the National Center for Preservation Technology and Training Grants Program (P16AP00373) and University of South Carolina Social Sciences Grant Program. We would like to show our gratitude to Professor Frankie Snow at South Georgia State College for sharing his pearls of wisdom and design images with us during the course of this research. We also thank Dr. Matthew Compton, Curator of the R. M. Bogan Repository at Georgia Southern University for generously sharing his collection, and our colleague Professor Scot Keith for encouraging the pursuit of this research.

References

  • [1] J. Zhou, H. Yu, K. Smith, C. Wilder, H. Yu, and S. Wang, “Identifying designs from incomplete, fragmented cultural heritage objects by curve-pattern matching,” Journal of Electronic Imaging, vol. 26, no. 1, pp. 011 022–011 022, 2017.
  • [2] P. Durham, P. Lewis, and S. Shennan, “Artefact matching and retrieval using the Generalised Hough Transform,” in Computer Applications and Quantitative Methods in Archaeology, 1995, pp. 25–30.
  • [3] P. Smith, D. Bespalov, A. Shokoufandeh, and P. Jeppson, “Classification of archaeological ceramic fragments using texture and color descriptors,” in

    IEEE Conference on Computer Vision and Pattern Recognition - Workshops

    , 2010, pp. 49–54.
  • [4] U. Schurmans, A. Razdan, A. Simon, M. Marzke, P. McCartney, D. Van Alfen, G. Jones, M. Zhu, D. Liu, M. Bae et al.

    , “Advances in geometric modeling and feature extraction on pots, rocks and bones for representation and query via the Internet,”

    BAR International Series, vol. 1016, pp. 191–204, 2002.
  • [5] S. Belongie, J. Malik, and J. Puzicha, “Shape Context: A new descriptor for shape matching and object recognition,” in Advances in Neural Information Processing Systems, 2001, pp. 831–837.
  • [6] H. Barrow, J. Tenenbaum, R. Bolles, and H. Wolf, “Parametric correspondence and Chamfer matching: Two new techniques for image matching,” in

    International Joint Conference on Artificial Intelligence

    , vol. 2, 1977, pp. 659–663.
  • [7] R. Brunelli, Template Matching Techniques in Computer Vision: Theory and Practice.   John Wiley & Sons, 2009.
  • [8] C. Zauner, Implementation and Benchmarking of Perceptual Image Hash Functions.   Upper Austria University of Applied Sciences, 2010.
  • [9] M. Hassaballah, A. Abdelmgeid, and H. Alshazly, “Image features detection, description and matching,” in Image Feature Detectors and Descriptors, 2016, pp. 11–45.
  • [10] B. Manjunath and W. Ma, “Texture features for browsing and retrieval of image data,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 837–842, 1996.
  • [11] T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 971–987, 2002.
  • [12] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, 2005, pp. 886–893.
  • [13] M. Brown and D. Lowe, “Automatic panoramic image stitching using invariant features,” International Journal of Computer Vision, vol. 74, no. 1, pp. 59–73, 2007.
  • [14] X. Han, T. Leung, Y. Jia, R. Sukthankar, and A. Berg, “MatchNet: Unifying feature and metric learning for patch-based matching,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3279–3286.
  • [15] S. Zagoruyko and N. Komodakis, “Learning to compare image patches via convolutional neural networks,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 4353–4361.
  • [16]

    K. Lin, H. Yang, J. Hsiao, and C. Chen, “Deep learning of binary hash codes for fast image retrieval,” in

    IEEE Conference on Computer Vision and Pattern Recognition - Workshops, 2015, pp. 27–35.
  • [17] J. Wang, Y. Song, T. Leung, C. Rosenberg, J. Wang, J. Philbin, B. Chen, and Y. Wu, “Learning fine-grained image similarity with deep ranking,” in IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1386–1393.
  • [18]

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in

    Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.
  • [19] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
  • [20] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1–9.
  • [21] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” Computing Research Repository, vol. abs/1409.1556, 2014.
  • [22] Y. Lu, J. Zhou, J. Chen, J. Wang, K. Smith, W. Colin, and S. Wang, “Curve-structure segmentation from depth maps: A CNN-based approach and its application to exploring cultural heritage objects,” AAAI Conference on Artificial Intelligence, 2018.
  • [23] J. Lewis, “Fast normalized cross-correlation,” in Vision Interface, vol. 10, no. 1, 1995, pp. 120–123.
  • [24] R. Weber, H. Schek, and S. Blott, “A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces,” in International Conference on Very Large Data Bases, vol. 98, 1998, pp. 194–205.
  • [25] L. Yaeger, R. Lyon, and B. Webb, “Effective training of a neural network character classifier for word recognition,” in Advances in Neural Information Processing Systems, 1997, pp. 807–816.
  • [26]

    I. Masi, A. Trần, T. Hassner, J. Leksut, and G. Medioni, “Do we really need to collect millions of faces for effective face recognition?” in

    European Conference on Computer Vision, 2016, pp. 579–596.