Log In Sign Up

Automatic inspection of cultural monuments using deep and tensor-based learning on hyperspectral imagery

by   Ioannis N. Tzortzis, et al.

In Cultural Heritage, hyperspectral images are commonly used since they provide extended information regarding the optical properties of materials. Thus, the processing of such high-dimensional data becomes challenging from the perspective of machine learning techniques to be applied. In this paper, we propose a Rank-R tensor-based learning model to identify and classify material defects on Cultural Heritage monuments. In contrast to conventional deep learning approaches, the proposed high order tensor-based learning demonstrates greater accuracy and robustness against overfitting. Experimental results on real-world data from UNESCO protected areas indicate the superiority of the proposed scheme compared to conventional deep learning models.


page 2

page 3


Rank-R FNN: A Tensor-Based Learning Model for High-Order Data Classification

An increasing number of emerging applications in data science and engine...

Tensor-based Nonlinear Classifier for High-Order Data Analysis

In this paper we propose a tensor-based nonlinear model for high-order d...

A Support Tensor Train Machine

There has been growing interest in extending traditional vector-based ma...

Kernelized Support Tensor Train Machines

Tensor, a multi-dimensional data structure, has been exploited recently ...

Evaluating the Usefulness of Unsupervised monitoring in Cultural Heritage Monuments

In this paper, we scrutinize the effectiveness of various clustering tec...

Cross-cultural Usability Issues in E/M-Learning

This paper gives an overview of electronic learning (E-Learning) and mob...

Predicting online user behaviour using deep learning algorithms

We propose a robust classifier to predict buying intentions based on use...

1 Introduction

Cultural Heritage (CH) assets suffer from man-made hazards, and natural disasters [15, 17] as UNESCO declares in World Heritage in Danger [5]. Therefore, CH entities require regular inspection for defects, material deterioration and structure deformation. Currently, inspection is performed on-site and manually by experts. However, this is a tedious, time consuming and costly task. Motivated by the advances in hyperspectral sensing and deep learning, in this paper, we propose an automatic non-invasive approach for CH assets inspection.

A typical deep learning model includes a large amount of tunable parameters to be learnt implying, in the sequel, a large amount of training samples. However, collection of labeled data is an expensive and tedious process, especially for CH applications. This is due the fact that (i) complementary approaches (invasive, non-invasive) should be carried out to verify the type and the degree of material deterioration and/or structural deformation and (ii) the captured hyperspectral image data should be annotated by engineers’ experts.

Recently, tensor-based learning has been emerged as a powerful alternative for classifying hyperspectral image data [10, 12, 14, 11]. These works (i) apply a canonical decomposition of the model parameters to reduce the number of trainable weights—and thus the number of samples required for efficient training—and (ii) retain the raw (tensor) form of the data to fully exploit its structural information encoded across the available data modes (dimensions). Currently tensor learning have been mostly applied for remote sensing agricultural scenarios from satellite data. It should be mentioned that the scale of a remote sensing data is of order of Kms, making it impossible for detecting defects in CH assets. These works are motivated by the property of hyperspectral imaging to act as a material detector since different materials absorb or reflect light differently. However, CH asset inspection requires high discrimination sensitivity since the model is designed to distinguish small deformations, defects and cracks which could be catastrophic for its stability. Additionally, the scale of the analysis is of order of few centimeters imposing challenges both for performance and computational demands.

Figure 1: The proposed framework for automatic defect detection on CH assets using an Rank- tensor-learning model.

In this study, we introduce a tensor-based learning system capable of detecting and classifying defects on CH monuments. Our proposed system exploits the Rank--FNN architecture [12] to overcome the main drawback of deep learning models, related to the requirements for large amounts of training data, without compromising detection accuracy. Our system detects and classifies defects using hyperspectral images captured with ground senors giving the opportunity to civil engineers to assess in a non invasive way and in real-time the impact of defects on the structure of CH monuments.

To sum up, our paper is novel at the following directions. First, it introduces a high-order tensor-based learning system using ground hyperspectral data for defect detection and classification in CH structures. Second, it utilizes much smaller training samples than other conventional deep learning approaches for training, significantly reducing annotation effort and computational demands. Third, it introduces a new real-world dataset captured at UNESCO protected areas for evaluating the proposed scheme.

1.1 Related Works

Defect detection with hyperspectral data: The authors in [1] show that spectral imaging techniques can offer several possibilities in exploiting optical properties of materials, by taking into account specific characteristics of the spectral bands. The study in [2]

tries to address the problem of crack detection on paintings. Towards this direction, distance-based spectral mathematical morphology was used, offering a vector and full-band processing approach. Additionally, several top-hat transformations were utilized to assist the task. This work also confirms the capacity of hyperspectral imaging to offer additional useful and essential information. However, none of the aforementioned approaches is married with deep and tensor-based learning schemes to improve classification performance across difficult real-world CH paradigms and being able to discriminate not only the defects but also the defect types as we perform in presented study.

Inspection of CH assets: The study in [3] describes a data fusion pipeline for the delayering of X-ray fluorence (XRF) images. At first, visible hyperspectral reflectance data (RIS) is clustered in pigment mixtures. Then, a synthetic surface XRF image is formed by calculating the mean XRF response across all clusters. Finally, the surface and subsurface correlated features are identified by subtracting the synthetic surface XRF from the full image. In [16]

, the combination of hyperspectral imaging with advanced signal processing techniques is proposed as a tool to assist the artwork authentication procedure. According to the authors, this is achieved by applying classification techniques on coloured pigments. The Support Vector Machine (SVM) algorithm was used in combination with the “one-against-one” technique in order to facilitate the multi-class problem. In

[8], the authors exploit 3D textured models in combination of hyperspectral imagery to define specific areas of degradation. Although these works utilize state-of-the-art deep learning techniques, they require a large amount of samples for the training process. In stark contrast, the proposed tensor-based learning model requires noticeably lower amount of training samples.

Figure 2: a) Sample image from the dataset, b) the corresponding annotation, c) CNN prediction (TWS=9, TS=50), d) CNN prediciton (TWS=21, TS=400), e) Rank- FNN prediciton (TWS=9, TS=50), f) Rank- FNN prediciton (TWS=21, TS=400), g) Spectral response of representative pixels from each class.
Figure 3: In the top row, the graphs show the 95% confidential interval of the overall test accuracy for both models and different adjustments of TS and TWS parameters. In the bottom row, the mean accuracy per class is presented, for both models and different adjustments of TS and TWS parameters.

2 Methodology

2.1 Problem Formulation

Our problem can be seen as a multi-class classification task. Let us denote as the number of classes. Then, we can detect different types of defects plus one class of pixels that correspond to no defect.

Let us denote as the information describing the -th pixel of a hyperspectral image, and as , such that , the ground truth label vector for the same pixel. Then given a collection


the problem of automatic defect detection boils down to estimate a function

, such that


In Eq. (2), stands for the cross-entropy loss and is the set of parameters that determine the form of .

2.2 Rank- Tensor Based Learning

To efficiently detect defects, each should carry both spectral and spatial information describing the -th pixel. In other words, each should carry information about the spectral response of pixel and information about -th pixel’s neighbors. Towards this direction, with we represent a square patch of dimensions of the hyperspectral image centered at the -th pixel. Parameter stands for the height and width of the patch, and for the number of spectral bands. This way, each is a 3rd-order tensor encoding both the spatial and spectral information of pixel .

To address the problem formulated in the previous section, we represent the function by a tensor-based machine learning model utilizing high-order canonical decomposition. We call this model as Rank-

Feed Forward Tensor-based Neural Network (Rank-

FNN) since it exploits tensor operators in a feedforward structure. Particularly, Rank- FNN is a neural network with one hidden layer which consists of, let’s say,

hidden neurons. Rank-

FNN weights connecting the input to hidden layer are tensors satisfying the Rank- Canonical-Polyadic decomposition [7]:


for with and , . Superscript denotes that these weights connect the input to the -th neuron of the hidden layer, and “” operator stands for vectors outer product. The output of the Rank- FNN for the -th class is


where collects the weights between the hidden layer and the -th output neuron,

denotes the softmax activation function, and



for to be the output of the hidden layer activated by function . In this study, we use and compare the Rank- FNN and the CNN in [13] since both models exploit spatio-spectral pixels’ information and can be used for pixel-wise hyperspectral image classification tasks. Given a collection of training data in the form of relation (1

), we estimate the set of parameters of the employed models using the backpropagation algorithm

[9] with the Adam gradient based optimizer [6]. Fig.1 presents our overall approach.

3 Experimental Results

3.1 Dataset description

We use a new dataset consisting of hyperspectral images depicting part of ancient walls of Saint Nicolas fortress located in the UNESCO Heritage Medieval city of Rhodes, Greece. The 6 images of the dataset were collected using the HyperView sensing platform [4] by 3D-one, which combines the information from one Visual (VIS) snap-shot camera and one Near Infrared (NIR) snap-shot camera. Each hyperspectral image consists of 1016 x 1820 pixels and 42 spectral bands. Moreover, each image is accompanied by a pixel-based annotated ground truth image (see Fig. 2.b), carried out by CH experts. Four different classes are considered as is depicted in Fig. 2 with different colors; class 0 represents the salt defects (yellow), class 1 shows the non-significantly defected areas(light blue), class 2 depicts minor deterioration (orange) and class 3 (red) shows major deterioration. Fig. 2.g presents the spectral response of representative pixels from each class.

3.2 Pre-processing pipeline

The hyperspectral images are partitioned into tensor objects. The sampling unit selects random samples from the tensor objects to create the training and the test sets. A permutation process is also applied on the training set to reduce bias effect. We normalize each hyperspectral image in a band-wise manner using the min-max normalization to restrict the pixels’ responses at each band to . After normalization, we split each image into patches of dimension (tensorization step in Fig.1). During our experiments we set parameter

equal to 9, 15 and 21 to investigate the effect of patch height and width on the defect detection accuracy. The generated patches of all the available hyperspectral images are aggregated into a single set. From this set, we randomly select a number samples per class for training the models (the proposed tensor based and compared), while the rest of the patches are used as a test set for evaluating the defect detection performance. We have created the training set by randomly selecting 50, 100, 200 and 400 samples per class to to investigate the impact of training set size on learning models’ performance. We repeat the aforementioned hold-out cross validation scheme 10 times and report the average model classification accuracy and 95% confidence intervals.

3.3 Performance Evaluation

In this section, we evaluate the defect detection performance in terms of model classification accuracy and investigate the impact of patch size (parameter ) and training set size (50, 100, 200 and 400 samples per class). Two essential hyper-parameters are examined; Tensor Window Size (TWS) and Tensor Samples (TS)

. Twelve well defined different experiments are designed and performed. Three distinct values of TWS (9, 15 and 21) are used and for each value, four increasing values of TS are selected (50, 100, 200 and 400). The models are trained for 100 epochs and the results are calculated on the test set. Each experiment is repeated ten times to ensure the validity of the results.

In Fig. 2(a)-2(c) we present the overall accuracy of the proposed tensor based model compared with the state-of-the-art CNN-based model used in the literature for hyperspectral image classification [13]

.The accuracy is presented for different number of training samples (TS range from 50 to 400 samples / class) and TWS ranging from 9 to 21. In this figure, the colored shaded areas across the lines stand for the standard deviation value. As is observed, the tensor based model presents the best accuracy, while retaining its performance for small number of training samples. In particular, for TWS=9, the accuracy of the state-of-the-art CNN varies from 45% and 65% depending on the TS values, while the accuracy of the proposed tensor model ranges from 65% to 78%. It should be mentioned that as the number of training samples (TS) increases the accuracy of the CNN is similar to the tensor model. However, the CNN model presents much higher standard deviation across the several repetitions of the experiments, implying a much lower robustness compared to the proposed tensor model. Fig.

2(d)-2(f) presents the accuracy of the proposed tensor model and the compared CNN across the different defect classes and different TWS values 9, 15 and 21 respectively. As is observed, the proposed tensor model outperforms the CNN approach over the different classes.

In Fig. 2, a visual representation of both models predictions is demonstrated, along with the original image and the corresponding ground truth annotation. The results are depicted fro different number of training samples (TS) and window sizes (TWS). Even the worst case of Rank- FNN (Fig. 2.e), which uses a small number of training samples, is more accurate than the best CNN approach (Fig. 2.d). This reveals the robustness of the proposed tensor scheme for a small number of training samples.

4 Conclusion

In this work, we introduced the Rank- tensor-based learning model for the detection of different defect types on CH monuments using hyperspectral images. This model is compared against a state-of-the-art CNN. According to the results, the proposed method achieves higher accuracy score, even for low amount of training samples, while its standard deviation is lower than the CNN. In general, the proposed Rank- FNN increases the accuracy score 20% more than the CNN approach.


  • [1] A. Alexopoulou, A. A. Kaminari, and A. Moutsatsou (2019) Multispectral and hyperspectral studies on greek monuments, archaeological objects and paintings on different substrates. achievements and limitations. In Transdisciplinary Multispectral Modeling and Cooperation for the Preservation of Cultural Heritage, A. Moropoulou, M. Korres, A. Georgopoulos, C. Spyrakos, and C. Mouzakis (Eds.), Cham, pp. 443–461. Cited by: §1.1.
  • [2] H. Deborah, N. Richard, and J. Y. Hardeberg (2015) Hyperspectral crack detection in paintings. In 2015 Colour and Visual Computing Symposium (CVCS), Vol. , pp. 1–6. External Links: Document Cited by: §1.1.
  • [3] L. D. Fiske, A. K. Katsaggelos, M. C. G. Aalders, M. Alfeld, M. Walton, and O. Cossairt (2021) A data fusion method for the delayering of x-ray fluorescence images of painted works of art. In 2021 IEEE International Conference on Image Processing (ICIP), Vol. , pp. 3458–3462. External Links: Document Cited by: §1.1.
  • [4] A. S. Garea, Á. Ordóñez, D. B. Heras, and F. Argüello (2016)

    HypeRvieW: an open source desktop application for hyperspectral remote-sensing data processing

    International Journal of Remote Sensing 37 (23), pp. 5533–5550. Cited by: §3.1.
  • [5] (2021) ICOMOS world heritage in danger. UNECO Report. Cited by: §1.
  • [6] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §2.2.
  • [7] T. G. Kolda and B. W. Bader (2009) Tensor decompositions and applications. SIAM review 51 (3), pp. 455–500. Cited by: §2.2.
  • [8] P. Kolokoussis, M. Skamantzari, S. Tapinaki, V. Karathanassi, and A. Georgopoulos (2021) 3D and hyperspectral data integration for assessing material degradation in medieval masonry heritage buildings. The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 43, pp. 583–590. Cited by: §1.1.
  • [9] Y. LeCun, Y. Bengio, and G. Hinton (2015) Deep learning. nature 521 (7553), pp. 436–444. Cited by: §2.2.
  • [10] K. Makantasis, A. D. Doulamis, N. D. Doulamis, and A. Nikitakis (2018) Tensor-based classification models for hyperspectral data analysis. IEEE Transactions on Geoscience and Remote Sensing 56 (12), pp. 6884–6898. Cited by: §1.
  • [11] K. Makantasis, A. Doulamis, N. Doulamis, and A. Voulodimos (2019) Common mode patterns for supervised tensor subspace learning. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2927–2931. Cited by: §1.
  • [12] K. Makantasis, A. Georgogiannis, A. Voulodimos, I. Georgoulas, A. Doulamis, and N. Doulamis (2021) Rank-r fnn: a tensor-based learning model for high-order data classification. IEEE Access 9, pp. 58609–58620. Cited by: §1, §1.
  • [13] K. Makantasis, K. Karantzalos, A. Doulamis, and N. Doulamis (2015)

    Deep supervised learning for hyperspectral data classification through convolutional neural networks

    In 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Vol. , pp. 4959–4962. External Links: Document Cited by: §2.2, §3.3.
  • [14] K. Makantasis, A. Voulodimos, A. Doulamis, N. Bakalos, and N. Doulamis (2021) Space-time domain tensor neural networks: an application on human pose classification. In

    2020 25th International Conference on Pattern Recognition (ICPR)

    pp. 4688–4695. Cited by: §1.
  • [15] K. Papachristou, N. Dimitriou, A. Drosou, G. Karagiannis, and D. Tzovaras (2018) Realistic texture reconstruction incorporating spectrophotometric color correction. In 2018 25th IEEE International Conference on Image Processing (ICIP), Vol. , pp. 415–419. External Links: Document Cited by: §1.
  • [16] A. Polak, T. Kelman, P. Murray, S. Marshall, D. J. Stothard, N. Eastaugh, and F. Eastaugh (2017) Hyperspectral imaging combined with data classification techniques as an aid for artwork authentication. Journal of Cultural Heritage 26, pp. 1–11. Cited by: §1.1.
  • [17] I. Rallis, N. Bakalos, N. Doulamis, A. Voulodimos, A. Doulamis, and E. Protopapadakis (2019) Learning choreographic primitives through a bayesian optimized bi-directional lstm model. In 2019 IEEE International Conference on Image Processing (ICIP), Vol. , pp. 1940–1944. External Links: Document Cited by: §1.