1 Introduction
Cultural Heritage (CH) assets suffer from manmade hazards, and natural disasters [15, 17] as UNESCO declares in World Heritage in Danger [5]. Therefore, CH entities require regular inspection for defects, material deterioration and structure deformation. Currently, inspection is performed onsite and manually by experts. However, this is a tedious, time consuming and costly task. Motivated by the advances in hyperspectral sensing and deep learning, in this paper, we propose an automatic noninvasive approach for CH assets inspection.
A typical deep learning model includes a large amount of tunable parameters to be learnt implying, in the sequel, a large amount of training samples. However, collection of labeled data is an expensive and tedious process, especially for CH applications. This is due the fact that (i) complementary approaches (invasive, noninvasive) should be carried out to verify the type and the degree of material deterioration and/or structural deformation and (ii) the captured hyperspectral image data should be annotated by engineers’ experts.
Recently, tensorbased learning has been emerged as a powerful alternative for classifying hyperspectral image data [10, 12, 14, 11]. These works (i) apply a canonical decomposition of the model parameters to reduce the number of trainable weights—and thus the number of samples required for efficient training—and (ii) retain the raw (tensor) form of the data to fully exploit its structural information encoded across the available data modes (dimensions). Currently tensor learning have been mostly applied for remote sensing agricultural scenarios from satellite data. It should be mentioned that the scale of a remote sensing data is of order of Kms, making it impossible for detecting defects in CH assets. These works are motivated by the property of hyperspectral imaging to act as a material detector since different materials absorb or reflect light differently. However, CH asset inspection requires high discrimination sensitivity since the model is designed to distinguish small deformations, defects and cracks which could be catastrophic for its stability. Additionally, the scale of the analysis is of order of few centimeters imposing challenges both for performance and computational demands.
In this study, we introduce a tensorbased learning system capable of detecting and classifying defects on CH monuments. Our proposed system exploits the RankFNN architecture [12] to overcome the main drawback of deep learning models, related to the requirements for large amounts of training data, without compromising detection accuracy. Our system detects and classifies defects using hyperspectral images captured with ground senors giving the opportunity to civil engineers to assess in a non invasive way and in realtime the impact of defects on the structure of CH monuments.
To sum up, our paper is novel at the following directions. First, it introduces a highorder tensorbased learning system using ground hyperspectral data for defect detection and classification in CH structures. Second, it utilizes much smaller training samples than other conventional deep learning approaches for training, significantly reducing annotation effort and computational demands. Third, it introduces a new realworld dataset captured at UNESCO protected areas for evaluating the proposed scheme.
1.1 Related Works
Defect detection with hyperspectral data: The authors in [1] show that spectral imaging techniques can offer several possibilities in exploiting optical properties of materials, by taking into account specific characteristics of the spectral bands. The study in [2]
tries to address the problem of crack detection on paintings. Towards this direction, distancebased spectral mathematical morphology was used, offering a vector and fullband processing approach. Additionally, several tophat transformations were utilized to assist the task. This work also confirms the capacity of hyperspectral imaging to offer additional useful and essential information. However, none of the aforementioned approaches is married with deep and tensorbased learning schemes to improve classification performance across difficult realworld CH paradigms and being able to discriminate not only the defects but also the defect types as we perform in presented study.
Inspection of CH assets: The study in [3] describes a data fusion pipeline for the delayering of Xray fluorence (XRF) images. At first, visible hyperspectral reflectance data (RIS) is clustered in pigment mixtures. Then, a synthetic surface XRF image is formed by calculating the mean XRF response across all clusters. Finally, the surface and subsurface correlated features are identified by subtracting the synthetic surface XRF from the full image. In [16]
, the combination of hyperspectral imaging with advanced signal processing techniques is proposed as a tool to assist the artwork authentication procedure. According to the authors, this is achieved by applying classification techniques on coloured pigments. The Support Vector Machine (SVM) algorithm was used in combination with the “oneagainstone” technique in order to facilitate the multiclass problem. In
[8], the authors exploit 3D textured models in combination of hyperspectral imagery to define specific areas of degradation. Although these works utilize stateoftheart deep learning techniques, they require a large amount of samples for the training process. In stark contrast, the proposed tensorbased learning model requires noticeably lower amount of training samples.2 Methodology
2.1 Problem Formulation
Our problem can be seen as a multiclass classification task. Let us denote as the number of classes. Then, we can detect different types of defects plus one class of pixels that correspond to no defect.
Let us denote as the information describing the th pixel of a hyperspectral image, and as , such that , the ground truth label vector for the same pixel. Then given a collection
(1) 
the problem of automatic defect detection boils down to estimate a function
, such that(2) 
In Eq. (2), stands for the crossentropy loss and is the set of parameters that determine the form of .
2.2 Rank Tensor Based Learning
To efficiently detect defects, each should carry both spectral and spatial information describing the th pixel. In other words, each should carry information about the spectral response of pixel and information about th pixel’s neighbors. Towards this direction, with we represent a square patch of dimensions of the hyperspectral image centered at the th pixel. Parameter stands for the height and width of the patch, and for the number of spectral bands. This way, each is a 3rdorder tensor encoding both the spatial and spectral information of pixel .
To address the problem formulated in the previous section, we represent the function by a tensorbased machine learning model utilizing highorder canonical decomposition. We call this model as Rank
Feed Forward Tensorbased Neural Network (Rank
FNN) since it exploits tensor operators in a feedforward structure. Particularly, Rank FNN is a neural network with one hidden layer which consists of, let’s say,hidden neurons. Rank
FNN weights connecting the input to hidden layer are tensors satisfying the Rank CanonicalPolyadic decomposition [7]:(3) 
for with and , . Superscript denotes that these weights connect the input to the th neuron of the hidden layer, and “” operator stands for vectors outer product. The output of the Rank FNN for the th class is
(4) 
where collects the weights between the hidden layer and the th output neuron,
denotes the softmax activation function, and
with(5) 
for to be the output of the hidden layer activated by function . In this study, we use and compare the Rank FNN and the CNN in [13] since both models exploit spatiospectral pixels’ information and can be used for pixelwise hyperspectral image classification tasks. Given a collection of training data in the form of relation (1
), we estimate the set of parameters of the employed models using the backpropagation algorithm
[9] with the Adam gradient based optimizer [6]. Fig.1 presents our overall approach.3 Experimental Results
3.1 Dataset description
We use a new dataset consisting of hyperspectral images depicting part of ancient walls of Saint Nicolas fortress located in the UNESCO Heritage Medieval city of Rhodes, Greece. The 6 images of the dataset were collected using the HyperView sensing platform [4] by 3Done, which combines the information from one Visual (VIS) snapshot camera and one Near Infrared (NIR) snapshot camera. Each hyperspectral image consists of 1016 x 1820 pixels and 42 spectral bands. Moreover, each image is accompanied by a pixelbased annotated ground truth image (see Fig. 2.b), carried out by CH experts. Four different classes are considered as is depicted in Fig. 2 with different colors; class 0 represents the salt defects (yellow), class 1 shows the nonsignificantly defected areas(light blue), class 2 depicts minor deterioration (orange) and class 3 (red) shows major deterioration. Fig. 2.g presents the spectral response of representative pixels from each class.
3.2 Preprocessing pipeline
The hyperspectral images are partitioned into tensor objects. The sampling unit selects random samples from the tensor objects to create the training and the test sets. A permutation process is also applied on the training set to reduce bias effect. We normalize each hyperspectral image in a bandwise manner using the minmax normalization to restrict the pixels’ responses at each band to . After normalization, we split each image into patches of dimension (tensorization step in Fig.1). During our experiments we set parameter
equal to 9, 15 and 21 to investigate the effect of patch height and width on the defect detection accuracy. The generated patches of all the available hyperspectral images are aggregated into a single set. From this set, we randomly select a number samples per class for training the models (the proposed tensor based and compared), while the rest of the patches are used as a test set for evaluating the defect detection performance. We have created the training set by randomly selecting 50, 100, 200 and 400 samples per class to to investigate the impact of training set size on learning models’ performance. We repeat the aforementioned holdout cross validation scheme 10 times and report the average model classification accuracy and 95% confidence intervals.
3.3 Performance Evaluation
In this section, we evaluate the defect detection performance in terms of model classification accuracy and investigate the impact of patch size (parameter ) and training set size (50, 100, 200 and 400 samples per class). Two essential hyperparameters are examined; Tensor Window Size (TWS) and Tensor Samples (TS)
. Twelve well defined different experiments are designed and performed. Three distinct values of TWS (9, 15 and 21) are used and for each value, four increasing values of TS are selected (50, 100, 200 and 400). The models are trained for 100 epochs and the results are calculated on the test set. Each experiment is repeated ten times to ensure the validity of the results.
In Fig. 2(a)2(c) we present the overall accuracy of the proposed tensor based model compared with the stateoftheart CNNbased model used in the literature for hyperspectral image classification [13]
.The accuracy is presented for different number of training samples (TS range from 50 to 400 samples / class) and TWS ranging from 9 to 21. In this figure, the colored shaded areas across the lines stand for the standard deviation value. As is observed, the tensor based model presents the best accuracy, while retaining its performance for small number of training samples. In particular, for TWS=9, the accuracy of the stateoftheart CNN varies from 45% and 65% depending on the TS values, while the accuracy of the proposed tensor model ranges from 65% to 78%. It should be mentioned that as the number of training samples (TS) increases the accuracy of the CNN is similar to the tensor model. However, the CNN model presents much higher standard deviation across the several repetitions of the experiments, implying a much lower robustness compared to the proposed tensor model. Fig.
2(d)2(f) presents the accuracy of the proposed tensor model and the compared CNN across the different defect classes and different TWS values 9, 15 and 21 respectively. As is observed, the proposed tensor model outperforms the CNN approach over the different classes.In Fig. 2, a visual representation of both models predictions is demonstrated, along with the original image and the corresponding ground truth annotation. The results are depicted fro different number of training samples (TS) and window sizes (TWS). Even the worst case of Rank FNN (Fig. 2.e), which uses a small number of training samples, is more accurate than the best CNN approach (Fig. 2.d). This reveals the robustness of the proposed tensor scheme for a small number of training samples.
4 Conclusion
In this work, we introduced the Rank tensorbased learning model for the detection of different defect types on CH monuments using hyperspectral images. This model is compared against a stateoftheart CNN. According to the results, the proposed method achieves higher accuracy score, even for low amount of training samples, while its standard deviation is lower than the CNN. In general, the proposed Rank FNN increases the accuracy score 20% more than the CNN approach.
References
 [1] (2019) Multispectral and hyperspectral studies on greek monuments, archaeological objects and paintings on different substrates. achievements and limitations. In Transdisciplinary Multispectral Modeling and Cooperation for the Preservation of Cultural Heritage, A. Moropoulou, M. Korres, A. Georgopoulos, C. Spyrakos, and C. Mouzakis (Eds.), Cham, pp. 443–461. Cited by: §1.1.
 [2] (2015) Hyperspectral crack detection in paintings. In 2015 Colour and Visual Computing Symposium (CVCS), Vol. , pp. 1–6. External Links: Document Cited by: §1.1.
 [3] (2021) A data fusion method for the delayering of xray fluorescence images of painted works of art. In 2021 IEEE International Conference on Image Processing (ICIP), Vol. , pp. 3458–3462. External Links: Document Cited by: §1.1.

[4]
(2016)
HypeRvieW: an open source desktop application for hyperspectral remotesensing data processing
. International Journal of Remote Sensing 37 (23), pp. 5533–5550. Cited by: §3.1.  [5] (2021) ICOMOS world heritage in danger. UNECO Report. Cited by: §1.
 [6] (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §2.2.
 [7] (2009) Tensor decompositions and applications. SIAM review 51 (3), pp. 455–500. Cited by: §2.2.
 [8] (2021) 3D and hyperspectral data integration for assessing material degradation in medieval masonry heritage buildings. The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 43, pp. 583–590. Cited by: §1.1.
 [9] (2015) Deep learning. nature 521 (7553), pp. 436–444. Cited by: §2.2.
 [10] (2018) Tensorbased classification models for hyperspectral data analysis. IEEE Transactions on Geoscience and Remote Sensing 56 (12), pp. 6884–6898. Cited by: §1.
 [11] (2019) Common mode patterns for supervised tensor subspace learning. In ICASSP 20192019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2927–2931. Cited by: §1.
 [12] (2021) Rankr fnn: a tensorbased learning model for highorder data classification. IEEE Access 9, pp. 58609–58620. Cited by: §1, §1.

[13]
(2015)
Deep supervised learning for hyperspectral data classification through convolutional neural networks
. In 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Vol. , pp. 4959–4962. External Links: Document Cited by: §2.2, §3.3. 
[14]
(2021)
Spacetime domain tensor neural networks: an application on human pose classification.
In
2020 25th International Conference on Pattern Recognition (ICPR)
, pp. 4688–4695. Cited by: §1.  [15] (2018) Realistic texture reconstruction incorporating spectrophotometric color correction. In 2018 25th IEEE International Conference on Image Processing (ICIP), Vol. , pp. 415–419. External Links: Document Cited by: §1.
 [16] (2017) Hyperspectral imaging combined with data classification techniques as an aid for artwork authentication. Journal of Cultural Heritage 26, pp. 1–11. Cited by: §1.1.
 [17] (2019) Learning choreographic primitives through a bayesian optimized bidirectional lstm model. In 2019 IEEE International Conference on Image Processing (ICIP), Vol. , pp. 1940–1944. External Links: Document Cited by: §1.