With the rapid development of sensor data acquisition, signal processing, data storage and analysis, Structural Health Monitoring (SHM) system has a wide application prospect for machinery, high-rise buildings, long-span bridges, etc. Many complex structures, such as the Tsing Ma Bridge in Hong Kong, China, the Z24 Bridge in Switzerland, and the Caijia Jialing River Bridge in Chongqing [zhou2017health], have deployed SHM systems. A large number of monitoring sensors, such as acceleratormeter, energy, temperature and humidity gauge, and strain gauge, are installed in the key positions of structural buildings, in order to detect abd analyze structural defects. In SHM, structural damage detection based on acceleration vibration signal is one of the most important tasks, which aims to detect damages based on the changes of important structural parameters, such as natural frequency, stiffness, damping ratio and modal vibration mode [zhouhong2014damage].
Recent advances in deep learning [chen2017gpu, lecun2015deep]
can model the complex nonlinear relationships and have demonstrated superior performance on a wide range of domains, such as computer vision (CV) , natural language (NLP) processing[dau2020recommendation], stochastic configuration networks (SCNs) [wang2017stochasticFA] etc. Stochastic configuration networks [wang2017stochastic] used widely a lagre-scale data analytical, which can deal with heterogeneous features through a fast decorrelated neuro-ensemble. Convolutional neural networks (CNNs) [chen2018exploiting, han2019convolutional] based models have been successfully utilized to extract spatial features of images which are usually 2D data, and have achieved promising results in CV tasks, such as image classification [shi20173d], image segmentation [SeSeNet]
, and object detection. CNNs can extract features because of two key properties: spatially shared weights and spatial pooling, while recurrent neural networks (RNNs) based methods can generate and address memories of arbitrary-length sequences of input patterns[Zhang2017A]
. RNN attempts to map from the entire history of previous inputs to target vectors in principle and allows a memory of previous inputs kept in the network’s internal state. RNNs are usually utilized for supervised learning tasks with sequential input data, such as sentiment classification[chen2019gatedsen]
and target outputs. GRU is a simple and yet powerful variant of RNNs for sequence modeling tasks due to the gated mechanism[cho2014properties, chung2014empirical, fanta2020sitgru]. GRU is carefully designed to memorize historical information and fuse current states, new inputs and historical information together in a recurrently gated way.
The successful applications of deep learning have inspired several attempts to address the challenge of structural damage identification. These work can be mainly classified into three categories, multilayer perceptron (MLP), CNNs, and RNNs based methods, as discussed in the following. (i) Guo et al., used the sparse coding neural network as the feature extraction model, and the MLP as the classifier for structural damage identification. (ii) Abdeljaber[abdeljaber2017real] et al., proposed a structural damage feature extraction and recognition model based on CNNs. Bao [bao2019computer] et al., proposed a CNN based method which first converted the time series data collected by SHM into an image, and then utilized CNN to learn the features of the converted images. (iii) Zhao [zhao2017machine] et al., proposed a RNN based feature extraction method for collected time series data to identify machine conditions.
These pioneering attempts have presented superior performance for structural damage detection compared with previous methods based on traditional prediction methods. However, none of the work considers both spatial relations of different sensors and temporal sequential relations simultaneously. In addition, RNN-based approaches are slow and difficult to explore the data with very long-term sequential dependencies due to issues with gradient backpropagation vanishing, while CNN based approaches have high memory consumption, and do not involve smooth and interpretable latent representations that can be reused for down-stream tasks.
In this work, we propose a Hierarchical CNN and GRU framework (HCG) that harnesses the capabilities of CNNs and gated recurrent unit (GRU) jointly to capture the complex nonlinear relations of both space and time for structural damage identification. HCG consists of two levels of models to capture hierarchical levels of features. The high-level model uses GRUs to capture the evolving long-term sequential dependencies of data captured by sensors, while the low-level model is implemented with CNNs, to capture the short-term sequential dependencies of the data and interactions among sensors. Experimental results show that HCG has the following advantages over existing approaches:
HCG has a significant performance improvement over existing deep learning models on both IASC-ASCE structural health monitoring benchmark and scale model three-span continuous rigid frame bridge structure datasets;
Compared with RNN-based approaches, HCG is 1.5 times faster in terms of training time;
Compared with CNN-based approaches, HCG requires roughly 10% data memory usage and allows for easy latent feature extraction.
2 Related Work
In this section, the related developments in structural damage detection domain are mainly illustrated. For many years, research on structural damage detection methods based on high-frequency vibration signals has received extensive attention from academia and industrial communities. These work can be mainly divided into model-driven and data-driven methods [diez2016clustering].
2.1 Model-driven Method
Previous model-driven methods utilize mathematical models and physical theorems to discretize the structures. However, these methods have some limitations in the establishment and modification of complex structural models and the simulation of real excitation conditions [neves2017structural]. Moreover, due to the large-scale structure detection, the natural frequency of different seasons will change greatly. Since the structure is always in the state of unknown excitation, some methods that have achieved good results in the field of mechanical damage detection, such as wavelet transform and HHT, are also subject to certain restrictions. The model-driven methods cannot update the models with on-line measured data. Therefore, it cannot be applied flexibly. Model modification is mostly a mathematical process and the physical interpretation of the results is not obvious, which demands manual intervention and judgment. Hence, it is difficult to quantitatively identify the states of the structures [prvsic2017nature]
. In recent years, with the rapid development of intelligent algorithms such as statistical machine learning and deep neural networks, data-driven structural damage identification and state analysis methods have become a research hotspot in SHM, which are used to extract damage sensitive indicators and perform pattern recognition directly from structural sensing data.
2.2 Shallow Models of Data-driven Methods
In the data-driven domain, shallow learning models, such as support vector machine (SVM), k-nearest neighbor method and random forest have been studied for structural damage detection. For example, Alamdari[alamdari2019multi]
et al., proposed a multi-data fusion method for structural damage detection, which combined the timing response data sets of multiple sensors such as acceleration data, strain gauge data, and environmental data through data tensor and data extraction. SVM was also conducted to perform damage classification. Chene[chen2014semi]
et al., proposed a classification framework based on semi-supervised learning for structural damage classification. Carden et al., proposed a statistical classification method based on structural time series response. Tibaduiza[tibaduiza2013damage]
et al., proposed a damage detection method using principal component analysis (PCA) and self‐organizing maps. PAC was used to construct the initial baseline model based on the data collected in different test stages. All these methods are classified as traditional shallow machine learning models.
2.3 Deep Models of Data-driven Methods
Deep learning aims to deeply imitate the data interpretation process of the abstract essential features of human brain, establish a deep network structure similar to the analytical learning mechanism of human brain, and characterize the multi-layer abstract features of the data through a layer-by-layer learning mechanism. The way of feature extraction makes it more suitable for solving practical problems. Therefore, exploring deep learning on structural damage detection has become a hot topic of research [Pathirage@2018Structural]. Guo et al., used the sparse coding neural network as the feature extraction model, selected the MLP as the classifier for structural damage identification. Abdeljaber [abdeljaber2017real] et al., proposed a structural damage feature extraction and recognition model based on one-dimensional CNN. Yu [yu2019novel] et al., proposed a structural damage identification and localization method based on deep CNN model. Abdeljaberet [abdeljaber20181] et al., proposed an improved CNN classification model and carried out experimental verifications on the IASC-ASCE Benchmark public dataset, which proved that the method requires only a small amount of training data. Bao [bao2019computer]
et al., proposed an anomaly detection method based on computer vision and deep learning, which converted the time series data collected by SHM into an image, and then used the image as the training set for deep CNNs. However, the existing damage detection methods based on deep neural networks have not yet considered the damage feature extraction from the two dimensions of acceleration time series and space correlations among the data.
3 Problem Definition and Analysis
In this section, we illustrate the problem of structural damage detection and the problem modeling in detail, following the explicit analysis of the problem.
3.1 Problem Description
The states of structures can be altered by normal aging due to usage, environment, accidental events, etc. However, the states must remain in the range specified from the design. Structural damage detection aims to give a diagnosis of the “state” of the constituent materials of a structure and its constituting parts during the life of a structure. The sensors on the structure record the historical states of the structures which can be used for prognosis for the structures, such as evolution of damage, residual life, and so on.
3.2 Problem Modeling
In this work, we aim to identify the problem of structural damage detection based on historical sensor signals. The modeling of the problem is illustrated in Fig. 1. Sensors are usually deployed on a bridge or other structures to collect data of the states, and each sensor can generate raw time-series data. The time-series data obtained by all the sensors can be concatenated that form the input matrix of our model, as shown in Fig. 1(c).
Our goal is to estimate the structural damage based on all the sensory data. The structural damage can be divided into several categories, such as Healthy, Damage case 1, Damage case 2, and Damage case 3. Hence, we formulate our problem as a classification problem. Formally, givensensors in total, the -th sensor records a time-series , where denotes the sensory value recorded at timestamp , and denotes the length of the sequence. We suppose that all the sensors are synchronized. Then, we concatenate all the time-series data to form the input matrix , where is the concatenation operation, and represents all the sensor values at timestamp . Then, the output , which represents the estimated damage category, can be formulated as , where is the deep neural network we need to design.
3.3 Problem Analysis
In this section, we first demonstrate that the spatial and temporal dependencies are required to be considered simultaneously for structural damage detection. Take the SHM for bridges as an example, sensors are installed in different positions of a bridge. Generally, different positions in the structure may have different degrees of forces, and adjacent positions usually withstand similar forces. Therefore, the data generated by adjacent sensors often have similar patterns and have dependencies with each other. Meanwhile, the signals collected by the sensor at the time will affect the sensor’s signals at the following time intervals. We can observe that the data is affected by the spatial and temporal factors simultaneously, which inspires us to design an appropriate model that can learn and extract the spatio-temporal features jointly. However, by applying either single CNN or GRU model only, the spatio-temporal features cannot be extracted jointly for structural damage detection.
Moreover, due to the notorious gradient vanishing, GRU and LSTM usually fail to capture very long-term correlation in practice. In this work, we propose a hierarchical CNN and GRU framework to address these issues, where the low-level CNN based model is utilized to extract the spatial and short-term temporal dependencies, while the high-level GRU model is leveraged on to learn the long-term temporal dependencies.
4 Hierarchical CNN and GRU framework (HCG)
4.1 Architecture of HCG
The architecture of HCG is shown in Fig. 2. Our hierarchical model has a low-level convolutional component that learns from interactions among sensors and the short-term temporal dependencies, and a high-level recurrent component that handles information across long-term temporal dependencies.
The inputs of HCG are the time-series data of multiple sensors, while the outputs are the generated predictions of the structures. First, the raw sensory time-series data is modeled into the input matrix as discussed in Section 3.2
and then, fed into our proposed convolutional component. Second, our proposed convolutional component is leveraged on to learn the spatial and short-term temporal features. Third, the outputs of the convolutional component are fed into the recurrent component to learn the long-term temporal dependencies. Finally, a softmax layer is connected with the latent feature vectors generated by the recurrent component to predict the damage state of the structure.
4.2 Convolutional Component
As discussed in Section 3.3, it is crucial to model both the spatial and temporal dependencies in the sensory signals. Convolutional neural network (CNN) is powerful in capturing the spatial correlations and repeating patterns, and has been widely used in image classification, object tracking and video processing [simonyan2014very], etc.
A typical convolution layer contains several convolutional filters. Given the input matrix defined in Section 3.2, we use these filters to learn the spatial correlations between sensors and the short-term temporal patterns. We let the width of the kernels be the same as the number of sensors so that the kernels are able to capture the spatial correlations among all the sensors. The length of the kernels is relatively short for capturing the short-term temporal patterns. Then, the convolution operation can handle one dimension among the time data as illustrated in Fig. 3
. The convolution component finally outputs a corresponding sequence where each element is a latent vector and represents the captured patterns at that moment. Zero-padding is used to ensure that the lengths of the input and output are the same.
Formally, the convolutional layer can be formulated as:
where denotes the values of all the sensors at times , denotes a convolution kernel with size , the function
is the activation function, anddenotes the output sequence of our convolutional component. Each element , where is the number of the kernels, denotes the latent representation of the spatial and short-term features at time .
4.3 Recurrent Component
Recurrent neural networks (RNNs) have recently demonstrated the promising results in many machine learning tasks, especially when input and/or output are a sequence of variables. GRU is a simple and yet powerful variant of RNNs for time series prediction due to the gated mechanism [chen2019gated]. GRUs are carefully designed to memorize historical information and fuse current states, new inputs and historical information together in a recurrently gated way.
The outputs generated by the convolutional components are fed into the GRU to extract the long-term temporal dependencies. The computing process of the GRU unit at time can be shown in Fig. 4 and formulated in Equ. 4:
where is the hidden state of a GRU generated at the iteration step and is the original hidden states for the iteration step , is the hidden features generated by the convolutional component, is the generated hidden state of a GRU, and are the reset gate and update gate at the time , respectively, and are the learned parameters of filters, and is the element-wise multiplication of tensors.
By recurrently connecting the GRU cells, our recurrent component can process complex sequence data. Then, we use the hidden state at the last timestamp of the top-layer GRU to predict the damage category. We use a fully connected network and a softmax layer to generate the final output of HCG. Formally, the predicted category is formulated as:
where is the output of the fully connected layer, is the number of GRU layers, denotes the number of categories, and are trainable parameters, denotes the hidden state of the -th GRU layer at timestamp , and
is the predicted probability for the-th category.
4.4 Loss Function
In the training process, we adopt mean squared error (L2-loss) as the objective function, the corresponding optimization objective is formulated as:
where denotes the parameter set of our model, is the number of training samples, is the ground truth of the damage state, and is the predicted class of our model.
5 Performance Evaluation and Discussions
In this section, we evaluate our proposed HCG on two datasets including a Three-span Continuous Rigid Frame Bridge dataset (TCRF Bridge Dataset) and the phase I IASC-ASCE Structural Health Monitoring Benchmark dataset (IASC-ASCE Benchmark Dataset). First, the details of the datasets will be presented. Then, the experimental settings and implementation details will be illustrated. After that, the experimental results compared with other baselines will be presented and discussed.
In this section, two popular datasets in the domain shall be described in detail as follows.
5.1.1 TCRF Bridge Dataset
The real Three-span Continuous Rigid Frame Bridge (TCRF Bridge) structure is shown in Fig. 5, in which the main span is 98m + 180m + 98m and the total length is 377.30m. This bridge adopts the single box and single room structure. The roof width of the box girder is 12.5m. The width of the bottom plate of the box is 6.5m.
In our experiments, we use a scale model of the bridge where the main bridge, bridge pier and bridge abutment are constructed following the same scaling ratio of . Then, the stiffness degradation of the bridge structure is simulated by applying the concentrated force in the span of the continuous rigid frame bridge to make the floor crack. In order to monitor the changes of the structural state degradation, acceleration sensors have been installed on the scale model, including vertical measuring points at the bottom of the beam and on the web horizontal measuring points. We tow the trolley on the bridge deck to simulating the dynamic load process, then the acceleration can be monitored by the sensors, as shown in Fig. 6.
We apply concentrated force on the main span of the scale model that leads to cracks in middle floor of main span. We use these cracks to represent the structural damages. The degree of damage depends on the strength of the concentrated force. In our experiment settings, we have kinds of structural damage states, which are shown in Table 1.
|DC0||No damage in the bridge structure.|
|DC1||One crack in the bridge.|
|DC2||Two cracks in the bridge.|
|DC3||Two larger cracks.|
When a car passes through the bridge deck, the acceleration signals of each sensor are collected with the sampling frequency of Khz. The signals are quite different for different damage states. For example, in the case of different structural damage, the curve of acceleration at the second measuring point is shown in Fig. 7. The figure shows that the sensory data keep floating around zero, which accords with the stable time series. It can be observed that the response data are obviously different among the different damage states. The HCG model is used to learn the spatial and temporal features of these sensory data.
5.1.2 IASC-ASCE Benchmark Dataset
The phase I IASC-ASCE Structural Health Monitoring Benchmark dataset is a simulated structure and widely used for structural damage classification. The primary purpose of the IASC-ASCE benchmarks is to offer a common platform for numerous researchers to apply various SHM methods to an identical structure and compare their performance. The benchmarks comprise two phases, e.g., Phase I and Phase II, each with simulated and experimental benchmarks. In this study, the phase I IASC-ASCE Structural Health Monitoring Benchmark dataset (IASC-ASCE Benchmark Dataset) is proposed in. The simulated structure is shown in Fig. 8.
For the IASC-ASCE Benchmark structure, the degree of freedom (DOF) is 120, and the mass distribution is symmetric. The damage states are shown in Table2. Sensors are installed on each floor of the middle column along both sides. There are sensors in total. We use shaker on roof as excitation signals. The response acceleration signals are gathered from sensors on column of each floor. The sampling frequency is Hz and the length of the data is . Then, the collected sensory data is fed to HCG as the inputs.
|DC1||Remove all inclined supports from the first floor.|
|DC2||All braces in -st and -rd stories remove.|
|DC3||Remove an oblique support from the first floor.|
|DC4||Remove one oblique support from the first floor and one from the third floor.|
|DC5||Damage + relaxation element (first floor beam element) to the left.|
|DC6||The area of a certain inclined support on the first floor is reduced by .|
5.2 Experimental Setup
5.2.1 Experiment Settings
The two datasets are divided into training set (), validation set () and test set (
). Keras is used to build our models. We train and evaluate all the methods on a server withTesla P100 GPUs and E5-2620V4 CPUs.
5.2.2 Evaluation metrics
general evaluation metrics: Accuracy, Precision, Recall and F1 value to evaluate HCG and other compared baselines.
where , , and denotes true positives, false negatives, false positives and true negatives, respectively.
5.2.3 Compared Baselines
We compare HCG with a variety of baseline methods, which are summarized as follows. To ensure fair comparison, for all deep learning based model, we adjust the layer number and hidden units number such that all the models have very similar number of trainable parameters. All the deep learning based models are trained with Adam optimizer, with learning rate and batch size .
DNN: We use a -layer fully connected neural network with softmax activation. Each layer has , and neurons, respectively.
CNN [abdeljaber2017real]: We adopt a -layer CNN where the number of kernels is , , and , respectively. We use the convolutional kernels for all the layers.
GRU: We adopt a GRU model by stacking layers of GRU cells described in Equ. 4, each with dimensional hidden state.
LSTM: We improve the method in [zhao2017machine] by adopting a LSTM model by stacking layers of LSTM cells, each with dimensional hidden state.
HCG: For the convolutional component, the size of the convolution kernel is (, N) (N denotes the number of sensors) and the number of filters is , and layers are stacked. For the recurrent component, a -layer GRU is connected where each layer with dimensional hidden state. Then, two full connections having and neurons with softmax activation generate the final prediction.
5.3 Results of the TCRF Bridge Dataset
summarizes the experimental results of our proposed HCG and other baselines on the TCRF Bridge Dataset. The Dataset adopts the standard deviation. Then we run all the baselinestimes and present the average results. HCG obtains the better results compared with single CNN and GRU model. It shows the effectiveness of considering both the spatial and temporal dependencies together. HCG also outperforms DNN and LSTM based models. Overall, our proposed HCG outperforms other baselines including DNN, CNN, LSTM and GRU on all the evaluation metrics.
Fig. 9 illustrates the loss curve and accuracy curve in the training process for the deep-learning based baselines and our HCG. We can clearly observe that our proposed HCG converges more quickly than other competitors which demonstrate HCG’s priority as HCG can capture both the spatial and temporal dependencies in the sensory data.
In order to directly show the ability of the models to identify the damage states, the confusion matrices of CNN, GRU and HCG are presented according to the final results of the test set. As shown in Fig. 10, the HCG model proposed in this paper is used for condition . The classification effect of working condition 3 is higher, while the classification effect of working condition 3 and 4 is general, because the data characteristics of working condition 3 and 4 are similar. The final accuracy of HCG is 94.04%.
5.4 Performance Analysis of Hyper-parameters in the TCRF Bridge Dataset
In order to further demonstrate the advantages of the proposed HCG method rather than a set of a selected hyper-parameters, we compare and analyze the performance of different hyper-parameters, including network structure and number of neurons, of DNN, CNN, LSTM, RNN, and HCG for the TCRF Bridge dataset.
We first conduct a set of experiments to study the effectiveness of the network structure. We adopt the network structure shown in Table 4 to measure and compare the performance of the DNN, CNN, LSTM, RNN, and HCG model. In the table, -layer, -layer, -layer, -layer present the numbers of neural network layers for the models. The number of neurons is the same for all the models for fair comparisons.
We then conduct another set of experiments to study the effectiveness of the number of neurons. We adopt the -layer set, because the -layer set achieves higher accuracy than others. We choose different numbers of neurons for all the models. The results are listed in Table 5. In the table, means , , , and neurons for the layers, respectively.
|Model||[40, 70, 32, 32]||[40, 70, 32, 64]||[40, 70, 64, 64]||[40, 70, 64, 100]|
From both the tables, it can be concluded that HCG has higher accuracy in different hyper-parameter sets compared with neural network model. HCG model learns from interactions among sensors and the short-term temporal dependencies, and a high-level component that handles information across long-term temporal dependencies, which has excellent advantages.
5.5 Results of the IASC-ASCE Benchmark Dataset
Table 6 summarizes the experimental results on the IASC-ASCE Benchmark Dataset. Fig. 11 illustrates the loss curve and accuracy curve in the training process for the deep-learning based baselines and our HCG. Similar with the result in the IASC-ASCE Benchmark Dataset, HCG also shows its advantages over other compared baselines.
5.6 Performance Analysis of Hyper-parameters in the IASC-ASCE Benchmark Dataset
Like in TCRF Bridge Dataset, in order to further demonstrate the advantages of the proposed HCG method rather than a set of a selected hyper-parameters, we compare and analyze the performance of different hyper-parameters, including network structure and number of neurons, of DNN, CNN, LSTM, RNN, and HCG for the IASC-ASCE Benchmark Dataset.
We first conduct a set of experiments to study the effectiveness of the network structure. We adopt the network structure as shown in Table 7 to measure and compare the performance of the DNN, CNN, LSTM, RNN, and HCG model. In the table, -layer, -layer, -layer, -layer mean the numbers of neural network layers for the models. The number of neurons is the same for all the models for fair comparisons.
We then conduct another set of experiments to study the effectiveness of the number of neurons. We adopt the -layer set because the -layer set achieves higher accuracy than others. We choose different numbers of neurons for all the models. The results are listed in Table 8.
|Model||[32, 32, 32, 64]||[32, 32, 64, 64]||[32, 64, 64, 64]||[64, 64, 64, 64]|
From both the tables, it can be concluded that HCG has higher accuracy in different hyper-parameter sets compared with neural network models.
5.7 Running Time and Model Sizes
summarizes the running time and the model sizes of different methods. We report the running time to finish one epoch training for all the deep learning models and observe that models adopted with HCG are roughlytimes faster than GRU-based models, which demonstrates HCG is generally much faster than GRU. Moreover, adding the hierarchical structure in HCG only slightly affects the computational speed.
GPU memory consumption consists of the memory to store model parameters, which is the same for all the models in our experiments, and the memory to store the input data. From Table 9, we can observe that our proposed HCG has the smallest model sizes compared with CNN, LSTM, and GRU.
Structural damage detection has become an interdisciplinary area of interest for various engineering fields. In this work, we propose a novel hierarchical deep CNN and GRU framework, termed as HCG, for structural damage detection. HCG is prominent in capturing both the spatial and temporal features among the sensory data. In HCG, we propose a low-level convolutional component to learn the spatial and short-term temporal features. Then, we propose a high-level recurrent component to learn the long-term temporal dependencies. We have done extensive experiments on ASCE Benchmark and a three-span continuous rigid frame bridge structure datasets. The experimental results demonstrate that our HCG outperforms other methods for structural damage detection. To further improve the efficiency of computing, advanced machine learning techniques, such as stochastic configuration networks [lu2020ensemble, wang2017stochastic, wang2017stochasticFA] and graph convolutional neural network [guo2019attention], would be considered in the future research work.
This work is supported by the Science and Technology Planning Project of Yunnan Province (2017IB025), the Science and Technology Research Program of Chongqing Municipal Education Commission of China (KJQN201800705, KJQN201900726), and the Breeding Program of National Natural Science Foundation in Chongqing Jiaotong University (PY201834).