Construction Cost Index Forecasting: A Multi-feature Fusion Approach

by   Tianxiang Zhan, et al.

The construction cost index is an important indicator in the construction industry. Predicting CCI has great practical significance. This paper combines information fusion with machine learning, and proposes a Multi-feature Fusion framework for time series forecasting. MFF uses a sliding window algorithm and proposes a function sequence to convert the time sequence into a feature sequence for information fusion. MFF replaces the traditional information method with machine learning to achieve information fusion, which greatly improves the CCI prediction effect. MFF is of great significance to CCI and time series forecasting.


DVS: Deep Visibility Series and its Application in Construction Cost Index Forecasting

Time series forecasting has always been a hot spot in scientific researc...

Forecasting the abnormal events at well drilling with machine learning

We present a data-driven and physics-informed algorithm for drilling acc...

A Differential Attention Fusion Model Based on Transformer for Time Series Forecasting

Time series forecasting is widely used in the fields of equipment life c...

AutoFITS: Automatic Feature Engineering for Irregular Time Series

A time series represents a set of observations collected over time. Typi...

Yes, DLGM! A novel hierarchical model for hazard classification

Hazards can be exposed by HAZOP as text information, and studying their ...

ShuttleNet: Position-aware Fusion of Rally Progress and Player Styles for Stroke Forecasting in Badminton

The increasing demand for analyzing the insights in sports has stimulate...

Rethinking the constraints of multimodal fusion: case study in Weakly-Supervised Audio-Visual Video Parsing

For multimodal tasks, a good feature extraction network should extract i...

1 Introduction

The construction cost index (CCI) is an indicator that reflects the construction cost, and it is a research hotspot in the fields of construction and finance. The prediction of CCI is meaningful and necessary. Effectively improving the prediction level of CCI is one of the research goals. CCI data is a time series, and there are many forecasting methods for time series. Time series forecasting methods include statistical methods, fuzzy forecasting methods Parida et al. (2017); Li and Deng (2021); Liu et al. (2020), complex network methods Zhan and Xiao (2021); Liu and Deng (2019); Mao and Xiao (2019), evidence theory methods Zhan and Xiao , machine learning methods Zhou et al. (2021); Huang et al. (2021), and so on.

In order to improve the prediction effect of CCI, this paper combines the ideas of information fusion and machine learning. Information fusion is a technology to fuse information from different sources to synthesize target data. It is often used for intelligent decision-making Xiao (2019); He and Xiao (2021)

, time series analysis and so on. The traditional information fusion method is a fixed fusion method, which uses the same method to learn different data. This paper uses the multi-layer perceptron (MLP) in machine learning to replace the traditional information fusion method, and changes the fusion parameters according to the characteristics of the data to make the fusion result closer to the target result. This paper proposes a Multi-feature Fusion Framework (MFF) to predict CCI.

MFF generates a CCI feature sequence through the proposed sliding window and function sequence. The feature sequence saves the feature information of the CCI slices, and fuses the feature information into the required prediction data. MLP here replaces the traditional information fusion method, which further improves the prediction effect.

The structure of this paper is as follows: the second section introduces some basic theories of MFF, the third section is the definition of MFF, the fourth section shows the effect of predicting CCI and the analysis of CCI prediction, and the fifth section summarizes the paper.

2 Preliminaries

This section includes the basic theory of MFF. It supposes that the time series is as follows.

The time series are treated as raw data as shown in Fig.1. The length of the time series is .

Figure 1: The example of the raw data

2.1 Sliding window and time slice set

Sliding window is a method in machine learning. By setting a fixed window size, data can be sliced by sliding. Assuming that the window size is a fixed integer (, here is taken as an example), the process of sliding the window is shown in Fig.2.

Figure 2: The process of sliding window
Definition 1.

The definition of the Sliding Window is as follows:

where time slice means a continuous subsequence of the original time series and the definition of the Time Slice is as follows:

Definition 2.

The time slice set is the union of time slices generated by the time series through the sliding window as shown in Fig.3.

Figure 3: The generation of time slice set

2.2 Multilayer perceptron

The multi-layer perceptron (MLP) is promoted from the rerceptron learning algorithm (PLA) Tang et al. (2015). Multilayer perceptron can effectively enhance the robustness of machine learning and the problem of overfitting. The structure of MLP is shown in Fig.4 below.

Figure 4: The structure of MLP

2.3 Mean squared error loss function

Mean squared error (MSE) loss function is a loss function in machine learning

Pavlov et al. (2021). The mean squared error is defined as follows.

where is the input, is the target, and the shapes of and are the same.

2.4 Adam method

Adam is an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments

Kingma and Ba (2015); Loshchilov and Hutter (2017). Adam is simple to implement, has high computational efficiency, low memory requirements, and reduces the angle of the angle line, making it ideal for data and parameter problems Kingma and Ba (2015); Loshchilov and Hutter (2017). The pseudo code of Adam is as follows Kingma and Ba (2015); Loshchilov and Hutter (2017). And the good default parameters of Adam are shown in Tab.1 Kingma and Ba (2015); Loshchilov and Hutter (2017).

[0,1): Exponential decay rates for the moment estimates
: Stochastic objective function with parameters

: Initial parameter vector

while  not converged do
Algorithm 1 Adam method
Parameter Meaning Good default settings
Step Size
Exponential decay rates for the moment estimates
Term added to the denominator
Stochastic objective function with parameters \
Table 1: The meaning and good default settings of Adam parameter

2.5 Cyclical learning rates

Cyclical learning rates (CLR) is a method of dynamically adjusting the learning rate in machine learning Smith (2017). CLR eliminates the need for experiments to find the best value and timetable for the global learning rate. CLR does not reduce the learning rate in a monotonous manner, but rather makes the learning rate fluctuate between reasonable boundary values on a regular basis. The parameters and schematic diagram of CLR are shown in Tab.2 and Fig.5 Smith (2017).

Parameter Meaning
Base learning rate Lower learning rate boundaries in the cycle for each parameter group
Max learning rate Upper learning rate boundaries in the cycle for each parameter group
Step size up Number of training iterations in the increasing half of a cycle
Step size down Number of training iterations in the decreasing half of a cycle
Table 2: The meaning of CLR

Figure 5: Schematic diagram of CLR Smith (2017)

3 Multi-feature Fusion

3.1 Step 1: Input time series

The input of MFF is the time series . The time series is as follows:

where is used as an index and does not exist in the form of tuples.

3.2 Step 2: Slice time series

When generating a time slice set, MFF needs to determine the size of a sliding window . The calculation process of Time slice set is as follows:

When generating a time slice set, the setting of Ws needs to be considered. The number of time slices is . Excessive results in fewer slices and fewer learning samples. If is too small, each sample can only reflect short time series characteristics. is default parameters. The shape of is .

3.3 Step 3: Input function sequence

In step 3, MFF needs to complete the preprocessing of the time slice set and convert the time slice into a feature sequence. The function sequence is a converter that converts the time slice into a feature sequence as shown in Fig.6.

Figure 6: Example of feature conversion (window size=8, there are four functions in the function sequence)
Definition 3.

Function sequence is a set of functions, defined as follows:

where is the number of functions in the function sequence , is a time slice and transfers which is in the shape to feature which is in the shape of .

After the function sequence is input, the time slice set is converted to the feature sequence set as follows:

represents the feature value generated by the function in the time slice . The shape of feature sequence set is .

3.4 Step 4: Multilayer perceptron: forward propagation

In MFF, MLP has four layers: input layer, hidden layer 1, hidden layer 2 and output layer. The nodes in the three layers are , ,

and 1 as shown in Fig.7. Each feature sequence will be input into MLP, and then a result will be input. Whenever the result corresponding to the feature sequence is generated, it will do back propagate and optimize the parameters. A forward propagation and back propagation are called an epoch. Each epoch will update the result of the result as follows:

Figure 7: The structure of MLP in the MFF

3.5 Step 5: Multilayer perceptron: back propagation and parameter optimization

In MFF, each epoch needs back propagation and parameter optimization. The loss function of MFF is MSE and the target is next time node’s value of the the current time slice. After calculating the loss in each epoch, the parameters of MFF are back-propagated and optimized by Adam algorithm and CLR. When initializing MFF, it is necessary to input the upper and lower limits of the learning rate, which are dynamically adjusted by the CLR algorithm during training. MFF does not use the traditional gradient descent method of MLP, but uses the Adam algorithm for gradient descent, which accelerates machine learning and strengthens the effect of machine learning.

The loss and model parameters calculated in each epoch will be saved in a set. In MFF, the number of epochs is a variable set in advance. After the back propagation and parameter optimization of each epoch updated by the Adam algorithm and CLR, MFF returns to Step 4 for the next epoch training. When the last epoch is completed, a set of training parameters with the smallest loss will be selected for prediction. The process of MFF is shown in Fig.8 and the pseudo code of MFF is as follows.

3.6 Step 6: Predict

In MFF, the model parameter with the smallest loss is applied to the MLP and then the time series that needs to be predicted are input into the MFF to complete the prediction.

Figure 8: The process of MFF
1:Time series
2:Sliding window size
3:Function Sequence
4:Number of epoch
5:Shape of MLP
6:Slice time series by sliding window algorithm
7:Generate feature sequence hrough time slice set and function sequence
8:Train: Set as training set
9:for Epoch = 1 to  do
10:     MLP: forward propagation
11:     MLP: back propagation and parameter optimization by Adam and CLR algorithm
12:     Save model parameter and loss in model set
13:Predict: Apply the model with minimum loss in the MLP
14:Input last feature and output the result
Algorithm 2

4 Experiment

4.1 Data set description

Engineering News Record (ENR) is a monthly publication that publishes the CCI Shahandashti and Ashuri (2013); Hwang (2011). CCI has been studied by many civil engineers and cost analysts because it contains vital building industry price information. The CCI data set includes a total of 295 data values of construction costs from January 1990 to July 2014.

4.2 Experiment preprocessing

For CCI, MFF needs to determine the size of a window, in the experiment as an example. At the same time, the last data is used as the target of the penultimate time point, without sliding window. A total of 116 time slices were generated, and there were 116 corresponding feature sequences. In this experiment, 116 pieces of data are divided into experimental set and test set according to the ratio of .

The choice of function is variable. In this experiment, the MFF function sequence is composed of 6 functions. The function names and definitions are shown in Tab.3. Also, the number of nodes of MLP is set to and the max epoch is 10000 in this experiment. In the CLR algorithm, the base learning rate is and the max learning rate is .

Function Definition
Index The order of time nodes in the current slice
Mean Average of the time series
Standard deviation Standard deviation of the time series
Distance Time series maximum minus minimum
ApEn Approximate entropy of time series Montesinos et al. (2018)
Degree The sum of the degrees of the visibility graph Lacasa et al. (2008)
Table 3: Function sequence in the experiment

4.3 Experimental results and comparison

In order to verify the prediction effect of MFF, the classic machine learning regression methods of Decision Tree Regression (DTR)

Hastie et al. (2009)

, Ordinary least squares Linear Regression (Linear)

Hutcheson (2011), Lasso model fit with Least Angle Regression (Lasso) Taylor et al. (2014)

, Bayesian Ridge Regression (Bayesian)

Xu et al. (2020)

and Logistic Regression (Logistic)

Hosmer Jr et al. (2013); Defazio et al. (2014) are selected for comparison. At the same time, Simple Moving Average (SMA) (K=1) Guan and Zhao (2017), Autoregressive Integrated Moving Average model (ARIMA) Tseng et al. (2002b) and Seasonal Autoregressive Integrated Moving Average model (Seasonal ARIMA) Tseng et al. (2002a) time series commonly used methods are also used as comparison methods. To compare the prediction of each method, there are five measures of error: mean absolute difference (MAD), mean absolute percentage error (MAPE), root mean square error (RMSE), and normalized root mean squared error (NRMSE):

where is the predicted value, is the true value and N is the total number of .

Fig.9 shows the prediction of MFF. The predicted value of MFF is close to the actual value, and the prediction effect is good.

Figure 9: Prediction of MFF

Fig.10 shows the prediction comparison between MFF and other methods. The prediction curve of MFF is closer to the actual value than other methods, and MFF has higher advantages than other methods. The prediction errors of MFF and other methods are shown in Tab.4.

Figure 10: Comparison of MFF and other methods
SMA(K=1) 43.7391 0.4582 0.4566 55.8180 256.9233
ARIMA 38.6931 0.4055 0.4044 47.7177 214.7822
Seasonal ARIMA 45.3349 0.4769 0.4753 54.8709 240.0670
DTR 58.3954 0.6117 0.6089 71.7173 368.8740
Linear 30.3914 0.3172 0.3163 39.6220 187.2685
Lasso 30.9693 0.3232 0.3224 40.1681 189.9494
Bayesian 30.8234 0.3218 0.3209 39.8843 188.1513
Logistic 47.8696 0.5016 0.4996 60.6755 279.2818
MFF(8,5) 22.2877 0.2318 0.2316 29.2458 131.5833
Table 4: Forecast error of MFF and comparison method

4.4 Additional experiment

In order to show the prediction effect of MFF, the experimental effect of different MLP parameters will be tested here. Both M and N were tested from 1 to 20, and a total of 400 models were tested. The intuitive diagram of the prediction effect of the 400 model is shown in Fig.11. At the same time, the top 10 models and errors of the prediction effect are shown in Tab.5.

3 8 19.6209 0.2041 0.2041 26.9679 121.3342
2 9 20.0718 0.2090 0.2089 27.1621 122.2081
1 6 21.1527 0.2204 0.2202 27.9791 125.8839
5 13 22.0341 0.2293 0.2292 29.2146 131.4428
3 16 22.1317 0.2303 0.2301 29.1754 131.2663
1 7 22.1498 0.2306 0.2305 29.2729 131.7052
8 5 22.2877 0.2318 0.2316 29.2458 131.5833
8 9 23.1081 0.2405 0.2402 29.9967 134.9616
1 20 23.6854 0.2463 0.2460 30.7693 138.4374
9 2 23.7177 0.2468 0.2465 30.5634 137.5111
Table 5: Top 10 models with the best prediction results of MFF(M,N)

Figure 11: The process of MFF

4.5 Analysis

Compared with other methods, the improvement of MFF’s prediction effect comes from the following aspects:

  1. MFF processes the time series through a sliding window method, so that different time slices appear for multiple learning at the same time point, and different characteristics of the same time point at different time slices are saved.

  2. In MFF, the function sequence contains the generation methods for the characteristics of multiple directions of the time sequence. Through the method of information fusion, MFF fuses different features into prediction targets.

  3. In MFF, feature fusion uses machine learning instead of traditional information fusion. Machine learning can more flexibly fuse target data based on existing data, instead of fusing through a fixed method.

  4. In MFF, the Adam and CLR algorithms are used for back propagation and parameter optimization of MLP, which improves the training effect while increasing the robustness and efficiency of MLP.

5 Conclusion

The paper proposed the MFF method to predict CCI. By combining information fusion and machine learning, the prediction effect of MFF exceeds most pure machine learning regression methods. The proposal of MFF has contributed to CCI and time series forecasting. In the future, MFF will continue to improve and explore time series forecasting methods based on information fusion and machine learning.


The authors greatly appreciate the reviewers’ suggestions and the editor’s encouragement. This research is supported by the National Natural Science Foundation of China (No.62003280).

Conflict of Interests

The authors declare that there are no conflict of interests.


  • A. Defazio, F. Bach, and S. Lacoste-Julien (2014) SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In Advances in neural information processing systems, pp. 1646–1654. Cited by: §4.3.
  • S. Guan and A. Zhao (2017) A two-factor autoregressive moving average model based on fuzzy fluctuation logical relationships. Symmetry 9 (10), pp. 207. Cited by: §4.3.
  • T. Hastie, R. Tibshirani, and J. Friedman (2009) The elements of statistical learnin. Cited on, pp. 33. Cited by: §4.3.
  • Y. He and F. Xiao (2021)

    Conflicting management of evidence combination from the point of improvement of basic probability assignment

    International Journal of Intelligent Systems 36 (5), pp. 1914–1942. Cited by: §1.
  • D. W. Hosmer Jr, S. Lemeshow, and R. X. Sturdivant (2013) Applied logistic regression. Vol. 398, John Wiley & Sons. Cited by: §4.3.
  • Y. Huang, Y. Gao, Y. Gan, and M. Ye (2021)

    A new financial data forecasting model using genetic algorithm and long short-term memory network

    Neurocomputing 425, pp. 207–218. Cited by: §1.
  • G. D. Hutcheson (2011) Ordinary least-squares regression. L. Moutinho and GD Hutcheson, The SAGE dictionary of quantitative management research, pp. 224–228. Cited by: §4.3.
  • S. Hwang (2011) Time series models for forecasting construction costs using time series indexes. Journal of Construction Engineering and Management 137 (9), pp. 656–662. Cited by: §4.1.
  • D. P. Kingma and J. Ba (2015) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §2.4.
  • L. Lacasa, B. Luque, F. Ballesteros, J. Luque, and J. C. Nuno (2008) From time series to complex networks: the visibility graph. Proceedings of the National Academy of Sciences 105 (13), pp. 4972–4975. Cited by: Table 3.
  • H. Li and Y. Deng (2021) Local volume dimension: A novel approach for important nodes identification in complex networks. International Journal of Modern Physics B 35 (05), pp. 2150069. Cited by: §1.
  • F. Liu and Y. Deng (2019) A fast algorithm for network forecasting time series. IEEE Access 7, pp. 102554–102560. Cited by: §1.
  • G. Liu, F. Xiao, C. Lin, and Z. Cao (2020) A fuzzy interval time-series energy and financial forecasting model using network-based multiple time-frequency spaces and the induced-ordered weighted averaging aggregation operation. IEEE Transactions on Fuzzy Systems 28 (11), pp. 2677–2690. Cited by: §1.
  • I. Loshchilov and F. Hutter (2017) Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101. Cited by: §2.4.
  • S. Mao and F. Xiao (2019) A novel method for forecasting construction cost index based on complex network. Physica A: Statistical Mechanics and its Applications 527, pp. 121306. Cited by: §1.
  • L. Montesinos, R. Castaldo, and L. Pecchia (2018) On the use of approximate entropy and sample entropy with centre of pressure time-series. Journal of neuroengineering and rehabilitation 15 (1), pp. 1–15. Cited by: Table 3.
  • A. Parida, R. Bisoi, P. Dash, and S. Mishra (2017) Times series forecasting using chebyshev functions based locally recurrent neuro-fuzzy information system. International Journal of Computational Intelligence Systems 10 (1), pp. 375–393. Cited by: §1.
  • G. Pavlov, A. Maydeu-Olivares, and D. Shi (2021) Using the standardized root mean squared residual (srmr) to assess exact fit in structural equation models. Educational and Psychological Measurement 81 (1), pp. 110–130. Cited by: §2.3.
  • S. M. Shahandashti and B. Ashuri (2013) Forecasting engineering news-record construction cost index using multivariate time series models. Journal of Construction Engineering and Management 139 (9), pp. 1237–1243. Cited by: §4.1.
  • L. N. Smith (2017)

    Cyclical learning rates for training neural networks


    2017 IEEE winter conference on applications of computer vision (WACV)

    pp. 464–472. Cited by: Figure 5, §2.5.
  • J. Tang, C. Deng, and G. Huang (2015) Extreme learning machine for multilayer perceptron. IEEE transactions on neural networks and learning systems 27 (4), pp. 809–821. Cited by: §2.2.
  • J. Taylor, R. Lockhart, R. J. Tibshirani, and R. Tibshirani (2014) Post-selection adaptive inference for least angle regression and the lasso. arXiv preprint arXiv:1401.3889 354. Cited by: §4.3.
  • F. Tseng, G. Tzeng, et al. (2002a) A fuzzy seasonal arima model for forecasting. Fuzzy Sets and Systems 126 (3), pp. 367–376. Cited by: §4.3.
  • F. Tseng, H. Yu, and G. Tzeng (2002b) Combining neural network model with seasonal time series arima model. Technological forecasting and social change 69 (1), pp. 71–87. Cited by: §4.3.
  • F. Xiao (2019) Multi-sensor data fusion based on the belief divergence measure of evidences and the belief entropy. Information Fusion 46, pp. 23–32. Cited by: §1.
  • W. Xu, X. Liu, F. Leng, and W. Li (2020) Blood-based multi-tissue gene expression inference with bayesian ridge regression. Bioinformatics 36 (12), pp. 3788–3794. Cited by: §4.3.
  • [27] T. Zhan and F. Xiao A fast evidential approach for stock forecasting. International Journal of Intelligent Systems (), pp. . External Links: Document Cited by: §1.
  • T. Zhan and F. Xiao (2021) A novel weighted approach for time series forecasting based on visibility graph. arXiv preprint arXiv:2103.13870. Cited by: §1.
  • H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang (2021) Informer: beyond efficient transformer for long sequence time-series forecasting. In Proceedings of AAAI, Cited by: §1.