1 Introduction
The construction cost index (CCI) is an indicator that reflects the construction cost, and it is a research hotspot in the fields of construction and finance. The prediction of CCI is meaningful and necessary. Effectively improving the prediction level of CCI is one of the research goals. CCI data is a time series, and there are many forecasting methods for time series. Time series forecasting methods include statistical methods, fuzzy forecasting methods Parida et al. (2017); Li and Deng (2021); Liu et al. (2020), complex network methods Zhan and Xiao (2021); Liu and Deng (2019); Mao and Xiao (2019), evidence theory methods Zhan and Xiao , machine learning methods Zhou et al. (2021); Huang et al. (2021), and so on.
In order to improve the prediction effect of CCI, this paper combines the ideas of information fusion and machine learning. Information fusion is a technology to fuse information from different sources to synthesize target data. It is often used for intelligent decisionmaking Xiao (2019); He and Xiao (2021)
, time series analysis and so on. The traditional information fusion method is a fixed fusion method, which uses the same method to learn different data. This paper uses the multilayer perceptron (MLP) in machine learning to replace the traditional information fusion method, and changes the fusion parameters according to the characteristics of the data to make the fusion result closer to the target result. This paper proposes a Multifeature Fusion Framework (MFF) to predict CCI.
MFF generates a CCI feature sequence through the proposed sliding window and function sequence. The feature sequence saves the feature information of the CCI slices, and fuses the feature information into the required prediction data. MLP here replaces the traditional information fusion method, which further improves the prediction effect.
The structure of this paper is as follows: the second section introduces some basic theories of MFF, the third section is the definition of MFF, the fourth section shows the effect of predicting CCI and the analysis of CCI prediction, and the fifth section summarizes the paper.
2 Preliminaries
This section includes the basic theory of MFF. It supposes that the time series is as follows.
The time series are treated as raw data as shown in Fig.1. The length of the time series is .
2.1 Sliding window and time slice set
Sliding window is a method in machine learning. By setting a fixed window size, data can be sliced by sliding. Assuming that the window size is a fixed integer (, here is taken as an example), the process of sliding the window is shown in Fig.2.
Definition 1.
The definition of the Sliding Window is as follows:
where time slice means a continuous subsequence of the original time series and the definition of the Time Slice is as follows:
Definition 2.
The time slice set is the union of time slices generated by the time series through the sliding window as shown in Fig.3.
2.2 Multilayer perceptron
The multilayer perceptron (MLP) is promoted from the rerceptron learning algorithm (PLA) Tang et al. (2015). Multilayer perceptron can effectively enhance the robustness of machine learning and the problem of overfitting. The structure of MLP is shown in Fig.4 below.
2.3 Mean squared error loss function
Mean squared error (MSE) loss function is a loss function in machine learning
Pavlov et al. (2021). The mean squared error is defined as follows.where is the input, is the target, and the shapes of and are the same.
2.4 Adam method
Adam is an algorithm for firstorder gradientbased optimization of stochastic objective functions, based on adaptive estimates of lowerorder moments
Kingma and Ba (2015); Loshchilov and Hutter (2017). Adam is simple to implement, has high computational efficiency, low memory requirements, and reduces the angle of the angle line, making it ideal for data and parameter problems Kingma and Ba (2015); Loshchilov and Hutter (2017). The pseudo code of Adam is as follows Kingma and Ba (2015); Loshchilov and Hutter (2017). And the good default parameters of Adam are shown in Tab.1 Kingma and Ba (2015); Loshchilov and Hutter (2017).Parameter  Meaning  Good default settings 

Step Size  
Exponential decay rates for the moment estimates  
Term added to the denominator  
Stochastic objective function with parameters  \ 
2.5 Cyclical learning rates
Cyclical learning rates (CLR) is a method of dynamically adjusting the learning rate in machine learning Smith (2017). CLR eliminates the need for experiments to find the best value and timetable for the global learning rate. CLR does not reduce the learning rate in a monotonous manner, but rather makes the learning rate fluctuate between reasonable boundary values on a regular basis. The parameters and schematic diagram of CLR are shown in Tab.2 and Fig.5 Smith (2017).
Parameter  Meaning 

Base learning rate  Lower learning rate boundaries in the cycle for each parameter group 
Max learning rate  Upper learning rate boundaries in the cycle for each parameter group 
Step size up  Number of training iterations in the increasing half of a cycle 
Step size down  Number of training iterations in the decreasing half of a cycle 
3 Multifeature Fusion
3.1 Step 1: Input time series
The input of MFF is the time series . The time series is as follows:
where is used as an index and does not exist in the form of tuples.
3.2 Step 2: Slice time series
When generating a time slice set, MFF needs to determine the size of a sliding window . The calculation process of Time slice set is as follows:
When generating a time slice set, the setting of Ws needs to be considered. The number of time slices is . Excessive results in fewer slices and fewer learning samples. If is too small, each sample can only reflect short time series characteristics. is default parameters. The shape of is .
3.3 Step 3: Input function sequence
In step 3, MFF needs to complete the preprocessing of the time slice set and convert the time slice into a feature sequence. The function sequence is a converter that converts the time slice into a feature sequence as shown in Fig.6.
Definition 3.
Function sequence is a set of functions, defined as follows:
where is the number of functions in the function sequence , is a time slice and transfers which is in the shape to feature which is in the shape of .
After the function sequence is input, the time slice set is converted to the feature sequence set as follows:
represents the feature value generated by the function in the time slice . The shape of feature sequence set is .
3.4 Step 4: Multilayer perceptron: forward propagation
In MFF, MLP has four layers: input layer, hidden layer 1, hidden layer 2 and output layer. The nodes in the three layers are , ,
and 1 as shown in Fig.7. Each feature sequence will be input into MLP, and then a result will be input. Whenever the result corresponding to the feature sequence is generated, it will do back propagate and optimize the parameters. A forward propagation and back propagation are called an epoch. Each epoch will update the result of the result as follows:
3.5 Step 5: Multilayer perceptron: back propagation and parameter optimization
In MFF, each epoch needs back propagation and parameter optimization. The loss function of MFF is MSE and the target is next time node’s value of the the current time slice. After calculating the loss in each epoch, the parameters of MFF are backpropagated and optimized by Adam algorithm and CLR. When initializing MFF, it is necessary to input the upper and lower limits of the learning rate, which are dynamically adjusted by the CLR algorithm during training. MFF does not use the traditional gradient descent method of MLP, but uses the Adam algorithm for gradient descent, which accelerates machine learning and strengthens the effect of machine learning.
The loss and model parameters calculated in each epoch will be saved in a set. In MFF, the number of epochs is a variable set in advance. After the back propagation and parameter optimization of each epoch updated by the Adam algorithm and CLR, MFF returns to Step 4 for the next epoch training. When the last epoch is completed, a set of training parameters with the smallest loss will be selected for prediction. The process of MFF is shown in Fig.8 and the pseudo code of MFF is as follows.
3.6 Step 6: Predict
In MFF, the model parameter with the smallest loss is applied to the MLP and then the time series that needs to be predicted are input into the MFF to complete the prediction.
4 Experiment
4.1 Data set description
Engineering News Record (ENR) is a monthly publication that publishes the CCI Shahandashti and Ashuri (2013); Hwang (2011). CCI has been studied by many civil engineers and cost analysts because it contains vital building industry price information. The CCI data set includes a total of 295 data values of construction costs from January 1990 to July 2014.
4.2 Experiment preprocessing
For CCI, MFF needs to determine the size of a window, in the experiment as an example. At the same time, the last data is used as the target of the penultimate time point, without sliding window. A total of 116 time slices were generated, and there were 116 corresponding feature sequences. In this experiment, 116 pieces of data are divided into experimental set and test set according to the ratio of .
The choice of function is variable. In this experiment, the MFF function sequence is composed of 6 functions. The function names and definitions are shown in Tab.3. Also, the number of nodes of MLP is set to and the max epoch is 10000 in this experiment. In the CLR algorithm, the base learning rate is and the max learning rate is .
Function  Definition 

Index  The order of time nodes in the current slice 
Mean  Average of the time series 
Standard deviation  Standard deviation of the time series 
Distance  Time series maximum minus minimum 
ApEn  Approximate entropy of time series Montesinos et al. (2018) 
Degree  The sum of the degrees of the visibility graph Lacasa et al. (2008) 
4.3 Experimental results and comparison
In order to verify the prediction effect of MFF, the classic machine learning regression methods of Decision Tree Regression (DTR)
Hastie et al. (2009), Ordinary least squares Linear Regression (Linear)
Hutcheson (2011), Lasso model fit with Least Angle Regression (Lasso) Taylor et al. (2014), Bayesian Ridge Regression (Bayesian)
Xu et al. (2020)and Logistic Regression (Logistic)
Hosmer Jr et al. (2013); Defazio et al. (2014) are selected for comparison. At the same time, Simple Moving Average (SMA) (K=1) Guan and Zhao (2017), Autoregressive Integrated Moving Average model (ARIMA) Tseng et al. (2002b) and Seasonal Autoregressive Integrated Moving Average model (Seasonal ARIMA) Tseng et al. (2002a) time series commonly used methods are also used as comparison methods. To compare the prediction of each method, there are five measures of error: mean absolute difference (MAD), mean absolute percentage error (MAPE), root mean square error (RMSE), and normalized root mean squared error (NRMSE):where is the predicted value, is the true value and N is the total number of .
Fig.9 shows the prediction of MFF. The predicted value of MFF is close to the actual value, and the prediction effect is good.
Fig.10 shows the prediction comparison between MFF and other methods. The prediction curve of MFF is closer to the actual value than other methods, and MFF has higher advantages than other methods. The prediction errors of MFF and other methods are shown in Tab.4.
MAD  MAPE  SMAPE  RMSE  NRMSE  

SMA(K=1)  43.7391  0.4582  0.4566  55.8180  256.9233 
ARIMA  38.6931  0.4055  0.4044  47.7177  214.7822 
Seasonal ARIMA  45.3349  0.4769  0.4753  54.8709  240.0670 
DTR  58.3954  0.6117  0.6089  71.7173  368.8740 
Linear  30.3914  0.3172  0.3163  39.6220  187.2685 
Lasso  30.9693  0.3232  0.3224  40.1681  189.9494 
Bayesian  30.8234  0.3218  0.3209  39.8843  188.1513 
Logistic  47.8696  0.5016  0.4996  60.6755  279.2818 
MFF(8,5)  22.2877  0.2318  0.2316  29.2458  131.5833 
4.4 Additional experiment
In order to show the prediction effect of MFF, the experimental effect of different MLP parameters will be tested here. Both M and N were tested from 1 to 20, and a total of 400 models were tested. The intuitive diagram of the prediction effect of the 400 model is shown in Fig.11. At the same time, the top 10 models and errors of the prediction effect are shown in Tab.5.
M  N  MAD  MAPE  SMAPE  RMSE  NRMSE 

3  8  19.6209  0.2041  0.2041  26.9679  121.3342 
2  9  20.0718  0.2090  0.2089  27.1621  122.2081 
1  6  21.1527  0.2204  0.2202  27.9791  125.8839 
5  13  22.0341  0.2293  0.2292  29.2146  131.4428 
3  16  22.1317  0.2303  0.2301  29.1754  131.2663 
1  7  22.1498  0.2306  0.2305  29.2729  131.7052 
8  5  22.2877  0.2318  0.2316  29.2458  131.5833 
8  9  23.1081  0.2405  0.2402  29.9967  134.9616 
1  20  23.6854  0.2463  0.2460  30.7693  138.4374 
9  2  23.7177  0.2468  0.2465  30.5634  137.5111 
4.5 Analysis
Compared with other methods, the improvement of MFF’s prediction effect comes from the following aspects:

MFF processes the time series through a sliding window method, so that different time slices appear for multiple learning at the same time point, and different characteristics of the same time point at different time slices are saved.

In MFF, the function sequence contains the generation methods for the characteristics of multiple directions of the time sequence. Through the method of information fusion, MFF fuses different features into prediction targets.

In MFF, feature fusion uses machine learning instead of traditional information fusion. Machine learning can more flexibly fuse target data based on existing data, instead of fusing through a fixed method.

In MFF, the Adam and CLR algorithms are used for back propagation and parameter optimization of MLP, which improves the training effect while increasing the robustness and efficiency of MLP.
5 Conclusion
The paper proposed the MFF method to predict CCI. By combining information fusion and machine learning, the prediction effect of MFF exceeds most pure machine learning regression methods. The proposal of MFF has contributed to CCI and time series forecasting. In the future, MFF will continue to improve and explore time series forecasting methods based on information fusion and machine learning.
Acknowledgment
The authors greatly appreciate the reviewers’ suggestions and the editor’s encouragement. This research is supported by the National Natural Science Foundation of China (No.62003280).
Conflict of Interests
The authors declare that there are no conflict of interests.
References
 SAGA: a fast incremental gradient method with support for nonstrongly convex composite objectives. In Advances in neural information processing systems, pp. 1646–1654. Cited by: §4.3.
 A twofactor autoregressive moving average model based on fuzzy fluctuation logical relationships. Symmetry 9 (10), pp. 207. Cited by: §4.3.
 The elements of statistical learnin. Cited on, pp. 33. Cited by: §4.3.

Conflicting management of evidence combination from the point of improvement of basic probability assignment
. International Journal of Intelligent Systems 36 (5), pp. 1914–1942. Cited by: §1.  Applied logistic regression. Vol. 398, John Wiley & Sons. Cited by: §4.3.

A new financial data forecasting model using genetic algorithm and long shortterm memory network
. Neurocomputing 425, pp. 207–218. Cited by: §1.  Ordinary leastsquares regression. L. Moutinho and GD Hutcheson, The SAGE dictionary of quantitative management research, pp. 224–228. Cited by: §4.3.
 Time series models for forecasting construction costs using time series indexes. Journal of Construction Engineering and Management 137 (9), pp. 656–662. Cited by: §4.1.
 Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §2.4.
 From time series to complex networks: the visibility graph. Proceedings of the National Academy of Sciences 105 (13), pp. 4972–4975. Cited by: Table 3.
 Local volume dimension: A novel approach for important nodes identification in complex networks. International Journal of Modern Physics B 35 (05), pp. 2150069. Cited by: §1.
 A fast algorithm for network forecasting time series. IEEE Access 7, pp. 102554–102560. Cited by: §1.
 A fuzzy interval timeseries energy and financial forecasting model using networkbased multiple timefrequency spaces and the inducedordered weighted averaging aggregation operation. IEEE Transactions on Fuzzy Systems 28 (11), pp. 2677–2690. Cited by: §1.
 Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101. Cited by: §2.4.
 A novel method for forecasting construction cost index based on complex network. Physica A: Statistical Mechanics and its Applications 527, pp. 121306. Cited by: §1.
 On the use of approximate entropy and sample entropy with centre of pressure timeseries. Journal of neuroengineering and rehabilitation 15 (1), pp. 1–15. Cited by: Table 3.
 Times series forecasting using chebyshev functions based locally recurrent neurofuzzy information system. International Journal of Computational Intelligence Systems 10 (1), pp. 375–393. Cited by: §1.
 Using the standardized root mean squared residual (srmr) to assess exact fit in structural equation models. Educational and Psychological Measurement 81 (1), pp. 110–130. Cited by: §2.3.
 Forecasting engineering newsrecord construction cost index using multivariate time series models. Journal of Construction Engineering and Management 139 (9), pp. 1237–1243. Cited by: §4.1.

Cyclical learning rates for training neural networks
. In2017 IEEE winter conference on applications of computer vision (WACV)
, pp. 464–472. Cited by: Figure 5, §2.5.  Extreme learning machine for multilayer perceptron. IEEE transactions on neural networks and learning systems 27 (4), pp. 809–821. Cited by: §2.2.
 Postselection adaptive inference for least angle regression and the lasso. arXiv preprint arXiv:1401.3889 354. Cited by: §4.3.
 A fuzzy seasonal arima model for forecasting. Fuzzy Sets and Systems 126 (3), pp. 367–376. Cited by: §4.3.
 Combining neural network model with seasonal time series arima model. Technological forecasting and social change 69 (1), pp. 71–87. Cited by: §4.3.
 Multisensor data fusion based on the belief divergence measure of evidences and the belief entropy. Information Fusion 46, pp. 23–32. Cited by: §1.
 Bloodbased multitissue gene expression inference with bayesian ridge regression. Bioinformatics 36 (12), pp. 3788–3794. Cited by: §4.3.
 [27] A fast evidential approach for stock forecasting. International Journal of Intelligent Systems (), pp. . External Links: Document Cited by: §1.
 A novel weighted approach for time series forecasting based on visibility graph. arXiv preprint arXiv:2103.13870. Cited by: §1.
 Informer: beyond efficient transformer for long sequence timeseries forecasting. In Proceedings of AAAI, Cited by: §1.