1 Introduction
In this paper, we describe a comprehensive pipeline with two methods to predict the mood of the user on the next day based on the data we obtained from the users on the days before. Moreover, we achieve the hotel recommendation for the user based on their mood. The methods are compared with the benchmark that simply predict the mood on the next day by assuming it is the same as the previous day. First, we use the model of Support Vector Machine (SVM) to predict the mood of user. The second method we used in this assignment is the Recurrent Neural Network (RNN). Both of them achieved the prediction in a reasonable accuracy.
The Pre-process of the dataset is a very important step in data mining. Usually, the Pre-process is closely related to the prediction. To be clear description, this document describe the Pre-process of dataset in Section 2. The experiments are implemented in R code based on the libraries like e1071 (SVM), RNN, etc.
2 Pre-process the Raw Data
2.1 Data Analysis
-
Reading and Understanding Data
In this section we use R code to process the dataset due to the plenty of support library in data mining. First, it is necessary to understanding what are the meaning of the variables and the value of the dataset before process the data. The dataset shows the variables and the corresponding values of the users from the smart phone. The mood of the users is related to the variables on the last days. However, the data of some variables are not related to the mood of users, or not unusable due to damage and /or insufficiency. This is the objective in this section, that we aim to pre-process the dataset from an original to the wrapped dataset that can be fed to the predictive model.
-
Pre-process the Dataset
In this section, we present the process about how we pre-process the data. First, to make the data structure clearer, we organize the dataset as the table 1 that the value is grouped by id, time, and variables. In addition, we can analysis the mood of the user in a day like the Figure 1 for the mood dynamic of the user. In such way, we know how is the mood dynamics in a day, which is good to predict the mood of the user in the next day. Therefore we processed the dataset to the new structure as shown in Fig.3
. To build the predictive model, we need to summarise the value of the variables in days that can be right format to input the classifiers. We therefore average the value of variables in days.
Figure 1: The mood of an user in a day. The mood of the user keep in a stable average value in [7,8]. Figure 2: The predictive model procedure. id time variables mood ….. …. …. AS14.01 2014-02-26 6.25 AS14.01 2014-02-27 6.33 AS14.01 2014-03-21 6.2 AS14.01 2014-03-22 6.4 AS14.01 2014-03-23 6.8 Table 1: The re-structure of dataset. Figure 3: The snapshot of the data structure that the value is grouped by id, time, and variables. The data in red rectangle are unusable due to too NA. However, in figure 3, we can see some variables of dataset we obtained have few data. We think they are unusable data, and remove them. Although the dataset is much more tidy, it is still not good enough to be the training data and test data. Therefore we remove the data in some days that only have a little value of variables. To here, the dataset are usable for training and testing as shown in Fig. 4.
Figure 4: The snapshot of the usable data structure. The value is grouped by id, time, and variables. There are no NA and duplicated id in different variables.
To pre-process the dataset to be usable in data mining, we face many challenges and problems to the original dataset like missing value, outliers, etc
[che2013big]. The pre-process is an essential and important step for data mining due to a variety of possible defect in the original dataset.Here we show many examples that are part of the techniques we used in our experiments. Missing value in original dataset is a common problem that we have to solve in data mining. First, we illustrate how we process the problem of missing value in the original dataset.
2.2 The set-up of feature
In Fig. 4, we choose some variables as the usable feature with enough samples. The data in Fig. 4 have the full values in the variables in different dates and ids. They are tidy data that can be fed to the model for training and testing from the data formate and information.
In other hand, we divide the dataset into two parts with 10% and 90% rate as the training sample and testing sample separately. To build the predictive model as shown in Fig. 2
, we aggregate the history to create attributes that can be used in the machine learning approach like the SVM
[lan2018ICARCV] and RNN we use in this document. We use the average mood during the last five days as a predictor. This is clearly present in Fig. 5 to create the new feature that can be used in the classifiers.2.3 Rationale
For the rationale of choice of the final attributes, in this assignment, we mainly consider the quality and quantity of dataset. We have to filter damaged data that probable to train an incorrect predicted model or decrease the accuracy of the prediction. Therefore, we remove the data in the day that many variables have missing value and the variable with outliers.
3 Learn prediction models
In this section, we described two predictive models and the benchmark. But we are not focus on the details of the models because we use the standard R code library for SVM and RNN.
3.1 Model Variant 1
First, we adapted the Support Vector Machine (SVM) as the predictive model. In R programming, a variety of libraries can be used to implement SVM, we used the library e1071 due to its feature of easy-to-use. The main parameters setting of SVM is shown in Table 2.
parameters | scale | type | kernel | degree | gamma | coef0 | cost | class.weights | epsilon |
---|---|---|---|---|---|---|---|---|---|
setting | 1 | C-classification | linear | 3 | 1 | 0 | 1 | 1 | 0.1 |
Therefore, we just need to train and test the sample after the pre-process section. We used the variables in section 2 as the input of SVM model and the value of mood is the output of SVM model. The 90% sample was used to train the SVM model. The rest 10% sample was used to test the accuracy of trained SVM model.
We output the accuracy of trained SVM model by predicting the training sample. And then, The accuracy was verified again by predicting the testing sample. The table 3 shows the results of predictive mood of the user on the next day that testing on the training sample. In table 3, the 568 samples are used to train the SVM predictive model. The value of mood from 5 to 8 are predicted from 3 to 9. The 467 samples are correct predicted. The trained SVM model have .
results_train | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|
5 | 0 | 1 | 4 | 1 | 0 | 0 | 0 |
6 | 1 | 2 | 9 | 71 | 11 | 4 | 0 |
7 | 0 | 0 | 0 | 23 | 313 | 31 | 2 |
8 | 0 | 0 | 0 | 0 | 13 | 79 | 0 |

Furthermore, we verified the accuracy of SVM predictive model by predicting the test sample. We have divide 100 sample as the testing sample in section 2. We have the result as shown in Table 4 and the statistical results in Fig. 7. The 81 samples were correctly predicted.
result_test | 3 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|
6 | 1 | 2 | 10 | 2 | 0 |
7 | 0 | 1 | 5 | 55 | 3 |
8 | 0 | 0 | 0 | 5 | 16 |

Last, we test the accuracy of benchmark that assuming the mood of user is the same as the previous day, which is 62.3%. Therefore, we have the comparison as shown in Table 5.
prediction | result_train | result_test | benchmark |
accuracy | 0.822 | 0.810 | 0.623 |
3.2 Model Variant 2
For this variant of the model, we incorporate a Recurrent Neural Network (RNN) to exploit the temporal characteristics of the dataset. To do so, we first pre-process the data somewhat. For this pre-processing we first replaced all the unavailable values, that is the values corresponding to ‘NA’, by their values of the previous data-point. In such a way, we can use more data-point and do not have to discard any data points. Besides that, it seems reasonable to equate these values to their previously measured values since all the variables are measured several times a day, and it seems plausible that the values of these variables do not change substantially from one data-point to the next.
Now that we have a full dataset with no missing values, we can aggregate the data over the days. This allows us to obtain averages of all the days for each variable, which is needed to produce a prediction of the average mood the next day. At the same time, all the days that the mood variable is measured and delete the days in our dataset for which the mood is not observed for each individual separately. By doing this for each individual separately, we avoid throwing away data that we could in fact use for certain individuals. So if, for instance, the mood is only measured for an individual 1 at 10 dates, but for some other individual 2 on 15 dates, we avoid throwing away 5 dates for individual 2. As a next step, we then find for which days, where at least the mood has been measured, the most variables have been recorded and discard the rest of the days in our dataset. This is again done for each individual separately, which results in the following number of observations.
ID | Observations | ID | Observations |
---|---|---|---|
AS14.01 | 18 | AS14.19 | 21 |
AS14.02 | 11 | AS14.20 | 12 |
AS14.03 | 17 | AS14.23 | 11 |
AS14.05 | 21 | AS14.24 | 18 |
AS14.06 | 12 | AS14.25 | 14 |
AS14.07 | 14 | AS14.26 | 15 |
AS14.08 | 19 | AS14.27 | 14 |
AS14.09 | 9 | AS14.28 | 13 |
AS14.12 | 15 | AS14.29 | 14 |
AS14.13 | 17 | AS14.30 | 14 |
AS14.14 | 11 | AS14.31 | 12 |
AS14.15 | 12 | AS14.32 | 13 |
AS14.16 | 14 | AS14.33 | 11 |
AS14.17 | 15 |
As a final step for preparing to fit a RNN, we convert all variables to the interval, which is necessary for the RNN to converge (faster). In the end, we scale back our predictions to their original scale such that we obtain predictions for the mood that we actually observe.
For every individual we then train and test a RNN, where we eventually ended with a learning rate of , hidden layers in the network,
iterations, the logistic sigmoid and the stochastic gradient descent method as updating rule. For testing the individual RNN’s, we used
of the available data (rounded to the nearest integer) and the other (rounded to the nearest integer) for training the data. As an example, we present the results of this training and testing phase for individual AS14.08 below. Note that for each individual, the random number generator in the training phase was initialized by set.seed(2204).
![]() |
![]() |
From results in Figure 8, the errors made in classifying the data decrease rather steep as the number of iterations progress. From the corresponding prediction plot, we see what the actual values of the mood were in the test set as opposed to the predicted values from the trained RNN. We might be worried from the error plot that we are overfitting the data in the RNN since the errors become so small, but we can see that the RNN seems to reasonably predict the mood for the following day from the prediction plot. This means our RNN is not overfitting in this case and that it can reasonably predict the mood for the following day for unknown cases. If we train our network using the entire dataset, we can also see that we adequately capture the mood of the following day for the known cases.

The results of predictions in Figure 9 show that we actually capture the mood of the following day with rather high accuracy. As expected, we thus obtain qualitatively the same pattern for the errors made in classifying the data as for the training phase earlier arises when we use the entire dataset, explains why we are able to predict the mood of the following day quite precisely. Moreover, this pattern for both the predictions and the errors is consistent for all the individuals. To provide a selection of our results, we show the prediction plots for 3 individuals below, namely for AS14.08, AS14.16 and AS14.24.
![]() |
![]() |
![]() |
From these plots we indeed see that our predictions match the observed values rather closely and seem pretty accurate. This is confirmed when we inspect the RMSE of the predictions for each individual. These RMSE’s are given in the following table.
ID | RMSE | ID | RMSE |
---|---|---|---|
AS14.01 | 0.4013390 | AS14.19 | 0.4711004 |
AS14.02 | 0.2142030 | AS14.20 | 0.3762178 |
AS14.03 | 0.4265163 | AS14.23 | 0.2910442 |
AS14.05 | 1.0045396 | AS14.24 | 0.2965332 |
AS14.06 | 0.2778267 | AS14.25 | 0.3889919 |
AS14.07 | 0.2246084 | AS14.26 | 0.2770717 |
AS14.08 | 0.5791765 | AS14.27 | 0.3803674 |
AS14.09 | 0.5235971 | AS14.28 | 0.2794391 |
AS14.12 | 0.3836072 | AS14.29 | 0.5301942 |
AS14.13 | 0.4786648 | AS14.30 | 0.2453128 |
AS14.14 | 0.2647090 | AS14.31 | 0.3185494 |
AS14.15 | 0.2972429 | AS14.32 | 0.2421542 |
AS14.16 | 1.0184591 | AS14.33 | 0.3605297 |
AS14.17 | 0.4683970 |
We see that these RMSE’s are all pretty close to zero for each individual. All things considered, we thus see that the predicted values match the actual values rather closely, for each individual. The RNN method thus seems to adequately incorporate the temporal aspects of the dataset at hand on an individual level.
3.3 Model Variant 3
In this model variant, we simply predict that the average mood on the next day is the same as on this day. In the prediction plots below we can see these actual and predicted values for 3 individuals, namely for AS14.08, AS14.16 and AS14.24.
![]() |
![]() |
![]() |
We can see that this naive approach of simply predicting that the average mood will stay constant (the same as the previous day) does not produce as nice results as those of the RNN’s. The following table shows the corresponding RMSE for each individual when adopting this naive approach.
ID | RMSE | ID | RMSE |
---|---|---|---|
AS14.01 | 3.802594 | AS14.19 | 4.642377 |
AS14.02 | 6.134013 | AS14.20 | 3.289039 |
AS14.03 | 2.906315 | AS14.23 | 3.544714 |
AS14.05 | 5.314184 | AS14.24 | 5.064912 |
AS14.06 | 4.985479 | AS14.25 | 3.896437 |
AS14.07 | 10.497090 | AS14.26 | 6.547519 |
AS14.08 | 4.977672 | AS14.27 | 5.096676 |
AS14.09 | 5.282255 | AS14.28 | 5.410381 |
AS14.12 | 4.506662 | AS14.29 | 4.377468 |
AS14.13 | 7.353911 | AS14.30 | 2.966854 |
AS14.14 | 4.541047 | AS14.31 | 3.009430 |
AS14.15 | 3.139621 | AS14.32 | 4.637708 |
AS14.16 | 4.735328 | AS14.33 | 7.052462 |
AS14.17 | 3.596140 |
From this table we also see that the RNN’s actually perform much better than the naive approach. All things considered, the naive approach does not seem correct to adopt and can be considered as a ‘clueless’ method, that is if we had no idea how to approach the problem then this would be the standard ‘worst case scenario’ for producing predictions. The naive approach can therefore indeed be considered as a benchmark model.
4 Conclusion
Hotel recommendation system is a popular research field. This paper provide a comprehensive pipeline for the researchers to build such a system from the raw data to specific application. Although the results show that the two methods achieve a successful prediction system, they are only the basic approaches in machine learning. Many approaches are interesting to further exploration. For instance, evolutionary approaches have been applied in many areas [lan2020time]
. Neuroevolution have been applied in evolving neural network for real-time computer vision
[lan2019evolving], evolutionary robotics [lan2019simulated, lan2019learning, lan2019evolutionary, lan2018directed]. In many areas [lan2016convolution], convolution neutral networks generally achieves remarkable performance that we aim to apply in this pipeline. Knowledge graph is a popular method that is applied to the many applications
[Liu2020Influence, liu2019evidence], such as finance, medicine, biology, Question—Answering, Storing Information of Research, in particular recommendation system. Therefore, we will use knowledge graph to design the hotel recommendation system in the future. In addition, the signal compress [lan2016bayesian, lan2017development, lan2016development] is an interesting technology for the pre-process raw data. These approaches are the points we aim to extend for this pipeline.