Location-Centered House Price Prediction: A Multi-Task Learning Approach

01/07/2019 ∙ by Guangliang Gao, et al. ∙ Nanjing University NetEase, Inc RMIT University 0

Accurate house prediction is of great significance to various real estate stakeholders such as house owners, buyers, investors, and agents. We propose a location-centered prediction framework that differs from existing work in terms of data profiling and prediction model. Regarding data profiling, we define and capture a fine-grained location profile powered by a diverse range of location data sources, such as transportation profile (e.g., distance to nearest train station), education profile (e.g., school zones and ranking), suburb profile based on census data, facility profile (e.g., nearby hospitals, supermarkets). Regarding the choice of prediction model, we observe that a variety of approaches either consider the entire house data for modeling, or split the entire data and model each partition independently. However, such modeling ignores the relatedness between partitions, and for all prediction scenarios, there may not be sufficient training samples per partition for the latter approach. We address this problem by conducting a careful study of exploiting the Multi-Task Learning (MTL) model. Specifically, we map the strategies for splitting the entire house data to the ways the tasks are defined in MTL, and each partition obtained is aligned with a task. Furthermore, we select specific MTL-based methods with different regularization terms to capture and exploit the relatedness between tasks. Based on real-world house transaction data collected in Melbourne, Australia. We design extensive experimental evaluations, and the results indicate a significant superiority of MTL-based methods over state-of-the-art approaches. Meanwhile, we conduct an in-depth analysis on the impact of task definitions and method selections in MTL on the prediction performance, and demonstrate that the impact of task definitions on prediction performance far exceeds that of method selections.



There are no comments yet.


page 3

page 5

page 6

page 7

page 9

page 10

page 11

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the improvement of people’s living standards, the demand for houses increases. In the United States111https://www.statista.com/statistics/226144/us-existing-home-sales/, house sales have grown by 34% in the last decade and reached a record high of 5.51 million last year. In Australia222https://tradingeconomics.com/australia/new-home-sales, house sales have increased by 36% since 2013. House price prediction [1, 2] has therefore attracted widespread attentions because the prediction outcomes can help various real estate stakeholders to make more informed decisions. For example, buyers would use house price prediction to search for candidate houses that match their financial capabilities. Similarly, house owners would need it to keep monitoring the market and seek the best opportunity for house selling. Moreover, real estate sales agents also rely on house price prediction to help customers better find out market trends, and the accuracy of prediction has become an important criterion for measuring the credibility of house sales agents.

House price is considered to be related to various house features. In general, these features can be grouped into two categories: non-geographical features, such as the number of bedrooms and floor space area; geographical features, such as the distance to the city center and the quality of nearby schools. Therefore, research on house price prediction defines appropriate models to fit various features to predict house price. The hedonic price model [3, 4] proposed from the perspective of economics is the most typical representative, and has been studied extensively in the literature of house price prediction [5, 6, 7]

. However, it is primarily used for analyzing the relationship between house price and house features, where it typically adopts regression methods. In recent years, with the extensive application of machine learning in various fields, house price prediction through more machine learning methods, such as ANN (Artificial Neural Network


, SVM (Support Vector Machine

[9, 10], AdaBoost (Adaptive Boosting) [11], has also received more and more attention.

By carefully examining these studies, it reveals that modeling is usually carried out from a global view, i.e., implemented on the entire house data directly. However, the prediction performance of such approaches is not satisfactory, especially as the scale of the house data increases. The primary reason is that the weight of various house features is always assumed to be constant in the formulation of house price from the global view. In fact, however, the impact of house features on house price varies from house to house. For example, transportation features may have a greater impact on house price in the suburbs than that in urban centers, the difference in house price between school districts and non-school districts will be mainly concentrated on education features.

Numerous studies [12, 13, 14, 15] have also demonstrated that the different locations of houses and the surrounding communities of houses have a significant impact on the price of houses, though at a very coarse granularity. As a consequence, the researchers began splitting the entire house data into several partitions and designing prediction models for each partition individually. However, no matter which strategy is used for splitting, there are still two challenges that limit further improvements in the prediction performance along this line: (1) independent modeling ignores the relatedness among the partitions; (2) each model uses only the data of its corresponding partition. Data sparsity, especially in partitions where the number of original data is insufficiently small, is a serious problem.

To address the aforementioned issues, we exploit the framework of Multi-Task Learning (MTL) to model the house price prediction problem. MTL [16, 17, 18, 19]

is a type of transfer learning, when there are multiple related tasks and each task has limited training samples, the model can enhance the performance by extracting and utilizing shared information among different tasks. MTL has been applied in many applications, including time series analysis 

[20], stock selection [21], event forecasting [22, 23], disease progression [24, 25], and water quality prediction [26]. By using MTL, we inherit the principle of modeling house price prediction problem from local views because the entire house data is grouped into multiple tasks. Moreover, MTL considers the relatedness among tasks, and the shortcomings of insufficient samples and independent modeling can also be addressed.

There are two key points in MTL: (i) how to define tasks and (ii) how to characterize the relatedness among tasks. In general, the first point is data-driven, different strategies for defining tasks will result in different task sets, and the relatedness among tasks will also be determined. We split the entire house data and define each partition as a task after splitting, where the splitting strategy is equivalent to the strategy for defining tasks in MTL. The second point is method-driven – once a multi-task problem is formulated, the design of a specific method will indicate the level of learning. We select MTL-based methods with regularization terms to capture and utilize the relatedness among tasks, and different regularization terms represent different levels of learning. To this end, in this paper, we present an organized study on exploiting MTL for the problem of house price prediction. The specific contributions of our work are listed as follows:

  • We formulate house price prediction as an MTL problem. To the best of our knowledge, this work is the first to consider using MTL to solve house price prediction problem, which provides a new perspective on the study of house price prediction. Additionally, our work also enriches the application fields of MTL.

  • We define and capture a fine-grained location profile powered by a diverse range of location data sources. We observe that the location of house plays a critical role in house price prediction. Therefore, we focus on enriching the location-driven house features and grouping them into four profiles for further fine-grained, namely house, education, transportation and facility, respectively.

  • We demonstrate the superiority of MTL-based methods over state-of-the-art approaches on real-world house data. Based on our house data, we evaluate the prediction performance of MTL-based methods and five state-of-the-art approaches. Experimental results show that MTL-based methods consistently outperform these competing approaches.

  • We conduct an in-depth analysis on the impact of task definitions and method selections on prediction performance. We design two categories of strategies to define tasks and select three MTL-based methods with different regularization terms. By comparing their corresponding prediction performance, we reveal that the impact of task definitions on prediction performance far exceeds that of method selections.

The remainder of this paper is organized as follows. We review the related work in Section 2, and Section 3 provides a comprehensive profiling of our location-centered house data. The MTL-based house price prediction is described in Section 4, and Section 5 shows our experimental results. Finally, we conclude the paper and give guidelines for our future work in Section 6.

2 Related Work

Data Examples References Ours
[5] [6] [7] [8] [11] [12] [13] [15] [27] [28] [29] [30] [31] [32]
of data
1, 000
10, 000
100, 000
floor area, number of bedrooms
geo-information, address, suburb
air condition, water, heating views
nearby schools
school districts
school rankings
nearby public transport
travel time to work
hospitals, shops
distance to nearest hospitals
TABLE I: Summary of our data profiles and comparisons with most of the existing house price prediction work. Our location-centered house data is more comprehensive in terms of house transaction records and house features than those used in the literature.

House is usually treated as a heterogeneous goods, defined by a bundle of utility bearing features [27, 33]. Therefore, the house price can be considered as a quantitative representation of a set of these features. Over the past decades, a large amount of studies have examined the relationship between house price and house features. For example, Król [6] investigated the relationship between the price of an apartment and its significant features based on the results of hedonic analysis in Poland. The work of [7] discussed which house features have negative or positive effects upon the value of the house in Turkey. Kryvobokov and Wilhelmsson [28] derived the weights of the relative importance of location features that influence the market values of apartments in Donetsk, Ukraine. Ottensmann et al. [29] compared measures of location using both distances and travel time, to the CBD, and to multiple employment centers to understand how residence location relative to employment location affects house price in Indianapolis, Indiana, USA. Ozalp and Akinci [30] determined the housing and environmental features that were effective on residential real estate sale prices in Artvin, Turkey.

Based on a broad study of the relationship between house price and various house features, house price prediction approaches output the estimated house price by inputting house features. According to the basic idea whether it relies on the global model or not, one can divide the existing house price prediction methods into two categories.

The global model predicts the house price on a range of its constituent features, and is usually modeled directly across the entire house data. Much work has been done along this line. Selim [8] examined the determinants of house price in Turkey by using the hedonic model and demonstrated that artificial neural network can be a better alternative for prediction of the house price in Turkey. Gu et al. [9]

proposed a hybrid of genetic algorithm and support vector machine approach (G-SVM) to predict house price, the cases from China showed the prediction ability of the method. Wang et al. 

[10] proposed a novel model based on SVM to predict the average house price in different years, meanwhile, the authors demonstrated that PSO algorithm can effectively determine the parameters of SVM. Park and Bae [11] developed a general prediction model based on machine learning methods such as C4.5, RIPPER, Naive Bayesian, and AdaBoost and compared their classification accuracy performance. The reason why global modeling can be widely recognized in house price prediction is obvious, because it is easy to apply and can reveal the comparative size of effects of various features on house price. However, global modeling ignores the impact of house location and surroundings on house price, so the prediction performance is often unsatisfactory as the scale of the house data increases.

Recent studies have focused on house price prediction from local views and have gradually become a serious alternative and extension of conventional house price modeling approaches. Among these studies, Bourassa et al. [12] compared alternative methods for taking spatial dependence into account in house price prediction, and concluded that a geostatistical model with disaggregated submarket variables performed the best. Case et al. [13] investigated the hedonic model and three spatial models, and out-of-sample prediction accuracy was used for comparison purposes. Their prediction results indicated the importance to incorporate the nearest neighbor transactions for predicting housing values. Gerek [14] designed two different adaptive neuro-fuzzy approaches for prediction, namely ANFIS with grid partition (ANFIS-GP) and ANFIS with sub clustering (ANFIS-SC), and the results indicated that the performance of ANFIS-GP was slightly better than that of ANFIS-SC. Montero et al. [15] considered parametric and semi-parametric spatial hedonic model variants to reflect the spatial effects in house price. The proposed model is represented as a mixed model that account for spatial autocorrelation, spatial heterogeneity and (smooth and nonparametrically specified) nonlinearities using penalized splines methodology. The results obtained suggest that the nonlinear models are the best strategies for house price prediction.

Although the house price prediction problem has been widely studied, our work is significantly different from most of the existing work in the following aspects. First, as shown in Table I, our house data (more details will be provided in the next section) is more adequate in terms of house transaction records and house features than the house data used in the literature, which enables us to better explore the impact of various house features on house price and study house price prediction problem. Second, the existing studies, whether global or local, can be summarized into traditional Single Task Learning (STL), while we use MTL to study house price prediction. To the best of our knowledge, our work is the first one adopting MTL for price prediction. In STL, each task is considered to be independent and learn individually. In MTL, tasks learn simultaneously by using relatedness between each other. Combined with our previous statements, the principle of MTL can help us better model house price prediction problem from a finer-grained local view.

3 House Data Profiling

In this section, we elaborate a comprehensive real estate dataset used in this study. We first describe four location-centered profiles, each of which contain a variety of features. Then we analyze the house data to further demonstrate the motivations for introducing MTL to predict house prices.

3.1 Data description

We utilize house data from Melbourne333http://www.realestate.com.au/ [34, 35], one of the largest cities in Australia, as an example to understand the domain situation. The data includes houses sold in Melbourne’s metropolitan area since 2011. We extract 136, 394 house transaction records from Jan. 2015 to Jan. 2018 to generate the dataset for this study. The house features are divided into four profiles to better observe the impact of different types of features on house price. In particular, this comprehensive dataset contains four location-centered profiles: house profile, education profile, transportation profile and facility profile.

House profile. In this group, we choose seven relevant features about the house itself. The number of bedrooms, the number of bathrooms, and land size are the most common basic features. The number of parking spaces is gradually related to price with the booming house market, thus we introduce this feature. Considering the possible correlation between price and income, we also include family weekly income as an independent feature. Moreover, geographical information has a great impact on house price. In the vast majority of cases, consumers are more concerned with general locations than with detailed addresses, so we choose two regional features, 444http://www.abs.gov.au/ (Statistical Area Level, the geographical areas for the processing and release of Australian census data) and postal code, to reflect the impact of geographical information. Table II reports the number of partitions at different statistical area levels and postal code.

Statistical area level #Partitions
Postal code 547
TABLE II: Number of partitions at different statistical area levels. has the coarsest-grained split of an area and the smallest number of partitions, while has the finest-grained split of an area and the largest number of partitions.

Education profile. In recent years, educational resources have received more and more attention, thus we either find the exact primary and secondary school districts555http://melbourneschoolzones.com/ (top 20% schools) that each house belongs to, or map the house with its nearest primary and secondary schools. The corresponding school rankings666http://bettereducation.com.au/ as four features to examine the impact on house price.

Transportation profile. Since transportation networks have always been of great concern, we set up six features: (1) the distance and walking time from each house to its nearest train station using Google Maps API777http://developers.google.com/maps/, (2) the distance and travel time between each pair of train stations based on the GTFS888http://www.data.vic.gov.au/data/dataset/ (General Transit Feed Specification) data, and (3) the distance and self-driving time from the location of the house to the city center, i.e., to the main Central Business District (CBD), using Google Maps API.

Facility profile. Proximity to facilities such as shops, hospitals, clinics, and supermarkets may affect the house price as well. Therefore, we introduce four different features based upon these four facilities to describe the distance between a given house and the nearest four facilities, where the distance calculation uses Google Maps API.

Table III summarizes all selected house features and their definitions. Meanwhile, some important statistics for our data set can be also found in Table III. The value we want to predict, the house price at a given sales time, as the target. Figure 1a describes the trend of average house price in each month during the three years. It is clear that average house price is fluctuating over time, which indicates that house price is time-sensitive. Considering that location is one of the important factors in shaping the price of a house. Without loss of generality, we choose the partitions based on here. As shown in Figure 1b, the average house price in each partition is different. Therefore, spatial dependence also clearly exists in house price.

Category Name of features Descriptions Min. Max. Median Std. Dev.
House The number of bedrooms 1 5
The number of bathrooms 1 3
The number of parking spaces 1 5
The land size of the house () 340 2500 708.79 291.73
Family weekly income () 935 2836 1553.91 387.96
Statistical area level SA1 SA4
Postal code 3000 3996
Education School district where the house is located 1 100
School closest to the house 1 500
The ranking of primary school 3 500
The ranking of secondary school 1 500
Transportation Distance to the nearest train station () 23 5040 2026.22 1186.78
Walking time to the nearest train station () 1 126 34.75 20.36
The train distance to city center () 1300 82600 35239.52 14594.42
The train time to city center () 6 101 48.33 16.64
The self-driving distance to city center () 1245 83497 35109.47 14678.34
The self-driving time to city center () 10 120 47.98 18.58
Facility The distance to nearest shopping center () 5 4999 1643.19 961.48
The distance to nearest hospital () 15 5000 1832.83 1102.66
The distance to nearest clinic () 8 4999 957.21 745.01
The distance to nearest supermarket () 25 5000 1452.63 847.71
The sales time of the houses Jan. 2015 Jan. 2018
The sales price of the houses () 262 2090 680.54 353.97
TABLE III: List of selected house features and statistics of our house data.
Fig. 1: Three-year average house price trend by Month and , respectively. Average house price fluctuates over time and varies by partition.

3.2 Data insight

Because of the time sensitivity and spatial dependence of house price as described above, the researchers [12, 13, 31, 15] intuitively split house data and model each partition individually. However, such modeling usually does not deal well with house price prediction problem. The two challenges that affect the prediction performance have been mentioned in the previous sections, and we now elaborate on them by analyzing the house data.

The first reason is that no matter which splitting strategy is adopted, it is difficult to ensure that the number of samples allocated to the generated partitions is optimal. We choose partitions based on and as two cases, respectively. Figure 2 shows the number of samples for each partition during the three years. We can find that the number of samples in the partitions varies greatly. Moreover, with the further refinement of the split, the number of samples in each partition is generally small, especially in areas where the number of original samples is insufficient, the impact of splitting is more pronounced, which reduces the prediction performance (please refer to the empirical results for the STL-based approaches in Tables VI and VII in the Experiments section).

Fig. 2: Number of samples per partition based on and , respectively. The number of samples in the partitions varies greatly. As the split is further refined, the number of samples is generally small.

Another important reason is that independent modeling ignores the relatedness between partitions. In fact, partitions are not completely independent, and there are many explicit or implicit connections between them. For example, two independent partitions in geographical space may belong to the same school district. As a result, when analyzing the impact of the school on house price, the two partitions should be merged rather than separated. In order to better illustrate our intuition, we count the number of features that belong to multiple partitions based on and , respectively. The results are summarized in Table IV. The phenomenon that features belong to multiple partitions is very common and becomes more apparent with further splitting. It also indicates the importance of preserving the relatedness between partitions for modeling the problem of house price prediction.

=1 =2 3 =1 =2 3
PRIMARY 46 50 225 34 30 257
SECONDARY 40 32 148 24 23 173
123 70 25 11 33 174
289 64 17 84 84 202
328 83 13 94 87 243
2133 112 7 1386 633 233
361 62 12 65 101 269
TABLE IV: Number of features that belong to only one partition (=1), two partitions (=2), and at least three partitions (3) based on and , respectively.

Based on the above data analysis, we conclude the two challenges that affect the prediction performance of existing approaches from the data view. Meanwhile, considering that the inherent relatedness of the partitions due to the consistency of the house features, we cast the house price prediction problem into an MTL problem. By using the framework of MTL, we can not only reflect the relatedness between partitions well, but also solve the dilemma of insufficient samples in some partitions.

4 MTL-based House Price Prediction

In this section, we first describe our house price prediction problem and provide preliminaries about MTL. Then we formulate the problem of MTL for house price prediction. To facilitate our illustration, the notations used throughout this paper are presented in Table V.

Notations Explanations
, a task (partition), all tasks (partitions)
, a time interval, a timestamp
prediction horizon
number of samples for task in a time interval
number of samples for task in time intervals
number of features
, training input and output for task
, feature weight parameter for task , weight matrix for all tasks
empirical loss and regularization error
Ratio of average house price for two tasks and
TABLE V: Notations and explanations.

4.1 Problem description and preliminaries

Consider that the house data contains partitions and in each time interval (e.g., month, quarter), each partition has house transaction records. Therefore, given a timestamp , the objective of our house price prediction is to predict the price of houses that appear in each partition from to based on the historical transaction data collected before , where is a specific prediction horizon. It is not difficult to find that multiple timestamps refer to multiple predictions.

In this paper, we comprehend each house price prediction as a problem that is jointly learned by multiple tasks. There are two key points in the formulation of such an MTL: defining tasks and characterizing the relatedness among tasks. In term of the first point, it is usually determined by the information in the data used and the specific application scenario. For example, in the widely used case of predicting student performance in schools, the 139 schools involved were defined as 139 tasks. In term of the second point, methods with regularization terms are generally used to formulate the relatedness among tasks, and different regularization terms represent different ways of formulation. For example, the -norm means that all tasks share a common set of representations.

Here, we define the partitions contained in the house data as tasks, and each partition is aligned with a task. Thus, the two key points in MTL-based house price prediction become how to construct the tasks and what methods are used to model the relatedness among these tasks. Next, we will introduce our strategies for each of these two issues.

4.2 Task definition

In the literature [25, 23, 26], tasks are usually uniquely identified and given directly, thus the impact of task definitions on performance has also not received sufficient attention. However, as an essential element of using MTL, there are various ways to define tasks, even in the same application scenario. For example, in the case of predicting student performance in schools as described above, in addition to defining each school as a task, we can also group adjacent schools into one task. Obviously, this change in task definition will affect the subsequent steps of the MTL formulation. Therefore, for our house price prediction problem, we propose two categories of strategies for task definition, and explore the impact of different task definitions on prediction performance.

4.2.1 Defining tasks based on one single profile

Existing STL-based house price prediction approaches [12, 31, 15] can be grouped into this category, usually from the perspective of geographical factors. For example, we can define the area of a postal code as one task. One reason for this definition is that geographical factors are the most intuitive expression of house price differences, so task definition along this line is the most common one. Another reason is that the house data used by the above studies includes limited features that affect house price, making it difficult to find more ways to define tasks. Considering that our data set contains a (much richer) variety of house features, we conduct a wide range of task definition attempts and select the following four cases as representatives of this category.

In the house profile, we use the statistical area levels to split the house data. The four area levels indicate four splitting strategies. One partition in each level is defined as a task in the corresponding level. For example, there are 17 partitions at the level, so we can get 17 tasks at this level. Similarly, we also consider task definition based on postal code, where one postal code partition corresponds to a task.

In the education profile, we employ the concept of school districts to split the house data, and each school district serves as one task. The primary and secondary school districts lead to two splitting strategies. In addition, we note that the attention to the school district is closely related to the ranking of the school. Therefore, we mainly focus on the school districts of top schools.

In the transportation profile, there seems to be no obvious perspective to define the tasks compared to the previous two feature profiles. Considering that distance/time is an important criterion for measuring the situation of transportation, we define each train station as the centroid of a task and determine the scope of the task by specifying the distance/time threshold to the train station. Therefore, houses with the same train station and the distance/time to the train station (that do not exceed a pre-specified threshold) belong to the same task.

In the facility profile, we group houses according to the similarity of the facilities. Specifically, there are four types of facilities in our data set, so we give four criteria for measuring similarity, namely (share one type), (share two types), (share three types), and (share four types). Thus, given a criterion and the names of the facilities, such as , market, we group houses that have the same market into one task, , shop and hospital, we group houses that have the same shop and hospital into one task.

4.2.2 Defining tasks based on multiple profiles

The above task definitions extract one feature profile at a time as a guideline. Such definitions not only constrain the differences in the specific feature profiles of the houses in each task, but also ensure the relatedness among tasks in terms of these profiles. However, the relatedness among tasks that depend on one feature profile are relatively weak. By introducing more feature profiles as a guideline to defining tasks, the relatedness among the resulting tasks can be strengthened, but with the refinement of the definition, the number of houses in each task may be insufficient meanwhile.

In order to guarantee sufficient number of houses necessary for each task and to enhance the relatedness between tasks, we consider six cases by combining any two of the above four task definitions, where each partition obtained corresponds to a task:

  1. statistical regions and school districts;

  2. statistical regions and transportation areas;

  3. statistical regions and neighbor facilities;

  4. school districts and transportation areas;

  5. school districts and neighbor facilities;

  6. transportation areas and neighbor facilities.

4.3 The MTL model

In this paper, we regard the MTL-based house price prediction problem as a multi-task regression problem. Given a timestamp , we extract the transaction records in the previous time intervals to construct the training input and output for each task . Here, is the number of transaction records in time intervals, is the number of house features, and includes the actual house price. Thus, for each task, we want to infer a linear function where and . Let us denote as the weight matrix over tasks. One typical MTL model for estimating is to minimize the following objective function:


where , and is the regularization term that controls the common information shared among tasks.

There can be various choices of regularization terms to fit the above objective function, and the specific choice is based on the identification of the relatedness among the defined tasks. In our house price prediction problem, it is too strict or even unrealistic to use only one type of regularization term because there are various task definitions. Moreover, the purpose of this paper is to study the application of MTL for the house price prediction problem, rather than designing a sophisticated MTL-based method to fit all task definitions. Therefore, we choose three different regularization terms to model the relatedness between tasks, and thus investigate the impact of different MTL-based methods on prediction performance.

The first way is to constrain the models of all tasks to be close to each other. The -norm regularization is widely used because it can reduce model complexity and feature learning by introducing sparsity into the model, a common simplification of -norm in MTL is that the parameter controlling the sparsity is shared among all tasks. Then the objective function can be defined as:


where is the parameter that controls sparsity.

The second way is to assume all tasks share a common yet latent representation, such as a common set of features, a common subspace. This motivates the group sparsity, the -norm regularization is usually used to implement this assumption. The objective function can be expressed as:


where is the parameter controlling the group sparsity.

Besides these two most common methods, we also consider ensuring the relatedness between tasks by adding graph regularization. Specifically, the structural relatedness among tasks is represented by a graph, each task is defined as a node, and two nodes are connected by a weighted edge. The overall objective function can be described as:


where is the connection strength between nodes (tasks) and , and are parameters for graph regularization and group sparsity, respectively.

We define as the ratio of the average house price for the two nodes to measure the structural relatedness between tasks and . Intuitively, the larger the , the graph regularization term will force to be closer to . Meanwhile, the closer and are, the more similar the average house price for these two nodes should be, i.e., tends to 1. Thus, we compute as follows:


All the optimization problems above can be solved by using the accelerated gradient descent method [36]. In this paper, we apply the implementation of accelerated gradient descent method included in the MALSAR [37] package to efficiently solve the optimization.

5 Experiments

In this section, we comprehensively evaluate the performance of our methodology in comparison with five STL-based approaches for house price prediction, and make an in-depth analysis of the impact of task definitions and method selections on our MTL-based house price prediction, and examine the prediction performance of each task individually to demonstrate:

  • MTL-based methods can significantly outperform the STL-based approaches. (Section 5.3)

  • The impact of task definitions on prediction performance far exceeds that of method selections. (Section 5.4)

  • The advantages of using MTL to preserve the relatedness among tasks. (Section 5.5)

5.1 Training and test sets

The data set studied in this experiment has been detailed in Section 3. Given the three-year records of sold house data, we first obtain several tasks based on a strategy for defining tasks. Then we evaluate the performance of our methodology by predicting the price of the houses in each task per month, i.e., is one month. In each prediction, we create a training set and a test set for each task. On the training set, the data samples are the information tabulation (features, sales price) of the houses that appeared in the previous months of the selected month, i.e., is month. On the test set, we use the data samples in the selected month. Meanwhile, we use the semi-logarithmic form of the sales price to fit the data in the training and test sets.

In order to analyze the impact of the number of months used for training on prediction, we use the split of the whole data set according to as a case and define the December of each year as the prediction horizon. We first count the number of samples for each task in December and extract those tasks whose number of samples exceeds the first quartile (1/4) of the data distribution as the test set. Then we collect data samples from the previous one month to the previous eleven months as different training sets and fit them using linear regression to predict the price of the houses in December. The prediction performance is shown in Figure 

3a, from which we have two main observations. (1) The prediction error increases with the number of previous months. This trend is not obvious in 2017, as mentioned earlier, the fluctuation of the average monthly house price in 2017 is not significant. (2) The previous three months are the most appropriate one, especially in 2017. In 2015 and 2016, although this does not seem to be the best option, considering that the shorter months may lead to contingency in prediction performance, it is necessary to use the data from the previous three months as training data. Therefore, in the following experiments, unless stated otherwise, we choose to use the previous three months in the training set, i.e., is 3.

In summary, we have a total of 36 predictions, each of which is one month in three years. We supplemented the house data for the last three months of 2014 to ensure that the first three months of 2015 are predictable. The number of samples in the training and test sets for each prediction is shown from Figure 3b to Figure 3d.

Fig. 3: Figure (a) shows the impact of the number of previous months used in the training set on the prediction performance for December each year. Figure (b) to Figure (d) show the number of samples in the training and test sets for each prediction.

5.2 Performance metrics

To evaluate the prediction performance of different methods, we employ two categories of evaluation metrics in the experiments.

The first category is two widely used prediction evaluation measures, Root Mean Squared Error [38] (RMSE) which considers the square root of the average of all prediction error, and Mean Absolute Error [39] (MAE) which calculates the average of absolute error for each predicted result. For each task, these two measures are defined in the following equations.


where () is the actual (estimated) house price, is the number of observations for each task . It is expected that a better prediction will result in a smaller value for both measures.

Given a task definition, we record the metric values for the methods in all tasks and use the mean as the performance of the methods in a prediction. Similarly, the mean of performance in all predictions is used as the overall performance of the methods under this task definition. To make performance comparison in a statistically sound way, we also use Wilcoxon’s rank sum test at the significant level of 0.05 [40].

The second category is the Win-Loss-Draw records [41]

. It is a comparative descriptive statistic. The three values, respectively, are the number of data sets for which method

obtained better, worse, or equal performance outcomes than method on a given measure. These summaries compare the performance of the two methods on different data sets and indicate a systematic underlying advantage to one of the methods.

5.3 Evaluation on overall prediction performance

We compare our methodology with the following five baseline approaches:

(1) Lasso Regression 

[42] () is a linear model with -norm regularization. It tends to prefer solutions that use a small number of features to improve the prediction accuracy and interpretability of the model. Increasing the penalty parameter in will produce more zeros in the feature coefficients.

(2) Ridge Regression 

[43] () is also a linear model, which gives more penalties by adding -norm regularization. Although such a formulation loses some information and reduces the accuracy of the fit, the regression coefficients obtained are more realistic and reliable. The penalty parameter in controls the extent of the loss.

(3) Support Vector Regression [44]

() is the natural extension of large margin kernel methods of support vector machine used for classification to regression analysis. It seeks to minimize an upper bound of the generalization error instead of the empirical error. The choice of kernel function, kernel coefficient, and penalty parameter determines the prediction performance of .

(4) AdaBoost [45] is an algorithm to improve performance by using an ensemble of weak learners to create a strong learner. The output of the weak learners is combined into a weighted sum that represents the final output of the boosted learner. There are many variants of the AdaBoost algorithm depending on the choice of weak learners. Here, we use AdaBoost.R2 ().

(5) Random Forest 


() also belongs to the category of ensemble learning algorithms. Each base learner has the same weight of influence, reducing overfitting by combining them together. Decision tree is often used as the base learner of the ensemble. In this study, the base learner is actually a regression tree. The number of trees in the ensemble is considered to be the design parameter of .

We use the Python scikit-learn library [47] to implement the above approaches. There are various parameters and options in each approach. Specifically, we set the kernel function of to a Gaussian function, and the weak learner in is defined as a decision tree regressor. As for the remaining parameters of each approach, we use 5-fold cross-validation to determine.

Tables VI and VII present the values of RMSE and MAE for all tested approaches under different task definitions, and the last row shows the mean of the RMSE and MAE for these approaches. We use the best performing approach under each task definition as a benchmark and compare the performance of the approaches by Wilcoxon’s rank sum test, from which we have three main observations. (1) The RMSE and MAE values for all approaches fluctuate insignificantly. This indicates that the performance of these approaches is stable. (2) The first three MTL-based methods outperform all the five baseline approaches. It indicates that the tasks are not independent and capturing their relatedness can improve the learning performance. (3) In the comparison of MTL-based methods, the impact of method changes on overall performance is limited. Although enhances the relatedness between tasks through graph regularization, which may help improve performance, we can see from the experimental results that this improvement is not significant.

5.4 Performance evaluation on different task definitions

We try two categories of task definitions, i.e., single-profile one and multiple-profile one. For each task definition, we evaluated three MTL-based methods to capture the relatedness between tasks in house price prediction. Given an MTL method, we choose the task definition that makes it perform the best as a benchmark, and compare its performance under various task definitions by Wilcoxon’s rank sum test.

5.4.1 Task definitions based on one single profile

The results are summarized in Table VIII. It can be clearly seen that no matter which method is involved, the rank sum test results between its performance under various task definitions and its best performance are similar. For , compared to the task definition with the best performance, there are significantly different task definitions that exist in all four single profiles, and are mainly concentrated in the education profile. For , the experimental phenomenon is very similar to . For , although the differences between the education profile and the best scenario are still evident, the performance of other profiles has improved.

Comparing these experimental results, we can find: (1) Task definitions in the house price prediction can be diverse. Traditional

Category Task definition strategies
House 0.219 0.226 0.192* 0.268 0.260 0.338 0.257 0.244
0.191 0.191 0.189* 0.306 0.280 0.278 0.272 0.225
0.203 0.207 0.190* 0.350 0.310 0.273 0.303 0.223
0.203 0.206 0.191* 0.383 0.366 0.244 0.365 0.214
Education [1, 10] 0.244 0.228 0.209* 0.361 0.326 0.325 0.318 0.255
[1, 20] 0.259 0.244 0.219* 0.345 0.296 0.351 0.299 0.281
[1, 30] 0.267 0.253 0.207* 0.347 0.296 0.372 0.295 0.260
[1, 40] 0.247 0.239 0.202* 0.345 0.302 0.364 0.297 0.263
[1, 50] 0.262 0.252 0.210* 0.348 0.316 0.353 0.308 0.257
[1, 10] 0.263 0.248 0.229* 0.252 0.250 0.377 0.249 0.274
[1, 20] 0.242 0.229 0.213* 0.255 0.300 0.337 0.285 0.257
[1, 30] 0.235 0.227 0.211* 0.350 0.302 0.340 0.268 0.247
[1, 40] 0.262 0.251 0.218* 0.351 0.307 0.341 0.285 0.277
[1, 50] 0.258 0.247 0.216* 0.354 0.314 0.339 0.290 0.258
Transportation [0, 1000] 0.213 0.211* 0.221 0.351 0.294 0.245 0.270 0.255
[0, 2000] 0.202 0.201* 0.204 0.314 0.313 0.244 0.311 0.230
[0, 3000] 0.197* 0.197 0.198 0.323 0.322 0.243 0.322 0.221
[0, 4000] 0.194* 0.195 0.194 0.295 0.293 0.250 0.292 0.217
[0, 5000] 0.195* 0.196 0.197 0.306 0.303 0.248 0.303 0.220
Facility _S () 0.199 0.200 0.193* 0.328 0.323 0.262 0.224 0.221
_H () 0.193 0.192* 0.192 0.329 0.324 0.263 0.224 0.221
_G () 0.201 0.201 0.194* 0.371 0.361 0.229 0.260 0.226
_M () 0.189 0.188* 0.193 0.316 0.314 0.249 0.216 0.222
_S, H 0.193 0.192* 0.194 0.350 0.332 0.243 0.221 0.222
_S, G 0.186 0.185* 0.192 0.338 0.336 0.223 0.232 0.228
_S, M 0.192 0.191* 0.194 0.341 0.359 0.240 0.235 0.222
_H, G 0.188 0.187* 0.194 0.328 0.338 0.224 0.234 0.228
_H, M 0.190 0.189* 0.193 0.349 0.337 0.237 0.225 0.223
_G, M 0.188 0.187* 0.194 0.335 0.337 0.225 0.234 0.231
_S, H, G 0.186 0.185* 0.192 0.347 0.332 0.219 0.228 0.230
_S, H, M 0.190 0.189* 0.194 0.346 0.364 0.232 0.260 0.227
_S, G, M 0.186 0.185* 0.192 0.332 0.332 0.221 0.219 0.230
_H, G, M 0.186 0.185* 0.194 0.346 0.334 0.221 0.230 0.230
_S, H, G, M 0.185 0.184* 0.192 0.351 0.323 0.217 0.240 0.231
, [1, 40] 0.194* 0.201 0.194 0.476 0.476 0.266 0.476 0.206
, [1, 30] 0.195 0.205 0.193* 0.443 0.378 0.273 0.377 0.208
, [0, 4000] 0.190* 0.192 0.195 0.373 0.353 0.240 0.350 0.196
, M 0.190* 0.190 0.191 0.356 0.339 0.242 0.337 0.194
, S, M 0.193 0.190* 0.195 0.452 0.382 0.237 0.378 0.197
, S, H, M 0.191 0.188* 0.194 0.544 0.483 0.230 0.478 0.197
[1, 40], [0, 4000] 0.190* 0.191 0.195 0.587 0.559 0.234 0.454 0.197
[1, 30], [0, 4000] 0.191* 0.194 0.197 0.562 0.437 0.240 0.431 0.199
[1, 40], M 0.191 0.189* 0.195 0.562 0.437 0.240 0.431 0.198
[1, 40], S, M 0.193 0.190* 0.196 0.582 0.584 0.234 0.433 0.200
[1, 40], S, H, M 0.191 0.187* 0.195 0.639 0.621 0.226 0.418 0.200
[1, 30], M 0.193* 0.193 0.193 0.617 0.614 0.246 0.470 0.197
[1, 30], S, M 0.193 0.191* 0.195 0.614 0.612 0.237 0.479 0.198
[1, 30], S, H, M 0.192 0.190* 0.195 0.629 0.617 0.231 0.445 0.200
[0, 4000], M 0.192 0.188* 0.198 0.444 0.394 0.228 0.388 0.198
[0, 4000], S, M 0.193 0.188* 0.199 0.685 0.516 0.224 0.462 0.200
[0, 4000], S, H, M 0.193 0.188* 0.199 0.678 0.661 0.219 0.471 0.203
Mean of overall performance 0.205 0.203 0.200 0.403 0.378 0.263 0.320 0.220
TABLE VI: Evaluation of RMSE among all tested approaches under two categories of task definitions. The first two columns show the specific task definitions, such as House, SA4 means that one partition at the SA4 level is a task. Education, PSCH_RANK (PRIMARY SCHOOL RANK) [1, 20] means that the top 20 primary schools, one school district is a task. Similarly, SSCH_RANK (SECONDARY SCHOOL RANK) [1, 40] means that the top 40 secondary schools, one school district is a task. Transportation, STN_DIS (DISTANCE TO STATION) [0, 4000] means that one station is a task, and houses within 4, 000 meters belong to each task. Facility, SHARED2_S, M means that houses with the same shop and market belong to the same task. The last eight columns show the RMSE for all tested approaches. In particular, those in bold and asterisk indicate the benchmark under each task definition, and those in bold-only indicate that the p-value for the rank sum test is greater than 0.05.
Category Task definition strategies
House 0.169 0.177 0.148* 0.208 0.203 0.268 0.200 0.190
0.149 0.150 0.147* 0.231 0.215 0.220 0.208 0.175
0.157 0.161 0.147* 0.261 0.236 0.216 0.230 0.173
0.156 0.161 0.148* 0.288 0.278 0.193 0.277 0.166
Education [1, 10] 0.178 0.171 0.150* 0.273 0.235 0.257 0.236 0.185
[1, 20] 0.193 0.187 0.164* 0.270 0.225 0.278 0.235 0.205
[1, 30] 0.199 0.194 0.157* 0.272 0.228 0.295 0.231 0.197
[1, 40] 0.187 0.184 0.154* 0.269 0.269 0.289 0.232 0.202
[1, 50] 0.197 0.192 0.160* 0.275 0.242 0.281 0.236 0.195
[1, 10] 0.197 0.192 0.178* 0.195 0.191 0.299 0.190 0.205
[1, 20] 0.184 0.178 0.166* 0.194 0.215 0.267 0.207 0.195
[1, 30] 0.181 0.178 0.166* 0.276 0.225 0.277 0.209 0.189
[1, 40] 0.198 0.194 0.167* 0.274 0.230 0.270 0.219 0.206
[1, 50] 0.195 0.192 0.165* 0.278 0.239 0.267 0.226 0.197
Transportation [0, 1000] 0.167* 0.167 0.174 0.260 0.229 0.197 0.222 0.199
[0, 2000] 0.158* 0.159 0.160 0.237 0.238 0.195 0.238 0.180
[0, 3000] 0.155* 0.155 0.155 0.239 0.243 0.194 0.244 0.173
[0, 4000] 0.153 0.154 0.152* 0.222 0.223 0.197 0.223 0.170
[0, 5000] 0.153* 0.155 0.154 0.229 0.230 0.196 0.230 0.172
Facility _S () 0.154 0.156 0.149* 0.252 0.243 0.208 0.175 0.172
_H () 0.150 0.151 0.149* 0.252 0.243 0.209 0.175 0.171
_G () 0.155 0.156 0.149* 0.277 0.275 0.184 0.203 0.178
_M () 0.150* 0.150 0.152 0.230 0.233 0.197 0.170 0.171
_S, H 0.151* 0.152 0.151 0.275 0.264 0.195 0.172 0.174
_S, G 0.150* 0.150 0.153 0.266 0.265 0.181 0.184 0.181
_S, M 0.151* 0.151 0.151 0.270 0.287 0.191 0.188 0.173
_H, G 0.151 0.150* 0.155 0.259 0.267 0.182 0.187 0.181
_H, M 0.149* 0.150 0.151 0.275 0.268 0.189 0.178 0.174
_G, M 0.150* 0.150 0.155 0.265 0.268 0.182 0.187 0.184
_S, H, G 0.151* 0.151 0.155 0.272 0.264 0.180 0.180 0.185
_S, H, M 0.151 0.151 0.135* 0.271 0.277 0.187 0.207 0.178
_S, G, M 0.150* 0.150 0.155 0.264 0.264 0.180 0.171 0.185
_H, G, M 0.151 0.150* 0.156 0.272 0.265 0.181 0.181 0.185
_S, H, G, M 0.151 0.150* 0.156 0.274 0.243 0.178 0.190 0.186
, [1, 40] 0.151* 0.158 0.151 0.397 0.396 0.212 0.396 0.164
, [1, 30] 0.151 0.160 0.150* 0.378 0.312 0.217 0.311 0.165
, [0, 4000] 0.149* 0.152 0.153 0.281 0.269 0.192 0.266 0.156
, M 0.148* 0.149 0.149 0.263 0.255 0.192 0.253 0.152
, S, M 0.151* 0.151 0.152 0.357 0.298 0.189 0.295 0.156
, S, H, M 0.152 0.151* 0.153 0.480 0.390 0.185 0.387 0.157
[1, 40], [0, 4000] 0.151* 0.152 0.154 0.511 0.492 0.187 0.358 0.157
[1, 30], [0, 4000] 0.151* 0.154 0.155 0.504 0.371 0.191 0.366 0.158
[1, 40], M 0.150* 0.150 0.153 0.504 0.371 0.192 0.366 0.156
[1, 40], S, M 0.153 0.152* 0.155 0.509 0.508 0.189 0.368 0.159
[1, 40], S, H, M 0.153 0.151* 0.155 0.541 0.521 0.184 0.337 0.160
[1, 30], M 0.151* 0.152 0.151 0.518* 0.515 0.196 0.380 0.155
[1, 30], S, M 0.152* 0.152 0.154 0.514 0.511 0.191 0.387 0.158
[1, 30], S, H, M 0.153 0.152* 0.155 0.525 0.518 0.187 0.347 0.159
[0, 4000], M 0.152 0.150* 0.156 0.345 0.307 0.184 0.302 0.156
[0, 4000], S, M 0.159 0.152* 0.159 0.573 0.418 0.181 0.374 0.160
[0, 4000], S, H, M 0.156 0.154* 0.161 0.569 0.556 0.179 0.383 0.164
Mean of overall performance 0.160 0.158 0.155 0.323 0.300 0.210 0.254 0.175
TABLE VII: Evaluation of MAE among all tested approaches under two categories of task definitions. The first two columns show the specific task definitions, such as House, SA4 means that one partition at the SA4 level is a task. Education, PSCH_RANK (PRIMARY SCHOOL RANK) [1, 20] means that the top 20 primary schools, one school district is a task. Similarly, SSCH_RANK (SECONDARY SCHOOL RANK) [1, 40] means that the top 40 secondary schools, one school district is a task. Transportation, STN_DIS (DISTANCE TO STATION) [0, 4000] means that one station is a task, and houses within 4, 000 meters belong to each task. Facility, SHARED2_S, M means that houses with the same shop and market belong to the same task. The last eight columns show the MAE for all tested approaches. In particular, those in-bold and asterisk indicate the benchmark under each task definition, and those in bold-only indicate that the p-value for the rank sum test is greater than 0.05.
Category Task definition strategies RMSE MAE
House 0.219 0.226 0.192 0.169 0.177 0.148
0.191 0.191 0.189* 0.149 0.150 0.147
0.203 0.207 0.190 0.157 0.161 0.147
0.203 0.206 0.191 0.156 0.161 0.148
Education [1, 10] 0.244 0.228 0.209 0.178 0.171 0.150
[1, 20] 0.259 0.244 0.219 0.193 0.187 0.164
[1, 30] 0.267 0.253 0.207 0.199 0.194 0.157
[1, 40] 0.247 0.239 0.202 0.187 0.184 0.154
[1, 50] 0.262 0.252 0.210 0.197 0.192 0.160
[1, 10] 0.263 0.248 0.229 0.197 0.192 0.178
[1, 20] 0.242 0.229 0.213 0.184 0.178 0.166
[1, 30] 0.235 0.227 0.211 0.181 0.178 0.166
[1, 40] 0.262 0.251 0.218 0.198 0.194 0.167
[1, 50] 0.258 0.247 0.216 0.195 0.192 0.165
Transportation [0, 1000] 0.213 0.211 0.221 0.167 0.167 0.174
[0, 2000] 0.202 0.201 0.204 0.158 0.159 0.160
[0, 3000] 0.197 0.197 0.198 0.155 0.155 0.155
[0, 4000] 0.194 0.195 0.194 0.153 0.154 0.152
[0, 5000] 0.195 0.196 0.197 0.153 0.155 0.154
Facility _S () 0.199 0.200 0.193 0.154 0.156 0.149
_H () 0.193 0.192 0.192 0.150 0.151 0.149
_G () 0.201 0.201 0.194 0.155 0.156 0.149
_M () 0.189 0.188 0.193 0.150 0.150 0.152
_S, H 0.193 0.192 0.194 0.151 0.152 0.151
_S, G 0.186 0.185 0.192 0.150 0.150 0.153
_S, M 0.192 0.191 0.194 0.151 0.151 0.151
_H, G 0.188 0.187 0.194 0.151 0.150 0.155
_H, M 0.190 0.189 0.193 0.149 0.150 0.151
_G, M 0.188 0.187 0.194 0.150 0.150 0.155
_S, H, G 0.186 0.185 0.192 0.151 0.151 0.155
_S, H, M 0.190 0.189 0.194 0.151 0.151 0.135*
_S, G, M 0.186 0.185 0.192 0.150 0.150 0.155
_H, G, M 0.186 0.185 0.194 0.151 0.150 0.156
_S, H, G, M 0.185* 0.184* 0.192 0.151 0.150 0.156
, [1, 40] 0.194 0.201 0.194 0.151 0.158 0.151
, [1, 30] 0.195 0.205 0.193 0.151 0.160 0.150
, [0, 4000] 0.190 0.192 0.195 0.149 0.152 0.153
, M 0.190 0.190 0.191 0.148* 0.149* 0.149
, S, M 0.193 0.190 0.195 0.151 0.151 0.152
, S, H, M 0.191 0.188 0.194 0.152 0.151 0.153
[1, 40], [0, 4000] 0.190 0.191 0.195 0.151 0.152 0.154
[1, 30], [0, 4000] 0.191 0.194 0.197 0.151 0.154 0.155
[1, 40], M 0.191 0.189 0.195 0.150 0.150 0.153
[1, 40], S, M 0.193 0.190 0.196 0.153 0.152 0.155
[1, 40], S, H, M 0.191 0.187 0.195 0.153 0.151 0.155
[1, 30], M 0.193 0.193 0.193 0.151 0.152 0.151
[1, 30], S, M 0.193 0.191 0.195 0.152 0.152 0.154
[1, 30], S, H, M 0.192 0.190 0.195 0.153 0.152 0.155
[0, 4000], M 0.192 0.188 0.198 0.152 0.150 0.156
[0, 4000], S, M 0.193 0.188 0.199 0.159 0.152 0.157
[0, 4000], S, H, M 0.193 0.188 0.199 0.156 0.154 0.161
TABLE VIII: Evaluation of RMSE and MAE among three MTL-based methods under two categories of task definitions. The first two columns show the specific task definitions, such as House, Transportation: SA3, [0, 4000] means that the tasks are defined based on these two profiles, and the same SA3 and houses within 4000 meters of the same station belong to the same task. The last six columns show the RMSE and MAE for three MTL-based methods. In particular, those in bold and asterisk indicate the benchmark under each method, and those in bold-only indicate that the p-value for the rank sum test is greater than 0.05.

prediction approaches are usually considered from geographical factors, but it can be seen from the experimental results that other factors can also ensure good prediction performance, even beyond geographical factors. (2) The optimal MTL-based methods that correspond to the various task definitions are different. Although enhances the relatedness between tasks by adding graph regularization, such a setting is too strict in some task definitions, which reduces the prediction performance. For example, in the transportation profile, compared with , the performance of and is available. (3) In terms of prediction performance, the choice of task definitions has a greater impact than the choice of MTL-based methods, such as, the performance of the facility profile is superior to the education profile regardless of which MTL-based method is used.

5.4.2 Task definitions based on multiple profiles

We choose the following task definitions as representatives of each profile and assemble them in pairs to redefine tasks. The first is in the statistical regions. The second is [1, 40], [1, 30] in the primary and secondary school districts, respectively. The third is [0, 4000] in the transportation areas. The fourth is M () in , S (), M in , and S, H (), M in in the neighbor facilities, respectively. The prediction

Task definition strategies Group
(0, 1/4] 6/11/0 5/12/0 9/8/0 0/17/0 0/17/0 1/16/0 0/17/0 0/17/0
(1/4, 1/2] 6/9/1 4/12/0 9/6/1 0/16/0 0/16/0 0/16/0 0/16/0 0/16/0
(1/2, 3/4] 12/4/0 3/13/0 4/12/0 0/16/0 0/16/0 0/16/0 0/16/0 0/16/0
(3/4, 1] 10/4/2 0/16/0 4/10/2 0/16/0 0/16/0 0/16/0 0/16/0 0/16/0
[1, 30] (0, 1/4] 1/4/0 0/5/0 3/2/0 0/5/0 0/5/0 1/4/0 0/5/0 0/5/0
(1/4, 1/2] 3/2/0 4/1/0 1/4/0 0/5/0 1/4/0 0/5/0 1/4/0 1/4/0
(1/2, 3/4] 2/1/0 2/1/0 0/3/0 0/3/0 1/2/0 0/3/0 1/2/0 0/3/0
(3/4, 1] 3/2/0 4/1/0 0/5/0 0/5/0 2/3/0 4/1/0 2/3/0 1/4/0
[0, 4000] (0, 1/4] 24/29/1 24/29/1 18/35/1 1/53/0 2/52/0 14/39/1 1/53/0 17/36/1
(1/4, 1/2] 20/34/0 20/34/0 22/30/2 0/54/0 0/54/0 6/48/0 0/54/0 21/31/2
(1/2, 3/4] 25/26/3 25/26/3 15/38/1 0/54/0 0/54/0 2/52/0 0/54/0 34/20/0
(3/4, 1] 18/36/0 18/36/0 28/24/2 0/54/0 0/54/0 1/52/1 0/54/0 9/41/4
(0, 1/4] 52/53/2 61/42/4 17/85/5 1/106/0 2/105/0 26/80/1 3/104/0 47/57/3
(1/4, 1/2] 39/68/1 47/57/4 33/73/2 0/108/0 1/107/0 6/101/1 1/107/0 46/60/2
(1/2, 3/4] 33/71/2 39/65/2 44/59/3 0/106/0 0/106/0 2/104/0 0/106/0 33/69/4
(3/4, 1] 29/72/5 28/76/2 51/48/7 0/106/0 0/106/0 1/105/0 0/106/0 22/78/6

Win/Loss/Draw records obtained by comparing all tested approaches with based on four different task definitions. The first two columns show the selected task definitions and the quantile-based groupings, such as , (1/4, 1/2] means that the tasks are defined based on , and the number of samples for the tasks in this group is between the first quartile and the second quartile of the distribution of the number of samples for all tasks. The last eight columns show the Win/Loss/Draw scores for all tested approaches. Those in bold correspond to the best scenarios.

(a) Task definition based on
(b) Task definition based on [1, 30]
(c) Task definition based on [0, 4000]
(d) Task definition based on
Fig. 4: Evaluation of the prediction performance for four methods in each task. The x-axis corresponds to the task indexes under the specified task definition. The y-axis corresponds to the RMSE and the number of samples per task, respectively.

performance of MTL-based methods under each of the above task definitions is also summarized in Table VIII.

We have the following observations. (1) The performance difference between various task definitions is not significant. This is because the definition based on multiple profiles is fine-grained, hence the tasks obtained are similar, which makes the difference in performance small. (2) The performance of and is closer to , even better in some task definitions. This somehow indicates that optimizing the performance by enhancing the relatedness among tasks is limited when the task definitions are sufficient. (3) Compared with one single profile, the overall performance based on multiple profiles is generally better. This also indicates that the task definitions have a greater impact on prediction performance than that of the MTL-based methods.

Discussion. Based on the above analysis, we not only validate the impact of task definitions and method selections on prediction performance, but also demonstrate that the impact of task definitions on prediction performance far exceeds that of method selections. In addition, we note that the MTL-based method using graph regularization usually performs well when the task definitions are not sufficient (definition based on one single profile). However, when the task definitions are very refined (definition based on multiple profiles), the performance of the MTL-based methods using general regularization (-norm, -norm) is good enough, which deserves further investigations via more data sources.

5.5 Performance evaluation on each task

Why MTL-based prediction methods can achieve good performance? We here take a further look into the MTL model by investigating the quality of prediction for each task in a prediction. In particular, we extract the task definitions based on , [1, 30], [0, 4000], and M () in as four cases, respectively. In each task definition, we first count the distribution of the number of data samples for all tasks, and then divide the tasks into four groups according to the first quartile (1/4), the second quartile (1/2), and the third quartile (3/4). Finally, we choose as a benchmark and choose RMSE as the measure to count the Win-Loss-Draw records for all tested approaches in each group.

The results are summarized in Table IX. We can find: (1) In groups with fewer samples, i.e., (0, 1/4] and (1/4, 1/2], the performance of MTL-based methods is significantly better than that of STL-based approaches. This confirms the advantages of MTL model. (2) In groups with enough samples, i.e., (1/2, 3/4] and (3/4, 1], the performance of MTL-based methods is still good, even in some task definitions, and outperform MTL-based methods. However, STL-based approaches ignore the relatedness between tasks, which result in well-behaved approaches in these groups that do not fit into other groups. Therefore, the overall performance of MTL-based methods is superior to them.

In order to better understand the conclusions of the above quantitative evaluation, we illustrate the performance of , , , and for each task based on four task definitions. As shown in Figure 4, we extract 30 tasks under each task definition and give the number of samples for each task and the prediction performance for four approaches in each task. Note that the results of some SECONDARY SCHOOL cannot be described in Figure (b)b due to the absence of house transactions. Taking Figure (a)a as an example, it can be clearly seen: (1) In tasks with fewer samples, such as task index less than 10, the performance of MTL-based methods is consistently better than that of other methods. (2) The performance gap is narrowed as the number of samples in the tasks increases, such as task index greater than 22, but the performance of MTL-based methods is still acceptable.

In summary, our MTL-based house price prediction is robust. It guarantees the prediction performance when data samples are sufficient, and when the samples are insufficient, it optimizes the prediction performance by exploiting the relatedness between tasks. As a result, the overall performance is improved.

6 Conclusions and Future Work

In this paper, we carried out a deep research on the application of MTL for the house price prediction problem. In terms of data profiling, we define and capture a fine-grained location profile powered by a diverse range of location data sources. In term of prediction model, there are two key points in the implementation of MTL-based house price prediction: task definitions and method selections. Therefore, we designed two categories of strategies to define tasks based on various house features, and selected three general MTL-based methods with different regularization terms to capture and utilize the relatedness between tasks. By extensive experimental evaluations, we first demonstrated that modeling based on MTL can significantly improve the overall performance of house price prediction. Then we illustrated that the diversity of task definitions is conducive to the MTL formulation for the house price prediction problem. Finally, we revealed that the impact of task definitions on prediction performance far exceeds that of method selections.

In the future, we will extend our methodology to adaptively learn task definitions, and we plan to explore non-linear models rather than only focus on linear models. We will also try the house recommendations based on the outcomes of the price prediction.


This work was partially supported by ARC under Grants DP170102726, DP180102050, DP170102231, DP160103595, and the National Natural Science Foundation of China (NSFC) under Grants 61728204, 91646204, and 71571093. Zhifeng Bao is a recipient of Google Faculty Award.


  • [1] J. F. Kain and J. M. Quigley, “Measuring the value of housing quality,” Journal of the American Statistical Association, vol. 65, no. 330, pp. 532–548, 1970.
  • [2] R. Schulz and A. Werwatz, “A state space model for berlin house prices: Estimation and economic interpretation,” The Journal of Real Estate Finance and Economics, vol. 28, no. 1, pp. 37–57, 2004.
  • [3] K. J. Lancaster, “A new approach to consumer theory,” Journal of Political Economy, vol. 74, no. 2, pp. 132–157, 1966.
  • [4] S. Rosen, “Hedonic prices and implicit markets: product differentiation in pure competition,” Journal of political economy, vol. 82, no. 1, pp. 34–55, 1974.
  • [5] A. Can, “Specification and estimation of hedonic housing price models,” Regional science and urban economics, vol. 22, no. 3, pp. 453–474, 1992.
  • [6] A. Król, “Application of hedonic methods in modelling real estate prices in poland,” in Data Science, Learning by Latent Structures, and Knowledge Discovery, 2013, pp. 501–511.
  • [7] R. YAYAR and D. DEMİR, “Hedonic estimation of housing market prices in turkey,” Erciyes Üniversitesi İktisadi ve İdari Bilimler Fakültesi Dergisi, no. 43, pp. 67–82, 2014.
  • [8] H. Selim, “Determinants of house prices in turkey: Hedonic regression versus artificial neural network,” Expert Systems with Applications, vol. 36, no. 2, pp. 2843–2852, 2009.
  • [9] J. Gu, M. Zhu, and L. Jiang, “Housing price forecasting based on genetic algorithm and support vector machine,” Expert Systems with Applications, vol. 38, no. 4, pp. 3383–3386, 2011.
  • [10] X. Wang, J. Wen, Y. Zhang, and Y. Wang, “Real estate price forecasting based on svm optimized by pso,” Optik-International Journal for Light and Electron Optics, vol. 125, no. 3, pp. 1439–1443, 2014.
  • [11] B. Park and J. K. Bae, “Using machine learning algorithms for housing price prediction: The case of fairfax county, virginia housing data,” Expert Syst. Appl., vol. 42, no. 6, pp. 2928–2934, 2015.
  • [12] S. Bourassa, E. Cantoni, and M. Hoesli, “Predicting house prices with spatial dependence: a comparison of alternative methods,” Journal of Real Estate Research, vol. 32, no. 2, pp. 139–159, 2010.
  • [13] B. Case, J. Clapp, R. Dubin, and M. Rodriguez, “Modeling spatial and temporal house price patterns: A comparison of four models,” The Journal of Real Estate Finance and Economics, vol. 29, no. 2, pp. 167–191, 2004.
  • [14] I. H. Gerek, “House selling price assessment using two different adaptive neuro-fuzzy techniques,” Automation in Construction, vol. 41, pp. 33–39, 2014.
  • [15] J.-M. Montero, R. Mínguez, and G. Fernández-Avilés, “Housing price prediction: parametric versus semi-parametric spatial hedonic models,” Journal of Geographical Systems, vol. 20, no. 1, pp. 27–55, Jan 2018.
  • [16] A. Argyriou, T. Evgeniou, and M. Pontil, “Multi-task feature learning,” in Advances in Neural Information Processing Systems, 2007, pp. 41–48.
  • [17] R. Caruana, “Multitask learning,” Machine Learning, vol. 28, no. 1, pp. 41–75, Jul 1997.
  • [18] T. Evgeniou and M. Pontil, “Regularized multi–task learning,” in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.   ACM, 2004, pp. 109–117.
  • [19] Y. Zhang and Q. Yang, “A survey on multi-task learning,” CoRR, vol. abs/1707.08114, 2017.
  • [20] B. Chidlovskii, “Multi-task learning of time series and its application to the travel demand,” CoRR, vol. abs/1712.08164, 2017.
  • [21] J. Ghosn and Y. Bengio, “Multi-task learning for stock selection,” in Advances in Neural Information Processing Systems, 1996, pp. 946–952.
  • [22] N. Jaques, S. Taylor, E. Nosakhare, A. Sano, and R. Picard, “Multi-task learning for predicting health, stress, and happiness,” in NIPS Workshop on Machine Learning for Healthcare, 2016.
  • [23] L. Zhao, Q. Sun, J. Ye, F. Chen, C.-T. Lu, and N. Ramakrishnan, “Multi-task learning for spatio-temporal event forecasting,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.   ACM, 2015, pp. 1503–1512.
  • [24] S. Emrani, A. McGuirk, and W. Xiao, “Prognosis and diagnosis of parkinson’s disease using multi-task learning,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.   ACM, 2017, pp. 1457–1466.
  • [25] J. Zhou, L. Yuan, J. Liu, and J. Ye, “A multi-task learning formulation for predicting disease progression,” in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.   ACM, 2011, pp. 814–822.
  • [26] Y. Liu, Y. Zheng, Y. Liang, S. Liu, and D. S. Rosenblum, “Urban water quality prediction based on multi-task multi-view learning,” in

    Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence

    , 2016, pp. 2576–2581.
  • [27] G.-Z. Fan, S. E. Ong, and H. C. Koh, “Determinants of house price: A decision tree approach,” Urban Studies, vol. 43, no. 12, pp. 2301–2315, 2006.
  • [28] M. Kryvobokov and M. Wilhelmsson, “Analysing location attributes with a hedonic model for apartment prices in donetsk, ukraine,” International Journal of Strategic Property Management, vol. 11, no. 3, pp. 157–178, 2007.
  • [29] J. R. Ottensmann, S. Payton, and J. Man, “Urban location and housing prices within a hedonic model,” Journal of Regional Analysis & Policy, vol. 38, no. 1, pp. 19–35, 2008.
  • [30] A. Y. Ozalp and H. Akinci, “The use of hedonic pricing method to determine the parameters affecting residential real estate prices,” Arabian Journal of Geosciences, vol. 10, no. 24, p. 535, 2017.
  • [31] M. Kuntz and M. Helbich, “Geostatistical mapping of real estate prices: an empirical comparison of kriging and cokriging,” International Journal of Geographical Information Science, vol. 28, no. 9, pp. 1904–1921, 2014.
  • [32] A. S. Adair, J. N. Berry, and W. S. McGreal, “Hedonic modelling, housing submarkets and residential valuation,” Journal of property Research, vol. 13, no. 1, pp. 67–83, 1996.
  • [33] H. Kuşan, O. Aytekin, and İlker Özdemir, “The use of fuzzy logic in predicting house selling price,” Expert Systems with Applications, vol. 37, no. 3, pp. 1808 – 1813, 2010.
  • [34] M. Li, Z. Bao, T. Sellis, and S. Yan, “Visualization-aided exploration of the real estate data,” in Databases Theory and Applications.   Springer International Publishing, 2016, pp. 435–439.
  • [35] M. Li, Z. Bao, T. Sellis, S. Yan, and R. Zhang, “Homeseeker: A visual analytics system of real estate data,” Journal of Visual Languages & Computing, vol. 45, pp. 1 – 16, 2018.
  • [36] S. Ruder, “An overview of gradient descent optimization algorithms,” CoRR, vol. abs/1609.04747, 2016.
  • [37] J. Zhou, J. Chen, and J. Ye, “Malsar: Multi-task learning via structural regularization,” Arizona State University, vol. 21, 2011.
  • [38] C. J. Willmott and K. Matsuura, “Advantages of the mean absolute error (mae) over the root mean square error (rmse) in assessing average model performance,” Climate research, vol. 30, no. 1, pp. 79–82, 2005.
  • [39] T. Chai and R. R. Draxler, “Root mean square error (rmse) or mean absolute error (mae)?–arguments against avoiding rmse in the literature,” Geoscientific model development, vol. 7, no. 3, pp. 1247–1250, 2014.
  • [40] F. Wilcoxon, “Individual comparisons by ranking methods,” Biometrics bulletin, vol. 1, no. 6, pp. 80–83, 1945.
  • [41] G. I. Webb, “Multiboosting: A technique for combining boosting and wagging,” Machine Learning, vol. 40, no. 2, pp. 159–196, 2000.
  • [42] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 267–288, 1996.
  • [43] A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation for nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 55–67, 1970.
  • [44] D. Basak, S. Pal, and D. C. Patranabis, “Support vector regression,” Neural Information Processing-Letters and Reviews, vol. 11, no. 10, pp. 203–224, 2007.
  • [45] D. P. Solomatine and D. L. Shrestha, “Adaboost. rt: a boosting algorithm for regression problems,” Neural Networks, vol. 2, pp. 1163–1168, 2004.
  • [46] A. Liaw, M. Wiener et al., “Classification and regression by randomforest,” R news, vol. 2, no. 3, pp. 18–22, 2002.
  • [47] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, no. Oct, pp. 2825–2830, 2011.