With the development mobile Internet and the blossom of sharing economy, all kinds of spatial crowdsourcing (SC) platforms become popular, where the online crowd workers are employed by their phones to participate in and complete offline crowdsourcing tasks in the physical world . Typical SC platforms includes Gigwalk111www.gigwalk.com, TaskRabbit222www.taskrabbit.com and gMission333gmission.github.io .
One fundamental issue in SC is task assignment, namely assigning crowdsourcing tasks to suitable crowd workers. Generally speaking, there are two kinds of tasks. The first kind is micro tasks which can be completed by any single worker such as taking photos and delivering things. The second kind is specialty-aware tasks such as repairing a house and organizing a party, where crowd workers with different kinds of skills are needed to work collaboratively and finish the task. For micro-task assignment, there are many existing works and we refer the readers to  for more details. In this paper we focus on specialty-aware tasks assignment.
Existing works [3, 4] on specialty-aware tasks assignment formulate that each crowd worker has multiple skills and will get a united fee if s/he is employed, which is not very practical as (1) workers often have unbalanced workloads, (2) workers may be confused what they should do in a task and (3) the payment and the workload do not often match. To solve the above drawbacks, in this paper we propose the Specialty-Aware Task Assignment (SATA) problem where each crowd worker specify a fee for each of her/his skill to make the payment proportional to the workload.
We then illustrate the STAT problem by a motivation example of organizing a party.
|Tasks||Lists of required skills|
|(music), (barbecue), (lights)|
|Workers||Skills and fees|
Suppose we have three tasks of throwing parties, each has different styles and thus different kinds of works need to be done. For example, party 1 is a mini one and only needs music and drinks, while party 3 is ceremonious and requires music, drinks, barbecue, lights and a stage. The skill lists of the three tasks are shown in Table 1. Besides, we have some workers shown in Table 2, each with different skills and corresponding fees. For example, if is required to finish the music job (), s/he will be paid 3. Besides, each worker will get the transportation fee, which equals the distance from the worker to the assignment task times a global unit price. For example, Figure 1 shows the locations of tasks and workers, and if the global unit price is 0.5, the transportation fees for assigning to is .
Motivated by the example above, we will formalize the STAT problem, which aims to efficiently assign crowd workers to specialty-aware tasks to maximize the total utility of the assignment. Note that existing works either focus on assigning workers to micro tasks to optimize different goals, or assume that the workers have a united fee. Thus, their methods cannot be directed adopted to solve our problem.
In this paper, we first prove the NP-hardness of the SATA problem, indicating that SATA is not tractable and it is challenging to gain the optimal solution. Therefore, we propose two efficient and effective heuristics to solve it.
To summarize, we make the following contributions.
We formally define a new task assignment problem in spatial crowdsourcing, called the Specialty-Aware Task Assignment (SATA) problem.
We prove the SATA problem is NP-hard, and develop two efficient heuristics to solve it.
We verify the effectiveness and efficiency of the proposed methods with extensive experiments on real and synthetic datasets.
The rest of the paper is organized as follows. We define the SATA problem and prove its NP-hardness in Section 2. Section 3 discusses extensive experiment results on both synthetic and real datasets. We review related works in Section 4 and conclude in Section 5.
2 Problem Definition
We first introduce two basic concepts, namely Task and Worker. Then, we introduce how to calculate the reward of worker. Finally, we formally give the definition of the Specialty-Aware Task Assignment (SATA) problem.
Definition 1 (Worker)
A worker is defined as , where is the location of which can be described by longitude and latitude, is the list of skills that masters, and is the list of fees for each skill in .
Similar to the definition of a worker, a task is formally defined as follows.
Definition 2 (Task)
A task is defined as , where is the location of which can be described by longitude and latitude, is the list of skills that are needed to complete collaboratively, and is the total monetary budget of .
Briefly, a worker’s reward includes two parts: (1) transportation fee, which is directly proportional to the distance between the worker and the task; (2) labor fee, which is the sum of the fees for the skills used to perform a task.
Definition 3 (Reward of Worker)
The reward of task to perform task equals , where is the distance between and , which can be Euclidean distance or road network distance, is a global parameter representing the unit transportation fee, and is the set of skills that uses to perform the task.
We define the utility of a task as follows.
Definition 4 (Utility of Task)
The utility of task is defined as , where is the budget of the task and is the summation of rewards of workers assigned to if is completed. If cannot be finished, the utility is zero.
We finally define our problem as follows.
Definition 5 (Specialty-Aware Task Assignment (SATA) Problem)
Given a set of tasks , a set of workers and a global unit transportation fee , the problem is to assign workers to tasks to maximize the total utility of the completed tasks and the following constraints should be satisfied:
Specialty Constraint: a task can be completed as long as the workers assigned to it can cover the required skills of the task;
Budget Constraint: the total rewards of workers assigned to a task cannot exceed the task’s total budget;
We then prove the hardness of SATA problem.
The SATA problem is NP-hard.
We prove through a reduction from the set cover problem 
We first introduce the set cover problem. Given a universe and its subsets , . Each is associated with a cost . The set cover problem is to find a set to minimize satisfying .
We next show how to transform the set cover problem to an instance of our SATA problem. We only have one task which requires skills and has infinite budget . For workers , their required fees for skills are all zero, and we adjust their locations and to make their transportation fee to perform is . For this instance of our SATA problem, we aim to find a set of workers to maximize the utility of , which equals to minimize . In this way, we reduce set cover problem to our SATA problem. As the set cove problem is known to be NP-hard , SATA problem is also NP-hard.
In this section, we give two efficient heuristic algorithms to solve the SATA problem.
3.1 Total Budget Based Algorithm
Our first algorithm is called the Total Budget Based Algorithm (TBA). The main idea is that we always try to assign workers to the tasks with the largest budget. During the procedure of task assignment, we refer to the greedy algorithm to solve the set cover problem .
The procedure of TBA is shown in algorithm-1. The algorithm takes the set of workers and set of tasks as input, and return an assignment between them as shown in lines 1-2. In line 3, the algorithm first sorts the tasks in in descending order according to their budgets, and the sorted result is saved in . In lines 4-13, for each task in , we refer to the greedy algorithm to solve the set cover problem  to assign workers. Specifically, in lines 5, we find worker with minimum . Notes that here considers all possible subsets of . In lines 6-7, we update , and . In lines 9-11, if is , which means it can be completed, we break the loop and start to assign workers for the next task.
Back to our running example in Example 1. TBA first finds the task with the largest total budget, which is . The it starts to assign workers for . As has the minimal of 2, we first assign to . After assigning , ’s list of skills has not been covered, thus we assign to with of . We finally assign to and the total reward paid to , and is . Thus, the utility of is . Similarly, we assign workers to and successively, and the final utility of TBA is 21.08.
Complexity. If we take the maximum number of skills a worker may have as a constant, the time complexity of TBA is .
3.2 Average Budget Based Algorithm
The TBA algorithms only considers the total budget of tasks. However, a large budget may result from a large number of skills required in the task. Thus, in this subsection, we propose another algorithm, called Average Budget Based Algorithm (ABA). The main idea is that we first measure the average budget of all the tasks, and prefer to assign workers to tasks with a larger average budget.
The pseudo codes of ABA is shown in Algorithm-2. The biggest difference between TBA and ABA lies on line 3. In TBA, we first sort tasks in based on average budget, which is defined as . The procedure of how to assign workers to a given task is the same as TBA, which is shown in lines 4-13.
Back to our running example in Example 1. Different from TBA, ABA first finds the task with the largest average budget, which is . Then it assigns workers to . As has the minimal of 2, we first assign to . After assigning , we find ’s list of skills has been covered, thus the total utility is . Similarly, we next assign workers for and , and the final utility of TBA is 25.78.
Complexity. If we take the maximum number of skills a worker may have as a constant, the time complexity of ABA is also .
4.1 Experiment Setup
We use real and synthetic datasets to evaluate our algorithms. Real data comes from CSTO (http://www.csto.com/ ), which is an outsource task platform. In the CSTO dataset, each task is associated with a set of skills needed to complete the software development task, and each coder is associated with a set of skills and an average price which can be deduced from the history data. Since the CSTO data is not associated with location information, we generate the distance of each coder from the task following uniform distribution. For synthetic data, based on the observation from real data set, the price of skills owned by a worker and the budget of a task follow Gaussian distribution, respectively. Statistics of the synthetic data are shown in Table3, where we mark our default settings in bold font.
), which is an outsource task platform. In the CSTO dataset, each task is associated with a set of skills needed to complete the software development task, and each coder is associated with a set of skills and an average price which can be deduced from the history data. Since the CSTO data is not associated with location information, we generate the distance of each coder from the task following uniform distribution. For synthetic data, based on the observation from real data set, the price of skills owned by a worker and the budget of a task follow Gaussian distribution, respectively. Statistics of the synthetic data are shown in Table3, where we mark our default settings in bold font.
|100 300 500 700 900|
|1000 3000 5000 7000 9000|
|0.1 0.3 0.5 0.7 0.9|
|60 80 100 120 140|
|10 15 20 25 30|
|10 20 30 40 50|
4.2 Experiment Results
In this subsection, we test the performance of our proposed algorithms by setting different parameters. We evaluate two exact algorithm, called TBA and ABA, and a baseline algorithm in terms of total utility score, running time and memory cost, and study the effect of varying parameters on the performance of the algorithms. The baseline algorithm uses a simple random strategy, which assigns workers to tasks randomly. The algorithms are implemented in CodeBlocks16.1, and the experiments are performed on a machine with Intel(R) Core(TM) i5 2.50GHZ CPU and 8GB main memory.
Effects of the number of tasks . The results of varying are presented in Fig.(a)a to (c)c. First, we can observe that the utility increases as increases, which is reasonable as more tasks available. Also, we can observe that TBA algorithm and ABA algorithm are much better than baseline algorithm and TBA algorithm has advantages over ABA algorithm. As for running time, TBA and ABA are slower than the baseline due to sorting tasks and finding more economic schedule, and the running time is acceptable for better performance on utility. Moreover, TBA is faster than ABA for it is easier to find suitable workers for each tasks. The three algorithm do not vary much in memory consumption.
Effects of the number of workers . The results of varying are presented in Fig.(a)a to (c)c. We can observe that the utility, running time and memory consumption generally increase as increase, which is reasonable as more workers need to be assigned. Again, we can see that TBA are better than ABA in terms of Utility and running time.
Effects of the global unit transportation fee . The results of varying are presented in Fig.(a)a to (c)c. We can see that the utility and running time decrease as the increases for higher transportation fee and less workers that could be assigned to far tasks.
Effects of the average budget of tasks . The results are presented in Fig.(a)a to (c)c. We can first see from the figure that the utility increases as the average budget increases. And there is no large differences of the running time and memory consumption between various .
Effects of the variance of the price of different skills
Effects of the variance of the price of different skills. The results are presented in Fig.(a)a to (c)c. We can see from the figures that TBA algorithm and ABA algorithm have much better performance than baseline algorithm as the price increases. And the running time and memory consumption do not vary too much in different price.
Effects of the total number of skills . The results are presented in Fig.(a)a to (c)c. First, we can observe that the utility and memory consumption do not change greatly as the number increases. Then, we can see that the running time increases as the number increases, and this is reasonable because it is much harder to find suitable workers to finish the task for more kinds of skills.
Conclusion. For Utility, TBA is better than ABA and baseline algorithm, and both TBA and ABA algorithm have a much better performance than baseline algorithm. As for running time, baseline algorithm is fastest, but the speed of TBA and ABA algorithm is acceptable for most circumstances. Moreover, TBA algorithm is faster than ABA algorithm.
5 Related Work
In this section, we review related works from two categories, namely task assignment and team formation problem.
5.1 Task Assignment in Spatial Crowdsourcing
The research on task assignment in spatial crowdsourcing mainly includes two parts: micro-task assignment and specialty-aware task assignment.
Micro task refers to the spatial tasks that can be completed by any single worker.  is the first work on task assignment in spatial crowdsourcing, whose optimization objective is to maximize the total number of the assignment tasks.  is the first work focusing on the online scenario of task assignment, and studies the two-sided online task assignment problem, whose goal is to maximize the total utility score of the assignment.  also focuses on the online scenario and considers the influence of work space on task assignment, whose goal is to maximize the total utility score.  studies the problem of online minimum weighted bipartite matching, which can be used in online task assignment.  considers the problem of flexible online matching where workers can be scheduled if no task is assigned.  recommends routes dynamically for workers to deal with online tasks, and the goal is to maximize the total utility.  assigns tasks to workers while trading off quality and latency of task completion.  proposes a match-based approach to solve the dynamic pricing problem in spatial crowdsourcing.  takes the destinations of workers into consideration to perform task assignment.  considers performing online task assignment while preserving the privacy of tasks and workers under the circumstance that the server is untrusted. [12, 19] proposes a real-time framework for task assignment. The difference between our work and the aforementioned works is that they focus on micro tasks which can be completed by a single worker, and we study on the assignment for specialty-aware tasks which have requirements on skills of workers and usually have to be completed by multiple workers collaboratively.
[4, 5] recommend top-k teams with the minimum cost to a specialty-aware task.  studies assigning workers for specialty-aware tasks to maximize the total utility score. The difference between our work and  is that in our work workers specify fees for each of their skills, and in  workers only have a united fee, which is not practical.
5.2 Team Formation Problem
A closely related topic is the team formation problem , whose goal is to find a team of experts with the minimum cost, according to the skills and social relationships of the users.  studies the online version of the team formation problem, where the issue of workload balance is also considered.  studies another variant of the team formation problem where the capacity constraint of experts is considered. The difference between our problem and the team formation problem and its variants is that we do not consider the social relationships between users and focus on task assignment.
In this paper we study the problem of Specialty-Aware Task Assignment (SATA) in spatial crowdsourcing, where the tasks have requirements on skills, and the workers specify fees for each of their skills. The goal is to maximize the total utility of the task assignment between tasks and workers. We prove the SATA problem is NP-hard. To solve the problem, we propose two efficient and effective heuristic algorithms. We conduct extensive experiments on both synthetic and real-world datasets to evaluate our algorithms. The experiment results show that our solutions are efficient and effective.
-  Anagnostopoulos, A., Becchetti, L., Castillo, C., Gionis, A., Leonardi, S.: Online team formation in social networks. In: WWW 2012. pp. 839–848
-  Chen, Z., Fu, R., Zhao, Z., Liu, Z., Xia, L., Chen, L., Cheng, P., Cao, C.C., Tong, Y., Zhang, C.J.: gmission: A general spatial crowdsourcing platform. PVLDB 7(14), 1629–1632 (2014)
-  Cheng, P., Lian, X., Chen, L., Han, J., Zhao, J.: Task assignment on multi-skill oriented spatial crowdsourcing. TKDE 28(8), 2201–2215 (2016)
-  Gao, D., Tong, Y., She, J., Song, T., Chen, L., Xu, K.: Top-k team recommendation in spatial crowdsourcing. In: WAIM 2016. pp. 191–204
-  Gao, D., Tong, Y., She, J., Song, T., Chen, L., Xu, K.: Top-k team recommendation and its variants in spatial crowdsourcing. DSE 2(2), 136–150 (2017)
-  Kazemi, L., Shahabi, C.: Geocrowd: enabling query answering with spatial crowdsourcing. In: GIS 2012. pp. 189–198
-  Lappas, T., Liu, K., Terzi, E.: Finding a team of experts in social networks. In: SIGKDD 2009. pp. 467–476
-  Majumder, A., Datta, S., Naidu, K.: Capacitated team formation problem on social networks. In: SIGKDD 2012. pp. 1005–1013
-  Musthag, M., Ganesan, D.: Labor dynamics in a mobile micro-task market. In: CHI 2013
-  Song, T., Tong, Y., Wang, L., She, J., Yao, B., Chen, L., Xu, K.: Trichromatic online matching in real-time spatial crowdsourcing. In: ICDE 2017. pp. 1009–1020
-  Tao, Q., Zeng, Y., Zhou, Z., Tong, Y., Chen, L., Xu, K.: Multi-worker-aware task planning in real-time spatial crowdsourcing. In: DASFAA 2018
-  To, H., Fan, L., Tran, L., Shahabi, C.: Real-time task assignment in hyperlocal spatial crowdsourcing under budget constraints. In: PerCom 2016. pp. 1–8
-  To, H., Shahabi, C., Xiong, L.: Privacy-preserving online task assignment in spatial crowdsourcing with untrusted server. In: ICDE 2018
-  Tong, Y., Chen, L., Shahabi, C.: Spatial crowdsourcing: Challenges, techniques, and applications. PVLDB 10(12), 1988–1991 (2017)
-  Tong, Y., She, J., Ding, B., Chen, L., Wo, T., Xu, K.: Online minimum matching in real-time spatial data: Experiments and analysis. vol. 9, pp. 1053–1064 (2016)
-  Tong, Y., She, J., Ding, B., Wang, L., Chen, L.: Online mobile micro-task allocation in spatial crowdsourcing. In: ICDE 2016. pp. 49–60
-  Tong, Y., Wang, L., Zhou, Z., Chen, L., Du, B., Ye, J.: Dynamic pricing in spatial crowdsourcing: A matching-based approach. In: SIGMOD 2018
-  Tong, Y., Wang, L., Zhou, Z., Ding, B., Chen, L., Ye, J., Xu, K.: Flexible online task assignment in real-time spatial data. PVLDB 10(11), 1334–1345 (2017)
-  Tran, L., To, H., Fan, L., Shahabi, C.: A real-time framework for task assignment in hyperlocal spatial crowdsourcing. TIST 9(3), 37 (2018)
-  Vazirani, V.V.: Approximation Algorithms. Springer Science & Business Media (2013)
-  Zeng, Y., Tong, Y., Chen, L., Zhou, Z.: Latency-oriented task completion via spatial crowdsourcing. In: ICDE 2018
-  and Yang Li and Yu Wang and Han Su and Kai Zheng, Y.Z.: Destination-aware task assignment in spatial crowdsourcing. In: CIKM 2017. pp. 297–306