This paper presents an online reinforcement learning framework to solve the quality-cost-aware task allocation problem in multi-attribute social sensing applications. Social sensing has emerged as a new sensing paradigm in pervasive and mobile computing applications where humans (or devices on their behalf) collectively report measurements about the physical world wang2019age; wang2015social. Examples of social sensing applications include air quality and environment monitoring in smart cities using mobile devices wang2015ccs, malfunctioning urban infrastructures reporting using geotagging yu2015mobile, and damage assessment in disaster response using online social media wang2012truth. In social sensing applications, participants perform sensing tasks at assigned locations to collect different attributes of the measured variables that are of interests to the application zhang2018robust. For example, in an urban air quality sensing application, participants are tasked to measure various air quality attributes (e.g., PM2.5, PM10, CO2
) at different locations of the city to estimate the overall air quality and identity potential health risks. We refer to this category of applications asmulti-attribute social sensing applications.
In multi-attribute social sensing applications, there exists a fundamental tradeoff between data quality and sensing (task allocation) cost wang2015ccs; zhang2016incentives. In particular, it is essential to obtain comprehensive and accurate measurements to ensure the desired data quality of the social sensing applications. Such a dedicated data collection process often also encounters a high sensing cost (e.g., more incentives to recruit participants to perform sensing tasks) wang2015ccs. However, the high sensing cost may not always be affordable to the applications with a finite budget jaimes2012location. Therefore, a key challenge for social sensing applications is to find a task allocation strategy (i.e., decide when and where to collect sensing data) that achieves an optimized tradeoff between the data quality and sensing cost. We refer to this problem as the quality-cost-aware task allocation problem. The current solutions to address this problem primarily focus on identifying an optimal set of sensing locations (i.e., cells) to collect measurements to minimize the overall sensing error wang2015ccs; zhang2018real; tong2016online; zhou2015qoata; hsieh2015inferring; zhang2017expertise; yu2015quality; liu2016cost; liu2018survey. However, these solutions cannot be directly adapted to solve our task allocation problem due to three challenges that have not been fully addressed: online task allocation, multi-attribute constrained optimization and nonuniform task allocation cost. We elaborate them below.
Online Task Allocation. Many social sensing applications are delay sensitive and require timely response to meet the application requirement hsieh2015inferring. For example, during a hurricane, it is crucial for the application to decide when and where the data should be collected to provide real-time situation awareness about the disaster. However, online task allocation in social sensing is challenging due to the large spatial-temporal dynamics of the measured variables and opportunistic nature of social sensing participants hsieh2015inferring; zhang2018real. This problem becomes more challenging in a multi-attribute sensing scenario where the values of all attributes change simultaneously. Several task allocation methods have been developed to address similar problems wang2015ccs; ahmed2011distance. However, a few important limitations exist. First, existing models largely ignore the high dynamics of the social sensing applications and allocate the sensing task to cells one by one until the data quality requirement is met wang2015ccs. Second, current solutions do not explicitly consider the correlation between different attributes, thus leading to sub-optimal task allocation solutions.
Multi-attribute Constrained Optimization. We observe that different sensing attributes often have different spatio-temporal distributions that will affect the task allocation decisions zhang2018real. For example, the local optimized task allocation strategy for a particular sensing attribute may not be the global optimized task allocation strategy for all attributes. Furthermore, different sensing attributes may have inherent and complex dependencies. For example, the PM2.5 and CO2 are often found to be correlated in a social sensing application that measures the air quality of a city zhang2018real. It is not a trivial job to design a task allocation strategy that can effectively identify the optimal set of sensing cells that can minimize the sensing error across multiple interdependent sensing attributes with diversified distributions.
Nonuniform Task Allocation Cost. The task allocation cost in social sensing is often related to the incentives to motivate a participant to travel from one sensing location to another kazemi2012geocrowd
. Such task allocation cost often has a non-uniform distribution (e.g., different travel distances will lead to different amount of incentives), which adds additional complexity to the optimized task allocation problemzhao2016spatial. For example, in a social sensing application as shown in Figure 1, a participant at location may be assigned to collect air quality readings at two possible locations: location and . The sensing measurements collected at B will reduce the overall sensing error more significantly. However, the task allocation cost at is also higher than because the travel distance between and is larger than the one between and (i.e., ). The question is which location we should send participant to perform the sensing task. To answer this question, the task allocation scheme needs to carefully explore the tradeoff between the data quality and the nonuniform task allocation cost.
In this paper, we develop a Quality-Cost-Aware Online Task Allocation (QCO-TA) scheme to address above challenges under a principled online reinforcement learning framework. To address the online task allocation challenge, we develop an online learning algorithm that dynamically estimates the priority of each sensing cell for different sensing attributes at each cycle. To address the multi-attribute constrained optimization
challenge, we develop a Bayesian inference scheme that judiciously combines the priority estimations of all sensing attributes into a comprehensive priority score that identifies the cells to effectively reduce the overall sensing error across different sensing attributes. To address thenonuniform task allocation cost challenge, we develop a principled reinforcement learning method to explicitly consider the nonuniform task allocation cost and learns the optimal set of sensing cells for the task allocation. Finally, we evaluate the QCO-TA scheme on a real-world social sensing dataset: Piemonte Air Quality Dataset. The results show that our scheme significantly outperforms the state-of-the-art baselines in both sensing accuracy and cost.
We choose the online and reinforcement learning framework to address the quality-cost-aware task allocation problem in multi-attribute social sensing applications for two main reasons. First, the sensing measurements in social sensing applications are often collected in real time aggarwal2013social. The online learning technique is a great fit for such application scenarios because it is capable of dynamically adjusting the task allocation decisions based on the streaming sensing measurements. This is in contrast to the batch based learning techniques which often require a large amount of high-quality training data a priori that are not available in our problem setting. Second, our quality-cost-aware task allocation problem aims to find a task allocation strategy that achieves an optimized trade-off between data quality and sensing cost given incomplete sensing measurements (i.e., due to a finite sensing budget). The reinforcement learning technique is a goal-oriented learning technique that fits nicely into our problem. In particular, it provides a data-driven solution that learns to achieve a complex objective (i.e., Equation (2)) given a set of partially available sensing measurements. The reinforcement learning solution sharply contrasts with classical optimization techniques (e.g., linear programming, dynamic programming) which often require a complete set of sensing measurements to learn the optimized trade-off between data quality and sensing cost.
However, such a complete sensing dataset is often not available due to limited sensing resources and coverage in social sensing applications
that are not available in our problem setting. Second, our quality-cost-aware task allocation problem aims to find a task allocation strategy that achieves an optimized trade-off between data quality and sensing cost given incomplete sensing measurements (i.e., due to a finite sensing budget). The reinforcement learning technique is a goal-oriented learning technique that fits nicely into our problem. In particular, it provides a data-driven solution that learns to achieve a complex objective (i.e., Equation (2)) given a set of partially available sensing measurements. The reinforcement learning solution sharply contrasts with classical optimization techniques (e.g., linear programming, dynamic programming) which often require a complete set of sensing measurements to learn the optimized trade-off between data quality and sensing cost. However, such a complete sensing dataset is often not available due to limited sensing resources and coverage in social sensing applicationswang2015ccs.
A preliminary version of this work was published in the zhang2018optimizing. We refer to the scheme developed in the previous work as the online optimized multi-attribute task allocation (OO-MTA) scheme. The current paper is a significant extension of the previous work in the following aspects. First, we extend our previous model by explicitly exploring the optimized tradeoff between the data quality and sensing cost in multi-attribute social sensing applications. In contrast, the OO-MTA only focuses on optimizing data quality and does not take the sensing cost into consideration. Second, we develop a novel reinforcement learning algorithm that explicitly addresses the nonuniform task allocation cost challenge identified in this paper. Third, we add a new set of experiments to explicitly evaluate the performance of all compared schemes in terms of both data quality and sensing cost. Fourth, we compare our scheme with more recent task allocation schemes including OO-MTA and demonstrate the performance gains achieved by the QCO-TA scheme compared to all baselines. Finally, we extend the related work by adding a new discussion on the cost-aware task allocation schemes and the difference between QCO-TA and those schemes (Section 2).
2 Background and Related Work
2.1 Social Sensing
Social sensing has emerged as a new application paradigm due to the proliferation of portable devices and ubiquitous Internet connectivity wang2015social. A recent survey of social sensing can be found in wang2019age. Social sensing has been widely used in environment sensing hsieh2015inferring, traffic monitoring zhang2018risksens, emergence and disaster response zhang2017constraint, social sensor profiling zhang2018opinion, point-of-interest (POI) identification zhang2017large, clickbait video detection shang2019towards, and abnormal event identification giridhar2016clarisense+. Quality-cost-aware optimal task allocation in social sensing remains to be an open challenge that has not been fully addressed wang2015ccs. This paper addresses the quality-cost-aware task allocation problem in a more challenging scenario where the measured variables have multiple dependent attributes and nonuniform sensing costs.
2.2 Task Allocation
Task allocation with sparse resources has been well studied in mobile crowdsensing literature wang2015ccs; zhang2018real; tong2016online; vance2019towards; zhang2018cooperative; zhou2015qoata; hsieh2015inferring; zhang2017expertise; yu2015quality; liu2016cost; zhang2019integrated; zhang2019heteroedge . Those techniques can be classified into two main categories based on their primary objectives: 1) developed a data quality aware task allocation scheme that leverages active learning and Bayesian inference techniques to allocate sensing tasks to a limited number of crowd sensors to reduce the overall sensing cost
. Those techniques can be classified into two main categories based on their primary objectives: 1)Resource and Cost Reduction: For example, Wang et al.
developed a data quality aware task allocation scheme that leverages active learning and Bayesian inference techniques to allocate sensing tasks to a limited number of crowd sensors to reduce the overall sensing costwang2015ccs. Zhang et al. developed a bottom-up task allocation scheme where mobile sensors bid for tasks to minimize energy cost using a game-theoretic approach zhang2018real. Tong et al. proposed a two-phased-based online task allocation scheme to reduce the task allocation cost in real-time crowdsourcing systems tong2016online. 2) Quality-of-Service (QoS) Improvement: For example, Zhou et al. developed a budget-aware task allocation scheme that maximizes the quality of sensing data under the constraints imposed by the physical distance between tasks zhou2015qoata. Hsieh et al. developed a greedy task allocation scheme to allocate sensors to cells that would generate minimum entropy to improve the inference accuracy hsieh2015inferring. Zhang et al. proposed an expertise-aware task allocation scheme to ensure the quality of the collected data in mobile crowdsourcing systems by inferring the expertise of task participants through truth analysis zhang2017expertise. There also exist a few solutions that explore both cost and QoS. For example, Yu et al. proposed a quality and budget aware task allocation scheme to improve the data quality for spatial crowdsourcing systems given the application specific budget limitations yu2015quality. Liu et al. developed an information distribution aware task allocation framework to minimize the task allocation cost of mobile crowdsourcing applications while ensuring the social fairness (e.g., task load balancing) of each participant liu2016cost.
Our work is clearly different from the above solutions in several important aspects. First, we explicitly consider the multi-attribute sensing problem which is more challenging than the single attribute problem addressed in the above literature. In the multi-attribute sensing problem, different sensing attributes often have different and correlated spatial-temporal distributions that can lead to inconsistent or even conflicting task allocation decisions zheng2013u. For example, the sensing measurements that can significantly improve the estimation accuracy of a specific sensing attribute may not be equally helpful for other sensing attributes. Moreover, it is not a trivial task to model the complex dependencies between different sensing attributes and understand how such dependencies would affect the global optimized task allocation solution. Second, the goal of our solution is to achieve a complex optimization objective (i.e., jointly optimize the trade-off between the nonuniform task allocation cost and the sensing data quality). The above objective becomes more challenging when we consider the incomplete sensing measurements introduced by a finite sensing budget in our problem. Such incomplete sensing measurements often provide inadequate evidence to explore the aforementioned trade-off between the data quality and sensing cost in multiple-attribute social sensing applications.
2.3 Online Learning
Our work is also related to online learning techniques which have been applied in solving decision making problems in social sensing applications feng2010online; zhang2017maintenance; rajan2013crowdcontrol; zhang2018light; xu2017online; zhang2018scalable. In particular, online learning learns to make sequential decisions to achieve the desired quality-of-service of an application and dynamically adjust the learning process based on the streaming data received at each step. For example, Rajan et al. developed an online learning task scheduling scheme that coordinates the real-time execution of crowd tasks through the learning of crowd performance rajan2013crowdcontrol. Feng et al. developed an online learning algorithm to detect abnormal behavior patterns in crowds using the online self-organization mapping technique proposed an online learning framework to improve the efficiency of competence-based knowledge compression in machine learning
developed an online learning algorithm to detect abnormal behavior patterns in crowds using the online self-organization mapping techniquefeng2010online. Zhang et al.
proposed an online learning framework to improve the efficiency of competence-based knowledge compression in machine learningzhang2017maintenance. Xu et al. developed an efficient online learning algorithm for dynamic workload offloading (to the centralized cloud) to minimize the system delay and operation cost xu2017online. To the best of our knowledge, the QCO-TA scheme is one of the first approaches to leverage online learning techniques to address the quality-cost-aware multi-attribute task allocation problem in social sensing.
2.4 Reinforcement Learning
Finally, our work also bears some relevance with the reinforcement learning techniques that have been applied in recommendation systems, intelligent transportation systems, computer vision, natural language processing, and control theory
Finally, our work also bears some relevance with the reinforcement learning techniques that have been applied in recommendation systems, intelligent transportation systems, computer vision, natural language processing, and control theoryzhang2017dynamic; xu2018zero; supancic2017tracking; branavan2009reinforcement; abbeel2007application. In particular, reinforcement learning learns to optimize the desired objective of an application by maximizing the cumulative reward received from the environment when exploring the application-specific search space. For instance, Zhang et al. developed a scholar collaboration and recommendation system via competitive multi-agent reinforcement learning zhang2017dynamic. Xu et al. applied a deep reinforcement learning approach in intelligent transportation systems to improve the control robustness for autonomous driving vehicles xu2018zero. Supancic III et al. presented a reinforcement learning based decision making framework to continuously track the objects in streaming videos supancic2017tracking. Branavan et al. developed a new reinforcement learning approach to effectively map natural language instructions to executable actions branavan2009reinforcement. Abbeel et al. proposed a reinforcement learning based autonomous helicopter flight control system using differential dynamic programming abbeel2007application. To the best of our knowledge, the QCO-TA scheme is among the first frameworks to leverage reinforcement learning techniques to explore the optimized tradeoff between the data quality and sensing cost in multi-attribute social sensing applications.
3 Problem Statement
In this section, we formulate the problem of quality-cost-aware task allocation in multi-attribute social sensing applications. We first define a few terms that will be used in the problem statement.
Sensing Cell: We divide the target area for multi-attribute social sensing task into disjoint cells where each cell represents a subarea of interest. In particular, we define to represent the set of sensing cells in the target area, to be the total number of sensing cells, and to be the sensing cell in the target area.
Sensing Cycle: A sensing cycle is a period of time where participants perform one round of the sensing tasks. We define to be the total number of sensing cycles, and to be the sensing cycle.
Sensing Attribute: In social sensing applications, the measured variables often have multiple sensing attributes. We define to be the total number of sensing attributes, and to be the attribute.
Consider a social sensing application where the goal is to monitor the air quality index of a city by tasking people to collect sensing measurements at different locations. In this case, a sensing cell is a neighborhood where the sensing values stay relatively stable spatially wang2015ccs. A sensing cycle reflects the frequency of sensing data updates (e.g., hourly, daily). In order to estimate the air quality index, participants collect a set of sensing attributes (e.g., NO2, CO, PM2.5 and PM10) that are associated with the measured variables (i.e., the air quality index at different cells).
Real Sensing Value (): We define the matrix to represent the ground-truth sensing value of measured variables. In particular, is the ground truth sensing value of attribute , and is the ground truth sensing value of attribute in cell at cycle .
Collected Sensing Value (): We define a matrix to represent the collected sensing values of measured variables. In particular, is the collected sensing values of attribute , and is the collected sensing value of attribute in cell at cycle .
Inferred Sensing Value (): We define an matrix to represent the inferred sensing values of measured variables from the inference algorithms by leveraging the collected sensing value. In particular, is the inferred sensing value of attribute , and is the inferred sensing value of attribute in cell at cycle .
Sensing Error (): we define the Sensing Error to be the mean absolute error between the inferred sensing value and the real sensing value. In particular, we have , where is the inference error of attribute in cell at cycle .
Number of Participants (): we define to be the total number of participants that can be assigned for task allocation in the application. In social sensing, we observe that the number of available participants is often much smaller than the number of sensing cells due to the budget and resource limitations wang2015ccs, i.e., . We denote as the participant. In addition, we assume the participants in our multi-attribute social sensing applications to be collaborative, i.e., participants agree to perform all assigned task during the application period as long as compensation is provided.
Task Allocation Cost ()111We use the term task allocation cost and sensing cost interchangeably in the paper.: we define the task allocation cost to be the compensation to cover the cost of a participant to travel from one sensing cell to another. We consider the cost to be proportional to the travel distance of a participant in this paper. In particular, we have
where and are the task allocation cost and travel distance of participant at sensing cycle . In this paper, we choose the above simplified cost model, however, we observe that the above cost function can be readily extended by considering additional cost/incentive design and mobility modeling zhang2015incentives to accommodate specific requirements of a social sensing application. For example, we can extend the Equation (1) by modeling the diversified response time and transportation expense of different sensing participants, e.g., participants may choose different means of transportation (e.g., walk, bike, drive, bus) to travel between different sensing cells depending on personal preference or availability of different transportation options.
The goal of our online quality-cost-aware task allocation in multi-attribute social sensing applications is to make real-time task allocation decisions that optimize the tradeoff between the overall sensing error for all sensing attributes of the measure variables and the task allocation costs over all participants. Formally our problem is defined as:
where denotes the total number of sensing cycles of interest, and represents the cells that are allocated to the participants for sensing tasks at cycle . Considering that different sensing attributes have different ranges of values, we define as the normalization function to normalize the sensing error for attribute .
The above problem is NP-hard since each of its two objectives (i.e., minimizing sensing error or minimizing sensing cost) can be reduced to the Knapsack problem ( i.e., one of the Karp’s 21 NP-hard problems) 222The detailed proof of NP-hardness of the proposed optimization problem can be found in the Appendix. pisinger1995minimal. In this paper, we develop a QCO-TA scheme that judiciously explores the tradeoff between the data quality and sensing cost and identifies an optimized task allocation strategy to jointly optimize the sensing error and cost. The details of the QCO-TA scheme are discussed in the next section.
In this section, we present the Quality-Cost-Aware Online Task Allocation (QCO-TA) scheme to address the problem formulated in the previous section using a principled online reinforcement learning framework. The QCO-TA scheme consists of three modules: 1) a Single-Attribute Priority Estimation (SPE) module, 2) a Multi-Attribute Priority Integration (MPI) module, and 3) a Nonuniform-Cost-Aware Task Selection (NTS) module. The overall architecture of the QCO-TA scheme is shown in Figure 2.
4.1 Single-Attribute Priority Estimation (SPE)
In this subsection, we present the single-attribute priority estimation module that addresses the online task allocation challenge discussed in the introduction. We first define a few terms that will be used in this module.
Task Priority: We define the task priority as the order in which the sensing cells are selected for task allocation given a particular sensing attribute 333Since each task is associated with a particular sensing cell, we use task and cell priority interchangeably in the rest of the paper.. A task with the highest priority will be selected first.
Priority Score: We further define the priority score as a scalar to quantify the task priority defined above.
In particular, the SPE module estimates the priority score of each cell for a given sensing attribute and dynamically updates the estimations based on the collected sensing values from the previous cycles using an online learning algorithm. To compute the task priority of each cell for a given attribute in real time, we need to know which cell’s sensing value, if collected, would be the most helpful one to reduce the sensing error. This problem has been proven to be an NP-hard without knowing the real sensing value of cells in advance wang2015ccs. To solve this problem, we develop an efficient approximation algorithm that considers two factors directly related to the sensing error of a cell: 1) uncertainty
: the estimation confidence of the sensing values in a cell from a given inference algorithm (e.g., KNN, SVR)wang2015ccs; 2) representativeness: how accurately the sensing value of the target cell can be used to represent the values of its neighboring cells pan2005finding. In QCO-TA, we use temporal entropy and spatial mutual information to estimate the uncertainty and representativeness of a sensing cell, respectively.
Temporal Entropy (): We define temporal entropy to quantify the uncertainty of the inferred sensing value of a sensing cell as follows:
where is the temporal entropy of cell at cycle for attribute . is the inferred sensing value of cell at cycle for attribute (defined in Definition 6).
is the distribution (e.g., normal distribution) of the inferred sensing value. is the function to calculate the differential entropy for the distribution friedman2001elements. Intuitively, a high temporal entropy of a cell indicates that the inference algorithm is uncertain about its inferred sensing values of that cell and vice versa.
Spatial Mutual Information (SMI): We define the spatial mutual information of a sensing cell to be the aggregated mutual information between the target cell and the rest of cells:
where represents the spatial mutual information of cell at cycle for attribute . is the function to calculate the mutual information of inferred sensing values between different cells ross2014mutual. In particular, is the mutual information between cell and . Intuitively, a high spatial mutual information of a cell indicates that the sensing values of the cell, if selected, can be used to reduce the inference error significantly.
We then combine the temporal entropy () and spatial mutual information () to compute the priority score (PS) that determines the task priority of each sensing cell to be selected for task allocation as follows:
where represents the priority score of cell at sensing cycle given sensing attribute . and are the weights for temporal entropy and spatial mutual information at cycle , respectively. The values of and are tuned based on the requirements of specific applications.
4.2 Multi-Attributes Priority Integration (MPI)
In this subsection, we describe the Multi-Attribute Priority Integration (MPI) module to address the multi-attribute constrained optimization challenge. First, we formally define a comprehensive ranking score as follows.
Unified Priority Score (): We define the to be the weighted summation of the priority score of all sensing attributes generated by the SPE module as follows:
where is the unified priority score for cell at cycle and is the number of sensing attributes. is the weight for attribute in cycle . is the priority score of cell at cycle for attribute .
The key question is how to dynamically compute the weights for all attributes at each cycle so that the aggregated sensing error (defined in Equation (2)) can be minimized by exploiting the dependencies between attributes. To solve this problem, we develop an exponential weighted online learning algorithm that dynamically updates the weights for all attributes based on collected sensing values in real time. In particular, we have the following updating rule for the weight:
where and are the weights for attribute at cycle and , respectively. is the learning rate parameter that directly controls the scale of the weight assigned to each sensing attribute.
is the loss function that measures the sensing error between the inferred sensing value (, defined in Definition 6) and the real sensing value (, defined in Definition 4) in the current cycle . The intuition of this weight updating function is that it increases the weights of the sensing attributes that contribute less sensing error.
The challenging part of computing the above weight updating function is how to calculate the loss function since we do not have the real sensing value for all cells due to the budget limitation. To address this problem, we apply Bayesian inference to estimate the loss function as follows:
where is the estimated loss given the collected sensing values and inferred sensing values .
is the inverse of the cumulative distribution function given the distribution. is the distribution of the mean absolute error () between and for attribute in current cycle (i.e., ). We assume follows the normal distribution, which is a common assumption for MAE in social sensing applications wang2015ccs. In addition,
is a probability threshold to determine the level of approximation between the lossand estimated loss . It is usually set to be higher than 0.95.
4.3 Nonuniform-Cost-Aware Task Allocation (NTS)
In this subsection, we describe the Nonuniform-Cost-Aware Task Allocation (NTS) module to address the nonuniform task allocation cost challenge. In particular, the NTS module judiciously integrates the nonuniform task allocation costs with the unified ranking score generated by the MPI module to explore the optimized tradeoff between the data quality and sensing cost through a principled reinforcement learning framework. We first define a quality-cost-aware ranking score as follows.
where is the quality-cost-aware ranking score for cell at cycle . is the unified priority score of cell at cycle . is the cost for participants to move from current cell to cell to perform the sensing task at cycle . is the mapping function that integrates the UPS score and nonuniform sensing cost. Intuitively, a high QRS value indicates that a cell, if selected, would most likely to reduce the overall sensing errors with the minimal sensing costs, and vice versa.
The key challenge now is how to design an effective mapping function to compute the score so that the aggregated sensing error and overall sensing cost can be jointly optimized as indicated in Equation (2). To solve this problem, we develop a principled reinforcement learning algorithm that iteratively explores the optimal tradeoff between the data quality and the cost for task allocation. We first define a few key terms that will be used in our reinforcement learning framework.
State (): We define a state () as a sensing cell (i.e., ) that is considered as a candidate for task allocation. Each state carries a state value (), which indicates the priority of the corresponding cell to be selected for a sensing task allocation. In particular, we define to be the state value for state at sensing cycle .
We initiate the state value for each sensing cell with the UPS score generated by MPI at each sensing cycle (i.e., ). The state value will be dynamically updated by our reinforcement learning algorithm to explicitly consider the nonuniform task allocation cost, which will be elaborated in this subsection.
Action (): We define an action as the move of a participant who travels from the sensing cell to to perform the assigned sensing task in two consecutive sensing cycles.
A participant can take an action to move from current cell to a new cells or stay at current cell (i.e., ) to perform the sensing task. The goal of our reinforcement learning algorithm is to learn the optimal action for each participant so the aggregated sensing error and overall sensing cost can be jointly optimized.
Reward (): We define the reward for action to be inversely proportional to the task allocation cost (i.e., travel distance between the two cells) defined in Definition 9 as:
where is the scaling parameter, the value of is tuned based on the specific requirements of the applications. The rewards will be used to dynamically update the state values in our reinforcement learning algorithm.
Using the above definitions, we leverage a Bellman optimality equation bradtke1995reinforcement to integrate the nonuniform task allocation cost with the unified ranking score generated by MPI as follows:
where is the action of moving from the sensing cell to for a sensing task at sensing cell . is the reward for the action. is the discount parameter to control the updating rate of the state values, which is usually set to be a small value (i.e., less than 1) to ensure a desired performance of the reinforcement learning algorithm. The intuition of the above equation is to assign a higher priority score (i.e., state value) to a cell that leads to lower overall sensing error with minimal sensing cost.
Using the Bellman optimality equation, we can learn the optimized state value for each sensing cell by iteratively updating each state value until all state values are converged as follows:
where is the state value for sensing cell in the previous iteration, and is the updated state value for sensing cell in the current iteration. is the threshold to stop the learning process, which is usually set to be a small value (e.g., less than 0.1) to ensure the convergence of the learning process and the accuracy of the learned model. We take the learned optimized state value as the overall ranking score for a sensing cell at the sensing cycle as follows:
Finally, each participant is allocated to move from the current cell to the new cell that leads to the highest value as follows:
where is set of sensing cells that has been allocated to the participants in the current sensing cycle.
In this section, we evaluate the performance of the QCO-TA scheme through a real world social sensing application. We compare the performance of QCO-TA with state-of-the-art task allocation baselines. The evaluation results show that QCO-TA significantly outperforms the baselines in terms of both sensing accuracy and task allocation cost.
Piemonte Air Quality Dataset: In our evaluation, we use a social sensing dataset published by Blangiardo et al. blangiardo2015spatial 444https://sites.google.com/a/r-inla.org/stbook/datasets. This dataset consists of daily air quality measurements across 24 locations (i.e., cells) in Piemonte, Italy (as shown in Figure 3(a)). The measurements include the following air quality related attributes: wind speed, temperature, emission rates of primary aerosols, and particulate matter (PM) 10. We choose this dataset because i) it contains multiple sensing attributes of the measured variable (i.e., the air quality); ii) the measurements have large spatial-temporal dynamics (Figure 3(b)), which make our problem more challenging to solve. The sensing cycle is set to be one day for this application.
5.2 Inference Algorithm
In the experiment, we select the following inference algorithms to work with the task allocation schemes to estimate the sensing values of the cells that are not selected for sensing in each cycle.
K-Nearest Neighbour (KNN): KNN estimates the missing value of a cell by averaging the collected sensing values from the k nearest cells of the target cell.
Inverse Distance Weighting (IDW): IDW estimates the missing value of a cell by calculating the weighted average value of the collected sensing values from its n closest neighbors, where the weights are proportional to the reciprocal of the spatial distances between the target cell and its neighbors.
Support Vector Regression (SVR):
SVR first establishes a prediction model with the collected sensing values from the selected cells using a support vector machine and then applies the prediction model to infer the missing value of the target cellchen2014short.
5.3 Baseline algorithms
We choose several representative task allocation schemes as the baselines.
OO-MTA: OO-MTA scheme is a simplified version of QCO-TA scheme that selects the sensing cells solely based on the unified priority score (defined in Definition 12) from our previous work zhang2018optimizing. The OO-MTA scheme does not consider the nonuniform task allocation cost.
GPS-TA: GPS-TA (Greedy Priority Selection Task Allocation) is a greedy task allocation algorithm that selects the cells with highest priorities for each sensing attribute for task allocation hsieh2015inferring. is equal to the number of participants divided by the number of attributes.
EWA-TA: EWA-TA (Equal Weighted Aggregation Task Allocation) is a task allocation scheme that generates the overall priority of a cell by calculating the mean of the priorities of the cell across all sensing attributes and then selects the top ranked cells for task allocation.
UNS-TA: UNS-TA (Uniform Sampling Task Allocation) is a task allocation scheme that uniformly samples cells from all cells for sensing task allocation in each cycle, where each cell has an equal probability to be selected for task allocation ho2012online.
5.4 Evaluation Metrics
In our evaluation, we define the following metrics to evaluate the task allocation performance of all compared schemes.
Aggregated Sensing Error (): We define Aggregated Sensing Error () to be the aggregated sensing errors for all the sensing attributes of the measured variable. Specifically, we define:
where is the number of the sensing attributes, is the number of sensing cells, and is the the number of sensing cycles. is the sensing error for attribute of cell at cycle as we defined in Definition 7. is the normalization function to normalize the sensing error for attribute as we defined in Equation (2).
Average Task Allocation Cost (): We define Average Task Allocation Cost () as follows:
where is the number of participants and is the task allocation cost for participant at sensing cycle as we defined in Definition 9. In our evaluation, the task allocation cost is measured by the physical distance between the source and destination cells a participant travels.
5.5 Evaluation Results
In this subsection, we present the results of our QCO-TA scheme and the compared baselines on the real world social sensing dataset. In the experiment, we evaluate the performance of all compared schemes by varying the number of sensing attributes of the measured variable. In particular, we change the number of of attributes from two to four in our experiment based on the number of attributes available in the dataset (i.e., we have four attributes in total). For a given number of sensing attributes, we evaluate the performance of all compared schemes using the aggregated sensing error and average task allocation cost metrics defined in Equation (15) and Equation (16), respectively. In our experiment, we evaluate the performance of all schemes by changing the number of participants. Specifically, we vary the number of participants from 8 to 14 by considering the number of sensing cells in our dataset (i.e., X=24).
5.5.1 Evaluation Results on Data Quality
|2 Sensing Attributes||3 Sensing Attributes||4 Sensing Attributes|
The results on aggregated sensing error are shown in Table 1. We observe that the QCO-TA scheme outperforms all of the baselines by achieving the smallest sensing error. The performance gain achieved by the QCO-TA scheme is consistent over different inference algorithms and different numbers of sensing attributes. The performance gain achieved by QCO-TA compared to the best performing baseline on =8, =10, =12, =14 are 4.8%, 4.4%, 3.0%, 3.1% under the KNN inference algorithm when the number of sensing attributes is 2. Such performance gains are achieved by two key designs in the proposed QCO-TA scheme. First, the SPE module (described in Section 4.1) judiciously uses temporal entropy and spatial mutual information to estimate the priority of cells for different sensing attributes. Two types of sensing cells are often selected for sensing tasks to reduce the overall sensing error: i) the sensing cells with the high uncertainty of their inferred sensing values, and ii) the representative sensing cells whose sensing value can be used to represent the sensing values of their neighboring cells. Second, the MPI module (described in Section 4.2) unitizes principled exponential weighted online learning to integrate the priority estimations of different sensing attributes. In particular, the MPI module explicitly exploits the dependencies between the attributes and carefully increases the weights of the sensing attributes that contribute less to the overall sensing error. Additionally, we observe that the sensing error of all schemes generally decreases when the number of participants increases. This is because a larger number of participants allow the schemes to collect sensing values from more cells, which reduces the errors of the inference algorithms of all compared schemes. These results demonstrate that the QCO-TA scheme can minimize the sensing error of social sensing applications with multiple sensing attributes in comparison with the state-of-the-art baselines.
5.5.2 Evaluation Results on Task Allocation Cost
|2 Sensing Attributes||3 Sensing Attributes||4 Sensing Attributes|
The results on average task allocation cost are shown in Table 2. We observe that the QCO-TA scheme outperforms all baselines by achieving the lowest task allocation cost. The performance gain achieved by QCO-TA compared to the best performing baseline on =8, =10, =12, =14 are 6.458 km, 3.645 km, 2.633 km, 6.279 km in terms of travel distance under the KNN inference algorithm when the number of sensing attributes is 2. Similar performance gains are also observed for different inference algorithms and different numbers of sensing attributes. Such performance gains of the QCO-TA scheme are achieved by the key design in the principled reinforcement learning framework (NTS module as described in Section 4.3), which directly learns to achieve the optimized tradeoff between the data quality and the nonuniform task allocation cost. In particular, the NTS module iteratively updates the proposed Bellman optimality equation (defined in Equation 11) to assign a higher priority score to the cell that leads to a lower overall sensing error with the minimal sensing cost. The reinforcement learning process ensures that the QCO-TA scheme always selects the sensing cells which lead to the lowest sensing cost under the premise of sensing quality assurance. In summary, the above results demonstrate the capability of QCO-TA to achieve the goal of quality-cost-aware task allocation (defined in Equation 2). We also would like to acknowledge that we only consider the task allocation cost to be the travel distances of participants to perform the sensing tasks in our evaluation. However, in real-world applications, there can be additional factors that affect the task allocation cost (e.g., the time consumed by the participants and the way of commute). We plan to further validate our QCO-TA scheme with a more complex and comprehensive cost model in our future work.
5.5.3 Affordability of QCO-TA scheme
Finally, we study the affordability of the QCO-TA scheme by examining the performance of QCO-TA scheme over multiple sensing cycles. The performance of QCO-TA scheme is shown in Figure 4. We observe that the QCO-TA scheme only requires a small number of sensing cycles at the beginning to quickly learn the optimal task allocation strategy that optimizes the performance of the system (in terms of data quality). The results demonstrate that our scheme can efficiently learn the optimal task allocation strategy within a limited number of sensing cycles.
This paper develops a QCO-TA scheme to solve the quality-cost-aware task allocation problem in multi-attribute social sensing applications. In the QCO-TA scheme, we develop a single-attribute priority estimation module to estimate the priority score of each cell for a given sensing attribute, a multi-attribute priority integration module to integrate the priority scores from all sensing attributes into a unified ranking score for the task allocation, and a task-cost-aware task allocation module to explore the optimized tradeoff between the data quality and sensing cost. The evaluation results on a real-world data trace demonstrate that the QCO-TA scheme achieves significant performance gains in terms of both data quality and sensing cost compared to the state-of-the-art baselines in various application scenarios.
7.1 The proof of NP-hardness of proposed task allocation problem
We use the reduction technique to prove the quality-cost-aware task allocation problem is NP-hard by reducing it to a well-known NP-hard problem, i.e, the bounded knapsack problem. Let us consider a simplified problem of the quality-cost-aware task allocation problem, where the goal is to only optimize the overall sensing error (given the predefined task allocation cost constraint).
The proof of NP-hard: Bounded Knapsack problem is a known NP-complete problem pisinger1995minimal. We do the transformation as follows:
For a bounded knapsack problem with a set of items, each item comes with a weight of and a value of . The goal is to select a subset of at most items from the items (i.e., ) to maximize the overall value of the selected items while ensuring the overall weight of the selected items is under the knapsack’s weight capacity (where is the knapsack’s weight capacity).
Let us covert the bounded knapsack problem to the simplified task allocation problem as follows: considering a set of sensing cells, selecting each sensing cell requires a sensing cost of and reduces the sensing error by . The goal is to select a subset of at most sensing cells from the sensing cells to maximize the reduced sensing error while ensuring the overall sensing cost is under the predefined threshold .
By solving the task allocation problem, we can get the answer for the bounded knapsack problem. In particular, if we can allocate sensing cells to maximize the reduced sensing error while ensuring the overall sensing cost is under the predefined threshold, then we must be able to find items in the bounded knapsack problem so that the overall value can be maximized while ensuring the overall weight is under the knapsack’s weight capacity.
The above reduction process proves that the simplified task allocation problem is NP-hard. In other words, we know that the simplified task allocation problem is at least as hard as all problems in NP, so our proposed quality-cost-aware task allocation problem is also at least as hard as all problems in NP. Therefore, our proposed quality-cost-aware task allocation problem is NP-hard.