Crowdsourcing is widely used in procuring labels and services for traditional AI applications. Often many of the tasks crowdsourced are more readily accomplished by humans than computers. An additional advantage is the scalable and cost-effective nature of crowdsourcing. However, typical crowdsourcing platforms may not consider several important aspects of traditional planning such as ensuring work completion within a strict deadline and with assured guarantees on the quality.
As a motivating example for this paper, consider a sequence of jobs arriving online where each job corresponds to translating a large document which has to be completed within a deadline and with an assured level of accuracy. It may not be possible for a single individual worker to accomplish this job, so the requester could split such a job into tasks (either at chapter or section or any other level) and allocate each task to a crowd worker. Due to the very nature of the task, a worker, if employed for a long duration, might start committing errors. We refer to the duration until which an agent works without committing any error as the time to failure (TTF). Also, each worker differs in the time taken to complete the entire job (if the entire job is executed by the worker). The time taken by a worker to complete the job all by himself is called the job completion time (JCT) of that worker. Each worker incurs a certain cost to complete the entire job. Note that the workers are heterogeneous in terms of their costs, JCT, and TTF. Moreover, JCT and TTF of the workers are stochastic. An additional non-trivial challenge occurs when crowd workers are strategic and may misrepresent their costs in the hope of gaining higher utility. This setting occurs in other problems such as tagging of a large repository of images, audio transcriptions, etc.
In this work, we consider jobs which (a) arrive online, (b) are divisible (into tasks), (c) have strict completion deadlines, and (d) are to be completed with an assured accuracy. We propose a multi-armed bandit (MAB) mechanism which learns the two parameters (mean job completion time (MJCT) and mean time to failure (MTTF)) of the workers while eliciting their privately held costs truthfully. We show that the proposed MAB mechanism minimizes the regret while meeting the deadline and accuracy requirements on every job. The following are the specific contributions of this work.
Non-strategic, with learning: We look at the problem of allocating divisible online jobs to crowd workers so as to meet the constraints on deadline and accuracy (Section 4). The underlying optimization problem turns out to be non-trivial since the parameters MJCT and MTTF of the workers are unknown. We overcome this challenge by devising a biparameter learning scheme based on the Robust UCB algorithm . Further, we embed this learning scheme into our social welfare maximizing algorithm, which we refer to as SW-GREEDY.
Regret Analysis: In Section 6, we show, for non-strategic as well as strategic settings, that the number of jobs for which a non-optimal worker set is chosen, is upper bounded by (Theorem 2), where is the total number of jobs to be completed. Moreover, once an optimal worker set is selected, the allocation algorithm converges asymptotically to an efficient allocation, ensuring that the average regret goes to zero in the limit (Theorem 3).
Simulations: Finally, we show the practical efficacy of our learning mechanism via simulations in Section 7.
2 Previous Work
We now look at previous work related to our setting. We group the relevant literature based on whether or not crowd workers are strategic.
In the non-strategic case, most of the work in crowdsourcing has focused on models for aggregating labels and building classifiers[13, 12]. Many efforts also address problems similar to the one considered in our paper. Faradani et al.  look at the design of pricing schemes dependent on the completion times of the workers. The strategic nature of the workers is not considered here. The problem of completing tasks within a deadline is also investigated by Yu et al. . The authors consider the setting where the workers delegate tasks to other workers when they are unable to complete the work within a deadline. Here the costs to workers are assumed to be known and workers are non-strategic. Under a different setup, Ding et al.  look at the budgeted multi-armed bandit problem where the two parameters stochastic costs and stochastic rewards are learnt. However, they do not consider strategic workers.
In the strategic case, Chandra et al. 
look at allocating indivisible tasks to strategic crowd workers under deadline constraints with the assumption that the reliability (in terms of completion of the task) of the agents is common knowledge and not estimated.Singer and Mittal  and Biswas et al.  look at pricing mechanisms in the presence of budget constraints and task completion deadlines. However, the heterogeneity with respect to time to failure is not modelled. Tran-Thanh et al.  look at crowdsourcing classification tasks with the goal of trading off cost and accuracy of the estimation. However the TTF and JCT of the workers is not modeled here. Choosing an optimal worker set in order to obtain an assured accuracy level has been studied in Jain et al. . The allocation algorithm makes use of the multi-armed bandits abstraction Auer et al. . A version of their allocation algorithm was designed for the case where workers are strategic with respect to bidding their costs. However, their setting does not look at the completion of tasks within a deadline. The problem of allocating tasks concurrently to several workers in order to meet deadlines is looked at by Gerding et al. . The work uses a variant of VCG mechanism to elicit the costs truthfully from the workers. They consider stochastic completion times of tasks but do not consider the time to failure during the allocation.
Our work differs from all the work listed above in that, we design an allocation scheme to complete jobs within a deadline while simultaneously learning the mean completion time as well as the mean time to failure of the workers. We also design a mechanism to elicit the costs of the workers truthfully.
3 The Model
Let denote the set of crowd workers (also referred to as agents) available to the requester. A sequence of homogeneous jobs arrives at the platform, one at a time. Following are some of the design issues pertaining to the requester.
Deadline: The clock starts ticking for a job as soon as it arrives. We use to denote the deadline. The deadline on each job is an upper bound on the duration, starting from the arrival of that job, before which the job is required to be completed in expectation.
Task creation: The requester can divide a current job into a certain number of tasks so as to facilitate completion of the job by the deadline . We use to denote the fraction of the job assigned as a task to the worker . Therefore, and . We assume arbitrary division of a given job into tasks for ease of exposition. However, this assumption can be relaxed to capture meaningful constraints such as the size of the task.
Threshold on probability of failure for tasks: A worker is more likely to commit an error if he works for a longer duration on a task. We say a worker has failed when he commits an error. We use to denote (the common) threshold on probability of failure for any task. This threshold allows the requester to control the overall “quality” of the job.
Job Completion Time (JCT):
A worker has a stochastic job completion time, which is the time he requires to complete the entire job by himself. JCT for a worker is random variable with a fixed but unknown mean. We refer to the mean job completion time as MJCT. The requester wishes to learn the MJCT for each worker. Ifis the MJCT of worker , then the task allocation will meet the deadline constraint in expectation if .
Time to Failure (TTF): A worker is also characterized by a stochastic time to failure, which denotes the duration for which a worker would work without a failure. Like JCT, TTF also has a fixed yet unknown mean, which the requester wishes to learn. If is the CDF of TTF for agent , who workers for a expected duration on the task allocation given by the fraction of job , the requirement on threshold probability error dictates .
Cost Incurred: Worker has a privately held cost which represents the cost incurred by worker to complete the job entirely on his own. Therefore, the cost involved to complete fraction of the job by the worker is .
Goal of Optimization Problem: The constraints on deadline and threshold on probability of failure for every task has to be met in a cost optimal way for every online job . Thus, the underlying optimization problem for the entire collection of jobs is given by eq. 1.
As mentioned earlier, the JCT and the TTF of the workers are stochastic in nature. We assume the JCT of each worker follows a log-normal distribution with unknown yet fixed mean
while the TTF for each worker follows an exponential distribution with mean.
Remark 1 (Choice of Distributions).
The choice of log-normal distribution is due to its wide applicability in social sciences and economics to model similar quantities. However any suitable non-negative random variable whose distribution is sub-Gaussian (or sub-exponential) may be used. As discussed, the errors in this setting are introduced due to higher working duration on the task. This is analogous to the modelling of failure as function of time, in biological or computer or reliability literature, as exponential distributions. Hence, we model the TTF of the workers as exponential.
The optimization problem stated in eq. 1 involves a learning scheme along with cost minimization across all the online jobs. However, due to independence across the jobs, the problem can be decomposed into a sequential cost minimization problem corresponding to each job (). Therefore, in eq. 1 the summation over the jobs can be omitted. This enables us to use in place of for the sequential optimization problem for each job.
4 The Case of Non-Strategic Workers
We first study the scenario where the costs incurred by the workers are common knowledge. If the means ( and ) are known to the requester, no feasible allocation to the worker should exceed . The additional requirement on accuracy requires that the probability of a worker failing in the duration does not exceed . This is equivalent to the constraint where is the CDF of the random variable TTF of worker which we model as the exponential distribution with mean . On simplification, the requester’s optimization problem reduces to eq. 2.
In practice, and are not known and need to be learnt. We make use of the multi-armed bandit (MAB) abstraction for learning these parameters. More specifically, since and are sub-exponential distributions, we appeal to the Robust UCB technique . While -UCB algorithm 
is a regret minimizing scheme for learning the mean of sub-Gaussian distributions, for heavy tailed distributions (e.g. log normal and exponential), Robust UCB has been shown to be regret minimizing. We adopt the Robust UCB scheme with truncated empirical mean as the estimator.
4.1 Difficulty in Learning
If a worker , allocated a fraction of the job, takes time for completion, then is a sample from the distribution log-normal(). Therefore, every allocation contributes one such sample for the Robust UCB algorithm estimating . However, for estimating , each sample allocation fed to the Robust UCB algorithm must correspond to a failure, but this is not practical as we do not observe failure at every instance of allocation. To handle this difficulty, we propose to use a surrogate random variable. Consider the experiment where a worker is allocated a task (fraction of a job) on which the worker spends a duration of at least . The experiment is deemed to have failed if the worker fails in the first duration of allocation, otherwise it is deemed a success. Let be the number of such independent experiments till a failure is encountered. We propose to use the random variable to construct a sample from exponential(). To obtain such a sample, for every job , we observe for a duration to see if any of the allocated workers have failed. Let be the number of contiguous instances (of jobs) of allocation during which a worker does not fail in the interval . Note that is a sample from . Therefore, the value forms a sample of interest. Once a sample is obtained, is reset and the process is again repeated to collect more samples. The expectation of the surrogate random variable in the limit coincides with due to Lemma 1.
By definition, . Note, and therefore, .
where eq. 3 follows by applying the L’Hospital’s rule. ∎
4.2 SW-GREEDY: A Greedy Allocation
The workers are indexed in an increasing order of their costs and each worker is allocated the largest possible fraction which does not violate the constraints in eq. 2 till all tasks of the job are allocated.
The constraint , involves means which are unknown. As mentioned earlier, we use Robust UCB to learn estimates for and . and are the upper confidence indices while and are the lower confidence indices of MJCT and MTTF respectively, obtained from Robust UCB. and are the empirical estimates of MJCT and MTTF respectively for worker . We could substitute or as the estimate for in our constraint. A higher value of enforces a lower allocation to worker compared to when a lower value of is used. Hence we refer to as a pessimistic estimate for . By a similar reasoning, we refer to as the pessimistic estimate for . The use of the pessimistic estimates ensures that even with the true underlying means the constraint in eq. 2 is satisfied.
The allocation algorithm discussed above ensures that the social welfare regret of the learning scheme is optimized, hence we refer to the above allocation as SW-GREEDY (Algorithm 1). The social welfare is defined as follows.
Social Welfare: Social welfare of a feasible (i.e. satisfying eq. 2) allocation is the sum of valuations of the agents under that allocation. In this setting, the valuation of a crowd agent is . Therefore, social welfare is given by .
Every worker is paid an amount equal to the cost incurred, i.e. , where is the allocation to agent given by Algorithm 1.
Remark 2 (Pessimistic Selection).
The fundamental underlying philosophy of the UCB family of algorithms is “optimism under uncertainty.” Intuitively, this optimism helps in adequate exploration realtive to a naive scheme which just uses the empirical estimate. In our work, we do not use this philosophy implicitly, however, due to the greedy nature of the allocation scheme, the pessimistic allocation set is a superset of the optimistic allocation.
5 The Case of Strategic Workers: TD-UCB
Here, before an allocation is performed, the agents announce their bids. These bids may or may not be equal to their true private costs. We denote the bid profile by , where is the bid of agent and denotes the collection of bids of all agents except agent . In order to ensure that the agents bid their costs truthfully, we introduce a mechanism TD-UCB. The allocation rule remains the same as the one for the case where the workers are non-strategic. We use the allocation given in Algorithm 1 replacing the input costs with the bids.
5.1 Payment Scheme
Let denote a tuple of allocation and performance of the allocated workers for the job . The learning until job is captured in the history . In order to specify the payment scheme, we require the notion of ‘externality’ imposed by an agent on another. We denote the externality imposed by agent on as , which signifies the additional fraction of the job allocated to the agent in the absence of agent . The externality for the job depends on the bid profile as well as the history of allocations till job . Let be the agent with the largest reported bid in the worker set chosen by the allocation scheme. Figure 1 provides a schematic diagram indicating the position of the bids and the agents chosen by our algorithm.
Formally, the externality is defined as
Remark 3 (Notation).
All the mechanism side parameters such as or are a function of . Similarly, the agent side parameters such as utility depend on the tuple . Note that the agent side parameters have an additional dependency on the true cost . Whenever clear from the context, we drop one or more of these dependencies for ease of notation.
Remark 4 (Externality).
Our mechanism is an externality based scheme like the VCG mechanism. We now set about the task of proving that the mechanism is truthul, regret minimizing, and individually rational, while learning the associated stochastic parameters. Earlier works have shown the non-triviality involved in the design of such learning mechanisms [2, 7].
5.2 Properties of TD-UCB Mechanism
Utility of an Agent: The utility of an agent in this setting is the difference between the valuation of an allocation and the payment made. The utility is given by the following.
Dominant Strategy Incentive Compatible (DSIC): A mechanism is DSIC if the utility , where and are the bid and true cost incurred by the worker respectively, is the bid profile of all agents other than .
A DSIC mechanism ensures that an agent obtains the highest utility by bidding his true cost, irrespective of the bids of other agents.
Ex-post Individually Rational (IR): A mechanism is ex-post individually rational if , .
An IR mechanism ensures that for every agent, the utility obtained from truthful bidding of the costs is non-negative.
The TD-UCB mechanism is DSIC and IR.
IR is immediate and follows from the definition of the payment scheme of the mechanism (eq. 5).
We prove the DSIC property by examining different possible scenarios of allocation for an agent. In each of these scenarios, we compute the utilities with truthful bids as against strategic misreports of bids.
For performing any job , utility of a worker is defined as follows.
where and are the allocation and the payment to the worker respectively. We consider the following three possible scenarios for the positioning of each worker in the increasing order of ranking of the bids of the workers. We refer to the set of workers with non-zero task allocation as the active set in this proof. Throughout the proof, we denote by the active set of allocated workers when agent bids his true cost . We denote by the active set when the agent bids untruthfully.
In this scenario, when the agent bids truthfully,
When the worker reports his cost truthfully (i.e, ), he does not receive any allocation and therefore . Now we consider the following two cases when he misreports his cost.
Overbid of cost () :
Since , a higher bid would only place the agent at a position in the revised ranking order. At the position , again the allocation to him would be zero, that is, and thereby the utility from overbidding would be same as the utility from truthful bidding. Hence, he does not benefit from overbidding his cost.
Underbid of cost ():
Here there could be two possibilities:
: This scenario is identical to case 1(a) shown above and hence there is no incentive for the agent to bid in this manner.
: With such a bid, the agent is able to enter the active set of allocated workers.
Let the position of the agent in the new active set be , that is, , and the agent with the highest bid in is . Therefore, by underbidding his cost, agent is able to move the workers out of the active set. We now show that such a bid does not fetch agent an increased utility. As per the payment structure,
The second term in item 2b is zero, this is due to the fact that in absence of agent can complete the current job . Therefore, with even an underbid has no externality on agents . The third term in item 2b is also zero as the allocation with truthful bidding was enough to complete the job by agents . Hence in absence of the allocation with underbid is met by the externality sum. By underbidding, the agent is therefore able to obtain the portions of the job which would have been allocated to . For all such agents , , since in the absence of , these agents would have received an allocation. But note that and so these agents contribute towards a negative utility. Therefore the net utility .
Case 2: .
When agent bids truthfully, the active set is as follows:
and the payment to agent
[leftmargin=1mm, label=), labelwidth=-2mm]
Overbid of cost ():
Here we look at two possible values of the range of the bids.
[label=(), leftmargin=3mm, labelwidth=-4mm]
An overbid such that agent no longer belongs to the active set : At the position , the allocation to him is zero, that is, and thereby the utility from overbidding would be less than the utility from truthful bidding. Hence, he does not benefit from overbidding his cost in this manner.
An overbid such that agent remains in the active set but brings in other higher cost agents into the active set: Suppose the active set contains the agents in addition to the set , such that, without loss of generality,
The payment to agent with overbid is,
Since the agents have moved before in the ordering of the bids, those agents do not contribute to further. However, for the agents , because the same proportion of job must be reassigned to the agent when bids as well as when is truthful. The first term in section 5.2 therefore strictly exceeds first term in item 1b. We now show that the second terms in section 5.2 and item 1b are equal. Observe that,
Underbid of cost ( ): Note that in this scenario, there are the following two possibilities.
The active set = . The agent moves to a new position , that is, . Without loss of generality, we can consider that the agent with the highest bid in is now agent . The ordering of the agents is now,
By our payment structure,
since the additional allocation that gets due to an overbid would be allocated to the last agent in , in the absence of . A simple substitution in item 2a shows that the second terms in section 5.2 and item 2a are equal. Therefore .
The active set due to underbidding by agent is smaller than the active set due to truthful bidding by agent : This means that some agents get removed from . Suppose the agents get pushed out in the active set . Then by a similar argument as in the case 2 (b) (i) above, , but . Therefore these agents contribute towards a negative utility and hence, .
[label=), leftmargin=1mm, labelwidth=-3mm]
Overbid of cost ():
If the agent bids a higher cost, the position of in the ranking order changes to one of the following.
[label=(), leftmargin=1mm, labelwidth=-3mm]
: The allocation to the worker remains the same as when he is truthful, that is, . Our payment structure ensures that the payment and hence .
: In this case, agent ends up losing a part of to the worker . This scenario is analogous to Case 2 (a) (ii) where a worker who bids truthfully would have been at the last position , but by overbidding ends up sharing his allocation with other agents. Therefore .
: Here, agent does not receive any allocation and thereby his payment as well as utility are both zero.
Underbid of cost ():
Upon bidding a lower cost, the agent moves further up in the ranking order, that is . The allocation also does not change, that is, . Our payment structure ensures that the payment and hence .
Future rounds: If the agent ignores the loss incurred in the current job and chooses to manipulate the current bid for future utility, the resulting argument rolls back to one the above three cases. ∎∎
6 Regret Analysis
In the strategic as well as the non-strategic settings, the underlying optimization problem involves parameters that are learnt in tandem. Hence regret is an important notion which we analyse in this section. Following are some relevant definitions. A problem instance in this space is characterized by a set of crowd agents , the vector of their costs, the mean vectors (), and the design parameters – Deadline(), accuracy ().
Optimal worker set: For a problem instance with all the parameters known, in the solution to the optimization problem of eq. 2, we refer to the set of agents allocated non-zero fraction of the job as the the optimal worker set.
Optimal allocation: We refer to the solution of eq. 2 as the optimal allocation.
-Separation: Let be the agent in the optimal worker set with the highest bid. In the optimal allocation (social welfare maximizing), all workers’ allocation except would meet the constraints in eq. 2 with equality. We refer to the -separation as the additional fraction of the job which agent can take without violating any of the constraints. As all the stochastic parameters in this space are continuous, almost surely .
Regret: A learning mechanism in this space suffers a loss in social welfare due to either a) non-optimal set selection or b) due to suboptimal allocation within the optimal set. Formally, regret of a mechanism , is given by
where is the allocation to the agent for the job by the mechanism .
We use the truncated empirical estimator within our Robust UCB scheme. Through an invocation of the Bernstein inequality, we have, with high confidence (probability for the job), the true mean lies within the Robust UCB and LCB indices(see Lemma 1 in ). With enough samples, the symmetric indices of the Robust UCB scheme shrinks small enough so that no additional agents than the optimal set are required to meet the spill-over even due to the pessimistic strategy used.
The TD-UCB mechanism selects an optimal set after the job .
We denote as the costliest agent in the optimal set. Let denote the allocations when the means are known. Consider ,
denotes the additional fraction of work the agent can take up without violating the constraints. Following is a sufficient condition on when the set selected by the pessimistic estimate matches the optimal set.
We denote and as the Robust UCB indices of the JCT and the TTF of a worker in the active set. Recall, the active set for job denotes an agent allocated a non-zero fraction of the job. The expression for the pessimistic allocation at job is given by
The allocation is determined by equality in eq. 13 whenever the set chosen is not optimal. We analyse this allocation via two cases to determine the job when condition in eq. 12 is met.
Case (i): or Consider,