# Task Recommendation in Crowdsourcing Based on Learning Preferences and Reliabilities

## Authors

• 4 publications
• 22 publications
• ### Eliciting Worker Preference for Task Completion

Current crowdsourcing platforms provide little support for worker feedba...
01/10/2018 ∙ by Mohammadreza Esfandiari, et al. ∙ 0

• ### Deep Bayesian Trust : A Dominant Strategy and Fair Reward Mechanism for Crowdsourcing

A common mechanism to assess trust in crowdworkers is to have them answe...
04/16/2018 ∙ by Naman Goel, et al. ∙ 0

• ### In-Route Task Selection in Crowdsourcing

One important problem in crowdsourcing is that of assigning tasks to wor...
09/14/2018 ∙ by Camila F. Costa, et al. ∙ 0

• ### Crowdsourced Labeling for Worker-Task Specialization Block Model

We consider crowdsourced labeling under a worker-task specialization blo...
03/21/2020 ∙ by Doyeon Kim, et al. ∙ 0

• ### Feature Based Task Recommendation in Crowdsourcing with Implicit Observations

Existing research in crowdsourcing has investigated how to recommend tas...
02/10/2016 ∙ by Habibur Rahman, et al. ∙ 0

• ### Time-Sensitive Bayesian Information Aggregation for Crowdsourcing Systems

Crowdsourcing systems commonly face the problem of aggregating multiple ...
10/21/2015 ∙ by Matteo Venanzi, et al. ∙ 0

• ### Scheduling Tasks for Software Crowdsourcing Platforms to Reduce Task Failure

Context: Highly dynamic and competitive crowd-sourcing software developm...
05/29/2020 ∙ by Jordan Urbaczek, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In a typical crowdsourcing platform, a worker may be recommended a wide variety of tasks to choose from [1, 2, 3]. For example, on Amazon Mechanical Turk (MTurk) and CrowdFlower, tasks can include labeling the content of an image, determining whether or not sentences extracted from movie reviews are positive, or determining whether or not a website has adult contents. In these examples, the different tasks require different sets of skills from a worker. In the image labeling task, a worker who is good at visual recognition will perform reliably, whereas identifying whether a movie review is positive or not requires a worker with language skills and knowledge of the nuances of that particular language the review is written in. Workers may also have different interests, and may choose not to accept tasks that they are qualified for. Therefore, the crowdsourcing platform needs to find a way to best match the available tasks to the most suitable workers to improve the likelihood of obtaining high-quality solutions to the tasks.

utilizes the expectation maximization approach to estimate the tasks’ solutions and workers’ reliabilities. The aforementioned approaches are all offline, i.e., the task assignments are made without taking into account additional information about the workers’ performances and behaviors that can be gleaned while the workers complete tasks sequentially over time.

Our MAB formulation is related to, but different from, the risk-averse MAB problem considered in [35, 36, 37]

, which uses mean-variance

[38, 39] as its risk objective. The differences between our MAB formulation and the risk-averse MAB are: (i) The worker in our problem may choose not to accept a task at each time step, leading to no reward at that time step, whereas in the risk-averse MAB, such an option is not available. (ii) The variance of the reward is used in the formulation of the risk-averse MAB, while we use the variance of the reward estimator in formulating our regret. This is because as explained above, in a crowdsourcing platform, the reward at each time step is unobservable, and our regret is dependent on the estimation accuracy from using gold tasks to estimate the expected reward instead of the reward variance.

The rest of this paper is organized as follows. In Section 2, we present our system model and assumptions. In Section 3, we derive the optimal order of the regret and introduce three strategies that are order optimal in Section 4. In Section 5, we present simulations to compare the performance of our approaches. Section 6 concludes the paper.

Notations: We use

to denote the distribution of a Bernoulli random variable

with . The notation denotes equality in distribution. We use to denote the set of positive integers and to denote the complement of the event . The indicator function if and only if . We let . For non-negative functions and , we write if , if , and if .

## 2 System Model

We consider a crowdsourcing platform where each task belongs to one of categories. The platform recommends a task from some category to a worker at each time . The worker has different reliabilities and preferences for different categories. The worker’s reliability for category

is the probability

of completing a task from category correctly, and his preference refers to the probability of accepting a task from category . We let if the worker accepts the -th task recommended to him from category , and otherwise. We let if the worker completes the task correctly, and otherwise. For all and , we assume that are independent and identically distributed. Similarly, are independent and identically distributed. We assume that and are independent.

We model the task recommendation problem as a MAB problem. However, since is typically unobservable since the system does not know the task solution a priori, we include the use of gold tasks in our recommendation. A gold task is a task whose solution is known a priori to the system. A reward is obtained only if the worker accepts and completes a non-gold task correctly.

Let , whose distribution is . We denote the task category that maximizes as . For each time step , let denote the event that the -th task from category recommended to the worker is a gold task. Let

 gk(t)=t∑i=1Ak,i1Gk,i (1)

be the number of completed gold tasks. If the worker completes a non-gold task, he is rewarded in proportion to , the probability that he has completed the task correctly. However, since we do not know a priori, the crowdsourcing platform estimates it using the empirical mean from the completed gold tasks:

 ¯Xk(t)=1gk(t)t∑i=1Ak,jiXk,ji1Gk,i. (2)

Due to the uncertainty in the estimate creftype 2, we assume that the reward for the -th non-gold task from category if the worker completes it is given by

 (pk−βσ2kgk(t))+, (3)

where is a predefined weight that quantifies the importance of the estimator ’s accuracy to the crowdsourcing platform, and is the variance of a task reward from category . The quantity is the variance of conditioned on .

For each and time , let denote the number of tasks recommended so far till time from category . We define the cumulative reward function at time as:

 r(n) =\EK∑k=1Nk(n)∑t=1Ak,t(Xk,t−βσ2kgk(t))+1Gck,t =\EK∑k=1Nk(n)∑t=1qk(pk−βσ2kgk(t))+1Gck,t, (4)

since , and are independent. Note that the coefficient can also be interpreted as the Lagrange multiplier for a constrained optimization problem in which the reward

 \EK∑k=1Nk(n)∑t=1qkpk1Gck,t

is maximized subject to a constraint on the uncertainty in the estimator creftype 2.

To avoid dividing by zero, i.e., to ensure , we assume that the worker completes one gold task from each category before the recommendation starts, which can be done through a calibration process in a crowdsourcing platform.

To maximize the above reward function (4) is equivalent to minimizing the regret function at time , defined as:

 R(n)=nq∗p∗−r(n). (5)

For simplicity, we assume that every task category is non-empty at each time step. This is a reasonable assumption since the task pool in a crowdsourcing platform is updated constantly, and a single task may be assigned to more than one worker. In the following, we use the terms “task category” and “arm” interchangeably in our MAB formulation.

## 3 Optimal Regret Order

In this section, we first show that the optimal order of the regret function (5) is , where is the number of time steps. We then propose three recommendation strategies, all of which achieve the order optimal regret.

###### Theorem 1.

The optimal order of the regret function (5) is , where is the number of time steps.

###### Proof:

Let

 f(n)=K∑k=1Nk(n)∑t=11Gk,t (6)

be the total number of gold tasks recommended till time . Then, from creftype 4, we have

 r(n) =\EK∑k=1Nk(n)∑t=1qk(pk−βσ2kgk(t))+1Gck,t ≤\EK∑k=1Nk(n)∑t=1(q∗p∗−βqkσ2kf(n))+1Gck,t ≤\E[(n−f(n))(q∗p∗−βminkqkσ2kf(n))]+. (7)

From creftypeplural 7 and 5, with , when is sufficiently large, we then have

 R(n) ≥nq∗p∗−\E[(n−f(n))(q∗p∗−βminkqkσ2kf(n))]+ ≥\E[min{anf(n)+q∗p∗f(n)−a,nq∗p∗}] (8) ≥min{2√aq∗p∗n−a,nq∗p∗} (9) =2√aq∗p∗n−a, (10)

where the inequality in creftype 9 follows because with probability . The theorem is now proved. ∎

## 4 Order Optimal Strategies

In this section, we propose three strategies that achieve the optimal order regret of in Theorem 1, and discuss the advantages of each.

### 4.1 Greedy Recommendation Strategy

In our greedy recommendation strategy (GR), we divide time into epochs. In each epoch

, a single task category or arm is chosen and all tasks in that epoch are drawn from that category. In the first epochs, each of which consists of a single time step, the worker completes a gold task from each of the categories. Subsequently, the -th epoch, where , consists of time steps, where , and . In each of these epochs, we set the first task to be a gold task, while for all other time steps in the epoch, the tasks recommended are non-gold tasks chosen from a particular task category. For each , the task category is chosen based on a -greedy policy [27] as follows.

Let , and choose so that . For each , let , where is a fixed constant. At the beginning of each epoch , where , we choose

 zr=argmaxk¯Yk(Tk(r−1)), (11)

where is the number of epochs within the first epochs that category is chosen,

 ¯Yk(g)=1gg∑i=1Ak,jiXk,ji, (12)

and are the indices of the first gold tasks recommended from category . Then, with probability , we let to be the task category chosen for epoch , and with probability , we let to be a randomly chosen category. The GR strategy is summarized in Algorithm 1.

###### Theorem 2.

Suppose that . Then, GR has regret of order , where is the number of time steps.

###### Proof:

For each arm in epoch , we denote by , the probability that arm is chosen. From Theorem 3 in [27], if and , we have

 Pk,r≤cd2r+o(r−1), (13)

so that

 P∗,r≥1−c(K−1)d2r−o(r−1). (14)

Let be the number of epochs completed till the time step , i.e., is the largest integer satisfying . We have . Let be the number of category gold tasks completed in the first epochs. Since a single gold question is recommended in each epoch, we have for all time steps in epoch . From creftype 4, we obtain

 r(n) ≥\EM∑r=K+1qcr(τ(r)−τ(r−1))(pcr−βσ2crncr(r))+ ≥\EM∑r=K+1q∗(2αr−α−1)+(p∗−βσ2∗n∗(r))1{cr=∗} =M∑r=K+1q∗(2αr−α−1)+(p∗P∗,r−\E[βσ2∗n∗(r)]). (15)

Recall also that is the number of epochs within the first epochs in which arm was chosen. Since GR assumes that the worker completes a gold task from arm before the recommendation starts, we have

 n∗(r)−1∣T∗(r)∼Bern(T∗(r)−1,q∗),

and

 \E[1n∗(r)] =\E[\E[1n∗(r)]T∗(r)] =\ET∗(r)−1∑j=011+j(T∗(r)−1j)qj∗(1−q∗)T∗(r)−1−j =\ET∗(r)−1∑j=01T∗(r)(T∗(r)j+1)qj∗(1−q∗)T∗(r)−1−j =\E1T∗(r)q∗T∗(r)−1∑j=0(T∗(r)j+1)qj+1∗(1−q∗)T∗(r)−1−j =\E1T∗(r)q∗(1−(1−q∗)T∗(r)) ≤\E[1T∗(r)q∗]. (16)

From (14), (15) and (4.1), we obtain

 r(n) ≥M∑r=K+1q∗p∗(2αr−α−1)+(1−O(r−1)) −M∑r=K+1(2αr−α−1)+\E[βσ2∗T∗(r)] ≥q∗p∗αM2−O(M)−M∑r=K+1(2αr−α−1)+\E[βσ2∗T∗(r)] ≥nq∗p∗−O(√n)−M∑r=K+1(2αr−α−1)+\E[βσ2∗T∗(r)]. (17)

We next prove the following lemma.

###### Lemma 1.

.

We have

 \E[1T∗(r)] =\E[1T∗(r)]T∗(r)≤rK\lx@paragraphsign(T∗(r)≤rK) +\E[1T∗(r)]T∗(r)>rK\lx@paragraphsign(T∗(r)>rK) ≤\lx@paragraphsign(T∗(r)≤rK)+O(r−1). (18)

Therefore, to show the lemma, it suffices to show that the tail probability is of order .

Consider . Let be the number of times for , and be the number of times arm was randomly chosen for . From the union bound, we have

 \lx@paragraphsign(Tk(r)≥rK)≤\lx@paragraphsign(TRk(r)≥r2K)+\lx@paragraphsign(TBk(r)≥r2K). (19)

From Hoeffding’s inequality, we obtain

 \lx@paragraphsign(TRk(r)≥r2K) ≤e−2(12K−\ETRk(r)/r)2r ≤e−2(12K−O(clnrd2r))2r =e−O(r), (20)

where the second inequality follows because the probability of randomly choosing arm at each epoch is not more than and

 \ETRk(r)≤r∑j=1cd2j≤c(lnr+1)d2.

In order to bound the second term in (19), we use a similar argument as that in [40]. Let and . Define the following events:

 E1 ={¯Yk(j)≤qkpk+Δk2,% for all j∈[s(r),r]}, E2 ={¯Y∗(j)≥q∗p∗−Δk2,% for all j∈(u(r),r]}, E3 ={T∗(s(r))>u(r)}.

Under the event , we have for all ,

 r≥T∗(i)≥T∗(s(r))>u(r),

which implies that

 ¯Y∗(T∗(i)) ≥q∗p∗−Δk2 >qkpk+Δk2 ≥¯Yk(j), (21)

for all . Since , we have because otherwise there exists such and , which contradicts creftype 21. Therefore, , and

 \lx@paragraphsign(TBk(r)>s(r)) ≤\lx@paragraphsign(Ec1∪Ec2∪Ec3) ≤\lx@paragraphsign(Ec1)+\lx@paragraphsign(Ec2)+\lx@paragraphsign(Ec3), (22)

where the last inequality is the union bound. We next bound each of the terms on the right hand side of creftype 22 separately. For the first term, we have

 \lx@paragraphsign(Ec1) ≤r∑j=s(r)\lx@paragraphsign(¯Yk(j)>qkpk+Δk2) ≤r∑j=s(r)e−Δ2k2j ≤e−Δ2kr4K1−e−Δ2k2 =o(r−1), (23)

where the second inequality follows from Hoeffding’s inequality. Similarly, for the second term on the right hand side of creftype 22, we obtain

 \lx@paragraphsign(Ec2) ≤r∑j=u(r)\lx@paragraphsign(¯Y∗(j)

since and .

From the proof of Theorem 3 in [27], we have and

 \lx@paragraphsign(Ec3) ≤\lx@paragraphsign(TR∗(s(r))≤u(r)) ≤\lx@paragraphsign(TR∗(s(r))≤12Ks(r)∑i=1ϵi) ≤e−1512K∑s(r)i=1ϵi ≤e−u(r)/5 =(2cK2rd2e1/2)c5d2 =o(r−1), (25)

where the third inequality follows from Bernstein’s inequality (see (13) of [27] for a detailed proof).

Applying creftypeplural 25, 24 and 23 on the right hand side of creftype 22, we obtain . From creftypeplural 19 and 20, we then have

 \lx@paragraphsign(Tk(r)≥rK)=o(r−1),

which yields

 \lx@paragraphsign(T∗(r)≤rK) =\lx@paragraphsign(∑k≠∗Tk(r)≥r(K−1)K) ≤∑k≠∗\lx@paragraphsign(Tk(r)≥rK) =o(r−1), (26)

and Lemma 1 is proved.

From Lemmas 1 and 4.1, we obtain

 r(n) ≥nq∗p∗−O(√n)−M∑r=K+1(2αr−α−1)+O(r−1) ≥np∗q∗−O(√n)−O(M) ≥np∗q∗−O(√n),

since . The proof of Theorem 2 is now complete. ∎

### 4.2 Uniform-Pulling Recommendation Strategy

In this subsection, we propose another strategy, which we call the Uniform-Pulling Recommendation (UR) strategy. We again divide time into epochs, with the -th epoch containing time steps. In the first time steps of the -th epoch, where , we first recommend gold tasks, one from each of the task categories. Next, we choose arm , where now for all . In the subsequent time steps, we recommend a non-gold task from arm at each time step. The UR strategy is summarized in Algorithm 2.

###### Theorem 3.

UR has regret of order , where is the number of time steps.

###### Proof:

The proof uses a similar argument and the same notations as that in the proof of Theorem 2. For , we have

 P∗,r ≥1−∑k≠∗\lx@paragraphsign(¯Yk(r)≥¯Y∗(r)) =1−∑k≠∗\lx@paragraphsign(¯Yk(r)−¯Y∗(r)+Δk≥Δk) ≥1−∑k≠∗e−2Δ2kr ≥1−o(r−1), (27)

where the penultimate inequality follows from Hoeffding’s inequality.

We choose be the largest integer satisfying so that , we have from Sections 4.1 and 15,

 r(n) ≥M∑r=2q∗(2αr−α−1)+(p∗P∗,r−\E[βσ2∗T∗(r)]). ≥M∑r=2q∗p∗(2αr−α−1)+(1−o(r−1)) −M∑r=2(2αr−α−1)+βσ2∗r ≥q∗p∗αM2−O(M)−M∑r=2(2αr−α−1)+βσ2∗r ≥nq∗p∗−O(√n).

where the second inequality follows from (27). Therefore, , and the proof of Theorem 3 is complete. ∎

### 4.3 ϵ-First Recommendation Strategy

The -first strategy was designed to deal with the budget-limited MAB problem [41], where the budget refers to the total number of time steps. In the -first strategy, the first fraction of the budget is used to explore the reward distributions of all arms, while the remaining fraction is used spent on the empirically best arm found in the initial exploration phase. To apply the -first strategy to our task recommendation problem, we let . Uniform pulling of arms is used in the exploration phase. The -first strategy is summarized in Algorithm 3, where .

###### Theorem 4.

If , then the -first strategy has regret of order , where is the number of time steps.

###### Proof:

Let . Similar to creftype 27 in the proof of Theorem 3, it can be shown that . From creftype 4 and Section 4.1, we obtain

 r(n) ≥\E[qcH(n−KH)(pcH−βσ2cHHqcH)+] ≥\E[q∗(n−KH)(pc∗−βσ2∗Hq∗)1{cH=∗}] ≥q∗(n−KH)(p∗P∗,H−βσ2∗Hq∗) ≥nq∗p∗−O(√n), (28)

since . Therefore, , and the proof is complete. ∎

### 4.4 Discussions

The -first strategy assumes that the total number of tasks to be recommended to each worker is known beforehand, while the other two strategies do not need such an assumption. If both UR and the -first strategy recommend the same total number of gold tasks, then the -first strategy achieves a smaller regret than UR. This is because the -first strategy recommends all the gold questions in the initial exploration phase, leading to a better estimation of the reward distributions before any non-gold tasks are recommended. However, in a practical crowdsourcing platform, knowing the total number of tasks for a worker may not be feasible as a worker is not guaranteed to be active or to remain in the system. An alternative is to use a hybrid UR and -first strategy, where the initial exploration phase in each epoch of the UR strategy is set to be an fraction of the total number of time steps in that epoch.

On the other hand, UR can become more costly than GR when the number of arms is large. If UR and GR both recommend the same total number of gold tasks to the worker, GR tends to recommend more gold tasks from the best task category in the long run, leading to a smaller estimation variance of creftype 2 for the best category and a smaller asymptotic regret than UR.

In both the GR and UR strategies, we choose , which determines how frequently gold tasks are recommended to the worker. A more general choice is for . We now show that is essentially the only choice that gives optimal order regret in GR and UR. Let be the number of gold tasks recommended till time . Then, since GR and UR recommends a fixed number of gold tasks in each epoch, we have if is within epoch . The number of time steps up to and including epoch is