Crowdsourcing has emerged in recent years as a paradigm for leveraging human intelligence and activity at a large scale, it offers a distributed and cost-effective approach to obtain needed content, information or services by soliciting contributions from an undefined set of people, instead of assigning a job to designated employees [1, 2]. Over the past decade, numerous successful crowdsourcing platforms, such as Amazon Mechanical Turk (AMT) , Yahoo! Answers , Upwork  emerge. With the help of crowdsourcing platforms and with the power of a crowd, crowdsourcing is becoming increasingly popular as it provides an efficient and cheap method of obtaining solutions to complex tasks that are currently beyond computational capabilities but possible for humans [6, 7, 8].
Over the past decade, techniques for securing crowdsourcing operations have been expanding steadily, so has the number of applications of crowdsourcing . However, users in a crowdsourcing platform have the opportunity to exhibit antisocial behaviors due to the openness of crowdsourcing, and hence crowdsourcing is stripped of aureole when collective efforts are derailed or severely hindered by elaborate sabotage [10, 11]. As part of crowdsourcing, service exchange applications have proliferated as the medium that allows users to exchange valuable services. In a typical service exchange application, a user plays a dual role: as a client who submits his requirement to a crowdsourcing platform, and as a server who chooses to devote a high/low level of efforts to work on a job and provides solutions to the client in exchange for rewards . Since providing services incurs costs to servers in terms of power, time, bandwidth, privacy leakage, etc., rational and self-interested users would be more inclined to devote low level efforts when being a server, and seek for services from others as a client, rather than providing services as a server. Under such circumstances, non-cooperative behaviors among self-interested users decrease their social welfare, which is a social dilemma. Therefore, an increased level of cooperation is considered to be socially desirable for service exchange in crowdsourcing platforms.
The main reason why users in the above service exchange game have the incentive not to cooperate with each other is the absence of punishments for such malicious behaviors. Self-interested users always adjust their strategies over time to maximize their own utilities, however, they cannot receive a direct and immediate benefit by choosing to be a server and devoting a high-level effort to provide high-quality services to other users (as clients). Such a conflict leads to an inevitable fact that, many users would be apt to be a client to request services, or be apt to be a server but devote a low-level effort to provide low-quality services. Thus, an important functionality of the crowdsourcing platform is to provide a good incentive mechanism for service exchange. And there is an urgent need to stimulate cooperation among self-interested users in crowdsourcing, under which self-interested users will be compelled to follow the social norm such that the inefficiency of the socially undesirable equilibrium will be overcome, i.e., if a user chooses to be a server in the first stage, and provides high-quality services in the second stage, then he should be rewarded immediately, otherwise, he should be punished .
Incentives are key to the success of crowdsourcing as it heavily depends on the level of cooperation among self-interested users. There are two types of incentives, monetary and non-monetary. Incentive mechanisms based on monetary incentivize individuals to provide high-quality services relying on monetary or matching rewards in the form of micropayments, which in principle can achieve the social optimum by internalizing external effects of self-interested individuals. The work  presents a game theoretic model of an all-pay contest in crowdsourcing, and investigates whether multiple prizes can maximize contest revenue. Although monetary incentives, in some sense, are the best and easiest way to motivate people , several challenges prevent monetary incentives from success in service exchange applications. Firstly, it is difficult to price small services (e.g., answer, knowledge, resources etc.) being exchanged between users as these are not real goods . Deploying auctions to set the price may reduce the price to a certain degree, while it may cause implementation complexity, high delay, and currency inflation . Secondly, as pointed out by ,  and , “free-riding” may happen when rewards are paid before providing services, a server always has the incentive to take the reward without devoting enough effort, whereas if rewards are paid after the service exchange is completed, “false-reporting” may arise since the client has an incentive to lower or refuse rewards to servers by lying about the outcome of the task. Thirdly, although a monetary scheme is simple to be designed, it often requires a complex accounting infrastructure, which introduces computation overheads and substantial communication, and thus difficult to be implemented in reality [21, 22].
In addition to monetary incentives, some applications are endowed with different non-monetary incentive types, such as natural incentives, personal development, solidary incentives, material incentives, etc. . Among these non-monetary incentives, rating protocols (as a form of solidary incentives) originally proposed by Kandori  have been shown to work effectively as incentive mechanisms to force cooperation in crowdsourcing platforms [13, 16, 25, 12, 24]. Generally speaking, a rating protocol labels each user by a rating label based on his past behaviors indicating his social status in the system. And users with different ratings are treated differently by the other users they interact with, and the rating of a user who complies with (resp. deviates from) the social norm goes up (resp. down). Hence, a user with high/low rating can be rewarded/punished by other users in a crowdsourcing platform who have not had past interactions with him. Furthermore, the use of ratings as a summary record of a user requires significantly less amount of information being maintained . Hence, the rating protocol has a potential to form a basis for successful incentive mechanisms for service exchange in crowdsourcing platforms. Motivated by the above considerations, this paper is devoted to the study of incentive mechanisms based on rating protocol.
However, there are several major reasons that prevent existing works on the rating protocol to be directly implemented for incentive provision for service exchange in crowdsourcing: (i) Users have asymmetric service requirements and they can freely and frequently change their partners they interact with in most crowdsourcing platforms, which results in asymmetric interactions among those users, and it is more difficult to model and analyze [27, 28]; (ii) Taking into account the service capability of users and the spatial/temporal requirements of tasks, using the framework of anonymous random matching games in which each user is repeatedly matched with different partners over time for service exchange is inappropriate [12, 16]; (iii) User population is large, users are anonymous and not sufficiently patient, especially when those users with bad ratings may attempt to leave and rejoin the system as new members to avoid punishments (i.e., whitewashing)[24, 29]; (iv) In the presence of imperfect monitoring, a user’s rating may be wrongly updated, which will impact on rating protocol design, as well as social welfare loss [13, 25].
In this paper, we take into account the above features of service exchange in crowdsourcing into consideration, and propose a game-theoretic framework for designing and analyzing a class of rating protocols based incentive mechanisms, in order to stimulate cooperation among self-interested users and maximize the social welfare. To the best of our knowledge, the update of ratings of both users (we name it as a two-sided rating) matched in the service exchange game is rarely tackled in the literature. Using game theory to analyze how cooperation can be enforced and how to maximize the social welfare under the designed two-sided rating protocol, we rigorously analyze how users’ behaviors are influenced by intrinsic parameters and design parameters as well as users’ evaluation of their individual long-term utilities, in order to characterize the optimal design that maximizes users’ utilities and enforces cooperation among them. The main contributions of this paper are summarized as follows:
We model the service exchange problem as an asymmetric game model with two stages, and show that inefficient outcome arises when no user cooperates with each other, and thus zero social welfare is obtained at myopic equilibrium, which is a social dilemma.
We develop the first game-theoretic design of two-sided rating protocols to stimulate cooperation among self-interested users, which consists of a recommended strategy and a rating update rule. The recommended strategy recommends a desirable behavior chosen from three predefined recommended plans according to intrinsic parameters, while the rating update rule involves the update of ratings of both users, and uses differential punishments that punish users with different ratings differently.
We formulate the problem of designing an optimal two-sided rating protocol that maximizes the social welfare among all sustainable rating protocols, provide design guidelines for determining whether there exists a sustainable two-sided rating protocol under a given recommended strategy, and design an algorithm achieving low-complexity computation via a two-stage procedure, each stage consists of two steps (we call this a two-stage two-step procedure), in an alternate manner.
We use simulation results to demonstrate how intrinsic parameters (i.e., costs, imperfect monitoring, user’s patience) impact on optimal recommended strategies, the design parameters to characterize the optimal design of various protocols, and the performance gain of the proposed optimal two-sided rating protocol.
The remainder of this article is organized as follows. In section II, we describe the service exchange dilemma game with two-sided rating protocols. In section III, we formulate the problem of designing an optimal two-sided rating protocol. Then we provide the optimal design of two-sided rating protocols in Section IV. Section V presents evaluation results to illustrate key features of the designed protocol. Finally, conclusions are drawn in Section VI.
Ii System Models
Ii-a Service Exchange Dilemma Game
As illustrated in Figure 1, a crowdsourcing based service exchange system consisting of a platform with several users on the Internet, where each user in a crowdsourcing platform can offer services to other users. Examples of services include sensing tasks, expert knowledge, information resource, computing power, storage space, etc. The crowdsourcing process can be described as follow: On the one hand, each user can choose to become either a service requester (i.e., client) or a service provider (i.e., server). On the other hand, a client generates a service request which is sent to a matched server, and the server devotes a high/low level of efforts to provide the requested service to the client.
We model such a process using uniform random matching, that is each user in the community is involved in two matches in every period, one as a client and the other as a server, each user is equally likely to receive exactly one request in every period, and the matching is independent in different periods. Note that the user with whom a user interacts as a client can be different from that with whom he interacts as a server, reflecting asymmetric interests between a pair of users in a given instant. Such a model well approximates the matching process between users in large-scale crowdsourcing systems where users interact with others in an ad-hoc fashion and the interactions between users are constructed randomly over time.
In this model, a user decides whether or not to request service (choosing to be a client/server), if the user chooses to be a server, he strategically determines his service quality (devoting a high/low level of efforts). Note that the decisions are sequential: the decision on role selection is made first, and then the decision on service quality is made next. We model this interaction as a sequential game. Formally, we have a two-stage game. In the first stage, a user’s action is chosen from the set , where stands for “choosing to be a client” (request service), whereas stands for “choosing to be a server”(offer service). In the second stage, the server has a binary choice of whether being whole-hearted or being half-hearted in providing the service, while the client has no choice. The set of actions for the server is denoted by , where stands for “high level of effort” and stands for “low level of effort”.
We assume that any strategy is costly (consumes a cost for choosing ). If the server devotes a high level of efforts to fulfill the client’s request, the client receives a service benefit of , while the server suffers a service cost of . If the server devotes a low level of efforts to the request, both users receive zero payoffs. Obviously, the server’s action determines the payoffs of both users. After a server takes an action, the client sends a report about the action of the server to the third-party device or infrastructure that manages the rating scores of users. Taking into account imperfect monitoring, the report is inaccurate, either by the client’s incapability of accurate assessment or by some system error with a small probability . That is, is reported when the server takes action
with probability, and vice versa111In this paper, we focus on the situation in which probability that errors occur in the first-stage of the game is approximated by 0, because when the probability for erroneous report of or is very small, errors occurring in the first stage is easy to be detected and corrected in time.. Assuming a binary set of reports, it is without loss of generality to restrict , because when , reports are completely random and do not contain any meaningful information about the actions of users. Conveniently, Table I lists frequently used notions in this paper.
We find the subgame perfect equilibrium of the two-stage game. Each pair of users’ decisions made in the first stage ( or ) result in a different second-stage game ( or ). We first compute expected utilities in the second-stage game, and then turn back to compute expected utilities when both users choose their actions in the first-stage before knowing their productivities. The pay-off matrix for the game played in the first stage was depicted in Table II. The detailed computation process is in Appendix A.
In summary, for any choice of parameters, only can be a Nash equilibrium of the service exchange game. When every user chooses his action to maximize his current payoff myopically, an inefficient outcome arises where every user receives zero payoff, which is a social dilemma. Under the current framework, nobody will take the initiative to help others, and do not expect to get help from others.
|cost for choosing to be a client.|
|cost for devoting a high level of effort to fulfill the client’s request.|
|service benefit if the service request be fulfilled.|
|probability that errors occur in the second-stage game.|
|discount factor to denote users’ patience.|
|set of rating labels.|
|recommended strategy for a server.|
|rating update rule.|
|strength of reward imposed to a server with rating .|
|strength of punishment imposed to a server with rating .|
|strength of reward imposed to a client with rating .|
|strength of punishment imposed to a client with rating .|
|ratio of the number of a user choosing to be a client and a server.|
|reported service quality by a client.|
|actual service quality devoted by a server.|
|stationary distribution of rating labels.|
|expected one-period utility of a user.|
|expected long-term utility of a user.|
|social utility under the rating protocol|
Ii-B Two-sided Rating Protocols
We consider a two-sided rating protocol that consists of a recommended strategy and a rating update rule. The recommended strategy prescribes the contingent plan according to intrinsic parameters that the server should take based on ratings of both his own and his client’s. Here, we focus on one plan, while two other plans will be introduced in the later half of this article. The rating update rule involves the update of ratings of both users depending on their past actions as a server or a client, and uses differential punishments that punish users with different ratings differently. To the best of our knowledge, two-sided rating protocol in crowdsourcing is rarely tackled in the literature. In the following, we give a formal definition of a two-sided rating protocol.
A two-sided rating protocol is represented as a 5-tuple , i.e., a set of binary rating labels , a social strategy , a client/server ratio , a recommended strategy , and a rating update rule .
denotes the set of binary rating labels, where 0 is the bad rating, and 1 is the good rating.
represents the adopted social strategy for a user with rating , where .
denotes the ratio that a user with rating chooses to become a client and a server, which contains his historical choice and current choice of .
defines the strategy which the server with rating should select when faced with the client with rating .
can be denoted by a tuple , where updates the rating of a server based on his current rating, his matched client’s rating, the reported strategy and the recommended strategy as follows:
specifies how a client’s rating should be updated based only on his current rating as follows:
We characterize the erroneous report by a mapping , where 0 and 1 represent “L” and “H”, respectively.
is the probability distribution over, and is the probability that the client reports received service quality “r” given the server’s provided service quality “q”.
Remark: A schematic representation of a rating update rule is provided in Figure 2. Given a rating protocol , each user is tagged with a binary rating label representing its social status. Obviously, the higher is, the better the social status the user has. Ratings of users are stored and updated by the system administrator based on strategies adopted by the user in the transactions that he is engaged in. The rating scheme can update a user’s rating at the end of each transaction or at the beginning of the next transaction. Under the rating update rule (2) and (3), a -server (i.e., a server with rating ) will have rating 1 with probability , and have rating 0 with probability , if the service quality reported by the client is no lower than the recommended service quality ; otherwise, it will have rating 1 with probability , and have rating 0 with probability . Similarly, a -client will have rating 1 with probability and have rating 0 with probability if the ratio ; otherwise, it will have rating 1 with probability , and have rating 0 with probability . Obviously, and can be referred to as the strength of reward imposed on servers and clients when they cooperate with each other, respectively, while can be referred to as the strength of punishment imposed on servers when they do not offer high level efforts, similarly, can be referred to as the strength of punishment imposed on clients when they expect to get excessive service from others rather than to serve others.
Definition 1 describes a simple two-sided rating protocol which assigns binary rating labels to users, and provides a binary choice of whether devoting a high level of effort or a low level of effort in providing the service. Although other more elaborate two-sided rating protocols (as discussed in Section VI) may be considered, we show that this simple one is effective to overcome the inefficiency of the service exchange dilemma in crowdsourcing.
Iii Problem Formulation
Iii-a Stationary Rating Distribution
Given a two-sided rating protocol , suppose that each user always follows a given recommended strategy and keep in any period. As time passes, ratings of users are updated, and thus the distribution of users’ ratings in a system evolves over time. Let be the fraction of -users in the total population at the beginning of time period t, then the transition from to is determined by the rating update rule , taking into account the rate for a user choosing to be a client and the error probability , as shown in the following expressions:
Here we set , , the stationary distribution can be derived as follows.
Since the coefficients in the equations that define a stationary distribution are independent of the recommended strategy that users should follow, the stationary distribution is also independent of the recommended strategy, as can be seen from Eq.(6). Thus, we will write the stationary distribution as to emphasize its dependence on .
Iii-B Sustainable Conditions
The purpose of designing a social norm is to enforce a user to follow the recommended strategy and keep in any period. We call a user who complies with such a social norm as a “compliant user”, otherwise, the user who deviates from the social norm is called as a “non-compliant user”. The compliant user will be rewarded, on the contrary, a non-compliant user will be punished in order to regulate his behavior. Since we consider a non-cooperative scenario, it is important to check whether a user can improve his long-term payoff by a unilateral deviation. Note that any unilateral deviation from an individual user would not affect the evolution of rating scores and thus the stationary distribution, because we consider a continuum of users.
Let be the cost paid by a -server who is matched with a -client and follows a recommended strategy , that is, if , and if . Similarly, let be the benefit received by a -client who is matched with a -server following a recommended strategy , that is, if and if . Since we consider uniform random matching, the expected one period payoff of a -user under a rating protocol and a chosen rate before he is matched is given by
To evaluate the long-term payoff of a compliant user, we use the discounted sum criterion in which the long-term payoff of a user is given by the expected value of the sum of discounted period payoffs starting from the current period. Let be the transition probability that a -user becomes a -user in the next period under a rating protocol when he follows the recommended strategy and selects the chosen rate , which can be expressed as
The expected long-term utility of a -user is the infinite horizon discounted sum of his expected one-period utility with his expected future payoff multiplied by a common discount factor , which can be computed by solving the following recursive equation:
Where is the rate at which a user discounts his future payoff, and reflects his patience222It is obvious that a larger discount factor reflects a more patient user, but no user is patient with a discount factor as no one is willing to stay in a system forever..
It is surprising that is a constant given , which is very convenient for optimal designing the proposed two-sided rating protocols in the remainder of this paper. Since users always aim to strategically maximize their own benefits, they will find in their own self-interest to comply with the social norm under a given two-sided rating protocol, if and only if they cannot benefit in terms of their long-term utilities upon deviations. We call such a protocol as a sustainable two-sided rating protocol, and give its formal definition as follows:
(Sustainable Two-sided Rating Protocols) A two-sided rating protocol is sustainable if and only if , for all , and .
In other words, a sustainable two-sided rating protocol should maximize a user’s expected long-term utility at any period, such that no user can gain from a unilateral deviation regardless of the rating of his matched partner when every other user follows the recommended strategy and selects . It is obvious that the social welfare will be maximized when compliant users keep (i.e., ). Checking whether a rating protocol is sustainable in the second stage using the preceding definition requires computing deviation gains from all possible recommended strategies. By employing the criterion of unimprovability in Markov decision theory 
, a user’s strategic decision problem can be formulated as a Markov decision process under a two-sided rating protocol, where the state is the user’s rating , and the action is his chosen strategy , we thus establish the one-shot deviation principle for sustainable two-sided rating protocols, which provides simple conditions.
(One-Shot Deviation Principles) A two-sided rating protocol satisfies the one-shot deviation principle if and only if
For the “if” part: A user’s expected long-term utility when he adopts the recommend strategy for all , can be expressed as in Eq.(9) (here, we fix ). If the user unilaterally deviates from to at rating , his expected long term utility becomes
Where is the transition probability that a non-compliant -server becomes a -server in the next period under , which is expressed as
By comparing these two payoffs and , and solving the following inequality:
If =0, then for each , , and , we have
While if =1 and , then , and . Else if =0, then for each , self-interested users have no incentive to deviate from . Hence, we have
For the “only if” part: Suppose the rating protocol is satisfied with the one-shot deviation principle, then clearly there are no profitable one-shot deviations. We can prove the converse by showing that if is not satisfied with the one-shot deviation principle, there is at least one profitable one-shot deviation. Since and are bounded, this is true by the unimprovability property in Markov decision theory. ∎
Lemma 1 shows that if a user cannot gain by unilaterally deviating from only in the current period and following afterwards, he can neither gain by switching to any other recommended strategy , and vice versa. of Eq.(14) can be interpreted as the current gain from choosing in the second stage, while of Eq.(14) represents the discounted expected future loss due to the different transition probabilities incurred by choosing .
After analyzing sustainable conditions in the second-stage, we then step back to analyze sustainable conditions in the first stage when both users choose their strategies in the first-stage before knowing their productivities. In the first stage, users decide the optimal chosen rate , and follow the recommended strategy in their self-interest. Under the service exchange dilemma game, a -user will find it optimal to choose to be a client in the first stage, as his revenue is maximized when his matched -server chooses to follow the recommended strategy in the second stage, which yields payoff for him. On the contrary, choosing to be a sever will suffer a cost . However, social welfare is maximized if and only if every user chooses to be a server or a client with the same probability , which we name it as the principle of fairness inspired by , and derive incentive constraints that characterize sustainable conditions in the first stage as shown in Lemma 2.
(The Principle of Fairness) A two-sided rating protocol satisfies the principle of fairness if and only if
For the “if” part: Assume that each user selects in the first stage, and adopts the recommend strategy in the second stage, then his expected long-term utility can be expressed as
Where is the transition probability that a compliant -user becomes a -user in the next period when he selects in the first stage under the rating protocol , which can be found in Eq.(8).
As a -user can receive the benefit if and only if he chooses to be a client in the current period under the recommended strategy , otherwise, he will suffer a cost . We now suppose that a user deviates from to in the current period, and follows afterwards. It is obvious that , because rewards for and are the same, while a higher cost will be suffered by selecting to be a server with a higher probability . Assuming that , and according to Eq.(9), we have
According to Eq.(8), we have
Hence, Eq.(19) can be rewritten as
We can derive that is a constant according to Eq.(10), that is is a monotonic function, which is determined by the intrinsic parameters (i.e., , , , and ), as well as the design parameters (i.e., ). Assuming that is monotonic decreasing with , then no user will deviate from as it is his optimal choice. Therefore, we only need to check the case that is monotonic increasing with . It is obvious that the expected long-term utility of a user has its maximum value at =1. Without loss of generality, we now suppose that a user deviates from to =1 in the current period, and follows afterwards, then his expected long-term utilities can be expressed as
Where can be computed based on Eq.(3).
If =1, Eq.(25) can be rewritten as
While if =0, Eq.(25) can be rewritten as
For the “only if” part: Suppose is satisfied with the principle of fairness, then clearly there are no profitable deviations (i.e., or ) in the first stage. We can prove the converse by showing that if is not satisfied with the principle of fairness, there is at least one profitable deviation. Since the RHS of Eq.(17) is bounded, this is true by the unimprovability property in Markov decision theory. ∎
Using one-shot deviation principle and the principle of fairness, we can derive incentive constraints that characterize necessary and sufficient conditions for a two-sided rating protocol to be sustainable, which is formalized in the next theorem.
A two-sided rating protocol is sustainable if and only if it is satisfied with both of the one-shot deviation principle and the principle of fairness.
This proof can be directly obtained from Lemma 1 and 2, and is omitted here. ∎
Iii-C Optimization Problem with Constraints
Under a sustainable rating protocol , it is in the self-interest of each user to devote a high level of effort (i.e., the one-shot deviation principle) and take the incentive to serve others (i.e., the principle of fairness). Obviously, a sustainable two-sided rating protocol always achieves a higher social welfare than a non-sustainable one, and hence it is only necessary to consider sustainable protocols in order to maximize the social welfare. We assume that the protocol designer is profit-seeking and aims to design a rating protocol that maximizes the expected one-period utility a user obtains in one transaction, which is defined as the social welfare in this paper. As a result, the design of the two-sided rating protocol that maximizes the social welfare can be formulated as follows:
The two-sided rating protocol design problem can be formulated as follows:
It should be noted that both and , otherwise Eq.(2) will be rewritten as
which is independent of users’ behaviors, and thus cannot provide effective incentives.
Iv OPTIMAL DESIGN OF Two-Sided RATING PROTOCOLS
In this section we investigate the design of an optimal two-sided rating protocol that solves the two-sided rating protocol design problem under a given recommended strategy , i.e., selecting the optimal rating update rule , which are determined by design parameters . In order to characterize an optimal design which is denoted as , we investigate the impacts of design parameters on the social welfare , and the incentive for satisfying constraints in Eq.(30).
Iv-a Existence of a Sustainable Two-sided Rating Protocol
We first investigate whether there exists a sustainable two-sided rating protocol under , i.e., determining whether there exists a feasible solution for the design problem of Eq.(30).
A sustainable two-sided rating protocol under the recommended strategy exists if and only if
For the “if” part: Among the eight design parameters, , , and can be referred to as reward factors imposed on compliant users, while , , and can be referred to as punishment factors imposed on non-compliant users. The incentive for self-interested users to be a compliant user is maximized when we maximize all of reward factors and punishment factors, i.e., . Then, Eq.(30) can be transformed into
It is obvious that , hence, Eq.(33) can be revised as follows