
Iterative Bayesian Learning for Crowdsourced Regression
Crowdsourcing platforms emerged as popular venues for purchasing human intelligence at low cost for large volumes of tasks. As many lowpaid workers are prone to give noisy answers, one of the fundamental questions is how to identify more reliable workers and exploit this heterogeneity to infer the true answers accurately. Despite significant research efforts for classification tasks with discrete answers, little attention has been paid to regression tasks with continuous answers. The popular DawidSkene model for discrete answers has the algorithmic and mathematical simplicity in relation to lowrank structures. But it does not generalize for continuous valued answers. To this end, we introduce a new probabilistic model for crowdsourced regression capturing the heterogeneity of the workers, generalizing the DawidSkene model to the continuous domain. We design a messagepassing algorithm for Bayesian inference inspired by the popular belief propagation algorithm. We showcase its performance first by proving that it achieves a near optimal mean squared error by comparing it to an oracle estimator. Asymptotically, we can provide a tighter analysis showing that the proposed algorithm achieves the exact optimal performance. We next show synthetic experiments confirming our theoretical predictions. As a practical application, we further emulate a crowdsourcing system reproducing PASCAL visual object classes datasets and show that denoising the crowdsourced data from the proposed scheme can significantly improve the performance for the vision task.
02/28/2017 ∙ by Jungseul Ok, et al. ∙ 0 ∙ shareread it

Adiabatic Persistent Contrastive Divergence Learning
This paper studies the problem of parameter learning in probabilistic graphical models having latent variables, where the standard approach is the expectation maximization algorithm alternating expectation (E) and maximization (M) steps. However, both E and M steps are computationally intractable for high dimensional data, while the substitution of one step to a faster surrogate for combating against intractability can often cause failure in convergence. We propose a new learning algorithm which is computationally efficient and provably ensures convergence to a correct optimum. Its key idea is to run only a few cycles of Markov Chains (MC) in both E and M steps. Such an idea of running incomplete MC has been well studied only for M step in the literature, called Contrastive Divergence (CD) learning. While such known CDbased schemes find approximated gradients of the loglikelihood via the meanfield approach in E step, our proposed algorithm does exact ones via MC algorithms in both steps due to the multitimescale stochastic approximation theory. Despite its theoretical guarantee in convergence, the proposed scheme might suffer from the slow mixing of MC in E step. To tackle it, we also propose a hybrid approach applying both meanfield and MC approximation in E step, where the hybrid approach outperforms the bare meanfield CD scheme in our experiments on realworld datasets.
05/26/2016 ∙ by Hyeryung Jang, et al. ∙ 0 ∙ shareread it

Optimal Inference in Crowdsourced Classification via Belief Propagation
Crowdsourcing systems are popular for solving largescale labelling tasks with lowpaid workers. We study the problem of recovering the true labels from the possibly erroneous crowdsourced labels under the popular DawidSkene model. To address this inference problem, several algorithms have recently been proposed, but the best known guarantee is still significantly larger than the fundamental limit. We close this gap by introducing a tighter lower bound on the fundamental limit and proving that Belief Propagation (BP) exactly matches this lower bound. The guaranteed optimality of BP is the strongest in the sense that it is informationtheoretically impossible for any other algorithm to correctly label a larger fraction of the tasks. Experimental results suggest that BP is close to optimal for all regimes considered and improves upon competing stateoftheart algorithms.
02/11/2016 ∙ by Jungseul Ok, et al. ∙ 0 ∙ shareread it

Learning Data Dependency with Communication Cost
In this paper, we consider the problem of recovering a graph that represents the statistical data dependency among nodes for a set of data samples generated by nodes, which provides the basic structure to perform an inference task, such as MAP (maximum a posteriori). This problem is referred to as structure learning. When nodes are spatially separated in different locations, running an inference algorithm requires a nonnegligible amount of message passing, incurring some communication cost. We inevitably have the tradeoff between the accuracy of structure learning and the cost we need to pay to perform a given messagepassing based inference task because the learnt edge structures of data dependency and physical connectivity graph are often highly different. In this paper, we formalize this tradeoff in an optimization problem which outputs the data dependency graph that jointly considers learning accuracy and messagepassing costs. We focus on a distributed MAP as the target inference task, and consider two different implementations, ASYNCMAP and SYNCMAP that have different messagepassing mechanisms and thus different cost structures. In ASYNC MAP, we propose a polynomial time learning algorithm that is optimal, motivated by the problem of finding a maximum weight spanning tree. In SYNCMAP, we first prove that it is NPhard and propose a greedy heuristic. For both implementations, we then quantify how the probability that the resulting data graphs from those learning algorithms differ from the ideal data graph decays as the number of data samples grows, using the large deviation principle, where the decaying rate is characterized by some topological structures of both original data dependency and physical connectivity graphs as well as the degree of the tradeoff. We validate our theoretical findings through extensive simulations, which confirms that it has a good match.
04/29/2018 ∙ by Hyeryung Jang, et al. ∙ 0 ∙ shareread it

On the Posted Pricing in Crowdsourcing: Power of Bonus
In practical crowdsourcing systems such as Amazon Mechanical Turk, posted pricing is widely used due to its simplicity, where a task requester publishes a pricing rule a priori, on which workers decide whether to accept and perform the task or not, and are often paid according to the quality of their effort. One of the key ingredients of a good posted pricing lies in how to recruit more highquality workers with less budget, for which the following two schemes are considered: (i) personalized pricing by profiling users in terms of their quality and cost, and (ii) additional bonus payment offered for more qualified task completion. Despite their potential benefits in crowdsourced pricing, it has been underexplored how much gain each or both of personalization and bonus payment actually provides to the requester. In this paper, we study four possible combinations of posted pricing made by pricing with/without personalization and bonus. We aim at analytically quantifying when and how much such two ideas contribute to the requester's utility. To this end, we first derive the optimal personalized and common pricing schemes and analyze their computational tractability. Next, we quantify the gap in the utility between with and without bonus payment in both pricing schemes. We analytically prove that the impact of bonus is negligible significantly marginal in personalized pricing, whereas crucial in common pricing. Finally, we study the notion of Price of Agnosticity that quantifies the utility gap between personalized and common pricing policies. This implies that a complex personalized pricing with privacy concerns can be replaced by a simple common pricing with bonus. We validate our analytical findings through extensive simulations and real experiments done in Amazon Mechanical Turk, and provide additional implications that are useful in designing a pricing policy.
04/09/2018 ∙ by Suho Shin, et al. ∙ 0 ∙ shareread it

Simulationbased Distributed Coordination Maximization over Networks
In various online/offline multiagent networked environments, it is very popular that the system can benefit from coordinating actions of two interacting agents at some cost of coordination. In this paper, we first formulate an optimization problem that captures the amount of coordination gain at the cost of node activation over networks. This problem is challenging to solve in a distributed manner, since the target gain is a function of the longterm time portion of the intercoupled activations of two adjacent nodes, and thus a standard Lagrange duality theory is hard to apply to obtain a distributed decomposition as in the standard Network Utility Maximization. In this paper, we propose three simulationbased distributed algorithms, each having different update rules, all of which require only onehop message passing and locallyobserved information. The key idea for being distributedness is due to a stochastic approximation method that runs a Markov chain simulation incompletely over time, but provably guarantees its convergence to the optimal solution. Next, we provide a gametheoretic framework to interpret our proposed algorithms from a different perspective. We artificially select the payoff function, where the game's Nash equilibrium is asymptotically equal to the socially optimal point, i.e., no PriceofAnarchy. We show that two stochasticallyapproximated variants of standard gamelearning dynamics overlap with two algorithms developed from the optimization perspective. Finally, we demonstrate our theoretical findings on convergence, optimality, and further features such as a tradeoff between efficiency and convergence speed through extensive simulations.
09/13/2018 ∙ by Hyeryung Jang, et al. ∙ 0 ∙ shareread it

QTRAN: Learning to Factorize with Transformation for Cooperative MultiAgent Reinforcement Learning
We explore valuebased solutions for multiagent reinforcement learning (MARL) tasks in the centralized training with decentralized execution (CTDE) regime popularized recently. However, VDN and QMIX are representative examples that use the idea of factorization of the joint actionvalue function into individual ones for decentralized execution. VDN and QMIX address only a fraction of factorizable MARL tasks due to their structural constraint in factorization such as additivity and monotonicity. In this paper, we propose a new factorization method for MARL, QTRAN, which is free from such structural constraints and takes on a new approach to transforming the original joint actionvalue function into an easily factorizable one, with the same optimal actions. QTRAN guarantees more general factorization than VDN or QMIX, thus covering a much wider class of MARL tasks than does previous methods. Our experiments for the tasks of multidomain Gaussiansqueeze and modified predatorprey demonstrate QTRAN's superior performance with especially larger margins in games whose payoffs penalize noncooperative behavior more aggressively.
05/14/2019 ∙ by Kyunghwan Son, et al. ∙ 0 ∙ shareread it

Learning to Schedule Communication in Multiagent Reinforcement Learning
Many realworld reinforcement learning tasks require multiple agents to make sequential decisions under the agents' interaction, where wellcoordinated actions among the agents are crucial to achieve the target goal better at these tasks. One way to accelerate the coordination effect is to enable multiple agents to communicate with each other in a distributed manner and behave as a group. In this paper, we study a practical scenario when (i) the communication bandwidth is limited and (ii) the agents share the communication medium so that only a restricted number of agents are able to simultaneously use the medium, as in the stateoftheart wireless networking standards. This calls for a certain form of communication scheduling. In that regard, we propose a multiagent deep reinforcement learning framework, called SchedNet, in which agents learn how to schedule themselves, how to encode the messages, and how to select actions based on received messages. SchedNet is capable of deciding which agents should be entitled to broadcasting their (encoded) messages, by learning the importance of each agent's partially observed information. We evaluate SchedNet against multiple baselines under two different applications, namely, cooperative communication and navigation, and predatorprey. Our experiments show a nonnegligible performance gap between SchedNet and other mechanisms such as the ones without communication and with vanilla scheduling methods, e.g., round robin, ranging from 32
02/05/2019 ∙ by Daewoo Kim, et al. ∙ 0 ∙ shareread it
Yung Yi
is this you? claim profile