TeleCrowd: A Crowdsourcing Approach to Create Informal to Formal Text Corpora

by   Vahid Masoumi, et al.

Crowdsourcing has been widely used recently as an alternative to traditional annotations that is costly and usually done by experts. However, crowdsourcing tasks are not interesting by themselves, therefore, combining tasks with game will increase both participants motivation and engagement. In this paper, we have proposed a gamified crowdsourcing platform called TeleCrowd based on Telegram Messenger to use its social power as a base platform and facilitator for accomplishing crowdsourcing projects. Furthermore, to evaluate the performance of the proposed platform, we ran an experimental crowdsourcing project consisting of 500 informal Persian sentences in which participants were supposed to provide candidates that were the formal equivalent of sentences or qualify other candidates by upvoting or downvoting them. In this study, 2700 candidates and 21000 votes were submitted by the participants and a parallel dataset using candidates with the highest points, sum of their upvotes and downvotes, as the best candidates was built. As the evaluation, BLEU score of 0.54 was achieved on the collected dataset which shows that our proposed platform can be used to create large corpora. Also, this platform is highly efficient in terms of time period and cost price in comparison with other related works, because the whole duration of the project was 28 days at a cost of 40 dollars.



There are no comments yet.


page 1

page 2

page 3

page 4


RC-chain: Reputation-based Crowdsourcing Blockchain for Vehicular Networks

As the commercial use of 5G technologies has grown more prevalent, smart...

WorkerRep: Immutable Reputation System For Crowdsourcing Platform Based on Blockchain

Crowdsourcing is a process wherein an individual or an organisation util...

Small Profits and Quick Returns: An Incentive Mechanism Design for IoT-based Crowdsourcing under Continuous Platform Competition

Crowdsourcing can be applied to the Internet-of-Things (IoT) systems to ...

Benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content

In this paper we present a benchmark dataset generated as part of a proj...

NF-Crowd: Nearly-free Blockchain-based Crowdsourcing

Advancements in distributed ledger technologies are rapidly driving the ...

Developing a Multi-Platform Speech Recording System Toward Open Service of Building Large-Scale Speech Corpora

This paper briefly reports our ongoing attempt at the development of a m...

PerPaDa: A Persian Paraphrase Dataset based on Implicit Crowdsourcing Data Collection

In this paper we introduce PerPaDa, a Persian paraphrase dataset that is...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.