MANY Non-Profit Organizations (NPOs) such as the United Nations Children’s Fund (UNICEF) provide  humanitarian assistance in developing contries. The largest challenges for these organizations are those unreachable zones  where the real time war situation or disaster level are extremely difficult to be derived. Lack of sufficient local infrastructures, disasters can only be monitored from the sky level. Satellite sensors are widely deployed in order to report images of monitoring areas .
Nowadays automatic disaster monitoring has not achieve satisfying success while highly costly manual methods cannot satisfy real-time requirements. Therefore, the fields of human computation and crowdsourcing are investigating methods to harvest crowd wisdom. GWAP is one representative theory which convert time- and energy-consuming image processing problems into games in which players are motivated to contribute. Inspired by this theory, we present a novel large-scale crowdsourcing disaster monitoring system. The system analyzes tagged satellite pictures from players and then calculate the disaster level automatically. An algorithm based on directed graph centralities is presented to address the core issues of malicious player detection as well as disaster level calculation. Out method can be applied to other human computation systems in general. As justification, the mathematical correctness of the system is proved. In the end, we also discusse some limitations and relevant solutions for the future work.
Ii Related works
Human computation system is a paradigm for utilizing the human processing power to solve problems that neither computers or humans can solve independently [26, 34]. Most of the human computation systems can be seen  as crowdsourcing trade, which rely on the wisdom of crowds. Surowiecki claimed  four critical properties of wisdom of crowds: diversity of opinion, independence, decentralization and aggregation. Oinas-Kukkonen further concluded  the theoretical foundation of wisdom of crowds based on network analysis. For instance, PageRank was first proposed by Lary Page  and applied to social network analysis . It is commonly used for expressing the stability of physical systems and the relative importance, so-called centralities, of the nodes of a network. PageRank fulfil the four condition of a wisdom of crowd mentioned above.
The fundamental theory for this paper is Game-With-A-Purpose (GWAP), which involves game theory in human computation systems [31, 33]
. It outsourced within a computational process to humans in an entertaining way, namely gamification, and recently considered as the power of addressing large-scale data labeling costs in machine learning research[7, 8, 1]. Nevertheless, the data collection mechanisms for a game is variety that should be considered in a proper way . In long-term research, ESP , and ARTigo  have verified through years of operations that human inputs are valuable and meaningful, and the most important two challenges in GWAP systems are game incentivization and malicious player detection.
Unfortunately, these existing representative GWAP-based human computation systems have the following issues: (1) They require two online players competing with each other, which may harm the degree of playability and even meet troubles when lacking of players. (2) They only use the most commonly appeared tags that cannot prevent massive malicious players attacking the system and providing meaningless tags. However, manually managing the tag database is not feasible due to the high cost of human labor and the inevitable issue of system cold start. In order to deal with the lack of players, our system turns multiplayers-required game into game between new players and existing reliable players. Furthermore, a malicious player detection algorithm based on directed graph centralities is proposed which requires only one single reliable players to avoid the issue of cold start.
Iii Design and Models
In this section, we describe the overall design and proposed models in detail. First, we propose the system architecture and specify the most critical components: player task generator (PTG), player rating model (PRM) as well as disaster evaluation model (DEM). With these components, the disaster monitoring system can handle the common issues in human computation system, such as system cold start and malicious player detection. It is also expandable, portable and can be easily applied to any other similar human computation systems.
Iii-a System Architecture and Functionalities
Figure 1 illustrates the architecture of our disaster monitoring system. The system databases are composed of two different type of databases. The player database (PlayerDB) stores gaming data including the player’s property and raw tagging inputs. The other database is called ResultDB where persistents the reliable players’ inputs that rated by our rating service. The overall data flow can be described as following:
Player task generation: The PTG mixes the reliable gaming results from ResultDB and new reported images from satellite, and then assigns them to the future players.
Malicious player detection: A reliable player requires to pass the malicious detection algorithm (see Algorithm 1) embedded in the PRM. Then the system will mark all the results from this player as reliable and then send them to the ranking service.
Disaster level evaluation: the system reuses the reliable players’ inputs into DEM that embedded in ranking service and calculates the disaster level of the monitoring region then persistents it in ResultDB.
After these three major steps, a disaster level report can be retrieved from ResultDB.
In our game, a player can execute infinity rounds of tasks, and each single round of task contains image tagging tasks. In one task, the player is asked to tag images (see Figure 1(a)). The player needs to draw a rectangle to select an area where a sign of danger or damage (such as fire or explosion) is discovered. System-suggested tags will then pop up and the player can select relevant ones by simply clicking on them (see Figure 1(b)). The player can also input new tags. The system analyzes the user input and creates a disaster level report (see Figure 1(c)) for this region which can be used by NGOs and governments.
To describe and establish our models, we describe a few basic definitions in this subsection.
The region of interests (ROI) is an indicator that represents player-selected two-dimentional region. The -th ROI from player in image at image creation time is denoted by .
Considering image implies its creation time (an image always contains its creation time), for convenience, is simplified as . For instance, Figure 3 shows some examples of ROIs in different images.
As we discussed in Section III, each tag can only be selected once, and players are allowed to input new tags for the selected ROIs.
Then, We define the ROI tag vector for the model:
Assuming the database stores different tags , , …, for a certain image , the tag vector of (the -th ROI in image of player ) is a vector that is denoted by the following formula:
where is the -th tag where , is the count of in a player task object, and equals to the number of tags.
For instance, for a certain image , 5 different tags , , , , were input by our game player. Assuming player selects the first ROI and inputs tags for : , , , , and player selects the first ROI and inputs tags for : . Then tag vector of is and tag vector of is .
Iii-C Player Task Generator
The PTG creates task images by combining images from satellite and ResultDB. A player task contains different images in random order, in which images are untagged new satellite images and other images are tagged images from ResultDB, PTG thus contains two generating steps:
PTG splits a monitoring region into small pieces of images, assigning a unique identifier for each piece (The reason is discussed in Section IV-B).
PTG retrieves tagged images from ResultDB, then combines these two types of images to create a task for a new upcoming player.
Iii-D Player Rating Model
The PRM is responsible for detecting malicious players. We convey the basic idea of centralities of a network and use eigenvalue as the trust value for each player to identify malicious players among all players.
The model is established from image dependent perspective. For a certain image , considering a directed player rating graph (PRG) between players who tagged the image . Each player is a node of PRG, as illustrated in Figure 4.
Assuming the database stores different tags , , …, . The system weight vector , , …, of all tags can be calculated by the following Equation 2:
where is the count of in the system.
Assuming different tags , , …, were tagged in a certain image , the image weight vector is a vector for image that is composed by part of the system weight vector, which is denoted by , , , with , and .
For instance, the system has 2 different images. The first image is tagged by two players. One is , , and another is , ; The second image is tagged by three players, their results are: , , ; , , ; , , . Thus, the system currently has 5 different tags , , , , . Each tag has corresponding counts: ; Therefore the system weight vector is , , , , ; the image weight vector of the first image is , , since the first image only is tagged by , , , and the image weight vector of the second image is the same as the system weight vector since the second image is tagged by all exist tags.
holds the properties: a) , b) , and c) .
The players ROI matching ratio (PRMR) is an importance measurement that measures the proportion of two different ROI intersection surface from player and the ROI surface from player in a certain image , which is denoted by the following formula:
where is the -th selected ROI from player , and is the surface area of .
The following inequality holds:
The players input tag correlation (PITC) is an importance measurement that measures the proportion of the covariance of two different tag vectors from player and the covariance of from player with itself under the image weight vector , which is denoted by the following formula:
where is the weighted covariance between and , which denoted by:
The definition of PRMR and PITC share the same intent for measuring asymmetric importance between player and player (namely how thinks of ).
28, 13]. statistically used to compare the similarity and diversity of sample sets. Differ from IoU, we only divided a single ROI surface area to guarantee the asymmetric property for directed graph weight.
The definition of PITC is inspired by the weighted pearson correlation coefficient , which is a measure of the linear correlation between two variables. In our case, with the same intent of PRMR, we drop the part of covariance of player in denominator to guarantee the asymmetric property for directed graph weight.
The PRMR and PITC both are not metrics of distance due to as well as .
The following inequality holds:
So far, we have enough techniques to define the edge weight of PRG.
For a certain image , the edge weight of the PRG between player and is denoted by the formula 9:
with player selected ROIs, player selected ROIs.
The Perron-Frobenius theorem guarantees our goal can be drifted to the calculation of the adjacency matrix of PRG. In consequence, one can use the normalized adjacency matrix by using formula 10:
where is the image indicator.
Theorem 1 (Soundness).
The normalized adjacency matrix of PRG of a certain image is irreducible, real, non-negative, and column-stochastic, with positive diagonal element.
From the proof (in Appendix -F) of property of positive diagonal elements, one can observe that the number “2” is a translation that guarantees lies on closed interval which helps us prove this theorem successfully.
According to Perron-Frobenius theorem and Theorem 1
, one can infer that there exists an uniqueness eigenvector, , of (Perron vector), with an uniqueness eigenvalue is the spectral radius of (Perron root), such that:
Therefore, we define the trust value of a player as following:
A trust value of player on image is a score that equals to the -th component of the Perron vector of the normalized PRG adjacency matrix .
This definition represents the rating score from player to player for a certain image , as same as the centrality of the player . With the trust value of players, we propose our classification algorithm:
The criterion of classifying new players performs the action that
the trust value of a new player should not be less than the mean value of overall trust value of players on image
The criterion of classifying new players performs the action that the trust value of a new player should not be less than the mean value of overall trust value of players on image, which means the tagging performance of new player should not be worse than result performance of former players. The acceptance threshold is a customizable parameter that can be set beforehand. For instance, if , the new player only needs to pass one singular image of all tagged images; if (half images of the task), the new player has to pass all tagged images, which makes the system unbreakable if the system is initialized by a trusted group.
Note that sometimes new player carries new tags into the system. It will influence the tag vector calculation and cause the weight not computable due to the inequal dimensions of the tag vector of new player and old player. A solution for this issue is proposed in the following steps:
If a new player does not provide new tag: Directly perform the calculation with the algorithm;
If a new player carries new tags only: Directly drop them because they are unreliable;
If a player carries both selected and new tags: a) Perform the calculation with the algorithm without new tags; b) Merge and update all weight vector via formula 9 if the player is reliable; c) Otherwise drop and mark the result as unreliable.
Iii-E Disaster Evaluation Model
A monitor region is composed by images . Each image exists number of ROIs with , and each ROI is tagged with tags . The disaster level of a monitor region is calculated by the following:
where is the surface area of a ROI, means accumulated surface area of all ROIs that tagged by , and is the surface area of image .
The disaster level is defined as a weighted area coverage. The is a surface ratio of the ROI over monitoring area, and the is the correcponding weight of the ratio.
Theorem 2 (Denseness).
The disaster level is dense in internal .
Iii-F Model Initialization
Due to the lack of users in the very beginning, cold start is a common problem in such human computation system. This issue is nornally solved by hiring people to create data manually. In this system, we only have to consider one system initialization issue of cold start.
The issue appears in the PTG. To initialize the whole system, we need to address an initial trusted group for PTG who shall tag enough initial trusted results as well as a fixed predefined tag list (containing all of the most important keywords that need to be monitored) for PTG and then assign the tagged images to new upcoming players. Once a new player is included in the trusted group, all the relevant result from this player will be considered as reliable. The trusted group and available dataset grows with gradually growing number of reliable players and their reliable tags, as shown in Figure 5.
Thus, we have only one issue regarding the minimum number of the initial trusted group. Our PRM is based on graph centrality calculation, which means we need a (at least) two dimensional matrix to perform the overall model calculation. Hence, with the new player, the minimum number of the initial trusted group is 1. Then the initial trusted group (one person) with the new player form a two dimensional adjacency matrix that makes the model computable. For larger initial trusted groups, the trust value can be simply initialized to with is the number of initial trusted group.
We have described the system architecture and 3 core models for task generation, malicious player detection and disaster level evaluation. Malicious player detection is essentially a classification problem in which our system determines the reliability of a new player based on the trust value. In this section, we would like to discuss some issues for the future work.
Iv-a Simulated evaluation
To evaluate our model, a typical classification model performance evaluation metric is receiver operating characteristic (ROC) curve, which plots True Positive (on the y-axis) against False Positive and the ideal surface under the ROC curve is 1. Nevertheless, before we test the system with real users, one can generate a reasonable random dataset to test the performance of our classification model (PRM).
Our player has two different types of inputs: the ROI and its tag vector. For a reasonable player data entity, one has to define the ROI selection and its corresponding tag vector. To generate reasonable ROI for simulating real user behavior, we would like to discuss a desktop target click behavior first.
. It has been modeled and proved that the distribution of click behavior for a certain point satisfy Gaussian distribution
. Thus, from frequency statistic view, the actual ROI(s) certainly exists. No matter where the user starts, according to the Fitts Law and FFitts Law, the starting click point should follows normal distribution around the actual point, as shown in Figure6. Similarly, the end point of the selection of ROI(s) should also follows a normal distribution.
Therefore, to generate ROI(s), let as the player ROI starting point, , as the height and width pair of this ROI, then we generate noise for the ROI starting points and landing points: where . For the parameter
, one can use maximum likelihood estimator to perform the inference for all manually ROI selection samples from initial trusted group.
The generation of tag vector for a certain image is simpler than ROI’s. A randomly pick from initial trusted group is sufficient for the simulation case because these tags are trusted results and a partially randomly selection already introduced the noise in this case.
Eventually, one can apply this random dataset to evaluate surface under the ROC curve as an indication of the overall performance (the model may show good performance if the surface approximate to 1).
Iv-B Data leakage and information loss
In order to prevent leakage of data to malicious players, we intentionally cut original satellite images into small segmentations. However, this method may cause information loss if some important ROIs are located at the intersection of two dividing lines. A possible solution is to consider “half shifting” cut, as shown in Figure 7.
Our PRG network is based on image dependent perspective, that leads, each calculated disaster level
may become invalid if the region image is outdated. We assume the satellite takes pictures for the monitoring area between intervals. However, our model only calculates the disaster level at a unique moment, which means the disaster level needs transvaluation when a new image is generated. If none of the new images gets evaluated, then the disaster level will not be updated. The disaster level of a certain region over time is essentially a non-stationary process time series prediction method  can be applied on the disaster level time series.
Considering the fact that most parts of the earth are lake, forest, desert and so on, during the game playing, players may meet the situation that there is no available ROI in several continuous rounds. Obviously, it will decrease the playability and enjoyment of the game. A possible solution is pre-filtering these images from the image database.
In this paper, we explored a GWAPs-based disaster monitoring system. We firstly proposed a player rating model based on eingenvalue centralities to calculate the trust value of a player. And then we proposed an algorithm for malicious user detection. As justification, we proved the mathematical correctness of this model. We then calculate the regional disaster level in the disaster evaluation model. We also deal with the general problem of system cold start by introducing the method of image half shifting cut. Our system design can also applied to other similar human computation systems. Furthermore, we discussed theoretical evaluation criteria for this system, and then addressed corresponding solutions for the issues of data leakage, information loss and game playability.
The authors would like to thank Prof. François Bry and Prof. Andreas Butz for their valuable input; we also thank colleague Yingding Wang for his inspiration on system design, algorithm rationalizations as well as system evaluations. Finally, we also thank Huimin An for his inspiration on Bayesian perspective that helps us handling human inputs with new tags successfully.
-  (2018) Large-scale information extraction using rules, machine learning, and human computation. The University of Wisconsin-Madison. Cited by: §II.
-  (2013) FFitts law: modeling finger touch with fitts’ law. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1363–1372. Cited by: §IV-A.
-  (2001) Eigenvector-like measures of centrality for asymmetric relations. Social networks 23 (3), pp. 191–201. Cited by: §II.
-  (2013) Time series: theory and methods. Springer Science & Business Media. Cited by: §IV-C.
Flexible, high performance convolutional neural networks for image classification. In
IJCAI Proceedings-International Joint Conference on Artificial Intelligence, Vol. 22, pp. 1237. Cited by: §III-E.
-  (2012) Multi-column deep neural networks for image classification. CoRR abs/1202.2745. External Links: Cited by: §III-E.
-  (2018) Quality control in crowdsourcing: a survey of quality attributes, assessment techniques, and assurance actions. ACM Computing Surveys (CSUR) 51 (1), pp. 7. Cited by: §II.
-  (2018) Introduction to artificial intelligence. Springer. Cited by: §II.
-  (1912) Über Matrizen aus nicht negativen Elementen.
-  (1963) Statistical analysis based on a certain multivariate complex Gaussian distribution (an introduction). The Annals of mathematical statistics 34 (1), pp. 152–177. Cited by: §IV-A.
-  (2016)(Website) Note: https://www.theguardian.com/world/2016/dec/23/i-couldnt-take-anything-except-dignity-people-aleppo-syria-on-fleeing-city[Online; accessed 31-July-2017] Cited by: Fig. 2.
-  (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve.. Radiology 143 (1), pp. 29–36. Cited by: §IV-A.
-  (1901) Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull Soc Vaudoise Sci Nat 37, pp. 547–579. Cited by: Remark 6.
-  (2008) A game-theoretic analysis of games with a purpose. In Proceedings of the 4th International Workshop on Internet and Network Economics, WINE ’08, Berlin, Heidelberg, pp. 342–350. External Links: Cited by: §II.
-  (1990) Maximum likelihood estimation and inference on cointegration—with applications to the demand for money. Oxford Bulletin of Economics and statistics 52 (2), pp. 169–210. Cited by: §IV-A.
-  (2012) ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.), pp. 1097–1105. Cited by: §III-E.
-  (2015) Learning theory and algorithms for forecasting non-stationary time series. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.), pp. 541–549. Cited by: §IV-C.
-  (2002-10-01) Discretization: An Enabling Technique. Data Mining and Knowledge Discovery 6 (4), pp. 393–423. External Links: Cited by: Remark 2.
-  (1992) Fitts’ law as a research and design tool in human-computer interaction. Human-computer interaction 7 (1), pp. 91–139. Cited by: §IV-A.
-  (2018) Game theory and control. Annual Review of Control, Robotics, and Autonomous Systems 1, pp. 105–134.
-  (2006) The wisdom of crowds: why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations. Business Economics 41 (4), pp. 63–65. Cited by: §II.
-  (2008) Network analysis and crowds of people as sources of new organisational knowledge. Knowledge Management: Theoretical Foundation, pp. 173–189. Cited by: §II.
-  (1999) The PageRank citation ranking: Bringing order to the web. Technical report Stanford InfoLab. Cited by: §II.
-  (1895) Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London 58, pp. 240–242. Cited by: §-D, Remark 7.
-  (1907) Zur theorie der matrices. Mathematische Annalen 64 (2), pp. 248–263.
-  (2009) A taxonomy of distributed human computation. Human-Computer Interaction Lab Tech Report, University of Maryland. Cited by: §II.
-  (2011) Human computation: a survey and taxonomy of a growing field. In Proceedings of the SIGCHI conference on human factors in computing systems, pp. 1403–1412. Cited by: §II.
-  (1996) The probabilistic basis of Jaccard’s index of similarity. Systematic biology 45 (3), pp. 380–385. Cited by: Remark 6.
-  (2017)(Website) Note: https://www.unicef.org/appeals/syrianrefugees_sitreps.html[Online; accessed 31-July-2017] Cited by: §I.
-  (1994) The state of the world’s children. 1998. Unicef. Cited by: §I.
-  (2004) Labeling images with a computer game. In Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 319–326. Cited by: §II.
-  (2008) Designing games with a purpose. Communications of the ACM 51 (8), pp. 58–67. Cited by: §II.
-  (2006) Games with a purpose. Computer 39 (6), pp. 92–94. Cited by: §II.
-  (2016)(Website) Note: http://www.cs.cmu.edu/ biglou/[Online; accessed 31-July-2017] Cited by: §II.
-  (2013) ARTigo: building an artwork search engine with games and higher-order latent semantic analysis. In First AAAI Conference on Human Computation and Crowdsourcing, Cited by: §II.
-  (2012) Foundations of predictive analytics. CRC Press. Cited by: Remark 2.
-  (2002) Flood disaster monitoring and evaluation in China. Global Environmental Change Part B: Environmental Hazards 4 (2), pp. 33–43. Cited by: §I.