Automatic detection of passable roads after floods in remote sensed and social media data

This paper addresses the problem of floods classification and floods aftermath detection utilizing both social media and satellite imagery. Automatic detection of disasters such as floods is still a very challenging task. The focus lies on identifying passable routes or roads during floods. Two novel solutions are presented, which were developed for two corresponding tasks at the MediaEval 2018 benchmarking challenge. The tasks are (i) identification of images providing evidence for road passability and (ii) differentiation and detection of passable and non-passable roads in images from two complementary sources of information. For the first challenge, we mainly rely on object and scene-level features extracted through multiple deep models pre-trained on the ImageNet and Places datasets. The object and scene-level features are then combined using early, late and double fusion techniques. To identify whether or not it is possible for a vehicle to pass a road in satellite images, we rely on Convolutional Neural Networks and a transfer learning-based classification approach. The evaluation of the proposed methods are carried out on the large-scale datasets provided for the benchmark competition. The results demonstrate significant improvement in the performance over the recent state-of-art approaches.



There are no comments yet.


page 6


Natural Disasters Detection in Social Media and Satellite imagery: a survey

The analysis of natural disaster-related multimedia content got great at...

Multi-Modal Machine Learning for Flood Detection in News, Social Media and Satellite Sequences

In this paper we present our methods for the MediaEval 2019 Mul-timedia ...

Floods Detection in Twitter Text and Images

In this paper, we present our methods for the MediaEval 2020 Flood Relat...

Satellite Imagery Feature Detection using Deep Convolutional Neural Network: A Kaggle Competition

This paper describes our approach to the DSTL Satellite Imagery Feature ...

Deep Learning Methods for Event Verification and Image Repurposing Detection

The authenticity of images posted on social media is an issue of growing...

Fully Convolutional Network for Automatic Road Extraction from Satellite Imagery

Analysis of high-resolution satellite images has been an important resea...

Utilizing Neural Networks and Linguistic Metadata for Early Detection of Depression Indications in Text Sequences

Depression is ranked as the largest contributor to global disability and...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Natural disasters, such as floods, earthquakes and storms, may cause significant damage to both human life and infrastructures. In such adverse events, an instant access to relevant information may certainly help in the rescue operations, which will ultimately help in mitigating the damage [2, 4]. Having an idea of the scope of damage inflicted due to a disaster, government and non-government organizations could allocate their resources to the affected areas accordingly. In such situations, especially in flood events, information about the passability of the roads, i.e., it is possible to travel through the affected areas, is a crucial element for emergency response and deployment of the resources for rescue operations.

In this respect, social media emerged as an important source of information and has been proved very effective in emergency situations, where news agencies could not provide information at all or in time [17, 54]. For instance, in [48], several situations have been reported where news agencies with conventional sources of information failed to report in a timely fashion simply due to the insufficient number of reporters covering the world. On the other hand, due to wide geographical coverage and high spatial and multi-spectral resolutions, satellite imagery has been widely used for the analysis of natural disasters and their impact on the environment [29, 35].

The joint use of social media and remotely sensed information has been already investigated in the literature for the analysis of natural disasters and their potential impact on the environment [2, 13, 28, 39]. To encourage research in this area, as well as to compare the performance of different software solutions, a task at the MediaEval benchmarking initiative has been initiated with particular attention to flood detection for two consecutive years i.e., MediaEval 2017 [14] and MediaEval 2018 [15]. In MediaEval 2017, the challenge aimed at flood detection in images from social media and satellite. In Mediaeval 2018, the challenge targeted analysis of social media and satellite imagery for detection of passable roads in flood-affected regions. The 2018 challenge is composed of two parts:

  1. FCSM: Flood Classification for Social Multimedia. This task is further divided in two sub-tasks aiming to predict:

    1. whether there are evidences of a flood in a given social media image or not and,

    2. if evidences of flood exists in the image, whether it is possible to pass through the flooded road (passability).

  2. FDSI: Flood Detection in Satellite Imagery. This task aims to analyze the roads from satellite images, and predict whether or not it is possible for a vehicle to pass a road.

This paper addresses the MediaEval 2018 challenge. For FCSM (1), we rely on an ensemble of several deep models using three different fusion techniques. For FDSI (2), we rely on a Convolutional Neural Network (CNN) architecture and a transfer learning-based classification approach to identify passable roads in satellite imagery. The main contributions of the work can be summarized as:

  • We analyze flood-related images from social media and satellite to identify passable roads. In details, we aim to analyze (a) whether the images provide evidence for road passability and (b) to differentiate between images showing passable vs non passable roads.

  • On thhe FCSM task, being an important ingredient of several multimedia analysis applications frameworks, we analyze the performances of four models from three well-known architectures, pre-trained on object and places datasets, both individually and in different combinations using three different fusion techniques, namely early, late and double fusion. We believe such rigorous analysis will set a benchmark for the future research on the topic.

  • For the analysis of satellite imagery in the FDSI task, we propose a CNN- and a transfer learning-based classification approach to identify passable roads in satellite imagery.

  • We have performed experiments on a challenging benchmark dataset provided for the benchmark competition, and show that better scores are achieved compared to the recent literature.

The rest of the paper is organized as follows. Related work is discussed in Section 2, followed by a detailed presentation of our proposed solution (Section 3). We show experimentally how our approach performs (Section 4), and we conclude and provide potential future research directions in Section 5.

2 Related Work

The literature on natural disasters detection and analysis can be roughly divided into two parts, namely (i) disaster detection in social media, and (ii) disaster detection in satellite imagery. In the next subsections, we provide a detailed review of the relevant literature in both domains.

2.1 Disasters detection in Social Media Images

In recent years, several applications have emerged to make use of data posted on social media platforms in combination to other media streams available (e.g. Google Street view imagery, OpenStreetMap map) allowing 3D reconstruction of cities [46] and automatic discovery and geo-tagging of objects [34].

Geo-located and time-stamped data available in form of text and visual content on social media have also been widely utilized for disaster events analysis to gather useful information to be used in rescue and rehabilitation [2]. To this aim, most of the approaches rely on two types of complementary information including visual contents and the additional information associated with images in the form of metadata, such as user tags, geo-location and temporal information. For instance, in [53], users’ tags and other useful information from metadata are jointly utilized with visual features in an early fusion scheme. In [13], visual features extracted through deep models, pre-trained on ImageNet [21]

, are complemented by textual information, such as users tags, geo-location and temporal information along with textual description. Both textual and visual features are evaluated individually and jointly by concatenating feature vectors.

Existing pre-trained models are also used in [3], where five different models from four state-of-the-art deep architectures, namely AlexNet [33], GoogleNet [49], VggNet [45] and ResNet [26], pre-trained on the large-scale ImageNet and Places datasets [57]

, were used. The basic insight of the paper was to combine object and scene-level features for the flood classification task. Individual Support Vector Machines (SVMs) are then trained on the features extracted through each model, followed by a fusion phase where three different late fusion techniques are used to combine the scores obtained through the individual classifiers along with a Random Forest classifier trained on textual features. Object and scene-level features are also used in

[7, 10, 6], for the classification of flooded and non-flooded images in social media.

Tkachenko et al. [52] rely on hand-crafted visual features, such as the colour and edge directivity descriptor (CEDD) [18], color layout [30] and Gabor wavelets [12]. A more sophisticated solution has been proposed for textual information (i.e., description, title and users’ tags) relying on word embeddings trained on the entire YFCC100m dataset [50]. Each textual feature is extracted separately, and then concatenated to form a single feature vector. Moreover, to translate users’ tags into English, a machine translation technique has been employed. In [25], handcrafted visual features are concatenated into a single feature vector followed by dimensionality reduction and classification phases. Term Frequency Inverse Document Frequency (TFIDF) [44] are measured for users’ tags to represent textual features. In [19], handcrafted visual features along with textual information are used for the classification of flood related images.

An active learning framework intending to collect, filter and analyze social media contents for natural disasters has been proposed in

[8]. For data collection, a publicly available system, namely AIDR [27], has been used to crawl social media platforms, followed by a crowd-sourcing activity for data annotation. A pre-trained model [45] is then fine-tuned on the annotated images for classification purposes.

2.2 Disaster events detection in Satellite Imagery

Being one of the most valuable sources of information for disaster analysis [2, 5, 32], a growing portion of research also aims at the detection and classification of natural disaster events in satellite imagery. Liu et al. [36] proposed a deep architecture along with a wavelet transformation-based pre-processing scheme for the identification of disaster affected areas in satellite imagery. Amit et al. [9] propose a CNN-based deep architecture composed of five weighted layers for landslides and flood detection in satellite imagery.

Benjamin et al. [13] approach the flood detection in satellite imagery as an image segmentation task where a CNN-based framework with three different training strategies has been adopted. In order to remove a location bias due to local changes in images due to lighting conditions and other atmospheric distortions, the individual components of the provided satellite imagery, i.e., RGB and IR, are normalized before training the model. In [39], authors exploit the diversity of different CNNs, which are mainly based on dilated convolution [55] and de-convolution [11], in a fusion framework . Initially, binary maps obtained with the individual models are concatenated, which are then used to train SVMs for analyzing which and when the individual model is better. Finally, SVMs are trained on features/maps obtained with the combination of the best models to predict the final binary maps of the test images.

Ahmad et al. [3] tackle flood detection in satellite imagery as a generative problem where an Adversarial Generative Networks (GANs) based framework has been proposed. The framework mainly relies on a GANs architecture, namely V-GAN [47], originally developed for the retinal vessel segmentation. In order to adopt the architecture for the flood detection task, the top layer of the generative network is extended with a threshold mechanism to generate binary segmentation mask of the flooded regions in satellite imagery. In an other work from the same authors [2], the input layer is modified to support 4 channel input images (i.e., RGB and IR) and several experiments are conducted to evaluate the performance of RGB and IR components individually and jointly. In [52], different indices, namely Land Water Index (LWI), Normalised Difference Vegetation Index (NDVI) and Normalised Difference Water Index (NDWI) are selected from the spectral images. Subsequently, two different strategies based on supervised classification and un-supervised clustering techniques are then adopted for the identification of flooded regions in satellite imagery. On the other hand, Avgrinak et al. [10] rely on Mahalanobis distance [20] and some morphological operations for the task.

2.3 MediaEval 2018 challenge on Multimedia Satellite Task: Emergency Response for Flooding Events

The problem posted in the MediaEval 2018 challenge is different from the existing state-of-the-art, and focuses on a different prospective of detecting road passability in flood affected areas from social media and satellite imagery. In this section, we will briefly discuss the other approaches proposed for the benchmark competition.

Similar to the challenge posted in MediaEval 2017 [14], in MediaEval 2018, two tasks have been included in the challenge. The majority of the solutions proposed for the challenge rely on deep architectures. For instance, Fen et al. [23] use several deep models pre-trained on Imagenet dataset along with textual features extracted through fasttext [16] for the FCSM task. In [37]

, the performance of several deep models is evaluated in a framework with double ended classifier and compact loss function treating the two sub tasks as one class classification problem by individually training the models on the provided dataset. Moreover, data augmentation techniques are used to increase the number of training samples. On the other hand, textual features are represented through embedding initialized with Glove

[40]. Another deep architecture based method is presented in [31], where an existing deep model pre-trained on ImageNet is fine-tuned on the provided dataset. Moreover, the Bag of visual Words (BoW) model over the textual information is also used for the FCSM task. For the FDSI task, image patches each of size pixels are extracted around each of the two given points. Visual features are then extracted through RGB histograms with 16 bins per channel followed by training and SVMs based classification of the test patches.

In [38], a deep architecture based framework containing two separate deep architectures (VggNet) for the evidence and passability sub-tasks has been proposed in the FCSM task. Each image is passed through the both networks aiming to predict evidence for the road passasbility and differentiating in passable and non-passable images. Moreover, stacked auto-encoders are used for the early fusion of textual and visual features. For the FDSI task, a ResNet model pre-trained on ImageNet is fine-tuned on the satellite imagery provided for the challenge. The authors in [56], utilize scene-level features extracted through a deep model pre-trained on places dataset for the evidence of road passability. For the second sub-task to differentiate in passable and non-passable roads, both object and scene-level features are extracted through deep models. Moreover, presence of a boat is also used as an indication of the non-passable roads. Hanif et al. [24] adopted an ensemble framework to jointly utilize local features and global visual features for the FCSM task. Global features are extracted through several features descriptors, while a CNN-based local feature descriptor is used for the extraction of local features. On the other hand, textual information is represented through frequency–inverse document frequency (TF-IDF). Subsequently, an ensemble framework is used to jointly utilize these features for the FCSM task.

3 Proposed Solutions

In this section, we present our proposed solutions for the both tasks. First, we describe the methodology proposed for the FCSM task, and then we provide the details of the methodology adopted for the FDSI task.

3.1 Methodology for FCSM Task

The first task is to analyze images from social media providing direct evidence for passability of roads through conventional means without needing boats and big vehicles, such as trucks. The task can be divided into two sub-tasks, namely (i) identification of images providing an evidence for road passability and (ii) differentiating passable and non-passable images among the ones providing an evidence for the road passability identified in the first step. Both steps are carried out sequentially. In Figure 1, we provide the flowchart of the first challenge showing the two sub-tasks of the FCSM task.

Figure 1: Flow chart of the first task showing the two sub-tasks of the FCSM task.

Figure 2 shows the block diagram of the methodology we adopted for the FCSM task. The proposed method is mainly composed of three phases, namely, (i) feature extraction, (ii) classification, and (iii) fusion. For feature extraction, we rely on four different models, pre-trained on ImageNet and Places datasets, from three state-of-the-art deep architectures, namely AlexNet [33], VggNet [45] and ResNet [26]. For classification we adopt a SVM. The basic motivation for the feature extraction through these deep models pre-trained on object and places datasets comes from our previous experience [6, 1], where object and scene-level features showed better performance when jointly utilized. In the final phase, we use three different fusion schemes to combine the capabilities of the four models in the FCSM task. In the next subsections, we provide a detailed description of each of the components of the methodology.

Figure 2: Block diagram of the proposed methodology of the three different techniques used for the FCSM task.

3.1.1 Feature Extraction and classification

In this phase, we rely on four different deep models for feature extractions. Two of the models (ResNet and VggNet) are pre-trained on ImageNet [21], and the other two are pre-trained on the Places dataset [57].

Features are extracted from the last fully connected layer of each model (i.e., Fc-7 for AlexNet and VggNet, Fc-1000 for ResNet). For VggNet, we use the architecture with 19 layers, and for ResNet the configuration with 50 layers. We use the models as feature descriptors without any re-training and fine-tuning. Moreover, for feature extraction with all models, we used the Caffe toolbox

111 After feature extraction, SVMs are trained on features extracted through each individual model. For classification, we use the default parameters using Fit-Multiclass model from the MathWorks222 toolbox.

3.1.2 Fusion

To jointly utilize the capabilities of the individual models in the FCSM task, we rely on three different fusion techniques: early, late and double fusion. In the early fusion, we concatenate the features extracted through the different models. For the late fusion, in the current implementation, we simply average the results obtained through the individual models. For the third fusion technique, we combine the results obtained from the first two techniques in an additional late fusion step by averaging their scores.

3.2 Methodology for FDSI Task

For the FDSI sub-task, we have also opted for a CNN and a transfer learning-based classification approach, previously validated in a different application domain in our work [41]. In fact, we initially tried to apply the well-performing GAN approach introduced in our previous works for the flood detection satellite imagery [2] and medical imagery [42, 43]. We conducted an exhaustive set of experiments, but we unfortunately could not achieve a roads passability detection performance better than random label assignment would achieve. The reason for that is the limited size of the dataset (only 1,437 samples were provided in the development set). This, in combination with the large variety of landscapes, road types, types of obstacles and weather conditions, etc., prevents the GAN-based approach from adequate training and finding key visual features required to reliably distinguish between flooded and non-flooded roads.

Figure 3 provides the block diagram of the methodology proposed for the FDSI task. This approach is based on the Inception v3 architecture [49] pre-trained on the ImageNet dataset [21] and the retraining method described in [22].

For the here presented work, we froze all the basic convolutional layers of the network and only retrained the top fully connected layer with softmax activation after random initialization of its weights. The new fully connected layer was retrained using the RMSprop 

[51] optimizer, which allows an adaptive learning rate during the training process.

As the input for the CNN model, we used the image patches, extracted from the full images using the provided coordinates of the target road end points. Visual inspection of the generated roads’ patches from the training dataset, showed relatively good coverage for the road-related areas and enough coverage of the neighbourhood areas and give enough visual information for the following CNN-based analysis and classification.

Moreover, in order to increase the number of training samples, we also performed various augmentation operations on the images. Specifically, we performed horizontal and vertical flipping, and change of brightness in the interval of .

After the model has been retrained, we used it as a multi-class classifier that provides the probability value for each of two classes: passable and non-passable. The final passability detection is done via the selection of the class with a higher probability. In case of equal class probabilities, we mark the road patch as non-passable.

Figure 3: Block diagram of the proposed satellite image processing methodology used for the FDSI task.

4 Evaluation and Results

4.1 Dataset

The dataset for evaluation is provided in the MediaEval 2018 benchmark competition on Multimedia and Satellite task. In the challenge two different collections containing images and associated meta-data from social media and satellite imagery have been provided for the FSCM and FDSI tasks, respectively.

For the FCSM task, participants were provided with a collection of tweets along with associated images from three hurricanes, namely Harvey, Maria, and Irma, occurred in 2017. The dataset is downloaded during the events from Twitter using keywords, such as flooding and floods. The dataset also contains additional information, such as rainfall and climate predictions. The development and test sets are provided, separately. The development set is composed of 7,387 tweets and associated images while the test set contains a total of 3683 tweets/images. The ground truth is provided in two separate files, one for each of the sub task (i.e., evidence and passability). Moreover, the participants are also provided with a set of handcrafted visual feature.

Satellite image patches of flooded areas from the three disaster events have been provided for the FDSI task. The FDSI dataset covers image patches from DigitalGlobe taken by the satellite WorldView3 (0.3m resolution). The dataset is provided in two files, i.e., the one containing the cropped satellite images of flooded areas, while the other provides the ground-truth labels for road passability given two points on each satellite image. The dataset is provided in two sets, namely development and test sets, containing 1,438 and 225 image patches, respectively.

4.2 Experimental results of the FCSM task

This section provides a detailed description of the conducted experiments, the results achieved, and their description and comparisons against the state-of-the-art on the FCSM task.

Table 1 provides the experimental results of our first experiment where we evaluate the performance of the individual deep models on both sub-tasks, namely (i) identification of images providing evidence for road passability, and (ii) differentiation between images showing passable vs. non passable roads, in terms of accuracy per class on both sub-tasks. The evaluation is carried out on the development set allocating 60% and 40% images for training and testing, respectively. Overall, better results are obtained on the first sub-task, i.e., identification of images providing evidence for road passability. However, the performances for differentiating in passable and non-passable roads are generally lower, demonstrating a significantly higher complexity for the task. Although there is no significant difference in the performance of models in identifying the images providing an evidence of road passability, overall, slightly better results are obtained with VggNet pre-trained on the Places dataset. Moreover, in differentiating in passable and non-passable roads, the models pre-trained on the Places dataset outperform the ones pre-trained on the object dataset showing the importance of scene-level features in the task.

Models Performance (accuracy in %)
Evidence sub-task Passability sub-task
AlexNet (places) 86.19 71.00
VggNet (places) 86.79 71.85
VggNet (ImageNet) 86.79 69.85
ResNet (ImageNet) 85.05 69.28
Table 1: Evaluation of the individual models in terms of accuracy on the validation set.

In order to analyze the performance of the models on individual classes of the dataset, we also provide the results of the models in terms of per-class accuracy in Table 2. In both sub-tasks, the accuracy on the negative samples (i.e., images with no evidence and non-passable images) is generally high. Significant variations in the performances of the individual model can been observed on positive samples (i.e., images providing evidence for passability and images of passable roads). The variations in the performances of the individual models provide basis for our second experiment, where we use three different fusion techniques to combine the capabilities of the individual models for the potential improvement in the performances.

Models Evidence sub-task(Accuracy in %) Passability sub-task(accuracy in %)
Evidence class No-evidence class Passable class Non-Passable class
AlexNet (places) 78.11 90.80 57.33 81.25
VggNet (places) 80.00 90.71 57.66 82.50
VggNet (ImageNet) 79.88 90.77 55.66 80.50
ResNet (ImageNet) 77.41 89.49 52.00 82.25
Table 2: Evaluation of the individual models in terms of per class accuracy on the validation set.

Table 3 provides the experimental results of our fusion techniques on both sub-tasks in terms of accuracy on the validation set. As can be seen, fusion contributes to significantly improving the performances of the individual models. The double fusion method, which combines the results of both early and late fusion, secures a significant improvement of 6.03% and 3.71% over the best single fusion methods, on evidence and passability sub-tasks, respectively.

Methods Performance (accuracy in %)
Evidence sub-task Passability sub-task
Early Fusion 88.81 77.00
Late Fusion 90.36 76.00
Double Fusion 96.43 80.71
Table 3: Evaluation results of the fusion experiment in terms of accuracy on the validation set.

We also provide comparisons against state-of-the-art in terms of mean F1 Score, used as an official evaluation metric in the benchmark competition. Since, on one side, we are interested in the evidence class in sub-task 1 and on the other side in the passable and non-passable classes in sub-task 2, the mean F1 score is computed as follows:

Mean_F1 = (F1 (evidence   AND   passable) + F1 (evidence   AND   non_passable))/2


Table 4 provides the comparisons of our proposed methods against the existing state-of-the-art. It is important to mention that in the benchmark competition teams could submit up to five runs: visual information only for Run 1, textual information only in Run 2, combination of textual and visual features in Run 3, and two Runs (4 and 5) without any restrictions on the modality of the information. In our method, we rely on visual information only. Our first Run is based on late fusion while in the fourth Run, we concatenate the features extracted through the individual models (early fusion). Our final Run is based on the double fusion. The conducted experiments show that the proposed solution outperforms the available state-of-the-art in all four configurations. Our best run with double fusion has significant improvement over most of the methods, and achieves comparable results with method proposed by Anastasia et al. [38], which shows the significance of combining multiple deep models for the classification purposes.

Methods Performance (F1 Measure)
Visual Textual Multi-modal Run 4 (visual) Run 5 (visual)
Feng et al. [31] 64.35 32.81 59.49 52.16 51.59
Armin et al. [31] 20.00 24.00 - 17.00 35.00
Anastasia et al. [38] 66.65 30.17 66.43 55.12 54.48
Zhao et al. [56] 63.88 12.86 - 63.13 63.89
Hanif et al. [24] 45.04 31.15 45.56 - -
Our method 63.58 (late fusion) - - 60.59 (early fusion) 65.03 (double fusion )
Table 4: Comparisons against other methods from the benchmarking competition on the FCSM dataset. In the competition up-to 5 runs were allowed.

4.3 Experimental results of the FDSI Task

For the experimental setup of the FDSI task, we decided to perform only two mandatory runs, which rely on the task-provided training data only. Due to a limited amount of training samples available, training the deep architecture from the scratch is not possible. Thus, we decided to perform two types of training for our transfer-learning detection approach.

First, we implemented a pipeline for classification that differs from common procedures. This process was involving all the training samples into the training process as both training and validation sets. Usually, for classification tasks, this would result into over-fitting of the model and inability to correctly classify the test samples. However, for this specific task, the limited number of training epochs and significant training data augmentation in conjunction with a high variety of road patch samples resulted in normal training process. This allowed to correctly retrain the last layers of the network and produce reasonable classifiers even on such a limited training set.

The official F1-Score metric (see table 5) on the non-passable road class for the first ”All-train” Run is . To verify our idea of the usability of using all the training data for both training and validation, we also performed a normal network training with a random 50/50 development/validation data split. This second Half-trained Run resulted in F1-Score of which is slightly lower comparing to the All-trained Run. This is confirming the validity of our idea of using the complete training dataset and heavy data augmentation to improve road patches classification performance.

Run Method F1 Score
1 All-train 62.30%
2 Half-train 61.02%
Table 5: Evaluation of our proposed approach for the FDSI task in terms of F1 Scores.

We also provide comparisons of our method for FDSI task against state-of-the-art in terms of F1 score in Table 6. Our method outperforms the state-of-the-art in both Run 1 and Run 2 with an improvement of 5.30% and 4.02%, respectively.

Methods Performance (F1 Measure)
Run 1 Run 2 Run 3 Run 4 Run 5
Armin et al. [31] 57.00 32.00 39.00 56.00 57.00
Anastasia et al. [38] 56.45 - - - -
Our method 62.30 61.02 - - -
Table 6: Comparisons against other state-of-the-art methods from the benchmarking competition on FDSI dataset. In the competition, up-to 5 runs were allowed.

5 Conclusions and Future Work

In this paper, we addressed a challenging problem of detecting the passibility of roads.In the social media image analysis, we mainly relied on deep features extracted through different pre-trained deep models individually and jointly through fusion. We observed better results can be achieved for the models pre-trained on Places dataset compared to the ones pre-trained on objects dataset, showing the importance of the scene-level information in the task. We also observed that the object-level information well complement the scene-level features when jointly utilized. Among the fusion methods, double fusion combines the capabilities of both early and late fusions and ultimately leads to better results. Considering the improvement with fusion techniques, in future, we aim to use some optimization methods to assign more specific weights to the individual models.

In the satellite sub-task, we found that just a normal image segmentation approach is of no help, and we implemented a task-oriented CNN and transfer learning-based approach. This approach was able to classify image patches with roads and achieved an F1-Score of for the non-passable road class. In the future, we plan to implement an advanced road network and flooding detection and segmentation using a combined CNN- and GAN-based approach pre-trained on the existing annotated road network and flooded areas datasets.


This research is partly supported by the ADAPT Centre for Digital Content Technology, which is funded under the Science Foundation Ireland Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.


  • [1] K. Ahmad, M. L. Mekhalfi, N. Conci, F. Melgani, and F. D. Natale. Ensemble of deep models for event recognition. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 14(2):51, 2018.
  • [2] K. Ahmad, K. Pogorelov, M. Riegler, N. Conci, and P. Halvorsen. Social media and satellites. Multimedia Tools and Applications, pages 1–39, 2018.
  • [3] K. Ahmad, K. Pogorelov, M. Riegler, N. Conci, and H. Pal. Cnn and gan based satellite and social media data fusion for disaster detection. In Proc. of the MediaEval 2017 Workshop, Dublin, Ireland, 2017.
  • [4] K. Ahmad, M. Riegler, K. Pogorelov, N. Conci, P. Halvorsen, and F. De Natale. Jord: a system for collecting information and monitoring natural disasters by linking social media with satellite imagery. In Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing, page 12. ACM, 2017.
  • [5] K. Ahmad, M. Riegler, A. Riaz, N. Conci, D.-T. Dang-Nguyen, and P. Halvorsen. The jord system: Linking sky and social multimedia data to natural disasters. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pages 461–465. ACM, 2017.
  • [6] K. Ahmad, A. Sohail, N. Conci, and F. De Natale. A comparative study of global and deep features for the analysis of user-generated natural disaster related images. In 2018 IEEE 13th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), pages 1–5. IEEE, 2018.
  • [7] S. Ahmad, K. Ahmad, N. Ahmad, and N. Conci. Convolutional neural networks for disaster images retrieval. In Proceedings of the MediaEval 2017 Workshop (Sept. 13–15, 2017). Dublin, Ireland, 2017.
  • [8] F. Alam, F. Ofli, and M. Imran. Processing social media images by combining human and machine computing during crises. International Journal of Human–Computer Interaction, 34(4):311–327, 2018.
  • [9] S. N. K. B. Amit, S. Shiraishi, T. Inoshita, and Y. Aoki. Analysis of satellite images for disaster detection. In Geoscience and Remote Sensing Symposium (IGARSS), 2016 IEEE International, pages 5189–5192. IEEE, 2016.
  • [10] K. Avgerinakis, A. Moumtzidou, S. Andreadis, E. Michail, I. Gialampoukidis, S. Vrochidis, and I. Kompatsiaris. Visual and textual analysis of social media and satellite images for flood detection@ multimedia satellite task mediaeval 2017. In Proceedings of the Working Notes Proceeding MediaEval Workshop, Dublin, Ireland, pages 13–15, 2017.
  • [11] V. Badrinarayanan, A. Kendall, and R. Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis & Machine Intelligence, (12):2481–2495, 2017.
  • [12] Y. Bai, L. Guo, L. Jin, and Q. Huang. A novel feature extraction method using pyramid histogram of orientation gradients for smile recognition. In Image Processing (ICIP), 2009 16th IEEE International Conference on, pages 3305–3308. IEEE, 2009.
  • [13] B. Bischke, P. Bhardwaj, A. Gautam, P. Helber, D. Borth, and A. Dengel. Detection of flooding events in social multimedia and satellite imagery using deep neural networks. In Proceedings of the Working Notes Proceeding MediaEval Workshop, Dublin, Ireland, pages 13–15, 2017.
  • [14] B. Bischke, P. Helber, C. Schulze, S. Venkat, A. Dengel, and D. Borth. The multimedia satellite task at mediaeval 2017: Emergence response for flooding events. In Proc. of the MediaEval 2017 Workshop (Sept. 13-15, 2017). Dublin, Ireland, 2017.
  • [15] B. Bischke, P. Helber, Z. Zhao, J. de Bruijn, and D. Borth. The multimedia satellite task at mediaeval 2018: Emergency response for flooding events. In Proc. of the MediaEval 2018 Workshop, Sophia-Antipolis, France, Oct. 29-31, 2018.
  • [16] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146, 2017.
  • [17] T. Brouwer, D. Eilander, A. Van Loenen, M. J. Booij, K. M. Wijnberg, J. S. Verkade, and J. Wagemaker.

    Probabilistic flood extent estimates from social media flood observations.

    Natural Hazards & Earth System Sciences, 17(5), 2017.
  • [18] S. A. Chatzichristofis and Y. S. Boutalis. Cedd: color and edge directivity descriptor: a compact descriptor for image indexing and retrieval. In

    International Conference on Computer Vision Systems

    , pages 312–322. Springer, 2008.
  • [19] M. S. Dao, Q. N. M. Pham, D. Nguyen, and D. Tien.

    A domain-based late-fusion for disaster image retrieval from social media.

  • [20] R. De Maesschalck, D. Jouan-Rimbaud, and D. L. Massart. The mahalanobis distance. Chemometrics and intelligent laboratory systems, 50(1):1–18, 2000.
  • [21] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In

    Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on

    , pages 248–255. Ieee, 2009.
  • [22] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. In Proc. of ICML, volume 32, pages 647–655, 2014.
  • [23] Y. Feng, S. Shebotnov, C. Brenner, and M. Sester. Ensembled convolutional neural network models for retrieving flood relevant tweets. In Proc. of the MediaEval 2018 Workshop, Sophia-Antipolis, France, Oct. 29-31, 2018.
  • [24] M. Hanif, M. Tahir, and M. Rafi. Detection of passable roads using ensemble of global and local features. In Proc. of the MediaEval 2018 Workshop, Sophia-Antipolis, France, Oct. 29-31, 2018.
  • [25] M. Hanif, M. A. Tahir, M. Khan, and M. Rafi. Flood detection using social media data and spectral regression based kernel discriminant analysis. In Proc. of the MediaEval 2017 Workshop (Sept. 13-15, 2017). Dublin, Ireland, 2017.
  • [26] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • [27] M. Imran, C. Castillo, J. Lucas, P. Meier, and S. Vieweg.

    Aidr: Artificial intelligence for disaster response.

    In Proceedings of the 23rd International Conference on World Wide Web, pages 159–162. ACM, 2014.
  • [28] M. Jing, B. Scotney, S. Coleman, et al. Flood event image recognition via social media image and text analysis. In Signals and Systems Conference (ISSC), pages 4–9, 2016.
  • [29] J. Kansas, J. Vargas, H. G. Skatter, B. Balicki, and K. McCullum. Using landsat imagery to backcast fire and post-fire residuals in the boreal shield of saskatchewan: implications for woodland caribou management. International Journal of Wildland Fire, 25(5):597–607, 2016.
  • [30] E. Kasutani and A. Yamada. The mpeg-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval. In Image Processing, 2001. Proceedings. 2001 International Conference on, volume 1, pages 674–677. IEEE, 2001.
  • [31] A. Kirchknopf, D. Slijepcevic, M. Zeppelzauer, and M. Seidl. Detection of road passability from social media and satellite images. In Proc. of the MediaEval 2018 Workshop, Sophia-Antipolis, France, Oct. 29-31, 2018.
  • [32] V. Klemas. Remote sensing of floods and flood-prone areas: an overview. Journal of Coastal Research, 31(4):1005–1013, 2014.
  • [33] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
  • [34] V. Krylov, E. Kenny, and R. Dahyot. Automatic discovery and geotagging of objects from street view imagery. Remote Sensing, 10(5):661, 2018.
  • [35] R. Lagerstrom, Y. Arzhaeva, P. Szul, O. Obst, R. Power, B. Robinson, and T. Bednarz. Image classification to support emergency situation awareness. Frontiers in Robotics and AI, 3:54, 2016.
  • [36] Y. Liu and L. Wu.

    Geological disaster recognition on optical remote sensing images using deep learning.

    Procedia Computer Science, 91:566–575, 2016.
  • [37] L. Lopez-Fuentes, A. Farasin, H. Skinnemoen, and P. Garza. Deep learning models for passability detection in flooded roads. Sophia-Antipolis, France, Oct. 29-31, 2018.
  • [38] A. Moumtzidou, P. Giannakeris, S. Andreadis, A. Mavropoulos, G. Meditskos, I. Gialampoukidis, K. Avgerinakis, and I. Kompatsiaris. A multimodal approach in estimating road passability through a flooded area using social media and satellite images. In Proc. of the MediaEval 2018 Workshop, Sophia-Antipolis, France, Oct. 29-31, 2018.
  • [39] K. Nogueira, S. G. Fadel, Í. C. Dourado, R. d. O. Werneck, J. A. Muñoz, O. A. Penatti, R. T. Calumby, L. T. Li, J. A. dos Santos, and R. d. S. Torres. Exploiting convnet diversity for flooding identification. IEEE Geoscience and Remote Sensing Letters, 15(9):1446–1450, 2018.
  • [40] J. Pennington, R. Socher, and C. Manning. Glove: Global vectors for word representation. In

    Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)

    , pages 1532–1543, 2014.
  • [41] K. Pogorelov, S. L. Eskeland, T. de Lange, C. Griwodz, K. R. Randel, H. K. Stensland, D.-T. Dang-Nguyen, C. Spampinato, D. Johansen, M. Riegler, et al. A holistic multimedia system for gastrointestinal tract disease detection. In Proceedings of the 8th ACM on Multimedia Systems Conference, pages 112–123. ACM, 2017.
  • [42] K. Pogorelov, O. Ostroukhova, M. Jeppsson, H. Espeland, C. Griwodz, T. de Lange, D. Johansen, M. Riegler, and P. Halvorsen. Deep learning and hand-crafted feature based approaches for polyp detection in medical videos. In 2018 IEEE 31st International Symposium on Computer-Based Medical Systems (CBMS), pages 381–386. IEEE, 2018.
  • [43] K. Pogorelov, O. Ostroukhova, A. Petlund, P. Halvorsen, T. de Lange, H. N. Espeland, T. Kupka, C. Griwodz, and M. Riegler. Deep learning and handcrafted feature based approaches for automatic detection of angiectasia. In Biomedical & Health Informatics (BHI), 2018 IEEE EMBS International Conference on, pages 365–368. IEEE, 2018.
  • [44] G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5):513–523, 1988.
  • [45] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  • [46] S.-S. . S. R. Snavely, N. International Journal on Computer Vision, (80), 2008.
  • [47] J. Son, S. J. Park, and K.-H. Jung. Retinal vessel segmentation in fundoscopic images with generative adversarial networks. arXiv preprint arXiv:1706.09318, 2017.
  • [48] B. Stelter and N. Cohen. Citizen journalists provided glimpses of mumbai attacks. The New York Times, 30, 2008.
  • [49] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
  • [50] B. Thomee, D. A. Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, and L.-J. Li. Yfcc100m: the new data in multimedia research. Communications of the ACM, 59(2):64–73, 2016.
  • [51] T. Tieleman and G. Hinton. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude.

    COURSERA: Neural networks for machine learning

    , 4(2), 2012.
  • [52] N. Tkachenko, A. Zubiaga, and R. Procter. Wisc at mediaeval 2017: Multimedia satellite task. In Working Notes Proc. MediaEval Workshop, page 2, 2017.
  • [53] Y. Yang, H.-Y. Ha, F. Fleites, S.-C. Chen, and S. Luis. Hierarchical disaster image classification for situation report enhancement. In Information Reuse and Integration (IRI), 2011 IEEE International Conference on, pages 181–186. IEEE, 2011.
  • [54] J. Yin, A. Lampert, M. Cameron, B. Robinson, and R. Power. Using social media to enhance emergency situation awareness. IEEE Intelligent Systems, 27(6):52–59, 2012.
  • [55] F. Yu and V. Koltun. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015.
  • [56] Z. Zhengyu, M. Larson, and N. Oostdijk. Exploiting local semantic concepts for flooding-related social image classification. In Proc. of the MediaEval 2018 Workshop, Sophia-Antipolis, France, Oct. 29-31, 2018.
  • [57] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva.

    Learning deep features for scene recognition using places database.

    In Advances in neural information processing systems, pages 487–495, 2014.