Train and Deploy an Image Classifier for Disaster Response

by   Jianyu Mao, et al.
Penn State University

With Deep Learning Image Classification becoming more powerful each year, it is apparent that its introduction to disaster response will increase the efficiency that responders can work with. Using several Neural Network Models, including AlexNet, ResNet, MobileNet, DenseNets, and 4-Layer CNN, we have classified flood disaster images from a large image data set with up to 79 accuracy. Our models and tutorials for working with the data set have created a foundation for others to classify other types of disasters contained in the images.



page 2

page 4


Deep Learning Based Classification System For Recognizing Local Spinach

A deep learning model gives an incredible result for image processing by...

Fine-Tuning Models Comparisons on Garbage Classification for Recyclability

In this study, it is aimed to develop a deep learning application which ...

Deep Learning on Real Geophysical Data: A Case Study for Distributed Acoustic Sensing Research

Deep Learning approaches for real, large, and complex scientific data se...

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset

In this project we work on creating a model to classify images for the P...

Improving the Energy Efficiency and Robustness of tinyML Computer Vision using Log-Gradient Input Images

This paper studies the merits of applying log-gradient input images to c...

Influence of image noise on crack detection performance of deep convolutional neural networks

Development of deep learning techniques to analyse image data is an expa...

Bag of Tricks for Retail Product Image Classification

Retail Product Image Classification is an important Computer Vision and ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

After Hurricane Maria struck Puerto Rico, researchers from MIT’s Lincoln Laboratory were hard at work helping the Federal Emergency Management Agency, also known as FEMA, assess the damage. This is when the MIT researchers came up with a large LADI data set, also known as the Low Altitude Disaster Imagery data set [1]. In the initial state, LADI focused on the Atlantic hurricane and coastal states along the Atlantic Ocean and Gulf of Mexico. However, this initial data set arose with various issues revolving around image sorting and misidentifying images from recognition systems.

In any large-scale disaster scenario, teams of emergency responders like FEMA could save significant time and resources by reviewing the conditions prior. Therefore, the project was organized into two goals that leveraged the data set. In the initial state, the data set consisted of human and machine annotated aerial images collected by the Civil Air Patrol in support of various disaster responses from 2015-2019. The first goal was to develop deep learning models for image classification using the LADI data set to prioritize flooding, debris, buildings, and other infrastructures. The second goal was to make our deep learning models available publicly to enable potential end-users to adopt, modify, and even improve our models.

Ii Method

Ii-a Data Processing

The LADI data set contains more than 200,000 data points, and each image is labeled as one of the 6 categories - Damage, Rubble, Landslide, Flooding, Road Washout, and Fire. With our goal to create an image classification algorithm to correctly identify disaster response, a sufficient data set becomes the most valuable thing for us to construct an accurate deep learning model.

Ii-A1 Data Cleaning and Validation

When it comes to real-world data, it is not improbable that data may contain incomplete, inconsistent, or missing values. If the data is corrupted, then the model might fail to yield ideal results. To create a reliable data set, our main aim of data cleaning is to identify and remove errors and duplicate data. This will improve our data quality and enable accurate decision making.

Besides data cleaning, we have also restricted the LADI data set with only flooding and non-flooding images. For this step, we only focus on the metadata and label files. After generating the data set with only flooding information, we have extracted the data set into 2000 images and stored them into a separate file for model implementation.

Steps for data cleaning and validation:

  • Extract labels with damage and infrastructure categories

  • Filter out infrastructure label with the label ’none’

  • Extract data with the label that contains ’flood’

  • Extract S3 URL data with the label that contains ’flood’

  • Extract URL data with the label that does not contain ’flood’

Ii-A2 Data Augmentation

Having a large data set is always beneficial for the performance of the deep learning model. By utilizing the transform functions in the TorchVision package[3], it can help to increase the amount of relevant data in our data set.

We have used the following transform functions:

  • transforms.Resize(256): Resize the input image with width to be 256 pixels.

  • transforms.RandomRotation(10): Rotate the input image by a random angle not greater than 10 .

  • transforms.RandomCrop(250): Randomly crop the images to the size of pixels.

  • transforms.RandomHorizontalFlip(): Horizontally flip the given PIL Image randomly with a given probability (50% if no parameter specified).

Fig. 1: Five commonly used TorchVision parameter for image augmentation

Ii-B Model

Deep learning models consist of diverse neural network architectures. Among them, Convolutional Neural Networks (CNNs) are most commonly used to analyze visual imagery and perform image classification tasks. The main success of utilizing CNNs for image classification is to get a comprehensive understanding and use of digital image processing techniques. In this section, we introduce and illustrate the regular CNN architecture and advanced networks, such as AlexNet, ResNet, DenseNet and MobileNet, which will be used in our experiments discussed in the next section.

Ii-B1 Convolutional Neural Network

A Convolutional Neural Network (CNN) [4] [5] is the most prevalent neural network model being used for image classification tasks. A CNN architecture consists of alternate convolutional layers and pooling layers that are followed by fully-connected layers to generate outputs. The structure of a CNN model is shown in Fig. 2.

Fig. 2: Structure of a Convolutional Neutral Network.
  • Convolutional Layers: Convolutional layers convolve the input and pass the result to the next layer. The use of convolution operations is also the source for the name of this kind of architecture. Instead of using fully connected layers to learn from each pixel resulting in numerous free parameters of weights, CNNs resolve this by reducing the number of free parameters and allowing the network to be deeper by convolutions.

  • Pooling Layers: Pooling layers reduce the dimensions of the data by combining the outputs of clusters from the previous layer into a single node in the next layer. Popular pooling options include max pooling and average pooling, that compute the maximum value and average value of the clusters at the prior layer, respectively. The benefits of pooling are to reduce computational costs by reducing the number of free parameters as well as alleviate over-fitting by generalizing the input clusters for the following layers.

  • Fully Connected Layers: Fully connected layers connect the nodes from the previous layer to the nodes specified for the next layer. This is the final step to generalize the outputs from convolutional and pooling layers and provide outputs for image classification tasks.

The advantages of applying Convolutional Neural Networks to image classification are (1) requires less prior processing work e.g. feature extraction, (2) reduces dimensional complexity and computational cost, (3) mitigates the over-fitting problem and (4) provides human-level correctness.

Ii-B2 AlexNet

AlexNet [6]

is considered one of the most influential architectures in computer vision after achieving nearly 50% error rate reduction in the ImageNet challenge, having spurred many more papers published employing CNNs and GPUs to accelerate deep learning.

The main improvements of AlexNet are implementing Rectified Linear Units (ReLUs) and Dropout Layers in the network architecture.

  • ReLU Layers: After convolution operations done by convolutional layers, it is convention to apply a nonlinear layer (activation layer) to introduce non-linearity to the model. Since the convolutions consist of linear operations like multiplications and summations, it is important to make the model nonlinear for complex image classification tasks. For the traditional nonlinear operations including tanh and sigmond, AlexNet applies ReLUs (

    ) which costs much less computational time and alleviates the vanishing gradient problem without compromising much accuracy.

  • Dropout Layers: In Pooling Layers, over-fitting in training process occurs when the parameters (weights) are tuned too much to over-fit the samples, resulting in a poorly performed model on new samples. The idea of dropout is to randomly set a layer of activations to be 0. Dropout layers further alleviate the issue of over-fitting by assuming that a well performed model should provide good classifications even if some random activations are dropped out.

Ii-B3 ResNet

ResNet [7] resolves the issue that deep networks suffer from that saturate and degrade accuracy while increase the number of layers by using skip connections that are also known as residuals to identity blocks which form basic blocks in its structure along with convolutional blocks.

Fig. 3: Comparison of a standard block and a residual block.

As shown in Fig.3, residual blocks add a connection between network layers and the features from the previous layers. Skip connections allow the features to be easily propagated through the network. The summation of the features from previous layers increases the accuracy of the network.

Ii-B4 DenseNet

Similar to ResNets, DenseNets [8] also use shortcut connections in the network structure. DenseNets extend the idea of skip connections to every layer and provide a much more densely connected architecture.

The main fundamental difference is that DenseNets use concatenated feature maps from all preceding layers rather than summation of the previous layers in ResNets.

The advantages of DenseNet include (1) uses fewer parameters for training and (2) reduces computational cost. For instance, a ResNet with 101 layers can achieve a similar accuracy with a DenseNet with 201 layers. However, Densenet has only 45 % of the number of the parameters used in ResNet and can be trained nearly twice as faster than ResNet.

Ii-B5 MobileNet

Mobile devices are a massive market for deep learning models. Due to the trade off between the number of layers in terms of accuracy and the memory cost, MobileNets [9] [10] have become popular for deployment on hardware.

The main idea of MobileNets is to use depth-wise separable convolutions instead of point-wise convolutions like in other CNN models, represented in Fig.4.

Fig. 4: Comparison of a standard convolution and a depthwise separable convolution in MobileNets.

MobileNets apply Batch Normalization (BN) and ReLUs after each convolution. When the kernel size of the convolution operation is

, nearly 9 times less computation power will be achieved.

Iii Results and Discussion

In this section, we systematically evaluate the performance of each prior trained models in PyTorch framework

[2] which are introduced in the previous section. The models we assess include a regular 4-layer CNN model, ResNet models with 34, 50 and 101 layers, respectively, a DenseNet model with 161 hidden layers, an AlexNet model and a MobileNetV2 model.

A data set containing 2000 samples is used for the training and testing processes for each model. The samples in the data set are randomly selected images with human-generated labels from LADI data set where half of the samples are labeled as “damage: flood/water”, and the other half are labeled as other kinds of damages or no damage. The goal of training and testing different models with such a data set is to provide a binary classifier to classify whether an image contains flooding or not. Note: The data set is not a fixed database for all models. Each time before training a model, we randomly select 2000 samples with a fixed “flooding : non-flooding” (50% : 50%) ratio from the LADI data set. In this way, we can mitigate the bias of over-fitting and under-fitting by feeding stochastic-ally chosen samples into models for our experiments each time.

We split our data set with 80% of samples (1600 images) for training and 20% of samples (400 images) for testing. We train each model for 30 epochs and test it accordingly.

In the testing process, we first get the machine generated labels by the model based on the predictions of our binary classifier. They are then compared to the ground truths generated by human beings in the LADI data set. The binary classifier returns label 1 for flooding images and 0 for non-flooding images. If the predicted labels match the ground truths, the detection of a flooding or a non-flooding image is successful. In this way, we get accuracy scores for different classifiers.

Table I compares the accuracy and size of the models trained on 30 epochs for the randomly generated data set as binary classifiers for flood detection in images. The regular 4-layer CNN model performs the worst and gets the largest size in the 7 prior trained models. ResNet models achieve good accuracy and occupy relatively small memory space. As the number of layers increase, the accuracy and size of the model also increases. ResNet 101 model achieved the best accuracy of 79% among all the models we have trained. DenseNet with 161 hidden layers obtained a satisfactory accuracy of 76% as well as maintain a relatively small size. AlexNet, considered as one of the most influential models in Computer Vision, also gets 76% accuracy but with a huge size. In contrast, MobileNet V2, although does not get an outstanding accuracy score, it has the smallest size of merely 17 megabytes, illustrating its potential to be deployed on a hardware, such as mobile devices, embedding systems and web servers.

Model Accuracy (%) Size (MB)
4-Layer CNN 68 3794
ResNet34 72 163
ResNet50 75 180
ResNet101 79 325
AlexNet 76 539
DenseNet 76 203
MobileNetV2 73 17
TABLE I: Accuracy (%) and Size (MB) of 4-Layer CNN, ResNet34, ResNet50, ResNet101, AlexNet, DenseNet and MobileNetV2 Models Trained for 30 Epochs

Our next experiment is to provide predicted results by the trained model on our test samples. Fig. 5 presents the predictions of ResNet 101 model, which obtains the best accuracy as shown in Table I, on 15 random images in the test set.

Fig. 5: Predictions of ResNet 101 Model on 15 test images.

Among these 15 test images, only 3 images are falsely classified: one in the second row, last column; another in the last row, second column and the last one in the last row, last column. If we look at the image in the second row and the last column, we can find that it includes highways, roads, buildings and lands. Although it is labeled as flooding, the flooding pattern is too subtle to be discovered, or it is incorrectly labeled by human beings. Similarly, the second false classified image, in the second row, last column, could be falsely labeled. We can see that the water invades the boundary of the land, but the image is labeled as non-flooding. In this case, the classifier can also serve as a filter to find out suspiciously labeled images and promote further data cleaning and enhancement of LADI data set. The last wrongly classified image in the last row, last column demonstrates the limit of our current ResNet 101 classifier which requires further training and improvement.

Accuracy is a good metric to measure the proportion of correctly classified instances over all the samples in the test set. However, to evaluate a classifier, accuracy is not always the pivotal score. In some cases, a classifier can get a good accuracy but not a good performance in real world problems. Suppose a classifier always predicts 0 for a binary classification task with a test set containing 90% of samples labeled as 0 and 10% as 1. The accuracy is high, but this classifier will not perform well. To eliminate the deficiency of accuracy, below is a confusion matrix of our ResNet 101 model with counts and ratios for True Positives (TP), False Positives (FP), True Negatives (TN) and False Negatives (FN) in Fig. 


Fig. 6: Confusion matrix of ResNet 101 model.

From Fig. 6, we can see the 4 outcomes of a binary classification:

  • True Positives: data instances labeled as positive (flooding) that are actually positive (flooding).

  • False Positives: data instances labeled as positive (flooding) that are actually negative (non-flooding).

  • True Negatives: data instances labeled as negative (non-flooding) that are actually negative (non-flooding).

  • False Negatives: data instances labeled as negative (non-flooding) that are actually positive (flooding).

Based on the four outcomes in the confusion matrix, we can use precision and recall metrics to evaluate the model.

  • Precision: ability of a classification model to return only relevant instances.

  • Recall: ability of a classification model to identify all relevant instances.

The equations for precision and recall are shown below:


In our binary classification, precision is the ratio of the flooding samples correctly identified over the sum of the flooding samples correctly identified and the instances incorrectly identified as flooding. If the precision is high, the images that the classification model classified as positives are more likely to be actually positives. Recall is the ratio of the flooding samples correctly identified over the sum of the flooding samples correctly identified and the flooding samples incorrectly identified as non-flooding. If the recall is high, the classification model is more likely to capture all flooding images in the data set and label them as flooding.

The precision and recall of our ResNet 101 model are 79.5% and 79.9%, respectively. The high precision and recall scores indicate our ResNet 101 model is a capable and precise flooding imagery classifier.

In this section, we demonstrate our experimental results of our models and discuss several interesting outputs. The results of our models are considered to be exceptional, but we expect further improvements of the classifiers as well as the LADI data set. The next section will give a summary of our project and offer a prospect of the future work.

Iv Conclusion and Future Work

Iv-a Conclusion

The LADI project is designed to develop a useful and efficient tool to quickly respond to a disaster based on imagery classification and detection. The model we developed would become a part of the tool to detect and classify images in the LADI data set . Given LADI data set, our model processes the input images and classifies them if they include flooding or not. The result could be used for further disaster responders.

In this paper, we implemented a binary classifier for flooding imagery classification based on the LADI data set. We successfully trained various convolutional neural networks including a regular CNN model, an AlexNet, ResNets, a DenseNet and a MobileNet with satisfactory accuracy scores.

From our experimental results, we obtained a ResNet 101 model with the highest accuracy of 79% as well as exceptional precision and recall scores of nearly 80%, indicating the good performance of our CNN models in the disaster imagery classification tasks. We achieved a MobileNetV2 model which takes only 17 megabytes, illustrating the potential of MobileNets for deployment on hardware devices.

By comparing human generated labels as the ground truths and the model predicted labels, we obtained the accuracy scores of various classifiers we have trained. By examining the True Positives (TP), False Positives (FP), True Negatives (TN) and False Negatives (FN) in our classification results, we inspect our binary classifier more deeply. The outstanding precision and recall scores of our classifier indicate the capability and precision of our binary classifier.

Iv-B Future Work

Our deep learning models are available publicly to enable potential end-users to adopt, modify, and improve our already existing models. Since we are the first team to develop classifiers for this flooding classification set, our code and documentation will be used in the future for a class taught by MIT.

In the future, focusing on improving the accuracy of MobileNet for later hardware deployment would be beneficial. MobileNetV2 achieved an accuracy of 73% with a size of only 17MB. In comparison, our next highest size is for ResNet34 with 163MB. This is an extremely large gap, making MobileNet the most suitable for deployment on embedded hardware.

Other future work may also include extending our binary classifier to multi-classifier and multi-label classifier. Furthermore, because there have been images that are falsely classified by humans, our trained classifiers may also aid in finding suspicious human generated labels. Our classifier can essentially help filter out the images with mismatched labels for future data cleaning.

Future iterations should place emphasis on the system deployment to embedded hardware. Using commercially embedded development platforms such as Raspberry Pi, Intel Neural Compute, or Google Edge TPU is highly recommended when developing the device that deploys the trained deep learning models. The embedded hardware should be able to perform online detection and classification; therefore, enabling institutions such as FEMA to assess damage prior to arriving on-site. Drones or weather balloons are recommended in order to retrieve aerial views/images of the specified area.


We would like to thank Dr. Jeffrey Liu and Andrew Weinert for sponsoring this project, as well as providing the LADI dataset, weekly discussions, and guidance with working on the Deep Learning Models. We would also like to thank Dr. Marc Rigas for forming this team, providing weekly guidance, and looking over the direction of our work.