Hierarchical Classification of Kelps utilizing Deep Residual Features

06/26/2019 ∙ by Ammar Mahmood, et al. ∙ 3

Across the globe, remote image data is rapidly being collected for the assessment of benthic communities from shallow to extremely deep waters on continental slopes to the abyssal seas. Exploiting this data is presently limited by the time it takes for experts to identify organisms found in these images. With this limitation in mind, a large effort has been made globally to introduce automation and machine learning algorithms to accelerate both classification and assessment of marine benthic biota. One major issue lies with organisms that move with swell and currents, like kelps. This paper presents an automatic hierarchical classification method to classify kelps from images collected by autonomous underwater vehicles. The proposed kelp classification approach exploits learned image representations extracted from deep residual networks. These powerful and generic features outperform the traditional off-the-shelf CNN features, which have already shown superior performance over the conventional hand-crafted features. Experiments also demonstrate that the hierarchical classification method outperforms the common parallel multi-class classifications by a significant margin. Experimental results are provided to illustrate the efficient applicability of the proposed method to study the change in kelp cover over time for annually repeated AUV surveys.



There are no comments yet.


page 19

page 28

page 33

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Kelp forests support diverse and productive ecological communities throughout temperate and arctic regions worldwide. Environmental anomalies such as cyclones, storms, marine heat waves and climate change have a detrimental effect on marine life including kelps doney2012climate . Significant declines in kelp bed have been observed around the globe in the past decades, with the main drivers identified as eutrophication and climate change related environmental stressors. For instance, large-scale disappearance of kelp was observed in 2002 in the southern coast of Norway moy2012large . In Spain, large scale reductions in two main species of kelp have also been observed since 1980’s fernandez2011retreat .

Similarly, kelp populations in Australia have decreased as a consequence of climate change driven environmental stressors. In the east coast of Tasmania, the coverage of giant kelp Macrocystis pyrifera in the present decade is 9% of the coverage in the 1940’s johnson2011climate . This decline is consistent with the intrusion of warmer, nutrient poor water from the East Australian Current, which now extends 350 km further south than in the 1940’s ridgway2007long . Wernberg et al.wernberg2016climate reported a rapid climate-driven transition of kelp forests to seaweed turfs in the Australian temperate reef communities with kelp forests showing a 100km poleward contraction from their pre-heatwave distribution on the Western Australia coast. This trend is alarming for the numerous endemic species that rely on kelp forests for support. Loss of kelp forests is also a major threat for Australia’s fishing and tourism industries, which generate more than 10 billion Australian dollars per annum bennett2016great . There is thus a pressing and immediate need for monitoring programs to document changes in kelp dominated habitats along coastlines worldwide and especially in temperate Australia.

Autonomous underwater vehicles (AUVs) are emerging as highly effective tools for monitoring changes in marine environments, because (i) they can autonomously conduct non-destructive sampling in remote marine habitats; (ii) they can repeatedly survey the same spatial region to detect change over time; and (iii) they are fitted with a range of instrumentation to acquire both physical and biological data. AUVs have been used to monitor the marine benthos across temperate and tropical environments in Australia williams2012monitoring , smale2012regional ; to survey invasive pest species barrett2010autonomous ; to document rapid loss of corals associated with warming events smale2012regional , bridge2014variable ; to describe benthic community structure at depths greater than 1000 m sherman2009deep ; and assess environmental impacts of the Deepwater Horizon oil spill camilli2010tracking . In a large-scale study of deep waters, the distribution patterns of kelp forests were investigated to provide useful insights on the effect of environmental changes on the kelp population marzinelli2015large . The survey took an extremely long time to complete as marine biologists had to manually classify images and to identify kelp from imagery.

AUV driven monitoring can generate vast amounts of imagery. For example, an AUV deployed in Western Australia collected more than 15,000 stereo image pairs each day and was deployed between 10 and 12 days each year smale2012regional . Manual analysis of such a large number of images per deployment (150,000 to 200,000 stereo image pairs) takes a significant amount of time and effort and is the major bottleneck in data acquisition from AUV surveys. In order to promptly identify changes in benthic species, especially dominant habitat formers (such as kelps and corals), it is necessary to match image-analysis time to surveying time so data can be analyzed rapidly and identification of change patterns can be accomplished. Automatic classification is critical to speed up image analysis and consequently automatic classification of benthic species has raised interest in ecologists and computer scientists (such as marcos2005classification ; denuelle2010kelp ; bewley2012automated ; beijbom2012automated ; mahmood2016coral ). Nonetheless, automated classification of AUV collected imagery is challenging because images are captured in dynamic shallow water with little to no control on lighting and significant variations in what is visible and how it is perceived.

Figure 1:

Evolution of classification pipelines (the most recent at the bottom). Off-the-shelf deep residual features have the potential to replace the previous classification pipelines and improve performance for marine image classification tasks. (SIFT: scale invariant feature transform, HOG: histograms of gradient, LBP: local binary patterns, CNN: convolutional neural networks, ResNet: residual networks.)

In this paper, we tackle the challenge of automatically annotating underwater imagery for the presence of kelp to detect changes in the coverage of Australian kelp forests. The common practice is to study the distribution and density of benthic species, which involves manually annotating a smaller dataset and then extrapolating these results to make inferences about the sites under study. Automating the process of determining kelp coverage will significantly decrease image processing times and will allow for large scale analysis of datasets and for early identification of changes in kelp cover. To automate this process, it is paramount to select appropriate features. In traditional computer vision tasks, the general trend has shifted from conventional hand-crafted features to off-the-shelf deep features

razavian2014cnn . Hand-crafted features which usually encode one aspect of data (i.e., color, shape or texture) were a popular choice as image representations for marine species recognition tasks in the works of marcos2005classification ; beijbom2012automated ; stokes2009automated ; pizarro2008towards

. Moreover, given that hand-crafted features are designed specifically for a current task at hand, they generally do not perform well when applied on a different task. Recently, Convolutional Neural Networks (CNNs) and features extracted from pre-trained CNNs have become the preferred choice for marine image classification tasks,

e.g., mahmood2016coral ; beijbom2016improving ; mahmood2016automatic

. These off-the-shelf features are image representations learned by a deep network trained on a larger dataset such as ImageNet. Off-the-shelf CNN features are generic and have shown better performance as compared to hand-crafted features on a variety of image recognition tasks

razavian2014cnn . In this paper, we propose to apply image representations extracted from deep residual networks (ResNets) to further improve the automatic annotation of benthic species. Besides better performance, one big advantage of ResNets is their faster training time and ease of optimization. Fig. 1 depicts the evolution of classification pipelines for automatic marine specie annotation.

The main motivation for using ResNet as a base network to extract features for kelp classification is its superior performance over previous deep networks he2015deep

. Moreover, the feature extraction is fast due to the low computational complexity of ResNets and the reduced number of floating point operations (FLOPs). Also, the feature extracted from ResNet is 2048-dimensional, which is half of the traditional 4096-dimensional feature vector of previous networks such as VGG16

simonyan2014very . These compact features result in reduced memory requirements for storing the features of large marine datasets.

The main contributions of this paper are:

  1. We report the first application of deep learning for automated kelp coverage analysis for the marine science community.

  2. We propose a supervised kelp image classification method based on features extracted from deep residual networks, termed as Deep Residual Features (DRF).

  3. We compare the classification performance of the DRF with the widely used off-the-shelf CNN features for automatic annotation of kelps.

  4. We show that DRFs achieve a superior classification accuracy compared to previous methods for kelp classification.

  5. We compare hierarchical image classification with multi-class image classification and report the accuracies and mean f1-scores for two large datasets.

  6. We utilize our proposed method to automatically analyze kelp coverage across five regions of Rottnest Island in Western Australia.

  7. We demonstrate the performance of the proposed kelp coverage analysis technique using ground truth data provided by marine experts and show a high correlation with previously conducted manual surveys.

The paper is organized as follows. In Section 2, we will briefly review related work. In Section 3, we present our proposed approach and explain the features extracted from deep networks. We then report the experimental results and kelp coverage analysis. In Section 4, we discuss the next steps required to implement our proposed method to a platform to rapidly analyze benthic images. Section 5 concludes this paper.

2 Related Work

2.1 Kelp Classification

Previous studies on automatic classification and segmentation of kelps in marine imagery were based on hand-crafted features (Table 1). To the best of our knowledge, deep networks or features extracted from deep networks have not yet been applied to solve this problem. Here we briefly summarize a few of the prominent studies focused on automating kelp identification.

Denuelle and Dunbabin denuelle2010kelp

utilized a technique that employed generation of kelp probability maps using Haralick texture features across an entire image. They reported that supervised and unsupervised segmentation yielded similar results. Color imbalance resulted in a significant number of false positives thus implying that the images collected must be diversified to cater for the various possible underwater lighting and visibility conditions. When compared to manual segmentation by experts, the results show good agreement.

Bewley et al.bewley2012automated

presented a technique for the automatic detection of kelps using AUV gathered images. The proposed method used local image features which are fed to Support Vector Machines (SVM)


to identify whether kelp is present in the image under examination. Comparison of several descriptors such as Local Binary Patterns (LBP) and Principal Component Analysis was carried out across multiple scales. This algorithm was tested on benthic data (collected from Tasmania in 2008), which contained 1258 images with 62,900 labels and 19 classes. The f1-score, which is the harmonic mean of precision and recall was used to evaluate the performance of their proposed method:

A maximum f1-score of 0.69 was reported for kelps. It was also suggested that practical systems can be built to assist scientists with automatic identification of kelps. They also concluded that results could be improved by using combinations at multiple scales, finding superior descriptors and by using more supplementary AUV data. The study concluded that for a local geographical region, and for a particular species, sufficient generalization is possible.

This work was extended in bewley2015hierarchical for a multi-class classification problem in the presence of a taxonomical hierarchy. A local classifier was trained for each node of the hierarchy tree for LBP features and the classification results were compared through multiple hierarchy training methods. This algorithm achieved an f1-score of 0.75 for kelps and an overall mean f1-score of 0.197 for all 19 classes present in the dataset.

2.2 Deep Learning for Marine Species Recognition

In recent years, deep networks and off-the-shelf CNN features have become the first choice to tackle computer vision tasks. Only a handful of studies have developed marine species recognition methods based on deep learning. Beijbom et al.beijbom2016improving trained three and five-channel deep CNNs based on the CIFAR10 LeNet architecture krizhevsky2012imagenet to improve the classification performance for coral and non-coral species. Reflectance and fluorescence images were registered together to obtain a five-channel image, which improved the classification performance by a significant margin. This was the first reported study to employ training of deep networks (from scratch) for marine specie recognition.

Off-the-shelf CNN features razavian2014cnn along with multi-scale pooling were first used for coral classification in mahmood2016coral on the Moorea Labelled Coral (MLC) dataset, which is a challenging dataset introduced in beijbom2012automated . This paper also explored a hybrid feature approach, combining CNN features with texton maps to further improve the classification accuracy on this dataset. Class imbalance is an additional problem which refers to the disproportionate difference in the amount of points allocated to some classes compared to others. This is a common issue in marine datasets, as some species are significantly more abundant than others. To address the class imbalance, a cost-sensitive learning approach was studied in khan2017cost using off-the-shelf CNN features for MLC dataset. In another study, features extracted from pre-trained deep networks were used to generate coral population maps for the Abrolhos Islands in Western Australia mahmood2016automatic . This study reported a trend of decreasing live coral cover in this region. This is consistent with the manual analysis of AUV images conducted by marine researchers smale2012regional ; bridge2014variable .

Deep residual networks (ResNets) are a special class of CNNs and are deeper, faster to train and easier to optimize than previous CNN architectures he2015deep . ResNets employ techniques such as residual learning and identity mapping for shortcut connections he2016identity , which enables them to overcome the limitations of traditional CNNs and outperform them in speed and accuracy. ResFeats, features extracted from the output of convolutional layers of a 50-layer ResNet (ResNet-50), were reported to improve the performance of different image classification tasks in mahmood2016resfeats , including coral classification on the MLC dataset. Although these features are computationally expensive large arrays, we chose to use the image representations extracted from the layers closer to the output end of ResNet-50 to reduce computation cost and alleviate the need for dimensionality reduction.

Authors Methods Classes Main Species
Marcos et al.marcos2005classification Color histograms, local binary pattern (LBP) and a 3-layer neural network 3 Corals
Stokes and Deane stokes2009automated Color histograms, discrete cosine transform and probability density based classifier 18 Corals, Macroalgae
Pizarro et al.pizarro2008towards Color histograms, Gabor filter response, scale-invariant feature transform (SIFT) and a voting based classifier 8 Corals, Macroalgae
Beijbom et al.beijbom2012automated Maximum response filter bank with SVM classifier 9 Corals, Macroalgae
Denuelle and Dunbabin denuelle2010kelp * Haralick texture features with Mahalanobis distance classifier 2 Kelp
Bewley et al.bewley2012automated * Principal Component Analysis (PCA) and LBP descriptors with SVM classifier 19 Corals, Algae and Kelp
Bewley et al.bewley2015hierarchical * Hierarchical classification with PCA and LBP features 19 Corals, Algae and Kelp
Beijbom et al.beijbom2016improving Deep neural network with reflectance and fluorescence images 10 Corals, Macrolagae
Mahmood et al.mahmood2016coral

Hybrid ( CNN + handcrafted) features with a multilayer perceptron (MLP) network

9 Corals, Macrolagae
Mahmood et al.mahmood2016automatic Off-the-shelf CNN features with SVM classifier 2 Corals, Macroalgae
Table 1: A brief summary of methods for benthic image classification. Key: * have reported results on kelps and have used methods based on deep learning.

3 Methods and Results

In this section, we outline the key components of our proposed method (Figure 2) and present the adopted experimental protocols.

3.1 Datasets

3.1.1 Benthoz15 Dataset

This Australian benthic data set (Benthoz15) bewley2015australian consists of an expert-annotated set of geo-referenced benthic images and associated sensor data. These images were captured by AUV Sirius during Australia’s integrated marine observation system (IMOS) benthic monitoring program at multiple temperate locations (Table 2) around Australia williams2012monitoring . Marine experts manually annotated each of these images according to the CATAMI classification scheme. For each image, up to 50 randomly selected pixels were hand labelled using the Coral Point Count with Excel Extensions (CPCe) software package kohler2006coral . The whole dataset contains 407,968 expert labelled points, taken from 9,874 distinct images collected at different depths and sites over the past few years. There are 145 distinct class labels in this dataset, with pixel labels ranging from 2 to 98,380 per class. 33 out of these 145 classes belong to macroalgae (MA) species. 63,722 labelled points out of the total belongs to the kelp class. Further details on the labeling methodology can be found in bewley2015australian .

Site Survey Year # of Labels # of Images
Abrolhos Islands 2011, 2012, 2013 119,273 2,377
Tasmania 2008, 2009 88,900 1,778
Rottnest Island 2011 63,600 1,272
Jurien Bay 2011 55,050 1,101
Solitary Islands 2012 30,700 1,228
Batemans Bay 2010, 2012 24,825 993
Port Stevens 2010, 2012 15,600 624
South East Queensland 2010 10,020 501
Total - 407,968 9,874
Table 2: Benthoz15 data.

3.1.2 Rottnest Island Dataset

The Rottnest Island dataset was also collected by AUV Sirius and contains 297,800 expert labelled points, taken from 5,956 distinct images collected at different depths from five sites around Rottnest Island from 2010 to 2013 (Table 3). Three out of the five sites are labelled north (15m, 25m and 40m depth) and two as south (15m and 25m depth). There are 78 distinct class labels in this dataset, with pixel labels ranging from 2 to 155,776 per class. This makes the classification quite challenging. 25 out of these 78 classes belong to macroalgae species. 156,000 labelled points out of the total belongs to the kelp class.

Survey Year # of Images # of Labels # of Classes
2010 1,680 84,000 61
2011 1,680 84,000 55
2012 1,033 51,650 44
2013 1,563 78,150 55
Total 5,956 297,800 78
Table 3: Rottnest Island data.
Figure 2: The block diagram of our proposed framework.

3.2 Classification Methods

Deep residual features are the outputs of the first fully connected layer of a 50-layer deep residual network (ResNet-50) he2015deep that is pre-trained on ImageNet. Fig. 3 shows the architecture of the ResNet-50 deep network which we have used for feature extraction. The ResNet-50 is made up of five convolutional blocks stacked on top of each other (Figure 3). The convolutional blocks of a ResNet are different from those of the traditional CNNs because of the introduction of a shortcut connection between the input and output of each block. Identity mappings when used as shortcut connections in ResNets he2016identity , can lead to better optimization and reduced complexity. This in turn results in deeper ResNets which are faster to train and are computationally less expensive than the conventional CNNs i.e., VGGnet simonyan2014very .

Figure 3: ResNet-50 architecture he2015deep shown with the residual units, the size of the filters and the outputs of each convolutional layer. DRF extracted from the last convolutional layer of this network is also shown. Key: The notation

in the convolutional layer block denotes a filter of size k and n channels. FC 1000 denotes the fully connected layer with 1000 neurons. The number on the top of the convolutional layer block represents the repetition of each unit. nClasses represents the number of output classes.

The image representations extracted from the fully connected layers of deep networks pre-trained on ImageNet razavian2014cnn capture the overall shape of the object contained in the region of interest. The features extracted from the deeper layers encode class specific properties (i.e., shape, texture and color) and give superior classification performance as compared to features from shallower layers zeiler2014visualizing . Hence, we propose to extract the features from the output of the last convolutional block of ResNet-50 (Figure 3). The output of the Conv5 block is a

dimensional array and is used as input of the FC-1000 layer. This large array is however, first converted to a 2048-dimensional vector by using a max-pool layer. We extract this 2048-dimensional vector and name it DRF. We do not use the FC-1000 layer for feature extraction because it is used as an output layer to classify the 1000 classes of the ImageNet dataset, which was used to pre-train this network. Our feature extraction method is different from the conventional method employed in previous deep networks such as VGGnet. The presence of multiple fully connected layers in the VGGnet makes the feature extraction straightforward. The only fully connected layer in ResNet is class specific to the ImageNet dataset. Therefore, we proposed to use the output of the last convolution block for DRF extraction.

There are three different approaches described in silla2011survey to deal with the hierarchical classification problem:

  1. Flat Classification: This approach ignores the hierarchy and treats the problem as a parallel multi-class classification problem.

  2. Local Binary Classification: A binary classifier is trained for every node in the hierarchical tree of the given problem.

  3. Global Classification: A single classifier is trained for all classes and the hierarchical information is encoded in the data.

We have used the local binary classification technique in this paper to identify kelps from other taxa. This approach is easier to implement and more useful when all the nodes in the hierarchy are not labeled to a specific leaf node level. For example, some macroalgae are not labeled to the species level in the Benthoz15 dataset bewley2015australian . Moreover, this approach also allows for the use of different features, training sets and classifiers for each node of the hierarchy tree. The hierarchy tree for kelps is shown in Fig. 4.

Figure 4: Hierarchy tree for kelps in our benthic data. In each node, the first line shows the node number, 2 line shows the name of the specie, and 3 and 4 lines show the number of labels belonging to that particular specie in Benthoz15 and Rottnest Island data respectively.

3.3 Training and Testing Protocols

In this paper, two training approaches are used, namely inclusive training and sibling training. In the inclusive training method, all the non-kelp samples from the entire dataset are treated as negative samples i.e., nodes 1.2 and 1.1.2 in Fig. 4. However in the sibling training method, only those non-kelp samples are considered as negative which comes under the macroalgae node i.e., node 1.1.2 in Fig. 6. We use a linear Support Vector Machines (SVM) cortes1995support classifier because it has shown excellent performance with features extracted from deep networks razavian2014cnn . We performed 3-fold cross validation within the training set to optimize the SVM parameters and mean performances are reported in Section 3.

3.4 Image Enhancement and Implementation Details

We applied color channel stretch on each image in the dataset to reduce the effect of underwater color distortion phenomenon. We calculated the averages of the lowest 1% and the highest 99% of the intensities for each color channel. The average of the lowest 1% intensities was subtracted from all the intensities in each respective channel and the negative values were set to zero. These intensities were then divided by the average of the highest 99% of the intensities. This process enhanced the color information of marine images as shown in Fig. 5.

Figure 5: Enhancement of marine images with color channel stretch: (left) raw images and (right) enhnanced images.

For feature extraction, we used a pre-trained ResNet-50 he2015deep deep network architecture in our experiments. We used the publicly available model of this network, which was pre-trained on the ImageNet dataset. We implemented our proposed method using MatConvNet vedaldi2015matconvnet and the SVM classifier using LIBLINEAR fan2008liblinear (Figure 2).

3.5 Experimental Settings and Evaluation Criteria

70% of images from each geographical location were used to form the training set for experiments carried out on the Benthoz15 dataset. However, for Rottnest Island data, the images from years 2010, 2011 and 2012 are included in the training set and the images from year 2013 form the testing set. We performed our experiments with three different classification approaches: flat classification and local binary classification with both inclusive and sibling training policies. The overall classification accuracy is not an effective measure of binary classifier performance for datasets exhibiting a skewed class distribution. Therefore, to evaluate the performance of our classifier, we have used four evaluation criteria: overall classification accuracy, mean f1-score (the average of f1-scores of each class involved in the test data), precision and recall values of kelp.

3.6 Classification Results

In this section, we report the results of three different types of features for the three training methods on the two datasets: (i) Maximum Response (MR) filter and texton maps of beijbom2012automated as baseline handcrafted features. We used a publicly available implementation of this method; (ii) CNN features extracted from a VGG16 network pretrained on ImageNet dataset simonyan2014very ; (iii) Our proposed DRFs extracted from a pretrained ResNet-50.

Classification by DRF method is shown to always outperform the traditional CNN features and MR features in both datasets as it consistently showed higher accuracy, higher f1 scores, higher precision of kelps and higher kelp recall than previously used features. Additionally, hierarchical classification (sibling and inclusive) in comparison to flat classification, also improved f1-score and recall of kelps while providing lower training times. However, the sibling training method achieved the highest f1-score for both datasets. Because f1-score is an evaluation metric based on both precision and recall, we recommend the sibling training method as the top performing practical method for classification and automated coverage analysis of kelps.

3.6.1 Benthoz15 Dataset

To highlight the superior classification performance of DRF, we have included a comparative study among DRF and the traditionally used CNN features extracted from VGGnet simonyan2014very and MR features (Table 4). The DRF method performs better than both the features for all three classification experiments. The lowest overall accuracy was achieved by the flat multi-class classification method. This is not surprising since there are 145 classes in this dataset. Additionally, a very low mean f1-score of 0.05 was observed, since many classes among the total 145 had very few samples for training and testing. Nonetheless, the flat classification method achieved the highest precision for kelps among all the three methods. Out of every 100 kelp samples, this method correctly identifies 71 samples as kelps. However, this method resulted in the worst recall value of 65% (Table 4).

The best classification accuracy is achieved with the inclusive training method for which all the non-kelp samples are bundled together in the negative class. This training scheme achieves a mean f1-score of 0.79 which is similar to the highest f1-score of 0.80 obtained using the sibling training method (Table 4).

The sibling training method is more challenging as compared to the inclusive training method because the negative samples only include macroalgae classes and some of these classes are very similar to kelp in appearance. This accounts for a drop in classification accuracy from 90% to 83.4%. However the sibling training method resulted in the highest mean f1-score and recall value for kelp. The sibling and inclusive training methods show comparatively similar performance. However, the sibling method is superior because it has lower training time than the inclusive method. Moreover, statistical testing supports the hypothesis that all three DRF classifiers are better than their VGG and MR counterparts at significance level of 0.05 (50,000 samples were chosen at random for these tests from the respective testing sets).

Method Accuracy (%) Mean f1-score Precision of Kelps (%) Recall of Kelps (%)
MR: Flat 51.6 0.03 64 59
MR: Inclusive 82.8 0.70 43 69
MR: Sibling 79.6 0.72 55 73
VGG: Flat 54.4 0.03 67 63
VGG: Inclusive 89.0 0.75 47 73
VGG: Sibling 82.1 0.76 57 75
DRF: Flat 57.6 0.05 71 65
DRF: Inclusive 90.0 0.79 58 73
DRF: Sibling 83.4 0.80 65 78
Table 4: A comparison of flat, inclusive and sibling classification methods for kelp classification on Benthoz15 dataset for MR, VGG and DRF methods. The flat classification focuses on all the classes present in the dataset whereas the inclusive and sibling classification only includes kelps and non-kelps. Mean f1-score corresponds to the average of the individual f1-score of each class involved in the experiment. Best scores are shown in bold font.

3.6.2 Rottnest Island Dataset

The DRF was then applied to the Rottnest Island data and once again confirmed that the DRF outperformed the VGG and MR features for all the classification experiments (Table 5). The hierarchical methods performed better than the flat classification method for all evaluation criteria except for precision. However, the recall value achieved by this method is the worst. This is consistent with the results obtained on Benthoz15 dataset. The mean f1-scores for flat classifier are again very low given the fact that all 78 classes are classified at the same time. The sibling training method comes out as the best method with respect to accuracy, mean f1-score and recall value of kelps. Moreover, the sibling training method is also the fastest method because it has less negative examples than the inclusive method.

Fine-tuning a deep network is also a popular approach for transfer learning

azizpour2015generic . We also compared our proposed method with fine-tuning. The fine-tuning approach slightly outperformed our proposed method. We observed an accuracy gain of 0.5% for the Benthoz15 dataset and 0.8% for the Rottnest Island dataset. However, our proposed method is computationally much less expensive, requiring only a few hours to run whereas fine-tuning a ResNet-50 with Rottnest Island dataset took 2 days using a Nvidia Titan-X GPU.

Method Accuracy (%) Mean f1-score Precision of Kelps (%) Recall of Kelps (%)
MR: Flat 52.9 0.02 90 62
MR: Inclusive 73.2 0.70 77 74
MR: Sibling 71.7 0.71 80 73
VGG: Flat 58.6 0.02 95 65
VGG: Inclusive 74.7 0.74 81 75
VGG: Sibling 74.5 0.73 84 75
DRF: Flat 59.0 0.03 95 66
DRF: Inclusive 75.0 0.75 82 75
DRF: Sibling 77.2 0.76 86 79
Table 5: A comparison of flat, inclusive and sibling classification methods for kelp classification on Rottnest Island dataset for MR, VGG and DRF methods. The flat classification focuses on all the classes present in the dataset whereas the inclusive and sibling classification only includes kelps and non-kelps. Mean f1-score corresponds to the average of the individual f1-score of each class involved in the experiment. Best scores are shown in bold font.

3.7 Kelp Coverage Analysis

We extended our method to estimate kelp cover for the Rottnest Island dataset. Kelp cover estimated by the annotations generated by our proposed method was compared to the cover based on expert classification (Figure

6; Table 6). Scatter plots were generated for each of five sites and all the data included in the 2013 test set. An important application of our proposed method is to estimate the population trends of kelp across spatial and time scales. To accomplish this task, we split the Rottnest Island data into sites and trained a classifier on this basis instead of years. The three sites from the north constitute the training set and the two southern sites form the test set.

The first sub-plot in Figure 6

shows kelp coverage for all of the data included in the test set. The slope of the line generated by linear regression is very close to the ideal case. This highlights the robustness of our proposed algorithm. The remaining sub-plots show kelp coverage for each of the five sites. These sub-plots show a good agreement between the annotations generated by our proposed method and the annotations provided by the human experts (Table

6). It is important to note that the DRF classification seems to over-fit kelp cover at high percentages of cover and to under-fit kelp cover at lower ones.

Figure 6:

Coverage estimation scatter plots for Rottnest Island Data for the DRF: Sibling Training experiment. Each dot indicates the estimated cover and the actual cover per image. The dashed green line represents the perfect estimation. The blue line on each plot is the linear regression model and the shaded area represent the 95% confidence intervals. The first plot is the aggregated plot of the remaining plots of the five sites included in the 2013 test data.

value for each sub-plot is shown in the respective title.

The estimated kelp coverage is not significantly different from the coverage calculated by the experts from the ground truth labels (Figure 7). This indicates the robustness of our proposed method for estimating kelp coverage. These results are beneficial to marine scientists since many surveys focus on estimating kelp coverage, which is an important metric to indicate the health of kelp forests.

Figure 8 shows the expert identified and estimated percent cover of kelp across years of sites 2 and 4. For site 2, a slight over estimation of kelp cover by the DRF classification is visible, however no distinct trend of change across years is observable in either manual or automatic classification. On the other hand, the estimation of kelp cover for site 4 shows no overestimation and similarly to site 2, no trend change in kelp cover over the years.

Figure 7: Expert identified and estimated kelp coverage for all five sites of Rottnest Island data for year 2013.
Site Depth and Location Expert Identified (%) Estimated (%)
1 15m North 52.65 60.19 0.84
2 15m South 64.64 71.23 0.87
3 25m North 62.44 72.32 0.83
4 25m South 49.24 49.78 0.89
5 40m North 44.60 43.28 0.85
Table 6: Expert identified and estimated kelp coverage for all five sites of Rottnest Island data for year 2013 along with the values.
Figure 8: Expert identified and estimated kelp coverage for the two southern sites of the Rottnest Island data. Left: Site 2, Right: Site 4.

4 Discussion

The use of AUVs to survey marine habitats has allowed scientists to investigate remote locations such as off-shore and deep sites, which are beyond the limits of traditional SCUBA diving. Nonetheless, the efficiency of image collection does not match the availability of data for ecological analysis, as image classification is time consuming and costly given that it is performed manually by marine experts. Additionally, manual classification has other disadvantages such as observer discrepancies and biases. Automated analysis of imagery is thus essential to fully benefit from the advantages of remote surveying technologies such as AUV’s. In this study, we have addressed this problem by evaluating a machine learning automated image classification method using Deep Residual Features (DRF) for a key marine benthic species: the kelp Ecklonia radiata.

We have demonstrated that the image representations extracted from pre-trained deep residual networks can be effectively used for marine image classification in general and kelps in particular. These powerful and generic features outperform traditional off-the-shelf CNN features, which have already shown superior performance over conventional hand-crafted features mahmood2016coral ; razavian2014cnn . We have demonstrated that the sibling and inclusive hierarchical training methods further enhance performance when compared to flat multi-class classification methods. Furthermore, estimations of kelp cover by automated DRF classification closely resemble those of manual expert classifications with the added advantage of faster processing times. This work provides evidence that automatic annotations may save resources and time while providing effective estimates of benthic cover.

One of many challenges in benthic cover estimations through image analysis is the large amount of time required to manually classify the imagery. The average time for manual annotation with 50 sample points per image is 8 minutes. A trained marine expert can annotate up to 8 images per hour. The proposed method is significantly less time consuming as it results in an annotation rate of 1800 images per hour using a Nvidia Titan-X GPU. This is approximately 225 times faster than manual annotation by experts. Nonetheless, note that the proposed machine learning algorithm is only classifying ‘kelp’ vs ‘non kelp’. Although it is faster, it is not yet trained to classify 145 potential benthic classes. This paper evaluates the technique for a single class and presents a way forward to develop the methodology for other classes and faster processing times, which will allow scientist to promptly analyze changes in benthic community composition.

This method was also applied on a dataset to compare kelp coverage for multiple locations at different depths and from different years at Rottnest Island. The patterns observed showed differences in percent cover of the kelp Ecklonia radiata between sites (higher cover in shallower and lower cover at deeper sites) and no considerable change of kelp cover across years. These trends were similar to those observed by manually classified data once more confirming the usefulness of automated image classifying methods. Interestingly, our results contrast with trends of significant and continuous kelp decline reported in the literature since the 2011 marine heat wave in the region wernberg2016climate from locations on the coast near Rottnest at depths less than 15 meters. Kelp growing offshore and in deeper locations appear not to be impacted to the same level by ocean warming as coastal shallow reefs. Note that we do need further detailed analysis of our data to confirm this. This emphasizes the importance of AUV surveys since they provide information on offshore and deep locations which may be influenced by different factors to their inshore counterparts. The use of automated image analysis for processing AUV images will allow us to compare the patterns observed in deep and remote locations with patterns identified in shallow and inshore sites from readily available datasets obtained with SCUBA diving techniques.

Although the proposed DRF classification method allowed us to compare kelp cover in different sites and across different years providing marginal differences with the estimations from manual annotations, there were some errors associated with the proposed technique. We observed an over-prediction of kelp at high percentage cover and under-prediction at low cover. Nonetheless, the over prediction was smaller when data was divided per site and in some sites was negligible (4 and 5). Overall, the estimated kelp cover closely resembles manual classification and taking into consideration the cost effectiveness of automated DRF classification methods, the error is inconsequential for AUV surveys, which target kelp populations in large spatial and time scales.

A comparison of the best overall accuracies of hierarchical classification across the two used datasets shows that the Benthoz15 dataset has better classification accuracy than Rottnest Island dataset. There is an absolute gain of 12.8% in classification accuracy of the Benthoz15 dataset. This substantial difference is possibly due to the high presence of the brown algae Scytothalia dorycarpa in the Rottnest Island data. Scytothalia dorycarpa is very similar to kelp in appearance and usually occurs in areas of the sea floor with high cover of kelp. Therefore, marine scientists may mis-classify it as kelp in poor quality images. This misclassification is possible if the point falls on the edge of Scytothalia dorycarpa, where the boundary between the two species is not clear. The expert misclassification of Scytothalia dorycarpa as kelp may also explain the under-prediction of kelp by the DRF classification method at high percentage cover. The under-prediction of the automated classification is actually an overestimation of the kelp cover by the manual annotation method. The subjectivity in the classification is removed by the automated analysis, which uses several features to classify kelp. Fig. 9 illustrates the similarity of appearance of these two species.

Poor quality images (low light and resolution) will also affect the manual classification of other classes of algae such as ‘turf matrix’, ‘fine branching red algae’ or other canopy forming brown algae. These and other algae classes are not as common as kelp at the sites surveyed at Rottnest Island. Thus, misclassification associated to manual annotations may also explain the over prediction of kelp at low percentage covers. At low cover of kelp, a turf and foliose matrix of red algae occurs on the rocks. In areas of low kelp cover it is easy for an expert to distinguish kelp from other classes, but perhaps due to the imbalance of data for training the classifier sometimes other classes are classified as kelp resulting in over-prediction by the DRF classification method. These issues highlight the need for larger training datasets for deep learning based automatic annotation. Extensive and comprehensive training sets will allow for better classifier training and give the opportunity to increase the amount of biota classified automatically (e.g. other algae species, corals, sponges, invertebrates such as sea urchins and lobsters). Future work will explore multi-class classification of marine species across diverse benthic habitats so methods based on deep learning algorithms can be applied to numerous ecological problems that include other marine species. Scientists who use data extracted from image classification should keep these considerations in mind when manually annotating images since these datasets are extremely valuable for deep learning based automatic classification.

Figure 9: An example image from Rottnest Island Dataset with manual annotations showing similarity in appearance between Scytothalia dorycarpa (green) and the kelp Ecklonia radiata (blue).

5 Conclusion

The aim of this study was to investigate deep learning techniques for automatic annotation of kelp species in a complex underwater scenery. Towards this end, we proposed a Deep Residual Features (DRF) based method to carry out this task and showed it outperformed the widely adopted off-the-shelf CNN based classification. We also established that hierarchical classification with the sibling method gave superior results compared to the flat multi-class approach with the added advantage of faster training times. Our results suggest that the proposed automatic kelp annotation method can significantly reduce the number of human-hours spent in manual annotations. Additionally, our proposed method can enhance the effectiveness of AUV monitoring campaigns by facilitating the early detection of changes in the population of key species though rapid image processing times, as demonstrated with examples from the Rottnest Island dataset. To conclude, the proposed DRF based automatic annotation of benthic images is to this date the most accurate machine learning technique for estimation of kelp cover.


This research was partially supported by Australian Research Council Grants (DP150104251 and DE120102960) and the Integrated Marine Observing System (IMOS) through the Department of Innovation, Industry, Science and Research (DIISR), National Collaborative Research Infrastructure Scheme. The authors also acknowledge NVIDIA for providing a Titan-X GPU for the experiments involved in this research. There are no conflicts of interest to disclose.



  • (1) S. C. Doney, M. Ruckelshaus, J. E. Duffy, J. P. Barry, F. Chan, C. A. English, H. M. Galindo, J. M. Grebmeier, A. B. Hollowed, N. Knowlton, et al., “Climate change impacts on marine ecosystems,” Marine Science, vol. 4, 2012.
  • (2) F. E. Moy and H. Christie, “Large-scale shift from sugar kelp (saccharina latissima) to ephemeral algae along the south and west coast of norway,” Marine Biology Research, vol. 8, no. 4, pp. 309–321, 2012.
  • (3) C. Fernández, “The retreat of large brown seaweeds on the north coast of spain: the case of saccorhiza polyschides,” European Journal of Phycology, vol. 46, no. 4, pp. 352–360, 2011.
  • (4) C. R. Johnson, S. C. Banks, N. S. Barrett, F. Cazassus, P. K. Dunstan, G. J. Edgar, S. D. Frusher, C. Gardner, M. Haddon, F. Helidoniotis, et al., “Climate change cascades: Shifts in oceanography, species’ ranges and subtidal marine community dynamics in eastern tasmania,” Journal of Experimental Marine Biology and Ecology, vol. 400, no. 1, pp. 17–32, 2011.
  • (5) K. Ridgway, “Long-term trend and decadal variability of the southward penetration of the east australian current,” Geophysical Research Letters, vol. 34, no. 13, 2007.
  • (6) T. Wernberg, S. Bennett, R. C. Babcock, T. de Bettignies, K. Cure, M. Depczynski, F. Dufois, J. Fromont, C. J. Fulton, R. K. Hovey, et al., “Climate-driven regime shift of a temperate marine ecosystem,” Science, vol. 353, no. 6295, pp. 169–172, 2016.
  • (7) S. Bennett, T. Wernberg, S. D. Connell, A. J. Hobday, C. R. Johnson, and E. S. Poloczanska, “The ‘great southern reef’: social, ecological and economic value of australia’s neglected kelp forests,” Marine and Freshwater Research, vol. 67, no. 1, pp. 47–56, 2016.
  • (8) S. B. Williams, O. R. Pizarro, M. V. Jakuba, C. R. Johnson, N. S. Barrett, R. C. Babcock, G. A. Kendrick, P. D. Steinberg, A. J. Heyward, P. J. Doherty, et al., “Monitoring of benthic reference sites: using an autonomous underwater vehicle,” IEEE Robotics & Automation Magazine, vol. 19, no. 1, pp. 73–84, 2012.
  • (9) D. A. Smale, G. A. Kendrick, E. S. Harvey, T. J. Langlois, R. K. Hovey, K. P. Van Niel, K. I. Waddington, L. M. Bellchambers, M. B. Pember, R. C. Babcock, et al., “Regional-scale benthic monitoring for ecosystem-based fisheries management (ebfm) using an autonomous underwater vehicle (auv),” ICES Journal of Marine Science, vol. 69, no. 6, pp. 1108–1118, 2012.
  • (10) N. Barrett, J. Seiler, T. Anderson, S. Williams, S. Nichol, and S. N. Hill, “Autonomous underwater vehicle (auv) for mapping marine biodiversity in coastal and shelf waters: Implications for marine management,” in OCEANS 2010 IEEE-Sydney, pp. 1–6, IEEE, 2010.
  • (11) T. C. Bridge, R. Ferrari, M. Bryson, R. Hovey, W. F. Figueira, S. B. Williams, O. Pizarro, A. R. Harborne, and M. Byrne, “Variable responses of benthic communities to anomalously warm sea temperatures on a high-latitude coral reef,” PloS one, vol. 9, no. 11, p. e113079, 2014.
  • (12) A. D. Sherman and K. Smith, “Deep-sea benthic boundary layer communities and food supply: A long-term monitoring strategy,” Deep Sea Research Part II: Topical Studies in Oceanography, vol. 56, no. 19, pp. 1754–1762, 2009.
  • (13) R. Camilli, C. M. Reddy, D. R. Yoerger, B. A. Van Mooy, M. V. Jakuba, J. C. Kinsey, C. P. McIntyre, S. P. Sylva, and J. V. Maloney, “Tracking hydrocarbon plume transport and biodegradation at deepwater horizon,” Science, vol. 330, no. 6001, pp. 201–204, 2010.
  • (14) E. M. Marzinelli, S. B. Williams, R. C. Babcock, N. S. Barrett, C. R. Johnson, A. Jordan, G. A. Kendrick, O. R. Pizarro, D. A. Smale, and P. D. Steinberg, “Large-scale geographic variation in distribution and abundance of australian deep-water kelp forests,” PloS one, vol. 10, no. 2, p. e0118390, 2015.
  • (15) M. S. A. Marcos, M. Soriano, and C. Saloma, “Classification of coral reef images from underwater video using neural networks,” Optics express, vol. 13, no. 22, pp. 8766–8771, 2005.
  • (16) A. Denuelle and M. Dunbabin, “Kelp detection in highly dynamic environments using texture recognition,” in The Australasian Conference on Robotics & Automation (ACRA)(December 2010), 2010.
  • (17) M. Bewley, B. Douillard, N. Nourani-Vatani, A. Friedman, O. Pizarro, and S. Williams, “Automated species detection: An experimental approach to kelp detection from sea-floor auv images,” in Proc Australas Conf Rob Autom, 2012.
  • (18) O. Beijbom, P. J. Edmunds, D. Kline, B. G. Mitchell, D. Kriegman, et al., “Automated annotation of coral reef survey images,” in

    2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    , pp. 1170–1177, IEEE, 2012.
  • (19) A. Mahmood, M. Bennamoun, S. An, F. Sohel, F. Boussaid, R. Hovey, G. Kendrick, and R. Fisher, “Coral classification with hybrid feature representations,” in Image Processing (ICIP), 2016 IEEE International Conference on, pp. 519–523, IEEE, 2016.
  • (20) A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, “Cnn features off-the-shelf: an astounding baseline for recognition,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 512–519, IEEE, 2014.
  • (21) M. D. Stokes and G. B. Deane, “Automated processing of coral reef benthic images,” Limnology and Oceanography: Methods, vol. 7, no. 2, pp. 157–168, 2009.
  • (22) O. Pizarro, P. Rigby, M. Johnson-Roberson, S. B. Williams, and J. Colquhoun, “Towards image-based marine habitat classification,” in OCEANS 2008, pp. 1–7, IEEE, 2008.
  • (23) O. Beijbom, T. Treibitz, D. I. Kline, G. Eyal, A. Khen, B. Neal, Y. Loya, B. G. Mitchell, and D. Kriegman, “Improving automated annotation of benthic survey images using wide-band fluorescence,” Scientific reports, vol. 6, 2016.
  • (24) A. Mahmood, M. Bennamoun, S. An, F. Sohel, F. Boussaid, R. Hovey, G. Kendrick, and R. Fisher, “Automatic annotation of coral reefs using deep learning,” in OCEANS 2016 MTS/IEEE Monterey, pp. 1–5, IEEE, 2016.
  • (25) K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
  • (26) K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  • (27) C. Cortes and V. Vapnik, “Support vector machine,” Machine learning, vol. 20, no. 3, pp. 273–297, 1995.
  • (28) M. Bewley, N. Nourani-Vatani, D. Rao, B. Douillard, O. Pizarro, and S. B. Williams, “Hierarchical classification in auv imagery,” in Field and service robotics, pp. 3–16, Springer, 2015.
  • (29) A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, pp. 1097–1105, 2012.
  • (30) S. H. Khan, M. Hayat, M. Bennamoun, F. Sohel, and R. Togneri, “Cost sensitive learning of deep feature representations from imbalanced data,” IEEE Transactions on Neural Networks and Learning Systems, 2017.
  • (31) K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in European Conference on Computer Vision, pp. 630–645, Springer, 2016.
  • (32) A. Mahmood, M. Bennamoun, S. An, and F. Sohel, “Resfeats: Residual network based features for image classification,” in Image Processing (ICIP), 2017 IEEE International Conference on, pp. 1597–1601, IEEE, 2017.
  • (33) M. Bewley, A. Friedman, R. Ferrari, N. Hill, R. Hovey, N. Barrett, O. Pizarro, W. Figueira, L. Meyer, R. Babcock, et al., “Australian sea-floor survey data, with images and expert annotations,” Scientific data, vol. 2, 2015.
  • (34) K. E. Kohler and S. M. Gill, “Coral point count with excel extensions (cpce): A visual basic program for the determination of coral and substrate coverage using random point count methodology,” Computers & Geosciences, vol. 32, no. 9, pp. 1259–1269, 2006.
  • (35) M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in European conference on computer vision, pp. 818–833, Springer, 2014.
  • (36) C. N. Silla Jr and A. A. Freitas, “A survey of hierarchical classification across different application domains,” Data Mining and Knowledge Discovery, vol. 22, no. 1-2, pp. 31–72, 2011.
  • (37) A. Vedaldi and K. Lenc, “Matconvnet: Convolutional neural networks for matlab,” in Proceedings of the 23rd ACM international conference on Multimedia, pp. 689–692, ACM, 2015.
  • (38) R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, “Liblinear: A library for large linear classification,” Journal of machine learning research, vol. 9, no. Aug, pp. 1871–1874, 2008.
  • (39) H. Azizpour, A. Sharif Razavian, J. Sullivan, A. Maki, and S. Carlsson, “From generic to specific deep representations for visual recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 36–45, 2015.