Unlocking the potential of deep learning for marine ecology: overview, applications, and outlook

by   Morten Goodwin, et al.
proton mail
Universitetet Agder

The deep learning revolution is touching all scientific disciplines and corners of our lives as a means of harnessing the power of big data. Marine ecology is no exception. These new methods provide analysis of data from sensors, cameras, and acoustic recorders, even in real time, in ways that are reproducible and rapid. Off-the-shelf algorithms can find, count, and classify species from digital images or video and detect cryptic patterns in noisy data. Using these opportunities requires collaboration across ecological and data science disciplines, which can be challenging to initiate. To facilitate these collaborations and promote the use of deep learning towards ecosystem-based management of the sea, this paper aims to bridge the gap between marine ecologists and computer scientists. We provide insight into popular deep learning approaches for ecological data analysis in plain language, focusing on the techniques of supervised learning with deep neural networks, and illustrate challenges and opportunities through established and emerging applications of deep learning to marine ecology. We use established and future-looking case studies on plankton, fishes, marine mammals, pollution, and nutrient cycling that involve object detection, classification, tracking, and segmentation of visualized data. We conclude with a broad outlook of the field's opportunities and challenges, including potential technological advances and issues with managing complex data sets.



page 6

page 14

page 16

page 18


Deep Learning for Genomics: A Concise Overview

Advancements in genomic research such as high-throughput sequencing tech...

Big Data Analytics Applying the Fusion Approach of Multicriteria Decision Making with Deep Learning Algorithms

Data is evolving with the rapid progress of population and communication...

Deep Learning Object Detection Methods for Ecological Camera Trap Data

Deep learning methods for computer vision tasks show promise for automat...

Hack Weeks as a model for Data Science Education and Collaboration

Across almost all scientific disciplines, the instruments that record ou...

Deep Learning on Real Geophysical Data: A Case Study for Distributed Acoustic Sensing Research

Deep Learning approaches for real, large, and complex scientific data se...

Fooling the Crowd with Deep Learning-based Methods

Modern, state-of-the-art deep learning approaches yield human like perfo...

Field Studies with Multimedia Big Data: Opportunities and Challenges (Extended Version)

Social multimedia users are increasingly sharing all kinds of data about...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Marine ecosystems are complex, highly diverse, and productive, providing renewable resources to a growing human population. At the same time, the oceans are particularly sensitive to and impacted by anthropogenic stressors (Antão et al., 2020)

. As such, the scientific community strives to deliver up-to-date information about the state of marine ecosystems so that management decisions are well-informed. Ideally, such decisions use ecosystem-based management (EBM) approaches to preserve ecosystem health and productivity while allowing appropriate human use. EBM is especially relevant in densely populated coastal areas. During this period of rapid environmental change, EBM requires researchers to track ecological change and critical events when, and not well after, they occur. Fortunately, technological developments in observation methods have provided ecologists with a range of new tools for obtaining vast amounts of data from marine ecosystems over the last couple of decades. These include high-end cameras, echo sounders, and hydrophones, combined with various sensors to measure environmental parameters. Researchers can attach such technologies to cabled observatories or static rigs to assess temporal dynamics, or remotely or autonomously operated vehicles to evaluate spatial variability. However, because these technologies can produce an unprecedented amount of data, which has traditionally required manual processing, ecologists may be reluctant to adopt them as an alternative or supplement to traditional sampling techniques. For example, using traditional gear (e.g., nets and traps) to assess the abundance of fish has been an established sampling technique for centuries and is still used today. These methods are efficient for manual data handling and straightforward: as soon as the fish are caught, counted, and the data punched, it can be analyzed by the researchers. On the other hand, detecting and counting fish with cameras is less destructive to animals and habitat, provides a temporal dimension to the collected data, allows researchers to observe behaviour of animals and habitat use, and often provides a more representative estimate of species diversity and relative abundance 

(Bacheler et al., 2017). However, extracting all of this information from videos manually is a laborious task. Thus, automating this step would undoubtedly encourage more fish biologists to use cameras for data collection.

Many diverse fields of research are undergoing rapid change due to advances in the use of artificial intelligence (AI) for data interpretation. AI offers fast and accurate analysis of the large volumes of data collected by sensors, cameras, and other observation technologies. Off-the-shelf algorithms can now, with high precision, find, count, and classify organisms from digital images and real-time video, (Knausgård et al., 2021; Lopez-Guede et al., 2020) and detect cryptic patterns in noisy images or acoustic data (Weinstein, 2018). An increasing number of marine ecologists embrace this opportunity, yet initiating collaborations across ecological and data science disciplines can be challenging for several reasons. First, transferring the necessary information to start a project between an ecologist and a computer scientist can be a steep learning curve because knowledge barriers and field-specific jargon can cloud otherwise fruitful discussions and halt progression. Secondly, ecologists unfamiliar with AI may not be aware of the opportunities available to address a particular problem. Before an ecologist approaches an AI expert, they may need to know about the possibilities and limitations of AI for the task at hand, how to prepare and annotate data sets, and what information to provide the computer scientist to enable identification of the best AI method for the task at hand. Meanwhile, before advising on the possibilities, the computer scientist may find it challenging to understand the underlying ecological question, the data and its inherent variability/noisiness, how it is categorized, and what level of accuracy is needed. Thus, substantial investment in the interdisciplinary partnership is required in order to achieve a common understanding.

This paper aims to bridge the gap between marine ecologists and computer scientists to expedite the initial stages of collaboration. Here, we provide insight into the most popular and suitable AI techniques for ecological data analysis and describe technical concepts in plain language. AI is a general term referring to any artificial intelligence technique that can solve a complicated problem (Goodwin, 2020; Russell and Norvig, 2002). We focus on applicable and well-used methods, namely deep neural networks (DNNs), synonymous with “deep learning”, and learning with supervision (supervised machine learning). Supervised learning requires algorithms to be presented with datasets that have been labeled with accurate information on the region of interest, for example the presence or location of known species, objects, or sound. The algorithms learn to associate the labels with the examples (Christin et al., 2019). With enough training material, the algorithms can produce models that automatically recognize and identify new and unseen examples in other datasets without the need for new labels (LeCun et al., 2015). One of the biggest challenges for supervised learning is the demand for a large, labeled training dataset of sufficient quality to achieve high accuracy (Malde et al., 2020; Beyan and Browman, 2020). Close collaboration between ecologists and computer scientists would likely facilitate and accelerate the dedicated effort required to collect and label representative datasets (Weinstein, 2018; Schneider et al., 2019; Beyan and Browman, 2020).

This paper is organized as follows: Section 2 summarizes popular deep learning tools relevant for ecologists and explains standard AI terms. Section 3 describes three cases where AI has been applied to ecological data, namely, fish detection, classification, and tracking in underwater videos; image-based analysis for plankton monitoring; and acoustic monitoring of whales. Deep learning ecology research is not limited to these cases and we are confident that the deep learning toolset will have an even greater impact on emerging research areas in marine ecology. Therefore, section 4 continues with four case studies where we see potential for deep learning to make an essential impact, including individual re-identification of fish using unique patterns; analysing fish vocal communication to understand mating behavior; ghost fishing gear detection; and determining the ecological functions of fishes. Finally, in section 5

, we discuss technological advances, complexity in data, and acceleration of data collection and labelling through open-source approaches.

Case studies Object detection Classification Segmentation
When to use Possible method When to use Possible method When to use Possible method
Fish and species counting (C1) Images with 1+ fish YOLO with Ap A Images with 0/1 fish / species Squeeze-and-excitation with Ap B Outlines of 1+ regions wanted R-CNN with Ap C
Plankton analysis (C2) Images with 1+ organisms YOLO with Ap A Images with single organisms CNN with Ap B Images with single organism (morphology) R-CNN with Ap C
Marine bioacoustics (C3) Spectrograms with 1+ calls R-CNN with Ap D Spectrograms with 1/0 calls CNN with Ap F Separation for 0+ calls in time series RNN with Ap G or transformer with Ap G
Re-identification in fish populations (C4) Images with 1+ fish YOLO with Ap A Images with 0/1 individuals CNN with Ap B Images with fish outlined R-CNN with Ap C
Fish vocal communication (C5) Spectrograms with 1+ individual calls R-CNN with Ap D Spectrograms with 1/0 calls CNN with Ap F Separation for 0+ calls in time series RNN with Ap G or transformer with Ap G
Ghost fishing gear detection (C6) Images with 1+ gear R-CNN with Ap A Images with 0/1 gear CNN with Ap B Areas with partially dissolved fishing nets R-CNN with Ap C
Carbon cycling by fish (C7) Images with 1+ life processes R-CNN with Ap A Images with 0/1 life processes CNN with Ap B Images or video with moving processes YOLO with Ap C
Table 1: Machine learning approaches to ecological data applied (green) or explored (blue) in the case studies (C1-C7), and some alternatives (orange). Grey cells indicate no added benefit to using that approach for the task. Approaches: (Ap A): One label per region of interest, (Ap B): One label per image, (Ap C): Pixel-wise segmentation, (Ap D): Ground truth spectrograms with labeled region of interest, (Ap E): Labelled spectrograms with regions of interest, and (Ap F): Segmented time series data.

2 A non-comprehensive review of deep learning

AI is a broad concept, but the most commonly applied technique is machine learning. Machine learning is a set of algorithms that learn from an environment containing data such as images. The most common AI approach used in biology is supervised learning, which is when the data are labeled or categorized so that the algorithms can learn from the data. Conversely, unsupervised learning is when algorithms do not use labelled data but, instead, learn data structures that are reinforced when the algorithms continuously interact with an environment, such as playing a board game. Figure

1 illustrates the overall procedure for training and application of AI with supervised learning.

Figure 1: The workflow of AI based strategies. (1) The (yellow) column illustrates the training phase, in which labeled data is used to train the AI algorithm. (2) The first row (blue) shows that the performance of the trained AI is evaluated using a validation data set and the AI algorithm may be updated and refined in this process. (3) The bottom row (green) shows the application phase, using the AI on a test data set once the training and testing are completed.

Among the most popular and widely used AI algorithms are the family of artificial neural networks. A neural network is a set of human brain-inspired networks with artificial neurons and synapses that are trained to approximate an external function, typically mapping from input data (e.g., images) to labeled values or categories (e.g., classes). A neural network consists of a layer of input neurons connected to the input data and a layer of output neurons mapping to the values or categories to be predicted. It is common to have layers between the input and output, which are referred to as hidden layers. When a network has more than one hidden layer, it is referred to as deep learning (DL) or a deep neural network (DNN).

Neural networks, especially DL, are the go-to machine learning approach for categorizing and recognizing images and sound data. These techniques have won numerous pattern recognition and machine learning competitions for image and sound analytics 

(Tessler et al., 2017; Schmidhuber, 2015). In recent years, DL has become the predominant analytical technology in many domains, including health (Esteva et al., 2019), customer evaluation (Lessmann et al., 2019), and crisis management (Ben Lazreg et al., 2019a, b). Aquatic ecology has experienced the early stages of the same shift, where object detection and semantic segmentation are being used to identify and locate marine species in raw images, videos, and audio recordings for the purpose of species (Knausgård et al., 2021) and individual (Bogucki et al., 2019) classification, and to quantify abundance. Despite the domination of deeper over more shallow neural networks, there is no need to employ DL models exclusively. Depending on the complexity and the nature of the problem, various models with different depths can be utilized. For example, Kohonen networks, which consist of only one layer, are shallow but useful for biology-related classifications and visualisation (Suryanarayana et al., 2008). In addition to identifying and counting fish and other marine animals, there is enormous potential to apply DL to a wide range of data in coastal ecology (Grasso et al., 2019; Marre et al., 2020). In the following subsections, we will briefly go through the basics of DNN. A glossary of AI terms is summarized in Table 2.

Accuracy Fraction of correct classifications
Activation A non-linear mathematical operation. It is often used to approximate “turning on” or “turning off” an artificial neuron
Area Under the Curve (AUC) A summary of the ROC curve that shows capacity of a supervised learning algorithm to distinguish between classes. A perfectly performing algorithm will have an AUC of 1
Attentions A deep learning technique to learn and indicate which sequence in a time series or which regain of image to pay attention to
Classification Categorisation of input data into classes
Convolution Mathematical operation that expresses the amount of overlap of one function as it is shifted over another function
Convolutional neural network A neural network with convolutions, typically used for image classification
Deep learning / deep neural network A neural network with more than one hidden layer
Encode-decoder A neural network that encodes the input data into an internal representation, followed by a neural network that decodes the internal representation, typically to a human readable format
False negative rate The rate of wrongly predicted negative values
False positive rate The rate of wrongly predicted positive values
Feature extraction An operation to elect extract values into feature, typically from unprocessed data
Features Valued characteristics, typically numeric or structural, representing the input data
Hidden layer Any layer of neurons in between the input and the output layers
Hyper parameters User-controlled parameters that influence the model such as number of layers
Layer A set of neurons that takes data as input and typically does a combination of linear (synapses) and non-linear operations (activation)
Loss A real number indicating the incorrectness of a single prediction and is typically used to adjust the weights of the neural network
Machine learning Trainable computer programs that learn the representation of data with an aim to predict never-before-seen data
Model A representation of what a machine learning program has learned. In a neural network, the model is a combined structure consisting of the network and learned weights of the algorithm
Neural network A brain-inspired machine learning technique with an input layer (features), one or more hidden layers, and an output layer (predictions)
Neuron A node that combines input data with learned weights and provides a single output
Object detection Recognize the presence of an object instance in a location or area
Overfitting When a model closely predicts the training data but fails to fit testing data
Weights Real values in a neural network in which each parameter individually prioritises each data value, and that are updated in the learning process
Pattern Common trends and regularities in the data such as statistical trends often unique for one category
Pattern recognition Methods to detect patterns in input data
Precision The frequency of true positives among all positive predictions
Receiver operating characteristic curve (ROC) A graph displaying a supervised algorithm’s performance at all classification thresholds. Typically, the relationship between the rate of true predictions and the rate of false predictions
Recall The frequency of correctly identified positive values from all positive values in a data set
Recurrent network Neural networks that connect between nodes to form a directed graph to detect patterns that occur, often over a time series
Semantic segmentation The process of partitioning images into labelled regions
Supervised learning Machine learning that maps an input to a specific, often labelled, output
Synapses Learned weights on the input data for a layer, i.e., how to prioritise the input features
Labelled training data Data used for training the model. It is kept separate from testing and validation data
Testing data Data used for independently evaluating the trained model. It is kept separate from training and validation data
True predictions Model output that corresponds with the correct values
Underfitting When a model has not reliably learned the patterns of the data
Unsupervised learning Machine learning that finds patterns in unlabelled data
Validation data Data used for verifying the model and tuning the hyper parameters during training
Table 2: Glossary table.

2.1 Deep Neural Networks

All neural networks are function approximators; they mimic the function presented in the training data and adapt to this function through an optimization process. During training, the neural networks’ weights, which are many real-valued and connected neurons followed by activations, are updated to match the training data. In more detail, the real-valued difference between the predicted output, , and the expected output, , is referred to as the loss, which guides the training. For example,

can be a list of image categories where each value in the vector relates a category to an image, and

is then the neural network’s predicted image categories. If the neural network is able to correctly predict image categories, will be identical to and the loss will be zero. The goal of the training process is, generally speaking, to minimize the loss. However, the loss minimization should be done with care since a small loss may indicate that DL has learned specific patterns for each example rather than general trends in the data (i.e., overfitting). To check for overfitting, a separate validation and testing data set is normally employed to independently evaluate the algorithm’s performance.

A properly trained network has active or inactive neurons that jointly match the training data and minimize the loss. This is analogous to a series of virtual dials that can be turned completely on, completely off, or somewhere in between, indicating the relevance for each feature. During training, the loss for each neuron is propagated backward through the network so that each neuron’s contribution matches the product of the weight and a hyper-parameterized learning rate. Hence, each neuron’s influence of the loss is matched with a corresponding adjustment of weights, and its adjustment is kept small by the learning rate. When the loss is propagated backwards, the dials are turned slightly in the direction that decreases the loss.

A neural network is considered shallow if it has one layer of input neurons, one layer of hidden neurons, and one output layer. The same network would be considered deep if it had more than one hidden layer, and very deep if it had more than 10 hidden layers. Any neuron that is not at the input layer combines a weighted sum from active neurons in the previous layer. The sum is then followed by an activation function for the next layer of neurons. Despite popular belief, the depth of the DL may not be proportional to the difficulty of the problem that it can solve. It is not always true that deeper networks solve more complicated issues than shallower networks. Some problems can be solved with shallow networks, but in many cases very deep models empirically outperform the shallow ones for image and sound categorisations. For example, a type of neural network called Residual Networks (sometimes abbreviated to ResNets) often has 18, 34, 50, or 101 layers. Usually, the deeper networks perform better image classification, but occasionally the most shallow network, with 18 layers, is sufficient and even more accurate than the deeper networks 

(Aloysius and Geetha, 2017).

A notable limitation of DL is its dependency on vast amounts of training data. The data requirement typically becomes a significant problem in supervised learning, as a successful application in most cases depends on large quantities of human-classified training examples. This challenge is extensively presented in the marine biology domain, as the limited capacity of trained experts makes extensive and quality-assured labeled training databases hard to acquire. A beneficial property of deep unsupervised learning is its independence of labeled data. However, due to the unsupervised nature, the application area is rather limited in the marine biology domain and has mostly been confined to finding anomalies through re-identification (Dargan et al., 2019; Ferreira et al., 2020a) and data clustering.

Deep semi-supervised learning has emerged in recent years to mitigate the limitations of supervised and unsupervised learning. Semi-supervised approaches combine training on a small amount of labeled data with a subsequent training phase using large amounts of unlabeled data. In applications where there is often a lack of human-classified training data, semi-supervised learning is especially useful.

In the paragraphs below, we summarize typical problems relevant to marine ecology where DNN can be utilized as a promising solution.

2.1.1 Image classification

DNN is the de facto standard for machine vision, such as the categorization of images and video files. The most prominent approach among various DNNs is Convolutional Neural Networks (CNNs), which extract relevant features of an image for subsequent classification by a neural network through a series of two-dimensional mathematical convolutional operations with learnable filters of typical sizes 33, 55, and 77 applied in the image pixels. A CNN trained for classification of images finds the function that best maps the input of pixels to a class, e.g., presence of a fish, plankton or a rope in the photo (Figure 2). Note that the CNN generates small image blocks from the convolutionals of overlapped data within each image. CNN categorizes the image but does not output in which part of the image the object is located.

The first popularised CNN models were LeNet-1 to LeNet-5  (LeCun et al., 1995), which contain all the basic building blocks still used today. A major advancement came in 2012 with AlexNet (Krizhevsky et al., 2012), which achieved an error of 15% of all non-neural network architectures, compared to 26% previously. These early models suffered from vanishing gradients, meaning that the input data was gradually lost when additional layers were added. This limitation hindered the development of DNN and the performance of the DL models suffered. Later, major innovations included: 1) inception networks (Szegedy et al., 2015), which utilized parallel convolutions of different sizes, 2) residual architecture (He et al., 2016), which added skip connections to allow for an image to both be processed by convolutions and skipped through the network, and 3) Squeeze-and-Excitation networks (Hu et al., 2018), which introduced a method to add additional parameters to each convolutional block so that the model could adjust the weight of each block. Each of these innovations has enabled larger, more complex networks.

2.1.2 Object detection and semantic segmentation

Object detection extends CNN models by detecting regions of interest in the image (Figure 2). In addition to classification, a network trained for object detection can output the - and -location, width, and height of the object of interest. This information is then used to draw a boundary box around the object to be classified, e.g. a fish. In this way, a single image can be divided into multiple regions by generating several boundary boxes, allowing for many classes to be classified within a single image. In practice, this means that we can detect and count objects in an image or a video, e.g. the number of fish. The approach has been extended even further by pixel-wise detection and classification of the entire image. This approach scales down the image with convolutions and pooling operations, followed by reverse order scaling-up of the same image. This is known as an encoder-decoder architecture (Girshick et al., 2014) and allows for categorisation of every region in the image at a high level of detail.

2.1.3 Individual identification

A Siamese Neural Network (SNN) (Koch et al., 2015) is a type of DL model that contains two identical sub-networks with the same layers, hyper parameters, and weights. The neuron weight updates are mirrored and so can be used to find the similarity of the inputs by comparing vector features. An SNN allows us to detect if two images are the same, e.g., two faces are of the same person or two fish photos are of the same fish taken at a different time. Hence, an SNN can classify a new class without re-training the entire network. Other features include robustness to class imbalance (i.e., data is unequally distributed between classes) and learning efficiency in the semantic similarities between images. However, SNNs need more training data and longer computational time than competing networks.

When training an SNN, a typical loss function used to detect differences in input is a so-called triple loss, in which the baseline input is compared both with a positive and a negative example. A perfectly trained SNN should have a zero loss for the positive example and a loss for the negative example. For example, when detecting individual marine animals, a comparison between pictures of the same animal should have a small loss, while a comparison between pictures of two different individuals of the same species should have a much larger loss. This approach can be used to identify if two pictures include the same individual and verify whether an image consists of an individual that is not part of the training data.

Figure 2: Examples of classification, object detection, and pixel-wise segmentation with illustrations of the techniques applied to fish images or audio files.

Figure 2 provides examples of classification, object detection, and segmentation and how they are typically evaluated.

2.1.4 Audio signal classification

Audio signal classification (ASC) is one of the classic and most challenging audio signal processing fields. In brief, ASC comprises capturing appropriate features from an audio sequence and employing these features to distinguish the class that the sequence is most likely to fit. Depending on the application’s domain, one may predict a global signal class with a unique label or a subset of the possible classes with multiple labels. Traditionally, finding appropriate features and designing a suitable classifier are configured as separate procedures. This approach has several drawbacks, e.g. the extracted features might not be optimal for the classification objective and certain features may require prior human knowledge, are difficult to describe precisely, and can be subjective and unstable. As mitigation, DNN-based approaches are developed to perform feature extraction jointly with classification.

To increase the modeling capability, DNN in different structures are usually employed individually or jointly, such as multiple convolutional, feed-forward, and/or recurrent networks  (Goodfellow et al., 2016)

. Feed-forward neural networks have one-way information flow so do not have feedback loops, whereas recurrent neural networks (RNNs) contain loops. Due to the feedback loops, RNNs can use their reasoning from previous experiences to influence upcoming events. There are different variations of RNN, such as long short-term memory (LSTM) and gated recurrent units (GRUs). The most recently developed RNN approaches include attention 

(Chaudhari et al., 2019; Phan et al., 2019) and transformer-based strategies (Tay et al., 2020; Moritz et al., 2020). The convolutional concept can be employed together with RNN and attention to improve the performance of the system. For example, an attention-based convolutional RNN model is utilized for environmental sound classification (Zhang et al., 2020). That mechanism adopts frame-level attention to learn discriminatory feature representations for classification. For audio signals that are time series in nature, RNN is especially popular. In some cases, audio is transformed into spectrograms, an image representation of the audio which can be classified using a DNN or CNN. These mechanisms, alone or in combination, can be utilized for audio classification tasks in marine ecology-related applications. Figure 2 gives examples of classification, object detection, and data point segmentation with CNN and RNN networks for audio categorisation.

2.2 Evaluation criterion

To evaluate the performance of a trained model, different parameters are utilized by the different approaches, such as accuracy, precision, and recall (Figure


). Accuracy is the ratio of correct classifications to the total number of classifications. Precision for the positive predictions is the ratio of true positive predictions over the sum of true positive and false positive predictions. The same concept applies to the precision of negative prediction. Recall is the ratio of true positive predictions over the sum of true positive and false negative predictions. A result of a DL algorithm may be precise but not accurate when results are biased but with small variance. A DL algorithm is considered valid if it is both accurate and precise.

For example, if the expected output is 5 images of cod and 5 images of trout, and the predicted output correctly identifies all cod and only 4 of the trout, with one trout wrongly identified as cod, the algorithm is correct 9 out of 10 times, yielding an overall accuracy of 90%. In this example, the precision for cod is 100%, i.e. all cod were predicted as cod, but only for trout, i.e. for all the predicted trout, only 83% are actually trout. The recall for cod will be 100%, i.e. the algorithm identifies all cod, but the recall for trout will be , i.e. the algorithm only identifies 80% of the trout.

The parameter used for performance evaluation depends on the data. Accuracy is most suitable if the data set is balanced, meaning an approximately equal number of examples in each class, and where false positives and false negatives have similar implications. But if the data set is imbalanced, which is typical for ecological data (e.g., some species are more common than others), precision or recall are better. A high precision relates to a low false-positive rate, whereas a high recall relates to how well the model detects the class in the total data set. The F1 score, a unified metric, is a weighted average of precision and recall and therefore encompasses both the false positives and false negatives. A rule of thumb is: if in doubt, evaluate your algorithm with the F1 score.

Figure 3: Evaluation metrics, accuracy, precision, and recall, for classifications and predictions.

2.3 Data

There is no universally right answer as to how much data is needed — generally, the more data, the better. Learning an intricate pattern requires more data than learning a simpler one. For example, for a DL to classify an image as either a sea trout or another fish species with clear morphological differences, such as a cod, it may achieve a near-perfect separator with relatively few samples. However, more data are likely to be required for a model to learn to distinguish sea trout from a closely related species with similarities in appearance, such as salmon, simply because that is a more complex task to learn.

Mitigation for the lack of data means using an existing model with weights pre-trained using other data sources, such as the ImageNet database 

(Deng et al., 2009a). The typical approach is to first train with an available, sizeable dataset and subsequently train with a smaller but more relevant dataset. In this way, the learning algorithms find the general image patterns from a big dataset (e.g., shapes, species patterns, face patterns) and the individual differences from the smaller dataset.

For a classification or object detection task, the dataset needs to be labeled (sometimes referred to as annotated), usually by a human expert (e.g., an ecologist). The labeled data is often referred to as the vector. An accurate classifier algorithm should correctly map the input, known as the vectors (e.g., images) to the appropriate vector (the labels). These predicted labels are often referred to as the vector, regardless of whether the predictions are correct ( matches ), or incorrect ( does not match ).

The labels for a classification task are distinct for each input variable, such as a species of fish for each image. This requires manual categorization and labelling of a large set of images. For object detection and semantic segmentation, the labels must also indicate where in the image the object of interest is located. In the case of audio input for RNN and CNN classification, the start and stop times of all events of interest must be labelled in order to segment the data into relevant categories. If object detection is used on spectrograms of audio, the frequency bands must also be labelled, encasing the contours of interest in the spectrogram. As is the case with images, existing labeled datasets also exist for audio, which can be used for training when data is otherwise limited (e.g., the DCLDE 2015 data set for baleen whale social calls (Huang et al., 2016)).

A labeled dataset is divided into three separate datasets, as illustrated in Figure 1: training, validation, and testing. The training set is used to train the model, meaning that it tries to find an approach to map the training set’s input vector with the training set’s correct labels . The validation set follows and is first used to check whether the algorithm can map the validation set’s input vector with the validation set’s correct labels , which is separate from the training set vectors. Finally, the test set is then employed to blindly verify that the data set with its own input vector maps to the labeled data set . This test is the final check of how well the algorithm can classify.

3 Established cases: identification and quantification of marine biodiversity

Figure 4: Established and emerging cases for deep learning in marine biology, from individuals to ecosystems. Data input type icons represent images/video (cases 1, 2, 4, and 6), audio (cases 3 and 5), and large-scale environmental monitoring data that is often stored on remote servers (i.e., ”the cloud”; case 7). Kelp forest photo (bottom) credit: Frithjof Moy/Havforskningsinstituttet.

The application of DNN provides an alternative to laborious or repetitive manual tasks, such as processing data from underwater recording equipment. The following section presents three cases in ecological research where deep learning is already used to alleviate data processing and is likely to become the method of choice. These cases exemplify the DL methods described in Section 2:

  • Image and video classification to identify fish species and track movement,

  • Image-based analysis for monitoring of plankton, and

  • Passive acoustic monitoring of whales.

3.1 Case 1: detection, classification, and tracking of fishes in images and videos

Monitoring of fish populations and communities is a central activity within marine management and conservation. Traditional sampling methods to track population trends, estimate abundance, and to infer movement patterns of fish have relied on studies that involve animal handling (i.e., fishing gears, individual tags, biologgers). These methods are not only invasive, but also time consuming. Developing and applying passive ways to both obtain the necessary data and to speed up analysis are therefore imperative. Today, automated detection, classification, and tracking of small-scale movements of fish through images and video are made possible with deep learning, an application well suited to this task.

When selecting AI approaches for monitoring, consider that a real-life underwater scenario typically involves multiple fishes present in the same image, which precludes the use of standard classification techniques. A solution to this problem is to introduce object detection before classification. The object detection step discriminates between individuals within an image and separates them, and in this way, prepares the image data for classification. Object detection and classification can be two completely separate steps in a pipeline (Knausgård et al., 2021; Connolly et al., 2021), or integrated as part of an object detector, such as YOLOv1-YOLOv4 (Redmon et al., 2016; Bochkovskiy et al., 2020; Yang et al., 2021; Shin et al., 2021; Jalal et al., 2020).

Detecting and counting species from still images and videos is relatively straightforward using standard DL object detection algorithms, as described in Section 2

. However, a challenge with setting up a detection algorithm is that well-established object detection training datasets, such as Coco 

(Lin et al., 2014)

and ImageNet 

(Deng et al., 2009b), include few images within the category of each species of fish and with very little variation in the background. Thus, the applicability of such datasets becomes somewhat limited. To increase the precision of detection suited to the specific use, one should instead train the DNN with images of fishes in their natural environment. Collecting and labelling relevant image and video data is therefore central to building a high-performance and robust fish detector. Public datasets are currently an integral part of this research, particularly for fish detection and species identification (e.g., Fish4Knowlege (Fisher et al., 2016), datasets of temperate fish species  (Knausgård et al., 2021), and across species, location, and depths, as in NOAA fishery datasets (Link et al., 2015) and the OzFish dataset (Ditria et al., 2021)). The best performance by AI in species identification (i.e., classification) is achieved with a specialized CNN that only classifies species without detecting at the same time. The squeeze-and-excitation-based CNN presented in (Knausgård et al., 2021) reached classification accuracy of 99.27% on the Fish4Knowledge dataset (Fisher et al., 2016) and 87.74% on a second temperate species dataset.

Marine researchers often collect videos rather than still images and are interested in tracking the same animal across consecutive frames to obtain information on behaviour (e.g., to estimate swimming speed, (Beyan et al., 2015)), or to ensure that the same fish is not counted multiple times (Lopez-Marcano et al., 2021)

. To continuously follow a moving object’s position in a video sequence, such as a swimming fish, object tracking can be used. One way of implementing tracking is to use a detection algorithm that feeds another tracking algorithm with position data. When tracking multiple objects (e.g., a school of fish), a track association decision needs to be made for each object (e.g., each individual fish). Thus, a complete tracking system typically consists of a detection algorithm, association of detection with tracks, and the actual tracking algorithm. In practice, tracking commonly involves Kalman filters or other recursive estimators to enable efficient dynamic tracking of objects  

(Ristic et al., 2004), including specific fish (Barreiros et al., 2021). Another emerging approach is to let DL solve the entire multi-class tracking problem in one step (Ciaparrone et al., 2020). This one-step approach typically results in a more homogeneous system, but with less fine-scale control than when applying well-understood recursive estimators. Further, a fully integrated CNN-tracking approach leaves less room for the user to include a priori information on expected fish dynamics and behaviour. A CNN-only approach will, however, completely avoid the meticulous tuning requirement of mathematical models and Kalman filter parameters.

We see DL as an essential building block for automating image and video analysis where the goal is to quantify, classify, and track fish. DL can either be used in a modular pipeline with separate steps for detection (Knausgård et al., 2014), association, and track building, or as a complete solution to a multi-object tracking problem (Yang et al., 2021; Shin et al., 2021; Jalal et al., 2020). As these DL tools are adaptable for use with different ecosystems or species by virtue of the training datasets used, the potential for AI in monitoring is great.

3.2 Case 2: image-based analysis for plankton monitoring

Plankton is a highly diverse group with very different morphologies and sizes ranging from submicrons to a few centimeters, or even a few meters (Lombard et al., 2019). Plankton are responsible for about 50% of global primary production (Field et al., 1998) and constitute the base of many marine food webs. Some species serve as bioindicators of ecosystem health, while others can form toxic blooms with adverse impacts on other marine life, including commercially important fishes. Therefore, tracking seasonal, interannual, and spatial changes in plankton composition and abundance is central to coastal monitoring. As such, an ever-increasing volume of plankton images is generated for monitoring each year. Various AI approaches have been developed to analyse this data and reduce manual processing. Plankton identification and counting are arguably some of the most useful examples of DL in marine biology. The ultimate goal is fully automated plankton classification without human biases (Culverhouse, 2007). This bias is not trivial, as human experts can only achieve 67-83 % self-consistency during a difficult classification task (Culverhouse et al., 2003), although accuracy is much higher (90%) when working with natural plankton samples with many taxa which have variable classification difficulty (Luo et al., 2018).

Several systems for image acquisition and AI analysis of plankton are commercially available  (Lombard et al., 2019), including in situ  (e.g., Imaging FlowCytobot, VPR, IISIS) and those that image samples, fixed or fresh, on research vessels or in the laboratory (e.g., ZooCam, FlowCam). All approaches share the same basic principles: pictures are taken of the sampling volume and the objects are segmented (i.e., into individual organisms). Each segment is then classified into one of several pre-defined classes, typically taxonomic or functional groups, but living organisms are always separated from non-living particles. Besides the predicted classification, the algorithms can extract object features (e.g., length, width, equivalent spherical diameter), and therefore information on plankton community structure and function (e.g., normalized biomass size spectra (Wang et al., 2020)). Seasonal and interannual variability in plankton abundance and composition obtained using these image-based DL methods is comparable with traditional microscopy (e.g., FlowCam, (Alvarez et al., 2014)).

Initial plankton classification models were based on statistical approaches but soon transitioned into machine learning solutions (Kerr et al., 2020; Luo et al., 2018)

, including algorithms that classified plankton based on object features such as size or edge, for example Support-Vector Machine and Random Forest algorithms  

(Fischer et al., 2020; Faillettaz et al., 2016). These algorithms reach 70-90% accuracy in classification for the most abundant plankton groups, but rare or cryptic species can still be a problem. Simpler classifiers cannot extract the object features from the raw data and instead require these to be manually defined by ecologists, a cumbersome process. CNNs to overcome these issues are being proposed, such as collaborative CNNs with configurations to deal with class imbalance (e.g., where one type of plankton is much more frequent than another)  (Kerr et al., 2020) or when the environment dynamically changes (dataset shift) using a supervised quantification scheme  (Orenstein et al., 2020a). These CNNs achieve state-of-the-art  90% classification accuracy when classifying independent test sets (e.g., 97% accuracy classifying 0.1 million FlowCam images  (Kerr et al., 2020)), although accuracy decreases with very many diverse images (e.g., 83% accuracy for 52 million zooplankton images from IISIS  (Briseño-Avena et al., 2020)). Other approaches to improve accuracy of conventional CNNs are through inclusion of context data (e.g., sampling location and time) in the classifier (Ellen et al., 2019), using unsupervised clustering of data (Schroeder et al., 2020), or combining CNNs with Support-Vector Machine (SVM) classifiers (Cheng et al., 2020).

DL enables a whole new approach to plankton coastal monitoring by (semi-) automatic analysis of samples either in situ or in the lab (Wang et al., 2019). DL is used to monitor long-term, seasonal, and spatial changes in taxonomical groups (Briseño-Avena et al., 2020) and size spectra  (Wang et al., 2020; Yu et al., 2016), to track plankton that serve as bioindicators of ecosystem health (Uusitalo et al., 2016), or as an early-warning system for harmful algal blooms that impact higher trophic levels and, ultimately, humans (Gorocs et al., 2018; Orenstein et al., 2020b). However, DL cannot replace a taxonomist for difficult identification tasks (e.g., identification of certain species or life stages of zooplankton or larval fish), and as such are not yet adequate for studies that require high taxonomic resolution. Experts are also required to create training sets and validate the results. However, manual hours can be reduced if training sets and analysis pipelines are made publicly available (Li et al., 2020; Chen, 2021; Schmid et al., 2021), as well as through the creation of global databases and training sets (e.g., Ecotaxa  (Picheral et al., 2017)). Ultimately, the combination of traditional physical plankton sampling with autonomous platforms that combine image-based data with data from other sources (e.g., genomics, acoustics, pigments) appears to be the best way forward for coastal plankton monitoring studies (Gorsky et al., 2019; Lombard et al., 2019).

3.3 Case 3: passive acoustic monitoring of whales

The use of long-term underwater passive acoustic monitoring (PAM) recording has grown in the last couple of decades to become an indispensable tool for investigating relative population trends and temporal and spatial migration patterns of a wide range of whale species (Wiggins and Hildebrand, 2016).

For many years, the standard procedure for detecting and classifying whale calls from PAM recording has been to retrieve the sound recording, use a software package like Triton (Wiggins and Hildebrand, 2007) to create spectrograms lasting 1-2 minutes, then have the spectrograms manually scanned for call contours by a trained data analyst. This method is not only highly labor-intensive, as PAM recording can cover months, if not years, but the results are also subjective (Baumgartner and Mussoline, 2011). As many whale calls are highly stereotypical, algorithms like matched filtering (Giannakis and Tsatsanis, 1990) and spectrogram correlation  (Mellinger and Clark, 1997) have successfully been developed for automated call detection. However, these methods tend to work poorly on calls with more variability in frequency modulation. Hence, manually scanning spectrograms continues to be used for many call types.

The manual procedure of visually scanning spectrograms for known call contours is very similar to the image classification process explained in Section 2.1.1. Further, sound classification using deep learning is becoming well established outside of marine aquatics (Piczak, 2015; Mushtaq et al., 2021; Sharma et al., 2020), which has led to significant interest in using CNN for automated whale call detection.

Among the whale calls recently being investigated using CNN are those of the beluga whale (Delphinapterus leucas) with an AUC of 0.9906 (Zhong et al., 2020), North Atlantic right whale (Eubalaena glacialis) with an AUC of 0.902  (Shiu et al., 2020), killer whales (Orcinus orca) with an AUC of 0.9523  (Bergler et al., 2019), and sperm whales (Physeter macrocephalus) with 99.5 percent accuracy in detecting sperm whale clicks in 650 spectrograms  (Bermant et al., 2019). A drawback of CNN classification without object detection is that it does not relay information about where in the image an object is located. For example, when examining spectrograms where the axis is the timeline, no information is included about the call’s specific time, nor the number of calls, thus the CNN serves as a “presence” identification tool only. A work-around for this issue has been to make the spectrograms very small, covering only a short timeline (e.g., two seconds) (Bergler et al., 2019). When creating a spectrogram, there needs to be an overlap between two consecutive spectrograms. Otherwise, a call located at the intersection of two spectrograms might be missed. Using short spectrograms combined with these overlaps can increase the redundant data up to 20% (Bergler et al., 2019) and thereby increase the computational cost at a similar level.

Object detection, as described in Section 2.1.2, would solve these issues for whale call detection. For example, a custom-made region-based CNN for detecting regions of interest in combination with a transformed pre-trained CNN for further classifying the regions of interest was successfully trained and tested on the highly variable D call emitted by blue whales and 40 Hz calls emitted by fin whales (Balaenoptera physalus(Rasmussen and Širović, 2021).

Looking to the future, use of AI generally, and DL specifically, in automated detection of whale calls in PAM recordings will undoubtedly benefit from the recent developments in networks architecture search (NAS) algorithms  (Sun et al., 2019). This new technique of automatically developing network architecture from prefabricated blocks will cut down significantly on the work needed to adapt networks to fit specific species and calls, and make CNN more accessible for whale researchers. A general move from using CNNs to perform image recognition on spectrograms extracted from the PAM to using DL directly on the PAM is also anticipated. This can be done via recurrent networks like long short time memory networks  (Hochreiter and Schmidhuber, 1997) or a recently developed type of network called the transformer (Vaswani et al., 2017).

4 Emerging cases

A common theme of the established cases mentioned above is that they replace tasks currently conducted by humans - where using DL can reduce costs, labour, and sometimes improved accuracy compared to human analysts. However, DL has the capacity to be applied to solve more complex tasks, detecting patterns in visual and acoustic data that are difficult for humans to reliably detect or discriminate. In this section, we illustrate novel research avenues in which we predict DL will be successfully applied in the near future.

4.1 Identifying and characterizing individual phenotypes

4.1.1 Case 4: visual re-identification of individuals in wild fish populations

Methods for individual identification are needed to answer many questions in animal behavior and ecology, such as growth, movement, and survival inferred from capture-recapture studies (Clutton-Brock and Sheldon, 2010). Currently, the most common approach is to mark animals with various physical identifiers to recognise individuals upon re-sight or re-capture, such as leg rings on birds, number scratching or paint on reptiles, or lip tattoos on larger carnivores. In marine and freshwater systems, capture-recapture studies on fish are most often performed using external number tags or radio-frequency identification (RFID) tags (Pine et al., 2003). However, trapping and tagging surveys are often costly, logistically challenging to conduct, and are intrusive to the animals.

A less invasive and more practical way forward for data collection is to use images or videos from wildlife cameras and perform DL image analysis by taking advantage of natural markings that make individuals identifiable (Schneider et al., 2019)

. Like humans, many animals have unique features about their individual appearance, such as intricate patterns of spots and stripes on the skin, fur, or feathers. A trained computer vision algorithm can distinguish between individuals as different classes, even when the identifying features are highly complex. CNN networks have been trained to recognise individuals (individual re-identification [Re-ID]) from photos of animals across many taxa, including birds (e.g., 93.6% accuracy  

(Ferreira et al., 2020b)), turtles (e.g., 95% accuracy  (Carter et al., 2014)), and terrestrial and marine mammals (e.g., 92.5% accuracy  (Schofield et al., 2019)). Many fish species also have solid visual pigmentation; stripes, spots, or mosaic in contrasting colors that can be clearly seen in images and video surveys  (dala Corte and Moschetta, 2016; Hau and Sadovy de Mitcheson, 2019; Mucientes et al., 2019), particularly coastal fish like the corkwing wrasse (Symphodus melops; Figure 4). Therefore, development of Re-ID has potential to replace physical tagging for individual identification of teleost fish, and would also be of great value for monitoring, as it could be used to assess individual movement, behaviour, and growth. Re-ID could also solve the problem of double counting when individuals re-enter the field of view, thus improving video-based monitoring of abundance  (Aguzzi et al., 2015; Campos-Candela et al., 2018; Perry et al., 2018)

As far as we are aware, Re-ID by CNN has not been tested in wild populations. One of the challenges preventing the widespread development of AI-based Re-ID is the need for photos or videos of known individuals, independently validated with high certainty, for the training and validation of the algorithm. One solution to this problem is collecting data by using remote detection systems, such as RFID technology, to identify individuals tagged with passive integrated transponders (PITs). By combining PIT-tagging with RFID and synchronized underwater cameras, a large, automatically labeled dataset of many individuals could be created over a relatively short time (Ferreira et al., 2020b; Schneider et al., 2019).

4.1.2 Case 5: inter- and intra-individual variability in fish vocal communications

Acoustic communication is a fundamental component of animal life, especially for aquatic species for which visual cues are not as effective (Tessler et al., 2017). For example, many fishes hear their species mating choruses from several kilometers away (Winn, 1964). Subtle variation in complex acoustic signals is challenging for humans to detect or interpret. Furthermore, using algorithms to detect patterns that defy human perception has technological limitations, including processing high volumes of noisy, real-time acoustic data. Using algorithms to detect acoustic signaling presents the additional challenge of source identification in moving animals. However, advances both in audio recording technologies and in DL algorithms that can detect and classify acoustic signals in natural settings have opened up new systems for study, both on land and at sea (Parsons et al., 2009). These technologies unlock the potential for understanding inter-and intra- individual variation in acoustic communication of fishes.

Marine mammals are relatively well studied in this respect, as vocalizations can be classified at the species, population, and even individual levels (e.g., Case 3). However, understanding of the diversity of fish vocalizations and how these vary within species is poorly understood. Moving beyond species-level to population- and individual-level classification of vocalizations is necessary to understand the ecological and evolutionary consequences of acoustic communication in fishes and the potential impacts of anthropogenic noise pollution on them. Likewise, for better understanding of intra-individual variation in communication, which is necessary for understanding the role of vocalization in fish behavior and personality.

A prime example is Atlantic cod (Gadus morhua), which use drumming vocalizations during social interactions, particularly during mating [Brawn 1961]. Yet, our understanding of inter-and intra-individual variation in drumming is limited. There is potential to catalog individual variation in sound production using DL algorithms (Deng and Yu, 2014). Fine-scale individual variation in fish sounds, especially without a priori

knowledge, is beyond human perception. Thus, automating this task requires DL approaches that do not rely on labelled training sets. Specifically, CNNs can detect and classify fish sounds by implementing a transformer network 

(Deng and Yu, 2014). Transformer networks work solely on optimized attention and are currently state-of-the-art in translation tasks. The transformer network is rapidly replacing recurrent neural networks (RNN) previously used for this kind of task, as it solves two of the problems inherent in RNNs: 1) long computing times due to serial processing and 2) vanishing gradients (see Section 2.1.1).

4.2 Ecosystem

4.2.1 Case 6: ghost fishing gear detection

When fishing gear is lost, the continued mortality of fish, crustaceans, and other species caught in the gear is termed ghost fishing (Brown and Macfadyen, 2007). The problem is widespread and high rates of fish trap loss are reported (Vadziutsina and Riera, 2020). Using DL to detect and locate lost gear can greatly increase the efficiency of clean-up efforts, as human effort could then focus on retrieving gear (e.g., using remotely operated vehicles). Detection of ghost fishing gear has been achieved using side-scan sonar for data acquisition followed by feature cloud generation, which involves looking for objects in an image by identifying areas of high entropy, then clustering and noise reduction to separate the objects from noise by looking for clusters of the identified areas  (Labbe-Morissette and Gauthier, 2020).

The next step is using autonomous object detection to extract the location of lost fishing gear. The detection of lost fishing nets using a towed underwater camera followed by automatic object detection has been achieved with a region-based CNN (R-CNN) (Politikos et al., 2021). In that study, fishing nets were detected with higher precision than any other type of marine litter. Detection of more types and features of fishing gear is of interest to researchers and clean-up efforts (e.g., whether the feature detected is a trap, fyke net, or ropes). Image classification may be an effective approach to provide this level of detail, where low resolution images are not usually a hindrance for successful image classification. As well as video, side-scan sonar on autonomously operated vehicles could provide the data needed for this approach. Towed underwater cameras may represent a low-cost option for data collection, whereas autonomously operated vehicles equipped with side-scan sonar represent a high-cost option.

4.2.2 Case 7: carbon cycling by fish

The ocean sinks approximately one third of greenhouse gas emissions out of the atmosphere, including carbon dioxide. The ocean carbon sink is driven by a physical and a biological pump. As well as plankton and bacteria, fishes contribute to the biological pump, with recent estimates suggesting 16 percent of sinking carbon could be due to fishes (Saba et al., 2021). However, the role of fish in the biological pump is not well understood (Martin et al., 2021). The data on fishes required to improve our understanding relates to metabolic use and excretion of consumed carbon and other nutrients; properties of carbon and nutrient outputs and their fate in the environment; habitat use and connectivity of ecosystems; and physical interactions with extrinsic carbon and nutrients in the environment. As well as advancing knowledge of the role of fishes, this knowledge could inform effective management approaches to maintaining or restoring ecosystem carbon function. As an emerging field, zoogeochemistry has the advantage that much of the relevant data are already published for other purposes. For example, metabolic rates and behavioral data is already published for many commercially important species through fisheries and climate change research. Using AI in this field has the potential to expedite a better understanding of fishes ecological functions, effects of human disturbance, and therefore potential management of important carbon sink habitats. Here we present a few of the options available to apply DL to zoogeochemistry research.

In habitats where visual sampling is possible, video images could be used with object detection, classification, and tracking to identify the presence or absence, behavior, and features of particles from fishes and their short-term fate  (e.g., defecation, spawning, and whether material reaches and settles on the sea floor). This could inform estimates of the volume of carbon transferred into or out of a habitat by fishes, and the short-term fate of the carbon or nutrient they release. Methods that use AI computer vision to determine the connectivity of fish populations can also be of value in estimating carbon flow  (Lopez-Marcano et al., 2021). The long term fate of carbon and nutrients depends on physical, chemical, and biological conditions of the environment. Graph networks have recently been used to simulate the physical behavior of materials (Sanchez-Gonzalez et al., 2020)

. This technology has potential application to estimating the probable fate of carbon and nutrient outputs through simulations that combine oceanographic data with features of the carbon released by fishes. With many variables to consider, recent approaches to assessing carbon contained in sediments in different habitats include a combination of survey (acoustic and image-based) and bathymetry data, modeling, and remote ground-truthing 

(Hunt et al., 2020; Wilson et al., 2018). The current approach is manual, but there is potential for AI application to link habitats to carbon fates and make spatial and temporal estimates on cycling and sinking of carbon and nutrients. Graph networks (Sanchez-Gonzalez et al., 2020) could be applied to generate probable long-term fates of carbon and nutrient outputs using simulations based on video observations and environmental parameters such as season, temperature, currents, and maps of habitat type.

As has been mentioned in earlier cases, biological data for fishes is partially or fully available for commercially targeted species in online databases (e.g. Fishbase). Such databases have been used to generate estimates of nutrient output from fishes, such as nitrogen and phosphorous (Schiettekatte et al., 2020). AI can be trained on these databases to estimate ecological and behavioural carbon flows, including on food webs and habitat use. This training could then be applied to generate estimates for species where ecological data is limited, such as deep-sea fishes. The research needs for deep sea fishes are urgent as commercial interest is increasing at the same time as the significance of these species in moving carbon from surface waters to the deep sea is beginning to be explored, but data collection methods are expensive, time consuming, and patchy (Martin et al., 2020; Bohan et al., 2011; Lyubchich and Woodland, 2019). In this instance, DL could be used to detect probable carbon flows by using logic-based machine learning.

5 Discussion

We are entering a new era in ocean research and management thanks to new technological developments in observational methods combined with AI-supported data analysis. Data collection, processing, and interpretation are at the core of ecological studies and biodiversity monitoring. Scientists are increasingly relying on indirect observations from various sensors generating large and complex data sets, especially in the aquatic environment. Thus, we envision that within a decade, marine researchers will firmly integrate AI and ML in data collection and analysis within most sub-fields of applied marine biology. This development will only continue to accelerate with new generations of biologists better educated in computer science and informatics (Weinstein, 2018).

Non-human, autonomous, and remote platforms such as cabled observatories, autonomous underwater vehicles or gliders, and ships of opportunity will have a pivotal role in ocean monitoring (Whitt et al., 2020). These platforms will record continuous, real-time information on water physics, chemistry, community composition, and biomass of plankton, fish, and other marine species. For example, long-term monitoring of harmful bloom-forming plankton species can be achieved using inexpensive image technology anchored to piers (Gorocs et al., 2018; Orenstein et al., 2020b). Similarly, changes in whale population trends and migrations can be investigated using passive acoustic monitoring (Szesciorka et al., 2020). These methods are likely to decrease reliance on manual analysis or direct sampling via more invasive, expensive, time-consuming, or labour-intensive traditional approaches. This new way of observing the ocean will generate large volumes of data that will only be feasible to analyze with the help of AI. Therefore, AI will play a key role in making routine processes more time-efficient and alleviate the manual work required. As an example, a trained data analyst currently needs 50 to 350 workdays to manually scan one year’s worth of PAM recordings for whale calls. In contrast, the same task can be accomplished by a trained neural network in approximately four days (Bergler et al., 2019). Fully automated coastal monitoring systems will be faster and more efficient at detecting changes of interest, such as necessitating warnings to the public where toxic algae are abundant and enabling redirection of boat traffic where whales are moving across shipping routes. Altogether, this monitoring information will be valuable in the development of indicators and in integrated assessments to support ecosystem-based management (Tam et al., 2017). It is important to emphasize that expert work will always be needed to create and correctly label training sets and revise the automated analysis, such as when new species enter the system. However, this anticipated demand emphasizes the need to develop interdisciplinary skills in researchers at all career stages, as well as the skills required to form fruitful collaborations (McDonald et al., 2018).

Collaborative work based on open access and sharing culture (from model configurations to training sets) will be essential to advance this future. While this is a common practice within AI communities, the culture of marine science is not as open. However, funding agencies, publishers, and institutions are increasingly enforcing open access for data generated via public funds. The FAIR Principles for scientific data management and stewardship are now widely adopted (Wilkinson et al., 2019). These emphasize improving the access, utility, and reuse of data by machines in addition to individual researchers. As such, they may play a vital role in applying AI to the marine domain. Some collaborative initiatives are underway to create global databases for plankton and benthic images and training sets (e.g., EcoTaxa (Picheral et al., 2017) and BIIGLE (Langenkämper et al., 2017)), as well as pipelines (Chen, 2021). Ultimately, we envision libraries of images, videos, metadata and more available globally, similarly to the open access GenBank database for sequence information and associated metadata for genetic material hosted by the National Center for Biotechnology Information (NCBI) in the United States.

6 Conclusions and future directions

We have provided examples of how image and audio analysis are already used to analyze marine biodiversity distribution and dynamics in non-invasive ways, emerging applications of AI, and a look at what the future of AI in marine ecology requires. The United Nations Decade of the Ocean has just started, with the aim of achieving “a healthy, safe, and resilient ocean for sustainable development by 2030 and beyond”. We have shown that AI will be key to achieve this goal by developing new technology to uncover new aspects of and potential threats to marine ecosystems’ structures and functions, thereby informing EBM. This new knowledge will directly address several of the key challenges identified for the Decade, from effective EBM and biodiversity conservation, to creating a digital representation of the ocean and delivering data, knowledge, and technology to all. The Decade of the Ocean initiative promotes global cooperation and interdisciplinary efforts at all levels, which are at the core of how AI-linked marine studies will progress. Where researchers have the opportunity to gather large amounts of complex ecological data, unfamiliarity with AI jargon and the latest developments should not prevent collaborations with data and computer scientists to support EBM of ocean resources during this time of rapid change.


Morten Goodwin is supported by the Norwegian Research Council HAVBRUK2 innovation project CreateView Project nr. 309784. Rebekah A. Oomen is supported by the James S. McDonnell Foundation 21st Century Postdoctoral Fellowship. Susanna Huneide Thorbjørnsen is supported by Handelens Miljøfond.


  • Antão et al. [2020] Laura H Antão, Amanda E Bates, Shane A Blowes, Conor Waldock, Sarah R Supp, Anne E Magurran, Maria Dornelas, and Aafke M Schipper. Temperature-related biodiversity change across temperate marine and terrestrial systems. Nature ecology & evolution, 4(7):927–933, 2020.
  • Bacheler et al. [2017] Nathan M. Bacheler, Nathan R. Geraldi, Michael L. Burton, Roldan C. Muñoz, and G. Todd Kellison. Comparing relative abundance, lengths, and habitat of temperate reef fishes using simultaneous underwater visual census, video, and trap sampling. Marine Ecology Progress Series, 574:141–155, 2017. ISSN 01718630. doi: 10.3354/meps12172.
  • Knausgård et al. [2021] Kristian Muri Knausgård, Arne Wiklund, Tonje Knutsen Sørdalen, Kim Tallaksen Halvorsen, Alf Ring Kleiven, Lei Jiao, and Morten Goodwin. Temperate fish detection and classification: A deep learning based approach. Applied Intelligence, pages 1–14, 2021.
  • Lopez-Guede et al. [2020] Lopez-Vazquez Lopez-Guede, Marini, Fanelli, Johnsen, and Aguzzi. Video image enhancement and machine learning pipeline for underwater animal detection and classification at cabled observatories. Sensors, 20(3):726, 2020.
  • Weinstein [2018] Ben G. Weinstein. A computer vision for animal ecology. Journal of Animal Ecology, 87(3):533–545, 2018.
  • Goodwin [2020] Morten Goodwin. AI: Myten om maskinene. Humanist forlag, 2020.
  • Russell and Norvig [2002] Stuart Russell and Peter Norvig. Artificial intelligence: a modern approach. 2002.
  • Christin et al. [2019] Sylvain Christin, Éric Hervet, and Nicolas Lecomte. Applications for deep learning in ecology. Methods in Ecology and Evolution, 10(10):1632–1644, 2019.
  • LeCun et al. [2015] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436–444, 2015.
  • Malde et al. [2020] Ketil Malde, Nils Olav Handegard, Line Eikvil, and Arnt-Børre Salberg. Machine intelligence and the data-driven future of marine science. ICES Journal of Marine Science, 77(4):1274–1285, 2020.
  • Beyan and Browman [2020] Cigdem Beyan and Howard I Browman. Setting the stage for the machine intelligence era in marine science. ICES Journal of Marine Science, 77(4):1267–1273, 2020.
  • Schneider et al. [2019] Stefan Schneider, Graham W. Taylor, Stefan Linquist, and Stefan C. Kremer. Past, present and future approaches using computer vision for animal re-identification from camera trap data. Methods in Ecology and Evolution, 10(4):1151–1155, 2019. ISSN 2041210X. doi: 10.1111/2041-210X.13133.
  • Tessler et al. [2017] Chen Tessler, Shahar Givony, Tom Zahavy, Daniel Mankowitz, and Shie Mannor. A deep hierarchical approach to lifelong learning in minecraft. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
  • Schmidhuber [2015] Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural networks, 61:85–117, 2015.
  • Esteva et al. [2019] Andre Esteva, Alexandre Robicquet, Bharath Ramsundar, Volodymyr Kuleshov, Mark DePristo, Katherine Chou, Claire Cui, Greg Corrado, Sebastian Thrun, and Jeff Dean. A guide to deep learning in healthcare. Nature medicine, 25(1):24–29, 2019.
  • Lessmann et al. [2019] Stefan Lessmann, Johannes Haupt, Kristof Coussement, and Koen W De Bock. Targeting customers for profit: An ensemble learning framework to support marketing decision-making. Information Sciences, 2019.
  • Ben Lazreg et al. [2019a] Mehdi Ben Lazreg, Morten Goodwin, and Ole-Christoffer Granmo. An iterative information retrieval approach from social media in crisis situations. In 2019 International Conference on Information and Communication Technologies for Disaster Management (ICT-DM), pages 1–8. IEEE, 2019a.
  • Ben Lazreg et al. [2019b] Mehdi Ben Lazreg, Nadia Noori, Tina Comes, and Morten Goodwin. Not a target. a deep learning approach for a warning and decision support system to improve safety and security of humanitarian aid workers. In IEEE/WIC/ACM International Conference on Web Intelligence, pages 378–382, 2019b.
  • Bogucki et al. [2019] Robert Bogucki, Marek Cygan, Christin Brangwynne Khan, Maciej Klimek, Jan Kanty Milczek, and Marcin Mucha. Applying deep learning to right whale photo identification. Conservation Biology, 33(3):676–684, 2019.
  • Suryanarayana et al. [2008] Iragavarapu Suryanarayana, Antonio Braibanti, Rupenaguntla Sambasiva Rao, Veluri Anantha Ramam, Duvvuri Sudarsan, and Gollapalli Nageswara Rao. Neural networks in fisheries research. Fisheries Research, 92(2-3):115–139, 2008.
  • Grasso et al. [2019] Isabella Grasso, Stephen D Archer, Craig Burnell, Benjamin Tupper, Carlton Rauschenberg, Kohl Kanwit, and Nicholas R Record. The hunt for red tides: Deep learning algorithm forecasts shellfish toxicity at site scales in coastal maine. Ecosphere, 10(12):e02960, 2019.
  • Marre et al. [2020] Guilhem Marre, Cedric De Almeida Braga, Dino Ienco, Sandra Luque, Florian Holon, and Julie Deter. Deep convolutional neural networks to monitor coralligenous reefs: Operationalizing biodiversity and ecological assessment. Ecological Informatics, page 101110, 2020.
  • Aloysius and Geetha [2017] Neena Aloysius and M Geetha. A review on deep convolutional neural networks. In 2017 International Conference on Communication and Signal Processing (ICCSP), pages 0588–0592. IEEE, 2017.
  • Dargan et al. [2019] Shaveta Dargan, Munish Kumar, Maruthi Rohit Ayyagari, and Gulshan Kumar. A survey of deep learning and its applications: A new paradigm to machine learning. Archives of Computational Methods in Engineering, pages 1–22, 2019.
  • Ferreira et al. [2020a] André C Ferreira, Liliana R Silva, Francesco Renna, Hanja B Brandl, Julien P Renoult, Damien R Farine, Rita Covas, and Claire Doutrelant. Deep learning-based methods for individual recognition in small birds. Methods in Ecology and Evolution, 11(9):1072–1085, 2020a.
  • LeCun et al. [1995] Yann LeCun, Yoshua Bengio, et al. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10):1995, 1995.
  • Krizhevsky et al. [2012] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 25, pages 1097–1105. Curran Associates, Inc., 2012. URL https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.
  • Szegedy et al. [2015] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
  • He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • Hu et al. [2018] Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141, 2018.
  • Girshick et al. [2014] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587, 2014.
  • Koch et al. [2015] Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. Siamese neural networks for one-shot image recognition. In ICML deep learning workshop, volume 2. Lille, 2015.
  • Goodfellow et al. [2016] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. Deep learning, volume 1. MIT press Cambridge, 2016.
  • Chaudhari et al. [2019] Sneha Chaudhari, Gungor Polatkan, Rohan Ramanath, and Varun Mithal. An attentive survey of attention models. arXiv preprint arXiv:1904.02874, 2019.
  • Phan et al. [2019] Huy Phan, Oliver Y Chén, Lam Pham, Philipp Koch, Maarten De Vos, Ian McLoughlin, and Alfred Mertins.

    Spatio-temporal attention pooling for audio scene classification.

    Proc. INTERSPEECH, 2019.
  • Tay et al. [2020] Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. Efficient transformers: A survey. arXiv preprint arXiv:2009.06732, 2020.
  • Moritz et al. [2020] Niko Moritz, Gordon Wichern, Takaaki Hori, and Jonathan Le Roux. All-in-one transformer: Unifying speech recognition, audio tagging, and event detection. Proc. Interspeech, pages 3112–3116, 2020.
  • Zhang et al. [2020] Zhichao Zhang, Shugong Xu, Shunqing Zhang, Tianhao Qiao, and Shan Cao. Attention based convolutional recurrent neural network for environmental sound classification. Neurocomputing, 2020.
  • Deng et al. [2009a] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009a.
  • Huang et al. [2016] Ho Chun Huang, John Joseph, Ming Jer Huang, and Tetyana Margolina. Automated detection and identification of blue and fin whale foraging calls by combining pattern recognition and machine learning techniques. In OCEANS 2016 MTS/IEEE Monterey, pages 1–7. IEEE, 2016.
  • Connolly et al. [2021] Rod Connolly, David Fairclough, Eric Jinks, Ellen Ditria, Gary Jackson, Sebastian Lopez-Marcano, Andrew Olds, and Kristin Jinks. Improved accuracy for automated counting of a fish in baited underwater videos for stock assessment. bioRxiv, 2021.
  • Redmon et al. [2016] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
  • Bochkovskiy et al. [2020] Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020.
  • Yang et al. [2021] Xinting Yang, Song Zhang, Jintao Liu, Qinfeng Gao, Shuanglin Dong, and Chao Zhou. Deep learning for smart fish farming: applications, opportunities and challenges. Reviews in Aquaculture, 13(1):66–90, 2021.
  • Shin et al. [2021] Younghak Shin, Jeong Hyeon Choi, and Han Suk Choi. Deep learning based fish object detection and tracking for smart aqua farm. The Journal of the Korea Contents Association, 21(1):552–560, 2021.
  • Jalal et al. [2020] Ahsan Jalal, Ahmad Salman, Ajmal Mian, Mark Shortis, and Faisal Shafait. Fish detection and species classification in underwater environments using deep learning with temporal information. Ecological Informatics, 57:101088, 2020.
  • Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.
  • Deng et al. [2009b] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009b.
  • Fisher et al. [2016] Robert B Fisher, Yun-Heh Chen-Burger, Daniela Giordano, Lynda Hardman, Fang-Pang Lin, et al. Fish4Knowledge: collecting and analyzing massive coral reef fish video data, volume 104. Springer, 2016.
  • Link et al. [2015] Jason S Link, Roger B Griffis, and Danielle Shallin Busch. Noaa fisheries climate science strategy. 2015.
  • Ditria et al. [2021] Ellen M Ditria, Rod M Connolly, Eric L Jinks, and Sebastian Lopez-Marcano. Annotated video footage for automated identification and counting of fish in unconstrained seagrass habitats. Frontiers in Marine Science, 2021.
  • Beyan et al. [2015] Cigdem Beyan, Bastian J Boom, Jolanda M P Liefhebber, Kwang-tsao Shao, and Robert B Fisher. Natural swimming speed of Dascyllus reticulatus increases with water temperature. 72(August):2506–2511, 2015.
  • Lopez-Marcano et al. [2021] Sebastian Lopez-Marcano, Christopher J Brown, Michael Sievers, and Rod M Connolly. The slow rise of technology: Computer vision techniques in fish population connectivity. Aquatic Conservation: Marine and Freshwater Ecosystems, 31(1):210–217, 2021.
  • Ristic et al. [2004] B. Ristic, S. Arulampalam, and N. Gordon. Beyond the Kalman Filter: Particle Filters for Tracking Applications. Artech House radar library. Artech House, 2004. ISBN 9781580536318. URL https://books.google.no/books?id=cjFDngEACAAJ.
  • Barreiros et al. [2021] Marta de Oliveira Barreiros, Diego de Oliveira Dantas, Luís Claudio de Oliveira Silva, Sidarta Ribeiro, and Allan Kardec Barros. Zebrafish tracking using yolov2 and kalman filter. Scientific Reports, 11(1):3219, 2021. doi: 10.1038/s41598-021-81997-9. URL https://doi.org/10.1038/s41598-021-81997-9.
  • Ciaparrone et al. [2020] Gioele Ciaparrone, Francisco Luque Sánchez, Siham Tabik, Luigi Troiano, Roberto Tagliaferri, and Francisco Herrera. Deep learning in video multi-object tracking: A survey. Neurocomputing, 381:61–88, 2020.
  • Knausgård et al. [2014] Kristian Muri Knausgård, Arne Wiklund, Tonje Knutsen Sørdalen, Kim Halvorsen, Alf Ring Kleiven, Lei Jiao, and Morten Goodwin. Temperate fish detection and classification: a deep learning based approach. Applied Intelligence, 2014.
  • Lombard et al. [2019] Fabien Lombard, Emmanuel Boss, Anya M Waite, Meike Vogt, Julia Uitz, Lars Stemmann, Heidi M Sosik, Jan Schulz, Jean-Baptiste Romagnan, Marc Picheral, et al. Globally consistent quantitative observations of planktonic ecosystems. Frontiers in Marine Science, 6:196, 2019.
  • Field et al. [1998] Christopher B Field, Michael J Behrenfeld, James T Randerson, and Paul Falkowski. Primary production of the biosphere: integrating terrestrial and oceanic components. science, 281(5374):237–240, 1998.
  • Culverhouse [2007] Phil F. Culverhouse. Natural object categorization: Man versus machine. In Automated Taxon Identification in Systematics: Theory, Approaches and Applications, pages 25–46. CRC Press, MacLeod, n. edition, 2007.
  • Culverhouse et al. [2003] Phil F Culverhouse, Robert Williams, Beatriz Reguera, Vincent Herry, and Sonsoles González-Gil. Do experts make mistakes? a comparison of human and machine indentification of dinoflagellates. Marine ecology progress series, 247:17–25, 2003.
  • Luo et al. [2018] Jessica Y Luo, Jean-Olivier Irisson, Benjamin Graham, Cedric Guigand, Amin Sarafraz, Christopher Mader, and Robert K Cowen. Automated plankton image analysis using convolutional neural networks. Limnology and Oceanography: Methods, 16(12):814–827, 2018.
  • Wang et al. [2020] N. Wang, J. Yu, B. Yang, H. Zheng, and B. Zheng. Vision-based in situ monitoring of plankton size spectra via a convolutional neural network. 45(2):511–520, 2020. ISSN 1558-1691. doi: 10.1109/JOE.2018.2881387.
  • Alvarez et al. [2014] Eva Alvarez, Marta Moyano, Angel Lopez-Urrutia, Enrique Nogueira, and Renate Scharek. Routine determination of plankton community composition and size structure: a comparison between FlowCAM and light microscopy. 36(1):170–184, 2014. ISSN 0142-7873. doi: 10.1093/plankt/fbt069. Type: Article.
  • Kerr et al. [2020] Thomas Kerr, James R Clark, Elaine S Fileman, Claire E Widdicombe, and Nicolas Pugeault. Collaborative deep learning models to handle class imbalance in flowcam plankton imagery. IEEE Access, 8:170013–170032, 2020.
  • Fischer et al. [2020] Alexis D. Fischer, Kendra Hayashi, Anna McGaraghan, and Raphael M. Kudela. Return of the “age of dinoflagellates” in monterey bay: Drivers of dinoflagellate dominance examined using automated imaging flow cytometry and long-term time series analysis. 65(9):2125–2141, 2020. ISSN 0024-3590. doi: 10.1002/lno.11443. URL https://aslopubs.onlinelibrary.wiley.com/doi/full/10.1002/lno.11443. Type: Article.
  • Faillettaz et al. [2016] Robin Faillettaz, Marc Picheral, Jessica Y Luo, Cédric Guigand, Robert K Cowen, and Jean-Olivier Irisson. Imperfect automatic image classification successfully describes plankton distribution patterns. Methods in Oceanography, 15:60–77, 2016.
  • Orenstein et al. [2020a] Eric C. Orenstein, Kasia M. Kenitz, Paul L. D. Roberts, Peter J. S. Franks, Jules S. Jaffe, and Andrew D. Barton. Semi- and fully supervised quantification techniques to improve population estimates from machine classifiers. 18(12):739–753, 2020a. ISSN 1541-5856. doi: 10.1002/lom3.10399. URL https://aslopubs.onlinelibrary.wiley.com/doi/full/10.1002/lom3.10399. Type: Article.
  • Briseño-Avena et al. [2020] Christian Briseño-Avena, Moritz S. Schmid, Kelsey Swieca, Su Sponaugle, Richard D. Brodeur, and Robert K. Cowen. Three-dimensional cross-shelf zooplankton distributions off the central oregon coast during anomalous oceanographic conditions. 188:102436, 2020. ISSN 00796611. doi: 10.1016/j.pocean.2020.102436. URL https://linkinghub.elsevier.com/retrieve/pii/S0079661120301750.
  • Ellen et al. [2019] Jeffrey S. Ellen, Casey A. Graff, and Mark D. Ohman. Improving plankton image classification using context metadata. 17(8):439–461, 2019. ISSN 1541-5856. doi: 10.1002/lom3.10324. URL https://aslopubs.onlinelibrary.wiley.com/doi/full/10.1002/lom3.10234. Type: Article.
  • Schroeder et al. [2020] Simon-Martin Schroeder, Rainer Kiko, and Reinhard Koch. MorphoCluster: Efficient annotation of plankton images by clustering. 20(11), 2020. doi: 10.3390/s20113060. Type: Article.
  • Cheng et al. [2020] Xuemin Cheng, Yong Ren, Kaichang Cheng, Jie Cao, and Qun Hao. Method for training convolutional neural networks for in situ plankton image recognition and classification based on the mechanisms of the human eye. 20(9), 2020. doi: 10.3390/s20092592. Type: Article.
  • Wang et al. [2019] Zhaohui Aleck Wang, Hassan Moustahfid, Amy V Mueller, Anna PM Michel, Matthew Mowlem, Brian T Glazer, T Aran Mooney, William Michaels, Jonathan S McQuillan, Julie C Robidart, et al. Advancing observation of ocean biogeochemistry, biology, and ecosystems with cost-effective in situ sensing technologies. Frontiers in Marine Science, 6:519, 2019.
  • Yu et al. [2016] Xinsheng Yu, Yajuan Wei, Minliang Zhu, and Zhangguo Zhou. Automated classification of zooplankton for a towed imaging system. In OCEANS 2016 - SHANGHAI, OCEANS-IEEE, 2016. ISBN 978-1-4673-9724-7. ISSN: 0197-7385 Type: Proceedings Paper.
  • Uusitalo et al. [2016] Laura Uusitalo, Jose A. Fernandes, Eneko Bachiller, Siru Tasala, and Maiju Lehtiniemi. Semi-automated classification method addressing marine strategy framework directive (MSFD) zooplankton indicators. 71:398–405, 2016. ISSN 1470-160X. doi: 10.1016/j.ecolind.2016.05.036. Type: Article.
  • Gorocs et al. [2018] Zoltan Gorocs, Miu Tamamitsu, Vittorio Bianco, Patrick Wolf, Shounak Roy, Koyoshi Shindo, Kyrollos Yanny, Yichen Wu, Hatice Ceylan Koydemir, Yair Rivenson, and Aydogan Ozcan. A deep learning-enabled portable imaging flow cytometer for cost-effective, high-throughput, and label-free analysis of natural water samples. 7, 2018. ISSN 2047-7538. doi: 10.1038/s41377-018-0067-0. Type: Article.
  • Orenstein et al. [2020b] Eric C. Orenstein, Devin Ratelle, Christian Briseno-Avena, Melissa L. Carter, Peter J. S. Franks, Jules S. Jaffe, and Paul L. D. Roberts. The scripps plankton camera system: A framework and platform for in situ microscopy. 18(11):681–695, 2020b. ISSN 1541-5856. doi: 10.1002/lom3.10394. Type: Article.
  • Li et al. [2020] Qiong Li, Xin Sun, Junyu Dong, Shuqun Song, Tongtong Zhang, Dan Liu, Han Zhang, and Shuai Han. Developing a microscopic image dataset in support of intelligent phytoplankton detection using deep learning. 77(4):1427–1439, 2020. ISSN 1054-3139. doi: 10.1093/icesjms/fsz171. Type: Article.
  • Chen [2021] Jianping Li; Zhenyu Yang; Tao Chen. Dyb-planktonnet, 2021. URL https://dx.doi.org/10.21227/875n-f104.
  • Schmid et al. [2021] Moritz S Schmid, Dominic Daprano, Kyler M Jacobson, Christopher Sullivan, Christian Briseño-Avena, Jessica Y Luo, and Robert K Cowen. A Convolutional Neural Network based high- throughput image classification pipeline - code and documentation to process plankton underwater imagery using local HPC infrastructure and NSF’s XSEDE, May 2021. URL https://doi.org/10.5281/zenodo.4641158. This project was funded by the National Science Foundation under grant numbers OCE-1737399 and OCE-1419987, the National Aeronautics and Space Administration under grant number 80NSSC20M0008, the Belmont Forum (through NSF grant number 1927710), as well as the Extreme Science and Engineering Discovery Environment (XSEDE) under grant number OCE170012.
  • Picheral et al. [2017] Marc Picheral, S Colin, and Jean-Olivier Irisson. EcoTaxa—a tool for the taxonomic classification of images, 2017. URL http://ecotaxa.obs-vlfr.fr/.
  • Gorsky et al. [2019] Gabriel Gorsky, Guillaume Bourdin, Fabien Lombard, Maria Luiza Pedrotti, Samuel Audrain, Nicolas Bin, Emmanuel Boss, Chris Bowler, Nicolas Cassar, Loic Caudan, Genevieve Chabot, Natalie R. Cohen, Daniel Cron, Colomban De Vargas, John R. Dolan, Eric Douville, Amanda Elineau, J. Michel Flores, Jean Francois Ghiglione, Nils Haëntjens, Martin Hertau, Seth G. John, Rachel L. Kelly, Ilan Koren, Yajuan Lin, Dominique Marie, Clémentine Moulin, Yohann Moucherie, Stéphane Pesant, Marc Picheral, Julie Poulain, Mireille Pujo-Pay, Gilles Reverdin, Sarah Romac, Mathew B. Sullivan, Miri Trainic, Marc Tressol, Romain Troublé, Assaf Vardi, Christian R. Voolstra, Patrick Wincker, Sylvain Agostini, Bernard Banaigs, Emilie Boissin, Didier Forcioli, Paola Furla, Pierre E. Galand, Eric Gilson, Stéphanie Reynaud, Shinichi Sunagawa, Olivier P. Thomas, Rebecca Lisette Vega Thurber, Didier Zoccola, Serge Planes, Denis Allemand, and Eric Karsenti. Expanding tara oceans protocols for underway, ecosystemic sampling of the ocean-atmosphere interface during tara pacific expedition (2016–2018). 6:750, 2019. ISSN 2296-7745. doi: 10.3389/fmars.2019.00750. URL https://www.frontiersin.org/article/10.3389/fmars.2019.00750.
  • Wiggins and Hildebrand [2016] Sean M Wiggins and John A Hildebrand. Long-term monitoring of cetaceans using autonomous acoustic recording packages. In Listening in the Ocean, pages 35–59. Springer, 2016.
  • Wiggins and Hildebrand [2007] Sean M Wiggins and John A Hildebrand. High-frequency acoustic recording package (harp) for broad-band, long-term marine mammal monitoring. In 2007 Symposium on Underwater Technology and Workshop on Scientific Use of Submarine Cables and Related Technologies, pages 551–557. IEEE, 2007.
  • Baumgartner and Mussoline [2011] Mark F Baumgartner and Sarah E Mussoline. A generalized baleen whale call detection and classification system. The Journal of the Acoustical Society of America, 129(5):2889–2902, 2011.
  • Giannakis and Tsatsanis [1990] Georgios B Giannakis and Michail K Tsatsanis. Signal detection and classification using matched filtering and higher order statistics. IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(7):1284–1296, 1990.
  • Mellinger and Clark [1997] David K Mellinger and Christopher W Clark. Methods for automatic detection of mysticete sounds. Marine & Freshwater Behaviour & Phy, 29(1-4):163–181, 1997.
  • Piczak [2015] Karol J Piczak. Environmental sound classification with convolutional neural networks. In 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), pages 1–6. IEEE, 2015.
  • Mushtaq et al. [2021] Zohaib Mushtaq, Shun-Feng Su, and Quoc-Viet Tran. Spectral images based environmental sound classification using cnn with meaningful data augmentation. Applied Acoustics, 172:107581, 2021.
  • Sharma et al. [2020] Jivitesh Sharma, Ole-Christoffer Granmo, and Morten Goodwin. Environment Sound Classification Using Multiple Feature Channels and Attention Based Deep Convolutional Neural Network. In Proc. Interspeech 2020, pages 1186–1190, 2020. doi: 10.21437/Interspeech.2020-1303. URL http://dx.doi.org/10.21437/Interspeech.2020-1303.
  • Zhong et al. [2020] Ming Zhong, Manuel Castellote, Rahul Dodhia, Juan Lavista Ferres, Mandy Keogh, and Arial Brewer. Beluga whale acoustic signal classification using deep learning neural network models. The Journal of the Acoustical Society of America, 147(3):1834–1841, 2020.
  • Shiu et al. [2020] Yu Shiu, KJ Palmer, Marie A Roch, Erica Fleishman, Xiaobai Liu, Eva-Marie Nosal, Tyler Helble, Danielle Cholewiak, Douglas Gillespie, and Holger Klinck. Deep neural networks for automated detection of marine mammal species. Scientific reports, 10(1):1–12, 2020.
  • Bergler et al. [2019] Christian Bergler, Hendrik Schröter, Rachael Xi Cheng, Volker Barth, Michael Weber, Elmar Nöth, Heribert Hofer, and Andreas Maier. Orca-spot: An automatic killer whale sound detection toolkit using deep learning. Scientific reports, 9(1):1–17, 2019.
  • Bermant et al. [2019] Peter C Bermant, Michael M Bronstein, Robert J Wood, Shane Gero, and David F Gruber. Deep machine learning techniques for the detection and classification of sperm whale bioacoustics. Scientific reports, 9(1):1–10, 2019.
  • Rasmussen and Širović [2021] Jeppe Have Rasmussen and Ana Širović. Automatic detection and classification of baleen whale social calls using convolutional neural networks. The Journal of the Acoustical Society of America, 149(5):3635–3644, 2021. doi: 10.1121/10.0005047. URL https://doi.org/10.1121/10.0005047.
  • Sun et al. [2019] Yanan Sun, Bing Xue, Mengjie Zhang, and Gary G Yen. Completely automated cnn architecture design based on blocks. IEEE transactions on neural networks and learning systems, 31(4):1242–1254, 2019.
  • Hochreiter and Schmidhuber [1997] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  • Vaswani et al. [2017] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. arXiv preprint arXiv:1706.03762, 2017.
  • Clutton-Brock and Sheldon [2010] Tim Clutton-Brock and Ben C. Sheldon. Individuals and populations: the role of long-term, individual-based studies of animals in ecology and evolutionary biology. Trends in Ecology & Evolution, 25(10):562–573, oct 2010. ISSN 01695347. doi: 10.1016/j.tree.2010.08.002. URL https://linkinghub.elsevier.com/retrieve/pii/S0169534710001849.
  • Pine et al. [2003] William E Pine, Kenneth H Pollock, Joseph E Hightower, Thomas J Kwak, and James A Rice. A review of tagging methods for estimating fish population size and components of mortality. Fisheries, 28(10):10–23, 2003.
  • Ferreira et al. [2020b] André C. Ferreira, Liliana R. Silva, Francesco Renna, Hanja B. Brandl, Julien P. Renoult, Damien R. Farine, Rita Covas, and Claire Doutrelant. Deep learning‐based methods for individual recognition in small birds. Methods in Ecology and Evolution, 11(9), 2020b. ISSN 2041-210X. doi: 10.1111/2041-210X.13436.
  • Carter et al. [2014] Steven JB Carter, Ian P Bell, Jessica J Miller, and Peter P Gash. Automated marine turtle photograph identification using artificial neural networks, with application to green turtles. Journal of experimental marine biology and ecology, 452:105–110, 2014.
  • Schofield et al. [2019] Daniel Schofield, Arsha Nagrani, Andrew Zisserman, Misato Hayashi, Tetsuro Matsuzawa, Dora Biro, and Susana Carvalho.

    Chimpanzee face recognition from videos in the wild using deep learning.

    Science Advances, 5(9):1–10, 2019. ISSN 23752548. doi: 10.1126/sciadv.aaw0736.
  • dala Corte and Moschetta [2016] Renato B. dala Corte and Fernando G Moschetta, Júlia B.and Becker. Photo-identification as a technique for recognition of individual fish: A test with the freshwater armored catfish rineloricaria aequalicuspis reis & cardoso, 2001 (siluriformes: Loricariidae). Neotropical Ichthyology, 14(1), 2016. ISSN 19820224. doi: 10.1590/1982-0224-20150074.
  • Hau and Sadovy de Mitcheson [2019] Cheuk Yu Hau and Yvonne Sadovy de Mitcheson. A facial recognition tool and legislative changes for improved enforcement of the cites appendix ii listing of the humphead wrasse, cheilinus undulatus. Aquatic Conservation: Marine and Freshwater Ecosystems, 29(12), 2019. ISSN 10990755. doi: 10.1002/aqc.3199.
  • Mucientes et al. [2019] Gonzalo Mucientes, José Irisarri, and David Villegas-Ríos. Interannual fine-scale site fidelity of male ballan wrasse labrus bergylta revealed by photo-identification and tagging. Journal of Fish Biology, 95(4), 2019. ISSN 0022-1112.
  • Aguzzi et al. [2015] J. Aguzzi, C. Doya, S. Tecchio, F. C. De Leo, E. Azzurro, C. Costa, V. Sbragaglia, J. Del Río, J.and Navarro, H. A. Ruhl, J. B. Company, P. Favali, A. Purser, L. Thomsen, and I. A. Catalán. Coastal observatories for monitoring of fish behaviour and their responses to environmental changes. Reviews in Fish Biology and Fisheries, 25(3):463–483, 2015. ISSN 09603166. doi: 10.1007/s11160-015-9387-9.
  • Campos-Candela et al. [2018] Andrea Campos-Candela, Miquel Palmer, Salvador Balle, and Josep Alós. A camera-based method for estimating absolute density in animals displaying home range behaviour. Journal of Animal Ecology, 87(3):825–837, 2018. ISSN 13652656. doi: 10.1111/1365-2656.12787.
  • Perry et al. [2018] Diana Perry, Thomas A.B. Staveley, and Martin Gullström. Habitat connectivity of fish in temperate shallow-water seascapes. Frontiers in Marine Science, 4(1):1–12, 2018. ISSN 22967745. doi: 10.3389/fmars.2017.00440.
  • Winn [1964] H. E. Winn. The biological significance offish sounds. in marine bioacoustics, 1964.
  • Parsons et al. [2009] Miles J Parsons, Robert D McCauley, Michael C Mackie, Paulus Siwabessy, and Alec J Duncan. Localization of individual mulloway (argyrosomus japonicus) within a spawning aggregation and their behaviour throughout a diel spawning period. ICES Journal of Marine science, 66(6):1007–1014, 2009.
  • Deng and Yu [2014] Li Deng and Dong Yu. Deep learning: methods and applications. Foundations and trends in signal processing, 7(3–4):197–387, 2014.
  • Brown and Macfadyen [2007] James Brown and Graeme Macfadyen. Ghost fishing in european waters: Impacts and management responses. Marine Policy, 31(4):488–504, 2007.
  • Vadziutsina and Riera [2020] Maria Vadziutsina and Rodrigo Riera. Review of fish trap fisheries from tropical and subtropical reefs: Main features, threats and management solutions. Fisheries Research, 223:105432, 2020.
  • Labbe-Morissette and Gauthier [2020] Guillaume Labbe-Morissette and Sylvain Gauthier. Unsupervised extraction of underwater regions of interest in side scan sonar imagery. Journal of Ocean Technology, 15(1), 2020.
  • Politikos et al. [2021] Dimitris V. Politikos, Elias Fakiris, Athanasios Davvetas, Iraklis A. Klampanos, and George Papatheodorou. Automatic detection of seafloor marine litter using towed camera images and deep learning. Marine Pollution Bulletin, 164:111974, 2021. ISSN 0025-326X. doi: https://doi.org/10.1016/j.marpolbul.2021.111974. URL https://www.sciencedirect.com/science/article/pii/S0025326X21000084.
  • Saba et al. [2021] Grace K Saba, Adrian B Burd, John P Dunne, Santiago Hernández-León, Angela H Martin, Kenneth A Rose, Joseph Salisbury, Deborah K Steinberg, Clive N Trueman, Rod W Wilson, et al. Toward a better understanding of fish-based contribution to ocean carbon flux. Limnology and Oceanography, 66(5):1639–1664, 2021.
  • Martin et al. [2021] Angela Helen Martin, Heidi Christine Pearson, Grace Kathleen Saba, and Esben Moland Olsen. Integral functions of marine vertebrates in the ocean carbon cycle and climate change mitigation. One Earth, 4(5):680–693, 2021.
  • Sanchez-Gonzalez et al. [2020] Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, and Peter Battaglia. Learning to simulate complex physics with graph networks. In International Conference on Machine Learning, pages 8459–8468. PMLR, 2020.
  • Hunt et al. [2020] Corallie Hunt, Urška Demšar, Dayton Dove, Craig Smeaton, Rhys Cooper, and William EN Austin. Quantifying marine sedimentary carbon: a new spatial analysis approach using seafloor acoustics, imagery, and ground-truthing data in scotland. Frontiers in Marine Science, 7:588, 2020.
  • Wilson et al. [2018] Robert J Wilson, Douglas C Speirs, Alessandro Sabatino, and Michael R Heath. A synthetic map of the north-west european shelf sedimentary environment for applications in marine science. Earth System Science Data, 10(1):109–130, 2018.
  • Schiettekatte et al. [2020] Nina MD Schiettekatte, Diego R Barneche, Sébastien Villéger, Jacob E Allgeier, Deron E Burkepile, Simon J Brandl, Jordan M Casey, Alexandre Mercière, Katrina S Munsterman, Fabien Morat, et al. Nutrient limitation, bioenergetics and stoichiometry: A new model to predict elemental fluxes mediated by fishes. Functional Ecology, 34(9):1857–1869, 2020.
  • Martin et al. [2020] Adrian Martin, Philip Boyd, Ken Buesseler, Ivona Cetinic, Hervé Claustre, Sari Giering, Stephanie Henson, Xabier Irigoien, Iris Kriest, Laurent Memery, et al. The oceans’ twilight zone must be studied now, before it is too late, 2020.
  • Bohan et al. [2011] David A Bohan, Geoffrey Caron-Lormier, Stephen Muggleton, Alan Raybould, and Alireza Tamaddoni-Nezhad. Automated discovery of food webs from ecological data using logic-based machine learning. PLoS One, 6(12):e29028, 2011.
  • Lyubchich and Woodland [2019] Vyacheslav Lyubchich and Ryan J Woodland. Using isotope composition and other node attributes to predict edges in fish trophic networks. Statistics & Probability Letters, 144:63–68, 2019.
  • Whitt et al. [2020] Christopher Whitt, Jay Pearlman, Brian Polagye, Frank Caimi, Frank Muller-Karger, Andrea Copping, Heather Spence, Shyam Madhusudhana, William Kirkwood, Ludovic Grosjean, Bilal Muhammad Fiaz, Satinder Singh, Sikandra Singh, Dana Manalang, Ananya Sen Gupta, Alain Maguer, Justin J. H. Buck, Andreas Marouchos, Malayath Aravindakshan Atmanand, Ramasamy Venkatesan, Vedachalam Narayanaswamy, Pierre Testor, Elizabeth Douglas, Sebastien de Halleux, and Siri Jodha Khalsa. Future vision for autonomous ocean observations. Frontiers in Marine Science, 7:697, 2020. ISSN 2296-7745. doi: 10.3389/fmars.2020.00697. URL https://www.frontiersin.org/article/10.3389/fmars.2020.00697.
  • Szesciorka et al. [2020] Angela R Szesciorka, Lisa T Ballance, Ana Širović, Ally Rice, Mark D Ohman, John A Hildebrand, and Peter JS Franks. Timing is everything: Drivers of interannual variability in blue whale migration. Scientific reports, 10(1):1–9, 2020.
  • Tam et al. [2017] Jamie C. Tam, Jason S. Link, Axel G. Rossberg, Stuart I. Rogers, Philip S. Levin, Marie-Joëlle Rochet, Alida Bundy, Andrea Belgrano, Simone Libralato, Maciej Tomczak, Karen van de Wolfshaar, Fabio Pranovi, Elena Gorokhova, Scott I. Large, Nathalie Niquil, Simon P. R. Greenstreet, Jean-Noel Druon, Jurate Lesutiene, Marie Johansen, Izaskun Preciado, Joana Patricio, Andreas Palialexis, Paul Tett, Geir O. Johansen, Jennifer Houle, and Anna Rindorf. Towards ecosystem-based management: identifying operational food-web indicators for marine ecosystems. ICES Journal of Marine Science, 74(7):2040–2052, 02 2017. ISSN 1054-3139. doi: 10.1093/icesjms/fsw230. URL https://doi.org/10.1093/icesjms/fsw230.
  • McDonald et al. [2018] Karlie S. McDonald, Alistair J. Hobday, Elizabeth A. Fulton, and Peter A. Thompson. Interdisciplinary knowledge exchange across scales in a globally changing marine environment. Global Change Biology, 24(7):3039–3054, 2018. doi: https://doi.org/10.1111/gcb.14168. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/gcb.14168.
  • Wilkinson et al. [2019] Mark D Wilkinson, Michel Dumontier, Ijsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E Bourne, et al. The fair guiding principles for scientific data management and stewardship (vol 15, 160018, 2016). Scientific data, 6, 2019.
  • Langenkämper et al. [2017] Daniel Langenkämper, Martin Zurowietz, Timm Schoening, and Tim W. Nattkemper. Biigle 2.0 - browsing and annotating large marine image collections. Frontiers in Marine Science, 4:83, 2017. ISSN 2296-7745. doi: 10.3389/fmars.2017.00083. URL https://www.frontiersin.org/article/10.3389/fmars.2017.00083.