The uncertanties around the state of global insect populations is largely due to data gaps and more efficient methods for quantifying abundance and identifying invertebrates are urgently needed (seibold2019; wagner2020). Commonly used passive traps, such as the Malaise traps produce samples, which are time consuming to process. For this reason, samples are sometimes only weighed - as was the case in the study, which triggered the global attention around insect declines (hallmann2017). In other studies, specimens are lumped into larger taxonomic groups (timms2012; hoye2008; rich2013), or only specific taxa are identified (loboda2017; hansen2016). On the other hand, such traps help standardise efforts across sampling events and are often preferred in long-term monitoring. Hence, the time and expertise needed to process (sort, identify, count and potentially weigh) samples of insects and other invertebrates from passive traps remains a key bottleneck in entomological research. In light of the apparent global decline of many invertebrate taxa, and the Linnean shortfall (that only a small fraction of all species on Earth are described; hortal2015), more efficient ways of processing invertebrate samples are in high demand. Such methods should ideally 1) not destroy specimens, which could be new to the study area or even new to science, 2) count the abundance of individual species, and 3) estimate the biomass of such samples.
Reliable identification of species is pivotal but due to its inherent slowness and high costs, traditional manual identification has caused bottlenecks in the bioassessment process. As the demand for biological monitoring grows, and the number of taxonomic experts declines (gaston2004), there is a need for alternatives to the manual processing and identification of monitoring samples (borja2013; nygard2016). While genetic approaches are gaining popularity and becoming standard tools in diversity assessments (raupach2010; keskin2014; dunker2016; aylagas2016; kermarrec2014; elbrecht2017; zimmermann2015) they are still expensive and are not yet suitable to produce reliable abundance data or estimates of biomass. In stead, machine learning methods could be used to replace or semi-automate the task of manual species identification.
Several computer-based identification systems for biological monitoring have been proposed and tested in the last two decades. While potamis2014
has classified birds based on sound andqian2011 have used acoustic signals to identify bark beetles, most computer-based identification systems use morphological features and image data for species prediction. schroder1995; weeks1997; liu2008; lequing2012; perre2016 and feng2016 have classified bees, butterflies, fruit flies and wasps based on features calculated from their wings. In aquatic research, automatic or semi-automatic systems have been developed to identify algae (e.g. santhi2013), zooplankton (e.g. dai2016; bochinski2018) and benthic macroinvertebrates (e.g. raitoharju2019b; arje2020). In recent years, iNaturalist, a citizen-science application and community for recoding and sharing nature observations, has accumulated a notable database of taxa images for training state-of-the-art CNNs (vanhorn2018). However, such field photos will not provide the same accuracy as can be achieved in the lab under controlled light conditions.
Classification based on single 2D images can suffer from variations of the viewing angle and certain morphological traits being omitted. To overcome those limitations zhang2010 have proposed a method for structuring 3D insect models from 2D images. raitoharju2018 have presented an imaging system producing multiple images from two different angles for benthic macroinvertebrates. Using this latter imaging device and deep CNNs, arje2020 have achieved classification accuracy within the range of human taxonomic experts.
Our aim for this work was 1) to make a reproducible imaging system, 2) to test the importance of different camera settings 3) to evaluate overall classification accuracy, and 4) to test the possibility of deriving biomass straight from geometrical features in images. To obtain these objectives, we rebuilt the imaging system presented in raitoharju2018 using industry components to make it completely reproducible. It has been made light proof to prevent false light from affecting the images. We also developed a flushing mechanism to pass specimens through the imaging device. This is a critical improvement for automation as explained below. For classification, we used Resnet-50 (he2016) and InceptionV3 (szegedy2016) CNNs. We tested different camera settings (exposure time and aperture) to find the optimal settings for species identification, and we explored the necessary number of images per specimen to achieve high classification accuracy. Finally, for a subset of species, we tested if the area of a specimen derived directly from images taken by the device could serve as a proxy for biomass (dry weight) of the specimen.
2 Materials and Methods
To facilitate the automation of specimen identification, biomass estimation and sorting of invertebrate specimens, we improved the prototype imaging system developed for automatic identication of benthic macroinvertebrates (raitoharju2018). We named the new device the BIODISCOVER machine, as an acronym for BIOlogical specimens Described, Identified, Sorted, Counted, and Observed using Vision-Enabled Robotics. The system comprises an aluminium box with two Basler ACA1920-155UC cameras and LD75 lenses with xo.15 to xo.35 magnification and five aperture settings (maximum aperture ratio of 1:3.8). The cameras are placed at a 90 degree angle to each other at two corners of the case and in the other corners there is a high power LED light (ODSX30-WHI Prox Light, which enables a maximum frame rate of 100 per second with exposure 1000) and a rectangular cuvette made of optical glass and filled with ethanol. The inside of the case is depicted in Fig. (a)a. The case is rubber-sealed and has a lid to minimize light, shadows and other disturbances. The lid has an opening for the cuvette with a funnel for dropping specimens into the liquid. Fig. (b)b shows the new refill system, which pumps ethanol into the cuvette.
The multiview imaging component is connected to a computer with an integrated software, which controls all parts of the machine. The program uses calibration images to detect objects differing from the background and triggers the light and cameras to take images as the specimen sinks in the ethanol until it disappears from the assigned view point of the cameras. The program detects the specimen and crops the images to be 496 pixels wide (defined by the width of the cuvette) and 496 pixels high while keeping the specimen at the center of the image with regards to the height. If a specimen exceeds the height of 496 pixels, the resulting images will be higher. The images are stored onto the computer as PNG files.
The BIODISCOVER machine enables imaging multiple specimens before having to be emptied and refilling the cuvette. This is accommodated by a small area at the bottom of the cuvette, where the specimens are outside of the field of view of the cameras. Once a sample is imaged, the software triggers the opening of a sliding plate, which acts as a valve and flushes the specimens into a container below the imaging device case. Several containers placed in a rack can be controlled by the software based on input from the classification algorithm used to identify species. This enables a sorting of specimens into predefined classes based on size or taxonomy. In this way, the system can, for instance, separate large and small specimens for further molecular study, separate insect orders, or separate common and rare species. The system is described in Fig. 2. After the specimens have been flushed into the container for archiving, the pump in Fig. (b)b is used to refill the cuvette with ethanol.
Prior to large-scale imaging of reference collections of specimens of known identity, it is important to test the camera settings. As we plan to use the BIODISCOVER machine to create a large image database covering both terrestrial and aquatic invertebrates, it is important to optimize the different settings of the device to ensure the best possible image quality of the database with regards to classification accuracy. For this purpose, we imaged a pilot dataset with nine different combinations of camera settings. To study the importance of lighting, we explored the effect of varying exposure values of and to study the effect of the focal length, we explored aperture values . Using the nine different combinations of camera settings, we imaged a dataset of nine terrestrial arthropod species collected at Narsarsuaq, South Greenland and identified by morphology using bocher2015: Bembidion grapii, Byrrhus fasciatus, Coccinella transversoguttata, Otiorhynchus arcticus, Otiorhynchus nodosus, Patrobus septentrionus, Quedius fellmanni, Xysticus deichmanni and Xysticus durus (see Fig. (a)a). For the pilot data we wanted to include both species that have clear visual differences and should be easily indentifiable and species from the same genera that have similar morphological features and are more difficult to tell apart.
The resulting nine datasets include the same specimens but the number of images varies depending on the camera settings since longer exposure time decreases the frame rate. Fig. (b)b shows example images of the same Coccinella transversoguttata
specimen from each of the nine camera setting combinations. Some of the specimens were damaged during the imaging. Therefore to have comparable results, we removed any specimens that were not present in all nine datasets. In addition, we performed a crude, initial check for outliers by calculating the mean of blue, green and red pixel values per species and making a list of all specimens that had mean pixel values further than three standard deviations from the species average. We then manually checked the images of those listed specimens and removed, e.g., images with only air bubbles or severed limbs. After this initial check, the number of images per specimen in the final data ranged from 1 to 376 (with 15 cases where a specimen had only 1 image). Table1 gives details on the final data.
We split the data into training (), validation () and test () observations. As difficult specimens can introduce variation to the results, we performed the tests on 10 different random data divisions. If a specimen was selected for training, all the images of that specimen were used for training. To keep the results comparable between the different camera settings, we used the exact same training specimens for all camera settings. Respectively, the exact same validation and testing specimens were used for each camera setting combination. The number of images, exposure and aperture differed for the camera setting combinations but the specimens remained the same, i.e. if a difficult, atypical specimen of a certain species was selected for testing that same specimen was used for testing all the camera setting combinations, making the identification task equally difficult for all the settings.
To examine whether the BIODISCOVER machine benefits from having two cameras shooting from different angles, we performed a test where, for each specimen, we counted the number of images captured by each of the cameras. To compare the two camera angles, we require an equal amount of images from both cameras. For each specimen, we checked which camera had captured less images and randomly sampled the same amount of images from the other camera as well. Finally, we randomly sampled that same amount of images for each specimen, this time including images from both cameras. Thus, we obtained three datasets, each with the same total amount of images. To account for variation in a single data split, we ran the test again on 10 data divisions into training, validation and test observations.
For the classification task, we tested two widely used deep CNNs, namingly Resnet-50 (he2016) and InceptionV3 (szegedy2016)
, both pretrained with the Imagenet database(deng2009). For each data division, we used the training observations to fine-tune the weights of the pre-trained CNNs. In order to feed the images to the network, we scaled them all to 128128 pixels. This caused slight distortion to specimens taller than 496 px but the majority of the images (
) are square-shaped and thus remained undistorted. We used batch normalization, a batch size of 128, and a decaying learning rate
, training the network for 50 epochs with each learning rate. The validation images were used to select optimal weights for the network. Finally, the test observations were used to test the final classification accuracy.
As we used multiple images per observation, we needed to define a decision rule to determine the final species of the observation based on the predictions for all the images. The simplest option was to use majority vote, i.e. the species that was predicted most often among the images of the specimen was chosen as the final prediction.
The BIODISCOVER machine derives geometric features from each image taken of each individual. These features include the area of the specimen in the image, which can be used for biomass prediction. For this purpose, we imaged three species of Diptera with the optimal camera settings and measured dry weight for a subset of this data (). The species included in this data set were Dolichopus groenlandicus, Dolichopus plumipes and Tachina ampliforceps
. The area was calculated from images as average per specimen. After imaging, each specimen was dried at 70°C for 48 hours and weighed on a scale to the nearest 0.0001g to quantify dry weight. For biomass prediction, we performed a logarithm transformation on the data and fitted a linear mixed model to examine the relationship between the average area and dry weight, using the species as a random factor. However, the model assumptions could not be met with the data, hence, we fitted separate generalized linear models for each species.
Our first objective was to find optimal camera settings for the imaging device for species identification. The average classification accuracy across 10 test sets is presented in Table 2 and Fig. 4. Based on the results from our pilot data, the optimal camera settings for both CNNs were exposure and aperture . The InceptionV3 network produced the highest classification accuracy with these camera settings. For InceptionV3, the best camera settings also yielded the second lowest standard deviation. The differences between the settings were small but we observed that decreasing aperture to 1:16 decreased the classification accuracy. For higher exposure, an initial decrease in aperture enhanced the results while decreasing aperture to 1:16 decreased classification accuracy. For exposure , even increasing aperture to 8 decreased classification accuracy. The optimal camera settings are intuitive as they provide sharp images while having as much light as possible.
In addition to majority vote, we used the weighted sum rule for CNN confidence values presented in raitoharju2019b. Using the weighted sum rule for confidence values for our image data gave varying results. For Resnet-50, the weighted sum rule produced slightly higher accuracies across all camera settings but for InceptionV3, it produced sometimes lower and sometimes higher accuracies with the highest value for exposure , aperture being almost identical to that of the majority vote rule. Hence, we used the majority vote decision rule in our classification results. The results for the weighted sum rule are shown in detail in Table 6 in the Appendix.
Table 3 gives the average training times (on K80 GPU) for the different camera settings. The differences in training time were mainly due to the number of images (see Table 1), but confirmed that the optimal camera settings were optimal with regards to the training time as well. In addition to producing higher classification accuracy, InceptionV3 network was also faster to train.
To test whether the BIODISCOVER machine benefits from having two cameras shooting from different angles, we performed a test on the data imaged with the optimal camera settings. The results are shown in Table 4. The classification accuracy was higher when using images from both cameras. In addition, for Resnet-50, the standard deviation was lower meaning there is less variation in the classification accuracy due to choice of test specimens. The classification accuracies in this test are slightly lower than in Table 2 as for this particular test we are using less images per specimen (approximately 50%).
|Camera 1||Camera 2||Both cameras||Camera 1||Camera 2||Both cameras|
Once we had optimized the camera settings, we re-ran the InceptionV3 network with the data including also the three Diptera
species. The average classification accuracy over 10 test sets was 0.980. The information of individual classification decisions is shown in a confusion matrix with the true species on the rows and the predicted species on the columns. Table5 shows the normalized average confusion matrix over the 10 random data splits for InceptionV3 CNN with the optimal camera settings. As for individual species, Bembidion grapii was the hardest to identify. Some of the specimens were misclassified as Patrobus septentrionus and Quedius fellmanni. In addition, Otiorhyncus arcticus and Otiorhynchus nodosus were often confused, as well as Xysticus deichmanni and Xysticus durus. Other common classification errors were misclassifying Byrrhus fasciatus as Otiorhynchus nodosus and misclassifying Xysticus durus as Bembidion grapii. The species that performed poorly compared to the others are species with the lowest number of images in the data. The accuracy could be improved with collecting more data on these species or by using data augmentation techniques.
When considering automated biomonitoring, one key factor is the time it takes to automatically identify the taxonomic identity of a specimen. Training of the network can take a long time but it needs to be done only once so we recommend to use as much data as possible for the training. In taxa identification scenarios, optimising the time used for testing is more interesting. The number of images per specimen affects the total time of identification as each image needs a prediction. To optimize the number of images per specimen, we tested how this affects the classification accuracy. As the specimens had varying number of images, we tested with the maximum number of images per specimen, . If a specimen had less images, we used all of them. If a specimen had more images, we randomly sampled of them. Again, we ran this test on the 10 data splits imaged with the optimal camera settings. The results are shown in Fig. 5, where the dark blue line represents the average classification accuracy over 10 data splits and the lighter blue area is standard deviation. The average number of images per specimen is 47 so while some specimens had over 100 images, the test accuracy stabilized at approximately 50 images. The same accuracy of approximately 96% could already be achieved with 20 images per specimen but lower numbers of images increased the variation in the classification accuracy. While increasing the maximum number of images per specimen does increase the time for taxa predictions, testing time is not an issue. Even with a maximum 100 images per specimen the time taken to predict taxa for the entire test data was on average 40 seconds. However, fixing the maximum number of images per specimen would mean less images for the BIODISCOVER device to store onto the computer, enabling faster imaging process and saving computational resources.
Fig. 6 shows the results of the biomass prediction. The logarithm transformed average area was found to be statistically significant predictor of dry weight for all three Diptera species. However, considering the R-squares of the different models, the average area is a good predictor only for the largest species, Tachina ampliforceps (r-squared = 0.758). For the two small Dolichopus species, relationships were weaker.
We have presented an image-based identification system (i.e. the BIODISCOVER machine) for insects and other invertebrates as an alternative to manual identification. We demonstrated a very high classification accuracy on a test set of images of 249 specimens of known identity belonging to one of 12 insect and spider species. We were also able to show that biomass of individual specimens could be predicted straight from information in the images. Together, our results pave the way for future non-destructive, automatic, image-based identification and biomass estimation of bulk invertebrate samples.
We imaged specimens of seven beetle, two spider, and three fly species with the BIODISCOVER machine with different values for exposure time and aperture settings and found that the best camera settings were obtained with an exposure time and an aperture . With these settings, we obtained a high test classification accuracy of 98.0%, demonstrating the great potential of the BIODISCOVER machine for the use in species identification. In arje2020, e.g., taxonomic experts achieved an accuracy of 93.9% with a dataset of 39 taxonomic groups. While adding more species to the data will increase the difficulty of the classification task (arje2020), data augmentation can be used to improve the results for rare species (raitoharju2016).
We tested predicting biomass from images on a subset of three fly species. We explored a joined mixed model for all species but the small data restricted our final analysis to three species-wise generalized linear models. The average area of the specimen was a good predictor for dry weight for the largest species, Tachina ampliforceps, but the two smaller species would require more data for better results. For instance, by weighing more species of different sizes, it would be possible to quantify the uncertainty associated with using general relationships between area and dry weight constructed from multiple, related species (e.g. species belonging to the same family). The BIODISCOVER machine can easily be used with any animal small enough to fit into the cuvette. Since the imaging device comprises standard industry components, ensuring the possibility to build more copies of the BIODISCOVER machine. The flow-through and refill systems facilitate easy archiving of samples. Furthermore, the BIODISCOVER machine also saves metadata from the images, e.g. geometric features that can be used in automatic biomass predictions.
The imaging device is one of three components for automatic image-based species identification. We are currently working on implementing a) a computer-vision enabled robotic arm to automatically detect insects from a bulk sample in a tray, choose among different tools to move individual specimens to the imaging device and b) a sorting rack to place specimens in the preferred container after imaging based on e.g. taxonomic identity, size or rarity. With these additions, the BIODISCOVER machine offers high throughput, non-destructive taxonomic identification, size/biomass estimation, counting and further morphological data, while keeping the specimens intact. Given that the robotic arm is standard industry equipment, we are on the verge of producing a truly automated species identification system for invertebrates, both aquatic and terrestrial.
We would like to thank CSC for computational resources. TTH acknowledges funding from his VILLUM Experiment project “Automatic Insect Detection” (grant 17523) and an Aarhus University synergy grant.
6 Authors’ contributions
JÄ drafted the paper with contributions from TTH with the other authors providing feedback and approving the final manuscript. CM, TTH, MRJ and SAM designed and built the BIODISCOVER machine with inputs from KM and VT who had designed the prototype described in raitoharju2018. MSR imaged the arthropod specimens and JÄ, TTH, AI, MG and JR designed the classification experiments for the data.
The image data will be made publicly available on publication of the manuscript.