Whombat: An open-source annotation tool for machine learning development in bioacoustics

1. Automated analysis of bioacoustic recordings using machine learning (ML) methods has the potential to greatly scale biodiversity monitoring efforts. The use of ML for high-stakes applications, such as conservation research, demands a data-centric approach with a focus on utilizing carefully annotated and curated evaluation and training data that is relevant and representative. Creating annotated datasets of sound recordings presents a number of challenges, such as managing large collections of recordings with associated metadata, developing flexible annotation tools that can accommodate the diverse range of vocalization profiles of different organisms, and addressing the scarcity of expert annotators. 2. We present Whombat a user-friendly, browser-based interface for managing audio recordings and annotation projects, with several visualization, exploration, and annotation tools. It enables users to quickly annotate, review, and share annotations, as well as visualize and evaluate a set of machine learning predictions on a dataset. The tool facilitates an iterative workflow where user annotations and machine learning predictions feedback to enhance model performance and annotation quality. 3. We demonstrate the flexibility of Whombat by showcasing two distinct use cases: an project aimed at enhancing automated UK bat call identification at the Bat Conservation Trust (BCT), and a collaborative effort among the USDA Forest Service and Oregon State University researchers exploring bioacoustic applications and extending automated avian classification models in the Pacific Northwest, USA. 4. Whombat is a flexible tool that can effectively address the challenges of annotation for bioacoustic research. It can be used for individual and collaborative work, hosted on a shared server or accessed remotely, or run on a personal computer without the need for coding skills.

READ FULL TEXT

page 1

page 3

page 4

page 6

research
10/06/2020

Easy, Reproducible and Quality-Controlled Data Collection with Crowdaq

High-quality and large-scale data are key to success for AI systems. How...
research
09/24/2020

Best Practices for Managing Data Annotation Projects

Annotation is the labeling of data by human effort. Annotation is critic...
research
12/02/2022

NEAL: An open-source tool for audio annotation

Passive acoustic monitoring is used widely in ecology, biodiversity, and...
research
07/14/2023

DataAssist: A Machine Learning Approach to Data Cleaning and Preparation

Current automated machine learning (ML) tools are model-centric, focusin...
research
05/05/2021

Iterative Human and Automated Identification of Wildlife Images

Camera trapping is increasingly used to monitor wildlife, but this techn...
research
08/20/2022

MLExchange: A web-based platform enabling exchangeable machine learning workflows

Machine learning (ML) algorithms are showing a growing trend in helping ...
research
09/17/2020

Deploying machine learning to assist digital humanitarians: making image annotation in OpenStreetMap more efficient

Locating populations in rural areas of developing countries has attracte...

Please sign up or login with your details

Forgot password? Click here to reset