audb – Sharing and Versioning of Audio and Annotation Data in Python

03/01/2023
by   Hagen Wierstorf, et al.
0

Driven by the need for larger and more diverse datasets to pre-train and fine-tune increasingly complex machine learning models, the number of datasets is rapidly growing. audb is an open-source Python library that supports versioning and documentation of audio datasets. It aims to provide a standardized and simple user-interface to publish, maintain, and access the annotations and audio files of a dataset. To efficiently store the data on a server, audb automatically resolves dependencies between versions of a dataset and only uploads newly added or altered files when a new version is published. The library supports partial loading of a dataset and local caching for fast access. audb is a lightweight library and can be interfaced from any machine learning library. It supports the management of datasets on a single PC, within a university or company, or within a whole research community. audb is available at https://github.com/audeering/audb.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2021

Datasets: A Community Library for Natural Language Processing

The scale, variety, and quantity of publicly-available NLP datasets has ...
research
11/30/2020

PMLB v1.0: An open source dataset collection for benchmarking machine learning methods

Motivation: Novel machine learning and statistical modeling studies rely...
research
03/02/2023

BioImageLoader: Easy Handling of Bioimage Datasets for Machine Learning

BioImageLoader (BIL) is a python library that handles bioimage datasets ...
research
05/18/2020

Surfboard: Audio Feature Extraction for Modern Machine Learning

We introduce Surfboard, an open-source Python library for extracting aud...
research
06/11/2021

PyGAD: An Intuitive Genetic Algorithm Python Library

This paper introduces PyGAD, an open-source easy-to-use Python library f...
research
09/26/2021

Soundata: A Python library for reproducible use of audio datasets

Soundata is a Python library for loading and working with audio datasets...
research
09/29/2022

Chandojnanam: A Sanskrit Meter Identification and Utilization System

We present Chandojñānam, a web-based Sanskrit meter (Chanda) identificat...

Please sign up or login with your details

Forgot password? Click here to reset