Multimodal Fish Feeding Intensity Assessment in Aquaculture

09/10/2023
by   Meng Cui, et al.
0

Fish feeding intensity assessment (FFIA) aims to evaluate the intensity change of fish appetite during the feeding process, which is vital in industrial aquaculture applications. The main challenges surrounding FFIA are two-fold. 1) robustness: existing work has mainly leveraged single-modality (e.g., vision, audio) methods, which have a high sensitivity to input noise. 2) efficiency: FFIA models are generally expected to be employed on devices. This presents a challenge in terms of computational efficiency. In this work, we first introduce an audio-visual dataset, called AV-FFIA. AV-FFIA consists of 27,000 labeled audio and video clips that capture different levels of fish feeding intensity. To our knowledge, AV-FFIA is the first large-scale multimodal dataset for FFIA research. Then, we introduce a multi-modal approach for FFIA by leveraging single-modality pre-trained models and modality-fusion methods, with benchmark studies on AV-FFIA. Our experimental results indicate that the multi-modal approach substantially outperforms the single-modality based approach, especially in noisy environments. While multimodal approaches provide a performance gain for FFIA, it inherently increase the computational cost. To overcome this issue, we further present a novel unified model, termed as U-FFIA. U-FFIA is a single model capable of processing audio, visual, or audio-visual modalities, by leveraging modality dropout during training and knowledge distillation from single-modality pre-trained models. We demonstrate that U-FFIA can achieve performance better than or on par with the state-of-the-art modality-specific FFIA models, with significantly lower computational overhead. Our proposed U-FFIA approach enables a more robust and efficient method for FFIA, with the potential to contribute to improved management practices and sustainability in aquaculture.

READ FULL TEXT

page 1

page 4

page 9

research
02/07/2023

Revisiting Pre-training in Audio-Visual Learning

Pre-training technique has gained tremendous success in enhancing model ...
research
11/02/2022

Impact of annotation modality on label quality and model performance in the automatic assessment of laughter in-the-wild

Laughter is considered one of the most overt signals of joy. Laughter is...
research
07/14/2022

A Single Self-Supervised Model for Many Speech Modalities Enables Zero-Shot Modality Transfer

While audio-visual speech models can yield superior performance and robu...
research
03/12/2023

Accommodating Audio Modality in CLIP for Multimodal Processing

Multimodal processing has attracted much attention lately especially wit...
research
05/27/2023

Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser

Audio-visual learning has been a major pillar of multi-modal machine lea...
research
09/13/2023

Leveraging Foundation models for Unsupervised Audio-Visual Segmentation

Audio-Visual Segmentation (AVS) aims to precisely outline audible object...
research
02/11/2022

A Novel Chaos-based Light-weight Image Encryption Scheme for Multi-modal Hearing Aids

Multimodal hearing aids (HAs) aim to deliver more intelligible audio in ...

Please sign up or login with your details

Forgot password? Click here to reset