Automatic evaluation of herding behavior in towed fishing gear using end-to-end training of CNN and attention-based networks

03/21/2023
by   Orri Steinn Guðfinnsson, et al.
0

This paper considers the automatic classification of herding behavior in the cluttered low-visibility environment that typically surrounds towed fishing gear. The paper compares three convolutional and attention-based deep action recognition network architectures trained end-to-end on a small set of video sequences captured by a remotely controlled camera and classified by an expert in fishing technology. The sequences depict a scene in front of a fishing trawl where the conventional herding mechanism has been replaced by directed laser light. The goal is to detect the presence of a fish in the sequence and classify whether or not the fish reacts to the lasers. A two-stream CNN model, a CNN-transformer hybrid, and a pure transformer model were trained end-to-end to achieve 63 task when compared to the human expert. Inspection of the activation maps learned by the three networks raises questions about the attributes of the sequences the models may be learning, specifically whether changes in viewpoint introduced by human camera operators that affect the position of laser lines in the video frames may interfere with the classification. This underlines the importance of careful experimental design when capturing scientific data for automatic end-to-end evaluation and the usefulness of inspecting the trained models.

READ FULL TEXT

page 3

page 9

page 10

page 12

page 13

research
03/19/2022

DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition

Human action recognition has recently become one of the popular research...
research
03/19/2018

Attention-based Temporal Weighted Convolutional Neural Network for Action Recognition

Research in human action recognition has accelerated significantly since...
research
03/31/2021

Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition

Stream fusion, also known as system combination, is a common technique i...
research
07/30/2019

Towards Pure End-to-End Learning for Recognizing Multiple Text Sequences from an Image

Here we address a challenging problem: recognizing multiple text sequenc...
research
03/24/2022

Continuous-Time Audiovisual Fusion with Recurrence vs. Attention for In-The-Wild Affect Recognition

In this paper, we present our submission to 3rd Affective Behavior Analy...
research
08/19/2020

Transformer based Multilingual document Embedding model

One of the current state-of-the-art multilingual document embedding mode...
research
03/06/2020

DeLTra: Deep Light Transport for Projector-Camera Systems

In projector-camera systems, light transport models the propagation from...

Please sign up or login with your details

Forgot password? Click here to reset