Synergy between human and machine approaches to sound/scene recognition and processing: An overview of ICASSP special session

by   Laurie M. Heller, et al.

Machine Listening, as usually formalized, attempts to perform a task that is, from our perspective, fundamentally human-performable, and performed by humans. Current automated models of Machine Listening vary from purely data-driven approaches to approaches imitating human systems. In recent years, the most promising approaches have been hybrid in that they have used data-driven approaches informed by models of the perceptual, cognitive, and semantic processes of the human system. Not only does the guidance provided by models of human perception and domain knowledge enable better, and more generalizable Machine Listening, in the converse, the lessons learned from these models may be used to verify or improve our models of human perception themselves. This paper summarizes advances in the development of such hybrid approaches, ranging from Machine Listening models that are informed by models of peripheral (human) auditory processes, to those that employ or derive semantic information encoded in relations between sounds. The research described herein was presented in a special session on "Synergy between human and machine approaches to sound/scene recognition and processing" at the 2023 ICASSP meeting.


page 1

page 2

page 3

page 4


Enhancing Semantic Communication with Deep Generative Models – An ICASSP Special Session Overview

Semantic communication is poised to play a pivotal role in shaping the l...

Pathway toward prior knowledge-integrated machine learning in engineering

Despite the digitalization trend and data volume surge, first-principles...

Refining Human-Centered Autonomy Using Side Information

Data-driven algorithms for human-centered autonomy use observed data to ...

SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge

Recently, excellent progress has been made in speech recognition. Howeve...

Decisional Processes with Boolean Neural Network: the Emergence of Mental Schemes

Human decisional processes result from the employment of selected quanti...

Unsupervised Learning of Audio Perception for Robotics Applications: Learning to Project Data to T-SNE/UMAP space

Audio perception is a key to solving a variety of problems ranging from ...

Visual Attention and its Intimate Links to Spatial Cognition

It is almost universal to regard attention as the facility that permits ...

Please sign up or login with your details

Forgot password? Click here to reset