From Audio to Symbolic Encoding

02/26/2023
by   Shenli Yuan, et al.
0

Automatic music transcription (AMT) aims to convert raw audio to symbolic music representation. As a fundamental problem of music information retrieval (MIR), AMT is considered a difficult task even for trained human experts due to overlap of multiple harmonics in the acoustic signal. On the other hand, speech recognition, as one of the most popular tasks in natural language processing, aims to translate human spoken language to texts. Based on the similar nature of AMT and speech recognition (as they both deal with tasks of translating audio signal to symbolic encoding), this paper investigated whether a generic neural network architecture could possibly work on both tasks. In this paper, we introduced our new neural network architecture built on top of the current state-of-the-art Onsets and Frames, and compared the performances of its multiple variations on AMT task. We also tested our architecture with the task of speech recognition. For AMT, our models were able to produce better results compared to the model trained using the state-of-art architecture; however, although similar architecture was able to be trained on the speech recognition task, it did not generate very ideal result compared to other task-specific models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/21/2018

An Empirical Analysis of Deep Audio-Visual Models for Speech Recognition

In this project, we worked on speech recognition, specifically predictin...
research
01/30/2020

Oral Billiards

We propose a physical model of speech to explain its precision and robus...
research
12/10/2022

A Comparison of Audio Preprocessing Techniques and Deep Learning Algorithms for Raga Recognition

Ragas form the foundation for Indian Classical Music. The task of Raga R...
research
08/24/2023

Sparks of Large Audio Models: A Survey and Outlook

This survey paper provides a comprehensive overview of the recent advanc...
research
08/23/2021

Learning Sparse Analytic Filters for Piano Transcription

In recent years, filterbank learning has become an increasingly popular ...
research
02/22/2019

Fast Multi-language LSTM-based Online Handwriting Recognition

We describe an online handwriting system that is able to support 102 lan...
research
11/16/2018

Generating Black Metal and Math Rock: Beyond Bach, Beethoven, and Beatles

We use a modified SampleRNN architecture to generate music in modern gen...

Please sign up or login with your details

Forgot password? Click here to reset