Between Homomorphic Signal Processing and Deep Neural Networks: Constructing Deep Algorithms for Polyphonic Music Transcription
This paper presents a new approach in understanding how deep neural networks (DNNs) work by applying homomorphic signal processing techniques. Focusing on the task of multi-pitch estimation (MPE), this paper demonstrates the equivalence relation between a generalized cepstrum and a DNN in terms of their structures and functionality. Such an equivalence relation, together with pitch perception theories and the recently established rectified-correlations-on-a-sphere (RECOS) filter analysis, provide an alternative way in explaining the role of the nonlinear activation function and the multi-layer structure, both of which exist in a cepstrum and a DNN. To validate the efficacy of this new approach, a new feature designed in the same fashion is proposed for pitch salience function. The new feature outperforms the one-layer spectrum in the MPE task and, as predicted, it addresses the issue of the missing fundamental effect and also achieves better robustness to noise.
READ FULL TEXT