Attentive Max Feature Map for Acoustic Scene Classification with Joint Learning considering the Abstraction of Classes
The attention mechanism has been widely adopted in acoustic scene classification. However, we find that during the process of attention exclusively emphasizing information, it tends to excessively discard information although improving the performance. We propose a mechanism referred to as the attentive max feature map which combines two effective techniques, attention and max feature map, to further elaborate the attention mechanism and mitigate the abovementioned phenomenon. Furthermore, we explore various joint learning methods that utilize additional labels originally generated for subtask B (3-classes) on top of existing labels for subtask A (10-classes) of the DCASE2020 challenge. We expect that using two kinds of labels simultaneously would be helpful because the labels of the two subtasks differ in their degree of abstraction. Applying two proposed techniques, our proposed system achieves state-of-the-art performance among single systems on subtask A. In addition, because the model has a complexity comparable to subtask B's requirement, it shows the possibility of developing a system that fulfills the requirements of both subtasks; generalization on multiple devices and low-complexity.
READ FULL TEXT