Acoustic noise problems are becoming more prevalent as the quantity of industrial equipment increases . The attenuation of low-frequency noises is quite difficult and expensive for passive noise control techniques such as enclosures, barriers, silencers, etc. Different from passive techniques, active noise control (ANC) involves the electro-acoustic generation of a sound field to cancel an unwanted existing sound field . Moreover, ANC can offer a possible lower-cost alternative for the control of low-frequency noises. Thus, it attracts much interest from the industry. When dealing with different types of noises, traditional ANC systems typically use adaptive algorithms to adjust control filter coefficients to minimize the error signal . Among adaptive algorithms, the filtered-X least mean square (FxLMS) and filtered-X normalized least-mean-square (FxNLMS) algorithm are commonly used since they can compensate for the delay involved by the secondary path to increase the system robustness .
However, due to the least mean square (LMS) based algorithms’ inherent slow convergence and poor tracking ability , FxLMS and FxNLMS are less capable of dealing with rapidly varying or non-stationary noises. Their slow responses to noises may impact customers’ perceptions of the noise reduction effect . Fixed-filter ANC methods  can be adopted to tackle slow convergence, where the control filter coefficients are pre-trained rather than adaptive updated. However, the pre-trained control filter is only suitable for a specific noise type, resulting in the degradation of noise reduction performance for other types of noises. To rapidly select different pre-trained control filters given different noise types, a selective fixed-filter active noise control (SFANC) method based on the frequency band matching was proposed in .
Though the SFANC method  selects the most suitable pre-trained control filters in response to different noise types, several critical parameters of the method can only be determined through trials and errors. Considering the limitations, deep learning techniques, particularly convolutional neural networks (CNNs) [8, 23, 11, 12]
, appear to be powerful in classifying noises in SFANC methods. Automatic learning of the SFANC algorithm’s critical parameters based on deep learning would broaden its applications in real-world scenarios.
With the learning ability of CNN models, the SFANC algorithm can automatically learn its parameters from noise datasets and select the best control filter given different noise types without resorting to extra-human efforts . Additionally, a CNN model implemented on a co-processor can decouple the computational load from the real-time noise controller. Therefore, in this paper, we compared the performance of several one-dimensional (1D) CNNs and two-dimensional (2D) CNNs in the SFANC method. Also, different network training strategies are tried to choose the best one for training the networks. Experiments show that the SFANC method based on CNN not only achieves faster responses than FxLMS and FxNLMS but also exhibits good robustness. Thus, it is expected to be used for attenuating dynamic noises such as traffic noises and urban noises, etc.
2 CNN-based SFANC Algorithm
The overall architecture of the CNN-based SFANC algorithm is depicted in Figure 1. Throughout the control process, the real-time controller conducts filtering to generate anti-noise while simultaneously sending the primary noise to a co-processor (e.g., a mobile phone). Given the primary noise, the co-processor employs a pre-trained CNN to produce the index for the most appropriate control filter and delivers it to the real-time controller. The controller then adjusts the control filter coefficients based on the received filter index. Notably, if the network is a 1D CNN, its input is the raw waveform . However, if the network is a 2D CNN, its input is the Log Mel-spectrogram .
1 Concise Explanation of SFANC
An ANC progress can be abstracted as a first-order Markov chain as shown in Figure 2, where represents the optimal control filter to attenuate the disturbance . To achieve the best noise reduction performance, the best control filter can be selected from a pre-trained filter set . Hence, the SFANC method can be represented as follows:
where operator returns the input value for minimum output; , , and represent the linear convolution, the reference signal, and the impulse response of the secondary path, respectively. The reference signal is assumed to be the same as the primary noise.
In practice, is typically seen as the linear combination of . Thus, Equation 1 equals to
which means that the selected control filter is the one with maximum posterior probability in the presence of reference signal
. Moreover, according to Bayes’ theorem
, the posterior probability can be replaced with a conditional probability as
which predicts the most suitable control filter straight from the primary noise .
A classifier model can be developed to approximate from the pre-recorded sampling set . The
denotes the parameters of the classifier and can be obtained through maximum likelihood estimation (MLE) as
Therefore, we can utilize deep learning approaches to lean the classifier model from the training set .
2 CNN-based SFANC algorithm
Motivated by the work 
, this paper compares some 1D CNNs and 2D CNNs used for classifying noises in the time domain and frequency domain, respectively. The min-max operation firstly normalizes the input of the network:
where and mean obtaining the maximum and minimum value of . It aims to rescale the input range into and retain the signal’s negative part that contains phase information. Phase information is quite critical for ANC applications.
A lightweight 1D CNN illustrated in Figure 3
is proposed. Every residual block in the network comprises two convolutional layers, subsequent batch normalization, and ReLU non-linearity. Note that a shortcut connection is adopted to add the input with the output in each residual block since residual architecture is demonstrated easy to be optimized. Additionally, the network uses a broad receptive field (RF) in the first convolutional layer and narrow RFs in the rest convolutional layers to fully exploit both global and local information.
3 Training of CNNs
The primary and secondary paths used in the training stage of the control filters are band-pass filters with a frequency range of Hz-Hz. Broadband noises with frequency ranges shown in Figure 4 are used to pre-train control filters. The FxLMS algorithm is adopted to obtain the optimal control filters for these broadband noises due to its low computational complexity. Subsequently, the pre-trained control filters are saved in the control filter database.
A noise dataset including synthetic and real noise tracks is used in this work. Specifically, synthetic noise tracks and real noise tracks are used for training, real noise tracks for validation, and real noise tracks for testing. The synthetic noise tracks are randomly generated with various frequency bands, amplitudes, and background noise levels.
The SFANC system’s sample rate is Hz, so each noise track of 1-second duration consists of samples. Each noise track of 1 second duration is taken as primary noise to generate disturbance. The class label of a noise track corresponds to the index of the control filter that achieves the best noise reduction performance on the disturbance.
The Adam algorithm was employed to optimize the network during training. The training epoch was set to be. The glorot initialization  was used to avoid bursting or vanishing gradients. Additionally, to prevent overfitting, the weights of CNNs were subjected to regularization with a coefficient of .
1 Comparison of Different Training Schemes
Four different training schemes are compared in training the proposed 1D CNN. The comparison results are summarized in Table 1. According to Table 1, training firstly on the synthetic noise tracks and then fine-tuning on the real noise tracks achieves the highest testing accuracy. Noted that simultaneously using synthetic dataset and real dataset for training has not obtained a superior testing accuracy since the characteristics of synthetic noises and real noises are quite different. As discussed above, in the SFANC system, the CNN models can be firstly trained with the synthetic dataset and then fine-tuned with the real noise dataset.
|Training Scheme||Testing Accuracy|
|Only using synthetic dataset||46.4|
|Only using real dataset||94.6|
|Simultaneously using synthetic dataset and real dataset||94.5|
Training firstly by the synthetic dataset and then fine-tuning by the real dataset.
2 Comparison of Different Networks
Based on above fine-tuning training scheme, we compared several different 1D networks utilizing raw acoustic waveforms: the proposed 1D CNN, M3 , M5 , M11 , M18 , and M34-res . Also, some light-weight 2D networks including ShuffleNet v2 , MoblieNet v2  and Attention Network  are compared in the SFANC method. The performance of these networks on the real testing dataset are summarised in Table 2.
|Network||Testing Accuracy||Network Parameters|
|1D Convolutional Neural Networks|
|Proposed 1D Network||95.3||0.21M|
|2D Convolutional Neural Networks|
As shown in Table 2, the proposed 1D network obtains the highest classification accuracy of with the fewest network parameters among the 1D networks. As for 2D networks, the ShuffleNet v2 achieves a similar classification accuracy as MoblieNet v2 and requires far fewer parameters. By considering both the testing accuracy and the number of parameters, the ShuffleNet v2 performs best on the testing dataset among the 2D networks. Compared to the proposed 1D network, the ShuffleNet v2 obtains a slight improvement in classification accuracy but requires a little more network parameters. Therefore, it is found that the proposed 1D network and ShuffleNet v2 perform better in classifying noises in the SFANC system. The two light-weight networks can be implemented on mobile platforms, but using acoustic models directly from the raw waveform data is more convenient . Hence, the proposed 1D network is preferred.
3 Non-stationary Noise Cancellation
This section uses the SFANC algorithm based on the proposed 1D network, FxLMS algorithm, and FxNLMS algorithm to attenuate a recorded aircraft noise. The aircraft noise is non-stationary and has a frequency range of 50Hz-14,000Hz. It does not belong to the training dataset. The step size of the FxLMS and FxNLMS algorithm is set to , and the control filter length is taps. The noise reduction results using different ANC methods on the aircraft noise are shown in Figure 5.
From the results in Figure 5, we can observe that the SFANC method responds to the aircraft noise much faster than the FxLMS and FxNLMS algorithms. Also, the SFANC method consistently outperforms the FxLMS and FxNLMS algorithm in the noise reduction process. In particular, during s-s, the averaged noise reduction level achieved by the SFANC algorithm is about 7dB and 8dB more than that of FxLMS and FxNLMS, respectively. Therefore, the results on the aircraft noise confirm that the SFANC method can rapidly select the most suitable pre-trained control filter given the noise type. In contrast, adaptive algorithms show slow responses to the aircraft noise due to adaptive updating.
Active noise control (ANC) technologies have been widely used to deal with low-frequency noises. However, adaptive ANC algorithms are typically limited by slow convergence speed. In this paper, CNNs are used to automatically select the best pre-trained control filters given different noises. Also, light-weight CNNs implemented on a co-processor can decouple the computational load from the real-time noise controller. Numerical simulations show that the CNN-based SFANC method improves response time while maintaining low computational complexity and high robustness. Additionally, the effectiveness of the proposed 1D network and the fine-tuning training strategy are confirmed in the SFANC method. In future works, we will explore more efficient and robust ANC algorithms based on deep learning.
-  (2019) Urban sound tagging using convolutional neural networks. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), pp. 5–9. Cited by: §2.
-  (2017) Very deep convolutional neural networks for raw waveforms. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 421–425. Cited by: §2.
Understanding the difficulty of training deep feedforward neural networks.
Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249–256. Cited by: §3.
-  (1999) Understanding active noise cancellation. CRC Press. Cited by: §1.
-  (2016) Deep residual learning for image recognition. In , pp. 770–778. Cited by: §2.
-  (1993) Fundamentals of statistical signal processing: estimation theory. Prentice-Hall, Inc.. Cited by: §1.
-  (1999) Active noise control: a tutorial review. Proceedings of the IEEE 87 (6), pp. 943–973. Cited by: §1.
-  (2015) Deep learning. nature 521 (7553), pp. 436–444. Cited by: §1.
A kalman filter approach to active noise control. In 2000 10th European Signal Processing Conference, pp. 1–4. Cited by: §1.
-  (2020) On the robustness and training dynamics of raw waveform models.. In INTERSPEECH, pp. 1001–1005. Cited by: §2.
A robust single-sensor face and iris biometric identification system based on multimodal feature extraction network. In 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1237–1244. Cited by: §1.
-  (2021) An adaptive face-iris multimodal identification system based on quality assessment network. In MultiMedia Modeling, pp. 87–98. Cited by: §1.
A deep feature fusion network based on multiple attention mechanisms for joint iris-periocular biometric recognition. IEEE Signal Processing Letters 28, pp. 1060–1064. Cited by: §2.
-  (2022) A hybrid sfanc-fxnlms algorithm for active noise control based on deep learning. IEEE Signal Processing Letters 29, pp. 1102–1106. Cited by: §2.
-  (2018) Shufflenet v2: practical guidelines for efficient cnn architecture design. In Proceedings of the European conference on computer vision (ECCV), pp. 116–131. Cited by: §2.
-  (2015) Natural listening over headphones in augmented reality using adaptive filtering techniques. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23 (11), pp. 1988–2002. Cited by: §1.
-  (2016) Selective active noise control system for open windows using sound classification. In INTER-NOISE and NOISE-CON Congress and Conference Proceedings, Vol. 253, pp. 1921–1931. Cited by: §1.
-  (2018) MobileNetV2: inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510–4520. Cited by: §2.
-  (2019) Selective virtual sensing technique for multi-channel feedforward active noise control systems. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8489–8493. Cited by: §1.
-  (2020) Active noise control based on the momentum multichannel normalized filtered-x least mean square algorithm. In INTER-NOISE and NOISE-CON Congress and Conference Proceedings, Vol. 261, pp. 709–719. Cited by: §1.
-  (2020) Feedforward selective fixed-filter active noise control: algorithm and implementation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28, pp. 1479–1492. Cited by: §1, §1.
-  (2022) Selective fixed-filter active noise control based on convolutional neural network. Signal Processing 190, pp. 108317. Cited by: §1, §2.
-  (2015) Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9. Cited by: §1.
-  (2013) Multi-channel kalman filters for active noise control. The Journal of the Acoustical Society of America 133 (4), pp. 2105–2115. Cited by: §1.
-  (2020) Stochastic analysis of the filtered-x lms algorithm for active noise control. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28, pp. 2252–2266. Cited by: §1.