IIRNet
Direct design of biquad filter cascades with deep learning by sampling random polynomials.
view repo
Designing infinite impulse response filters to match an arbitrary magnitude response requires specialized techniques. Methods like modified Yule-Walker are relatively efficient, but may not be sufficiently accurate in matching high order responses. On the other hand, iterative optimization techniques often enable superior performance, but come at the cost of longer run-times and are sensitive to initial conditions, requiring manual tuning. In this work, we address some of these limitations by learning a direct mapping from the target magnitude response to the filter coefficient space with a neural network trained on millions of random filters. We demonstrate our approach enables both fast and accurate estimation of filter coefficients given a desired response. We investigate training with different families of random filters, and find training with a variety of filter families enables better generalization when estimating real-world filters, using head-related transfer functions and guitar cabinets as case studies. We compare our method against existing methods including modified Yule-Walker and gradient descent and show IIRNet is, on average, both faster and more accurate.
READ FULL TEXT VIEW PDFDirect design of biquad filter cascades with deep learning by sampling random polynomials.
Infinite impulse response (IIR) filters have a variety of applications, such as control systems, time series forecasting, and audio signal processing [4, 7, 37]. In audio applications, digital IIR filters are often used for equalisation, including tone matching, feedback reduction, and room compensation [28]. Classical methods for designing digital IIR filters are generally restricted to specific prototypes, e.g. designing a lowpass filter with minimum passband ripple [29]. However, some applications require designing a filter that achieves an arbitrary magnitude and/or phase response. Classical methods for this task include the modified Yule-Walker (MYW) estimation [2], least squares approaches [13, 16]
[26], Steiglitz-McBride [32], and gradient-based methods [3].However, these approaches have drawbacks that may limit their application in scenarios that require high accuracy, fast estimation, or both. For example, while MYW can be performed quickly with a small number of operations, it may produce inaccurate results for more challenging target responses. On the other hand, iterative methods often provide greater accuracy and can be tailored with customized loss functions. Although, this comes with higher computational cost due to need for multiple gradient update operations. In addition, since this optimization process is generally non-convex, performance is often very sensitive to initial conditions and also may suffer from getting stuck in local minima
[3, 23].Recently, there has been interest in integrating deep learning approaches for filter design. The parallels between recurrent neural networks (RNNs) and IIR filters have been exploited to learn arbitrary filters from data
[15, 25, 27]. While these networks simulate the sample-by-sample operations of a digital IIR filter, they can be slow and difficult to train due to their recursive nature, which requires many gradient steps through time. Other approaches have instead been trained to directly estimate the parameters of graphic [36] and parametric equalizers [23, 39] given a desired magnitude response. While these approaches avoid the need for iterative estimation, they are potentially restricted by the IIR filter prototypes they estimate.We aim to address these limitations by constructing a model capable of learning the mapping from a desired arbitrary magnitude response directly to the coefficients of an IIR filter, removing the need for iterative optimization. To achieve this, we propose IIRNet, a neural network trained with randomly generated filters to estimate a cascade of biquads given a desired magnitude response, as shown in Figure 1. We investigate methods for generating a diverse random filters using knowledge of the behaviour of random polynomials.
The contributions of this work are as follows. First, we propose a specialized, domain-inspired architecture for stable training of neural network biquad filter cascade estimation named IIRNet. Second, we outline a training regime that involves generating random filters from a diverse range of families of random polynomials and empirically demonstrate how training with different families impacts generalization in real-world tasks. Finally, we demonstrate that our model combined with a training regime covering a range of random filter families generalizes to real-world filter estimation, and is shown to, on average, outperform MYW and gradient-based techniques in terms of run-time and accuracy.
An degree digital IIR filter can be characterized by its transfer function as shown in (1).
(1) |
For most applications, . To facilitate numerical stability, these filters are often implemented as a cascade of second order biquad section where
(2) |
Should the poles and zeros of each biquad fall within the unit circle of the complex plane, the digital IIR filter is said to be minimum phase. The magnitude response of these filters can be calculated by evaluating along the unit circle in the complex plane, and taking the magnitude of the result.
(3) |
In practice, the logarithm of this magnitude response is of interest,
Training neural network filter estimators relies on the generation of a dataset of random digital IIR filters. While sampling random filters for this training process may appear straightforward, we found the sampling method plays an important role in generalization. Implicit in the process of random filter generation is a random sampling of polynomials, as demonstrated in (1). In this section we define several methods for sampling random polynomials of even degree and comment on their properties. Figure 2 shows the root placements of these polynomials sampling methods for 100 filters with degree .
A. Polynomials with normal coefficients — Given a degree polynomial , sample each
from the normal distribution
. For sufficiently large , the roots of this polynomial converge to the unit circle [8]. Most roots are approximately away from the unit circle [30], and some roots are approximately away from the unit circle [20]. Roughly roots fall on the real line [11], most of which are close to and . The closest real root to the unit circle is approximately away [21]. Much of the behaviour is unchanged if the coefficients are other distributions [9, 31, 34]. Under very general conditions, the zeros of these polynomials experience repulsion [34]. As long as the polynomial and its derivative are not both likely to be small at the same time, the roots repel each other [34].B. Biquads with normal coefficients — Given a desired polynomial order , sample second order polynomials , with independently sampled from and multiply them together. This is a process where roots are sampled independently in pairs, which means the roots of the derivative polynomial are uniformly independent in the same way [24, 10]. A Monte Carlo simulation with iterations suggests that about of roots sampled using this method are real.
C. Polynomials with uniformly sampled roots in the unit disk — Given a desired order , sample roots in the complex plane using the following procedure: take uniform in and where . Then, select these roots’ complex conjugates as the remaining roots. Similar to (B), the roots of the derivative polynomial are uniformly independent in the same way [24, 10], with no expected density of real roots given this sampling.
D. Polynomials with roots sampled uniformly in magnitude and argument — Given a desired polynomial order , sample roots in the complex plane using the following procedure: take uniform in and take uniform in . Then, select these roots’ complex conjugates as the remaining roots. Similar to (C), the roots of the derivative polynomial are uniformly independent in the same way [24, 10]. There is no expected density of real roots given this sampling. Compared to the roots sampled in (C), these roots will exhibit a greater density closer to the origin than the unit circle.
E. Characteristic polynomial of a random matrix — Given a desired polynomial order
, take a random matrix
whose entries are sampled i.i.d. fromand use its eigenvalues (rescaled by
) as the roots of the desired polynomial [6, 19]. These roots exhibit a repulsion from one another [34]. The eigenvalues converge to the unit disk for various distributions of entries [33]. The characteristic polynomial of this random matrix has roughly real roots [5]. It is known this behaviour persists for a family of random variables whose first four moments match those of the Gaussian (i.e.
) [35], but still open in cases such as if the coefficients arewith equal probability
[38].F. Uniform parametric EQ — Given a desired polynomial order , uniformly sample the parameters of a parametric EQ made up of one low shelf section, one high shelf section, and peaking filters [23]. The uniformly sampled parameters include each section’s corner/center frequency, gain, and Q factor.
Our goal is to train a neural network to learn a mapping that takes a desired magnitude response sampled at linearly spaced frequencies over , where is the system sample rate, and estimates an order digital IIR filter with a magnitude response . We fixed to be even. This cascade of biquads can be represented by a scalar gain , and a set of second-order sections comprised of complex poles and complex zeros , where . Thus the network learns a mapping , with as shown in Figure 1. Each pole and zero is paired with its complex conjugate to ensure each biquad has real-valued coefficients. Thus the biquad takes the form
(4) |
Estimating a system gain rather than an individual gain for each second order section reduces the total number of parameters without loss of generality, and was found to aid stability in training higher order models. Additionally, we force the system gain
to aid training stability by applying the sigmoid function to IIRNet’s gain estimate and then multiplying by 100. To ensure a minimum phase filter, the estimated poles
and zeros are rescaled according to [22] as shown in (5). To further stabilize training, a constant was added to prevent root placement at the origin or on the unit circle.(5) | |||
During training, the network is tasked with minimizing a loss function that measures the distance between the input and estimated magnitude response. We used the mean squared error of the of the magnitude responses of the estimated and target magnitude responses over a set of linearly spaced frequencies .
(6) |
The complex response of each second order section was calculated by performing the discrete Fourier transform on the numerator and denominator polynomials with zero padding, and dividing the result. This allows for parallelized computation, as opposed to the sample-based gradient optimization in previous works
[15, 25]. The base IIRNet architecture is composed of linear layers with hidden dimension , each followed by layer normalization [1] and LReLU with . The final layer has no activation, and projects the hidden dimension to the number of filter parameters, which is a function of the filter order. We treat the estimation of complex values as the individual estimation of their real and imaginary components.We considered two baselines to benchmark IIRNet against existing methods: the modified Yule-Walker [2]
method and a stochastic gradient descent (SGD) method. For the SGD approach we used the same biquad parameterization and loss function as IIRNet, but instead randomly initialized a vector with
parameters that are optimized over a number of gradient steps using a learning rate of . We then varied the number of gradient steps to observe the impact on run-time as well as accuracy.Training Method | Random polynomial families | Real | Avg | |||||||
---|---|---|---|---|---|---|---|---|---|---|
A | B | C | D | E | F | G | HRTF | Gtr. Cab. | ||
Modified Yule-Walker () | 12.84 | 32.46 | 16.67 | 124.23 | 6.80 | 1.40 | 19.73 | 1.19 | 60.86 | 30.69 |
A. Normal coefficients | 4.38 | 6.80 | 6.22 | 23.11 | 1.42 | 1.11 | 5.07 | 1.35 | 6.73 | 6.24 |
B. Normal biquads | 13.19 | 2.70 | 0.21 | 1.29 | 2.14 | 0.57 | 2.64 | 2.40 | 6.86 | 3.55 |
C. Uniform disk | 193.81 | 328.79 | 0.08 | 1.19 | 8.91 | 50.42 | 83.32 | 263.06 | 1203.40 | 237.00 |
D. Uniform magnitude disk | 175.81 | 279.54 | 0.09 | 0.54 | 11.25 | 61.41 | 76.37 | 250.38 | 1111.05 | 218.49 |
E. Characteristic polynomial | 22.95 | 32.66 | 0.35 | 2.44 | 0.81 | 0.72 | 6.81 | 11.02 | 138.99 | 24.08 |
F. Uniform parametric EQ | 19.33 | 12.84 | 3.06 | 17.84 | 3.52 | 0.21 | 6.89 | 3.79 | 17.50 | 9.44 |
G. All families | 6.24 | 2.89 | 0.11 | 0.67 | 1.12 | 0.34 | 1.28 | 1.40 | 5.59 | 2.18 |
Method | Params. | Time | (G) | HRTF | Gtr. Cab. |
---|---|---|---|---|---|
Million | ms | dB MSE | dB MSE | dB MSE | |
MYW | - | 9.00 | 19.73 | 1.19 | 60.86 |
SGD (1) | - | 7.75 | 2458.28 | 3165.43 | 5648.83 |
SGD (10) | - | 58.21 | 998.20 | 1393.29 | 2362.49 |
SGD (100) | - | 578.52 | 11.74 | 3.49 | 5.67 |
SGD (1000) | - | 5784.94 | 9.49 | 0.76 | 2.25 |
IIRNet D64 | 0.04 | 0.28 | 3.70 | 2.74 | 7.22 |
IIRNet 128 | 0.09 | 0.29 | 2.95 | 2.41 | 7.11 |
IIRNet 256 | 0.21 | 0.30 | 2.08 | 2.03 | 6.29 |
IIRNet 512 | 0.55 | 0.36 | 1.51 | 1.69 | 6.54 |
IIRNet 1024 | 1.63 | 0.71 | 1.29 | 1.39 | 5.54 |
IIRNet 2048 | 5.35 | 1.87 | 1.16 | 1.52 | 5.02 |
IIRNet 4096 | 19.1 | 4.65 | 1.11 | 1.38 | 5.86 |
We used AdamW [12, 17] and train for epochs with a batch size of 128, where epoch is defined as random filters, equating to a total of million filters. The target magnitude response was evaluated over linearly spaced frequencies. To aid stability, we clip all responses and then scale responses between . All models were trained with an initial learning rate of unless otherwise noted, and we decayed the learning rate by a factor of at and
% through training. We applied gradient clipping where the norm of the gradients exceeded
. We conducted a set of three experiments to investigate the behaviour of IIRNet, training a total of 19 models. We provide code for these experiments along with pre-trained models.^{1}^{1}1https://github.com/csteinmetz1/IIRNetFilter family — To investigate the impact of the random filter sampling method we trained 7 models, training each on a different family of random order filters (A-F) as described in Section 2.2, with the final model trained using all of the families together (G). For these experiments, each linear layer had hidden units, and each model was trained to estimate a order biquad cascade.
Model size — The size of the linear layers within IIRNet has a direct impact on the inference time, which is of interest for online and real-time applications. We investigated the impact of the model size on the run-time and accuracy by training another 7 models using hidden sizes . For these models we trained using an equal number of random filters from all of the families (G). All timings were performed on CPU and averaged over a total of 1000 runs, using a machine with an AMD Ryzen Threadripper 2920X.
Filter order — IIRNet predicts a fixed order filter given a desired magnitude response, which means that a different model must be trained for different filter order estimations. To investigate the performance of our approach as a function of the filter order, we trained another 5 models, varying both the order of the random filters used in training, and the filter order estimated by IIRNet. These models used hidden units in each linear layer and were trained again with random filters from all families (G). Since we found training models that estimate higher order filters () more unstable, we trained all of these models with an initial learning rate of .
Three different sets of filters were used to evaluate the models. First, we evaluated using random filters from each of the 7 proposed random filter families (A-G). We then measured how models generalized to distributions of filters not seen during training, as well as matching the magnitude response of real-world filters such as measured head-related transfer functions (HRTFs) and guitar amplifier cabinets. Though phase is an integral part of the HRTF, some studies suggest that the HRTF can be reproduced within perceptual tolerance under certain conditions via a minimum phase magnitude response plus delay match [14]. Guitar cabinets combine loudspeakers with guitar amplification circuits for use in creative settings within music production. The impulse response of these cabinets can then be used for digital emulation of the linear behaviour of these devices. In our experiments, 187 HRTFs were sourced from the IRCAM-Listen HRTF Dataset^{2}^{2}2http://recherche.ircam.fr/equipes/salles/listen/ and 32 guitar cabinet impulse resposnes were sourced from Kalthallen Cabs^{3}^{3}3https://cabs.kalthallen.de. All impulse responses were resampled to 16-bit 44.1kHz and a Savitzky-Golay filter [18] was used to smooth the magnitude responses before input to IIRNet.
Experiments with different random filter families in Table 1 show that training on a specific family of random filters resulted in the best performance when evaluating on that filter family. Furthermore, we found that training on certain families (A, B, F) rather than others (C, D, E) resulted in better performance on real-world filters. This supports our claim that the method for constructing random filters is a significant consideration in training this type of model. Notably, IIRNet trained on all filter families (G) achieved the lowest combined MSE across all datasets, indicating that training on multiple families is superior to training on any single family alone.
We also compared performance of these models against the MYW approach, as shown in the first row of Table 1. Here we used MYW to fit the desired response specifying the filter order , the same as the target filter. This approach performs worse than IIRNet trained with (G) across all of the random polynomial families, along with the guitar cabinet responses. However, we find that MYW outperforms other methods on the HRTF dataset. These results point to MYW performing better when the overall range of the magnitude response is more limited, but this approach may struggle when the response has a much larger range in the magnitude space.
The run-time and accuracy of variants of IIRNet are compared to an SGD and MYW approximation on identical datasets in Table 2. Both the run-time and accuracy increase as we increase the size of IIRNet, as expected. All versions of IIRNet are both faster and more accurate across the set containing all random filter families (G) as compared to both SGD and MYW. On the real-world filter estimation tasks, SGD with 1000 iterations outperforms MYW and even the largest IIRNet model, but has a run time orders of magnitude higher. MYW beats all other approaches on the HRTF estimation task, but performs worse than even the smallest IIRNet model across all random filters families and guitar cabinet estimation.
Since IIRNet is trained to estimate filters of a fixed order, we evaluated how performance changed as a function of the estimation order. Table 3 demonstrates that in general, increasing the estimated filter order of IIRNet improves estimation accuracy at all orders less than or equal to the training order. However, we found training models that estimate filters with order challenging, often leading to instability. As a result, the model trained to estimate order filters diverged, and hence performs worse than the order model.
Train | Test order (G) | HRTF | Gtr. Cab. | ||||
---|---|---|---|---|---|---|---|
Order | 4 | 8 | 16 | 32 | 64 | ||
4 | 1.21 | 7.65 | 20.30 | 75.28 | 196.19 | 11.77 | 19.20 |
8 | 0.37 | 1.59 | 6.20 | 24.98 | 80.10 | 6.08 | 11.96 |
16 | 0.22 | 0.68 | 2.13 | 9.55 | 34.76 | 1.97 | 6.12 |
32 | 0.17 | 0.39 | 0.98 | 4.82 | 21.32 | 0.66 | 1.92 |
64 | 1.96 | 2.07 | 2.69 | 7.49 | 22.61 | 3.29 | 4.70 |
While our results demonstrate that IIRNet produces accurate estimates of both unseen random and real-world filters, this approach has some limitations including fixed order filter estimates, consideration only of the magnitude response, and the inability to apply additional design constraints. Future work could investigate a formulation of the loss function that also considers phase, along with architectural adjustments that may support variable filter order estimation. However, it may be possible to address some of these limitations by using IIRNet simply as a method for generating an initial estimate that can be refined with more flexible iterative techniques.
We presented IIRNet, a neural network for the direct estimation of the biquad filter cascades to match an arbitrary magnitude response. We investigated performance of IIRNet using a diverse range of random filter families informed by knowledge of random polynomials. Performance was measured using a large dataset of random filters, head-related magnitude responses, and guitar cabinet magnitude responses. We demonstrated that using a variety of random sampling methods performs best across datasets, outperforming all models trained with only a single random filter family. IIRNet is shown, on average, to perform faster and more accurate filter estimation compared to modified Yule-Walker and stochastic gradient descent, requiring no manual parameter tuning during the design process. Additionally, we demonstrate the accuracy-speed trade-off when varying the network size, and show how training with higher order filters produces superior generalization performance across tasks.
This work was supported by the EPSRC UKRI Centre for Doctoral Training in Artificial Intelligence and Music (EP/S022694/1) and the Research and Development Division of Yamaha Corporation, Japan.
Spectral estimation via the high-order yule-walker equations
. IEEE Trans. Acoust. 30 (5). Cited by: §1, §3.1.Differentiable IIR filters for machine learning applications
. In DAFx, Cited by: §1, §3.End-to-end equalization with convolutional neural networks
. In DAFx, Cited by: §1.
Comments
There are no comments yet.