In pursuit of recreating the treasure trove of musical and non-musical sounds found in nature as well as realizing those stemming directly from human creativity, numerous audio synthesis strategies have been devised and developed over time. The additive synthesis technique, originating from the applications of Fourier theory, found use in the earliest synthesizer implementations like the Telharmonium (1897) [weidenaar] and the Hammond Organ (1935) [vail]. On the other hand, subtractive synthesis saw increased popularity in the 1960s and was employed in many popular synthesizers like the Moog Synthesizer[pekonen].
The Frequency Modulation (FM) Synthesis technique devised by John Chowning [chowning] took audio synthesis into newfound territories, with implementations like Yamaha’s DX7 (1983) achieving cult status in the music industry. FM synthesis paved the way for efficiently emulating the complex and dynamic spectra of naturally occurring sounds. Since then, wavetable [bristow] and granular synthesis [roads] techniques have also found wide acceptance in hardware and software synths.
Parallel to the aforementioned developments in audio engineering, the process control industry saw a similar advancement in the proportional-integral-derivative (PID) control strategy. Currently, PID controllers are the most commonly used means of achieving feedback control in industrial plants[lin]. They also find widespread use in automobile control, HVAC systems and robotics.
However, despite the popularity and robustness of PID control in various applications, its usability in the audio domain has not received significant research attention. Consequently, this study documents the application of the PID control strategy in synthesizing musical and non-musical sounds by defining the framework for the PID Synthesis (PIDS) technique. Additionally, an effort is also made to analyze the waveforms and spectrums generated using PIDS, thereby juxtaposing it with existing methods like FM and wavetable synthesis.
1.1 The PIDS Framework
The PID algorithm finds widespread use in various process control systems to perform feedback control, specifically, to bring the value of a process variable (PV) up to a value known as the setpoint (SP).
In most control applications, the SP is either a constant or changes infrequently and is equal to the desired value(s) to which the PV must be set. However, if the SP is allowed to vary as a curve, the PV can be made to follow it around constantly; the nature of its following being determined by the values of , and ; the respective gains of the Proportional (P), Integral (I) and Derivative (D) components.
If the frequency of the SP curve referred hereafter as the "" is f, the frequency of the resulting PV curve termed as the "" will also be f, granted the values of or are high enough. In this case, the nature of the and the magnitudes of the P, I and D gains (provided as user input or modulated programmatically) control the shape of the . This lays the foundation of PIDS, a technique to generate signals which can be either Low - Frequency Oscillations (LFOs) when f is under 20 Hz or audible waveforms when f is in the audio frequency range. Some of the typical waveforms generated by PIDS are illustrated by fig. 1.
However, the PID algorithm does not intrinsically ensure that the is restricted within the desired threshold. During its process to settle about SP, the PV is susceptible to overshoots. Depending on the PID gains set, such overshoots can assume enormous values. In such cases, the resulting audio signal produced by PIDS is likely to clip (i.e., its level exceeds 0 dB). As a result, the PIDS algorithm restricts the range to [-1,1].
Similarly, the range of the is also contained to [-1, 1]. Any sample above this value is truncated to be within this range. Such truncations can cause undesirable effects in the form of high-frequency harmonics in the signal, leading to issues like aliasing. However, these are addressed in section 3.
Additionally, the error integral may quickly accumulate when the PID algorithm activates the integral mode, and the parameter is set to sufficiently high values. Such accumulations can exceed the data type range associated with the integral variable in software synthesizers and can lead to capacitors reaching their charge limits in analogue hardware synthesizers. This saturation condition, known as an integral windup, is generally undesirable in control systems. However, when applied to PIDS, windups can lead to exciting waveforms. Consequently, the PIDS framework accommodates early limiting of the maximum integral value to hasten the windup phenomenon during signal synthesis. The characteristic differences in doing so are depicted in fig. 2.
2 The Artist
In PIDS, the determines the instantaneous SP against which the PID algorithm generates an sample. At higher values of and , the frequency of this SP curve directly sets the frequency of the generated waveform. To synthesize an audio signal of frequency f, the wavelength required in terms of sample count is given by eq. 1.
The PID scheme places no restrictions on the type of that can be supplied to it. However, to ensure that meaningful audio signals are generated for all possible inputs provided by a user, a PIDS implementation can define a set of skeletons that the
can assume. Here, the degree of freedom offered to the user is in terms of selecting one of the available skeletons and then varying its subtleties to control the audio waveform produced as desired.
To provide user control for varying the subtleties, the concept of breakpoints is introduced. Breakpoints are the set of x- and y-coordinates through which the must strictly pass through during a single wave cycle. For each breakpoint, the x-coordinate can assume any value in [0, 1] and represents the relative position of that point in the wave cycle. On the other hand, y-coordinates can have values within [-1, 1] and describe the amplitude of the at that point. The first and last breakpoint must be the same to ensure continuity of the across multiple wave cycles. However, it may not be validated in practice if the user prefers such sharp changes to contribute towards the synthesized . The manufacturer must place suitable anti-aliasing techniques to produce clean audio signals in these situations, as discussed in section 3. The remaining breakpoints are positioned through user input. By placing these intermediate breakpoints suitably, the finer aspects of the in terms of its amplitude, duty cycle and baseline may be controlled.
Additionally, how the traverses the intervals between consecutive breakpoints determines its skeleton (or shape). Conventionally, the PIDS framework abstracts the skeleton generation process from the user. Instead, they are provided with the option of selecting one of the available skeletons, and the algorithm will ensure that the resulting produced will pass through all the set breakpoints. Except for some exceptional cases, the thus created will have discontinuities at the intermediate breakpoints. In theory, there can be many such breakpoints: the higher their count, the better the flexibility in controlling the shape. However, in practice, the sampling rate of synthesis constrains the highest frequency that can be produced while ensuring the passes through all the set breakpoints; this upper limit may be obtained from eq. 2.
2.1 Implementational Choices
In the following section, some of the common compatible with PIDS will be discussed.
(made up of linear skeletons) are the stock option provided by the PIDS framework. It is obtained by performing linear interpolation between consecutive breakpoints. Apart from the relative ease of generating linear, the per-sample change in the SPs produced is gradual. This ensures that the shape does not deviate from the to a great extent. Additionally, it also reduces the possibility of sharp changes being present in the . Therefore, the resulting audio signals produced contain much lesser high-frequency components than for other types at the same audio frequency.
Consequently, implementing anti-aliasing for linear is relatively simple in most cases. However, linear may undergo a significant change in gradient at the breakpoints. While the PID algorithm negates the effect of these discontinuities to a great extent, at higher values of and , it may be required that the anti-aliasing technique employed addresses their undesired effects.
Step skeletons are another convenient alternative to generate while providing considerable control to the user. Unlike linear , the SP remains constant between breakpoints in step . Hence, the PID algorithm has more liberty in setting the value of samples when compared to linear . However, the setpoints can change sharply at the breakpoints, leading to discontinuities in the step . For higher values of and , the thus produced may also consist of these discontinuities. Hence, appropriate anti-aliasing must account for the infinite high frequencies added by them in the audio signal.
Step result in square waveforms. Therefore, by setting the position of breakpoints, the user essentially controls the duty cycle, which influences the ’s overall shape.
Unlike linear and step , sinusoidal are continuous. Since sine waves contain no harmonics, the synthesized using sinusoidal is less prone to aliasing at higher audio frequencies than its discontinuous counterparts. However, the concept of setting breakpoints is ineffective in the case of sine . Therefore, the tradeoff for simpler anti-aliasing is a marked decrease in configurability. While facilities can be provided to control the amplitude and phase (useful when using more than 2 PIDS together) of the sine waveform, the inability to set breakpoints might make sinusoidal less desirable when compared to some of the discontinuous alternatives.
Additionally, there exist other types that can be used with PIDS. For example, these may consist of skeletons of the inverse-exponential and parabolic types. However, they will not be discussed in detail as part of this research.
The synthesized by PIDS against most of the aforementioned types contains high-frequency harmonic and inharmonic components. While these components may lead to natural-sounding or "organic" timbres, some of their frequencies can exceed the Nyquist frequency set by the sampling rate. Such components get folded over and become aliased by lower frequency components on the other side of the axis of symmetry (given by the Nyquist frequency).
Subjectively, aliasing may be a desirable effect in terms of enriching the spectrum of sounds produced. However, the properties of audio signals containing aliased components depend on the sampling rate to a great extent. Hence the same signal might sound different when played by different audio players. Therefore, the PIDS framework incorporates anti-aliasing to keep synthesis solely dependent on the PID algorithm and consistent across audio playbacks.
Especially in the case of discontinuous , features like a step change in amplitude (occurring in step ) and sudden change in slope (occurring in linear ) induce infinite harmonics that slowly decay to inaudible amplitudes, as seen in fig. 6. Therefore, the problem of aliasing appears right from the generation process itself. The phase-lag introduced by the PID algorithm at lower values of or when the I mode is activated moderates the effects of these aliased components to a great extent. However, PID induces high-frequency components of its own while generating samples against the . Some of the common causes may be oscillations produced about the SP at higher values of and the getting truncated in the [-1, 1] range at higher values of .
As part of this research, numerous methods were prototyped and integrated into the PIDS framework to prevent aliasing intrinsically. However, most of them returned insufficient degrees of success. Finally, oversampling was employed as the means of anti-aliasing for PIDS synthesized signals. Initially, the is generated at a sampling rate of 2-, 4- or 8- times the playback rate. Next, the was convoluted with a windowed, low-pass FIR filter having a cutoff frequency of 20 kHz and sufficient transition bandwidth to truncate all the high-frequency components susceptible to aliasing. Finally, the filtered was downsampled back to the playback rate. A comparison of the produced in various scenarios of oversampling is depicted in fig. 7. While higher degrees of oversampling reduce the occurrences of rolled over components appearing in the final synthesized , they may increase computational resource utilization and, therefore, may not be feasible with low powered processors. In such cases, implementers must arrive at a suitable tradeoff between audio quality and processing costs.
4 Effect of PIDS Controls
Similar to PID control, the synthesized by PIDS largely depends on the values of , and provided as input by the user. To understand their working and effect on the produced, a software version of PIDS is implemented in Python3. In each case (unless specified), the is generated against a step of 440 Hz. The is portrayed by fig. (a)a, and its breakpoints are documented by table I.
4.1 Proportional Gain ()
The parameter controls the amount of contribution the P-component of PID control has on the synthesized . The P-component is a function of the error (i.e., difference) between instantaneous SP and PV. When the system is stable, higher values of ensure that this error is reduced more quickly. For PIDS, as increases, the follows the more closely (the phase-lag between them reduces). Additionally, by controlling the extent of this lag, the parameter indirectly determines the amplitude of the synthesized . As values change considerably at breakpoints, the lag present may prevent the from reaching the breakpoint y-values at lower values of . Figure (a)a illustrates the change of waveforms as increases from 0 to 1; the increasingly resembles the .
Moreover, fig. (b)b shows the increase in the amplitude on increasing , as it can follow the better. Figure 9 depicts the frequency spectrum of fig. 8. At lower values of , the frequency distribution remains almost constant as the energy of each frequency component increases per the increasing amplitude. However, at higher , the starts resembling the step-. Consequently, while the total energy remains the same, it gets more evenly distributed among the immediate harmonics (step
have infinite odd harmonics).
4.2 Integral Gain ()
The parameter controls the extent to which the I-component of PID control influences the synthesized . Unlike the P-component, the historical values of errors are considered here, and corrective action is produced as a function of the error accumulated over time. The impact of accumulation is that the PV will eventually reach the SP as the integral (accumulated error) decays back to 0. In PIDS, this results in two observations.
Secondly, the process of integral decay and consequent accumulation in the opposite direction introduces high-frequency oscillations into the . As evidenced by fig. (a)a, the higher the value of , the higher the resulting ’s peak frequency. From a music producer’s perspective, noting this phenomenon is important as the pitch of the synthesized audio signal may be drastically higher than expected when playing a note. To prevent such cases, an alternative would be to add in a small amount of the P-component while using the I-component, thereby operating PIDS in the PI-mode. Moreover, as increases, the induced harmonics can exceed the Nyquist frequency, as evidenced for waveforms with greater than 0.6 in fig. (b)b. In these cases, the anti-aliasing mechanism kicks in and removes considerable energy from the synthesized . Consequently, the average magnitude of such signals is lesser than those with lower , as shown in fig. 10.
4.3 Derivative Gain ()
The parameter controls the presence of the D-component of PID control in the synthesized . In conventional controllers, the D-component acts on the derivative of errors to provide a lead compensation to account for the sluggish performance of larger process systems (that act as high-capacity integrators). In PIDS, the D-component is simply a function of the difference between consecutive errors. Consequently, when only is active and is less than one (negative feedback), the of PIDS is a miniaturized form of the , the amplitude of the being proportional to the value of . This can be observed in fig. 12. However, at higher values, the system tends to become unstable (due to positive feedback), and the waveforms thus generated may not yield desirable audio signals.
Analyzing the frequency spectrum of purely D-mode waveforms provided by fig. 13, it is clear that increasing the value of does not modify the frequency distribution of the (it shares the same spectrum as the ). Overall, the parameter is more suitable to be used while the parameter is also activated. In such cases, the tends to pull back the effect of by providing a reduced form of P-component contribution.
While PIDS can be operated to synthesize audio signals with just one of , and being activated, its flexibility in terms of controlling the waveforms produced lies in using it with multiple parameters activated. This includes all the permutations of PID control: PI, PD, ID and PID-modes. The behaviour of PIDS in each of these multi-modes is the combined effect of the activated parameters, the individual contribution of each parameter to the being proportional to the values of their relative gains.
Figure 14 illustrates the frequency distribution of the produced in the PI mode, with the remaining constant at 0.6 while is sweeped from zero to one. The resulting not only have considerable energy at the fundamental frequency of the (440 Hz) that is induced by the P-component but also have a significant high frequency component introduced by the I-component.
As described in section 2, the set of breakpoint coordinates determines its subtleties and, therefore, the resulting synthesized by PIDS. The user can control the y-values of the first and last breakpoints and the x- and y-values of the intermediate breakpoints. By modulating these values with time, a great degree of variation is achieved in the type of obtained. Consequently, the audio signal generated in this process may vary drastically as the modulation sweeps, producing an effect similar to wavetable synthesis discussed in detail in section 6.4. Figure 15 describes the changes in PIDS (and hence, the ) as the breakpoints change for a linear .
5 Effect of Fundamental Artist Frequency
The fundamental frequency of PIDS is dependent mainly on the fundamental frequency of the . At lower frequencies, the can follow the more faithfully, and hence its peak frequency accurately matches the fundamental frequency. This can be observed in fig. (a)a and fig. 17, where the were generated against a step having breakpoints given by table I. However, as this frequency increases for higher notes, the finer details of the synthesized waveforms get lost as the struggles to match the faster oscillations of the . Consequently, for the same input provided to PIDS, the timbre generated at lower octaves may vary noticeably than at higher octaves. Another effect observed in this case is that the waveforms’ amplitude also tends to fall as the note frequency increases, the reason being the loss of finer details and the band limit set by the anti-aliasing (removing most of the harmonics). This phenomenon is observed in fig. (b)b and occurs even for conventional sine, triangle and sawtooth oscillators. Additionally, such situations arise only for the notes belonging to the higher octaves. For the commonly used octaves of modern music, the PIDS generated is majorly consistent and has the fundamental frequency characteristic of the note played.
6.1 Additive Synthesis
While a single instance of PIDS can generate a wide variety of waveforms, greater degrees of freedom are provided to the user by combining two or more PIDS . In such cases of additive synthesis, two more user-controls may be provided: a ratio adjuster to set the proportion by which the individual PIDS are mixed to form the final , and a detuner to transpose one of the PIDS with respect to the other. Additionally, if the set of breakpoints y-values of all the PIDS is the same, the relative difference in their x-values can implement phase differences between them. Figure 18 depicts a simple example of additive synthesis, with one PIDS having a step while the other having a linear .
6.2 Low Frequency Oscillator (LFO)
When used with frequencies under 20 Hz, PIDS effectively behaves as an LFO. Hence, its can be used to modulate parameters like the pitch, gain and cutoff frequencies of other audio modules (including conventional synthesizers and effect processors). In PIDS LFOs, the user controls and underlying processing remain the same as for audio synthesis, the only crucial difference being an optional lack of anti-aliasing in LFOs. Aliasing may be allowed for two main reasons; firstly, the harmonics generated for LFOs over the Nyquist frequency are significantly attenuated. Hence, the effect of any folding taking place is mostly inaudible. Secondly, implementing anti-aliasing in LFO may not necessarily guarantee the absence of aliased components in the signal it is modulating (therefore, anti-aliasing must be performed directly on the modulated signal).
On the other hand, it must be noted that the LFOs generated with the dominating may generate high-frequency harmonics that get rolled over. In such situations, the implementers may choose to enable anti-aliasing as soon as the I-component dominates (if such foldovers produce undesirable effects). An example of PIDS generated LFO is demonstrated by fig. 19.
6.3 Alternative to FM Synthesis
FM Synthesis is widely known to simulate naturally occurring timbres by adding side frequencies (harmonic and inharmonic) in the generated waveform and dynamically varying them with time [chowning]. With the I-mode activated, PIDS can also exhibit similar behaviour. The I-component adds oscillations in the generated , resulting in sidebands in the corresponding spectrum about the peak frequency; effectively, it distributes the energy of the peak among these sidebands. Additionally, the variation of (and other PIDS parameters) can alter the spectrum with time to recreate the effect of changing the modulation index in FM synthesizers; as increases, the distribution of energy among the upper sideband increases. An example of a PIDS spectrum similar to those produced by FM synthesizers is exhibited in fig. 20.
An important thing to note in this case is that the P- and D-components tend to oppose this FM synthesis-like behaviour induced by the I-component. As and increases with respect to , the distribution of energy among the sidebands decreases and begins to increasingly accumulate at the peak/fundamental frequency.
6.4 Alternative to Wavetable Synthesis
Wavetable synthesis is one of the most popular forms of audio signal generation techniques in modern music. It involves modulating the selection of one of the available waveforms in a wavetable and interpolating between consecutive selections to smoothen the transitions [bristow]. As a result, the process leads to intriguingly complex waveforms. The P, I and D components, as well as the breakpoint coordinates and type of the used, greatly influence the resulting synthesized by PIDS. Hence, modulating one or more of these parameters can return an effect similar to modulating a wavetable.
To this end, PIDS presents some advantages over conventional wavetable synthesis. Firstly, all transitions in waveform shapes take place in situ. Hence, no wavetable buffer is required to be maintained in memory, thereby benefiting processors having low-memory or read speeds. Further, the number of waves in the PIDS "wavetable" is essentially a function of the sampling rate of the modulating signal. Due to the nature of PID control, the transitions between the waves will have inherent smoothness despite sudden changes in PIDS parameters, and thus, the need for digital interpolation is avoided.
There may be limitless possibilities from using PIDS as an alternative to wavetable synthesis. The influencing factors being the modulating signal used (could be conventional or PIDS LFOs) and the parameters being modulated. A simple instance of the "wavetable" generated by linearly sweeping the and parameters is seen in fig. 21.
7 Further Research Opportunities
7.1 Addressing unstable conditions
Similar to its applications in the process control industry [liptak], the PID control scheme in PIDS is also susceptible to operating as an unstable control system. Such situations occur when one or more , and values are very high, leading the to oscillate continuously between two values (i.e., a triangular waveform) at the Nyquist frequency. Ultimately, any anti-aliasing mechanism incorporated into PIDS tends to filter out this component completely, resulting in an inaudible signal like in the case of fig. (b)b
. As of current progress in this research, it has not proven easy to estimate the exact gain values at which such conditions occur. A simple means to reduce the chances of instability may be to limit the range of inputs set for, and ; the range of [0,1] is a convenient option. However, such range limiting may not be sufficient since, for instance, PIDS may operate stably at a particular value of less than one but may undergo instability for the same value when is activated. Thus, an onus is placed on the user to ensure that the parameter inputs they provide do not tip over PIDS into instability. Special care must be taken during the automation of these parameters to ensure that PIDS is stable at all instances of modulation. However, a long term solution must be found to objectively and completely remove the possibility of unstable conditions in PIDS.
7.2 Addressing the DC Component
Unlike conventional synthesizers, the PIDS algorithm cannot guarantee that the synthesized audio waveforms will be symmetric about the time axis. A significant low-frequency component is present in the spectrum for signals having asymmetries, known as the DC component or DC bias of that signal, as seen in fig. 23. The DC component can be a source of unwanted "click" sounds and distortions that may be amplified when using some effect processors or while exporting the signal in the MP3 format [dcoffset]. Hence, there may be an interest in removing it from the synthesized . Some possible solutions in this regard may be implementing a high pass filter or subtracting the moving average of the from its instantaneous value. However, this study did not employ these strategies to address the DC component as it modifies the actual PIDS produced , and there may be better solutions available to remove it intrinsically.
7.3 Identifying Intrinsic Anti-Aliasing Techniques
In the aforementioned PIDS framework, anti-aliasing is achieved by oversampling. While this strategy is sufficient in practical cases to reduce aliasing to a great extent, it seems unnatural and detached from the PID control technique. Additionally, the computational complexity of this method scales up with an increase in the magnitude of oversampling (that may be required for the higher audio frequencies) and the order of the filter used [kahles].
In audio waveforms, a significant source of high-frequency components prone to aliasing are discontinuities in the form of sharp changes in magnitudes (found in square and sawtooth waves) or slope (found in triangle waves). This is true for the linear and step against which PIDS is synthesized. The PID algorithm can behave as a lag-compensator and negate the effect of high-frequency components in the . However, the algorithm also adds its own high frequencies in the . Figure 24 represents these observations.
Therefore, effort must be made to identify strategies to reduce the PID induced aliased components in the and capitalize on its intrinsic low pass filtering tendencies to execute a form of anti-aliasing naturally integrated into the PID control mechanism. As a starting point in this regard, implementing a small fraction of the integral mode (even when is zero) may be considered.
This study discussed the framework for a novel audio synthesis technique derived from the fundamentals of the widely used PID controllers in the process industry. Here, the goal was to lay out the foundation of the synthesis technique in terms of the nature of synthesized that may be produced. Additionally, the concept of and their constituent breakpoints was introduced as a significant building block of the PIDS framework.
Efforts were also made to understand the effects of each control parameter on the synthesized ; these included , and , as well as the type of and the number of breakpoints used. From a practical perspective, the problems of aliasing and some common ways to handle them were analyzed.
Finally, the research delved into a high-level demonstration of the possible applications of PIDS. It is conducive to being used in additive synthesis and also as an LFO generator. It also presents itself as a capable alternative to FM and wavetable synthesis, emulating their effects with efficiency.
However, there are aspects of PIDS that call for concentrated research efforts. Identifying techniques to prevent instability and developing intrinsic forms of anti-aliasing are some of the future research directions required before employing PIDS in commercial synthesizers. Moreover, additional experimentation may be needed from a musician’s perspective to fully comprehend the possible applications of PIDS in real-world music production and performance.
As of now, PIDS is just a newfound, promising technique in audio synthesis. Nevertheless, with a focused approach to address the imperfections and optimize the implementation, it can influence future developments in generating musical and non-musical sounds and control signals.