# Estimates of the Reconstruction Error in Partially Redressed Warped Frames Expansions

In recent work, redressed warped frames have been introduced for the analysis and synthesis of audio signals with non-uniform frequency and time resolutions. In these frames, the allocation of frequency bands or time intervals of the elements of the representation can be uniquely described by means of a warping map. Inverse warping applied after time-frequency sampling provides the key to reduce or eliminate dispersion of the warped frame elements in the conjugate variable, making it possible, e.g., to construct frequency warped frames with synchronous time alignment through frequency. The redressing procedure is however exact only when the analysis and synthesis windows have compact support in the domain where warping is applied. This implies that frequency warped frames cannot have compact support in the time domain. This property is undesirable when online computation is required. Approximations in which the time support is finite are however possible, which lead to small reconstruction errors. In this paper we study the approximation error for compactly supported frequency warped analysis-synthesis elements, providing a few examples and case studies.

## Authors

• 2 publications
• 1 publication
• ### On the Decomposition of Multivariate Nonstationary Multicomponent Signals

With their ability to handle an increased amount of information, multiva...
03/31/2019 ∙ by Ljubisa Stankovic, et al. ∙ 0

• ### Exact, Parallelizable Dynamic Time Warping Alignment with Linear Memory

Audio alignment is a fundamental preprocessing step in many MIR pipeline...
08/04/2020 ∙ by Christopher Tralie, et al. ∙ 0

• ### Frequency, Acceptability, and Selection: A case study of clause-embedding

We investigate the relationship between the frequency with which verbs a...
04/08/2020 ∙ by Aaron Steven White, et al. ∙ 0

• ### Using a Pitch-Synchronous Residual Codebook for Hybrid HMM/Frame Selection Speech Synthesis

This paper proposes a method to improve the quality delivered by statist...
12/30/2019 ∙ by Thomas Drugman, et al. ∙ 0

• ### Time-Frequency Analysis and Parameterisation of Knee Sounds for Non-invasive Detection of Osteoarthritis

Objective: In this work the potential of non-invasive detection of knee ...
04/27/2020 ∙ by Costas Yiallourides, et al. ∙ 0

• ### Learning Deep Models from Synthetic Data for Extracting Dolphin Whistle Contours

We present a learning-based method for extracting whistles of toothed wh...
05/18/2020 ∙ by Pu Li, et al. ∙ 0

• ### Geometry-aligned moving frames by removing spurious divergence in curvilinear mesh with geometric approximation error

The vertices of curvilinear elements usually lie on the exact domain. Ho...
12/15/2020 ∙ by Sehun Chun, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The availability of configurable time-frequency schemes is an asset for sound analysis and synthesis, where the elements of the representation can efficiently capture features of the signal. These include features of interpretation: e.g. glissandi and vibrati; human perception: e.g. non-uniform frequency sensitivity of the cochlea; music theory: e.g. scales (pentatonic, 12-tone, Indian scales with unequally spaced tones, etc.); physical effects: e.g. non-harmonic overtones in the low register of the piano or of percussion instruments and so forth. When used in the context of Music Information Retrieval, the adaptation of the representation to music scales is bound to improve. e.g. for instrument-, note-, chord- and overtone detection and recognition.

Traditionally, two extreme cases have been considered: Gabor expansions featuring uniform time and frequency resolutions and orthogonal wavelet expansions and frames featuring octave band allocation and constant uncertainty of the representation elements. In previous work [1, 2, 3, 4, 5, 6, 7], generalised Gabor frames have been constructed which allow for non-uniform time-frequency schemes with perfect reconstruction. In [3] the allocation of generalised Gabor atoms is specified according to a frequency or time warping map. In [8] the STFT redressing method is introduced, which, with the use of additional warping in time-frequency, shows under which conditions one can have generalised Gabor frames. These conditions stem from the interaction of sampling in time-frequency and frequency or time warping operators, which allow to incorporate the results in [3] in a more general context. It is shown that arbitrary allocation of the atoms is exactly possible in the so called painless case, i.e. in the case of finite time support of the windows for arbitrary time interval allocation and of finite frequency support of the windows for arbitrary frequency band allocation. Non-uniform frequency analysis by means of warping was introduced in [7]. Non-uniform Gabor frames with constant-Q were previously introduced in [1] , based on the theory developed in [2], where an ad hoc procedure was employed for their construction. In [3] we provided an alternative general method for their construction, using warping. In [8] the redressing method was introduced and applied to the general construction of non-stationary Gabor frames. In [9] the general theory was revisited with the real-time computational aspects in mind. There, the first approximate schemes were introduced without extensive testing.

Since online computation of the generalised Gabor analysis-synthesis is only possible with finite duration windows, the arbitrary frequency band allocation does not lead to an exact solution in applications that require real-time, while the arbitrary time interval allocation presents little or no problem. In [9] approximations leading to nearly exact representations were introduced.

In this paper we expand on the results in [9] and provide a study of the approximation error on a wide class of signals when finite duration windows are required in arbitrary frequency band allocation.

The paper is organised as follows. In Section 2 we review the concept of applying time and frequency warping to time-frequency representations, together with the redressing method, which involves a further warping operation in the time-frequency domain to reduce or eliminate dispersion. In Section 3 we introduce approximations suitable for the online computation of redressed frame expansions. In Section 4 the results of numerical experiments are shown, which provide estimations of the approximation error. In Section 5 we draw our conclusions.

## 2 Redressed Warped Gabor Frames

In this section we review the concepts leading to the definition of redressing in the context of frequency warped time-frequency representations. First we review the basic notions of STFT (Short-Time Fourier Transform) and Gabor frames. Then we move on to the definition of warped frames and then to the redressing procedure.

### 2.1 STFT and Gabor Frames

Gabor expansions can be considered as a form of sampling and exact reconstruction of the STFT. As is well-known, given a window and defining the time-shift operator and the modulation operator , the STFT is obtained by applying the operator to the signal :

 (1)

where the overbar denotes complex conjugation.

A Gabor system is generated by the kernel of by sampling the time-frequency plane :

 G(h,a,b)={TnaMqbh:n,q∈Z} (2)

where are sampling parameters.

The scalar products of the signal with the members of the Gabor system

 ⟨s,TnaMqbh⟩,n,q∈Z (3)

provide evaluations of the STFT (1) of a signal with window at the time-frequency grid of points , with . The question whether the signal can be reconstructed from these evaluations can be addressed by introducing the concept of frame.

A sequence of functions in the Hilbert space is called a frame if there exist both positive constant lower and upper bounds and , respectively, such that

 A∥s∥2≤∑l∈I|⟨s,ψl⟩|2≤B∥s∥2∀s∈H, (4)

where is the norm square or total energy of the signal. Frames generate signal expansions, i.e., the signal can be perfectly reconstructed from its projections over the frame.

A Gabor system that is a frame is called a Gabor frame. In this case, the signal can be reconstructed from the corresponding samples of the STFT (3). While not unique, reconstruction can be achieved with the help of a dual frame, which in turn is a Gabor frame generated by a dual window . Perfect reconstruction essentially depends on the choice of the window and the sampling grid. One can show that there exist no Gabor frames when . See [10] for more informations about Gabor frames.

### 2.2 Warped STFT and Gabor Frames

The warped STFT can be obtained by warping the signal prior to applying the STFT operator. In this paper we focus on pure frequency warping.

A frequency warping operator is completely characterised by a function composition operator , such that , in the frequency domain:

 W~θ=F−1WθF, (5)

where is the Fourier transform operator. The function is the frequency warping map, which transforms the Fourier transform of a signal into the Fourier transform of another signal . We affix the symbol over the map as a reminder that the map operates in the frequency domain.

If the warping map is one-to-one and almost everywhere differentiable then a unitary form of the warping operator can be defined by the following frequency domain action

 ^sfw(ν)=[ˆU~θs](ν)=√∣∣dθdν∣∣^s(θ(ν)), (6)

where denotes frequency. We assume henceforth that all warping maps are almost everywhere increasing so that the magnitude sign can be dropped from the derivative under the square root.

Given a frequency warping operator , the warped STFT is defined through the operator as follows

 (7)

which is indeed a warped version of (1), where is the adjoint of the warping operator. If the warping operator is unitary then we have . In that case, warping the signal prior to STFT is perfectly equivalent to perform STFT analysis with inversely frequency warped windows. The warped STFT is unitarily equivalent to the STFT so that a number of properties concerning conditioning and reconstruction hold [11].

The Fourier transforms of the frequency warped STFT analysis elements are

 ^~hτ,ν(f)=[ˆW~θ−1hτ,ν](f)=√dθ−1df^h(θ−1(f)−ν)e−j2πθ−1(f)τ, (8)

i.e., the warped STFT analysis elements are obtained from frequency warped modulated windows centred at frequencies . The windows are time-shifted with dispersive delay, where the group delay is .

Frequency warping generally disrupts the time organisation of signals due to the fact that the time-shift operator does not commute with the frequency warping operator [9].

From (4) it is easy to see that any unitary operation, in particular unitary warping, on a frame results in a new frame with the same frame bounds and [11]. Since the atoms are not generated by shifting and modulating a single window function, the resulting frames are not necessarily of the Gabor type. However, warping prior to conventional Gabor analysis and unwarping after Gabor synthesis always leads to perfect reconstruction.

Starting from a Gabor frame (analysis) and dual frame (synthesis) :

 φn,q=TnaMqbhγn,q=TnaMqbg, (9)

where and are dual windows, warped frames can be generated, following (8), by unwarping the analysis and synthesis frames. In the case of non-unitary warping, a frequency domain scaling operation is necessary in order to reconstruct the original signal. For the case of unitary warping we simply have:

 ~φn,q=U†~θφn,q=U~θ−1TnaMqbh~γn,q=U†~θγn,q=U~θ−1TnaMqbg, (10)

where is the frequency warped analysis frame and is the dual warped frame for the synthesis. With these definitions, one obtains the signal expansion

 s=∑n,q∈Z⟨s,~φn,q⟩~γn,q. (11)

Warped Gabor frames suffer from the same problem as the warped STFT. Indeed the Fourier transforms of the warped Gabor frame elements bear frequency dispersive delays so that dispersive time samples are produced by the direct application of the frequency warped frame analysis.

### 2.3 Redressing Methods

As shown in [8, 9], the dispersive delays intrinsic to the warped STFT can be redressed, i.e. made into constant delays in each analysis band if frequency unwarping is performed in the transformed time domain, i.e. with respect to time shift. In other words, instead of (7) we consider the similarity transformation on the STFT operator, which is time-shift covariant. In fact, one has:

 [ˆW~θ−1SW~θs](f,ν)=¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯^h0,0(θ−1(f)−ν)^s(f), (12)

which is in the form of a time-invariant filtering operation, corresponding to convolution in time domain. The filters are frequency warped versions of the modulated windows in the traditional STFT. The Fourier transform of the redressed analysis elements are

 (13)

which shows that the dispersive delays in the analysis elements (8) are brought back to non-dispersive delays.

In redressing warped Gabor frames one faces a further difficulty due to time-frequency sampling. In this case, inverse frequency warping can only be applied to sequences (with the respect to the time shift index) and may not perfectly reverse the dispersive effect of the original map on delays.

Unitary frequency warping in discrete time can be realised with the help of an orthonormal basis of constructed from an almost everywhere differentiable warping map that is one-to-one and onto , as follows:

 μm(n)=∫+12−12√dϑdνej2π(nϑ(ν)−mν)dν, (14)

where (see [12, 13, 14, 15, 16]). The map can be extended over the entire real axis as congruent modulo to a -periodic function.

Given any sequence in , the action of the discrete-time unitary warping operator is defined as follows:

 ~x(m)=[D~ϑx](m)=⟨x,μm⟩ℓ2(Z). (15)

In fact, the sequence in satisfies

 ^~x(ν)=√dϑdν^x(ϑ(ν)), (16)

where the symbol, when applied to sequences, denotes discrete-time Fourier transform. The sequences define the nucleus of the inverse unitary frequency warping operator . where .

In order to limit or eliminate time dispersion in the frequency warped Gabor expansion, the discrete-time frequency warping operator is applied to the time sequence of expansion coefficients over the warped Gabor frames. Since the operator is applied only on the time index, for generality, one can include dependency of the map and of the sequences on the frequency index . The process can be equivalently described by defining the redressed frequency warped Gabor analysis and synthesis frames as follows:

 ~~φn,q=D~ϑq−1~φ∙,q=∑mηn,q(m)~φm,q~~γn,q=D~ϑq−1~γ∙,q=∑mηn,q(m)~γm,q, (17)

obtaining:

 s=∑n,q∈(Z)⟨s,~~φn,q⟩~~γn,q. (18)

One can show [9] that the Fourier transforms of the redressed frame are

 ^~~φn,q(f)=A(f)^h(θ−1(f)−qb)e−j2πnϑq(aθ−1(f)), (19)

where

 A(f)=√dθ−1df√dϑqdν∣∣∣ν=aθ−1(f). (20)

Hence, dispersion is completely eliminated if

 ϑq(aθ−1(f))=dqf (21)

for any , where are positive constants controlling the time scale in each frequency band. In this case, the Fourier transforms of the redressed frame elements become:

 ^~~φn,q(f)=√dqa^h(θ−1(f)−qb)e−j2πndqf. (22)

When all are identical all the time samples are aligned to a uniform time scale throughout frequencies. If the are distinct, time realignment when displaying the non-uniform spectrogram is a simple matter of different time base or time scale for each frequency band.

Unfortunately, due to the discrete nature of the redressing warping operation, each map is constrained to be congruent modulo to a -periodic function, while the global warping map can be arbitrarily selected. Moreover, the functions must be one-to-one in each unit interval, therefore they can have at most an increment of there.

In Fig. 1 we illustrate the phase linearisation problem. There, the black curve is the amplitude scaled warping map and the grey curve represents the map , which is -periodic. Both maps are plotted in the abscissa . By amplitude scaling the warping map one can allow the values of the map to lie in the range of the discrete-time warping map . The amplitude scaling factors happen to be the new time sampling intervals of the redressed warped Gabor expansion.

In the “painless” case, which was hand picked in [3], the window is chosen to have compact support in the frequency domain. Through equation (19) and condition (21), the redressing method shows that, given any continuous and almost anywhere differentiable and increasing warping map, only in the painless case one can exactly eliminate the dispersive delays with the help of (17). In fact, in this case linearisation of the phase is only required within a finite frequency range given by the frequency support of the frame elements [8, 9], which is compatible with the periodicity constraint of the redressing maps .

In the general case, a perfect time realignment of the components is not guaranteed. Notice, however, that, by construction, the redressed warped Gabor systems are guaranteed to be frames for any choice of the maps satisfying the stated periodicity conditions, even when the phase is not completely linearised. Locally, within a certain band of the warped modulated windows it is possible to linearise the phase of the complex exponentials in (19). In the sequel we will refer to this band as the essential bandwidth since, hopefully, the magnitude of the Fourier transform of the window is negligible outside it, at least as a design goal.

Unfortunately, in general both the painless case and the partially redressed cases lead to infinite duration windows, which are undesirable when online computation is required. In the next we are going to propose some approximations and study the reconstruction error also through numerical experiments.

## 3 Real-time computation of the Warped Gabor Expansion

For real-time computation one needs to make assumptions on the signal, as well on the window functions in order to keep the computational load as low as possible.

By requiring the window to be real-valued and

to be odd, we obtain

= . If, additionally the input signal is real-valued, then the coefficients fulfil and thus we only need to compute about half the coefficients and frame elements. By enforcing shift-invariance of the warped frame elements (i.e. ), which we must do since otherwise the pre-computation of the frame elements and hence the real-time computation of the expansion would be impossible, it is sufficient to only store for non-negative values of q. It is advisable that the warping map is selected as a continuously differentiable function, since the error in the resynthesised signal in the frequency domain at the points of discontinuity of is very high. A strictly positive derivative helps the atoms not get too stretched in the time domain, which is undesirable in real time applications. To avoid aliasing we further require that and for some , where SR denotes the sampling rate. This ensures that the high frequencies are smoothly mapped back to the negative frequencies and avoids extra aliasing introduced by the approximation. By requiring the Fourier transform of the window to have an analytic expression, we can avoid numerical errors in the computation of the warped windows. We further only worked with windows which are dual to itself, i.e. which is actually not a necessary restriction.

### 3.1 Implementation Details

We implemented the approximate warped redressed Gabor analysis and synthesis as C externals interfaced to Pure-Data (32 bit and 64 bit). It runs both under Windows and Linux and can use multiple cores. The warped windows are computed using equation (22) and applying an IDFT of that data. The inner products are directly computed with a loop. Also for the synthesizing part, the sum is directly computed. It turned out that this is sufficiently fast for real-time computation and whence we made no use of fast convolution algorithm. Since the data rate of the analysis part is not uniform in time, PD’s signal connections are not suitable to transfer the coefficients. Therefore we use the signal connections to transfer pointers rather than signals.

### 3.2 Interface Details

For simplicity we require that the essential bandwidth B is a multiple of the frequency shift parameter , i.e. . Then, in order to obtain a frame the following conditions must be fulfilled as can be seen easily in Figure 1 [9, eq. (34) to (36)]:

 abK≤1 (23) dqBq≤1

where is the ess. bandwidth of the warped modulated window.

 Bq=θ(qb+B2)−θ(qb−B2) (24)

In the case of an exponential increasing warping map, it makes sense to set the frequency shifting parameter in a way that adjacent notes fall away from the windows main lobe in the frequency domain (If there is such a main lobe, as in the case of the raised cosine window). This on the other hand determines the window length , a minimal value for and the time shift parameter

 a=ThR (25)

with due to equation (23). We set for simplicity. Equations (23) are fulfilled by setting

 b =1aRCb dq ≃1BqCd (26)

where can be chosen inside the PD-patch. together with controls the oversampling, where controls the bandwidth in which the phase is linearised and also influences the oversampling. The are different for each band which results in a non-uniform data rate for each band. We remark that the numbers have to be chosen, such that , where SR is the sampling rate.

The number of bands we need for a given SR is

 qsup=θ−1(SR/2)b. (27)

The assumptions on the warping map ensure that this is a natural number.

Due to the warping, the supports of the windows are in general unbounded. Thus we compute the windows with a zero padded array of length of

( for compute), which is defined as

 Tc=Thθ′infCTc (28)

where can be chosen inside the PD-patch and . Due to the same reason it is indispensable to cut the windows after warping. To define a sensible atom length after warping, we set the atoms to zero after the point from which they were smaller than , with a constant which can be chosen inside the PD-patch. Afterwards the windows are truncated accordingly and with respect to the parameter , which defines the maximal desired window length. It is important that the windows are cut after aligning them to the time origin. The second approach suggested in [9] to obtain windows with finite length by computing only a linearised version of the warped windows, leads to very bad reconstruction.

### 3.3 Computational Costs

#### 3.3.1 Analysis

A rough estimation yields the following. Let denote the length of the -window . It will be clear from the context if denotes seconds or samples. To compute the inner product of that window with the signal one needs many real multiplications and summations. This has to be done every samples. Hence, per sample we have many floating point operations (flops) in average. If the essential bandwidth is not too large, then is proportional to and is proportional to since, linearising around yields and hence

 Bq=θ(qb+Kb2)−θ(qb−Kb2)≃θ′(qb)Kb ⇒dq=1Bq≃1θ′(qb)Kb (29)

Summing up over all windows we get the average number of operations per sample :

 Navg=∑q4Tqdq∼∑q4T0θ′(qb)Kbθ′(qb)1≃4Kθ−1(SR/2)T0 (30)

Where , defined above, denotes the number of bands and depends on . That means the complexity of the analysis part is proportional to , and and .

If there is a lower bound for the ’s, one can choose identical numbers for all bands. However, this increases the computational load tremendously, making real time computation impossible.

The above is an estimation of the average computational cost. In the worst case all inner products of the windows with the signal have to be computed starting at one frame. The number of operations for that frame is

 ∑q4Tq≃4∑qT0θ′(qb) (31)

However, if one processes the audio stream block wise, the worst case cannot arise for all samples in the block at once, because two worst case scenarios have a distance of at least (actually ), the next samples have for sure lower computational cost. Furthermore, the next samples have zero computational cost. Therefore the average costs are a suitable measure for the analysis process.

#### 3.3.2 Synthesis Part

The complexity of the synthesis part is the same as that of the analysis part. Furthermore, in the synthesis part the worst case scenario can be avoided, because only the parts of the next frame buffer have to be computed in real-time, the rest can be computed later. Nevertheless, our given implementation is not optimised in this direction.

#### 3.3.3 Memory Costs

Our algorithm for precomputing the windows needs cells. With slight modifications it only needs . The smallest possible number of cells being needed is . The analysis and synthesis algorithm both need at least a buffer of size , where denotes the window length of the longest windows used in the analysis and synthesis, and audiobuffer the buffer length in which the audio is processed.

The frame elements for our tests below needed between 50 and 400 MB.

## 4 Computational Error

The measured -numbers are the difference between the averaged RMS-amplitude in dB of the input signal and the analysed-synthesised output signal (i.e. negative signal to noise ratio). For comparison: 16 bit quantization (which is CD-quality) has  dB, 8 bit quantization has  dB. The tests were conducted with the PD-patch available at tommsch.com/science.php in real-time over a time of about 20 s with double precision floating point numbers and a sample rate of  kHz. We used the following stationary and non-stationary test signals

• white:White noise

• sine X: A pure sine tone with X Hz

• const: A constant signal

• clicks: Clicks with a spacing of 1 s

• beet: Beethoven - Piano Sonata op 31.2, length 25 s.

• speech: A man counting from one to twenty, length 20 s.

• fire: A firework, length 23 s.

• atom: A sample of sparse synthesised warped Gabor atoms which were also used for that specific test run.

Since our method would lead to perfect reconstruction if the windows were time-shift invariant, the behaviour of the algorithm for stationary signals over a long time is of greatest interest. The constant signal is of interest since it is the one with the lowest possible frequency and our algorithm may bear problems with low frequencies due to the necessary cutting. The clicks represent the other extreme point of signals. The atoms are of interest since they shows whether our algorithm has the ability to reproduce its atoms with high quality.

For the warping map, we used a function which is exponentially increasing between two frequencies and , namely where are constant parameters which can be set inside the PD-patch. For all the tests we set and . Outside of and between the map is linear. The function attains exactly Nyquist frequency, i.e. at . The frequencies , are chosen such, that the resulting map is . See Figure 4 for a plot of .

### 4.1 Raised Cosine Window

As proposed in [9] we first performed tests with a raised cosine window.

 h(t)={√2bRcosπtT−T2≤t≤+T20otherwise (32)

with Fourier-Transform

 ^h(ν)=√b2R(sinc(νT−12)+sinc(νT+12)) (33)

where T is the total duration of the window, is an integer, , , . See [9, Section 4] for an ansatz about how to compute these parameters. This window has a very slow decay in the frequency domain after warping:

 ^~~φ0,q(f)=√dqa^h(θ−1e(f)−qb)≃1log2f (34)

This means that either the warped windows are not bounded anymore (i.e.: ) or that they are discontinuous, which is clearly visible in the warped windows, a plot of one window can be seen in Figure 5. Hence the computation of the warped windows with the IDFT bears numerical errors. Also the test results with this window were suboptimal, yielding an error between  dB and  dB, depending on the chosen parameters and input signal.

Changing the parameter or the parameter had a significant effect on the error. In Figure 6 on can see the influence of . This can be expected, since the essential bandwidth, in which the phase is linearised, depends on that parameter. Since oversampling is directly proportional to the computational load (as computed in  (30)), for real-time applications there is a natural upper bound. provided good results. Too small and too big numbers for both gave worst results.

On the contrary, changing the parameter while preserving the oversampling factor and the parameter had no effect, as long .

The relation between the window length after cutting and the error was nearly proportional to the value of in a certain range, see Figure 8. From this observation one can determine a suitable cutting parameter . Changing the parameter has clearly only an influence on the for low frequencies.

The parameter had no big influence as long it was about was about twice the window length , see Figure 7

The warping map has a very big influence on the error. At points where the map is not smooth the error for these frequencies is order of magnitudes higher. This can be seen in Figures 9 and 10 for the sine with 30 Hz signal. At this point, our used warping map is only . For a warping map the error was again 10 dB higher.

The table in Figure 9 shows selected numerical results with well chosen parameters for the raised cosine window. All values in dB, values above  dB and below  dB are coloured. The number denotes the average computational average complexity (see above).

### 4.2 Gaussian Window

In order to obtain a window with proper decay in the frequency domain after warping, we used a Gaussian window. This window does not overlap-add to one. Hence higher overlap is necessary to minimize the deviation from one. The warped windows were still fast decaying in the time domain, resulting in the possibility to cut them much shorter then the warped raised-cosine windows which compensated the high computational load due to the high overlap.

Our used Gaussian window and its Fourier Transform is

 h(t)=C√b2Re−14t2T2,^h(t)=CT√bRe−t2T2 (35)

where T is the total duration of the window, the overlap factor is an integer, , , and is a constant factor used to approximate the overlap-add to one condition.

This window has a fast enough decay to ensure that the warped windows in the frequency domain still are in and hence their Inverse Fourier transformed (i.e. the warped windows in the time domain) are bounded and continuous. This window leads to significantly better results down to  dB which is the limit when using PD’s single precision numbers. The influence of the parameters on the error was the same as for the raised-cosine window.

The tables in Figure 10 show selected numerical results.

### 4.3 Comparison of these two windows

For the raised cosine a high number of bands must be used to achieve a small error. In Figure 8 the tests of the red dots are conducted using 92 bands, the tests with yellow dots were with 229 bands. Since the windows have a bad decay in the frequency domain, a high essential bandwidth has to be used too, which entails big overlap in time. and hence a very high average computational load, in our example k floating point operations per sample. If one uses similar parameters to the ones used for the Gaussian window in Figure 10, the error is in the range of  dB.

Gaussian windows on the other hand do not overlap add to one and hence are not dual to themselves. Hence we at first used a very high overlap to minimize the deviation from 1, which turned out to be not necessary later. The fast decay in the time domain allows to cut the windows much shorter than the raised cosine window. This decreases the computational load, in our example only 7.3k flops per sample in average. In Figure 8 one can see that for the Gaussian window, with proper parameters, the error is in the range of  dB.

## 5 Conclusions

We have introduced a novel, flexible, easy way to construct frames starting from a classical Gabor frame. Tests show, that even in the painful case, where perfect time realignment of the components is not guaranteed and hence our method does not lead to perfect reconstruction, the error can be made as small as the accuracy of single precision floating point numbers for a wide range of signals. This does not prove that the error is small for all signals, but gives a good estimation of the error to be expected. The transform is working in real time.

We are going to implement maps that can be arbitrarily defined, e.g., by means of interpolation of a selected number of points. We will also implement Gabor Multipliers

[17] in the redressed warped Gabor expansion (partially done already with the external winarym̃issing).

Since we already implemented this method as a Pure Data external, it is ready to use for audio-applications. The whole external explained in detail as well as the source code can be found online at  tommsch.com/science.php and in [18].

## 6 Acknowledgements

Many thanks to the great number of suggestions by the great number of anonymous reviewers on how to improve the paper.

## References

• [1] G. A. Velasco, N. Holighaus, M. Dörfler, and T. Grill, “Constructing an invertible constant-Q transform with non-stationary Gabor frames,” in Proceedings of the Digital Audio Effects Conference (DAFx-11), Paris, France, 2011, pp. 93–99.
• [2] P. Balazs, M. Dörfler, F. Jaillet, N. Holighaus, and G. A. Velasco, “Theory, implementation and applications of nonstationary Gabor Frames,” Journal of Computational and Applied Mathematics, vol. 236, no. 6, pp. 1481–1496, 2011.
• [3] G. Evangelista, M. Dörfler, and E. Matusiak, “Phase vocoders with arbitrary frequency band selection,” in Proceedings of the 9th Sound and Music Computing Conference, Copenhagen, Denmark, 2012, pp. 442–449.
• [4] N. Holighaus and C. Wiesmeyr, “Construction of warped time-frequency representations on nonuniform frequency scales, Part I: Frames,” ArXiv e-prints, Sep. 2014.
• [5] T. Twaroch and F. Hlawatsch, “Modulation and warping operators in joint signal analysis,” in Time-Frequency and Time-Scale Analysis, 1998. Proceedings of the IEEE-SP International Symposium on, Oct. 1998, pp. 9–12.
• [6] R. G. Baraniuk and D. L. Jones, “Warped wavelet bases: unitary equivalence and signal processing,” in Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on, Apr. 1993, vol. 3, pp. 320–323 vol.3.
• [7] C. Braccini and A. Oppenheim, “Unequal bandwidth spectral analysis using digital frequency warping,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 22, no. 4, pp. 236–244, Aug. 1974.
• [8] G. Evangelista, “Warped Frames: dispersive vs. non-dispersive sampling,” in Proceedings of the Sound and Music Computing Conference (SMC-SMAC-2013), Stockholm, Sweden, 2013, pp. 553–560.
• [9] G. Evangelista, “Approximations for Online Computation of Redressed Frequency Warped Vocoders,” in Proceedings of the Digital Audio Effects Conference (DAFx-14), Erlangen, Germany, 2014, pp. 1–7.
• [10] K. Gröchenig, Foundations of Time-Frequency Analysis, Applied and Numerical Harmonic Analysis. Birkhäuser Boston, 2001.
• [11] R. G. Baraniuk and D. L. Jones, “Unitary equivalence : A new twist on signal processing,” IEEE Transactions on Signal Processing, vol. 43, no. 10, pp. 2269–2282, Oct. 1995.
• [12] P. W. Broome, “Discrete orthonormal sequences,” Journal of the ACM, vol. 12, no. 2, pp. 151–168, Apr. 1965.
• [13] L. Knockaert, “On Orthonormal Muntz-Laguerre Filters,” IEEE Transactions on Signal Processing, vol. 49, no. 4, pp. 790–793, Apr. 2001.
• [14] G. Evangelista, “Dyadic Warped Wavelets,” Advances in Imaging and Electron Physics, vol. 117, pp. 73–171, Apr. 2001.
• [15] G. Evangelista and S. Cavaliere, “Frequency Warped Filter Banks and Wavelet Transform: A Discrete-Time Approach Via Laguerre Expansions,” IEEE Transactions on Signal Processing, vol. 46, no. 10, pp. 2638–2650, Oct. 1998.
• [16] G. Evangelista and S. Cavaliere, “Discrete Frequency Warped Wavelets: Theory and Applications,” IEEE Transactions on Signal Processing, vol. 46, no. 4, pp. 874–885, Apr. 1998, special issue on Theory and Applications of Filter Banks and Wavelets.
• [17] Hans G. Feichtinger and Krzysztof Nowak, A First Survey of Gabor Multipliers, pp. 99–128, Birkhäuser Boston, Boston, MA, 2003.
• [18] Thomas Mejstrik, “Real time computation of redressed frequency warped gabor expansion,” Master thesis, University of Music and Performing Arts Vienna, tommsch.com/science.php, 2015.