Minimum-Phase HRTF Modeling of Pinna Spectral Notches using Group Delay Decomposition

11/06/2017 ∙ by Sandeep Reddy C, et al. ∙ University of Surrey Indian Institute of Technology Kanpur 0

Accurate reconstruction of HRTFs is important in the design and development of high quality 3D binaural sound synthesis systems. Conventionally, minimum phase HRTF model development for reconstruction of HRTFs has been limited to minimum phase-pure delay models which ignore the all pass component of the HRTF. In this paper, a novel method for minimum phase HRTF modeling of Pinna Spectral Notches (PSNs) using group delay decomposition is proposed. The proposed minimum phase model captures the PSNs contributed by both the minimum phase and all pass component of the HRTF thus facilitating an accurate reconstruction of the HRTFs for all spatial angles which was hitherto not possible in the minimum phase-pure delay model. The purely minimum phase HRTF components and their corresponding spatial angles are first identified using a Fourier Bessel Series method that ensures a continuous evolution of the PSNs. The minimum phase-pure delay model is used to reconstruct HRTF for these spatial angles. Subsequently, the spatial angles which require both the minimum phase and all pass component of the HRTF are also identified. For these spatial angles, a second order all pass filter is cascaded with the minimum phase-pure delay HRTF model. The all pass filter designed in this work captures the spectral cues contributed by the all pass component of the HRTF. The minimum phase HRTF model proposed in this work is able to reconstruct accurate HRTFs since it is able to capture PSNs with a high degree of resolution. Performance of the proposed minimum phase HRTF model is evaluated by conducting experiments on PSN extraction, cross coherence analysis, and 3d binaural sound synthesis. Both objective and subjective evaluation results obtained using CIPIC, KEMAR, and the MiPS database are used to indicate the significance of the proposed model in 3D binaural sound synthesis.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 7

page 9

page 10

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Sound interaction with human body varies with its direction of arrival. This interaction can be uniquely characterised by a direction dependent transfer function called as Head Related Transfer Function (HRTF) begault20003 ; blauert . HRTFs are usually measured in an anechoic environment using a Loudspeaker-Microphone setup cipic ; gardner . These HRTFs measured for left and right ears can be utilised for synthesis of spatial sound. Binaural synthesis is a headphone based spatial audio technique utilizing HRTFs of both left and right ear. In binaural synthesis, the HRTF is typically implemented as cascade of a minimum phase component of HRTF and a pure delay called as Minimum phase-pure delay model kulkarni ; stanford . This representation has been traditionally used since it reduces the cost of binaural synthesis. But it is well known that, spectral notches of HRTF, particularly contributed by pinna called as Pinna Spectral Notches (PSN), are considered to be dominant cues in the perception of sound directionality Wright . Additionally, the contribution of these notches come from both minimum phase component and all pass component of HRTF. Therefore, an approximation of all pass component of HRTF using only a pure delay lead to loss in some of the spectral cues responsible for spatial sound perception. 111This paper is a preprint of a paper submitted to IET Signal Processing Journal. If accepted, the copy of record will be available at the IET Digital Library

Spectral notches of HRTFs have been traditionally associated with the nulls of the HRTF magnitude spectrum. They also manifest as transitions in the phase spectrum. Phase transitions are also been contributed by the all pass component of HRTFs. All these transitions in the phase spectrum appear as sharp valleys in the group delay spectrum which is the negative differential of phase spectrum

karan_MM . In raykar , a signal processing technique (LP-GD), employ autocorrelation, windowing, LP-residual and group delay operations on the HRTF to extract the location of the pinna spectral notches. As group delay operates on the phase of HRTF, the notches observed in the group delay spectrum are due to minimum phase component and all pass component. It is important to understand the contributions from each component so that spectral notches which are lost in the minimum phase-pure delay model can be identified and captured by a compensation filter. However earlier perceptual studies indicate that the relevance of these subtle cues (spectral variations of all pass components) are perceptually indistinguishable when binaural synthesis is performed from a fixed direction kulkarni

. But it is important to study the relevance of the sudden variations of all pass phase in continuously rendering binaural audio. In order to address this problem, it is important to understand the phase variations continuously in the 3D space. Fourier Bessel Series can be utilised for studying these phase variations. Fourier Bessel Series were earlier used for interpolating horizontal plane HRTFs

abhayahorz . However spatial variation of all pass phase using FBS for any circular plane has not been explored in an explicit manner. In this study, we attempt to express HRTFs of any circular plane in Fourier Bessel Series. Subsequently HRTFs of any intermediate directions are reconstructed and their phase variations are studied using Group Delay decomposition. This helps us in understanding the evolution of spectral notches continuously and thereby identify the HRTFs and its corresponding directions which are purely minimum phase in nature.

The primary contributions of this work are as follows. A novel minimum phase HRTF model using group delay decomposition is proposed for binaural sound synthesis. In this context, a HRTF group delay decomposition method for pure minimum phase HRTF identification using FBS method is first developed. Subsequently, a second order all pass filter is designed using specifications obtained by the phase variations of all pass component of HRTF. The HRTFs reconstructed by the proposed HRTF model with all pass filter are used for binaural sound synthesis using CIPIC, KEMAR, MiPS databases. Performance improvements are observed in terms of binaural sound perception.

The rest of the paper is organised in the following manner. Pure minimum phase HRTF identification using HRTF group delay decomposition is described in Section 2. A FBS method for identification of pure minimum phase HRTFs is also discussed herein. Subsequently, HRTF reconstruction and spectral notch extraction for CIPIC database is also discussed in this section. Section 3 describes the proposed HRTF model using group delay decomposition and all pass filter for binaural sound synthesis. Design of all pass filter is also discussed in this section. The performance of the proposed minimum phase HRTF model for PSN extraction and binaural sound synthesis over CIPIC, KEMAR and MiPS databases is discussed in Section 4. Section 5 concludes the paper.

2 HRTF Group Delay Decomposition for Identification of Pure Minimum Phase HRTF

Any transfer function can in general be decomposed into its minimum phase component and all pass component. Similarly, an HRTF measured from a direction can also be represented as a cascade of its minimum phase component and pure delay component as,

(1)

where is the minimum phase component of HRTF, and is the all pass component of HRTF. is termed as composite HRTF herein. In principle, the minimum phase component of HRTF is obtained by factoring the transfer function polynomial and identifying zeros and poles which are inside unit circle. But factoring large polynomials is not practical and therefore a non parametrical method of obtaining minimum phase component is used herein FILTERS07 ; FILTERSWEB07 . The flow diagram for computing minimum phase component of HRTF is illustrated in Figure 1.

indicates the Fourier transform operator and

indicates the folding operation which is performed on complex cepstrum using Equations 2 and 3

for even and odd N respectively.

ln(.)

exp(.)

Figure 1: Flow diagram for computing minimum phase component of HRTF
(2)
(3)
Figure 2: Magnitude and phase spectrum of subject 003 in the CIPIC database for angular direction . (a) Magnitude of minimum phase HRTF and (b) Unwrapped phase spectrum of composite HRTF, minimum phase HRTF and all pass component of HRTF

Figure 3: Figure illustrating spectral notches extracted using group delay decomposition. (a) Group delay spectrum of composite HRTF, (b) Group delay spectrum of minimum phase component of HRTF and (c) Group delay spectrum of all pass component of HRTF. HRTF corresponds to Subject 003 of CIPIC database for angular direction

A folding operation is performed to transform the non causal exponentials in the complex cepstrum to causal exponentials. In other words, exponentials due to the zeros and poles lying outside the unit circle are shifted to its conjugate reciprocals which lie inside the unit circle. After obtaining the minimum phase component, all pass component can be obtained as the ratio of composite HRTF and minimum-phase component of HRTF as in Equation 1. Fig. 2(a) illustrates the magnitude spectrum of composite HRTF. Fig. 2(b) illustrates the unwrapped phase spectrums of composite HRTF, minimum phase component of HRTF and all pass component of HRTF for Subject 003 of CIPIC database for angular direction . M1, M2 and M3 in Figure 2(a) indicate the nulls in the magnitude spectrum. These nulls manifested as small variations in the unwrapped phase spectrum of minimum phase component of HRTF which can be observed in Figure 2(b). This relation stems from the fact that the magnitude response and phase of the minimum phase systems are related using a Hilbert operator FILTERS07 ; oppenheim2010discrete . Further in this example, the all pass component also exhibit a phase variation. All these phase variations are manifested in the composite HRTF as the phase spectrum of composite HRTF is the sum of phase of minimum phase and all pass components. However some of the variations are indistinguishable in the phase spectrum. In order to resolve this ambiguity, group delay decomposition of HRTFs is performed, as the group delay spectrum exhibits sharper nulls when compared to the phase spectrum yegnanarayana1992significance ; kumar2014robust . In the rest of the section, group delay decomposition of HRTF and pure minimum Phase HRTF identification is discussed.

2.1 Decomposition of Group Delay of HRTF in to Minimum-phase and All Pass Components

The group delay function is defined as the negative derivative of phase spectrum. The additivity and high resolution property of group delay functions has been studied and applied widely in the domain of speech processing 4032772 . These two properties of group delay functions can be used in the accurate decomposition into minimum phase and all pass components. The group delay of the composite HRTF can be decomposed in to its minimum-phase and all pass components as,

(4)

Figure 4: Figures (a1,a2,a3) illustrate the group delay response of composite HRTF, Figures (b1,b2,b3) illustrates the group delay response of minimum-phase component of HRTF, Figures (c1,c2,c3) illustrates the group delay response of allpass component of HRTF for elevation angles respectively corresponding to Subject 003 of CIPIC database.

where, , , and are the group delay of composite, minimum phase and all pass components of HRTF respectively. Group delay of each individual component can be obtained as oppenheim2010discrete ,

(5)

where, and are the real and imaginary part of Fourier transform of respectively. It may be noted that, is the Head Related Impulse Response (HRIR). and are the real and imaginary part of Fourier transform of respectively. The group delay response of composite, minimum phase and all pass components of HRTF are computed using a LP-GD based method as discussed in raykar . The group delay response of all the three components is illustrated in Fig. 3(a)-(c) for Subject 003 of CIPIC database. Notches M2 and M3 in the group delay response of composite HRTF are clearly due to the nulls in the magnitude spectrum as shown in Fig. 2(a). Notch M1 of group delay response of HRTF is contributed by the phase variation of all pass component that can be seen in Fig. 2(b). The group delay magnitude of the composite HRTF at the notch frequencies can be attributed to the group delay additivity property given in Equation 4. The significance of the group delay decomposition in the identification of pure minimum phase HRTF is discussed in the ensuing section.

2.2 Significance of Group Delay Decomposition in Identification of Pure Minimum-phase HRTF

Group delay spectrum of composite HRTF is an addition of its minimum phase and all pass components as given in Equation 4. However, a composite HRTF is decomposed as the product of its minimum phase and all pass component as given in Equation 1. Hence, spectral notches can be resolved better in the group delay domain because of its additivity property kumar2014robust . Additivity property of the group delay spectrum also helps in resolving the closely spaced spectral notches of the composite HRTFs. In order to illustrate the significance of the group delay spectrum, consider HRTFs for elevation angles respectively of Subject 003 of CIPIC database. The group delay spectrum of the composite HRTF and, its minimum phase and all pass components, for all the three elevation angles are shown in Fig. 4. Three different cases are illustrated in Fig. 4. In the first case shown in Fig. 4 (a3)-(c3) that, spectral notches are contributed only by the minimum phase component and not by the all pass component. The second case can be categorised based on the location of spectral notches in the minimum phase and all pass components. It can be observed from Fig. 4 that, notches N1 and N3 in Fig. 4(a1) and 4

(a2) respectively indicate the spectral notches due to the all pass component of HRTF. Notches N2, N4 and N5 indicate the spectral notches contributed by the minimum phase component of HRTF. HRTFs can therefore be classified into Pure Minimum-phase HRTFs and Minimum phase-All pass HRTFs

2.2.1 Pure Minimum-phase HRTFs

HRTFs of spatial angles which are completely described by spectral notches due to the minimum phase component and not by the all pass component are termed as Pure-Minimum phase HRTFs. HRTFs of these directions can be modelled as a cascade of minimum phase component and a pure delay component. The resultant minimum phase HRTF preserves all spectral notches of the composite HRTF. HRTF exhibiting pure minimum phase behaviour is shown in Fig. 4 (a3)-(c3).

2.2.2 Minimum phase-All pass HRTFs

HRTFs of spatial angles which are completely described by spectral notches by both the minimum phase and all pass components of HRTF are called Minimum phase-All pass HRTFs. A minimum phase-All pass HRTF is shown in Figs. 4(a1)-(c1) and (a2)-(c2). HRTFs of these spatial directions cannot be modelled as a cascade of minimum phase component and pure delay component as the spectral notches of the all pass component cannot be captured by the minimum phase-pure delay model. Therefore it is important to develop a minimum phase HRTF model which can capture the spectral notches of the all pass component. Before introducing this model, identification of pure minimum HRTFs and its corresponding spatial angles using FBS method is discussed.

2.3 Indentification of Pure Minimum-phase HRTFs and Corresponding Spatial Angles using FBS Method

Identification of pure minimum-phase HRTFs and its corresponding spatial angles is an important step in the development of a new minimum phase HRTF model. Identification of pure minimum phase HRTFs of various spatial angles is performed herein, by analytical modelling of composite HRTFs. Fourier Bessel Series (FBS) is one method to model composite HRTFs of a circular plane. FBS method was earlier studied in the horizontal plane HRTF interpolation abhayahorz . However FBS method is used here to reconstruct HRTFs of any arbitrary spatial direction in a particular circular plane. Spectral notches are extracted for the reconstructed HRTFs, and the range of azimuthal and elevation angles for which all pass spectral notches exist are identified. The following sections details the FBS based HRTF modelling and identification of purely minimum phase HRTFs.

2.3.1 FBS based Modelling of Composite HRTF

Consider HRTFs of equally spaced discrete directions on any circle circumscribed over a sphere. Fourier series representation of these HRTFs is given by

(6)

Fourier series coefficients in Equation 6 are frequency dependent. These coefficients capture the resonances and notches present in HRTF. It has been shown in earlier work that spectral component of HRTF have similarities with the Bessel functions abhaya . Utilising this knowledge, is represented using Bessel functions as,

(7)

where, is the Bessel function of first kind and order n. are the positive roots of . Substituting Equation 7 in to Equation 6 gives a combined orthogonal representation of HRTFs as,

(8)

The complex coefficients are found by utilising the orthogonal property of Bessel functions. In Equation 8, for various values of k, Bessel functions are orthogonal functions. The orthogonal property of Bessel functions is given below.

(9)

Further for various values of m, the exponential terms are orthogonal. Therefore applying joint orthogonality property, coefficients can be found as,

(10)

Equation 2.3.1 consists of continuous functions and an integration operator. In practice the integral needs to be computed by discrete approximation of Equation 2.3.1. A discrete approximation of Equation 2.3.1 can be obtained by

(11)

where M and N denote the total number of discretised frequency bins and angular directions respectively. Computation of for all and is practically not possible. Therefore truncation to a finite value is required. This truncation is justified because the magnitude of coefficients is significant for only finite and values as illustrated in Fig. 5. It can be inferred from the Fig. 5 that, for and , magnitude is considerably very less. Using the truncation limits HRTFs can be reconstructed for any arbitrary elevation angles. In this manner HRTFs thus computed from FBS method can further be used to obtain the spectral notches for any arbitrary angle . In this work reconstruction is performed for all the elevation angles of all the 25 interaural circles of CIPIC database.

Figure 5: Amplitude of Fourier Bessel coefficient in the median plane for left pinna of subject 50 in CIPIC Database.

Figure 6: Group delay response of reconstructed HRTF for elevation angles ranging from to in steps of (top to bottom). Spectral notches are indicated using bold dots.

Figure 7: PSNs extracted for reconstructed HRTFs, and their minimum-phase and allpass components for elevation angles ranging from to . Elevation angle step size of is used.

In order to precisely estimate the spectral notches, group delay is computed using an LP-GD approach

raykar . This algorithm is used to obtain the location of spectral notches, and notch depth. In order to distinguish notches a threshold of -0.8dB is used in this work. Fig. 6 illustrates the group delay plot obtained for the HRTFs reconstructed for an elevation angle to in steps of . It can observed clearly that there is a monotonic increase of the depth of notch 1 and notch 5. It may be noted that, the HRTF reconstructed using FBS for any arbitrary elevation angle has a continuous evolution in terms of spectral notches with respect to its preceding and succeeding elevation angles.

Figure 8: Location of the all pass spectral notch in the audio frequency band from 20Hz to 20KHz extracted for the HRTFs of various subjects of CIPIC database.

2.3.2 Identification of Pure Minimum phase HRTFs from Composite HRTFs

FBS based modelling of composite HRTFs can be used to reconstruct composite HRTFs of any elevation angle in an interaural circle. This result of the FBS method indicate that it is possible to observe the spatial dynamics of the notches continuously in the 3D space. Hence it is also possible to observe the spatial dynamics of minimum phase and all pass spectral notches so that purely minimum phase HRTFs can be identified. An example illustrating the spatial dynamics of minimum phase and all pass spectral notches is given in Fig. 7. It can be understood from the first notch of all pass component that, FBS method can exactly identify the angular direction at which all pass spectral notches crosses the threshold. The steps for finding the pure minimum phase HRTFs and corresponding spatial angles is enumerated in Algorithm 1. The inputs to Algorithm 1 are the set of measured HRIRs. The algorithm utilizes the FBS method described in Section 2.3.1 to output and . is the range of all angles for which HRTFs exhibit purely minimum phase behaviour in a particular circular plane, and includes corresponding minimum-phase HRTFs. This algorithm can be repeated for all the azimuthal angles to identify the pure minimum phase HRTFs of all directions of 3D space.

Simulations are performed on HRTFs of CIPIC database across various subjects using proposed algorithm to identify the purely minimum phase HRTFs. Figure 8 indicates the depth of the all pass spectral notch for various directions. The color (Blue) in Figure 8 indicates the regions where HRTFs does not exhibit any spectral notch by the all pass component and therefore considered as purely minimum phase in nature. It can be observed that HRTFs corresponding to roughly azimuthal angles ranging from to and elevation angles ranging from to , exhibit pure minimum phase behaviour.

1:Input: , , Set of measured HRIRs
2:HRTF reconstruction using FBS
3:Extract minimum phase and all pass components and
4:Compute the group delay of all pass HRTFs, say .
5:if  then
6:     
7:     
8:     if  then
9:         
10:         Jump to Step 2
11:     else
12:         STOP
13:     end if
14:end if
15:Output: Set of purely minimum phase HRTFs , and its corresponding directions .
Algorithm 1 Algorithm to identify pure minimum phase HRTFs and its corresponding directions

3 A HRTF Model using Minimum-phase Decomposition for Binaural Sound Synthesis

In spatial directions where HRTFs do not exhibit pure minimum phase behaviour, there is a contribution of spectral notches from the all pass component. Therefore a new minimum phase HRTF model is studied so that the resultant HRTF captures all spectral cues. The minimum phase HRTF model is illustrated in Fig. 9. The model consists of three components that are cascaded in series. The first component, indicates the minimum phase component of composite HRTF. The second component indicates the pure delay, where indicates the onset time of the measured HRIRs. The third component is the second order all pass filter that models all pass component . It must be noted that, for , the all pass components do not exhibit spectral notches and hence the transfer function of APF has no zeros and poles. But for directions where , the transfer function contains zeros and poles. Hence the resultant transfer function of the proposed HRTF model is given by

The design of the all pass filter is discussed in the ensuing section.

Figure 9: Figure illustrates cascading of minimum phase component of HRTF, pure delay, and second order all pass filter.

Figure 10: Figure illustrating the variation of group delay magnitude of composite HRTF, minimum phase component and all pass component of HRTF respectively, and Figure illustrates composite HRTF constructed using second order all pass filter , minimum phase component of , and second order all pass filter respectively obtained using LP-GD approach.

3.1 Design of a Second Order All Pass Filter for Modelling All Pass Component of HRTF

All pass component of any transfer function exhibits a flat magnitude spectrum. The differences in the all pass component of various transfer functions can only be observed in the phase response. In general, a second order all pass transfer function can be expressed as,

(12)
(13)

where are conjugate reciprocal to each other and they correspond to locations of pole and zero respectively. are their corresponding magnitudes and angles. represent frequency location where the all pass filter exhibit a phase variation, and corresponds to sampling frequency. This transfer function exhibit flat magnitude spectrum, however phase and its corresponding group delay of the transfer function depends upon the magnitude and angle of the . The angle is determined as given in Equation 13. Magnitude is determined by the depth of notch that is observed in the group delay of all pass component. The phase and group delay spectrum of an all pass filter is related to its magnitude of the pole as follows.

(14)
(15)
(16)

It can be observed that the relation between the depth of the notch and the magnitude of is not linear. Location of the notches obtained using LP-GD approach are used for finding the angle and . For a fixed , and , the polynomial expression in can be solved to obtain the magnitude of pole . In this manner second order all pole filter is designed.

Experiments are performed using HRTFs of Subject 003 of CIPIC database for angular direction . Figure 10 illustrates the group delay of the composite HRTF, minimum phase component of HRTF, and all pass component of HRTF obtained using LP-GD approach. It can be noted that, the all pass component of HRTF consists of a spectral notch at frequency . As group delays of minimum phase and all pass components are additive, the spectral notch contributed by the all pass component is manifested at the same location in the composite HRTF. The HRTFs generated using the minimum phase-pure delay model does not capture the contribution of these notches that can be noted from Figure . Figure 10 indicates the group delay of second order all pass filter using obtained using LP-GD approach. Figure 10 indicate the group delay of the minimum phase component of HRTF. Figure 10 indicates the group delay of the final composite HRTF obtained using the proposed method. It can be noted that, all spectral notches of minimum phase or all pass components are well captured.

3.2 Significance of minimum phase HRTFs in binaural sound synthesis

Binaural synthesis is usually performed by filtering any monophonic sound with Left and Right HRTFs. It is important to understand the advantages of using minimum phase HRTFs as compared to composite HRTFs in the binaural sound synthesis. Minimum phase systems reduces the cost of binaural synthesis by shortening the length of FIR filters and reducing the components required in the linear decomposition methods. Hence for the directions whose HRTFs exhibit purely minimum phase nature, the cost of binaural synthesis will be less as they can be completely represented in minimum phase components. For other directions a second order all pass filter adds additional cost but comes at an advantage of capturing important spectral cues. So instead of approximating all the HRTFs using minimum phase component and pure delay, in this work we selectively identify the directions for minimum phase representation thereby reducing the complexity and also capturing the important cues that are relevant for spatial sound perception. Considering these advantages the HRTFs obtained using the proposed method are used in the binaural sound synthesis. The steps involved in obtaining the proposed HRTFs and the binaural sound synthesised using these HRTFs are enumerated in Algorithm 2. The ensuing section discusses the performance of the proposed HRTFs as compared to HRTFs obtained using minimum phase pure delay model.

1:Input: , , purely minimum-phase HRTFs , and its corresponding spatial angles , and monophonic sound .
2:Extract minimum phase and all pass components and
3:Compute the time delay using the onset time of HRIR.
4:Compute the zeros , and poles of second order all pass filter using all pass spectral notch specifications.
5:Using the zeros and poles obtain the second order all pass transfer function
6:if ,  then
7:     
8:else
9:     
10:end if
11:Compute the left and right minimum phase HRTFs and and its corresponding HRIRs and using Steps 2 to 10.
12:,  
13:Play the left and right channel output through headphones.
14:Output: HRTFs obtained using the proposed method , and , and Binaural sound.
Algorithm 2 Algorithm for Binaural Sound Synthesis using the proposed HRTF model

Figure 11: Variation of PSN frequencies with elevation angle for measured HRTF (a1,a2), proposed M-HRTF model HRTF (b1,b2), Minimum phase component of HRTF (c1,c2), and all pass component of HRTF (d1,d2) corresponding to Subjects (163, 119) respectively of CIPIC database

Figure 12: Pinna contours obtained from the PSNs extracted for measured HRIR (a1,a2), reconstructed HRIR using proposed method (b1,b2), minimum phase component of reconstructed HRIR (c1,c2), all pass component of reconstructed HRIR (d1,d2) are mapped to the pinna images available on CIPIC database for subjects 163, and 119 respectively.

4 Performance Evaluation

Performance evaluation of the proposed HRTF model is evaluated by extracting and investigating the quality of PSNs obtained from the HRIR. Coherence analysis between HRIR computed from the proposed model and the ground truth HRIR is performed. Finally binaural audio is synthesised using the HRIRs obtained herein. Subjective evaluations of the rendered binaural audio is performed and compared with HRIRs obtained from CIPIC, KEMAR and MiPS databases.

4.1 Databases Used

4.1.1 CIPIC Database

CIPIC database cipic contains HRIRs measured by following interaural polar coordinate system. CIPIC contains total of 1250 HRIRs with 25 lateral angles ranging from to and 50 elevation angles ranging from to . There are 50 HRIRs in both horizontal and median plane. Each HRIR is of 200 sample length and measured at a sampling frequency of 44100 Hz. In addition to HRIRs, CIPIC also includes left and right pinna images of subjects for whom the HRIR measurement has been performed.

4.1.2 KEMAR Database

KEMAR database gardner contains HRIRs measured by following vertical polar coordinate system . KEMAR contains HRIRs for elevations angles ranging from to with an angular increment of , and for azimuthal angles ranging from to with an angular increment of . KEMAR contains 28 HRIRs in median plane, and 72 HRIRs in the horizontal plane. Each HRIR is of 512 sample length and measured at a sampling frequency of 44100 Hz.

4.1.3 MiPS Database

MiPS database contains HRIRs measured for the sampling points chosen by combining the interaural and vertical polar coordinate as discussed in bionic . HRIRs are measured for every resolution in the vertical circles. MiPS database contains 36 HRIRs in both horizontal and median plane. Each HRIR is of 1024 sample length and measured at a sampling frequency of 44100 Hz.

4.2 Experiments on Median Plane PSN Extraction

As the proposed HRTF model is aimed at preserving the important spectral cues, pinna spectral notches are extracted for the HRTFs obtained from the proposed model and compared it with the PSNs of measured HRTFs and the PSNs of minimum phase-pure delay model. Further we also illustrate the spectral notches of the all pass components in order to illustrate the spectral notches that are lost due to the minimum phase approximation of HRTFs. In this work, PSNs are extracted using LP-GD method with a threshold for notch depth equal to -0.8. PSNs extracted for composite HRTFs (a1,a2), reconstructed HRTFs using proposed method (b1,b2), minimum phase component of composite HRTF (c1,c2), and all pass component of composite HRTF (d1,d2) in the median plane for elevations to are illustrated in Fig. 11. As compared to Fig. 11 (a1) and (a2), the spectral notches are smoother in Fig. 11 (b1) and (b2). This can be attributed to the FBS method of HRTF reconstruction, since this method facilitates continuous evolution of spectral notches. PSNs are observed to be discontinuous in both Fig. 11(c1),(c2) and (d1),(d2). This indicates that both minimum phase and all pass components do not exhibit well defined spectral notches for all elevation angles. Additionally it must be noted that, not all the directions in the median plane are pure minimum phase. PSNs extracted for the proposed HRTFs are smoother and evolve continuously. This smooth evolution observed in the PSN can be noted when PSNs are mapped to pinna images.

4.2.1 Mapping Pinna Spectral Notches to Pinna Images

The relation between PSNs and contours of the pinna has been studied using a two ray reflection model raykar ; PRTF . A similar mapping is performed for the composite HRTFs, reconstructed HRTFs using proposed method, minimum phase and all pass components of composite HRTFs. It can be observed from Fig. 12 that, notch distances vary smoothly for the reconstructed HRTFs as compared to distances obtained for composite HRTFs. Additionally in the proposed method, notch distances can be obtained for all the elevation angles in a seamless manner. An important observation in the mappings of minimum phase and all pass components is that, some of the spectral notches are unique to either minimum phase or all pass components. For example in Fig. 12(c1) some of the PSN mappings are missing in the middle of concha wall, where as in Fig. 12(d1), those PSN mappings can be observed. This indicates that, in some regions spectral notches contributed by the all pass components are directly related to the sound reflections from the concha wall. On the other hand the PSN mappings exhibited at the top of concha wall in Fig. 12(c1), is absent in Fig. 12(d1). These regions exhibit purely minimum phase behaviour of HRTFs. A similar observation can be made for Subject 163 of CIPIC database in Figs. 12(a2)-(d2). This indicates that ignoring all pass spectral notches would prune some of the sound reflections from the the concha wall for directions which do not exhibit purely minimum phase behaviour.

Figure 13: Normalized Cross Coherence (NCC) between HRIR for various elevation and azimuthal angles of CIPIC database. NCC between composite HRIR and HRIR obtained using Min-PD model is indicated in Blue. NCC between composite HRIR and HRIR obtained using proposed M-HRTF model is indicated in Red.

4.3 Cross Coherence Analysis

Minimum phase-pure delay model (Min-PD) ignores the phase contributed by the all pass component. The proposed HRTF model (M-HRTF) captures the contribution of both the pure minimum phase and minimum phase-all pass behaviour. In order to evaluate both these models, an objective measure called Normalised Cross Coherence (NCC) stanford is used herein. The expression for NCC is given by

(17)
(18)

where, , , indicate the ground truth HRIR, HRIR obtained from M-HRTF model, HRIR obtained from Min-PD model respectively. However there exists an onset time between minimum phase and composite HRIRs. Therefore the point at which there is a maximum coherence can be used as a measure to determine correlation between composite HRIRs and the HRIRs reconstructed from different models. HRIR cross coherence computed for all the directions of CIPIC database are illustrated in Fig. 13. It can be observed that, for elevation angles ranging from to the cross coherence is same for the proposed M-HRTF model and the Min-PD model. It is so because, these directions exhibit purely minimum phase behaviour. For the remaining directions we can observe a higher cross coherence for the M-HRTF model when compared to Min-PD model for most of the directions. Difference in the cross coherence is used as a measure to identify the model which exhibits higher coherence. This difference in normalised cross coherence can be defined as,

(19)

indicate the difference in the cross coherence obtained by M-HRTF and Min-PD model. A single dimensional K-means clustering is used to segregate the cross coherence values in to three categories. In Fig.

14, the first category as shown in color (Yellow) indicate HRIRs of these directions exhibit higher cross coherence for M-HRTF model. The second category as shown in color (Blue) indicate HRIRs of these directions exhibit the higher coherence for Min-PD model, and the third category as shown in color (green) indicate the nearly similar cross coherence with respect to HRIRs of both models. It can be observed that HRIRs obtained using M-HRTF model has higher coherence for a larger number of directions as compared to HRIRs obtained from Min-PD model.

Figure 14: Variation of for different elevation and azimuthal angles. Yellow regions indicate that the proposed M-HRTF model yields high cross coherence for majority of azimuthal and elevation angles.

4.4 Experiments on Binaural Sound Synthesis

HRTFs obtained from the proposed method are utilised for binaural sound synthesis. As median plane HRTFs are highly person specific, binaural sound synthesis is performed only in the horizontal plane. FBS method is used to reconstruct HRTFs for any angle in the 3D space. HRTFs are obtained by proposed method and Min-PD model for azimuthal angles with a resolution of . Measured HRTFs of CIPIC, KEMAR, and MiPS databases in the horizontal plane are also used for evaluations. A monaural sound signal is spatialised using HRTFs obtained from various methods. Binaural rendering is performed by computing block convolution, where, successive blocks of monophonic sound are convolved with HRIRs of successive directions in the 3D space. Subsequently these filtered sounds are concatenated using the overlap save method oppenheim2010discrete . The filtered signal using left and right HRIRs are played through headphones. The synthesised binaural audio is evaluated subjectively using Mean Opinion Scores (MOS). Fifteen subjects are selected in the evaluation. Following attributes are used for subjective evaluations sub_eval_3 ; SAQI .

  • Naturalness (N): How true to life the audio listening was?

  • Presence (Ps): Presence in audio source environment.

  • Preference (Pf): Degree of pleasantness or harshness.

  • Perception of Motion (PoM): The precision and correctness with which the trajectory of the source is perceived.

The corresponding MOS obtained for spatial sound are enumerated in Table 1. The difference between the MOS of PoM attribute across various methods is very low. But measured HRTFs and proposed method HRTFs has slightly better MOS when compared to MOS obtained for Min-PD HRTFs. The MOS of preference attribute is found to have higher value and is mostly same for all the methods. For naturalness attribute, measured HRTFs seem to provide better quality of rendered sound. In the presence attribute, measured HRTFs followed by proposed method HRTFs seem to have higher scores. Across all the attributes measured HRTF and proposed method HRTFs has higher scores when compared to the HRTFs obtained by the Min-PD model.

N Ps Pf PoM
CIPIC-Measured 4.1 4.2 4.2 4
CIPIC-Min-PD 3.8 3.8 4.2 3.5
CIPIC-Proposed 3.8 4.1 4.1 3.9
KEMAR-Measured 3.8 4 4.2 3.9
KEMAR-Min-PD 3.6 3.8 4.2 3.5
KEMAR-Proposed 3.6 4 4 3.8
MiPS-Measured 3.5 4.2 4.1 3.7
MiPS-Min-PD 3.5 3.7 4 3.3
MiPS-Proposed 3.5 3.7 4.1 3.7
Table 1: Mean Opinion Scores obtained for binaural synthesis in Horizontal plane

5 Conclusion

A new HRTF model is developed in this paper for accurate reconstruction of HRTFs. A novel group delay decomposition method is used in developing the proposed model which captures the pinna spectral notches due to both the minimum phase and all pass component of the HRTF. Selectively including the all pass filter for specific angles in to the proposed model improved the accuracy of capturing the HRTF phase response. Additional advantages of the proposed HRTF reconstruction model include a PSN extraction with a high degree of resolution and reduced cost of the binaural sound synthesis without compromising the important spectral cues. Future work will investigate the computational complexity of the proposed HRTF model in binaural sound synthesis applications. Development of real time binaural sound systems using the proposed HRTF model will also be investigated in future.

6 Acknowledgments

This work was funded in part by TCS Research Scholarship Program under project number TCS/CS/2011191E and in part by SERB, Dept. of Science and Technology, GoI under project number SERB/EE/2017242.

References

  • [1] Begault D. R. and Trejo L. J.: ’3-d sound for virtual reality and multimedia’, 2000.
  • [2] Blauert, J.: ’Spatial hearing: the psychophysics of human sound localization’, Cambridge, Mass. MIT Press, 1997. [Online]. Available: http://opac.inria.fr/record=b1126831.
  • [3] Algazi, V., Duda, R., Thompson, D., and Avendano, C.: ’The cipic hrtf database’, in IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics, 2001, pp. 99-102.
  • [4] Jin, C. T., Guillon, P., Epain,N., Zolfaghari, R., Schaik, A.V., Tew, A. I., Hetherington, C., and Thorpe, J.: ’ Creating the sydney york morphological and acoustic recordings of ears database’, in IEEE Transactions on Multimedia, vol. 16, no. 1, pp. 37?46, Jan 2014.
  • [5] Kulkarni, A., Isabelle, S., and Colburn, H.: ’On the minimum-phase approximation of head-related transfer functions’ , in IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, 1995, pp. 84-87.
  • [6] Nam, J., Kolar, M., and Abel, J.: ’On the minimum-phase nature of head-related transfer functions’, in the 125th Audio Engineering Society Convention, AES. San Francisco, CA, USA: AES, 2008. [Online]. Available: http://www.aes.org/e-lib/browse.cfm?elib=14698.
  • [7] Wright, D., Hebrank, J. H., and Wilson, B.: ’Pinna reflections as cues for localization’, The Journal of the Acoustical Society of America, vol. 56, no. 3, 1974.
  • [8] Nathwani, K., Pandit, P., and Hegde, R. M.: ’Group delay based methods for speaker segregation and its application in multimedia information retrieval’, IEEE Transactions on Multimedia, vol. 15, no. 6, pp. 1326-1339, Oct 2013.
  • [9] Raykar, V. C., Duraiswami, R., and Yegnanarayana, B.: ’Extracting the frequencies of the pinna spectral notches in measured head related impulse responses’, The Journal of the Acoustical Society of America, vol. 118, no. 1, 2005.
  • [10] Abhayapala, T. D., Kennedy, R. A. and Zhang, W.: ’Horizontal plane hrtf reproduction using continuous fourier-bessel functions’, in Audio Engineering Society Conference: 31st International Conference: New Directions in High Resolution Audio, Jun 2007. [Online]. Available: http://www.aes.org/e-lib/browse.cfm?elib=13969
  • [11] Smith, J. O.: ’Introduction to Digital Filters with Audio Applications’, W3K Publishing, 2007. [Online]. Available: http://www.w3k.org/books/
  • [12] Smith, J. O.: ’Introduction to digital filters with audio applications’ , Available [Online]: http://ccrma.stanford.edu/ jos/filters.
  • [13] Oppenheim A. V., and Schafer, R. W., ’Discrete-time signal processing’, Pearson Higher Education, 2010.
  • [14] Yegnanarayana, B., and Murthy, H.A.: ’Significance of group delay functions in spectrum estimation’, IEEE Transactionson signal processing, vol. 40, no. 9, pp. 2281-2289, 1992. [Online]. Available: http://dx.doi.org/10.1007/s12046-011-0045-1
  • [15] Kumar, L., Tripathy, A., and Hegde, R. M.: ’Robust multi-source localization over planar arrays using music-group delay spectrum’, IEEE Transactions on Signal Processing, vol. 62, no. 17, pp. 4627-4636, 2014.
  • [16] Hegde, R. M., Murthy, H. A., and Gadde, V. R. R. ’Significance of the modified group delay feature in speech recognition’, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 1, pp. 190-202, Jan 2007.
  • [17] Zhang, W., Abhayapala, T. D., Kennedy, R. A., and Duraiswami, R. : ’Insights into head-related transfer function: Spatial dimensionality and continuous representation’, The Journal of the Acoustical Society of America, vol. 127, no. 4, 2010.
  • [18] Gardner, B. Martin K. et al.: ’Hrtf measurements of a kemar dummy-head microphone’, Massachusetts Institute of Technology, vol. 280, no. 280, pp. 1-7, 1994.
  • [19] Reddy S. and Hegde, R. M.: ’Design and development of bionic ears for rendering binaural audio’, in 2016 International Conference on Signal Processing and Communications (SPCOM), 2016.
  • [20] Spagnol, S., Geronazzo, M., and Avanzini, F., ’Fitting pinna-related transfer functions to anthropometry for binaural sound rendering’, in IEEE International Workshop on Multimedia Signal Processing (MMSP), Oct 2010, pp. 194-199.
  • [21] Guastavino C., and Katz, B. F. G.: ’Perceptual evaluation of multi-dimensional spatial audio reproduction’, The Journal of the Acoustical Society of America, vol. 116, no. 2, 2004.
  • [22] Lindau, A., Erbes, V., Lepa, S., Maempel, H.J., Brinkman, F., and Weinzierl, S.: ’A spatial audio quality inventory (saqi)’, Acta Acustica united with Acustica, vol. 100, no. 5, 2014.