Interaural Coherence Across Frequency Channels Accounts for Binaural Detection in Complex Maskers

by   Bernhard Eurich, et al.
University of Oldenburg

Differences in interaural phase configuration between a target and a masker can lead to substantial binaural unmasking. This effect is decreased for masking noise having an interaural time difference (ITD). Adding a second noise with the opposite ITD further reduces binaural unmasking. Thus far, simulation of the detection threshold required both a mechanism for internal ITD compensation and an increased binaural processing bandwidth. An alternative explanation for the reduction is that unmasking is impaired by the lower interaural coherence in off-frequency regions caused by the second masker (Marquardt and McAlpine 2009, JASA pp. EL177 - EL182). Based on this hypothesis the current work proposes a quantitative multi-channel model using monaurally derived peripheral filter bandwidths and an across-channel incoherence interference mechanism. This mechanism differs from wider filters since it is moot when the masker coherence is constant across frequency bands. Combined with a monaural energy discrimination pathway, the model predicts the differences between single- and double-delayed noise, as well as four other data sets. It can help resolving the inconsistency that simulation of some data sets requires wide filters while others require narrow filters.



There are no comments yet.


page 3


Prediction of tone detection thresholds in interaurally delayed noise based on interaural phase difference fluctuations

Differences between the interaural phase of a noise and a target tone im...

The Complex-Pole Filter Representation (COFRE) for spectral modeling of fNIRS signals

The complex-pole frequency representation (COFRE) is introduced in this ...

3DCAM: A Low Overhead Crosstalk Avoidance Mechanism for TSV-Based 3D ICs

Three Dimensional Integrated Circuits (3D IC) offer lower power consumpt...

A MAP-MRF filter for phase-sensitive coil combination in autocalibrating partially parallel susceptibility weighted MRI

A statistical approach for combination of channel phases is developed fo...

Simulating Multi-channel Wind Noise Based on Corcos Model

A novel multi-channel artificial wind noise generator based on a fluid d...

Simulating Multi-channel Wind Noise Based on the Corcos Model

A novel multi-channel artificial wind noise generator based on a fluid d...

The complex-valued correlation coefficient accounts for binaural detection

Binaural hearing is one of the principal mechanisms enabling the localiz...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The detection of a pure tone in noise is facilitated by differences in the interaural phase between tone and noise (hirsh1948). The improvement in the detection threshold compared to the diotic case is referred to as the binaural masking level difference (BMLD). The maximum BMLD is observed when detecting an antiphasic pure tone target () in an in-phase noise masker (). Adding an interaural time difference (ITD = ) to the masker has been observed to reduce the BMLD (langford1964). Explanations of the underlying interaural mechanisms have referred to the normalized cross-correlation function. For two complex-valued signals and of the form and , where is the instantaneous amplitude and the instantaneous angular frequency, it is defined as


and denote the energies of the signals, i.e., , and the asterisk the complex conjugate. The real part is the real-valued correlation function which is also that which (1) delivers for real-valued signals, such as sound pressure. The values at are referred to as correlation coefficients and written in the short forms and , respectively. Models evaluating the real-valued correlation coefficient explain BMLD as a function of (rabiner1966) and partial correlation obtained by mixing correlated and uncorrelated noise (robinson1963) well, but only for the special case where , i.e. if the real-valued correlation function has a (main or side) peak at .

The modulus can be described as the envelope of and provides a measure for the phase consistency between the two signals and . We refer to it as the temporal coherence function and at to the interaural coherence (see encke2021 for an in-depth discussion and visualization).

If , i.e. auto- instead of cross-correlation, is determined by the signal bandwidth (Wiener-Khinchin theorem, khintchine1934). In the auditory context, the signal bandwidth is the bandwidth after peripheral filtering. langford1964, rabiner1966 and encke2021 have shown that the ITD dependence of the BMLD can to a large extent be explained based on , if a basilar membrane mimicking filter with a bandwidth according to glasberg1990 has been applied to the signal.

Other models rely on the hypothesis of jeffress1948 stating that the binaural system is able to compensate for the interaural masker delay by internal delays, referred to as delay lines (vanderheijden1999; stern1996; bernstein2018; bernstein2020). That is, the brain has access to the real-valued cross-correlation function over a considerable range of latency differences. To account for the decrease in BMLD with increasing , delay-line-based models reasonably assume a decreasing accuracy of the delay compensation, described by a function (e.g. colburn1977).

vanderheijden1999 introduced a new stimulus, aiming to isolate the additional unmasking facilitated by putative delay lines in addition to any unmasking resulting from a possible partial correlation at . First, they measured detection thresholds for a pure tone masked by broadband noise as a function of the ITD (single-delayed noise, SDN; rectangles in Figures 1d). This condition is equivalent to langford1964 (langford1964, their Figures 1 and 2) and the corresponding BMLDs are very similar. They further generated a stimulus called double-delayed noise (DDN, diamonds in Figures 1d) by adding two noises, one with a positive and one with a negative ITD. For any given the conventional SDN and DDN have the same correlation coefficient. The DDN, however, limits the usefulness of the putative delay lines as internal delays can only compensate for the ITD of one noise. vanderheijden1999 therefore used the decline of BMLD with

in DDN to derive the binaural processing bandwidth. As outlined above, the two quantities are connected by the Wiener-Khinchin theorem. The term "binaural processing bandwidth" is used to leave open the possibility that this bandwidth differs from the respective basilar membrane filter characteristics, e.g., by some form of spectral integration in the binaural pathway. In contrast to previous attempts using SDN for processing bandwidth estimates

(langford1964; rabiner1966), vanderheijden1999 could be sure that their calculation was not confounded by a more gradual BMLD decline caused by delay lines. That is, they prevented a potential underestimation of the processing bandwidth. In a final step, vanderheijden1999 attributed the larger BMLD in SDN, compared to DDN, to the correlation increase caused by the delay line (as illustrated by the dashed arrow in Figure 1d). To fit their model to the data, they (1) adjusted the filter bandwidth to account for the DDN thresholds and then (2) derived a function so that the predetermined filter bandwidth plus delay line accounted for the SDN thresholds. The peripheral filter that best fit their model to the data had an equivalent rectangular bandwidth (ERB) of to at . This bandwidth range is consistent with estimates from binaural notched-noise experiments (e.g. sondhi1966, kolarik2010). It does, however, not agree with SDN data fitted for different masker bandwidths by bernstein2020. There, despite the very similar method, an ERB of "not greater than " (for the filter centered on-frequency, i.e. at ) was required. This narrow processing bandwidth is in agreement with processing bandwidths obtained with SDN and without considering delay lines, which are (rabiner1966; langford1964; dietz2021). For comparison, “standard”, monaurally obtained ERBs at center frequency are (glasberg1990).

Figure 1: Panel a) Cross-correlogram of SDN with = . White and black areas represent maxima and minima of the cross-correlation functions, respectively. The white box highlights the frequency channel while the gray box highlights a channel centered at . b) Interaural cross-correlogram as in a) but for DDN. c) Continuous lines: Normalized Cross-power spectral density (CPSD) at  ms as a function of frequency, , as derived in (13) et seq.; Bars: Interaural coherence of the signals after peripheral Gammatone filtering. d) Thresholds of detection in SDN and DDN as a function of interaural delay from vanderheijden1999. The dashed lines symbolize the coherence-decline induced threshold increase determined by a processing bandwidth of ERB = (lower line) and for an ERB = (upper line). As denoted by the arrows, the data can be explained in two ways: (1: continuous arrow) The DDN thresholds are determined by the cross-correlation function at 500 Hz and a processing bandwidth . A delay line causes the lower SDN thresholds. (2: dotted arrow) The SDN thresholds are determined by the ITD-dependent coherence as derived from an ERB of . Off-frequency incoherence in DDN causes higher DDN thresholds.

marquardt2009 pointed out that the thresholds obtained with spectrally complex maskers, particularly with DDN, are not necessarily caused by a large peripheral filter bandwidth. They noted that the interaural coherence of SDN is fairly constant across frequency bands (see Figure 1a,c) while the coherence in DDN is spectrally modulated (Figure 1b,c). Its spectral shape is derived in the appendix. Depending on , the off-frequency coherence can thus be lower in DDN than in SDN while the on-frequency coherence is identical in both stimulus types. marquardt2009 hypothesized that "reduced interaural coherence appears detrimental to binaural detection, even outside the target’s frequency channel". If this is true, both SDN and DDN thresholds can potentially be explained using the same standard filter bandwidth with the lower off-frequency coherence elevating the corresponding DDN thresholds. Such a standard-filter-plus-off-frequency-impact concept implies that the gradual unmasking decline in SDN does not have to originate from delay lines because it is already explained by the coherence that is defined by the narrow filtered signal.

The concept of a detrimental incoherence interference may also help in reducing inconsistencies in estimating the processing bandwidth from other binaural detection experiments, e.g., band-widening versus notched-noise (nitschmann2013). This concept may enable models to account for the different stimulus types with a fixed standard filter bandwidth and without delay lines. Thus far, however, the hypothesis has only been tested using a very simplistic model. The simulations from marquardt2009 with an target lack adequate precision and there are no simulations for the target.

The goal of the present study was a model based on the standard-filter-plus-off-frequency-impact concept that can account for the critical vanderheijden1999 data, with an accuracy comparable to the wider-filter-plus-delay-line model by vanderheijden1999. This would resolve the present contradiction that filters need to be narrow to account for some and broad to account for other data.

The proposed model follows the approach of encke2021 that the coherence is responsible for the binaural system’s unmasking capabilities. A low coherence expresses itself in fluctuations in the interaural phase difference (IPD). The hypothesized across-channel incoherence interference is implemented in analogy to different currents in a bundle of electric cables: An alternating, i.e. fluctuating, current causes electromagnetic interferences, while direct current does not. In the frame of IPD fluctuations, this means: Only frequency channels carrying less-coherent signals, i.e. stronger IPD fluctatuations, affect the on-frequency channel. We evaluated our model with data from five different studies in three groups:

(1) vanderheijden1999 combined all key aspects required to revisit marquardt2009’s hypothesis: (a) The SDN thresholds are planned to be determined by the decay of with a -wide Gammatone filter. (b) The higher DDN thresholds will be simulated by an across-channel incoherence interference but with the same 79 Hz on-frequency filter. The most important datapoints to judge this are at those ITDs where the on-frequency coherence of SDN and DDN is the same, but thresholds differ. For detection this is the case at = and , for detection at = and .

(2) marquardt2009 not only presented the above-mentioned hypothesis but also detection thresholds with SDN and DDN maskers that are spectrally surrounded by constant-IPD bands that either do or do not cause interaural incoherence at the transitions. Their reported differences impose a challenge for single-channel models that use a constant filter bandwidth.

(3) sondhi1966; holube1998; kolarik2010, reported detection thresholds of an tone centered in an in-phase noise that is spectrally surrounded by antiphasic noise. These simulations are included for an additional discussion about the proposed narrow-filter-plus-channel-interaction concept, since larger binaural processing bandwidths have previously been derived based on such data.

2 Description of the Model

Figure 2 shows the processing stages of the proposed model. It is designed as a multi-channel model through all stages, but these were here realized and tailored to predict binaural-detection data with a pure-tone target. Briefly, the model builds on the analytical single-channel model approach of encke2021. Furthermore, the model includes an across-channel interaction mechanism. It consists of a multi-channel binaural processing pathway and a monaural pathway. Both pathways compare multiple tokens of the processed signal representation of the condition-specific masker to the representation of signal plus masker. This comparison has been suggested to mimic a subject’s strategy of comparing a stimulus to a learned reference template (dau1996; dau1997b; jepsen2008; breebaart2001; bernstein2017a). Based on these comparisons, both pathways deliver a sensitivity index (). An optimal combination of the pathways’ estimates gives the overall estimate of the model (biberger2016).

Figure 2: Processing stages of the proposed model. See main text for details.

2.1 Peripheral Processing

The left and right input signals were processed with a fourth-order Gammatone filterbank that represents basilar-membrane bandpass filtering. The filterbank implementation by hohmann2002 was employed with a spacing of five filters per ERB in the range of to . The grid was defined by centering one filter at . This filter had an ERB of (glasberg1990) and was indexed with .

In order to focus on the influence of the above-mentioned key elements, the present implementation did not include any other peripheral processing such as low-pass filtering, power-law compression, or half-wave rectification.

2.2 Binaural Pathway

Right at the outset the correlation coefficient was derived from the analytical, i.e. complex-valued left and right signals and in the frequency channel , provided by the Gammatone filterbank:


where marks the temporal mean. This is a more efficient implementation of the definition in 1. The complex-valued correlation coefficient was used because it conveniently combines information about both the mean IPD as and about IPD fluctuations, i.e. interaural coherence . Theoretically this information can be extracted from two (real-valued) correlation units, ideally with 90° IPD-tuning difference (Dietz2021a). As discussed in encke2021

, this is meant to represent the ensemble information of binaural neurons in mammals

(mcalpine2001; grothe2010), including their sensitivity to fast fluctuations (siveke2008; joris2006; vanderheijden2010). Sensitivity to the amount of IPD fluctuations makes this model class crucially different from most established models of binaural unmasking, where real-valued correlation is used (bernstein2017a; vanderheijden1999) which only depends on time-averaged IPDs or ITDs.

As pointed out in the Introduction, the novelty of the present model is the interference of IPD fluctuations across frequency channels. The term incoherence interference describes purely detrimental effects, i.e. only channels with lower coherence affect their neighbor, but not the other way around. Simulation therefore involves a restricted across-channel weighted average of the coherence : The are limited to the coherence of the channel for which the post-interference coherence is calculated, thus referred to as .


where symbolizes a function that weights the contribution of a channel to the resulting . The employed weighting function has an exponential decay described by


represents the decay parameter, normalized by the number of filters per ERB, . The double-exponential decay shape is an ad-hoc choice, inspired by descriptions of the channel interaction through spread of excitation in cochlear implants (bingabr2008; biesheuvel2016).

For a low masker coherence, or at the – practically irrelevant – case of a positive SNR, adding a target with an IPD of relative to the masker can swap the mean IPD from the masker to that of the target. In special cases, the masker alone and masker plus target can have the same coherence, but differ in their mean IPD and thus in their correlation. Thus, the interaural coherence is not sufficient as a decision variable. Instead, , including both coherence and the mean IPD, is required. Therefore, the original mean IPD is now "given back" to the coherence after the interference and limitation stage, so that the model can operate on the complex correlation coefficient as suggested by encke2021.


Unity-limited measures such as coherence- or correlation can be Fisher , i.e. atanh

-transformed to approximate an equal-variance axis in perception (

mcnemar1969, as often applied in psychophysics, e.g., luddemann2007; bernstein2017a, and technical applications, e.g., just1994). As in encke2021, is multiplied by a model parameter to avoid an infinite sensitivity to deviations from a coherence of one. This is equivalent to adding uncorrelated noise to the two input signals. The decision variable of the binaural pathway is thus



is the fisher-z transform applied to the modulus of

while leaving the argument unchanged.

In the signal detection stage, the is obtained based on the difference between the ensemble averages of the representations of the noise alone, , and the representations of the target signal plus noise, :


The internal noise defines the absolute performance of the binaural model pathway (encke2021).

2.3 Monaural Pathway

For the monaural pathway, the power of the on-frequency filter channel was evaluated. It is half the squared mean of the envelope across the whole signal duration (biberger2016). The envelope is the modulus of the complex-valued filter output:


In the stimuli employed in this study, the power is identical in the left and right channels, thus it is sufficient to evaluate only one side.

The for the monaural detection follows the unequal variance model of signal detection theory (simpson1973; swets1986) with the averages,

, and the standard deviations,

, of each stimulus’ of signal plus masker or masker alone, respectively:


The processing accuracy of a signal-induced power change is limited by a model parameter that represents a stimulus-dependent internal noise with a Gaussian distribution of amplitudes and a variance of

. The of the monaural path is therefore


2.4 Detector

The sensitivity indices of the binaural, , and monaural pathway, were combined in an optimal manner, as in ewert2000; furukawa2008; biberger2016. This assumed two independent indices. The output of the model is thus


The that corresponds to the experiment-specific detection thresholds was obtained via table-lookup (Numerical evaluation in hacker1979). This depends on the number of intervals, as well as the down-up-paradigm of the alternative-forced-choice (AFC) procedure used in the simulated experiments. Evaluating the model successively with increasing level of the target signal delivered its condition-specific psychometric function. The logarithmic as a function of target level is a straight line. The predicted detection threshold was obtained from a straight line fitted to the logarithmic .

3 Predictions of Binaural-Detection Datasets

In all experiments, a or tone was to be detected in a broadband Gaussian noise masker. Figures 3, 4 and 5 show the experimental data denoted by symbols, as well as the predictions of the proposed model as continuous lines. Three types of binaural-detection experiments were simulated, as described in detail in the following subsections.

Table 1 summarizes the predicted experimental conditions.

3.1 van der Heijden & Trahiotis 1999

Figure 3: Experimental data from vanderheijden1999 (symbols). The continuous lines show the predictions of the presented model including the across-channel incoherence interference. The dashed lines show predictions for DDN with a single-channel version, i.e. without interference, equivalent to encke2021. Upper panel: Detection thresholds with target; lower panel: with target.

In this arguably most central experiment, detection thresholds of an target tone (Figure 3, upper panel) as well as of an tone (Figure 3, lower panel) were measured as a function of the interaural masker delay, , in steps of . The bandwidth of the masker was . As outlined in the Introduction, the DDN consisted of two superimposed noises with opposite ITD. The experiment performed by vanderheijden1999 employed a four-interval, two-alternative forced choice task (4I-2AFC, first and fourth intervals always contained only the masker and served as queuing intervals). Their adaptive 2-down 1-up stair case procedure estimated the correct-response threshold. This is equivalent to a of 0.78 at threshold. The continuous lines in Figure 3 show the simulations of the presented model, including the across-channel incoherence interference. From visual inspection, the simulations captured all effects from the experimental thresholds and the critical threshold differences between SDN and DDN at all ITDs under both conditions. Specifically, the critical and threshold difference at = in the and = in the condition, respectively, are precisely accounted for. This good correspondence is also reflected in the more than explained variance under both conditions, and RMS errors of less than . The dashed lines show simulations without the across-channel incoherence interference (single-channel version, equivalent to encke2021) but all other model parameters unchanged. This shows that a large amount of the threshold differences is already explained by differences in the on-frequency coherence. In much the same way as DDN coherence oscillates as a function of analysis frequency (Fig. 1c), it also fluctuates as a function of . Particularly at = , DDN is incoherent in the 500-Hz band, whereas SDN is almost fully coherent. This, and not the across-frequency process, causes the difference in the simulated thresholds at this ITD. The across-frequency process only comes into play at those ITDs where the coherence at (on-frequency) is nearly identical in SDN and DDN (upper panel: = and ; lower panel: = and ).

3.2 Marquardt & McAlpine 2009

Figure 4: Experimental data from marquardt2009 (symbols) and model predictions (lines).

The stimuli of this experiment contained SDN or DDN centered at the frequency of the target tone with a constant in the notch band. The notch was spectrally surrounded by constant-IPD bands with and

, or vice versa. Thresholds are given as a function of the notch bandwidth. The resulting phase transitions between notch and flanking bands have been hypothesized to impair the detection if they cause a frequency region of low interaural coherence. The lower and upper frequency limits of the composite stimuli are

and , respectively. The two-interval-two-alternative-forced choice task with a 3-down 1-up procedure that was used estimated the thresholds to be correct. This corresponds to at the threshold predicted by the model.

In Figure 4, detection thresholds of the tone are shown as a function of the bandwidth of the inner band. Again, the model predicted all critical characteristics of the data. These are the at the full notch bandwidth (same as = in the condition in vanderheijden1999), the elevated SDN thresholds in the [-, , +] compared to the [+, , -] condition and the BMLD where the notch bandwidth is zero.

3.3 Experiments on the operating bandwidth in binaural detection

Several studies investigated the operating bandwidth in binaural detection using notched-noise binaural detection (sondhi1966; holube1998; kolarik2010). The masking noise is diotic () in the notch band and antiphasic () in the flanking bands. Detection thresholds of an target tone were again measured as a function of the notch bandwidth. Results are expressed as the difference between thresholds in the notched condition and the threshold without notch, i.e. . In Figure 5, the circles mark the threshold differences reported by kolarik2010 (kolarik2010, centered condition), which represent averages across their three participants. The triangles show individual thresholds of the two participants in the study by holube1998 (holube1998, rectangular condition). The gray diamonds show the data from sondhi1966. Our model predictions were oriented on the 2-down 1-up 2I-2AFC paradigm employed in kolarik2010, equivalent to at threshold. The black continuous line shows the model predictions with the same parameter settings as used to predict the detection thresholds in vanderheijden1999 (vanderheijden1999, Figure 3b). The dotted black line shows model predictions without the across-channel interaction, so that detection was purely determined by the ERB = Gammatone filter centered at 500 Hz. Despite the large deviations between and within experiments, the model predictions involving the across-channel interaction captured the shape of the decreasing thresholds with increasing notch bandwidth.

Figure 5: Symbols denote data from notched-noise binaural detection experiments with the configuration as a function of the inner-band () bandwidth; continuous line: Model prediction employing across-channel "incoherence interference"; dotted Line: Prediction with model relying only on the on-frequency filter (ERB = , centered at )
Experiment Signal Noise Variable Var. Exp./ % RMSE / dB
vanderheijden1999 0.91 0.20 0.50 0.40 94.9 0.85
0.86 0.17 0.65 0.40 92.4 0.84
marquardt2009 0 [+, , -] 0.89 0.24 0.65 0.40 93.1 0.71
[-, , +] 75.7 1.25
[+, , -] 96.7 0.27
[-, , +] 94.4 0.35
kolarik2010 [, 0, ] 0.91 0.20 0.50 0.40 97.1 0.67
Table 1: Summary of the simulated experiments and predictions. Columns 1 - 4: Simulated experiment, interaural configuration of the used target signal, that of the masking noise, independent variable (: masker ITD; : Masker inner-band bandwidth). Columns 5 - 8: Used model parameters: : Maximum coherence (internal noise); : Standard deviation of the internal noise to determine the absolute performance of the binaural pathway; : Slope parameter of the double-exponential across-channel interaction window (normalized by the number of filters per ERB); : Standard Deviation of the level-dependent Internal Noise to determine the accuity of the monaural pathway; Columns 9 - 10: Percentage of the variance in the data accounted for by the model; Root Mean Squared Error of the predictions.

4 Discussion

It has been shown that a large amount of binaural detection data can be explained purely on the basis of the coherence defined by a wide Gammatone filter at = (rabiner1966; encke2021). This accounts for experimentally obtained thresholds with fully coherent broadband noise maskers (hirsh1948; vandepar1999), for mixtures of correlated and uncorrelated noise (robinson1963; pollack1959; bernstein2014), and for experiments where interaural coherence is reduced by an ITD (langford1964; rabiner1966; bernstein2020). All these experiments have in common that the coherence and the phase relationship between noise and target are fairly constant across frequency bands. However, the on-frequency coherence does not account for thresholds obtained with maskers where these properties change substantially in filter bands that are relatively near to the target frequency. Specifically, the single-channel model version proposed in encke2021 is not able to predict all of the threshold differences between SDN and DDN (see the dashed lines in Fig. 3).
marquardt2009 hypothesized across-channel processing in the binaural system to explain the reduced binaural benefit under such conditions. Here, we extended the analytical model by encke2021 to a multi-channel numerical signal-processing model with interference of IPD fluctuations. With the interference applied only to the fluctuations, not the mean IPD, the proposed model differs from approaches assuming wider binaural filters (e.g. vanderheijden1999; kolarik2010). For stimuli with spectrally constant coherence and masker-target phase relations, e.g., SDN and all conditions simulated by encke2021, the incoherence interference is moot and the model operates on the standard filter bandwidths of its peripheral filterbank.
The dataset of vanderheijden1999 contains both SDN and DDN, and is therefore the critical challenge for binaural detection models 111 The most comprehensive simulation of dichotic tone in noise detection thresholds using a cross-correlation-based model is by bernstein2017a. It is not expected to simulate the DDN detection thresholds of vanderheijden1999 with a good accuracy, because an ERB of at least is necessary. Other DDN stimuli, used experimentally by bernstein2015, were included in the model test battery by bernstein2017a. Those DDN stimuli, however, differed in several ways from the former. First, the target frequency is , compared to 500 Hz in vanderheijden1999 and in all other studies here simulated. Second, instead of fixing the target tone to or , the target is delayed by the same amount as one of the two noises, i.e. . Such an approach is useful for SDN, as it ensures a constant difference between the IPDs of the noise and of the tone. For DDN, however, the IPD of the second noise relative to the tone is offset from by . This type of stimulus therefore causes an even more complex -dependence of threshold, which offers no advantage over the DDN from vanderheijden1999 for filter estimation. With both definitions, corresponding SDN and DDN stimuli can be generated only if is an integer or a half-integer multiple of the target period (i.e. , ). In bernstein2015 (their Figure 1, Panel a) these are the two data points at = 2 and . SDN and DDN thresholds are, however, very similar at those points. Third, the masker bandwidth is . For such a masker bandwidth similar to a peripheral filter width, neither vanderheijden1999 nor our model would predict a considerable threshold difference between SDN and DDN at = 2 and , since there are no off-frequency regions of considerably lower coherence.. Their correlation-based model precisely predicted DDN detection thresholds by fitting the filter bandwidth to the data, such that the resulting correlation at determined the DDN thresholds. The best-fitting peripheral filters had an ERB between 130 and at . The stronger unmasking with SDN was then attributed to ITD-compensating delay lines. Our model has the reverse rationale: The peripheral filter (ERB = ) determines the coherence decline accounting for detection in SDN (rabiner1966; marquardt2009; dietz2021; encke2021). The suggested interference mechanism then reduces the effective coherence in DDN (marquardt2009), increasing detection thresholds. Both vanderheijden1999’ and our model simulate the data very accurately. Therefore, the discussion focuses on consequences and plausibility of the two different concepts.

The processing bandwidth dictates the temporal coherence and thus the decline of BMLD with increasing noise ITD in the absence of ITD compensation (rabiner1966; vanderheijden1999; dietz2021). To date, two of the arguably most comprehensive datasets of dichotic tone-in-noise detection, vanderheijden1999 and bernstein2020, have self-reported mutually exclusive requirements for the processing bandwidth (ERB = 130 vs. ERB at ).
A variety of studies aims to estimate the binaural processing bandwidth by means of dichotic tone-in-noise detection, but no consistent picture emerges. There is, for example, a difference in estimated processing bandwidth between band-widening and notched-noise BMLD data, and between two different interaural configurations (e.g., kolarik2010). Consequently, the topic is somewhat controversially debated and to date there is no clear consensus about either the underlying mechanism or the interpretation of the data (see, e.g., verhey2020 for review). Most of the recent binaural models, such as bernstein2017a and encke2021 assume a binaural processing bandwidth as narrow as the peripheral bandwidth. This is reasonable, considering direct measurements of the binaural processing bandwidth in ITD-sensitive inferior colliculus neurons in cats (mclaughlin2008): For (single) delayed noise, they found that damping of the cross-correlation function corresponds to the peripheral bandwidth at the respective center frequency. To develop a binaural model that also accounts for maskers that so far have been explained with a larger binaural processing bandwidth, like DDN, we built on the hypothesis of marquardt2009: We speculate that a low masker coherence in spectrally adjacent frequency bands reduces the ability of the auditory system to detect relatively smaller changes in IPD fluctations caused by the target tone. This assumption, that listening is affected by the interaural incoherence in surrounding frequency regions, is related to binaural interference in detection tasks (bernstein1995) and to the more diffuse sound sensation. Hints towards across-channel processing in binaural unmasking were also found by ewert2017, predicting spatial release from masking (SRM) with an equalization-cancellation (EC) model. Assuming an across-channel dependence of the EC parameters increased the predictive power. Conceptual similarity to the proposed across-channel incoherence interference can be found in models of monaural modulation processing: In piechowiak2007 and dau2013, modulation patterns interact across channels, while energetic spectral masking properties do not.

In contrast, in the case of narrow-band maskers, off-frequency channels can also improve

detection, because they carry a similar signal-to-noise ratio

vandepar1999. The present single-channel detection stage does not exploit this effect (see encke2021 for details), which is only a matter of the final integration stage (breebaart2001a; breebaart2001b).

The binaural system’s sensitivity to changes in ITD depends on the baseline ITD, including an IPD dependence for pure tones (yost1974) and a reduced unmasking for compared to hirsh1948. Delay-line models can account for this dependence with a corresponding function. However, they then incorrectly predict better unmasking with compared to when (breebaart1999). The present model has a reversed problem and cannot account for the former difference. When accounting for the precision of IPD encoding from the Fisher information of neural response population recorded in mammals, this dependence is also expected (harper2004; encke2019). An angular compression of the decision variable space at large IPD is a possible model extension. With the present implementation, the binaural pathway parameters (, , ) had to be adjusted slightly between conditions with targets and conditions with targets (see Table 1).

5 Conclusion

The proposed binaural model with detrimental interaural incoherence interference operating across neighboring auditory filters simulates binaural detection thresholds from several data sets with spectrally complex maskers. Employing conventional auditory filters glasberg1990, it predicts the reduced unmasking in double-delayed noise (vanderheijden1999) compared to conventional interaurally delayed noise. The concept can help to resolve the inconsistency inherent in the fact that binaural models require standard filter bandwidths for most data sets (bernstein2017a; bernstein2020), but at least 1.6 times wider filters for double-delayed noise vanderheijden1999 and other spectrally complex maskers verhey2020.
The main consequence of using a standard filter bandwidth is that the gradual decline of the binaural benefit with masker ITD can be simulated without internal ITD compensation, as first suggested by langford1964.


This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Projektnummer 352015383 – SFB 1330 B4.

Appendix A Derivation of cross-power spectral density in double-delayed noise

In DDN, two two-channel signals and with opposite ITDs, and , are summed. The cross-power spectral density (CPSD) functions for ensembles of such signals are


The amplitudes are 0.5, so that a DDN has the same energy as a SDN. Summation of the time signals is equivalent to a summation of their CPSD functions, which leads to


This resulting cosine pattern is determined by the sum of the CPSDs’ phases adding up or canceling each other at different frequencies. The modulus of this normalized CPSD represents the coherent energy of the signals as a function of frequency (gardner1992),


If is based on ensembles of signals, then . As a continuous function of it gives a coherence for any frequency representing an infinitesimally small bandwidth , illustrated as continuous lines in Figure 1c. The coherence for peripherally filtered, i.e. finite-bandwidth signals is an average of the frequencies’ normalized CPSDs . The coherence decreases with increasing interaural delay and increasing bandwidth , as illustrated by the bars in Figure 1c. Two superimposed noises with of are in phase at . At , however, they have IPDs of and , respectively. The coherence between left and right signals at is therefore zero.