Multi-Sound-Source Localization for Small Autonomous Unmanned Vehicles with a Self-Rotating Bi-Microphone Array

04/13/2018 ∙ by Deepak Gala, et al. ∙ New Mexico State University 0

While vision-based localization techniques have been widely studied for small autonomous unmanned vehicles (SAUVs), sound-source localization capability has not been fully enabled for SAUVs. This paper presents two novel approaches for SAUVs to perform multi-sound-sources localization (MSSL) using only the interaural time difference (ITD) signal generated by a self-rotating bi-microphone array. The proposed two approaches are based on the DBSCAN and RANSAC algorithms, respectively, whose performances are tested and compared in both simulations and experiments. The results show that both approaches are capable of correctly identifying the number of sound sources along with their three-dimensional orientations in a reverberant environment.



There are no comments yet.


page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Small autonomous unmanned vehicles (e.g., quadcopters and ground robots) have revolutionized civilian and military missions by creating a platform for observation and permitting access to locations that are too dangerous, too difficult or too costly to send humans. These small vehicles have shown themselves to be remarkably capable in a lot of applications, such as surveying and mapping, precision agriculture, search and rescue, traffic surveillance, and infrastructure monitoring, to name just a few.

The sensing capability of unmanned vehicles has been enabled by various sensors, such as RGB cameras, infrared cameras, LiDARs, RADARs, and ultrasound sensors. However, these mainstream sensors are subject to either lighting conditions or line-of-sight requirements. On the end of the spectrum, sound sensors have the superiority to conquer line-of-sight constraints and provide a more efficient approach for unmanned vehicles to acquire situational awareness thanks to their omnidirectional nature.

Among the sensing tasks for unmanned vehicles, localization is of utmost significance [1]. While vision-based localization techniques have been developed based on cameras, sound source localization (SSL) has been achieved using microphone arrays with different numbers (e.g., 2, 4, 8, 16) of microphones. Although it has been reported that the accuracy of the localization is enhanced as the number of microphones increases [2, 3], this comes with the price of algorithm complexity and hardware cost, especially due to the expense of the Analog-to-Digital converters (ADC), which is proportional to the number of speaker channels. Moreover, arrays with a particular structure (e.g., linear, cubical, circular, etc.) will be difficult to control, mount and maneuver, which makes them unsuitable to be used on small vehicles.

Humans and many other animals can locate sound sources with decent accuracy and responsiveness by using their two ears associated with head rotations to avoid ambiguity (i.e., cone of confusion) [4]. Recently, SSL techniques based on a self-rotating bi-microphone array have been reported in the literature [5, 6, 7, 8, 9]. Single-SSL (SSSL) techniques have been well studies using different numbers of microphones, while for multi-sound-source-localization (MSSL) many reported techniques require large microphone arrays with specific structures, limiting them to be mounted on small robots. Pioneer work for MSSL assumed the number of sources to be known beforehand [10, 11]. Some of these approaches [12, 13, 14] are based on sparse component analysis (SCA) that requires the sources to be W-disjoint orthogonal [15] (i.e., in some time-frequency components, at most one source is active), thereby making them unsuitable for reverberant environments. Pavlidi et al. [16] and Loesch et al. [17] presented an SCA-based method to count and localize multiple sound sources but requires one sound source to be dominant over others in a time-frequency zone. Clustering methods have also been used to conduct MSSL [13, 12, 14]. Catalbas et al. [18] presented an approach for MSSL by deploying four microphones at the corners of the room and the sound sources are required to be present within the boundary. The technique was limited to localize sound orientations in the two-dimensional plane using K-mediods clustering. The number of sound sources were calculated using the exhaustive elbow method, which is instinctive and computationally expensive. Traa et al. [19]

presented an approach that converts the time-delay between the microphones in the frequency domain so as to model the phase differences in each frequency bin of short-time Fourier transform. Due to the linear relationship between phase difference and frequency, the data were then clustered using random sample consensus (RANSAC). In our previous work 

[9, 8]

, we developed a SSSL technique based on an extended Kalman filter and a MSSL technique based on a cross-correlation approach, which was very computationally expensive.

The contributions of this paper includes two novel MSSL approaches for identification of the number of sound sources as well as localizing them in a three-dimensional (3D) environment. The rotation of the bi-microphone array generates an Interaural Time Difference (ITD) signal with data points forming multiple discontinuous sinusoidal waveforms. In the first approach, a novel mapping mechanism is developed to convert the acquired ITD signal to an orientation domain. An unsupervised classification is then conducted using the Density-Based Spatial Clustering of Applications with Noise (DBSCAN). DBSCAN [20] is one of the most popular nonlinear clustering techniques. It can discover any arbitrary shaped clusters of densely grouped points in a data set and outperforms other clustering methods in the literature [21, 22].

The second presented novel approach for MSSL completes a sinusoidal ITD regression using a RANSAC-based method. Each of the sine waves in the ITD signal corresponds to a single sound source. The data points associated with each sine wave is separated by performing a repeated sinusoidal regression using RANSAC [23]. After a model is fitted in an iteration, the associated data points will be removed from the ITD signal before the next iteration starts. The azimuth and elevation angles of the sound source are then determined for each fitted model. A threshold is then selected to determine the number of sound sources with the qualifying number of data points.

Both simulations and experiments were conducted to test the proposed two approaches. The results show that both approaches are capable of correctly generating the number of sound sources and their 3D orientations in terms of azimuth and elevation angles. However, the RANSAC-based approach outperforms the DBSCAN-based approach on the identification of the number of sound sources, while the DBSCAN-based approach outperforms the RANSAC-based approach in the localization accuracy.

The rest of the paper is organized as follows. In Section II-B, the mathematical calculation for the ITD signal generated by the self-rotating microphone array is presented. In Section III, the mapping mechanism for regression is presented. Section  IV presents the localization algorithm using DBSCAN clustering and Section V presents the RANSAC-based localization algorithm. Simulation results are presented and discussed in Section VI and Section  VII concludes the paper.

Ii Preliminaries

Ii-a Interaural Time Difference (ITD)

The ITD is the time difference between a sound signal arriving at two microphones and can be calculated using the cross-correlation technique [24, 25].

Figure 1:

Interaural Time Delay (ITD) estimation between signals

and using the cross-correlation technique.

Consider a single stationary sound source and two spatially separated microphones placed in an environment. Let and be the sound signals captured by the microphones in the presence of noise, which are given by [24]


where is the sound signal, and are real and jointly stationary random processes, denotes the time difference of arriving at the two microphones, and is the signal attenuation factor due to different traveling distances. It is commonly assumed that changes slowly and is uncorrelated with noises and  [24]. Figure 1 shows the process of ITD estimation between signals and , where and could be the scaling functions or pre-filters [24], which eliminate or reduce the effect of background noise and reverberations using various techniques [26, 27, 28, 29].

The cross-correlation function of and is given by

where represents the expectation operator. The time difference of and , i.e., the ITD, is given by The distance difference of the sound signal traveling to the two microphones is given by where is the sound speed and is usually selected to be 345 m/s on the Earth surface.

Remark 1.

As a matter of simplicity, the signal is referred as ITD in the context. ITD is the only cue used in this paper for the source counting and localization, generated without using any scaling functions nor pre-filters mentioned above.

Ii-B Mathematical Model for ITD signal

Before discussing the multi-source ITD signal collected by the self-rotating bi-microphone array, the single source ITD signal is first modeled. In this paper, the location of a single sound source is defined in a spherical coordinate frame, whose origin is assumed to coincide with the center of a ground robot.

Figure 2: Top-down view of the system.
Figure 3: 3D view of the system.
Figure 4: Top-down view of the plane containing triangle .

As shown in Figures 2 and 3, the left and right microphones, and collects the acoustic signal generated by the sound source S. Let be the center of the robot as well as the bi-microphone array. The sound source location is represented by (), where is the distance between the source and the center of the robot, i.e., the length of segment , is the elevation angle defined as the angle between and the horizontal plane, and

] is the azimuth angle defined as the angle measured clockwise from the robot heading vector,

, to . Letting unit vector be the orientation (heading) of the microphone array, be the angle between and , and be the angle between and , both following a right hand rotation rule, we have


In the shaded triangle, , shown in Figures 3 and 4, define and we have Based on the far-field assumption [30], we have


To avoid cone of confusion [4] in SSL, the binaural microphone array needs to be rotated with a nonzero angular velocity [11]. Without loss of generality, in this paper we assume a clockwise rotation of the microphone array on the horizontal plan while the robot itself does not rotate throughout the entire estimation process, which implies that is constant.

The initial heading of the microphone array is configured to coincide with the heading of the robot, i.e., , which implies that . As the microphone array rotates clockwise with a constant angular velocity, , we have and due to Equation (3) we have


The resulting time-varying due to Equation (4) is then given by


Because the microphone array rotates on the horizontal plane, does not change during the rotation for a stationary sound source. The resulting is a sinusoidal signal with the amplitude , which implies that


It can be seen from Equation (6), the phase angle of is the azimuth angle of the sound source. Therefore, the localization of a stationary sound source equates the identification of the characteristics (i.e., the amplitude and phase angle) of the sinusoidal signal, .

The collection of the ITD signal for multiple sound sources (as shown in Figure 11) illustrates a group of multiple discontinuous sinusoidal waveforms, each corresponding to a single sound source, satisfying the amplitude-elevation and phase-azimuth relationship as mentioned above.

Iii Model for Mapping and Sinusoidal Regression

The signal in Equation (6) is sinusoidal with its amplitude and phase angle that corresponds to the azimuth angle of the sound source. Since the frequency, , of is the known rotational speed of the microphone array, the localization task (i.e., identifying and ) is to estimate the amplitude and phase angle of , i.e., and . Consider a general form of expressed as


where and , and we have




Consider the two data points, and , collected at two distinct time instants and , respectively, and we have


which gives,


According to (9) and (10), we have




where is an integer.


In the DBSCAN algorithm [21], a random point from the data set is considered as a core cluster point when more than points (including itself) within a distance of (epsilon ball) exists in its neighborhood. This cluster is then extended by checking all of the other points satisfying the and criteria thereby letting the cluster grow. A new arbitrary point is then chosen and the process is repeated. The point which is not a part of any cluster and having fewer than points in its epsilon ball is considered as a "noise point". The DBSCAN technique is more suitable for applications with noise and performs better than the Kmeans method, which requires a prior knowledge of the number and the approximate centroid locations of clusters and can also fail in the presence of noisy data points.

The DBSCAN-Based MSSL technique consists of two stages. In the first stage, the data points of the ITD signal are mapped to the orientation domain. The data set consisting of all the data points in multi-source ITD signal contains not only inliers but also outliers, which produce undesired mapped locations. When the number of inliers is significantly greater than the outliers after a number of iterations, highly dense clusters will be formed. In the second stage, these clusters are detected using the DBSCAN technique by carefully selecting parameters

and . The number of clusters corresponds to the number of sound sources and the centroids of these clusters represent the locations of the sound sources.

1: Capture for one full rotation of the bi-microphone array

2: Select and

3: Select the number of iterations

4: FOR to DO

5: Randomly choose two points and from , such that and do not equal zero simultaneously

6: Calculate and using Equations (13) and (14)

7: Calculate using Equation (7) and


9: FOR to DO

10: Randomly choose the pair from the set

11: Calculate the distance between the chosen and every other point in

12: IF the number of points in the range is greater than

13: Label as a core cluster point

14: ELSE Label as a noise point

15: END IF


Algorithm 1 DBSCAN-Based MSSL

The complete DBSCAN-based MSSL algorithm is described in 1. Two points in the data set are selected randomly and mapped into the orientation domain by calculating the angles and using Equations (6), (13), (7) and (14). A set of these mapped points is then created. The process for detection of clusters is then started. A point in is randomly chosen and is decided to be a core cluster point or a noise point by checking the density-reachability criteria under the - condition [21]. The time complexity of Algorithm 1 is , where is the number of iterations for mapping and clustering. The number of iterations, , needs to be selected large enough for the algorithm to work efficiently.


The RANSAC algorithm [23] is able to identify inliers (e.g., parameters of a mathematical model) in a data set that may contain a significantly large number of outliers. The input to the RANSAC algorithm includes a set of data, a parameterized model, and a confidence parameter (). In each iteration, a subset of the original data is randomly selected and used to fit the predefined parameterized model. All other data points in the original data set are then tested against the fitted model. A point is determined to be an inlier of the fitted model, if it satisfies the condition. The process is repeated by selecting another random subset of the data. After a fixed number of iterations, the parameters are then selected for the best fitting (with maximum inliers) estimated model.

1: Capture for one full rotation of the bi-microphone array

2: Select , and initialize

3: WHILE there are samples in

4: FOR to DO

5: Randomly choose and from

6: Calculate and using Equations (13) and (14)

7: Calculate

8: Calculate = number of points in fitting with at least

9: IF

10: , and

11: END IF


13: Calculate using Equation (7)


15: Remove samples on within from


Algorithm 2 RANSAC-Based MSSL

The RANSAC-based MSSL method is described in Algorithm 2. It can be seen from Equation (6) that the signal generated by the self-rotating bi-microphone array is sinusoidal. Two points from the ITD signal are selected randomly and a sine wave with the given frequency (i.e., the angular speed of the rotation, is generated. The represents the number of points whose distance to the fitted sine wave is less than , which is the threshold for a point to be considered inlier. Then the points in that belong to according to the condition will be removed from , This procedure is repeated for iterations and the parameters and are updated every time the number of inliers is greater than that in the previous iterations. This process is repeated until either all the points in are examined or iterations are completed. The time complexity of Algorithm 2 is , where is the number of samples in the ITD. After the first few of iterations, most of the data points are removed. This results in to be very small number as compared to . The iterations

should be chosen large enough to ensure the probability that at least one of the sets of randomly selected points does not include an outlier.

The number of sound source is determined by carefully selecting a threshold, as shown in Figure 5. The confidence about the presence of a sound source is dependent on the value. The source with the maximum is considered to be qualified with 100 % confidence and the confidence values for other sources are calculated relatively. The source with confidence value less than the threshold is considered to be noise and do not qualify as a sound source. Very weak sound signals will have few or no data points at all in the ITD signal and will be discarded.

Figure 5: Confidence on presence of sound source

Vi Simulation and Experimental Results

Parameter Value
Dimension 20 m x 20 m x 20 m
Reflection coefficient 0.5
(walls, floor and ceiling)
Sound speed 345 m/s
Temperature 22
Static pressure 29.92 mmHg
Relative humidity 38 %
Table I: Simulated room specifications

Audio Array Toolbox [31] is used to establish an emulated rectangular room using the image method described in [32]. The robot was placed in the origin of the room. The sound sources and the microphones are assumed omni-directional and the attenuation of the sound are calculated per the specifications in Table I. A number of recorded speech signals available at [33] were used as sound sources to test the technique. Different number of sound sources were placed at various azimuth and elevation angles at a fixed distance of 5 m and the ITD signal was recorded by the rotating bi-microphone array with mics separated by a distance of m (which is approximate distance between the ears of a human). The sound sources were separated by at least in azimuth and atleast in elevation. The ITD value was calculated and recorded every

of rotation. Noise with a variance (

of was added to this ITD signal for the simulations in order to account for sensor noise. These simulations were run on a high performance cluster named Joker[34].

Figure 6: Experimental environment and the robotic platform based on a Kobuki turtlebot 2 with two MEMS microphones.

Numerous experiments were also conducted using a robotic platform in an indoor environment with  ms, as shown in Figure 6. Figure 7 shows the impulse response of the room. The sound sources were kept at a distance of about to m from the center of the robot.

Figure 7: Impulse response of the room reverberation showing secondary peaks representing the reflections from the floor and the walls.
Figure 8: Sound source locations in the simulation.
Figure 9: Simulation results of the mapped data points in the orientation domain.
Parameters For simulations For experiments
m m
Threshold % %
Table II: Parameters for RANSAC-Based and DBSCAN-Based MSSL
No. of MAE (Sim) MAE (Expt) Avg
source(s) (deg) (deg) (deg) (deg) (deg)
4 1.73 3.20 3.71 5.66 3.58
3 0.8 4.68 1.77 5.78 3.26
2 1.06 2.18 2.89 3.92 2.51
1 0.94 0.57 2.27 0.35 1.03
Table III: Mean absolute error (MAE) for localization performed with DBSCAN-Based MSSL in simulation (Sim) and experiments (Expt) for different number of sound sources.
No. of MAE (Sim) MAE (Expt) Avg
source(s) (deg) (deg) (deg) (deg) (deg)
4 0.88 8.12 3.92 7.20 5.03
3 2.51 5.56 2.33 5.62 4.01
2 2.55 5.01 3.07 3.88 3.63
1 1.61 1.97 2.15 3.07 2.2
Table IV: Mean absolute error (MAE) for localization performed with RANSAC-Based MSSL in simulation (Sim) and experiments (Expt) for different number of sound sources.

Figure 8 shows four sound sources in the simulation placed at , , and and a distance of  m from the robot placed at the origin. Figure 9 shows the estimation of the parameters and for every two points chosen from the multi-source signal at each iteration. The parameters used for the RANSAC-based and DBSCAN-based algorithms are listed in Table II. The value was chosen to be  m, which implies that all sound sources are assumed to be separated by at least from each other. Tables III and IV show the simulation and experimental results of localization with the number of sound sources varying from one to four.

Monte Carlo simulation runs were performed using the two proposed approaches, respectively, with specifications given in Table I. The simulation were run with sources and the results of the source counting are listed in the Table V.

Figure 10: Simulation results showing clusters detected by DBSCAN-Based MSSL at positions corresponding to the sound source locations.

The clustering result using DBSCAN, as shown in Figure 10.

Act K 1 2 3 4 5 6 7 8 9
1 944 56 0 0 0 0 0 0 0
2 11 902 65 15 7 0 0 0 0
3 1 61 847 47 38 5 1 0 0
4 12 39 126 687 82 36 11 6 1
5 4 17 80 183 595 110 6 3 1
Table V: Estimated vs actual number of sound source count for DBSCAN-Based MSSL in the simulated environment.
Act K 1 2 3 4 5 6 7 8 9
1 1000 0 0 0 0 0 0 0 0
2 7 991 2 0 0 0 0 0 0
3 0 56 898 46 0 0 0 0 0
4 0 4 104 888 4 0 0 0 0
5 0 0 1 85 750 139 17 6 2
Table VI: Estimated vs actual number of sound source count using RANSAC-Based MSSL in the simulation environment.

Figure 11 shows the simulation result of a sample run, where three sources were detected. The three sound sources were kept at and and the estimated locations by the RANSAC-based algorithm are and , respectively. Since the ITD signal is noisy any point very close () to any of was chosen to be on the ITD by the RANSAC algorithm. The value can be chosen depending on possibility of sound sources to be close to each other and the noise level. The signal to noise ratio (SNR) of the measured signal was dB. For a source to be considered as a qualified sound source, the threshold for the confidence that worked for us was  % in simulation and  % in experiments.

Figure 11: Estimation of each of signal from the multi-source signal using RANSAC-Based MSSL in simulation.
Figure 12: Average of simulation and experimental localization error by the DBSCAN-Based and RANSAC-Based MSSL for different number of sound sources.
Figure 13: Error in the number of sound source identification by DBSCAN-Based and RANSAC-Based MSSL in simulation.

As shown in Figure 12, the average error of orientation localization with the DBSCAN-based algorithm is less as compared to the RANSAC-based algorithm. which, however, generates comparatively more accurate results for source counting, as shown in the Figure 13. In both simulations and experiments, the error of elevation angle estimation was found to be large for sources kept close to zero elevation, which coincide the conclusion in [9]. The performance of the localization and source counting using both the aforementioned techniques improves significantly by increasing the number of rotation of the bi-microphone array. The sound sources are assumed to be active during the rotation of the bi-microphone array with possible pauses such as in case of speech signals.

Vii Conclusion

Two novel techniques are presented for small autonomous unmanned vehicles (SAUVs) to perform multi-sound-source localization (MSSL) using a self-rotating bi-microphone array. The DBSCAN-based MSSL approach iteratively maps the randomly chosen points in the ITD signal to the orientation domain, leading to a data sets for clustering. These clusters are detected using the density based spatial clustering for application with noise (DBSCAN). The number of clusters gives the number of sound sources and the location of the centroid of these clusters determines the location of the sound sources. The second proposed technique uses random sample consensus (RANSAC) to iteratively estimate parameters of a model using two randomly randomly chosen data points from the ITD signal data. It then uses a threshold to decide between the qualifying sound sources. The simulation and experimental results show the effectiveness of both approaches in identifying the number and the orientations of the sound sources.


  • [1] J. Borenstein, H. Everett, and L. Feng, Navigating mobile robots: systems and techniques.   A K Peters Ltd., 1996.
  • [2] D. V. Rabinkin, “Optimum sensor placement for microphone arrays,” Ph.D. dissertation, RUTGERS The State University of New Jersey - New Brunswick, 1998.
  • [3] M. Brandstein and D. Ward, Microphone arrays: signal processing techniques and applications.   Springer Science & Business Media, 2013.
  • [4] H. Wallach, “On sound localization,” The Journal of the Acoustical Society of America, vol. 10, no. 4, pp. 270–274, 1939.
  • [5] S. Lee, Y. Park, and Y.-s. Park, “Three-dimensional sound source localization using inter-channel time difference trajectory,” International Journal of Advanced Robotic Systems, vol. 12, no. 12, p. 171, 2015.
  • [6] A. A. Handzel and P. Krishnaprasad, “Biomimetic sound-source localization,” IEEE Sensors Journal, vol. 2, no. 6, pp. 607–616, 2002.
  • [7] G. H. Eriksen, “Visualization tools and graphical methods for source localization and signal separation,” Master’s thesis, Universityof OSLO, Department of Informatics, 2006.
  • [8] X. Zhong, W. Yost, and L. Sun, “Dynamic binaural sound source localization with ITD cues: Human listeners,” The Journal of the Acoustical Society of America, vol. 137, no. 4, pp. 2376–2376, 2015.
  • [9] D. Gala, N. Lindsay, and L. Sun, “Three-dimensional sound source localization for unmanned ground vehicles with a self-rotational two-microphone array,” in Proceedings of the 5th International Conference of Control, Dynamic Systems, and Robotics (CDSR’18). Accepted (2018), June.
  • [10] L. Sun and Q. Cheng, “Indoor multiple sound source localization using a novel data selection scheme,” in 48th Annual Conference on Information Sciences and Systems (CISS).   IEEE, 2014, pp. 1–6.
  • [11] X. Zhong, L. Sun, and W. Yost, “Active binaural localization of multiple sound sources,” Robotics and Autonomous Systems, vol. 85, pp. 83–92, 2016.
  • [12] C. Blandin, A. Ozerov, and E. Vincent, “Multi-source TDOA estimation in reverberant audio using angular spectra and clustering,” Signal Processing, vol. 92, no. 8, pp. 1950–1960, 2012.
  • [13] M. Swartling, B. Sällberg, and N. Grbić, “Source localization for multiple speech sources using low complexity non-parametric source separation and clustering,” Signal Processing, vol. 91, no. 8, pp. 1781–1788, 2011.
  • [14] T. Dong, Y. Lei, and J. Yang, “An algorithm for underdetermined mixing matrix estimation,” Neurocomputing, vol. 104, pp. 26–34, 2013.
  • [15] O. Yilmaz and S. Rickard, “Blind separation of speech mixtures via time-frequency masking,” IEEE Transactions on signal processing, vol. 52, no. 7, pp. 1830–1847, 2004.
  • [16] D. Pavlidi, A. Griffin, M. Puigt, and A. Mouchtaris, “Real-time multiple sound source localization and counting using a circular microphone array,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 10, pp. 2193–2206, 2013.
  • [17] B. Loesch and B. Yang, “Source number estimation and clustering for underdetermined blind source separation,” in International Workshop on Acoustic Signal Enhancement (IWAENC), 2008.
  • [18] M. C. Catalbas and S. Dobrisek, “3D moving sound source localization via conventional microphones,” Elektronika ir Elektrotechnika, vol. 23, no. 4, pp. 63–69, 2017.
  • [19] J. Traa and P. Smaragdis, “Blind multi-channel source separation by circular-linear statistical modeling of phase differences,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2013, pp. 4320–4324.
  • [20] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Kdd, vol. 96, no. 34, 1996, pp. 226–231.
  • [21]

    C. D. Raj, “Comparison of K means K medoids DBSCAN algorithms using DNA microarray dataset,”

    International Journal of Computational and Applied Mathematics (IJCAM), 2017.
  • [22] N. Farmani, L. Sun, and D. J. Pack, “A scalable multitarget tracking system for cooperative unmanned aerial vehicles,” IEEE Transactions on Aerospace and Electronic Systems, vol. 53, no. 4, pp. 1947–1961, Aug 2017.
  • [23] M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.
  • [24] C. Knapp and G. Carter, “The generalized correlation method for estimation of time delay,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 24, no. 4, pp. 320–327, Aug 1976.
  • [25] M. Azaria and D. Hertz, “Time delay estimation by generalized cross correlation methods,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, no. 2, pp. 280–285, 1984.
  • [26] P. Naylor and N. D. Gaubitch, Speech dereverberation.   Springer Science & Business Media, 2010.
  • [27] A. Spriet, L. Van Deun, K. Eftaxiadis, J. Laneau, M. Moonen, B. Van Dijk, A. Van Wieringen, and J. Wouters, “Speech understanding in background noise with the two-microphone adaptive beamformer beam in the nucleus freedom cochlear implant system,” Ear and hearing, vol. 28, no. 1, pp. 62–72, 2007.
  • [28] D. R. Gala, A. Vasoya, and V. M. Misra, “Speech enhancement combining spectral subtraction and beamforming techniques for microphone array,” in Proceedings of the International Conference and Workshop on Emerging Trends in Technology (ICWET), 2010, pp. 163–166.
  • [29] D. R. Gala and V. M. Misra, “SNR improvement with speech enhancement techniques,” in Proceedings of the International Conference and Workshop on Emerging Trends in Technology (ICWET).   ACM, 2011, pp. 163–166.
  • [30] “International Organization for Standardization (ISO), British, European and International Standards (BSEN), Noise emitted by machinery and equipment – Rules for the drafting and presentation of a noise test code,” 12001: 1997 Acoustics.
  • [31] K. D. Donohue, “Audio array toolbox,” [Online] Available: , 2017, Dec 22.
  • [32] J. B. Allen and D. A. Berkley, “Image method for efficiently simulating small-room acoustics,” The Journal of the Acoustical Society of America, vol. 65, no. 4, pp. 943–950, 1979.
  • [33] K. D. Donohue, “Audio systems lab experimental data - single-track single-speaker speech,” [Online] Available: http:// donohue/audio/Data/audioexpdata.htm , 2018, Feb 10.
  • [34] New Mexico State University, “High performance cluster, joker,” [Online] Available: http:// , 2017, Feb 10.