Broadband beamforming with microphone array is a key signal processing module in many consumer electronics products, e.g., smart phones and smart speakers [1, 2, 3]. The proliferation of microphone arrays due to decreasing hardware cost and superior speech enhancement performance, has made broadband beamforming a ubiquitous embedded technology, and its performance has a critical impact on the overall system.
A key requirement for broadband beamforming is to deliver consistent performance across several octaves of frequencies, e.g., Hz - KHz in the voiceband case. Speech enhancement is typically the system objective in most microphone arrays systems, rather than mere signal detection, as in the narrowband case. This poses hardware and algorithmic challenges in the design of microphone arrays and the underlying beamforming procedure. Filter-and-Sum (F&S)  has been a standard approach for designing a broadband beamformer as an extension to a narrowband beamformer by stitching frequency-domain coefficients that are computed using narrowband beamforming techniques. Several narrowband beamforming techniques, with different objectives and assumptions, become standard design techniques, e.g., Delay-and-Sum (D&S) [5, 2]
, Minimum-Variance-Distortionless-Response (MVDR)[6, 7], Subspace methods . In this work, we do not address a particular beamformer design algorithm. Rather, the emphasis is on the acoustic modeling, which is common among these techniques. Without loss of generality, MVDR-based F&S beamformer with a robustness constraint is used as a case study for our analysis. At a given frequency and look-direction
, the two key design parameters in almost all beamforming algorithms are the steering vector and the spatial coherence matrix. Proper design of these two parameters is the subject of this work.
In Far-Field models, the acoustic wave is usually approximated by plane-waves , and the steering vector at the direction/frequency of a plane wave is defined as the observed acoustic pressure at the different microphones when the microphone array is impinged with the plane-wave. Near-Field steering vectors can similarly be approximated by acoustic spherical waves. The observed wave-field in the general case is the superposition of the incident wave-field and the scattered wave-field. A typical approximation of the steering vector is the free-field approximation, which assumes sound propagation in free-field (at the speed of sound in air), and only the incident wave-field is considered. This approximation is used almost universally in the microphone array literature because it yields closed-form formulae that simplify beamformer analysis. The main issue of the free-field approximation is that it ignores the impact of the device surface on the observed acoustic pressure, i.e., the scattered wave-field. This impact, as will be shown, can significantly change the microphone array behavior at certain frequencies and angles.
A possible remedy to this problem is to rely on anechoic lab measurements to quantify the device response to incident waves. However, this is a time-consuming and high-cost solution, and imperfect experimental settings could lead to noticeable modeling errors, especially in near-field cases. In this work, we describe a simulation-based approach for acoustic modeling of microphone array on rigid surface by solving the Helmholtz wave equation using Finite-Element-Method (FEM) with a background wave-field that matches the incident wave-field . Prior works that studied the impact of scattered field on microphone arrays used spherical harmonic decomposition for specific form-factors (e.g. sphere, cylinder) [10, 11, 12] (and references, therein). However, these methods are restrictive in the choice of device form-factors (e.g. do not include modern smart-speaker form-factors) and beamforming techniques. In comparison, the FEM method proposed in this paper provides three notable contributions: (i) a methodology to compute the steering vector for microphone arrays mounted on solid hard surfaces without the need for expensive anechoic chamber measurements, (ii) ability to design any type of beamformer that relies on steering vectors; these include MVDR beamformer , linearly constrained minimum variance (LCMV) beamformer [13, 14], and polynomial beamformer , and (iii) extension of the proposed method to generic device form-factors that are used for smart speakers.
The following notations are used throughout the paper. A bold lower-case letter denotes a column vector, while a bold upper-case letter denotes a matrix. and denote the transpose and conjugate transpose, respectively, of , and is the matrix entry at position . denotes the polar and azimuth angles, respectively, in a spherical coordinate system. always refers to the number of microphones. denotes the noise coherence matrix, of size , at frequency (the dependency on is dropped whenever it is clear from the context). Additional notations are introduced when needed.
2.1 Wave Equation
The acoustic wave equation  is the governing equation for the propagation of sound waves at equilibrium in elastic fluids, e.g., air. The homogenous wave equation has the form
where is the acoustic pressure, and is the speed of sound in the medium. In this work, we consider only the practical case of homogenous fluid with no viscosity.
In practice, the wave equation is usually solved in the frequency domain using the Helmholtz equation to find :
where is the wave number. At steady state, the time-domain and frequency-domain solutions are Fourier pairs . In our modeling, we work only with the homogenous Helmholtz equation under various boundary conditions. The boundary conditions are determined by the geometry and the acoustic impedance of the different boundaries. We assume the device has a rigid surface, therefore, it is modeled as a sound hard boundary.
2.2 Beamforming Strategies
Beamforming is a microphone-array signal processing technique that allows emphasizing the user’s speech from a desirable look-direction (LD) while suppressing interferences from other directions. Here, we process microphone elements such that the signals arriving from look-direction are combined in-phase, while signals arriving from other directions are combined out-of-phase. Denote the position of the -th microphone by , and the signal acquired at the -th microphone for frequency by . Then, the signal acquired by the microphone array can then be expressed as:
Denoting the spectrum of the desired source signal by and the ambient noise captured by the microphone array as , we can express as:
where, is the frequency and angle-dependent steering vector.
The beamformer design involves computation of complex-valued weights for each frequency and microphone denoted by , which are then applied to to obtain the beamformer output :
We are interested in using FEM modeling for the design of F&S beamformers that can be expressed as a constrained optimization problem, and the solution to which provides the optimal beamformer filters. This covers various F&S beamformers like MVDR, maximum SNR, and LCMV beamformers . In this work, we use, without loss of generality, the MVDR beamformer with a robustness constraint to present our analysis.
2.3 Beamforming Metrics
We use three metrics to assess the performance: array gain (AG), white noise gain (WNG), and microphone array channel capacity (MACC) . The AG metric is defined as the improvement in signal-to-noise-ratio (SNR) offered by the beamformer: . After some algebraic manipulations, one can show that :
where denotes the look-direction, and is the normalized noise correlation matrix with
where denotes the distribution of noise power as a function of and , and
The WNG metric is the SNR improvement provided by the beamformer when the noise components at the microphones are statistically independent :
The MACC metric  aims at providing a characterization of the microphone array that is independent of the beamformer realization. It is analogous to MIMO channel capacity in wireless communication. If the source location is known, then the MACC is defined as
is the singular value decomposition of, and is the input power.
3 Acoustic Modeling
3.1 Acoustic Plane-Waves
Acoustic plane waves constitute a powerful tool for analyzing the wave equation, and it provides a good approximation of the wavefield emanating from a far-field point source . The acoustic pressure of a plane-wave with vector wave number is defined at a point in the 3D space as:
This is a solution of the inhomogeneous Helmholtz equation with a far point source, where (note that, for a given
in the Helmholtz equation, there are two degrees of freedom in choosing). Further, a general solution to the homogenous Helmholtz equation can be approximated by a linear superposition of plane waves of different angles [17, 19, 20, 21]. These properties render acoustic plane-waves a key tool in designing far-field beamforming for microphone arrays, where the microphone array response to each plane wave provides a sufficient set for the beamformer design.
The total wavefield at each microphone of the microphone array when an incident plane-wave impinges on the device has the general form:
where and refer to the total and scattered wavefield respectively. The total wavefield, , at each microphone is computed by inserting (12) in the Helmholtz equation (2) and solving for with appropriate boundary conditions. The details of this modeling are described in section 3.2. It is evident from (11) that an incident plane-wave does not have magnitude information, and it is fully parameterized by its phase. This is not true for the scattered wavefield, which represents the reflections/diffractions due to the rigid device surface. This magnitude information in is critical in resolving phase ambiguity due to microphone array geometry.
If the microphone array is composed of discrete microphones in space, and the area of each microphone is much smaller than the wavelength, then a reasonable approximation is to set in (12). This is referred to as free-field approximation. In this case, the total wavefield, , is fully determined by the wavenumber in (11), and the coordinates of each microphone. It is obvious that, free-field approximation is not accurate if the microphone array is on a rigid surface. Nevertheless, this approximation has been utilized almost universally in the literature for acoustic modeling in beamformer design. In the following section, we show that the free-field approximation does not provide a good approximation of the total field under important practical cases.
3.2 FEM Modeling
The modeling objective is to compute the total sound field in (12) at each microphone when the device is impinged by a plane wave. This resembles physical measurement in anechoic room with a distant point source. FEM is one of the standard approaches for solving the Helmholtz equation numerically. In our case, we need to solve the Helmholtz equation for the total wavefield at all frequencies of interest with a background plane wave. The device surface is modeled as sound hard boundary. The microphone is modeled as a point receiver on the surface if the microphone surface area is much smaller than the wavelength, otherwise, its response is computed as the integral of the acoustic pressure over its area. To have a true background plane-wave, the external boundary should be open and non-reflecting. In our model, the device is enclosed by a closed boundary, e.g., a cylinder or a spherical surface. To mimic open-ended boundary, there are two choices: (i) Matched boundary whose impedance is matched to the air impedance at the frequency of interest, (ii) Perfectly matched layer, which defines a special absorbing domain that eliminates reflection and refractions in the internal domain that encloses the device . The merits of each approach is beyond the scope of this paper. The FEM solves for in (12), which is equivalent to solving for only the scattered field, , after inserting background plane wave model (11) in the Helmholtz equation. The acoustics module of COMSOL multiphysics package  is used for this FEM numerical solution, and the simulation is rigorously validated with exact and measured results on different form-factors. For example, in Fig. 1, we show the total pressure field of two microphones on a spherical surface with analytical and simulated solution. Both amplitude and phase responses match excellently with the analytical solution . Further, in Fig. 2, we show an example of simulated and measured acoustic pressure of a rectangular microphone array mounted on a slanted cube. In the plot, we show the inter-channel response, i.e., , where is a reference microphone. The phase difference between simulated and measured responses is linear, which is expected when the positions of the device in both cases are not perfectly aligned. For more comparisons between simulated/theoretical and measured acoustic pressure responses, one may refer to [25, 26].
Note that, the above procedure is not limited to plane-wave as we only need to specify the background pressure field, which could be, for example, spherical wave for near-field modeling. The procedure is repeated for a grid of frequency and incident angles to build a dictionary of total pressure that is used in subsequent analysis.
4 Analysis of Free-Space Beamforming
To illustrate the benefits of FEM modeling, we use the MVDR beamformer with a robustness constraint, formulated as a constrained convex optimization problem :
where the first constraint is called the distortionless constraint , and the second constraint is the WNG constraint, which imposes robustness in the beamformer design that can be controlled through . Further, the WNG constraint enables a more fair comparison between the total and free-field beamformer designs because the WNG is bounded in both cases. Without loss of generality, we assume a spherically diffuse noise field. The optimization problem in (4) is solved using a convex optimization solver to obtain the beamformer weights . Note that the proposed FEM-model based method can be similarly extended to other beamformer designs like the MVDR , LCMV , and polynomial beamformer .
4.1 Analysis Methodology
The MVDR solution can be obtained from (4) by using and as steering vectors for free-field (FF) and total-field (TF), respectively. To compute , we use analytical method for canonical device shapes, such as finite cylinder and sphere . For a general device shape, the FEM tool is used to simulate the steering vectors for a uniform grid of azimuth and polar angles. Then, is numerically computed from (7) and (8), with for the spherically diffuse noise field.
We now compare the performance of MVDR beamformer designed using FF and TF assumptions. For our study, we use the setup in Fig. 3, which has microphones on the top of a cylinder of height 130 mm and diameter of 70 mm; the top surface of the cylinder has a spherically-curved shape. This surface does not have a closed-form solution for the Helmholtz equation, which necessitates the use of the proposed FEM method. The origin of the coordinate system coincides with the center microphone with axis pointing upwards, and the - plane parallel to the bottom face of the cylinder. The coordinates of the microphones are: , where mm and mm. Lastly, we set dB.
We evaluate the microphone array metrics under Free Field (FF) and Total Field (TF) setups for the array in Fig. 3 at two arrival angles: and . The results are summarized in Figs. 4-6 for the three performance metrics.
At , i.e., - plane, the TF case is slightly better for the AG and MACC, but the WNG performance for the TF case is better than the FF case. This is explained by noting that the steering vectors in the TF case have variations in both phase and amplitude (over microphones) in comparison to the FF case, which only has phase variations. The amplitude variations increase the spatial diversity for the TF case, which can be used to improve the spatial directivity of the beamformer. At , the TF case has WNG performance better than the FF over the full frequency range; the AG for TF case is noticeably better than FF case for all frequencies, and significantly better for frequencies beyond 2 kHz. Note that the WNG curves are lower-bounded by dB, because of the WNG constraint specified in (4). Note also that in all cases, the MACC for the TF case is noticeably better than the FF case, because the FF case ignores the magnitude information, which provides invaluable characterization of the look direction.
The big deviation of the FF performance at is attributed to the magnitude of the scattered wavefield (which is ignored in the FF case). This is illustrated in Fig. 7, where we show the magnitude of the scattered wavefield at the five microphones at both angles (where the background plane wave has the same magnitude in both cases). Note that, the scattered wavefield at is approximately dB stronger than especially at high frequencies, which is manifested clearly in the corresponding AG/WNG/MACC behavior. This significant deviation of the free-field case demonstrates the limitation of this modeling and the necessity of incorporating the scattered field component through FEM modeling for beamformer design.
5 Conclusion and Future Work
The free-field model does not provide accurate modeling for broadband beamformer design, especially when the scattered wavefield is significant. Therefore, designing beamformer metrics based on free-field modeling results in suboptimal performance. To mitigate this issue, we described a simulation-based framework for modeling the total wavefield, which is shown to noticeably improve the beamformer design. The model is universal for any device surface, and it could be used for both near-field and far-field modeling by computing the steering vectors of spherical and plane waves, respectively. Future work will utilize the results of this work to develop novel design techniques for broadband beamformer and generic form-factors that are based on this realistic microphone array modeling . Additionally, we expand the array processing metrics, and show a close matching of simulated and measured beampatterns for our proposed method .
-  Amit Chhetri, Philip Hilmes, Trausti Kristjansson, Wai Chu, Mohamed Mansour, Xiaoxue Li, and Xianxian Zhang, “Multichannel Audio Front-End for Far-Field Automatic Speech Recognition,” in 2018 26th European Signal Processing Conference (EUSIPCO). IEEE, 2018, pp. 1527–1531.
-  Michael Brandstein and Darren Ward, Microphone arrays: signal processing techniques and applications, Springer Science & Business Media, 2013.
-  Jacob Benesty, Jingdong Chen, and Yiteng Huang, Microphone array signal processing, vol. 1, Springer Science & Business Media, 2008.
-  Otis Lamont Frost, “An algorithm for linearly constrained adaptive array processing,” Proceedings of the IEEE, vol. 60, no. 8, pp. 926–935, 1972.
-  Dan E Dudgeon, “Fundamentals of digital array processing,” Proceedings of the IEEE, vol. 65, no. 6, pp. 898–904, 1977.
-  Jack Capon, “High-resolution frequency-wavenumber spectrum analysis,” Proceedings of the IEEE, vol. 57, no. 8, pp. 1408–1418, 1969.
-  Henry Cox, Robert Zeskind, and Mark Owen, “Robust adaptive beamforming,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 35, no. 10, pp. 1365–1376, 1987.
-  Heinrich Kuttruff, Room acoustics, CRC Press, fourth edition, 2000.
-  Stig Larsson and Vidar Thomée, Partial differential equations with numerical methods, vol. 45, Springer Science & Business Media, 2008.
-  Dmitry N Zotkin, Nail A Gumerov, and Ramani Duraiswami, “Incident field recovery for an arbitrary-shaped scatterer,” in Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017, pp. 451–455.
-  Heinz Teutsch, Modal array signal processing: principles and applications of acoustic wavefield decomposition, vol. 348, Springer, 2007.
-  Boaz Rafaely, Fundamentals of spherical array processing, vol. 8, Springer, 2015.
-  H. L. Van Trees, Optimum Array Processing, Wiley, New York, 2002.
-  Edwin Mabande, Adrian Schad, and Walter Kellermann, “Design of robust superdirective beamformers as a convex optimization problem,” in Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on. IEEE, 2009, pp. 77–80.
-  E Mabande and Walter Kellermann, “Design of robust polynomial beamformers as a convex optimization problem,” in Proc. IEEE Int. Workshop Acoustic Echo, Noise Control (IWAENC), 2010, pp. 1–4.
-  Lawrence E Kinsler, Austin R Frey, Alan B Coppens, and James V Sanders, Fundamentals of acoustics, Wiley, third edition, 1982.
-  Earl G Williams, Fourier acoustics: sound radiation and nearfield acoustical holography, Academic press, 1999.
-  Mohamed F Mansour, “Information measures for microphone arrays,” arXiv preprint arXiv:1801.10128, 2018.
-  Andrea Moiola, Ralf Hiptmair, and I Perugia, “Plane wave approximation of homogeneous helmholtz solutions,” Zeitschrift für angewandte Mathematik und Physik, vol. 62, no. 5, pp. 809, 2011.
-  Orhan Yilmaz and M Turhan Taner, “Discrete plane-wave decomposition by least-mean-square-error method,” Geophysics, vol. 59, no. 6, pp. 973–982, 1994.
“Plane wave decomposition in the unit disc: Convergence estimates and computational aspects,”Journal of Computational and Applied Mathematics, vol. 193, no. 1, pp. 140–156, 2006.
-  Jean-Pierre Berenger, “A perfectly matched layer for the absorption of electromagnetic waves,” Journal of computational physics, vol. 114, no. 2, pp. 185–200, 1994.
-  COMSOL Multiphysics, “Acoustic module–user guide,” 2017.
-  John J Bowman, Thomas B Senior, and Piergiorgio L Uslenghi, Electromagnetic and acoustic scattering by simple shapes, North-Holland Publishing Company, 1970.
-  Francis M Wiener, “The diffraction of sound by rigid disks and rigid square plates,” The Journal of the Acoustical Society of America, vol. 21, no. 4, pp. 334–347, 1949.
-  RD Spence, “The diffraction of sound by circular disks and apertures,” The Journal of the Acoustical Society of America, vol. 20, no. 4, pp. 380–386, 1948.
-  Guangdong Pan, Wontak Kim, Savaskan Bulek, Amit Chhetri, and Mohamed Mansour, “A study on acoustic modeling for microphone array beamforming,” pre-print, 2019.