Localizing Multiple Audio Sources in a Wireless Acoustic Sensor Network

: . .

In this work, we propose a real-time grid-based method to estimate the location of multiple sound sources in a wireless acoustic sensor network, where each sensor node contains a microphone array and only transmits direction-of-arrival (DOA) estimates in each time interval, reducing the transmissions to the central processing node. For the single-source case the grid-based method is an alternative implementation of the non-linear least squares (NLS) estimator that performs much better in terms of computation time without sacrificing any accuracy.   For multiple sources, the single-source grid-based method is applied to every possible DOA combination from the sensors resulting in a set of candidate location estimates. The final sources’ locations are selected from the set of candidate location estimates based on the location estimate and its corresponding DOA combination to the sensors.


Our simulations use new results that we present in this paper to model the DOA estimation error. We also simulate more realistic scenarios where we consider the problem of missing DOAs when the sources are close together, which occurs very often in practice as our experiments with real recorded signals suggest. Our approach was evaluated and compared to other state-of-the-art localization methods in simulated environment with different SNR levels and different amount of missing DOAs—which is modelled through the Minimum Angular Source Separation (MASS) that defines the minimum angular separation that two sources must have in order to be both detected. Figure 1 shows the localization accuracy of our proposed grid-based method for the case of two and three simultaneously active sound sources and various MASS values.



Lastly, our grid-based method was evaluated using real signals that we recorded in an outdoor wireless acoustic sensor network. Figure 2 shows the location estimates from the real recordings using the proposed grid-based method for different layouts of two and three sources. The red dots show the cloud of estimates over about 5 seconds, and show quite accurate localization.


 Read more on Audio and Speech Signal Processing or go to our Publications