In this work, a multiple sound source localization and counting method is presented, that imposes relaxed sparsity constraints on the source signals. A uniform circular microphone array is used to overcome the ambiguities of linear arrays, however the underlying concepts (sparse component analysis and matching pursuit-based operation on the histogram of estimates) are applicable to any microphone array topology.
Our method is based on detecting time-frequency (TF) zones where one source is dominant over the others. By applying any single source direction of arrival (DOA) estimation algorithm over appropriately selected TF components in theses “single-source” zones we obtain local DOA estimates. From the local estimates in a block of consecutive time frames we form a histogram which we have to process in order to acquire the number of active sound sources and their corresponding DOAs. We apply a matching pursuit-based approach (MP) which utilizes a two-width pulse combination in order to account for the contribution of a source to the histogram and accurately pick the DOA. This dual-width approach is illustrated in Fig. 1 where one can observe four clearly visible and similarly shaped peaks, the indexes of which correspond to the DOAs of four simultaneously active sources. We iteratively detect a source, retrieve its DOA and remove its contribution from the histogram until we reach a user defined threshold.
The method is shown to have excellent performance for DOA estimation and source counting, and to be highly suitable for real-time applications due to its low complexity. Through simulations (in various signal-to-noise ratio conditions and reverberant environments) and real environment experiments, we indicate that our method outperforms other state-of-the-art DOA and source counting methods in terms of accuracy, while being significantly more efficient in terms of computational complexity. Representative results can be seen in Fig. 2 where we show the estimated DOA of six simultaneously active sources in a real environment of approximately RT=400 ms of reverberation time and in Table 1 where we provide source counting success rates in terms of a confusion matrix for scenarios up to 6 simultaneously active sources in low reverberation conditions and SNR=20 dB.