Audio & Speech Signal Processing

SPL for over a decade has developed significant research and innovation expertise in diverse applications of audio and speech signal processing, such as the acquisition, coding, transformation, separation, retrieval, enhancement, and immersive rendering of audio and speech signals.

Featured Application Areas:

Sensor Networks for Immersive Environments

Immersive Audio Communication Systems (IMMACS)


Spatial sound acquisition and rendering methods for immersive environments

Immersive audio systems have been making strides in such applications as telepresence, augmented and virtual reality, entertainment, distance learning, and sound editing for television and film. SPL performs research on signal processing issues that pertain to the acquisition and subsequent rendering and transmission of 3D sound fields. On the acquisition side, advanced statistical methods have been developed for applying microphone arrays in audio applications, by addressing two major aspects of spatial filtering, namely localisation of a signal of interest and adaptation of the spatial response of an array of sensors so as to achieve steering in a given direction. On the rendering side, surround sound and immersive audio methods have been developed that generate virtual sound sources around thelistener.


Related Publications:

Stefanakis, NikolaosMouchtaris, Athanasios, Foreground Suppression for Capturing and Reproduction of Crowded Acoustic Environments, in Proc.IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2015), Brisbane, Australia, April 19-24, 2015, pp.51-55.

Pavlidi Despoina; Griffin, Anthony; Puigt, Matthieu; Mouchtaris, Athanasios, Real-Time Multiple Sound Source Localization and Counting using a Circular Microphone Array, IEEE Trans. Audio, Speech, and Language Processing, vol. 21, no. 10, October 2013, pp. 2193 - 2206.

Audio signals analysis and modelling

The analysis and modelling of audio signals finds important applications in a variety of fields, such as audio compression, sound source separation and enhancement, music analysis and retrieval, and so forth. SPL has developed algorithms which analyze and exploit properties of music signals, such as the harmonic structure of sound signals, the temporal and spectral variations of music signals with respect to time, and spatial attributes in the spectral information. Applications of interest include multichannel audio compression, separation of non-linear mixtures of sound sources, tempo estimation, onset detection, detection and classification of impact sounds, for the analysis and classification of music signals.

Related Projects: SENSE, AVID-MODE.

Related Publications:

Stefanakis Nikolaos; Mouchtaris, Athanasios, A multi-sensor approach for real-time detection and classification of impact sounds, Proc. European Signal Processing Conference (EUSIPCO), Nice, France, August 31-September 4, 2015, pp. 2083-2087.

Caetano, Marcello; Kafentzis, George; Mouchtaris, Athanasios, Adaptive Modeling of Synthetic Nonstationary Sinusoids, Proc. 18th International Conference on Digital Audio Effects (DAFx-15), Trodheim, Norway, November 30 – December 3, 2015.

Audio coding enabling interactivity at the receiver

We have always believed that audio coding approaches should enable interactivity at the receiving end. In this sense, we have focused our research in this area in two directions, (a) multichannel audio coding of spot microphone signals, and (b) directional audio coding. In the first direction, we have developed parametric models which can encode multiple audio channels with low bitrate, before those are mixed into the final multichannel audio content (spot signals), enabling remote mixing. In directional audio coding, we have developed innovative coding approaches combining research in microphone arrays and spatial filtering, resulting in low-bitrate systems which enable direction-based interactivity at the receiver.

Related Projects: AVID-MODE, ASPIRE.

Related Publications:

Alexandridis, Anastasios; Griffin, Anthony; Mouchtaris, Athanasios, Breaking down the Cocktail Party: Capturing and Isolating Sources in a Soundscape, Proc. European Signal Processing Conference (EUSIPCO), Lisbon, Portugal, September 1-5, 2014.

Alexandridis, Anastasios; Griffin, Anthony; Mouchtaris, Athanasios, Directional Coding of Audio Using a Circular Microphone Array, Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada, May 26-31, 2013.

Wireless acoustic sensor networks

SPL focuses on Wireless acoustic sensor networks (WASNs), under the assumption that each node consists of a microphone array enhanced with limited signal processing and wireless communication capabilities to perform computations and communicate with other nodes. WASNs offer richer sensing capabilities than a single microphone array and find use in applications such as hearing aids, distant speech recognition, ambient intelligence, hands-free telephony, and acoustic monitoring. In such applications, information about the sources’ locations is important for operations like noise reduction and speech enhancement. SPL researchers in WASNs have achieved to perform accurate localization of multiple simultaneous sound sources under realistic conditions of noise and reverberation, without the need to perform source tracking. It is noted that SPL develops its own acoustic sensors for the WASN applications.

Related Projects: LISTEN, SENSE, ASPIRE.

Related Publications:

Alexandridis, Anastasios; Mouchtaris, Athanasios, Multiple sound source location estimation and counting in a wireless acoustic sensor network, Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), October 18-21, 2015.

Griffin, Anthony; Alexandridis, Anastasios; Pavlidi, Despoina; Mastorakis, Yannis; Mouchtaris, Athanasios, Localizing multiple audio sources in a wireless acoustic sensor network, Signal Processing, Special issue on wireless acoustic sensor networks and ad-hoc microphone arrays, 107 , pp. 54–67, 2015.

Speech signal processing

SLP performs research in the area of speech signal modelling and has proposed or adapted several mathematical and statistical models for representing the time-frequency properties of speech signals, and for analyzing speaker-dependent information. Sparse signal processing methodology and compressed sensing have been extensively examined in the area of speech signal classification and speech/audio coding. Statistical transformations of speech spectral features have also been investigated. SPL has applied the derived models in several applications of interest, including voice conversion, speech enhancement, speaker identification, speech emotion recognition, and front-end design for distant speech recognition systems.

Related Projects: LISTEN, ASPIRE.

Related Publications:

Tzagkarakis, Christos; Mouchtaris, Athanasios, Sparsity Based Noise Robust Speaker Identification Using a Discriminative Dictionary Learning Approach, Proc. European Signal Processing Conference (EUSIPCO), Marrakech, Morocco, September 9-13, 2013.

Morfi, Veronica; Degottex, Gilles; Mouchtaris, Athanasios, Speech Analysis and Synthesis with a Computationally Efficient Adaptive Harmonic Model, IEEE Trans. Audio, Speech, and Language Processing, vol. 23, no. 11, Nov. 2015, pp. 1950-1962.