Time-frequency masking for speaker of interest extraction in an immersive environment

Harikrishnan Unnikrishnan, Kevin D. Donohue, Jens Hannemann

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations


Distributed microphone systems can be used to enhance intelligibility for a speaker of interest (SOI) in a noisy environment of multiple speech sources (cocktail party scenario). For finite microphone distributions, however, interfering speech sources leak into the beamformed signal and degrade intelligibility. This article introduces an auditory inspired post-processing algorithm for beamformed signals using spectrooral cues to enhance SOI intelligibility. Spatial power ratios obtained through beamforming on multiple locations are used to identify and mask out time-frequency regions dominated by the interfering speech. Performance results based on planar microphone array simulations show consistent increases in the Speech Intelligibility Index (SII) over the beamformed signal for various configurations of speakers using 2 to 16 microphones. In cases of critically low SII (< 0.25), the application of interference masking achieves critical enhancements in SII, increasing it beyond.3 for the case of 2 microphones to above.5 for the 16 microphone case. Experimental recording were also performed and examples presented. The experimental recordings show similar improvements consistent with the simulation.

Original languageEnglish
Title of host publicationConference Proceedings - IEEE SOUTHEASTCON
ISBN (Electronic)9781479965854
StatePublished - Nov 7 2014
EventIEEE SoutheastCon 2014 - Lexington, United States
Duration: Mar 13 2014Mar 16 2014

Publication series

NameConference Proceedings - IEEE SOUTHEASTCON
ISSN (Print)0734-7502


ConferenceIEEE SoutheastCon 2014
Country/TerritoryUnited States

Bibliographical note

Publisher Copyright:
© 2014 IEEE.

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software
  • Electrical and Electronic Engineering
  • Control and Systems Engineering
  • Signal Processing


Dive into the research topics of 'Time-frequency masking for speaker of interest extraction in an immersive environment'. Together they form a unique fingerprint.

Cite this