Evaluation of the importance of time-frequency contributions to speech intelligibility in noise

Chengzhu Yu, Kamil K. Wójcicki, Philipos C. Loizou, John H.L. Hansen, Michael T. Johnson

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Recent studies on binary masking techniques make the assumption that each time-frequency (T-F) unit contributes an equal amount to the overall intelligibility of speech. The present study demonstrated that the importance of each T-F unit to speech intelligibility varies in accordance with speech content. Specifically, T-F units are categorized into two classes, speech-present T-F units and speech-absent T-F units. Results indicate that the importance of each speech-present T-F unit to speech intelligibility is highly related to the loudness of its target component, while the importance of each speech-absent T-F unit varies according to the loudness of its masker component. Two types of mask errors are also considered, which include miss and false alarm errors. Consistent with previous work, false alarm errors are shown to be more harmful to speech intelligibility than miss errors when the mixture signal-to-noise ratio (SNR) is below 0dB. However, the relative importance between the two types of error is conditioned on the SNR level of the input speech signal. Based on these observations, a mask-based objective measure, the loudness weighted hit-false, is proposed for predicting speech intelligibility. The proposed objective measure shows significantly higher correlation with intelligibility compared to two existing mask-based objective measures.

Original languageEnglish
Pages (from-to)3007-3016
Number of pages10
JournalJournal of the Acoustical Society of America
Volume135
Issue number5
DOIs
StatePublished - May 2014

ASJC Scopus subject areas

  • Arts and Humanities (miscellaneous)
  • Acoustics and Ultrasonics

Fingerprint

Dive into the research topics of 'Evaluation of the importance of time-frequency contributions to speech intelligibility in noise'. Together they form a unique fingerprint.

Cite this