How can we understand speech in the presence of background noise?

This study highlights a critical role for neural feedback circuits that modulate the activity of the inner ear, enabling effective listening to degraded speech

Understanding degraded speech leads to perceptual gating of a brain stem reflex in human listeners, is now published in PLOS Biology.

This study was a collaboration between Heivet Hernández-Pérez, Macquarie University Hearing, Jason Mikiel-Hunter, Macquarie University Hearing, David McAlpine, Macquarie University Hearing, Sumitrajit Dhar, Northwestern University, Sriram Boothalingam, University of Wisconsin-Madison Jessica J. M. Monaghan , National Acoustic Laboratories and Catherine M. McMahon, Macquarie University Hearing.


The ability to navigate “cocktail party” situations by focusing on sounds of interest over irrelevant, background sounds is often considered in terms of cortical mechanisms. However, subcortical circuits such as the pathway underlying the medial olivocochlear (MOC) reflex modulate the activity of the inner ear itself, supporting the extraction of salient features from auditory scene prior to any cortical processing. To understand the contribution of auditory subcortical nuclei and the cochlea in complex listening tasks, we made physiological recordings along the auditory pathway while listeners engaged in detecting non(sense) words in lists of words. Both naturally spoken and intrinsically noisy, vocoded speech—filtering that mimics processing by a cochlear implant (CI)—significantly activated the MOC reflex, but this was not the case for speech in background noise, which more engaged midbrain and cortical resources. A model of the initial stages of auditory processing reproduced specific effects of each form of speech degradation, providing a rationale for goal-directed gating of the MOC reflex based on enhancing the representation of the energy envelope of the acoustic waveform. Our data reveal the coexistence of 2 strategies in the auditory system that may facilitate speech understanding in situations where the signal is either intrinsically degraded or masked by extrinsic acoustic energy. Whereas intrinsically degraded streams recruit the MOC reflex to improve representation of speech cues peripherally, extrinsically masked streams rely more on higher auditory centres to denoise signals.

Single and multiple stream