Abstract: Previous studies have shown that visual information can enhance auditory speech perception in noisy environments. The current study aimed to compare two widely accepted procedures for simultaneity judgment (SJ) measurement: 1) the method of constant stimuli (MCS) procedure where overall SJ functions are estimated; 2) the method of adjustment (MA) procedure where the thresholds of the SJ function are only estimated. Results revealed that the thresholds estimated by MCS and MA procedures were relatively equivalent in auditory-leading stimuli; however, the MA procedure overestimated the thresholds in visual-leading stimuli. This suggests that procedure-dependent perceptual bias could occur during the multisensory integration process.
Summary: Speech perception is inherently multimodal. In recent years, several studies have been conducted to investigate the effectiveness of integrating visual and auditory cues to enhance listeners’ speech perception performance in complex listening environments. In most of their studies, two widely accepted psychophysical procedures have been applied: 1) the method of constant stimuli (MCS) procedure where overall perceptual functions are estimated; 2) the method of adjustment (MA) procedure where the thresholds (i.e., points on the perceptual function) are only estimated. The purpose of this study was to compare the performance differences between MCS and MA procedures in auditory-visual perceptual binding, especially the temporal integration process. Ten young adults participated in simultaneity judgment (SJ) experiments in the auditory-visual domain. The auditory stimulus was a speech-shaped noise, and the visual stimulus consisted of a sphere with a radius changing between 0 and 20 cm. All stimuli were presented with a 50-msec duration. Two psychophysical procedures were applied. First, the MCS procedure estimated overall SJ function by measuring subject responses to various stimulus onset asynchronies (SOAs: -1000~ 1000 msec). Here, the negative and positive SOAs represent that one sensory stimulus was presented earlier than the other. The thresholds indicate the boundaries of the temporal binding window (TBW) and are obtained at 50, 71, and 79% on the SJ function. Second, the classical staircase MA procedure estimated the thresholds at 50% (1-down 1-up), 71% (2-down 1-up), and 79% (3-down 1-up) on the SJ function by adaptively adjusting SOAs based on subject responses. In the MA procedure, two interleaved adaptive tracks were examined with the auditory-leading and visual-leading stimuli. All thresholds by both MCS and MA procedures were averaged over two separate runs. Average results showed that the boundaries of the TBW by the MCS procedure were -20 and 408 msec, 2 and 377 msec, and 27 and 302 msec at 50, 71, and 79% on the SJ functions, respectively. Here, the negative values mean auditory-leading SOAs. The TBW boundaries by the MA procedure were -18 and 443 msec, 10 and 411 msec, and 33 and 401 msec at 50, 71, and 79%, respectively.
Brief Summary of Clinical Takeaways: Results revealed that the thresholds estimated by MCS and MA procedures were relatively equivalent in auditory-leading stimuli; however, the MA procedure overestimated the thresholds in visual-leading stimuli. This suggests that procedure-dependent perceptual bias could occur during the multisensory integration process. The findings may be applied to future rehabilitation approaches using auditory training programs to enhance speech perception in noise and implications for potential technological enhancements to speech perception with real-time multisensory hearing-assistive devices. In particular, the current study provides preliminary information to suggest there is a measurement bias factor involved in the multisensory process. That is, the reliable TBW results might specify a minimum amount of time delay for real-time multisensory devices and provide validity for their practical application.
Learning Objectives:
Upon completion, participants will be able to describe the importance of accurate measurement of the temporal coherence cues like thetemporal binding window for real-time multisensory speech processing.