Abstract: Facial cues and accurate text captions enhance speech perception in noisy settings. However, the effects of reduced caption accuracy on comprehension, with or without facial cues, are less understood. In this study, eight adults with normal hearing completed sentence recognition tasks with varying facial cues, signal-to-noise ratios (SNR), and caption accuracy levels. Findings showed that facial cues and higher SNR improved recognition, while lower caption accuracy decreased scores. Notably, facial cues benefited comprehension across all caption conditions. These results highlight the value of accurate captioning in auditory-visual aids, potentially supporting individuals in challenging acoustic environments.
Summary: Objective This study investigates how decreasing the accuracy of text captioning impacts speech perception with and without facial cues, particularly in noisy environments.
Rationale The negative effects of noise on speech perception are known to be mitigated by visual cues, such as facial expressions and text captioning. Research has shown that accurate captioning improves auditory-visual speech recognition, especially when facial cues are available. However, there is limited understanding of how decreased caption accuracy affects speech recognition in auditory-visual contexts. Given the prevalence of background noise in social and occupational settings, this study seeks to clarify the role of caption accuracy and facial cues in supporting speech comprehension.
Methods Eight adults with normal hearing participated in a sentence recognition task under 18 conditions. These conditions varied by facial cues (yes/no), signal-to-noise ratio (SNR: -7 dB, -5 dB, -4 dB), and text accuracy (4 content words displayed, 2 content words displayed, 1 content word displayed). Each sentence contained 4 content words, and participants' recognition scores ranged from 0 to 4. A Cumulative Link Mixed Model (CLMM) was employed, using facial cues, SNR levels, and text accuracy as fixed effects, with participants as a random effect.
Results & Conclusions The CLMM revealed that the presence of facial cues (p < 0.001) and higher SNR (p < 0.001) each significantly increased the odds of higher sentence recognition scores, while reduced text accuracy was significantly associated with lower scores (p < 0.001). No significant interactions were found between the predictors, indicating that each contributes independently to improved comprehension. These findings indicate that both facial cues and accurate captioning improve speech recognition in noisy environments, with the benefit of accurate text remaining consistent regardless of facial cues. The results suggest that integrating accurate captioning in auditory-visual listening aids could benefit individuals in challenging acoustic settings.
Instructional Level This study is intended for an intermediate audience familiar with sensory processing, providing new insights relevant for clinical applications, auditory device development, and communication strategies.
Brief Summary of Clinical Takeaways: This study addresses an important issue by examining how individuals adapt their sensory processing to support speech comprehension in noisy environments. The innovative use of eye-tracking and cumulative modeling techniques provides a deeper understanding of how varying text accuracy and facial cues impact speech recognition. The findings have the potential to inform the design of auditory devices and communication aids, helping individuals navigate complex auditory environments more effectively.
Learning Objectives:
Upon completion, participants will be able to describe how visual cues, like text captioning accuracy, and background noise impact speech comprehension, providing insights for improving communication tools.
Investigate the Role of Facial Cues and Captions in Speech Perception
Apply Findings to Improve Communication Strategies