Deep speech-to-text models capture the neural basis of spontaneous speech in everyday conversations

Abstract Humans effortlessly use the continuous acoustics of speech to communicate rich linguistic meaning during everyday conversations. In this study, we leverage 100 hours (half a million words) of spontaneous open-ended conversations and concurrent high-quality neural activity recorded using electrocorticography (ECoG) to decipher the neural basis of real-world speech production and comprehension. Employing a deep multimodal speech-to-text model named Whisper, we develop encoding models capable of accurately predicting neural responses to both acoustic and semantic aspects of speech. Our encoding models achieved high accuracy in predicting neural responses in hundreds of thousands of words across many hours of left-out recordings. We uncover a distributed cortical hierarchy for speech and language processing, with sensory and motor regions encoding acoustic features of speech and higher-level language areas encoding syntactic and semantic information. Many electrodes—including those in both perceptual and motor areas—display mixed selectivity for both speech and linguistic features. Notably, our encoding model reveals a temporal progression from language-to-speech encoding before word onset during speech production and from speech-to-language encoding following word articulation during speech comprehension. This study offers a comprehensive account of the unfolding neural responses during fully natural, unbounded daily conversations. By leveraging a multimodal deep speech recognition model, we highlight the power of deep learning for unraveling the neural mechanisms of language processing in real-world contexts..

Medienart:

Preprint

Erscheinungsjahr:

2023

Erschienen:

2023

Enthalten in:

bioRxiv.org - (2023) vom: 30. Juni Zur Gesamtaufnahme - year:2023

Sprache:

Englisch

Beteiligte Personen:

Goldstein, Ariel [VerfasserIn]
Wang, Haocheng [VerfasserIn]
Niekerken, Leonard [VerfasserIn]
Zada, Zaid [VerfasserIn]
Aubrey, Bobbi [VerfasserIn]
Sheffer, Tom [VerfasserIn]
Nastase, Samuel A. [VerfasserIn]
Gazula, Harshvardhan [VerfasserIn]
Schain, Mariano [VerfasserIn]
Singh, Aditi [VerfasserIn]
Rao, Aditi [VerfasserIn]
Choe, Gina [VerfasserIn]
Kim, Catherine [VerfasserIn]
Doyle, Werner [VerfasserIn]
Friedman, Daniel [VerfasserIn]
Devore, Sasha [VerfasserIn]
Dugan, Patricia [VerfasserIn]
Hassidim, Avinatan [VerfasserIn]
Brenner, Michael [VerfasserIn]
Matias, Yossi [VerfasserIn]
Devinsky, Orrin [VerfasserIn]
Flinker, Adeen [VerfasserIn]
Hasson, Uri [VerfasserIn]

Links:

Volltext [kostenfrei]

Themen:

570
Biology

doi:

10.1101/2023.06.26.546557

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

XBI040028321