Max Planck Institute for Empirical Aesthetics
Lecture by Laura Gwilliams: Transforming acoustic input
into a hierarchy of linguistic representations
Language comprises multiple levels of representation, from phonemes (e.g. /b/ /p/) to lexical items (e.g. bear, pear) to syntactic structures (e.g. bears [SUBJECT] eat [VERB] pears [OBJECT]). Here we address two research questions that arise in online processing of naturalistic speech: 1) which representational states are encoded in neural activity; 2) what overarching algorithm orchestrates these representations to ultimately derive meaning? Participants listened to spoken narratives while magnetoencephalography (MEG) was recorded. From those recordings we decode and localise phonological, lexical and syntactic operations using machine learning approaches. First, acoustic-phonetic features (e.g. voicing, manner, place of articulation) could be successfully discriminated from a sequence of neural responses unfolding between ~100 ms to ~400 ms after phoneme onset. Second, part of speech (e.g. verb, noun, adjective), indicative of lexical processing, was decodable between ~150 ms and ~800 ms after word onset. Third, we could track proxies of both syntactic operations (e.g. number of closing nodes) and syntactic states (e.g. depth of tree). Interestingly, some of these syntactic representations were clearly present several hundreds of ms before word onset, whereas others maximally peaked ~300 ms later. These sustained and evoked MEG responses suggest that the human brain encodes each level of representation as proposed by linguistic theories. Importantly, the corresponding neural assemblies overlap in space and time, likely facilitating concurrent access across these low-to-high-level representations, in line with a cascade architecture. Finally, our study demonstrates how the combination of machine learning and traditional statistics can bridge the gap between spatiotemporally-resolved neuroimaging data and rich but tractable naturalistic stimuli.