What neuronal and cognitive representations and computations form the basis for the transformation from “vibrations in the ear” (sounds) to “abstractions in the head” (words)? Successful communication using spoken language requires a speech processing system that can negotiate between the demands of auditory perception and motor outputs, on the one hand, and the representational and computational requirements of the language system, on the other. The speech processing system is situated at the interface of acoustics and sensorimotor physiology, at one end, and linguistics, at the other end. The effortlessness and automaticity of the perceptuo-motor and linguistic processes as we experience them belies the number and complexity of the bottom-up and top-down operations that - in aggregate - constitute the perception of speech and comprehension of language. Developing a mechanistic understanding of this system remains one of the foundational challenges for cognitive neuroscience. The research carried out draws - in a relentlessly interdisciplinary fashion - from neuroscience, linguistics, psychology computation, and other relevant domain.
Screaming is an ability we share with many other primates, and which we possess long before we learn to express our affective state with speech. Previous studies focusing on fearful screams highlighted certain acoustic features, such as roughness, unexploited by speech (Arnal et al., 2015), leading to activation of the amygdala and other subcortical structures critical for danger appraisal. However, screams are not exclusively fearful. But how diverse are the acoustic properties of screams compared to the various affective states expressed?
The tone of the voice carries information about the emotional state or intention of a speaker. Whereas the nature of acoustic features of contrasted prosodic signals has attracted a lot of attention in the last decades (particularly since Banse & Scherer, 1996), the communication of emotions/intentions remains poorly understood. Also, most of listeners seem to share the ‘code’ (or interpret adequately a prosodic signal) to access emotions/intentions of speakers but misunderstandings easily occur. This project focuses on the cognitive processes involved in prosody comprehension.
In a foundational study published in 1955, Miller and Nicely measured the perceptual confusions among 19 consonants followed by the vowel [a] (ba, da, ga, etc.). The stimuli were subjected to different kinds of linear distortions, i.e., additive noise and variations in bandwidth. They published their results in terms of confusion matrices. In a followup study, Shepard (in a 1972 book chapter, but see a more accessible publication, Shepard, 1980) analyzed the Miller & Nicely data in a Multidimensional Scaling framework, aiming at revealing the underlying perceptual mechanism that caused those confusions.
Bilingualism has become common. Infants' auditory cortex undergoes structural maturation during the first three years (Yakovlev & Lecours, 1967). They develop an auditory capacity to specifically recognize acoustic patterns used in their native language (e.g., Kuhl et al. 1992). Bilinguals, compared to monolinguals, are exposed to and learn to identify greater number of speech sounds. We hypothesize that distinguishing between and storing larger inventories leads to the expansion of auditory cortical areas and the establishment of denser connections within subregions.
Humans naturally tune in to the rhythm of speech (Giraud & Poeppel, 2012). Recent work has shown that low-frequency brain rhythms have been shown concurrently to track the main constituents in a linguistic hierarchy: phrases and sentences (Ding et al., 2016). Notably, brain rhythms were cross-linguistically tested using isochronous speech in English and Mandarin Chinese. Given the variable nature of linguistic constituents, it is however unclear whether the implicit knowledge of linguistic structure contributes in tracking linguistic constituents (phrases, sentences) of variable duration.