Neuronal oscillations are believed to play a role in various perceptual and cognitive tasks, including attention, navigation, memory, motor planning, and - most relevant in the context of the present work - spoken-language comprehension. The specific computational functions of neuronal oscillations are uncertain. We aim to elucidate how these ubiquitous neurophysiological attributes may underpin speech, language, and music processing. Speech and other dynamically changing auditory signals (as well as visual stimuli, including sign) contain critical information required for successful decoding that is carried at multiple temporal scales (e.g. slower intonation-level information, syllabic information, and rapidly changing featural information). These different aspects of signals (slow and fast temporal modulation, frequency composition) must be analyzed to achieve successful recognition. To parse a naturalistic input signal (e.g. speech signal) into elementary pieces, one ‘mesoscopic-level’ mechanism is suggested to be the application of temporal windows, implemented as low-frequency oscillations on privileged time scales.
When we listen to someone speaking, we are able to quickly and effortlessly understand the content of the spoken language. This ability, however, obscures the complexity of the neural processes that underlie comprehension. One of the first steps that the perceptual system has to accomplish is to break the incoming speech stream into units or segments that can provide the basis for the next processing steps. Very little, however, is known about the neural mechanisms of linguistic decoding, that is, how information about the physical stimulus is mapped onto stored linguistic information in the brain (informally speaking, words).
Natural sounds, music, and vocal sounds have a rich temporal structure over multiple timescales, and behaviorally relevant acoustic information is usually carried on more than one timescale. For example, speech conveys linguistic information at several scales: 20-80 ms for phonemic information, 100-300 ms for syllabic information, and more than 1000 ms for intonation information. Therefore, successful perceptual analysis of auditory signals requires the auditory system to extract acoustic information at multiple scales.
The precise role of cortical oscillations in speech processing is under investigation. According to current research, the phase alignment of Δ/θ-band (2-8 Hz) neural oscillations in the auditory cortex is involved in the segmentation of speech. Neural oscillations in the θ band correspond to the slow energy fluctuations in the speech signal at the syllabic rate.
Recently, Overath, McDermott, Zarate and Poeppel  showed that brain regions involved in speech-specific processing (i.e. superior temporal sulcus) are activated even by strongly corrupted speech stimuli.
In a recent study, Ding et al. (2016) showed that spectral peaks of brain waves corresponded to multiple levels of linguistic structure (e.g., peaks in the delta and theta range corresponded to the phrase and syllable rate, respectively). Because no acoustic/prosodic cues at this time scale were present, the peaks in the delta range must be generated internally.