18. Januar 2024

Guest Lecture by Adrien Doerig

Interpreting the meaning of a visual scene requires not only identification of its constituent objects, but also rich semantic characterization of object interrelations. Here, we study the neural mechanisms underlying visuo-semantic transformations by leveraging recent progress in linguistic AI. We evaluate models of increasing complexity from object categories to single word embeddings to sentence-level embeddings, and apply the resulting models to a large-scale 7T fMRI dataset of human brain responses elicited by complex natural scenes. We identify a widely distributed network of brain regions that encode sentence-level scene descriptions. Importantly, these sentence embeddings better explain activity in these regions than traditional object category labels or word embeddings, despite the fact that the participants were not required to actively engage in a semantic task. We then show that highly accurate reconstructions of scene captions can be linearly decoded from patterns of brain activity evoked by natural scenes. Finally, we show that a recurrent convolutional neural network trained on sentence embeddings predict brain activity better than the sentence embeddings themselves, and better than control networks trained on category labels. Together, these experimental and modelling results suggest that transforming visual input into rich semantic scene descriptions may be a central computational feature of the visual system, and that focusing efforts on this new objective may lead to improved models of visual information processing in the human brain.

Adrien Doerig (University of Osnabrück, Germany)
Adrien obtained a Msc in neuroscience and physics at EPFL, Switzerland, followed by a PhD in Neuroscience under the direction of Prof. Michael Herzog working with computational models of vision. Since then, he has worked with Prof. Tim Kietzmann at the Donders Institute in the Netherlands, and the University of Osnabrück, Germany. His interests focus on computational models of vision, with a particular interest in modern AI techniques such as DNNs. Currently, he works on semantic representations in the visual system, non-convolutional topographic networks, and dual-stream networks with eye movements. Aside from his interests in neuroAI, he has published several papers about scientific theories of consciousness and how to test them.