Humans are exquisitely sensitive to the particular arrangement of complex visual features that make up objects. A large body of evidence implicates the ventral temporal cortex in object recognition, suggesting that it encodes both complex features and their spatial arrangement. Contrarily, we demonstrate that ventral temporal cortex and image-net trained deep convolutional neural networks display representational geometry that does not distinguish between natural images of objects or scenes and images that have been engineered to have similar complex visual features that have been spatially scrambled. Human observers, nonetheless, can easily distinguish the natural images. Our results suggest the need to reconceptualize the role of ventral temporal cortex as representing a basis set of complex texture-like visual features that are generally useful for a variety of visual behaviors, rather than as an explicit representation of objects.
https://profiles.stanford.edu/justin-gardner?tab=bio
Organized:Department of Psychology
Co-Organized:Center for Artificial Intelligence and Advanced Robotics、Imaging Center for Integrated Body, Mind and Culture Research