Symposium on Imagistic Cognition

Authors

Roland Fleming, Yaniv Morgenstern, Kevin Lande, Christopher Gauker

Affiliation: University of Gießen, Erasmus University Rotterdam, York University, Universität Salzburg

Category: Symposia

Keywords: imagistic cognition, perception, imagery, generative models, similarity space

Schedule & Location

Date: Thursday 4th of September

Time: 14:30

Location: Maria Skłodowska-Curie Hall (123)

Abstract

Imagistic cognition comprises the processes by which perceptual and quasi-perceptual imagistic representations are produced and transformed in the interest of problem solving. The processes by which we learn to produce and use imagistic representations are also included. An assumption underlying the theory of imagistic cognition is that processes of imagistic cognition can function largely independent of conceptual (classifying) representations, although the imagistic foundation for conceptual representations is also a topic for the theory of imagistic cognition.

The presentations in this symposium deal with foundational questions concerning imagistic cognition. Roland Fleming will explain how representations of distal properties can emerge in generative models seeking regularities in proximal sensory data. Yaniv Morgenstern will set forth a model for non-symbolic representations of similarity between perceptible and imaginable objects and even scenes in a multi-dimensional similarity space measuring gradable qualities along many dimensions. Kevin Lande will extend his account of the perceptual representation of the spatial relations of the parts of objects to the special case of our representation of several objects in a connected space. Christopher Gauker will explain how the distinction between realistic and fantastic courses of imagination can be defined in terms of the modes of representation and processes of transformation of imagistic cognition.

Taken together, these four presentations lay much of the foundation for a study of imagistic cognition. Morgenstern and Lande present the basic formats of perceptual/imagistic representations. Fleming explains how perceptual/imagistic representations acquire representational contents. Gauker defines a distinction that helps to differentiate the useful imagistic representations from the useless.


Generative models and visual understanding

Roland W. Fleming, PhD, FRSB
Kurt Koffka Professor of Experimental Psychology
Justus Liebig University Giessen,
Gießen, Germany

Perception researchers typically frame perception as the process of inferring the distal world from proximal sense data. In this view, the primary obstacle is the ambiguity of raw sense data and the primary result is knowing ‘what' is ‘where'. Here I challenge and extend both of these central tenets of perception research.

First, using theory, demos and data, I'll argue perception delivers much more than mere 'labels in space'. Instead, we enjoy a deep visual understanding of the physical and functional properties of our environment. At a glance, we can judge subtle parametric attributes of objects, materials, places and agents, like seeing whether a surface is flexible, fragile or slippery, or whether a predator is poised to pounce. We can use our perceptual understanding to infer what likely happened to objects in their past (e.g. a crushed can or bitten apple) and predict likely futures, (e.g., which direction a block tower would tumble if nudged). We can work out if tugging a tread would unravel a knot or visualise how other members of a novel object class might look having seen just a single example. Together these capabilities suggest a central role of internal ‘generative models' that encapsulate perceptual knowledge about the physical and functional behaviour of our world, and which can guide reasoning in an imagistic, specifically perceptual way.

Second, I'll discuss how we acquire such deep, generative understanding of the world. I'll argue that in addition to the 'perceptual inference problem' of estimating distal properties from sense data, there is a more fundamental ‘ontological inference problem'. Before we can learn how to see, we need to work out what to see. For example, learning to infer how long a table edge is ('size constancy'), requires knowing that there is such a thing as distal length in the first place. Somehow the brain has to discover the basic latent variables of the world, to be inferred from sensory evidence. This is challenging because neither we, nor those that teach us, nor any evolutionary ancestor, has ever had access to extrasensory information about the true state of the world. Appealing to other senses as training signals (e.g., ‘touch educates vision') is no help: all senses provide ambiguous information, they all suffer from the same ontological challenge, and many aspects of the world are accessible to only one sense at a time: tree tips are out of reach, you can't smell the pitch of a voice or touch the colour of a surface. While generating actions likely helps us learn perceptual representations, there's no sidestepping the fact that the changes in world state that are brought about by issuing motor commands are ultimately also known to us only though the senses.

I will argue that paradoxically, the best way to overcome this—to acquire deep knowledge about the distal structure of the world is through learning objectives focussed on encoding and predicting proximal sense data. I will show how such approaches predict both successes and putative failures of human perception.


ShapeComp and the dynamics of imagistic shape representations

Yaniv Morgenstern
Erasmus School of Social and Behavioral Sciences
Erasmus University Rotterdam
Rotterdam, Netherlands
(Reports on joint work with Filipp Schmidt, Frieder Hartmann, Henning Tiedemann, Kate Storrs, Guido Maiello, Eugen Prokott, Johan Wagemans, and Roland Fleming)

Mental imagery plays a central role in human cognition, allowing us to compare, transform, and reason about objects without direct sensory input. Unlike symbolic accounts of cognition, imagistic cognition posits that internal representations preserve structural properties of perceptual experience, enabling fluid transformations between similar shapes. In this context, we explore ShapeComp, a high-dimensional computational model of visual shape similarity, as a framework for understanding imagistic cognition. ShapeComp provides a quantitative approach to modeling shape perception by embedding objects in a structured, continuous space where similarity is determined along multiple perceptual dimensions.

ShapeComp models shape similarity not through discrete symbolic categories but via a set of perceptual dimensions that capture systematic variations in shape. These dimensions are based on a data-driven approach, developed through computational analysis of large-scale datasets, that captures the way natural animal shapes tend to vary, and include global properties such as "spiky-to-blobby," which reflects the degree of protrusions or curvature present in a shape, and relative orientation of the shape relative to the horizontal axis. By representing objects as points in a high-dimensional perceptual space, ShapeComp allows for fine-grained similarity judgments that align with human perception. Shapes that are nearer in the space are perceived as more similar, with proximity determined using metrics such as Euclidean distance in a multi-dimensional embedding space that reflects systematic perceptual variations.

Beyond pairwise shape similarity, ShapeComp extends naturally to contexts involving multiple objects. Unlike traditional approaches that compare individual object pairs in isolation, ShapeComp can successfully evaluate relative shape similarity in multi-object contexts. This feature enables the model to predict how objects are mentally arranged in a conceptual space and how perceptual adaptation—such as shape aftereffects—systematically shifts representations within this space. Empirical studies on shape aftereffects reveal that prolonged exposure to a given shape biases subsequent perception away from the adapted stimulus, an effect that ShapeComp accounts for by shifting objects within its high-dimensional embedding. These perceptual shifts provide computational evidence that shape representations are dynamically updated based on visual experience, rather than being fixed symbolic entities.

The ability of ShapeComp to represent both local and global shape transformations provides insights into imagistic cognition. Unlike symbolic models, which rely on discrete feature sets or rule-based classification, ShapeComp offers a continuous, structured representation that captures the graded nature of visual experience. This aligns with theories of mental imagery in which internal representations are not static snapshots but dynamically evolve under perceptual and cognitive constraints. Moreover, the model's capacity to generalize across diverse shape comparisons suggests that imagistic cognition is inherently high-dimensional, involving transformations along multiple perceptual axes simultaneously.

By bridging computational modeling with empirical findings on shape perception and aftereffects, ShapeComp provides a novel framework for studying imagistic cognition. Its ability to model both pairwise and multi-object similarity judgments, alongside perceptual adaptation effects, highlights the dynamic and non-symbolic nature of shape representations. This approach offers a promising direction for understanding how mental imagery operates in the visual domain and, more broadly, how the mind encodes and manipulates complex perceptual structures.


The spatial unity of perception

Kevin J. Lande
Department of Philosophy
York University
Toronto, Canada

We seem to perceive things as related within a common, unified space. Not only do I see the cat as here on the mat and the lamp as there on the ramp; I also see the cat as to the right of the lamp. Yet perceptual processes code space with respect to a diversity of reference frames, based on different parts of one's body, one's environment, and objects themselves. How does a unified perception of space emerge from this diversity of ways of coding space? How do we go from multiple perceptual “images” or “maps” to one seamless one? We can divide this question into three sub-questions. The Representation Question asks what it is to represent things as located in a common space—to have spatially unified representations of things. The Grounding Question asks about the fundamental conditions under which we can have representations that are spatially unified in that sense. The Causal Question asks how such representations are generated by psychological processes.

I focus on the Representation Question. The standard Identity Assumption holds that in order to represent things as located in a common space, one must represent those things all with respect to one and the same frame of reference. Contents must be remapped into the same map, so to speak. I argue that the Identity Assumption is false. In its place, I offer a Compositional account of spatial perception. We can perceive space in a structured way. As an example, suppose a friend is waving as they walk past you. You see the back-and-forth motion of their hand relative to their body. You see the lateral motion of their body relative to you. You see the motion of their hand relative to you in a structured way, as moving back-and-forth relative to something that is moving laterally relative to you. I argue that this structured way of seeing spatial features and relations is essential to what it is to perceive things as located within a common space. To perceive things all within a common space is not to remap them into one common map; it is, in essence, to model the way that different maps, or frames of reference, are related to each other.


How to imagine realistically

Christopher Gauker
Department of Philosophy
University of Salzburg
Salzburg, Austria

When we imagine a course of events, we may regard what we imagine as realistic or as pure fantasy. We prefer realistic courses of imagination when we are trying to solve practical problems, for instance, in trying to imagine how the cat got out of the house. The distinction between realistic and fantastic courses of imagination does not reduce to the distinction between probable and improbable, and it does not reduce to a distinction between what is possible and what is impossible. Accordance with prior beliefs is a constraint, but an insufficient one.

A fundamental account of realistic imaginings will be built on three components. First, we need an account of the representation relation for imagistic representations. Second, we need an account of the transformations applicable to representations of these kinds. Third, we need an account of how episodic memory supplies representations to start with. This presentation will address only the first two.

An imagistic representation, like a perceptual representation, has a location in a multidimensional perceptual similarity space. Call this kind of representation gauge-like representation. The dimensions measure gradable, perceptible qualities, some expected (hue, size) and others discovered empirically (stringiness) (Hebart et al., Nature Human Behavior 2020, Morgenstern, Fleming, et al., PLOS Computational Biology 2021). Gauge-like representation allows us to recognize that x is more like y than like z with respect to a given subset of dimensions.

Second, an imagistic representation, like a perceptual representation, has an internal structure of parts and relations. Accurate perceptual representations will stand in a relation of structural isomorphism to the objects they represent. Call this kind of representation map-like representation. Map-like representation is attested in studies of mental object rotation (Shepard & Feng, Cog Psych 1972) and amodal completion (van Lier & Wagemans, JEP: Human Perception and Performance 1999).

In terms of gauge-like representation, we can define admissible morphing transformations by which we take memories of processes and uniformly translate them across one or more dimensions of perceptual similarity space. By this means we can imagine that a certain bending transformation that we have seen applied to a spring can be applied to a cylinder of baking dough (Schmidt, Fleming, et al. Cognition 2019). Or we can imagine that certain dance step that we have seen performed by a woman can be performed by a man. In terms of map-like representations we can define admissible geometric transformations. By means of these we are able to imagine what an object will look like from the other side (Cooper, JEP: Learning, Memory and Cognition 1990) and decide whether two objects will fit into each other (Hafri & Firestone, Trends in Cognitive Sciences 2021).

My hypothesis is that all realistic courses of imagining can be generated from material supplied by memory by means of some combination of permissible transformations defined in these ways (which I will here only illustrate). Of those so generated, the realistic ones are those compatible with prior belief.


Websites and repr