Enactive Interface Perception and Affordances

Posted in artificial intelligence, interfaces on November 14th, 2011 by Samuel Kenyon

I just published version 2 of my Enactive Interface Perception essay over on Science 2.0.

It’s now called “Enactive Interface Perception and Affordances”.

Tags: , , , , ,

Your Visual System is Lying

Posted in artificial intelligence, philosophy on September 6th, 2010 by Samuel Kenyon
Photo of a view uphill of a road.

Does a hill feel steeper when you are already exhausted?  Does a hill appear steeper when you are afraid to roll down it?  Is it true that baseballs appear larger to players when they are hitting well? You may have some suspicions that your perception is greatly affected by your context and may not always be correct.

Psychologists Dennis R. Proffitt (University of Virginia) and Jessica Witt (now at Purdue University) have done performed some interesting experiments in recent years dealing with perception and action.  Christof Koch, professor of biology and engineering at the California Institute of Technology and popular writer, described some of them in his column in the July 2010 issue of Scientific American Mind [1].

In the slant experiments [2], subjects were asked to estimate the slope of a hill with two different visual tasks:

  1. Visual matching: Adjust a line on a flat disk to indicate the slant.

    Diagram of a disk consisting of a circle with a black line diameter.

  2. Haptic: Adjust the slant of a movable board with your hands without looking at the hands.

    Diagram of haptic device, consisting of a rotating board mounted to a stand, with a hand resting palm down on the board.

Koch didn’t mention this, but according to Witt and Proffitt, the two tasks in the experiment are supposed to be absolute, in that the slant of the hill is measured of itself, not compared to anything else.  They had a third task used in previous experiments which was relative in that it compared the slant to the ground plane.  So the new “absolute” disk was introduced to attempt to filter out task differences from the results.

Experiment 1 had the subjects look at the hill head on (pitch).  Task 1 (matching) was not very accurate: 31 degrees was perceived steeper, around 50 degrees, and 22 degrees was perceived as steeper also (in between 30 and 40 degrees).  However, Task 2 (haptic) was accurate.

In previous studies, visual matching and verbal reports were even more inaccurate when the subjects were encumbered, tired, unhealthy, or elderly, yet the haptic task was not influenced.

In experiment 2, the subjects could see the slant of hill from the side (a cross-section).  Yet they still over-estimated the slant with visual matching despite that they could actually hold the disk up in their visual field and match the line to the slant.  The haptic tasks were still accurate.

The researchers concluded that these results support the theory that we have two independent visual systems, one for explicit awareness, and one that is visuomotor for immediate actions.

As Koch explains [1]:

Proffitt argues that perception is not fixed: it is flexible, reflecting a person’s physiological state. Your conscious perception of slant depends on your current ability to walk up or down hills—hard work that should not be undertaken lightly. If you are tired, frail, scared or carrying a load, your assessment of the hill—the one that guides your actions—will differ from what you see. Not by choice, but by design. It is the way you are wired.

The Enactive Approach to Perception

Photo of path going uphill in the woods, with a warning sign about that.

I am reminded of the book Action in Perception, in which philosopher Alva Noë said [3, p.228]:

Perceptual experience, according to the enactive approach, is an activity of exploring the environment drawing on knowledge of sensorimotor dependencies and thought.

It seems that the slant perception experiments mesh with the enactive approach.  Again, from Noë [3, p.105]:

To see that something is flat is precisely to see it as giving rise to certain possibilities of sensorimotor contingency.  To feel a surface as flat is precisely to perceive it as impeding or shaping one’s possibilities of movement.

But, you might argue, the enactive approach doesn’t account for why Task 2 (haptic) is more accurate–wouldn’t the enactive approach predict that Task 2 is as skewed as Task 1?  However, one of the things Noë attempted to argue was that we always have dual experiences even if we don’t realize it:

  1. How things are to us in experience.
  2. How things look.  Normally this is transparent, but one can, with some effort, see how things present themselves visually, e.g. when an artist is painting a depiction.

So, this hypothetical solution indicates that the Task 2, operating on the action-oriented visuomotor stream, provides data as one can achieve with the techniques of artists.

Turning the Tables
Koch started off his column by invoking the two vision systems theory (which is what the slant experiment was trying to prove) [1]:

As psychologists and neuroscientists have discovered over the past several decades, our consciousness provides a stable interface to a dizzyingly rich sensory world. Underneath this interface lurk two vision systems that work in parallel. Both are fed by the same two sensors, the eyeballs, yet they serve different functions. One system is responsible for visual perception and is necessary for identifying objects—such as approaching cars and potential mates—independent of their apparent size or location in our visual field. The other is responsible for action: it transforms visual input into the movements of our eyes, hands and legs. We consciously experience only the former, but we depend for our survival on both.

So isn’t that enough of a description–doesn’t it annihilate the enactive approach?  Noë addressed the two visual systems theory, but spent little more than a page on it, dismissing it as orthogonal to the enactive approach.  Noë states that both visual streams depend on deployment of sensorimotor skills [3, p.19].

However, the slant perception studies actually lend support to the enactive approach–there are sensorimotor relations to both the visuomotor stream and the explicit awareness stream.  The awareness perceptions are modulated by sensorimotor skills as they apply to a person’s current context, which is why Task 1 results in inaccurate perceptions.   Meanwhile, the visuomotor stream is tied into in a quicker, tighter loop that skips the slower type of aware perceptual processing that the other stream uses, which is why Task 2 is accurate.

In other words, both visual systems use sensorimotor skills, but in different ways, thus giving support to the enactive approach.


[1] Koch, C., “Looks Can Deceive: Why Perception and Reality Don’t Always Match Up,” Scientific American Mind, July 2010.
[2] Witt, J. K.,&Proffitt, D.R., “Perceived slant: A dissociation between perception and action,” Perception, vol. 36, pp. 249-257, 2007.
[3] Noë, A., Action in Perception, Cambridge, MA: MIT Press, 2004.

Image Credits

  1. Stefan Jannson
  2. Samuel H. Kenyon
  3. Samuel H. Kenyon
  4. most uncool

Crosspost with my other blog, In the Eye of the Brainstorm.
Tags: , , , , , ,

Enactive Interface Perception

Posted in artificial intelligence, interfaces on February 24th, 2010 by Samuel Kenyon

UPDATE 2011: There is a new/better version of this essay:  “Enactive Interface Perception and Affordances”.

There are two theories of perception which are very interesting to me not just for AI, but also from a point of view of interfaces, interactions, and affordances.  The first one is Alva Noë’s enactive approach to perception.  The second one is Donald D. Hoffman’s interface theory of perception.

Enactive Perception vs. Interface Perception

Enactive Perception

The key element of the enactive approach to perception is that sensorimotor knowledge and skills are a required part of perception.

A lot of artificial perception schemes, e.g. for robots, run algorithms on camera video frames.  Some programs also use the time dimension, e.g. structure from motion.  They can find certain objects and even extract 3D data (especially if they also use a range scanner such as LIDAR, ultrasound, or radar).  But the enactive approach suggests that animal visual perception is not simply a transformation of 2-D pictures into a 3-D (or any kind) of representation.

Example of optical flow (one of the ways to get structure from motion). Credit: Naoya Ohta.

My interpretation of the enactive approach is that it suggests perception co-evolved with motor skills such as how our bodies move and how our sensors, for instance, eyes, move.  A static 2D image can not tell you what color blobs are objects and what are merely artifacts of the sensor or environment (e.g. light effects).  But if you walk around this scene, and take into account how you are moving, you get a lot more data to figure out what is stable and what is not.  We have evolved to have constant motion in our eyes via saccades, so even without walking around or moving our heads, we are getting this motion data for our visual perception system.

Of course, there are some major issues that need to be resolved, at least in my mind, about enactive perception (and related theories).  As Aaron Sloman has pointed out repeatedly, we need to fix or remove dependence on symbol grounding.  Do all concepts, even abstract ones, exist in a mental skyscraper built on a foundation of sensorimotor concepts?  I won’t get into that here, but I will return to it in a later blog post.

The enactive approach says that you should be careful about making assumptions that perception (and consciousness) can be isolated on one side of an arbitrary interface.  For instance, it may not be alright to study perception (or consciousness) by looking just at the brain.  It may be necessary to include much more of the mind-environment system—a system which is not limited to one side of the arbitrary interface of the skull.

Perception as a User Interface

The Interface Theory of Perception says that “our perceptions constitute a species-specific user interface that guides behavior in a niche.”

Evolution has provided us with icons and widgets to hide the true complexity of reality.  This reality user interface allows organisms to survive better in particular environments, hence the selection for it.

Perception as an interface

Hoffman’s colorful example describes how male jewel beetles use a reality user interface to find females.  This perceptual interface is composed of simple rules involving the color and shininess of female wing cases.  Unfortunately, it evolved for a niche which could not have predicted the trash dropped by humans that lead to false positives—which results in male jewel beetles humping empty beer bottles.

Male Australian jewel beetle attempting to mate with a discarded "stubby" (beer bottle). Credit: Trevor J. Hawkeswood.

All perception, including of humans, evolved for adaptation to niches.  There is no reason or evidence to suspect that our reality interfaces provide “faithful depictions” of the objective world.  Fitness trumps truth.  Hoffman says that Noë supports a version of faithful depiction within enactive perception, although I don’t see how that is necessary for enactive perception.

Of course, the organism itself is part of the environment.

True Perception is Right Out the Window

How do we know what we know about reality?  There seems to be a consistency at our macroscopic scale of operation.  One consistency is due to natural genetic programs—and programs they in turn cause—which result in humans having shared knowledge bases and shared kinds of experience.  If you’ve ever not been on the same page as somebody before, then you can imagine how it would be like if we didn’t have anything in common conceptually.  Communication would be very difficult.  For every other entity you want to communicate with, you’d have to establish communication interfaces, translators, interpreters, etc.  And how would you even know who to communicate with in the first place?  Maybe you wouldn’t have even evolved communication.

So humans (and probably many other related animals) have experiences and concepts that are similar enough that we can communicate with each other via speech, writing, physical contact, gestures, art, etc.

But for all that shared experience and ability to generate interfaces, we have no inkling of reality.

Since the UI theory says that our perception is not necessarily realistic, and is most likely not even close to being realistic, does this conflict with the enactive theory?

Noë chants the mantra that the world makes itself available to us (echoing some of the 80s/90s era Brooksian “world as its own model”).  If representation is distributed in a human-environment system, doesn’t that mean it must be a pretty accurate representation?  No.  I don’t see why that has to be true.  So it seems we can combine the two theories together.

There may be some mutation to enactive theories if we have to slant or expand perception more towards what happens in the environment and away from the morphology-dependent properties.  In other words, we may have to emphasize the far environment (everything you can observe or interact with) even more than the near environment (the body).  As I think about this and conduct experiments, I will report on how this merging of theories is working out.


Noë, A., Action in Perception, Cambridge, MA: MIT Press, 2004.

Hoffman, D.D, “The interface theory of perception: Natural selection drives true perception to swift extinction” in Dickinson, S., Leonardis, A., Schiele, B., & Tarr, M.J. (Eds.), Object categorization: Computer and human vision perspectives. Cambridge, UK: Cambridge University Press, 2009, pp.148-166.  PDF.

Hawkeswood, T., “Review of the biology and host-plants of the Australian jewel beetle Julodimorpha bakewelli,” Calodema, vol. 3, 2005.  PDF.

Tags: , , , , , , , , ,