The Visual System You receive information about the world through various sensory modalities: You hear the sound of the approaching train, you smell the freshly baked bread, you feel the tap on your shoulder. Researchers have made impressive progress in studying all of these modalities, and students interested in, say, hearing or the sense of smell will find a course in (or a book about) sensation and perception to be fascinating.
There’s no question, though, that for humans vision is the dominant sense. This is reflected in how much brain area is devoted to vision compared to any of the other senses. It’s also reflected in many aspects of our behavior. For example, if visual from other senses, you usually place your trust in vision. This is the basis for ventriloquism, in which ation conflicts with information received fo you see the dummy’s mouth moving while the sounds themselves are coming from the dummy’s master. Vision wins out in this contest, and so you experience the illusion that the voice is coming from the dummy. The Photoreceptors
How does vision operate? The process begins, of course, with light. Light is produced by many objects in our surroundings-the sun, lamps, candles-and then reflects off other objects. In most cases, it’s this reflected light-reflected from this book page or from a friend’s face-that launches the processes of visual perception. Some of this light hits the front surface of the eyeball, passes through the cornea and the lens, and then hits the retina, the light-sensitive tissue that lines the back of the eyeball (see Figure 3.1). The cornea and lens focus the incoming light, just as a camera lens might, so that a sharp image is cast onto the retina. Adjustments in this process can take place because the lens is surrounded by a band of muscle. When the muscle tightens, the lens bulges somewhat, creating the proper shape for focusing the images cast by nearby objects. When the muscle relaxes, the lens returns to a flatter shape, allowing the proper focus for objects farther away. On the retina, there are two types of photoreceptors-specialized neural cells that respond directly to the incoming light. One type, the rods, are sensitive to very low levels of light and so play an essential role whenever you’re moving around in semidarkness or trying to view a fairly dim stimulus. But the rods are also color-blind: They can distinguish different intensities of light (and in that way contribute to your perception of brightness), but they provide no means of discriminating one hue from another (see Figure 3.2). Cones, in contrast, are less sensitive than rods and so need more incoming light to operate at all. But cones are sensitive to color differences. More precisely, there are three different types of cones each having its own pattern of sensitivities to different wavelengths (see Figure 3.3). You perceive color, therefore, by comparing the outputs from these three cone types. Strong firing from only the cones that prefer short wavelengths, for example, accompanied by weak (or no) firing from the other cone types, signals purple. Blue is signaled by equally strong firing from the cones that prefer short wavelengths and those that prefer medium wavelengths, with only modest firing by cones that across the three cone types, prefer long wavelengths. And so on, with other patterns of firing, corresponding to different perceived hues. Cones have another function: They enable you to discern fine detail. The ability to see fine detail is referred to as acuity, and acuity is much higher for the cones than it is for the rods. This explains why you point your eyes toward a target whenever you want to perceive it in detail. What you’re actually doing is positioning your eyes so that the image of the target falls onto the fovea, the very center of the retina. Here, cones far outnumber rods (and, in fact, the center of the fovea has no rods at all). As a result, this is the region of the retina with the greatest acuity.
In portions of the retina more distant from the fovea (i.e., portions of the retina in the so-called visual periphery), the rods predominate; well out into the periphery, there are no cones at all. This distribution of photoreceptors explains why you’re better able to see very dim lights out of the corner of your eyes. Psychologists have understood this point for at least a century, but the key observation here has a much longer history. Sailors and astronomers have known for hundreds of years that when looking looking slightly away from the star, they ensured that the star’s image would fall outside of the fovea at a barely visible star, it’s best not to look directly at the star’s location. By and onto a region of the retina dense with the more light-sensitive rods. Lateral Inhibition Rods and cones do not report directly to the cortex. Instead, the photoreceptors stimulate bipolar cells, which in turn excite ganglion cells. The ganglion cells are spread uniformly across the entire retina, but all of their axons converge to form the bundle of nerve fibers that we call the optic nerve. This is the nerve tract that leaves the eyeball and carries information to various sites in the brain. The information is sent first to a way station in the thalamus called the lateral geniculate nucleus (LGN); from there, information is transmitted to the primary projection area for vision, in the occipital lobe.
Let’s be clear, though, that the optic nerve is not just a cable that conducts signals from one site to another. Instead, the cells that link retina to brain are already analyzing the visual input. One example lies in the phenomenon of lateral inhibition, a pattern in which cells, when stimulated, inhibit the activity of neighboring cells. To see why this is important, consider two cells, each receiving stimulation from a brightly lit area (see Figure 3.4). One cell (Cell B in the figure) is receiving its stimulation from the middle of the lit area. It is intensely stimulated, but so are its neighbors (including Cell A and Cell C). As a result, all of these cells are active, and therefore each one is trying to inhibit its neighbors. The upshot is that the activity level of Cell B is increased by the stimulation but decreased by the lateral inhibition it’s receiving from Cells A and C. This combination leads to only a moderate level of activity in Cell B. In contrast, another cell (Cell C in the figure) is receiving its stimulation from the edge of the lit area. It is intensely stimulated, and so are its neighbors on one side. Therefore, this cell will receive inhibition from one side but not from the other (in the figure: inhibition from Cell B but not from Cell D), so it will be less inhibited than Cell B (which is receiving inhibition from both sides). Thus, Cells B and C initially receive the same input, but C is less inhibited than B and so will end up firing more strongly than B.
Notice that the pattern of lateral inhibition highlights a surface’s edges, because the response of cells detecting the edge of the surface (such as Cell C) will be stronger than that of cells detecting the middle of the surface (such as Cell B). For that matter, by increasing the response by Cell C and decreasing the response by Cell D, lateral inhibition actually exaggerates the contrast at the edge-a importance, because it’s obviously object’s shape-information essential for figuring what the object is. And let’s emphasize that this edge enhancement occurs at a very early stage of process called edge enhancement. This process is of enormous highlighting the information that defines an out the visual processing. In other words, the information sent to the brain isn’t a mere copy of the incoming stimulation; instead, the steps of interpretation and analysis begin immediately, in the eyeball. (For a demonstration of an illusion caused by this edge enhancement- the so-called Mach bands-see Figure 3.5.) e. Demonstration 3.1: Foveation
The chapter describes the basic anatomy of the eyeball, including the fact that the retina (the light- sensitive surface at the back of the eye) has, at its center, a specialized region called the fovea. The cells in the fovea are distinctive in several ways, but, perhaps most important, they are much better at discerning visual detail than cells elsewhere on the retina.
In fact, cells away from the fovea are not just worse at seeing detail in comparison to foveal cells, they are actually quite bad at seeing detail. As a result, if you want to see an object’s details, you need to look straight at it; this movement positions your eyes so that the object’s image falls on the fovea. If you want to see detail in other regions, then you need to reposition your eyes so that new inputs will be in “foveal view.”
Putting this more broadly, if you want to scrutinize an entire scene, you need to move your eyes a lot, and this point leads to another limitation, because eye movements are surprisingly slow: For the eye movements we use to explore the world- eye movements called “saccades“-you need almost 200 msec to change your eye position. Most of that time is spent in “planning” and “programming” each movement, but, even so, you’re only able to move your eyes four or five times each second; it’s just not possible to move your eyes more quickly than this. This combination-the inability to see detail outside of the fovea, and the slowness of eye movements-places severe limits on your pickup of information from the world, and these limits, in turn, influence how the nervous system must use and interpret the information actually received. How severe are these limits? And just how distinctive is the fovea? Position yourself about 12 inches from these words. Point your eyes at the black dot in the middle of the display, and try not to move them. Stare at the dot for a moment, to make sure you’ve got your eye position appropriately “locked” in place, and then, without moving your eyes, try to read the letters one row up or down to the left or right. You should be able to do this, but you will probably find that your impression of the letters is indistinct. Now-still without moving your eyes- from the dot, or a couple of positions try reading the letters further from the dot. This should be more difficult. What’s going on? When you point your eyes at the dot, you’re positioning each eyeball relative to the input so that the dot falls on the fovea; therefore, the other letters fall on retinal positions away from the fovea. The other letters are therefore falling on areas of the retina that are literally less able to see sharply.
Notice, however, that in the ordinary circumstances of day-to-day life, the entire visual world seems sharp and clear to you. You don’t have an impression of only being able to see a small region clearly, with everything else being blurry. Your sense of the world, though, is produced in large part by the “construction” and “filling in” that you do unconsciously-relying on inference and to supplement the surprisingly sparse input that is actually provided by your eyes. e. Demonstration 3.2: Eye Movements
The previous demonstration was designed to remind you that only a small portion of the retina (the fovea) is sensitive to fine detail. This is, of course, one of the reasons why you constantly move your eyes: Every shift in position points the eyes at a new portion of the visual world, allowing you to pick up detail from that portion of the world. Eventually, with enough time and enough changes in eye position, you can inspect an entire scene.
To explore the world, you rely on eye movements called “saccades.” These eye movements (mentioned in Demonstration 3.1) are abrupt and “jerky,” as your eyes hop from position to position, and, in fact, the word saccade is taken from the French for “jerk” or “twitch” To see just how jerky these eye movements are, sit (or stand) close to a friend (within 2 feet or so), but just off to the side. (There’s no need in this demonstration for you and your friend to be nose-to-nose.) Have your friend look off to the left, and then, when you say “go,” have your friend move his or her eyes smoothly to the right. You’ll easily see that-despite this instruction- the eye movements aren’t smooth at all. Instead, your friend’s eyes move left-to-right in a series of small jumps; these are the saccades. (Now, reverse roles, so that your friend can see your saccades.)
Next, try a variation of this procedure: Again, position yourself to watch your friend’s eye movements. This time, hold up a pen, positioning it off to your friend’s left. Now, smoothly move the pen from your friend’s left to your friend’s right, and have your friend watch the pen’s tip as it moves across his or her view. This time, you won’t see jerky eye movements. Instead, when someone is tracking a moving object (such as the pen’s tip), the person relies on a different type of eye movement called “smooth pursuit movements.” (And, once more, reverse roles so that your friend can watch your smooth pursuit.)
Obviously, therefore, people which type do you use in your ordinary examination of the world? One last time, position yourself to watch your friend’s eye movements. This time, have your friend look around, counting the circular objects that are in view. (If there are no circular objects around, choose some other target. In truth, capable of producing smooth (not jerky) eye movements. But are want some chore that will force your friend to the nature of the target doesn’t matter; you just inspect the immediate environment.) Which type of eye movements does your friend use-the jerky saccades, or smooth movements? Finally, internal state. Consider, for example, one last comment: A person’s pattern of eye movements is also influenced by his or her one of the tests that police officers rely on when they suspect a driver is intoxicated. The police routinely conduct what’s called a “field sobriety test,” and one part of the test involves a close examination of the driver’s eye movements. The test can yield several indications of drunkenness-including a disruption of smooth pursuit, and also an inability to hold the eyes still when looking at a stationary target. Plainly, then, an understanding of eye movements has practical implications-and is one of the ways in which we promote safety by keeping drunk drivers off the road! e. Demonstration 3.3: The Blind Spot and the Active Nature of Vision
Axons from the retina’s ganglion cells gather together to form the optic eyeball and carries information first to the thalamus and then to the visual cortex. Notice, therefore, nerve. This nerve leaves the that there has to be a location at the back of each eyeball that can serve as the “exit” for the ganglion cells’ axons, and the axons fill this “exit” entirely, leaving no room for rods or cones. As a result, this photoreceptors at all and, therefore, is completely insensitive to light. region contains no Appropriately enough, this region is called the “blind spot”.
Ordinarily, people aren’t aware of the blind spot-but we can make them aware with a simple procedure. L0ok at the following picture, with your face about 18 inches from the screen. Close your left eye. Stare at the center of the author’s picture on the left. Gradually lean toward or away from the screen. At most distances, you’ll still be able to see the brain (on the right) out of the corner of your eye. You should be able to find a distance, though, at which the brain picture drops from view- it just seems not to be there. What is going on? You’ve positioned the screen, relative to your eye, in a way that places the author’s picture on your fovea but the picture of the brain on your blind spot, and so the brain simply became invisible to you.
Even when the brain “disappeared,” however, you didn’t perceive a “hole” in the visual world. Instead, the brain picture disappeared, but you could still perceive the continuous grid pattern with interruption in the lines. Why is this? Your visual system detected the pattern in the grid no (continuous vertical lines plus continuous horizontals) and used this pattern to “fill in” the information that was missing because of the blind spot. But, of course, the picture of the brain isn’t part of this overall pattern, so it wasn’t included when you did the filling in. Therefore, the picture of the brain vanished, but the pattern was not disrupted. Visual Coding In Chapter 2, we introduced the idea of coding in the nervous system. This term refers to the relationship between activity in the nervous system and the stimulus (or idea or operation) that is somehow represented by that activity. In the study of perception, through which neurons (or groups of neurons) manage to represent the shapes, colors, sizes, and movements that you perceive? we can ask: What’s the code Single Neurons and Single-Cell Recording Part of what we know about the visual system-actually, part of what we know about the entire brain -comes from a technique called single-cell recording. As the name implies, this is a procedure through which investigators can record, moment by moment, the pattern of electrical changes within a single neuron.
We mentioned in Chapter 2 that when a neuron fires, each response is the same size; this is the all-or-none law. But neurons can vary in how often they fire, and when investigators record the activity of a single neuron, what they’re usually interested in is the cell’s firing rate, measured in “spikes per second.” The investigator can then vary the circumstances (either in the external world or elsewhere in the nervous system) in order to learn what makes the cell fire more and what makes it fire less. In this way, we can figure out what job the neuron does within the broad context of the entire nervous system. The technique of single-cell recording has been used with enormous success in the study of vision. In a typical procedure, the animal being studied is first immobilized. Then, electrodes are placed just outside a neuron in the animal’s optic nerve or brain. Next, a computer screen is placed in front of the animal’s eyes, and various patterns are flashed on the screen: circles, lines at various angles, or squares of various sizes at various positions. Researchers can then ask: Which patterns cause that neuron to fire? To what visual inputs does that cell respond?
By analogy, we know that a smoke detector is a smoke detector because it “fires” (i.e., makes noise) when smoke is on the scene. We know that a motion detector is a motion detector because it “fires” when something moves nearby. But what kind of detector is a given neuron? Is it responsive to any light in any position within the field of view? In that case, we might call it a “light detector” Or is it perhaps responsive only to certain shapes at certain positions (and therefore is a “shape detector”)? With this logic, detector it is. More formally, this procedure allows us to define the cell’s receptive field-that is, the we can map out precisely what the cell responds to-what kind of size and shape of the area in the visual world to which that cell responds. Multiple Types of Receptive Fields In 1981, the neurophysiologists David Hubel and Torsten Wiesel were awarded the Nobel Prize for their exploration of the mammalian visual system (e.g., Hubel & Wiesel, 1959, 1968). They documented the existence of specialized neurons within the brain, each of which has a different type of receptive field, a different kind of visual trigger. For example, as “dot detectors.” These cells fire at their maximum rate when light is presented in a small, roughly some neurons seem to function circular area in a specific position within the field of view. Presentations of light just outside of this must be precisely area cause the cell to fire at less than its usual “resting” rate, so the input positioned to make this cell fire. Figure 3.6 depicts such a receptive field. These cells are often called center-surround cells, to mark the fact that light presented to the central region of the receptive field has one influence, while light presented surrounding ring has the opposite influence. If the to both the center and the surround are strongly stimulated, the cell will fire neither more nor less than usual. For this cell, a strong uniform stimulus is equivalent to no stimulus at all.
Other cells fire at their maximum only when a stimulus containing an edge of just the right orientation appears within their receptive fields. These cells, therefore, can be thought of as “edge detectors.” Some of these cells fire at their maximum rate when a horizontal edge is presented; others, when a vertical edge is in view: still others fire at their maximum to orientations in between horizontal and vertical. Note, though, that in each these case. orientations merely “preference,” because these cells cells are not oblivious to edges of other orientations. If a cell’s preference is for, say, horizontal edges then the cell will still respond to other orientations-but less strongly than it does for horizontals. Specifically, the farther the edge is from the cell’s preferred orientation, the weaker the firing will be, and edges sharply different from the cell’s preferred orientation (e.g. vertical edge for a cell that prefers horizontal) will elicit virtually a no response (see Figure 3.7). Other cells, elsewhere in the visual cortex, have receptive fields that are more specific. Some cells fire maximally only if an angle of a particular size appears in their receptive fields; others fire maximally in response to corners and notches. Still other cells appear to be “movement detectors” and fire strongly if a stimulus moves, say, from right to left across the cell’s receptive field. Other cells favor left-to-right movement, and so on through the various possible directions of movement. Parallel Processing in the Visual System This proliferation of cell types highlights another important principle-namely, that the visual system relies on a “divide and conquer” strategy, with different types of cells, located in different areas of the cortex, each specializing in a particular kind of analysis. This pattern is plainly evident in Area V1, the site on the occipital lobe where axons from the LGN first reach the cortex (see Figure 3.8). In this brain area, some cells fire to (say) horizontals in this position in the visual world, others to horizontals in that position, others to verticals in specific positions, and so on. The full ensemble of provides a detector for every possible stimulus, making certain that no matter what cells in this area the input is or where it’s located, some cell will respond to it. The pattern of specialization is also evident when we consider other brain areas. Figure 3.9, for example, reflects one summary of the brain areas known to be involved in vision. The details of the figure aren’t crucial, but it is noteworthy that some of these areas (V1, V2, V3, V4, PO, and MT) are in the occipital cortex; other areas are in the parietal cortex; others are in the temporal cortex. (We’ll have more to say in a moment about these areas outside of the occipital cortex.) Most important, each area seems to have its own function. Neurons in Area MT, for example, are acutely sensitive to direction and speed of movement. (This area is the brain region that has suffered damage in strongly when the input is of a certain color and a involving akinetopsia.) Cells in Area V4 fire most certain shape. Let’s also emphasize that all of these specialized areas are active at the same time, so that (for example) cells in Area MT are detecting movement in the visual input at the same time that cells in Area V4 are detecting shapes. In other words, the visual system relies on parallel processing-a system in which many different steps (in this case, different kinds of analysis) are going on simultaneously. (Parallel processing is usually contrasted with serial processing, in which steps are carried out one at a time-i.e., in a series.)
One advantage of this simultaneous processing is speed: Brain areas trying to discern the shape of the incoming stimulus don’t need to wait until the motion analysis complete. Instead, all of the analyses go forward immediately when the input appears before the or the color analysis eyes, with no waiting time.
Another advantage of parallel processing is the possibility of mutual influence among multiple systems. To see why this matters, consider the fact that sometimes your interpretation of an object’s motion depends on your understanding of the object’s three-dimensional shape. This suggests that it might be best if the perception of shape happened first. That way, you could use the results of this processing step as a guide to later analyses. In other cases, though, the relationship between shape and motion is reversed. In these cases, your interpretation of an object’s three-dimensional shape depends on your understanding of its motion. To allow for this possibility, it might be best if the perception of motion happened first, so that it could guide the subsequent analysis of shape.
How does the brain deal with these contradictory demands? Parallel processing provides the answer. Since both sorts of analysis go on simultaneously, each type of analysis can be informed by the other. Put differently, neither the shape-analyzing system nor the motion-analyzing system gets priority. Instead, the two systems work concurrently and “negotiate” a solution that satisfies both systems (Van Essen & DeYoe, 1995). Parallel processing is easy to document throughout the visual system. As we’ve seen, the retina contains two types of specialized receptors (rods and cones) each doing its own job (e.g., the rods detecting stimuli in the periphery of your vision and stimuli presented at low light levels, and the cones detecting hues and detail at the center of your vision). Both types of receptors function at the same time-another case of parallel processing.
Likewise, within the optic nerve itself, there are two types of cells, P cells and M cells. The P cells provide the main input for the LGN’s parvocellular cells and appear to be specialized for spatial analysis and the detailed analysis of form. M cells provide the input for the LGN’s magnocellular cells and are specialized for the detection of motion and the perception of depth. And, again, both of these systems are functioning at the same time-more parallel processing.
Parallel processing remains in evidence when we move beyond the occipital cortex. As Figure 3.10 shows, some of the activation from the occipital lobe is passed along to the cortex of the temporal lobe. This pathway, often called the what system, plays a major role in the identification of visual objects, telling you whether the object is a cat, an apple, or whatever. At the same time, activation from the occipital lobe is also passed along a second pathway, leading to the parietal cortex, in what is often called the where system. This system seems to guide your action based on your perception of where an object is located-above or below you, to your right or to your left. (See Goodale & Milner, 2004; Humphreys & Riddoch, 2014; Ungerleider & Haxby, 1994; Ungerleider & Mishkin, 1982. For some complications, though, see Borst, Thompson, & Kosslyn, 2011; de Haan & Cowey, 2011.) The contrasting roles of these two systems can be revealed in many ways, including through studies of brain damage. Patients with lesions in the what system show visual agnosia-an inability to recognize visually presented objects, including such common things as a cup or a pencil. However, these patients show little disorder in recognizing visual orientation or in reaching. The reverse pattern occurs with patients who have suffered lesions in the where system: They have difficulty in reaching, but no problem in object identification (Damasio, Tranel, & Damasio, 1989; Farah, 1990; Goodale, 1995; Newcombe, Ratcliff, & Damasio, 1987).
Still other data echo this broad theme of parallel processing among separate systems. For example, we noted earlier that different brain areas are critical for the perception of color, motion and form. If this is right, then someone who has suffered damage in just one of these areas might show problems in the perception of color but not the perception of motion or form, or problems in the perception of motion but not the perception of form or color. These predictions are correct. As we mentioned at the chapter’s start, some patients suffer damage to the motion system and so develop akinetopsia (Zihl et al., 1983). For such patients, the world is described as a succession of static photographs. They’re unable to report the speed patient put it, “When I’m looking at the car first, it seems far away. But then when I want to cross the or direction of a moving object; as one road, suddenly the car is very near” (Zihl et al., 1983, p. 315).
Other patients suffer a specific loss of color vision through damage to the central nervous system, even though their perception of form and motion remains normal (Damasio, 1985: Gazzaniga, Ivry, & Mangun, 2014; Meadows, 1974). To them, the entire world is clothed only in “dirty shades of gray.” Cases like these provide dramatic confirmation of the separateness of our visual system’s various elements and the ways in which the visual system is vulnerable to very specific forms of damage. (For further evidence with neurologically intact participants, see Bundesen, Kyllingsbaek, & Larsen, 2003.)
Putting the Pieces Back Together
Let’s emphasize once again, therefore, that even the simplest of our intellectual achievements depends on an array of different, highly specialized brain areas all working together in parallel. This evident in Chapter 2 in our consideration of Capgras syndrome, and the same pattern has emerged in our description of the visual system. Here, too, many brain areas must work together: the what system and the where system, areas specialized for the detection of movement and areas specialized for the identification of simple forms. We have identified the advantages that come from this division of labor and the parallel processing it allows. But the division of labor also creates a problem: If multiple brain areas contribute to an overall task, how is their functioning coordinated? When you see an athlete make an astonishing jump, the jump itself is registered by motion-sensitive neurons, but your recognition of the athlete depends on shape-sensitive neurons. How are the pieces put back together? When you reach for a coffee cup but stop midway because you see that the cup is empty, the reach itself is guided by the where system; the fact that the cup is empty is registered by the what system. How are these two streams of processing coordinated?
Investigators refer to this broad issue as the binding problem– the task of reuniting the various initially addressed by different systems in different parts of the brain. And obviously this problem is solved. What you perceive is not an unordered catalogue of elements of a scene, elements that sensory elements. Instead, you perceive a coherent, integrated perceptual world. Apparently, a case in which the various pieces of Humpty Dumpty are reassembled to form an organized whole. Visual Maps and Firing Synchrony
Look around you. Your visual system registers whiteness and blueness and brownness; it also registers a small cylindrical shape (your coffee cup), a medium-sized rectangle (a sheet of paper), and a much larger rectangle (your desk). How do you put these pieces together so that you see that it’s the coffee cup, and not the book page, that’s blue; the desktop, and not the cup, that’s brown?
There is debate about how the visual system solves this problem, but we can identify three elements that contribute to the solution. One element is spatial position. The part of the brain registering the cup’s shape is separate from the parts registering its color its motion; nonetheless, these various brain areas all have something in common. They each keep track of where the target is -where the cylindrical shape was located, and where the blueness was; where the motion was detected, and where things were still. As a result, the reassembling of these pieces can be done with reference to position. In essence, you can overlay the map of which forms are where on top of the map of which colors are where to get the right colors with the right forms, and likewise for the map showing which motion patterns are where. Information about spatial position is, of course, useful for its own sake: You have a compelling reason to care whether the tiger is close to you or far away, or whether the bus is on your side of the street or the other. But in addition, location information apparently provides a frame of reference used to solve the binding problem. Given this double function, we shouldn’t be surprised that spatial position is a major organizing theme in all the various brain areas concerned with vision, with each area seeming to provide its own map of the visual world.
Spatial position, however, is not the whole story. Evidence also suggests that the brain uses special rhythms to identify which sensory elements belong with which. Imagine two groups of neurons in the visual cortex. One group of neurons fires maximally whenever a vertical line is in view; another group fires maximally whenever a stimulus is in view moving from a high position to a low one. Let’s also imagine that right now a vertical line is presented and it is moving downward; as a result, both groups of neurons are firing strongly. How does the brain encode the fact that these attributes are bound together, different aspects of a single object? There is evidence that the visual system marks this fact by means of neural synchrony: If the neurons detecting a vertical line are firing in synchrony with those signaling movement, then these attributes are registered as belonging to the same object. If they aren’t in synchrony, then the features aren’t bound together (Buzsáki & Draguhn, 2004; Csibra, Davis, Spratling, & Johnson, 2000; Elliott & Müller, 2000; Fries, Reynolds Rorie, & Desimone, 2001). What causes this synchrony? synchronized in the first place? Here, another factor appears to be important: attention. Well have more to say about attention in Chapter 5, but for now let’s note that attention plays a key role in binding together the separate features of a stimulus. (For a classic statement of this argument, see Treisman & Gelade, 1980; Treisman, Sykes, & Gelade, 1977. For more recent views, see Quinlan, 2003; Rensink, 2012; and also Chapter 5.) Evidence for attention’s role comes from many sources, including the fact that when we overload someone’s attention, she is likely to make conjunction errors. This means that she’s likely to correctly detect the features present in a visual display, but then to make mistakes about how the features are bound together (or conjoined). Thus, for example, someone shown a blue H and a red T might report seeing a blue T and a red H-an error in binding.
Similarly, individuals who suffer from severe attention deficits (because of brain damage in the binding. parietal cortex) are particularly impaired in tasks that require them to conjoined to form complex objects (e.g., Robertson, Treisman, Friedman-Hill, & Grabowecky, 1997). judge how features are Finally, studies suggest that synchronized neural firing occurs in an animal’s brain when the animal is attending to a specific stimulus but does not occur in neurons activated by an unattended stimulus (e.g., Buschman & Miller, 2007; Saalmann, Pigarev, & Vidyasagar, 2007; Womelsdorf et al., 2007). All of these results point toward the claim that attention is crucial for the binding problem and, moreover, that attention is linked to the neural synchrony that seems to unite a stimulus’s features. Notice, then, that there are several ways in which information is represented in the brain. In Chapter 2, we noted that the brain uses different chemical signals (i.e., different neurotransmitters) to transmit different types of information. We now see that there is information reflected in which firing in synchrony with other cells, firing, whether the cells are firing, how often they cells are are and the rhythm in which they are firing. Plainly, this is a system of considerable complexity! Form Perception So far in this chapter, we’ve been discussing how visual perception begins: with the detection of simple attributes in the stimulus-its color, its motion, and its catalogue of features. But this detection is just the start of the process, because the visual system still has to assemble these features into recognizable wholes. We’ve mentioned the binding problem as part of this “assembly”- but binding isn’t the whole story. This point is reflected in the fact that our perception of the visual world is organized in ways that the stimulus input is not-a point documented early in the 20th 3 century by a group called the “Gestalt psychologists.” The Gestaltists argued that the organization is contributed by the perceiver; this is why, they claimed, the perceptual whole is often different from the sum of its parts. Some years later, Jerome Bruner (1973) voiced related claims and coined the phrase “beyond the information given” to describe some of the ways our perception of a stimulus differs from (and goes beyond) the stimulus itself. For example, consider the form shown in the top of Figure 3.11: the Necker cube. This drawing is an example of a figure-so-called because people perceive it first one way and then another. Specifically, this form can be perceived as a drawing of a cube viewed from above (in which case it’s similar to the cube marked A in the figure); it can also be perceived as a cube viewed from below (in which case it’s similar to the cube marked B). Let’s be clear, though, that this isn’t an “illusion,” because neither of these interpretations is “wrong” and the drawing itself (and, therefore, the information reaching your eyes) is fully compatible with either interpretation. Put differently, the drawing shown in Figure 3.11 is entirely neutral with regard to the shape’s configuration in depth; the lines on the page don’t specify which is the “proper” interpretation. Your perception of the cube, however, is not neutral. Instead, you perceive the cube as having one configuration or the other-similar either to Cube A or to Cube B. Your perception goes beyond the information given in the drawing, by specifying an arrangement in depth. The same point can be made for many other stimuli. Figure 3.12A (after Rubin, 1915, 1921) can be perceived either as a vase centered in the picture or as two profiles facing each other. The drawing by itself is compatible with either of these perceptions, and so, once again, the drawing is neutral with regard to perceptual organization. In particular, it is neutral with regard to figure/ground organization, the determination of what is the figure (the depicted object, displayed against a background) and what is the ground. Your perception of this drawing, however, isn’t neutral about this point. Instead, your perception somehow specifies that you’re looking at the vase and not the profiles, or that you’re looking at the profiles and not the vase. Figure/ground ambiguity is also detectable in the Canadian flag (Figure 3.12B). Since 1965, the centerpiece of Canada’s flag has been a red maple leaf. Many observers, however, note that a different organization is possible, profiles, shown in white against a red backdrop. Each profile has a large nose, an open mouth, and a prominent brow ridge, and the profiles are looking downward, toward the flag’s center. In all these examples, then, your perception contains information-about how the form is at least for part of the flag. On their view, the flag depicts two arranged in depth, contained within the stimulus itself. Apparently, this is information contributed by you, the or about which part of the form is figure and which is ground-that is not perceiver. The Gestalt Principles
With figures like the Necker cube or the vase/profiles, your role in shaping the perception seems undeniable. In fact, if you stare at either of these figures, your perception flips back and forth-first you see the figure changing, and so the information that’s reaching your eyes is constant. Any changes in perception, one way, then another, then back to the first way. But the stimulus itself isn’t therefore, are caused by you and not by some change in the stimulus.
One might argue, though, that reversible figures are special-carefully designed to support multiple interpretations. On this basis, perhaps you play a smaller role when perceiving other, more “natural” stimuli. This position is plausible-but wrong, because many stimuli (and not just the reversible figures) ambiguous and in need of interpretation. We often don’t detect this ambiguity, but that’s because the interpretation happens so quickly that we don’t notice it. Consider, for example, the scene shown in Figure 3.13. It’s almost certain that you perceive segments B and E as are being united, forming a complete apple, but notice that this information isn’t provided by the stimulus; instead, it’s your interpretation. (If we simply go with the information in the figure, it’s possible that segments B and E are parts of entirely different fruits, with the “gap” between the two fruits hidden from view by the banana.) It’s also likely that you perceive the banana as entirely banana-shaped and therefore continuing downward out of your view, into the bowl, where it eventually ends with the sort of point that’s normal for a banana. In the same way, surely you perceive the horizontal stripes in the as continuous and merely hidden from view by the pitcher. (You’d be surprised if we background removed the pitcher and revealed a pitcher-shaped gap in the stripes.) But, of course, the stimulus doesn’t in any way “guarantee” the banana’s shape or the continuity of the stripes; these points are again, just your interpretation. Even with this ordinary scene, therefore, your perception goes “beyond the information given” and so the unity of the two apple slices and the continuity of the stripes is “in the eye of the beholder,” not in the stimulus itself. Of course, you don’t feel like you’re “interpreting” this picture or extrapolating beyond what’s on the page. But your role becomes clear the moment we start cataloguing the differences between your perception and the information that’s truly present in the photograph. Let’s emphasize, though, that your interpretation of the stimulus isn’t careless or capricious. Instead, you’re guided by a few straightforward principles that the Gestalt psychologists catalogued many years ago-and so they’re routinely referred to as the Gestalt principles. For example, your perception is guided by proximity and similarity: If, within the visual scene, you see elements that are close to each other, or elements that resemble each other, you assume these elements are parts of the same object (Figure 3.14). You also tend to assume that contours are smooth, not jagged, and you avoid interpretations that involve coincidences. (For a modern perspective on these principles and Gestalt psychology in general, see Wagemans, Elder, Kubovy et al., 2012; Wagemans, Feldman, Gephstein et al., 2010.) These perceptual principles are quite straightforward, but they’re essential if your perceptual to make sense of the often ambiguous, often incomplete information provided by apparatus is going your senses. In addition, it’s worth mentioning that everyone’s perceptions are guided by the same principles, and that’s why you generally perceive the world in the same way that other people do. on the perceptual input, but we all tend to impose the Each of us imposes our own interpretation interpretation because we’re all governed by the same rules. Organization and Features We’ve now considered two broad topics-the detection of simple attributes in the stimulus, and then the ways in which you organize those attributes. In thinking about these topics, you might want to think about them as separate steps. First, you collect information about the stimulus, so that you know (for example) what corners or angles or curves are in view-the visual features contained within the input. Then, once you’ve gathered the “raw data” you interpret this information. That’s when you “go beyond the information given”- deciding how the form is laid out in depth (as in Figure 3.11), deciding what is figure and what is ground (Figure 3.12A or B), and so on.
The idea, then, is that perception might be divided (roughly) into an “information gathering” step followed by an “interpretation” step. This view, however, is wrong, and, in fact, it’s easy to show that in many settings, your interpretation of the input happens before you start cataloguing the input’s basic features, not after. Consider Figure 3.15. Initially, these shapes seem to have no meaning, but after a moment most people discover the word hidden in the figure. That is, people find a way to reorganize the figure so that the familiar letters come into view. But let’s be clear about what means. At the start, the form seems not to contain the features needed to identify the L, the I, and so on. Once the form is reorganized, though, it does contain these features, and the letters are immediately recognized. In other words, with one organization, the features are absent; with another, they’re plainly present. It would seem, then, that the features themselves depend on how the form is organized by the viewer-and so the features are as much “in the eve of the beholder” as they are in the figure itself. As a different example, you have no difficulty reading the word printed in Figure 3.16, although most of the features needed for this recognition are absent. You easily “provide” the missing features, though, thanks to the fact that you interpret the black marks in the figure as shadows cast by solid letters. Given this interpretation and the extrapolation it involves, you can easily “fill in” the missing features and read the word. How should we think about all of this? On one hand, your perception of a form surely has to start with the stimulus itself and must in some ways be governed by what’s in that stimulus. (After all, no matter how you try interpret Figure 3.16, it won’t look to you like a photograph of Queen Elizabeth -the basic features of the queen are just not present, and your perception respects this obvious fact.) This suggests that the features must be in place before an interpretation is offered, because the features govern the interpretation. But, on the other hand, Figures 3.15 and 3.16 suggest that the opposite is the case: that the features you find in an input depend on how the figure is interpreted. Therefore, it’s the interpretation, not the features, that must be first.
The solution to this puzzle, however, is easy, and builds on ideas that we’ve already met: Many aspects of the brain’s functioning depend on parallel processing, with different brain areas all doing their work at the same time. In addition, the various brain areas all influence one another, so that what’s going on in one brain region is shaped by what’s going on elsewhere. In this way, the brain areas that analyze a pattern’s basic features do their work at the same time as the brain areas that analyze the pattern’s large-scale configuration, and these brain areas interact so that the perception of the features is guided by the configuration, and analysis of the configuration is guided by the features. In other words, neither type of processing “goes first.” Neither has priority. Instead, they work together, with the result that the perception that is achieved makes sense at both the large- scale and fine-grained levels. e. Demonstration 3.4: Satanic Messages? The Power of Suggestion There are periodic reports on the Internet of “secret messages” contained within pop music-often, messages that can be revealed only by playing the music backward. The messages supposedly have content that’s upsetting or offensive to many people-such as messages about Satan, or drug use, or sexual activity.
Are the messages really there? Probably not, and if you just listen to the music backward, you’re unlikely to hear the messages. However, if someone tells you what the message is, so that you know exactly “what to listen for,” you often can hear these dark messages! This point provides a powerful demonstration that your perception can be guided by expectations and knowledge-so that you can hear things with that guidance that can’t at all otherwise.
Visit this website: http://jeffmilner.com/backmasking/stairway-to-heaven-backwards.html
First, select one of the sound clips and play it forward. Try at that point to guess what the hidden message is.
Then, play the clip backward. can you guess what the hidden message is.
Then, as the crucial step, click on the button to reveal the lyrics that are supposedly hidden in the clip when it’s played backward, and play the clip again. Can you hear the hidden message now?
Of course, the question then becomes: Is the message really there, because you can hear it? Or is the message plainly not there, because you only hear it when someone explicitly suggests to you what the message is? Constancy We’ve now seen many indications of the perceiver’s role in “going beyond the information given” in the stimulus itself. This theme is also evident in another aspect of perception: the achievement of perceptual constancy. This term refers to the fact that we objects in the world (their sizes, shapes, and so on) even though the sensory information we receive about these attributes changes whenever our viewing circumstances change.
To illustrate this point, consider the perception of size. If you happen to be far away from the object you’re viewing, then the image cast onto your retinas by that object will be relatively small. If you approach the object, then the image size will increase. This change in image size is a simple perceive the constant properties of consequence physics, but you’re not fooled by this variation. Instead, you manage to achieve size constancy-you correctly perceive the sizes of objects despite the changes in retinal-image size created by changes in viewing distance. Similarly, if you view a door straight on, the retinal image will be rectangular; but if you view the same door from an angle, the retinal image will have a different shape (see Figure 3.17). Still, you achieve shape constancy-that is, you correctly perceive the shapes of objects despite changes in the retinal image created by shifts in your viewing angle. You also achieve brightness constancy-you correctly perceive the brightness of objects whether they’re illuminated by dim light or strong sun. Unconscious Inference How do you achieve each of these forms of constancy? One hypothesis focuses on relationships within the retinal image. In judging size, for example, you generally see objects against some background, and this can provide a basis for comparison with the target object. To see how this works, imagine that you’re looking at a dog sitting on the kitchen floor. Let’s say the dog is half as tall as the nearby chair and hides eight of the kitchen’s floor tiles from view. If you take several steps back from the dog, none of these relationships change, even though the sizes of all the retinal images are reduced. Size constancy, therefore, might be achieved by focusing not on the images themselves but on these unchanging relationships (see Figure 3.18). Relationships do contribute to size constancy, and that’s why you’re better able to judge size when comparison objects are in view or when the target you’re judging sits on a surface that has a uniform visual texture (like the floor tiles in the example). But these relationships don’t tell the whole story. Size constancy is achieved even when the visual scene offers no basis for comparison (if, for example, the object to be judged is the only object in view), provided that other cues signal the distance of the target object (Harvey & Leibowitz, 1967; Holway & Boring, 1947). How does your visual system use this distance information? More than a century ago, the German physicist Hermann von Helmholtz developed an influential hypothesis regarding this question. Helmholtz started with the fact that there’s a simple inverse relationship between distance and retinal image size: If an object doubles its distance from the viewer, the size of its image is reduced by half. If an object triples its distance, the size of its image is reduced to a third of its initial size. This relationship is guaranteed to hold true because of the principles of optics, and the relationship makes it possible for perceivers to achieve size constancy by means of a simple calculation. Of course, Helmholtz knew that we don’t run through a conscious calculation every time we perceive object’s size, but he believed we’re calculating nonetheless-and so he referred to the process as an unconscious inference (Helmholtz, 1909). What is the calculation that enables someone to perceive size correctly? It’s multiplication: the size of the image on the retina, multiplied by the distance between you and the object. (We’ll have more to say about how you know this distance in a later section.) As an example, imagine an object that, at a distance of 10 ft, casts an image on the retina that’s 4 mm across. Because of straightforward principles of optics, the same object, at a distance of 20 ft, casts an image of 2 mm. In both cases, the product-10 3 4 or 20 3 2-is the same. If, therefore, your size estimate depends on that product, your size estimate won’t be thrown off by viewing distance-and that’s exactly what we want (see Figure 3.19). What’s the evidence that size constancy does depend on this sort of inference? In many experiments, researchers have shown participants an object and, without changing the object’s retinal image, have changed the apparent distance of the object. (There are many ways to do this- lenses that change how the eye has to focus to bring the object into sharp view, or mirrors that change how the two eyes have to angle inward so that the object’s image is centered on both foveas.) If people are-as Helmholtz proposed-using distance information to judge size, then these manipulations should affect size perception. Any manipulation that makes an object seem farther away (without changing retinal image size) should make that object seem bigger (because, in essence, the perceiver would be “multiplying” by a larger number). Any manipulation that makes the object seem closer should make it look smaller. And, in fact, these predictions are correct-a powerful confirmation that people do use distance to judge size.
A similar proposal explains how people achieve shape constancy. Here, you take the slant of the surface into account and make appropriate adjustments-again, an unconscious inference-in your interpretation the retinal image’s shape. Likewise for brightness constancy: Perceivers are sensitive to how a surface is oriented relative to the available light sources, and they take this information into account in estimating how much light is reaching the surface. Then, they use this assessment of lighting to judge the surface’s brightness (e.g., whether it’s black or gray or white). In all these cases, therefore, it appears that the perceptual system does draw some sort of unconscious inference, taking viewing circumstances into account in a way that enables you to perceive the constant properties of the visual world. Illusions This process of taking information into account-whether it’s distance (in order to judge size), viewing angle (to judge shape), or illumination (to judge brightness)- is crucial for achieving constancy. More than that, it’s another indication that you don’t just “receive” visual information instead, you interpret it. The interpretation is an essential part of your perception and generally helps you perceive the world correctly.
The role of the interpretation becomes especially clear, however, in circumstances in which you misinterpret the information available to you and end up misperceiving the world. Consider the two tabletops shown in Figure 3.20. The table one on the right; a tablecloth that fits one table surely won’t fit the other. Objectively, though, the the left looks quite a bit longer and thinner than the parallelogram depicting the left tabletop is exactly the same shape as the one depicting the right tabletop. If you were to cut out the shape on the page depicting the left tabletop, rotate it, and slide it onto the right tabletop, they’d be an exact match. (Not convinced? Just lay another piece of paper on top of the page, trace the left tabletop, and then move your tracing onto the right tabletop.) Why do people misperceive these shapes? The answer involves the normal mechanisms of shape as a drawing of three- particular angle. This leads you-quite automatically-to tabletops, and it’s this depth in this figure constancy. Cues to dimensional objects, each viewed from a cause you to perceive the figure adjust for the (apparent) viewing angles in order to perceive the two adjustment that causes the illusion. Notice, then, that this illusion about shape is caused by misperception of depth: You misperceive the depth relationships in the drawing and then take this faulty information into account in interpreting the shapes. (For a related illusion, see Figure 3.21.) A different example is shown in Figure 3.22. It seems obvious to most viewers that the center square in this checkerboard (third row, third column) is a brighter shade than the square indicated by the arrow. But, in truth, the shade of gray shown on the page is identical for these two squares. What has happened here? The answer again involves the normal processes of perception. First, the mechanisms of lateral inhibition (described earlier) play a role here in producing a contrast effect: The central square in this figure is surrounded by dark squares, and the contrast makes the central square look brighter. The square marked at the edge of the checkerboard, however, is surrounded by white squares; here, contrast makes the marked square look darker. But, in addition, the visual system also detects that the central square is in the shadow cast by the cylinder. Your vision compensates for this fact-again, takes the shadow into account in judging brightness-and therefore powerfully magnifies the example of unconscious inference that an illusion. e. Demonstration 3.5: A Brightness Illusion
The earlier demonstrations for this chapter emphasized the active nature of vision: We apparently fill in information outside of the retina, so that we feel like we’re seeing a uniformly detailed scene even though we’re actually picking up detail from only a small portion of the visual world. Likewise, we seem to fill in information that’s missing from the visual input because of the blind spot. Vision’s active role is also evident in many illusions-for example, in the faint gray “dots” you see where the white bars in this figure cross. There are no dots at those intersections. The white bars actually are uniform in brightness from the far left to the far right, and from the top to the bottom. (If you don’t believe this, cover the figure with two pieces of paper, adjusting the paper to hide the black squares and to expose only a single white bar.) Why do you see the dots? They’re the result of lateral inhibition. In brief, the bits of white at the intersections are receiving more inhibition than the bits of white away from the intersections, and it’s this “extra” inhibition that makes the white at the intersections look slightly darker.
Why is the white at the intersections receiving more inhibition? Bear in mind that lateral inhibition is produced by a cell’s “neighbors,” and the more excited those neighbors are, the more inhibition they produce. With this basis, think about a cell that’s picking up information from one of the intersections. Let’s call this “Cell X.” Cells responsive to the bit of the image on the screen just above the intersection are also “seeing” white-and so these cells are excited, so they’re trying to inhibit Cell X. Likewise for cells responsive to the bit of screen just below the intersection, and also those responsive to the image on the screen just left and right of the intersection. In short, Cell X is surrounded by other cells that are “seeing” white-cells “north,” “south,” “east,” and “west” of it-and so Cell X is receiving inhibition from all sides. Now, think about a cell positioned so that it is picking up information away from the intersection. Let’s focus on a cell that’s a bit to the right of Cell X, and so it’s in the middle of one of the horizontal white bars; we’ll call this “Cell Y” Cells responsive to the image on the screen left and right of this position are seeing white and are therefore excited and trying to inhibit Cell Y. But cells responsive to the input above and below this position are seeing black, and so they’re less excited and therefore not trying to inhibit Cell Y. Thus, Cell Y has excited neighbors only right) and therefore is receiving inhibition from just two sides. In this way, Cell Y is receiving roughly half the inhibition that Cell X is receiving.