Demystifying Colour (ii) : the basics of colour perception

How humans perceive colour is interesting, because the technology of how digital cameras capture light is adapted from the human visual system. When light enters our eye it is focused by the cornea and lens into the “sensor” portion of the eye – the retina. The retina is composed of a number of different layers. One of these layers contains two types of photosensitive cells (photoreceptors), rods and cones, which interpret the light, and convert it into a neural signal. The neural signals are collected and further processed by other layers in the retina before being sent to the brain via the optic nerve. It is in the brain that some form of colour association is made. For example, an lemon is perceived as yellow, and any deviation from this makes us question what we are looking at (like maybe a pink lemon?).

Fig.1: An example of the structure and arrangement of rods and cones

The rods, which are long and thin, interpret light (white) and darkness (black). Rods work only at night, as only a few photons of light are needed to activate a rod. Rods don’t help with colour perception, which is why at night we see everything in shades of gray. The human eye is suppose to have over 100 million rods.

Cones have tapered shape, and are used to process the the three wavelengths which our brains interpret as colour. There are three types of cones – short-wavelength (S), medium-wavelength (M), and long-wavelength (L). Each cone absorbs light over a broad range of wavelengths: L ∼ 570nm, M ∼ 545nm, and S ∼ 440nm. The cones are usually called R, G, and B for L, M, and S respectively. Of course these cones have nothing to do with their colours, just wavelengths that our brain interprets as colours. There are roughly 6-7 million cones in the human eye, divided up into 64% “red” cones, 32% “green” cones, and 2% “blue” cones. Most of these are packed into the fovea. Figure 2 shows how rods and cones are arranged in the retina. Rods are located mainly in the peripheral regions of the retina, and are absent from the middle of the fovea. Cones are located throughout the retina, but concentrated on the very centre.

Fig.2: Rods and cones in the retina.

Since there are three types of cones, how are other colours formed? The ability to see millions of colours is a combination of the overlap of the cones, and how the brain interprets the information. Figure 3 shows roughly how the red, green, and blue sensitive cones interpret different wavelengths as colour. As different wavelengths stimulate the colour sensitive cones in differing proportions, the brain interprets the signals as differing colours. For example, the colour yellow results from the red and green cones being stimulated while the blues cones are not.

Fig.3: Response of the human visual system to light

Below is a list of approximately how the cones make the primary and secondary colours. All other colours are composed of varying strengths of light activating the red, green and blues cones. when the light is turned off, black is perceived.

  • The colour violet activates the blue cone, and partially activates the red cone.
  • The colour blue activates the blue cone.
  • The colour cyan activates the blue cone, and the green cone.
  • The colour green activates the green cone, and partially activates the red and blue cones.
  • The colour yellow activates the green cone and the red cone.
  • The colour orange activates the red cone, and partially activates the green cone.
  • The colour red activates the red cones.
  • The colour magenta activates the red cone and the blue cone.
  • The colour white activates the red, green and blue cones.

So what about post-processing once the cones have done their thing? The sensor array receives the colours, and stores the information by encoding it in the bipolar and ganglion cells in the retina before it is passed to the brain. There are three types of encoding.

  1. The luminance (brightness) is encoded as the sum of the signals coming from the red, green and blue cones and the rods. These help provide the fine detail of the image in black and white. This is similar to a grayscale version of a colour image.
  2. The second encoding separates blue from yellow.
  3. The third encoding separates red and green.
Fig.4: The encoding of colour information after the cones do their thing.

In the fovea there are no rods, only cones, so the luminance ganglion cell only receives a signal from one cone cell of each colour. A rough approximation of the process is shown in Figure 4.

Now, you don’t really need to know that much about the inner workings of the eye, except that colour theory is based a great deal on how the human eye perceives colour, hence the use of RGB in digital cameras.


Why human eyes are so great

Human eyes are made of gel-like material. It is interesting then, that together with a 3-pound brain composed predominantly of fat and water, we are capable of the feat of vision. Yes, we don’t have super-vision, and aren’t capable of zooming in on objects in the distance, but our eyes are magical. Eyes are able to focus instantaneously, and at objects as closer as 10cm, and as far away as infinity. They also automatically adjust for various lighting conditions. Our vision system is quickly able to decide what an object is and perceive 3D scenes.

Computer vision algorithms have made a lot of progress in the past 40 years, but they are by no means perfect, and in reality can be easily fooled. Here is an image of a refrigerator section in a grocery store in Oslo. The context of the content within the image is easily discernible. If we load this image into “Google Reverse Image Search” (GRIS), the program says that it is a picture of a supermarket – which is correct.

Now what happens if we blur the image somewhat? Let’s say a Gaussian blur with a radius of 51 pixels. This is what the resulting image looks like:

The human eye is still able to decipher the content in this image, at least enough to determine it is a series of supermarket shelves. Judging by the shape of the blurry items, one might go so far to say it is a refrigerated shelf. So how does the computer compare? The best it could come up with was “close-up”, because it had nothing to compare against. The Wolfram Language “Image Identification Program“, (IIP) does a better job, identifying the scene as “store”. Generic, but not a total loss. Let’s try a second example. This photo was taken in the train station in Bergen, Norway.

GRIS identifies similar images, and guesses the image is “Bergen”. Now this is true, however the context of the image is more related to railway rolling stock and the Bergen station, than Bergen itself. IIP identifies it as “locomotive engine”, which is right on target. If we add a Gaussian blur with radius = 11, then we get the following blurred image:

Now GRIS thinks this scene is “metro”, identifying similar images containing cars. It is two trains, so this is not a terrible guess. IIP identifies it as a subway train, which is a good result. Now lets try the original with Gaussian blur and a radius of 21.

Now GRIS identifies the scene as “rolling stock”, which is true, however the images it considers similar involve cars doing burn-out or stuck in the snow (or in one case a rockhopper penguin). IIP on the other hand fails this image, identifying it as a “measuring device”.

So as the image gets blurrier, it becomes harder for computer vision systems to identify, whereas the human eye does not have these problems. Even in a worst case scenario, where the Gaussian blur filter has a radius of 51, the human eye is still able to decipher its content. But GRIS thinks it’s a “photograph” (which *is* true, I guess), and IIP says it’s a person.

30-odd shades of gray – the importance of gray in vision

Gray (or grey) means a colour “without colour”… and it is a colour. But in terms of image processing we more commonly use gray as a term synonymous to monochromatic (although monochrome means single colour). Now grayscale images can potentially come with limitless levels of gray, but while this is practical for a machine, it’s not useful for humans. Why? Because the structure of human eyes is composed of a system for conveying colour information. This allows humans to distinguish between approximately 10 million colours, but only about 30 shades of gray.

The human eye has two core forms of photoreceptor cells: rods and cones. Cones deal with visioning colour, while rods allow us to see grayscale in low-light conditions, e.g. night. The human eye has three types of cones sensitive to magenta, green, and yellow-to-red. Each of these cones react to an interval of different wavelengths, for example blue light stimulates the green receptors. However, of all the possible wavelengths of light, our eyes detect only a small band, typically in the range of 380-720 nanometres, what we known as the visible spectrum. The brain then combines signals from the receptors to give us the impression of colour. So every person will perceive colours slightly differently, and this might also be different depending on location, or even culture.

After the light is absorbed by the cones, the responses are transformed into three signals:  a black-white (achromatic) signal and two colour-difference signals: a red-green and a blue-yellow. This theory was put forward by German physiologist Ewald Hering in the late 19th century. It is important for the vision system to properly reproduce blacks, grays, and whites. Deviations from these norms are usually very noticeable, and even a small amount of hue can produce a noticeable defect. Consider the following image which contains a number of regions that are white, gray, and black.

A fjord in Norway

Now consider the photograph with a slight blue colour cast. The whites, grays, *and* blacks have taken on the cast (giving the photograph a very cold feel to it).

Photograph of a fjord in Norway with a cast added.

The grayscale portion of our vision also provides contrast, without which images would have very little depth. This is synonymous with removing the intensity portion of an image. Consider the following image of some rail snowblowers on the Oslo-Bergen railway in Norway.

Rail snowblowers on the Oslo-Bergen railway in Norway.

Now, let’s take away the intensity component (by converting it to HSB, and replacing the B component with white, i.e. 255). This is what you get:

Rail snowblowers on the Oslo-Bergen railway in Norway. Photo has intensity component removed.

The image shows the hue and saturation components, but no contrast, making it appear extremely flat. The other issue is that sharpness depends much more on the luminance than the chrominance component of images (as you will also notice in the example above). It does make a nice art filter though.

Resolution of the human eye (iii) – things that affect visual acuity

So now we have looked at the number of overall pixels, and the acuity of pixels throughout that region. If you have read the last two posts, you, like me, might surmise that there is no possibility of associating a value with the resolution of the eye. And you would probably be right, because on top of everything else there are a number of factors which affect visual acuity.

  1. Refractive errors – Causes defocus at the retina, blurring out fine detail and sharp edges. A good example is myopia (short-sightedness).
  2. Size of pupil – Pupils act like camera apertures, allowing light into the eye. Large pupils allow more light in, possibly affecting resolution by aberrations in the eye.
  3. Illumination of the background – Less light means a lower visual acuity. As cones are the acuity masters, low light reduces their capabilities.
  4. Area of retina stimulated – Visual acuity is greatest in the fovea. At 2.5 degrees from the point the eyes are fixated upon, there is approximately a 50% loss in visual acuity.
  5. Eye movement – The eyes move, like all the time (e.g. your head doesn’t move when reading a book).

Complicated right? So what is the answer? We have looked at how non-uniform acuity may affect the resolution of the human eye. The last piece of the puzzle (maybe?) in trying to approximate the resolution of the human eye is the shape of our visual scope. When we view something, what is the “shape of the picture” being created. On a digital camera it is a rectangle. Not so with the human visual system. Because of the non-uniformity of acuity, the shape of the region being “captured” really depends on the application. If you are viewing a landscape vista, you are looking at an overall scene, whereas reading a book, the “capture area” is quite narrow (although the overall shape of information being input is the same, peripheral areas are seemingly ignored, because the fovea is concentrating on processing the words being read). To provide a sense of the visual field of binocular vision, here is an image from a 1964 NASA report, Bioastronautics Data Book:

This diagram shows the normal field of view of a pair of human eyes. The central white portion represents the region seen by both eyes. The dashed portions, right and left, represent the regions seen by the right and left eyes, respectively. The cut-off by the brows, cheeks, and nose is shown by the black area. Head and eyes are motionless in this case. Not quite, but almost an ellipse. But you can see how this complicates things even further when trying to approximate resolution. Instead of a rectangular field-of-view of 135°×190°, assume the shape of an ellipse, which gives (95*67.5)*π = 20145, which converts to 72.5 megapixels for 1 arc minute sized pixels – which is marginally lower than the 75 megapixels of the bounding rectangle.

So what’s the answer? What *is* the resolution of the human eye? If you wanted a number to represent the eyes pixelation, I would verge on the conservative side, and give the resolution of the eye a relatively low number, and by this I mean using the 1 arc minute acuity value, and estimating the “resolution” of the human visual system at somewhere around 100 megapixels. This likely factors in some sort of compromise for the region of the fovea with high acuity, and the remainder of the field of view with low resolution. It may also take into account the fact that the human vision system operates more like streaming video than it does a photograph. Can the eye be compared to a camera? No, it’s far too complicated trying to decipher a quantitative value for an organic structure comprised 80% gelatinous tissue.

Maybe some mysteries of the world should remain just that.

The camera does not lie

There is an old phrase, “the camera does not lie“, which can be interpreted as both true and false. In historic photos where there was little done in the way of manipulation, the photograph often did hold the truth of what appeared in the scene. In modern photographs that are “enhanced” this is often not the case. But there is another perspective. The phrase is true because the camera objectively captures everything in the scene within its field of view. But it is also false, because the human eye, is not all seeing, perceiving the world in a highly subjective manner – focusing on the object (or person) of interest. Most photographs tend to contain far too much information, visual “flotsam” that is selectively discarded by the human visual system. The rendition of colours can also appear “unnatural” in photographs because of issues with white balance, film types (in analog cameras), and sensors (digital cameras). 

What the human eye sees (left) versus the camera (right)

A good example of how the human eye and camera lens perceive things differently is shown in the two photos above. The photograph on the right contains photographic perspective distortion (keystoning), where the tall buildings tend to “fall” or “lean” within the picture. The human eye (simulated on the left) on the other hand, corrects for this issue, and so does not perceive it.  To photograph a tall building, the camera is often tilted upward, and in position the vertical lines of the building converge toward the top of the picture. The convergence of vertical lines is a natural manifestation of perspective which we find acceptable in the horizontal plane (e.g. the convergence of railway tracks in the distance), but which seems unnatural in the vertical plane.

There are many other factors that influence the outcome of a picture. Some are associated with the physical abilities of a camera and its associated lenses, others the environment. For example the colour of ambient light (e.g. a colour cast created by the sun setting), perspective (the wider a lens the more distortion introduced), or contrast (e.g. B&W images becoming “flat”). While the camera does not lie, it rarely exactly reproduces the world as we see it. Or maybe we don’t perceive the world around us as it truly is.

How many colours can humans see?

The human eye is a marvelous thing. A human eye has three types of cone cells, each of which can distinguish 100 different shades of colour. This puts the number of colours at around 1,000,000, although colour perception is a highly subjective activity. Colour-blind people (dichromats) have only two cones and see 10,000 colours, and tetrachromats have 4, and see up to 100 million colours. There is at least one case of a person with tetra-chromatic vision.

Of course the true number of colours visible to human eyes is truly unknown, and some people may have better perception than others. The CIE (Commission internationale de l’éclairage), who in 1931 established the “CIE 1931 XYZ color space”, created a horseshoe-shaped colour plot covering the hue range from 380-700nm, and saturation from 0% at the centre point, to 100% on the periphery. The work of CIE suggests humans can see approximately 2.4 million colours.

CIE 1931 XYZ color space

Others postulate that humans can discriminate about 150 bands between 380 and 700 nm. By changing saturation, and brightness, it is possible to determine many more colours – maybe 7 million [1].

Visible colour spectrum

This puts the human visual system in the mid-range of colour perception. Marine mammals are adapted for the low-light environment they live in, and are monochromats, i.e. they perceive about 100 colours. Conversely, on the other end of the spectrum, pentachromates can see 10 billion colours, e.g. some butterflies.

Now in computer vision, “true colour” is considered to be 24-bit RGB, or 16,777,216 color variations.  Most people obviously can’t see that many colours. The alternatives in colour images are limited. 8-bit colour provides 256 colours, and 16-bit which is a weird combination of R (5-bit), G (6-bit) and B (5-bit), giving 65,536 colours. Can we perceive the difference? Here is a full 24-bit RGB photograph:

Colour image with 24-bit RGB

Here’s the equivalent 8-bit colour photograph:

Colour image with 8-bit RGB

Can you tell the difference? (Except for the apparent uniformly white region above the red and yellow buildings).

[1] Goldstein, E.B., Sensation and Perception, 3rd ed. (1989)