Can humans discern 16 million colours in an image?

A standard colour image is 8-bit (or 24-bit) containing 2563 = 16,777,216 colours. That seems like a lot right? But can that many colours even be distinguished by the human visual system? The quick answer is no, or rather we don’t exactly know for certain. Research into the number of actual discernible colours is actually a bit of a rabbit’s hole.

A 1998 paper [1] suggests that the number of discernible colours may be around 2.28 million – the authors determined this by calculating the number of colours within the boundary of the MacAdam Limits in CIELAB Uniform Colour Space [2] (for those who are interested). However even the authors suggested this 2.28M may be somewhat of an overestimation. An larger figure of 10 million colours (from 1975) is often cited [3], but there is no information on the origin of this figure. A similar figure of 2.5 million colours was cited in a 2012 article [4]. A more recent article [5] gives a conservative estimate of 40 million distinguishable object color stimuli. Is it even possible to realistically prove such large numbers? Somewhat unlikely, because it may be impossible to quantify – ever. Indications based on existing colour spaces may be as good as it gets, and frankly even 1-2 million colours is a lot.

Of course the actual number of colours someone sees is also dependent on the number and distribution of cones in the eye. For example, dichromat’s only have two types of cones which are able to perceive colour. This colour deficiency manifests differently depending on which cone is missing. The majority of the population are trichromats, i.e. they have three types of cones. Lastly there are the very rare individuals, the tetrachromats who have four different cones. Supposedly tetrachromats can see 100 million colours, but it is thought the condition only exists in women, and in reality, nobody really knows how many are potentially tetrachromatic [6] (and the only definitive way of finding out if you have tetrachromacy is via a genetic test).

The reality is that few if any real pictures contain 16 million colours. Here are some examples (all images contain 9 million pixels). Note the images are shown in association with the hue distribution from the HSB colour-space. The first example is a picture of a wall of graffiti art in Toronto. Now this is an atypical image because it contains a lot of varied colours, most images do not. This image has only 740,314 distinct colours – that’s only 4.4% of the potential colours available.

The next example is a more natural picture, a picture of two building (Nova Scotia). This picture is quite representative of images such as landscapes, that are skewed towards quite a narrow band of colours. It only contains 217,751 distinct colours, or 1.3% of the 16.77 million colours.

Finally we have foody-type image that doesn’t seem to have a lot of differing colours, but in reality it does. There are 635,026 (3.8%) colours in the image. What these examples show is that most images contain fewer than one million different colours. So while there is the potential for an image to contain 16,777,216 colours, in all likely they won’t.

What about 10-bit colour? We’re taking about 10243 or 1,073,741,824 colours – which is really kind of ridiculous.

Further reading:

  1. Pointer, M.R., Attridge, G.G., “The number of discernible colours”, Color Research and Application, 23(1), pp.52-54 (1998)
  2. MacAdam, D.L., “Maximum visual efficiency of colored materials”, Journal of the Optical Society of America, 25, pp.361-367 (1935)
  3. Judd, D.B., Wyszecki, G., Color in Business, Science and Industry, Wiley, p.388 (1975)
  4. Flinkman, M., Laamanen, H., Vahimaa, P., Hauta-Kasari, M., “Number of colors generated by smooth nonfluorescent reflectance spectra”, J Opt Soc Am A Opt Image Sci Vis., 29(12), pp.2566-2575 (2012)
  5. Kuehni, R.G., “How Many Object Colors Can We Distinguish?”, Color Research and Application, 41(5), pp.439-444 (2016)
  6. Jordan, G., Mollon, J., “Tetrachromacy: the mysterious case of extra-ordinary color vision”, Current Opinion in Behavioral Sciences, 30, pp.130-134 (2019)
  7. All the Colors We Cannot See, Carl Jennings (June 24, 2019)
  8. How Many Colors Can Most Of Us Actually See, USA Art News (July 23, 2020)
Advertisement

The math behind visual acuity

The number of megapixels required to print something, or view a television is ultimately determined by the human eye’s visual acuity, and the distance the object is viewed from. For someone with average vision (i.e. 20/20), their acuity would be defined as one arcminute, or 1/60th of a degree. For comparison, a full moon in the sky appears about 31 arcminutes (1/2 a degree) across (Figure 1).

Fig.1: Looking at the moon

Now generally, some descriptions skip from talking about arcminutes to describing how the distance between an observer and an object can calculated given the resolution of the object. For example, the distance (d, in inches) at which the eye reaches its resolution limit is often calculated using:

d = 3438 / h

Where h, is the resolution, and can be ppi for screens, and dpi for prints. So if h=300, then d=11.46 inches. Now to calculate the optimal viewing distance involves a magic number – 3438. Where does this number come from? Few descriptions actually give any insights, but we can can start with some basic trigonometry. Consider the diagram in Figure 2, where h is the pixel pitch, d is the viewing distance, and θ is the angle of viewing.

Fig.2: Viewing an object

Now we can use the basic equation for calculating an angle, Theta (θ), given the length of the opposite and adjacent sides:

tan(θ) = opposite/adjacent

In order to apply this formula to the diagram in Figure 2, only θ/2 and h/2 are used.

tan(θ/2) = (h/2)/d

So now, we can solve for h.

d tan(θ/2) = h/2
2d⋅tan(θ/2) = h

Now if we use visual acuity as 1 arcminute, this is equivalent to 0.000290888 radians. Therefore:

h = 2d⋅tan(0.000290888/2) 
  = 2d⋅0.000145444

So for d=24”, h= 0.00698 inches, or converted to mm (by multiplying by 25.4), h=0.177mm. To convert this into PPI/DPI, we simply take the inverse, so 1/0.00698 = 143 ppi/dpi. How do we turn this equation into one with the value 3438 in it? Well, given that the resolution can be calculated by taking the inverse, we can modify the previous equation:

h = 1/(2d⋅0.000145444)
  = 1/d * 1/2 * 1/0.000145444
  = 1/d * 1/2 * 6875.49847
  = 1/d * 3437.749
  = 3438/d

So for a poster viewed at d=36″, the value of h=95dpi (which is the minimum). The viewing distance can be calculated by rearranging the equation above to:

d = 3438 / h

As an example, consider the Apple Watch Series 8, whose screen has a resolution of 326ppi. Performing the calculation gives d=3438/326 = 10.55”. So the watch should be held 10.55” from one’s face. For a poster printed at 300dpi, d=11.46”, and for a poster printed at 180dpi, d=19.1”. This is independent of the size of the poster, just printing resolution, and represents the minimum resolution at a particular distance – only if you move closer do you need a higher resolution. This is why billboards can be printed at a low resolution, even 1dpi, because when viewed from a distance it doesn’t really matter how low the resolution is.

Note that there are many different variables at play when it comes to acuity. These calculations provide the simplest case scenario. For eyes outside the normal range, visual acuity is different, which will change the calculations (i.e. radians expressed in θ). The differing values for the arcminutes are: 0.75 (20/15), 1.5 (20/30), 2.0 (20/40), etc. There are also factors such as lighting, how eye prescriptions modify acuity, etc. to take into account. Finally, it should be added that these acuity calculations only take into account what is directly in front of our eyes, i.e. the narrow, sharp, vision provided by the foveola in the eye – all other parts of a scene, will have slightly less acuity moving out from this central point.

Fig.3: At 1-2° the foveola provides the greatest amount of acuity.

p.s. The same system can be used to calculate ideal monitor and TV sizes. For a 24″ viewing distance, the pixel pitch is h= 0.177mm. For a 4K (3840×2160) monitor, this would mean 3840*0.177=680mm, and 2160*0.177=382mm which after calculating the diagonal results in a 30.7″ monitor.

p.p.s. If using cm, the formula becomes: d = 8595 / h

The human visual system : focus and acuity

There is a third difference between cameras and the human visual system (HVS). While some camera lenses may share a similar perspective of the world with the HVS with respect to the angle-of view, where they differ is what is actually in the area of focus. Using any lens on a camera means that a picture will have an area where the scene is in-focus, with the remainder being out-of-focus. This in-focus region generally occurs in a plane, and is associated with the depth-of-field. On the other hand, the in-focus region of the picture our mind presents us does not have a plane of focus.

While binocular vision allows approximately 120° of (horizontal) vision, it is only highly focused in the very centre, with the remaining picture being increasingly out-of-focus depending on how far a point is away from the central focused region. This may be challenging to visualize, but if you look at an object, only the central point is in focus, the remainder of the picture is out-of-focus. That does not mean it is necessarily blurred, because the brain is still able to discern shape and colour, just not fine details. Blurring it usually a function of distance from the object being focused on, i.e. the point-of-focus. If you look at a close object, distant objects will be out-of-focus, and vice versa.

Fig.1: Parts of the macula

Focused vision is related to the different parts of the macula, an oval-shaped pigmented area in the centre of the retina which is responsible for interpreting vision, colour, fine details, and symbols (see Figure 1). It is composed almost entirely of cones, into a series of zones:

  • perifovea (5.5mm∅, 18°) : Details that appear in up to 9-10° of visual angle.
  • parafovea (3mm∅, 8°) : Details that appear in peripheral vision, not as sharp as the fovea.
  • fovea (1.5mm∅, 5°) : Or Fovea centralis, comprised entirely of cones, and responsible for high-acuity, and colour vision.
  • foveola (0.35mm∅, 1°) : A central pit within the fovea, which contains densely packed cones. Within the foveola is a small depression known as the umbo (0.15mm∅), which is the microscopic centre of the foveola.
Fig.2: Angle-of-view of the whole macula region, versus the foveola. The foveola provides the greatest region of acuity, i.e. fine details.

When we fixate on an object, we bring an image of that object onto the fovea. The foveola provides the greatest amount of visual acuity, in the area 1-2° outwards from the point of fixation. As the distance from fixation increases, visual acuity decreases quite rapidly. To illustrate this effect, try reading the preceding text in this paragraph while fixating on the period at the end of the sentence. It is likely challenging, if not impossible, to read text outside a small circle of focus from the point of fixation. A seven letter word, like “outside”, is about 1cm wide, which when read on a screen 60cm from your eye represents about an angle of 1°. The 5° of the fovea region allows for a “preview” of the words either side, and parafovea region, 8° of peripheral words (i.e. their shape). This is illustrated in Figure 3.

Fig.3: Reading text from 60cm

To illustrate how this differential focus affects how humans view a scene, consider the image shown in Figure 4. The point of focus is a building in the background roughly 85m from where the person is standing. This image has been modified by adding radial blur from a central point-of-focus to simulate in-focus versus out-of-focus regions as seen by the eye (the blur has been exaggerated). The sharpest region is the point of fixation in the centre – from this focus on a particular object, anything either side of that object will be unsharp, and the further away from that point, the more unsharp is becomes. The

Fig.4: A simulation of focused versus out-of-focus regions in the HVS (the point of fixation is roughly 85m from the eyes)

It is hard to effectively illustrate exactly how the HVS perceives a scene as there is no way of taking a snapshot and analyzing it. However we do know that focus is a function of distance from the point-of-focus. Other parts of an image as essentially de-emphasized, there is still information there, and the way our minds process it, it provides a complete vision, but there is a central point of focus.

Further reading:

  1. Ruch, T.C., “Chapter 21: Binocular Vision, and Central Visual Pathways”, in Neurophysiology (Ruch, T.C. et al. (eds)) p.441-464 (1965)

The human visual system : image shape and binocular vision

There are a number of fundamental differences between a “normal” 50mm lens and the human visual system (HVS). Firstly, a camera extracts a rectangular image from the circular view of the lens. The HVS on the other hand is not circular, nor rectangular – if anything it has somewhat of an oval shape. This can be seen in the diagram of binocular field of vision shown in Figure 1 (from [1]). The central shaded region is the field of vision seen by both eyes, i.e. binocular (stereoscopic) vision, the white areas on both sides are the monocular crescents, seen by only by each eye, and the blackened area is not seen.

Fig.1: One of the original diagrams illustrating both the shape of vision, and the extent of binocular vision [1].

Figure 1 illustrates a second difference, the fact that normal human vision is largely binocular, i.e. uses both eyes to produce an image, whereas most cameras are monocular. Figure 2 illustrates binocular vision more clearly, comparing it to the total visual field.

Fig.2: Shape and angle-of-view, total versus binocular vision (horizontal).

The total visual field of the HVS is 190-200° horizontally, which is composed of 120° of binocular vision, and two fields of 35-40° seen by one one eye. Vertically, the visual field of view is about 130° (and the binocular field is roughly the same), comprised of 50° above the horizontal line-of-sight, and 70-80° below it. An example to illustrate binocular vision (horizontal) is shown in Figure 3.

Fig.3: A binocular (horizontal – 120°) view of Bergen, Norway

It is actually quite challenging to provide an exact example of what a human sees – largely because trying to take the same picture would require a lens such as a fish-eye which would introduce distortions, something the HVS is capable of filtering out.

Further reading:

  1. Ruch, T.C., “Chapter 21: Binocular Vision, and Central Visual Pathways”, in Neurophysiology (Ruch, T.C. et al. (eds)) p.441-464 (1965)

What is an RGB colour image?

Most colour images are stored using a colour model, and RGB is the most commonly used one. Digital cameras typically offer a specific RGB colour space such as sRGB. It is commonly used because it is based on how humans perceive colours, and has a good amount of theory underpinning it. For instance, a camera sensor detects the wavelength of light reflected from an object and differentiates it into the primary colours red, green, and blue.

An RGB image is represented by M×N colour pixels (M = width, N = height). When viewed on a screen, each pixel is displayed as a specific colour. However, deconstructed, an RGB image is actually composed of three layers. These layers, or component images are all M×N pixels in size, and represent the values associated with Red, Green and Blue. An example of an RGB image decoupled into its R-G-B component images is shown in Figure 1. None of the component images contain any colour, and are actually grayscale. An RGB image may then be viewed as a stack of three grayscale images. Corresponding pixels in all three R, G, B images help form the colour that is seen when the image is visualized.

A Decoupled RGB image
Fig.1: A “deconstructed” RGB image

The component images typically have pixels with values in the range 0 to 2B-1, where B is the number of bits of the image. If B=8, the values in each component image would range from 0..255. The number of bits used to represent the pixel values of the component images determines the bit depth of the RGB image. For example if a component image is 8-bit, then the corresponding RGB image would be a 24-bit RGB image (generally the standard). The number of possible colours in an RGB image is then (2B)3, so for B=8, there would be 16,777,216 possible colours.

Coupled together, each RGB pixel is described using a triplet of values, each of which is in the range 0 to 255. It is this triplet value that is interpreted by the output system to produce a colour which is perceived by the human visual system. An example of an RGB pixel’s triplet value, and the associated R-G-B component values is shown in Figure 2. The RGB value visualized as a lime-green colour is composed of the RGB triplet (193, 201, 64), i.e. Red=193, Green=201 and Blue=64.

Fig.2: Component values of an RGB pixel

One way of visualizing the R,G,B, components of an image is by means of a 3D colour cube. An example is shown in Figure 3. The RGB image shown has 310×510, or 158,100 pixels. Next to it is a colour cube with the three axes, R, G, and B, each with a range of values 0-255, producing a cube with 16,777,216 elements. Each of the images 122,113 unique colours is represented as a point in the cube (representing only 0.7% of available colours).

Fig 2 Example of colours in an RGB 3D cube

The caveat of the RGB colour model is that it is not a perceptual one, i.e. chrominance and luminance are not separated from one another, they are coupled together. Note that there are some colour models/space that are decoupled, i.e. they separate luminance information from chrominance information. A good example is HSV (Hue, Saturation, Value).

Photography and colour deficiency

How often do we stop and think about how colour blind people perceive the world around us? For many people there is a reduced ability to perceive colours in the same way that the average person perceives them. Colour blindness, which is also known as colour vision deficiency affects some 8% of males, and 5% of females. Colour blindness means that a person has difficulty seeing red, green, or blue, or certain hues of these colours. In extremely rare cases, a person have an inability to see any colour at all. And one term does not fit all, as there are many differing forms of colour deficiency.

The most common form is red/green colour deficiency, split into two groups:

  • Deuteranomaly – 3 cones with a reduced sensitivity to green wavelengths. People with deuteranomaly may commonly confuse reds with greens, bright greens with yellows, pale pinks with light grey, and light blues with lilac.
  • Protanomaly – The opposite of deuteranomaly, a reduced sensitivity to red wavelengths. People with protanomaly may confuse black with shades of red, some blues with reds or purples, dark brown with dark green, and green with orange.

Then there is also blue/yellow colour deficiency. Tritanomaly is a rare color vision deficiency affecting the sensitivity of the blue cones. People with tritanomaly most commonly confuse blues with greens and yellows with purple or violet.

Standard vision
Deuteranomaly
Protanomaly
Tritanomaly

People with deuteranopia, protanopia, or tritanopia are the dichromatic forms where the associated cones (green, red, or blue) are missing completely. Lastly there is monochromacy, achromatopsia, or total colour blindness are conditions of having mostly defective or non-existent cones, causing a complete lack of ability to distinguish colours.

Standard vision
Deuteranopia
Protanopia
Tritanopia

How does this affect photography? Obviously photographs will be the same, but photographers who have a colour deficiency will perceive a scene differently. For those interested, there are some fine articles on how photographers deal with colourblindness.

  • Check here for an exceptional article on how photographer Cameron Bushong approaches colour deficiency.
  • Photographer David Wilder offers some insights into working on location and some tips for editing.
  • David Wilder describes taking photographs in Iceland using special glasses which facilitate the perception of colour.
  • Some examples of what the world looks like when your colour-blind.

Below is a rendition of the standard colour spectrum as it relates to differing types of colour deficiency.

Simulated colour deficiencies applied to the colour spectrum.

In reality people who are colourblind may be better at discerning some things. A 2005 article [1] suggests people with deuteranomaly may actually have an expanded colour space in certain circumstances, making it possible for them to for example discern subtle shades of khaki.

Note: The colour deficiencies shown above were simulated using ImageJ’s (Fiji) “Simulate Color Blindness” function. An good online simulator is the Coblis, Color Blindness Simulator.

  1. Bosten, J.M., Robinson, J.D., Jordan, G., Mollon, J.D., “Multidimensional scaling reveals a color dimension unique to ‘color-deficient’ observers“, Current Biology, 15(23), pp.R950-R952 (2005)

32 shades of gray

Humans don’t interpret gray tones very well – the human visual system perceiving approximately 32 shades of gray. So an 8-bit image with 256 tones already contains too much information for humans to interpret. That’s why you don’t really see any more clarity in a 10-bit image with 1024 shades of gray than a 5-bit image with 32 shades of gray. But why do we only see approximately 32 shades of gray?

It is the responsibility of the rod receptors to deal with black and white. The rods are far less precise than the cones which deal with colour, but are more sensitive to low levels of light that are typically associated with being able to see in a dimly lit room, or at night. There are supposedly over 100 million rods in the retina, but this doesn’t help distinguish any more than 30-32 shades of gray. This may stem from evolutionary needs – in the natural world there are very few things that are actually gray – stones, some trunks of trees, weathered wood, so there was very little need to distinguish between more than a few shades of gray. From an evolutionary perspective, humans needed night vision because they lived half their lives in darkness. This advantage remained crucial, apart perhaps form the past 150 years or so.

The rods work so well that dark adapted humans can detect just a handful of photons hitting the retina. It is likely this is the reason there are so many rods in the retina – so that in exceedingly low levels of light as many as possible of the scarce photons are captured by rods. Figure 1 illustrates two grayscale optical illusions, which rely on our eyes insensitivity to shades of gray. In the image on the left, the horizontal strip of gray is actually the same shade throughout, although our eyes deceive us into thinking that it is light on the left and dark on the right. in the image on the right, the inner boxes are all the same shade of gray, even though they appear to be different.

Fig.1: Optical illusions

To illustrate this further, consider the series of images in the figure below. The first image is the original colour image. The middle image shows that image converted to grayscale with 256 shades of gray. The image on the right shows the colour image converted to 4-bit grayscale, i.e. 16 shades of gray. Is there any perceptual difference between Fig.2b and 2c? Hardly.

Fig.2a: Original colour
Fig.2b: 8-bit grayscale
Fig.2c: 4-bit grayscale

You will see articles that suggest humans can see anywhere from 500-750 shades of gray. They are usually articles related to radiology, where radiologists interpret images like x-rays. The machines that take these medical images are capable of producing 10-bit or 12-bit images which are interpreted on systems capable of improving contrast. There may of course be people that can see more shades of gray, just like there are people with a condition called aphakia that possess ultraviolet vision (aphakia is a lack of a lens which normally blocks UV light, so they are able to perceive wavelengths up to 300nm). There are also tetrachromats who posses a fourth cone cell, allowing them to see up to 100 million colours.

Demystifying Colour (ii) : the basics of colour perception

How humans perceive colour is interesting, because the technology of how digital cameras capture light is adapted from the human visual system. When light enters our eye it is focused by the cornea and lens into the “sensor” portion of the eye – the retina. The retina is composed of a number of different layers. One of these layers contains two types of photosensitive cells (photoreceptors), rods and cones, which interpret the light, and convert it into a neural signal. The neural signals are collected and further processed by other layers in the retina before being sent to the brain via the optic nerve. It is in the brain that some form of colour association is made. For example, an lemon is perceived as yellow, and any deviation from this makes us question what we are looking at (like maybe a pink lemon?).

Fig.1: An example of the structure and arrangement of rods and cones

The rods, which are long and thin, interpret light (white) and darkness (black). Rods work only at night, as only a few photons of light are needed to activate a rod. Rods don’t help with colour perception, which is why at night we see everything in shades of gray. The human eye is suppose to have over 100 million rods.

Cones have tapered shape, and are used to process the the three wavelengths which our brains interpret as colour. There are three types of cones – short-wavelength (S), medium-wavelength (M), and long-wavelength (L). Each cone absorbs light over a broad range of wavelengths: L ∼ 570nm, M ∼ 545nm, and S ∼ 440nm. The cones are usually called R, G, and B for L, M, and S respectively. Of course these cones have nothing to do with their colours, just wavelengths that our brain interprets as colours. There are roughly 6-7 million cones in the human eye, divided up into 64% “red” cones, 32% “green” cones, and 2% “blue” cones. Most of these are packed into the fovea. Figure 2 shows how rods and cones are arranged in the retina. Rods are located mainly in the peripheral regions of the retina, and are absent from the middle of the fovea. Cones are located throughout the retina, but concentrated on the very centre.

Fig.2: Rods and cones in the retina.

Since there are three types of cones, how are other colours formed? The ability to see millions of colours is a combination of the overlap of the cones, and how the brain interprets the information. Figure 3 shows roughly how the red, green, and blue sensitive cones interpret different wavelengths as colour. As different wavelengths stimulate the colour sensitive cones in differing proportions, the brain interprets the signals as differing colours. For example, the colour yellow results from the red and green cones being stimulated while the blues cones are not.

Fig.3: Response of the human visual system to light

Below is a list of approximately how the cones make the primary and secondary colours. All other colours are composed of varying strengths of light activating the red, green and blues cones. when the light is turned off, black is perceived.

  • The colour violet activates the blue cone, and partially activates the red cone.
  • The colour blue activates the blue cone.
  • The colour cyan activates the blue cone, and the green cone.
  • The colour green activates the green cone, and partially activates the red and blue cones.
  • The colour yellow activates the green cone and the red cone.
  • The colour orange activates the red cone, and partially activates the green cone.
  • The colour red activates the red cones.
  • The colour magenta activates the red cone and the blue cone.
  • The colour white activates the red, green and blue cones.

So what about post-processing once the cones have done their thing? The sensor array receives the colours, and stores the information by encoding it in the bipolar and ganglion cells in the retina before it is passed to the brain. There are three types of encoding.

  1. The luminance (brightness) is encoded as the sum of the signals coming from the red, green and blue cones and the rods. These help provide the fine detail of the image in black and white. This is similar to a grayscale version of a colour image.
  2. The second encoding separates blue from yellow.
  3. The third encoding separates red and green.
Fig.4: The encoding of colour information after the cones do their thing.

In the fovea there are no rods, only cones, so the luminance ganglion cell only receives a signal from one cone cell of each colour. A rough approximation of the process is shown in Figure 4.

Now, you don’t really need to know that much about the inner workings of the eye, except that colour theory is based a great deal on how the human eye perceives colour, hence the use of RGB in digital cameras.

Image resolution and human perception

Sometimes we view a poster or picture from afar and are amazed at the level of detail, or the crispness of the features, yet viewed from up close this just isn’t the case. Is this a trick of the eye? It has to do with the resolving power of the eye.

Images, whether they are analog photographs, digital prints, or paintings, can contain many different things. There are geometric patterns, shapes, colours – everything needed in order to perceive the contents of the image (or in the case of some abstract art, not perceive it). Now as we have mentioned before, the sharpest resolution in the human eye occurs in the fovea, which represents about 1% of the eyes visual field – not exactly a lot. The rest of the visual field until the peripheral vision has progressively less ability to discern sharpness. Of course the human visual system does form a picture, because the brain is able to use visual memory to form a mental model of the world as you move around.

Fig.1: A photograph of a photograph stitched together (photographed at The Rooms, St.John’s, NFLD). .

Image resolution plays a role in our perception of images. The human eye is only able to resolve a certain amount of resolution based on viewing distance. There is actually an equation used to calculate this: 2/(0.000291×distance(inches)). A normal human eye (i.e. 20-20 vision) can distinguish patterns of alternating black and white lines with a feature size as small as one minute of an arc, i.e. 1/60 degree or π/(60*180) = 0.000291 radians.

So if a poster were viewed from a distance of 6 feet, the resolution capable of being resolved by the eye is 95 PPI. That’s why the poster in Fig.1, comprised of various separate photographs stitched together (digitally) to form a large image, appears crisp from that distance. It could be printed at 100 DPI, and still look good from that distance. Up close though it is a different story, as many of the edge features are quiet soft, and lack the sharpness expected from the “distant” viewing. The reality it that the poster could be printed at 300 DPI, but viewed from the same distance of 6 feet, it is unlikely the human eye could discern any more detail. It would only be useful if the viewer comes closer, however coming closer then means you may not be able to view the entire scene. Billboards offer another a good example. Billboards are viewed from anywhere from 500-2500 feet away. At 573ft, the human eye can discern 1.0 PPI, at 2500ft it would be 0.23 PPI (it would take 16 in2 to represent 1 pixel). So the images used for billboards don’t need to have a very high resolution.

Fig.2: Blurry details up close

Human perception is then linked to the resolving power of the eye. Resolving power is the ability of the eye to distinguish between very small objects that are very close together. To illustrate this further, consider the images shown in Fig.3. They have been extracted from a digital scan of a vintage brochure taken at various enlargement scales. When viewing the brochure it is impossible to see the dots associated with the printing process, because they are too small to discern (and that’s the point). The original, viewed on the screen is shown in Fig.3D. Even in Fig.3C it is challenging to see the dot pattern that makes up the print. In both Fig.3A and 3B, the dot pattern can be identified. It is no different with any picture. But looking at the picture close up, the perception of the picture is one of blocky, dot matrix, not the continuous image which exists when viewed from afar.

Fig.3: Resolving detail

Note that this is an exaggerated example, as the human eye does not have the discerning power to view the dots of the printing process without assistance. If the image were blown up to poster size however, a viewer would be able to discern the printing pattern. Many vintage photographs, such as the vacation pictures sold in 10-12 photo sets work on the same principle. When provided as a 9cm×6cm black-and-white photograph, they seem to show good detail when viewed from 16-24 inches away. However when viewed through a magnifying glass, or enlarged post-digitization, they lack the same sharpness as viewed from afar.

Note that 20-20 vision is based on the 20ft distance from the patient to the acuity chart when taking an eye exam. Outside of North America, the distance is normally 6 metres, and so 20-20 = 6-6.

How do we perceive photographs?

Pictures are flat objects that contain pigment (either colour, or monochrome), and are very different from the objects and scenes they represent. Of course pictures must be something like the objects they depict, otherwise they could not adequately represent them. Let’s consider depth in a picture. In a picture, it is often easy to find cues relating to the depth of a scene. The depth-of-field often manifests itself as a region of increasing out-of-focus away from the object which is in focus. Other possibilities are parallel lines than converge in the distance, e.g. railway tracks, or objects that are blocked by closer objects. Real scenes do not always offer such depth cues, as we perceive “everything” in focus, and railway tracks do not converge to a point! In this sense, pictures are very dissimilar to the real world.

If you move while taking a picture, the scene will change. Objects that are near move more in the field-of-view than those that are far away. As the photographer moves, so too does the scene, as a whole. Take a picture from a moving vehicle, and the near scene will be blurred, the far not as much, regardless of the speed (motion parallax). This then is an example of a picture for which there is no real world scene.

A photograph is all about how it is interpreted

Photography then, is not about capturing “reality”, but rather capturing our perception, our interpretation of the world around us. It is still a visual representation of a “moment in time”, but not one that necessarily represents the world around us accurately. All perceptions of the world are unique, as humans are individual beings, with their own quirks and interpretations of the world. There are also things that we can’t perceive. Humans experience sight through the visible spectrum, but UV light exists, and some animals, such as reindeer are believed to be able to see in UV.

So what do we perceive in a photograph?

Every photograph, no matter how painstaking the observation of the photographer or how long the actual exposure, is essentially a snapshot; it is an attempt to penetrate and capture the unique esthetic moment that singles itself out of the thousands of chance compositions, uncrystallized and insignificant, that occur in the course of a day.

Lewis Mumford, Technics and Civilization (1934)