Can humans discern 16 million colours in an image?

A standard colour image is 8-bit (or 24-bit) containing 2563 = 16,777,216 colours. That seems like a lot right? But can that many colours even be distinguished by the human visual system? The quick answer is no, or rather we don’t exactly know for certain. Research into the number of actual discernible colours is actually a bit of a rabbit’s hole.

A 1998 paper [1] suggests that the number of discernible colours may be around 2.28 million – the authors determined this by calculating the number of colours within the boundary of the MacAdam Limits in CIELAB Uniform Colour Space [2] (for those who are interested). However even the authors suggested this 2.28M may be somewhat of an overestimation. An larger figure of 10 million colours (from 1975) is often cited [3], but there is no information on the origin of this figure. A similar figure of 2.5 million colours was cited in a 2012 article [4]. A more recent article [5] gives a conservative estimate of 40 million distinguishable object color stimuli. Is it even possible to realistically prove such large numbers? Somewhat unlikely, because it may be impossible to quantify – ever. Indications based on existing colour spaces may be as good as it gets, and frankly even 1-2 million colours is a lot.

Of course the actual number of colours someone sees is also dependent on the number and distribution of cones in the eye. For example, dichromat’s only have two types of cones which are able to perceive colour. This colour deficiency manifests differently depending on which cone is missing. The majority of the population are trichromats, i.e. they have three types of cones. Lastly there are the very rare individuals, the tetrachromats who have four different cones. Supposedly tetrachromats can see 100 million colours, but it is thought the condition only exists in women, and in reality, nobody really knows how many are potentially tetrachromatic [6] (and the only definitive way of finding out if you have tetrachromacy is via a genetic test).

The reality is that few if any real pictures contain 16 million colours. Here are some examples (all images contain 9 million pixels). Note the images are shown in association with the hue distribution from the HSB colour-space. The first example is a picture of a wall of graffiti art in Toronto. Now this is an atypical image because it contains a lot of varied colours, most images do not. This image has only 740,314 distinct colours – that’s only 4.4% of the potential colours available.

The next example is a more natural picture, a picture of two building (Nova Scotia). This picture is quite representative of images such as landscapes, that are skewed towards quite a narrow band of colours. It only contains 217,751 distinct colours, or 1.3% of the 16.77 million colours.

Finally we have foody-type image that doesn’t seem to have a lot of differing colours, but in reality it does. There are 635,026 (3.8%) colours in the image. What these examples show is that most images contain fewer than one million different colours. So while there is the potential for an image to contain 16,777,216 colours, in all likely they won’t.

What about 10-bit colour? We’re taking about 10243 or 1,073,741,824 colours – which is really kind of ridiculous.

Further reading:

  1. Pointer, M.R., Attridge, G.G., “The number of discernible colours”, Color Research and Application, 23(1), pp.52-54 (1998)
  2. MacAdam, D.L., “Maximum visual efficiency of colored materials”, Journal of the Optical Society of America, 25, pp.361-367 (1935)
  3. Judd, D.B., Wyszecki, G., Color in Business, Science and Industry, Wiley, p.388 (1975)
  4. Flinkman, M., Laamanen, H., Vahimaa, P., Hauta-Kasari, M., “Number of colors generated by smooth nonfluorescent reflectance spectra”, J Opt Soc Am A Opt Image Sci Vis., 29(12), pp.2566-2575 (2012)
  5. Kuehni, R.G., “How Many Object Colors Can We Distinguish?”, Color Research and Application, 41(5), pp.439-444 (2016)
  6. Jordan, G., Mollon, J., “Tetrachromacy: the mysterious case of extra-ordinary color vision”, Current Opinion in Behavioral Sciences, 30, pp.130-134 (2019)
  7. All the Colors We Cannot See, Carl Jennings (June 24, 2019)
  8. How Many Colors Can Most Of Us Actually See, USA Art News (July 23, 2020)

Viewing distances, DPI and image size for printing

When it comes to megapixels, the bottom line might be how an image ends up being used. If viewed on a digital device, be it an ultra-resolution monitor or TV, there are limits to what you can see. To view an image on an 8K TV at full resolution, we would need a 33MP image. However any device smaller than this will happily work with a 24MP image, and still not display all the pixels. Printing is however another matter all together.

The standard for quality in printing is 300dpi, or 300 dots-per-inch. If we equate a pixel to a dot, then we can work out the maximum size an image can be printed. 300 dpi is generally the “standard”, because that is the resolution most commonly used. To put this into perspective, at 300dpi, or 300 dots per 25.4mm, each pixel printed on a medium would be 0.085mm, or about as thick as 105 GSM weight paper. That means a dot area of roughly 0.007mm². For example a 24MP image containing 6000×4000 pixels can be printed to a maximum size of 13.3×20 inches (33.8×50.8cm) at 300dpi. The print sizes for a number of different sized images printed using 300dpi are shown in Figure 1.

Fig.1: Maximum printing sizes for various image sizes at 300dpi

The thing is that you may not even need 300dpi? At 300dpi the minimum viewing distance is theoretically 11.46”, whereas dropping it down to 180dpi means the viewing distance increases to 19.1” (but the printed size of an image can increase). In the previous post we discussed visual acuity in terms of the math behind it. Knowing that a print will be viewed from a minimum of 30” away allows us to determine that the optimal DPI required is 115. Now if we have a large panoramic print, say 80″ wide, printed at 300dpi, then the calculated minimum viewing distance is ca. 12″ – but it is impossible to view the entire print being only one foot away from it. So how do we calculate the optimal viewing distance, and then use this to calculate the actual number of DPI required?

The amount of megapixels required of a print can be guided in part by the viewing distance, i.e. the distance from the centre of the print to the eyes of the viewer. The golden standard for calculating the optimal viewing distance involves the following process:

  • Calculate the diagonal of the print size required.
  • Multiply the diagonal by 1.5 to calculate the minimum viewing distance
  • Multiply the diagonal by 2.0 to calculate the maximum viewing distance.

For example a print which is 20×30″ will have a diagonal of 36″, so the optimal viewing distance range from minimum to maximum is 54-72 inches (137-182cm). This means that we are no longer reliant on the use of 300dpi for printing. Now we can use the equations set out in the previous post to calculate the minimum DPI for a viewing distance. For the example above, the minimum DPI required is only 3438/54=64dpi. This would imply that the image size required to create the print is (20*64)×(30*64) = 2.5MP. Figure 2 shows a series of sample print sizes, viewing distances, and minimum DPI (calculated using dpi=3438/min_dist).

Fig.2: Viewing distances and minimum DPI for various common print sizes

Now printing at such a low resolution likely has more limitations than benefits, for example there is no guarantee that people will view the panorama from a set distance. So there likely is a lower bound to the practical amount of DPI required, probably around 180-200dpi because nobody wants to see pixels. For the 20×30″ print, boosting the DPI to 200 would only require a modest 24MP image, whereas a full 300dpi print would require a staggering 54MP image! Figure 3 simulates a 1×1″ square representing various DPI configurations as they might be seen on a print. Note that even at 120dpi the pixels are visible – the lower the DPI, the greater the chance of “blocky” features when view up close.

Fig.3: Various DPI as printed in a 1×1″ square

Are the viewing distances realistic? As an example consider the viewing of a 36×12″ panorama. The diagonal for this print would be 37.9″, so the minimum distance would be calculated as 57 inches. This example is illustrated in Figure 4. Now if we work out the actual viewing angle this creates, it is 37.4°, which is pretty close to 40°. Why is this important? Well THX recommends that the “best seat-to-screen distance” (for a digital theatre) is one where the view angle approximates 40 degrees, and it’s probably not much different for pictures hanging on a wall. The minimum resolution for the panoramic print viewed at this distance would be about 60dpi, but it can be printed at 240dpi with an input image size of about 25MP.

Fig.4: An example of viewing a 36×12″ panorama

So choosing a printing resolution (DPI) is really a balance between: (i) the number of megapixels an image has, (ii) the size of the print required, and (iii) the distance a print will be viewed from. For example, a 24MP image printed at 300dpi will allow a maximum print size of 13.3×20 inches, which has an optimal viewing distance of 3 feet, however by reducing the DPI to 200, we get an increased print size of 20×30 inches, with an optimal viewing distance of 4.5 feet. It is an interplay of many differing factors, including where the print is to be viewed.

P.S. For small prints, such as 5×7 and 4×6, 300dpi is still the best.

P.P.S. For those who who can’t remember how to calculate the diagonal, it’s using the Pythagorean Theorem. So for a 20×30″ print, this would mean:

diagonal = √(20²+30²)
         = √1300
         = 36.06

The math behind visual acuity

The number of megapixels required to print something, or view a television is ultimately determined by the human eye’s visual acuity, and the distance the object is viewed from. For someone with average vision (i.e. 20/20), their acuity would be defined as one arcminute, or 1/60th of a degree. For comparison, a full moon in the sky appears about 31 arcminutes (1/2 a degree) across (Figure 1).

Fig.1: Looking at the moon

Now generally, some descriptions skip from talking about arcminutes to describing how the distance between an observer and an object can calculated given the resolution of the object. For example, the distance (d, in inches) at which the eye reaches its resolution limit is often calculated using:

d = 3438 / h

Where h, is the resolution, and can be ppi for screens, and dpi for prints. So if h=300, then d=11.46 inches. Now to calculate the optimal viewing distance involves a magic number – 3438. Where does this number come from? Few descriptions actually give any insights, but we can can start with some basic trigonometry. Consider the diagram in Figure 2, where h is the pixel pitch, d is the viewing distance, and θ is the angle of viewing.

Fig.2: Viewing an object

Now we can use the basic equation for calculating an angle, Theta (θ), given the length of the opposite and adjacent sides:

tan(θ) = opposite/adjacent

In order to apply this formula to the diagram in Figure 2, only θ/2 and h/2 are used.

tan(θ/2) = (h/2)/d

So now, we can solve for h.

d tan(θ/2) = h/2
2d⋅tan(θ/2) = h

Now if we use visual acuity as 1 arcminute, this is equivalent to 0.000290888 radians. Therefore:

h = 2d⋅tan(0.000290888/2) 
  = 2d⋅0.000145444

So for d=24”, h= 0.00698 inches, or converted to mm (by multiplying by 25.4), h=0.177mm. To convert this into PPI/DPI, we simply take the inverse, so 1/0.00698 = 143 ppi/dpi. How do we turn this equation into one with the value 3438 in it? Well, given that the resolution can be calculated by taking the inverse, we can modify the previous equation:

h = 1/(2d⋅0.000145444)
  = 1/d * 1/2 * 1/0.000145444
  = 1/d * 1/2 * 6875.49847
  = 1/d * 3437.749
  = 3438/d

So for a poster viewed at d=36″, the value of h=95dpi (which is the minimum). The viewing distance can be calculated by rearranging the equation above to:

d = 3438 / h

As an example, consider the Apple Watch Series 8, whose screen has a resolution of 326ppi. Performing the calculation gives d=3438/326 = 10.55”. So the watch should be held 10.55” from one’s face. For a poster printed at 300dpi, d=11.46”, and for a poster printed at 180dpi, d=19.1”. This is independent of the size of the poster, just printing resolution, and represents the minimum resolution at a particular distance – only if you move closer do you need a higher resolution. This is why billboards can be printed at a low resolution, even 1dpi, because when viewed from a distance it doesn’t really matter how low the resolution is.

Note that there are many different variables at play when it comes to acuity. These calculations provide the simplest case scenario. For eyes outside the normal range, visual acuity is different, which will change the calculations (i.e. radians expressed in θ). The differing values for the arcminutes are: 0.75 (20/15), 1.5 (20/30), 2.0 (20/40), etc. There are also factors such as lighting, how eye prescriptions modify acuity, etc. to take into account. Finally, it should be added that these acuity calculations only take into account what is directly in front of our eyes, i.e. the narrow, sharp, vision provided by the foveola in the eye – all other parts of a scene, will have slightly less acuity moving out from this central point.

Fig.3: At 1-2° the foveola provides the greatest amount of acuity.

p.s. The same system can be used to calculate ideal monitor and TV sizes. For a 24″ viewing distance, the pixel pitch is h= 0.177mm. For a 4K (3840×2160) monitor, this would mean 3840*0.177=680mm, and 2160*0.177=382mm which after calculating the diagonal results in a 30.7″ monitor.

p.p.s. If using cm, the formula becomes: d = 8595 / h

The camera versus the eye

There are some similarities between the camera and the human eye. Both have a lens, shutter, and light sensitive material. In the eye, the image is formed by a combination of the cornea, the aqueous humor, and the lens. The eye-lid is the shutter, and the retina is the light sensitive material. The other similarity is that both cameras and the eye control the image brightness by means of an iris diaphragm. In the eye the amount of light is involuntarily controlled by opening and closing the iris. A camera controls the light transmitted through the lens by means of the aperture diaphragm.

But comparing the eye and the camera with one another by stressing only the similarities in their construction has confused the understanding of photography, because it disregards the differences in their function. These differences make the eye superior to the cameras in some instances, and the camera superior to the eye in others.

  • Human vision is binocular and stereoscopic, that of the camera is monocular. This is why photographs lack the same “depth of field” that is seen through the human eyes. A camera sees a scene without depth, and a photograph appears flat.
  • The eye’s view of the world is subjective, viewing what the mind is interested in, has a wish to see, or is forced to see. The camera sees objectively, recording everything in its field of view. This is the reason so many pictures, are just pictures, full of superfluous subject matter.
  • The eye is sensitive to colour. Cameras and different lenses can see colour differently, and black-and-white photography sees colour as shades of gray (the transformation of colour to gray is also varied).
  • The eye does not normally perceive minor changes in the colour of light. Both film and sensors are sensitive to such small changes. This failure to detect changes in light colour manifests itself in what the eye considers “unnatural” colours.
  • The eye cannot “store” and combine bracketed images, or stay open for an amount of time and “add up light”. The dimmer the light, the less we see, no matter how long we look at a scene. Both film and sensors can do this – and this ability to accumulate light impressions makes images in low light possible – at levels where nothing can be seen by the human eye.
  • The eye is sensitive only to that part of the electromagnetic spectrum which is known as light. Photographic films and sensors can be sensitive to other types of radiation, e.g. infrared, ultraviolet, and x-rays.
  • The focal length of the eye is fixed, and as such is limited. A camera cam be equipped with lenses of almost any focal length.
  • The angle of view of the eyes is fixed, but lenses range in angle from a few degrees to 220°. The monocular AOV of an eye is (160° wide by 135° high), whereas binocular AOV is 200°×135° with an overlap of 120°.
  • Human vision functions to see 3D things in a rectilinear perspective. Most lenses produce a perspective that is cylindrical or spherical.
  • The focusing ability of the eye is severely limited with respect to close distances. Anything closer than about 25cm can usually only be seen indistinctly, with objects perceived less and less clearly the smaller they are, to the point where they become invisible to the naked eye. The camera, with the right accessories, has none of these restrictions.
  • To the human eye, everything appears sharp at the same time (actually an illusion caused by the ability of the eye to autofocus). A camera can produce images with any degree of unsharpness, or images in which a predetermined zone is rendered sharp, while everything else is out-of-focus.
  • The eye can adjust almost instantaneously to changes in illumination, by contracting and enlarging the iris as it views light and dark scenes respectively. The camera’s “iris”, its diaphragm, can only be adjusted for overall brightness. Therefore the contrast range of the human eye is much wider than that of a camera. On a sensor/film too much contrast would show up as an over-exposed (featureless, white) region, whereas too little contrast would show up as underexposed (dark) regions.
  • The eye cannot express movement by instantaneously “freezing” an image of a moving subject, and cannot retain an image”. A camera can do both.
  • The eye “corrects” for receding parallel lines in the vertical plane, e.g. tall buildings, yet considers those in the horizontal plane to be normal. The camera makes no such distinction.
  • The eye sees everything it focuses on in the context of its surroundings, relating the part to the whole. A photograph nearly always shows the subject out of context, cut off from the surrounding visuals – a small limited view.

The human visual system : focus and acuity

There is a third difference between cameras and the human visual system (HVS). While some camera lenses may share a similar perspective of the world with the HVS with respect to the angle-of view, where they differ is what is actually in the area of focus. Using any lens on a camera means that a picture will have an area where the scene is in-focus, with the remainder being out-of-focus. This in-focus region generally occurs in a plane, and is associated with the depth-of-field. On the other hand, the in-focus region of the picture our mind presents us does not have a plane of focus.

While binocular vision allows approximately 120° of (horizontal) vision, it is only highly focused in the very centre, with the remaining picture being increasingly out-of-focus depending on how far a point is away from the central focused region. This may be challenging to visualize, but if you look at an object, only the central point is in focus, the remainder of the picture is out-of-focus. That does not mean it is necessarily blurred, because the brain is still able to discern shape and colour, just not fine details. Blurring it usually a function of distance from the object being focused on, i.e. the point-of-focus. If you look at a close object, distant objects will be out-of-focus, and vice versa.

Fig.1: Parts of the macula

Focused vision is related to the different parts of the macula, an oval-shaped pigmented area in the centre of the retina which is responsible for interpreting vision, colour, fine details, and symbols (see Figure 1). It is composed almost entirely of cones, into a series of zones:

  • perifovea (5.5mm∅, 18°) : Details that appear in up to 9-10° of visual angle.
  • parafovea (3mm∅, 8°) : Details that appear in peripheral vision, not as sharp as the fovea.
  • fovea (1.5mm∅, 5°) : Or Fovea centralis, comprised entirely of cones, and responsible for high-acuity, and colour vision.
  • foveola (0.35mm∅, 1°) : A central pit within the fovea, which contains densely packed cones. Within the foveola is a small depression known as the umbo (0.15mm∅), which is the microscopic centre of the foveola.
Fig.2: Angle-of-view of the whole macula region, versus the foveola. The foveola provides the greatest region of acuity, i.e. fine details.

When we fixate on an object, we bring an image of that object onto the fovea. The foveola provides the greatest amount of visual acuity, in the area 1-2° outwards from the point of fixation. As the distance from fixation increases, visual acuity decreases quite rapidly. To illustrate this effect, try reading the preceding text in this paragraph while fixating on the period at the end of the sentence. It is likely challenging, if not impossible, to read text outside a small circle of focus from the point of fixation. A seven letter word, like “outside”, is about 1cm wide, which when read on a screen 60cm from your eye represents about an angle of 1°. The 5° of the fovea region allows for a “preview” of the words either side, and parafovea region, 8° of peripheral words (i.e. their shape). This is illustrated in Figure 3.

Fig.3: Reading text from 60cm

To illustrate how this differential focus affects how humans view a scene, consider the image shown in Figure 4. The point of focus is a building in the background roughly 85m from where the person is standing. This image has been modified by adding radial blur from a central point-of-focus to simulate in-focus versus out-of-focus regions as seen by the eye (the blur has been exaggerated). The sharpest region is the point of fixation in the centre – from this focus on a particular object, anything either side of that object will be unsharp, and the further away from that point, the more unsharp is becomes. The

Fig.4: A simulation of focused versus out-of-focus regions in the HVS (the point of fixation is roughly 85m from the eyes)

It is hard to effectively illustrate exactly how the HVS perceives a scene as there is no way of taking a snapshot and analyzing it. However we do know that focus is a function of distance from the point-of-focus. Other parts of an image as essentially de-emphasized, there is still information there, and the way our minds process it, it provides a complete vision, but there is a central point of focus.

Further reading:

  1. Ruch, T.C., “Chapter 21: Binocular Vision, and Central Visual Pathways”, in Neurophysiology (Ruch, T.C. et al. (eds)) p.441-464 (1965)

The human visual system : image shape and binocular vision

There are a number of fundamental differences between a “normal” 50mm lens and the human visual system (HVS). Firstly, a camera extracts a rectangular image from the circular view of the lens. The HVS on the other hand is not circular, nor rectangular – if anything it has somewhat of an oval shape. This can be seen in the diagram of binocular field of vision shown in Figure 1 (from [1]). The central shaded region is the field of vision seen by both eyes, i.e. binocular (stereoscopic) vision, the white areas on both sides are the monocular crescents, seen by only by each eye, and the blackened area is not seen.

Fig.1: One of the original diagrams illustrating both the shape of vision, and the extent of binocular vision [1].

Figure 1 illustrates a second difference, the fact that normal human vision is largely binocular, i.e. uses both eyes to produce an image, whereas most cameras are monocular. Figure 2 illustrates binocular vision more clearly, comparing it to the total visual field.

Fig.2: Shape and angle-of-view, total versus binocular vision (horizontal).

The total visual field of the HVS is 190-200° horizontally, which is composed of 120° of binocular vision, and two fields of 35-40° seen by one one eye. Vertically, the visual field of view is about 130° (and the binocular field is roughly the same), comprised of 50° above the horizontal line-of-sight, and 70-80° below it. An example to illustrate binocular vision (horizontal) is shown in Figure 3.

Fig.3: A binocular (horizontal – 120°) view of Bergen, Norway

It is actually quite challenging to provide an exact example of what a human sees – largely because trying to take the same picture would require a lens such as a fish-eye which would introduce distortions, something the HVS is capable of filtering out.

Further reading:

  1. Ruch, T.C., “Chapter 21: Binocular Vision, and Central Visual Pathways”, in Neurophysiology (Ruch, T.C. et al. (eds)) p.441-464 (1965)