Can humans discern 16 million colours in an image?

March 10, 2023 / spqr / Leave a comment

A standard colour image is 8-bit (or 24-bit) containing 256³ = 16,777,216 colours. That seems like a lot right? But can that many colours even be distinguished by the human visual system? The quick answer is no, or rather we don’t exactly know for certain. Research into the number of actual discernible colours is actually a bit of a rabbit’s hole.

A 1998 paper [1] suggests that the number of discernible colours may be around 2.28 million – the authors determined this by calculating the number of colours within the boundary of the MacAdam Limits in CIELAB Uniform Colour Space [2] (for those who are interested). However even the authors suggested this 2.28M may be somewhat of an overestimation. An larger figure of 10 million colours (from 1975) is often cited [3], but there is no information on the origin of this figure. A similar figure of 2.5 million colours was cited in a 2012 article [4]. A more recent article [5] gives a conservative estimate of 40 million distinguishable object color stimuli. Is it even possible to realistically prove such large numbers? Somewhat unlikely, because it may be impossible to quantify – ever. Indications based on existing colour spaces may be as good as it gets, and frankly even 1-2 million colours is a lot.

Of course the actual number of colours someone sees is also dependent on the number and distribution of cones in the eye. For example, dichromat’s only have two types of cones which are able to perceive colour. This colour deficiency manifests differently depending on which cone is missing. The majority of the population are trichromats, i.e. they have three types of cones. Lastly there are the very rare individuals, the tetrachromats who have four different cones. Supposedly tetrachromats can see 100 million colours, but it is thought the condition only exists in women, and in reality, nobody really knows how many are potentially tetrachromatic [6] (and the only definitive way of finding out if you have tetrachromacy is via a genetic test).

The reality is that few if any real pictures contain 16 million colours. Here are some examples (all images contain 9 million pixels). Note the images are shown in association with the hue distribution from the HSB colour-space. The first example is a picture of a wall of graffiti art in Toronto. Now this is an atypical image because it contains a lot of varied colours, most images do not. This image has only 740,314 distinct colours – that’s only 4.4% of the potential colours available.

The next example is a more natural picture, a picture of two building (Nova Scotia). This picture is quite representative of images such as landscapes, that are skewed towards quite a narrow band of colours. It only contains 217,751 distinct colours, or 1.3% of the 16.77 million colours.

Finally we have foody-type image that doesn’t seem to have a lot of differing colours, but in reality it does. There are 635,026 (3.8%) colours in the image. What these examples show is that most images contain fewer than one million different colours. So while there is the potential for an image to contain 16,777,216 colours, in all likely they won’t.

What about 10-bit colour? We’re taking about 1024³ or 1,073,741,824 colours – which is really kind of ridiculous.

Why human eyes are so great

December 6, 2019 / spqr / Leave a comment

Human eyes are made of gel-like material. It is interesting then, that together with a 3-pound brain composed predominantly of fat and water, we are capable of the feat of vision. Yes, we don’t have super-vision, and aren’t capable of zooming in on objects in the distance, but our eyes are magical. Eyes are able to focus instantaneously, and at objects as closer as 10cm, and as far away as infinity. They also automatically adjust for various lighting conditions. Our vision system is quickly able to decide what an object is and perceive 3D scenes.

Computer vision algorithms have made a lot of progress in the past 40 years, but they are by no means perfect, and in reality can be easily fooled. Here is an image of a refrigerator section in a grocery store in Oslo. The context of the content within the image is easily discernible. If we load this image into “Google Reverse Image Search” (GRIS), the program says that it is a picture of a supermarket – which is correct.

Now what happens if we blur the image somewhat? Let’s say a Gaussian blur with a radius of 51 pixels. This is what the resulting image looks like:

The human eye is still able to decipher the content in this image, at least enough to determine it is a series of supermarket shelves. Judging by the shape of the blurry items, one might go so far to say it is a refrigerated shelf. So how does the computer compare? The best it could come up with was “close-up”, because it had nothing to compare against. The Wolfram Language “Image Identification Program“, (IIP) does a better job, identifying the scene as “store”. Generic, but not a total loss. Let’s try a second example. This photo was taken in the train station in Bergen, Norway.

GRIS identifies similar images, and guesses the image is “Bergen”. Now this is true, however the context of the image is more related to railway rolling stock and the Bergen station, than Bergen itself. IIP identifies it as “locomotive engine”, which is right on target. If we add a Gaussian blur with radius = 11, then we get the following blurred image:

Now GRIS thinks this scene is “metro”, identifying similar images containing cars. It is two trains, so this is not a terrible guess. IIP identifies it as a subway train, which is a good result. Now lets try the original with Gaussian blur and a radius of 21.

Now GRIS identifies the scene as “rolling stock”, which is true, however the images it considers similar involve cars doing burn-out or stuck in the snow (or in one case a rockhopper penguin). IIP on the other hand fails this image, identifying it as a “measuring device”.

So as the image gets blurrier, it becomes harder for computer vision systems to identify, whereas the human eye does not have these problems. Even in a worst case scenario, where the Gaussian blur filter has a radius of 51, the human eye is still able to decipher its content. But GRIS thinks it’s a “photograph” (which *is* true, I guess), and IIP says it’s a person.

Is the eye equivalent to a 50mm lens?

August 16, 2019 / spqr / 1 Comment

So in the final post in this series we will look at the adage that a 50mm lens is a “normal” lens because it equates to the eyes view of things. Or is it 43mm… or 35mm? Again a bunch of number seem to exist on the net, and it’s hard to decipher what the real answer is. Maybe there is no real answer, and we should stop comparing eyes to cameras? But for arguments sake let’s look at the situation in a different way by asking what lens focal length most closely replicates the Angle Of View (AOV) of the human visual system (HVS).

One common idea floating around is that the “normal” length of a lens is 43mm because a “full-frame” film, or sensor is 24×36mm in size, and if you calculate the length of the diagonal you get 43.3mm. Is this meaningful? Unlikely. You can calculate the various AOVs for each of the dimensions using the formula: 2 arctan(d/2f); where d is the dimension, and f is the focal length. So for the 24×36mm frame with a 50mm lens, for the diagonal we get: 2 arctan(43.3/(2×50) = 46.8°. This diagonal AOV is the one most commonly cited with lenses, but probably not the right one because few people think about a diagonal AOV. A horizontal one is more common, using d=36mm. Now we get 39.6°.

So now let’s consider the AOV of the HVS. The normal AOV of the HVS assuming binocular vision constraints of roughly 120° (H) by 135° (V), but the reality is that our AOV with respect to targeted vision is probably only 60° horizontally and 10-15° vertically from a point of focus. Of the horizontal vision, likely only 30° is focused. Let’s be conservative and assume 60°.

So a 50mm lens is not close. What about a 35mm lens? This would end up with a horizontal AOV of 54.4°, which is honestly a little closer. A 31mm lens gives us roughly 60°. A 68mm gives us the 30° of focused vision. What about if we wanted a lens AOV equivalent for the binocular 120° horizontal view? We would need a 10.5mm lens, which is starting to get a little fish-eyed.

There is in reality, no single answer. It really depends on how much of the viewable region of the HVS you want to include.

Resolution of the human eye (iii) – things that affect visual acuity

July 8, 2019July 5, 2019 / spqr / Leave a comment

So now we have looked at the number of overall pixels, and the acuity of pixels throughout that region. If you have read the last two posts, you, like me, might surmise that there is no possibility of associating a value with the resolution of the eye. And you would probably be right, because on top of everything else there are a number of factors which affect visual acuity.

Refractive errors – Causes defocus at the retina, blurring out fine detail and sharp edges. A good example is myopia (short-sightedness).
Size of pupil – Pupils act like camera apertures, allowing light into the eye. Large pupils allow more light in, possibly affecting resolution by aberrations in the eye.
Illumination of the background – Less light means a lower visual acuity. As cones are the acuity masters, low light reduces their capabilities.
Area of retina stimulated – Visual acuity is greatest in the fovea. At 2.5 degrees from the point the eyes are fixated upon, there is approximately a 50% loss in visual acuity.
Eye movement – The eyes move, like all the time (e.g. your head doesn’t move when reading a book).

Complicated right? So what is the answer? We have looked at how non-uniform acuity may affect the resolution of the human eye. The last piece of the puzzle (maybe?) in trying to approximate the resolution of the human eye is the shape of our visual scope. When we view something, what is the “shape of the picture” being created. On a digital camera it is a rectangle. Not so with the human visual system. Because of the non-uniformity of acuity, the shape of the region being “captured” really depends on the application. If you are viewing a landscape vista, you are looking at an overall scene, whereas reading a book, the “capture area” is quite narrow (although the overall shape of information being input is the same, peripheral areas are seemingly ignored, because the fovea is concentrating on processing the words being read). To provide a sense of the visual field of binocular vision, here is an image from a 1964 NASA report, Bioastronautics Data Book:

This diagram shows the normal field of view of a pair of human eyes. The central white portion represents the region seen by both eyes. The dashed portions, right and left, represent the regions seen by the right and left eyes, respectively. The cut-off by the brows, cheeks, and nose is shown by the black area. Head and eyes are motionless in this case. Not quite, but almost an ellipse. But you can see how this complicates things even further when trying to approximate resolution. Instead of a rectangular field-of-view of 135°×190°, assume the shape of an ellipse, which gives (95*67.5)*π = 20145, which converts to 72.5 megapixels for 1 arc minute sized pixels – which is marginally lower than the 75 megapixels of the bounding rectangle.

So what’s the answer? What *is* the resolution of the human eye? If you wanted a number to represent the eyes pixelation, I would verge on the conservative side, and give the resolution of the eye a relatively low number, and by this I mean using the 1 arc minute acuity value, and estimating the “resolution” of the human visual system at somewhere around 100 megapixels. This likely factors in some sort of compromise for the region of the fovea with high acuity, and the remainder of the field of view with low resolution. It may also take into account the fact that the human vision system operates more like streaming video than it does a photograph. Can the eye be compared to a camera? No, it’s far too complicated trying to decipher a quantitative value for an organic structure comprised 80% gelatinous tissue.

Maybe some mysteries of the world should remain just that.

Resolution of the human eye (ii) – visual acuity

June 27, 2019June 27, 2019 / spqr / Leave a comment

In the previous post, from a pure pixel viewpoint, we got a mixed bag of numbers to represent the human eye in terms of megapixels. One of the caveats was that not all pixels are created equal. A sensor in a camera has a certain resolution, and each pixel has the same visual acuity – whether a pixel becomes sharp or blurry is dependent on characteristics such as the lens, and depth-of-field. The human eye does not have a uniform acuity.

But resolution is about more than just how many pixels – it is about determining fine details. As noted in the last post, the information from the rods in coupled together, whereas the information from each cone has a direct link to the ganglion cells. Cones are therefore extremely important in vision, because without them we would view everything as we do in our peripheral vision, oh and without colour (people who can’t see colour have a condition called achromatopsia).

Cones are however, not uniformly distributed throughout the retina – they are packed more tightly in the centre of the eyes visual field, in a place known as the fovea. So how does this effect the resolution of the eye? The fovea (which means pit), or fovea centralis, is located in the centre of a region known as the macula lutea, a small oval region located exactly in the centre of the posterior portion of the retina. The macula lutea is 4.5-5.5mm in diameter, and the fovea lies directly in the centre. The arrangement of these components of the retina is shown below.

Components of the retina. — Photograph: Danny Hope from Brighton & Hove, UKDiagram: User:Zyxwv99 [CC BY 2.0 (https://creativecommons.org/licenses/by/2.0)%5D

The fovea has a diameter of 1.5mm (although it varies slightly based on the study), and a field of view of approximately 5°. Therefore the fovea has an area of approximately 1.77mm². The fovea has roughly 158,000 cones per mm ² (see note). The density in the remainder of the retina is 9,000 cones per mm². So, the resolution of the human eye is much greater in the centre, than on the periphery. This high density of cones is achieved by decreasing the diameter of the cone outer segments such that foveal cones resemble rods in their appearance. The increased density of cones in the fovea is accompanied by a decrease in the density of rods. Within the fovea is a region called the foveola which has a diameter of about 0.3mm, and a field of view of 1° – this region contains only cones. The figure below (from 1935) shows the density of rods and cones in the retina.

Density of rods and cones in the retina. — Adapted from Osterberg, G., “Topography of the layer of rods and cones in the retina”, Acta Opthalmologica, 6, pp.1-103 (1935).

The fovea has the highest visual acuity in the eye. Why? One reason may be the concentration of colour-sensitive cones. Most photoreceptors in the retina are located behind retinal blood vessels and cells which absorb light before it reaches the photoreceptor cells. The fovea lacks the supporting cells and blood vessels, and only contains photoreceptors. This means that visual acuity is sharpest there, and drops significantly moving away from this central region.

For example pick a paragraph of text, and stare at a word in the middle of it. The visual stimulus in the middle of the field of view falls in the fovea and is in the sharpest focus. Without moving your eye, notice that the words on the periphery of the paragraph are not in complete focus. The images in the peripheral vision have a “blurred” appearance, and the words cannot be clearly identified (although we can’t see this properly, obviously). The eyes receive data from a field of view of 190-200°, but acuity of most of that range is quite poor. If you view the word from approximately 50cm away, then the field of view is about ±2.2cm from the word – beyond that things get fuzzier. Note that each eye obviously has its own fovea, but when you focus on a point both fovea overlap, but the resolution doesn’t increase.

The restriction of highest acuity vision to the fovea is the main reason we spend so much time moving our eyes (and heads) around. From a processing perspective, the fovea represents 1% of the retina, but the brains visual cortex devotes 50% of its computation to input from the fovea. So in the fovea, resolution is equivalent of a TIFF, whereas elsewhere it’s a JPEG. So, if the sharpest and most brilliantly coloured human vision comes from the fovea, what is its resolution?

Again this is a somewhat loaded question, but let’s attempt it anyway. If the fovea has a field of view of 5°, and assuming a circular region, we can create a circular region with a radius 2.5 degrees = 19.635 degrees², and 60×60 = 3600 arcmin²/degree². Assume a “pixel” acuity of 0.3×0.3=0.09 arcmin². This gives us 19.635*3600 / 0.09 = 785,400 pixels. Even if we round up we get a resolution of about 1MP for the fovea. And honestly, the actual point of highest acuity may be even smaller than that – if we considered the foveola, we’re looking at a mere 125,000 pixels.

NOTE
Note: There are many studies relating to the size of the fovea, and the density of photoreceptors, given that each human is a distinct being, there is no one exact number.

Jonas, J.B., Schneider, U., Naumann, G.O.H., “Count and density of human retinal photoreceptors”, Graefe’s Archive for Clinical and Experimental Ophthalmology, 230(6), pp.505-510 (1992).

For more information on the anatomy of the retina.
Everything you wanted to know about visual acuity.

Crafting Pixels

Vintage lenses, photography and digital imagery

human visual system