The camera does not lie

There is an old phrase, “the camera does not lie“, which can be interpreted as both true and false. In historic photos where there was little done in the way of manipulation, the photograph often did hold the truth of what appeared in the scene. In modern photographs that are “enhanced” this is often not the case. But there is another perspective. The phrase is true because the camera objectively captures everything in the scene within its field of view. But it is also false, because the human eye, is not all seeing, perceiving the world in a highly subjective manner – focusing on the object (or person) of interest. Most photographs tend to contain far too much information, visual “flotsam” that is selectively discarded by the human visual system. The rendition of colours can also appear “unnatural” in photographs because of issues with white balance, film types (in analog cameras), and sensors (digital cameras). 

What the human eye sees (left) versus the camera (right)

A good example of how the human eye and camera lens perceive things differently is shown in the two photos above. The photograph on the right contains photographic perspective distortion (keystoning), where the tall buildings tend to “fall” or “lean” within the picture. The human eye (simulated on the left) on the other hand, corrects for this issue, and so does not perceive it.  To photograph a tall building, the camera is often tilted upward, and in position the vertical lines of the building converge toward the top of the picture. The convergence of vertical lines is a natural manifestation of perspective which we find acceptable in the horizontal plane (e.g. the convergence of railway tracks in the distance), but which seems unnatural in the vertical plane.

There are many other factors that influence the outcome of a picture. Some are associated with the physical abilities of a camera and its associated lenses, others the environment. For example the colour of ambient light (e.g. a colour cast created by the sun setting), perspective (the wider a lens the more distortion introduced), or contrast (e.g. B&W images becoming “flat”). While the camera does not lie, it rarely exactly reproduces the world as we see it. Or maybe we don’t perceive the world around us as it truly is.

Resolution of the human eye (ii) – visual acuity

In the previous post, from a pure pixel viewpoint, we got a mixed bag of numbers to represent the human eye in terms of megapixels. One of the caveats was that not all pixels are created equal. A sensor in a camera has a certain resolution, and each pixel has the same visual acuity – whether a pixel becomes sharp or blurry is dependent on characteristics such as the lens, and depth-of-field. The human eye does not have a uniform acuity.

But resolution is about more than just how many pixels – it is about determining fine details. As noted in the last post, the information from the rods in coupled together, whereas the information from each cone has a direct link to the ganglion cells. Cones are therefore extremely important in vision, because without them we would view everything as we do in our peripheral vision, oh and without colour (people who can’t see colour have a condition called achromatopsia).

Layers of the Retina

Cones are however, not uniformly distributed throughout the retina – they are packed more tightly in the centre of the eyes visual field, in a place known as the fovea. So how does this effect the resolution of the eye? The fovea (which means pit), or fovea centralis, is located in the centre of a region known as the macula lutea, a small oval region located exactly in the centre of the posterior portion of the retina. The macula lutea is 4.5-5.5mm in diameter, and the fovea lies directly in the centre. The arrangement of these components of the retina is shown below.

Components of the retina.
Photograph: Danny Hope from Brighton & Hove, UKDiagram: User:Zyxwv99 [CC BY 2.0 (https://creativecommons.org/licenses/by/2.0)%5D

The fovea has a diameter of 1.5mm (although it varies slightly based on the study), and a field of view of approximately 5°. Therefore the fovea has an area of approximately 1.77mm². The fovea has roughly 158,000 cones per mm ² (see note). The density in the remainder of the retina is 9,000 cones per mm². So, the resolution of the human eye is much greater in the centre, than on the periphery. This high density of cones is achieved by decreasing the diameter of the cone outer segments such that foveal cones resemble rods in their appearance. The increased density of cones in the fovea is accompanied by a decrease in the density of rods. Within the fovea is a region called the foveola which has a diameter of about 0.3mm, and a field of view of 1° – this region contains only cones. The figure below (from 1935) shows the density of rods and cones in the retina.

Density of rods and cones in the retina.
Adapted from Osterberg, G., “Topography of the layer of rods and cones in the retina”, Acta Opthalmologica, 6, pp.1-103 (1935).

The fovea has the highest visual acuity in the eye. Why? One reason may be the concentration of colour-sensitive cones. Most photoreceptors in the retina are located behind retinal blood vessels and cells which absorb light before it reaches the photoreceptor cells. The fovea lacks the supporting cells and blood vessels, and only contains photoreceptors. This means that visual acuity is sharpest there, and drops significantly moving away from this central region.

For example pick a paragraph of text, and stare at a word in the middle of it. The visual stimulus in the middle of the field of view falls in the fovea and is in the sharpest focus. Without moving your eye, notice that the words on the periphery of the paragraph are not in complete focus. The images in the peripheral vision have a “blurred” appearance, and the words cannot be clearly identified (although we can’t see this properly, obviously). The eyes receive data from a field of view of 190-200°, but acuity of most of that range is quite poor. If you view the word from approximately 50cm away, then the field of view is about ±2.2cm from the word – beyond that things get fuzzier. Note that each eye obviously has its own fovea, but when you focus on a point both fovea overlap, but the resolution doesn’t increase.

The restriction of highest acuity vision to the fovea is the main reason we  spend so much time moving our eyes (and heads) around. From a processing perspective, the fovea represents 1% of the retina, but the brains visual cortex devotes 50% of its computation to input from the fovea. So in the fovea, resolution is equivalent of a TIFF, whereas elsewhere it’s a JPEG. So, if the sharpest and most brilliantly coloured human vision comes from the fovea, what is its resolution?

Again this is a somewhat loaded question, but let’s attempt it anyway. If the fovea has a field of view of 5°, and assuming a circular region, we can create a circular region with a radius 2.5 degrees = 19.635 degrees2, and 60×60 = 3600 arcmin2/degree2. Assume a “pixel” acuity  of 0.3×0.3=0.09 arcmin2. This gives us 19.635*3600 / 0.09 = 785,400 pixels. Even if we round up we get a resolution of about 1MP for the fovea. And honestly, the actual point of highest acuity may be even smaller than that  – if we considered the foveola, we’re looking at a mere 125,000 pixels.

NOTE
Note: There are many studies relating to the size of the fovea, and the density of photoreceptors, given that each human is a distinct being, there is no one exact number.

Jonas, J.B., Schneider, U., Naumann, G.O.H., “Count and density of human retinal photoreceptors”, Graefe’s Archive for Clinical and Experimental Ophthalmology, 230(6), pp.505-510 (1992).

For more information on the anatomy of the retina.
Everything you wanted to know about visual acuity.

Resolution of the human eye (i) pure pixel power

A lot of visual technology such as digital cameras, and even TVs are based on megapixels, or rather the millions of pixels in a sensor/screen. What is the resolution of the human eye? It’s not an easy question to answer, because there are a number of facets to the concept of resolution, and the human eye is not analogous to a camera sensor. It might be better to ask how many pixels would be needed to make an image on a “screen” large enough to fill our entire field of view, so that when we look at it we can’t detect pixelation.

Truthfully, we may never really be able to put an exact number on the resolution of the human visual system – the eyes are organic, not digital. Human vision is made possible by the presence of photoreceptors in the retina. These photoreceptors, of which there are over 120 million in each eye, convert electromagnetic radiation into neural signals. The photoreceptors consist of rods and cones. Rods (which are rod shaped) provide scotopicvision,  are responsible for low-light vision, and are achromatic. Cones (which are con shaped) provide photopicvision, are active at high levels of illumination, and are capable of colour vision. There are roughly 6-7 million cones, and nearly 120-125 million rods.

But how many [mega] pixels is this equivalent to? An easy guess of pixel resolution might be 125 -130 megapixels. Maybe. But then many rods are attached to bipolar cells providing for a low resolution, whereas cones each have their own  bipolar cell. The bipolar cells strive to transmit signals from the photoreceptors to the ganglion cells. So there may be way less than 120 million rods providing actual information (sort-of like taking a bunch of grayscale pixels in an image and averaging their values to create an uber-pixel). So that’s not a fruitful number.

A few years ago Roger M. Clark of Clark Vision performed a calculation, assuming a field of view of 120° by 120°, and an acuity of 0.3 arc minutes. The result? He calculated that the human eye has a resolution of 576 megapixels. The calculation is simple enough:

(120 × 120 × 60 × 60) / (0.3 × 0.3) =576,000,000

The value 60 is the number of arc-minutes per degree, and the 0.3 arcmin²  is essentially the “pixel” size. A square degree is then 60×60 arc-minutes, and contains 40,000 “pixels”.  Seems like a huge number. But, as Clark notes, the human eye is not a digital camera. We don’t take snapshots (more’s the pity), and our vision system is more like a video stream. We also have two eyes, providing stereoscopic and binocular vision with the ability of depth perception. So there are many more factors than available in a simple sensor. For example, we typically move our eyes around, and our brain probably assembles a higher resolution image than is possible using our photoreceptors (similar I would imagine to how a high-megapixel image is created by a digital camera, slightly moving the sensor, and combining the shifted images).

The issue here may actually be the pixel size. In optimal viewing conditions the human eye can resolve detail as small as 0.59 arc minutes per line pair, which equates to 0.3 arc minutes. This number comes from a study from 1897 – “Die Abhängigkeit der Sehschärfe von der Beleuchtungsintensität”, written by Arthur König (translated roughly to “The Dependence of Visual Acuity on the Illumination Intensity”). A more recent study from 1990 (Curcio90) suggests a value of 77 cycles per degree. To convert this to arc-minutes per cycle, we first divide 1 by 77 and then multiply by 60 = 0.779. Two pixels define a cycle, so 0.779/2 = 0.3895, or 0.39. Now if we use 0.39×0.39 arcmin as the pixel size, we get 6.57 pixels per arcmin², versus 11.11 pixels when the acuity is 0.3. This vastly changes the value calculated to 341megapixels (60% of the previous calculation).

Clark’s calculation using 120° is also conservative, as the eyes  field of view is roughly 155° horizontally, and 135° vertically. If we used these constraints we would get 837 megapixels (0.3), or 495 megapixels (0.39). The pixel size of 0.3 arcmin² is optimal viewing – but about 75% of the population have 20/20 vision, both with and without corrective measures. 20/20 vision implies an acuity of 1 arc minute, which means a pixel size of 1×1 arcmin². This could mean a simple 75 megapixels. There are three other factors which complicate this: (i)  these calculations assume uniform  optimal acuity, which is very rarely the case, (ii) vision is binocular, not monocular, and (iii) the field of view is likely not a rectangle.

For binocular vision, assuming each eye has a horizontal field of view of 155°, and there is an overlap of 120° (120° of vision from each eye is binocular, remaining 35° in each eye is monocular). This results in an overall horizontal field of view of 190°, meaning if we use 190°, and 1 arc minute acuity we get a combined total vision of 92 megapixels. If we change acuity to 0.3 we get over 1 gigapixel. Quite a range.

All these calculations are mere musings – there are far too many variables to consider in trying to calculate a generic number to represent the megapixel equivalent of the human visual system. The numbers I have calculated are approximations only to show the broad range of possibilities based solely on a few simple assumptions. In the next couple of posts we’ll look at some of the complicating factors, such as the concept of uniform acuity.

REFS:
(Curcio90) Curcio, C.A., Sloan, K.R., Kalina, R.E., Hendrickson, A.E., “Human photoreceptor topography”, The Journal of Comparative Neurology, 292, pp.497-523 (1990)

How many colours can humans see?

The human eye is a marvelous thing. A human eye has three types of cone cells, each of which can distinguish 100 different shades of colour. This puts the number of colours at around 1,000,000, although colour perception is a highly subjective activity. Colour-blind people (dichromats) have only two cones and see 10,000 colours, and tetrachromats have 4, and see up to 100 million colours. There is at least one case of a person with tetra-chromatic vision.

Of course the true number of colours visible to human eyes is truly unknown, and some people may have better perception than others. The CIE (Commission internationale de l’éclairage), who in 1931 established the “CIE 1931 XYZ color space”, created a horseshoe-shaped colour plot covering the hue range from 380-700nm, and saturation from 0% at the centre point, to 100% on the periphery. The work of CIE suggests humans can see approximately 2.4 million colours.

CIE 1931 XYZ color space

Others postulate that humans can discriminate about 150 bands between 380 and 700 nm. By changing saturation, and brightness, it is possible to determine many more colours – maybe 7 million [1].

Visible colour spectrum

This puts the human visual system in the mid-range of colour perception. Marine mammals are adapted for the low-light environment they live in, and are monochromats, i.e. they perceive about 100 colours. Conversely, on the other end of the spectrum, pentachromates can see 10 billion colours, e.g. some butterflies.

Now in computer vision, “true colour” is considered to be 24-bit RGB, or 16,777,216 color variations.  Most people obviously can’t see that many colours. The alternatives in colour images are limited. 8-bit colour provides 256 colours, and 16-bit which is a weird combination of R (5-bit), G (6-bit) and B (5-bit), giving 65,536 colours. Can we perceive the difference? Here is a full 24-bit RGB photograph:

Colour image with 24-bit RGB

Here’s the equivalent 8-bit colour photograph:

Colour image with 8-bit RGB

Can you tell the difference? (Except for the apparent uniformly white region above the red and yellow buildings).

[1] Goldstein, E.B., Sensation and Perception, 3rd ed. (1989)

A new perspective

I calved this blog off from my other blog craftofcoding.wordpress.com, because I really felt that it was the right time to have a blog dedicated to the craft of visual imagery. I have worked in image processing for many years, but more in the throes of academia, and never in a fulfilling manner. I have always had an interest in photography, and dabbled in film photography while at university during the 80’s. But film processing got too expensive, and it just didn’t feel like the right medium for me. I abandoned taking analog photographs and instead turned to image processing, when the only really digital media was scanned images. With the advent of digital cameras, I returned to photography, mainly due to the simplicity in obtaining photographs – there was no need to wait to see how a picture had turned out, it was instantaneous. I bought my first digital camera in 2002, a 2MP Olympus point-and-shoot.

The past 20 years have seen vast improvements in digital capture technology, photographs with incredible resolution, precision optics, super-fast autofocus etc. But something was still missing. There was something about the character of analog photographs that just can’t be replicated in digital. Some might relate this to their aesthetically appealing aberrations. Why were Instagram filters so popular? Why were people adding the same sort of image defects we were always trying to remove in image processing? What’s the dal with bokeh? Then the ah-ha moment, when the two worlds of analog and digital collide.

Last year I bought a Voigtländer 25mm manual focus prime lens for my M43 Olympus EM5(ii). After so many years of auto-everything, you tend to forget about the photographic skills you learn doing things manually, like focusing. Therein I decided to move the clock backwards and integrate the use of vintage lenses with my digital cameras. There was also a part of me that was beginning to wonder whether many image processing tasks I had spent years exploring had any intrinsic value. Instead I decided to move down another path, and view image processing from a more aesthetic viewpoint.

A depanneur in Montreal (clockwise from top-left): Original 9MP image; increased image saturation; converted to B&W, and scaled to 1/64th the size.

This blog will explore the history of vintage lenses, their integration with digital cameras, basic image processing techniques, and aspects of digital photography. It will also look at some of the more esoteric aspects of the visual world, like the cameras perspective of the human eye.