So in the final post in this series we will look at the adage that a 50mm lens is a “normal” lens because it equates to the eyes view of things. Or is it 43mm… or 35mm? Again a bunch of number seem to exist on the net, and it’s hard to decipher what the real answer is. Maybe there is no real answer, and we should stop comparing eyes to cameras? But for arguments sake let’s look at the situation in a different way by asking what lens focal length most closely replicates the Angle Of View (AOV) of the human visual system (HVS).
One common idea floating around is that the “normal” length of a lens is 43mm because a “full-frame” film, or sensor is 24×36mm in size, and if you calculate the length of the diagonal you get 43.3mm. Is this meaningful? Unlikely. You can calculate the various AOVs for each of the dimensions using the formula: 2 arctan(d/2f); where d is the dimension, and f is the focal length. So for the 24×36mm frame with a 50mm lens, for the diagonal we get: 2 arctan(43.3/(2×50) = 46.8°. This diagonal AOV is the one most commonly cited with lenses, but probably not the right one because few people think about a diagonal AOV. A horizontal one is more common, using d=36mm. Now we get 39.6°.
So now let’s consider the AOV of the HVS. The normal AOV of the HVS assuming binocular vision constraints of roughly 120° (H) by 135° (V), but the reality is that our AOV with respect to targeted vision is probably only 60° horizontally and 10-15° vertically from a point of focus. Of the horizontal vision, likely only 30° is focused. Let’s be conservative and assume 60°.
So a 50mm lens is not close. What about a 35mm lens? This would end up with a horizontal AOV of 54.4°, which is honestly a little closer. A 31mm lens gives us roughly 60°. A 68mm gives us the 30° of focused vision. What about if we wanted a lens AOV equivalent for the binocular 120° horizontal view? We would need a 10.5mm lens, which is starting to get a little fish-eyed.
There is in reality, no single answer. It really depends on how much of the viewable region of the HVS you want to include.
If you ever wonder what marvels lie in the human visual system, perform this exercise. Next time your a passenger in a moving car, and driving through a region with trees, look out at the landscape passing you by. If you look at the leaves on a particular tree you might be able to pick up individual leaves. Now track those leaves as you pass the scene. You will be able to track them because the human visual system is highly adept at pinpointing objects, even moving ones. The best high resolution camera could either take a video, or a photograph with an incredibly fast shutter speed, effectively freezing the frame. Cameras find tracking at high speed challenging.
Tracking and interpreting is even more challenging. It is the interpretation that sets the HVS apart from its digital counterparts. It is likely one of the attributes that allowed us to evolve. Access to fine detail, motion analysis, visual sizing of objects, colour differentiation – all things that can be done less effectively in the digital realm. Notice that I said effectively, and not efficiently. For the HVS does have limitations – lack of zoom, inability to store pictures, macro abilities, and no filtering. The images we do retain in our minds are somewhat abstract, lacking the clarity of photographs. But memories exist as more than mere visual representations. They encompass the amalgam of our senses as visual intersperse with smell, sound and touch.
Consider the photograph above, of some spruce tips. The image shows the needles as being a vibrant light green. What the picture fails to impart is an understanding of the feel and smell associated with the picture. The resiny smell of pine, and the soft almost fuzzy feeling of the tips. These sensory memories are encapsulated in the image stored in our minds. We can also conceptualize the information in the photography using colour, shape and texture.
It’s funny the associations people make between cameras and the human eye. Megapixels is one, but focal length is another. It probably stems from the notion that a full-frame 50mm focal length is as close as a camera gets to human vision (well not quite). While resolution has to do with the “the number of pixels”, and “the acuity of those pixels”, i.e. how the retina works, the focal length has to do with other components of the eye. Now search the web and you will find a whole bunch of different numbers when it comes to the focal length of the eye, in fact there are a number of definitions based on the optical system.
Now the anatomy of the eye has a role to play in defining the focal length. A camera lens is composed of a series of lens elements separated by air. The eye, conversely, is composed of two lenses separated by fluids. In the front of the eye is a tough, transparent layer called the cornea, which can be considered a fixed lens. Behind the cornea is a fluid known as the aqueous humor, filling the space between the cornea and lens. The lens is transparent, like the cornea, but it can be reshaped to allow focusing of objects are differing distances (the process of changing the shape of the lens is called accommodation,and is mediated by the ciliary muscles). From the lens, light travels through another larger layer of fluid known as the vitreous humor on its way to the retina.
When the ciliary muscles are relaxed, the focal length of the lens is at its maximum, and objects at a distance are in focus. When the ciliary muscles contract, the lens assumes a more convex shape, and the focal length of the lens is shortened to bring closer objects into focus. These two limits are called the far-point and near-point respectively.
Given this, there seem to be two ways people measure the focal length: (i) diopter, or (ii) optics based.
Focal length based on diopter
To understand diopter-based focal length of the eye, we have to understand Diopter, or the strength (refractive power) of a lens. It is calculated as the reciprocal of the focal length in metres. The refractive power of a lens is the ability of a material to bend light. A 1-diopter lens will bring a parallel beam to a focus at 1 metre. So the calculation is:
Diopter = 1 / (focal length in metres)
The average human eye functions in such a way that for a parallel beam of light coming from a distant object to be brought into focus, on the retina, the eye must have an optical power of about 59-60 diopters. In the compound lens of the human eye, about 40 diopters comes from the front surface of the cornea, the rest from the variable focus (crystalline) lens. Using this information we can calculate the focal length of human eye, as 1/Diopter, which means 1/59=16.9 and 1/60 = 16.66, or roughly 17mm.
Focal length based OPTICS
From the viewpoint of physical eye there are a number of distances to consider. If we consider the reduced eye, with a single principal plane, and nodal point. The principal plane is 1.5mm behind the anterior surface of the cornea, and a nodal point 7.2mm behind the anterior surface of the cornea. This gives an anterior focal length of 17.2mm measured from the single principal plane to the anterior focal point (F1), 15.7mm in front of the anterior surface of the cornea. The posterior focal length of 22.9mm is measured from the same plane to the posterior focal point (F2) on the retina.
The problem with some calculations is that they fail to take into account the fluid-filled properties of the eye. Now calculate the Dioptric power of both focal lengths, using the refractive index of vitreous humour = 1.337 for the calculation of the posterior focal length :
What does this allow us to do? Calculate the aperture range of the human eye. If we assume the iris diameters are 2-8mm, and use both 17mm and 22.9mm we get the following aperture ranges:
17mm : f2.1 – f8.5 22.9mm : f2.9 – f11.5
Does any of this really matter? Only if we were making a comparison to the “normal” lens found on a camera – the 50mm. We’ll continue this in a future post.
So now we have looked at the number of overall pixels, and the acuity of pixels throughout that region. If you have read the last two posts, you, like me, might surmise that there is no possibility of associating a value with the resolution of the eye. And you would probably be right, because on top of everything else there are a number of factors which affect visual acuity.
Refractive errors – Causes defocus at the retina, blurring out fine detail and sharp edges. A good example is myopia (short-sightedness).
Size of pupil – Pupils act like camera apertures, allowing light into the eye. Large pupils allow more light in, possibly affecting resolution by aberrations in the eye.
Illumination of the background – Less light means a lower visual acuity. As cones are the acuity masters, low light reduces their capabilities.
Area of retina stimulated – Visual acuity is greatest in the fovea. At 2.5 degrees from the point the eyes are fixated upon, there is approximately a 50% loss in visual acuity.
Eye movement – The eyes move, like all the time (e.g. your head doesn’t move when reading a book).
Complicated right? So what is the answer? We have looked at how non-uniform acuity may affect the resolution of the human eye. The last piece of the puzzle (maybe?) in trying to approximate the resolution of the human eye is the shape of our visual scope. When we view something, what is the “shape of the picture” being created. On a digital camera it is a rectangle. Not so with the human visual system. Because of the non-uniformity of acuity, the shape of the region being “captured” really depends on the application. If you are viewing a landscape vista, you are looking at an overall scene, whereas reading a book, the “capture area” is quite narrow (although the overall shape of information being input is the same, peripheral areas are seemingly ignored, because the fovea is concentrating on processing the words being read). To provide a sense of the visual field of binocular vision, here is an image from a 1964 NASA report, Bioastronautics Data Book:
This diagram shows the normal field of view of a pair of human eyes. The central white portion represents the region seen by both eyes. The dashed portions, right and left, represent the regions seen by the right and left eyes, respectively. The cut-off by the brows, cheeks, and nose is shown by the black area. Head and eyes are motionless in this case. Not quite, but almost an ellipse. But you can see how this complicates things even further when trying to approximate resolution. Instead of a rectangular field-of-view of 135°×190°, assume the shape of an ellipse, which gives (95*67.5)*π = 20145, which converts to 72.5 megapixels for 1 arc minute sized pixels – which is marginally lower than the 75 megapixels of the bounding rectangle.
So what’s the answer? What *is* the resolution of the human eye? If you wanted a number to represent the eyes pixelation, I would verge on the conservative side, and give the resolution of the eye a relatively low number, and by this I mean using the 1 arc minute acuity value, and estimating the “resolution” of the human visual system at somewhere around 100 megapixels. This likely factors in some sort of compromise for the region of the fovea with high acuity, and the remainder of the field of view with low resolution. It may also take into account the fact that the human vision system operates more like streaming video than it does a photograph. Can the eye be compared to a camera? No, it’s far too complicated trying to decipher a quantitative value for an organic structure comprised 80% gelatinous tissue.
Maybe some mysteries of the world should remain just that.
There is an old phrase, “the camera does not lie“, which can be interpreted as both true and false. In historic photos where there was little done in the way of manipulation, the photograph often did hold the truth of what appeared in the scene. In modern photographs that are “enhanced” this is often not the case. But there is another perspective. The phrase is true because the camera objectively captures everything in the scene within its field of view. But it is also false, because the human eye, is not all seeing, perceiving the world in a highly subjective manner – focusing on the object (or person) of interest. Most photographs tend to contain far too much information, visual “flotsam” that is selectively discarded by the human visual system. The rendition of colours can also appear “unnatural” in photographs because of issues with white balance, film types (in analog cameras), and sensors (digital cameras).
What the human eye sees (left) versus the camera (right)
A good example of how the human eye and camera lens perceive things differently is shown in the two photos above. The photograph on the right contains photographic perspective distortion (keystoning), where the tall buildings tend to “fall” or “lean” within the picture. The human eye (simulated on the left) on the other hand, corrects for this issue, and so does not perceive it. To photograph a tall building, the camera is often tilted upward, and in position the vertical lines of the building converge toward the top of the picture. The convergence of vertical lines is a natural manifestation of perspective which we find acceptable in the horizontal plane (e.g. the convergence of railway tracks in the distance), but which seems unnatural in the vertical plane.
There are many other factors that influence the outcome of a picture. Some are associated with the physical abilities of a camera and its associated lenses, others the environment. For example the colour of ambient light (e.g. a colour cast created by the sun setting), perspective (the wider a lens the more distortion introduced), or contrast (e.g. B&W images becoming “flat”). While the camera does not lie, it rarely exactly reproduces the world as we see it. Or maybe we don’t perceive the world around us as it truly is.
In the previous post, from a pure pixel viewpoint, we got a mixed bag of numbers to represent the human eye in terms of megapixels. One of the caveats was that not all pixels are created equal. A sensor in a camera has a certain resolution, and each pixel has the same visual acuity – whether a pixel becomes sharp or blurry is dependent on characteristics such as the lens, and depth-of-field. The human eye does not have a uniform acuity.
But resolution is about more than just how many pixels – it is about determining fine details. As noted in the last post, the information from the rods in coupled together, whereas the information from each cone has a direct link to the ganglion cells. Cones are therefore extremely important in vision, because without them we would view everything as we do in our peripheral vision, oh and without colour (people who can’t see colour have a condition called achromatopsia).
Cones are however, not uniformly distributed throughout the retina – they are packed more tightly in the centre of the eyes visual field, in a place known as the fovea. So how does this effect the resolution of the eye? The fovea (which means pit), or fovea centralis, is located in the centre of a region known as the macula lutea, a small oval region located exactly in the centre of the posterior portion of the retina. The macula lutea is 4.5-5.5mm in diameter, and the fovea lies directly in the centre. The arrangement of these components of the retina is shown below.
The fovea has a diameter of 1.5mm (although it varies slightly based on the study), and a field of view of approximately 5°. Therefore the fovea has an area of approximately 1.77mm². The fovea has roughly 158,000 cones per mm ² (see note). The density in the remainder of the retina is 9,000 cones per mm². So, the resolution of the human eye is much greater in the centre, than on the periphery. This high density of cones is achieved by decreasing the diameter of the cone outer segments such that foveal cones resemble rods in their appearance. The increased density of cones in the fovea is accompanied by a decrease in the density of rods. Within the fovea is a region called the foveola which has a diameter of about 0.3mm, and a field of view of 1° – this region contains only cones. The figure below (from 1935) shows the density of rods and cones in the retina.
Adapted from Osterberg, G., “Topography of the layer of rods and cones in the retina”, Acta Opthalmologica, 6, pp.1-103 (1935).
The fovea has the highest visual acuity in the eye. Why? One reason may be the concentration of colour-sensitive cones. Most photoreceptors in the retina are located behind retinal blood vessels and cells which absorb light before it reaches the photoreceptor cells. The fovea lacks the supporting cells and blood vessels, and only contains photoreceptors. This means that visual acuity is sharpest there, and drops significantly moving away from this central region.
For example pick a paragraph of text, and stare at a word in the middle of it. The visual stimulus in the middle of the field of view falls in the fovea and is in the sharpest focus. Without moving your eye, notice that the words on the periphery of the paragraph are not in complete focus. The images in the peripheral vision have a “blurred” appearance, and the words cannot be clearly identified (although we can’t see this properly, obviously). The eyes receive data from a field of view of 190-200°, but acuity of most of that range is quite poor. If you view the word from approximately 50cm away, then the field of view is about ±2.2cm from the word – beyond that things get fuzzier. Note that each eye obviously has its own fovea, but when you focus on a point both fovea overlap, but the resolution doesn’t increase.
The restriction of highest acuity vision to the fovea is the main reason we spend so much time moving our eyes (and heads) around. From a processing perspective, the fovea represents 1% of the retina, but the brains visual cortex devotes 50% of its computation to input from the fovea. So in the fovea, resolution is equivalent of a TIFF, whereas elsewhere it’s a JPEG. So, if the sharpest and most brilliantly coloured human vision comes from the fovea, what is its resolution?
Again this is a somewhat loaded question, but let’s attempt it anyway. If the fovea has a field of view of 5°, and assuming a circular region, we can create a circular region with a radius 2.5 degrees = 19.635 degrees2, and 60×60 = 3600 arcmin2/degree2. Assume a “pixel” acuity of 0.3×0.3=0.09 arcmin2. This gives us 19.635*3600 / 0.09 = 785,400 pixels. Even if we round up we get a resolution of about 1MP for the fovea. And honestly, the actual point of highest acuity may be even smaller than that – if we considered the foveola, we’re looking at a mere 125,000 pixels.
NOTE Note: There are many studies relating to the size of the fovea, and the density of photoreceptors, given that each human is a distinct being, there is no one exact number.
Jonas, J.B., Schneider, U., Naumann, G.O.H., “Count and density of human retinal photoreceptors”, Graefe’s Archive for Clinical and Experimental Ophthalmology, 230(6), pp.505-510 (1992).
A lot of visual technology such as digital cameras, and even TVs are based on megapixels, or rather the millions of pixels in a sensor/screen. What is the resolution of the human eye? It’s not an easy question to answer, because there are a number of facets to the concept of resolution, and the human eye is not analogous to a camera sensor. It might be better to ask how many pixels would be needed to make an image on a “screen” large enough to fill our entire field of view, so that when we look at it we can’t detect pixelation.
Truthfully, we may never really be able to put an exact number on the resolution of the human visual system – the eyes are organic, not digital. Human vision is made possible by the presence of photoreceptors in the retina. These photoreceptors, of which there are over 120 million in each eye, convert electromagnetic radiation into neural signals. The photoreceptors consist of rods and cones. Rods (which are rod shaped) provide scotopicvision, are responsible for low-light vision, and are achromatic. Cones (which are con shaped) provide photopicvision, are active at high levels of illumination, and are capable of colour vision. There are roughly 6-7 million cones, and nearly 120-125 million rods.
But how many [mega] pixels is this equivalent to? An easy guess of pixel resolution might be 125 -130 megapixels. Maybe. But then many rods are attached to bipolar cells providing for a low resolution, whereas cones each have their own bipolar cell. The bipolar cells strive to transmit signals from the photoreceptors to the ganglion cells. So there may be way less than 120 million rods providing actual information (sort-of like taking a bunch of grayscale pixels in an image and averaging their values to create an uber-pixel). So that’s not a fruitful number.
A few years ago Roger M. Clark of Clark Vision performed a calculation, assuming a field of view of 120° by 120°, and an acuity of 0.3 arc minutes. The result? He calculated that the human eye has a resolution of 576 megapixels. The calculation is simple enough:
(120 × 120 × 60 × 60) / (0.3 × 0.3) =576,000,000
The value 60 is the number of arc-minutes per degree, and the 0.3 arcmin² is essentially the “pixel” size. A square degree is then 60×60 arc-minutes, and contains 40,000 “pixels”. Seems like a huge number. But, as Clark notes, the human eye is not a digital camera. We don’t take snapshots (more’s the pity), and our vision system is more like a video stream. We also have two eyes, providing stereoscopic and binocular vision with the ability of depth perception. So there are many more factors than available in a simple sensor. For example, we typically move our eyes around, and our brain probably assembles a higher resolution image than is possible using our photoreceptors (similar I would imagine to how a high-megapixel image is created by a digital camera, slightly moving the sensor, and combining the shifted images).
The issue here may actually be the pixel size. In optimal viewing conditions the human eye can resolve detail as small as 0.59 arc minutes per line pair, which equates to 0.3 arc minutes. This number comes from a study from 1897 – “Die Abhängigkeit der Sehschärfe von der Beleuchtungsintensität”, written by Arthur König (translated roughly to “The Dependence of Visual Acuity on the Illumination Intensity”). A more recent study from 1990 (Curcio90) suggests a value of 77 cycles per degree. To convert this to arc-minutes per cycle, we first divide 1 by 77 and then multiply by 60 = 0.779. Two pixels define a cycle, so 0.779/2 = 0.3895, or 0.39. Now if we use 0.39×0.39 arcmin as the pixel size, we get 6.57 pixels per arcmin², versus 11.11 pixels when the acuity is 0.3. This vastly changes the value calculated to 341megapixels (60% of the previous calculation).
Clark’s calculation using 120° is also conservative, as the eyes field of view is roughly 155° horizontally, and 135° vertically. If we used these constraints we would get 837 megapixels (0.3), or 495 megapixels (0.39). The pixel size of 0.3 arcmin² is optimal viewing – but about 75% of the population have 20/20 vision, both with and without corrective measures. 20/20 vision implies an acuity of 1 arc minute, which means a pixel size of 1×1 arcmin². This could mean a simple 75 megapixels. There are three other factors which complicate this: (i) these calculations assume uniform optimal acuity, which is very rarely the case, (ii) vision is binocular, not monocular, and (iii) the field of view is likely not a rectangle.
For binocular vision, assuming each eye has a horizontal field of view of 155°, and there is an overlap of 120° (120° of vision from each eye is binocular, remaining 35° in each eye is monocular). This results in an overall horizontal field of view of 190°, meaning if we use 190°, and 1 arc minute acuity we get a combined total vision of 92 megapixels. If we change acuity to 0.3 we get over 1 gigapixel. Quite a range.
All these calculations are mere musings – there are far too many variables to consider in trying to calculate a generic number to represent the megapixel equivalent of the human visual system. The numbers I have calculated are approximations only to show the broad range of possibilities based solely on a few simple assumptions. In the next couple of posts we’ll look at some of the complicating factors, such as the concept of uniform acuity.
The human eye is a marvelous thing. A human eye has three types of cone cells, each of which can distinguish 100 different shades of colour. This puts the number of colours at around 1,000,000, although colour perception is a highly subjective activity. Colour-blind people (dichromats) have only two cones and see 10,000 colours, and tetrachromats have 4, and see up to 100 million colours. There is at least one case of a person with tetra-chromatic vision.
Of course the true number of colours visible to human eyes is truly unknown, and some people may have better perception than others. The CIE (Commission internationale de l’éclairage), who in 1931 established the “CIE 1931 XYZ color space”, created a horseshoe-shaped colour plot covering the hue range from 380-700nm, and saturation from 0% at the centre point, to 100% on the periphery. The work of CIE suggests humans can see approximately 2.4 million colours.
Others postulate that humans can discriminate about 150 bands between 380 and 700 nm. By changing saturation, and brightness, it is possible to determine many more colours – maybe 7 million [1].
This puts the human visual system in the mid-range of colour perception. Marine mammals are adapted for the low-light environment they live in, and are monochromats, i.e. they perceive about 100 colours. Conversely, on the other end of the spectrum, pentachromates can see 10 billion colours, e.g. some butterflies.
Now in computer vision, “true colour” is considered to be 24-bit RGB, or 16,777,216 color variations. Most people obviously can’t see that many colours. The alternatives in colour images are limited. 8-bit colour provides 256 colours, and 16-bit which is a weird combination of R (5-bit), G (6-bit) and B (5-bit), giving 65,536 colours. Can we perceive the difference? Here is a full 24-bit RGB photograph:
Here’s the equivalent 8-bit colour photograph:
Can you tell the difference? (Except for the apparent uniformly white region above the red and yellow buildings).
[1] Goldstein, E.B., Sensation and Perception, 3rd ed. (1989)