What is a grayscale image?

If you are starting to learn about image processing then you will likely be dealing with grayscale or 8-bit images. This effectively means that they contain 2^8 or 256 different shades of gray, from 0 (black), to 255 (white). They are the simplest form of image to create image processing algorithms for. There are some image types that are more than 8-bit, e.g. 10-bit (1024 shades of grey), but in reality these are only used in specialist applications. Why? Doesn’t more shades of grey mean a better image? Not necessarily.

The main reason? Blame the human visual system. It is designed for colour, having three cone photoreceptors for conveying colour information that allows humans to perceive approximately 10 million unique colours. It has been suggested that from the perspective of grays, human eyes cannot perceptually see the difference between 32 and 256 graylevel intensities (there is only one photoreceptor with deals with black and white). So 256 levels of gray are really for the benefit of the machine, and although the machine would be just as happy processing 1024, it is likely not needed.

Here is an example. Consider the following photo of the London Blitz, WW2 (New Times Paris Bureau Collection).

blitz

This is a nice grayscale image, because it has a good distribution of intensity values from 0 to 255 (which is not always easy to find). Here is the histogram:

blitzHST

Now consider the image, reduced to 8, 16, 32, 64, and 128 intensity levels. Here is a montage of the results, shown in the form of a region extracted form he original image.

The same image with differing levels of grayscale.

Not that there is very little perceivable difference, except at 8 intensity levels, where the image starts to become somewhat grainy. Now consider a companion of this enlarged region showing only 256 (left) versus 32 (right) intensity levels.

blitz256vs32

Can you see the difference? There is very little difference, especially when viewed in the over context of the complete image.

Many historic images look like they are grayscale, but in fact they are anything but. They may be slightly yellowish or brown in colour, either due to the photographic process, or due to aging of the photographic medium. There is no benefit to processing these type of photographs as colour images however, they should be converted to 8-bit.

Is the eye equivalent to a 50mm lens?

So in the final post in this series we will look at the adage that a 50mm lens is a “normal” lens because it equates to the eyes view of things. Or is it 43mm… or 35mm? Again a bunch of number seem to exist on the net, and it’s hard to decipher what the real answer is. Maybe there is no real answer, and we should stop comparing eyes to cameras? But for arguments sake let’s look at the situation in a different way by asking what lens focal length most closely replicates the Angle Of View (AOV) of the human visual system (HVS).

One common idea floating around is that the “normal” length of a lens is 43mm because  a “full-frame” film, or sensor is 24×36mm in size, and if you calculate the length of the diagonal you get 43.3mm. Is this meaningful? Unlikely. You can calculate the various AOVs for each of the dimensions using the formula: 2 arctan(d/2f); where is the dimension, and f is the focal length. So for the 24×36mm frame with a 50mm lens, for the diagonal we get: 2 arctan(43.3/(2×50) = 46.8°. This diagonal AOV is the one most commonly cited with lenses, but probably not the right one because few people think about a diagonal AOV. A horizontal one is more common, using d=36mm. Now we get 39.6°.

So now let’s consider the AOV of the HVS. The normal AOV of the HVS assuming binocular vision constraints of roughly 120° (H) by 135° (V), but the reality is that our AOV with respect to targeted vision is probably only 60° horizontally and 10-15° vertically from a point of focus. Of the horizontal vision, likely only 30° is focused. Let’s be conservative and assume 60°.

So a 50mm lens is not close. What about a 35mm lens? This would end up with a horizontal AOV of 54.4°, which is honestly a little closer. A 31mm lens gives us roughly 60°. A 68mm gives us the 30° of focused vision. What about if we wanted a lens AOV equivalent for the binocular 120° horizontal view? We would need a 10.5mm lens, which is starting to get a little fish-eyed.

There is in reality, no single answer. It really depends on how much of the viewable region of the HVS you want to include.

The wonder of the human eye

If you ever wonder what marvels lie in the human visual system, perform this exercise. Next time your a passenger in a moving car, and driving through a region with trees, look out at the landscape passing you by. If you look at the leaves on a particular tree you might be able to pick up individual leaves. Now track those leaves as you pass the scene. You will be able to track them because the human visual system is highly adept at pinpointing objects, even moving ones. The best high resolution camera could either take a video, or a photograph with an incredibly fast shutter speed, effectively freezing the frame. Cameras find tracking at high speed challenging.

Tracking and interpreting is even more challenging. It is the interpretation that sets the HVS apart from its digital counterparts. It is likely one of the attributes that allowed us to evolve. Access to fine detail, motion analysis, visual sizing of objects, colour differentiation – all things that can be done less effectively in the digital realm. Notice that I said effectively, and not efficiently. For the HVS does have limitations – lack of zoom, inability to store pictures, macro abilities, and no filtering. The images we do retain in our minds are somewhat abstract, lacking the clarity of photographs. But memories exist as more than mere visual representations. They encompass the amalgam of our senses as visual intersperse with smell, sound and touch. 

Consider the photograph above, of some spruce tips. The image shows the needles as being a vibrant light green. What the picture fails to impart is an understanding of the feel and smell associated with the picture. The resiny smell of pine, and the soft almost fuzzy feeling of the tips. These sensory memories are encapsulated in the image stored in our minds. We can also conceptualize the information in the photography using colour, shape and texture. 

Resolution of the human eye (i) pure pixel power

A lot of visual technology such as digital cameras, and even TVs are based on megapixels, or rather the millions of pixels in a sensor/screen. What is the resolution of the human eye? It’s not an easy question to answer, because there are a number of facets to the concept of resolution, and the human eye is not analogous to a camera sensor. It might be better to ask how many pixels would be needed to make an image on a “screen” large enough to fill our entire field of view, so that when we look at it we can’t detect pixelation.

Truthfully, we may never really be able to put an exact number on the resolution of the human visual system – the eyes are organic, not digital. Human vision is made possible by the presence of photoreceptors in the retina. These photoreceptors, of which there are over 120 million in each eye, convert electromagnetic radiation into neural signals. The photoreceptors consist of rods and cones. Rods (which are rod shaped) provide scotopicvision,  are responsible for low-light vision, and are achromatic. Cones (which are con shaped) provide photopicvision, are active at high levels of illumination, and are capable of colour vision. There are roughly 6-7 million cones, and nearly 120-125 million rods.

But how many [mega] pixels is this equivalent to? An easy guess of pixel resolution might be 125 -130 megapixels. Maybe. But then many rods are attached to bipolar cells providing for a low resolution, whereas cones each have their own  bipolar cell. The bipolar cells strive to transmit signals from the photoreceptors to the ganglion cells. So there may be way less than 120 million rods providing actual information (sort-of like taking a bunch of grayscale pixels in an image and averaging their values to create an uber-pixel). So that’s not a fruitful number.

A few years ago Roger M. Clark of Clark Vision performed a calculation, assuming a field of view of 120° by 120°, and an acuity of 0.3 arc minutes. The result? He calculated that the human eye has a resolution of 576 megapixels. The calculation is simple enough:

(120 × 120 × 60 × 60) / (0.3 × 0.3) =576,000,000

The value 60 is the number of arc-minutes per degree, and the 0.3 arcmin²  is essentially the “pixel” size. A square degree is then 60×60 arc-minutes, and contains 40,000 “pixels”.  Seems like a huge number. But, as Clark notes, the human eye is not a digital camera. We don’t take snapshots (more’s the pity), and our vision system is more like a video stream. We also have two eyes, providing stereoscopic and binocular vision with the ability of depth perception. So there are many more factors than available in a simple sensor. For example, we typically move our eyes around, and our brain probably assembles a higher resolution image than is possible using our photoreceptors (similar I would imagine to how a high-megapixel image is created by a digital camera, slightly moving the sensor, and combining the shifted images).

The issue here may actually be the pixel size. In optimal viewing conditions the human eye can resolve detail as small as 0.59 arc minutes per line pair, which equates to 0.3 arc minutes. This number comes from a study from 1897 – “Die Abhängigkeit der Sehschärfe von der Beleuchtungsintensität”, written by Arthur König (translated roughly to “The Dependence of Visual Acuity on the Illumination Intensity”). A more recent study from 1990 (Curcio90) suggests a value of 77 cycles per degree. To convert this to arc-minutes per cycle, we first divide 1 by 77 and then multiply by 60 = 0.779. Two pixels define a cycle, so 0.779/2 = 0.3895, or 0.39. Now if we use 0.39×0.39 arcmin as the pixel size, we get 6.57 pixels per arcmin², versus 11.11 pixels when the acuity is 0.3. This vastly changes the value calculated to 341megapixels (60% of the previous calculation).

Clark’s calculation using 120° is also conservative, as the eyes  field of view is roughly 155° horizontally, and 135° vertically. If we used these constraints we would get 837 megapixels (0.3), or 495 megapixels (0.39). The pixel size of 0.3 arcmin² is optimal viewing – but about 75% of the population have 20/20 vision, both with and without corrective measures. 20/20 vision implies an acuity of 1 arc minute, which means a pixel size of 1×1 arcmin². This could mean a simple 75 megapixels. There are three other factors which complicate this: (i)  these calculations assume uniform  optimal acuity, which is very rarely the case, (ii) vision is binocular, not monocular, and (iii) the field of view is likely not a rectangle.

For binocular vision, assuming each eye has a horizontal field of view of 155°, and there is an overlap of 120° (120° of vision from each eye is binocular, remaining 35° in each eye is monocular). This results in an overall horizontal field of view of 190°, meaning if we use 190°, and 1 arc minute acuity we get a combined total vision of 92 megapixels. If we change acuity to 0.3 we get over 1 gigapixel. Quite a range.

All these calculations are mere musings – there are far too many variables to consider in trying to calculate a generic number to represent the megapixel equivalent of the human visual system. The numbers I have calculated are approximations only to show the broad range of possibilities based solely on a few simple assumptions. In the next couple of posts we’ll look at some of the complicating factors, such as the concept of uniform acuity.

REFS:
(Curcio90) Curcio, C.A., Sloan, K.R., Kalina, R.E., Hendrickson, A.E., “Human photoreceptor topography”, The Journal of Comparative Neurology, 292, pp.497-523 (1990)