Why aesthetic appeal in image processing matters

What makes us experience beauty?

I have spent over two decades writing algorithms for image processing, however I have never really created anything uber fulfilling . Why? Because it is hard to create generic filters, especially for tasks such as image beautification. In many ways improving the aesthetic appeal of photographs involves modifying the content on an image in more non natural ways. It doesn’t matter how AI-ish an algorithm is, it cannot fathom what the concept of aesthetic appeal is.  A photograph one person may find pleasing may be boring to others. Just like a blank canvas is considered art to some, but not to others. No amount of mathematical manipulation will lead to a algorithmic panacea of aesthetics. We can modify the white balance and play with curves, indeed we can make 1001 changes to a photograph, but the final outcome will be perceived differently by different people.

After spending years researching image processing algorithms, and designing some of my own, it wasn’t until I decided to take the art of acquiring images to a greater depth that I realized algorithms are all good and well, but there is likely little need for the plethora of algorithms created every year. Once you pick up a camera, and start playing with different lenses, and different camera settings, you begin to realize that part of the nuance any photograph is its natural aesthetic appeal. Sure, there are things that can be modified to improve aesthetic appeal, such as contrast enhancement or improving the sharpness, but images also contain unfocused regions that contribute to their beauty.

If you approach image processing purely from a mathematical (or algorithmic) viewpoint, what you are trying to achieve is some sort of utopia of aesthetics. But this is almost impossible, largely because every photography is unique.  It is possible to improve the acuity of objects in an image using techniques such as unsharp masking, but it is impossible to resurrect a blurred image – but maybe that’s the point. One could create an fantastic filter that sharpens an image beautifully, but with the sharpness of modern lenses, that may not be practical. Consider this example of a photograph taken in Montreal. The image has good definition of colour, and has a histogram which is fairly uniform. There isn’t a lot that can be done to this image, because it truly does represent the scene as it exists in real life. If I had taken this photo on my iPhone, I would be tempted to post it on Instagram, and add a filter… which might make it more interesting, but maybe only from the perspective of boosting colour.

aestheticAppeal1

A corner hamburger joint in Montreal – original image.

Here is the same image with only the colour saturation boosted (by ×1.6). Have its visual aesthetics been improved? Probably. Our visual system would say it is improved, but that is largely because our eyes are tailored to interpret colour.

aestheticAppeal2

A corner hamburger joint in Montreal – enhanced image.

If you take a step back from the abyss of algorithmically driven aesthetics, you begin to realize that too few individuals in the image processing community have taken the time to really understand the qualities of an image. Each photograph is unique, and so the idea of generic image processing techniques is highly flawed. Generic techniques work sufficiently well in machine vision applications where the lighting is uniform, and the task is also uniform, e.g. inspection of rice grains, or identification of burnt potato chips. No aesthetics are needed, just the ability to isolate an object and analyze it for whatever quality is needed. It’s one of the reasons unsharp masking has always been popular. Alternative algorithms for image sharpening really don’t work much better. And modern lenses are sharp, in fact many people would be more likely to add blur than take it away.

 

In-camera keystone compensation (Olympus) (ii)

So I took some photographs using the Olympus keystone compensation on a trip to Montreal. Most of them deal with buildings that are leaning back, which is the classic case when trying to photograph a building. The first set deal with some landscape photographs. In both these photographs I could not move any further back to take the photographs, and both were taken with the Olympus 12-40mm, set as wide angle (12mm or 24mm full frae equivalent).It was possible to correct both images, without loosing any of the building.

keystone correction of photographs
Originals (left), keystone corrected (right)

The second case deals with portrait format photographs. In both cases it was slightly more challenging to make sure the entire picture was in the frame, but doing it in-situ it was possible to assure this happened. Doing in post-processing may result in the lose of a portion of the photograph. In the lower image I had enough leeway to position the keystone-corrected frame in such a manner that the building is surrounded by ample space.

keystone correction of photographs
Originals (left), keystone corrected (right)

Compensating for perspective distortion often comes at a price. Modifying the geometry of a photograph means that less will fit in the photograph. Taking a photograph too close to a building may mean something is cut off.

Horizontal keystone correction can sometimes be more difficult, because the distortion is usually a compound distortion. In the example below, the photograph was taken slightly off-centre, producing an image which is distorted both from a horizontal and a vertical perspective.

keystone correction
Complex distortion

Is there a loss in aesthetic appeal? Maybe. Food for future thought.

In-camera keystone compensation (Olympus) (i)

The Olympus OM-D EM5 Mark IIhas a completely cool feature they call keystone compensation. It’s a kind-of weird name – but dig a little deeper and you run into the keystone effect  which is the apparent distortion of an image caused by projecting it onto an angled surface. It basically makes a square look like a trapezoid, which is the shape of an architectural stone known as a keystone. Now normally when you take a photograph of a building, this effect comes into play. Reducing the keystone effect is called keystone correction. There are special lenses that remove this distortion, i.e. tilt-shift lenses. Now Olympus has introduced an algorithm which compensates for the keystone effect. Here is an example of keystone correction (distortion is shown as the opaque pink region).

keystone correction
Keystone correction before (left) and after (right)

Olympus has introduced an algorithm on some of their cameras (e.g. EM5ii) which compensates for the keystone effect. First, you have to enable Keystone Correction in “Shooting Menu 2”.

Olympus EM-5(ii)
Turning on keystone correction on an Olympus EM-5(ii)

Then it’s a simple matter of using the front or rear dial for correction. The front dial is used to horizontal correction, and the rear dial is used for vertical correction. Note that it doesn’t allow for both types of keystone compensation to be used at the same time. If you decide to change from vertical to horizontal correction, you have to reset the vertical component to 0. Frame the shot and adjust the effect in the display using the front and rear dial. Select the area to be recorded using the directions buttons (surrounding the OK button).

keystoneOLY4
Keystone correction screen

The only trick is using the INFObutton to switch between keystone compensation and making adjustments to exposure compensation. In fact if you are using keystone correction often, I would program it into one of the function buttons.

Keystone Compensation mode enables keystone distortion to be corrected when shooting architecture and product photography without resorting to tilt-shift lenses or post-processing corrections in Photoshop.

Is the eye equivalent to a 50mm lens?

So in the final post in this series we will look at the adage that a 50mm lens is a “normal” lens because it equates to the eyes view of things. Or is it 43mm… or 35mm? Again a bunch of number seem to exist on the net, and it’s hard to decipher what the real answer is. Maybe there is no real answer, and we should stop comparing eyes to cameras? But for arguments sake let’s look at the situation in a different way by asking what lens focal length most closely replicates the Angle Of View (AOV) of the human visual system (HVS).

One common idea floating around is that the “normal” length of a lens is 43mm because  a “full-frame” film, or sensor is 24×36mm in size, and if you calculate the length of the diagonal you get 43.3mm. Is this meaningful? Unlikely. You can calculate the various AOVs for each of the dimensions using the formula: 2 arctan(d/2f); where is the dimension, and f is the focal length. So for the 24×36mm frame with a 50mm lens, for the diagonal we get: 2 arctan(43.3/(2×50) = 46.8°. This diagonal AOV is the one most commonly cited with lenses, but probably not the right one because few people think about a diagonal AOV. A horizontal one is more common, using d=36mm. Now we get 39.6°.

So now let’s consider the AOV of the HVS. The normal AOV of the HVS assuming binocular vision constraints of roughly 120° (H) by 135° (V), but the reality is that our AOV with respect to targeted vision is probably only 60° horizontally and 10-15° vertically from a point of focus. Of the horizontal vision, likely only 30° is focused. Let’s be conservative and assume 60°.

So a 50mm lens is not close. What about a 35mm lens? This would end up with a horizontal AOV of 54.4°, which is honestly a little closer. A 31mm lens gives us roughly 60°. A 68mm gives us the 30° of focused vision. What about if we wanted a lens AOV equivalent for the binocular 120° horizontal view? We would need a 10.5mm lens, which is starting to get a little fish-eyed.

There is in reality, no single answer. It really depends on how much of the viewable region of the HVS you want to include.

Photographs and the craft of chance

Photographs are the encapsulation of our lives. They are snapshots, brief interludes into slices of time. Times long past. Memories of fighting in the trenches in WW1, the landings at Normandy, life in small Italian mountain villages. The best and worse of our histories. Photographs capture such fleeting moments that in most cases it would be impossible to reproduce. Photography is in its core essence the art of chance. Of being in the right place at the right time, of being able to capture just the right amount of photons entering the camera. Blink, and it could all be different. Before photographs our history was handed down through generations in stories, or paintings upon the wall. But neither of these is fleeting, they are thought-out, prescribed renditions of history. Photographs are not, they are raw, invoking, and often need no explanation. And while they could be considered by some to be art, they are crafted using tools which allow light to be captured. The true result is in natures control.

Capturing natural life is truly the essence of the craft of chance. That one photograph that captures an insect holding still, almost posing for the shot – blink and it will move on to its next feast.

The wonder of the human eye

If you ever wonder what marvels lie in the human visual system, perform this exercise. Next time your a passenger in a moving car, and driving through a region with trees, look out at the landscape passing you by. If you look at the leaves on a particular tree you might be able to pick up individual leaves. Now track those leaves as you pass the scene. You will be able to track them because the human visual system is highly adept at pinpointing objects, even moving ones. The best high resolution camera could either take a video, or a photograph with an incredibly fast shutter speed, effectively freezing the frame. Cameras find tracking at high speed challenging.

Tracking and interpreting is even more challenging. It is the interpretation that sets the HVS apart from its digital counterparts. It is likely one of the attributes that allowed us to evolve. Access to fine detail, motion analysis, visual sizing of objects, colour differentiation – all things that can be done less effectively in the digital realm. Notice that I said effectively, and not efficiently. For the HVS does have limitations – lack of zoom, inability to store pictures, macro abilities, and no filtering. The images we do retain in our minds are somewhat abstract, lacking the clarity of photographs. But memories exist as more than mere visual representations. They encompass the amalgam of our senses as visual intersperse with smell, sound and touch. 

Consider the photograph above, of some spruce tips. The image shows the needles as being a vibrant light green. What the picture fails to impart is an understanding of the feel and smell associated with the picture. The resiny smell of pine, and the soft almost fuzzy feeling of the tips. These sensory memories are encapsulated in the image stored in our minds. We can also conceptualize the information in the photography using colour, shape and texture. 

Should a camera think?

Photographer Arnold Newman  (1918-2006) once said “The camera is a mirror with a memory, but it cannot think.”.  Has anything really changed since analog cameras evolved into digital ones? Do cameras take better pictures, or do they just take better “quality” pictures because certain tasks, e.g. exposure, have been automated? Digital cameras automatically focus a scene, and do just about everything else necessary to automate the process (except pick the scene). They perform facial recognition, and the newer ones even have types of machine learning that do various things – most likely make the task of photography even “easier”. But what’s the point? Part of the reason for taking a photograph is the experience involved. Playing with the settings, maybe focusing the lens manually – all this gives a better insight in the process of taking a photograph. Otherwise it becomes just another automated phenomena in our lives – which is *ok* for takings snaps on mobile devices I guess… but not on cameras.

What is the focal length of the human eye?

It’s funny the associations people make between cameras and the human eye. Megapixels is one, but focal length is another. It probably stems from the notion that a full-frame 50mm focal length is as close as a camera gets to human vision (well not quite). While resolution has to do with the “the number of pixels”, and “the acuity of those pixels”, i.e. how the retina works, the focal length has to do with other components of the eye. Now search the web and you will find a whole bunch of different numbers when it comes to the focal length of the eye, in fact there are a number of definitions based on the optical system.

Now the anatomy of the eye has a role to play in defining the focal length. A camera lens is composed of a series of lens elements separated by air. The eye, conversely, is composed of two lenses separated by fluids. In the front of the eye is a tough, transparent layer called the cornea, which can be considered a fixed lens. Behind the cornea is a fluid known as the aqueous humor, filling the space between the cornea and lens. The lens is transparent, like the cornea, but it can be reshaped to allow focusing of objects are differing distances (the process of changing the shape of the lens is called accommodation,and is mediated by the ciliary muscles). From the lens, light travels through another larger layer of fluid known as the vitreous humor on its way to the retina.

When the ciliary muscles are relaxed, the focal length of the lens is at its maximum, and objects at a distance are in focus. When the ciliary muscles contract, the lens assumes a more convex shape, and the focal length of the lens is shortened to bring closer objects into focus. These two limits are called the far-point and near-point respectively. 

Given this, there seem to be two ways people measure the focal length: (i) diopter, or (ii) optics based.

Focal length based on diopter

To understand diopter-based focal length of the eye, we have to understand Diopter, or the strength (refractive power) of a lens. It is calculated as the reciprocal of the focal length in metres. The refractive power of a lens is the ability of a material to bend light. A 1-diopter lens will bring a parallel beam to a focus at 1 metre. So the calculation is:

Diopter = 1 / (focal length in metres)

The average human eye functions in such a way that for a parallel beam of light coming from a distant object to be brought into focus, on the retina, the eye must have an optical power of about 59-60 diopters. In the compound lens of the human eye, about 40 diopters comes from the front surface of the cornea, the rest from the variable focus (crystalline) lens. Using this information we can calculate the focal length of human eye, as 1/Diopter, which means 1/59=16.9 and 1/60 = 16.66, or roughly 17mm.

Focal length based OPTICS

From the viewpoint of physical eye there are a number of distances to consider. If we consider the reduced eye, with a single principal plane, and nodal point. The principal plane is 1.5mm behind the anterior surface of the cornea, and a nodal point 7.2mm behind the anterior surface of the cornea. This gives an anterior focal length of 17.2mm measured from the single principal plane to the anterior focal point (F1), 15.7mm in front of the anterior surface of the cornea. The posterior focal length of 22.9mm is measured from the same plane to the posterior focal point (F2) on the retina. 

The problem with some calculations is that they fail to take into account the fluid-filled properties of the eye. Now calculate the Dioptric power of both focal lengths, using the refractive index of vitreous humour = 1.337 for the calculation of the posterior focal length :

diopter, anterior focal length = 1000/17.2 = 58.14
diopter, posterior focal length = (1000 * 1.337)/22.9 = 58.38

what about aperture?

What does this allow us to do? Calculate the aperture range of the human eye. If we assume the iris diameters are 2-8mm, and use both 17mm and 22.9mm we get the following aperture ranges:

17mm : f2.1 – f8.5
22.9mm : f2.9 – f11.5

Does any of this really matter? Only if we were making a comparison to the “normal” lens found on a camera – the 50mm. We’ll continue this in a future post.

Resolution of the human eye (iii) – things that affect visual acuity

So now we have looked at the number of overall pixels, and the acuity of pixels throughout that region. If you have read the last two posts, you, like me, might surmise that there is no possibility of associating a value with the resolution of the eye. And you would probably be right, because on top of everything else there are a number of factors which affect visual acuity.

  1. Refractive errors – Causes defocus at the retina, blurring out fine detail and sharp edges. A good example is myopia (short-sightedness).
  2. Size of pupil – Pupils act like camera apertures, allowing light into the eye. Large pupils allow more light in, possibly affecting resolution by aberrations in the eye.
  3. Illumination of the background – Less light means a lower visual acuity. As cones are the acuity masters, low light reduces their capabilities.
  4. Area of retina stimulated – Visual acuity is greatest in the fovea. At 2.5 degrees from the point the eyes are fixated upon, there is approximately a 50% loss in visual acuity.
  5. Eye movement – The eyes move, like all the time (e.g. your head doesn’t move when reading a book).

Complicated right? So what is the answer? We have looked at how non-uniform acuity may affect the resolution of the human eye. The last piece of the puzzle (maybe?) in trying to approximate the resolution of the human eye is the shape of our visual scope. When we view something, what is the “shape of the picture” being created. On a digital camera it is a rectangle. Not so with the human visual system. Because of the non-uniformity of acuity, the shape of the region being “captured” really depends on the application. If you are viewing a landscape vista, you are looking at an overall scene, whereas reading a book, the “capture area” is quite narrow (although the overall shape of information being input is the same, peripheral areas are seemingly ignored, because the fovea is concentrating on processing the words being read). To provide a sense of the visual field of binocular vision, here is an image from a 1964 NASA report, Bioastronautics Data Book:

This diagram shows the normal field of view of a pair of human eyes. The central white portion represents the region seen by both eyes. The dashed portions, right and left, represent the regions seen by the right and left eyes, respectively. The cut-off by the brows, cheeks, and nose is shown by the black area. Head and eyes are motionless in this case. Not quite, but almost an ellipse. But you can see how this complicates things even further when trying to approximate resolution. Instead of a rectangular field-of-view of 135°×190°, assume the shape of an ellipse, which gives (95*67.5)*π = 20145, which converts to 72.5 megapixels for 1 arc minute sized pixels – which is marginally lower than the 75 megapixels of the bounding rectangle.

So what’s the answer? What *is* the resolution of the human eye? If you wanted a number to represent the eyes pixelation, I would verge on the conservative side, and give the resolution of the eye a relatively low number, and by this I mean using the 1 arc minute acuity value, and estimating the “resolution” of the human visual system at somewhere around 100 megapixels. This likely factors in some sort of compromise for the region of the fovea with high acuity, and the remainder of the field of view with low resolution. It may also take into account the fact that the human vision system operates more like streaming video than it does a photograph. Can the eye be compared to a camera? No, it’s far too complicated trying to decipher a quantitative value for an organic structure comprised 80% gelatinous tissue.

Maybe some mysteries of the world should remain just that.

Why the camera lies…

‘The old saying “The camera cannot lie”, is wrong of course. Photography is not objective. Firstly, every photograph is an abstract, a transformation of colour values into the grey-scale. already here there are endless possibilities of subjective representation. Secondly, only a small tone-scale is at our disposal in which to express the infinite wealth of tone values which we find in nature, from gleaming white down to the deepest black; it comprises only the thousandth even ten-thousandth, part of the original tone-scale. Thus we have not only to find an analogy to colour, we have also to transpose the entire graduation of light intensity. Thus consideration of style, of composition, play an important role in “objective” photography in addition to technical considerations, and, most of all, the personal conception of nature and ability to re-create. The photographic problem goes, therefore, much deeper than the mere depiction of something seen in the world of phenomena.’

Helmut Gernsheim in New Photo Vision, Fountain Press, 1942.