Why photographs need very little processing

I recently read an article on photographing a safari in Kenya, in which the author, Sarfaraz Niazi, made an interesting statement. While describing the process of taking 8000 photos on the trip he made a remark about post-processing, and said his father taught him a lesson when he was aged 5 – that “every picture is carved out in perpetuity as soon as you push the shutter“. There is so much truth in this statement. Photographs are snapshots of life, and the world around us is rarely perfect, so why should a photograph be any different? It is not necessary to vastly process images – there are of course ways to adjust the contrast, maybe improve the sharpness, or adjust the exposure somewhat, but beyond that, what is necessary? Add a filter? Sure that’s fun on Instagram, but shouldn’t be necessary on camera-based photographs.

Many years of attempting to derive algorithms to improve images have taught me that there are no generic one-fits-all algorithms. Each photograph must be modified in a manner that suits the ultimate aesthetic appeal of the image. An algorithm manipulates through quantitative evaluation, having no insight into the content, or qualitative aspects of the photograph. No AI algorithm will ever be able to replicate the human eyes ability to determine aesthetic value – and every persons aesthetic interpretation will be different. Add too much computational photography into a digital camera, and you end up with too much of a machine-driven photograph. Photography is a craft as much as an art and should not be controlled solely by algorithms. Consider the following photograph, taken in Glasgow, Scotland. The photograph suffers from being taken on quite a hot day in the summer, when the sky was somewhat hazy. The hazy sky is one factor which causes a reduction in colour intensity in the photograph.

glasgowAestheticpre

Original photograph

In every likelihood, this photograph represents the true scene quite accurately. An increase in saturation, and modification of exposure will produce a more vivid photograph, shown below. Likely one of the Instagram filters would also have done a nice job in “improving” the image. Was the enhancement necessary? Maybe, maybe not. The enhancement does improve the colours within the image, and the contrast between objects.

glasgowAestheticpost

Post-processed photograph

Aesthetically motivated picture processing

For years I wrote scientific papers on various topics in image processing, but what I learnt from that process was that few of the papers written are actually meaningful. For instance, in trying to create new image sharpening algorithms many people forgot the whole point of sharpening. Either a photographer strives for sharpness in an entire image or endeavours to use blur as a means of focusing the attention on something of interest in the image (which is in focus, and therefore sharp). Many sharpening algorithms have been developed with the concept of sharpening the whole image… but this is often a falsehood. Why does the photo need to be sharpened? What is the benefit? A simple sharpening with unsharp masking (which is an unfortunate name for a filter) works quite well in its task. But it was designed at a time when images were small, and filters were generally simple 3×3 constructs. Applying the original filter to a 24MP 4000×6000 pixel image will make little, if any difference. On the other hand, blurring an image does nothing for its aesthetics unless it is selective, in essence trying to mimic bokeh in some manner.

Much of what happens in image processing (aside from machine vision) is aesthetically based. The true results of image processing cannot be provided in a quantitative manner and that puts it at odds with scientific methodology. But who cares? Scientific thought in an academic realm is far too driven by pure science with little in the way of pure inventing. But alas few academics think this way, most take on the academic mantra and are hogtied to doing things in a specified way. I no longer prescribe to this train of thoughts, and I don’t really know if I ever did.

aesthetic appeal, picture of Montreal metro with motion blur

This picture shows motion blur which results from a moving subway car, whilst the rest of the picture remains in focus. The motion blur is a part of the intrinsic appeal of the photograph – yet there is no way of objectively quantifying the aesthetic value – it is something that can only be qualitatively and subjectively evaluated.

Aesthetically motivated Image processing is a perfect fit for photographs because while there are theoretical underpinnings to how lenses are designed, and technical principles of how a camera works, the ultimate result – a photograph, is the culmination of the mechanical ability of the camera and the artistic ability of the photographer. Machine vision, the type used in manufacturing facilities to determine things like product defects is different, because it is tasked with precision automated photography in ideal controlled conditions. To develop algorithms to remove haze from natural scenes, or reduce glare is extremely difficult, and may be best taken when thee is no haze. Aesthetic-based picture processing is subjectively qualitative and there is nothing wrong with that. It is one of the criteria that sets humans apart from machines – the inherent ability to visualize things differently. Some may find bokeh creamy while others may find it too distractive, but that’s okay. You can’t create an algorithm to describe bokeh because it is an aesthetic thing. The same way it’s impossible to quantify taste, or distinguish exactly what umami is.

Consider the following quote from Bernard Berenson (Aesthetics, Ethics, and History) –

‘The eyes without the mind would perceive in solids nothing but spots or pockets of shadow and blisters of light, chequering and criss-crossing a given area. The rest is a matter of mental organization and intellectual construction. What the operator will see in his camera will depend, therefore, on his gifts, and training, and skill, and even more on his general education; ultimately it will depend on his scheme of the universe.’

Why aesthetic appeal in image processing matters

What makes us experience beauty?

I have spent over two decades writing algorithms for image processing, however I have never really created anything uber fulfilling . Why? Because it is hard to create generic filters, especially for tasks such as image beautification. In many ways improving the aesthetic appeal of photographs involves modifying the content on an image in more non natural ways. It doesn’t matter how AI-ish an algorithm is, it cannot fathom what the concept of aesthetic appeal is.  A photograph one person may find pleasing may be boring to others. Just like a blank canvas is considered art to some, but not to others. No amount of mathematical manipulation will lead to a algorithmic panacea of aesthetics. We can modify the white balance and play with curves, indeed we can make 1001 changes to a photograph, but the final outcome will be perceived differently by different people.

After spending years researching image processing algorithms, and designing some of my own, it wasn’t until I decided to take the art of acquiring images to a greater depth that I realized algorithms are all good and well, but there is likely little need for the plethora of algorithms created every year. Once you pick up a camera, and start playing with different lenses, and different camera settings, you begin to realize that part of the nuance any photograph is its natural aesthetic appeal. Sure, there are things that can be modified to improve aesthetic appeal, such as contrast enhancement or improving the sharpness, but images also contain unfocused regions that contribute to their beauty.

If you approach image processing purely from a mathematical (or algorithmic) viewpoint, what you are trying to achieve is some sort of utopia of aesthetics. But this is almost impossible, largely because every photography is unique.  It is possible to improve the acuity of objects in an image using techniques such as unsharp masking, but it is impossible to resurrect a blurred image – but maybe that’s the point. One could create an fantastic filter that sharpens an image beautifully, but with the sharpness of modern lenses, that may not be practical. Consider this example of a photograph taken in Montreal. The image has good definition of colour, and has a histogram which is fairly uniform. There isn’t a lot that can be done to this image, because it truly does represent the scene as it exists in real life. If I had taken this photo on my iPhone, I would be tempted to post it on Instagram, and add a filter… which might make it more interesting, but maybe only from the perspective of boosting colour.

aestheticAppeal1

A corner hamburger joint in Montreal – original image.

Here is the same image with only the colour saturation boosted (by ×1.6). Have its visual aesthetics been improved? Probably. Our visual system would say it is improved, but that is largely because our eyes are tailored to interpret colour.

aestheticAppeal2

A corner hamburger joint in Montreal – enhanced image.

If you take a step back from the abyss of algorithmically driven aesthetics, you begin to realize that too few individuals in the image processing community have taken the time to really understand the qualities of an image. Each photograph is unique, and so the idea of generic image processing techniques is highly flawed. Generic techniques work sufficiently well in machine vision applications where the lighting is uniform, and the task is also uniform, e.g. inspection of rice grains, or identification of burnt potato chips. No aesthetics are needed, just the ability to isolate an object and analyze it for whatever quality is needed. It’s one of the reasons unsharp masking has always been popular. Alternative algorithms for image sharpening really don’t work much better. And modern lenses are sharp, in fact many people would be more likely to add blur than take it away.

 

In-camera keystone compensation (Olympus) (ii)

So I took some photographs using the Olympus keystone compensation on a trip to Montreal. Most of them deal with buildings that are leaning back, which is the classic case when trying to photograph a building. The first set deal with some landscape photographs. In both these photographs I could not move any further back to take the photographs, and both were taken with the Olympus 12-40mm, set as wide angle (12mm or 24mm full frae equivalent).It was possible to correct both images, without loosing any of the building.

keystone correction of photographs
Originals (left), keystone corrected (right)

The second case deals with portrait format photographs. In both cases it was slightly more challenging to make sure the entire picture was in the frame, but doing it in-situ it was possible to assure this happened. Doing in post-processing may result in the lose of a portion of the photograph. In the lower image I had enough leeway to position the keystone-corrected frame in such a manner that the building is surrounded by ample space.

keystone correction of photographs
Originals (left), keystone corrected (right)

Compensating for perspective distortion often comes at a price. Modifying the geometry of a photograph means that less will fit in the photograph. Taking a photograph too close to a building may mean something is cut off.

Horizontal keystone correction can sometimes be more difficult, because the distortion is usually a compound distortion. In the example below, the photograph was taken slightly off-centre, producing an image which is distorted both from a horizontal and a vertical perspective.

keystone correction
Complex distortion

Is there a loss in aesthetic appeal? Maybe. Food for future thought.

In-camera keystone compensation (Olympus) (i)

The Olympus OM-D EM5 Mark IIhas a completely cool feature they call keystone compensation. It’s a kind-of weird name – but dig a little deeper and you run into the keystone effect  which is the apparent distortion of an image caused by projecting it onto an angled surface. It basically makes a square look like a trapezoid, which is the shape of an architectural stone known as a keystone. Now normally when you take a photograph of a building, this effect comes into play. Reducing the keystone effect is called keystone correction. There are special lenses that remove this distortion, i.e. tilt-shift lenses. Now Olympus has introduced an algorithm which compensates for the keystone effect. Here is an example of keystone correction (distortion is shown as the opaque pink region).

keystone correction
Keystone correction before (left) and after (right)

Olympus has introduced an algorithm on some of their cameras (e.g. EM5ii) which compensates for the keystone effect. First, you have to enable Keystone Correction in “Shooting Menu 2”.

Olympus EM-5(ii)
Turning on keystone correction on an Olympus EM-5(ii)

Then it’s a simple matter of using the front or rear dial for correction. The front dial is used to horizontal correction, and the rear dial is used for vertical correction. Note that it doesn’t allow for both types of keystone compensation to be used at the same time. If you decide to change from vertical to horizontal correction, you have to reset the vertical component to 0. Frame the shot and adjust the effect in the display using the front and rear dial. Select the area to be recorded using the directions buttons (surrounding the OK button).

keystoneOLY4
Keystone correction screen

The only trick is using the INFObutton to switch between keystone compensation and making adjustments to exposure compensation. In fact if you are using keystone correction often, I would program it into one of the function buttons.

Keystone Compensation mode enables keystone distortion to be corrected when shooting architecture and product photography without resorting to tilt-shift lenses or post-processing corrections in Photoshop.

Is the eye equivalent to a 50mm lens?

So in the final post in this series we will look at the adage that a 50mm lens is a “normal” lens because it equates to the eyes view of things. Or is it 43mm… or 35mm? Again a bunch of number seem to exist on the net, and it’s hard to decipher what the real answer is. Maybe there is no real answer, and we should stop comparing eyes to cameras? But for arguments sake let’s look at the situation in a different way by asking what lens focal length most closely replicates the Angle Of View (AOV) of the human visual system (HVS).

One common idea floating around is that the “normal” length of a lens is 43mm because  a “full-frame” film, or sensor is 24×36mm in size, and if you calculate the length of the diagonal you get 43.3mm. Is this meaningful? Unlikely. You can calculate the various AOVs for each of the dimensions using the formula: 2 arctan(d/2f); where is the dimension, and f is the focal length. So for the 24×36mm frame with a 50mm lens, for the diagonal we get: 2 arctan(43.3/(2×50) = 46.8°. This diagonal AOV is the one most commonly cited with lenses, but probably not the right one because few people think about a diagonal AOV. A horizontal one is more common, using d=36mm. Now we get 39.6°.

So now let’s consider the AOV of the HVS. The normal AOV of the HVS assuming binocular vision constraints of roughly 120° (H) by 135° (V), but the reality is that our AOV with respect to targeted vision is probably only 60° horizontally and 10-15° vertically from a point of focus. Of the horizontal vision, likely only 30° is focused. Let’s be conservative and assume 60°.

So a 50mm lens is not close. What about a 35mm lens? This would end up with a horizontal AOV of 54.4°, which is honestly a little closer. A 31mm lens gives us roughly 60°. A 68mm gives us the 30° of focused vision. What about if we wanted a lens AOV equivalent for the binocular 120° horizontal view? We would need a 10.5mm lens, which is starting to get a little fish-eyed.

There is in reality, no single answer. It really depends on how much of the viewable region of the HVS you want to include.

Photographs and the craft of chance

Photographs are the encapsulation of our lives. They are snapshots, brief interludes into slices of time. Times long past. Memories of fighting in the trenches in WW1, the landings at Normandy, life in small Italian mountain villages. The best and worse of our histories. Photographs capture such fleeting moments that in most cases it would be impossible to reproduce. Photography is in its core essence the art of chance. Of being in the right place at the right time, of being able to capture just the right amount of photons entering the camera. Blink, and it could all be different. Before photographs our history was handed down through generations in stories, or paintings upon the wall. But neither of these is fleeting, they are thought-out, prescribed renditions of history. Photographs are not, they are raw, invoking, and often need no explanation. And while they could be considered by some to be art, they are crafted using tools which allow light to be captured. The true result is in natures control.

Capturing natural life is truly the essence of the craft of chance. That one photograph that captures an insect holding still, almost posing for the shot – blink and it will move on to its next feast.

The wonder of the human eye

If you ever wonder what marvels lie in the human visual system, perform this exercise. Next time your a passenger in a moving car, and driving through a region with trees, look out at the landscape passing you by. If you look at the leaves on a particular tree you might be able to pick up individual leaves. Now track those leaves as you pass the scene. You will be able to track them because the human visual system is highly adept at pinpointing objects, even moving ones. The best high resolution camera could either take a video, or a photograph with an incredibly fast shutter speed, effectively freezing the frame. Cameras find tracking at high speed challenging.

Tracking and interpreting is even more challenging. It is the interpretation that sets the HVS apart from its digital counterparts. It is likely one of the attributes that allowed us to evolve. Access to fine detail, motion analysis, visual sizing of objects, colour differentiation – all things that can be done less effectively in the digital realm. Notice that I said effectively, and not efficiently. For the HVS does have limitations – lack of zoom, inability to store pictures, macro abilities, and no filtering. The images we do retain in our minds are somewhat abstract, lacking the clarity of photographs. But memories exist as more than mere visual representations. They encompass the amalgam of our senses as visual intersperse with smell, sound and touch. 

Consider the photograph above, of some spruce tips. The image shows the needles as being a vibrant light green. What the picture fails to impart is an understanding of the feel and smell associated with the picture. The resiny smell of pine, and the soft almost fuzzy feeling of the tips. These sensory memories are encapsulated in the image stored in our minds. We can also conceptualize the information in the photography using colour, shape and texture. 

Should a camera think?

Photographer Arnold Newman  (1918-2006) once said “The camera is a mirror with a memory, but it cannot think.”.  Has anything really changed since analog cameras evolved into digital ones? Do cameras take better pictures, or do they just take better “quality” pictures because certain tasks, e.g. exposure, have been automated? Digital cameras automatically focus a scene, and do just about everything else necessary to automate the process (except pick the scene). They perform facial recognition, and the newer ones even have types of machine learning that do various things – most likely make the task of photography even “easier”. But what’s the point? Part of the reason for taking a photograph is the experience involved. Playing with the settings, maybe focusing the lens manually – all this gives a better insight in the process of taking a photograph. Otherwise it becomes just another automated phenomena in our lives – which is *ok* for takings snaps on mobile devices I guess… but not on cameras.

What is the focal length of the human eye?

It’s funny the associations people make between cameras and the human eye. Megapixels is one, but focal length is another. It probably stems from the notion that a full-frame 50mm focal length is as close as a camera gets to human vision (well not quite). While resolution has to do with the “the number of pixels”, and “the acuity of those pixels”, i.e. how the retina works, the focal length has to do with other components of the eye. Now search the web and you will find a whole bunch of different numbers when it comes to the focal length of the eye, in fact there are a number of definitions based on the optical system.

Now the anatomy of the eye has a role to play in defining the focal length. A camera lens is composed of a series of lens elements separated by air. The eye, conversely, is composed of two lenses separated by fluids. In the front of the eye is a tough, transparent layer called the cornea, which can be considered a fixed lens. Behind the cornea is a fluid known as the aqueous humor, filling the space between the cornea and lens. The lens is transparent, like the cornea, but it can be reshaped to allow focusing of objects are differing distances (the process of changing the shape of the lens is called accommodation,and is mediated by the ciliary muscles). From the lens, light travels through another larger layer of fluid known as the vitreous humor on its way to the retina.

When the ciliary muscles are relaxed, the focal length of the lens is at its maximum, and objects at a distance are in focus. When the ciliary muscles contract, the lens assumes a more convex shape, and the focal length of the lens is shortened to bring closer objects into focus. These two limits are called the far-point and near-point respectively. 

Given this, there seem to be two ways people measure the focal length: (i) diopter, or (ii) optics based.

Focal length based on diopter

To understand diopter-based focal length of the eye, we have to understand Diopter, or the strength (refractive power) of a lens. It is calculated as the reciprocal of the focal length in metres. The refractive power of a lens is the ability of a material to bend light. A 1-diopter lens will bring a parallel beam to a focus at 1 metre. So the calculation is:

Diopter = 1 / (focal length in metres)

The average human eye functions in such a way that for a parallel beam of light coming from a distant object to be brought into focus, on the retina, the eye must have an optical power of about 59-60 diopters. In the compound lens of the human eye, about 40 diopters comes from the front surface of the cornea, the rest from the variable focus (crystalline) lens. Using this information we can calculate the focal length of human eye, as 1/Diopter, which means 1/59=16.9 and 1/60 = 16.66, or roughly 17mm.

Focal length based OPTICS

From the viewpoint of physical eye there are a number of distances to consider. If we consider the reduced eye, with a single principal plane, and nodal point. The principal plane is 1.5mm behind the anterior surface of the cornea, and a nodal point 7.2mm behind the anterior surface of the cornea. This gives an anterior focal length of 17.2mm measured from the single principal plane to the anterior focal point (F1), 15.7mm in front of the anterior surface of the cornea. The posterior focal length of 22.9mm is measured from the same plane to the posterior focal point (F2) on the retina. 

The problem with some calculations is that they fail to take into account the fluid-filled properties of the eye. Now calculate the Dioptric power of both focal lengths, using the refractive index of vitreous humour = 1.337 for the calculation of the posterior focal length :

diopter, anterior focal length = 1000/17.2 = 58.14
diopter, posterior focal length = (1000 * 1.337)/22.9 = 58.38

what about aperture?

What does this allow us to do? Calculate the aperture range of the human eye. If we assume the iris diameters are 2-8mm, and use both 17mm and 22.9mm we get the following aperture ranges:

17mm : f2.1 – f8.5
22.9mm : f2.9 – f11.5

Does any of this really matter? Only if we were making a comparison to the “normal” lens found on a camera – the 50mm. We’ll continue this in a future post.