Aesthetically motivated picture processing

For years I wrote scientific papers on various topics in image processing, but what I learnt from that process was that few of the papers written are actually meaningful. For instance, in trying to create new image sharpening algorithms many people forgot the whole point of sharpening. Either a photographer strives for sharpness in an entire image or endeavours to use blur as a means of focusing the attention on something of interest in the image (which is in focus, and therefore sharp). Many sharpening algorithms have been developed with the concept of sharpening the whole image… but this is often a falsehood. Why does the photo need to be sharpened? What is the benefit? A simple sharpening with unsharp masking (which is an unfortunate name for a filter) works quite well in its task. But it was designed at a time when images were small, and filters were generally simple 3×3 constructs. Applying the original filter to a 24MP 4000×6000 pixel image will make little, if any difference. On the other hand, blurring an image does nothing for its aesthetics unless it is selective, in essence trying to mimic bokeh in some manner.

Much of what happens in image processing (aside from machine vision) is aesthetically based. The true results of image processing cannot be provided in a quantitative manner and that puts it at odds with scientific methodology. But who cares? Scientific thought in an academic realm is far too driven by pure science with little in the way of pure inventing. But alas few academics think this way, most take on the academic mantra and are hogtied to doing things in a specified way. I no longer prescribe to this train of thoughts, and I don’t really know if I ever did.

aesthetic appeal, picture of Montreal metro with motion blur

This picture shows motion blur which results from a moving subway car, whilst the rest of the picture remains in focus. The motion blur is a part of the intrinsic appeal of the photograph – yet there is no way of objectively quantifying the aesthetic value – it is something that can only be qualitatively and subjectively evaluated.

Aesthetically motivated Image processing is a perfect fit for photographs because while there are theoretical underpinnings to how lenses are designed, and technical principles of how a camera works, the ultimate result – a photograph, is the culmination of the mechanical ability of the camera and the artistic ability of the photographer. Machine vision, the type used in manufacturing facilities to determine things like product defects is different, because it is tasked with precision automated photography in ideal controlled conditions. To develop algorithms to remove haze from natural scenes, or reduce glare is extremely difficult, and may be best taken when thee is no haze. Aesthetic-based picture processing is subjectively qualitative and there is nothing wrong with that. It is one of the criteria that sets humans apart from machines – the inherent ability to visualize things differently. Some may find bokeh creamy while others may find it too distractive, but that’s okay. You can’t create an algorithm to describe bokeh because it is an aesthetic thing. The same way it’s impossible to quantify taste, or distinguish exactly what umami is.

Consider the following quote from Bernard Berenson (Aesthetics, Ethics, and History) –

‘The eyes without the mind would perceive in solids nothing but spots or pockets of shadow and blisters of light, chequering and criss-crossing a given area. The rest is a matter of mental organization and intellectual construction. What the operator will see in his camera will depend, therefore, on his gifts, and training, and skill, and even more on his general education; ultimately it will depend on his scheme of the universe.’

Is the eye equivalent to a 50mm lens?

So in the final post in this series we will look at the adage that a 50mm lens is a “normal” lens because it equates to the eyes view of things. Or is it 43mm… or 35mm? Again a bunch of number seem to exist on the net, and it’s hard to decipher what the real answer is. Maybe there is no real answer, and we should stop comparing eyes to cameras? But for arguments sake let’s look at the situation in a different way by asking what lens focal length most closely replicates the Angle Of View (AOV) of the human visual system (HVS).

One common idea floating around is that the “normal” length of a lens is 43mm because  a “full-frame” film, or sensor is 24×36mm in size, and if you calculate the length of the diagonal you get 43.3mm. Is this meaningful? Unlikely. You can calculate the various AOVs for each of the dimensions using the formula: 2 arctan(d/2f); where is the dimension, and f is the focal length. So for the 24×36mm frame with a 50mm lens, for the diagonal we get: 2 arctan(43.3/(2×50) = 46.8°. This diagonal AOV is the one most commonly cited with lenses, but probably not the right one because few people think about a diagonal AOV. A horizontal one is more common, using d=36mm. Now we get 39.6°.

So now let’s consider the AOV of the HVS. The normal AOV of the HVS assuming binocular vision constraints of roughly 120° (H) by 135° (V), but the reality is that our AOV with respect to targeted vision is probably only 60° horizontally and 10-15° vertically from a point of focus. Of the horizontal vision, likely only 30° is focused. Let’s be conservative and assume 60°.

So a 50mm lens is not close. What about a 35mm lens? This would end up with a horizontal AOV of 54.4°, which is honestly a little closer. A 31mm lens gives us roughly 60°. A 68mm gives us the 30° of focused vision. What about if we wanted a lens AOV equivalent for the binocular 120° horizontal view? We would need a 10.5mm lens, which is starting to get a little fish-eyed.

There is in reality, no single answer. It really depends on how much of the viewable region of the HVS you want to include.

Photographs and the craft of chance

Photographs are the encapsulation of our lives. They are snapshots, brief interludes into slices of time. Times long past. Memories of fighting in the trenches in WW1, the landings at Normandy, life in small Italian mountain villages. The best and worse of our histories. Photographs capture such fleeting moments that in most cases it would be impossible to reproduce. Photography is in its core essence the art of chance. Of being in the right place at the right time, of being able to capture just the right amount of photons entering the camera. Blink, and it could all be different. Before photographs our history was handed down through generations in stories, or paintings upon the wall. But neither of these is fleeting, they are thought-out, prescribed renditions of history. Photographs are not, they are raw, invoking, and often need no explanation. And while they could be considered by some to be art, they are crafted using tools which allow light to be captured. The true result is in natures control.

Capturing natural life is truly the essence of the craft of chance. That one photograph that captures an insect holding still, almost posing for the shot – blink and it will move on to its next feast.

Should a camera think?

Photographer Arnold Newman  (1918-2006) once said “The camera is a mirror with a memory, but it cannot think.”.  Has anything really changed since analog cameras evolved into digital ones? Do cameras take better pictures, or do they just take better “quality” pictures because certain tasks, e.g. exposure, have been automated? Digital cameras automatically focus a scene, and do just about everything else necessary to automate the process (except pick the scene). They perform facial recognition, and the newer ones even have types of machine learning that do various things – most likely make the task of photography even “easier”. But what’s the point? Part of the reason for taking a photograph is the experience involved. Playing with the settings, maybe focusing the lens manually – all this gives a better insight in the process of taking a photograph. Otherwise it becomes just another automated phenomena in our lives – which is *ok* for takings snaps on mobile devices I guess… but not on cameras.

Why are lenses round, and photos rectangular?

Have you ever wondered why lenses are round, and photographs rectilinear? Obviously square lenses would not work, but why not round photographs? Well, lenses do indeed produce a circular image, however the quality of this image with respect to sharpness and brightness is not at all uniform. It is sharpest and brightest near the centre of the lens, becoming progressively less sharp and bright towards the outer edge of the circle. This deterioration is due to factors such as lens aberrations which become more pronounced towards the edges of the image. In terms of the photograph, only the inner, portion of the circular image should be used, hence why photographs are rectangular, or historically more square (before 35mm film).

Basically for lenses on a particular sensor, the diameter of the circle has to be larger than the diagonal of the frame. The example below shows a Full Frame 24mm×36mm sensor and its associated image circle with a diameter of 43.27mm.

This basically means that the image sensor only makes use of roughly 59% of the image circle (the sensor is 864mm², the image circle 1470mm²). Using a circular fisheye lens, or one that is smaller than the sensor, will result in a circular image. For example, using a small 16mm cinematographic lens on a full frame sensor.

In some cases, such in the case of the Leica D-LUX 6, the camera allows swapping between a bunch of aspect ratios: 16:9, 4:3, 3:2, and 1:1. This camera has a 1/1.7″ sensor (crop factor of 4.6). The actual sensor size is 3678 x 2745 pixels.

The camera does lie – a paradox of sorts

The greatest misconception about photography is that the camera is “all seeing”. But as we previously explored the camera does lie. The majority of photographs are lies because they don’t have any basis in fact. First and foremost, photographs are 2D representations of 3D scenes, so do not capture the world as it truly is. Black and white photographs are monochromatic representations of a coloured reality, and “frozen” stills represent moving objects. Yet every photograph is a true rendition of a subject/object/scene at one particular moment in time. This is something of a paradox – everything visible in the cameras field of view is authentic, but it lacks the intricate qualities of the real scene. You can take a picture of a sunrise on a beach, but there will be missing the factors that make it a memorable scene – the wind blowing (sure video can capture this), the smell of the sea, the warmth of the first rays of the sun, the feel of the sand on the beach. The camera then produces a lie, in so much as it only tells a portion of a story, or distorts it in some manner.  A difference exists between a photograph, and the subject/scene it depicts. It is a snapshot in time, nothing more.

Conversely, the camera allows us to capture things the human eye cannot perceive. It allows differences in viewing angles – a fisheye lens can see 180° in extremes, and although the human eyes can perceive 120° individually, dual eye overlap is only about 120°, and of that the central angle of view is only about 40-60°. Our peripheral vision is only good enough for sensing motion, and huge objects. Camera’s are also capable of stopping motion – human eyes can’t, we have no ability to slow down a video, or “freeze” motion. Therefore the cameras ability to lie can be beneficial, producing images that are more effectual than the actual experience.

Examples include far-away scenes that the human eye is incapable of perceiving, yet a telephoto lens can show quite distinctly. Another is high speed photography of an egg being dropped on a hard surface, where each frame represents milliseconds in time, yet clearly depicts each facet of the egg hitting the surface with clarity the human eye is incapable of. Or, an image where blur and unsharpness (or bokeh), have been used with great effect to isolate a quality of a particular subject/object (human eyes don’t actively perceive the unsharp regions of our vision). In all these cases the subject/object is shown in a way different to how the eye would perceive them, and in many cases the photograph contains information that is lost to the human eye. Of course a photograph can also hide information. A photograph of a small village in a valley may veil the fact that a large expressway lies behind the photographer – the viewer of the photograph sees only a secluded village.

For good or bad, cameras do lie.

The camera does not lie

There is an old phrase, “the camera does not lie“, which can be interpreted as both true and false. In historic photos where there was little done in the way of manipulation, the photograph often did hold the truth of what appeared in the scene. In modern photographs that are “enhanced” this is often not the case. But there is another perspective. The phrase is true because the camera objectively captures everything in the scene within its field of view. But it is also false, because the human eye, is not all seeing, perceiving the world in a highly subjective manner – focusing on the object (or person) of interest. Most photographs tend to contain far too much information, visual “flotsam” that is selectively discarded by the human visual system. The rendition of colours can also appear “unnatural” in photographs because of issues with white balance, film types (in analog cameras), and sensors (digital cameras). 

What the human eye sees (left) versus the camera (right)

A good example of how the human eye and camera lens perceive things differently is shown in the two photos above. The photograph on the right contains photographic perspective distortion (keystoning), where the tall buildings tend to “fall” or “lean” within the picture. The human eye (simulated on the left) on the other hand, corrects for this issue, and so does not perceive it.  To photograph a tall building, the camera is often tilted upward, and in position the vertical lines of the building converge toward the top of the picture. The convergence of vertical lines is a natural manifestation of perspective which we find acceptable in the horizontal plane (e.g. the convergence of railway tracks in the distance), but which seems unnatural in the vertical plane.

There are many other factors that influence the outcome of a picture. Some are associated with the physical abilities of a camera and its associated lenses, others the environment. For example the colour of ambient light (e.g. a colour cast created by the sun setting), perspective (the wider a lens the more distortion introduced), or contrast (e.g. B&W images becoming “flat”). While the camera does not lie, it rarely exactly reproduces the world as we see it. Or maybe we don’t perceive the world around us as it truly is.