Spectre – Does it work?

Over a year ago I installed Spectre (for IOS). The thought of having a piece of software that could remove moving objects from photographs seemed like a real cool idea. It is essentially a long-exposure app which uses multiple images to create two forms of effects: (i) an image sans moving objects, and (ii) images with light (or movement) trails. It is touted as using AI and computational photography to produce these long exposures. The machine learning algorithms provide the scene recognition, exposure compensation, and “AI stabilization”, supposedly allowing for up to a 9-second handheld exposure without the need for a tripod.

It seems as though the effects are provided by means of a computational photography technique known as “image stacking“. Image stacking just involves taking multiple images, and post-processing the series to produce a single image. For removing objects, the images are averaged. The static features will be retained in the image, the moving features will be removed through the image averaging process – which is why a stable image is important. For the light trails it works similar to a long exposure on a digital camera, where moving objects in the image become blurred, which is usually achieved by superimposing the moving features from each frame on the starting frame.

Fig.1: The Spectre main screen.

The app is very easy to use. Below the viewing window are a series of basic controls: camera flip; camera stabilization, and settings. The stabilization control, when activated, provides a small visual feature that determines when the iPhone is STABLE. As Spectre can perform a maximum of 9 seconds worth of processing, stabilization is an important attribute. The length of exposure is controlled by a dial in the lower-right corner of the app – you can choose between 3, 5, and 9 seconds. The Settings really only allows the “images” to be saved as Live Photos. The button at the top-middle turns light trails to ON, OFF, or AUTO. The button in the top-right allows for exposure compensation, which can be adjusted using a slider. The viewing window can also be tapped to set the focus point for the shot.

Fig.2: The use of Spectre to create a motion trail (9 sec). The length of the train, and the slow speed it was moving at created slow-motion perception.

Using this app allows one of two types of processing. As mentioned, one of these modes is the creation of trails – during the day these are motion trails, and at night these are light trails. Motion trails are added by turning “light trails” to the “ON” position (Fig.4). The second mode, with “light trails” to the “OFF” position, basically removes moving objects from the scene (Fig.3)

Fig.3: Light trails off with moving objects removed.
Fig.4: Light trails on with motion trails shown during daylight.

It is a very simple app, for which I do congratulate the app designers. Too many photo-type app designers try and cram 1001 features into an app, often overwhelming the user.

Here are some caveats/suggestions:

  • Sometimes motion trails occur because the moving object is too long to fundamentally change the content of the image stack. A good example is a slow moving train – the train never leaves the scene, during a 9-second exposure, and hence gets averaged into a motion trail. This is an example of a long-exposure image, as aptly shown in Figure 2. It’s still cool from as aesthetics point-of-view.
  • Objects must move in and out of frame during the exposure time. So it’s not great for trying to remove people from tourist spots, because there may be too many of them, and they may not move quick enough.
  • Long exposures tend to suffer from camera shake. Although Spectre offers an indication of stability, it is best to rest the camera on at least one stable surface, otherwise there is a risk of subtle motion artifacts being introduced.
  • Objects moving too slowly might be blurred, and still leave some residual movement in a scene where moving objects are to be removed.

Does this app work? The answer is both yes and no. During the day the ideal situation for his app is a crowded scene, but the objects/people have to be moving at a good rate. Getting rid of parked cars, and slow people is not going to happen. Views from above are obviously ideal, or scenes where the objects to be removed are moving. For example, doing light trails of moving cars at night produces cool images, but only if they are taken from a vantage point – photos taken at the same level of the cars only results in producing a band of bright light.

It would actually be cool if they could extend this app to allow for times above nine seconds, specifically for removing people from crowded scenes. Or perhaps allowing the user to specify a frame count and delay. For example, 30 frames with a 3 second delay between each frame. It’s a fun app to play around with, and well worth the $2.99 (although how long it will be maintained is another question, the last update was 11 months ago).

How does high-resolution mode work?

One of the tricks of modern digital cameras is a little thing called “high-resolution mode” (HRM), which is sometimes called pixel-shift. It effectively boosts the resolution of an image, even though the number of pixels used by the camera’s sensor does not change. It can boost a 24 megapixel image into a 96 megapixel image, enabling a camera to create images at a much higher resolution than its sensor would normally be able to produce.

So how does this work?

In normal mode, using a colour filter array like Bayer, each photosite acquires one particular colour, and the final colour of each pixel in an image is achieved by means of demosaicing. The basic mechanism for HRM works through sensor-shifting (or pixel-shifting) i.e. taking a series of exposures and processing the data from the photosite array to generate a single image.

  1. An exposure is obtained with the sensor in its original position. The exposure provides the first of the RGB components for the pixel in the final image.
  2. The sensor is moved by one photosite unit in one of the four principal directions. At each original array location there is now another photosite with a different colour filter. A second exposure is made, providing the second of the components for the final pixel.
  3. Step 2 is repeated two more times, in a square movement pattern. The result is that there are four pieces of colour data for every array location: one red, one blue, and two greens.
  4. An image is generated with each RGB pixel derived from the data, the green information is derived by averaging the two green values.

No interpolation is required, and hence no demosaicing.

The basic high-resolution mode process (the arrows represent the direction the sensor shifts)

In cameras with HRM, it functions using the motors that are normally dedicated to image stabilization tasks. The motors effectively move the sensor by exactly the amount needed to shift the photosites by one whole unit. The shifting moves in such a manner that the data captured includes one Red, one Blue and two Green photosites for each pixel.

There are many benefits to this process:

  • The total amount of information is quadrupled, with each image pixel using the actual values for the colour components from the correct physical location, i.e. full RGB information, no interpolation required.
  • Quadrupling the light reaching the sensor (four exposures) should also cut the random noise in half.
  • False-colour artifacts often arising in the demosaicing process are no longer an issue.

There are also some limitations:

  • It requires a very steady scene. It doesn’t work well if the camera is on a tripod, yet there is a slight breeze, moving the leaves on a tree.
  • It can be extremely CPU-intensive to generate a HRM RAW image, and subsequently drain the battery. Some systems, like Fuji’s GFX100 uses off-camera, post-processing software to generate the RAW image.

Here are some examples of the high resolution modes offered by camera manufacturers:

  • Fujifilm – Cameras like the GFX100 (102MP) have a Pixel Shift Multi Shot mode where the camera moves the image sensor by 0.5 pixels over 16 images and composes a 400MP image (yes you read that right).
  • Olympus – Cameras like the OM-D E-M5 Mark III (20.4MP), has a High-Resolution Mode which takes 8 shots using 1 and 0.5 pixel shifts, which are merged into a 50MP image.
  • Panasonic – Cameras like the S1 (24.2MP) have a High-Resolution mode that results in 96MP images. The Panasonic S1R at 47.3MP produces 187MP images.
  • Pentax – Cameras like the K-1 II (36.4MP) use a Pixel Shift Resolution System II with a Dynamic Pixel Shift Resolution mode (for handheld shooting).
  • Sony – Cameras like the A7R IV (61MP) uses a Pixel Shift Multi Shooting mode to produce a 240MP image.

Further Reading:

What is a crop factor?

The crop factor of a sensor is the ratio of one camera’s sensor size in relation to another camera’s sensor of a different size. The term is most commonly used to represent the ratio between a 35mm full-frame sensor and a crop sensor. The term was coined to help photographers understand how existing lenses would perform on new digital cameras which had sensors smaller than the 35mm film format.

How to calculate crop factors?

It is easy to calculate a crop factor using the size of a crop-sensor in relation to a full-frame sensor. This is usually determined by comparing diagonals, i.e. full-frame sensor diagonal/cropped sensor diagonal. The diagonals can be calculated using Pythagorean Theorem. Calculate the diagonal of the crop-sensor, and divide this into the diagonal of a full-frame sensor, which is 43.27mm.

Here is an example of deriving the crop factor for a MFT sensor (17.3×13mm):

  1. The diagonal of a full-frame sensor is √(36²+24²) = 43.27mm
  2. The diagonal of the MFT sensor is √(17.3²+13²) = 21.64mm
  3. The crop factor is 43.27/21.64 = 2.0

This means a scene photographed with a MFT sensor will be smaller by a factor or 2 than a FF sensor, i.e. it will have half the physical size in dimensions.

Common crop factors

TypeCrop factor
1/2.3″5.6
1″2.7
MFT2.0
APS-C (Canon)1.6
APS-C (Fujifilm Nikon, Ricoh, Sony, Pentax)1.5
APS-H (defunct)1.35
35mm full frame1.0
Medium format (Fuji GFX)0.8

Below is a visual depiction of these crop sensors compared to the 1× of the full-frame sensor.

The various crop-factors per crop-sensor.

How are crop factors used?

The term crop factor is often called the focal length multiplier. That is because it is often used to calculate the “full-frame equivalent” focal length of a lens on a camera with a cropped sensor. For example, a MFT sensor has a crop factor of 2.0. So taking a MFT 25mm lens, and multiplying it by 2.0 gives 50mm. This means that a 25mm lens on a MFT camera would behave more like a 50mm lens on a FF camera, in terms of AOV, and FOV. If a 50mm mounted on a full-frame camera is placed next to a 25mm mounted on a MFT camera, and both cameras were the same distance from the subject, they would yield photographs with similar FOVs. They would not be identical of course, because they have different focal lengths which modifies characteristics such as perspective and depth-of-field.

Things to remember

  • The crop-factor is a value which relates the size of a crop-sensor to a full-frame sensor.
  • The crop-factor does not affect the focal length of a lens.
  • The crop-factor does not affect the aperture of a lens.

The low-down on crop sensors

Before the advent of digital cameras, the standard reference format for photography was 35mm film, with frames 24×36mm in size. Everything in analog photography had the same frame of reference (well except for medium format, but let’s ignore that). In the early development of digital sensors, there were cost and technological issues with developing a sensor the same size as 35mm film. The first commercially available dSLR, the Nikon QV-1000C, released in 1988, had a ⅔” sensor with a crop-factor of 4. The first full-frame dSLR would not appear until 2002, the Contax N Digital, sporting 6 megapixels.

Using a camera with a sensor smaller presented one significant problem – the field of view of images captured using these sensors was narrower than the reference 35mm standard. When camera manufacturers started creating sensors smaller than 24×36mm, they had to create a term which described them in relation to a 35mm film-frame (full-frame). For that reason the term crop sensor is used to describe a sensor that is some percentage smaller than a full-frame sensor (sometimes the term cropped is used interchangeably). The picture a crop sensor creates is “cropped” in relation to the picture created with a full-frame sensor (using the lenses with the same focal length). The sensor does not actually cut anything, it’s just that parts of the image are simply ignored. To illustrate what happens in a full-frame versus a cropped sensor, consider Fig.1.

Fig.1: A visual depiction of full-frame versus crop sensor in relation to the 35mm image circle.

Lenses project a circular image, the “image circle”, but a sensor only records a rectangular portion of the scene. A full-frame sensor, like the one from the Leica SL2 captures a large portion of the 35mm lens circle, whereas the Micro-Four-Thirds cropped sensor of the Olympus OM-D E-M1, only captures the central portion of the lens – the rest of the image falls outside the scope of the sensor (the FF sensor is shown as a dashed box). While crop-sensor lenses are smaller than those of full-frame cameras, there are limits to reducing their size from the perspective of optics, and light capture. Fig.2 shows another perspective on crop sensors based on a real scene, comparing a full-frame sensor to an APS-C sensor (assuming the same “size” lenses, say 50mm).

Fig.2: Viewing full-frame versus crop (APS-C)

The benefits of crop-sensors

  • Crop-sensors are smaller than full-frame sensors, therefore the cameras are generally smaller. This means cameras are generally smaller in dimensions and weigh less.
  • The cost of crop-sensor cameras, and the cost of their lenses is generally lower than FF.
  • A smaller size of lens is required. For example, a MFT camera only requires a 150mm lens to achieve the equivalent of a 300mm FF lens, in terms of field-of-view.

The limitations of crop-sensors

  • Lenses on a crop-sensor camera with the same focal-length as those on a full-frame camera will generally have a smaller AOV. For example a FF 50mm lens will have an AOV=39.6°, while a APS-C 50mm lens would have an AOV=26.6°. To get a similar AOV on the cropped sensor APS-C, a 33mm equivalent lens would have to be used.
  • A cropped sensor captures less of the lens image circle than a full-frame.
  • A cropped sensor captures less light than a full-frame (which has larger photosites which are more sensitive to light).

Common crop-sensors

A list of the most common crop-sensor sizes currently used in digital cameras, as well as the average sensor sizes (sensors from different manufacturers can differ by as much as 0.5mm in size), and example cameras is summarized in Table 1. A complete list of sensor sizes can be found here. Smartphones are in a league of their own, and usually have small sensors of the type 1/n”. For example the Apple iPhone 12 Pro max has 4 cameras – the tele camera uses a 1/3.4″ (4.23×3.17mm) sensor, and the tele camera a 1/3.6″ sensor (4×3mm).

TypeExample Cameras
1/2.3″6.16×4.62mmSony HX99, Panasonic Lumix DC-ZS80, Nikon Coolpix P950
1″13.2×8.8mmCanon Powershot G7X M3, Sony X100 VII
MFT / m43 17.3×13mmPanasonic Lumix DC-G95, Olympus OM-D E-M1 Mark III
APS-C (Canon)23.3×14.9mmCanon EOS M50 Mark II
APS-C 23.5×15.6mmRicoh GRIII, Fuji X-E3, Sony α6600, Sigma sd Quattro
35mm Full Frame 36×24mmSigma fpL, Canon EOS R5, Sony α, Leica SL2-S, Nikon Z6II
Medium format44×33mmFuji GFX 100
Table 1: Crop sensor sizes.

Figure 3 shows the relative sizes of three of the more common crop sensors: APS-C (Advanced Photo System type-C), MFT (Micro-Four-Thirds), and 1″, as compared to a full-frame sensor. The APS-C sensor size is modelled on the Advantix film developed by Kodak, where the Classic image format had a size of 25.1×16.7mm.

Fig.3: Examples of crop-sensors versus a full-frame sensor.

Defunct crop-sensors

Below is a list of sensors which are basically defunct, usually because they are not currently being used in any new cameras.

TypeSensor sizeExample Cameras
1/1.7″7.53×5.64mmNikon Coolpix P340 (2014), Olympus Stylus 1 (2013), Leica C (2013)
2/3″8.8×6.6mmFujifilm FinePix X10 (2011)
APS-C Foveon 20.7×13.8mmSigma DP series (2006-2011)
APS-H Foveon26.6×17.9mmSigma sd Quattro H (2016)
APS-H27×18mmLeica M8(2006), Canon EOS 1D Mark IV (2009)
Table 2: Defunct crop sensor sizes.

How good is High Dynamic Range (HDR) photography?

There are photographic situations where the lighting conditions are not ideal, even for the most modern “smart” camera – and they occur quite often. On vacation, taking landscapes, with the vast contrast difference between the sky and land, or low-light situations, scenes with shadows. These situations are unavoidable, especially when on vacation when the weather can be unpredictable.

The problem is one of perception. A scene that we view with our eyes, does not always translate into a photograph. This is because the human eye has more capacity to differentiate between tones than a camera. A good example of this is taking a photo from the inside of a building, through a window – the camera will likely produce an underexposed room, or an overexposed sky. Here is an example of a photograph taken during a sunny, yet slightly overcast day. One side of the building is effectively in shadow, whilst the other side is brightly lit-up.

HDR photography before shot

Olympus EM-5(MII), 12mm, f8.0, 1/640, ISO200 (P mode)

One way of compensating for the inability of a camera to take a good photograph in these situations is a computational photography technique known as High Dynamic Range(HDR). HDR is a technique which can be applied in-camera, or through an application such as Photoshop. For example, a camera such as the Olympus EM5(Mark II), has a button marked HDR, and even the iPhone camera has a HDR function.

In its simplest form, HDR takes three images of the exact same scene, with different exposures, and combines them together. The three exposures are normally (i) an exposure for shadows, (ii) an exposure for highlights, and (iii) an exposure for midtones. This is sometimes done by modifying the shutter speed, and keeping the aperture and ISO constant. Here is a HDR version of the photograph above, with the effect of the shadow very much reduced. Is it a better image? That is in the eye of the beholder. It does seem to loose something in translation.

HDR photography after processing

Olympus EM-5(MII), 12mm, f7.1, 1/500, ISO200 (HDR)

But HDR is not a panacea. – it won’t solve everything, and should be used sparingly. it is sometimes easier to perform exposure bracketing, and choose an appropriate image from those generated.

Aesthetically motivated picture processing

For years I wrote scientific papers on various topics in image processing, but what I learnt from that process was that few of the papers written are actually meaningful. For instance, in trying to create new image sharpening algorithms many people forgot the whole point of sharpening. Either a photographer strives for sharpness in an entire image or endeavours to use blur as a means of focusing the attention on something of interest in the image (which is in focus, and therefore sharp). Many sharpening algorithms have been developed with the concept of sharpening the whole image… but this is often a falsehood. Why does the photo need to be sharpened? What is the benefit? A simple sharpening with unsharp masking (which is an unfortunate name for a filter) works quite well in its task. But it was designed at a time when images were small, and filters were generally simple 3×3 constructs. Applying the original filter to a 24MP 4000×6000 pixel image will make little, if any difference. On the other hand, blurring an image does nothing for its aesthetics unless it is selective, in essence trying to mimic bokeh in some manner.

Much of what happens in image processing (aside from machine vision) is aesthetically based. The true results of image processing cannot be provided in a quantitative manner and that puts it at odds with scientific methodology. But who cares? Scientific thought in an academic realm is far too driven by pure science with little in the way of pure inventing. But alas few academics think this way, most take on the academic mantra and are hogtied to doing things in a specified way. I no longer prescribe to this train of thoughts, and I don’t really know if I ever did.

aesthetic appeal, picture of Montreal metro with motion blur

This picture shows motion blur which results from a moving subway car, whilst the rest of the picture remains in focus. The motion blur is a part of the intrinsic appeal of the photograph – yet there is no way of objectively quantifying the aesthetic value – it is something that can only be qualitatively and subjectively evaluated.

Aesthetically motivated Image processing is a perfect fit for photographs because while there are theoretical underpinnings to how lenses are designed, and technical principles of how a camera works, the ultimate result – a photograph, is the culmination of the mechanical ability of the camera and the artistic ability of the photographer. Machine vision, the type used in manufacturing facilities to determine things like product defects is different, because it is tasked with precision automated photography in ideal controlled conditions. To develop algorithms to remove haze from natural scenes, or reduce glare is extremely difficult, and may be best taken when thee is no haze. Aesthetic-based picture processing is subjectively qualitative and there is nothing wrong with that. It is one of the criteria that sets humans apart from machines – the inherent ability to visualize things differently. Some may find bokeh creamy while others may find it too distractive, but that’s okay. You can’t create an algorithm to describe bokeh because it is an aesthetic thing. The same way it’s impossible to quantify taste, or distinguish exactly what umami is.

Consider the following quote from Bernard Berenson (Aesthetics, Ethics, and History) –

‘The eyes without the mind would perceive in solids nothing but spots or pockets of shadow and blisters of light, chequering and criss-crossing a given area. The rest is a matter of mental organization and intellectual construction. What the operator will see in his camera will depend, therefore, on his gifts, and training, and skill, and even more on his general education; ultimately it will depend on his scheme of the universe.’

Why aesthetic appeal in image processing matters

What makes us experience beauty?

I have spent over two decades writing algorithms for image processing, however I have never really created anything uber fulfilling . Why? Because it is hard to create generic filters, especially for tasks such as image beautification. In many ways improving the aesthetic appeal of photographs involves modifying the content on an image in more non natural ways. It doesn’t matter how AI-ish an algorithm is, it cannot fathom what the concept of aesthetic appeal is.  A photograph one person may find pleasing may be boring to others. Just like a blank canvas is considered art to some, but not to others. No amount of mathematical manipulation will lead to a algorithmic panacea of aesthetics. We can modify the white balance and play with curves, indeed we can make 1001 changes to a photograph, but the final outcome will be perceived differently by different people.

After spending years researching image processing algorithms, and designing some of my own, it wasn’t until I decided to take the art of acquiring images to a greater depth that I realized algorithms are all good and well, but there is likely little need for the plethora of algorithms created every year. Once you pick up a camera, and start playing with different lenses, and different camera settings, you begin to realize that part of the nuance any photograph is its natural aesthetic appeal. Sure, there are things that can be modified to improve aesthetic appeal, such as contrast enhancement or improving the sharpness, but images also contain unfocused regions that contribute to their beauty.

If you approach image processing purely from a mathematical (or algorithmic) viewpoint, what you are trying to achieve is some sort of utopia of aesthetics. But this is almost impossible, largely because every photography is unique.  It is possible to improve the acuity of objects in an image using techniques such as unsharp masking, but it is impossible to resurrect a blurred image – but maybe that’s the point. One could create an fantastic filter that sharpens an image beautifully, but with the sharpness of modern lenses, that may not be practical. Consider this example of a photograph taken in Montreal. The image has good definition of colour, and has a histogram which is fairly uniform. There isn’t a lot that can be done to this image, because it truly does represent the scene as it exists in real life. If I had taken this photo on my iPhone, I would be tempted to post it on Instagram, and add a filter… which might make it more interesting, but maybe only from the perspective of boosting colour.

aestheticAppeal1

A corner hamburger joint in Montreal – original image.

Here is the same image with only the colour saturation boosted (by ×1.6). Have its visual aesthetics been improved? Probably. Our visual system would say it is improved, but that is largely because our eyes are tailored to interpret colour.

aestheticAppeal2

A corner hamburger joint in Montreal – enhanced image.

If you take a step back from the abyss of algorithmically driven aesthetics, you begin to realize that too few individuals in the image processing community have taken the time to really understand the qualities of an image. Each photograph is unique, and so the idea of generic image processing techniques is highly flawed. Generic techniques work sufficiently well in machine vision applications where the lighting is uniform, and the task is also uniform, e.g. inspection of rice grains, or identification of burnt potato chips. No aesthetics are needed, just the ability to isolate an object and analyze it for whatever quality is needed. It’s one of the reasons unsharp masking has always been popular. Alternative algorithms for image sharpening really don’t work much better. And modern lenses are sharp, in fact many people would be more likely to add blur than take it away.

 

In-camera keystone compensation (Olympus) (ii)

So I took some photographs using the Olympus keystone compensation on a trip to Montreal. Most of them deal with buildings that are leaning back, which is the classic case when trying to photograph a building. The first set deal with some landscape photographs. In both these photographs I could not move any further back to take the photographs, and both were taken with the Olympus 12-40mm, set as wide angle (12mm or 24mm full frae equivalent).It was possible to correct both images, without loosing any of the building.

keystone correction of photographs
Originals (left), keystone corrected (right)

The second case deals with portrait format photographs. In both cases it was slightly more challenging to make sure the entire picture was in the frame, but doing it in-situ it was possible to assure this happened. Doing in post-processing may result in the lose of a portion of the photograph. In the lower image I had enough leeway to position the keystone-corrected frame in such a manner that the building is surrounded by ample space.

keystone correction of photographs
Originals (left), keystone corrected (right)

Compensating for perspective distortion often comes at a price. Modifying the geometry of a photograph means that less will fit in the photograph. Taking a photograph too close to a building may mean something is cut off.

Horizontal keystone correction can sometimes be more difficult, because the distortion is usually a compound distortion. In the example below, the photograph was taken slightly off-centre, producing an image which is distorted both from a horizontal and a vertical perspective.

keystone correction
Complex distortion

Is there a loss in aesthetic appeal? Maybe. Food for future thought.