The photography of Daidō Moriyama

Daidō Moriyama was born in Ikeda, Osaka, Japan in 1938, and came to photography in the late 1950s. Moriyama studied photography under Takeji Iwamiya before moving to Tokyo in 1961 to work as an assistant to Eikoh Hosoe. In his early 20’s he bought a Canon 4SB and started photographing on the streets on Osaka. Moriyama was the quintessential street photographer focused on the snapshot. Moriyama likened snapshot photography to a cast net – “Your desire compels you to throw it out. You throw the net out, and snag whatever happens to come back – it’s like an ‘accidental moment’” [1]. Moriyama’s advice on street photography was literally “Get outside. It’s all about getting out and walking.” [1]

In the late 1960s Japan was characterized by street demonstrations protesting the Vietnam War and the continuing presence of the US in Japan. Moriyama joined a group of photographers, associated with the short-lived (3-issue) magazine Provoke (1968-69), which really dealt with elements of experimental photography. His most provocative work during the Provoke-era was the are-bure-boke style that illustrates a blazing immediacy. His photographic style is characterized by snapshots which are gritty, grainy black and white, out-of-focus, extreme contrast, Chiaroscuro (dark, harsh spotlighting, mysterious backgrounds). Moriyama is “drawn to black and white because monochrome has stronger elements of abstraction or symbolism, colour is something more vulgar…”.

“My approach is very simple — there is no artistry, I just shoot freely. For example, most of my snapshots I take from a moving car, or while running, without a finder, and in those instances I am taking the pictures more with my body than my eye… My photos are often out of focus, rough, streaky, warped etc. But if you think about I, a normal human being will in one day receive an infinite number of images, and some are focused upon, other are barely seen out of the corners of one’s eye.”

Moriyama is an interesting photographer, because he does not focus on the camera (or its make), instead shoots with anything, a camera is just a tool. He photographs mostly with compact cameras, because with street photography large cameras tend to make people feel uncomfortable. There were a number of cameras which followed the Canon 4SB, including a Nikon S2 with a 25/4, Rolleiflex, Minolta Autocord, Pentax Spotmatic, Minolta SR-2, Minolta SR-T 101 and Olympus Pen W. One of Moriyama’s favourite film camera’s was the Ricoh GR series, using a Ricoh GR1 with a fixed 28mm lens (which appeared in 1996) and sometimes a Ricoh GR21 for a wider field of view (21mm). Recently he was photographing with a Ricoh GR III.

“I’ve always said it doesn’t matter what kind of camera you’re using – a toy camera, a polaroid camera, or whatever – just as long as it does what a camera has to do. So what makes digital cameras any different?”

Yet Moriyama’s photos are made in the post-processing stage. He captures the snapshot on the street and then makes the photo in the darkroom (or in Silver Efex with digital). Post-processing usually involves pushing the blacks and whites, increasing contrast and adding grain. In his modern work it seems as though Moriyama photographs in colour, and converts to B&W in post-processing (see video below). It is no wonder that Moriyama is considered by some to be the godfather of street photography, saying himself that he is “addicted to cities“.

“[My] photos are often out of focus, rough, streaky, warped, etc. But if you think about it, a normal human being will in one day perceive an infinite number of images, and some of them are focused upon, others are barely seen out of the corner of one’s eye.”

For those interested, there are a number of short videos. The one below shows Moriyama in his studio and takes a walk around the atmospheric Shinjuku neighbourhood, his home from home in Tokyo. There is also a longer documentary called Daidō Moriyama: Near Equal, and one which showcases some of his photographs, Daido Moriyama – Godfather of Japanese Street Photography.

Artist Daido Moriyama – In Pictures | Tate (2012)

Further Reading:

Removing unwanted objects from pictures with Cleanup.pictures

Ever been on vacation somewhere, and wanted to take a picture of something, only to be thwarted by the hordes of tourists? Typically for me it’s buildings of architectural interest, or wide-angle photos in towns. It’s quite a common occurrence, especially in places where tourists tend to congregate. There aren’t many choices – if you can come back at a quieter time that may be the best approach, but often you are at a place for a limited time-frame. So what to do?

Use software to remove the offending objects, or people. Now this type of algorithm designed to remove objects from an image has been around for about 20 years, known in the early years as digital inpainting, akin to the conservation process where damaged, deteriorating, or missing parts of an artwork are filled in to present a complete image. In its early forms digital inpainting algorithms worked well in scenes where the object to be removed was surrounded by fairly uniform background, or pattern. In complex scenes they often didn’t fair so well. So what about the newer generation of these algorithms?

There are many different types of picture cleaning software, some stand-alone such as the AI-powered IOS app Inpaint, others in the form of features in photo processing software such as Photoshop. One new-comer to the scene is web-based, open-source, Cleanup.pictures. It is incredibly easy to use. Upload a picture, choose the size of the brush tool, paint over the unwanted object with the brush tool, and voila! a new image, sans the offending object. Then you can just download the “cleaned” image. So how well does it work? Below are some experiments.

The first image is a vintage photograph of Paris, removing all the people from the streets. The results are actually quite exceptional.

The second image is a photograph taken in Glasgow, where the people and passing car have been erased.

The third image is from a trip to Norway, specifically the harbour in Bergen. This area always seems to have both people and boats, so it is hard to take clear pictures of the historical buildings.

The final image is a photograph taken by Prokudin-Gorskii Collection at the Library of Congress. The image is derived from a series of glass plates, and suffers from some of the original glass plates being broken, with missing pieces of glass. The result of cleaning up the image, actually has done a better job than I could ever have imagined.

The AI used in this algorithm is really good at what it does, like *really good*, and it is easy to use. You just keep cleaning up unwanted things until you are happy with the result. The downsides? It isn’t exactly perfect all the time. In regions to be removed where there are fine details you want to retain, they are often removed. Sometimes areas become “soft” because they have to be “created” because they were obscured by objects before – especially prevalent in edge detail. Some examples are shown below:

Creation of detail during inpainting

Loss of fine detail during inpainting

It only produces low-res images, with a maximum width of 720 pixels. You can upgrade to the Pro version to increase resolution (2K width). It would be interesting to see this algorithm produce large scale cleaned images. There is also the issue of uploading personal photos to a website, although they do make the point of saying that images are discarded once processed.

For those interested in the technology behind the inpainting, it is based on an algorithm known as large mask inpainting, developed by a group at Samsung, and associates [1]. The code can be obtained directly from github for those who really want to play with things.

  1. Suvorov, R., et al. Resolution-robust Large Mask Inpainting with Fourier Convolutions (2022)

Demystifying Colour (viii) : CIE colour model

The Commission Internationale de l’Eclairage (French for International Commission on Illumination) , or CIE is an organization formed in 1913 to create international standards related to light and colour. In 1931, CIE introduced CIE1931, or CIEXYZ, a colorimetric colour space created in order to map out all the colours that can be perceived by the human eye. CIEXYZ was based on statistics derived from extensive measurements of human visual perception under controlled conditions.

In the 1920s, colour matching experiments were performed independently by physicists W. David Wright and John Guild, both in England [2]. The experiments were carried out with 7 (Guild) and 10 (Wright) people. Each experiment involved a subject looking through a hole which allowed for a 2° field of view. On one side was a reference colour projected by a light source, while on the other were three adjustable light sources (the primaries were set to R=700.0nm, G=546.1nm, and B=435.8nm.). The observer would then adjust the values of three primary lights until they can produce a colour indistinguishable from a reference light. This was repeated for every visible wavelength. The result of the colour-matching experiments was a table of RGB triplets for each wavelength. These experiments were not about describing colours with qualities like hue and saturation, but rather just attempt to explain how combinations of light appear to be the same colour to most people.

Fig.1: An example of the experimental setup of Guild/Wright

In 1931 CIE amalgamated Wright and Guild’s data and proposed two sets of of colour matching functions: CIE RGB and CIE XYZ. Based on the responses in the experiments, values were plotted to reflect how the average human eye senses the colours in the spectrum, producing three different curves of intensity for each light source to mix all colours of the colour spectrum (Figure 2), i.e. Some of the values for red were negative, and the CIE decided it would be more convenient to work in a colour space where the coefficients were always positive – the XYZ colour matching functions (Figure 3). The new matching functions had certain characteristics: (i) the new functions must always be greater than or equal to zero; (ii) the y function would describe only the luminosity, and (iii) the white-point is where x=y=z=1/3. This produced the CIE XYZ colour space, also known as CIE 1931.

Fig.2: CIE RGB colour matching functions

Fig.3: CIE XYZ colour matching functions

The CIE XYZ colour space defines a quantitative link between distributions of wavelengths in the electromagnetic visible spectrum, and physiologically perceived colours in human colour vision. The space is based on three fictional primary colours, X, Y, and Z, where the Y component corresponds to the luminance (as a measure of perceived brightness) of a colour. All the visible colours reside inside an open cone-shaped region, as shown in Figure 4. CIE XYZ is then a mathematical generalization of the colour portion of the HVS, which allows us to define colours.

Fig.4: CIE XYZ colour space (G denotes the axis of neutral gray).
Fig.5: RGB mapped to CIE XYZ space

The luminance in XYZ space increases along the Y axis, starting at 0, the black point (X=Y=Z=0). The colour hue is independent of the luminance, and hence independent of Y. CIE also defines a means of describing hues and saturation, by defining three normalized coordinates: x, y, and z (where x+y+z=1).

x = X / (X+Y+Z)
y = Y / (X+Y+Z)
z = Z / (X+Y+Z)
z = 1 - x - y

The x and y components can then be taken as the chromaticity coordinates, determining colours for a certain luminance. This system is called CIE xyY, because a colour value is defined by the chromaticity coordinates x and y in addition to the luminance coordinate Y. More on this in the next post on chromaticity diagrams.

The RGB colour space is related to XYZ space by a linear coordinate transformation. The RGB colour space is embedded in the XYZ space as a distorted cube (see Figure 5). RGB can be mapped onto XYZ using the following set of equations:

X = 0.41847R - 0.09169G - 0.0009209B
Y = -0.15866R + 0.25243G - 0.0025498B (luminance)
Z = -0.082835R + 0.015708G + 0.17860B

CIEXYZ is non-uniform with respect to human visual perception, i.e. a particular fixed distance in XYZ is not perceived as a uniform colour change throughout the entire colour space. CIE XYZ is often used as an intermediary space in determining a perceptually uniform space such as CIE Lab (or Lab), or CIE LUV (or Luv).

  • CIE 1976 CIEL*u*v*, or CIELuv, is an easy to calculate transformation of CIE XYZ which is more perceptually uniform. Luv was created to correct the CIEXYZ distortion by distributing colours approximately proportional to their perceived colour difference.
  • CIE 1976 CIEL*a*b*, or CIELab, is a perceptually uniform colour differences and L* lightness parameter has a better correlation to perceived brightness. Lab remaps the visible colours so that they extend equally on two axes. The two colour components a* and b* specify the colour hue and saturation along the green-red and blue-yellow axes respectively.

In 1964 another set of experiments were done allowing for a 10° field of view, and are known as the CIE 1964 supplementary standard colorimetric observer. CIE XYZ is still the most commonly used reference colour space, although it is slowly being pushed to the wayside by CIE1976. There is a lot of information on CIE XYZ and its derivative spaces. The reader interested in how CIE1931 came about in referred to [1,4]. CIELab is the most commonly used CIE colour space for imaging, and the printing industry.

Further Reading

  1. Fairman, H.S., Brill, M.H., Hemmendinger, H., “How the CIE 1931 color-matching functions were derived from Wright-Guild data”, Color Research and Application, 22(1), pp.11-23, 259 (1997)
  2. Service, P., The Wright – Guild Experiments and the Development of the CIE 1931 RGB and XYZ Color Spaces (2016)
  3. Abraham, C., A Beginners Guide to (CIE) Colorimetry
  4. Zhu, Y., “How the CIE 1931 RGB Color Matching Functions Were Developed from the Initial Color Matching Experiments”.
  5. Sharma, G. (ed.), Digital Color Imaging Handbook, CRC Press (2003)

What is (camera) sensor resolution?

What is sensor resolution? It is not the number of photosites on the sensor, that is just photosite count. In reality sensor resolution is a measure of density, usually the number of photosites per some area, e.g. MP/cm2. For example a full-frame sensor with 24MP has an area of 36×24mm = 864mm2, or 8.64cm2. Dividing 24MP by this gives us 2.77 MP/cm2. It could also mean the actual area of a photosite, usually expressed in terms of μm2.

Such measures are useful in comparing different sensors from the perspective of density, and characteristics such as the amount of light which is absorbed by the photosites. A series of differing sized sensors with the same pixel count (image resolution) will have differing sized photosites and sensor resolutions. For 16 megapixels, a MFT sensor will have 7.1 MP/cm2, APS-C 4.4 MP/cm2, full-frame 1.85 MP/cm2, and medium format 1.1 MP/cm2. For the same pixel count, the larger the sensor, the larger the photosite.

Sensor resolution for the same image resolution, i.e. the same pixel count (e.g. 16MP)

It can also be used in comparing the same sized sensor. Consider the following three Fujifilm cameras and their associated APS-C sensors (with an area of 366.6mm2):

  • The X-T30 has 26MP, 6240×4160 photosites on its sensor. The photosite pitch is 3.74µm (dimensions), and the pixel density is 7.08 MP/cm2.
  • The X-T20 has a pixel count of 24.3MP, or 6058×4012 photosites with a photosite pitch is 3.9µm (dimensions), and a pixel density is 6.63 MP/cm2.
  • The X-T10 has a pixel count of 16.3MP, or 4962×3286 photosites with a photosite pitch is 4.76µm (dimensions), and a pixel density is 4.45 MP/cm2.

The X-T30 has a higher sensor resolution than both the X-T20 and X-T10. The X-T20 has a higher sensor resolution than the X-T10. The sensor resolution of the X-T30 is 1.61 times as dense as that of the X-T10.

Sometimes different sensors have similar photosite sizes, and similar photosite densities. For example the Leica SL2 (2019), is full-frame 47.3MP sensor with a photosite area of 18.23 µm2 and a density of 5.47 MP/cm2. The antiquated Olympus PEN E-P1 (2009) is MFT 12MP sensor with a photosite area of 18.32 µm2 and a density of 5.47 MP/cm2.

How does high-resolution mode work?

One of the tricks of modern digital cameras is a little thing called “high-resolution mode” (HRM), which is sometimes called pixel-shift. It effectively boosts the resolution of an image, even though the number of pixels used by the camera’s sensor does not change. It can boost a 24 megapixel image into a 96 megapixel image, enabling a camera to create images at a much higher resolution than its sensor would normally be able to produce.

So how does this work?

In normal mode, using a colour filter array like Bayer, each photosite acquires one particular colour, and the final colour of each pixel in an image is achieved by means of demosaicing. The basic mechanism for HRM works through sensor-shifting (or pixel-shifting) i.e. taking a series of exposures and processing the data from the photosite array to generate a single image.

  1. An exposure is obtained with the sensor in its original position. The exposure provides the first of the RGB components for the pixel in the final image.
  2. The sensor is moved by one photosite unit in one of the four principal directions. At each original array location there is now another photosite with a different colour filter. A second exposure is made, providing the second of the components for the final pixel.
  3. Step 2 is repeated two more times, in a square movement pattern. The result is that there are four pieces of colour data for every array location: one red, one blue, and two greens.
  4. An image is generated with each RGB pixel derived from the data, the green information is derived by averaging the two green values.

No interpolation is required, and hence no demosaicing.

The basic high-resolution mode process (the arrows represent the direction the sensor shifts)

In cameras with HRM, it functions using the motors that are normally dedicated to image stabilization tasks. The motors effectively move the sensor by exactly the amount needed to shift the photosites by one whole unit. The shifting moves in such a manner that the data captured includes one Red, one Blue and two Green photosites for each pixel.

There are many benefits to this process:

  • The total amount of information is quadrupled, with each image pixel using the actual values for the colour components from the correct physical location, i.e. full RGB information, no interpolation required.
  • Quadrupling the light reaching the sensor (four exposures) should also cut the random noise in half.
  • False-colour artifacts often arising in the demosaicing process are no longer an issue.

There are also some limitations:

  • It requires a very steady scene. It doesn’t work well if the camera is on a tripod, yet there is a slight breeze, moving the leaves on a tree.
  • It can be extremely CPU-intensive to generate a HRM RAW image, and subsequently drain the battery. Some systems, like Fuji’s GFX100 uses off-camera, post-processing software to generate the RAW image.

Here are some examples of the high resolution modes offered by camera manufacturers:

  • Fujifilm – Cameras like the GFX100 (102MP) have a Pixel Shift Multi Shot mode where the camera moves the image sensor by 0.5 pixels over 16 images and composes a 400MP image (yes you read that right).
  • Olympus – Cameras like the OM-D E-M5 Mark III (20.4MP), has a High-Resolution Mode which takes 8 shots using 1 and 0.5 pixel shifts, which are merged into a 50MP image.
  • Panasonic – Cameras like the S1 (24.2MP) have a High-Resolution mode that results in 96MP images. The Panasonic S1R at 47.3MP produces 187MP images.
  • Pentax – Cameras like the K-1 II (36.4MP) use a Pixel Shift Resolution System II with a Dynamic Pixel Shift Resolution mode (for handheld shooting).
  • Sony – Cameras like the A7R IV (61MP) uses a Pixel Shift Multi Shooting mode to produce a 240MP image.

Further Reading:

Do you need 61 megapixels, or even 102?

The highest “native” resolution camera available today is the Phase One FX IQ4 medium format camera at 150MP. Higher than that there is the Hasselblad H6D-400C at 400MP, but it uses pixel-shift image capture. Next in line is the medium format Fujifilm GFX 100/100S at 102 MP. In fact we don’t get to full-frame sensors until we hit the Sony A7R IV, at a tiny 61MP. Crazy right? The question is how useful are these sensors for the photographer? The answer is not straightforward. For some photographic professionals these large sensors make inherent sense. For the average casual photographer, they likely don’t.

People who don’t photograph a lot tend to be somewhat bamboozled by megapixels, like more is better. But more megapixels does not mean a better image. Here are some things to consider when thinking about when considering megapixels.

Sensor size

There is a point when it becomes hard to cram any more photosites into a particular sensor – they just become too small. For example the upper bound with APC-S sensors seems to be around 33MP, with full-frame it seems to be around 60MP. Put too many photosites on a sensor and the density of the photosites increases, as the size of the photosites decreases. The smaller the photosite, the harder it is for it to collect light. For example Fuji APS-C cameras seem to tap out at around 26MP – the X-T30 has a photosite pitch of 3.75µm. Note that Fuji’s leap to a larger number of megapixels also means a leap to a larger sensor – the medium format sensor with a sensor size of 44×33mm. Compared to the APS-C sensor (23.5×15.6mm), the medium format sensor is nearly four times the size. A 51MP medium format sensor has photosites which are 5.33µm in size, or 1.42 times of size of the 26MP APS-C sensor.

The verdict? Squeezing more photosites onto the same size sensor does increase resolution, but sometimes at the expense of how light is acquired by the sensor.

Image and linear resolution

Sensors are made up of photosites that acquire the data used to make image pixels. The image resolution of an image describes the number of pixels used to construct an image. For example a 16MP sensor with a 3:2 aspect ratio has an image resolution of 4899×3266 pixels – the dimensions are sometimes termed the linear resolution. To obtain twice the image resolution we need a 64MP sensor, rather than a 32MP sensor. A 32MP sensor has 6928×4619 photosites, which results in a 1.4 times increase in the linear resolution of the image. The pixel count has doubled, but the linear resolution has not. Upgrading from a 16MP sensor to a 24MP sensor means a ×1.5 increase in the pixel count, and a ×1.2 increase in linear resolution. The transition from 16MP to 64MP is a ×2 increase in linear resolution, and a ×4 increase in the number of pixels. That’s why the difference between 16MP and 24MP sensors is also dubious (see Figure 1).

Fig.1: Different image resolutions and megapixels within an APS-C sensor

To double the linear resolution of a 24MP sensor you need a 96MP sensor. So the 61MP sensor provides about double the linear resolution of a 16MP sensor, as the 102MP sensor doubles the 24MP sensor.

The verdict? Doubling the pixel count, i.e. image resolution, does not double the linear resolution.

Photosite size

When you have more photosites, you also have to ask what their physical size is. Squeezing 41 million photosites on the same size sensor as one which previously had 24 million pixels means that each pixel will be smaller, and that comes with its own baggage. Consider for instance the full-frame camera, the full-frame Leica M10-R, which has a 7864×5200 photosites (41MP) meaning the photosite size is roughly 4.59 microns. The full-frame 24MP Leica M-E has a photosite size of 5.97 microns, so 1.7 times the area. Large photosites allow more light to be captured, while smaller photosites gather less light, so when their low signal strength is transformed into a pixel, more noise is generated.

The verdict? From the perspective of photosite size, 24MP captured on a full-frame sensor will be better than 24MP on an APS-C sensor, which in turn is better than 24MP on a M43 sensor (theoretically anyways).

Optics

Comparing the quality of a 16MP lens to a 24MP lens, we might determine that the quality, and sharpness of the lens is more important than the number of pixels. In fact too many people place an emphasis on the number of pixels and forget about the fact that light has to pass through a lens before it is captured by the sensor and converted into an image. Many high-end cameras already provide an in-camera means of generating a high-resolution images, often four times the actual image resolution – so why pay more for more megapixels? Is a 50MP full-frame sensor any good without optically perfect (or near-perfect) lenses? Likely not.

The verdict? Good quality lenses are just as important as more megapixels.

File size

People tend to forget that images have to be saved on memory cards (and post-processed). The greater the megapixels, the greater the resulting file size. A 24MP image stored as a 24-bit/pixel JPEG will be 3.4MB in size (at 1/20). As a 12-bit RAW the file size would be 34MB. A 51MP camera like the Fujifilm GFX 50S II would have a 7.3MB JPEG, and a 73MB 12-bit RAW. If the only format used is JPEG it’s probably, fine, but the minute you switch to RAW it will use way more storage.

The verdict? More megapixels = more megabytes.

Camera use

The most important thing to consider may be what the camera is being used for?

  • Website / social media photography – Full-width images for websites are optimal at around 2400×1600 (aka 4MP), blog-post images max. 1500 pixels in width (regardless of height), and inside content max 1500×1000. Large images can reduce website performance, and due to screen resolution won’t be visualized to their fullest capacity anyways.
  • Digital viewing – 4K televisions have roughly 3840×2160 = 8,294,400 pixels. Viewing photographs from a camera with a large spatial resolution will just mean they are down-sampled for viewing. Even the Apple Pro Display XDR only has 6016×3384=20MP view capacity (which is a lot).
  • Large prints – Doing large posters, for example 20×30″ requires a good amount of resolution if they are being printed at 300DPI, which is the nominal standard. So this needs about 54MP (check out the calculator). But you can get by with less resolution because few people view a poster at 100%.
  • Average prints – An 8×10″ print requires 2400×3000 = 7.2MP at 300DPI. A 26MP image will print maximum size 14×20″ at 300DPI (which is pretty good).
  • Video – Does not need high resolution, but rather 4K video at a descent frame rate.

The verdict? The megapixel amount really depends on the core photographic application.

Postscript

So where does that leave us? Pondering a lot of information, most of which the average photographer may not be that interested in. Selecting the appropriate megapixel size is really based on what a camera will be used for. If you commonly take landscape photographs that are used in large scale posters, then 61 or 102 megapixels is certainly not out of the ballpark. For the average photographer taking travel photos, or for someone taking images for the web, or book publishing, then 16MP (or 24MP at the higher end) is ample. That’s why smartphone cameras do so well at 12MP. High MP cameras are really made more for professionals. Nobody needs 61MP.

The voverall erdict? Most photographers don’t need 61 megapixels. In reality anywhere between 16 and 24 megapixels is just fine.

Further Reading

What is image resolution?

Sometimes a technical term gets used without any thought to its meaning, and before you know it becomes an industry standard. This is the case with the term “image resolution”, which has become the standard means of describing how much detail is portrayed in an image. The problem is that the term resolution can mean different things in photography. In one context it is used in describing the pixel density of devices (in DPI or PPI). For example a screen may have a resolution of 218 ppi (pixels-per-inch), and a smartphone might have a resolution of 460ppi. There is also sensor resolution, which is concerned with photosite density on a sensor based on sensor size. You can see how this can get confusing.

Fig.1: Image resolution is about detail in the image (the image on the right becomes pixelated when enlarged)

The term image resolution really just refers to the number of pixels in an image, i.e. pixel count. It is usually expressed in terms of two numbers for the number of pixel rows and columns in an image, often known as the linear resolution. For example the Ricoh GR III has an APS-C sensor with a sensor resolution of 6051×4007, or about 24.2 million photosites on the physical sensor. The effective number of pixels in an image derived from the sensor is 6000×4000, or a pixel count of 24 million pixels – this is considered the image resolution. Image resolution can be used in the context of describing a camera in broad context, e.g., the Sony A1 has 50 megapixels, or based on dimensions, “8640×5760”. It is often used in the context of comparing images, e.g. the Sony A1 with 50MP has a higher resolution than the Sony ZV-1 with 20MP. The image resolution of two images is shown in Figure 1 – a high resolution image has more detail than an image with lower resolution.

Fig.2: Image resolution and sensor size for 24MP.

Technically when talking about the sensor we are talking about photosites, but image resolution is not about the sensor, it is about the image produced from the sensor. This is because it is challenging to attempt to compare cameras based on photosites, as they all have differing properties, e.g. photosite area. Once the data from the sensor has been transformed into an image, then the photosite data becomes pixels, which are dimensionless entities. Note that the two dimensions representing the image resolution will change depending on the aspect ratio of the sensor. So while a 24MP image on a 3:2 sensor (APS-C) will have dimensions of 6000 and 4000, a full-frame sensor with the same pixel count will have dimensions of roughly 5657×4243.

Fig.3: Changes in image resolution within different sensors.

Increasing image resolution does not always mean increasing the linear resolution, or detail in the same amount. For example a 16MP image from 3:2 ratio sensor would produce an image with resolution of 4899×3266. A 24MP images from the same type of sensor would increase the pixel count by 50%, however the vertical and horizontal dimensions would only increase by 20% – so a much lower change in linear resolution. To double the linear resolution would require an increase in resolution to a 64MP image.

Is image resolution the same as sharpness? Not really, this has more to do with an images spatial resolution (this is where the definition of the word resolution starts to betray itself). Sharpness concerns how clearly defined details within images appear, and is somewhat subjective. It’s possible to have a high resolution image that is not sharp, just like its possible to have a low resolution image that has a good amount of acuity. It really depends on the situation it is being viewed in, i.e. back to device pixel density.

Photosites – Quantum efficiency

Not every photo that makes it through the lens ends up in a photosite. The efficiency with which photosites gather incoming light photons is called its quantum efficiency (QE). The ability to gather light is determined by many factors including the micro lenses, sensor structure, and photosite size. The QE value of a sensor is a fixed value that depends largely on the chip technology of the sensor manufacturer. The QE is averaged out over the entire sensor, and is expressed as the chance that a photon will be captured and converted to an electron.

Quantum efficiency (P = Photons per μm2, e = electrons)

The QE is a fixed value and is dependent on a sensor manufacturers design choices. The QE is averaged out over the entire sensor. A sensor with an 85% QE would produce 85 electrons of signal if it were exposed to 100 photons. There is no way to effect the QE of a sensor, i.e. you can’t change things by changing the ISO.

The QE is typically 30-55% meaning 30-55% of the photons that fall on any given photosite are converted to electrons. (front illuminated sensors). In back illuminated sensors, like those typically found on smartphones, the QE is approximately 85%. The website Photons to Photos has a list of sensor characteristics for a good number of cameras. For example the sensor in my Olympus OM-D E-M5 Mark II has a supposed QE of 60%. Trying to calculate the QE of a sensor in non-trivial.

The early days of image processing: To Mars and beyond

After Ranger 7, NASA moved on to Mars, deploying Mariner 4 in November 1964. It was the first probe to send signals back to Earth in digital form, which was necessitated by the fact that the signals had to travel 216 million km back to earth. The receiver on board could send and receive data via the low- and high-gain antennas at 8⅓ or 33⅓ bits-per-second. So at the low end, one pixel (8-bit) per second. All images were transmitted twice to insure no data were missing or corrupt. In 1965, JPL established the Image Processing Laboratory (IPL).

The next series of lunar probes, Surveyor, were also analog (due to construction being too advanced to make changes), providing some 87,000 images for processing by IPL. The Mariner images also contained noise artifacts that made them look as if they were printed on “herringbone tweed”. It was Thomas Rindfleisch of IPL who applied nonlinear algebra, creating a program called Despike – it performed a 2D Fourier transform to create a frequency spectrum with spikes representing the noise elements, which could then be isolated, removed and the data transformed back into an image.

Below is an example of this process applied to an image from Mariner 9 taken in 1971 (PIA02999), containing a herringbone type artifact (Figure 1). The image is processed using a Fast Fourier Transform (FFT – see examples FFT1, FFT2, FFT3) in ImageJ.

Fig.1: Image before (left) and after (right) FFT processing

Applying a FFT to the original image, we obtain a power spectrum (PS), which shows differing components of the image. By enhancing the power spectrum (Figure 2) we are able to look for peaks pertaining to the feature of interest. In this case the vertical herringbone artifacts will appear as peaks in the horizontal dimension of the PS. Now in ImageJ these peaks can be removed from the power spectrum, (setting them to black), effectively filtering out those frequencies (Figure 3). By applying the Inverse FFT to the modified power spectrum, we obtain an image with the herringbone artifacts removed (Figure 1, right).

Fig.2: Power spectrum (enhanced to show peaks)
Fig.3: Power spectrum with frequencies to be filtered out marked in black.

Research then moved to applying the image enhancement techniques developed at IPL to biomedical problems. Robert Selzer processed chest and skull x-rays resulting in improved visibility of blood vessels. It was the National Institutes of Health (NIH) that ended up funding ongoing work in biomedical image processing. Many fields were not using image processing because of the vast amounts of data involved. Limitations were not posed by algorithms, but rather hardware bottlenecks.

The early days of image processing : the 1960s lunar probes

Some people probably think image processing was designed for digital cameras (or to add filters to selfies), but in reality many of the basic algorithms we take for-granted today (e.g. improving the sharpness of images) evolved in the 1960s with the NASA space program. The space age began in earnest in 1957 with the USSR’s launch of Sputnik I, the first man-made satellite to successfully orbit Earth. A string of Soviet successes lead to Luna III, which in 1959 transmitted back to Earth the first images ever seen of the far side of the moon. The probe was equipped with an imaging system comprised of a 35mm dual-lens camera, an automatic film processing unit, and a scanner. The camera sported a 200mm f/5.6, and a 500mm f/9.5 lens, and carried temperature and radiation resistant 35mm isochrome film. Luna III took 29 photographs over a 40-minute period, covering 70% of the far side, however only 17 of the images were transmitted back to earth. The images were low-resolution, and noisy.

The first image obtained from the Soviet Luna III probe on October 7, 1959 (29 photos were taken of the dark side of the moon).

In response to the Soviet advances, NASA’s Jet Propulsion Lab (JPL) developed the Ranger series of probes, designed to return photographs and data from the moon. Many of the early probes were a disaster. Two failed to leave Earth orbit, one crashed onto the moon, and two left Earth orbit but missed the moon. Ranger 6 got to the moon, but its television cameras failed to turn on, so not a single image could be transmitted back to earth. Ranger 7 was the last hope for the program. On July 31, 1964 Ranger 7 neared its lunar destination, and in the 17 minutes before it impacted the lunar surface it relayed the first detailed images of the moon, 4,316 of them, back to JPL.

Image processing was not really considered in the planning for the early space missions, and had to gain acceptance. The development of the early stages of image processing was led by Robert Nathan. Nathan received a PhD in crystallography in 1952, and by 1955 found himself running CalTech’s computer centre. In 1959 he moved to JPL to help develop equipment to map the moon. When he viewed pictures from the Luna III probe he remarked “I was certain we could do much better“, and “It was quite clear that extraneous noise had distorted their pictures and severely handicapped analysis” [1].

The cameras† used on the Ranger were Vidicon television cameras produced by RCA. The pictures were transmitted from space in analog form, but enhancing them would be difficult if they remained in analog. It was Nathan who suggested digitizing the analog video signals, and adapting 1D signal processing techniques to process the 2D images. Frederick Billingsley and Roger Brandt of JPL devised a Video Film Converter (VFC) that was used to transform the analog video signals into digital data (which was 6-bit, 64 gray levels).

The images had a number of issues. First there was the geometric distortion. The beam that swept electrons across the face of the tube in the spacecraft’s camera moved at nonuniform rates that varied from the beam on the playback tube reproducing the image on Earth. This resulted in images that were stretched or distorted. A second problem was that of photometric nonlinearity. The cameras had a tendency to display brightness in the centre, and a darkness around the edge which was caused by a nonuniform response of the phosphor on the tube’s surface. Thirdly, there was an oscillation in the electronics of the camera which was “bleeding” into the video signal, causing a visible period noise pattern. Lastly there was scan-line noise, which was the nonuniform response of the camera with respect to successive scan lines (the noise is generated at right-angles to the scan). Nathan and the JPL team designed a series of algorithms to correct for the limitations of the camera. The image processing algorithms [2] were programmed on JPL’s IBM 7094, likely in the programming language Fortran.

  • The geometric distortion was corrected using a “rubber sheeting” algorithm that stretched the images to match a pre-flight calibration.
  • The photometric nonlinearity was calculated before flight, and filtered from the images.
  • The oscillation noise was removed by isolating the noise on a featureless portion of the image, created a filter, and subtracted the pattern from the rest of the image.
  • The scan-line noise was removed using a form of mean filtering.

Ranger VII was followed by the successful missions of Ranger VIII and Ranger IX. The image processing algorithms were used to successfully process 17,259 images of the moon from Rangers 7, 8, and 9 (the link includes the images and documentation from the Ranger missions). Nathan and his team also developed other algorithms which dealt with random-noise removal, Sine-wave correction.

Refs:
[1] NASA Release 1966-0402
[2] Nathan, R., “Digital Video-Data Handling”, NASA Technical Report No.32-877 (1966)
[3] Computers in Spaceflight: The NASA Experience, Making New Reality: Computers in Simulations and Image Processing.

† The Ranger missions used six cameras, two wide-angle and four narrow angle.

  • Camera A was a 25mm f/1 with a FOV of 25×25° and a Vidicon target area of 11×11mm.
  • Camera B was a 76mm f/2 with a FOV of 8.4×8.4° and a Vidicon target area of 11×11mm.
  • Camera P used two type A and two type B cameras with a Vidicon target area of 2.8×2.8mm.