Making a digital contact sheet with ImageMagick

On the basis of the previous posts, this posts presents a method of generating a digital contact sheet using another ImageMagick command, montage. It can be used for pure images, or as I like to do, create a contact sheet with the images and their intensity histograms. Like many commands there are a myriad of options, but a basic use of montage might be:

montage *.png -geometry 200x150 -tile 8x5 contact.png

This takes all PNG files in the current directory and creates an 8×5 montage (8 columns by 5 rows) with 200×150 thumbnails of the images, and saves the montage in a file named contact.png. This will hold 40 images, and if there are more than this, they spill over into a second image. This is a little bit awkward, so to make things nicer, we can write a script to process the images. Below is a bash shell script called :

ls *.png > imglist
read nImgs <<< $(sed -n '$=' imglist)
let nrows=nImgs/8
let lefto=nImgs%8
if [ $lefto -gt 0 ]
   let nrows=nrows+1
montage @imglist -geometry 200x150 -tile 8x$nrow $1
rm imglist

Now let’s look through this script based on what each line does.

  • Line 1 Identifies the script as a bash shell script.
  • Line 2 uses the ls command to list all the PNG image files in the current directory, and outputs the list to a text file called imglist. The files will be sorted in alphabetical order.
  • Line 3 counts the number of lines in the file imglist, using the sed (stream editor) command, sed -n '$=' imglist. The number of lines represents the number of PNG files, as there is one filename per line. The number of files calculated is stored in the variable nImgs.
  • Line 4 calculates the number of rows by dividing nImgs by 8, and stores the value in the variable nrows (assuming we want 8 images across in the montage). This will produce an integer result. For example if the number of images is 41, then 41/8 = 5.
  • Line 5 calculates the leftover from the division of nImgs by 8, and stores it in the variable lefto. For example 41%8 = 1.
  • Line 6 questions if the leftover, i.e. the value in lefto, is greater than 0. If it is, it indicates a extra row should be added to the variable nrows (Line 8). This deals with the issue of montage creating an extra image should the number of images go beyond the 8×5 tiles.
  • Line 10, generates the contact sheet using montage. It uses the list in imglist, and uses the variable nrows to specify the number of rows in the montage, i.e. 8x$nrow. The $1 at the end of the command is the output filename for the montage, which is specified when contactsheet is run, for example (result shown in Fig.1):
    • ./ photosheet1.png
  • Line 11 deletes the file containing the list of images.
Fig.1: A sample digital contact sheet using the script.

It may seem quite complicated, but once you get the hang of it, writing these scripts save a lot of time. If you have a folder with a lot of images in it, then you may prefer to produce a series of smaller contact sheets, in which case the script becomes much simpler. In the simpler version below, the tile size remains at 8×5. So 190 images would produce 5 contact sheets.

ls *.png > imglist
montage @imglist -geometry 200x150 -tile 8x5 $1
rm imglist

There are lots of things which can be customized. Perhaps you want smaller images, which can be achieved by modifying the geometry size. Or perhaps you want each image in the montage labelled? Or perhaps you want to process JPGs?

the image histogram (iii) – useful information

Some people think that the histogram is some sort of panacea for digital photography, a means of deciding whether an image is “perfect” enough. Others tend to disregard the statistical response it provides completely. This leads us to question what useful information is there in a histogram, and how we go about interpreting it.

A plethora of information

A histogram maps the brightness or intensity of every pixel in an image. But what does this information tell us? One of the main roles of a histogram is to provide information on the tonal distributions in an image. This is useful to help determine if there is something askew with the visual appearance of an image. Histograms can be viewed live/in-camera, for the purpose of determining whether or not an image has been corrected exposed, or used during post-processing to fix aesthetic inadequacies. Aesthetic deficiencies can occur during the acquisition process, or can be intrinsic to the image itself, e.g. faded vintage photographs. Examples of deficiencies include such things as blown highlights, or lack of contrast.

A histogram can tell us many differing things about how intensities are distributed throughout the image. Figure 1 shows an example of a colour image, photograph taken in Bergen, Norway, its associated grayscale image and histograms. The histogram spans the entire range of intensity values. Midtones comprise 66% of pixels in the image, with the majority tiered towards the lighter midtone values (the largest hump in the histogram). Shadow pixels comprise only 7% of the whole image, and are actually associated with shaded regions in the image. Highlights relate to regions like the white building on the left, and some of the clouds. There are very few pure white, the exception being the shopfront signs. Some of the major features in the histogram are indicated in the image.

Fig.1: A colour image and its histograms

There is no perfect histogram

Before we get into the nitty-gritty, there is one thing that should be made clear. Sometimes there are infographics on the internet that tout the myth of a “perfect” or “ideal” histogram. The reality is that such infographics are very misleading. There is no such thing as a perfect histogram. The notion of the ideal histogram is one that is shaped like a “bell”, but there is no reason why the distribution of intensities should be that even. Here is the usual description of an ideal image: “An ideal image has a histogram which has a centred hill type shape, with no obvious skew, and a form that is spread across the entire histogram (and without clipping)”.

Fig.2: A bell-shaped curve

But a scene may be naturally darker or lighter rather than midtones found in a bell-shaped histogram. Photographs taken in the latter part of the day will be naturally darker, as will photographs of dark objects. Conversely, a photograph of a snowy scene will skew to the right. Consider the picture of the Toronto skyline taken at night shown in Figure 3. Obviously the histogram doesn’t come close to being “perfect”, but the majority of the scene is dark – not unusual for a dark scene, and hence the histogram is representative of this. In this case the low-key histogram is ideal.

Fig.3: A dark image with a skewed histogram

Interpreting a histogram

Interpreting a histogram usually involves examining the size and uniformity of the distribution of intensities in the image. The first thing to do is to look at the overall curve of the histogram to get some idea about its shape characteristics. The curve visually communicates the number of pixels in any one particular intensity.

First, check for any noticeable peaks, dips, or plateaus. For example peaks generally indicate a large number of pixels of a certain intensity range within the image. Plateaus indicate a uniform distribution of intensities. Check to see if the histogram skewed to the left or right. A left-skewed histogram might indicate underexposure, the scene itself being dark (e.g. a night scene), or containing dark objects. A right-skewed histogram may indicate overexposure, or a scene full of white objects. A centred histogram may indicate a well-exposed image, because it is full of mid-tones. A small, uniform hill may indicate a lack of contrast.

Next look at the edges of the histogram. A histogram with peaks that are placed against either edge of the histogram may indicate some loss of information, a phenomena known as clipping. For example if clipping occurs on the right side, something known as highlight clipping, the image may be overexposed in some areas. This is a common occurrence in semi-bright overcast days, where the clouds can become blown-out. But of course this is relative to the scene content of the image. As well as shape, the histogram shows how pixels are groups into tonal regions, i.e. the highlights, shadows, and midtones.

Consider the example shown below in Figure 4. Some might interpret this as somewhat of an “ideal” histogram. Most of the pixels appear in the midtones region of the histogram, with no great amount of blacks below 17, nor whites above 211. This is a well-formed image, except that it lacks some contrast. Stretching the histogram over the entire range of 0-255 could help improve the contrast.

Fig.4: An ideal image with a central “hump” (but lacking some contrast)

Now consider a second example. This picture in Figure 5 is of a corner grocery store in Montreal and has a histogram with a multipeak shape. The three distinct features almost fit into the three tonal regions: the shadows (dark blue regions, and empty dark space to the right of the building), the midtones (e.g. the road), and the highlights (the light upper brick portion of the building). There is nothing intrinsically wrong with this histogram, as it accurately represents the scene in the image.

Fig.4: An ideal image with multiple peaks in the histogram

Remember, if the image looks okay from a visual perspective, don’t second-guess minor disturbances in the histogram.

Next: More on interpretation – histogram shapes.

Programming to process images : scripting

If you don’t want to learn a programming language, but you do want to go beyond the likes of Photoshop and GIMP, then you should consider ImageMagick, and the use of scripts. ImageMagick is used to create, compose, edit and convert images. It can deal with over 200 image formats, and allows processing on the command-line.

1. Using the command line

Most systems like MacOS and Linux, provide what is known as a “shell”. For example in OSX it can be accessed through the “Terminal” app, although it is nicer to use an app like iTerm2. When a window is opened up in the Terminal (or iTerm2), the system applies a shell, which is essentially an environment that you can work in. In a Mac’s OSX, this is usually the “Z” shell, or zsh for short. It let’s you list files, change folders (directories), and manipulate files, among many other things. At this lower level of the system, aptly known as the command-line, programs can be executed using the keyboard. The command-line is different from using an app. It is not WYSIWYG, What-You-See-Is-What-You-Get, but it is perfect for tasks where you know what you want done, or you want to process a whole series of images in the same way.

2. Processing on the command line

Processing on the command-line is very much a text-based endeavour. No one is going to apply a curve-tool to an image in this manner because there is no way of seeing the process happen live. But for other tasks, things are just done way easier on the command line. A case in point is batch-processing. For example say we have a folder of 16 megapixel images, which we want to reduce in size for use on the web. It is uber tedious to have to open them up in an application, and then save each individually at a reduced size. Consider the following example which reduces the size of an image by 50%, i.e. its dimensions are reduced by 50%, using one of the ImageMagick commands:

magick yorkshire.png -resize 50% yorkshire50p.png

There is also a plethora of ready-made scripts out there, for example Fred’s ImageMagick Scripts.

3. Scripting

Command-line processing can be made more powerful using a scripting language. Now it is possible to do batch processing in ImageMagick using mogrify. For example to reduce all PNG images by 40% is simple:

magick mogrify -resize 40% *.png

The one problem here is that mogrify  will overwrite the existing images, so it should be run on copies of the original images. An easier way is to learn about shell scripts, which are small programs designed to run in the shell – basically they are just a list of commands to be performed. These scripts use some of the same constructs as normal programming languages to perform tasks, but also allow the use of a myriad of programs from the system. For example, below is a shell script written in bash (a type of shell), and using the convert command from ImageMagick to convert all the JPG files to PNG.

for img in *.jpg
    filename=$(basename "$img")
    echo $filename
    convert "$img" "$filename.png"

It uses a loop to process each of the JPG files, without affecting the original files. There is some fancy stuff going on before we call convert, but all that does is split the filename and its extension (jpg), keeping the filename, and ditching the extension, so that a new extension (png) can be added to the processed image file.

Sometimes I like to view the intensity histograms of a folder of images but don’t want to have to view them all in an app. Is there an easier way? Again we can write a script.

for img in *.png
   # Extract basename, ditching extension
   filename=$(basename "$img")
   echo $filename

   # Create new filenames

   # Generate intensity / grayscale histograms
   convert $filename.png -colorspace Gray -define histogram:unique-colors=false histogram:$grayhist.png

Below is a sample of the output applied to a folder. Now I can easily see what the intensity histogram associated with each image looks like.

Histograms generated using a script.

Yes, some of these seem a bit complicated, but once you have a script it can be easily modified to perform other batch processing tasks.

Programming to process images : coding

People take photographs for a number of reasons. People process images for a number of reasons. Some people (like me) spend as little time as possible post-processing images, others spend hours in applications like Photoshop, tweaking every possible part of an image. To each their own. There are others still who like to tinker with the techniques themselves, writing their own software to process their images. There are many different things to consider, and my hope in this post is to try and look at what it means to create your own image processing programs to manipulate images.

Methods of image manipulation

1. What do you want to achieve?

The first thing to ask is what do you want to achieve? Do you want to implement algorithms you can’t find in regular photo manipulating programs? Do you want to batch process images? There are various different approaches here. If you want to batch process images, for example converting hundreds of images to another file format, or automatically generating histograms of images, then it is possible to learn some basic scripting, and use existing image manipulation programs such as ImageMagick. If you want to write heavier algorithms, for example some fancy new Instagram-like filter, then you will have to learn how to program using a programming language.

2. To program you need to understand algorithms

Programming of course is not a trivial thing, as much as people like to make it sound easy. It is all about taking an algorithm, which is a precise description of a method for solving a problem, and translating it into a program using a programming language. The algorithms can already exist, or they can be created from scratch. For example, a Clarendon-like (Instagram) image filter can be reproduced with the following algorithm (shown visually below):

  1. Read in an image from file.
  2. Add a blue tint to the image by blending it with a blue image of the same size (e.g. R=127, G=187, B=227, opacity = 20%).
  3. Increase the contrast of the image by 20%.
  4. Increase the saturation of the image by 35%.
  5. Save the new Clarendon-like filtered image in a new file.
A visual depiction of the algorithm for a Clarendon filter-like effect

To reproduce any of these tasks, we need to have an understanding of how to translate the algorithm into a program using a programming language, which is not always a trivial task. For example to perform the task of increasing the saturation of an image, we have to perform a number of steps:

  1. Convert the image to a colour model which allows saturation to be easily modified, for example HSI, which has three component layers: Hue, Saturation, and Intensity.
  2. Increase in saturation by manipulating the Saturation layer, i.e. multiplying all values by 1.2.
  3. Convert the image from HSI back to RGB.

Each of these tasks in turn requires additional steps. You can see this could become somewhat long-winded if the algorithm is complex.

3. To implement algorithms you need a programming language

The next trick is the programming language. Most people don’t really want to get bogged down in programming because of the idiosyncrasies of programming languages. But in order to convert an algorithm to a program, you have to choose a programming language, and learn how to code in it.

There are many programming languages, some are old, some are new. Some are easier to use than others. Novice programmers often opt for a language such as Python which is easy to learn, and offers a wide array of existing libraries or algorithms. The trick with programming is that you don’t necessarily want to reinvent the wheel. You don’t want to have to implement an algorithm that already exists. That’s why programming languages provide functions like sqrt() and log(), so people don’t have to implement them. For example, Akiomi Kamakura has already created a Python library called Pilgram, which contains a series of mimicked Instragram filters (26 of them), some CSS filters, and blend modes. So choosing Python means that you don’t have to build these from scratch, if anything they might provide inspiration to build your own filters. For example the following Python program uses Pilgram to apply the Clarendon filter to a JPG image:

from PIL import Image
import pilgram
import os

inputfile = input('Enter image filename (jpg): ')
base = os.path.splitext(inputfile)[0]
outputfile = base + '-clarendon.jpg'

im =

The top part of the program imports the libraries, the middle portion deals with obtaining the filename of the image to be processed, and producing an output filename, and the last two lines actually open the image, process it with the filter, and save the filtered image. Fun right? But you still have to learn how to code in Python. Python comes with it’s own baggage (like dependencies, and being uber slow), but overall it is still a solid language, and easy to learn for novices. There are also other computer vision/image processing libraries such as scikit-image, SimpleCV and OpenCV. And in reality that is the crux here, programming for beginners.

If you really want to program in a more “complex” language, like C or C++ there is often a much steeper learning curve, and fewer libraries. I would not advocate for a fledgling programmer to learn any C-based language, only because it will result in being bogged down by the language itself. For the individual hell-bent on learning one of these languages I would suggest Fortran. Fortran was the first high-level programming language, introduced in 1957, however it has evolved, and modern Fortran is easy to learn, and use. It doesn’t come with much in the way of image processing libraries, but if you persevere you can build them.

The Pentax (Asahi) 17mm fish-eye lens – 160 or 180°?

The closest Pentax came to a fisheye prior to the 17mm was the Takumar 18mm, which had an angle of view of 148°. In 1967, Pentax introduced the 17mm fish-eye. There are some discrepancies with whether the Asahi fish-eye lenses had an angle-of-view of 160° or 180°. During the period when Asahi Pentax produced the 17mm lens, it seems there were three versions.

  • Fish-eye-Takumar 17mm f/4 (1967-1971)
    • This seems to be referred to in the literature as a Super-Takumar.
  • Super-Multi-Coated FISH-EYE-TAKUMAR 17mm f/4 (1971-1975)
  • SMC PENTAX FISH-EYE 17mm f/4 (1975-1985)
All three variants of the 17mm lens

Many people assume every variant is 180°, but the literature such as brochures seems to tell another story. As you can see from the snippets of various catalog’s shown below, the earliest version seems to be 160°, with some transition between the Super-Takumar and Super-Multicoated being either 160° or 180°, with the later SMC versions being all 180°. What’s the real story? I haven’t been able to find out. Short of physically measuring the earlier two versions it’s hard to tell whether the early versions were indeed 160°, or was it a typo?

Specs from various pieces of literature

Using vintage fisheye lenses on a crop-sensor

I love vintage lenses, and in the future, I will be posting much more on them. The question I want to look at here is the usefulness of vintage fish-eye lenses on crop sensors. Typically 35mm fisheye lenses are categorized into circular, and full-frame (or diagonal). A circular fisheye is typically in the range 8-10mm, with full-frame fisheye’s typically 15-17mm. The difference is shown in Figure 1.

Fig. 1: Circular 7.5mm versus full-frame 17mm

The problem arises with the fact that fish-eye lenses are different. So different that the projection itself can be one of a number of differing types, for example equidistant, and equisolid. That aside, using a fisheye lens on a crop-sensor format produces much different results. This of course has to do with the crop factor. An 8mm circular fisheye on a camera with an APS-C sensor will have an AOV (Angle-of-View) equivalent to a 12mm lens. A 15mm full-frame fisheye will similarly have an AOV equivalent of a 22.5mm lens. A camera with a MFT sensor will produce an even smaller image. The effect of crop-sensors on both circular and full-frame fisheye lenses is shown in Figure 2.

Fig.2: Picture areas in circular and full-frame fisheye lenses on full-frame, and crop-sensors

In particular, let’s look at the Asahi Super Takumar 17mm f/4 fish-eye lens. Produced from 1967-1971, in a couple of renditions, this lens has a 160° angle of view, in the diagonal, 130° in the horizontal. This is a popular vintage full-frame fisheye lens.

Fig.3: The Super-Takumar 17mm

The effect of using this lens on a crop-sensor camera is shown in Figure 4. It effectively looses a lot of its fisheye-ness. In the case of an APS-C sensor, the 160° in the diagonal reduces to 100°, which is on the cusp of being an ultra-wide. When associated with a MFT sensor, the AOV reduces again to 75°, now a wide angle lens. Figure 4 also shows the horizontal AOV, which is easier to comprehend.

Fig.4: The Angle-of-View of the Super-Takumar 17mm of various sensors

The bottom line is, that a full-frame camera is the best place to use a vintage fish-eye lens. Using one on a crop-sensor will limit its “fisheye-ness”. Is it then worthwhile to purchase a 17mm Takumar? Sure if you want to play with the lens, experiment with it’s cool built-in filters (good for B&W), or are looking for a wide-angle lens equivalent, any sort of fisheye effect will never be achieved. In many circumstances, if you want a more pronounced fisheye effect on a crop-sensor, it may be better to use a modern fisheye instead.

NB: Some Asahi Pentax catalogs suggest the 17mm has an AOV of 160°, while others suggest 180°.

Pixel peeping and why you should avoid it

In recent years there has been a lot of of hoopla about this idea of pixel peeping. But what is it? Pixel peeping is essentially magnifying an image until individual pixels are perceivable by the viewer. The concept has been around for many years, but was really restricted to those that post-process their images. In the pre-digital era, the closest photographers would come to pixel peeping was the use of a loupe to view negatives, and slides in greater detail. It is the evolution of digital cameras that spurned the widespread use of pixel peeping.

Fig.1: Peeping at the pixels

For some people, pixel-peeping just offers a vehicle for finding flaws, particularly in lenses. But here’s the thing, there is no such thing as a perfect lens. There will always be flaws. A zoomed in picture will contain noise, and grain, unsharp, and unfocused regions. But sometimes these are only a problem because they are being viewed at 800%. Yes, image quality is important, but if you spend all your time worrying about every single pixel, you will miss the broader context – photography is suppose to be fun.

Pixel-peeping is also limited by the resolution of the sensor, or put another way, some objects won’t look good when viewed at 1:1 at 16MP. They might look better at 24MP, and very good at 96MP, but a picture is the sum of all its pixels. My Ricoh GR III allows 16× zooming when viewing an image. Sometimes I use it just to find out it the detail has enough sharpness in close-up or macro shots. Beyond that I find little use for it. The reality is that in the field, there usually isn’t the time to deep dive into the pixel content of a 24MP image.

Of course apps allow diving down to the level of the individual pixels. There are some circumstances where it is appropriate to look this deep. For example viewing the subtle effects of changing settings such as noise reduction, or sharpening. Or perhaps viewing the effect of using a vintage lens on a digital camera, to check the validity of manual focusing. There are legitimate reasons. Pixel peeping on the whole is really only helpful for people who are developing or finetuning image processing algorithms.

Fig.2: Pixel peeping = meaningless detail

One of the problems with looking at pixels 1:1 is that a 24MP image was never meant to be viewed using the granularity of a pixel. Given the size of the image, and the distance it should be viewed at, micro-issues are all but trivial. The 16MP picture in Figure 2 shows pixel-peeping of one of the ducks under the steam engine. The entire picture has a lot of detail in it, but dig closer, and the detail goes away. That makes complete sense because there are not enough pixels to represent everything in complete detail. Pixel-peeping shows the ducks eye – but it’s not exactly that easy to decipher what it is?

People that pixel-peep are too obsessed with looking at small details, when they should be more concerned with the picture as a whole.

The different Angle-of-View measurements

Look at any lens spec, and they will normally talk about the angle-of-view (AOV), sometimes used interchangeably (and incorectly) with field-of-view (FOV). But there are three forms of AOV, and they can be somewhat confusing. The first form is the diagonal AOV. It is one of the most common ones found in lens literature, but it isn’t very easy to comprehend without viewing the picture across the diagonal. Next is the vertical AOV, which makes the least sense, because we generally don’t take pictures, or even visualize the vertical. Lastly is the horizontal AOV, which makes the most sense, because of how humans perceive the world in front of them.

Showing the diagonal AOV of a lens is hard to conceptualize. It’s a bit like the way TV’s are described as being, say 50″, which is the diagonal measurement. In reality through, the TV is only 43.6″ wide. Horizontal is how people generally conceptualize things. As an example of a lens, consider a 24mm full-frame lens – it has a diagonal AOV of 84°, and a horizontal AOV of 74°. This isn’t really a lot, but enough to get a little confusing. A 16mm lens that has a AOV of 180° in the vertical, may only have a horizontal AOV of 140° An example of this is shown below.

Why are there no 3D colour histograms?

Some people probably wonder why there aren’t any 3D colour histograms. I mean if a colour image is comprised of red, green, and blue components, why not provide those in a combined manner rather than separate 2D histograms or a single 2D histogram with the R,G,B overlaid? Well, it’s not that simple.

A 2D histogram has 256 pieces of information (grayscale). A 24-bit colour image contains 2563 colours in it – that’s 16,777,216 pieces of information. So a three-dimensional “histogram” would contain the same number of elements. Well, it’s not really a histogram, more of a 3D representation of the diversity of colours in the image. Consider the example shown in Figure 1. The sample image contains 428,763 unique colours, representing just 2.5% of all available colours. Two different views of the colour cube (rotated) show the dispersion of colours. Both show the vastness of the 3D space, and conversely the sparsity of the image colour information.

Figure 1: A colour image and 3D colour distribution cubes shown at different angles

It is extremely hard to create a true 3D histogram. A true 3D histogram would have a count of the number of pixels with a particular RGB triplet at every point. For example, how many times does the colour (23,157,87) occur? It’s hard to visualize this in a 3D sense, because unlike the 2D histogram which displays frequency as the number of occurrences of each grayscale intensity, the same is not possible in 3D. Well it is, kind-of.

In a 3D histogram which already uses the three dimensions to represent R, G, and B, there would have to be a fourth dimension to hold the number of times a colour occurs. To obtain a true 3D histogram, we would have to group the colours into “cells” which are essentially clusters representing similar colours. An example of the frequency-weighted histogram with for the image in Figure 2, using 500 cells, is shown in Figure 2. You can see that while in the colour distribution cube in Figure 1 shows a large band of reds, because these colours exist in the image, the frequency weighted histogram shows that objects with red colours actually comprise a small number of pixels in the image.

Figure 2: The frequency-distributed histogram of the image in Fig.1

The bigger problem is that it is quite hard to visualize a 3D anything and actively manipulate it. There are very few tools for this. Theoretically it makes sense to deal with 3D data in 3D. The application ImageJ (Fiji) does offer an add-on called Color Inspector 3D, which facilitates viewing and manipulating an image in 3D, in a number of differing colour spaces. Consider another example, shown in Figure 3. The aerial image, taken above Montreal lacks contrast. From the example shown, you can see that the colour image takes up quite a thin band of colours, almost on the black-white diagonal (it has 186,322 uniques colours).

Figure 3: Another sample colour image and its 3D colour distribution cube

Using the contrast tool provided in ImageJ, it is possible to manipulate the contrast in 3D. Here we have increased the contrast by 2.1 times. You can easily see the result in Figure 4. difference working in 3D makes. This is something that is much harder to do in two dimensions, manipulating each colour independently.

Figure 4: Increasing contrast via the 3D cube

Another example of increasing colour saturation 2 times, and the associated 3D colour distribution is shown in Figure 5. The Color Inspector 3D also allows viewing and manipulating the image in other colour spaces such as HSB and CieLab. For example in HSB the true effect of manipulating saturation can be gauged. The downside is that it does not actually process the full-resolution image, but rather one reduced in size, largely because I imagine it can’t handle the size of the image, and allow manipulation in real-time.

Figure 5: Increasing saturation via the 3D cube

the image histogram (ii) – grayscale vs colour

In terms of image processing there are two basic types of histogram: (i) colour, and (ii) intensity (or luminance/grayscale) histograms. Figure 1 shows a colour image (an aerial shot of Montreal), and its associated RGB and intensity histograms. Colour histograms are essentially RGB histograms, typically represented by three separate histograms, one for each of the components – Red, Green, and Blue. The three R,G,B histograms are sometimes shown in one mixed histogram with all three R,G,B, components overlaid with one another (sometimes including an intensity histogram).

Fig.1: Colour and grayscale histograms

Both RGB and intensity histograms contain the same basic information – the distribution of values. The difference lies in what the values represent. In an intensity histogram, the values represent the intensity values in a grayscale image (typically 0 to 255). In an RGB histogram, divided into individual R, G, B histograms, each colour channel is just a graph of the frequencies of each of the RGB component values of each pixel.

An example is shown in Figure 2. Here a single pixel is extracted from an image. The RGB triplet for the pixel is (230,154,182) i.e. it has a red value of 230, a green value of 154, and a blue value of 182. Each value is counted in its respective bin in the associated component histogram. So red value 230 is counted in the bin marked as “230” in the red histogram. The three R,G, B histograms are visually no different than an intensity histogram. The individual R, G, and B histograms do not represent distributions of colours, but merely distributions of components – for that you need a 3D histogram (see bottom).

Fig.2: How an RGB histogram works: From single RGB pixel to RGB component histograms

Applications portray colour histograms in many different forms. Figure 3 shows the RGB histograms from three differing applications: Apple Photos, ImageJ, and ImageMagick. Apple Photos provides the user with the option of showing the luminance histogram, the mixed RGB, or the individual R, G, B histograms. The combined histogram shows all the overlaid R, G, B histograms, and a gray region showing where all three overlap. ImageJ shows the three components in separate histograms, and ImageMagick provides an option for their combined or separate. Note that some histograms (ImageMagick) seem a little “compressed”, because of the chosen x-scale.

Fig.3: How RGB histograms are depicted in applications

One thing you may notice when comparing intensity and RGB histograms is that the intensity histogram is very similar to the green channel or the RGB image (see Figure 4). The human eye is more sensitive to green light than red or blue light. Typically the green intensity levels within an image are most representative of the brightness distribution of the colour image.

Fig.4: The RGB-green histogram verus intensity histogram

An intensity image is normally created from an RGB image by converting each pixel so that it represents a value based on a weighted average of the three colours at that pixel. This weighting assumes that green represents 59% of the perceived intensity, while the red and blue channels account for just 30% and 11%, respectively. Here is the actual formula used:

gray = 0.299R + 0.587G + 0.114B

Once you have a grayscale image, it can be used to derive an intensity histogram. Figure 5 illustrates how a grayscale image is created from an RGB image using this formula.

Fig.5: Deriving a grayscale image from an RGB image

Honestly there isn’t really that much useful data in RGB histograms, although they seem to be very common in image manipulation applications, and digital cameras. The problem lies with the notion of the RGB colour space. It is a space in which chrominance and luminance are coupled together, and as such it is difficult to manipulate any one of the channels without causing shifts in colour. Typically, applications that allow manipulation of the histogram do so by first converting the image to a decoupled colour space such as HSB (Hue-Saturation-Brightness), where the brightness can be manipulated independently of the colour information.

A Note on 3D RGB: Although it would be somewhat useful, there are very few applications that provide a 3D histogram, constructed from the R, G, and B information. One reason is that these 3D matrices could be very sparse. Instead of three 2D histograms, each with 256 pieces of information, there is now a 3D histogram with 2563 or 16,777,216 pieces of information. The other reason is that 3D histograms are hard to visualize.