July 23, 2019

Compressing images using Python

Compressing images is a neat way to shrink the size of an image while maintaining the resolution. In this tutorial we’re building an image compressor using Python, Numpy and Pillow. We’ll be using machine learning, the unsupervised K-means algorithm to be precise.

If you don’t have Numpy and Pillow installed, you can do so using the following command:

pip3 install pillow
pip3 install numpy

Start by importing the following libraries:

import os
import sys

from PIL import Image
import numpy as np

The K-means algorithm

K-means, as mentioned in the introduction, is an unsupervised machine learning algorithm. Simplified, the major difference between unsupervised and supervised machine learning algorithms is that supervised learning algorithms learn by example (labels are in the dataset), and unsupervised learning algorithms learn by trial and error; you label the data yourself.

I’ll be explaining how the algorithm works based on an example. In the illustration below there are two features, \(x_1\) and \(x_2\).

Unclassified points on a 2d plane

We want to assign each item to one out of two clusters. The most natural way to do this would be something like this:

Colored points on a 2 plane

K-means is an algorithm to do exactly this. Note that \(K\) is the number of clusters we want to have, hence the name K means.

We start by randomly getting \(K\) training examples and in \(\mu\) so that the points are \(\mu_1, \mu_2, …, \mu_k\) and \(\mu \in \mathbb{R}^n\) where \(n\) is the number of features. At initialization, the points might be very close to one another, so we have to check if our result looks like how we want it to look because it might stuck be in a local optimum.

Then for a certain number of iterations, execute the following steps:

  1. For every training example, assign \(c^{(i)}\) to the closest centroid.
  2. For every centroid \(\mu_k\), set the location to be the average of examples assigned to it.

As the algorithm progresses, generally the centroids will move to the center of the clusters and the overall distance of the examples to the clusters gets smaller.

K-means for image compression

This technique can also be used for image compression. Each pixel of the image consists of three values, R(ed), B(lue) and G(reen). Those can be seen as the points on the grid, in 3D in case of RGB. The objective of image compression in this case is to reduce the number of colors by taking the average \(K\) colors that look closest to the original image. An image with fewer colors takes up less disk space, which is what we want.

Compressing the image can also be a preprocessing step to another algorithm. Reducing the size of the input data generally speeds up learning.

Implementing K-means

We start by implementing a function that creates initial points for the centroids. This function takes as input X, the training examples, and chooses \(K\) distinct points at random.

def initialize_K_centroids(X, K):
    """ Choose K points from X at random """
    m = len(X)
    return X[np.random.choice(m, K, replace=False), :]

Then we write a function to find the closest centroid for each training example. This is the first step of the algorithm. We take X and the centroids as input and return the the index of the closest centroid for every example in c, an m-dimensional vector.

def find_closest_centroids(X, centroids):
    m = len(X)
    c = np.zeros(m)
    for i in range(m):
        # Find distances
        distances = np.linalg.norm(X[i] - centroids, axis=1)

        # Assign closest cluster to c[i]
        c[i] = np.argmin(distances)

    return c

For the second step of the algorithm, we compute the distance of each example to ‘its’ centroid and take the average of distance for every centroid \(\mu_k\). Because we’re looping over the rows, we have to transpose the examples.

def compute_means(X, idx, K):
    _, n = X.shape
    centroids = np.zeros((K, n))
    for k in range(K):
        examples = X[np.where(idx == k)]
        mean = [np.mean(column) for column in examples.T]
        centroids[k] = mean
    return centroids

Finally, we got all the ingredients to complete the K-means algorithm. We set max_iter, the maximum number of iterations, to 10. Note that if the centroids aren’t moving anymore, we return the results because we cannot optimize any further.

def find_k_means(X, K, max_iters=10):
    centroids = initialize_K_centroids(X, K)
    previous_centroids = centroids
    for _ in range(max_iters):
        idx = find_closest_centroids(X, centroids)
        centroids = compute_means(X, idx, K)
        if (centroids == previous_centroids).all():
            # The centroids aren't moving anymore.
            return centroids
        else:
            previous_centroids = centroids

    return centroids, idx

Getting the image

In case you’ve never worked with Pillow before, don’t worry. The api is very easy.

We start by trying to open the image, which is defined as the first (and last) command line argument like so:

try:
    image_path = sys.argv[1]
    assert os.path.isfile(image_path)
except (IndexError, AssertionError):
    print('Please specify an image')

Pillow gives us an Image object, but our algorithm requires a NumPy array. So let’s define a little helper function to convert them. Notice how each value is divided by 255 to scale the pixels to 0…1 (a good ML practice).

def load_image(path):
    """ Load image from path. Return a numpy array """
    image = Image.open(path)
    return np.asarray(image) / 255

Here’s how to use that:

image = load_image(image_path)
w, h, d = image.shape
print('Image found with width: {}, height: {}, depth: {}'.format(w, h, d))

Then we get our feature matrix \(X\). We’re reshaping the image because each pixel has the same meaning (color), so they don’t have to be presented as a grid.

X = image.reshape((w * h, d))
K = 20 # the desired number of colors in the compressed image

Finally we can make use of our algorithm and get the \(K\) colors. These colors are chosen by the algorithm.

colors, _ = find_k_means(X, K, max_iters=20)

Because the indexes returned by the find_k_means function are 1 iteration behind the colors, we compute the indexes for the current colors. Each pixel has a value in \({0...K}\) corresponding, of course, to its color.

idx = find_closest_centroids(X, colors)

Once we have all the data required we reconstruct the image by substituting the color index with the color and resphaping the image back to its original dimensions. Then using the Pillow function Image.fromarray we convert the raw numbers back to an image. We also convert the indexes to integers because numpy only accepts those as indexes for matrices.

idx = np.array(idx, dtype=np.uint8)
X_reconstructed = np.array(colors[idx, :] * 255, dtype=np.uint8).reshape((w, h, d))
compressed_image = Image.fromarray(X_reconstructed)

Finally, we save the image back to the disk like so:

compressed_image.save('out.png')

Testing an image

This is the fun part, testing our program. You can choose any (png) image you want. I use this image by Valentin Pinisoara on Unsplash that I scaled down for speed purposes.

Compressed image next to uncompressed image

The file size was reduced by 71%, from 228kb to 65kb with \(K = 20\).

To get a better intuition for this algorith, you should try out different values for \(K\) and max_iter yourself.

Conclusion

As you’ve seen in this tutorial, it’s quite easy to reduce the file size of an image using unsupervised machine learning.

Check out the finished program on GitHub.

Thanks to Daniel Hess for reporting an error in this post.

If you liked this post, you will probably also like gettingstarted.ml: an open source list of the best courses, books, papers, and more to quickly get started with ML.

Thanks for reading

I hope this post was useful to you. If you have any questions or comments, feel free to reach out on Twitter or email me directly at rick_wierenga [at] icloud [dot] com.