Computer Vision - Part 1: Reading The MNIST Dataset in R

4 minute read

Published:

In this series I will go through the creation of machine learning models to perform a basic computer vision task: digit recognition.

The task is to train a machine learning algorithm to recognise handwritten digits 0-9.

For this modelling I will use the famous, introductory dataset for computer vision: the MNIST dataset.

About MNIST

The MNIST dataset (Modified National Institute of Standards and Technology database) contains 60,000 training images and 10,000 testing images of handwritten numerical digits from 0-9.

These images come from a mix of both United States Census Bureau employees and high school students. The MNIST dataset has standardised the initial images into a black and white 28x28 pixel bounding box and shuffled the training and test split to be more conducive for machine learning applications.

The data are located at: http://yann.lecun.com/exdb/mnist/

Read in the data

The data are available in a special file format tuned for storing vectors and multidimensional matrices. The data are in four files per below.

train-images-idx3-ubyte.gz: training set images (9912422 bytes)
train-labels-idx1-ubyte.gz: training set labels (28881 bytes)
t10k-images-idx3-ubyte.gz: test set images (1648877 bytes)
t10k-labels-idx1-ubyte.gz: test set labels (4542 bytes)

This guide walks through how to read this into R.

Reading Images

The images are represented as row-wise pixel values from 0-255 where 0 means background (white) and 255 means foreground (black).

Given each image is a standardised 28x28 pixel image we will have 28*28=784 pixels per image.

We will read this as a matrix with one row per image.

The function below will download the compressed IDX files and read the values required, putting them into the required matrix structure.

library(R.utils)

read_mnist_file <- function(url){
  tf <- tempfile(fileext = ".gz")
  download.file(url, destfile =  tf)
  uncomp <- R.utils::gunzip(tf)
  file <- file(uncomp, 'rb')
  magic <- readBin(con = file, what = 'integer', n = 1, size = 4, endian = "big")
  imgs <-  readBin(con = file, what = 'integer', n = 1, size = 4, endian = "big")
  nrows <- readBin(con = file, what = 'integer', n = 1, size = 4, endian = "big")
  ncols <- readBin(con = file, what = 'integer', n = 1, size = 4, endian = "big")
  x <- readBin(file, 'integer', n = imgs * nrows * ncols,  size = 1, signed = F)
  xmat <- matrix(x, ncol = nrows * ncols, byrow = T)
  close(file)
  return(xmat)
    }

Now we can apply this function to our test and training data.

training_images <- read_mnist_file('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz')
test_images <- read_mnist_file('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz')

We have two matrices with 784 columns (one column per pixel value). The training set has 60,000 rows (images) and the test dataset has 10,000 rows (images).

 dim(training_images)
## [1] 60000   784
 dim(test_images)
## [1] 10000   784

Labels

Now we can read in the labels, which tell us the ground truth of each image value from 0-9.

read_mnist_label <- function(url){
  tf <- tempfile(fileext = ".gz")
  download.file(url, destfile =  tf)
  uncomp <- R.utils::gunzip(tf)
  file <- file(uncomp, 'rb')
  magic <- readBin(con = file, what = 'integer', n = 1, size = 4, endian = "big")
  imgs <-  readBin(con = file, what = 'integer', n = 1, size = 4, endian = "big")
  lab <- readBin(con = file, what = 'integer', n = imgs, size = 1, endian = "big")
  close(file)
  return(lab)
  }
training_labels <- read_mnist_label(url = "http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz")
test_labels <- read_mnist_label(url = "http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz")

Let’s preview the first 20 labels. We can check these are right below when we plot the images themselves.

head(training_labels, 20)
##  [1] 5 0 4 1 9 2 1 3 1 4 3 5 3 6 1 7 2 8 6 9

We will save this for future use.

Visualising the MNIST digits

We can plot each image to get a feel for what the data represents.

The below function takes a given row/s from either the test or training set and turns it into its own 28x28 matrix so we can view it.

show_mnist_image <- function(data, image_num = 1){
  x <- data[image_num,]

  if(length(image_num) == 1){
    tm <- matrix(x, nrow = 28, ncol = 28)
    image(tm[,28:1])
  } else {
    col <- ceiling(sqrt(length(image_num)))
    row <- length(image_num) %/% sqrt(length(image_num))
    par(mfcol=c(row, col))
    for(i in 1:length(image_num)){
      tm <- matrix(x[i,], nrow = 28, ncol = 28)
      image(tm[,28:1])
    }
  }
}

show_mnist_image(data = training_images, 1:12)

References

https://en.wikipedia.org/wiki/MNIST_database

http://yann.lecun.com/exdb/mnist/