Computer Vision - Part 1: Reading The MNIST Dataset in R

Published:

In this series I will go through the creation of machine learning models to perform a basic computer vision task: digit recognition.

The task is to train a machine learning algorithm to recognise handwritten digits 0-9.

For this modelling I will use the famous, introductory dataset for computer vision: the MNIST dataset.

The MNIST dataset (Modified National Institute of Standards and Technology database) contains 60,000 training images and 10,000 testing images of handwritten numerical digits from 0-9.

These images come from a mix of both United States Census Bureau employees and high school students. The MNIST dataset has standardised the initial images into a black and white 28x28 pixel bounding box and shuffled the training and test split to be more conducive for machine learning applications.

The data are located at: http://yann.lecun.com/exdb/mnist/

The data are available in a special file format tuned for storing vectors and multidimensional matrices. The data are in four files per below.

train-images-idx3-ubyte.gz: training set images (9912422 bytes)
train-labels-idx1-ubyte.gz: training set labels (28881 bytes)
t10k-images-idx3-ubyte.gz: test set images (1648877 bytes)
t10k-labels-idx1-ubyte.gz: test set labels (4542 bytes)

This guide walks through how to read this into R.

The images are represented as row-wise pixel values from 0-255 where 0 means background (white) and 255 means foreground (black).

Given each image is a standardised 28x28 pixel image we will have 28*28=784 pixels per image.

We will read this as a matrix with one row per image.

The function below will download the compressed IDX files and read the values required, putting them into the required matrix structure.

library(R.utils)

tf <- tempfile(fileext = ".gz")
uncomp <- R.utils::gunzip(tf)
file <- file(uncomp, 'rb')
magic <- readBin(con = file, what = 'integer', n = 1, size = 4, endian = "big")
imgs <-  readBin(con = file, what = 'integer', n = 1, size = 4, endian = "big")
nrows <- readBin(con = file, what = 'integer', n = 1, size = 4, endian = "big")
ncols <- readBin(con = file, what = 'integer', n = 1, size = 4, endian = "big")
x <- readBin(file, 'integer', n = imgs * nrows * ncols,  size = 1, signed = F)
xmat <- matrix(x, ncol = nrows * ncols, byrow = T)
close(file)
return(xmat)
}


Now we can apply this function to our test and training data.

training_images <- read_mnist_file('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz')


We have two matrices with 784 columns (one column per pixel value). The training set has 60,000 rows (images) and the test dataset has 10,000 rows (images).

 dim(training_images)

## [1] 60000   784

 dim(test_images)

## [1] 10000   784


Labels

Now we can read in the labels, which tell us the ground truth of each image value from 0-9.

read_mnist_label <- function(url){
tf <- tempfile(fileext = ".gz")
uncomp <- R.utils::gunzip(tf)
file <- file(uncomp, 'rb')
magic <- readBin(con = file, what = 'integer', n = 1, size = 4, endian = "big")
imgs <-  readBin(con = file, what = 'integer', n = 1, size = 4, endian = "big")
lab <- readBin(con = file, what = 'integer', n = imgs, size = 1, endian = "big")
close(file)
return(lab)
}

training_labels <- read_mnist_label(url = "http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz")


Let’s preview the first 20 labels. We can check these are right below when we plot the images themselves.

head(training_labels, 20)

##  [1] 5 0 4 1 9 2 1 3 1 4 3 5 3 6 1 7 2 8 6 9


We will save this for future use.

Visualising the MNIST digits

We can plot each image to get a feel for what the data represents.

The below function takes a given row/s from either the test or training set and turns it into its own 28x28 matrix so we can view it.

show_mnist_image <- function(data, image_num = 1){
x <- data[image_num,]

if(length(image_num) == 1){
tm <- matrix(x, nrow = 28, ncol = 28)
image(tm[,28:1])
} else {
col <- ceiling(sqrt(length(image_num)))
row <- length(image_num) %/% sqrt(length(image_num))
par(mfcol=c(row, col))
for(i in 1:length(image_num)){
tm <- matrix(x[i,], nrow = 28, ncol = 28)
image(tm[,28:1])
}
}
}

show_mnist_image(data = training_images, 1:12)


References

https://en.wikipedia.org/wiki/MNIST_database

http://yann.lecun.com/exdb/mnist/

Tags: