Classify dog and cat pictures with a 92% accuracy with a deep convolutional neural network.
In this article, you will build a convolutional neural network from scratch to classify images into two categories, dog or cat, with a 92% accuracy.
We're not going to use transfer learning this time (so no cheating!), and I will explain in details the process I followed to solve this classic problem.
You will learn how to:
Finally, you will try the ResNet50 pre-trained model, just to see.
Unlike most posts on this blog, I do not provide a recipe to run this notebook on Google Colab. I tried it, but it appears that:
As a consequence, you'll need to run on your machine.
First, install TensorFlow for your Linux or Windows PC. With these recipes, you will also install anaconda, including the keras package.
Then, install these packages with conda:
conda install numpy matplotlib
Clone my github repo locally, and start the notebook:
git clone https://github.com/cbernet/maldives.git
cd maldives/dogs_vs_cats
jupyter notebook
And open the notebook
dogs_vs_cats_local.ipynb
.
I haven't tested this recipe, so if it doesn't work, tell me in the comments and I'll help right away.
The dogs and cats dataset was first introduced for a Kaggle competition in 2013. To access the dataset, you will need to create a Kaggle account and to log in. No pressure, we're not here for the competition, but to learn!
The dataset is available here . You can use the kaggle utility to get the dataset, or simply download the train.zip file (about 540 MB). Don't forget to log in to Kaggle first.
The instructions to prepare the dataset are for Linux or macOS. If you work on Windows, I'm sure you can easily find a way to do that (e.g. use 7-zip do unpack the archive, and Windows Explorer to create directories and move files around).
Once downloaded, unzip the archive:
unzip train.zip
List the contents of the train directory:
ls train
You will see a lot of images of dogs and cats.
In the next sections, we will use Keras to retrieve images from disk with the flow_from_directory method of the ImageDataGenerator class.
This method however requires the images from the different classes to be sorted in different directories. So we will put all dog images in
dogs
, and all cat images in
cats
:
mkdir cats
mkdir dogs
find train -name 'dog.*' -exec mv {} dogs/ \;
find train -name 'cat.*' -exec mv {} cats/ \;
You might be wondering why I used
find
instead of a simple
mv
to move the files. It's because with
mv
, the shell needs to pass a very large number of arguments to the command (all the file names), and there is a limitation on this number on macOS (on Linux, it's working fine). With
find
, we can work around this limitation.
Now, enter in the cell below the location of your dataset directory, the one that contains the
dogs
and
cats
subdirectories, and execute it:
# define and move to dataset directory
datasetdir = '/data2/cbernet/maldives/dogs_vs_cats'
import os
os.chdir(datasetdir)
# import the needed packages
import matplotlib.pyplot as plt
import matplotlib.image as img
from tensorflow import keras
# shortcut to the ImageDataGenerator class
ImageDataGenerator = keras.preprocessing.image.ImageDataGenerator
We can start investigating the dataset by plotting the first picture in each category:
plt.subplot(1,2,1)
plt.imshow(img.imread('cats/cat.0.jpg'))
plt.subplot(1,2,2)
plt.imshow(img.imread('dogs/dog.0.jpg'))
Cute. But let's be more specific and print some information about our images:
images = []
for i in range(10):
im = img.imread('cats/cat.{}.jpg'.format(i))
images.append(im)
print('image shape', im.shape, 'maximum color level', im.max())
In the image shape, the first two columns correspond to the height and width of the image in pixels, respectively, and the third one to the three color channels. So each pixel contains three values (for red, green, and blue, respectively). We also print the maximum color level in each channel, and we can conclude that the RGB levels are in the range 0-255.
If there is only one thing that you should take away from this tutorial it is this:
You Should Never Trust Your Data
The data are always dirty.
To get a close look at this dataset, I used a fast image browser to check all images in the dogs and cats directories. Actually, I simply used the Preview application on my mac to browse through the small preview icons. The brain is very fast to spot obvious issues even if you just let your eyes wander on a large number of pictures. So this (tedious) work took me no more than 20 minutes. But of course, I have certainly missed a lot of less obvious issues.
Anyway, here is what I found.
First, here are the indices for the bad images I could find in each category:
bad_dog_ids = [5604, 6413, 8736, 8898, 9188, 9517, 10161,
10190, 10237, 10401, 10797, 11186]
bad_cat_ids = [2939, 3216, 4688, 4833, 5418, 6215, 7377,
8456, 8470, 11565, 12272]
We can then retrieve the images with these ids from the
cats
and
dogs
directories:
def load_images(ids, categ):
'''return the images corresponding to a list of ids'''
images = []
dirname = categ+'s' # dog -> dogs
for theid in ids:
fname = '{dirname}/{categ}.{theid}.jpg'.format(
dirname=dirname,
categ=categ,
theid=theid
)
im = img.imread(fname)
images.append(im)
return images
bad_dogs = load_images(bad_dog_ids, 'dog')
bad_cats = load_images(bad_cat_ids, 'cat')
def plot_images(images, ids):
ncols, nrows = 4, 3
fig = plt.figure( figsize=(ncols*3, nrows*3), dpi=90)
for i, (img, theid) in enumerate(zip(images,ids)):
plt.subplot(nrows, ncols, i+1)
plt.imshow(img)
plt.title(str(theid))
plt.axis('off')
plot_images(bad_dogs, bad_dog_ids)
Some of these images are completely meaningless, like 5604 and 8736. For 10401 and 10797, we actually see a cat in the picture! sigh... Keeping or not cartoon dogs is debatable. My feeling is that it's going to be better to remove them. Same for 6413, we could keep it, but I'm afraid that the network would focus on the drawings around the dog picture.
Now let's look at the bad cats:
plot_images(bad_cats, bad_cat_ids)
Again, I'm not a fan of keeping cartoon cats for training, but who knows, it could be ok. In image 4688, we have a small cartoon cat and a big cartoon dog... this one should clearly be rejected. In image 6215, we just see fur, so it could be either a cat or a dog, though this does look like cat fur. And why is the guy in image 7377 labeled as a cat? no idea...
It should however be noted that even if we decide to reject cartoon images for training, the trained network might be able to identify them correctly.
Now let's implement a small function to clean up the dataset:
import glob
import re
import shutil
# matches any string with the substring ".<digits>."
# such as dog.666.jpg
pattern = re.compile(r'.*\.(\d+)\..*')
def trash_path(dirname):
'''return the path of the Trash directory,
where the bad dog and bad cat images will be moved.
Note that the Trash directory should not be within the dogs/
or the cats/ directory, or Keras will still find these pictures.
'''
return os.path.join('../Trash', dirname)
def cleanup(ids, dirname):
'''move away images with these ids in dirname
'''
os.chdir(datasetdir)
# keep track of current directory
oldpwd = os.getcwd()
# go to either cats/ or dogs/
os.chdir(dirname)
# create the trash directory.
# if it exists, it is first removed
trash = trash_path(dirname)
if os.path.isdir(trash):
shutil.rmtree(trash)
os.makedirs(trash, exist_ok=True)
# loop on all cat or dog files
fnames = os.listdir()
for fname in fnames:
m = pattern.match(fname)
if m:
# extract the id
the_id = int(m.group(1))
if the_id in ids:
# this id is in the list of ids to be trashed
print('moving to {}: {}'.format(trash, fname))
shutil.move(fname, trash)
# going back to root directory
os.chdir(oldpwd)
def restore(dirname):
'''restores files in the trash
I will need this to restore this tutorial to initial state for you
and you might need it if you want to try training the network
without the cleaning of bad images
'''
os.chdir(datasetdir)
oldpwd = os.getcwd()
os.chdir(dirname)
trash = trash_path(dirname)
print(trash)
for fname in os.listdir(trash):
fname = os.path.join(trash,fname)
print('restoring', fname)
print(os.getcwd())
shutil.move(fname, os.getcwd())
os.chdir(oldpwd)
cleanup(bad_cat_ids,'cats')
cleanup(bad_dog_ids, 'dogs')
If you want to restore your dataset to its original version before cleaning, just uncomment and execute the following:
# restore('dogs')
# restore('cats')
Neural networks are trained by presenting them with batches of images, each of them with a label identifying the true nature of the image (either cat or dog in our case). A batch may contain of the order of a few tenths to a couple hundred images. If you want an introduction to neural networks and supervised learning for classification, you can check out my post on Handwritten Digit Recognition with scikit-learn .
For each image, the prediction of the network is compared with the corresponding label, and the distance between the predictions of the network and the truth is evaluated for the whole batch. Then, after the processing of the batch, the network parameters are changed in a way to minimize this distance, therefore improving the prediction capability of the network. The training then proceeds iteratively, batch after batch.
So we need a way to turn our images, now files on disk, into batches of data arrays in memory that can be fed to the network during training.
The ImageDataGenerator can readily be used for this purpose. Let's import this class and create an instance of the generator:
gen = ImageDataGenerator()
Now, we will use the
flow_from_directory
method of the
gen
object to start generating batches.
This method will return an iterator that returns a batch everytime it's iterated on. To see how the data is organized, we can simply create this iterator, and get a first batch to look at it:
iterator = gen.flow_from_directory(
os.getcwd(),
target_size=(256,256),
classes=('dogs','cats')
)
# we can guess that the iterator has a next function,
# because all python iterators have one.
batch = iterator.next()
len(batch)
The batch has two elements. What's their type?
print(type(batch[0]))
print(type(batch[1]))
Two numpy arrays! well, it means we can print their shape and type:
print(batch[0].shape)
print(batch[0].dtype)
print(batch[0].max())
print(batch[1].shape)
print(batch[1].dtype)
Obviously, the first element is an array of 32 images with 256x256 pixels, and 3 color channels, encoded as floats in the range 0 to 255. So the ImageDataGenerator did force the image to 256x256 pixels as requested, but didn't normalize the color levels between 0 and 1. We will have to do that later.
The second element contains the 32 corresponding labels.
Before having a detailed look at the labels, we can plot the first image:
import numpy as np
# we need to cast the image array to integers
# before plotting as imshow either takes arrays of integers,
# or arrays of floats normalized to 1.
plt.imshow(batch[0][0].astype(np.int))
And here is the corresponding label:
batch[1][0]
We see that the ImageDataGenerator automatically produced a label for each image depending on the directory in which it was found. One-hot encoding is used for the labels, and this is precisely what we need for the classification task we are up to. If you want to know more about one-hot encoding, check my post about a First Neural Network with Keras .
We can also guess that the label
[0., 1.]
corresponds to a true cat, and
[1.,0.]
to a true dog. The prediction of the network for a given image will lie somewhere in between, like
[0.6,0.4]
for a dog. Still, that's only a guess, and a guess is not enough! we need to be sure, or we take the risk to feed our network with mislabeled images (and garbage in, garbage out).
Since it might be the first time you use the ImageDataGenerator, you probably want to get more confident about this tool. For that, we are going to develop a small function in the next section to validate the input dataset.
To validate the dataset labels, we want to check that, for a few batches, the labels are correctly set. So we need a function that can plot a fairly large number of images and label them. Here it is:
def plot_images(batch):
imgs = batch[0]
labels = batch[1]
ncols, nrows = 4,8
fig = plt.figure( figsize=(ncols*3, nrows*3), dpi=90)
for i, (img,label) in enumerate(zip(imgs,labels)):
plt.subplot(nrows, ncols, i+1)
plt.imshow(img.astype(np.int))
assert(label[0]+label[1]==1.)
categ = 'dog' if label[0]>0.5 else 'cat'
plt.title( '{} {}'.format(str(label), categ))
plt.axis('off')
plot_images(iterator.next())
Please repeat the previous cell until you are confident that the labels are correct.
We will train our neural network on a subset of the dog and cat images called the training dataset .
If a network is complex enough (if it has enough parameters), it can start overfitting , which means that it is able to learn the specific features of the images in the training dataset. In other words, the network loses its generality, and its ability to classify an arbitrary image as a dog or cat image.
To make sure that overfitting does not occur, we will evaluate the performance of the network on a validation dataset , disjoint from the training dataset.
Let's first create a new
ImageDataGenerator
. With respect to the previous one, we request to:
imgdatagen = ImageDataGenerator(
rescale = 1/255.,
validation_split = 0.2,
)
Next, we define our iterators for the training and validation datasets. We use a batch size of 30 because, typically, networks are not able to learn if the batch size is too large or too small. You may try again with a different batch size after completing this tutorial.
The images are forced to 256x256 pixels. Here, we just need to make sure that the format of all images is the same, as the convolutional neural network that we're going to use has a fixed number of inputs. I chose a square shape to avoid getting too much distortion on both landscape and portrait pictures. But if the vast majority of our images is portrait, maybe a shape that is closer to portrait would be a better choice. I didn't check that.
batch_size = 30
height, width = (256,256)
train_dataset = imgdatagen.flow_from_directory(
os.getcwd(),
target_size = (height, width),
classes = ('dogs','cats'),
batch_size = batch_size,
subset = 'training'
)
val_dataset = imgdatagen.flow_from_directory(
os.getcwd(),
target_size = (height, width),
classes = ('dogs','cats'),
batch_size = batch_size,
subset = 'validation'
)
Deep convolutional neural networks are the go-to choice when it comes to classifying images. For a detailed introduction to this kind of networks, see my post on Tuning such a network for handwritten digit recognition . The model below is very similar to the one we have used in this article.
model = keras.models.Sequential()
initializers = {
}
model.add(
keras.layers.Conv2D(
24, 5, input_shape=(256,256,3),
activation='relu',
)
)
model.add( keras.layers.MaxPooling2D(2) )
model.add(
keras.layers.Conv2D(
48, 5, activation='relu',
)
)
model.add( keras.layers.MaxPooling2D(2) )
model.add(
keras.layers.Conv2D(
96, 5, activation='relu',
)
)
model.add( keras.layers.Flatten() )
model.add( keras.layers.Dropout(0.9) )
model.add( keras.layers.Dense(
2, activation='softmax',
)
)
model.summary()
Here are the main differences with respect to the network used for handwritten digits recognition:
input_shape
of the first layer has to match the shape given to the generator.
The first two points are kind of technical, we just need to do this or the network will not run.
The last two points are far from being obvious. These choices come from a long optimization. I started with two convolutional layers and less features, but the training accuracy was plateauing, which is a sign that the network was underfitting . This means that the network did not have enough parameters to describe the training dataset. So let alone the validation dataset...
So I increased the complexity by adding a layer, and by increasing the number of extracted features in each layer until the training accuracy could reach almost 100%.
At that point, the network was overfitting : the validation accuracy was much lower than the training accuracy. So I increased the dropout rate slowly from 0.4 to 0.9 to tame overfitting, and actually reached this fairly high value. This means that before entering the last dense layer for classification, the dropout layer drops 90% of the variables from the previous stages of the network on a random basis. That's a lot!
To train a neural network, we need to use an optimizer. This tool decides how to change the parameters of the network after each batch to minimize the distance between the network output and the truth. Among the optimizers implemented in Keras , people often choose Adam, or RMSProp, often as a matter of habit.
But in this case, these optimizers are not working well. The network very often starts training in a bad configuration very far from the optimal set of parameters, and is not able to learn. The loss starts around 8, and does not improve. As a result, the training accuracy remains at 50%, which is equivalent to a random guess.
So I fell back to Stochastic Gradient Descent (SGD). It works well, and the network learns. But the training is veeery long. And, by the way, this is the reason why people invented boosted optimizers such as Adam or RMSProp.
In all these studies, I tried to vary wildly the learning rate, to no avail.
I then read the paper about Adam, A Method for Stochastic Optimization , and decided to give a try to AdaMax, a variant of Adam:
model.compile(loss='binary_crossentropy',
optimizer=keras.optimizers.Adamax(lr=0.001),
metrics=['acc'])
After compiling the model, we train it on the training dataset, validating the results at the end of each epoch with the validation dataset. I'm using 10 cores of my CPU to handle the ImageDataGenerator tasks, and two GeForce GTX 1080 Ti for TensorFlow.
Each epoch should take about one minute. If it takes much more, like twenty minutes, you should make sure that you are indeed making use of your GPU. Check out your installation of the nvidia drivers and TensorFlow on Linux or Windows .
history = model.fit_generator(
train_dataset,
validation_data = val_dataset,
workers=10,
epochs=20,
)
Now that the training is done, we need a way to see how the training worked. For this, we'll write a small function to plot the loss and the accuracy for both the training and validation datasets, as a function of the epoch:
def plot_history(history, yrange):
'''Plot loss and accuracy as a function of the epoch,
for the training and validation datasets.
'''
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
# Get number of epochs
epochs = range(len(acc))
# Plot training and validation accuracy per epoch
plt.plot(epochs, acc)
plt.plot(epochs, val_acc)
plt.title('Training and validation accuracy')
plt.ylim(yrange)
# Plot training and validation loss per epoch
plt.figure()
plt.plot(epochs, loss)
plt.plot(epochs, val_loss)
plt.title('Training and validation loss')
plt.show()
And here are the results:
plot_history(history, (0.65, 1.))
We can conclude the following:
We cannot increase dropout further, so we would need more data to do a better job. In the next section, we will see how to use data augmentation to generate more training images from the ones we already have. That's going to be much easier and much faster than collecting and tagging new cat and dog images.
Data augmentation consists in generating new training examples from the ones we already have, in such a way as to increase artificially the size of the training sample. That's very easy to do with the ImageDataGenerator. For example, we can start by randomly flipping left and right in our images:
imgdatagen = ImageDataGenerator(
rescale = 1/255.,
horizontal_flip = True,
validation_split = 0.2,
)
Let's see the effect of this transformation on a given image:
image = img.imread('cats/cat.12.jpg')
def plot_transform():
'''apply the transformation 8 times randomly'''
nrows, ncols = 2,4
fig = plt.figure(figsize=(ncols*3, nrows*3), dpi=90)
for i in range(nrows*ncols):
timage = imgdatagen.random_transform(image)
plt.subplot(nrows, ncols, i+1)
plt.imshow(timage)
plt.axis('off')
plot_transform()
You should be able to see the effect of the horizontal flip, except if you're really unlucky!
Now, let's make the transformation a bit more complex. This time, the ImageDataGenerator will flip, zoom, and rotate the images on a random basis:
imgdatagen = ImageDataGenerator(
rescale = 1/255.,
horizontal_flip = True,
zoom_range = 0.3,
rotation_range = 15.,
validation_split = 0.1,
)
plot_transform()
We have seen that these transformations produce new images that are perfectly acceptable. So let's retrain the network with data augmentation. It is important to note that, due to the random nature of the transformations, the network will see each image only once. We can therefore expect that it will be difficult for the network to overfit.
batch_size = 30
height, width = (256,256)
train_dataset = imgdatagen.flow_from_directory(
os.getcwd(),
target_size = (height, width),
classes = ('dogs','cats'),
batch_size = batch_size,
subset = 'training'
)
val_dataset = imgdatagen.flow_from_directory(
os.getcwd(),
target_size = (height, width),
classes = ('dogs','cats'),
batch_size = batch_size,
subset = 'validation'
)
With data augmentation, there is no need for a large dropout rate anymore. I reduced it from 0.9 to 0.2 but I haven't tried to tune this parameter:
model = keras.models.Sequential()
initializers = {
}
model.add(
keras.layers.Conv2D(
24, 5, input_shape=(256,256,3),
activation='relu',
)
)
model.add( keras.layers.MaxPooling2D(2) )
model.add(
keras.layers.Conv2D(
48, 5, activation='relu',
)
)
model.add( keras.layers.MaxPooling2D(2) )
model.add(
keras.layers.Conv2D(
96, 5, activation='relu',
)
)
model.add( keras.layers.Flatten() )
model.add( keras.layers.Dropout(0.2) )
model.add( keras.layers.Dense(
2, activation='softmax',
)
)
model.summary()
model.compile(loss='binary_crossentropy',
optimizer=keras.optimizers.Adamax(lr=0.001),
metrics=['acc'])
history_augm = model.fit_generator(
train_dataset,
validation_data = val_dataset,
workers=10,
epochs=40,
)
plot_history(history_augm, (0.65, 1))
As you can see, with data augmentation, the training takes longer, but overfitting is much reduced, and we could reach a classification accuracy of 92%.
We could keep tuning the network to limit overfitting by maybe increasing a bit the dropout rate, and train more. But I guess we're not going to be able to reach 95% on this dataset.
Fortunately, there are other ways, as we're going to see.
Many clever people have been working on image recognition, with powerful hardware and very large datasets. I don't know about you, but I don't have that.
However, what we can do is to use their networks directly. Such networks are called pre-trained models . Because these models could be trained on very large datasets, they usually have a deep and complex architecture, and are extremely powerful. These models can be downloaded with their parameters set to the values they had at the end of the training. In this way, we can easily get excellent performance without training a new model ourselves.
Here is the list of pre-trained models available in keras .
I decided to use ResNet50 , a model trained on the ImageNet dataset, which contains 14 million images belonging to 1000 categories.
First, we download and build the model in just a few lines of code:
from keras.applications.resnet50 import ResNet50
from keras.preprocessing import image
from keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np
model = ResNet50(weights='imagenet')
Then, we define a small utility function to evaluate the model on an input image, and we call this function on a couple pictures in our dataset:
def evaluate(img_fname):
img = image.load_img(img_fname, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = model.predict(x)
# print the probability and category name for the 5 categories
# with highest probability:
print('Predicted:', decode_predictions(preds, top=5)[0])
plt.imshow(img)
evaluate('dogs/dog.0.jpg')
evaluate('dogs/dog.1.jpg')
evaluate('dogs/dog.2.jpg')
As you can see, the network gives dogs in the top categories, and is even to predict, to a large extent, the race of the dog!
What about cats?
evaluate('cats/cat.0.jpg')
Here is does not work that well. The top category is television, presumably because of the low quality of the image. And then we get dogs that are of this color, and finally a cat. For the next picture, the network is behaving in a better way:
evaluate('cats/cat.1.jpg')
Now let's try with dog cartoons:
evaluate('Trash/dogs/dog.9188.jpg')
The top-probability category is indeed a dog! ... closely followed by a chain saw. Now what about the cover picture of this blog post?
# download the image from my github repository
import urllib.request as req
url = 'https://raw.githubusercontent.com/cbernet/maldives/master/dogs_vs_cats/datafrog_chien_chat.png'
req.urlretrieve(url, 'dog_cartoon.jpg')
evaluate('dog_cartoon.jpg')
The network did not get fooled by the cosplay: it's a dog!
Ok we had fun with ResNet50, but the network is not doing exactly what we want, which is to classify pictures as dog or cat. To do this with a model based on ImageNet, we would need to be able to find out that a fine-grained category such as
Great_Dane
actually belongs to the coarser
dog
category. Until we do that, we cannot quantify the performance of this model in the context of our problem.
We could do that by hand, but we'll instead learn a more elegant solution in the next article.
In this post, you learnt how to:
We reached an accuracy of 92% with our simple convolutional network, but we have seen that it's going to be difficult to go further.
To further improve performance, we would need to use a more complex architecture. But we see that even with our small convolutional network and serious data augmentation, we're already affected by overfitting. It would be even more problematic with a sophisticated neural net with many more parameters.
But in the next post, Image Recognition with Transfer Learning , we will see how to use transfer learning to use state-of-the-art convolutional networks to classify dogs and cats with a 98.5% accuracy!
And the basics:
Please let me know what you think in the comments! I’ll try and answer all questions.
And if you liked this article, you can subscribe to my mailing list to be notified of new posts (no more than one mail per week I promise.)
You can join my mailing list for new posts and exclusive content: