# First Neural Network with Keras

## Introduction

In my post about Handwritten Digit Recognition with scikit-learn we have seen that a simple neural network can easily be trained with scikit-learn. And we managed to classify images of digits with an accuracy larger than 90%.

But scikit-learn is generally not adapted to neural networks.

Its goal is in fact to provide a unified interface for training and testing different machine learning algorithms: neural networks, and also support-vector machines, naive Bayes, nearest neighbours, decision trees, etc. Indeed, in machine learning projects, you'll spend a large fraction of your time choosing the best algorithm and tuning it for best performance. Scikit-learn was designed to make these tasks as easy as possible.

However, when it comes to neural networks specifically, scikit-learn has two major drawbacks:

• the interface does not provide enough control to build complex neural networks, nor to control their behaviour in details.
• it's not adapted to deep learning

In this post, we will repeat our exercise about handwritten digit recognition with Keras, a high-level neural network API. You will learn:

• How to install Keras
• How to create a simple dense neural network with this tool
• How to estimate its performance

In a later article, we will see how to do deep learning with Keras.

Prerequisites:

## Installation

To run this notebook interactively, the easiest is to click here to execute it on Google Colab.

But if you prefer to run it on your own machine, follow these instructions.

Then create an environment for this tutorial:

conda create -n hwd_keras


and activate it:

conda activate hwd_keras


Install the needed packages:

conda install keras  scikit-learn jupyter matplotlib


Then, download this notebook and open it in a jupyter notebook:

jupyter notebook handwritten_digits_keras.ipynb


## Preparing the dataset¶

As in handwritten digit recognition with scikit-learn, we are going to use the digits dataset provided by scikit-learn. The digits are 8x8 images and we will feed them to a neural network with:

• an input layer with 8x8 = 64 neurons
• a hidden layer with 15 neurons
• an output layer with 10 neurons corresponding to the 10 digit categories.

First, let's initialize our tools and load the digits dataset:

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import os
# for some reason, the following is needed to run on mac os
os.environ['KMP_DUPLICATE_LIB_OK']='True'

from sklearn import datasets


The input layer requires a 1-dimensional array in input, but our images are 2D. So we need to flatten all images:

In [2]:
x = digits.images.reshape((len(digits.images), -1))
x.shape

Out[2]:
(1797, 64)

The labels require a bit of attention. At the moment, digits.target contains the digit corresponding to each image in the dataset:

In [3]:
digits.target

Out[3]:
array([0, 1, 2, ..., 8, 9, 8])

But in Keras, we have to build our neural network with 10 output neurons (this actually happens under the hood in scikit-learn). During the training, Keras will have to compare the 10 output values of these neurons to the target value. But how can we compare a vector of 10 values with a single target value?

The solution is to translate each target value into a vector of length 10 with a technique called one-hot encoding:

• target 0 is translated to [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
• target 1 is translated to [0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
• ...
• target 9 is translated to [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]

After doing that, the values from the output neurons, which are probabilities ranging from 0 to 1, can be compared directly to the values in the target vector. In this way, for a given number, say 0, the neural network will be trained to output a high probability from the first output neuron, and a low probability from the following neurons.

One-hot encoding can be performed easily with the utilities provided by Keras:

In [4]:
from keras.utils import np_utils
y = np_utils.to_categorical(digits.target,10)
print(digits.target)
print(y)

Using TensorFlow backend.

[0 1 2 ... 8 9 8]
[[1. 0. 0. ... 0. 0. 0.]
[0. 1. 0. ... 0. 0. 0.]
[0. 0. 1. ... 0. 0. 0.]
...
[0. 0. 0. ... 0. 1. 0.]
[0. 0. 0. ... 0. 0. 1.]
[0. 0. 0. ... 0. 1. 0.]]


let's now split our data into a training sample and a testing sample:

In [5]:
split_limit=1000
x_train = x[:split_limit]
y_train = y[:split_limit]
x_test = x[split_limit:]
y_test = y[split_limit:]


The first 1000 images and labels are going to be used for training. The rest of the dataset will be used later to test the performance of our network.

## Creation of the neural network with Keras¶

After importing the necessary tools from Keras, we create the neural network in the following code snippet.

In [6]:
from keras import layers, Model, optimizers, regularizers

In [7]:
# create the input layer
#
# we specify that the input layer
# should have 64 neurons, one for each pixel
# in our images.
# The input neurons do nothing, they
# just transfer the value at each pixel
# to the next layer.
img_input = layers.Input(shape=(64,))

# create the hidden layer
#
# This layer is a Dense layer, which means
# that its neurons are fully connected to the
# neurons in the previous layer (the input layer)
# We will talk about the activation in a future post
tmp = layers.Dense(15,
activation='sigmoid')(img_input)

# create the output layer
#
# The output layer is another Dense layer.
# It must have 10 neurons, corresponding to
# the 10 digit categories
output = layers.Dense(10,
activation='softmax')(tmp)

# create the neural network from the layers
model = Model(img_input, output)

# print a summary of the model
model.summary()

# =================================================
# Please don't pay attention to what follows,
# we'll talk about regularization later!
# For now, it is enough to know that regularization
# helps the neural network converge properly.
# I've added this regularization because it is
# performed by default in scikit-learn,
# and because we want to be able to compare the
# results of scikit-learn and keras.
l2_rate = 1e-4
for layer in model.layers:
if hasattr(layer, 'kernel_regularizer'):
layer.kernel_regularizer = regularizers.l2(l2_rate)
layer.bias_regularizer = regularizers.l2(l2_rate)
layer.activity_regularizer = regularizers.l2(l2_rate)
# =================================================

# define how the neural network will learn,
# and compile the model.
# models must be compiled before
# they can be trained and used.
# the loss, optimizer, and metrics arguments
# will be covered in a future post.
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.SGD(lr=0.1, momentum=0.9),
metrics=['accuracy'])

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         (None, 64)                0
_________________________________________________________________
dense_1 (Dense)              (None, 15)                975
_________________________________________________________________
dense_2 (Dense)              (None, 10)                160
=================================================================
Total params: 1,135
Trainable params: 1,135
Non-trainable params: 0
_________________________________________________________________


Finally, we can train the network:

In [8]:
history = model.fit(x=x_train, y=y_train, validation_data=(x_test,y_test),
batch_size=100, epochs=50)

Train on 1000 samples, validate on 797 samples
Epoch 1/50
1000/1000 [==============================] - 0s 283us/step - loss: 2.1862 - accuracy: 0.2820 - val_loss: 1.8762 - val_accuracy: 0.4015
Epoch 2/50
1000/1000 [==============================] - 0s 25us/step - loss: 1.6556 - accuracy: 0.6000 - val_loss: 1.4465 - val_accuracy: 0.7026
Epoch 3/50
1000/1000 [==============================] - 0s 26us/step - loss: 1.2358 - accuracy: 0.7790 - val_loss: 1.1519 - val_accuracy: 0.7679
Epoch 4/50
1000/1000 [==============================] - 0s 24us/step - loss: 0.9043 - accuracy: 0.8770 - val_loss: 0.8860 - val_accuracy: 0.8206
Epoch 5/50
1000/1000 [==============================] - 0s 24us/step - loss: 0.6885 - accuracy: 0.9010 - val_loss: 0.7609 - val_accuracy: 0.8018
Epoch 6/50
1000/1000 [==============================] - 0s 23us/step - loss: 0.5315 - accuracy: 0.9280 - val_loss: 0.6154 - val_accuracy: 0.8394
Epoch 7/50
1000/1000 [==============================] - 0s 24us/step - loss: 0.4212 - accuracy: 0.9400 - val_loss: 0.5249 - val_accuracy: 0.8871
Epoch 8/50
1000/1000 [==============================] - 0s 23us/step - loss: 0.3538 - accuracy: 0.9510 - val_loss: 0.4787 - val_accuracy: 0.8921
Epoch 9/50
1000/1000 [==============================] - 0s 25us/step - loss: 0.3084 - accuracy: 0.9600 - val_loss: 0.4634 - val_accuracy: 0.8871
Epoch 10/50
1000/1000 [==============================] - 0s 23us/step - loss: 0.2661 - accuracy: 0.9670 - val_loss: 0.4100 - val_accuracy: 0.9009
Epoch 11/50
1000/1000 [==============================] - 0s 23us/step - loss: 0.2312 - accuracy: 0.9720 - val_loss: 0.4087 - val_accuracy: 0.8934
Epoch 12/50
1000/1000 [==============================] - 0s 24us/step - loss: 0.2242 - accuracy: 0.9670 - val_loss: 0.3919 - val_accuracy: 0.8934
Epoch 13/50
1000/1000 [==============================] - 0s 25us/step - loss: 0.2066 - accuracy: 0.9760 - val_loss: 0.3984 - val_accuracy: 0.8971
Epoch 14/50
1000/1000 [==============================] - 0s 24us/step - loss: 0.2050 - accuracy: 0.9690 - val_loss: 0.3984 - val_accuracy: 0.8821
Epoch 15/50
1000/1000 [==============================] - 0s 24us/step - loss: 0.1746 - accuracy: 0.9780 - val_loss: 0.3441 - val_accuracy: 0.9059
Epoch 16/50
1000/1000 [==============================] - 0s 23us/step - loss: 0.1679 - accuracy: 0.9710 - val_loss: 0.3455 - val_accuracy: 0.9109
Epoch 17/50
1000/1000 [==============================] - 0s 23us/step - loss: 0.1613 - accuracy: 0.9830 - val_loss: 0.3565 - val_accuracy: 0.9122
Epoch 18/50
1000/1000 [==============================] - 0s 25us/step - loss: 0.1563 - accuracy: 0.9750 - val_loss: 0.3565 - val_accuracy: 0.8971
Epoch 19/50
1000/1000 [==============================] - 0s 24us/step - loss: 0.1552 - accuracy: 0.9700 - val_loss: 0.3898 - val_accuracy: 0.8959
Epoch 20/50
1000/1000 [==============================] - 0s 23us/step - loss: 0.1482 - accuracy: 0.9750 - val_loss: 0.3995 - val_accuracy: 0.8770
Epoch 21/50
1000/1000 [==============================] - 0s 24us/step - loss: 0.1519 - accuracy: 0.9690 - val_loss: 0.3622 - val_accuracy: 0.9059
Epoch 22/50
1000/1000 [==============================] - 0s 23us/step - loss: 0.1346 - accuracy: 0.9800 - val_loss: 0.3411 - val_accuracy: 0.9034
Epoch 23/50
1000/1000 [==============================] - 0s 28us/step - loss: 0.1132 - accuracy: 0.9820 - val_loss: 0.3080 - val_accuracy: 0.9134
Epoch 24/50
1000/1000 [==============================] - 0s 25us/step - loss: 0.1297 - accuracy: 0.9800 - val_loss: 0.3498 - val_accuracy: 0.8934
Epoch 25/50
1000/1000 [==============================] - 0s 24us/step - loss: 0.1612 - accuracy: 0.9700 - val_loss: 0.4304 - val_accuracy: 0.8808
Epoch 26/50
1000/1000 [==============================] - 0s 24us/step - loss: 0.1653 - accuracy: 0.9720 - val_loss: 0.3841 - val_accuracy: 0.9072
Epoch 27/50
1000/1000 [==============================] - 0s 24us/step - loss: 0.1471 - accuracy: 0.9660 - val_loss: 0.3872 - val_accuracy: 0.8858
Epoch 28/50
1000/1000 [==============================] - 0s 25us/step - loss: 0.1366 - accuracy: 0.9740 - val_loss: 0.3660 - val_accuracy: 0.8883
Epoch 29/50
1000/1000 [==============================] - 0s 24us/step - loss: 0.1240 - accuracy: 0.9800 - val_loss: 0.3663 - val_accuracy: 0.8971
Epoch 30/50
1000/1000 [==============================] - 0s 26us/step - loss: 0.1292 - accuracy: 0.9700 - val_loss: 0.4119 - val_accuracy: 0.8846
Epoch 31/50
1000/1000 [==============================] - 0s 25us/step - loss: 0.1626 - accuracy: 0.9630 - val_loss: 0.3900 - val_accuracy: 0.9021
Epoch 32/50
1000/1000 [==============================] - 0s 23us/step - loss: 0.1139 - accuracy: 0.9760 - val_loss: 0.3730 - val_accuracy: 0.8984
Epoch 33/50
1000/1000 [==============================] - 0s 25us/step - loss: 0.1030 - accuracy: 0.9800 - val_loss: 0.3669 - val_accuracy: 0.8959
Epoch 34/50
1000/1000 [==============================] - 0s 24us/step - loss: 0.0902 - accuracy: 0.9840 - val_loss: 0.3442 - val_accuracy: 0.8996
Epoch 35/50
1000/1000 [==============================] - 0s 24us/step - loss: 0.0775 - accuracy: 0.9890 - val_loss: 0.3692 - val_accuracy: 0.9009
Epoch 36/50
1000/1000 [==============================] - 0s 22us/step - loss: 0.0778 - accuracy: 0.9860 - val_loss: 0.3224 - val_accuracy: 0.9097
Epoch 37/50
1000/1000 [==============================] - 0s 26us/step - loss: 0.0711 - accuracy: 0.9900 - val_loss: 0.3425 - val_accuracy: 0.9084
Epoch 38/50
1000/1000 [==============================] - 0s 24us/step - loss: 0.0692 - accuracy: 0.9870 - val_loss: 0.3195 - val_accuracy: 0.9084
Epoch 39/50
1000/1000 [==============================] - 0s 22us/step - loss: 0.0691 - accuracy: 0.9880 - val_loss: 0.3388 - val_accuracy: 0.9084
Epoch 40/50
1000/1000 [==============================] - 0s 21us/step - loss: 0.0612 - accuracy: 0.9920 - val_loss: 0.3140 - val_accuracy: 0.9172
Epoch 41/50
1000/1000 [==============================] - 0s 23us/step - loss: 0.0569 - accuracy: 0.9910 - val_loss: 0.3351 - val_accuracy: 0.9122
Epoch 42/50
1000/1000 [==============================] - 0s 25us/step - loss: 0.0550 - accuracy: 0.9930 - val_loss: 0.3231 - val_accuracy: 0.9109
Epoch 43/50
1000/1000 [==============================] - 0s 23us/step - loss: 0.0514 - accuracy: 0.9950 - val_loss: 0.3354 - val_accuracy: 0.9122
Epoch 44/50
1000/1000 [==============================] - 0s 22us/step - loss: 0.0490 - accuracy: 0.9950 - val_loss: 0.3315 - val_accuracy: 0.9059
Epoch 45/50
1000/1000 [==============================] - 0s 24us/step - loss: 0.0456 - accuracy: 0.9950 - val_loss: 0.3375 - val_accuracy: 0.9147
Epoch 46/50
1000/1000 [==============================] - 0s 23us/step - loss: 0.0477 - accuracy: 0.9940 - val_loss: 0.3397 - val_accuracy: 0.9046
Epoch 47/50
1000/1000 [==============================] - 0s 22us/step - loss: 0.0495 - accuracy: 0.9940 - val_loss: 0.3277 - val_accuracy: 0.9034
Epoch 48/50
1000/1000 [==============================] - 0s 23us/step - loss: 0.0478 - accuracy: 0.9930 - val_loss: 0.3320 - val_accuracy: 0.9072
Epoch 49/50
1000/1000 [==============================] - 0s 22us/step - loss: 0.0494 - accuracy: 0.9910 - val_loss: 0.3373 - val_accuracy: 0.9046
Epoch 50/50
1000/1000 [==============================] - 0s 22us/step - loss: 0.0494 - accuracy: 0.9890 - val_loss: 0.3832 - val_accuracy: 0.9009


## Evaluating the performance¶

The predictions from the neural network are evaluated for all examples in the test sample by doing

In [9]:
predictions = model.predict(x_test)
print(predictions[3])

[1.5537975e-03 5.0299516e-04 5.8268331e-05 5.6680041e-05 5.2190635e-06
9.9738103e-01 9.8017254e-06 6.1042505e-05 6.5072112e-05 3.0611001e-04]


For each sample, the prediction is an array of 10 values. Each value is the estimated probability for the image to belong to this category.

The predicted category is the one with the largest probability.

Let's write a small function to plot a given image, and to print the true and predicted categories:

In [10]:
def plot_prediction(index):
print('predicted probabilities:')
print(predictions[index])
print('predicted category', np.argmax(predictions[index]))
print('true probabilities:')
print(y_test[index])
print('true category', np.argmax(y_test[index]))
img = x_test[index].reshape(8,8)
plt.imshow(img)


In this function, we obtain the category with np.argmax that, for an array, returns the index corresponding to the largest value.

Let's use this function to have a look at a few examples (just choose a different index to look at another example).

In [11]:
plot_prediction(3)

predicted probabilities:
[1.5537975e-03 5.0299516e-04 5.8268331e-05 5.6680041e-05 5.2190635e-06
9.9738103e-01 9.8017254e-06 6.1042505e-05 6.5072112e-05 3.0611001e-04]
predicted category 5
true probabilities:
[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
true category 5


Finally, let's compute the accuracy score, which is the probability to classify the digits correctly.

We will compute the accuracy for the test sample, which has not been used to train the network. Again, we will use np.argmax to get the predicted and true categories for each example.

In [12]:
# the second argument of argmax specifies
# that we want to get argmax for each example.
# without this argument, argmax would return
# the largest value in the whole array,
# considering all examples
y_test_best = np.argmax(y_test,1)
print(y_test_best.shape)
predictions_best = np.argmax(predictions,1)

from sklearn.metrics import accuracy_score
accuracy_score(y_test_best, predictions_best)

(797,)

Out[12]:
0.9008782936010038

You should obtain an accuracy around 90%, similar to the one we had obtained in the same conditions with scikit-learn.

Please note that the result is not deterministic, so the accuracy will vary every time you train the network. I usually get an accuracy between 90 and 93%, but I sometimes get a value as low as 87%.

Please repeat the exercise starting from the creation of the neural network to see what happens.

## What next?

In this post, you have trained your first neural network with Keras.

Keras is the easiest and most powerful way to work with neural networks, and we will use it very often on this blog.

To learn how to do deep learning for image recognition with Keras and TensorFlow, check the following posts:

Please let me know what you think in the comments! I’ll try and answer all questions.

And if you liked this article, you can subscribe to my newsletter to be notified of new posts (no more than one mail per week I promise.)