In my post about Handwritten Digit Recognition with scikit-learn we have seen that a simple neural network can easily be trained with scikit-learn. And we managed to classify images of digits with an accuracy larger than 90%.

But scikit-learn is generally not adapted to neural networks.

Its goal is in fact to provide a unified interface for training and testing different machine learning algorithms: neural networks, and also support-vector machines, naive Bayes, nearest neighbours, decision trees, etc. Indeed, in machine learning projects, you'll spend a large fraction of your time choosing the best algorithm and tuning it for best performance. Scikit-learn was designed to make these tasks as easy as possible.

However, when it comes to neural networks specifically, scikit-learn has two major drawbacks:

- the interface does not provide enough control to build complex neural networks, nor to control their behaviour in details.
- it's not adapted to deep learning

In this post, we will repeat our exercise about handwritten digit recognition with Keras, a high-level neural network API. You will learn:

- How to install Keras
- How to create a simple dense neural network with this tool
- How to estimate its performance

In a later article, we will see how to do deep learning with Keras.

**Prerequisites:**

- Python Crash Course for Machine Learning
- Numpy Crash Course for Machine Learning
- Matplotlib for Machine Learning
- Handwritten Digit Recognition with scikit-learn

To run this notebook interactively, the easiest is to click here to execute it on Google Colab.

But if you prefer to run it on your own machine, follow these instructions.

First, Install Anaconda for Machine Learning and Data Science in Python.

Then create an environment for this tutorial:

```
conda create -n hwd_keras
```

and activate it:

```
conda activate hwd_keras
```

Install the needed packages:

```
conda install keras scikit-learn jupyter matplotlib
```

Then, download this notebook and open it in a jupyter notebook:

- download the repository containing this notebook
- unzip it, say to
`maldives-master`

- navigate to
`maldives-master/handwritten_digits_keras`

- run:

```
jupyter notebook handwritten_digits_keras.ipynb
```

As in handwritten digit recognition with scikit-learn, we are going to use the digits dataset provided by scikit-learn. The digits are 8x8 images and we will feed them to a neural network with:

- an input layer with 8x8 = 64 neurons
- a hidden layer with 15 neurons
- an output layer with 10 neurons corresponding to the 10 digit categories.

First, let's initialize our tools and load the digits dataset:

In [1]:

```
import numpy as np
import matplotlib.pyplot as plt
import os
# for some reason, the following is needed to run on mac os
os.environ['KMP_DUPLICATE_LIB_OK']='True'
from sklearn import datasets
digits = datasets.load_digits()
```

The input layer requires a 1-dimensional array in input, but our images are 2D. So we need to flatten all images:

In [2]:

```
x = digits.images.reshape((len(digits.images), -1))
x.shape
```

Out[2]:

The labels require a bit of attention. At the moment, `digits.target`

contains the digit corresponding to each image in the dataset:

In [3]:

```
digits.target
```

Out[3]:

But in Keras, we have to build our neural network with 10 output neurons (this actually happens under the hood in scikit-learn). During the training, Keras will have to compare the 10 output values of these neurons to the target value. But how can we compare a vector of 10 values with a single target value?

The solution is to translate each target value into a vector of length 10 with a technique called *one-hot encoding*:

- target
`0`

is translated to`[1, 0, 0, 0, 0, 0, 0, 0, 0, 0]`

- target
`1`

is translated to`[0, 1, 0, 0, 0, 0, 0, 0, 0, 0]`

- ...
- target
`9`

is translated to`[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]`

After doing that, the values from the output neurons, which are probabilities ranging from 0 to 1, can be compared directly to the values in the target vector. In this way, for a given number, say 0, the neural network will be trained to output a high probability from the first output neuron, and a low probability from the following neurons.

One-hot encoding can be performed easily with the utilities provided by Keras:

In [4]:

```
from keras.utils import np_utils
y = np_utils.to_categorical(digits.target,10)
print(digits.target)
print(y)
```

let's now split our data into a training sample and a testing sample:

In [5]:

```
split_limit=1000
x_train = x[:split_limit]
y_train = y[:split_limit]
x_test = x[split_limit:]
y_test = y[split_limit:]
```

The first 1000 images and labels are going to be used for training. The rest of the dataset will be used later to test the performance of our network.

After importing the necessary tools from Keras, we create the neural network in the following code snippet.

In [6]:

```
from keras import layers, Model, optimizers, regularizers
```

In [7]:

```
# create the input layer
#
# we specify that the input layer
# should have 64 neurons, one for each pixel
# in our images.
# The input neurons do nothing, they
# just transfer the value at each pixel
# to the next layer.
img_input = layers.Input(shape=(64,))
# create the hidden layer
#
# This layer is a Dense layer, which means
# that its neurons are fully connected to the
# neurons in the previous layer (the input layer)
# We will talk about the activation in a future post
tmp = layers.Dense(15,
activation='sigmoid')(img_input)
# create the output layer
#
# The output layer is another Dense layer.
# It must have 10 neurons, corresponding to
# the 10 digit categories
output = layers.Dense(10,
activation='softmax')(tmp)
# create the neural network from the layers
model = Model(img_input, output)
# print a summary of the model
model.summary()
# =================================================
# Please don't pay attention to what follows,
# we'll talk about regularization later!
# For now, it is enough to know that regularization
# helps the neural network converge properly.
# I've added this regularization because it is
# performed by default in scikit-learn,
# and because we want to be able to compare the
# results of scikit-learn and keras.
l2_rate = 1e-4
for layer in model.layers:
if hasattr(layer, 'kernel_regularizer'):
layer.kernel_regularizer = regularizers.l2(l2_rate)
layer.bias_regularizer = regularizers.l2(l2_rate)
layer.activity_regularizer = regularizers.l2(l2_rate)
# =================================================
# define how the neural network will learn,
# and compile the model.
# models must be compiled before
# they can be trained and used.
# the loss, optimizer, and metrics arguments
# will be covered in a future post.
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.SGD(lr=0.1, momentum=0.9),
metrics=['accuracy'])
```

Finally, we can train the network:

In [8]:

```
history = model.fit(x=x_train, y=y_train, validation_data=(x_test,y_test),
batch_size=100, epochs=50)
```

The predictions from the neural network are evaluated for all examples in the test sample by doing

In [9]:

```
predictions = model.predict(x_test)
print(predictions[3])
```

For each sample, the prediction is an array of 10 values. Each value is the estimated probability for the image to belong to this category.

The predicted category is the one with the largest probability.

Let's write a small function to plot a given image, and to print the true and predicted categories:

In [10]:

```
def plot_prediction(index):
print('predicted probabilities:')
print(predictions[index])
print('predicted category', np.argmax(predictions[index]))
print('true probabilities:')
print(y_test[index])
print('true category', np.argmax(y_test[index]))
img = x_test[index].reshape(8,8)
plt.imshow(img)
```

In this function, we obtain the category with `np.argmax`

that, for an array, returns the index corresponding to the largest value.

Let's use this function to have a look at a few examples (just choose a different index to look at another example).

In [11]:

```
plot_prediction(3)
```

Finally, let's compute the accuracy score, which is the probability to classify the digits correctly.

We will compute the accuracy for the test sample, which has not been used to train the network. Again, we will use `np.argmax`

to get the predicted and true categories for each example.

In [12]:

```
# the second argument of argmax specifies
# that we want to get argmax for each example.
# without this argument, argmax would return
# the largest value in the whole array,
# considering all examples
y_test_best = np.argmax(y_test,1)
print(y_test_best.shape)
predictions_best = np.argmax(predictions,1)
from sklearn.metrics import accuracy_score
accuracy_score(y_test_best, predictions_best)
```

Out[12]:

You should obtain an accuracy around 90%, similar to the one we had obtained in the same conditions with scikit-learn.

Please note that the result is not deterministic, so the accuracy will vary every time you train the network. I usually get an accuracy between 90 and 93%, but I sometimes get a value as low as 87%.

Please repeat the exercise starting from the creation of the neural network to see what happens.

In this post, you have trained your first neural network with Keras.

Keras is the easiest and most powerful way to work with neural networks, and we will use it very often on this blog.

To learn how to do deep learning for image recognition with Keras and TensorFlow, check the following posts:

Please let me know what you think in the comments! I’ll try and answer all questions.

And if you liked this article, you can subscribe to my newsletter to be notified of new posts (no more than one mail per week I promise.)