Python Lesson 38: Building Your First Neural Network (AI pt. 2)

A little bit about Keras and Tensorflow

Tensorflow and Keras are two prominent Python neural network machine learning packages. However, Tensorflow is an entire open-source end-to-end neural network package while Keras is more like an interface within Tensorflow. If it helps, think of Keras like a package-within-a-package in Tensorflow; whenever you use Keras, you’re actually using the Tensorflow library. However, Keras is a more intutive version of the Tensorflow libary, albeit with some trade-offs (such as the lack of ability to access more complex functionalities).

Package installation

Before we get started with our neural network creation, let’s first install our packages. You’re going to need both Tensorflow and Keras for this tutorial, but you only need to run the pip install tensorflow command on the command prompt, as installing Tensorflow will usually install Keras too. However, on the off chance that Keras doesn’t get installed with Tensorflow, you could run the pip install keras command on the command prompt.

Just in case you forgot, if you want to check if you’ve already pip-installed a certain package, run the pip list command and run through the list of installed packages to find the package you’re looking for (all packages are listed in alphabetical order).

Setting up the neural network

For this lesson, we’re going to start off by building a simple neural network-one that works with the MNIST Keras dataset. For those who don’t know, the MNIST (Modified National Institute of Standards and Technology) dataset is a very, very large dataset of images containing the handwritten digits 0-9-the MNIST dataset is commonly used for training image processing systems (or if you’re just starting out with neural network machine learning). This dataset contains 70,000 28×28 pixel images-60,000 images for the training dataset and 10,000 images for the testing dataset.

The MNIST dataset is certainly larger than most of the other datasets we’ve worked with in earlier posts (if you recall, the datasets from my earlier machine learning posts had about a few thousand elements tops). The reason for this is because, unlike the other machine learning I’ve taught you (k-means clustering, Naive Bayes classifications), neural networks are really well-suited for large datasets-and by large, I mean at least 10,000 records.

To start creating our neural network, first include these three lines of code in your Jupyter notebook:

import tensorflow as tf
import keras as kr
import tensorflow_datasets as tfds

Pay attention to the highlighted import line-in addition to the Tensorflow and Keras packages, you’ll also need the tensorflow_datasets package for this lesson. The tensorflow_datasets package contains several Tensorflow datasets you can work with when developing neural networks (such as the MNIST dataset, which we will be working with in this lesson).

If you haven’t installed the tensorflow_datasets package yet, run the line pip install tensorflow_datasets on your command prompt or run the line !pip install tensorflow_datasets on your Jupyter notebook (or whichever IDE you’re using).

Loading the MNIST dataset (and a word of advice)

Now that we’ve imported the necessary packages into our Python IDE, the next thing we need to do is import the MNIST dataset into our IDE. Here’s the code to do so:

from keras.datasets import mnist

Unlike most of my other machine learning/data analytics posts, I won’t be attaching a dataset to this post because we’ll be using a built-in Python dataset for this post. If you’re familiar with some popular data analytics/machine learning datasets such as titanic (detailing survivors and victims of the Titanic disaster), iris (detailing petal and sepal widths of a sample of 50 irises), and mtcars (detailing various features about a bunch of old cars), you’ve probably seen them on A LOT of data analytics/machine learning tutorials. There’s a good reason for that-they’re freely available and built-in datasets on several programs (Python and R to name just two).

For those who’ve been following my blog for a while, you’ll notice that I try to stay away from overly cliche datasets (I mean, if you’re a data science/data anayltics machine learning student, you’re probably quite sick of the iris dataset). However, even though MNIST is a very commonly used (and a little cliche) dataset, I think it will be the most appropriate first dataset to introduce you all to neural network creation.

Also, final word of advice for you all-if you’re trying to build a data science/data analytics/machine learning portfolio to land yourself a tech job (as I did when I launched this blog in summer 2018), try to stay away from cliche datasets. Find datasets that stand out (and ideally interest you)-you’ll be sure to impress the recruiters!

Now back to the lesson! After importing the MNIST dataset into your IDE, run this line of code to split the MNIST dataset into training and testing datasets:

(trainX, trainY), (testX, testY) = mnist.load_data()

When loading the MNIST dataset into your IDE (or any large dataset for that matter), remember to split your dataset into training and testing datasets, each denoted by their own variables.

I know it’s been a while since I’ve done any machine learning posts, so as a refresher, when building a machine learning model, the training dataset trains the model to work while the testing dataset is used to test if the model works as intended. When working with machine learning datasets, don’t split the main dataset 50-50 into training and testing datasets. The training dataset should be the larger dataset; a split like 70% training/30% testing should work fine-though the MNIST dataset has a split of ~85% training/~15% testing, which will work for this dataset.

Why do we need X and Y training and testing datasets? The X datasets encompass the whole dimensions of the training and testing datasets-the size (60,000 for training and 10,000 for testing) along with the dimensions of each image (28×28 pixels). The Y datasets on the other hand just encompass the sizes of each dataset.

In case you’re wondering about the size of each X and Y dataset, run the .shape command for each like so-remember not to include a pair of parentheses after each .shape command, as you can’t call tuple objects:

trainX.shape
(60000, 28, 28)

testX.shape
(10000, 28, 28)

trainY.shape
(60000,)

testY.shape
(10000,)

And now…time to build the model!

Now that we’ve loaded our MNIST dataset into Python, split the data into training and testing datasets, and obtained the shapes of each dataset, it’s time to get our feet wet and build our first neural network!

However, before we dive into the neural network nitty-gritty, there’s something I want to show you. Take a look at the code and output below:

import matplotlib.pyplot as plt
imageNum = 1500
plt.imshow(trainX[imageNum], cmap='magma')

In this example, I imported the matplotlib.pyplot package (which you may recall from my MATPLOTLIB lessons) to plot the 1501st image in the MNIST training dataset in MATPLOTLIB’s magma color scheme (the cmap parameter refers to MATPLOTLIB’s color schemes). As you can see, this image of a handwritten 9 is displayed as a 28×28 pixel image-which makes sense, as all images in the MNIST dataset (both training and testing) have a 28×28 pixel size.

MATPLOTLIB has several different color schemes to choose from. For a list of all available color scheme choices, check out this link-https://matplotlib.org/stable/tutorials/colors/colormaps.html.
In order to plot any of the images in the MNIST dataset, you’ll need to use either of the X datasets (in this example, trainX and testX) since they encompass the image sizes and in turn, contain the actual images. The Y datasets simply encompass the images themselves, so you would be able to retrieve any element from the MNIST dataset from either of the Y datasets, but you won’t be able to plot the image itself.
Just like many of the other Python projects I’ve done throughout this blog involving lists, the MNIST dataset is basically a giant zero-indexed list of images. So for a parameter like imageNum, you can choose any value between 0 and 59,999 if you’re analyzing the 60,000 image training dataset. If you’re analyzing the 10,000 image testing dataset, you can choose any value between 0 and 9,999. In the example above, I chose the 1,501st image in the testing dataset (as the imageNum I chose was 1,500, which represents the element at index 1,500).

Just for fun, let’s also plot a random image from the testing dataset:

import matplotlib.pyplot as plt
imageNum = 3332
plt.imshow(testX[imageNum], cmap='magma')

In this example, I did the same thing as I did in the previous example, except I decided to plot the 3,333rd image from the MNIST testing dataset-which happens to be the number 4.

Now that we know how to plot each element in the MNIST dataset (for both the testing and training datasets) it’s time to create our model! Take a look at the code below to see how we can create our first Python neural network model:

firstNeuralNetwork = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28)),
    tf.keras.layers.Dense(150, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10)
])

Now, if you’ve never seen a Python neural network before, you’re probably wondering what all of this code means. But don’t worry-your friendly neighborhood coding blogger is here to break it all down for you!

First off, let’s start with the Sequential sub-module. We use this sub-module in order to create the outer part of the neural network; in this sub-module, we wrap all the functions for the neural network inside of a list wrapped inside of the Sequential object constructor (referrring to the pair of parentheses that enclose the list). Why do we need a sequential model for the neural network? In this example, using a sequential model for the neural network allows us to add the other four layers in this neural network-Flatten, Dense, Dropout, and Dense-in sequential order, which is important for neural networks.

Now what about the four layers wrapped in our sequential model-Flatten, Dropout and the two Dense layers? The Flatten layer, well, flattens the input from 2-dimensional to 1-dimensional-which is important as we’re dealing with thousands of 2-D images for this dataset. How does Flatten flatten the input data? The Flatten layer’s input_shape parameter takes in the dimensions of the object to flatten-in this case each 28×28 image in the MNIST dataset-and takes in the (28, 28) tuple as the value of the input_shape argument.

The Dropout layer removes some of the data from the model in order to prevent overfitting. In the context of machine learning, what is overfitting? Overfitting in machine learning is what happens when your model has excellent accuracy with training data but not with new and unfamiliar data.

Let me give you an example. Let’s say you want to create a model that predicts whether an employee at a very, very, very large company is going to get a promotion based off of their resume. Let’s also assume that you train a model containing 5,000 resumes and it predicts outcomes with 96% accuracy-pretty awesome, right! Now let’s say you feed the model a new set of 2,500 resumes and it predicts outcomes with only a 44% accuracy-what happened here? The model experienced overfitting, as it was able to predict outcomes with great accuracy for the training dataset but with less-than-stellar accuracy for the new and unfamiliar dataset.

In our neural network, the Dropout layer will ignore 10% of the data in the training dataset to avoid overfitting.

Last but not least, we have two Dense layers for our neural network. The first Dense layer activates the neural network using the ReLU, or rectified linear unit activation, function. For more on the algebra behind ReLU, check out this article-https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/ (if you’re into linear algebra and/or trigonometry, I think you’ll enjoy this article). In the most basic sense, ReLU is a linear activation function that is used in a lot of neural networks due to its easy-to-train and well-performance.

In the first Dense layer, you’ll notice a number right before the activation parameter-that number indicates how many neurons you want to have in the neural network upon activation; in this case, we have 150 neurons upon activation of our neural network. The second Dense layer also has a number too-10. What’s the difference between these two numbers? In the first Dense layer, you can have as many neurons as you’d like upon activation while in the second Dense layer, you must have 10 neurons as there are ten unique objects for classifcation (images of the numbers 0-9).

Fitting and Compiling the Model

The last two things we need to do before we deploy our model are to fit it and compile it. How can we do that? Take a look at the code below:

firstNeuralNetwork.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
firstNeuralNetwork.fit(x=trainX,y=trainY, epochs=25)

So, what does all of this code mean? First of all, the optimizer parameter and value set the neural network’s optimization alogrithm-in this case, we’re using Tensorflow’s adam optimizer (for a more in-depth explination on that optimizer, check out this link-https://www.educba.com/tensorflow-adam-optimizer/), though you can experiement with whatever Tensorflow optimizer you like.

The loss parameter and corresponding value set the neural network’s loss function, which is used to help optimize the model’s performance by measuring the discrepancies between the predicted values and the target values. In the context of the MNIST dataset, each element would be considered a target value and the value that the neural network predicts as part of its classification would be the target value. In this example, we’re using the sparse_categorical_crossentropy loss function, which measures the cross-entropy (or contrast or discrepancy) between the predicted values and the actual values.

The metrics parameter and corresponding value (or list in this case) set the metrics-or in this case, metric-that you’d like to use to measure the neural network’s accuracy. In this example, we’re going with the accuracy metric, as this is the easiest metric to understand. Accuracy is also often used as a baseline for other metrics such as precision and f1 score (which is similar to accuracy but it takes false positives and false negatives into account).

In the fit function, you’ll first need to pass in your training datasets for both the X and Y values. As for the epoch parameter and value, an epoch is essentially an iteration through all the training data that isn’t ignored by the Dropout layer. To train a neural network and optimize it for accuracy, iterating through all of the training data once won’t suffice-you’ll need at least 10 iterations through the training data to optimize your neural network (though more epochs couldn’t hurt). In this neural network, we’re using 25 epochs, meaning that we will iterate through the training data 25 times.

Now, let’s see how our neural network performs through each epoch (or iteration):

Epoch 1/25
1875/1875 [==============================] - 5s 2ms/step - loss: 2.3026 - accuracy: 0.1118
Epoch 2/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3028 - accuracy: 0.1137
Epoch 3/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3030 - accuracy: 0.1118
Epoch 4/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3032 - accuracy: 0.1124
Epoch 5/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3030 - accuracy: 0.1114
Epoch 6/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3032 - accuracy: 0.1118
Epoch 7/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3030 - accuracy: 0.1100
Epoch 8/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3030 - accuracy: 0.1125
Epoch 9/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3028 - accuracy: 0.1114
Epoch 10/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3027 - accuracy: 0.1107
Epoch 11/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3028 - accuracy: 0.1129
Epoch 12/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3026 - accuracy: 0.1113
Epoch 13/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3028 - accuracy: 0.1135
Epoch 14/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3032 - accuracy: 0.1124
Epoch 15/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3030 - accuracy: 0.1133
Epoch 16/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3026 - accuracy: 0.1121
Epoch 17/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3030 - accuracy: 0.1124
Epoch 18/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3030 - accuracy: 0.1133
Epoch 19/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3030 - accuracy: 0.1120
Epoch 20/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3028 - accuracy: 0.1134
Epoch 21/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3026 - accuracy: 0.1141
Epoch 22/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3026 - accuracy: 0.1129
Epoch 23/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3030 - accuracy: 0.1126
Epoch 24/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3030 - accuracy: 0.1127
Epoch 25/25
1875/1875 [==============================] - 4s 2ms/step - loss: 2.3037 - accuracy: 0.1117

In this epoch run log, we can see several different metrics for each epoch, such as loss and accuracy. However, the only metric you should focus on is each epoch’s accuracy, as that tells you the accuracy of the neural network throughout each training run. For instance, the first epoch (denoted as Epoch 1/25) had an accuracy of 11.18%. The final epoch (denoted as Epoch 25/25) had an accuracy of 11.17%-all in all, pretty abysmal accruacy for the neural network.

Neural network evaluation time!

Last but not least, it’s neural network evaluation time! To evaluate the accuracy of the overall model (as opposed to individual epochs), all you need is one line of code:

firstNeuralNetwork.evaluate(testX, testY)

313/313 [==============================] - 1s 1ms/step - loss: 2.3026 - accuracy: 0.1045

Just like you saw with the epochs, you’ll see the loss and accuracy metrics. Pay close attention to the accuracy metric, as this will tell you the model’s overall accuracy, which is still pretty bad at 10.45%.

I know this may seem confusing, but remember when you’re fitting & compiling the model to use the training dataset (for both the X and Y axes). When you’re evaluating the model’s accuracy, use the testing dataset (for both the X and Y axes).

Yes, I know the accuracy of this neural network sucked. However, the aim of this lesson was not to build the best neural network out there-rather, my aim was to teach you the basics of neural network creation so that you all knew the basic concepts of neural networks. A lot of the concepts we discussed in this post-activation algorithms, epochs, dropout rate-can be experimented with to your liking in order to optimize the neural network’s accuracy.

Final code and some parting words for 2022

So, I know we had A LOT of code for this lesson. In case you wanted to run the code in the order we discussed it, here’s the entire script below for your convinience (outputs not included):

import tensorflow as tf
import keras as kr
import tensorflow_datasets as tfds

(trainX, trainY), (testX, testY) = mnist.load_data()

trainX.shape
testX.shape
trainY.shape
testY.shape

import matplotlib.pyplot as plt
imageNum = 1500
plt.imshow(trainX[imageNum], cmap='magma')

import matplotlib.pyplot as plt
imageNum = 3332
plt.imshow(testX[imageNum], cmap='magma')

firstNeuralNetwork = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28)),
    tf.keras.layers.Dense(150, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10)
])

firstNeuralNetwork.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
firstNeuralNetwork.fit(x=trainX,y=trainY, epochs=25)

firstNeuralNetwork.evaluate(testX, testY)

Thanks for coming along on this coding journey in 2022! Hope you all sharpened your skills and/or learned something new along the way this year! Have a very happy holiday season and rest assured-I will be back in 2023 with brand new coding content (and a little something special for my blog’s 5th anniversary)!

Michael