Traffic Signs Recognition using CNN with TensorFlow & Keras

German Traffic Sign Recognition Benchmark Classification

Zhenli Jin
9 min readDec 13, 2021
Photo by Joshua Hoehne on Unsplash

Project Definition

Project Overview

Nowadays artificial intelligence (AI) is gaining momentum both in academia and industry. At the same time, autonomous driving and self-driving car are thriving, and many emerging technologies, such as high precision radar, high-definition camera, LiDAR (light detection and ranging), artificial intelligence algorithms and so on, are being applied to autonomous driving. Many technology company and automobile enterprise, such as Waymo, Tesla, Uber, BMW, Mercedes-Benz, and Baidu, are getting into the autonomous driving competition.

Autonomous driving is a complicated system and artificial intelligence is a wide concept. In this small project, we are concerned about how self-driving vehicle recognizes the road traffic signs. With no doubt, road traffic safety is important to vehicles and drivers, so automotive suppliers are developing driver-assistance system and traffic-sign recognition is one of the features of the assisted control system. Recognizing traffic signs correctly is important, since this is related to speed of vehicles, whether or not to stop, merging into one lane, and so on. In addition, every country has its own traffic signs system and the signs are largely identical but with slight differences. Thus, building an effective algorithm is necessary for the development of autonomous driving.

There is a German Traffic Sign Recognition Benchmark (GRSRB) available at Kaggle. It was designed for a multi-class, single-image classification challenge.

One can find the corresponding github repository here.

Motivation: This project is based on Udacity Data Scientist Nanodegree Capstone Project. And personally I am interested in deep learning. Therefore, to begin with building neural network, I select the image classification with convolutional neural network in TensorFlow & Keras.

Problem Statement

As we stated before, identifying road traffic sign is a crucial part of autonomous driving system. Therefore, we are interested in correctly recognizing the road traffic signs. We will create a machine learning model to recognize the road traffic signs, specifically we will build a convolutional neural network (CNN) because we are working on image data.

In this project, we will build a CNN to recognize the traffic signs. At the same time, we will implement transfer learning with VGG19. Also, given a new traffic sign image, we will test if the recoginizer could identify the image correctly.

Metrics

As to the metrics we use to measure the performance of our model, we will use the accuracy score, since relatively speaking, the traffic sign images are evenly distributed and we have a sufficiently large dataset. That is, there is no extreme such that a single category has too many images or very few images.

The analysis and methodology is divided into the following small parts:

  • Data Preparation
  • Data Exploration & Data Visualization (Traffic Sign Images Show)
  • Data Preprocessing
  • Implementation (CNN model building)
  • Train and Validate the Model
  • Refinement

Then we have the result part as:

  • Model Evaluation and Validation
  • Justification

Analysis

Data Preparation

According to the description on Kaggle page, the traffic signs originated from German and it was a multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011. The dataset contains more than 50000 images of different traffic signs stored in three folders: Train, Test, and Meta. Unlike regular tabular dataset in csv form, we are working with the image data. Thus, our first step is to import the dataset and to convert it into the proper form for the following implementation.

First things first, let’s check how many different signs or classes in our dataset. Running the above code, we have 43 folders, which is 43 different categories in this dataset.

Next, we load the training images into the memory using the following code. It takes a while because we have a lot of images.

For the image labels, we use to_categorical from tensorflow.keras.utils to one-hot encode the labels. By priting outthe shape, we have the training set (39209, 50, 50, 3) and labels (39209, 43).

Data Exploration & Data Visualization (Traffic Sign Images Show)

This is an image classification project, so it is intuitive to visualize the dataset by showing some images. First, let’s show the 43 standard traffic signs in the Meta folder.

Figure 1: Standard Traffic Signs

Then, we see that those images are labeled as numbers which is meaningless in a human mind. Therefore, by looking up these signs online¹, we create a dictionary contains labels and corresponding sign names.

Since we are doing a classification problem, we want to know how many images in each category in our training set. Let’s visualize the class distribution.

Figure 2: Number of Images by each Traffic Sign

From the above plot, we see that the distribution of training images are uneven, there is no extreme case though. Next, let’s visualize some random real images from the Test folder.

Figure 3: Random Images from the Test Set

Notice that the images are in difference sizes and resolutions are not identical. That is, some of them are clear while some of them are vague images.

Methodology

Data Preprocessing

Note that we only have training images and test images from the original data. To validate our model, we need to split the training data into training set and validation set, so we use train_test_split for this.

The print() output shows as the following:

Figure 4: Output

Note that we divided these images by 255.

Implementation (CNN model building)

As we stated, the aim of this project is to create a convolutional neural network to classify the images into their corresponding classes. Now we are going to construct our own CNN architecture.

Here is a summary of the network architecture.

Figure 5: Network Summary

And one possible refinement is to add more hidden layer.

Here is a summary of the refined network architecture.

Figure 6: Network Summary

In addition to our own architecture, we will implement transfer learning with pre-trained network VGG19.

Figure 7: Network Summary

Train and Validate the Model

Now it is time to train and validate our model. Note that it is a multiclasses classification task, so we use categorical_crossentropy as our loss function when compiling the model. And we use adam as our optimizer. As we stated before, we will use the metric of accuracy score.

Also, we will use ImageDataGenerator from tensorflow.keras.preprocessing to generate batches of images with augmentation.

Then we train the model.

The training process output:

Figure 8: Training Output

Similarly, let’s train our refined model (more hidden layer).

The training process output:

Figure 9: Training Output

We do the same process using VGG19 pre-trained model.

The training process output:

Figure 10: Training Output

Refinement

There are some potential actions we could take to improve the performance of the convolutional neural network. We can adjust the learning rate based on the choice of optimizer, and we can increase or decrease the number of epochs. Also, batch size is a good choice to refine the model. In addition, to add more hidden layers and units is also one of the improvements.

Here we tried to add more layers and more hidden units per layer to approach a higher representational capacity. Note that, compared to our initial model, we added two more layers with filter 64 and 128, respectively. And the result is obvious that the training and validation accuracy increased a lot.

Results

Model Evaluation and Validation

From the above training output, we see that our training set has an accuracy of around 78% on the first model and 94% on the second model. And we got 90% accuracy on the VGG19. Let’s plot the accuracy and loss for training set and validation set and for each model.

Model 1:

Figure 11: Accuracy
Figure 12: Loss

Model 2 (more hidden layer):

Figure 13: Accuracy
Figure 14: Loss

Pre-trained Model VGG19:

Figure 15: Accuracy (VGG19)
Figure 16: Loss (VGG19)

Prediction

Lastly, we predict the images using the test images.

We have the test images accuracy on our own network as:

Figure 17: Test Data Accuracy Model 1

We have the test images accuracy on our second network as:

Figure 18: Test Data Accuracy Model 2

We have the test images accuracy on VGG19 as:

Figure 18: Test Data Accuracy (VGG19)

Justification

Our first architecture of neural network reached the accuracy of 78% at the 15 epoch, while our second refined network reached a higher accuracy of 94%, which is a big improvement with more layers. It is an excellent performance on training set and the loss curve performed as expected for our refinement.

Two things need to be concerned about the VGG19 (1)with more epochs, the accuracy could be improved a little bit more, because the curve looks smoothly; (2)we notice that the validation loss is oscillating and fluctuating. A potential problem might be: too large learning rate, improper batch size, or choice of optimizer.

As a whole, an accuracy of 93.14% on the test data is relatively high enough, while we have the test accuracy of 87.5% with VGG19. However, it is not a high accuracy if we are working on a traffic sign recognizer. We talk about it in the reflection section.

Project Conclusion

Reflection

For this project, we first loaded the image data and used some in built API from Keras to prepare the image data. Then we visualized some traffic sign samples and we visualized the class distribution of training images. We first tried a simple CNN architecture and the result was unsatisfactory. Then to improve and refine the performance of a CNN, we decided to add more hidden layers and adjusted hidden units, and that indeed worked as we increased the training accuracy to 94%, although we actually expected an approximately 100% accuracy. Then we implemented transfer learning with VGG19. However, the result was not ideal either. A potential solution might be implementing other pre-trained model. With no doubt, our refined model gave us the best test data accuracy.

Normally speaking, we wanted our algorithm to identify the traffic sign with a 100% accuracy because we are concerned about the driving safety and persoanl safety. Furthermore, the recognizer and the camera are usually working in a dynamic environment. In real life, the traffic on the road is way more complicated that there might be weather effect, birds and insects flying across the signs, dust and dirt on the signs, or even some of the traffic signs are broken. In short, the traffic sign recognition is a complex problem. And generally the more data (more traffic signs) we have for training the model, the higher accuracy the recognizer could reach at.

Improvement

As we talked before, some improvements include adjusting learning rate, adjusting batch size, and check the model complexity and try to add regulariztion. Depending on the task, we could use other alternative activation functions such as sigmoid and tanh.

Also, note that we implement the model based on the CPU and we don’t use AWS for GPU computation. In addition, there are other pre-trained CNN models we could use such as InceptionV3, ResNet50, and EfficientNet.

Reference

  1. Road Signs in Germany. https://routetogermany.com/drivingingermany/road-signs

--

--