Hotdog or Not Hotdog – Image Classification in Python using fastai

Earlier, I was of the opinion that getting computers to recognize images requires – huge amount of data, carefully experimented neural network architectures and lots of coding.

But, after taking the deep learning course –, I found out that it is not always true. We can achieve a lot by writing just a few lines of code. One does not need to be a deep learning expert to do deep learning.

In this blog post, we will build an image classifier using a deep learning library, fastai. The fastai library sits on top of pytorch, an another hugely popular deep learning library developed by Facebook. Pytorch usually requires writing more code than fastai. You can think of the fastai-pytorch environment to be analogous to the keras-tensorflow environment.

Why Hotdogs?

not hot dog image classification fastai

If you are also a fan of the TV show Silicon Valley, you would get the idea. In one of its episodes, a character Jian-Yang builds an application that classifies various food dishes into a hotdog or not a hotdog. Whether you’ve seen the show or not, you should  definitely watch a refresher on how it works. Though not a very useful food classifier, it can be a very nice learning exercise. And this can be easily extended to build other image classifiers as well.

Most of the tasks done in fastai require very few lines of code. To build this hotdog classifier we would just need the following lines of code  –

tfms = tfms_from_model(resnet34, sz=224)
data = ImageClassifierData.from_paths(PATH, tfms=tfms)
learn = ConvLearner.pretrained(resnet34, data)

not hot dog image classification fastai

Yes, we can. And we will also study each of these lines in detail.

Data Setup

Before diving into these three lines let’s go through the steps required for the initial data setup. I have downloaded the images from this kaggle dataset. The dataset contains a test and a train folder each containing two sub-folders – hot_dog and not_hot_dog with 250 images each.

not hot dog image classification fastai

The fastai library assumes that we have train and valid directories and that each directory will have sub-directories for each class we wish to recognize (in this case, ‘hot_dog’ and ‘not_hot_dog’). Originally, the dataset had equal number of images in train and test folders but I moved 100 images from each of the test folders to the respective train folders to get a 7:3 train-test split ratio. I also renamed the test folder to valid.

Let’s have a look at some of the images in our dataset.

# This file contains all the main external libraries we will use
from fastai.imports import * 

PATH = "/home/yashu_seth/data/"

hot_dog_files = os.listdir(f'{PATH}valid/hot_dog')[:10]
img = plt.imread(f'{PATH}valid/hot_dog/{hot_dog_files[4]}')

not hot dog image classification fastai

not_hot_dog_files = os.listdir(f'{PATH}valid/not_hot_dog/')[:10]

img = plt.imread(f'{PATH}valid/not_hot_dog/{not_hot_dog_files[6]}')

not hot dog image classification fastai

Let’s move on to model building and discuss each of the lines one by one.

Data Transformation and Augmentation

# This imports everything thing that we need.
from fastai.conv_learner import *
tfms_from_model(resnet34, sz=224)

The tfms_from_model function in fastai is used for data transformation and augmentation. It requires the type of pre-trained we will be using (more on this later). We use the resnet34 model here. We need to transform our images to a particular size before feeding them as an input to our model. We use the sz parameter to resize the input images.

This method can also be used for various data augmentations such as horizontal flipping, random zooming, rotating etc. In the example above, we have not used any data augmentations but you can refer the file in the fastai GitHub repository to experiment with various ways to augment your data.

Building a Dataset

PATH = "/home/yashu_seth/data/"
data = ImageClassifierData.from_paths(PATH, tfms=tfms)

The ImageClassifierData is used to create a fastai dataset. It handles the train and validation data along with the various augmentations internally. All we need to do is provide the directory path of our images and the data transforms object to the from_paths methods. We used the from_paths method because we had our image data as files. The from_csv and from_arrays methods can also be used depending on the source of your input data.

Transfer Learning

Instead of training a model from scratch, we will be using a pre-trained model. A pre-trained model is a model created by someone else to solve a different problem. We modify such models to suit it to our needs. This technique is known as transfer learning. You can have a look at the Stanford CS231n course notes on transfer learning to get an introduction.

The pre-trained model that we will be using is the resnet34. This model has been trained on 1.2 million images and 1000 classes from the ImageNet database. The resnet34 is a version of the resnet models that won the 2015 ImageNet competition. It is a convolutional neural network (CNN), a type of neural network used extensively in computer vision.

learn = ConvLearner.pretrained(resnet34, data)

The ConvLearner class is used to  build convolutional neural networks in fastai. We will use its pretrained method to leverage the resnet34 model.

Model Training

Finally, let’s train our model on our hot dog and not hot dog images., 6)
epoch      trn_loss   val_loss   accuracy                  
    0      0.726549   0.372272   0.842339  
    1      0.500642   0.25229    0.914819                  
    2      0.401544   0.235235   0.911694                  
    3      0.333859   0.23114    0.905444                  
    4      0.303246   0.217755   0.911895                  
    5      0.269516   0.214936   0.921472                  

Here, we have used a learning rate of 0.005 and we get an accuracy of 92% in 6 epochs. Since, there are only 200 images in the validation set the accuracy might vary a little with different experiments. But it is still good. Besides, the whole training can be done in a reasonably small amount of time.

You can find the entire code in a jupyter notebook in my GitHub repository – hot-dog-or-not-hot-dog.

How to improve?

When we trained the model above, it only changed the weights of the last layer. Because, when we create a learner object it sets all but the last layer to frozen. We can unfreeze the previous layers and run a few more epochs with differential learning rates to improve the accuracy further.

I hope you enjoyed the post. If you want to get an in-depth understanding on neural networks you should totally check out the course.

Thank You. 🙂

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s