author
Tejashwi Kalp Taru
Engineer, Tinkerer, Blogger
Reading time about 4 minutes

Object detection with Fizyr Retinanet


cover

In this article, we will train Fizyr Retinanet. It is discovered that there is extreme foreground-background class imbalance problem in one-stage detector. And it is believed that this is the central cause which makes the performance of one-stage detectors inferior to two-stage detectors

In RetinaNet, an one-stage detector, by using focal loss, lower loss is contributed by “easy” negative samples so that the loss is focusing on “hard” samples, which improves the prediction accuracy.

For this tutorial, we will be using Keras RetinaNet implementation by fizyr

We will follow the steps as below:

Label image dataset using HyperLabel

To identify and mark the objects in a given image we need to draw a bounding box around it. This will provide us 4 coordinates and these coordinates will be used by machine learning scripts to identify the object and classify it.

In this tutorial, we will be using HyperLabel

hyperlabel

For this project, we will be using the following YouTube video from a ship port and then we will extract the frames from the video using FFmpeg.

Or if you are feeling lazy to download the video and extract the images, don’t worry. You can download the extracted images directly from here

Download the video and then install FFmpeg (if not installed on your system).

Once, installed we need to extract the frames out of the video. We will be extracting the frames for first 30 second. To do this, we need to run the following command in terminal

1
2
3
# {PATH-VIDEO-FILE}    -> the path to downloaded video file
# {DESTINATION-FOLDER} -> path to a local folder in which extracted frames will be saved
ffmpeg -i {PATH-VIDEO-FILE} -ss 00:00:00 -t 00:00:30 {DESTINATION-FOLDER}/output-%09d.jpg

Once the images are exported, its time to launch HyperLabel. You can watch the following tutorial video about how to use HyperLabel and tag images with it.

Once all of your images are tagged, its time to export the labels and data in Pascal VOC format. Goto the review tab/section and click on the export button. From the export menu, choose Object Detection and select Pascal VOC and click on export. It will take some time to export the data, once exported you should be able to see multiple folders, but we are interested in two i.e., Annotations and JPEGImages folder.

folder structure

Now created a new folder, name it dataset. Copy all the contents of Annotations folder and JPEGImages folder inside dataset folder and then create a zip archive of this folder. Optionally, you can download the pre-made ZIP file from here

Upload this zip archive to your Google drive and get the file ID. To locate the File ID, right click on the name of the file, choose the Get Shareable Link option, and turn on Link Sharing if needed (you can turn it off later). You will see the link with a combination of numbers and letters at the end, and what you see after id = is the File ID.

https://drive.google.com/open?id=***ThisIsFileID***

If your file is already open in a browser, you can obtain File ID from its link:

https://docs.google.com/spreadsheets/d/***ThisIsFileID***/edit#gid=123456789

Now its time to open Google Colab notebook or download the notebook from here

When you open the notebook, you should change the DATASET_DRIVEID value to your file ID.

file id

Now run the notebook cells one by one from the top and train your model. It will take some time to train, meanwhile you can sip some coffee or beer, as your wish.

Once trained, you can run the inference and get the result. In my case the result looks something like

final result

Congratulations! You have trained your model.

comments powered by Disqus