Tejashwi Kalp Taru
Engineer, Tinkerer, Blogger
Reading time about 4 minutes

Object detection with Fizyr Retinanet


In this article, we will train Fizyr Retinanet. It is discovered that there is an extreme foreground-background class imbalance problem in the one-stage detector. And it is believed that this is the central cause which makes the performance of one-stage detectors inferior to two-stage detectors

In RetinaNet, a one-stage detector, by using focal loss, the lower loss is contributed by “easy” negative samples so that the loss is focusing on “hard” samples, which improves the prediction accuracy.

For this tutorial, we will be using Keras RetinaNet implementation by fizyr

We will follow the steps below:

  • Label image dataset using HyperLabel
  • Export labeled dataset in Pascal VOC format
  • Create annotations and classes CSV file from Pascal VOC data
  • Train our model
  • And finally, perform inference on images

Label image dataset using HyperLabel

To identify and mark the objects in a given image we need to draw a bounding box around it. This will provide us 4 coordinates and these coordinates will be used by machine learning scripts to identify the object and classify it.

In this tutorial, we will be using HyperLabel


For this project, we will be using the following YouTube video from a shipping port and then we will extract the frames from the video using FFmpeg.

Or if you are feeling lazy to download the video and extract the images, don’t worry. You can download the extracted images directly from here

Download the video and then install FFmpeg (if not installed on your system).

Once, installed we need to extract the frames out of the video. We will be extracting the frames for the first 30 seconds. To do this, we need to run the following command in terminal

# {PATH-VIDEO-FILE}    -> the path to downloaded video file
# {DESTINATION-FOLDER} -> path to a local folder in which extracted frames will be saved
ffmpeg -i {PATH-VIDEO-FILE} -ss 00:00:00 -t 00:00:30 {DESTINATION-FOLDER}/output-%09d.jpg

Once the images are exported, its time to launch HyperLabel. You can watch the following tutorial video about how to use HyperLabel and tag images with it.

Once all of your images are tagged, its time to export the labels and data in Pascal VOC format. Goto the review tab/section and click on the export button. From the export menu, choose Object Detection and select Pascal VOC and click on export. It will take some time to export the data, once exported you should be able to see multiple folders, but we are interested in two i.e., Annotations and JPEGImages folder.

folder structure

Now created a new folder, name it dataset. Copy all the contents of Annotations folder and JPEGImages folder inside dataset folder and then create a zip archive of this folder. Optionally, you can download the pre-made ZIP file from here

Upload this zip archive to your Google drive and get the file ID. To locate the File ID, right-click on the name of the file, choose the Get Shareable Link option, and turn on Link Sharing if needed (you can turn it off later). You will see the link with a combination of numbers and letters at the end, and what you see after id = is the File ID.***ThisIsFileID***

If your file is already open in a browser, you can obtain File ID from its link:***ThisIsFileID***/edit#gid=123456789

Now its time to open Google Colab notebook or download the notebook from here

When you open the notebook, you should change the DATASET_DRIVEID value to your file ID.

file id

Now run the notebook cells one by one from the top and train your model. It will take some time to train, meanwhile, you can sip some coffee or beer, as you wish.

Once trained, you can run the inference and get the result. In my case, the result looks something like

final result

Congratulations! You have trained your model.

comments powered by Disqus