Object detection and recognition is a hot item, and massive advances have been achieved in recent years thanks to Convolutional Neural Networks (CNNs). Besides the technology, you also need either labelled training data or pre-trained models. The COCO dataset is a popular set of images used for training and evaluating object detection models.
Google provide an object detection API, including models that are trained on COCO. The Github page explains how to download and install it. You might have to install git and protoc first. I assume that you already have a running Python environment for data science and Machine Learning – on Windows, Anaconda is the easiest way to get this up and running.
How to do inference is shown in a Colab. You can take the code here and paste it into a Python file. You’ll have to make a few minor adjustments. Instead of downloading and unpacking a model as shown in the Colab, I recommend manually downloading the models from the model zoo and unpacking them to the models\research\object_detection\test_data folder. I could not get it to work with one of the keypoints models and only used models without keypoints.
In the Python script, you need to modify the path to the config and model checkpoint:
pipeline_config = 'models/research/object_detection/test_data/efficientdet_d7_coco17_tpu-32/pipeline.config'
model_dir = 'models/research/object_detection/test_data/efficientdet_d7_coco17_tpu-32/checkpoint/'
In the pipeline config file, you need to set the path to the label definitions:
eval_input_reader: {
label_map_path: "models/research/object_detection/data/mscoco_label_map.pbtxt"
The plotting functions in the Colab use the inline plt.show() functionality, but this will not work in a script run through the command line Python interpreter.This needs to be replaced by saving to file, e.g.
plt.savefig('img1.png')
plt.close()
You should then be able to run the Python script and get as output an image with the bounding boxes:
You can try different models. For the non-Centernet models, you’ll have to disable the plotting of heatmaps. Models with a larger input size can detect more detail, but are slower.
You can try GPU-based processing. This requires the installation of Nvidia CUDA, cuDNN, and probably Zlib – and of course a CUDA-capable GPU.