SSD-MobilenetV1 is an object detection model that uses a Single Shot MultiBox Detector (SSD) approach to predict object classes for boundary boxes.
SSD is a CNN that enables the model to only need to take one single shot to detect multiple objects in an image, and MobileNet is a CNN base network that provides high-level features for object detection. The combination of these two model frameworks produces an efficient, high-accuracy detection model that requires less computational cost.
The SSD-MobilenetV1 is suitable for mobile and embedded vision applications.
Model | Download | Download (with sample test data) | ONNX version | Opset version |
---|---|---|---|---|
SSD-MobilenetV1 | 29.3 MB | 27.9 MB | 1.7.0 | 10 |
Tensorflow SSD-MobileNetV1 ==> ONNX SSD-MobileNetV1
Refer to this conversion and inference notebook for more details on how to inference this model using onnxruntime and define environment variables for the model.
import onnxruntime as rt
# Load model and run inference
# Start from ORT 1.10, ORT requires explicitly setting the providers parameter if you want to use execution providers
# other than the default CPU provider (as opposed to the previous behavior of providers getting set/registered by default
# based on the build flags) when instantiating InferenceSession.
# For example, if NVIDIA GPU is available and ORT Python package is built with CUDA, then call API as following:
# rt.InferenceSession(path/to/model, providers=['CUDAExecutionProvider'])
sess = rt.InferenceSession(os.path.join(WORK, MODEL + ".onnx"))
result = sess.run(outputs, {"image_tensor:0": img_data})
num_detections, detection_boxes, detection_scores, detection_classes = result
# print number of detections
print(num_detections)
print(detection_classes)
# produce outputs in this order
outputs = ["num_detections:0", "detection_boxes:0", "detection_scores:0", "detection_classes:0"]
This model does not require fixed image dimensions. Input batch size is 1, with 3 color channels. Image has these variables: (batch_size, height, width, channels)
.
The following code shows how preprocessing is done. For more information and an example on how preprocessing is done, please visit the tf2onnx conversion and inference notebook for this model.
import numpy as np
from PIL import Image, ImageDraw, ImageColor
import math
import matplotlib.pyplot as plt
# open and display image file
img = Image.open("image file")
plt.axis('off')
plt.imshow(img)
plt.show()
# reshape the flat array returned by img.getdata() to HWC and than add an additial
dimension to make NHWC, aka a batch of images with 1 image in it
img_data = np.array(img.getdata()).reshape(img.size[1], img.size[0], 3)
img_data = np.expand_dims(img_data.astype(np.uint8), axis=0)
It outputs the image with boundary boxes and labels. The full list of classes can be found in the COCO dataset.
Given each batch of images, the model returns 4 tensor arrays:
num_detections
: the number of detections.
detection_boxes
: a list of bounding boxes. Each list item describes a box with top, left, bottom, right relative to the image size.
detection_scores
: the score for each detection with values between 0 and 1 representing probability that a class was detected.
detection_classes
: Array of 10 integers (floating point values) indicating the index of a class label from the COCO class.
# draw boundary boxes and label for each detection
def draw_detection(draw, d, c):
width, height = draw.im.size
# the box is relative to the image size so we multiply with height and width to get pixels
top = d[0] * height
left = d[1] * width
bottom = d[2] * height
right = d[3] * width
top = max(0, np.floor(top + 0.5).astype('int32'))
left = max(0, np.floor(left + 0.5).astype('int32'))
bottom = min(height, np.floor(bottom + 0.5).astype('int32'))
right = min(width, np.floor(right + 0.5).astype('int32'))
label = coco_classes[c]
label_size = draw.textsize(label)
if top - label_size[1] >= 0:
text_origin = tuple(np.array([left, top - label_size[1]]))
else:
text_origin = tuple(np.array([left, top + 1]))
color = ImageColor.getrgb("red")
thickness = 0
draw.rectangle([left + thickness, top + thickness, right - thickness, bottom - thickness],
outline=color)
draw.text(text_origin, label, fill=color), font=font)
# loop over the results - each returned tensor is a batch
batch_size = num_detections.shape[0]
draw = ImageDraw.Draw(img)
for batch in range(0, batch_size):
for detection in range(0, int(num_detections[batch])):
c = detection_classes[batch][detection]
d = detection_boxes[batch][detection]
draw_detection(draw, d, c)
# show image file with object detection boundary boxes and labels
plt.figure(figsize=(80, 40))
plt.axis('off')
plt.imshow(img)
plt.show()
The model was trained using MS COCO 2017 Train Images, Val Images, and Train/Val annotations.
Training details for the SSD-MobileNet model's preprocessing is found in this tutorial notebook. The notebook also details how the ONNX model was converted.
Tensorflow to ONNX conversion tutorial. The notebook references how to run an evaluation on the SSD-MobilenetV1 model and export it as a saved model. It also details how to convert the tensorflow model into onnx, and how to run its preprocessing and postprocessing code for the inputs and outputs.
MIT License