Can I just sub the ASL model/labels files? #4

slowrunner · 2023-03-27T17:03:19Z

slowrunner
Mar 27, 2023

I am working on Raspberry Pi4 with a picamera, and have a working mobilenet_v1_1.0_224_quant.tflite classifier program which I tried "blindly" with the ASL model.tflite and labels.txt.

It seems to produce "nothing" when I fill the image with all black, but classifies nearly anything else as "C".

Is there something more I need to do to get this running?

jupyter@GoPiGo:~/tflite $ more asl_picamera.py 
# python3
#
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Example using TF Lite to classify objects with the Raspberry Pi camera."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import argparse
import io
import time
import numpy as np
import picamera

from PIL import Image
from tflite_runtime.interpreter import Interpreter


def load_labels(path):
  with open(path, 'r') as f:
    return {i: line.strip() for i, line in enumerate(f.readlines())}


def set_input_tensor(interpreter, image):
  tensor_index = interpreter.get_input_details()[0]['index']
  input_tensor = interpreter.tensor(tensor_index)()[0]
  input_tensor[:, :] = image


def classify_image(interpreter, image, top_k=1):
  """Returns a sorted array of classification results."""
  set_input_tensor(interpreter, image)
  interpreter.invoke()
  output_details = interpreter.get_output_details()[0]
  output = np.squeeze(interpreter.get_tensor(output_details['index']))

  # If the model is quantized (uint8 data), then dequantize the results
  if output_details['dtype'] == np.uint8:
    scale, zero_point = output_details['quantization']
    output = scale * (output - zero_point)

  ordered = np.argpartition(-output, top_k)
  return [(i, output[i]) for i in ordered[:top_k]]


def main():
  parser = argparse.ArgumentParser(
      formatter_class=argparse.ArgumentDefaultsHelpFormatter)
  parser.add_argument(
      '--model', help='File path of .tflite file.', required=True)
  parser.add_argument(
      '--labels', help='File path of labels file.', required=True)
  parser.add_argument(
      '--preview', help='y to get a preview. Otherwise skip preview window for use in jupyter.',default='y')
  parser.add_argument(
      '--confidence', help='Confidence level before printint output.', type=float, default='0.7')
  args = parser.parse_args()

  labels = load_labels(args.labels)
  preview = args.preview == 'y'
  confidence = int(args.confidence)

  interpreter = Interpreter(args.model)
  interpreter.allocate_tensors()
  _, height, width, _ = interpreter.get_input_details()[0]['shape']

  with picamera.PiCamera(resolution=(640, 480), framerate=30) as camera:
    if preview: 
      camera.start_preview()
    try:
      camera.rotation=180
      stream = io.BytesIO()
      for _ in camera.capture_continuous(
          stream, format='jpeg', use_video_port=True):
        stream.seek(0)
        image = Image.open(stream).convert('RGB').resize((width, height),
                                                         Image.ANTIALIAS)
        start_time = time.time()
        results = classify_image(interpreter, image)
        elapsed_ms = (time.time() - start_time) * 1000
        label_id, prob = results[0]
        stream.seek(0)
        stream.truncate()
        if preview: 
          camera.annotate_text = '%s %.2f\n%.1fms' % (labels[label_id], prob,
                                                    elapsed_ms)
        else:
          if prob > confidence:
            print(f"{labels[label_id]} {prob}")
    finally:
      if preview: 
        camera.stop_preview()


if __name__ == '__main__':
  main()


jupyter@GoPiGo:~/tflite $ python3 /home/jupyter/tflite/asl_picamera.py   --model /home/jupyter/tflite/tflite_models/ASL.tflite   --labels /home/jupyter/tflite/tflite_models/labels_ASL.txt   --preview no   --confidence 0.7
C 0.9379435777664185
C 0.9986691474914551
C 0.9546886086463928
C 0.9770399332046509
C 0.9811861515045166
C 0.9951953291893005
C 0.9951464533805847
C 0.9904552102088928
C 0.9176868796348572
C 0.8588564395904541
Nothing 0.8450303077697754
C 0.49503782391548157
Nothing 0.794183611869812
C 0.691787838935852
C 0.8820510506629944
C 0.47636285424232483
Nothing 0.9474263787269592
Nothing 0.8891041874885559
Nothing 0.9813063740730286

Answered by slowrunner

Mar 29, 2023

In the demo image, it shows “Frame, Crop, View” parameters with “Crop 224x224” and in the Mobilenet doc it mentions this dimension “Our primary network (width multiplier 1, 224 × 224),” but elsewhere in the doc it mentions the input as 320x320: "Both MobileNet models are trained and evaluated with … The input resolution of both models is 320 ×
320. "

I don’t know if the app uses a segmentation step to “find a hand” and then apply a crop of 320² or 224² around “the hand”, but it seems like I need to, at a minimum, try cropping out the center 320² of the picamera’s 640x480 image and see what happens.

But the actual ASL doc states " The model takes an input image of size 224x224 with three c…

View full answer

slowrunner · 2023-03-29T13:14:42Z

slowrunner
Mar 29, 2023
Author

In the demo image, it shows “Frame, Crop, View” parameters with “Crop 224x224” and in the Mobilenet doc it mentions this dimension “Our primary network (width multiplier 1, 224 × 224),” but elsewhere in the doc it mentions the input as 320x320: "Both MobileNet models are trained and evaluated with … The input resolution of both models is 320 ×
320. "

I don’t know if the app uses a segmentation step to “find a hand” and then apply a crop of 320² or 224² around “the hand”, but it seems like I need to, at a minimum, try cropping out the center 320² of the picamera’s 640x480 image and see what happens.

But the actual ASL doc states " The model takes an input image of size 224x224 with three channels per pixel (RGB - Red Green Blue)." Perhaps I will have to crop out the center 224² of the image but since I don't have a preview ability on my robot, it will be difficult to know when my hand is in the proper position - requiring an additional segmentation "find the hand" step.

I so wanted this to be easy!

1 reply

slowrunner Mar 30, 2023
Author

I have completed my attempts at running the published ASL.tflite model on Raspberry Pi, and conclude I am out of ideas and out of my league. I am feeding a 480² RGB image resized with antialiasing to 224² to the model, and it generates 0% accurate ASL predictions for me.

sayannath · 2023-04-12T17:15:21Z

sayannath
Apr 12, 2023
Maintainer

Hey, @slowrunner!

You can use this notebook for reference.

The input image will be in the shape (1, 224, 224, 3).
The labels file is here

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I just sub the ASL model/labels files? #4

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Can I just sub the ASL model/labels files? #4

slowrunner Mar 27, 2023

Is there something more I need to do to get this running?

Replies: 2 comments · 1 reply

slowrunner Mar 29, 2023 Author

slowrunner Mar 30, 2023 Author

sayannath Apr 12, 2023 Maintainer

slowrunner
Mar 27, 2023

Replies: 2 comments 1 reply

slowrunner
Mar 29, 2023
Author

slowrunner Mar 30, 2023
Author

sayannath
Apr 12, 2023
Maintainer