The MLTK model profiler provides information about how efficiently a model may run on an embedded target.
The model profiler allows for executing a .tflite
model file in a simulator or on a physical embedded target.
This guide describes how to run the model profiler from the command-line or Python API.
Alternatively, refer to the Model Profiler Utility which allows
for running the model profiler as a standalone executable with a webpage interface.
_Any_ `.tflite` model file supported by [Tensorflow-Lite Micro](https://github.com/tensorflow/tflite-micro)
will work with the model profiler.
i.e. The `.tflite` does _not_ need to be generated by the MLTK to use the profiler.
_All_ model profiling is done locally. _No_ data is uploaded to a remote server
- Command-line: mltk profile --help
- Python API: profile_model
- Python API examples: profile_model.ipynb
The model profiler returns results for the entire model as well as individual layers of the model.
Name | Description |
---|---|
Name | Name of profiled model |
Accelerator | Name of hardware accelerator |
Input Shape | Shape of the model's input tensor |
Input Data Type | Model input's data type |
Output Shape | Shape of the model's output tensor |
Output Data Type | Model output's data type |
Model File Size | Size of the .tflite model file (this is effectively the flash required by the model) |
Runtime Memory Size | Size of RAM required for Tensorflow-Lite Micro's working memory |
# Operations | Number of mathematical operations required to execute the model |
# Multiply-Accumulates | Number of multiply-accumulate operations required to execute the model |
# Layers | Number of layers in model |
# Unsupported Layers | Number of layers that could not be accelerated due to hardware accelerator constraints |
# Accelerator Cycles | Number of clock cycles required by hardware accelerator |
# CPU Cycles | Number of CPU clock cycles |
CPU Utilization | Percentage of CPU used to execute model |
Clock Rate | CPU clock rate |
Time | Time required to execute model (i.e. latency) |
Energy | Energy required to execute model (relative to CPU idling) |
J/Op | Energy per operation |
J/MAC | Energy per multiply-accumulate |
Ops/s | Operations per second |
MACs/s | Multiply-accumulate per second |
Inference/s | Number of times the model can execute per second |
Name | Description |
---|---|
Index | Model layer index |
OpCode | Kernel Layer name |
# Ops | Number of mathematical operations required by layer |
# MACs | Number of multiply-accumulate operations required by layer |
Acc Cycles | Number of accelerator cycles required by layer |
CPU Cycles | Number of CPU cycles required by layer |
Energy | Energy required by layer (relative to CPU idling) |
Time | Time required to execute layer (i.e. latency) |
Input Shape | Shape(s) of layer input tensor(s) |
Output Shape | Shape(s) of layer output tensor(s) |
Options | Kernel configuration options used by layer |
Supported? | False if the layer was not able to be accelerated, True else |
Error Msg | Error message if layer was not able to be accelerated |
The model profiler has three modes of operation:
The model executes the Tensorflow-Lite Micro ARM CMSIS kernels and reference kernels in a basic simulator.
All returned profiling information is estimated.
- No physical device required
- Estimates CPU cycles and latency
- Estimates required energy per inference
NOTE: Estimates are provided based on the ARM Cortex-M33.
The model executes in hardware accelerator simulator.
All returned profiling information is calculated or estimated.
- No physical device required
- Accelerator cycles calculated in hardware simulator
- Estimates CPU cycles and latency
- Estimates required energy per inference
Estimated numbers are based on the __EFR32xG24__ at 78MHz
The model executes and is profiled on a physical device.
This allows for determining actual profiling numbers (i.e. not calculated or estimated).
- Physical device must be locally connected
- Accelerator cycles, CPU cycles, and latency measured on physical device
- No energy measurements provided
Model profiling from the command-line is done using profile
operation.
For more details on the available command-line options, issue the command:
mltk profile --help
The following are examples of how the profiler can be invoked from the command-line:
Profile the given .tflite
model file in the basic simulator.
With this command, no physical device is required.
This command will also provide profiling results for:
- Estimated latency (i.e. seconds per inference)
- Estimated CPU cycles
- Estimated energy
mltk profile ~/workspace/my_model.tflite --estimates
Profile the given .tflite
model file in the MVP hardware simulator.
With this command, no physical device is required.
This command will also provide profiling results for:
- Estimated latency (i.e. seconds per inference)
- Calculated accelerator cycles
- Estimated CPU cycles
- Estimated energy
mltk profile ~/workspace/my_model.tflite --accelerator MVP --estimates
Profile the given .tflite
model file on a physically connected embedded device using the MVP hardware accelerator.
This command will also provide measured profiling results for:
- Latency (i.e. seconds per inference)
- Accelerator cycles
- CPU cycles
mltk profile ~/workspace/my_model.tflite --accelerator MVP --device
Training a model can be very time-consuming, and it is useful to know how efficiently a
model will execute on an embedded device before investing time and energy into training it.
For this reason, the MLTK profile
command features a --build
flag to build a model
and profile it before the model is fully trained.
In this example, the image_example1 model is built at command-execution-time and profiled in the MVP hardware simulator. Note that only the model specification script is required, it does not need to be trained first.
mltk profile image_example1 --build --accelerator MVP --estimates
The model profiler is accessible via profile_model API.
Examples using this API may be found in profile_model.ipynb