Releases: ARM-software/armnn
Release 22.08
Summary
New Features
- Add Arm NN Support Library.
- The Arm NN Support Library for Android NNAPI is a shared library which has all the functionalities of existing HAL drivers for Android NNAPI.
- It is available from Android S.
- It focuses on update-ability of ML operators.
- Guiide on how to build Arm NN Support Library is available armnn/shim/BuildGuideShimSupportLibrary.md.
- SLTS (Support Library Test Suit) compliance.
- Support for Batch MatMul in CpuRef.
TfLite Parser
- Added support for LOG.
- Added support for SIN.
ExecuteNetwork App Changes:
- Refactor of ExecuteNetwork. Now input name, input type, output name, output type and model type are read from the model.
Arm NN Build Tool:
- Introduced Arm NN Build Tool which consists of an official Arm NN Dockerfile for building Arm NN and Arm Compute Library (ACL).
- This tool replaces the majority of our existing build guides as a user-friendly way to build Arm NN (and its dependencies) from scratch.
- Tested on x86_64 (Intel) and aarch64 (Arm) build hosts for the Ubuntu platform.
- Currently supports targeting Linux devices (from Ubuntu 18.04 onwards) on x86_64, aarch32 and aarch64 architectures.
Bug Fixes
- The models in format .armnn (serialized models) were failing in 22.05, this problem has been solved by adding the constant layers before the operator layers.
- Neon fold padding into average pool 2D quantization bug fix.
- Fix segmentation fault when running --bf16-turbo-mode on FPGA.
Other Changes
- General documentation refactor and updates.
- Added LICENSE.spdx for Arm NN
- Delay backend deprecation from 22.11 to 23.08
ABI/API Changes
The following front-end API changes have occurred during the implementation of 22.08 that users should be aware of before upgrading.
.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
Import inputs but don't export outputs fails | 626bd90 | https://review.mlplatform.org/c/ml/armnn/+/7661 | Field m_ExportEnabled has been added to type OptimizerOptions. This field will not be initialized by old clients that have not been recompiled. |
Get non-const IConnectableLayer from I/O slots | 09fa24d | https://review.mlplatform.org/c/ml/armnn/+/7835 | Pure virtual method GetOwningIConnectableLayer ( ) has been added to classes IOutputSlot and IInputSlot. |
Remove deprecated code 22.05 | 4d2eec0 | https://review.mlplatform.org/c/ml/armnn/+/7712 | Removed Symbols: |
Modified SubgraphView returned by GetWorkingCopy() | cea3d49 | https://review.mlplatform.org/c/ml/armnn/+/7852 | Pure virtual method GetSlotIndex ( ) const has been added to class IInputSlot. |
Update the async api to use ExecutionData | 21a6a1a | https://review.mlplatform.org/c/ml/armnn/+/7878 | experimental::IWorkingMemHandle Pure virtual method GetExecutionDataAt ( unsigned int ) has been added to this class. |
The following back-end API changes have occurred during the implementation of 22.08 that users should be aware of before upgrading.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
Update the async api to use ExecutionData | 21a6a1a | https://review.mlplatform.org/c/ml/armnn/+/8051/2 | The following virtual functions have been added to class IBackendInternal: |
Add GetMemoryRequirements to IWorkload | 5e09080 | https://review.mlplatform.org/c/ml/armnn/+/7886 | The following virtual function has been added to class IWorkload: |
Modified SubgraphView returned by GetWorkingCopy() | cea3d49 | https://review.mlplatform.org/c/ml/armnn/+/7852 | The signature of SubgraphView::GetWorkingCopy() has changed, it has now been marked as const to reflect the fact that the graph represented by the working copy does not get altered. |
TfLite Delegate
New features
- Added support for LOG
- Added support for SIN
- Add JNI interface
Bug Fixes
- Fix running MobileBERT on CpuRef
- Only use the macro ARMNN_TFLITE_DELEGATE
- DelegateQuickStartGuide.md errors fix
PyArmNN
- Documentation update running PyArm NN with ONNX parser.
Build Dependencies
Tools | Supported Version |
---|---|
Git | 2.17.1 or later |
SCons | 2.4.1 (Ubuntu) 2.5.1 (Debian) |
Cmake | 3.19.0 |
Tensorflow | 2.5.0 |
Onnx | 1.6.0 |
Flatbuffer | 1.12.0 |
Protobuf | 3.12.0 |
Android NDK | r20b |
mapbox/variant | 1.2.0 |
cxxopts | SHA 12e496da3d486b87fa9df43edea65232ed852510 |
doctest | 2.4.6 |
fmt | 7.0.1 |
ghc | 1.3.2 |
half | 1.12.0 |
stb | 2.16 |
Release 22.05.01
Summary
New Features
This is a patch release of 22.05 where we have implemented Pooling3d custom operator for ArmNN TfLite Delegate. This feature is available in the 22.05 release branch itself (branches/armnn_22_05) and in the tag created for patch release v22.05.01.
Release 22.05
Summary
New Features
- ArmnnTestUtils is now versioned and under ABI compliance checker
- Added support for Int32 CONCATENATION layer for CpuRef
- Added support for Float32 Unidirectional Sequence LSTM layer for CpuAcc and GpuAcc
- Added support for GatherNd for CpuRef, CpuAcc and GpuAcc
- Added support for SQRT for CpuAcc and GpuAcc
- Added support for Depthwise Convolution2d ConstTensorsAsInput for CpuRef, CpuAcc and GpuAcc
- Added support for Conv2d ConstTensorsAsInput for CpuRef, CpuAcc and GpuAcc
- Added support for Fully Connected ConstTensorsAsInput for CpuAcc and GpuAcc
- Added support for MaxPool3D and AveragePool3D for CpuAcc and GpuAcc
- Added support for L2Pooling3D for GpuAcc
- Added support for UnidirectionalLSTM for CpuAcc
- ConstTensorsAsInput: Optimizer Fix - FuseBatchNorm
- ConstTensorsAsInput: Optimizer Fix - FoldPadIntoConvolution2d
- ConstTensorsAsInput: Optimizer Fix - Fp32ToBf16 optimization
TfLite Parser
- Added support for GatherNd
- Added support for FloorDiv
- Added support for UnidirectionalLSTM
- Do not create Floor for FloorDiv layer when the data type is int32
ArmNN Serializer/Deserializer
- Added support for GatherNd
ExecuteNetwork App Changes:
- Added Reuse IO Buffers mode
- Profiling details weights and bias JSON keys deprecated. Will be removed for 22.08
Bug Fixes
- Fixed crashing in profiling
- Fixed the issue with running SimpleSample app in Raspi
- Removed MockBackend.hpp from armnn/src/backends/backendsCommon/test/ to solve problems when using Visual Studio in Windows
- Fixed segfault in RefDepthwiseConvolution2d workload
Other Changes
- ArmNN Baremetal
- Change the namespace from armnn::profiling to arm::pipe
ABI/API Changes
The following front-end API changes have occurred during the implementation of 22.05 that users should be aware of before upgrading.
.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
Change the namespace from armnn::profiling to arm::pipe | 5aa9fd7 | https://review.mlplatform.org/c/ml/armnn/+/7222 | IRuntime::RegisterDebugCallback |
Replace ProfilingService includes with IProfilingService. | af94772 | https://review.mlplatform.org/c/ml/armnn/+/7240 | The following function has had a change in signature meaning it will not be recognized by old applications. BackendRegistry::SetProfilingService |
Remove dependency on armnn::Exception classes from the Profiling code | f9db3ef | https://review.mlplatform.org/c/ml/armnn/+/7280 | Class armnn::BackendProfilingException has been moved to namespace arm::pipe; this will result in older applications not being able to find it. |
Replace armnn:Optional with arm::pipe::Optional in profiling code | decd08b | https://review.mlplatform.org/c/ml/armnn/+/7295 | Class armnn::TimeoutException has been moved to namespace arm::pipe; this will result in older applications not being able to find it. |
Add Unidirectional Sequence Lstm support to TFLite | 5880b91 | https://review.mlplatform.org/c/ml/armnn/+/7023 | Following fields have been added to struct LstmDescriptor: m_CellIntermediateScale m_ForgetIntermediateScale m_HiddenStateScale m_HiddenStateZeroPoint m_InputIntermediateScale m_OutputIntermediateScale As a result of this size of the struct has been changed |
ConstTensorsAsInput: DepthwiseConvolution2d | 0690265 | https://review.mlplatform.org/c/ml/armnn/+/7417 | Pure virtual method VisitDepthwiseConvolution2dLayer ( IConnectableLayer const*, struct DepthwiseConvolution2dDescriptor const&, char const* ) has been added to this class.. |
ConstTensorsAsInput: Conv2d - FrontEnd | b4dd5cc | https://review.mlplatform.org/c/ml/armnn/+/7382 | Pure virtual method VisitConvolution2dLayer ( IConnectableLayer const*, struct Convolution2dDescriptor const&, char const* ) has been added to this class. |
The following back-end API changes have occurred during the implementation of 22.02 that users should be aware of before upgrading.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
Move headers to profiling/client/include | 2776183 | https://review.mlplatform.org/c/ml/armnn/+/7327 | Headers have been moved to profiling/client/include. |
Change the namespace from armnn::profiling to arm::pipe | 5aa9fd7 | https://review.mlplatform.org/c/ml/armnn/+/7222 |
TfLite Delegate
New features
- Added support for GatherNd
Bug Fixes
Note: Arm NN is aware of an issue where converting a model to .armnn will yield unpredictable results when reading back in through the deserializer. This is due to the serializer being dependent on graph topology and the graph being out of order. The graph becomes out of order because of the additional constant layers as inputs that are created through the parsers
PyArmNN
- Added support for GatherNd
- Added Pooling3D
Build Dependencies
Tools | Supported Version |
---|---|
Git | 2.17.1 or later |
SCons | 2.4.1 (Ubuntu) 2.5.1 (Debian) |
Cmake | 3.5.1 (Ubuntu) and 3.7.2 (Debian) |
Tensorflow | 2.5.0 |
Onnx | 1.6.0 |
Flatbuffer | 1.12.0 |
Protobuf | 3.12.0 |
Android NDK | r20b |
mapbox/variant | 1.2.0 |
cxxopts | SHA 12e496da3d486b87fa9df43edea65232ed852510 |
doctest | 2.4.6 |
fmt | 7.0.1 |
ghc | 1.3.2 |
half | 1.12.0 |
stb | 2.16 |
Android 12 Compatibility Testing was performed using the following:
Android Tag | Android Build ID | Mali Driver | Android Compatibility Test Suite | Android Vendor Test Suite |
---|---|---|---|---|
android-12.0.0_r1 | SP1A.210812.015 | r36p0_01eac0-rc0 | 12_r2 (7987736) | 12_r2 (7973604) |
Android 11 Compatibility Testing was performed using the following:
Android Tag | Android Build ID | Mali Driver | Android Compatibility Test Suite | Android Vendor Test Suite |
---|---|---|---|---|
android-11.0.0_r6 | RPM1.210413.002 | r33p0_01eac0 | 11_r5 (7640833) | 11_r5 (7599184) |
Android 10 Compatibility Testing was performed using the following:
Androidtag | Android Build ID | Mali Driver |
---|---|---|
android-10.0.0_r39 | QQ3A.200605.002.A1 | R23P0_01REL0 |
Release 22.02
Summary
New Features
- Add mirror padding support on Pad layer for CpuAcc and GpuAcc.
- Add support for Pool3d FrontEnd, Reference implementation.
TfLite Parser
- Added missing support for reshape operator when the target shape is dynamic and batch size is unknown.
- Added PadV2 support.
- Changed asserts to CHECK in ParserFlatbuffersFixture.hpp.
ArmNN Serializer/Deserializer
- Add support for Pool3d.
Bug Fixes
- Added bounds checking when indexing PermutationVector elements and its correspondent unit tests.
- Fixed output bindings in ExecuteNetwork when using delegate with models with multiple outputs.
- Fixed build issues in x86 Dockerfile.
- Fixed ExNet prints inference time twice.
- Fixed thread safety issues in TimelineDecoder and associated unit tests.
- Fixed some Thread Sanitizer warnings.
- Added check for existing event to fix issue on OpenCL Timer.
- Fixed logging bug where blank messages were being sent.
- Fixed issues on Logging API.
- Fixed async execute test on 32bit Raspberry Pi
Other Changes
- Removed references to blacklist from Model Accuracy tool.
- Removed deprecated code.
- Added ModelOptions and addition timing to ARMNN_LOG.
- Added get_tensorflow.sh script.
- Updated build guides.
- Updated error messages from the flatbuffers parser.
- Added the C++ KWS example.
- Handled optional biases better in Neon/Cl FullyConnected workloads.
- Stabilise the Backend API:
- Backend developers should now be able to limit includes to headers in include/armnn/backends/
- Moved CompatibleTypes.hpp to the armnnUtils library.
- Added forwarding header for src/armnn/CompatibleTypes.hpp.
- Moved the ArmNN Test Utils code to a physically separate directory.
- Added new method AddPrecompiledLayer() to INetwork.
- Promoted backend headers in backendCommon to armnn/backends.
- Used INetwork rather than Graph for holding layers for OptimizationViews.
- Used IConnectableLayer in SubgraphView rather than Layer in its m_Layers.
- Stabilised the IWorkloadFactory interface with unified strategy.
- Stabilised the ILayerSupport interface with unified strategy.
- Moved SubgraphView to backends include folder.
- Added GetParameters to IConnectableLayer.
- Exposed a new MockWorkloadFactory and MockMemManager.
- Accessing ConstTensors from IConnectableLayer
- Added method of returning a GetSubgraphWorkingCopy (SubgraphView).
- Moved MemCopyTestImpl from acl to armnnTestUtils.
- Support Import of Aligned Host Memory in NNAPI:
- Added CanBeImported to ITensorHandle.
- Implemented CanBeImported function in RefTensorHandle.
- Implemented CanBeImported function in NeonTensorHandle.
- Implemented CanBeImported function in ClTensorHandle.
- Added functionality for CopyAndImportFactoryPair to TensorHandleFactoryRegistry.
- Register CopyAndImportFactoryPairs to RefBackend and unit tests.
- Register CopyAndImportFactoryPairs to NeonBackend and unit tests.
- Register CopyAndImportFactoryPairs to ClBackend and unit tests.
- Added ReplaceTensorHandle functions to IWorkload and BaseWorkload.
- Added ClBaseWorkload and NeonBaseWorkload.
- Modified workloads to extend Neon/Cl BaseWorkload.
- Added ReplaceTensorHandle functions to Neon/CL BaseWorkloads.
- Implemented ICLTensorProxy.
- Added input and output workload slot pairs to LoadedNetwork.
- Added support of aligned host memory.
- Added Forced Import EndToEnd tests to Ref, Neon, and CL.
- Call Cl sync after EnqueueWorkload
- Added EndToEnd tests on reference backend to ensure allocated data can be reused.
ABI/API Changes
The following front-end API changes have occurred during the implementation of 22.02 that users should be aware of before upgrading.
.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
SubgraphView uses IConnectableLayer rather than Layer in its m_Layers | 56ccf68 | https://review.mlplatform.org/c/ml/armnn/+/6807 | Pure virtual method GetOwningIConnectableLayer( ) const has been added to class IOutputSlot.: |
Stabilize the ILayerSupport interface with unified strategy. | 34b429c | https://review.mlplatform.org/c/ml/armnn/+/6903 | Virtual descriptor added to the struct BaseDescriptor, as a result the size of all desciptors has been changed.The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications. |
SubgraphView: Add method of returning a GetSubgraphWorkingCopy. | 9d74ba6 | https://review.mlplatform.org/c/ml/armnn/+/6995 | Pure virtual method GetOwningIConnectableLayer( ) const has been added to class IInputSlot. |
Add support of aligned host memory | e2af6f4 | https://review.mlplatform.org/c/ml/armnn/+/7025 | The following functions have had a change in signature meaning they will not be recognized by old applications: IRuntime::EnqueueWorkload() accepts two new parameters preImportedInputIds and preImportedOutputIds. IRuntime::ImportInputs() accepts a new parameter forceImportMemorySource. IRuntime::ImportOutputs() accepts a new parameter forceImportMemorySource. |
Add GetParameters to IConnectableLayer | e466596 | https://review.mlplatform.org/c/ml/armnn/+/7031 | Pure virtual method GetParameters ( ) const has been added to class IConnectableLayer. Virtual method IsNull ( ) const has been added to class BaseDescriptor. |
Accessing ConstTensors from IConnectableLayer | 2e24175 | https://review.mlplatform.org/c/ml/armnn/+/7040 | Pure virtual method GetConstantTensorsByRef ( ) has been added to class IConnectableLayer. |
Remove deprecated code 22.02 | b28e525 | https://review.mlplatform.org/c/ml/armnn/+/7104 | Deprecated LayerSupport.hpp and included IsXXXLayerSupported() functions have been removed as they have been replaced with ABI Stable ILayerSupport interface and the BackendHelper.hpp GetILayerSupportByBackendId() function. |
The following back-end API changes have occurred during the implementation of 22.02 that users should be aware of before upgrading.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
Add a Pooling3d FrontEnd and Ref Implementation | 7b885b3 | https://review.mlplatform.org/c/ml/armnn/+/6511 | ILayerSupport.hpp |
Stabilize the ILayerSupport interface with unified strategy. | 34b429c | https://review.mlplatform.org/c/ml/armnn/+/6903 | |
Stabilize the IWorkloadFactory interface with unified strategy | 611c7fb | https://review.mlplatform.org/c/ml/armnn/+/6906 |
TfLite Delegate
New features
- Added Delegate cross compile to x86 Dockerfile
- Added constant input supports for Pack/Stack, Concatenation operators
- Added Int32 supp...
Release 21.11
Arm NN 21.11 was focused on providing new capabilities and improve performance:
New Features
- Added support for Reduce Prod.
- Added support for Channel Shuffle.
- Added support for Conv3d.
- Added support for Symmetric and Reflect Padding on CpuRef backend.
- Added support for statically linking ArmNN TfLite Delegate against Tensorflow Lite.
- Added Import Input/Output functions to async API, allowing for imported I/O buffers to be used by multiple network executions.
- Added external memory manager that allows for customization of network memory management ( Note: currently only fully supported on the CpuRef Backend ).
TfLite Parser
- Added support for Reduce Prod.
- Added support for Conv3d.
- Added support for MirrorPad.
- Added support for size of -1 for Slice.
ONNX Parser
- Add support for Concat
- Add support for Gather
- Add support for Gemm
- The parser supports constant bias or non-constant bias where bias dimension = 1.
- Add support for Shape
- Add support for Unsqueeze
- Add support of min/max as attribute for Clip
ArmNN Serializer/Deserializer
- Add support for Reduce Prod.
- Add support for Channel Shuffle.
- Add support for Conv3d.
- Add support for Symmetric and Reflect Padding.
ExecuteNetwork App Changes
- Added 'do-not-print-output' option to ExecuteNetwork.
Bug Fixes
- Using output-network-details or output-network-details-only during ExecuteNetwork profiling created an invalid JSON format. This has since been fixed.
- Fixed undefined reinterpret_cast in BFloat16.hpp. It fixes gcc builds with version 8 or above.
- Fixed format of the delegate JSON output.
- Fixed bug related with constant tensor flag.
- Fixed pyarmnn py35 unit tests.
Other Changes
- Added sample app for asynchronous execution.
- Printed new Optimize and LoadedNetwork profiling points.
- Added new serialized model supported on Netron.
- Made it possible for backends to add include paths in Android.
- Changed order of the Doxygen tree.
ABI/API Changes
The following front-end API changes have occurred during the implementation of 21.11 that users should be aware of before upgrading. Due to these changes we have bumped our ARMNN_VERSION to 27.0.0, the Delegate to 25.0.0 and also bumping our Parsers to 24.3.0 following Semantic Versioning guidelines.
.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
Remove deprecated code | 1b2654f | https://review.mlplatform.org/c/ml/armnn/+/6254 | Removed Symbols: Removed pure virtual methods, resulting in change to v-table layout: Removed DataTypes: |
'IMemoryOptimizerStrategy Add strategy library and add support in BackendRegistry' | b8a26d8 | https://review.mlplatform.org/c/ml/armnn/+/6297 | struct IRuntime::CreationOptions: |
Add missing runtime parameters to TfLite delegate. | 3e32a87 | https://review.mlplatform.org/c/ml/armnn/+/6388 | class Delegate: class DelegateOptions had the following fields added and so the size of the inclusive type has been changed. |
Profiling instrumentation throughout the Optimizer | f1e0ad3 | https://review.mlplatform.org/c/ml/armnn/+/6432 | struct OptimizerOptions: class Delegate: class DelegateOptions: Objects of these classes can be allocated by the applications and old size will be hardcoded at the compile time. Call of any exported constructor will break the memory of neighboring objects on the stack or heap. This is due to addition of m_ProfilingEnabled to the OptimizerOptions used in constructors of both Delegate classes. |
Fix armnn_external_delegate option parsing | b1c62f1 | https://review.mlplatform.org/c/ml/armnn/+/6519 | class Delegate: class DelegateOptions: Objects of these classes can be allocated by the applications and old size will be hardcoded at the compile time. Call of any exported constructor will break the memory of neighboring objects on the stack or heap. |
Support the new memory API in loaded network | b1aad42 | https://review.mlplatform.org/c/ml/armnn/+/6552 | class INetworkProperties: The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications. |
The following back-end API changes have occurred during the implementation of 21.11 that users should be aware of before upgrading.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
Remove deprecated code | 1b2654f | https://review.mlplatform.org/c/ml/armnn/+/6254 | IBackendInternal.hpp Removed Symbols: Removed Aliases: ILayerSupport.hpp Removed Symbols: |
Add Channel Shuffle Front end and Ref Implementation | 51f6777 | https://review.mlplatform.org/c/ml/armnn/+/6211 | ILayerSupport.hpp |
Add Conv3d FrontEnd and Ref Implementation | b63a311... |
Release 21.08
Summary
Arm NN 21.08 was focused on providing new capabilities and improve performance::
- Added the ability to import protected DMA Buffers and allow Arm NN to run inferences that are in Protected GPU Memory. As well as providing Custom Memory Allocator which supports importing malloc, Dma_buf and protected Dma buffers.
- Users with multi core NPUs has been given the ability to pin inferences to selected cores giving them the ability to balance parallel workloads across the NPU and increase throughput.
- Boost has been completely removed from the code base making Arm NN easier to integrate into other software stacks.
- Added support for non-constant weights and biases on FullyConnected which lay the groundwork for supporting more models.
- More operators supported on Arm NN, TfLite Parser, TfLite Delegate and Android NNAPI driver.
New Features
- Moved unit tests from BOOST to doctest.
- UNIDIRECTIONAL_SEQUENCE_LSTM Operator support added on CpuRef backend.
- Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].
- Reduce Operator can now support multiple axes.
- Optimisation added to fuse PAD Operator into Depthwise Convolution Operator.
- Added SIN and LOG support to ElementWiseUnary Operator on CpuRef, CpuAcc (Only LOG is supported) and GpuAcc backends.
- Added SHAPE Operator support on CpuRef backend.
- Moved useful test utilities to new static library (libarmnnTestUtils.a).
- Added ability to create multiple LoadedNetworks from one OptimizedNetwork.
- Arm NN TfLite Delegate Image Classification sample application added to samples directory.
- Added fully comprehensive Arm NN Operator list page to Doxygen.
- Added support to allow Arm NN to run inferences that are in Protected GPU Memory.
- Creation of Protected Memory is handled via a Custom Memory Allocator which supports importing malloc, Dma_buf and protected DMA buffers.
TfLite Parser
- EXPAND_DIMS Operator support added.
- PRELU Operator support added.
- SHAPE Operator support added.
- Comparison Operator support added (EQUAL, GREATER, GREATER_EQUAL, LESS, LESS_EQUAL and NOT_EQUAL).
- Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].
- Added support for shape_signature, which will now be the preferred way to detect dynamic tensors.
- If creating an instance of the ITfLiteParser and the model used is dynamic, then please ensure that m_InferAndValidate is set in the TfLiteParserOptions and m_shapeInferenceMethod is set to InferAndValidate in the OptimizerOptions.
ArmNN Serializer/Deserializer
- Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].
- Added SIN and LOG support to ElementWiseUnary Operator.
- UNIDIRECTIONAL_SEQUENCE_LSTM Operator support added.
ExecuteNetwork App Changes
- Added option to specify what size Arm NN thread pool to use when running inferences asynchronously.
- Added support for qasymms8 (int8) and added qasymmu8 (uint8) as alias for qasymm8.
- Added option to specify different input data for every iteration of ExecuteNetwork.
- Added option to print additional information such as the TensorInfo, Descriptor and Convolution method when profiling is enabled.
NOTE: To run dynamic models through ExecuteNetwork the --infer-output-shape flag should be set.
Bug Fixes
- Removed duplicate check for Dequantize input type when checking if operator is supported.
- Fixed undefined behaviour in PolymorphicDowncast.
- Fixed binding of reference to null pointer in RefFullyConnectedWorkload.
- Fixed PermutationVector.end() to cope with dimensions < 5 in PermutationVector class.
- Fixed cl_ext.h include path in CL backend.
- Fixed bugs in PreCompiledLayer. E.g. A new shared_ptr was being created instead of allowing std::move to convert the unique_ptr into a shared_ptr.
- Fixed gcc 9.3.0 compiler warning in TfLiteParser.
- Fixed issue so that the BackendRegistry is cleaned up correctly following negative tests.
Other Changes
- Print Elementwise and Comparison Operator descriptors in a dot graph.
- Added IsConstant flag to TensorInfo. This should be set if using the new AddFullyConnectedLayer Graph API when weights and bias are constant. An example of this can be found in samples/SimpleSample.cpp.
- Added support for qasymms8 (int8) and added qasymmu8 (uint8) as alias for qasymm8 to ImageTensorGenerator.
ABI/API Changes
The following front-end API changes have occurred during the implementation of 21.08 that users should be aware of before upgrading. Due to these changes we have bumped our ARMNN_VERSION to 26.0.0 while also bumping our Parsers and Delegate to 24.2.0 following Semantic Versioning guidelines.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
Rework the async threadpool | f364d53 | https://review.mlplatform.org/c/ml/armnn/+/5801 |
struct INetworkProperties: Field m_NumThreads has been removed from the middle position of this structural type. Size of this type has been changed from 32 bytes to 24 bytes. class IWorkingMemHandle: Pure virtual method GetInferenceId ( ) has been removed from this class. class IAsyncExecutionCallback: The following methods have been removed: |
Add IsConstant flag to TensorInfo | b082ed0 | https://review.mlplatform.org/c/ml/armnn/+/5842 |
An object of this class can be allocated by applications which the old size will be hardcoded at original compile time. Call of any exported constructor will break the memory of neighboring objects on the stack or heap. struct BindingPointInfo: Size of field m_TensorInfo has been changed from 80 bytes to 88 bytes. The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications. |
Add protected mode to ArmNN CreationOptions | 15fcc7e | https://review.mlplatform.org/c/ml/armnn/+/5963 |
|
Add the Custom Memory Allocator interface definition | 801e2d5 | https://review.mlplatform.org/c/ml/armnn/+/5967 |
|
Add front end support for UnidirectionalSequenceLstm on ArmNN | 8ed39ae | https://review.mlplatform.org/c/ml/armnn/+/5956 |
|
JSON profiling output | 554fa09 | https://review.mlplatform.org/c/ml/armnn/+/5968 |
|
ConstTensorsAsInput: FullyConnected | 81beae3 | https://review.mlplatform.org/c/ml/armnn/+/5942 |
|
Adds CustomAllocator interface and Sample App | c1c872f | https://review.mlplatform.org/c/ml/armnn/+/5987 |
class BackendRegistry: Fie... |
Release 21.05
Summary
The 21.05 Release of Arm NN was focused on providing new capabilities to allow users attain higher performance by:
- Making the Arm NN Core thread safe opening the possibility of running multiple inferences on the same model in parallel software threads.
- Allowing graphs on the GPU backend import their input and output buffers either from correctly aligned main memory or from kernel memory exposed as a dma_buf, thus reducing memory usage and saving the time involved in copying data into and out of the GPU memory space.
In addition to this, support was added to allow the MobileBERT network to be parsed and run.
Finally three deprecated components: the Tensorflow Parser, the Caffe Parser and the Arm NN Quantizer tool, were removed.
New Features
- CAST Operator support added on CpuRef, CpuAcc, GpuAcc Backends.
- Non-const weights support added on FULLY_CONNECTED layer for CpuRef Backend.
- Enable Input and Output Memory Import on GPU (Malloc and DmaBuf).
- Asynchronous Network Execution for CpuRef Backend.
- Optimisation added to fuse PAD into Pooling2d if possible.
- ASR sample application added to samples directory.
TfLite Parser
- ABS Operator Support added.
- ARG_MIN Operator Support added.
- CAST Operator Support added.
- LOGICAL_NOT Operator Support added.
- RSQRT Operator Support added.
- Non-const weights support added on FULLY_CONNECTED layer.
- Turn off Biases when data location is -1 (Added to support MobileBERT).
ArmNN Serializer/Deserializer
- Added Signed64 support to Serializer and Deserializer.
- Added QAsymmS8 support to Serializer.
- Added L2 Pooling algorithm to Deserializer.
ExecuteNetwork App Changes
- Asynchronous Network Execution support (Currently for CpuRef Backend).
- Re-enabled GPU profiling in ExecuteNetwork.
Deprecated features
- Deprecated the Caffe Parser.
- Deprecated the Tensorflow Parser.
- Deprecated the Arm NN Quantizer tool.
- Deprecated m_Output_Type from the ArgMinMaxDescriptor: the output type is solely determined by the data type of the output tensor.
Bug Fixes
- Fix CheckProfilingObjectUids test failing on Ubuntu 21.04.
- Fix added to Serializer to handle situations where a shape has some unspecified dimensions.
- Fix added to AddBroadcastReshapeLayer optimisation to prevent modification to constant layers with multiple connections.
- Fix added to use CMake value ${CMAKE_THREAD_LIBS_INIT} throughout instead of 'pthread'.
- Fix added to handle negative axis correctly in ARG_MAX (TfLiteParser) and SPLIT (TfLiteParser & TfLiteDelegate) operators.
- Fixed TfLiteDelegate Normalization & Softmax for Android if NDK is less than r21.
- Fixed Deserializer issue where layer bindings were incorrectly assigning the tensor info of one output to all 4 outputs.
- Fixed x86_64 ArmNN DockerFile.
- Fixed TuningLevel enumeration values to be consistent.
- Fixed YoloV3 test application's incorrect use of std::abs.
- Improved performance on SqueezeNet v1.1.
Other Changes
- Removed cross-wiring in DepthwiseConvolution2d. The permutation of the full tensor info is now performed in armnnUtils::Permuted.
- Moved doctest third-party library to armnn from delegate.
- Updated TfLiteDelegate Python Integration guide with new links. Also added information about the TFLite Model Benchmark Tool.
- Updated Cross Compiling Guide.
- Improved Graph memory usage.
Known Issues
- Intermittent issue on Dma Buf memory import on GPU. This is fix in Mali Driver r30p0.
- There might be performance regressions against v20.08 in Inception v3 using int8 data types on Arm Mali-G77 GPUs. Currently under investigation.
ABI/API Changes
The following front-end API changes have occurred during the implementation of 21.05 that users should be aware of before upgrading. Due to these changes we have bumped our ARMNN_VERSION to 25.0.0 while also bumping our Parsers and Delegate to 24.1.0 following Semantic Versioning guidelines.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
Add Async Queue to IRuntime | e813d67 | https://review.mlplatform.org/c/ml/armnn/+/5493 |
|
Add front-end support for CAST + Add TfLiteParser support for CAST | b392e98 | https://review.mlplatform.org/c/ml/armnn/+/5374 |
|
Add MemorySourceFlags to TensorHandleFactoryRegistry::GetFactory | 73d3e2e | https://review.mlplatform.org/c/ml/armnn/+/5481 |
|
Move ILayerSupport.hpp to backends folder | cae4568 | https://review.mlplatform.org/c/ml/armnn/+/5500 |
|
NonConstWeights: Update front-end and TfLiteDelegate support for FullyConnected Operator | f0a6dec | https://review.mlplatform.org/c/ml/armnn/+/5180 |
|
Refactor Async Network API | 55a8ffd | https://review.mlplatform.org/c/ml/armnn/+/5365 |
|
Remove cross-wiring in depthwise | 7612bd6 | https://review.mlplatform.org/c/ml/armnn/+/5411 |
|
Remove Quantizer | 4a621c4 | https://review.mlplatform.org/c/ml/armnn/+/5486 |
|
The following back-end API changes have occurred during the implementation of 21.05 that users should be aware of before upgrading.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
NonConstWeights: Update front-end and TfLiteDelegate support for FullyConnected Operator | 16fb1a2 | https://review.mlplatform.org/c/ml/armnn/+/5180 |
|
Move ILayerSupport.hpp to backends folder | cae4568 | https://review.mlplatform.org/c/ml/armnn/+/5500 |
|
Generalise ConstCpuTensorHandle | 1f58f03 | https://review.mlplatform.org/c/ml/armnn/+/5515 |
|
Enable import on GPU | e5f0b24 | https://review.mlplatform.org/c/ml/armnn/+/5605 |
|
Release 21.02
Summary
The 21.02 Release provides two major pieces of functionality: one performance related, namely the ability to cache compiled OpenCL kernels when running on the GPU backend. Cached kernel files can be loaded into the runtime eliminating the cost of compiling their associated graphs resulting in significant performance uplift on first execution of a newly loaded graph. The second is that the operators which were not added to the Arm NN Tensorflow Lite delegate in the 20.11 release are now there giving the delegate the same level of operator support as the android-nn-driver.
The other features of the 21.02 release are updating the Tensorflow Lite parser to work with Tensorflow Lite v2.3.1 and changes to the public APIs to make binary compatibility between releases easier to maintain. Each group of public interfaces SDK, backend, TfLiteDelegate etc. have been separately versioned and will have their version independently updated in subsequent releases to indicate changes in their Application Binary Interface (ABI).
Support has also been added for the SSD-MobileNetv2 and SSD-MobileNetv3 models. The models have been verified to execute correctly with good performance. Work to generate accuracy figures for the models using the tensorflow lite coco_object_detection tool is on-going and will be published when complete.
Two configuration options for the CpuAcc backend have been added one to specify the number of threads to use when executing ML workloads on the CPU the other to load an MLGO tuning file to increase the performance of GEMM operations on the CPU.
New Features:
- Added ability to save and load the ClContext through ExecuteNetwork and the Android-nn-driver.
- This will remove the time taken for initial compilation of OpenCL kernels and speed up the first execution.
- Semantic Versioning for ArmNN APIs.
- Arm NN TfLite Delegate (more extensive details in Arm NN TfLite Delegate section)
- Further operator support.
- Add capability to build on Android.
- Verification of Support of SSD-MobileNetv2 & SSD-MobileNetv2.
TfLite Parser
- Added support for ELU activation.
- Support Dilation in Conv2D.
ONNX Parser
- Support Dilation in Conv2D.
Caffe Parser
- Added Dilation support.
- Added argmax deconv support.
ArmNN Serializer
- Serialise ArmNN Model on android-nn-driver.
Public API Changes:
Backend API Changes:
ExecuteNetwork App Changes:
- Two optimization parameters were added to enable saving and loading of the ClContext.
- save-cached-network
- cached-network-filepath
Other changes:
- Make it easier for backends to traverse the subgraph during optimization by sorting Subgraphview layers on construction.
- Added CL/NEON implementation of RANK Workload.
- Added REDUCE layer for REDUCE_MAX, REDUCE_MIN, REDUCE_SUM operators.
- Added REDUCE_MAX, REDUCE_MIN, and REDUCE_SUM operator support CpuRef Backend.
- Added REDUCE_MAX, REDUCE_MIN, and REDUCE_SUM operator support/workload CpuAcc Backend.
- Added REDUCE_MAX, REDUCE_MIN, and REDUCE_SUM operator support/workload GpuAcc Backend.
- Added more Fused Activation unit tests.
- Handle Neon optionality on 32 bit linux platforms.
- Validated MobileNetv2-SSD and MobileNetv3-SSD support.
- Add CpuAcc specific configuration option numberOfThreads.
- Add GpuAcc MLGO tuning file configuration argument.
Bug Fixes:
- Default stride values in depthwise and convolution to 1 instead of 0.
- Fixed transpose conv InferOutputShape.
- Fix incorrect padding value for asymmetric quantized type.
- Fix build breaks for armnnDeserializer test and Threads.cpp for macosx.
- Further fix for macosx where filenames are case insensitive.
- Unittest failure on mipsel/s390x/ppc64/powerpc.
- ArmnnQuantizer incorrectly Quantizes all data types.
- Fixed TFLite parser not parsing TransposeConvolution.
- Fix TfLite parser and ExecuteNetwork issues where error was not thrown in some cases.
- Fix wav2letter not producing correct output for Neon backend.
- Fix ReduceLayer InferOutputShape issue where the correct axis data will be read in TfLiteParser.
- Fix Reduce workload to allow input tensors of any rank into the validate function.
- Updated JsonPrinterTestImpl to use CpuLogitsDLogSoftmaxKernel_#.
- Add missing serializer support for m_DimensionsSpecificity.
- Removed unnecessary friend function in INetwork and fixed TransformIterator operator= to allow compilation on further compilers.
Known issues:
Deprecation Notification:
The following components have been deprecated and will be removed in the next release (21.05) of Arm NN.
- armnnQuantizer :
Now that the Tensorflow Lite Converter has matured post training quantization capabilities, the need for this component has gone. See: https://www.tensorflow.org/model_optimization/guide/quantization/post_training and https://www.tensorflow.org/lite/performance/post_training_quantization for more details. - armnnTfParser :
As Tensorflow Lite is our current recommended deployment environment for Arm NN and the Tensorflow Lite Converter provides a path for converting most common machine learning models into Tensorflow Lite format, the need for a Tensorflow parser has gone. - armnnCaffeParser :
Caffe is no longer as widely used as a framework for machine learning as it once was.
Ubuntu 16.04 LTS is reaching End of Life.
Ubuntu Linux 16.04 LTS will no longer be supported by April 30, 2021.
At that time, Ubuntu 16.04 LTS will no longer receive security patches or other software updates.
Consequently Arm NN will from the 21.08 Release at the end of August 2021 no longer be officially supported on Ubuntu 16.04 LTS but will instead be supported on Ubuntu 18.04 LTS.
TfLite Delegate
New Features:
- Enabled ELU Activation.
- Enabled HARD_SWISH Activation.
- Added GATHER operator support.
- Added Logical AND, NOT and OR operator support.
- Added PAD operator support.
- Added PADV2 operator support.
- Added SPLIT operator support.
- Added SPLIT_V operator support.
- Added ARG_MAX operator support.
- Added ARG_MIN operator support.
- Added LOCAL_RESPONSE_NORMALIZATION operator support.
- Added L2_NORMALIZATION operator support.
- Added BATCH_TO_SPACE_ND operator support.
- Added SPACE_TO_BATCH_ND operator support.
- Added DEPTH_TO_SPACE operator support.
- Added SPACE_TO_DEPTH operator support.
- Added SUM operator support.
- Added REDUCE_MAX, REDUCE_MIN operator support.
- Added FLOOR operator support.
- Added OptimizerOptions
- Reduce Float32 to Float16.
- Reduce Float32 to BFloat16.
- Enable debug data.
- Enable memory import.
- Added STRIDED_SLICE operator support.
- Added LSTM operator support.
Other Changes:
- Provided Android build.
- Removed Tensorflow requirement.
Bug Fixes:
- Fixed fused activation in Fully Connected layer.
- Fixed TfLiteDelegate Reshape operator failure when running models with 2D shape tensor.
Known Issues:
Note: We have added pre-built binaries (please see the Assets) of 21.02 Arm NN along with this release. Please refer to BuildGuideNative.md guide in the armnn/delegate for more information.
Build dependencies:
Tools | Supported Version |
---|---|
Git | 2.17.1 or later |
Scons | 2.4.1 (Ubuntu) and 2.5.1 (Debian) |
CMake | 3.5.1 (Ubuntu) and 3.7.2 (Debian) |
Boost | 1.64 |
Tensorflow | 2.3.1 |
Caffe | tag 1.0 |
Flatbuffer | 1.12.0 |
Protobuf | 3.12.0 |
Eigen3 | 3.3 |
Android NDK | r20b |
mapbox/variant | 1.2.0 |
Android 11 Compatibility Testing was performed using the following:
Android Tag | Android Build ID | Mali Driver | Android Compatibility Test Suite | Android Vendor Test Suite |
---|---|---|---|---|
android-11.0.0_r1 | RP1A.200720.009 | R26P0_01EAC0, R30P0_01EAC0 | 11_r2 (6965179) | 11_r2 (6961477) |
Android 10 Compatibility Testing was performed using the following:
Android Tag | Android Build ID | Mali Driver |
---|---|---|
android-10.0.0_r39 | QQ3A.200605.002.A1 | R23P0_01REL0 |
Note: Going forward Arm NN will be making document updates to the latest release, if we have missed any, and these will be available in github by selecting the doc tag corresponding to the release. For example, we have tag 21.02.doc1 which basically is the 21.02 release and also includes some of the documents which we updated for the 21.02 Release. There are no changes functionality wise. These document changes are cherry picked to the branches/armnn_21_02.
Release 20.11
Summary
The 20.11 Release was intended to provide major improvements to usability and performance in addition to delivering some additional functionality.
The usability enhancements were:
- Added Debian packaging for ArmNN Core, TfLite Parser and PyArmNN to Ubuntu Launchpad. This means users on Linux no longer need to go through a source repository setup and compile in order to start working.
- Addition of TfLite Delegate as well as 21 of its most valuable operators. Allows a much larger set of models to be executed as operators that are not accelerated in the delegate will execute in the TfLite interpreter.
- Removal of the boost framework from all ArmNN code bar our unit tests. Simplifies deployment as the dependency on boost no longer exists.
- Website updates (better layout and more examples).
The performance enhancements were:
- ArmNN integration of Compute Library Activation and Batch Normalization fusing.
- ArmNN exposed the Compute Library fastmath option as a parameter that can be set on a per model basis and in some scenarios will result in the selection of a faster convolution algorithm at the cost of some accuracy (winograd).
The additional functionality was:
- Addition of high priority partner requested Logical AND/OR/NOT operators in NNAPI.
- Support for Android R, verified against CTS 11_r3 (Build Id: 20201114.173303).
- Added support for the EfficientNet-Lite Model.
New Features:
- Added Debian packaging, which allows ArmNN to be installed via our APT repository on Ubuntu's Launchpad.
- Added ability to turn on the Compute Library fast_math option through ExecuteNetwork and the Android-nn-driver.
- Using the fast_math flag can lead to performance improvements in fp32 and fp16 layers but at the cost of some accuracy.
- The fast_math flag will not have any effect on int8 performance.
- Added support for Logical NOT, AND and OR for CpuRef, CpuAcc and GpuAcc.
- Added optimization to fuse BatchNorm into Convolution and Depthwise Convolution in fp32 and fp16.
- Added backend specific optimization to fuse Activations into the previous workload.
- Currently Activations can be fused with Addition, BatchNorm, Convolution, Depthwise Convolution, Division, Multiplication or Subtraction workloads on both CpuAcc and GpuAcc.
- Not all workloads can support all Activations.
- Added AddBroadcastReshapeLayer as optimizer.
- Added Map layer and Map workload. This layer has 1 input slot and 0 output slots and simply calls ->Map() on the input tensor handle.
- Added Unmap layer and Unmap workload. This layer has N input slot and 0 output slots and simply calls ->Unmap() on the input0 tensor handle. The remaining inputs are used for determining scheduling dependencies.
- Added support for TfLite Delegate (More information below in TfLite Delegate section).
TfLite Parser:
- Remove AddBroadcastReshapeLayer from TfLite Parser and added to optimizations.
- TfLite version updated to 2.3.1.
Tf Parser:
- Tensorflow version updated to 2.3.1.
- Add support for 2nd input to ExpandDims in TfParser.
ArmNN Serializer:
- Added support for Logical NOT, AND and OR.
Public API Changes:
Backend API Changes:
ExecuteNetwork App Changes:
- Added ability to enable Compute Library fast_math through ExecuteNetwork.
- Added ability to execute models using TfLiteDelegate.
- Refactored ExecuteNetwork to support cxxopts.
- Allow use of dynamic backendId in execute network.
Other changes:
- Removed remaining boost from ArmNN runtime code (Boost still resides in Unit Tests).
- Removed boost::format and swapped to fmt
- Link fmt statically and change to be header-only interface library
- Removed boost::tokenizer and boost::escaped_list_separator to avoid use of CsvReader
- Removed boost::make_iterator_range and boost::to_upper_copy
- Removed boost::transform_iterator and make_transform_iterator
- Removed boost::numeric_cast
- Removed boost::math::fpc uses
- Removed boost/preprocessor.hpp
- Removed boost::program_options and swapped to cxxopts
- Removed boost::variant and swapped to mapbox/variant library
- Removed Boost from standalone dynamic backend
- Removed remaining Boost references from test executables
- Removed boost::format and swapped to fmt
- Extended dump file with info about fused layers.
- Added SECURITY.md file that contains the security policy, vulnerability reporting procedure and a PGP key that can be used to create secure vulnerability reports.
- Graph::Print() now outputs more information such as number of input/output tensors and tensor dimensions.
- Updated Protobuf to 3.12.0.
- Load dynamic backends for YoloV3 tests.
- Included layer GUID in SerializeToDot output.
- Refactored Optimize(...) function to throw exceptions instead of returning null.
- Speed up the reference backend.
- Added int32 and int64 ArgMax op support.
- Added Quantization operator=() function to Tensor.
- Introduce ModelOptions to OptimizedNetwork.
- Added ability to pass ModelOption through Network::LoadNetwork() to Workload factory.
- Added Load-scope dynamic tensor TfLite tests.
Bug Fixes:
- Fixed Unittest failure while building using EthosNAcc backend.
- Fixed crash on model with Fullyconnected Sigmoid Activation by adding supported activations check to Neon FullyConnected validate.
- Fixed logical VTS skip.
- Fixed issue where EthosNAcc backend would output all zeros when falling back to CpuRef.
- Fixed issue causing SSD Mobilenet f16/uint8 to fail on CpuRef via ExecuteNetwork.
- Fixed issue with signed-int8 quantized model.
- Fixed error running EfficientNet-Lite on GpuAcc.
- Fixed validation for per-channel quantization.
- Fixed segfault between Neon and Cl layers.
- Fixed NonMaxSuppression.
- Fixed Yolov3 producing 0s on Neon.
- Removed Resize from list of layers that need padding in Neon.
- In Neon and CL MUL workloads, use as convert policy SATURATE if one of the inputs is quantized and WRAP for the rest of cases.
- Fixed non-channel per axis quantization.
- Fixed compiler implicit copy deprecation warning by updating Quantization copy constructor.
- PyArmNN has hard dependencies on all parsers when using cmake.
- Fixed cxxopts and ghc cross compilation issue.
- Fixed undefined reference to GetIdStatic() in DynamicBackendsTests.
Known Issues:
- Using a comma separated list to specify multiple compute devices
--compute CpuRef,CpuAcc
when using ExecuteNetwork doesn't work. To use multiple compute devices use--compute CpuRef --compute CpuAcc
.
TfLite Delegate:
New Features:
Current supported operators:
- Activation (ReLu, Relu6, Logistic, and TanH)
- Comparison (Equal, Greater, GreaterOrEqual, Less, LessOrEqual, NotEqual)
- Control (Concat and Mean)
- Convolution (Convolution2d, DepthwiseConvolution2d and TransposeConvolution)
- ElementWiseBinary (Add, Div, Max, Min, Mul, Sub)
- ElementWiseUnary (Abs, Exp, Neg, Rsqrt, Sqrt )
- FullyConnected
- Pooling (MaxPool2d, AveragePool2d and L2Pool2d)
- Quantization (Dequantize and Quantize)
- Redefine (Reshape)
- Resize (Bilinear and NearestNeightbour)
- Softmax (Softmax and LogSoftmax)
- Transpose
Other Changes:
- Created the TfLite Delegate sub-directory in ArmNN.
- Added Fp16 support.
- Updated Tensorflow from v1.15 to v2.3.1.
- Activated compiler warnings when building delegate.
- Added ability to execute models through ExecuteNetwork using the TfLiteDelegate.
Known Issues:
Build dependencies:
Tools | Version we support |
---|---|
Git | 2.17.1 or later |
SCons | 2.4.1 (Ubuntu) and 2.5.1 (Debian) |
CMake | 3.5.1 (Ubuntu) and 3.7.2 (Debian) |
boost | 1.64 |
Tensorflow | 2.3.1 |
Caffe | tag 1.0 |
Onnx | 1.6.0 |
Flatbuffer | 1.12.0 |
Protobuf | 3.12.0 |
Eigen3 | 3.3. |
Android | 10 and 11 |
Mali Driver | r25p1_01bet0 |
Android NDK | r20b |
mapbox/variant | 1.2.0 |
Release 20.08
Summary
The 20.08 Release delivers the following:
- The final tranche of support for Android R ahead of its release in September. Namely QoS functionality, Fill, Rank and the new Resize options.
- Support for dynamic tensors where the size of any unspecified tensors can be inferred at network load time.
- Performance enhancements on the NEON backend eliminating unnecessary copying of data in memory, namely:
- The ability to directly import and export data into an inference graph.
- The ability to use subtensors where possible in split and concat workloads.
- Verification of support for TensorFlow Lite wav2letter and wav2letter tiny models (note: need to do further work to verify accuracy in the next release).
New Features:
- Added FILL operator support for CpuRef, CpuAcc, and GpuAcc.
- Added RANK operator support for CpuRef.
- Added align corner and half pixels support to the RESIZE operator for CpuRef, CpuAcc, and GpuAcc.
- Refactor TensorShape to support Dynamic Tensors (tensors of unknown dimension sizes or even unknown rank).
- Enable memory import in CpuAcc.
- Allow using Sub-Tensors on CpuAcc on ConcatenationLayer if concatenation is along x or y (2 innermost dimensions) and previous layers do not require padding.
- Allow using Sub-Tensors on CpuAcc on SplitterLayer if split is along x or y (2 innermost dimensions) and next layers do not require padding.
TfLite Parser:
- Added DIV operator support.
- Added LEAKY_RELU operator support.
- Added NEG operator support.
- Added HARD_SWISH operator support.
- Added Dynamic Tensors Type 1 (Output shape can be inferred from Input Shape, Input shape always has to be set, Output shape can be dynamic) Support.
Public API Changes:
- Added ITensorHandleFactory::GetCapabilities to calculate capability of the TensorHandleFactor.
ExecuteNetwork App Changes:
- Added -infer-output-shape option: if enabled it will enable ShapeInferenceMethod::InferAndValidate on TfLiteParser which supports dynamic tensors type 1 that Output shape can be inferred from Input shape.
Other changes:
- Added EXP operator support to CpuAcc and GpuAcc.
- Added ADD,SUB,DIV,MUL,MAXIMUM and MINIMUM int32 support in CpuRef.
- Added PRELU float16 support in CpuRef.
- Added ARGMINMAX float16 support in CpuRef.
- Added GATHER support for any axis in CpuAcc and GpuAcc (previously the support was only for axis = 0).
- Added LOGSOFTMAX support in CpuAcc and GpuAcc.
- Added support for subtensors on Splitter layer for splitting x/y axis if no padding required on next layer.
- Added support for subtensors on Concat layer for concatenating x/y axis if no padding required on previous layer.
- Replace boost::filesystem by ghc::filesystem.
- Remove boot/dll.hpp from dynamic backends test.
- Separated external profiling server code into a standalone library.
Bug Fixes:
- Added ability for Mean Reduction to reduce to scalar.
- Added ability for Strided Slice to shrink to scalar.
- Added a check for Strided Slice to not run when stride is negative and ShrinkAxisMask set.
- Fix edge case for transposeConv2d output shape inference.
- Fix deserializer output binding TensorShape logic.
- Fixed issue where AddBroadcastReshapeLayer would always connect the Reshaped input to the first input slot and the other input to the first input slot.
- Remove TfLite Concat and Pad quantazation validation.
Build dependencies
Tools | Version we support |
---|---|
Git | 2.17.1 or later |
SCons | 2.4.1 (Ubuntu) and 2.5.1 (Debian) |
CMake | 3.5.1 (Ubuntu) and 3.7.2 (Debian) |
boost | 1.64 |
Tensorflow | TENSORFLOW_REVISION= 590d6eef7e91a6a7392c8ffffb7b58f2e0c8bc6b (v1.15.0) |
Caffe | CAFFE_REVISION= 7d3f8a7ea43fb06cd9804bc90933c7a91cd88ec9 |
Onnx | ONNX_REVISION= f612532843bd8e24efeab2815e45b436479cc9ab |
Flatbuffer | 1.12.0 |
Protobuf | 3.5.2 |
Eigen3 | 3.3 |
Android | 9 and 10 |
Mali Driver | r25p1_01bet0 |
Android NDK | r20b |