Releases: ARM-software/armnn
Release 24.11
Arm NN SDK
New Features:
- Implemented "All or Nothing" logic in assign backends.
- Added Signed64 support for Constant and Tile workloads.
- Added Int8 and Uint8 support for LogSoftMax to run on CpuAcc and GpuAcc backends.
- Added automatic backend selection of GPU to ExecuteNetwork.
- Added TfLite Turbo Model recognition and enable Turbo Mode.
TosaCommon & TosaRef:
- Added "Convolution3d" support.
- Added support for activation functions "Sigmoid" and "TanH".
- Added support for activation function "HardSwish".
- Added "StridedSlice" support.
- Added "ElementwiseBinary:SqDiff" support.
- Added "Stack" support.
- Added "Dequantize" support.
- Added "DepthToSpace" support.
- Added "Gather" support.
Bug Fixes:
- Fixed the ReduceProdOp Int8 CpuAcc and GpuAcc backend failures in the delegate test suite.
- Fixed Mean operator in TosaCommon and TosaRef.
- Fixed ExecuteNetwork logic error when using -N parameter.
- Fixed convolution delegate test suite failures.
- Fixed UnpackOpTest Acc backend failures in the delegate test suite.
- Fixed layer name error on classic delegate DepthwiseConvolution2d.
- Fixed GatherNd Acc backend failures in the delegate test suite.
- Fixed FloorDiv Acc backend failures in the delegate test suite.
- StridedSliceOp Acc backend failures in the delegate test suite.
- Fixed Pad tests incorrectly reporting unsupported in the delegate test suite.
Removed Features:
- Removed the Arm NN Converter
- Removed the GpuFsa backend.
Other Changes:
- Updated ExecuteNetwork comparison options to return a result value.
- Added Debugging macros to LoadedNetwork.
ABI/API Changes:
The following front-end API changes have occurred during the implementation of 24.11, which users should be aware of before upgrading.
Due to these, changes we have bumped our ARMNN_VERSION to 34.0.0, following Semantic Versioning guidelines.
Feature | SHA | Gerrit Review | Resultant ABI/API Changes |
---|---|---|---|
Make ExecuteNetwork handle automatic selection of GPU backends. | b670f11 | https://review.mlplatform.org/c/ml/armnn/+/12662 | The size of the BackendRegistry class has changed from 192 bytes to 216 bytes. Field m_MappedGpuBackends has been added. |
FixFloorDiv Acc DTS Failures. | 88a149f | https://review.mlplatform.org/c/ml/armnn/+/12411 | BinaryOperatory enum has a new member FloorDiv. The member FloorDiv with a value of 8 has been added. |
Build Dependencies
Tools | Supported Version |
---|---|
Git | 2.17.1 or later |
SCons | 2.4.1 (Ubuntu) and 2.5.1 (Debian) |
CMake | 3.22.1 |
Tensorflow | 2.15.0 |
Onnx | 1.6.0 |
Flatbuffer | 23.5.26 |
Protobuf | 3.12.0 |
Android NDK | r26b |
cxxopts | 3.1.1 |
doctest | 2.4.6 |
fmt | 7.0.1 |
ghc | 1.3.2 |
half | 1.12.0 |
mapbox/variant | 1.1.0 |
stb | 2.16 |
Gemmlowp | 16e8662c34917be0065110bfcd9cc27d30f52fdf |
Release 24.08
Summary
New Features
- Softmax implemented in TosaCommon and TosaRef.
- MEAN implemented in TosaCommon and TosaRef.
- REDUCE_SUM implemented in TosaCommon and TosaRef.
- Activation:Gelu implemented in TosaRef.
- ElementwiseUnary:Log implemented in TosaRef.
- Pad implemented in TosaCommon and TosaRef.
- ElementwiseUnary:Exp implemented in TosaRef.
- BatchMatMul implemented in TosaCommon and TosaRef.
- FullyConnected implemented in TosaCommon and TosaRef.
- Activation:BoundedReLu implemented in TosaCommon and TosaRef.
- Activation:ReLu implemented in TosaCommon and TosaRef.
- DepthwiseConvolution2d Implemented in TosaCommon and TosaRef.
- Implemented quantized ElementwiseBinary Add, Max, Mul and Sub support in TosaCommon and TosaRef.
Bug Fixes
- Fix floating point exception in PerAxisIterator.
- Fix TFLite Parser & Opaque Delegate ExecuteNetwork incorrectly unloading runtime.
- Fix StridedSliceOp out of bounds errors.
- Fix not specified dimensionality errors in classic and opaque delegates.
- Fix warnings when building ArmNN Delegate with GCC-14.1.0.
- Fix ReshapeOp DTS Test Failures.
- Fix ConstFloat DTS Test Failures.
- Fix Broadcast DTS test failures.
- Fix BatchMatMul DTS test failures.
Other Changes
- Update to Arm NN documentation for 24.08 release.
- Review and update documentation for the 24.08 release.
- Android support for evaluate_network.sh.
- Added Gemmlowp for fixed point arithmetic on small values.
- Moved Arm NN repository to use CMake 3.22.
- Added Numpy Support to Execute Network.
ABI/API Changes
No ABI breaking change occurred in ArmNN Core (libarmnn.so) and so the Major version has not changed, only a bump in minor version (33.1.0 → 33.2.0).
No API breaking back-end changes have occurred during the implementation of 24.08.
Build Dependencies
Tools | Supported Version |
---|---|
Git | 2.17.1 or later |
SCons | 2.4.1 (Ubuntu) 2.5.1 (Debian) |
Cmake | 3.22.1 |
Tensorflow | 2.15.0 |
Onnx | 1.6.0 |
Flatbuffer | 23.5.26 |
Protobuf | 3.12.0 |
Android NDK | r26b |
mapbox/variant | 1.2.0 |
cxxopts | 3.1.1 |
doctest | 2.4.6 |
fmt | 7.0.1 |
ghc | 1.3.2 |
half | 1.12.0 |
mapbox/variant | 1.1.0 |
stb | 2.16 |
Gemmlowp | 16e8662c34917be0065110bfcd9cc27d30f52fdf |
Release 24.05
Summary
New Features
- ScatterNd Operator Implementation.
- Added support to delegate and opaque delegate.
- Added support to Serializer and Deserializer.
- Added support to TFLite parser.
- End to End tests added.
- Added support for CpuRef and GpuAcc.
- Adding options to serialize networks in ExecuteNetwork.
- Add a build option to enable the OpenMP scheduler in ACL and made it the default scheduler for ACL builds.
- Add Boolean data type to Debug layer support.
- Update TOSA Common and TosaRef to use TOSA v0.80.
- Update build-tool README to include macOS support.
Bug Fixes
- ExecuteNetwork fix for abort after inference.
- Fix for failing CTS Float16 tests.
- Enable serialize-to-armnn only when ARMNN_SERIALIZER is on.
- TosaCommon backend
- In TosaCommon, modify the way the unique names for the inputs are generated.
- CreateRescaleTosaOperator() modified.
- Move ComputeSplitAxis() to backendsCommon/WorkloadUtils.
- For LeakyRelu, add TosaRefEndToEndTests and enable FP16 in TOSA mapping.
- Fix quantized Conv2d TOSA mapping.
- Broadcast handling for Comparison layer is inconsistent.
- Remove limitations on zero scale value in quantization.
- Fix failing fsrcnn test.
- Fix broken link in the delegate README.
- Fix runtime memory handling in delegate and Arm NN executor.
- Remove use of std::clamp.
- Syntax change to allow building on older compilers.
- Assert audit and removal.
Other Changes
- Deprecation notices for items to be removed in 24.08 release.
- Review and update documentation for operators added in 24.05 release.
- Update to Arm NN documentation for 24.05 release.
- Update python pillow version.
- Remove reference to 22.08 release in docker README.
- Minor change to the printouts in ExecuteNetwork.
- Enable build of execute network in build tool.
- Arm NN build tool script update for delegate header and so files.
ABI/API Changes
No API breaking front-end changes have occurred during the implementation of 24.05.
No API breaking back-end changes have occurred during the implementation of 24.05.
Build Dependencies
Tools | Supported Version |
---|---|
Git | 2.17.1 or later |
SCons | 2.4.1 (Ubuntu) 2.5.1 (Debian) |
Cmake | 3.19.0 (Ubuntu) and 3.19.0 (Debian) |
Tensorflow | 2.15.0 |
Onnx | 1.6.0 |
Flatbuffer | 23.5.26 |
Protobuf | 3.12.0 |
Android NDK | r26b |
mapbox/variant | 1.2.0 |
cxxopts | 3.1.1 |
doctest | 2.4.6 |
fmt | 8.3.0 |
ghc | 1.3.2 |
half | 1.12.0 |
mapbox/variant | 1.2.0 |
stb | 2.16 |
xxd | 1.10 |
Release 24.02
Summary
New Features
- ArmNN to TOSA backend:
- LeakyRelu Activation support added
- Quantize support added
- Maximum support added
- Split support added
- Resize Nearest Neighbour support added
- GpuFsa Backend (Dynamic Fusion)
- RESIZE/SCALE support added
- CAST support added
- POOL2d support added
- SUB support added
- ADD support added
- DEPTHWISE CONVOLUTION 2D support added
- CONVOLUTION 2D support added
- Updated to Android NDK r26b.
- Updated to TensorFlow 2.15.
- Added optimization to remove reshape operators where possible to CL, Neon and Ref backends.
Bug Fixes
- Removed implicit sign conversion which could cause compile errors.
- Fixed memory leak which only happens during profiling and reference Resize workload's align corners is true.
- Fixed build failures on C++ 14 compilers.
- Fixed build tool errors when building for Android target.
Other Changes
- Delegate Unit Tests are now only built for the backends which are being built.
- Increased end to end testing for two layer and three layer MaxPool2d.
- In ExecuteNetwork added support to serialize to dot graph for the Arm NN Delegates.
ABI/API Changes
No API breaking front-end changes have occurred during the implementation of 24.02.
No API breaking back-end changes have occurred during the implementation of 24.02.
Note: The minimum API level supported by Arm NN AAR file is 27.
Build Dependencies
Tools | Supported Version |
---|---|
Git | 2.17.1 or later |
SCons | 2.4.1 (Ubuntu) 2.5.1 (Debian) |
Cmake | 3.19.0 (Ubuntu) and 3.19.0 (Debian) |
Tensorflow | 2.15.0 |
Onnx | 1.6.0 |
Flatbuffer | 23.5.26 |
Protobuf | 3.12.0 |
Android NDK | r26b |
mapbox/variant | 1.2.0 |
cxxopts | 3.1.1 |
doctest | 2.4.6 |
fmt | 8.3.0 |
ghc | 1.3.2 |
half | 1.12.0 |
mapbox/variant | 1.2.0 |
stb | 2.16 |
xxd | 1.10 |
Release 23.11
Summary
New Features
- Add support for BROADCAST_TO layer in CpuRef, and remove it when it is followed by ElementWise layer.
- Add an optimization that fuses Add+Mul+Add+(Optional Relu) layers in CpuAcc.
- Add support for GELU activation layer in CpuRef, CpuAcc, GpuAcc.
- Upgrade Arm NN to Tensorflow 2.14
- Add Signed64 support
- Add support for Signed64 data type in Cast layer
- Add a script that evaluates the performance of a network
- Add ReverseV2 CL and Neon Workloads
TfLite Parser
- Add support for BROADCAST_TO layer.
- Add support for GELU activation layer.
- Updating TfLite parser to ignore VALIDATION: subgraphs
Arm NN Serializer/Deserializer:
- Add support for GELU activation layer.
Bug Fixes
- Fix UnidirectionalSequenceLstm
- Fix weights checking when converting in Support Library
- Fix unsafe Usages of Memcpy in Armnn
- Fix for -Wno-sign-conversion in profiling test in gcc9
- Fix ElementwiseBinary missing from NeonBackend activation fusion optimization
- Fix Reshape and concat invalid results
- Remove unnecessary Prelu restriction in quantization
- Remove unnecessary Square Difference restriction in quantization
Other Changes
- Update the Arm NN Execute Network app --help
- Introduce clang-format scripts to ArmNN
- Remove profiling detail for ConstTensorAsInputs Layers
- Install missing profiling headers
- Remove ASSERTs from deserializer code
- Remove ASSERTs from armnnUtils code
- Remove ASSERTs from shim code
- Update documentation to correct C++ version: C++ 17
- Removing explicit block on non constant bias in NEON CONV2D, allowing Arm Compute Library to handle this.
ABI/API Changes
The following front-end API changes have occurred during the implementation of 23.11 that users should be aware of before upgrading. Due to these changes we have bumped our ARMNN_VERSION to 33.1.0 and our OPAQUE_DELEGATE_VERSION to 2.0.0, following Semantic Versioning guidelines.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
Add ArmNNSettings to Opaque Delegate | 3e4b608 | https://review.mlplatform.org/c/ml/armnn/+/10493 | 2 changes have occurred:
|
No API breaking back-end changes have occurred during the implementation of 23.11
TfLite Delegate
- Add support for BROADCAST_TO layer to Classic and opaque delegate.
- Add support for GELU activation layer to classic and opaque delegate.
- Add ArmNNSettings parser function for Opaque Delegate.
- Improve logging in the delegate
Bug Fixes
- Reduce Sum uint8 failing. The fix was to only treat Reduce Prod Uint8 as a special case, as opposite of treating all reduce operations in uint8 as special case (kTfLiteAffineQuantization → kTfLiteNoQuantization)
- Fix Issue with delegate supporting FP16 models
- Delegate Test Suite: Fix reshape floating point exception
- Delegate Test Suite: Fix default scale/offset issue
- Delegate Test Suite: Fix ElementWise isnan assert
- Delegate Test Suite: Fix Unspecified dimension while using ShapeInferenceMethod::ValidateOnly
- Delegate Test Suite: Fix QuantizePerChannel tests
- Delegate Test Suite: Fix Gather and GatherNd Tests in CpuRef
PyArmNN
- Update requests version in PyArm NN
- Bump Pillow version from 9.3.0 to 10.0.1
Build Dependencies
Tools | Supported Version |
---|---|
Git | 2.17.1 or later |
SCons | 2.4.1 (Ubuntu) 2.5.1 (Debian) |
Cmake | 3.19.0 (Ubuntu) and 3.19.0 (Debian) |
Tensorflow | 2.14.0 |
Onnx | 1.6.0 |
Flatbuffer | 23.5.26 |
Protobuf | 3.12.0 |
Android NDK | r25 |
mapbox/variant | 1.2.0 |
cxxopts | 3.1.1 |
doctest | 2.4.6 |
fmt | 8.3.0 |
ghc | 1.3.2 |
half | 1.12.0 |
mapbox/variant | 1.2.0 |
stb | 2.16 |
xxd | 1.10 |
Release 23.08
Summary
New Features
- Added support for
tile
operator inCpuRef
,CpuAcc
,GpuAcc
. - Added support for
reverse_v2
operator inCpuRef
. - Added
pow
andsquared_difference
asElementWiseBinary
layers inCpuRef
,CpuAcc
, andGpuAcc
. - Added
squared_difference
,power
andceil
toTypeUtils.hpp
. - Enabled dynamic / non-constant bias for:
- Fully-Connected layers in
CpuAcc
andGpuAcc
- 3-D Convolutional layers in
CpuAcc
andGpuAcc
- Depthwise Convolutional layers in
GpuAcc
- Fully-Connected layers in
- Added
DataType
to.dot
files for constant layers. - Added BinaryElementwiseOperation to .dot files.
- Added a
FileComparisonExecutor
toExecuteNetwork
. - Added an optional
TensorInfo
toInputSlot
. - Added 3D tensors to
batch_to_space
andspace_to_batch
forCpuAcc
andGpuAcc
. - Added check for half-precision floating-point infinity values and backend support (FP16).
- Added backend optimisations to remove
reshape
layers where possible. - Added data layout to tensors in
NeonStridedSliceWorkload
. - Added names to workloads.
- Enabled
slice
end-to-end tests in all backends andSigned32
inCpuRef
. - Added
axis
toViewsDescriptor
. - Refactored ElementBinaryOps to use ElementBinaryLayer.
TfLite Parser
- Added
reverse_v2
support to TFLite Parser. - Added
tile
to TFLite Parser. - Added
square
asmul
in the TFLite Parser. - Check for
options != null
before adding fused activation in TFLite Parser. - Fixed segfault with some models in the TFLite Parser.
Arm NN Serializer/Deserializer:
- Added
tile
to Serialiser/Deserialiser. - Added
reverse_v2
to Serialiser/Deserialiser.
Support library
- Added
reverse_v2
to Support Library. - Added
tile
to Support Library. - Added cache-size check to Support Library.
Bug Fixes
- Fixed incorrect validation of
unidirectional_sequence_lstm
onCL
andNeon
. - Fixed issue with
ExecuteNetwork
when running with TFLite Executor. - Replaced asserts with exceptions in
Gather
reference workload. - Introduced fix to explicitly state the correct header to be included (following prior deprecation warning).
- Fixed XML parsing error in Arm NN Doxygen.
- Fixed
-Werror=unused-result
error. - Introduced fix for
ExecuteNetwork
where--output-network-details-only
was not working with-T delegate
flag. - Introduced fix for duplicate definitions in cross-compilation build.
- Fixed incorrect
Concat
permutation parameters in Support Library. - Removed unnecessary warnings for certain models.
- Introduced fix to allow
SplitterLayer
to use overriddenTensorInfos
correctly. - Introduced fix for some cases where the use of sub-tensors was causing an error.
- Fixed read memory access caused by missing
printf
arguments. - Introduced fix for failing dynamic backend build.
- Fixed issue where the dimension's specificity didn't match the number of dimensions.
- Fixed ambiguous method name in BackendHelper.
- Introduced fix for segmentation fault when an input was directly connected to an output.
- Fixed uninitialised variable error found during static analysis.
- Fixed fault in
ExecuteNetwork
when a model file was passed without an extension. - Fixed GitHub issue where search bar was not working in Doxygen documentation.
Other Changes
- Replaced use of
std::filesystem
withghc::filesystem
. - Refactored
ConnectedToSplitterWithMoreThan4Dims
function to a more generally usefulConnectedToLayerType
function. - Customised
Doxygen
output. - Removal of deprecated code due to be removed in 23.08 or earlier:
- INetworkProperties
- SubgraphView
- ILayerSupport
- WorkloadFactory
- Updated documentation with new operators in 23.08.
- Audited the use of
armnn_assert
.
Known Issues
- Intermittent issue on Dma Buf memory import on GPU. This is fix in Mali Driver r30p0.
- There might be performance regressions against v20.08 in Inception v3 using int8 data types on Arm Mali-G77 GPUs. Currently under investigation.
ABI/API Changes
The following front-end API changes have occurred during the implementation of 23.08 that users should be aware of before upgrading. Due to these changes we have bumped our ARMNN_VERSION to 33.0.0, following Semantic Versioning guidelines.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
Removal of Reshape | 4cc341c | https://review.mlplatform.org/c/ml/armnn/+/9885 | 4 additional virtual methods added to class IInputSlot: SetTensorInfo ( TensorInfo ), GetTensorInfo ( ) const, IsTensorInfoSet ( ) const, IsTensorInfoOverridden ( ) const |
Front end and reference implementation for TILE | 79a06a5 | https://review.mlplatform.org/c/ml/armnn/+/9920 | LayerType enum has had the LastLayer member value changed from 72 to 74 The member Tile with value 74 has been added |
Remove deprecated code | 09e4d05 | https://review.mlplatform.org/c/ml/armnn/+/9266 | 2 functions have been removed from the BatchMatMulDescriptor class:
|
Remove deprecated code (INetworkProperties) | b179382 | https://review.mlplatform.org/c/ml/armnn/+/10001 | INetworkProperties structure has had the following fields removed: m_ExportEnabled m_ImportEnabled |
Remove deprecated code (ILayerSupport) | 6627703 | https://review.mlplatform.org/c/ml/armnn/+/10005 | In ArmNN individual virtual IsXXXSupported() functions in the ILayerSupport class have been removed. This functionality has been replaced by a more ABI compliant model whereby an IsLayerSupported() function now accepts a LayerType argument. In ArmNNTestUtils, removal of 4 virtual methods from class MockLayerSupport:
|
Remove deprecated code (WorkloadFactory) | 7894ef9 | https://review.mlplatform.org/c/ml/armnn/+/10006 | In ArmNNTestUtils the MockWorkloadFactory class has had the following virtual method removed: CreateInput ( InputQueueDescriptor const&, struct WorkloadInfo const& ) const |
Added Axis to ViewsDescriptor | fca5916 | https://review.mlplatform.org/c/ml/armnn/+/10073 | The size of thte ViewsDescriptor structure has changed from 48 bytes to 56 bytes Field m_IsAxisSet has been added |
The following back-end API changes have occurred during the implementation of 23.08 that users should be aware of before upgrading.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
Add names to workloads | 7cbe781 | https://review.mlplatform.org/c/ml/armnn/+/9983 | Pure virtual method GetName ( ) const has been added to IWorkload class. Size of CopyMemGenericWorkload class has been changed from 152 bytes to 184 bytes. |
Remove deprecated code (SubgraphView) | 0f3e9a0 | https://review.mlplatform.org/c/ml/armnn/+/10009 | Removed the following methods from SubgraphView: GetInputSlot, GetIputSlots, GetLayers, GetOutputSlot, GetOutputtSlots. Resulted in a change to return types for the following methods in SubgraphView as follows: begin() returns SubgraphView::IConnectableLayerIterator, begin() const returns SubgraphView::ConstIConnectableIterator, cbegin() const returns, SubgraphView::ConstIConnectableIterator, cend() const returns SubgraphView::ConstIConnectableIterator , end () returns SubgraphView::IConnectableLayerIterator, end() const returns SubgraphView::ConstIConnectableIterator |
Remove deprecated code (ILayerSupport) | a504834 | https://review.mlplatform.org/c/ml/armnn/+/10071 | Made IWorkloadFactory::CreateWorkload a pure virtual function to force client to write its own implementation. |
Fix coverity error on variable initialize | b9b9792 | https://review.mlplatform.org/c/ml/armnn/+/10075 | Adjusted sequence of variable initialization in struct WorkloadInfo: m_Name m_WeightsTensorInfo m_BiasTensorInfo m_ConvolutionMethod |
TfLite Delegate
- Extended support for 3D tensors (`bat...
Release 23.05
Summary
New Features
- Added support for dynamic weights in CpuAcc and GpuAcc for FullyConnected workloads.
- Added support for Crops in CpuAcc and GpuAcc BatchToSpaceND workloads.
- Added support for int8 and changed the Compute Library kernel used for Fp32 in CpuAcc and GpuAcc Batch MatMul workloads.
- Added Opaque TfLite Delegate which provides the same operator coverage as the existing/classic TfLite Delegate. More information can be found in the TfLite Delegate section below
TfLite Parser
- Added support for CEIL and SPACE_TO_DEPTH operators.
- Fixed bug where calculated output shape wasn't being recorded in ParseSqueeze.
- Fixed segfault in ParseTransposeConv2d when output_shape is not constant.
- Fixed bug where negative axis was being read incorrectly in ParseMean.
- Calculate explicit padding for Transpose Convolution using output shape, if specified.
ONNX Parser
- Added support for dimensions > 2 to MatMul/FullyConnected.
Arm NN Serializer/Deserializer:
- Added support for CEIL.
Bug Fixes
- Fixed compare-output output feature in ExecuteNetwork.
- Fixed gcc 13 compiler errors.
Other Changes
- Added ElementwiseBinaryLayer to replace Add, Div, Sub, Maximum, Mul and Minimum layers.
- Updated build Android NDK guide (BuildGuideAndroidNDK.md).
- Set default quantization parameter scale to 1.0, instead of 0.0.
- Fp16ToFp32 and Fp32ToFp16 convert workloads now use arm_compute::NECast in CpuAcc backend, when available. This should in general be faster.
- Added Opaque TfLite Delegate build option to the build-tool.
Known Issues
- Intermittent issue on Dma Buf memory import on GPU. This is fix in Mali Driver r30p0.
- There might be performance regressions against v20.08 in Inception v3 using int8 data types on Arm Mali-G77 GPUs. Currently under investigation.
ABI/API Changes
The following front-end API changes have occurred during the implementation of 23.05 that users should be aware of before upgrading. Note: No ABI breaking change occured in ArmNN Core (libarmnn.so) and so the Major version has not changed, only a bump in minor version (32.0.0 → 32.1.0).
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
Implement Pimpl Idiom for Delegate Options | 1bae865 | https://review.mlplatform.org/c/ml/armnn/+/9358 | Size of class DelegateOptions has been changed from 488 bytes to 8 bytes. Layout of parameter's stack of several functions has been changed and therefore parameters at higher positions in the stack may be incorrectly initialized by applications. Size of class Delegate has been changed from 552 bytes to 72 bytes. Size of field m_Options has been changed from 488 bytes to 8 bytes. The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications. |
Implement Pimpl Idiom for OptimizerOptions | c5ee0d7 | https://review.mlplatform.org/c/ml/armnn/+/9369 | The following functions have been changed to now accept OptimizerOptionsOpaque argument instead of the unstable OptimizerOptions. Note: OptimizerOptionsOpaque will accept an OptimizerOptions to it's constructor so this is not an API break, only an ABI break. DelegateOptions::DelegateOptions [C1] ( enum armnn::Compute computeDevice, struct armnn::OptimizerOptions const& optimizerOptions,armnn::Optionalarmnn::LogSeverityconst& logSeverityLevel, armnn::Optional<std::function<void(arm::pipe::ProfilingGuid, unsigned int,armnn::ITensorHandle*)> >const& func) DelegateOptions::DelegateOptions [C2] ( enum armnn::Compute computeDevice, struct armnn::OptimizerOptions const& optimizerOptions, armnn::Optionalarmnn::LogSeverityconst& logSeverityLevel, armnn::Optional<std::function<void(arm::pipe::ProfilingGuid, unsigned int, armnn::ITensorHandle*)> >const& func) DelegateOptions::DelegateOptions [C1] ( std::vectorarmnn::BackendIdconst& backends, struct armnn::OptimizerOptions const& optimizerOptions, armnn::Optionalarmnn::LogSeverityconst& logSeverityLevel, armnn::Optional<std::function<void(arm::pipe::ProfilingGuid, unsigned int, armnn::ITensorHandle*)> >const& func) DelegateOptions::DelegateOptions [C2] ( std::vectorarmnn::BackendIdconst& backends, struct armnn::OptimizerOptions const& optimizerOptions, armnn::Optionalarmnn::LogSeverityconst& logSeverityLevel, armnn::Optional<std::function<void(arm::pipe::ProfilingGuid, unsigned int, armnn::ITensorHandle*)> >const& func) DelegateOptions::SetOptimizerOptions ( struct armnn::OptimizerOptions const& optimizerOptions) Replacement functions: DelegateOptions::DelegateOptions [C1] ( enum armnn::Compute computeDevice, armnn::OptimizerOptionsOpaque const& optimizerOptions, armnn::Optionalarmnn::LogSeverityconst& logSeverityLevel,armnn::Optional<std::function<void(arm::pipe::ProfilingGuid, unsigned int, armnn::ITensorHandle*)> >const& func) DelegateOptions::DelegateOptions [C2] ( enum armnn::Compute computeDevice, armnn::OptimizerOptionsOpaque const& optimizerOptions, armnn::Optionalarmnn::LogSeverityconst& logSeverityLevel, armnn::Optional<std::function<void(arm::pipe::ProfilingGuid, unsigned int, armnn::ITensorHandle*)> >const& func) DelegateOptions::DelegateOptions [C1] ( std::vectorarmnn::BackendIdconst& backends, armnn::OptimizerOptionsOpaque const& optimizerOptions, armnn::Optionalarmnn::LogSeverityconst& logSeverityLevel, armnn::Optional<std::function<void(arm::pipe::ProfilingGuid, unsigned int, armnn::ITensorHandle*)> >const& func) DelegateOptions::DelegateOptions [C2] ( std::vectorarmnn::BackendIdconst& backends, armnn::OptimizerOptionsOpaque const& optimizerOptions, armnn::Optionalarmnn::LogSeverityconst& logSeverityLevel, armnn::Optional<std::function<void(arm::pipe::ProfilingGuid, unsigned int, armnn::ITensorHandle*)> >const& func) DelegateOptions::SetOptimizerOptions ( armnn::OptimizerOptionsOpaque const& optimizerOptions ) |
Add constant version of IConnectableLayer::GetConstantTensorsByRef | aeec3ce | https://review.mlplatform.org/c/ml/armnn/+/9196 | For class IConnectableLayer a pure virtual method GetConstantTensorsByRef ( ) const has been added. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method. Note: The pure virtual function was added at the end so will not result in an ABI break. However you should usually not add new virtual functions for any reason, even to leaf classes, if the class is intended to remain binary compatible on Windows. Doing so may reorder existing virtual functions and break binary compatibility. As we do not target support for Windows we are not going to consider this an ABI break for ArmNN Core. |
The following back-end API changes have occurred during the implementation of 23.05 that users should be aware of before upgrading.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
Remove GetGraph and include of Graph.hpp header from public header | c1c5f2a | https://review.mlplatform.org/c/ml/armnn/+/9351 | Size of class OptimizationViews has been changed from 360 bytes to 88 bytes. Field m_Graph has been removed from this type. The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications. |
TfLite Delegate
- The existing TfLite Delegate has now been renamed to "Classic" TfLite Delegate to accommodate the new Opaque TfLite Delegate. There has been a file restructure because of this.
- The Opaque TfLite Delegate provides the same operator coverage as the existing/classic TfLite Delegate. A list of these supported operators can be found in the TfLite Delegate section of the documentation
New features
- Added support for CEIL, EXPAND_DIMS and SQUEEZE operators.
Bug Fixes
- Fixed layer support for Comparison, ElementWiseBinary and LogicalBinary operators, by expanding the TensorShape before verifying support, if required.
- Fixed handling of a negative size value in the Slice operator.
- Calculate explicit padding for Transpose Convolution using output shape, if specified.
Build Dependencies
Tools | Supported Version |
---|---|
Git | 2.17.1 or later |
SCons | 2.4.1 (Ubuntu) 2.5.1 (Debian) |
Cmake | 3.5.1 (Ubuntu) and 3.7.2 (Debian) |
Tensorflow | 2.12.0 (SHA 6f692f73cb2043b4a0b0446539cd8c15b3dd9220) |
Onnx | 1.6.0 |
Flatbuffer | 2.0.6 |
Protobuf | 3.12.0 |
Android NDK | r25 |
mapbox/variant | 1.2.0 |
cxxopts | 3.1.1 (SHA eb787304d67ec22f7c3a184ee8b4c481d04357fd) |
doctest | 2.4.6 |
fmt | 7.0.1 |
ghc | 1.3.2 |
half | 1.12.0 |
mapbox/variant | 1.2.0 |
stb | 2.16 |
xxd | 1.10 |
Release 23.02
New Features
- Arm NN TOSA Backend
- Added Concatenation support to TOSA Reference Backend.
- Added Constant layer support to TOSA Reference Backend.
- Added Convolution 2D support to TOSA Reference Backend.
- Added Pooling2d support to TOSA Reference Backend.
- Added Reshape support to TOSA Reference Backend.
- Added RSqrt support to TOSA Reference Backend.
- Added Slice support to TOSA Reference Backend.
- Added Transpose Convolution 2D support to TOSA Reference Backend.
- Added Subtraction and Multiplication support to TOSA Reference Backend.
- Added support for GpuAcc BatchMatMul with FP32.
- Extend BatchMatMul support for 4D tensors in GpuAcc.
ONNX Parser
- Provide a CreateNetworkFromBinary method for the ONNX parser.
TfLite Parser:
- Fixed issue in ParseReshape where the targetShape wasn't always calculated correctly.
- Fixed issue in ParseFullyConnected where the wrong name was used for the ReshapeLayer.
- Added an ExpandDims to the FullyConnected to ensure that we reshape the output correctly.
Bug Fixes
- Bug fixed on ExecuteNetwork when input names where not given, input files were not used.
- Bug Fixed on delegate Profiling in ExecuteNetwork with multiple iterations.
- Bug Fixed for CpuAcc and GpuAcc. BuildArmComputePermutationVector() function needed to be rewritten to account for all possible permutation vectors.
- Fixed an ExecuteNetwork unhandled exception when using option --import-inputs-if-aligned.
- Fixed Arm NNAPI Support Library to fail gracefully if device is unavailable.
- Fixed edge cases where some permute vectors for Arm Compute were not converted correctly.
- Fixed bug where GPU backend options were not being correctly passed by our delegate.
- Fixed bug when converting Constants with Per-Axis Quantization.
- Fixed bug where call on SubstituteSubgraph on working copy of subgraph in Optimize fails.
- Fixed segfault in ExecuteNetwork when no operator is supported by Arm NN.
- Fixed bug for slot replacement during UpdateSubgraphViewSlotPointers.
- Fixed bug for ExecuteNetwork using delegate when output is boolean from comparison layer.
Other Changes
- Disabled BF16-Turbo-Mode and remove conversion layers.
- Added Arm NN include directory into build-tool output.
- Code improvement through removal of unused includes.
- Optimization of IsLayerSupported to reduce calls to it.
- Removed deprecated code due to be removed in 23.02.
- Changed Arm NN Support LIbrary to use static libraries instead of object libraries.
- Added option of static build of Execute Network.
- Improved error handling when ExecuteNetwork creates a directory when -F option used.
- Changed ArmNNExecutors to now share a single IRuntime, which allows ExecuteNetwork to create and run multiple Executors instead of one.
- Added documentation relating to multithreading.
ABI/API Changes
The following front-end API changes have occurred during the implementation of 23.02 that users should be aware of before upgrading.
.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
Optimize the calling of IsLayerSupported(). | 5383767 | https://review.mlplatform.org/c/ml/armnn/+/8742 | In class IConnectableLayer: Pure virtual method SetBackendId (BackendId const&) has been added to this class. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications. |
When creating multiple Executors only the last one works fine | 5446a4d | https://review.mlplatform.org/c/ml/armnn/+/8997 | In class Delegate: Size of this type has been changed from 688 bytes to 680 bytes. The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications. Type of field m_Runtime has been changed from armnn::IRuntimePtr (16 bytes) to armnn::IRuntime* (8 bytes). Size of the inclusive type has been changed |
Fix incorrect last layer in Types.hpp | 6701daf | https://review.mlplatform.org/c/ml/armnn/+/8944 | In enum LayerType: Value of member LastLayer has been changed from 66 to 71. Applications may execute a wrong branch of code in the library and therefore change the behavior. |
Change to MemorySource to keep it usable as a bit mask | 1cebf49 | https://review.mlplatform.org/c/ml/armnn/+/9053 | In enum MemorySource: Value of member Gralloc has been changed from 5 to 8. Applications may execute a wrong branch of code in the library and therefore change the behavior. |
The following back-end API changes have occurred during the implementation of 23.02 that users should be aware of before upgrading.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
Remove deprecated code due to be removed in 23.02 | ec67a0f | https://review.mlplatform.org/c/ml/armnn/+/8319 | In struct Convolution2dQueueDescriptor, DepthwiseConvolution2dQueueDescriptor and FullyConnectedQueueDescriptor: Field m_Bias has been removed from this type. Field m_Weight has been removed from this type. 1) Applications will access incorrect memory when attempting to access this field. 2) Size of the inclusive type has been changed. 3) The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications. In class BaseWorkload , BaseWorkload and BaseWorkload: 1) Size of the inclusive type has been changed. 2) Previous accesses of applications and library functions to this field and fields at higher positions of the structure definition may be broken. 3) The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications. |
Return INetwork* not INetworkPtr& from OptimizationViews::GetINetwork() | 5b2145c | https://review.mlplatform.org/c/ml/armnn/+/8828 | In OptimizationViews::GetNetwork(): Base type of return value has been changed from std::unique_ptr<INetwork, void()(INetwork)> to INetwork. Recompilation of a client program may be broken. |
Allow working copy SubgraphView to get Original Slots | 01f7269 | https://review.mlplatform.org/c/ml/armnn/+/8918 | In class SubgraphView: Base class std::enable_shared_from_this has been added. 1) Size of the class has been changed from 160 bytes to 176 bytes. 2) The memory layout in this class has been shifted by 16 bytes. 3) The class has only inline or auto-generated constructors which will be copied to applications at compile time and will allocate an older memory layout. Call of any exported method of this class may access a memory outside the allocated objects or inside the older memory structure and result in crash or incorrect behavior of applications. 4) The memory layout and size of subclasses will be changed. |
TfLite Delegate
New features
- Added support for Slice operator.
- Made change to allow constant tensors as inputs for input data in the delegate.
Bug Fixes
- Fixed delegate fallback during VisitNode so that an ArmNN exception is now caught and the process is handed over to TFLite.
- Added an ExpandDims to the FullyConnected to ensure that we reshape the output correctly.
- Fixed delegate fallback when fused activation is unsupported.
- Fixed uncaught warnings treated as errors in delegate release build.
PyArm NN
- Add installation instructions for prebuilt binaries.
Build Dependencies
Tools | Supported Version |
---|---|
Git | 2.17.1 or later |
SCons | 2.4.1 (Ubuntu) 2.5.1 (Debian) |
Cmake | 3.5.1 (Ubuntu) and 3.7.2 (Debian) |
Tensorflow | 2.10.0 |
Onnx | 1.6.0 |
Flatbuffer | 2.0.6 |
Protobuf | 3.12.0 |
Android NDK | r25 |
mapbox/variant | 1.2.0 |
cxxopts | SHA 12e496da3d486b87fa9df43edea65232ed852510 |
doctest | 2.4.6 |
fmt | 7.0.1 |
ghc | 1.3.2 |
half | 1.12.0 |
stb | 2.16 |
xxd | 1.10 |
Release 22.11.01
Summary
This is a patch release to fix an issue in the Arm Support Library encountered on Android phones where the OpenCL libraries could not be detected.
In this case the 22.11 release was detecting the issue and throwing an exception but the Tensorflow Lite runtime was expecting an error code so fallback to the runtime was failing.
In this release an error code is being returned when a misconfigured/missing OpenCL installation is encountered and the Tensorflow Lite runtime is taking over execution of the graph as expected.
This 22.11.01 release contains all the features of Arm NN 22.11 release. Please find release note for 22.11 here https://github.com/ARM-software/armnn/releases/tag/v22.11.
Release 22.11
Summary
New Features
- ArmNN to TOSA backend:
- Added TOSA Mappings backbone structure with support for Addition operator (Float32).
- Implemented simple TOSA Reference Backend skeleton.
- Implemented TosaRefBackend::OptimizeSubgraphView.
- Integrated TOSA Serialization Library into Arm NN.
- Integrated TOSA Reference Model into Armn NN.
- BATCH_MATMUL:
- Added adjoint and transpose parameters to BATCH_MATMUL layer and CpuRef workload.
- Added support for BATCH_MATMUL to Arm NN Support Library.
- Added support for BATCH_MATMUL FP32 to CpuAcc.
- Added BATCH_MATMUL end to end tests.
- Updated to Android NDK r25.
- Updated to TensorFlow 2.10 and Flatbuffers 2.0.6.
TfLite Parser
- Added BATCH_MATMUL to TFLite Parser.
- Fixed bug in TFLite Parser failing to prepare model due to unspecified size buffer data for SLICE operator.
- In TFLite Parser we observed that in BATCH_MATMUL layer, when adjoint parameter was true, the mathematical calculation was transpose. So we linked adjoint from TFLite to transpose in ArmNN.
- Added support for RESHAPE when output 'shape_signature' parameter contains a value of -1 in TFLite Parser.
ArmNN Serializer/Deserializer
- Added support for BATCH_MATMUL to Serializer/Deserializer.
Bug Fixes
- Fixed bug in SubgraphView::SubstituteSubgraph where IOutputSlots were incorrectly overridden.
- Fixed bug in ExecuteNetwork when iterations and input files are not matching.
- Updated SubgraphView Selector to give deterministic results.
- Fixed bug in ArmNNExecutor where errors from LoadNetwork were being ignored in.
- Fixed bug with debug mode not working correctly with Constant Tensors as Inputs.
- Fixed incorrect kernel measurement in profiling output.
- Fixed ExecuteNetwork for multiple outputs.
- Make the AllowExpandedDims option work.
- Fixed output format issue for int8 when using -w in ExecuteNetwork.
Other Changes
- Added runtime options to Doxygen.
- Added message deprecating the use of master branch. main branch is now used.
- Removed deprecated code due to be removed in 22.08 as we cold not do this in 22.08.
- Removed deprecated code due to be removed in 22.11.
- Delayed the removal of deprecated weights and bias by one release.
- Generalized get_compute_library.sh usage.
- Use ARMNN_VERSION for Support Library version String.
- Removed aarch32 build from build-tool.
- Forward declare ILocalPacketHandlerSharedPtr in IRuntime.hpp
- Use stricter file extension check in CreateParser.
Note: Following the upgrades to Tensorflow 2.10 and Flatbuffers 2.0.6 a compiler that supports C++17 is now required. This will prevent compilation on some older operating systems, e.g. Debian 9.
ABI/API Changes
The following front-end API changes have occurred during the implementation of 22.11 that users should be aware of before upgrading.
.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
Remove deprecated code 22.08 | 48f9d5d | https://review.mlplatform.org/c/ml/armnn/+/8167 | Removed Symbols: INetwork::AddConvolution2dLayer ( struct Convolution2dDescriptor const& convolution2dDescriptor, ConstTensor const& weights, Optionalconst& biases, char const* name ). INetwork::AddDepthwiseConvolution2dLayer ( struct DepthwiseConvolution2dDescriptor const& convolution2dDescriptor, ConstTensor const& weights, Optionalconst& biases, char const* name ) |
Implement simple TOSA Reference Backend skeleton | ae8a6f5 | https://review.mlplatform.org/c/ml/armnn/+/8082 | In class MockWorkloadFactory the following has changed: |
Fix AllowExpandedDims option | 16c76d5 | https://review.mlplatform.org/c/ml/armnn/+/8419 | Added Symbols: |
Add functionality to print output tensors to file | 7bbf565 | https://review.mlplatform.org/c/ml/armnn/+/8421 | struct OptimizerOptions: Field m_DebugToFile has been added at the middle position of this structural type. Layout of structure fields has been changed and therefore fields at higher positions of the structure definition may be incorrectly accessed by applications. |
The following back-end API changes have occurred during the implementation of 22.11 that users should be aware of before upgrading.
Feature | SHA | Gerrit Review | Resultant ABI/API changes |
---|---|---|---|
Implement simple TOSA Reference Backend skeleton | ae8a6f5 | https://review.mlplatform.org/c/ml/armnn/+/8082 | ILayerSupport.hpp Changed pure virtual function IsChannelShuffleSupported to virtual function.WorkloadFactory.hpp.The relative position of virtual function CreateInput(const InputQueueDescriptor& descriptor, const WorkloadInfo& info) const has been moved.The layout of v-table has been changed. Call of these virtual methods may result in crash or incorrect behavior of applications. |
Fix AllowExpandedDims option | 16c76d5 | https://review.mlplatform.org/c/ml/armnn/+/8419 | const has been added for the constructor OptimizationViews(const NetworkOptions& networkOptions = {}) : m_INetwork(INetwork::Create(networkOptions)) As a result the layout of v-table has been changed. Calls of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications. |
Remove deprecated code 22.08 | d1628bf | https://review.mlplatform.org/c/ml/armnn/+/8167 | Removed in WorkloadData.hpp ResizeBilinearQueueDescriptor has been removed. |
TfLite Delegate
New features
- Added a no fallback mode to the TfLite Delegate. This should only be used for testing purposes.
Build Dependencies
Tools | Supported Version |
---|---|
Git | 2.17.1 or later |
SCons | 2.4.1 (Ubuntu) 2.5.1 (Debian) |
Cmake | 3.5.1 (Ubuntu) and 3.7.2 (Debian) |
Tensorflow | 2.10.0 |
Onnx | 1.6.0 |
Flatbuffer | 2.0.6 |
Protobuf | 3.12.0 |
Android NDK | r25 |
mapbox/variant | 1.2.0 |
cxxopts | SHA 12e496da3d486b87fa9df43edea65232ed852510 |
doctest | 2.4.6 |
fmt | 7.0.1 |
ghc | 1.3.2 |
half | 1.12.0 |
stb | 2.16 |
xxd | 1.10 |