25 Aug 15:43

nikraj01

0ba0b2b

Release 22.08

Summary

New Features

Add Arm NN Support Library.
- The Arm NN Support Library for Android NNAPI is a shared library which has all the functionalities of existing HAL drivers for Android NNAPI.
- It is available from Android S.
- It focuses on update-ability of ML operators.
- Guiide on how to build Arm NN Support Library is available armnn/shim/BuildGuideShimSupportLibrary.md.
- SLTS (Support Library Test Suit) compliance.
Support for Batch MatMul in CpuRef.

TfLite Parser

Added support for LOG.
Added support for SIN.

ExecuteNetwork App Changes:

Refactor of ExecuteNetwork. Now input name, input type, output name, output type and model type are read from the model.

Arm NN Build Tool:

Introduced Arm NN Build Tool which consists of an official Arm NN Dockerfile for building Arm NN and Arm Compute Library (ACL).
This tool replaces the majority of our existing build guides as a user-friendly way to build Arm NN (and its dependencies) from scratch.
Tested on x86_64 (Intel) and aarch64 (Arm) build hosts for the Ubuntu platform.
Currently supports targeting Linux devices (from Ubuntu 18.04 onwards) on x86_64, aarch32 and aarch64 architectures.

Bug Fixes

The models in format .armnn (serialized models) were failing in 22.05, this problem has been solved by adding the constant layers before the operator layers.
Neon fold padding into average pool 2D quantization bug fix.
Fix segmentation fault when running --bf16-turbo-mode on FPGA.

Other Changes

General documentation refactor and updates.
Added LICENSE.spdx for Arm NN
Delay backend deprecation from 22.11 to 23.08

ABI/API Changes

The following front-end API changes have occurred during the implementation of 22.08 that users should be aware of before upgrading.

Feature	SHA	Gerrit Review	Resultant ABI/API changes
Import inputs but don't export outputs fails	`626bd90`	https://review.mlplatform.org/c/ml/armnn/+/7661	Field m_ExportEnabled has been added to type OptimizerOptions. This field will not be initialized by old clients that have not been recompiled.
Get non-const IConnectableLayer from I/O slots	`09fa24d`	https://review.mlplatform.org/c/ml/armnn/+/7835	Pure virtual method GetOwningIConnectableLayer ( ) has been added to classes IOutputSlot and IInputSlot. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Remove deprecated code 22.05	`4d2eec0`	https://review.mlplatform.org/c/ml/armnn/+/7712	Removed Symbols: IsCapabilitySupported ( BackendId const& backend, enum BackendCapability capability ) FullyConnectedDescriptor::GetNumViews ( ) const INetwork::Accept ( ILayerVisitor& visitor ) const Pure virtual method Accept ( ILayerVisitor& ) const has been removed from class IConnectableLayer. The layout of v-table has been changed. Call of this virtual method or any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Modified SubgraphView returned by GetWorkingCopy()	`cea3d49`	https://review.mlplatform.org/c/ml/armnn/+/7852	Pure virtual method GetSlotIndex ( ) const has been added to class IInputSlot. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Update the async api to use ExecutionData	`21a6a1a`	https://review.mlplatform.org/c/ml/armnn/+/7878	experimental::IWorkingMemHandle Pure virtual method GetExecutionDataAt ( unsigned int ) has been added to this class. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications. Pure virtual method GetWorkingMemDescriptor ( LayerGuid ) has been removed from this class. The layout of v-table has been changed. Call of this virtual method or any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.

The following back-end API changes have occurred during the implementation of 22.08 that users should be aware of before upgrading.

Feature	SHA	Gerrit Review	Resultant ABI/API changes
Update the async api to use ExecutionData	`21a6a1a`	https://review.mlplatform.org/c/ml/armnn/+/8051/2	The following virtual functions have been added to class IBackendInternal: virtual ExecutionData CreateExecutionData(WorkingMemDescriptor&) const virtual void UpdateExecutionData(ExecutionData&, WorkingMemDescriptor&) const The layout of v-table has been changed. Call of this virtual method or any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications. The signature of IWorkload::ExecuteAsync() has changed, it now accepts ExecutionData& instead of WorkingMemDescriptor&.
Add GetMemoryRequirements to IWorkload	`5e09080`	https://review.mlplatform.org/c/ml/armnn/+/7886	The following virtual function has been added to class IWorkload: virtual armnn::Optionalarmnn::MemoryRequirements GetMemoryRequirements() The layout of v-table has been changed. Call of this virtual method or any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Modified SubgraphView returned by GetWorkingCopy()	`cea3d49`	https://review.mlplatform.org/c/ml/armnn/+/7852	The signature of SubgraphView::GetWorkingCopy() has changed, it has now been marked as const to reflect the fact that the graph represented by the working copy does not get altered.

TfLite Delegate

New features

Added support for LOG
Added support for SIN
Add JNI interface

Bug Fixes

Fix running MobileBERT on CpuRef
Only use the macro ARMNN_TFLITE_DELEGATE
DelegateQuickStartGuide.md errors fix

PyArmNN

Documentation update running PyArm NN with ONNX parser.

Build Dependencies

Tools	Supported Version
Git	2.17.1 or later
SCons	2.4.1 (Ubuntu) 2.5.1 (Debian)
Cmake	3.19.0
Tensorflow	2.5.0
Onnx	1.6.0
Flatbuffer	1.12.0
Protobuf	3.12.0
Android NDK	r20b
mapbox/variant	1.2.0
cxxopts	SHA 12e496da3d486b87fa9df43edea65232ed852510
doctest	2.4.6
fmt	7.0.1
ghc	1.3.2
half	1.12.0
stb	2.16

Assets 11

20 Jun 08:17

nikraj01

v22.05.01

4a1a475

Release 22.05.01

Summary

New Features

This is a patch release of 22.05 where we have implemented Pooling3d custom operator for ArmNN TfLite Delegate. This feature is available in the 22.05 release branch itself (branches/armnn_22_05) and in the tag created for patch release v22.05.01.

Assets 14

26 May 10:32

nikraj01

v22.05

7bbb79b

Release 22.05

Summary

New Features

ArmnnTestUtils is now versioned and under ABI compliance checker
Added support for Int32 CONCATENATION layer for CpuRef
Added support for Float32 Unidirectional Sequence LSTM layer for CpuAcc and GpuAcc
Added support for GatherNd for CpuRef, CpuAcc and GpuAcc
Added support for SQRT for CpuAcc and GpuAcc
Added support for Depthwise Convolution2d ConstTensorsAsInput for CpuRef, CpuAcc and GpuAcc
Added support for Conv2d ConstTensorsAsInput for CpuRef, CpuAcc and GpuAcc
Added support for Fully Connected ConstTensorsAsInput for CpuAcc and GpuAcc
Added support for MaxPool3D and AveragePool3D for CpuAcc and GpuAcc
Added support for L2Pooling3D for GpuAcc
Added support for UnidirectionalLSTM for CpuAcc
ConstTensorsAsInput: Optimizer Fix - FuseBatchNorm
ConstTensorsAsInput: Optimizer Fix - FoldPadIntoConvolution2d
ConstTensorsAsInput: Optimizer Fix - Fp32ToBf16 optimization

TfLite Parser

Added support for GatherNd
Added support for FloorDiv
Added support for UnidirectionalLSTM
Do not create Floor for FloorDiv layer when the data type is int32

ArmNN Serializer/Deserializer

Added support for GatherNd

ExecuteNetwork App Changes:

Added Reuse IO Buffers mode
Profiling details weights and bias JSON keys deprecated. Will be removed for 22.08

Bug Fixes

Fixed crashing in profiling
Fixed the issue with running SimpleSample app in Raspi
Removed MockBackend.hpp from armnn/src/backends/backendsCommon/test/ to solve problems when using Visual Studio in Windows
Fixed segfault in RefDepthwiseConvolution2d workload

Other Changes

ArmNN Baremetal
- Change the namespace from armnn::profiling to arm::pipe

ABI/API Changes

The following front-end API changes have occurred during the implementation of 22.05 that users should be aware of before upgrading.

Feature	SHA	Gerrit Review	Resultant ABI/API changes
Change the namespace from armnn::profiling to arm::pipe	`5aa9fd7`	https://review.mlplatform.org/c/ml/armnn/+/7222	Pure virtual method GetOwningIConnectableLayer( ) const has been added to class IOutputSlot. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications. The following functions has had a change in signature meaning it will not be recognized by old applications: BackendRegistry::SetProfilingService IRuntime::RegisterDebugCallback Type of field m_LocalPacketHandlers has been changed from std::vector<std::shared_ptrprofiling::ILocalPacketHandler > to std::vector<std::shared_ptrarm::pipe::ILocalPacketHandler > in Runtime::CreateOptions::ExternalProfilingOptions Type of return value has been changed from profiling::ProfilingGuid to arm::pipe::ProfilingGuid in OptimizedNetwork::GetGuid
Replace ProfilingService includes with IProfilingService.	`af94772`	https://review.mlplatform.org/c/ml/armnn/+/7240	The following function has had a change in signature meaning it will not be recognized by old applications. BackendRegistry::SetProfilingService
Remove dependency on armnn::Exception classes from the Profiling code	`f9db3ef`	https://review.mlplatform.org/c/ml/armnn/+/7280	Class armnn::BackendProfilingException has been moved to namespace arm::pipe; this will result in older applications not being able to find it.
Replace armnn:Optional with arm::pipe::Optional in profiling code	`decd08b`	https://review.mlplatform.org/c/ml/armnn/+/7295	Class armnn::TimeoutException has been moved to namespace arm::pipe; this will result in older applications not being able to find it.
Add Unidirectional Sequence Lstm support to TFLite	`5880b91`	https://review.mlplatform.org/c/ml/armnn/+/7023	Following fields have been added to struct LstmDescriptor: m_CellIntermediateScale m_ForgetIntermediateScale m_HiddenStateScale m_HiddenStateZeroPoint m_InputIntermediateScale m_OutputIntermediateScale As a result of this size of the struct has been changed
ConstTensorsAsInput: DepthwiseConvolution2d	`0690265`	https://review.mlplatform.org/c/ml/armnn/+/7417	Pure virtual method VisitDepthwiseConvolution2dLayer ( IConnectableLayer const, struct DepthwiseConvolution2dDescriptor const&, char const ) has been added to this class.. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method.. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
ConstTensorsAsInput: Conv2d - FrontEnd	`b4dd5cc`	https://review.mlplatform.org/c/ml/armnn/+/7382	Pure virtual method VisitConvolution2dLayer ( IConnectableLayer const, struct Convolution2dDescriptor const&, char const ) has been added to this class. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.

The following back-end API changes have occurred during the implementation of 22.02 that users should be aware of before upgrading.

Feature	SHA	Gerrit Review	Resultant ABI/API changes
Move headers to profiling/client/include	`2776183`	https://review.mlplatform.org/c/ml/armnn/+/7327	Headers have been moved to profiling/client/include.
Change the namespace from armnn::profiling to arm::pipe	`5aa9fd7`	https://review.mlplatform.org/c/ml/armnn/+/7222	Namespace changed from armnn: profiling to armnn: pipe:: profiling

TfLite Delegate

New features

Added support for GatherNd

Bug Fixes

Note: Arm NN is aware of an issue where converting a model to .armnn will yield unpredictable results when reading back in through the deserializer. This is due to the serializer being dependent on graph topology and the graph being out of order. The graph becomes out of order because of the additional constant layers as inputs that are created through the parsers

PyArmNN

Added support for GatherNd
Added Pooling3D

Build Dependencies

Tools	Supported Version
Git	2.17.1 or later
SCons	2.4.1 (Ubuntu) 2.5.1 (Debian)
Cmake	3.5.1 (Ubuntu) and 3.7.2 (Debian)
Tensorflow	2.5.0
Onnx	1.6.0
Flatbuffer	1.12.0
Protobuf	3.12.0
Android NDK	r20b
mapbox/variant	1.2.0
cxxopts	SHA 12e496da3d486b87fa9df43edea65232ed852510
doctest	2.4.6
fmt	7.0.1
ghc	1.3.2
half	1.12.0
stb	2.16

Android 12 Compatibility Testing was performed using the following:

Android Tag	Android Build ID	Mali Driver	Android Compatibility Test Suite	Android Vendor Test Suite
android-12.0.0_r1	SP1A.210812.015	r36p0_01eac0-rc0	12_r2 (7987736)	12_r2 (7973604)

Android 11 Compatibility Testing was performed using the following:

Android Tag	Android Build ID	Mali Driver	Android Compatibility Test Suite	Android Vendor Test Suite
android-11.0.0_r6	RPM1.210413.002	r33p0_01eac0	11_r5 (7640833)	11_r5 (7599184)

Android 10 Compatibility Testing was performed using the following:

Androidtag	Android Build ID	Mali Driver
android-10.0.0_r39	QQ3A.200605.002.A1	R23P0_01REL0

Assets 2

03 Mar 10:45

nikraj01

v22.02

b254731

Release 22.02

Summary

New Features

Add mirror padding support on Pad layer for CpuAcc and GpuAcc.
Add support for Pool3d FrontEnd, Reference implementation.

TfLite Parser

Added missing support for reshape operator when the target shape is dynamic and batch size is unknown.
Added PadV2 support.
Changed asserts to CHECK in ParserFlatbuffersFixture.hpp.

ArmNN Serializer/Deserializer

Add support for Pool3d.

Bug Fixes

Added bounds checking when indexing PermutationVector elements and its correspondent unit tests.
Fixed output bindings in ExecuteNetwork when using delegate with models with multiple outputs.
Fixed build issues in x86 Dockerfile.
Fixed ExNet prints inference time twice.
Fixed thread safety issues in TimelineDecoder and associated unit tests.
Fixed some Thread Sanitizer warnings.
Added check for existing event to fix issue on OpenCL Timer.
Fixed logging bug where blank messages were being sent.
Fixed issues on Logging API.
Fixed async execute test on 32bit Raspberry Pi

Other Changes

Removed references to blacklist from Model Accuracy tool.
Removed deprecated code.
Added ModelOptions and addition timing to ARMNN_LOG.
Added get_tensorflow.sh script.
Updated build guides.
Updated error messages from the flatbuffers parser.
Added the C++ KWS example.
Handled optional biases better in Neon/Cl FullyConnected workloads.
Stabilise the Backend API:
- Backend developers should now be able to limit includes to headers in include/armnn/backends/
- Moved CompatibleTypes.hpp to the armnnUtils library.
- Added forwarding header for src/armnn/CompatibleTypes.hpp.
- Moved the ArmNN Test Utils code to a physically separate directory.
- Added new method AddPrecompiledLayer() to INetwork.
- Promoted backend headers in backendCommon to armnn/backends.
- Used INetwork rather than Graph for holding layers for OptimizationViews.
- Used IConnectableLayer in SubgraphView rather than Layer in its m_Layers.
- Stabilised the IWorkloadFactory interface with unified strategy.
- Stabilised the ILayerSupport interface with unified strategy.
- Moved SubgraphView to backends include folder.
- Added GetParameters to IConnectableLayer.
- Exposed a new MockWorkloadFactory and MockMemManager.
- Accessing ConstTensors from IConnectableLayer
- Added method of returning a GetSubgraphWorkingCopy (SubgraphView).
- Moved MemCopyTestImpl from acl to armnnTestUtils.
Support Import of Aligned Host Memory in NNAPI:
- Added CanBeImported to ITensorHandle.
- Implemented CanBeImported function in RefTensorHandle.
- Implemented CanBeImported function in NeonTensorHandle.
- Implemented CanBeImported function in ClTensorHandle.
- Added functionality for CopyAndImportFactoryPair to TensorHandleFactoryRegistry.
- Register CopyAndImportFactoryPairs to RefBackend and unit tests.
- Register CopyAndImportFactoryPairs to NeonBackend and unit tests.
- Register CopyAndImportFactoryPairs to ClBackend and unit tests.
- Added ReplaceTensorHandle functions to IWorkload and BaseWorkload.
- Added ClBaseWorkload and NeonBaseWorkload.
- Modified workloads to extend Neon/Cl BaseWorkload.
- Added ReplaceTensorHandle functions to Neon/CL BaseWorkloads.
- Implemented ICLTensorProxy.
- Added input and output workload slot pairs to LoadedNetwork.
- Added support of aligned host memory.
- Added Forced Import EndToEnd tests to Ref, Neon, and CL.
- Call Cl sync after EnqueueWorkload
- Added EndToEnd tests on reference backend to ensure allocated data can be reused.

ABI/API Changes

The following front-end API changes have occurred during the implementation of 22.02 that users should be aware of before upgrading.

Feature	SHA	Gerrit Review	Resultant ABI/API changes
SubgraphView uses IConnectableLayer rather than Layer in its m_Layers	`56ccf68`	https://review.mlplatform.org/c/ml/armnn/+/6807	Pure virtual method GetOwningIConnectableLayer( ) const has been added to class IOutputSlot.: Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Stabilize the ILayerSupport interface with unified strategy.	`34b429c`	https://review.mlplatform.org/c/ml/armnn/+/6903	Virtual descriptor added to the struct BaseDescriptor, as a result the size of all desciptors has been changed.The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications.
SubgraphView: Add method of returning a GetSubgraphWorkingCopy.	`9d74ba6`	https://review.mlplatform.org/c/ml/armnn/+/6995	Pure virtual method GetOwningIConnectableLayer( ) const has been added to class IInputSlot. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Add support of aligned host memory	`e2af6f4`	https://review.mlplatform.org/c/ml/armnn/+/7025	The following functions have had a change in signature meaning they will not be recognized by old applications: IRuntime::EnqueueWorkload() accepts two new parameters preImportedInputIds and preImportedOutputIds. IRuntime::ImportInputs() accepts a new parameter forceImportMemorySource. IRuntime::ImportOutputs() accepts a new parameter forceImportMemorySource.
Add GetParameters to IConnectableLayer	`e466596`	https://review.mlplatform.org/c/ml/armnn/+/7031	Pure virtual method GetParameters ( ) const has been added to class IConnectableLayer. Virtual method IsNull ( ) const has been added to class BaseDescriptor. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Accessing ConstTensors from IConnectableLayer	`2e24175`	https://review.mlplatform.org/c/ml/armnn/+/7040	Pure virtual method GetConstantTensorsByRef ( ) has been added to class IConnectableLayer. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Remove deprecated code 22.02	`b28e525`	https://review.mlplatform.org/c/ml/armnn/+/7104	Deprecated LayerSupport.hpp and included IsXXXLayerSupported() functions have been removed as they have been replaced with ABI Stable ILayerSupport interface and the BackendHelper.hpp GetILayerSupportByBackendId() function.

The following back-end API changes have occurred during the implementation of 22.02 that users should be aware of before upgrading.

Feature	SHA	Gerrit Review	Resultant ABI/API changes
Add a Pooling3d FrontEnd and Ref Implementation	`7b885b3`	https://review.mlplatform.org/c/ml/armnn/+/6511	ILayerSupport.hpp Pure virtual function IsPooling3dSupported added requiring implementation by backend developers.
Stabilize the ILayerSupport interface with unified strategy.	`34b429c`	https://review.mlplatform.org/c/ml/armnn/+/6903	ABI stable virtual function IsLayerSupported(const LayerType& type, ...) has been added to ILayerSupport.hpp.The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Stabilize the IWorkloadFactory interface with unified strategy	`611c7fb`	https://review.mlplatform.org/c/ml/armnn/+/6906	ABI stable virtual function CreateWorkload(const LayerType& type, ...) has been added to class IWorkloadFactory.The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.

TfLite Delegate

New features

Added Delegate cross compile to x86 Dockerfile
Added constant input supports for Pack/Stack, Concatenation operators
Added Int32 supp...

Assets 14

22 Nov 14:46

nikraj01

v21.11

5e9965c

Release 21.11

Arm NN 21.11 was focused on providing new capabilities and improve performance:

New Features

Added support for Reduce Prod.
Added support for Channel Shuffle.
Added support for Conv3d.
Added support for Symmetric and Reflect Padding on CpuRef backend.
Added support for statically linking ArmNN TfLite Delegate against Tensorflow Lite.
Added Import Input/Output functions to async API, allowing for imported I/O buffers to be used by multiple network executions.
Added external memory manager that allows for customization of network memory management ( Note: currently only fully supported on the CpuRef Backend ).

TfLite Parser

Added support for Reduce Prod.
Added support for Conv3d.
Added support for MirrorPad.
Added support for size of -1 for Slice.

ONNX Parser

Add support for Concat
Add support for Gather
Add support for Gemm
- The parser supports constant bias or non-constant bias where bias dimension = 1.
Add support for Shape
Add support for Unsqueeze
Add support of min/max as attribute for Clip

ArmNN Serializer/Deserializer

Add support for Reduce Prod.
Add support for Channel Shuffle.
Add support for Conv3d.
Add support for Symmetric and Reflect Padding.

ExecuteNetwork App Changes

Added 'do-not-print-output' option to ExecuteNetwork.

Bug Fixes

Using output-network-details or output-network-details-only during ExecuteNetwork profiling created an invalid JSON format. This has since been fixed.
Fixed undefined reinterpret_cast in BFloat16.hpp. It fixes gcc builds with version 8 or above.
Fixed format of the delegate JSON output.
Fixed bug related with constant tensor flag.
Fixed pyarmnn py35 unit tests.

Other Changes

Added sample app for asynchronous execution.
Printed new Optimize and LoadedNetwork profiling points.
Added new serialized model supported on Netron.
Made it possible for backends to add include paths in Android.
Changed order of the Doxygen tree.

ABI/API Changes

The following front-end API changes have occurred during the implementation of 21.11 that users should be aware of before upgrading. Due to these changes we have bumped our ARMNN_VERSION to 27.0.0, the Delegate to 25.0.0 and also bumping our Parsers to 24.3.0 following Semantic Versioning guidelines.

Feature	SHA	Gerrit Review	Resultant ABI/API changes
Remove deprecated code	`1b2654f`	https://review.mlplatform.org/c/ml/armnn/+/6254	Removed Symbols: INetwork::AddAbsLayer ( char const* name ) INetwork::AddDepthwiseConvolution2dLayer ( struct DepthwiseConvolution2dDescriptor const& convolution2dDescriptor, ConstTensor const& weights, ConstTensor const& biases, char const* name ) INetwork::AddDepthwiseConvolution2dLayer ( struct DepthwiseConvolution2dDescriptor const& convolution2dDescriptor, ConstTensor const& weights, char const* name ) INetwork::AddEqualLayer ( char const* name ) INetwork::AddGatherLayer ( char const* name ) INetwork::AddGreaterLayer ( char const* name ) INetwork::AddMergerLayer ( MergerDescriptor const& mergerDescriptor, char const* name ) INetwork::AddResizeBilinearLayer ( struct ResizeBilinearDescriptor const& descriptor, char const* name ) INetwork::AddRsqrtLayer ( char const* name ) LayerSupport::IsMergerSupported ( BackendId const& backend, std::vector<TensorInfo const> inputs, TensorInfo const& output, struct OriginsDescriptor const& descriptor, char reasonIfUnsupported, size_t reasonIfUnsupportedMaxLength ) LayerSupport::IsResizeBilinearSupported ( BackendId const& backend, TensorInfo const& input, TensorInfo const& output, char* reasonIfUnsupported, size_t reasonIfUnsupportedMaxLength ) LayerSupport::IsRsqrtSupported ( BackendId const& backend, TensorInfo const& input, TensorInfo const& output, char* reasonIfUnsupported, size_t reasonIfUnsupportedMaxLength ) LayerSupport::IsSplitterSupported ( BackendId const& backend, TensorInfo const& input, struct ViewsDescriptor const& descriptor, char* reasonIfUnsupported, size_t reasonIfUnsupportedMaxLength ) Removed pure virtual methods, resulting in change to v-table layout: ILayerVisitor::VisitAbsLayer ILayerVisitor::VisitEqualLayer ILayerVisitor::VisitGatherLayer ILayerVisitor::VisitGreaterLayer ILayerVisitor::VisitMergerLayer ILayerVisitor::VisitResizeBilinearLayer ILayerVisitor::VisitRsqrtLayer Removed DataTypes: DataType::QuantisedAsymm8 DateType::QuantisedSymm16 DataType::QuantizedSymm8PerAxis
'IMemoryOptimizerStrategy Add strategy library and add support in BackendRegistry'	`b8a26d8`	https://review.mlplatform.org/c/ml/armnn/+/6297	struct IRuntime::CreationOptions: Member variable m_MemoryOptimizerStrategyMap has been added, changing the size of the type. class BackendRegistry: Member variable m_MemoryOptimizerStrategyMap has been added, changing the size of the type.
Add missing runtime parameters to TfLite delegate.	`3e32a87`	https://review.mlplatform.org/c/ml/armnn/+/6388	class Delegate: Size of field m_Options has been changed from 136 bytes to 352 bytes. class DelegateOptions had the following fields added and so the size of the inclusive type has been changed. Field m_DynamicBackendsPath has been added to this type. Field m_EnableGpuProfiling has been added to this type. Field m_InternalProfilingDetail has been added to this type. Field m_InternalProfilingEnabled has been added to this type. Field m_ProfilingOptions has been added to this type. Field m_SerializeToDot has been added to this type.
Profiling instrumentation throughout the Optimizer	`f1e0ad3`	https://review.mlplatform.org/c/ml/armnn/+/6432	struct OptimizerOptions: Field m_ProfilingEnabled has been added to this type. class Delegate: Size of this class has been increased from 416 bytes to 424 bytes. class DelegateOptions: Size of this class has been increased from 352 bytes to 360 bytes. Objects of these classes can be allocated by the applications and old size will be hardcoded at the compile time. Call of any exported constructor will break the memory of neighboring objects on the stack or heap. This is due to addition of m_ProfilingEnabled to the OptimizerOptions used in constructors of both Delegate classes.
Fix armnn_external_delegate option parsing	`b1c62f1`	https://review.mlplatform.org/c/ml/armnn/+/6519	class Delegate: Size of field m_Options has been changed from 360 bytes to 616 bytes. class DelegateOptions: Field m_RuntimeOptions has been added to this type. Field m_BackendOptions has been removed from this type. Field m_DynamicBackendsPath has been removed from this type. Field m_EnableGpuProfiling has been removed from this type. Objects of these classes can be allocated by the applications and old size will be hardcoded at the compile time. Call of any exported constructor will break the memory of neighboring objects on the stack or heap.
Support the new memory API in loaded network	`b1aad42`	https://review.mlplatform.org/c/ml/armnn/+/6552	class INetworkProperties: Field m_ExternalMemoryManagementEnabled has been added to this type. The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications.

The following back-end API changes have occurred during the implementation of 21.11 that users should be aware of before upgrading.

Feature	SHA	Gerrit Review	Resultant ABI/API changes
Remove deprecated code	`1b2654f`	https://review.mlplatform.org/c/ml/armnn/+/6254	IBackendInternal.hpp Removed Symbols: virtual ISubGraphConverterPtr CreateSubGraphConverter(const std::shared_ptr& subGraph) const; virtual Optimizations GetOptimizations() const; virtual SubGraphUniquePtr OptimizeSubGraph(const SubGraph& subGraph, bool& optimizationAttempted) const; Removed Aliases: GraphUniquePtr, SubgraphViewUniquePtr, ISubGraphConverterPtr, SubGraphUniquePtr ILayerSupport.hpp Removed Symbols: IsEqualSupported IsGatherSupported IsGreaterSupported IsMergerSupported IsResizeBilinearSupported IsRsqrtSupported IsSplitterSupported
Add Channel Shuffle Front end and Ref Implementation	`51f6777`	https://review.mlplatform.org/c/ml/armnn/+/6211	ILayerSupport.hpp Pure virtual function IsChannelShuffleSupported added requiring implementation by backend developers.
Add Conv3d FrontEnd and Ref Implementation	`b63a311`...

Assets 14

26 Aug 16:14

nikraj01

v21.08

217b0fa

Release 21.08

Summary

Arm NN 21.08 was focused on providing new capabilities and improve performance::

Added the ability to import protected DMA Buffers and allow Arm NN to run inferences that are in Protected GPU Memory. As well as providing Custom Memory Allocator which supports importing malloc, Dma_buf and protected Dma buffers.
Users with multi core NPUs has been given the ability to pin inferences to selected cores giving them the ability to balance parallel workloads across the NPU and increase throughput.
Boost has been completely removed from the code base making Arm NN easier to integrate into other software stacks.
Added support for non-constant weights and biases on FullyConnected which lay the groundwork for supporting more models.
More operators supported on Arm NN, TfLite Parser, TfLite Delegate and Android NNAPI driver.

New Features

Moved unit tests from BOOST to doctest.
UNIDIRECTIONAL_SEQUENCE_LSTM Operator support added on CpuRef backend.
Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].
Reduce Operator can now support multiple axes.
Optimisation added to fuse PAD Operator into Depthwise Convolution Operator.
Added SIN and LOG support to ElementWiseUnary Operator on CpuRef, CpuAcc (Only LOG is supported) and GpuAcc backends.
Added SHAPE Operator support on CpuRef backend.
Moved useful test utilities to new static library (libarmnnTestUtils.a).
Added ability to create multiple LoadedNetworks from one OptimizedNetwork.
Arm NN TfLite Delegate Image Classification sample application added to samples directory.
Added fully comprehensive Arm NN Operator list page to Doxygen.
Added support to allow Arm NN to run inferences that are in Protected GPU Memory.
- Creation of Protected Memory is handled via a Custom Memory Allocator which supports importing malloc, Dma_buf and protected DMA buffers.

TfLite Parser

EXPAND_DIMS Operator support added.
PRELU Operator support added.
SHAPE Operator support added.
Comparison Operator support added (EQUAL, GREATER, GREATER_EQUAL, LESS, LESS_EQUAL and NOT_EQUAL).
Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].
Added support for shape_signature, which will now be the preferred way to detect dynamic tensors.
- If creating an instance of the ITfLiteParser and the model used is dynamic, then please ensure that m_InferAndValidate is set in the TfLiteParserOptions and m_shapeInferenceMethod is set to InferAndValidate in the OptimizerOptions.

ArmNN Serializer/Deserializer

Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].
Added SIN and LOG support to ElementWiseUnary Operator.
UNIDIRECTIONAL_SEQUENCE_LSTM Operator support added.

ExecuteNetwork App Changes

Added option to specify what size Arm NN thread pool to use when running inferences asynchronously.
Added support for qasymms8 (int8) and added qasymmu8 (uint8) as alias for qasymm8.
Added option to specify different input data for every iteration of ExecuteNetwork.
Added option to print additional information such as the TensorInfo, Descriptor and Convolution method when profiling is enabled.

NOTE: To run dynamic models through ExecuteNetwork the --infer-output-shape flag should be set.

Bug Fixes

Removed duplicate check for Dequantize input type when checking if operator is supported.
Fixed undefined behaviour in PolymorphicDowncast.
Fixed binding of reference to null pointer in RefFullyConnectedWorkload.
Fixed PermutationVector.end() to cope with dimensions < 5 in PermutationVector class.
Fixed cl_ext.h include path in CL backend.
Fixed bugs in PreCompiledLayer. E.g. A new shared_ptr was being created instead of allowing std::move to convert the unique_ptr into a shared_ptr.
Fixed gcc 9.3.0 compiler warning in TfLiteParser.
Fixed issue so that the BackendRegistry is cleaned up correctly following negative tests.

Other Changes

Print Elementwise and Comparison Operator descriptors in a dot graph.
Added IsConstant flag to TensorInfo. This should be set if using the new AddFullyConnectedLayer Graph API when weights and bias are constant. An example of this can be found in samples/SimpleSample.cpp.
Added support for qasymms8 (int8) and added qasymmu8 (uint8) as alias for qasymm8 to ImageTensorGenerator.

ABI/API Changes

The following front-end API changes have occurred during the implementation of 21.08 that users should be aware of before upgrading. Due to these changes we have bumped our ARMNN_VERSION to 26.0.0 while also bumping our Parsers and Delegate to 24.2.0 following Semantic Versioning guidelines.

Feature	SHA	Gerrit Review	Resultant ABI/API changes
Rework the async threadpool	`f364d53`	https://review.mlplatform.org/c/ml/armnn/+/5801	Be aware that these classes are in the experimental namespace and should be treated as such. struct INetworkProperties: Field m_NumThreads has been removed from the middle position of this structural type. Size of this type has been changed from 32 bytes to 24 bytes. class IWorkingMemHandle: Pure virtual method GetInferenceId ( ) has been removed from this class. class IAsyncExecutionCallback: The following methods have been removed: GetEndTime ( ) const GetStartTime ( ) const Wait ( ) const GetStatus ( ) const
Add IsConstant flag to TensorInfo	`b082ed0`	https://review.mlplatform.org/c/ml/armnn/+/5842	class TensorInfo: Size of this class has been increased from 80 bytes to 88 bytes. This is due to the addition of private member bool m_IsConstant. An object of this class can be allocated by applications which the old size will be hardcoded at original compile time. Call of any exported constructor will break the memory of neighboring objects on the stack or heap. struct BindingPointInfo: Size of field m_TensorInfo has been changed from 80 bytes to 88 bytes. The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications.
Add protected mode to ArmNN CreationOptions	`15fcc7e`	https://review.mlplatform.org/c/ml/armnn/+/5963	struct IRuntime::CreationOptions: Field m_ProtectedMode has been added at the middle position of this structural type. Size of the inclusive type has been changed. Layout of structure fields has been changed and therefore fields at higher positions of the structure definition may be incorrectly accessed by applications.
Add the Custom Memory Allocator interface definition	`801e2d5`	https://review.mlplatform.org/c/ml/armnn/+/5967	struct IRuntime::CreationOptions: Field m_CustomAllocator has been added at the middle position of this structural type. Size of the inclusive type has been changed. Layout of structure fields has been changed and therefore fields at higher positions of the structure definition may be incorrectly accessed by applications.
Add front end support for UnidirectionalSequenceLstm on ArmNN	`8ed39ae`	https://review.mlplatform.org/c/ml/armnn/+/5956	struct LstmDescriptor: Field m_TimeMajor has been added to this type. This field will not be initialized by old clients. Size of the inclusive type has been changed.
JSON profiling output	`554fa09`	https://review.mlplatform.org/c/ml/armnn/+/5968	struct INetworkProperties: Field m_ProfilingEnabled has been added to this type. This field will not be initialized by old clients.
ConstTensorsAsInput: FullyConnected	`81beae3`	https://review.mlplatform.org/c/ml/armnn/+/5942	class ILayerVisitor: Pure virtual method VisitFullyConnectedLayer ( IConnectableLayer const, struct FullyConnectedDescriptor const&, char const ) has been added to this class. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.The following previously deprecated functions have been removed: INetwork::AddFullyConnectedLayer(struct FullyConnectedDescriptor const& fullyConnectedDescriptor, ConstTensor const& weights, ConstTensor const& biases, char const* name) INetwork::AddFullyConnectedLayer(struct FullyConnectedDescriptor const& fullyConnectedDescriptor, ConstTensor const& weights, char const* name)
Adds CustomAllocator interface and Sample App	`c1c872f`	https://review.mlplatform.org/c/ml/armnn/+/5987	struct IRuntime::CreationOptions: Field m_CustomAllocatorMap has been added at the middle position of this structural type. Size of the inclusive type has been changed. Layout of structure fields has been changed and therefore fields at higher positions of the structure definition may be incorrectly accessed by applications. class BackendRegistry: Fie...

Assets 13

20 May 16:16

nikraj01

v21.05

8a4bd66

Release 21.05

Summary

The 21.05 Release of Arm NN was focused on providing new capabilities to allow users attain higher performance by:

Making the Arm NN Core thread safe opening the possibility of running multiple inferences on the same model in parallel software threads.
Allowing graphs on the GPU backend import their input and output buffers either from correctly aligned main memory or from kernel memory exposed as a dma_buf, thus reducing memory usage and saving the time involved in copying data into and out of the GPU memory space.

In addition to this, support was added to allow the MobileBERT network to be parsed and run.

Finally three deprecated components: the Tensorflow Parser, the Caffe Parser and the Arm NN Quantizer tool, were removed.

New Features

CAST Operator support added on CpuRef, CpuAcc, GpuAcc Backends.
Non-const weights support added on FULLY_CONNECTED layer for CpuRef Backend.
Enable Input and Output Memory Import on GPU (Malloc and DmaBuf).
Asynchronous Network Execution for CpuRef Backend.
Optimisation added to fuse PAD into Pooling2d if possible.
ASR sample application added to samples directory.

TfLite Parser

ABS Operator Support added.
ARG_MIN Operator Support added.
CAST Operator Support added.
LOGICAL_NOT Operator Support added.
RSQRT Operator Support added.
Non-const weights support added on FULLY_CONNECTED layer.
Turn off Biases when data location is -1 (Added to support MobileBERT).

ArmNN Serializer/Deserializer

Added Signed64 support to Serializer and Deserializer.
Added QAsymmS8 support to Serializer.
Added L2 Pooling algorithm to Deserializer.

ExecuteNetwork App Changes

Asynchronous Network Execution support (Currently for CpuRef Backend).
Re-enabled GPU profiling in ExecuteNetwork.

Deprecated features

Deprecated the Caffe Parser.
Deprecated the Tensorflow Parser.
Deprecated the Arm NN Quantizer tool.
Deprecated m_Output_Type from the ArgMinMaxDescriptor: the output type is solely determined by the data type of the output tensor.

Bug Fixes

Fix CheckProfilingObjectUids test failing on Ubuntu 21.04.
Fix added to Serializer to handle situations where a shape has some unspecified dimensions.
Fix added to AddBroadcastReshapeLayer optimisation to prevent modification to constant layers with multiple connections.
Fix added to use CMake value ${CMAKE_THREAD_LIBS_INIT} throughout instead of 'pthread'.
Fix added to handle negative axis correctly in ARG_MAX (TfLiteParser) and SPLIT (TfLiteParser & TfLiteDelegate) operators.
Fixed TfLiteDelegate Normalization & Softmax for Android if NDK is less than r21.
Fixed Deserializer issue where layer bindings were incorrectly assigning the tensor info of one output to all 4 outputs.
Fixed x86_64 ArmNN DockerFile.
Fixed TuningLevel enumeration values to be consistent.
Fixed YoloV3 test application's incorrect use of std::abs.
Improved performance on SqueezeNet v1.1.

Other Changes

Removed cross-wiring in DepthwiseConvolution2d. The permutation of the full tensor info is now performed in armnnUtils::Permuted.
Moved doctest third-party library to armnn from delegate.
Updated TfLiteDelegate Python Integration guide with new links. Also added information about the TFLite Model Benchmark Tool.
Updated Cross Compiling Guide.
Improved Graph memory usage.

Known Issues

Intermittent issue on Dma Buf memory import on GPU. This is fix in Mali Driver r30p0.
There might be performance regressions against v20.08 in Inception v3 using int8 data types on Arm Mali-G77 GPUs. Currently under investigation.

ABI/API Changes

The following front-end API changes have occurred during the implementation of 21.05 that users should be aware of before upgrading. Due to these changes we have bumped our ARMNN_VERSION to 25.0.0 while also bumping our Parsers and Delegate to 24.1.0 following Semantic Versioning guidelines.

Feature	SHA	Gerrit Review	Resultant ABI/API changes
Add Async Queue to IRuntime	`e813d67`	https://review.mlplatform.org/c/ml/armnn/+/5493	For struct INetworkProperties the member variable size_t m_NumThreads has been added resulting in the change of size of the inclusive type.
Add front-end support for CAST + Add TfLiteParser support for CAST	`b392e98`	https://review.mlplatform.org/c/ml/armnn/+/5374	For enum class LayerType a new enum for Cast has been added which changes the class member LastLayer to equate to Cast rather than the previous Unmap. We advise against the usage of armnn::LayerType::LastLayer where stability is required.
Add MemorySourceFlags to TensorHandleFactoryRegistry::GetFactory	`73d3e2e`	https://review.mlplatform.org/c/ml/armnn/+/5481	For struct INetworkProperties the member variable MemorySource m_InputSource has been added resulting in the change of size of the inclusive type. For struct INetworkProperties the member variable MemorySource m_OutputSource has been added resulting in the change of size of the inclusive type.
Move ILayerSupport.hpp to backends folder	`cae4568`	https://review.mlplatform.org/c/ml/armnn/+/5500	include/armnn/ILayerSupport.hpp has been moved to include/armnn/backends/ILayerSupport.hpp this is to reflect the fact that ILayerSupport is a back-end interface. Front end users should move to using ABI stable GetILayerSupportByBackendId()
NonConstWeights: Update front-end and TfLiteDelegate support for FullyConnected Operator	`f0a6dec`	https://review.mlplatform.org/c/ml/armnn/+/5180	For class LayerSupportHandle the member variable BackendId m_BackendId has been added resulting in the change of size of the inclusive type. For struct FullyConnectedDescriptor the member variable bool m_ConstantWeights has been added resulting in the change of size of the inclusive type.
Refactor Async Network API	`55a8ffd`	https://review.mlplatform.org/c/ml/armnn/+/5365	For struct INetworkProperties the member variable bool m_AsyncEnabled has been added resulting in the change of size of the inclusive type.
Remove cross-wiring in depthwise	`7612bd6`	https://review.mlplatform.org/c/ml/armnn/+/5411	For method armnnUtils::Permuted() the argument bool perChannelPermute which was defaulted to false has been removed.
Remove Quantizer	`4a621c4`	https://review.mlplatform.org/c/ml/armnn/+/5486	The formerly deprecated class INetworkQuantizer has been removed and so any code making use of it must be altered.

The following back-end API changes have occurred during the implementation of 21.05 that users should be aware of before upgrading.

Feature	SHA	Gerrit Review	Resultant ABI/API changes
NonConstWeights: Update front-end and TfLiteDelegate support for FullyConnected Operator	`16fb1a2`	https://review.mlplatform.org/c/ml/armnn/+/5180	For class IBackendInternal the virtual method HasCapability ( enum BackendCapability ) const has been added. As a result the layout of v-table has been changed. Calls of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Move ILayerSupport.hpp to backends folder	`cae4568`	https://review.mlplatform.org/c/ml/armnn/+/5500	include/armnn/ILayerSupport.hpp has been moved to include/armnn/backends/ILayerSupport.hpp this is to reflect the fact that ILayerSupport is a back-end interface.
Generalise ConstCpuTensorHandle	`1f58f03`	https://review.mlplatform.org/c/ml/armnn/+/5515	include/armnn/backends/CpuTensorHandleFwd.hpp has been deprecated and replaced with include/armnn/backends/TensorHandleFwd.hpp and the forward declarations it contained have also been renamed to remove "Cpu".
Enable import on GPU	`e5f0b24`	https://review.mlplatform.org/c/ml/armnn/+/5605	For class IBackendInternal the virtual method CreateWorkloadFactory with MemorySourceFlags inputFlags/outputFlags arguments has been added. As a result the layout of v-table has been changed. Calls of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications. For class IBackendInternal the virtual method RegisterTensorHandleFactories with MemorySourceFlags inputFlags/outputFlags arguments has been added. As a result the layout of v-table has been changed. Calls of any virtual method at higher position in this class or i...

Assets 6

24 Mar 17:02

nikraj01

v21.02

bc9b098

Release 21.02

Summary

The 21.02 Release provides two major pieces of functionality: one performance related, namely the ability to cache compiled OpenCL kernels when running on the GPU backend. Cached kernel files can be loaded into the runtime eliminating the cost of compiling their associated graphs resulting in significant performance uplift on first execution of a newly loaded graph. The second is that the operators which were not added to the Arm NN Tensorflow Lite delegate in the 20.11 release are now there giving the delegate the same level of operator support as the android-nn-driver.

The other features of the 21.02 release are updating the Tensorflow Lite parser to work with Tensorflow Lite v2.3.1 and changes to the public APIs to make binary compatibility between releases easier to maintain. Each group of public interfaces SDK, backend, TfLiteDelegate etc. have been separately versioned and will have their version independently updated in subsequent releases to indicate changes in their Application Binary Interface (ABI).

Support has also been added for the SSD-MobileNetv2 and SSD-MobileNetv3 models. The models have been verified to execute correctly with good performance. Work to generate accuracy figures for the models using the tensorflow lite coco_object_detection tool is on-going and will be published when complete.

Two configuration options for the CpuAcc backend have been added one to specify the number of threads to use when executing ML workloads on the CPU the other to load an MLGO tuning file to increase the performance of GEMM operations on the CPU.

New Features:

Added ability to save and load the ClContext through ExecuteNetwork and the Android-nn-driver.
- This will remove the time taken for initial compilation of OpenCL kernels and speed up the first execution.
Semantic Versioning for ArmNN APIs.
Arm NN TfLite Delegate (more extensive details in Arm NN TfLite Delegate section)
- Further operator support.
- Add capability to build on Android.
Verification of Support of SSD-MobileNetv2 & SSD-MobileNetv2.

TfLite Parser

Added support for ELU activation.
Support Dilation in Conv2D.

ONNX Parser

Support Dilation in Conv2D.

Caffe Parser

Added Dilation support.
Added argmax deconv support.

ArmNN Serializer

Serialise ArmNN Model on android-nn-driver.

Public API Changes:

Backend API Changes:

ExecuteNetwork App Changes:

Two optimization parameters were added to enable saving and loading of the ClContext.
- save-cached-network
- cached-network-filepath

Other changes:

Make it easier for backends to traverse the subgraph during optimization by sorting Subgraphview layers on construction.
Added CL/NEON implementation of RANK Workload.
Added REDUCE layer for REDUCE_MAX, REDUCE_MIN, REDUCE_SUM operators.
Added REDUCE_MAX, REDUCE_MIN, and REDUCE_SUM operator support CpuRef Backend.
Added REDUCE_MAX, REDUCE_MIN, and REDUCE_SUM operator support/workload CpuAcc Backend.
Added REDUCE_MAX, REDUCE_MIN, and REDUCE_SUM operator support/workload GpuAcc Backend.
Added more Fused Activation unit tests.
Handle Neon optionality on 32 bit linux platforms.
Validated MobileNetv2-SSD and MobileNetv3-SSD support.
Add CpuAcc specific configuration option numberOfThreads.
Add GpuAcc MLGO tuning file configuration argument.

Bug Fixes:

Default stride values in depthwise and convolution to 1 instead of 0.
Fixed transpose conv InferOutputShape.
Fix incorrect padding value for asymmetric quantized type.
Fix build breaks for armnnDeserializer test and Threads.cpp for macosx.
- Further fix for macosx where filenames are case insensitive.
Unittest failure on mipsel/s390x/ppc64/powerpc.
ArmnnQuantizer incorrectly Quantizes all data types.
Fixed TFLite parser not parsing TransposeConvolution.
Fix TfLite parser and ExecuteNetwork issues where error was not thrown in some cases.
Fix wav2letter not producing correct output for Neon backend.
Fix ReduceLayer InferOutputShape issue where the correct axis data will be read in TfLiteParser.
Fix Reduce workload to allow input tensors of any rank into the validate function.
Updated JsonPrinterTestImpl to use CpuLogitsDLogSoftmaxKernel_#.
Add missing serializer support for m_DimensionsSpecificity.
Removed unnecessary friend function in INetwork and fixed TransformIterator operator= to allow compilation on further compilers.

Known issues:

Deprecation Notification:

The following components have been deprecated and will be removed in the next release (21.05) of Arm NN.

armnnQuantizer :
Now that the Tensorflow Lite Converter has matured post training quantization capabilities, the need for this component has gone. See: https://www.tensorflow.org/model_optimization/guide/quantization/post_training and https://www.tensorflow.org/lite/performance/post_training_quantization for more details.
armnnTfParser :
As Tensorflow Lite is our current recommended deployment environment for Arm NN and the Tensorflow Lite Converter provides a path for converting most common machine learning models into Tensorflow Lite format, the need for a Tensorflow parser has gone.
armnnCaffeParser :
Caffe is no longer as widely used as a framework for machine learning as it once was.

Ubuntu 16.04 LTS is reaching End of Life.

Ubuntu Linux 16.04 LTS will no longer be supported by April 30, 2021.
At that time, Ubuntu 16.04 LTS will no longer receive security patches or other software updates.
Consequently Arm NN will from the 21.08 Release at the end of August 2021 no longer be officially supported on Ubuntu 16.04 LTS but will instead be supported on Ubuntu 18.04 LTS.

TfLite Delegate

New Features:

Enabled ELU Activation.
Enabled HARD_SWISH Activation.
Added GATHER operator support.
Added Logical AND, NOT and OR operator support.
Added PAD operator support.
Added PADV2 operator support.
Added SPLIT operator support.
Added SPLIT_V operator support.
Added ARG_MAX operator support.
Added ARG_MIN operator support.
Added LOCAL_RESPONSE_NORMALIZATION operator support.
Added L2_NORMALIZATION operator support.
Added BATCH_TO_SPACE_ND operator support.
Added SPACE_TO_BATCH_ND operator support.
Added DEPTH_TO_SPACE operator support.
Added SPACE_TO_DEPTH operator support.
Added SUM operator support.
Added REDUCE_MAX, REDUCE_MIN operator support.
Added FLOOR operator support.
Added OptimizerOptions
- Reduce Float32 to Float16.
- Reduce Float32 to BFloat16.
- Enable debug data.
- Enable memory import.
Added STRIDED_SLICE operator support.
Added LSTM operator support.

Other Changes:

Provided Android build.
Removed Tensorflow requirement.

Bug Fixes:

Fixed fused activation in Fully Connected layer.
Fixed TfLiteDelegate Reshape operator failure when running models with 2D shape tensor.

Known Issues:

Note: We have added pre-built binaries (please see the Assets) of 21.02 Arm NN along with this release. Please refer to BuildGuideNative.md guide in the armnn/delegate for more information.

Build dependencies:

Tools	Supported Version
Git	2.17.1 or later
Scons	2.4.1 (Ubuntu) and 2.5.1 (Debian)
CMake	3.5.1 (Ubuntu) and 3.7.2 (Debian)
Boost	1.64
Tensorflow	2.3.1
Caffe	tag 1.0
Flatbuffer	1.12.0
Protobuf	3.12.0
Eigen3	3.3
Android NDK	r20b
mapbox/variant	1.2.0

Android 11 Compatibility Testing was performed using the following:

Android Tag	Android Build ID	Mali Driver	Android Compatibility Test Suite	Android Vendor Test Suite
android-11.0.0_r1	RP1A.200720.009	R26P0_01EAC0, R30P0_01EAC0	11_r2 (6965179)	11_r2 (6961477)

Android 10 Compatibility Testing was performed using the following:

Android Tag	Android Build ID	Mali Driver
android-10.0.0_r39	QQ3A.200605.002.A1	R23P0_01REL0

Note: Going forward Arm NN will be making document updates to the latest release, if we have missed any, and these will be available in github by selecting the doc tag corresponding to the release. For example, we have tag 21.02.doc1 which basically is the 21.02 release and also includes some of the documents which we updated for the 21.02 Release. There are no changes functionality wise. These document changes are cherry picked to the branches/armnn_21_02.

Assets 7

27 Nov 16:54

nikraj01

v20.11

fa52dfe

Release 20.11

Summary

The 20.11 Release was intended to provide major improvements to usability and performance in addition to delivering some additional functionality.

The usability enhancements were:

Added Debian packaging for ArmNN Core, TfLite Parser and PyArmNN to Ubuntu Launchpad. This means users on Linux no longer need to go through a source repository setup and compile in order to start working.
Addition of TfLite Delegate as well as 21 of its most valuable operators. Allows a much larger set of models to be executed as operators that are not accelerated in the delegate will execute in the TfLite interpreter.
Removal of the boost framework from all ArmNN code bar our unit tests. Simplifies deployment as the dependency on boost no longer exists.
Website updates (better layout and more examples).

The performance enhancements were:

ArmNN integration of Compute Library Activation and Batch Normalization fusing.
ArmNN exposed the Compute Library fastmath option as a parameter that can be set on a per model basis and in some scenarios will result in the selection of a faster convolution algorithm at the cost of some accuracy (winograd).

The additional functionality was:

Addition of high priority partner requested Logical AND/OR/NOT operators in NNAPI.
Support for Android R, verified against CTS 11_r3 (Build Id: 20201114.173303).
Added support for the EfficientNet-Lite Model.

New Features:

Added Debian packaging, which allows ArmNN to be installed via our APT repository on Ubuntu's Launchpad.
Added ability to turn on the Compute Library fast_math option through ExecuteNetwork and the Android-nn-driver.
- Using the fast_math flag can lead to performance improvements in fp32 and fp16 layers but at the cost of some accuracy.
- The fast_math flag will not have any effect on int8 performance.
Added support for Logical NOT, AND and OR for CpuRef, CpuAcc and GpuAcc.
Added optimization to fuse BatchNorm into Convolution and Depthwise Convolution in fp32 and fp16.
Added backend specific optimization to fuse Activations into the previous workload.
- Currently Activations can be fused with Addition, BatchNorm, Convolution, Depthwise Convolution, Division, Multiplication or Subtraction workloads on both CpuAcc and GpuAcc.
- Not all workloads can support all Activations.
Added AddBroadcastReshapeLayer as optimizer.
Added Map layer and Map workload. This layer has 1 input slot and 0 output slots and simply calls ->Map() on the input tensor handle.
Added Unmap layer and Unmap workload. This layer has N input slot and 0 output slots and simply calls ->Unmap() on the input0 tensor handle. The remaining inputs are used for determining scheduling dependencies.
Added support for TfLite Delegate (More information below in TfLite Delegate section).

TfLite Parser:

Remove AddBroadcastReshapeLayer from TfLite Parser and added to optimizations.
TfLite version updated to 2.3.1.

Tf Parser:

Tensorflow version updated to 2.3.1.
Add support for 2nd input to ExpandDims in TfParser.

ArmNN Serializer:

Added support for Logical NOT, AND and OR.

Public API Changes:

Backend API Changes:

ExecuteNetwork App Changes:

Added ability to enable Compute Library fast_math through ExecuteNetwork.
Added ability to execute models using TfLiteDelegate.
Refactored ExecuteNetwork to support cxxopts.
Allow use of dynamic backendId in execute network.

Other changes:

Removed remaining boost from ArmNN runtime code (Boost still resides in Unit Tests).
- Removed boost::format and swapped to fmt
  - Link fmt statically and change to be header-only interface library
- Removed boost::tokenizer and boost::escaped_list_separator to avoid use of CsvReader
- Removed boost::make_iterator_range and boost::to_upper_copy
- Removed boost::transform_iterator and make_transform_iterator
- Removed boost::numeric_cast
- Removed boost::math::fpc uses
- Removed boost/preprocessor.hpp
- Removed boost::program_options and swapped to cxxopts
- Removed boost::variant and swapped to mapbox/variant library
- Removed Boost from standalone dynamic backend
- Removed remaining Boost references from test executables
Extended dump file with info about fused layers.
Added SECURITY.md file that contains the security policy, vulnerability reporting procedure and a PGP key that can be used to create secure vulnerability reports.
Graph::Print() now outputs more information such as number of input/output tensors and tensor dimensions.
Updated Protobuf to 3.12.0.
Load dynamic backends for YoloV3 tests.
Included layer GUID in SerializeToDot output.
Refactored Optimize(...) function to throw exceptions instead of returning null.
Speed up the reference backend.
Added int32 and int64 ArgMax op support.
Added Quantization operator=() function to Tensor.
Introduce ModelOptions to OptimizedNetwork.
- Added ability to pass ModelOption through Network::LoadNetwork() to Workload factory.
Added Load-scope dynamic tensor TfLite tests.

Bug Fixes:

Fixed Unittest failure while building using EthosNAcc backend.
Fixed crash on model with Fullyconnected Sigmoid Activation by adding supported activations check to Neon FullyConnected validate.
Fixed logical VTS skip.
Fixed issue where EthosNAcc backend would output all zeros when falling back to CpuRef.
Fixed issue causing SSD Mobilenet f16/uint8 to fail on CpuRef via ExecuteNetwork.
Fixed issue with signed-int8 quantized model.
Fixed error running EfficientNet-Lite on GpuAcc.
Fixed validation for per-channel quantization.
Fixed segfault between Neon and Cl layers.
Fixed NonMaxSuppression.
Fixed Yolov3 producing 0s on Neon.
Removed Resize from list of layers that need padding in Neon.
In Neon and CL MUL workloads, use as convert policy SATURATE if one of the inputs is quantized and WRAP for the rest of cases.
Fixed non-channel per axis quantization.
Fixed compiler implicit copy deprecation warning by updating Quantization copy constructor.
PyArmNN has hard dependencies on all parsers when using cmake.
Fixed cxxopts and ghc cross compilation issue.
Fixed undefined reference to GetIdStatic() in DynamicBackendsTests.

Known Issues:

Using a comma separated list to specify multiple compute devices --compute CpuRef,CpuAcc when using ExecuteNetwork doesn't work. To use multiple compute devices use --compute CpuRef --compute CpuAcc.

TfLite Delegate:

New Features:

Current supported operators:

Activation (ReLu, Relu6, Logistic, and TanH)
Comparison (Equal, Greater, GreaterOrEqual, Less, LessOrEqual, NotEqual)
Control (Concat and Mean)
Convolution (Convolution2d, DepthwiseConvolution2d and TransposeConvolution)
ElementWiseBinary (Add, Div, Max, Min, Mul, Sub)
ElementWiseUnary (Abs, Exp, Neg, Rsqrt, Sqrt )
FullyConnected
Pooling (MaxPool2d, AveragePool2d and L2Pool2d)
Quantization (Dequantize and Quantize)
Redefine (Reshape)
Resize (Bilinear and NearestNeightbour)
Softmax (Softmax and LogSoftmax)
Transpose

Other Changes:

Created the TfLite Delegate sub-directory in ArmNN.
Added Fp16 support.
Updated Tensorflow from v1.15 to v2.3.1.
Activated compiler warnings when building delegate.
Added ability to execute models through ExecuteNetwork using the TfLiteDelegate.

Known Issues:

Build dependencies:

Tools	Version we support
Git	2.17.1 or later
SCons	2.4.1 (Ubuntu) and 2.5.1 (Debian)
CMake	3.5.1 (Ubuntu) and 3.7.2 (Debian)
boost	1.64
Tensorflow	2.3.1
Caffe	tag 1.0
Onnx	1.6.0
Flatbuffer	1.12.0
Protobuf	3.12.0
Eigen3	3.3.
Android	10 and 11
Mali Driver	r25p1_01bet0
Android NDK	r20b
mapbox/variant	1.2.0

Assets 2

28 Aug 13:31

nikraj01

v20.08

ba163f9

Release 20.08

Summary

The 20.08 Release delivers the following:

The final tranche of support for Android R ahead of its release in September. Namely QoS functionality, Fill, Rank and the new Resize options.
Support for dynamic tensors where the size of any unspecified tensors can be inferred at network load time.
Performance enhancements on the NEON backend eliminating unnecessary copying of data in memory, namely:
- The ability to directly import and export data into an inference graph.
- The ability to use subtensors where possible in split and concat workloads.
Verification of support for TensorFlow Lite wav2letter and wav2letter tiny models (note: need to do further work to verify accuracy in the next release).

New Features:

Added FILL operator support for CpuRef, CpuAcc, and GpuAcc.
Added RANK operator support for CpuRef.
Added align corner and half pixels support to the RESIZE operator for CpuRef, CpuAcc, and GpuAcc.
Refactor TensorShape to support Dynamic Tensors (tensors of unknown dimension sizes or even unknown rank).
Enable memory import in CpuAcc.
Allow using Sub-Tensors on CpuAcc on ConcatenationLayer if concatenation is along x or y (2 innermost dimensions) and previous layers do not require padding.
Allow using Sub-Tensors on CpuAcc on SplitterLayer if split is along x or y (2 innermost dimensions) and next layers do not require padding.

TfLite Parser:

Added DIV operator support.
Added LEAKY_RELU operator support.
Added NEG operator support.
Added HARD_SWISH operator support.
Added Dynamic Tensors Type 1 (Output shape can be inferred from Input Shape, Input shape always has to be set, Output shape can be dynamic) Support.

Public API Changes:

Added ITensorHandleFactory::GetCapabilities to calculate capability of the TensorHandleFactor.

ExecuteNetwork App Changes:

Added -infer-output-shape option: if enabled it will enable ShapeInferenceMethod::InferAndValidate on TfLiteParser which supports dynamic tensors type 1 that Output shape can be inferred from Input shape.

Other changes:

Added EXP operator support to CpuAcc and GpuAcc.
Added ADD,SUB,DIV,MUL,MAXIMUM and MINIMUM int32 support in CpuRef.
Added PRELU float16 support in CpuRef.
Added ARGMINMAX float16 support in CpuRef.
Added GATHER support for any axis in CpuAcc and GpuAcc (previously the support was only for axis = 0).
Added LOGSOFTMAX support in CpuAcc and GpuAcc.
Added support for subtensors on Splitter layer for splitting x/y axis if no padding required on next layer.
Added support for subtensors on Concat layer for concatenating x/y axis if no padding required on previous layer.
Replace boost::filesystem by ghc::filesystem.
Remove boot/dll.hpp from dynamic backends test.
Separated external profiling server code into a standalone library.

Bug Fixes:

Added ability for Mean Reduction to reduce to scalar.
Added ability for Strided Slice to shrink to scalar.
Added a check for Strided Slice to not run when stride is negative and ShrinkAxisMask set.
Fix edge case for transposeConv2d output shape inference.
Fix deserializer output binding TensorShape logic.
Fixed issue where AddBroadcastReshapeLayer would always connect the Reshaped input to the first input slot and the other input to the first input slot.
Remove TfLite Concat and Pad quantazation validation.

Build dependencies

Tools	Version we support
Git	2.17.1 or later
SCons	2.4.1 (Ubuntu) and 2.5.1 (Debian)
CMake	3.5.1 (Ubuntu) and 3.7.2 (Debian)
boost	1.64
Tensorflow	TENSORFLOW_REVISION= 590d6eef7e91a6a7392c8ffffb7b58f2e0c8bc6b (v1.15.0)
Caffe	CAFFE_REVISION= 7d3f8a7ea43fb06cd9804bc90933c7a91cd88ec9
Onnx	ONNX_REVISION= f612532843bd8e24efeab2815e45b436479cc9ab
Flatbuffer	1.12.0
Protobuf	3.5.2
Eigen3	3.3
Android	9 and 10
Mali Driver	r25p1_01bet0
Android NDK	r20b

Assets 2

Releases: ARM-software/armnn

Release 22.08

Summary

New Features

TfLite Parser

ExecuteNetwork App Changes:

Arm NN Build Tool:

Bug Fixes

Other Changes

ABI/API Changes

TfLite Delegate

New features

Bug Fixes

PyArmNN

Build Dependencies

Release 22.05.01

Summary

New Features

Release 22.05

Summary

New Features

TfLite Parser

ArmNN Serializer/Deserializer

ExecuteNetwork App Changes:

Bug Fixes

Other Changes

ABI/API Changes

TfLite Delegate

New features

Bug Fixes

PyArmNN

Build Dependencies

Release 22.02

Summary

New Features

TfLite Parser

ArmNN Serializer/Deserializer

Bug Fixes

Other Changes

ABI/API Changes

TfLite Delegate

New features

Release 21.11

New Features

TfLite Parser

ONNX Parser

ArmNN Serializer/Deserializer

ExecuteNetwork App Changes

Bug Fixes

Other Changes

ABI/API Changes

Release 21.08

Summary

New Features

TfLite Parser

ArmNN Serializer/Deserializer

ExecuteNetwork App Changes

Bug Fixes

Other Changes

ABI/API Changes

Release 21.05

Summary

New Features

TfLite Parser

ArmNN Serializer/Deserializer

ExecuteNetwork App Changes

Deprecated features

Bug Fixes

Other Changes

Known Issues

ABI/API Changes

Release 21.02

Summary

New Features:

TfLite Parser

ONNX Parser

Caffe Parser

ArmNN Serializer

Public API Changes:

Backend API Changes: