Skip to content

Releases: ARM-software/armnn

Release 22.08

25 Aug 15:43
Compare
Choose a tag to compare

Summary

New Features

  • Add Arm NN Support Library.
    • The Arm NN Support Library for Android NNAPI is a shared library which has all the functionalities of existing HAL drivers for Android NNAPI.
    • It is available from Android S.
    • It focuses on update-ability of ML operators.
    • Guiide on how to build Arm NN Support Library is available armnn/shim/BuildGuideShimSupportLibrary.md.
    • SLTS (Support Library Test Suit) compliance.
  • Support for Batch MatMul in CpuRef.

TfLite Parser

  • Added support for LOG.
  • Added support for SIN.

ExecuteNetwork App Changes:

  • Refactor of ExecuteNetwork. Now input name, input type, output name, output type and model type are read from the model.

Arm NN Build Tool:

  • Introduced Arm NN Build Tool which consists of an official Arm NN Dockerfile for building Arm NN and Arm Compute Library (ACL).
  • This tool replaces the majority of our existing build guides as a user-friendly way to build Arm NN (and its dependencies) from scratch.
  • Tested on x86_64 (Intel) and aarch64 (Arm) build hosts for the Ubuntu platform.
  • Currently supports targeting Linux devices (from Ubuntu 18.04 onwards) on x86_64, aarch32 and aarch64 architectures.

Bug Fixes

  • The models in format .armnn (serialized models) were failing in 22.05, this problem has been solved by adding the constant layers before the operator layers.
  • Neon fold padding into average pool 2D quantization bug fix.
  • Fix segmentation fault when running --bf16-turbo-mode on FPGA.

Other Changes

  • General documentation refactor and updates.
  • Added LICENSE.spdx for Arm NN
  • Delay backend deprecation from 22.11 to 23.08

ABI/API Changes

The following front-end API changes have occurred during the implementation of 22.08 that users should be aware of before upgrading.

.

Feature SHA Gerrit Review Resultant ABI/API changes
Import inputs but don't export outputs fails 626bd90 https://review.mlplatform.org/c/ml/armnn/+/7661 Field m_ExportEnabled has been added to type OptimizerOptions. This field will not be initialized by old clients that have not been recompiled.
Get non-const IConnectableLayer from I/O slots 09fa24d https://review.mlplatform.org/c/ml/armnn/+/7835 Pure virtual method GetOwningIConnectableLayer ( ) has been added to classes IOutputSlot and IInputSlot.
  • Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method.
  • The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
  • Remove deprecated code 22.05 4d2eec0 https://review.mlplatform.org/c/ml/armnn/+/7712 Removed Symbols:
  • IsCapabilitySupported ( BackendId const& backend, enum BackendCapability capability ) FullyConnectedDescriptor::GetNumViews ( ) const INetwork::Accept ( ILayerVisitor& visitor ) const
  • Pure virtual method Accept ( ILayerVisitor& ) const has been removed from class IConnectableLayer.
  • The layout of v-table has been changed. Call of this virtual method or any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
  • Modified SubgraphView returned by GetWorkingCopy() cea3d49 https://review.mlplatform.org/c/ml/armnn/+/7852 Pure virtual method GetSlotIndex ( ) const has been added to class IInputSlot.
  • Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method.
  • The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
  • Update the async api to use ExecutionData 21a6a1a https://review.mlplatform.org/c/ml/armnn/+/7878 experimental::IWorkingMemHandle Pure virtual method GetExecutionDataAt ( unsigned int ) has been added to this class.
  • Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method.
  • The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
  • Pure virtual method GetWorkingMemDescriptor ( LayerGuid ) has been removed from this class.
  • The layout of v-table has been changed. Call of this virtual method or any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
  • The following back-end API changes have occurred during the implementation of 22.08 that users should be aware of before upgrading.

    Feature SHA Gerrit Review Resultant ABI/API changes
    Update the async api to use ExecutionData 21a6a1a https://review.mlplatform.org/c/ml/armnn/+/8051/2 The following virtual functions have been added to class IBackendInternal:
  • virtual ExecutionData CreateExecutionData(WorkingMemDescriptor&) const
  • virtual void UpdateExecutionData(ExecutionData&, WorkingMemDescriptor&) const
  • The layout of v-table has been changed. Call of this virtual method or any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
  • The signature of IWorkload::ExecuteAsync() has changed, it now accepts ExecutionData& instead of WorkingMemDescriptor&.
  • Add GetMemoryRequirements to IWorkload 5e09080 https://review.mlplatform.org/c/ml/armnn/+/7886 The following virtual function has been added to class IWorkload:
  • virtual armnn::Optionalarmnn::MemoryRequirements GetMemoryRequirements()
  • The layout of v-table has been changed. Call of this virtual method or any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
  • Modified SubgraphView returned by GetWorkingCopy() cea3d49 https://review.mlplatform.org/c/ml/armnn/+/7852 The signature of SubgraphView::GetWorkingCopy() has changed, it has now been marked as const to reflect the fact that the graph represented by the working copy does not get altered.

    TfLite Delegate

    New features

    • Added support for LOG
    • Added support for SIN
    • Add JNI interface

    Bug Fixes

    • Fix running MobileBERT on CpuRef
    • Only use the macro ARMNN_TFLITE_DELEGATE
    • DelegateQuickStartGuide.md errors fix

    PyArmNN

    • Documentation update running PyArm NN with ONNX parser.

    Build Dependencies

    Tools Supported Version
    Git 2.17.1 or later
    SCons 2.4.1 (Ubuntu) 2.5.1 (Debian)
    Cmake 3.19.0
    Tensorflow 2.5.0
    Onnx 1.6.0
    Flatbuffer 1.12.0
    Protobuf 3.12.0
    Android NDK r20b
    mapbox/variant 1.2.0
    cxxopts SHA 12e496da3d486b87fa9df43edea65232ed852510
    doctest 2.4.6
    fmt 7.0.1
    ghc 1.3.2
    half 1.12.0
    stb 2.16

    Release 22.05.01

    20 Jun 08:17
    Compare
    Choose a tag to compare

    Summary

    New Features

    This is a patch release of 22.05 where we have implemented Pooling3d custom operator for ArmNN TfLite Delegate. This feature is available in the 22.05 release branch itself (branches/armnn_22_05) and in the tag created for patch release v22.05.01.

    Release 22.05

    26 May 10:32
    Compare
    Choose a tag to compare

    Summary

    New Features

    • ArmnnTestUtils is now versioned and under ABI compliance checker
    • Added support for Int32 CONCATENATION layer for CpuRef
    • Added support for Float32 Unidirectional Sequence LSTM layer for CpuAcc and GpuAcc
    • Added support for GatherNd for CpuRef, CpuAcc and GpuAcc
    • Added support for SQRT for CpuAcc and GpuAcc
    • Added support for Depthwise Convolution2d ConstTensorsAsInput for CpuRef, CpuAcc and GpuAcc
    • Added support for Conv2d ConstTensorsAsInput for CpuRef, CpuAcc and GpuAcc
    • Added support for Fully Connected ConstTensorsAsInput for CpuAcc and GpuAcc
    • Added support for MaxPool3D and AveragePool3D for CpuAcc and GpuAcc
    • Added support for L2Pooling3D for GpuAcc
    • Added support for UnidirectionalLSTM for CpuAcc
    • ConstTensorsAsInput: Optimizer Fix - FuseBatchNorm
    • ConstTensorsAsInput: Optimizer Fix - FoldPadIntoConvolution2d
    • ConstTensorsAsInput: Optimizer Fix - Fp32ToBf16 optimization

    TfLite Parser

    • Added support for GatherNd
    • Added support for FloorDiv
    • Added support for UnidirectionalLSTM
    • Do not create Floor for FloorDiv layer when the data type is int32

    ArmNN Serializer/Deserializer

    • Added support for GatherNd

    ExecuteNetwork App Changes:

    • Added Reuse IO Buffers mode
    • Profiling details weights and bias JSON keys deprecated. Will be removed for 22.08

    Bug Fixes

    • Fixed crashing in profiling
    • Fixed the issue with running SimpleSample app in Raspi
    • Removed MockBackend.hpp from armnn/src/backends/backendsCommon/test/ to solve problems when using Visual Studio in Windows
    • Fixed segfault in RefDepthwiseConvolution2d workload

    Other Changes

    • ArmNN Baremetal
      • Change the namespace from armnn::profiling to arm::pipe

    ABI/API Changes

    The following front-end API changes have occurred during the implementation of 22.05 that users should be aware of before upgrading.

    .

    Feature SHA Gerrit Review Resultant ABI/API changes
    Change the namespace from armnn::profiling to arm::pipe 5aa9fd7 https://review.mlplatform.org/c/ml/armnn/+/7222
  • Pure virtual method GetOwningIConnectableLayer( ) const has been added to class IOutputSlot. Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method.
  • The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
  • The following functions has had a change in signature meaning it will not be recognized by old applications: BackendRegistry::SetProfilingService
    IRuntime::RegisterDebugCallback
  • Type of field m_LocalPacketHandlers has been changed from std::vector<std::shared_ptrprofiling::ILocalPacketHandler > to std::vector<std::shared_ptrarm::pipe::ILocalPacketHandler > in Runtime::CreateOptions::ExternalProfilingOptions
  • Type of return value has been changed from profiling::ProfilingGuid to arm::pipe::ProfilingGuid in OptimizedNetwork::GetGuid
  • Replace ProfilingService includes with IProfilingService. af94772 https://review.mlplatform.org/c/ml/armnn/+/7240 The following function has had a change in signature meaning it will not be recognized by old applications.
    BackendRegistry::SetProfilingService
    Remove dependency on armnn::Exception classes from the Profiling code f9db3ef https://review.mlplatform.org/c/ml/armnn/+/7280 Class armnn::BackendProfilingException has been moved to namespace arm::pipe; this will result in older applications not being able to find it.
    Replace armnn:Optional with arm::pipe::Optional in profiling code decd08b https://review.mlplatform.org/c/ml/armnn/+/7295 Class armnn::TimeoutException has been moved to namespace arm::pipe; this will result in older applications not being able to find it.
    Add Unidirectional Sequence Lstm support to TFLite 5880b91 https://review.mlplatform.org/c/ml/armnn/+/7023 Following fields have been added to struct LstmDescriptor:
    m_CellIntermediateScale
    m_ForgetIntermediateScale
    m_HiddenStateScale
    m_HiddenStateZeroPoint
    m_InputIntermediateScale
    m_OutputIntermediateScale
    As a result of this size of the struct has been changed
    ConstTensorsAsInput: DepthwiseConvolution2d 0690265 https://review.mlplatform.org/c/ml/armnn/+/7417 Pure virtual method VisitDepthwiseConvolution2dLayer ( IConnectableLayer const*, struct DepthwiseConvolution2dDescriptor const&, char const* ) has been added to this class..
  • Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method..
  • The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
  • ConstTensorsAsInput: Conv2d - FrontEnd b4dd5cc https://review.mlplatform.org/c/ml/armnn/+/7382 Pure virtual method VisitConvolution2dLayer ( IConnectableLayer const*, struct Convolution2dDescriptor const&, char const* ) has been added to this class.
  • Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method.
  • The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
  • The following back-end API changes have occurred during the implementation of 22.02 that users should be aware of before upgrading.

    Feature SHA Gerrit Review Resultant ABI/API changes
    Move headers to profiling/client/include 2776183 https://review.mlplatform.org/c/ml/armnn/+/7327 Headers have been moved to profiling/client/include.
    Change the namespace from armnn::profiling to arm::pipe 5aa9fd7 https://review.mlplatform.org/c/ml/armnn/+/7222
  • Namespace changed from armnn: profiling to armnn: pipe:: profiling
  • TfLite Delegate

    New features

    • Added support for GatherNd

    Bug Fixes

    Note: Arm NN is aware of an issue where converting a model to .armnn will yield unpredictable results when reading back in through the deserializer. This is due to the serializer being dependent on graph topology and the graph being out of order. The graph becomes out of order because of the additional constant layers as inputs that are created through the parsers

    PyArmNN

    • Added support for GatherNd
    • Added Pooling3D

    Build Dependencies

    Tools Supported Version
    Git 2.17.1 or later
    SCons 2.4.1 (Ubuntu) 2.5.1 (Debian)
    Cmake 3.5.1 (Ubuntu) and 3.7.2 (Debian)
    Tensorflow 2.5.0
    Onnx 1.6.0
    Flatbuffer 1.12.0
    Protobuf 3.12.0
    Android NDK r20b
    mapbox/variant 1.2.0
    cxxopts SHA 12e496da3d486b87fa9df43edea65232ed852510
    doctest 2.4.6
    fmt 7.0.1
    ghc 1.3.2
    half 1.12.0
    stb 2.16

    Android 12 Compatibility Testing was performed using the following:

    Android Tag Android Build ID Mali Driver Android Compatibility Test Suite Android Vendor Test Suite
    android-12.0.0_r1 SP1A.210812.015 r36p0_01eac0-rc0 12_r2 (7987736) 12_r2 (7973604)

    Android 11 Compatibility Testing was performed using the following:

    Android Tag Android Build ID Mali Driver Android Compatibility Test Suite Android Vendor Test Suite
    android-11.0.0_r6 RPM1.210413.002 r33p0_01eac0 11_r5 (7640833) 11_r5 (7599184)

    Android 10 Compatibility Testing was performed using the following:

    Androidtag Android Build ID Mali Driver
    android-10.0.0_r39 QQ3A.200605.002.A1 R23P0_01REL0

    Release 22.02

    03 Mar 10:45
    Compare
    Choose a tag to compare

    Summary

    New Features

    • Add mirror padding support on Pad layer for CpuAcc and GpuAcc.
    • Add support for Pool3d FrontEnd, Reference implementation.

    TfLite Parser

    • Added missing support for reshape operator when the target shape is dynamic and batch size is unknown.
    • Added PadV2 support.
    • Changed asserts to CHECK in ParserFlatbuffersFixture.hpp.

    ArmNN Serializer/Deserializer

    • Add support for Pool3d.

    Bug Fixes

    • Added bounds checking when indexing PermutationVector elements and its correspondent unit tests.
    • Fixed output bindings in ExecuteNetwork when using delegate with models with multiple outputs.
    • Fixed build issues in x86 Dockerfile.
    • Fixed ExNet prints inference time twice.
    • Fixed thread safety issues in TimelineDecoder and associated unit tests.
    • Fixed some Thread Sanitizer warnings.
    • Added check for existing event to fix issue on OpenCL Timer.
    • Fixed logging bug where blank messages were being sent.
    • Fixed issues on Logging API.
    • Fixed async execute test on 32bit Raspberry Pi

    Other Changes

    • Removed references to blacklist from Model Accuracy tool.
    • Removed deprecated code.
    • Added ModelOptions and addition timing to ARMNN_LOG.
    • Added get_tensorflow.sh script.
    • Updated build guides.
    • Updated error messages from the flatbuffers parser.
    • Added the C++ KWS example.
    • Handled optional biases better in Neon/Cl FullyConnected workloads.
    • Stabilise the Backend API:
      • Backend developers should now be able to limit includes to headers in include/armnn/backends/
      • Moved CompatibleTypes.hpp to the armnnUtils library.
      • Added forwarding header for src/armnn/CompatibleTypes.hpp.
      • Moved the ArmNN Test Utils code to a physically separate directory.
      • Added new method AddPrecompiledLayer() to INetwork.
      • Promoted backend headers in backendCommon to armnn/backends.
      • Used INetwork rather than Graph for holding layers for OptimizationViews.
      • Used IConnectableLayer in SubgraphView rather than Layer in its m_Layers.
      • Stabilised the IWorkloadFactory interface with unified strategy.
      • Stabilised the ILayerSupport interface with unified strategy.
      • Moved SubgraphView to backends include folder.
      • Added GetParameters to IConnectableLayer.
      • Exposed a new MockWorkloadFactory and MockMemManager.
      • Accessing ConstTensors from IConnectableLayer
      • Added method of returning a GetSubgraphWorkingCopy (SubgraphView).
      • Moved MemCopyTestImpl from acl to armnnTestUtils.
    • Support Import of Aligned Host Memory in NNAPI:
      • Added CanBeImported to ITensorHandle.
      • Implemented CanBeImported function in RefTensorHandle.
      • Implemented CanBeImported function in NeonTensorHandle.
      • Implemented CanBeImported function in ClTensorHandle.
      • Added functionality for CopyAndImportFactoryPair to TensorHandleFactoryRegistry.
      • Register CopyAndImportFactoryPairs to RefBackend and unit tests.
      • Register CopyAndImportFactoryPairs to NeonBackend and unit tests.
      • Register CopyAndImportFactoryPairs to ClBackend and unit tests.
      • Added ReplaceTensorHandle functions to IWorkload and BaseWorkload.
      • Added ClBaseWorkload and NeonBaseWorkload.
      • Modified workloads to extend Neon/Cl BaseWorkload.
      • Added ReplaceTensorHandle functions to Neon/CL BaseWorkloads.
      • Implemented ICLTensorProxy.
      • Added input and output workload slot pairs to LoadedNetwork.
      • Added support of aligned host memory.
      • Added Forced Import EndToEnd tests to Ref, Neon, and CL.
      • Call Cl sync after EnqueueWorkload
      • Added EndToEnd tests on reference backend to ensure allocated data can be reused.

    ABI/API Changes

    The following front-end API changes have occurred during the implementation of 22.02 that users should be aware of before upgrading.

    .

    Feature SHA Gerrit Review Resultant ABI/API changes
    SubgraphView uses IConnectableLayer rather than Layer in its m_Layers 56ccf68 https://review.mlplatform.org/c/ml/armnn/+/6807 Pure virtual method GetOwningIConnectableLayer( ) const has been added to class IOutputSlot.:
  • Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method.
  • The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
  • Stabilize the ILayerSupport interface with unified strategy. 34b429c https://review.mlplatform.org/c/ml/armnn/+/6903 Virtual descriptor added to the struct BaseDescriptor, as a result the size of all desciptors has been changed.The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications.
    SubgraphView: Add method of returning a GetSubgraphWorkingCopy. 9d74ba6 https://review.mlplatform.org/c/ml/armnn/+/6995 Pure virtual method GetOwningIConnectableLayer( ) const has been added to class IInputSlot.
  • Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method.
  • The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
  • Add support of aligned host memory e2af6f4 https://review.mlplatform.org/c/ml/armnn/+/7025 The following functions have had a change in signature meaning they will not be recognized by old applications: IRuntime::EnqueueWorkload() accepts two new parameters preImportedInputIds and preImportedOutputIds. IRuntime::ImportInputs() accepts a new parameter forceImportMemorySource. IRuntime::ImportOutputs() accepts a new parameter forceImportMemorySource.
    Add GetParameters to IConnectableLayer e466596 https://review.mlplatform.org/c/ml/armnn/+/7031 Pure virtual method GetParameters ( ) const has been added to class IConnectableLayer. Virtual method IsNull ( ) const has been added to class BaseDescriptor.
  • Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method.
  • The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
  • Accessing ConstTensors from IConnectableLayer 2e24175 https://review.mlplatform.org/c/ml/armnn/+/7040 Pure virtual method GetConstantTensorsByRef ( ) has been added to class IConnectableLayer.
  • Applications will not provide the implementation for this pure virtual method and therefore cause a crash in the library trying to call this method.
  • The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
  • Remove deprecated code 22.02 b28e525 https://review.mlplatform.org/c/ml/armnn/+/7104 Deprecated LayerSupport.hpp and included IsXXXLayerSupported() functions have been removed as they have been replaced with ABI Stable ILayerSupport interface and the BackendHelper.hpp GetILayerSupportByBackendId() function.

    The following back-end API changes have occurred during the implementation of 22.02 that users should be aware of before upgrading.

    Feature SHA Gerrit Review Resultant ABI/API changes
    Add a Pooling3d FrontEnd and Ref Implementation 7b885b3 https://review.mlplatform.org/c/ml/armnn/+/6511 ILayerSupport.hpp
  • Pure virtual function IsPooling3dSupported added requiring implementation by backend developers.
  • Stabilize the ILayerSupport interface with unified strategy. 34b429c https://review.mlplatform.org/c/ml/armnn/+/6903
  • ABI stable virtual function IsLayerSupported(const LayerType& type, ...) has been added to ILayerSupport.hpp.The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
  • Stabilize the IWorkloadFactory interface with unified strategy 611c7fb https://review.mlplatform.org/c/ml/armnn/+/6906
  • ABI stable virtual function CreateWorkload(const LayerType& type, ...) has been added to class IWorkloadFactory.The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
  • TfLite Delegate

    New features

    • Added Delegate cross compile to x86 Dockerfile
    • Added constant input supports for Pack/Stack, Concatenation operators
    • Added Int32 supp...
    Read more

    Release 21.11

    22 Nov 14:46
    Compare
    Choose a tag to compare

    Arm NN 21.11 was focused on providing new capabilities and improve performance:

    New Features

    • Added support for Reduce Prod.
    • Added support for Channel Shuffle.
    • Added support for Conv3d.
    • Added support for Symmetric and Reflect Padding on CpuRef backend.
    • Added support for statically linking ArmNN TfLite Delegate against Tensorflow Lite.
    • Added Import Input/Output functions to async API, allowing for imported I/O buffers to be used by multiple network executions.
    • Added external memory manager that allows for customization of network memory management ( Note: currently only fully supported on the CpuRef Backend ).

    TfLite Parser

    • Added support for Reduce Prod.
    • Added support for Conv3d.
    • Added support for MirrorPad.
    • Added support for size of -1 for Slice.

    ONNX Parser

    • Add support for Concat
    • Add support for Gather
    • Add support for Gemm
      • The parser supports constant bias or non-constant bias where bias dimension = 1.
    • Add support for Shape
    • Add support for Unsqueeze
    • Add support of min/max as attribute for Clip

    ArmNN Serializer/Deserializer

    • Add support for Reduce Prod.
    • Add support for Channel Shuffle.
    • Add support for Conv3d.
    • Add support for Symmetric and Reflect Padding.

    ExecuteNetwork App Changes

    • Added 'do-not-print-output' option to ExecuteNetwork.

    Bug Fixes

    • Using output-network-details or output-network-details-only during ExecuteNetwork profiling created an invalid JSON format. This has since been fixed.
    • Fixed undefined reinterpret_cast in BFloat16.hpp. It fixes gcc builds with version 8 or above.
    • Fixed format of the delegate JSON output.
    • Fixed bug related with constant tensor flag.
    • Fixed pyarmnn py35 unit tests.

    Other Changes

    • Added sample app for asynchronous execution.
    • Printed new Optimize and LoadedNetwork profiling points.
    • Added new serialized model supported on Netron.
    • Made it possible for backends to add include paths in Android.
    • Changed order of the Doxygen tree.

    ABI/API Changes

    The following front-end API changes have occurred during the implementation of 21.11 that users should be aware of before upgrading. Due to these changes we have bumped our ARMNN_VERSION to 27.0.0, the Delegate to 25.0.0 and also bumping our Parsers to 24.3.0 following Semantic Versioning guidelines.

    .

    Feature SHA Gerrit Review Resultant ABI/API changes
    Remove deprecated code 1b2654f https://review.mlplatform.org/c/ml/armnn/+/6254 Removed Symbols:
  • INetwork::AddAbsLayer ( char const* name )
  • INetwork::AddDepthwiseConvolution2dLayer ( struct DepthwiseConvolution2dDescriptor const& convolution2dDescriptor, ConstTensor const& weights, ConstTensor const& biases, char const* name )
  • INetwork::AddDepthwiseConvolution2dLayer ( struct DepthwiseConvolution2dDescriptor const& convolution2dDescriptor, ConstTensor const& weights, char const* name )
  • INetwork::AddEqualLayer ( char const* name )
  • INetwork::AddGatherLayer ( char const* name )
  • INetwork::AddGreaterLayer ( char const* name )
  • INetwork::AddMergerLayer ( MergerDescriptor const& mergerDescriptor, char const* name )
  • INetwork::AddResizeBilinearLayer ( struct ResizeBilinearDescriptor const& descriptor, char const* name )
  • INetwork::AddRsqrtLayer ( char const* name )
  • LayerSupport::IsMergerSupported ( BackendId const& backend, std::vector<TensorInfo const*> inputs, TensorInfo const& output, struct OriginsDescriptor const& descriptor, char* reasonIfUnsupported, size_t reasonIfUnsupportedMaxLength )
  • LayerSupport::IsResizeBilinearSupported ( BackendId const& backend, TensorInfo const& input, TensorInfo const& output, char* reasonIfUnsupported, size_t reasonIfUnsupportedMaxLength )
  • LayerSupport::IsRsqrtSupported ( BackendId const& backend, TensorInfo const& input, TensorInfo const& output, char* reasonIfUnsupported, size_t reasonIfUnsupportedMaxLength )
  • LayerSupport::IsSplitterSupported ( BackendId const& backend, TensorInfo const& input, struct ViewsDescriptor const& descriptor, char* reasonIfUnsupported, size_t reasonIfUnsupportedMaxLength )

  • Removed pure virtual methods, resulting in change to v-table layout:
  • ILayerVisitor::VisitAbsLayer
  • ILayerVisitor::VisitEqualLayer
  • ILayerVisitor::VisitGatherLayer
  • ILayerVisitor::VisitGreaterLayer
  • ILayerVisitor::VisitMergerLayer
  • ILayerVisitor::VisitResizeBilinearLayer
  • ILayerVisitor::VisitRsqrtLayer

  • Removed DataTypes:
  • DataType::QuantisedAsymm8
  • DateType::QuantisedSymm16
  • DataType::QuantizedSymm8PerAxis
  • 'IMemoryOptimizerStrategy Add strategy library and add support in BackendRegistry' b8a26d8 https://review.mlplatform.org/c/ml/armnn/+/6297 struct IRuntime::CreationOptions:
  • Member variable m_MemoryOptimizerStrategyMap has been added, changing the size of the type.
  • class BackendRegistry:
  • Member variable m_MemoryOptimizerStrategyMap has been added, changing the size of the type.
  • Add missing runtime parameters to TfLite delegate. 3e32a87 https://review.mlplatform.org/c/ml/armnn/+/6388 class Delegate:
  • Size of field m_Options has been changed from 136 bytes to 352 bytes.

  • class DelegateOptions had the following fields added and so the size of the inclusive type has been changed.
  • Field m_DynamicBackendsPath has been added to this type.
  • Field m_EnableGpuProfiling has been added to this type.
  • Field m_InternalProfilingDetail has been added to this type.
  • Field m_InternalProfilingEnabled has been added to this type.
  • Field m_ProfilingOptions has been added to this type.
  • Field m_SerializeToDot has been added to this type.
  • Profiling instrumentation throughout the Optimizer f1e0ad3 https://review.mlplatform.org/c/ml/armnn/+/6432 struct OptimizerOptions:
  • Field m_ProfilingEnabled has been added to this type.

  • class Delegate:
  • Size of this class has been increased from 416 bytes to 424 bytes.

  • class DelegateOptions:
  • Size of this class has been increased from 352 bytes to 360 bytes.

  • Objects of these classes can be allocated by the applications and old size will be hardcoded at the compile time. Call of any exported constructor will break the memory of neighboring objects on the stack or heap.
    This is due to addition of m_ProfilingEnabled to the OptimizerOptions used in constructors of both Delegate classes.
    Fix armnn_external_delegate option parsing b1c62f1 https://review.mlplatform.org/c/ml/armnn/+/6519 class Delegate:
  • Size of field m_Options has been changed from 360 bytes to 616 bytes.

  • class DelegateOptions:
  • Field m_RuntimeOptions has been added to this type.
  • Field m_BackendOptions has been removed from this type.
  • Field m_DynamicBackendsPath has been removed from this type.
  • Field m_EnableGpuProfiling has been removed from this type.

  • Objects of these classes can be allocated by the applications and old size will be hardcoded at the compile time. Call of any exported constructor will break the memory of neighboring objects on the stack or heap.
    Support the new memory API in loaded network b1aad42 https://review.mlplatform.org/c/ml/armnn/+/6552 class INetworkProperties:
  • Field m_ExternalMemoryManagementEnabled has been added to this type.

  • The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications.

    The following back-end API changes have occurred during the implementation of 21.11 that users should be aware of before upgrading.

    Feature SHA Gerrit Review Resultant ABI/API changes
    Remove deprecated code 1b2654f https://review.mlplatform.org/c/ml/armnn/+/6254 IBackendInternal.hpp
    Removed Symbols:
  • virtual ISubGraphConverterPtr CreateSubGraphConverter(const std::shared_ptr& subGraph) const;
  • virtual Optimizations GetOptimizations() const;
  • virtual SubGraphUniquePtr OptimizeSubGraph(const SubGraph& subGraph, bool& optimizationAttempted) const;

  • Removed Aliases:
  • GraphUniquePtr, SubgraphViewUniquePtr, ISubGraphConverterPtr, SubGraphUniquePtr

  • ILayerSupport.hpp
    Removed Symbols:
  • IsEqualSupported
  • IsGatherSupported
  • IsGreaterSupported
  • IsMergerSupported
  • IsResizeBilinearSupported
  • IsRsqrtSupported
  • IsSplitterSupported
  • Add Channel Shuffle Front end and Ref Implementation 51f6777 https://review.mlplatform.org/c/ml/armnn/+/6211 ILayerSupport.hpp
  • Pure virtual function IsChannelShuffleSupported added requiring implementation by backend developers.
  • Add Conv3d FrontEnd and Ref Implementation b63a311...
    Read more

    Release 21.08

    26 Aug 16:14
    Compare
    Choose a tag to compare

    Summary

    Arm NN 21.08 was focused on providing new capabilities and improve performance::

    • Added the ability to import protected DMA Buffers and allow Arm NN to run inferences that are in Protected GPU Memory. As well as providing Custom Memory Allocator which supports importing malloc, Dma_buf and protected Dma buffers.
    • Users with multi core NPUs has been given the ability to pin inferences to selected cores giving them the ability to balance parallel workloads across the NPU and increase throughput.
    • Boost has been completely removed from the code base making Arm NN easier to integrate into other software stacks.
    • Added support for non-constant weights and biases on FullyConnected which lay the groundwork for supporting more models.
    • More operators supported on Arm NN, TfLite Parser, TfLite Delegate and Android NNAPI driver.

    New Features

    • Moved unit tests from BOOST to doctest.
    • UNIDIRECTIONAL_SEQUENCE_LSTM Operator support added on CpuRef backend.
    • Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].
    • Reduce Operator can now support multiple axes.
    • Optimisation added to fuse PAD Operator into Depthwise Convolution Operator.
    • Added SIN and LOG support to ElementWiseUnary Operator on CpuRef, CpuAcc (Only LOG is supported) and GpuAcc backends.
    • Added SHAPE Operator support on CpuRef backend.
    • Moved useful test utilities to new static library (libarmnnTestUtils.a).
    • Added ability to create multiple LoadedNetworks from one OptimizedNetwork.
    • Arm NN TfLite Delegate Image Classification sample application added to samples directory.
    • Added fully comprehensive Arm NN Operator list page to Doxygen.
    • Added support to allow Arm NN to run inferences that are in Protected GPU Memory.
      • Creation of Protected Memory is handled via a Custom Memory Allocator which supports importing malloc, Dma_buf and protected DMA buffers.

    TfLite Parser

    • EXPAND_DIMS Operator support added.
    • PRELU Operator support added.
    • SHAPE Operator support added.
    • Comparison Operator support added (EQUAL, GREATER, GREATER_EQUAL, LESS, LESS_EQUAL and NOT_EQUAL).
    • Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].
    • Added support for shape_signature, which will now be the preferred way to detect dynamic tensors.
      • If creating an instance of the ITfLiteParser and the model used is dynamic, then please ensure that m_InferAndValidate is set in the TfLiteParserOptions and m_shapeInferenceMethod is set to InferAndValidate in the OptimizerOptions.

    ArmNN Serializer/Deserializer

    • Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].
    • Added SIN and LOG support to ElementWiseUnary Operator.
    • UNIDIRECTIONAL_SEQUENCE_LSTM Operator support added.

    ExecuteNetwork App Changes

    • Added option to specify what size Arm NN thread pool to use when running inferences asynchronously.
    • Added support for qasymms8 (int8) and added qasymmu8 (uint8) as alias for qasymm8.
    • Added option to specify different input data for every iteration of ExecuteNetwork.
    • Added option to print additional information such as the TensorInfo, Descriptor and Convolution method when profiling is enabled.

    NOTE: To run dynamic models through ExecuteNetwork the --infer-output-shape flag should be set.

    Bug Fixes

    • Removed duplicate check for Dequantize input type when checking if operator is supported.
    • Fixed undefined behaviour in PolymorphicDowncast.
    • Fixed binding of reference to null pointer in RefFullyConnectedWorkload.
    • Fixed PermutationVector.end() to cope with dimensions < 5 in PermutationVector class.
    • Fixed cl_ext.h include path in CL backend.
    • Fixed bugs in PreCompiledLayer. E.g. A new shared_ptr was being created instead of allowing std::move to convert the unique_ptr into a shared_ptr.
    • Fixed gcc 9.3.0 compiler warning in TfLiteParser.
    • Fixed issue so that the BackendRegistry is cleaned up correctly following negative tests.

    Other Changes

    • Print Elementwise and Comparison Operator descriptors in a dot graph.
    • Added IsConstant flag to TensorInfo. This should be set if using the new AddFullyConnectedLayer Graph API when weights and bias are constant. An example of this can be found in samples/SimpleSample.cpp.
    • Added support for qasymms8 (int8) and added qasymmu8 (uint8) as alias for qasymm8 to ImageTensorGenerator.

    ABI/API Changes

    The following front-end API changes have occurred during the implementation of 21.08 that users should be aware of before upgrading. Due to these changes we have bumped our ARMNN_VERSION to 26.0.0 while also bumping our Parsers and Delegate to 24.2.0 following Semantic Versioning guidelines.

    Feature SHA Gerrit Review Resultant ABI/API changes
    Rework the async threadpool f364d53 https://review.mlplatform.org/c/ml/armnn/+/5801
      Be aware that these classes are in the experimental namespace and should be treated as such.
      struct INetworkProperties: Field m_NumThreads has been removed from the middle position of this structural type. Size of this type has been changed from 32 bytes to 24 bytes.
      class IWorkingMemHandle: Pure virtual method GetInferenceId ( ) has been removed from this class.
      class IAsyncExecutionCallback: The following methods have been removed:
    • GetEndTime ( ) const
    • GetStartTime ( ) const
    • Wait ( ) const
    • GetStatus ( ) const
    Add IsConstant flag to TensorInfo b082ed0 https://review.mlplatform.org/c/ml/armnn/+/5842
      class TensorInfo: Size of this class has been increased from 80 bytes to 88 bytes. This is due to the addition of private member bool m_IsConstant.
      An object of this class can be allocated by applications which the old size will be hardcoded at original compile time. Call of any exported constructor will break the memory of neighboring objects on the stack or heap.
      struct BindingPointInfo: Size of field m_TensorInfo has been changed from 80 bytes to 88 bytes. The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications.
    Add protected mode to ArmNN CreationOptions 15fcc7e https://review.mlplatform.org/c/ml/armnn/+/5963
      struct IRuntime::CreationOptions: Field m_ProtectedMode has been added at the middle position of this structural type. Size of the inclusive type has been changed. Layout of structure fields has been changed and therefore fields at higher positions of the structure definition may be incorrectly accessed by applications.
    Add the Custom Memory Allocator interface definition 801e2d5 https://review.mlplatform.org/c/ml/armnn/+/5967
      struct IRuntime::CreationOptions: Field m_CustomAllocator has been added at the middle position of this structural type. Size of the inclusive type has been changed. Layout of structure fields has been changed and therefore fields at higher positions of the structure definition may be incorrectly accessed by applications.
    Add front end support for UnidirectionalSequenceLstm on ArmNN 8ed39ae https://review.mlplatform.org/c/ml/armnn/+/5956
      struct LstmDescriptor: Field m_TimeMajor has been added to this type. This field will not be initialized by old clients. Size of the inclusive type has been changed.
    JSON profiling output 554fa09 https://review.mlplatform.org/c/ml/armnn/+/5968
      struct INetworkProperties: Field m_ProfilingEnabled has been added to this type. This field will not be initialized by old clients.
    ConstTensorsAsInput: FullyConnected 81beae3 https://review.mlplatform.org/c/ml/armnn/+/5942
      class ILayerVisitor: Pure virtual method VisitFullyConnectedLayer ( IConnectableLayer const*, struct FullyConnectedDescriptor const&, char const* ) has been added to this class. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.The following previously deprecated functions have been removed:
    • INetwork::AddFullyConnectedLayer(struct FullyConnectedDescriptor const& fullyConnectedDescriptor, ConstTensor const& weights, ConstTensor const& biases, char const* name)
    • INetwork::AddFullyConnectedLayer(struct FullyConnectedDescriptor const& fullyConnectedDescriptor, ConstTensor const& weights, char const* name)
    Adds CustomAllocator interface and Sample App c1c872f https://review.mlplatform.org/c/ml/armnn/+/5987
      struct IRuntime::CreationOptions: Field m_CustomAllocatorMap has been added at the middle position of this structural type. Size of the inclusive type has been changed. Layout of structure fields has been changed and therefore fields at higher positions of the structure definition may be incorrectly accessed by applications.
      class BackendRegistry: Fie...
    Read more

    Release 21.05

    20 May 16:16
    Compare
    Choose a tag to compare

    Summary

    The 21.05 Release of Arm NN was focused on providing new capabilities to allow users attain higher performance by:

    • Making the Arm NN Core thread safe opening the possibility of running multiple inferences on the same model in parallel software threads.
    • Allowing graphs on the GPU backend import their input and output buffers either from correctly aligned main memory or from kernel memory exposed as a dma_buf, thus reducing memory usage and saving the time involved in copying data into and out of the GPU memory space.

    In addition to this, support was added to allow the MobileBERT network to be parsed and run.

    Finally three deprecated components: the Tensorflow Parser, the Caffe Parser and the Arm NN Quantizer tool, were removed.

    New Features

    • CAST Operator support added on CpuRef, CpuAcc, GpuAcc Backends.
    • Non-const weights support added on FULLY_CONNECTED layer for CpuRef Backend.
    • Enable Input and Output Memory Import on GPU (Malloc and DmaBuf).
    • Asynchronous Network Execution for CpuRef Backend.
    • Optimisation added to fuse PAD into Pooling2d if possible.
    • ASR sample application added to samples directory.

    TfLite Parser

    • ABS Operator Support added.
    • ARG_MIN Operator Support added.
    • CAST Operator Support added.
    • LOGICAL_NOT Operator Support added.
    • RSQRT Operator Support added.
    • Non-const weights support added on FULLY_CONNECTED layer.
    • Turn off Biases when data location is -1 (Added to support MobileBERT).

    ArmNN Serializer/Deserializer

    • Added Signed64 support to Serializer and Deserializer.
    • Added QAsymmS8 support to Serializer.
    • Added L2 Pooling algorithm to Deserializer.

    ExecuteNetwork App Changes

    • Asynchronous Network Execution support (Currently for CpuRef Backend).
    • Re-enabled GPU profiling in ExecuteNetwork.

    Deprecated features

    • Deprecated the Caffe Parser.
    • Deprecated the Tensorflow Parser.
    • Deprecated the Arm NN Quantizer tool.
    • Deprecated m_Output_Type from the ArgMinMaxDescriptor: the output type is solely determined by the data type of the output tensor.

    Bug Fixes

    • Fix CheckProfilingObjectUids test failing on Ubuntu 21.04.
    • Fix added to Serializer to handle situations where a shape has some unspecified dimensions.
    • Fix added to AddBroadcastReshapeLayer optimisation to prevent modification to constant layers with multiple connections.
    • Fix added to use CMake value ${CMAKE_THREAD_LIBS_INIT} throughout instead of 'pthread'.
    • Fix added to handle negative axis correctly in ARG_MAX (TfLiteParser) and SPLIT (TfLiteParser & TfLiteDelegate) operators.
    • Fixed TfLiteDelegate Normalization & Softmax for Android if NDK is less than r21.
    • Fixed Deserializer issue where layer bindings were incorrectly assigning the tensor info of one output to all 4 outputs.
    • Fixed x86_64 ArmNN DockerFile.
    • Fixed TuningLevel enumeration values to be consistent.
    • Fixed YoloV3 test application's incorrect use of std::abs.
    • Improved performance on SqueezeNet v1.1.

    Other Changes

    • Removed cross-wiring in DepthwiseConvolution2d. The permutation of the full tensor info is now performed in armnnUtils::Permuted.
    • Moved doctest third-party library to armnn from delegate.
    • Updated TfLiteDelegate Python Integration guide with new links. Also added information about the TFLite Model Benchmark Tool.
    • Updated Cross Compiling Guide.
    • Improved Graph memory usage.

    Known Issues

    • Intermittent issue on Dma Buf memory import on GPU. This is fix in Mali Driver r30p0.
    • There might be performance regressions against v20.08 in Inception v3 using int8 data types on Arm Mali-G77 GPUs. Currently under investigation.

    ABI/API Changes

    The following front-end API changes have occurred during the implementation of 21.05 that users should be aware of before upgrading. Due to these changes we have bumped our ARMNN_VERSION to 25.0.0 while also bumping our Parsers and Delegate to 24.1.0 following Semantic Versioning guidelines.

    Feature SHA Gerrit Review Resultant ABI/API changes
    Add Async Queue to IRuntime e813d67 https://review.mlplatform.org/c/ml/armnn/+/5493
    • For struct INetworkProperties the member variable size_t m_NumThreads has been added resulting in the change of size of the inclusive type.
    Add front-end support for CAST + Add TfLiteParser support for CAST b392e98 https://review.mlplatform.org/c/ml/armnn/+/5374
    • For enum class LayerType a new enum for Cast has been added which changes the class member LastLayer to equate to Cast rather than the previous Unmap. We advise against the usage of armnn::LayerType::LastLayer where stability is required.
    Add MemorySourceFlags to TensorHandleFactoryRegistry::GetFactory 73d3e2e https://review.mlplatform.org/c/ml/armnn/+/5481
    • For struct INetworkProperties the member variable MemorySource m_InputSource has been added resulting in the change of size of the inclusive type.
    • For struct INetworkProperties the member variable MemorySource m_OutputSource has been added resulting in the change of size of the inclusive type.
    Move ILayerSupport.hpp to backends folder cae4568 https://review.mlplatform.org/c/ml/armnn/+/5500
    • include/armnn/ILayerSupport.hpp has been moved to include/armnn/backends/ILayerSupport.hpp this is to reflect the fact that ILayerSupport is a back-end interface. Front end users should move to using ABI stable GetILayerSupportByBackendId()
    NonConstWeights: Update front-end and TfLiteDelegate support for FullyConnected Operator f0a6dec https://review.mlplatform.org/c/ml/armnn/+/5180
    • For class LayerSupportHandle the member variable BackendId m_BackendId has been added resulting in the change of size of the inclusive type.
    • For struct FullyConnectedDescriptor the member variable bool m_ConstantWeights has been added resulting in the change of size of the inclusive type.
    Refactor Async Network API 55a8ffd https://review.mlplatform.org/c/ml/armnn/+/5365
    • For struct INetworkProperties the member variable bool m_AsyncEnabled has been added resulting in the change of size of the inclusive type.
    Remove cross-wiring in depthwise 7612bd6 https://review.mlplatform.org/c/ml/armnn/+/5411
    • For method armnnUtils::Permuted() the argument bool perChannelPermute which was defaulted to false has been removed.
    Remove Quantizer 4a621c4 https://review.mlplatform.org/c/ml/armnn/+/5486
    • The formerly deprecated class INetworkQuantizer has been removed and so any code making use of it must be altered.

    The following back-end API changes have occurred during the implementation of 21.05 that users should be aware of before upgrading.

    Feature SHA Gerrit Review Resultant ABI/API changes
    NonConstWeights: Update front-end and TfLiteDelegate support for FullyConnected Operator 16fb1a2 https://review.mlplatform.org/c/ml/armnn/+/5180
    • For class IBackendInternal the virtual method HasCapability ( enum BackendCapability ) const has been added. As a result the layout of v-table has been changed. Calls of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
    Move ILayerSupport.hpp to backends folder cae4568 https://review.mlplatform.org/c/ml/armnn/+/5500
    • include/armnn/ILayerSupport.hpp has been moved to include/armnn/backends/ILayerSupport.hpp this is to reflect the fact that ILayerSupport is a back-end interface.
    Generalise ConstCpuTensorHandle 1f58f03 https://review.mlplatform.org/c/ml/armnn/+/5515
    • include/armnn/backends/CpuTensorHandleFwd.hpp has been deprecated and replaced with include/armnn/backends/TensorHandleFwd.hpp and the forward declarations it contained have also been renamed to remove "Cpu".
    Enable import on GPU e5f0b24 https://review.mlplatform.org/c/ml/armnn/+/5605
    • For class IBackendInternal the virtual method CreateWorkloadFactory with MemorySourceFlags inputFlags/outputFlags arguments has been added. As a result the layout of v-table has been changed. Calls of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
    • For class IBackendInternal the virtual method RegisterTensorHandleFactories with MemorySourceFlags inputFlags/outputFlags arguments has been added. As a result the layout of v-table has been changed. Calls of any virtual method at higher position in this class or i...
    Read more

    Release 21.02

    24 Mar 17:02
    Compare
    Choose a tag to compare

    Summary

    The 21.02 Release provides two major pieces of functionality: one performance related, namely the ability to cache compiled OpenCL kernels when running on the GPU backend. Cached kernel files can be loaded into the runtime eliminating the cost of compiling their associated graphs resulting in significant performance uplift on first execution of a newly loaded graph. The second is that the operators which were not added to the Arm NN Tensorflow Lite delegate in the 20.11 release are now there giving the delegate the same level of operator support as the android-nn-driver.

    The other features of the 21.02 release are updating the Tensorflow Lite parser to work with Tensorflow Lite v2.3.1 and changes to the public APIs to make binary compatibility between releases easier to maintain. Each group of public interfaces SDK, backend, TfLiteDelegate etc. have been separately versioned and will have their version independently updated in subsequent releases to indicate changes in their Application Binary Interface (ABI).

    Support has also been added for the SSD-MobileNetv2 and SSD-MobileNetv3 models. The models have been verified to execute correctly with good performance. Work to generate accuracy figures for the models using the tensorflow lite coco_object_detection tool is on-going and will be published when complete.

    Two configuration options for the CpuAcc backend have been added one to specify the number of threads to use when executing ML workloads on the CPU the other to load an MLGO tuning file to increase the performance of GEMM operations on the CPU.

    New Features:

    • Added ability to save and load the ClContext through ExecuteNetwork and the Android-nn-driver.
      • This will remove the time taken for initial compilation of OpenCL kernels and speed up the first execution.
    • Semantic Versioning for ArmNN APIs.
    • Arm NN TfLite Delegate (more extensive details in Arm NN TfLite Delegate section)
      • Further operator support.
      • Add capability to build on Android.
    • Verification of Support of SSD-MobileNetv2 & SSD-MobileNetv2.

    TfLite Parser

    • Added support for ELU activation.
    • Support Dilation in Conv2D.

    ONNX Parser

    • Support Dilation in Conv2D.

    Caffe Parser

    • Added Dilation support.
    • Added argmax deconv support.

    ArmNN Serializer

    • Serialise ArmNN Model on android-nn-driver.

    Public API Changes:

    Backend API Changes:

    ExecuteNetwork App Changes:

    • Two optimization parameters were added to enable saving and loading of the ClContext.
      • save-cached-network
      • cached-network-filepath

    Other changes:

    • Make it easier for backends to traverse the subgraph during optimization by sorting Subgraphview layers on construction.
    • Added CL/NEON implementation of RANK Workload.
    • Added REDUCE layer for REDUCE_MAX, REDUCE_MIN, REDUCE_SUM operators.
    • Added REDUCE_MAX, REDUCE_MIN, and REDUCE_SUM operator support CpuRef Backend.
    • Added REDUCE_MAX, REDUCE_MIN, and REDUCE_SUM operator support/workload CpuAcc Backend.
    • Added REDUCE_MAX, REDUCE_MIN, and REDUCE_SUM operator support/workload GpuAcc Backend.
    • Added more Fused Activation unit tests.
    • Handle Neon optionality on 32 bit linux platforms.
    • Validated MobileNetv2-SSD and MobileNetv3-SSD support.
    • Add CpuAcc specific configuration option numberOfThreads.
    • Add GpuAcc MLGO tuning file configuration argument.

    Bug Fixes:

    • Default stride values in depthwise and convolution to 1 instead of 0.
    • Fixed transpose conv InferOutputShape.
    • Fix incorrect padding value for asymmetric quantized type.
    • Fix build breaks for armnnDeserializer test and Threads.cpp for macosx.
      • Further fix for macosx where filenames are case insensitive.
    • Unittest failure on mipsel/s390x/ppc64/powerpc.
    • ArmnnQuantizer incorrectly Quantizes all data types.
    • Fixed TFLite parser not parsing TransposeConvolution.
    • Fix TfLite parser and ExecuteNetwork issues where error was not thrown in some cases.
    • Fix wav2letter not producing correct output for Neon backend.
    • Fix ReduceLayer InferOutputShape issue where the correct axis data will be read in TfLiteParser.
    • Fix Reduce workload to allow input tensors of any rank into the validate function.
    • Updated JsonPrinterTestImpl to use CpuLogitsDLogSoftmaxKernel_#.
    • Add missing serializer support for m_DimensionsSpecificity.
    • Removed unnecessary friend function in INetwork and fixed TransformIterator operator= to allow compilation on further compilers.

    Known issues:

    Deprecation Notification:

    The following components have been deprecated and will be removed in the next release (21.05) of Arm NN.

    Ubuntu 16.04 LTS is reaching End of Life.

    Ubuntu Linux 16.04 LTS will no longer be supported by April 30, 2021.
    At that time, Ubuntu 16.04 LTS will no longer receive security patches or other software updates.
    Consequently Arm NN will from the 21.08 Release at the end of August 2021 no longer be officially supported on Ubuntu 16.04 LTS but will instead be supported on Ubuntu 18.04 LTS.

    TfLite Delegate

    New Features:
    • Enabled ELU Activation.
    • Enabled HARD_SWISH Activation.
    • Added GATHER operator support.
    • Added Logical AND, NOT and OR operator support.
    • Added PAD operator support.
    • Added PADV2 operator support.
    • Added SPLIT operator support.
    • Added SPLIT_V operator support.
    • Added ARG_MAX operator support.
    • Added ARG_MIN operator support.
    • Added LOCAL_RESPONSE_NORMALIZATION operator support.
    • Added L2_NORMALIZATION operator support.
    • Added BATCH_TO_SPACE_ND operator support.
    • Added SPACE_TO_BATCH_ND operator support.
    • Added DEPTH_TO_SPACE operator support.
    • Added SPACE_TO_DEPTH operator support.
    • Added SUM operator support.
    • Added REDUCE_MAX, REDUCE_MIN operator support.
    • Added FLOOR operator support.
    • Added OptimizerOptions
      • Reduce Float32 to Float16.
      • Reduce Float32 to BFloat16.
      • Enable debug data.
      • Enable memory import.
    • Added STRIDED_SLICE operator support.
    • Added LSTM operator support.
    Other Changes:
    • Provided Android build.
    • Removed Tensorflow requirement.
    Bug Fixes:
    • Fixed fused activation in Fully Connected layer.
    • Fixed TfLiteDelegate Reshape operator failure when running models with 2D shape tensor.
    Known Issues:

    Note: We have added pre-built binaries (please see the Assets) of 21.02 Arm NN along with this release. Please refer to BuildGuideNative.md guide in the armnn/delegate for more information.

    Build dependencies:

    Tools Supported Version
    Git 2.17.1 or later
    Scons 2.4.1 (Ubuntu) and 2.5.1 (Debian)
    CMake 3.5.1 (Ubuntu) and 3.7.2 (Debian)
    Boost 1.64
    Tensorflow 2.3.1
    Caffe tag 1.0
    Flatbuffer 1.12.0
    Protobuf 3.12.0
    Eigen3 3.3
    Android NDK r20b
    mapbox/variant 1.2.0
    Android 11 Compatibility Testing was performed using the following:
    Android Tag Android Build ID Mali Driver Android Compatibility Test Suite Android Vendor Test Suite
    android-11.0.0_r1 RP1A.200720.009 R26P0_01EAC0, R30P0_01EAC0 11_r2 (6965179) 11_r2 (6961477)
    Android 10 Compatibility Testing was performed using the following:
    Android Tag Android Build ID Mali Driver
    android-10.0.0_r39 QQ3A.200605.002.A1 R23P0_01REL0

    Note: Going forward Arm NN will be making document updates to the latest release, if we have missed any, and these will be available in github by selecting the doc tag corresponding to the release. For example, we have tag 21.02.doc1 which basically is the 21.02 release and also includes some of the documents which we updated for the 21.02 Release. There are no changes functionality wise. These document changes are cherry picked to the branches/armnn_21_02.

    Release 20.11

    27 Nov 16:54
    Compare
    Choose a tag to compare

    Summary

    The 20.11 Release was intended to provide major improvements to usability and performance in addition to delivering some additional functionality.

    The usability enhancements were:

    • Added Debian packaging for ArmNN Core, TfLite Parser and PyArmNN to Ubuntu Launchpad. This means users on Linux no longer need to go through a source repository setup and compile in order to start working.
    • Addition of TfLite Delegate as well as 21 of its most valuable operators. Allows a much larger set of models to be executed as operators that are not accelerated in the delegate will execute in the TfLite interpreter.
    • Removal of the boost framework from all ArmNN code bar our unit tests. Simplifies deployment as the dependency on boost no longer exists.
    • Website updates (better layout and more examples).

    The performance enhancements were:

    • ArmNN integration of Compute Library Activation and Batch Normalization fusing.
    • ArmNN exposed the Compute Library fastmath option as a parameter that can be set on a per model basis and in some scenarios will result in the selection of a faster convolution algorithm at the cost of some accuracy (winograd).

    The additional functionality was:

    • Addition of high priority partner requested Logical AND/OR/NOT operators in NNAPI.
    • Support for Android R, verified against CTS 11_r3 (Build Id: 20201114.173303).
    • Added support for the EfficientNet-Lite Model.

    New Features:

    • Added Debian packaging, which allows ArmNN to be installed via our APT repository on Ubuntu's Launchpad.
    • Added ability to turn on the Compute Library fast_math option through ExecuteNetwork and the Android-nn-driver.
      • Using the fast_math flag can lead to performance improvements in fp32 and fp16 layers but at the cost of some accuracy.
      • The fast_math flag will not have any effect on int8 performance.
    • Added support for Logical NOT, AND and OR for CpuRef, CpuAcc and GpuAcc.
    • Added optimization to fuse BatchNorm into Convolution and Depthwise Convolution in fp32 and fp16.
    • Added backend specific optimization to fuse Activations into the previous workload.
      • Currently Activations can be fused with Addition, BatchNorm, Convolution, Depthwise Convolution, Division, Multiplication or Subtraction workloads on both CpuAcc and GpuAcc.
      • Not all workloads can support all Activations.
    • Added AddBroadcastReshapeLayer as optimizer.
    • Added Map layer and Map workload. This layer has 1 input slot and 0 output slots and simply calls ->Map() on the input tensor handle.
    • Added Unmap layer and Unmap workload. This layer has N input slot and 0 output slots and simply calls ->Unmap() on the input0 tensor handle. The remaining inputs are used for determining scheduling dependencies.
    • Added support for TfLite Delegate (More information below in TfLite Delegate section).

    TfLite Parser:

    • Remove AddBroadcastReshapeLayer from TfLite Parser and added to optimizations.
    • TfLite version updated to 2.3.1.

    Tf Parser:

    • Tensorflow version updated to 2.3.1.
    • Add support for 2nd input to ExpandDims in TfParser.

    ArmNN Serializer:

    • Added support for Logical NOT, AND and OR.

    Public API Changes:

    Backend API Changes:

    ExecuteNetwork App Changes:

    • Added ability to enable Compute Library fast_math through ExecuteNetwork.
    • Added ability to execute models using TfLiteDelegate.
    • Refactored ExecuteNetwork to support cxxopts.
    • Allow use of dynamic backendId in execute network.

    Other changes:

    • Removed remaining boost from ArmNN runtime code (Boost still resides in Unit Tests).
      • Removed boost::format and swapped to fmt
        • Link fmt statically and change to be header-only interface library
      • Removed boost::tokenizer and boost::escaped_list_separator to avoid use of CsvReader
      • Removed boost::make_iterator_range and boost::to_upper_copy
      • Removed boost::transform_iterator and make_transform_iterator
      • Removed boost::numeric_cast
      • Removed boost::math::fpc uses
      • Removed boost/preprocessor.hpp
      • Removed boost::program_options and swapped to cxxopts
      • Removed boost::variant and swapped to mapbox/variant library
      • Removed Boost from standalone dynamic backend
      • Removed remaining Boost references from test executables
    • Extended dump file with info about fused layers.
    • Added SECURITY.md file that contains the security policy, vulnerability reporting procedure and a PGP key that can be used to create secure vulnerability reports.
    • Graph::Print() now outputs more information such as number of input/output tensors and tensor dimensions.
    • Updated Protobuf to 3.12.0.
    • Load dynamic backends for YoloV3 tests.
    • Included layer GUID in SerializeToDot output.
    • Refactored Optimize(...) function to throw exceptions instead of returning null.
    • Speed up the reference backend.
    • Added int32 and int64 ArgMax op support.
    • Added Quantization operator=() function to Tensor.
    • Introduce ModelOptions to OptimizedNetwork.
      • Added ability to pass ModelOption through Network::LoadNetwork() to Workload factory.
    • Added Load-scope dynamic tensor TfLite tests.

    Bug Fixes:

    • Fixed Unittest failure while building using EthosNAcc backend.
    • Fixed crash on model with Fullyconnected Sigmoid Activation by adding supported activations check to Neon FullyConnected validate.
    • Fixed logical VTS skip.
    • Fixed issue where EthosNAcc backend would output all zeros when falling back to CpuRef.
    • Fixed issue causing SSD Mobilenet f16/uint8 to fail on CpuRef via ExecuteNetwork.
    • Fixed issue with signed-int8 quantized model.
    • Fixed error running EfficientNet-Lite on GpuAcc.
    • Fixed validation for per-channel quantization.
    • Fixed segfault between Neon and Cl layers.
    • Fixed NonMaxSuppression.
    • Fixed Yolov3 producing 0s on Neon.
    • Removed Resize from list of layers that need padding in Neon.
    • In Neon and CL MUL workloads, use as convert policy SATURATE if one of the inputs is quantized and WRAP for the rest of cases.
    • Fixed non-channel per axis quantization.
    • Fixed compiler implicit copy deprecation warning by updating Quantization copy constructor.
    • PyArmNN has hard dependencies on all parsers when using cmake.
    • Fixed cxxopts and ghc cross compilation issue.
    • Fixed undefined reference to GetIdStatic() in DynamicBackendsTests.

    Known Issues:

    • Using a comma separated list to specify multiple compute devices --compute CpuRef,CpuAcc when using ExecuteNetwork doesn't work. To use multiple compute devices use --compute CpuRef --compute CpuAcc.

    TfLite Delegate:

    New Features:

    Current supported operators:

    • Activation (ReLu, Relu6, Logistic, and TanH)
    • Comparison (Equal, Greater, GreaterOrEqual, Less, LessOrEqual, NotEqual)
    • Control (Concat and Mean)
    • Convolution (Convolution2d, DepthwiseConvolution2d and TransposeConvolution)
    • ElementWiseBinary (Add, Div, Max, Min, Mul, Sub)
    • ElementWiseUnary (Abs, Exp, Neg, Rsqrt, Sqrt )
    • FullyConnected
    • Pooling (MaxPool2d, AveragePool2d and L2Pool2d)
    • Quantization (Dequantize and Quantize)
    • Redefine (Reshape)
    • Resize (Bilinear and NearestNeightbour)
    • Softmax (Softmax and LogSoftmax)
    • Transpose

    Other Changes:

    • Created the TfLite Delegate sub-directory in ArmNN.
    • Added Fp16 support.
    • Updated Tensorflow from v1.15 to v2.3.1.
    • Activated compiler warnings when building delegate.
    • Added ability to execute models through ExecuteNetwork using the TfLiteDelegate.

    Known Issues:

    Build dependencies:

    Tools Version we support
    Git 2.17.1 or later
    SCons 2.4.1 (Ubuntu) and 2.5.1 (Debian)
    CMake 3.5.1 (Ubuntu) and 3.7.2 (Debian)
    boost 1.64
    Tensorflow 2.3.1
    Caffe tag 1.0
    Onnx 1.6.0
    Flatbuffer 1.12.0
    Protobuf 3.12.0
    Eigen3 3.3.
    Android 10 and 11
    Mali Driver r25p1_01bet0
    Android NDK r20b
    mapbox/variant 1.2.0

    Release 20.08

    28 Aug 13:31
    Compare
    Choose a tag to compare

    Summary

    The 20.08 Release delivers the following:

    • The final tranche of support for Android R ahead of its release in September. Namely QoS functionality, Fill, Rank and the new Resize options.
    • Support for dynamic tensors where the size of any unspecified tensors can be inferred at network load time.
    • Performance enhancements on the NEON backend eliminating unnecessary copying of data in memory, namely:
      • The ability to directly import and export data into an inference graph.
      • The ability to use subtensors where possible in split and concat workloads.
    • Verification of support for TensorFlow Lite wav2letter and wav2letter tiny models (note: need to do further work to verify accuracy in the next release).

    New Features:

    • Added FILL operator support for CpuRef, CpuAcc, and GpuAcc.
    • Added RANK operator support for CpuRef.
    • Added align corner and half pixels support to the RESIZE operator for CpuRef, CpuAcc, and GpuAcc.
    • Refactor TensorShape to support Dynamic Tensors (tensors of unknown dimension sizes or even unknown rank).
    • Enable memory import in CpuAcc.
    • Allow using Sub-Tensors on CpuAcc on ConcatenationLayer if concatenation is along x or y (2 innermost dimensions) and previous layers do not require padding.
    • Allow using Sub-Tensors on CpuAcc on SplitterLayer if split is along x or y (2 innermost dimensions) and next layers do not require padding.

    TfLite Parser:

    • Added DIV operator support.
    • Added LEAKY_RELU operator support.
    • Added NEG operator support.
    • Added HARD_SWISH operator support.
    • Added Dynamic Tensors Type 1 (Output shape can be inferred from Input Shape, Input shape always has to be set, Output shape can be dynamic) Support.

    Public API Changes:

    • Added ITensorHandleFactory::GetCapabilities to calculate capability of the TensorHandleFactor.

    ExecuteNetwork App Changes:

    • Added -infer-output-shape option: if enabled it will enable ShapeInferenceMethod::InferAndValidate on TfLiteParser which supports dynamic tensors type 1 that Output shape can be inferred from Input shape.

    Other changes:

    • Added EXP operator support to CpuAcc and GpuAcc.
    • Added ADD,SUB,DIV,MUL,MAXIMUM and MINIMUM int32 support in CpuRef.
    • Added PRELU float16 support in CpuRef.
    • Added ARGMINMAX float16 support in CpuRef.
    • Added GATHER support for any axis in CpuAcc and GpuAcc (previously the support was only for axis = 0).
    • Added LOGSOFTMAX support in CpuAcc and GpuAcc.
    • Added support for subtensors on Splitter layer for splitting x/y axis if no padding required on next layer.
    • Added support for subtensors on Concat layer for concatenating x/y axis if no padding required on previous layer.
    • Replace boost::filesystem by ghc::filesystem.
    • Remove boot/dll.hpp from dynamic backends test.
    • Separated external profiling server code into a standalone library.

    Bug Fixes:

    • Added ability for Mean Reduction to reduce to scalar.
    • Added ability for Strided Slice to shrink to scalar.
    • Added a check for Strided Slice to not run when stride is negative and ShrinkAxisMask set.
    • Fix edge case for transposeConv2d output shape inference.
    • Fix deserializer output binding TensorShape logic.
    • Fixed issue where AddBroadcastReshapeLayer would always connect the Reshaped input to the first input slot and the other input to the first input slot.
    • Remove TfLite Concat and Pad quantazation validation.

    Build dependencies

    Tools Version we support
    Git 2.17.1 or later
    SCons 2.4.1 (Ubuntu) and 2.5.1 (Debian)
    CMake 3.5.1 (Ubuntu) and 3.7.2 (Debian)
    boost 1.64
    Tensorflow TENSORFLOW_REVISION= 590d6eef7e91a6a7392c8ffffb7b58f2e0c8bc6b (v1.15.0)
    Caffe CAFFE_REVISION= 7d3f8a7ea43fb06cd9804bc90933c7a91cd88ec9
    Onnx ONNX_REVISION= f612532843bd8e24efeab2815e45b436479cc9ab
    Flatbuffer 1.12.0
    Protobuf 3.5.2
    Eigen3 3.3
    Android 9 and 10
    Mali Driver r25p1_01bet0
    Android NDK r20b