Releases: ARM-software/armnn
Release 20.05
New Features:
- Added comparison operators (EQUAL, NOT_EQUAL, GREATER, GREATER_EQUAL, LESS, LESS_EQUAL) support to CpuAcc and GpuAcc backends
- Added EXP operator support to CpuAcc backend
- Added NEG operator support to CpuAcc and GpuAcc backend
- Added QLSTM operator partial support (projection not yet supported) to Reference backend
- Added QLSTM operator full support to CpuAcc and GpuAcc backends
- Added Boolean data type
- Added QAsymmS8 to ArmnnQuantizer
- Added QAsymmS8 data type
- Added BFloat16 data type
- Added BFloat16 support to Reference backend
- Activation
- Addition
- ArgMinMax
- BatchNormalization
- BatchToSpaceNd
- Comparison
- Concat
- Constant
- Convolution2d
- Debug
- DepthToSpace
- DepthwiseConvolution2d
- DetectionPostProcess
- Equal
- Floor
- FullyConnected
- Gather
- Input
- InstanceNormalization
- L2Normalization
- LogSoftmax
- Lstm
- Maximum
- Mean
- MemCopy
- MemImport
- Merge
- Minimum
- Multiplication
- Normalization
- Output
- Pad
- Permute
- Pooling2d
- Quantize
- Division
- Prelu
- Reshape
- Resize
- Slice
- Softmax
- SpaceToBatchNd
- SpaceToDepth
- Splitter
- Stack
- StandIn
- StridedSlice
- Subtraction
- Switch
- TransposeConvolution2d
- Transpose
- Added support for BFloat16 turbo mode
TfLite Parser:
- Added support for STRIDED_SLICE operator
- Added support for EXP operator
- Added support for SPLIT_V operator
- Enabled SPLIT along any dimension
Tf Parser:
- Added support for PACK/STACK
ArmNN Serializer
- Added QSymmS8 data type support Armnn Serializer Schema, ArmnnSchema.fbs
- Added per-axis quantization parameters to ArmnnConverter (Serializer - Deserializer) tool
Public API Changes:
- Added Activate and Deactivate timeline control packet to the External Profiling protocol
Backend API Changes:
- User backend hint API to select preferred backend on a per layer basis
Bug Fixes:
- Fixed segfault parsing reshape layer
- Fixed ArmNN Compile Error when compiled against gcc 9
- Fixed unit test errors when running on raspberry pi due to the fact that the size of thread::id is platform dependent
- Fixed LSTM layer CellToInputWeights
Other changes:
- Separated out BasePipeServer library from GatorDMock
- Introduced polymorphic_downcast implementation
- Introduced numeric_cast implementation
- Introduced PolymorphicPointerDowncast implementation
- Removed boost::ignore_unused
- Removed boost::polymorphic_pointer_downcast
- Removed boost::polymorphic_downcast
- Eliminated space restriction in batch norm layer which was giving errors when loading a quantized model
- Doxygen Beautification
Integration of PyArmNN
Known issues:
Build Dependencies:
Tools | Version we support |
---|---|
Git | 2.17.1 or later |
SCons | 2.4.1 (Ubuntu) and 2.5.1 (Debian) |
CMake | 3.5.1 (Ubuntu) and 3.7.2 (Debian) |
boost | 1.64 |
Tensorflow | TENSORFLOW_REVISION= 590d6eef7e91a6a7392c8ffffb7b58f2e0c8bc6b (v1.15.0) |
Caffe | CAFFE_REVISION= 7d3f8a7ea43fb06cd9804bc90933c7a91cd88ec9 |
Onnx | ONNX_REVISION= f612532843bd8e24efeab2815e45b436479cc9ab |
Flatbuffer | 1.10.0 |
Protobuf | 3.5.2 |
Eigen3 | 3.3 |
Android | 9 and 10 |
Mali Driver | r23 |
Android NDK | r20b |
Release 20.02
New Features:
- Added per-channel quantization support for Convolution2d, DepthwiseConvolution2d on CpuAcc & GpuAcc backend.
- Added post-optimized network structure for external profiling support.
- Added inference timeline trace for external profiling support.
- Added further support to Quantize CpuRef workload to make use of Decoder/Encoder types. This allows for requantize operations.
- Added NEON support for:
- SpaceToBatchNd
- Division
- Added ElementwiseUnaryLayer with ops for Abs, Exp, Sqrt, RSqrt and Neg. Standalone layers for Abs and RSqrt are now deprecated.
- Added Support for Signed quantized data types (QSymmS8 & QAsymmS8).
- Added sample of standalone dynamic backend.
- Added DynamicSample as a basic example of using the ArmNN SDK API with the standalone sample dynamic backend.
TfLite Parser:
- Added support for RESIZE_NEAREST_NEIGHBOR.
- Added support for DEQUANTIZE.
- Added support for QUANTIZE.
- Updated required TfLite version to 1.15.
- Added support for new quantized data types introduced in TfLite version 1.15.
- Added support for variable Flatbuffer libs for debug and release.
Tensorflow Parser:
- Added support for StridedSlice.
- Added support for Pack/Stack.
ArmNN Serializer/Deserializer:
-
Added support for deserialization to the following ArmNN layers:
- ElementwiseUnary
- Resize
- ResizeBilinear
-
Added support for serialization to the following ArmNN layers:
- ResizeBilinear
Public API Changes:
- Added New API to IRuntime::CreationOptions for passing parameters direct to the backends.
Backend API Changes:
- Added WorkloadFactoryBase class with default empty implementations. (Analogous to the existing LayerSupportBase class).
Bug Fixes:
- Fixed ONNX Parser bug where segmentation fault was occurring.
- Fixed SendCounterPacket hanging for indefinite time.
- Fixed compilation error when building for Linux (non Android).
- Fixed issues due to #include Windows.h.
- Fixed build error on gcc 7+ for implicit switch statement fallthroughs.
- Fixed issue in Serializer where models with multiple inputs/outputs could be serialized with incorrect binding ids.
Other changes:
- ~15% reduction in binary size by replacing boost logging with custom lightweight logger.
- ArmNN can now be built without warnings with -Wextra compile flag.
- Deprecated DataType::QuantizedSymm8PerAxis. Instead this behaviour can be selected by setting the data type to DataType::QSymmS8 and setting multiple scale values on the TensorInfo quantization parameters.
- Fixed crash when running ArmNN on a system without OpenCL drivers and a GpuAcc backend is present.
- Initial documentation implemented via Doxygen.
Known issues:
- External profiling for ArmNN on the Raspberry Pi platform is currently not fully supported and will result in some External Profiling unit tests failing.
v19.11.1
ArmNN 19.11.1 Release Notes
This is an incremental release of ArmNN 19.11 to fix CTS issues.
ArmNN SDK
New Features:
TfLite Parser:
Public API Changes:
- The IsReshapeSupported(const TensorInfo& input, const ReshapeDescriptor& descriptor, Optionalstd::string& reasonIfUnsupported = EmptyOptional()) in ILayerSupport has been deprecated.
- IsReshapeSupported(const TensorInfo& input, const TensorInfo& output, const ReshapeDescriptor& descriptor, Optionalstd::string& reasonIfUnsupported = EmptyOptional()) has been added to ILayerSupport.
Backend API Changes:
Other changes:
Known issues:
v19.08.1
ArmNN 19.08.01 Release Notes
This is an incremental release of ArmNN 19.08 to fix CTS issues.
ArmNN SDK
New Features:
TfLite Parser:
Public API Changes:
Backend API Changes:
Other changes:
Known issues:
Android NNAPI driver
Deprecated features:
New Features:
Other changes:
All errors and crashes occurring on the 19.08 release when running the Android Compliance Test Suite (CTS) R2 on Android 10 (Android Q) have been fixed, including:
- Driver termination during TestRandomGraph when using GPU acceleration (ie. ARMNN_COMPUTE_CL_ENABLE:=1)
- Some TestRandomGraph/RandomGraphTest tests which include CONCATENATION and L2_POOLING_2D operators.
- Some TestRandomGraph/RandomGraphTest tests which include operators taking the optional data layout argument if the argument is present and set to NCHW.
- Some TestRandomGraph/RandomGraphTest tests which include operators using FLOAT16 input.
- Some TestRandomGraph/RandomGraphTest tests which include RESIZE_BILINEAR operators.
- Some TestRandomGraph/RandomGraphTest tests which include RESIZE operators.
- Some TestRandomGraph/RandomGraphTest tests which include RESIZE_NEAREST_NEIGHBOR operators.
- Some TestRandomGraph/RandomGraphTest tests which include SPACE_TO_DEPTH operators.
- TestRandomGraph/SingleOperationTest#ADD_V1_0/31
- TestRandomGraph/SingleOperationTest#MUL_V1_0/31
- TestRandomGraph/SingleOperationTest#SUB_V1_2/31
- TestRandomGraph/SingleOperationTest#STRIDED_SLICE_V1_2/17
- TestRandomGraph/SingleOperationTest#PRELU_V1_2/14
- Several other errors occurring in Activations when debug.nn.partition is set to 2
Backend API Changes:
Known Issues:
Release 19.11
New Features:
- Added Abs support to CpuRef, CpuAcc and GpuAcc backend.
- Added Comparison support to CpuRef, covering the following operations: Equal, Greater, GreaterOrEqual, Less, LessOrEqual, NotEqual. Refactored the Equal and Greater layers previously present in terms of the new Comparison layer.
- Added Rsqrt support to CpuAcc and GpuAcc backend.
- Added ArgMinMax support to CpuRef, CpuAcc and GpuAcc backend.
- Added InstanceNormalization support to CpuRef, CpuAcc and GpuAcc backend.
- Added LogSoftmax support to CpuRef backend.
- Added Slice support to CpuAcc backend.
- Added DepthToSpace support to CpuRef, CpuAcc and GpuAcc backend.
- Added StandIn Layer which is a layer to represent "unknown" or "unsupported" operations in the input graph. StandIn layer has a configurable number of input and output slots. No workloads created for StandIn layer.
- Added QSymm8PerAxis support for Encoder and Decoder.
- Added per-channel quantization support for Convolution2d, DepthwiseConvolution2d and TransposeConvolution2d on CpuRef backend.
- Added FSRCNN support to CpuRef (fp32 and uint8), CpuAcc (fp32 and uint8) and GpuAcc (fp32) backend.
- Added initial external profiling support. A new ProfilingService class allows to connect to an external profiling service and to exchange an initial set of counter metadata, such as advertising a list of counters the client can select from, and periodically send the values of the selected counters to the client. The profiling support is compatible with DS5 and Streamline clients. The profiling service relies on gatord to forward the packets to the external profiling server.
- Added utility functions for creating Timeline Packets:
- Timeline Label Binary Packet
- Timeline Entity Binary Packet
- Timeline Event Class Binary Packet
- Timeline Message Directory Package
- Timeline Event Binary Packet
- Added SendTimelinePacket implementation to send Timeline Packets:
- Timeline Label Binary Packet
- Timeline Entity Binary Packet
- Timeline Event Class Binary Packet
- Timeline Message Directory Package
- Timeline Event Binary Packet
- Added TimelineUtilityMethods class to manage profioling entities
- Added utility function to create a named typed entity
- Added utility function to create a named typed child entity
- Added utility function to create a typed label
- Added utility function to declare a label
- Added utility function to record an event
- Added Timeline Decoder
- Added ITimelineDecoder C interface
- Added an example implementation of ITimelineDecoder
- Added command handlers for the timeline directory and objects
TfLite Parser:
- Added support for Transpose.
- Added support for parsing unsupported layers by representing them as a placeholder StandInLayer in the resulting Armn NN network. Please note that such networks will not be executable, as there are no workloads for StandInLayer – its only purpose is to maintain the original network topology.
- Fixed a bug in parsing custom layers that caused the TfLiteParser to attempt to parse all custom layers as a DetectionPostProcess layer. Now unsupported custom layers are parsed as a StandInLayer – similarly to unsupported built-in layers.
- Added support for Slice.
Public API Changes:
Backend API Changes:
- New CreateTensorHandle functions have been added to ITensorHandleFactory to allow for the creation of TensorHandles with unmanaged memory.
Other changes:
- Modified ExecuteNetwork so that it can generate dummy input data if no input data files are specified. This can be useful when the user is not interested in inference results, but in performance metrics or if they only wish to see whether Arm NN can execute a certain network.
- CTS bug fix in pooling layers on assessing when the kernel is solely over padding values.
- Change to algorithm for calculating subgraphs to submit to backends for optimisation to remove dependency cycles and unwanted subgraph splitting.
- Added Encoder and Decoder support to Dequantize layer.
Known issues:
Release 19.08
New Features:
- Added Dequantize layer support to CpuAcc and GpuAcc backend
- Added Quantize layer support to CpuAcc and GpuAcc backend
- Added Quantized_LSTM layer support to CpuAcc and GpuAcc backend
- Added PReLU layer support to CpuRef, CpuAcc and GpuAcc backend
- Added ResizeNearestNeighbor layer support to CpuRef, CpuAcc and GpuAcc backend
- Added SpaceToDepth layer support to CpuRef, CpuAcc and GpuAcc
- Added StridedSlice layer support to CpuAcc backend
- Added TransposeConvolution2d layer support to CpuRef and GpuAcc backend
- Added customizable Padding support to CpuAcc and GpuAcc backend.
- Added QuantisedAsymm8 support to the following reference workloads:
- L2Normalization
- PReLU
- Rsqrt
- SpaceToDepth
- Added QuantisedSymm16 support to the following reference workloads:
- BatchNormalization
- BatchToSpaceNd
- DetectionPostProcess
- Floor
- FullyConnected
- Gather
- L2Normalization
- Mean
- Normalization
- Pad
- Permute
- Pooling2d
- PReLU
- Reshape
- Resize
- Rsqrt
- Softmax
- SpaceToBatchNd
- SpaceToDepth
- Splitter
- StridedSlice
- Added layer normalization support for Ref, CL and Neon Lstm workload
- Added dilated convolution2d support for CL and Neon
- Added axis support for Softmax for Ref backend.
- The reference backend can now be built optionally as all the other backends, it's enabled by default in the global makefile cmake/GlobalConfig.cmake
(unlike all the other backends).
To enable/disable it, use the new ARMNNREF CMake option (for example, add "-DARMNNREF=0" to disable it).
Or alternatively, to make the any change "permanent", change ArmNN's global makefile (cmake/GlobalConfig.cmake) accordingly, like:
option(ARMNNREF "Build with ArmNN reference support" ON) the default, or:
option(ARMNNREF "Build with ArmNN reference support" OFF) to disable the reference backend
Disabling the reference backend will impact some of the unit tests that are built with ArmNN, as many of them use the reference backend as a way to perform cross-verification and end-to-end tests.
Follow the usage of ARMNNREF through the makefiles and ARMNNREF_ENABLED in the code to know which unit tests may be excluded if the reference backend is disabled.
- Added dynamic backend loading support, backends can now be loaded dynamically at runtime.
Updated the readme file at src/backends/README.md to explain the feature.
The public release note with technical details on the implementation: https://developer.mlplatform.org/w/arm_nn/design_notes/dynamic_backend_loading/
TfLite Parser:
- Added support for L2Normalization, TransposeConvolution2d
Public API Changes:
Backend API Changes:
- Added GetAPIVersion method to retrieve the current version of the Backend API.
- Added BackendVersion object to handle the Backend API and the dynamic backend versions.
- Added notes in the README file to describe the base interface for dynamic backends and the versioning strategy to enforce ABI compatibility.
- Added notes in the README file to describe how to specify the paths where to load the dynamic backends from.
- Added notes in the README file to describe the naming convention the dynamic backends files should comply with in order to be processed by ArmNN.
- Any available/valid dynamic backend is now loaded during the Runtime object creation, and added to the Backend Registry. ...
Release 19.05
New Features:
- Added Caffe, Onnx, and TfLite Support to Armnn Converter executable.
- Added support for QuantisedSymm16 data type.
- Added new quantization scheme for QuantisedSymm16 quantization target.
- Added QuantisedSymm16 support to the following workloads:
- Reference Elementwise Workload (Addition, Subtraction, Division, Multiplication, Maximum, Minimum, Greater, and Equal Operators).
- Reference Activation Workloads (Linear, Sigmoid ReLU, SoftReLU, BoundedReLU, LeakyReLU, Sqrt, Square, Abs, Tanh).
- Reference LSTM Workload.
- Reference Concat Workload.
- Reference Constant Workload.
- Reference Convolution2D Workload.
- Reference DepthwiseConvolution2d Workload.
- Extended QuantizerVisitor class to support customizable quantization scheme (selected based on the parameter from QuantizerOptions struct).
- Extended QuantizerVisitor class to support an option to preserve input and output types by inserting Quantize and Dequantize layers to the quantized network.
- Dequantize layer support for CpuRef backend
- Quantize layer support for CpuRef backend
- Added support for attaching custom callback function to the Debug layer
- Added new method RegisterDebugCallback(...) to IRuntime which allows a custom callback function to be attached to the Debug layer.
- Added CpuAcc and GpuAcc support for merging height in NCHW or width in NHWC cases.
- Added CpuAcc support for the sigmoid activation function.
- Added TfLite Parser support for :
- Rank-0 operands.
- Split operator.
- Unpack operator.
- TanH operator.
- Added support for TfLite DeepSpeech v1 model.
- Support for Serialization / Deserialization of the following ArmNN layers:
- Normalization
- BatchNormalization
- L2 Normalization
- Minimum
- Maximum
- Equal
- Rsqrt
- Floor
- Greater
- ResizeBilinear
- Subtraction
- StridedSlice
- Mean
- Merger (concat)
- Splitter
- DetectionPostProcess
- LSTM
Public API Changes:
- Implemented QuantizerOptions struct to enable customization of network quantization process.
- Updated Create(...) and CreateRaw(...) static method of the INetworkQuantizer class to take an additional QuantizerOptions argument (default value provided).
- Added Quantization Scheme parameter to the Armnn Quantizer Command Line tool, valid options are QAsymm8 and QSymm16. The default scheme should be QAsymm8
- Added type preservation parameter to the Armnn Quantizer tool
- Updated the EraseLayer methods in the Graph API, they no longer return an iterator
- The GetOutput method of the ISubgraphViewConverter interface has been renamed to CompileNetwork
- Updated the Backend API, the old OptimizeSubgraphView method is now deprecated in favor
of a new version the returns a more comprehensive OptimizationViews object, containing:
- A list of successful optimizations, in the form of substitution pairs, associating a SubgraphView representing a portion of the original graph, to a replacent subgraph, also in the form of SubgraphView, containing the substitution layers
- A list of failed optimizations, in the form of SubgraphView objects
- A list of untouched subgraphs, in the form of SubgraphView objects
- The SubGraph class has been renamed (and improved) to SubgraphView, the old definition is now kept as a deprecate alias of SubgraphView
- The method CreateSubGraphConverter of the backend API has been deprecated and it's no longer used by any backend implementation
- INetwork.hpp: AddMergerLayer has been deprecated and replaced by AddConcatLayer
- ILayerSupport.hpp and LayerSupport.hpp: IsMergerSupported has been deprecated and replaced by IsConcatSupported
- ILayerVisitor.hpp: a default implementation of VisitConcatLayer which calls VisitMergerLayer has been provided to ease migration
- LayerVisitorBase.hpp: VisitConcatLayer method added
Backend API Changes:
- The SubGraph class has been renamed SubgraphView.
- The method "SubgraphUniquePtr IBackendInternal::OptimizeSubgraph(const Subgraph& subgraph, bool& optimizationAttempted) const" has been deprecated and should be replaced w...
Release 19.02
New Features:
- Maximum operator support for CpuRef and CpuAcc backend.
- Minimum operator support for CpuRef, CpuAcc and GpuAcc backend.
- Maximum operator support for TensorFlow parser.
- Pad operator support for TensorFlow parser.
- ExpandDims operator support for TensorFlow parser.
- Sub operator support for TensorFlow parser.
- BatchToSpace operator support for GpuAcc backend.
- StridedSlice operator support for CpuRef, GpuAcc and CpuAcc backend.
- SpaceToBatchNd operator support for GpuAcc backend. Some padding configuration is currently not interpret correctly
- Greater operator support for CpuRef, GpuAcc and CpuAcc backend.
- Greater operator support for TensorFlow parser.
- Equal operator support for CpuRef backend.
- Equal operator support for TensorFlow parser.
- AddN operator support for TensorFlow parser.
- Split operator support for Tensorflow parser.STRIDED_SLICE
- Reciprocal of square root (Rsqrt) operator support for CpuRef backend.
- Mean operator support for TensorFlow parser.
- ResizeBilinear operator support for CpuAcc backend.
- Logistic support for TensorFlow Lite parser.
- Logistic support for GpuAcc backend.
- Gather operator support for CpuRef backend.
- Gather operator support for TensorFlow parser.
- TensorFlow Lite parser support for BatchToSpace operator.
- TensorFlow Lite parser support for Maximum operator.
- TensorFlow Lite parser support for Minimum operator.
- TensorFlow Lite parser support for ResizeBilinear operator.
- TensorFlow Lite parser support for SpaceToBatch operator.
- TensorFlow Lite parser support for StridedSlice operator.
- TensorFlow Lite parser support for Sub operator.
- TensorFlow Lite parser support for concatenation on tensors with rank other than 4
- TensorFlow Lite parser support for Detection Post Process.
- TensorFlow Lite parser support for Reciprocal of square root (Rsqrt).
- Detection Post Process custom operator Reference implementation added.
- Support for Serialization / Deserialization of the following ArmNN layers:
- Activation
- Addition
- Constant
- Convolution2d
- DepthwiseConvolution2d
- FullyConnected
- Multiplication
- Permute
- Pooling2d
- Reshape
- Softmax
- SpaceToBatchNd
- New executable to convert network from TensorFlow Protocol Buffers to ArmNN format
- New C++ Quantization API, supported layers are:
- Input
- Output
- Addition
- Activation
- BatchNormalization
- FullyConnected
- Convolution2d
- DepthwiseConvolution2d
- Softmax
- Permute
- Constant
- StridedSlice
- Splitter
- Pooling2d
- FullyConnected
- Reshape
- eMerger
- SpaceToBatch
- ResizeBilinear
Public API Changes:
- Support for the boolean data types. These are specified as 8-bit unsigned integers where zero (all bits off) represents false and any non-zero value (any bits on) represents true.
- AddRsqrtLayer() method added to the graph builder API.
- The profiling event now uses BackendId instead of Compute to identify the backend. BackendId is a wrapper class for the string that identifies a backend, and it is provided by the backend itself, rather than being statically enumerated like Compute.
- Added the new method OptimizeSubGraph to the backend interface that allows the backends to apply their specific optimizations to a given sub-grah.
- The old way backends had to provide a list optimizations to the Optimizer (through the GetOptimizations method) is still in place for backward compatibility, but it's now considered deprecated and will be remove in a future release.
- Added the new interface class INetworkQuantizer for the Quantization API exposing two methods
OverrideInputRange: allowing the caller to replace the quantization range for a specific input layer
ExportNetwork: returning the quantized version of the loaded network
Known issues:
-
Large graphs with many branches and joins can take an excessive time to load, or cause a software hang while loading into ArmNN. This issue affects versions of ArmNN from 18.11 onwards. We are continuing to investigate and will fix the problem in a future release. Models known to be affected include Inception v4 and Resnet V2 101.
-
Merge layer with 8-bit quantized data where the tensors to be merged have different quantization parameters does not work on the GpuAcc or CpuAcc backends. This is known to affect quantised Mobilenet-SSD models, and some quantized Mobilenet v2 models.
Release 18.11
New Features:
• Addition support for 8-bit tensors on the GpuAcc backend
• FullyConnected support for 8-bit tensors on the GpuAcc backend
• Division support for the GpuAcc backend.
• Subtraction support for the GpuAcc and CpuAcc backends.
• Arithmetic Mean operator support for the GpuAcc.
• Pad operator support for GpuAcc and CpuRef backends.
• SpaceToBatchNd operator support for CpuRef backend.
• BatchToSpaceNd operator support for CpuRef backend.
• Added support for NHWC Normalization with 'cross channels' method, including CpuRef backend support. NHWC data layout is not yet supported for 'Within channels' normalization method on any backend.
• Added support for NHWC ResizeBilinear for the CpuRef and GpuAcc backends
• Added support for NHWC Convolution2d for the CpuRef and GpuAcc backends.
• Added support for NHWC DepthwiseConvolution.
• Added support for NHWC Pooling2d for the CpuRef, GpuAcc and Neon backends
• Added support for NHWC L2Normalization.
• Added support for NHWC BatchNormalization.
• Added support for Float32 LSTM for CpuRef backend.
• Added CONCATENATION, FULLY_CONNECTED, MAX_POOL_2D, RELU, RELU6, RESHAPE operators support to the TfLite Parser.
• Added Fully Connected Support for 8-bit tensors on the CpuAcc becked.
• Added arbitrary axis support for the Merger Layer.
Public API Changes:
• armnn::Optional helper class was introduced and used in the IsDepthwiseConvolutionSupported(...) and IsConvolution2dSupported(...) functions to represent optional biases
• The IsXXXSupported(...) free functions now take a BackendId instead of the Compute enum. Backward compatibility is maintained through the automatic conversion from the Compute to the BackendId type.
• The Compute enum and the IsXXXSupported(...) free functions are being deprecated in favor of the IBackend and ILayerSupport interfaces, which provide the same functionality in a more flexible and extensible manner. The deprecated functions will be removed in a future release.
Other changes:
• An issue has been fixed where Profiler JSON output would report units of milliseconds but the data was actually in microseconds.
Release 18.08
This release of Arm NN integrates the latest Compute Library and adds improvements to thread-safety, memory consumption and overall performance.
New Features:
- The amount of system memory needed for a loaded network has been reduced compared to Release 18.05.
- Support for LSTM operator.
- Support for 16-bit floating point including:
- Support for 16-bit floating point weights and bias tensors in ModelBuilder (INetwork) API
- Optimiser option to automatically convert 32-bit floating point models to 16-bit floating point where supported.
- Support for computing inference in 16-bit floating point precision.
- Support for Tensorflow Lite parser including additional operator support for :
- AVERAGE_POOL_2D
- CONV_2D
- DEPTHWISE_CONV_2D
- SOFTMAX
- SQUEEZE
- Support for ONNX parser including additional layer support for:
- Addition
- Convolution
- MatMul
- Max Pool
- Constant
- Relu
- Reshape
- More detailed profiling with JSON output format support.
- Captures CL and Neon kernel level events
Public API Changes:
- API for creating a Runtime object has changed. It no longer takes an armnn::Compute argument but instead requires a CreationOptions object. (See include/armnn/IRuntime.hpp)
- The Optimize function now takes an additional 2 parameters (See include/armnn/INetwork.hpp)
- The backendPreferences which is a vector of compute devices that the user wants to execute the workloads on in preference order. The optimize function will attempt to use the first backend in the list, only falling back to subsequent backends if the first does not support the layer. e.g. a preference list of GpuAcc, CpuAcc will attempt to execute on the Mali GPU, falling back to a v7/v8 ARM CPU if the workload in question is not supported by the GPU
- (Optional) OptimizerOptions parameter which contains the flag to convert a 32-bit floating point model to 16-bit floating point automatically.
Other changes:
- This release of ArmNN requires at least release 18.08 of the Compute Library.
- Fixed an issue where a 4d softmax causes entire network to fail conversion.
- Fixed ParseFlatbuffersFixture to pass quantized input/output properly
- Fixed thread-safety of runtime.
- Fixed Mobilenet caffe model crashing when GpuAcc is selected as compute device
- Fixed failing NetworkTests when CL support is on but Neon support is off