Skip to content

v1.13.0-rc1

Pre-release
Pre-release
Compare
Choose a tag to compare
@evgeny-leksikov evgeny-leksikov released this 27 May 15:05
43f710a

1.13.0-rc1 (May 27, 2022)

Features

Core
  • Added new objects to VFS: local and remote address of endpoint, statistics of ucp_ep_create success/failure, failed/destroyed endpoints
  • Added support for UCX static libraries
  • Added profiling for rkey management routines
  • PCIe relaxed order enabled by default for AMD CPUs

UCP

  • Added API to pass pre-registered memory handle to UCP operations
  • Added implementation of AM rendezvous protocol
  • Added 2-stage pipeline rendezvous protocol for GPU
  • Added support for fragment mem_type for v1 pipeline proto, disabled by default
  • Added active message support for proto v2
  • Added UCP memory registration cache
  • Improved adaptive progress - deactivate iface when all p2p lanes are destroyed
  • Added support for user memh in proto_v1
  • Added support for selecting local address when creating a client endpoint
  • Added option to limit GPUDirectRDMA size in rendezvous protocol, UCX_RNDV_MEMTYPE_DIRECT_SIZE
  • Deprecated UCX_SOCKADDR_AUX_TLS configuration parameter

UCT

  • Introduced API uct_md_mkey_pack_v2
  • Introduced UCT iface features API
  • Introduced max_inflight_eps parameter in perf_attr API
  • Introduced UCT_SEND_FLAG_PEER_CHECK flag that forces checking connectivity to a peer
  • Introduced UCX_RCACHE_PURGE_ON_FORK to enable/disable cleaning regions when application is forking

RDMA CORE (IB, ROCE, etc.)

  • Introduced NDR autorecognition
  • Introduced CQE zipping support
  • Set the default MAX_RD_ATOMIC to maximum value supported by the hardware

ROCM

  • Increased maximum number of HSA agents

UCS

  • Added topo module infrastructure
  • Added memtrack and rcache information to VFS

Tools

  • Added support for pre-registered memory in ucx_perftest
  • Added loopback transport support for UCT perf tests

Bugfixes

Core

  • Fixed not deallocating memory from ucp_mem_unmap if no rcache
  • Fixed versioning infrastructure
  • Multiple code improvements: refactoring, debug prints and assertions, etc.
  • Multiple improvements in build, test and docs infrastructure

UCP

  • Resolving remote EP ID when creating local EP disabled by default
  • Multiple fixes in keepalive protocol
  • Fixed initialization request send state if software RMA/AMO in use
  • Fixed error handling in RMA and BW lanes selection logic
  • Fixed CM wireup fallback
  • Fixed occasional crash in finalize
  • Fixed AM proto flags
  • Fixed single zcopy proto initialization for AM
  • Fixed proto v2 selection, take into account user header length
  • Fixed selecting auxiliary transports when creating EP for sending EP_REMOVED
  • Fixed printing invalid configuration
  • Fixed allocation of indirect remote ID for internal EP if connected EP supports PEER_FAILURE
  • Fixed memh allocation when no rcache
  • Fixed protocol selection logic for UCP AM send
  • Fixed error handling flow for EP discard requests from pending queue
  • Fixed EP destroy flow
  • Fixed rsc_index for prereg_md_map
  • Fixed wireup error handling flow Create EP which send WIREUP_MSG/EP_REMOVED with AM lane only
  • Fixed probe for multi-fragment eager
  • Fixed alignment for AM rdesc init
  • Fixed perf estimation for proto v2
  • Fixed CM wireup with proto v2
  • Fixed EP discard flow during fast-forward
  • Fixed datatype issue in TAG send
  • Fixed EP refcount overflow
  • Fixed EP error handling flow
  • Fixed wire compatibility in address unpacking
  • Fixed ucp_ep_close_nb for failed endpoint when related requests have registered memory that should be invalidated
  • Fixed fragmented proto v2
  • Fixed UCP address v2 packing/unpacking and usage of seg_size
  • Fixed purge requests on failed endpoint
  • Fixed error handling of connecting p2p lanes during WIREUP phase
  • Fixed UCP endpoint use after free

UCT

  • Fixed ABI break of uct_ep_params_t
  • Fixed common intra-node keepalive protocol
  • Fixed a typo UCT_PERF_ATTR_FIELD_REMOTE_SYS_DEIVCE -> UCT_PERF_ATTR_FIELD_REMOTE_SYS_DEVICE
  • Fixed potential crash on MD mem alloc
  • Disabled PEER_FAILURE capability for XPMEM

RDMA CORE (IB, ROCE, etc.)

  • Fixed 2G aligned MR registration
  • Fixed FC_HARD_REQ resending
  • Fixed remote access to invalidated MR
  • Fixed max_rd_atomic_dc value for DV
  • Fixed DC handshake logic
  • Fixed error handling flows
  • Fixed flush(CANCEL) with UD and DC transports
  • Fixed multi-path handling for passive endpoint with UD transport
  • Fixed attributes for DV QP creation
  • Fixed device query
  • Fixed memory leak in case of disabling RDMA transport
  • Fixed dci->pool_index initialization
  • Fixed fallback if port speed not detected
  • Fixed tag offload recv for inlined data
  • Fixed PKEY index initialization
  • Disabled mlx5 ifaces on verbs MD

TCP

  • Fixed flush(CANCEL)
  • Fixed close protocol when UCT EP pairs have only RX capability
  • Fixed query local/remote saddr

GPU (CUDA, ROCM)

  • Fixed a bug in invalidating address range in CUDA_IPC
  • Fixed CUDA context caching and cleanup
  • Fixed ROCM initialization
  • Fixed ROCM components compilation
  • Fixed IPC tls reachability check
  • Fixed ROCM memory type detection
  • Use ROCM remote_agent if available

KNEM

  • Fixed memory registration cost

UCM

  • Fixed potential hang on init

UCS

  • Fixed name shadow problem in CentOS6.x

Tools

  • Print stream API limits and handle stream feature in ucx_info
  • Replaced ucp_ep_close_nb by ucp_ep_close_nbx in examples
  • Replaced completed field by checking UCS status in io_demo

JAVA

  • Throw exception if ucp_mem_query failed

GO

  • Disabled go bindings in rpmbuild
  • Fixed configure behavior if can't find go compiler
  • Standalone performance benchmark
  • Increased port range + make it dependent on agent_id
  • Check compiler minimum version
  • Set GOCACHE to a local directory that is cleared for each job in CI
  • Disabled module for goperftest
  • Fixed OOS build