Problem
When writing to an NFS/Lustre filesystem under high concurrency, flush_file in platform.cpp can fail with Input/output
error. The error is logged but not raised so the write returns success to the caller, the process exits 0, and the
resulting shard files are silently corrupt.
Log output observed:
platform.cpp:93 flush_file: Failed to flush file: Input/output error
Reading back the written shards then raises:
RuntimeError: error during blosc decompression: 0
The corruption is partial and timepoint-dependent: shards written during peak NFS contention are corrupt while earlier
ones are fine. This makes it extremely hard to detect without reading back every chunk.
Environment: NFS/Lustre archive filesystem, 200+ parallel writers, acquire-zarr 0.8.0
Expected behavior
If flush_file fails, the error should propagate up as a C++ exception (and surface as a Python exception), so the
caller knows the write failed and can exit non-zero, retry, or abort cleanly.
Suggested fixes
- Propagate the error: Raise from flush_file rather than logging and continuing, so callers always know when a write
has failed.
- Atomic shard writes: Write each shard to a temp file and rename it into place only on success. This prevents a
partially-flushed file from ever appearing as a valid (but corrupt) shard, regardless of whether the exception
propagates correctly.
Reproduction
The failure is not easily reproducible without an NFS filesystem under write contention, but the pattern is:
import acquire_zarr as aqz
import numpy as np
import zarr
settings = aqz.StreamSettings(
store_path="/nfs/archive/test.zarr", # NFS path under high concurrency
arrays=[
aqz.ArraySettings(
output_key="0",
shape=(1, 1, 1, 2048, 2048),
chunks=(1, 1, 1, 2048, 2048),
dtype=aqz.DataType.UINT16,
version=aqz.ZarrVersion.V3,
)
],
version=aqz.ZarrVersion.V3,
)
stream = aqz.ZarrStream(settings)
for t in range(141):
block = np.random.randint(0, 65535, (1, 1, 1, 2048, 2048), dtype=np.uint16)
stream.append(block, key="")
stream.close() # Returns without raising, even when flush failed.
# stderr shows: platform.cpp:93 flush_file: Failed to flush file: Input/output error
Reading back the written data then raises:
arr = zarr.open("/nfs/archive/test.zarr/0", mode="r")
np.asarray(arr[70, 0, 0, :4, :4]) # RuntimeError: error during blosc decompression: 0
Problem
When writing to an NFS/Lustre filesystem under high concurrency, flush_file in platform.cpp can fail with Input/output
error. The error is logged but not raised so the write returns success to the caller, the process exits 0, and the
resulting shard files are silently corrupt.
Log output observed:
platform.cpp:93 flush_file: Failed to flush file: Input/output error
Reading back the written shards then raises:
RuntimeError: error during blosc decompression: 0
The corruption is partial and timepoint-dependent: shards written during peak NFS contention are corrupt while earlier
ones are fine. This makes it extremely hard to detect without reading back every chunk.
Environment: NFS/Lustre archive filesystem, 200+ parallel writers, acquire-zarr 0.8.0
Expected behavior
If flush_file fails, the error should propagate up as a C++ exception (and surface as a Python exception), so the
caller knows the write failed and can exit non-zero, retry, or abort cleanly.
Suggested fixes
has failed.
partially-flushed file from ever appearing as a valid (but corrupt) shard, regardless of whether the exception
propagates correctly.
Reproduction
The failure is not easily reproducible without an NFS filesystem under write contention, but the pattern is:
stream.close() # Returns without raising, even when flush failed.
# stderr shows: platform.cpp:93 flush_file: Failed to flush file: Input/output error
Reading back the written data then raises:
arr = zarr.open("/nfs/archive/test.zarr/0", mode="r")
np.asarray(arr[70, 0, 0, :4, :4]) # RuntimeError: error during blosc decompression: 0