Skip to content

Incorrect hash when staging repository directory #6112

@DriesSchaumont

Description

@DriesSchaumont

Bug report

@rcannood and I have been looking at an issue for some time now where runs are not being resumed.
We've been able to reproduce the problem and found that it has to do with hashDirSha256 in HashBuilder.java. It is being called for directories which are part of the repository, after checking isAssetFile:

return hashDirSha256(hasher, path, base);

In the hashDirSha256 method, Files.walkFileTree is being used for calculating the hash of a directory by iterating over all files.

Files.walkFileTree(dir, new SimpleFileVisitor<Path>() {

However, a problem arises because the order in which the items in the directory are visited is not guaranteed. In case order of the files is different, the hasher calculates a different hash even when the contents of the directory is the same.

This bug is particularly hard to track down because depending on the kernel/filesystem combination, the order or iteration for walkFileTree may be the same. However, we've been able to use zfs to reliably reproduce the problem.

Please find a patch that is provided here: master...rcannood:nextflow:fix-hashing. We've applied it to the reproducible example and found that it resolves the issue.

Expected behavior and actual behavior

When an asset (repository) directory is being used as input for a process, the hash is not consistent across runs.

Steps to reproduce the problem

Please find a reproducable example here

Program output

See also here

Environment

  • Nextflow version: 25.04.2 build 5947
  • Groovy 4.0.26 on OpenJDK 64-Bit Server VM 17.0.15+6-Debian-1deb12u1
  • Operating system: Linux 6.8.12-10-pve SMP PREEMPT_DYNAMIC PMX 6.8.12-10 (2025-04-18T07:39Z) x86_64 GNU/Linux
  • Bash version: GNU bash, version 5.2.15(1)-release

Additional context

(Add any other context about the problem here)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions