Skip to content

Fix hashing of asset directories #6113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

rcannood
Copy link
Contributor

@rcannood rcannood commented May 22, 2025

Files.walkFileTree does not guarantee any specific order for iterating over files within a directory:

Considerations When Creating a FileVisitor

A file tree is walked depth first, but you cannot make any assumptions about the iteration order that subdirectories are visited.

-- source

This causes the sha of HashBuilder.hashDirSha256 to be different in certain situations (#6112).

This PR fixes the issue by storing all of the (path, sha256) in an array list, sorting by path, and then passing them to the hasher.

Notes:

  • The code in preVisitDirectory seems like it would cause a null pointer exception if the condition (base != null) is false -- so maybe base is always not null? What should happen in this code chuck if base does happen to be null?
  • It's hard to add a unit test for this, since the issue only arises in very specific situations.

Copy link

netlify bot commented May 22, 2025

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 31e1a8a
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/68430ca527377e0008af5881

@rcannood
Copy link
Contributor Author

rcannood commented Jun 2, 2025

Hi everybody! Can I get some feedback from the Seqera team on this issue + PR? 🙇

@pditommaso pditommaso force-pushed the master branch 3 times, most recently from b4b321e to 069653d Compare June 4, 2025 18:54
Copy link
Member

@pditommaso pditommaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reporting this, very tricky. Let a comment to improve the implementation

@pditommaso pditommaso added the cache-breaking Changes that will break everyone's task cache label Jun 6, 2025
rcannood and others added 3 commits June 6, 2025 12:05
Signed-off-by: Robrecht Cannoodt <[email protected]>
Signed-off-by: Robrecht Cannoodt <[email protected]>
Copy link
Contributor Author

@rcannood rcannood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, @pditommaso ! I swapped the ArrayList for a TreeMap, which makes more sense.

Regarding the potential NullPointerException in preVisitDirectory, which solution would you prefer?

Signed-off-by: Robrecht Cannoodt <[email protected]>
@rcannood rcannood requested a review from pditommaso June 6, 2025 13:53
Signed-off-by: Robrecht Cannoodt <[email protected]>
@bentsherman bentsherman self-requested a review June 6, 2025 14:30
@bentsherman
Copy link
Member

I think this function can be simplified by simply applying hashUnorderedCollection() to the set of key-value pairs. I will give it a try...

Signed-off-by: Ben Sherman <[email protected]>
@bentsherman
Copy link
Member

@rcannood let me know if my changes work for you

@bentsherman bentsherman changed the title Fix hashing of directories Fix hashing of asset directories Jun 6, 2025
@rcannood
Copy link
Contributor Author

rcannood commented Jun 6, 2025

@rcannood let me know if my changes work for you

Thanks @bentsherman ! I'll run it on my other partition when I'm back at my PC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cache-breaking Changes that will break everyone's task cache
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect hash when staging repository directory
3 participants