Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect hash used if file is modified and hashed twice within 100ns on Windows #427

Open
ecraig12345 opened this issue Sep 10, 2022 · 0 comments

Comments

@ecraig12345
Copy link
Member

This is an extreme edge case which is very unlikely to occur outside of tests and is probably not worth the effort to fix (if it's even possible), so this issue is mainly to serve as documentation.

getFileHash uses a cache based on the file modification time, and does not re-calculate the hash if the modification time is unchanged. This is a good approach for perf and usually doesn't cause any problems.

export async function getFileHash(
cwd: string,
filePath: string
): Promise<string> {
const fileAbsPath = path.join(cwd, filePath);
const stat = await fs.stat(fileAbsPath);
if (memo.has(fileAbsPath) && memo.get(fileAbsPath)!.has(stat.mtimeMs)) {
return memo.get(fileAbsPath)!.get(stat.mtimeMs)!;

However, the cache storage test was failing on Windows in CI (I couldn't repro locally) because a file that's changed in the code below was not being registered as changed. This was because the hash hadn't changed.

It turns out the reason the hash hadn't changed was because the CI build runs quickly enough that the file creation and modification happened within the same 100ns interval. This was fine on Mac and Linux which use filesystems that record mtimes with 1ns precision, but NTFS only records mtimes with 100ns precision. So on NTFS, the mtime wouldn't change and the cached hash would be used.

I worked around this in the test by adding a 1ms wait. Based on how backfill is actually used, such as in lage, it seems extremely unlikely that this would happen in a real scenario.

fs.writeFileSync(path.join(dir, "Changing"), "changing content");
await storage.fetch(hash);
// Wait 1ms to make sure the file modification time changes.
// WHY: hashFile uses the file modification time to determine if the hash
// needs to be recomputed. The filesystems typically used by Mac and Linux
// record these times at 1ns resolution, but NTFS uses 100ns resolution.
// So sometimes the test would fail on Windows if it ran too quickly.
await new Promise((resolve) => setTimeout(resolve, 1));
fs.writeFileSync(path.join(dir, "Changing"), "changing content now");
await storage.put(hash, ["**/*"]);
expect(storage.filesToCache).toEqual(["Changing"]);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant