Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: retry a file remove - a virus checker might lock files #11437

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

MSoegtropIMC
Copy link

Fixes #11425

A few notes: the Windows error in this situation is "file lockaed" and not "permission denied", but afaik the Unix error codes do not distinguish these cases.

One could be more restrictive in further with blocks also handling only `EACCESS´ - I am not sure what is best.

Regarding the retry time of 30x1s: this might be excessive, but 10s may happen on a heavily loaded machine. Usually a virus scanner should require less than 1s.

I tested this in a large opam meta project (Coq Platform) where without this patch 3 builds fail. With this patch, this worked reliably (tested with dune 3.16.1 because dune 3.17.X has other issues on Windows I didn't research in detail as yet, but e.g. cariro doesn't find cairo.h with 3.17.2 but it does with 3.16.1.

@MSoegtropIMC MSoegtropIMC force-pushed the windows-retry-file-remove-main branch from cb58a63 to 55c1a8b Compare February 3, 2025 16:09
@maiste maiste added the windows label Feb 3, 2025
Copy link
Collaborator

@nojb nojb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MSoegtropIMC! While we use Dune on Windows every day at LexiFi, I have not had reports of this particular issue as far as I remember. On the other hand, we explicitly make sure not to use any third-party virus scanners and instead use only the bulit-in Windows one, as we have reports of various types of unreliability related to third-party virus scanners.

That said, the fix seems mostly harmless and looks esentially good to me (modulo two questions below) if it fixes the issues you have observed.

Unix.sleep 1;
(try Unix.unlink fn
with _ -> if cnt>0 then retry_loop (cnt-1) else raise e)
in retry_loop 30)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Should we restrict this logic only if Sys.win32 ?
  • Should we print a message to the console to inform the user of what is going on? Something like Could not delete file %S. Retrying...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the name of the function implies it is Windows specific - this function is not called unless we have Sys.win32.

I would say we should definitely print a message in verbose mode, but I would need advice how to do it - I expect that dune has some elaborate logging facilities which should be used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the name of the function implies it is Windows specific - this function is not called unless we have Sys.win32.

Right, thanks.

I would say we should definitely print a message in verbose mode, but I would need advice how to do it - I expect that dune has some elaborate logging facilities which should be used.

I think that the Log module is what you should use. Grep for Log.info to see examples of its use.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say yes to both questions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maiste : as discussed, this code is already WIndows specfic, so there is no point in checking for Sys.win32.

I will look into adding an info message.

Copy link
Collaborator

@maiste maiste Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, sorry, I wrote my replies without reloading the page (only see first reply at this time). Thanks! 👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit worried about blocking Dune for that long (up to 30x 1 second), as that will lock the entirety of the Dune execution and not even other processes can execute.

Maybe then it should be converted to a Fiber?

@MSoegtropIMC
Copy link
Author

MSoegtropIMC commented Feb 4, 2025

Note that I regularly build about 180 opam packages on Windows (Coq Platform, I guess half of these use dune) and only 3 of these fail - but these fail fairly reliably (90%). I would expect it also fails if only the Windows virus scanner is there. As I explained in the issue this is a problem of how the Windows file system works, not how virus scanners work. Any virus scanner which does its job properly should produce an issue if exe files are deleted very shortly after their creation.

To fail the package needs to use certain form of PPX in specific ways to fail - what exactly it is I couldn't find out as yet.

@maiste maiste force-pushed the windows-retry-file-remove-main branch from 55c1a8b to 242b027 Compare February 6, 2025 09:16
@MSoegtropIMC
Copy link
Author

@maiste : I saw you just rebased - I still want to add the logging, but I am super busy this week. In Coq Platform I used a patched dune, so it is not urgent for me. If someone just knows how to write the one liner for the logging, please go ahead. I would need 1/2 hour to look into the dune logging system.

@maiste
Copy link
Collaborator

maiste commented Feb 6, 2025

OK, I'll see if I've got time to take a look at it this week 👍

EDIT: I won't have time to work on it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sandboxing on Windows does not work in presence of a virus scanner
4 participants