-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows: retry a file remove - a virus checker might lock files #11437
base: main
Are you sure you want to change the base?
Windows: retry a file remove - a virus checker might lock files #11437
Conversation
cb58a63
to
55c1a8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MSoegtropIMC! While we use Dune on Windows every day at LexiFi, I have not had reports of this particular issue as far as I remember. On the other hand, we explicitly make sure not to use any third-party virus scanners and instead use only the bulit-in Windows one, as we have reports of various types of unreliability related to third-party virus scanners.
That said, the fix seems mostly harmless and looks esentially good to me (modulo two questions below) if it fixes the issues you have observed.
Unix.sleep 1; | ||
(try Unix.unlink fn | ||
with _ -> if cnt>0 then retry_loop (cnt-1) else raise e) | ||
in retry_loop 30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Should we restrict this logic only if
Sys.win32
? - Should we print a message to the console to inform the user of what is going on? Something like
Could not delete file %S. Retrying...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the name of the function implies it is Windows specific - this function is not called unless we have Sys.win32
.
I would say we should definitely print a message in verbose mode, but I would need advice how to do it - I expect that dune has some elaborate logging facilities which should be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the name of the function implies it is Windows specific - this function is not called unless we have
Sys.win32
.
Right, thanks.
I would say we should definitely print a message in verbose mode, but I would need advice how to do it - I expect that dune has some elaborate logging facilities which should be used.
I think that the Log
module is what you should use. Grep for Log.info
to see examples of its use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say yes to both questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maiste : as discussed, this code is already WIndows specfic, so there is no point in checking for Sys.win32.
I will look into adding an info message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, sorry, I wrote my replies without reloading the page (only see first reply at this time). Thanks! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit worried about blocking Dune for that long (up to 30x 1 second), as that will lock the entirety of the Dune execution and not even other processes can execute.
Maybe then it should be converted to a Fiber
?
Note that I regularly build about 180 opam packages on Windows (Coq Platform, I guess half of these use dune) and only 3 of these fail - but these fail fairly reliably (90%). I would expect it also fails if only the Windows virus scanner is there. As I explained in the issue this is a problem of how the Windows file system works, not how virus scanners work. Any virus scanner which does its job properly should produce an issue if exe files are deleted very shortly after their creation. To fail the package needs to use certain form of PPX in specific ways to fail - what exactly it is I couldn't find out as yet. |
Fixes ocaml#11425 Signed-off-by: Michael Soegtrop <[email protected]>
55c1a8b
to
242b027
Compare
@maiste : I saw you just rebased - I still want to add the logging, but I am super busy this week. In Coq Platform I used a patched dune, so it is not urgent for me. If someone just knows how to write the one liner for the logging, please go ahead. I would need 1/2 hour to look into the dune logging system. |
OK, I'll see if I've got time to take a look at it this week 👍 EDIT: I won't have time to work on it soon. |
Fixes #11425
A few notes: the Windows error in this situation is "file lockaed" and not "permission denied", but afaik the Unix error codes do not distinguish these cases.
One could be more restrictive in further with blocks also handling only `EACCESS´ - I am not sure what is best.
Regarding the retry time of 30x1s: this might be excessive, but 10s may happen on a heavily loaded machine. Usually a virus scanner should require less than 1s.
I tested this in a large opam meta project (Coq Platform) where without this patch 3 builds fail. With this patch, this worked reliably (tested with dune 3.16.1 because dune 3.17.X has other issues on Windows I didn't research in detail as yet, but e.g. cariro doesn't find cairo.h with 3.17.2 but it does with 3.16.1.