Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement early stopping / validation patience interval. #375

Open
Lilferrit opened this issue Sep 6, 2024 · 1 comment
Open

Implement early stopping / validation patience interval. #375

Lilferrit opened this issue Sep 6, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@Lilferrit
Copy link
Contributor

This is another QOL feature I implemented for the sake of my own experiments, but that might be nice add to the mainline Casanovo release. I added a new config option val_patience_interval that defaults to -1 (to mirror the functionality of max_epochs), but if val_patience_interval is set to a positive value then an early stopping callback is added to the model runner using PyLightning's EarlyStopping callback. This callback will monitor valid_CELoss and will stop model training if the valid_CELoss doesn't improve for val_patience_interval.

My implementation is on the branch val-early-stop. I also changed the best validation checkpoint filename from <root>.best.ckpt to <root>.<epoch>-<step>.best.ckpt. If we want to implement add the early stopping feature, but we don't want to change the best filename, I can remove this before submitting a PR.

@Lilferrit Lilferrit added the enhancement New feature or request label Sep 6, 2024
@bittremieux
Copy link
Collaborator

My implementation is on the branch val-early-stop. I also changed the best validation checkpoint filename from <root>.best.ckpt to <root>.<epoch>-<step>.best.ckpt. If we want to implement add the early stopping feature, but we don't want to change the best filename, I can remove this before submitting a PR.

I don't think that this is an ideal change. The reasoning behind the best.ckpt file was that its filename would always be the same, so that the user can immediately get it. Adding the epoch number removes this advantage.

While adding the early stopping patience is a small change that can make training a bit more convenient, one thing to make sure in your implementation is that it is defined in terms of the number of training steps, not epochs. When we're training on the full MassIVE-KB data, there is convergence even before a full epoch has been processed. Hence also why val_check_interval and some other training options are defined in terms of the number of steps.

@bittremieux bittremieux added this to the Casanovo v5.0.0 milestone Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants