You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is another QOL feature I implemented for the sake of my own experiments, but that might be nice add to the mainline Casanovo release. I added a new config option val_patience_interval that defaults to -1 (to mirror the functionality of max_epochs), but if val_patience_interval is set to a positive value then an early stopping callback is added to the model runner using PyLightning's EarlyStopping callback. This callback will monitor valid_CELoss and will stop model training if the valid_CELoss doesn't improve for val_patience_interval.
My implementation is on the branch val-early-stop. I also changed the best validation checkpoint filename from <root>.best.ckpt to <root>.<epoch>-<step>.best.ckpt. If we want to implement add the early stopping feature, but we don't want to change the best filename, I can remove this before submitting a PR.
The text was updated successfully, but these errors were encountered:
My implementation is on the branch val-early-stop. I also changed the best validation checkpoint filename from <root>.best.ckpt to <root>.<epoch>-<step>.best.ckpt. If we want to implement add the early stopping feature, but we don't want to change the best filename, I can remove this before submitting a PR.
I don't think that this is an ideal change. The reasoning behind the best.ckpt file was that its filename would always be the same, so that the user can immediately get it. Adding the epoch number removes this advantage.
While adding the early stopping patience is a small change that can make training a bit more convenient, one thing to make sure in your implementation is that it is defined in terms of the number of training steps, not epochs. When we're training on the full MassIVE-KB data, there is convergence even before a full epoch has been processed. Hence also why val_check_interval and some other training options are defined in terms of the number of steps.
This is another QOL feature I implemented for the sake of my own experiments, but that might be nice add to the mainline Casanovo release. I added a new config option
val_patience_interval
that defaults to -1 (to mirror the functionality ofmax_epochs
), but ifval_patience_interval
is set to a positive value then an early stopping callback is added to the model runner using PyLightning'sEarlyStopping
callback. This callback will monitorvalid_CELoss
and will stop model training if thevalid_CELoss
doesn't improve forval_patience_interval
.My implementation is on the branch
val-early-stop
. I also changed the best validation checkpoint filename from<root>.best.ckpt
to<root>.<epoch>-<step>.best.ckpt
. If we want to implement add the early stopping feature, but we don't want to change the best filename, I can remove this before submitting a PR.The text was updated successfully, but these errors were encountered: