-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
summary: Add option to store **best** value? #89
Comments
As to problems: As to saving the best, I think we should let users provide callable taking list as argument/comparing two values, that by default would be choosing max(list) or max(x1, x2). That way users could provide their own methods to find the best. This kind of solution might be quite short-sighted though, with heavily customized metrix one might have to provide this argument each time they call log. This is something that needs to be considered. Maybe we need to be able to configure that. |
I think that a structured summary would be a great idea and it would make the summary easier to extend for future ideas/feature requests.
I'm not sure if a custom callable would be needed. I searched for similar scenarios (i.e. the tensorflow/keras ModelCheckpoint) and they just provide the options A problem in the |
Yeah, but before implementing that we need to consider how do we want to handle it on DVC side - if we include those in summary json, those values will be treated as metrics from dvc point of view. While it might make sense for "best" and "latest" I don't think its viable use case for "system" metrics. On the other hand I can totally imagine situation where someone wants to visualize memory usage and want to plot system controlled metrics. We need to research and discuss what do we actually want from those parameters. |
I see your point. I think that we could make a distinction between:
Generated in
Generated in So, for the specific case of In this scenario I think that your original idea of custom callable would make much more sense for letting the user decide the "aggregation". |
@daavoo Having max/min is a good idea! Do we need extra metrics / files? Can it be done as DVC cmd. Like For system metrics - yes, a separate metrics / file is needed. |
I was trying an example repo with In this I first run the pipeline and stop after the train stage: dvc exp run train At this point
And manually (looking at the metrics/params) select and apply the best one:
After that I run the downstream stages that depended on the best checkpoint:
However I think that there should be a better way to automatically "select and apply" the best checkpoint in order to avoid the manual step and allow the pipeline to run end-to-end. |
In this case, you not only want to keep an additional metric for the best value, but you might want to save the best model instead of the latest model, similar to the |
@dberenbaum wouldn't that mean that we would somehow need to spercify what does "the best" mean? |
If you use
|
Yes. At least, it would probably entail specifying the metric to monitor and whether min/max is best.
Actually deleting the other models requires some additional steps ( Maybe more importantly, keeping only the best model is one way to address the issue you raised:
If dvclive had an option similar to I'm not sure whether this would be a good idea, but curious to get your thoughts on this workflow @daavoo. |
I personally like that workflow more than the On the one hand we could implement all the functionality on the
I think this would have the benefit of leaving us with full control over the implementation. However, we would probably need to consider how easily users could adapt their code to this workflow and how compatible would be with existing functionality on each integration. For example, if someone is using This could also mean that, for some integrations, the user will lose functionality as we might not cover as much as the own ML Framework. On the other hand we could rely on somehow reusing the logic for monitoring the best metric already implemented in each integration, similar to what the |
Great points! Users will probably rather reuse the existing functionality within those ML frameworks, and it fits better with the dvclive ethos of being lightweight and interfering with code as little as possible. It does make the integrations more involved and potentially inconsistent between frameworks, but better to handle this in dvclive than expect users to do it themselves, right? Existing model checkpoint callbacks already handle storing the best metric value and model weights associated with it:
Maybe we can focus on building integrations with existing callbacks. I can think of a few specific enhancements we could build around summarizing the best values:
|
wandb now has support for something similar: https://docs.wandb.ai/guides/track/log#customize-the-summary. DVCLive could support something similar with either:
I like the latter since mixing metrics from different steps seems unhelpful to me. Edit: As part of 2, we could also put these arguments into the init method like |
By default the values for each metric at the latest step are being saved in the
summary
. It might be interesting to add an option to additionally save the best value for each metric.An example use case would be the usual deep learning training loop where the model is being validated at the end of each epoch and the last epoch is not necessarily the one with the best performance. Having the best value saved in the
summary
could be more useful for comparing experiments (i.e.dvc metrics diff --targets dvclive.json
)A potential problem would be how to expose to the user (#75 (comment)) some options like whether to use the ¿
save_best
? or not and what to consider as better (i.e. higher or lower)The text was updated successfully, but these errors were encountered: