For science quality runs, we should improve traceability and reproducibility. Currently, FEDS runs are not fully self-documented, and depend on someone remembering some combination of: which input data was used, which version of the codebase was used, the environment they were created in (which can be mismatched between DPS, ADE, and local), what settings were used, which region bounds were used. Some of this information exists, but is spread around various logs, .env files, or is left implicit in directory naming. This worked fine at first, but as we continue to add new sensors (NOAA21), new types of runs (NRT vs "archive" vs "constrained archive," and more), and more collaborators, having a systematic record will only become more important.
As a goal, I'd like to get to a "single source of truth-" one metadata record for each run that contains information such as:
@eorland @mccabete are there other bits of information you would like to see in this record?
For science quality runs, we should improve traceability and reproducibility. Currently, FEDS runs are not fully self-documented, and depend on someone remembering some combination of: which input data was used, which version of the codebase was used, the environment they were created in (which can be mismatched between DPS, ADE, and local), what settings were used, which region bounds were used. Some of this information exists, but is spread around various logs, .env files, or is left implicit in directory naming. This worked fine at first, but as we continue to add new sensors (NOAA21), new types of runs (NRT vs "archive" vs "constrained archive," and more), and more collaborators, having a systematic record will only become more important.
As a goal, I'd like to get to a "single source of truth-" one metadata record for each run that contains information such as:
@eorland @mccabete are there other bits of information you would like to see in this record?