Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Feature: Python script and module to check dataset readiness for data preservation #236

Closed
wants to merge 20 commits into from

Conversation

astrochun
Copy link
Contributor

@astrochun astrochun commented Jul 9, 2021

Description

This is WIP. Do not merge!
This is a new feature to support preparation for preservation checks.

Closes #235

ToDo List

@astrochun:
As of 07/26/2021, here is the current status of this PR:
While most methods have been tested at a minimum level for the preservation check workflow, it is still not fully tested/explored. The methods of Preserve that have been tested include: get_metadata, save_metadata, check_files, make_symbolic_links, delete_old_readme_files, delete_hidden_files, delete_files (used by delete_hidden_files).

I have not tested:

  • The Preserve.update_files method
  • I have not done full end-to-end testing with instances where there is (1) no issue (i.e., curation version matches published version), (2) cases where the are differences between curation version and published version (we were not immediately aware of such a version for testing).
  • I have not also tested this against dataset that does not have data (e.g., NPN datasets). Presumeably the --metadata_only option should work for this purpose.

In addition, we should have the script run as a "dry run" by default and to have a --update option. This will ensure that files are not replaced and to not download metadata. This would be a keyword input option that would need to be available to all method, probably through a (self) instance variable in Preserve.

Test plan

Update Changelog

Resources

Screenshots or additional context

@astrochun astrochun added this to the v1.2.0 milestone Jul 9, 2021
@astrochun astrochun added curation Pertains to aspects of curation, including workflow management enhancement New feature or request preservation Related to data preservation scripts Script development labels Jul 9, 2021
@astrochun astrochun self-assigned this Jul 9, 2021
 - Call Preserve.delete_old_readme_files() in preserve_checks
 - Include info message when not deleting files
 - Note: This is an initial (not final) commit for the attempt!
 - Call Preserve.update_files() in preserve_checks
@astrochun astrochun removed their assignment Mar 29, 2022
@zoidy
Copy link
Collaborator

zoidy commented Feb 8, 2023

Preservation work is now in the ReBACH repo. Closing this PR

@zoidy zoidy closed this Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
curation Pertains to aspects of curation, including workflow management enhancement New feature or request preservation Related to data preservation scripts Script development
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: Python script and module to check dataset readiness for data preservation
2 participants