Skip to content

Conversation

janakagithub
Copy link

Description of PR purpose/changes

Adding scripts for processing UniProt .dat files into schema-compliant formats

This commit includes scripts that:

  • Parse and process UniProt .dat dumps
  • Convert the data into organized tab-delimited files
  • Transform tab-delimited files into schema-compliant Parquet files
  • Generate sample TSV files for QA/QC

Test files are available in the uniprotTest directory.

  • Please include a summary of the change and which issue is fixed.
  • Please also include relevant motivation and context.
  • List any dependencies that are required for this change.

Testing Instructions

  • Details for how to test the PR:
  • Tests pass locally and in GitHub Actions

Dev Checklist:

  • My code follows the guidelines at https://sites.google.com/lbl.gov/trussresources/home?authuser=0
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have run Ruff format to format my code
  • I have run Ruff check and fixed any errors that it uncovered
  • Any dependent changes have been merged and published in downstream modules

Updating Version and Release Notes (if applicable)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant