Skip to content

Conversation

shntnu
Copy link
Contributor

@shntnu shntnu commented Aug 26, 2025

Summary

  • Migrated from Poetry to uv package manager for faster dependency resolution
  • Added comprehensive idempotency tests to verify SMILES standardization consistency. This will close out Homogenise chemical sources #24
  • Improved test robustness with better error reporting for non-idempotent cases

Changes

  • Replaced pyproject.toml Poetry configuration with uv-compatible format
  • Added test/test_idempotency.py with configurable dataset sizes (sample/full)
  • Enhanced error messages to show exactly which SMILES fail idempotency checks
  • Updated CI/documentation to use uv commands

Test plan

  • All existing tests pass
  • New idempotency tests pass for both standardization methods
  • Package installs correctly with uv sync
  • CLI tool works as expected

🤖 Generated with Claude Code

Co-Authored-By: Claude [email protected]

@shntnu shntnu requested a review from srijitseal August 26, 2025 17:22
shntnu and others added 4 commits August 26, 2025 18:14
- Convert pyproject.toml from Poetry to PEP 621 standard format
- Switch build backend from poetry-core to hatchling
- Add explicit Python version constraint (>=3.11,<3.13) for RDKit compatibility
- Add CLI entry point via [project.scripts]
- Create main() function for proper Fire CLI integration
- Add comprehensive class docstrings for standardization methods
- Suppress RDKit 2023.9.5 deprecation warnings in pytest.ini
- Update README with uv installation and usage patterns
- Add uv.lock for reproducible dependency resolution
- Format code with ruff
- Bump version to 0.0.2

Maintains strict RDKit 2023.9.5 pinning for JUMP dataset reproducibility.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Add comprehensive idempotency testing infrastructure to verify that standardizing
already-standardized SMILES produces identical results. This ensures the
standardization process is deterministic and stable.

Changes:
- Add test/test_idempotency.py with parameterized tests for 100-sample and full dataset
- Add scripts/download_jump_compounds.py for refreshing test data (sandboxed with uv)
- Include compressed JUMP compounds test data (~1.7MB with only SMILES column)
- Update pytest.ini with slow/very_slow markers for test organization
- Update README.md and CLAUDE.md with testing instructions

The test data contains ~115k real JUMP compounds and tests can run in two modes:
- Fast mode: 100 randomly sampled compounds (~11 seconds)
- Full mode: All ~115k compounds (marked as very_slow)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@shntnu shntnu changed the title Migrate to uv and add idempotency testing feat(jump_smiles): migrate to uv and add idempotency testing Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant