Skip to content

Conversation

@nkongenelly
Copy link
Contributor

@nkongenelly nkongenelly commented Nov 25, 2025

This code allows subsampling to work with relative numbers too as the initial implementation limited it to absolute number of reads (#50).
seqtk_sample module itself allows passing fractions as the sample_size thus only our schema has been updated here to allow both integers and floats

It also writes a warning to .nextflow.log for every sample that does not reach the selected absolute sample_size

Fixes #60
Resolves #79

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/seqinspector branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@nkongenelly nkongenelly force-pushed the seqtk_sample_size_warning branch from 073204d to cd36140 Compare November 25, 2025 11:16
@matrulda
Copy link
Contributor

I tried updating the log path to
def log_path = "$launchDir" + "/" + "meta" + "/" + "nextflow.log" (realizing later that the metaDir variable could be used). But the warnings are not written to that log.. I can't find any other nextflow logs in the launchdir or metadir 🤔

@nkongenelly nkongenelly force-pushed the seqtk_sample_size_warning branch from a626d7f to bc8e8c2 Compare November 26, 2025 13:19
nkongenelly and others added 2 commits November 26, 2025 14:29
@nkongenelly nkongenelly force-pushed the seqtk_sample_size_warning branch from bc8e8c2 to 77b8e30 Compare November 26, 2025 14:12
@nkongenelly
Copy link
Contributor Author

@nf-core-bot fix linting please

@nkongenelly
Copy link
Contributor Author

I tried updating the log path to def log_path = "$launchDir" + "/" + "meta" + "/" + "nextflow.log" (realizing later that the metaDir variable could be used). But the warnings are not written to that log.. I can't find any other nextflow logs in the launchdir or metadir 🤔

Thanks for looking into this. I have now pushed the code that fixes this.

@ramprasadn ramprasadn self-requested a review November 26, 2025 15:49
@maxulysse maxulysse added this to the 1.0.0 milestone Dec 11, 2025
@maxulysse maxulysse self-assigned this Dec 11, 2025
@maxulysse
Copy link
Member

@nkongenelly hope you don't mind, I solved the merge conflicts.
I also reworked the test you added, to capture the stdout instead, and used log.warn instead of println.
Changed the view operator for a map operator as well.
And lastly I put the sample name first, to make it more explicit in the logs.

@maxulysse maxulysse merged commit 02477d1 into nf-core:dev Dec 12, 2025
20 checks passed
maxulysse added a commit that referenced this pull request Jan 13, 2026
Seqtk add relative sample sets and warning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Run checks on a random (reproducible) subset of reads Add relative sample size input in subsampling

4 participants