-
Notifications
You must be signed in to change notification settings - Fork 201
feat: allow setting launchDir in params #4638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughRead Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant Wrapper as Nextflow Wrapper
participant FS as Filesystem
participant Shell
participant Nextflow
Caller->>Wrapper: invoke wrapper(params)
Wrapper->>Wrapper: filter params via set membership (exclude "launch_dir")
Wrapper->>Wrapper: launch_dir = params.get("launch_dir", None)
alt launch_dir provided
Wrapper->>FS: stat/check launch_dir
FS-->>Wrapper: exists / not exists
alt exists
Wrapper->>Wrapper: chdir(launch_dir)
else not exists
Wrapper-->>Caller: raise error
end
end
Wrapper->>Shell: run nextflow shell command (from current dir)
Shell->>Nextflow: execute
Nextflow-->>Shell: result
Shell-->>Wrapper: exit code/output
Wrapper-->>Caller: return result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
utils/nextflow/wrapper.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
utils/nextflow/wrapper.py
**/wrapper.py
⚙️ CodeRabbit configuration file
Do not complain about use of undefined variable called
snakemake.
Files:
utils/nextflow/wrapper.py
🪛 Ruff (0.13.3)
utils/nextflow/wrapper.py
38-38: Undefined name snakemake
(F821)
40-40: Undefined name snakemake
(F821)
41-41: Undefined name snakemake
(F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: testing
- GitHub Check: docs
- GitHub Check: Summary
🔇 Additional comments (1)
utils/nextflow/wrapper.py (1)
35-35: LGTM!The set membership check is a clean and efficient way to filter out reserved parameter names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
utils/nextflow/wrapper.py (1)
42-47: Good security improvements, but consider path normalization and simplifying the control flow.The current implementation successfully addresses the shell injection vulnerability and silent failure issues from previous reviews by using
os.chdir()instead of shell interpolation. However, consider these refinements:
Path normalization: The path should be normalized to prevent path traversal attempts and ensure consistent behavior:
launch_dir = os.path.abspath(launch_dir)Unnecessary else clause: Since line 45 raises an exception, the else clause is redundant and can be removed.
Error message style: Consider using a custom exception class or a shorter message to address the static analysis hint.
Apply this diff to implement the suggested refinements:
launch_dir = snakemake.params.get("launch_dir") if launch_dir: + launch_dir = os.path.abspath(launch_dir) if not os.path.isdir(launch_dir): - raise ValueError(f"launch_dir does not exist: {launch_dir}") - else: - os.chdir(launch_dir) + raise ValueError(f"Directory not found: {launch_dir}") + os.chdir(launch_dir)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
utils/nextflow/wrapper.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
utils/nextflow/wrapper.py
**/wrapper.py
⚙️ CodeRabbit configuration file
Do not complain about use of undefined variable called
snakemake.
Files:
utils/nextflow/wrapper.py
🪛 Ruff (0.13.3)
utils/nextflow/wrapper.py
38-38: Undefined name snakemake
(F821)
40-40: Undefined name snakemake
(F821)
42-42: Undefined name snakemake
(F821)
45-45: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: testing
- GitHub Check: docs
- GitHub Check: Summary
🔇 Additional comments (1)
utils/nextflow/wrapper.py (1)
35-35: LGTM! Cleaner approach to parameter filtering.The refactor from an if-elif chain to set membership is more maintainable and correctly excludes "launch_dir" from being passed as a Nextflow parameter.
|
Do you think it would be possible to add a test case? |
|
I think I'd just modify the existing testcase to include this parameter, so we don't unnecessarily run the pipeline multiple times. |
|
And it would be great of you could also add it to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
utils/nextflow/test/Snakefile (1)
40-62: Add test for.nextflowdirectory in launch_dir.
- Add a
verify_nextflow_locationrule in utils/nextflow/test/Snakefile that assertssome_directory/.nextflowexists after runningchipseq_pipeline.- Or extend the
chipseq_pipelineoutputs withnextflow_dir=directory("some_directory/.nextflow")to let Snakemake verify it automatically.
🧹 Nitpick comments (1)
utils/nextflow/test/Snakefile (1)
54-54: Simplify by removing the circular param-output reference.The param
launch_dir=lambda wildcards, output: output.launch_dircreates an unnecessary circular reference where the param depends on the output, which in turn depends on the param. This adds complexity without benefit.Apply this diff to simplify (after addressing the path resolution issue in the previous comment):
- launch_dir=lambda wildcards, output: output.launch_dir, + launch_dir="some_directory",
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
utils/nextflow/test/Snakefile(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: testing
- GitHub Check: docs
- GitHub Check: Summary
| multiqc_report="results/multiqc/broadPeak/multiqc_report.html", | ||
| # directory from which nextflow is launched, will contain the `.nextflow` directory | ||
| launch_dir=directory("some_directory"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Semantically, launch_dir should not be an output.
Declaring launch_dir as a directory() output is semantically incorrect. Snakemake outputs are artifacts produced by the rule, but launch_dir is merely a working directory where Nextflow runs. The .nextflow directory created inside it is the actual artifact.
Additionally, since the wrapper changes the working directory to launch_dir before running Nextflow, all relative paths (inputs, outputs, and the outdir param) will be resolved relative to some_directory instead of the Snakemake working directory. This means:
output.multiqc_reportdeclared as"results/multiqc/broadPeak/multiqc_report.html"will be created at"some_directory/results/multiqc/broadPeak/multiqc_report.html"- Snakemake will look for it at
"results/multiqc/broadPeak/multiqc_report.html"(relative to the Snakemake working directory) - The test will fail with a missing output error
Consider these alternatives:
Option 1: Pre-create the directory and pass it as a param
output:
multiqc_report="results/multiqc/broadPeak/multiqc_report.html",
- # directory from which nextflow is launched, will contain the `.nextflow` directory
- launch_dir=directory("some_directory"),
params:
pipeline="nf-core/chipseq",
revision="2.0.0",
profile=["test", "docker"],
- launch_dir=lambda wildcards, output: output.launch_dir,
+ launch_dir="some_directory",Then add a rule to create the directory before running the test:
rule prepare_launch_dir:
output:
directory("some_directory")
shell:
"mkdir -p {output}"Option 2: Make the wrapper convert paths to absolute before chdir
The wrapper should resolve all Snakemake input/output/params paths to absolute paths before changing directories, ensuring they remain valid after the chdir.
Run this script to verify whether the current implementation handles path resolution correctly:
🏁 Script executed:
#!/bin/bash
# Description: Check if the wrapper implementation handles path resolution after chdir
# Check if wrapper.py converts paths to absolute before chdir
rg -A 10 -B 5 'chdir.*launch_dir|os\.chdir' utils/nextflow/wrapper.pyLength of output: 263
Remove launch_dir from outputs and resolve paths before chdir
launch_diris not a produced artifact and must be aparam(or pre-created by a separate rule), not adirectory()output.- The wrapper calls
os.chdir(launch_dir)without converting Snakemake inputs/outputs (e.g."results/…/multiqc_report.html") to absolute paths, so the rule will write files undersome_directory/…and Snakemake won’t find them. - Fix by either:
- Pre-creating
some_directoryin a preparatory rule and passing it as a simple param, or - Modifying the wrapper to wrap every Snakemake path with
os.path.abspath(...)before performingos.chdir.
- Pre-creating
🤖 Prompt for AI Agents
In utils/nextflow/test/Snakefile around lines 47 to 49, the rule incorrectly
declares launch_dir as an output directory() while the wrapper treats it as a
runtime working directory and calls os.chdir(launch_dir), which causes
downstream output paths to be written under that directory and not found by
Snakemake; fix by removing launch_dir from the outputs and either (A) pre-create
"some_directory" in a separate preparatory rule and pass its path into this rule
as a simple param, or (B) change the wrapper to convert every Snakemake
input/output path to an absolute path (os.path.abspath(...)) before calling
os.chdir so Snakemake can still track outputs correctly.
| # We infer this from the output file path. Since that file path can be changed | ||
| # e.g. in case of cloud storage, we use a lambda function to infer the outdir. | ||
| outdir=lambda wildcards, output: str(Path(output[0]).parents[-2]), | ||
| outdir=lambda wildcards, output: str(Path(output.multiqc_report).parents[-2]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify the outdir calculation after path resolution changes.
The calculation .parents[-2] correctly extracts "results" from the path. However, after the wrapper changes into launch_dir, this relative path needs to be handled correctly. Ensure that either:
- The wrapper converts this param to an absolute path before chdir, or
- The path is explicitly made relative to the new working directory
Run this script to check how the wrapper handles params that contain paths:
🏁 Script executed:
#!/bin/bash
# Description: Check if outdir and other path params are handled correctly after chdir
# Look for param handling around the chdir operation
rg -A 15 -B 5 'snakemake\.params' utils/nextflow/wrapper.pyLength of output: 1277
🏁 Script executed:
#!/bin/bash
# Locate add_parameter definition and inspect its behavior
rg -n 'def add_parameter' -A5 utils/nextflow/wrapper.pyLength of output: 0
🏁 Script executed:
#!/bin/bash
# Search for add_parameter definition and usage across the repo
rg -n "add_parameter" -n .Length of output: 237
Make outdir absolute before changing into launch_dir.
add_parameter appends paths verbatim and the wrapper calls os.chdir(launch_dir) before invoking Nextflow, so a relative outdir (e.g. from parents[-2]) will be resolved against launch_dir instead of the project root. Convert outdir (and any other path params) to an absolute path (e.g. via os.path.abspath(value) or Path(value).resolve()) before the os.chdir in utils/nextflow/wrapper.py.
🤖 Prompt for AI Agents
In utils/nextflow/test/Snakefile around line 58, outdir is set via parents[-2]
which produces a relative path; because utils/nextflow/wrapper.py calls
os.chdir(launch_dir) before running Nextflow, relative path params will be
resolved incorrectly. Modify utils/nextflow/wrapper.py to convert outdir and any
other path-like parameters to absolute paths (e.g., os.path.abspath(value) or
Path(value).resolve()) when add_parameter is building the parameter dict or
immediately before os.chdir, so all path params are absolute prior to changing
the working directory.
|
Hm, I think the rabbit may potentially have a point regarding the relative-ness of paths; but the thing is that some nextflow pipelines resolve their (relative) paths relative to the launchDir… |
|
cc @famosab |
I think all pipelines resolve that relative to the launchDir (except for inputs where you can give absolute paths) and you can define the results / work folders to be created somewhere else than the launchDir (both relative to the launch and with absolute paths). But the .nextflow/. files will be in the launchDir folder. Does that help at least a bit? I think setting the launchDir makes sense but it depends on what you want to achieve with the wrapper because I think the outdir can be set to a different path than the launchDir as well as the workdir. |
Because nextflow does not allow specifying the
launchDir, but stores its.nextflowfolder in that directory, the wrapper now allows changing into the directory specified in the paramlaunch_dirbefore invoking nextflow.Not sure if this is the way to go / a robust way of doing so, though. Comments welcome.
Summary by CodeRabbit
New Features
Refactor
Documentation