Skip to content

Conversation

@tedil
Copy link
Contributor

@tedil tedil commented Oct 9, 2025

Because nextflow does not allow specifying the launchDir, but stores its .nextflow folder in that directory, the wrapper now allows changing into the directory specified in the param launch_dir before invoking nextflow.
Not sure if this is the way to go / a robust way of doing so, though. Comments welcome.

Summary by CodeRabbit

  • New Features

    • Optional launch_dir support: runs can validate and switch to a specified working directory before execution to improve reproducibility.
  • Refactor

    • Parameter forwarding adjusted to exclude internal keys (including launch_dir) from downstream command inputs for cleaner behavior.
  • Documentation

    • Added configuration documentation describing the launch_dir option and its intended behavior.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 9, 2025

📝 Walkthrough

Walkthrough

Read launch_dir from Snakemake params and, if provided, validate and chdir into it before running the Nextflow shell command; replace multi-branch param filtering with a set-membership check excluding "launch_dir"; test Snakefile updated to name outputs and forward launch_dir; meta.yaml documents a new top-level params.launch_dir.

Changes

Cohort / File(s) Summary
Nextflow wrapper
utils/nextflow/wrapper.py
Replace multi-branch name filter with a set-membership check that excludes "launch_dir"; read launch_dir = params.get("launch_dir", None); if provided, validate existence and chdir into it before invoking the same shell Nextflow command.
Test Snakefile updates
utils/nextflow/test/Snakefile
Change rule chipseq_pipeline outputs: replace unnamed output "results/multiqc/broadPeak/multiqc_report.html" with multiqc_report="results/multiqc/broadPeak/multiqc_report.html" and add launch_dir=directory("some_directory"); add params.launch_dir=lambda wildcards, output: output.launch_dir; update outdir computation to use Path(output.multiqc_report).parents[-2].
Metadata
utils/nextflow/meta.yaml
Add top-level params section with a launch_dir field and description documenting that it lets users adjust the directory from which Nextflow is launched and noting Nextflow’s read-only launchDir behavior.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Caller
    participant Wrapper as Nextflow Wrapper
    participant FS as Filesystem
    participant Shell
    participant Nextflow

    Caller->>Wrapper: invoke wrapper(params)
    Wrapper->>Wrapper: filter params via set membership (exclude "launch_dir")
    Wrapper->>Wrapper: launch_dir = params.get("launch_dir", None)
    alt launch_dir provided
        Wrapper->>FS: stat/check launch_dir
        FS-->>Wrapper: exists / not exists
        alt exists
            Wrapper->>Wrapper: chdir(launch_dir)
        else not exists
            Wrapper-->>Caller: raise error
        end
    end
    Wrapper->>Shell: run nextflow shell command (from current dir)
    Shell->>Nextflow: execute
    Nextflow-->>Shell: result
    Shell-->>Wrapper: exit code/output
    Wrapper-->>Caller: return result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description is a brief free-form note and does not follow the repository’s required template, as it lacks the QC checklist with tick boxes and omits confirmations for test updates, documentation links, and other mandatory sections. This makes it incomplete when compared to the provided description_template. Please expand the PR description to match the repository template by adding the QC section with the required checkboxes and filling in details on test.py updates, input/output path handling, documentation URLs in meta.yaml, and conda environment specifications.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly describes the main feature added—allowing the launchDir to be specified via parameters—and follows the conventional commit style prefix “feat:”. It accurately reflects the core change across the wrapper code, test Snakefile, and metadata without extraneous detail. It is clear and concise for anyone scanning the commit history.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch utils/nextflow/allow-setting-launchdir

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between df56b13 and d911057.

📒 Files selected for processing (1)
  • utils/nextflow/meta.yaml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: docs
  • GitHub Check: testing
  • GitHub Check: Summary
🔇 Additional comments (1)
utils/nextflow/meta.yaml (1)

9-12: Documentation clarifies the new launch_dir param

The added description cleanly explains the behavior and fits the wrapper metadata format. Nice!


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5805a2b and dbf3ae4.

📒 Files selected for processing (1)
  • utils/nextflow/wrapper.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • utils/nextflow/wrapper.py
**/wrapper.py

⚙️ CodeRabbit configuration file

Do not complain about use of undefined variable called snakemake.

Files:

  • utils/nextflow/wrapper.py
🪛 Ruff (0.13.3)
utils/nextflow/wrapper.py

38-38: Undefined name snakemake

(F821)


40-40: Undefined name snakemake

(F821)


41-41: Undefined name snakemake

(F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: testing
  • GitHub Check: docs
  • GitHub Check: Summary
🔇 Additional comments (1)
utils/nextflow/wrapper.py (1)

35-35: LGTM!

The set membership check is a clean and efficient way to filter out reserved parameter names.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
utils/nextflow/wrapper.py (1)

42-47: Good security improvements, but consider path normalization and simplifying the control flow.

The current implementation successfully addresses the shell injection vulnerability and silent failure issues from previous reviews by using os.chdir() instead of shell interpolation. However, consider these refinements:

  1. Path normalization: The path should be normalized to prevent path traversal attempts and ensure consistent behavior:

    launch_dir = os.path.abspath(launch_dir)
  2. Unnecessary else clause: Since line 45 raises an exception, the else clause is redundant and can be removed.

  3. Error message style: Consider using a custom exception class or a shorter message to address the static analysis hint.

Apply this diff to implement the suggested refinements:

 launch_dir = snakemake.params.get("launch_dir")
 if launch_dir:
+    launch_dir = os.path.abspath(launch_dir)
     if not os.path.isdir(launch_dir):
-        raise ValueError(f"launch_dir does not exist: {launch_dir}")
-    else:
-        os.chdir(launch_dir)
+        raise ValueError(f"Directory not found: {launch_dir}")
+    os.chdir(launch_dir)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dbf3ae4 and b78911b.

📒 Files selected for processing (1)
  • utils/nextflow/wrapper.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • utils/nextflow/wrapper.py
**/wrapper.py

⚙️ CodeRabbit configuration file

Do not complain about use of undefined variable called snakemake.

Files:

  • utils/nextflow/wrapper.py
🪛 Ruff (0.13.3)
utils/nextflow/wrapper.py

38-38: Undefined name snakemake

(F821)


40-40: Undefined name snakemake

(F821)


42-42: Undefined name snakemake

(F821)


45-45: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: testing
  • GitHub Check: docs
  • GitHub Check: Summary
🔇 Additional comments (1)
utils/nextflow/wrapper.py (1)

35-35: LGTM! Cleaner approach to parameter filtering.

The refactor from an if-elif chain to set membership is more maintainable and correctly excludes "launch_dir" from being passed as a Nextflow parameter.

@fgvieira
Copy link
Collaborator

fgvieira commented Oct 9, 2025

Do you think it would be possible to add a test case?

@tedil
Copy link
Contributor Author

tedil commented Oct 9, 2025

I think I'd just modify the existing testcase to include this parameter, so we don't unnecessarily run the pipeline multiple times.

@fgvieira
Copy link
Collaborator

fgvieira commented Oct 9, 2025

And it would be great of you could also add it to the meta.yaml file. Just so people know what the param is supposed to do. 😄

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
utils/nextflow/test/Snakefile (1)

40-62: Add test for .nextflow directory in launch_dir.

  • Add a verify_nextflow_location rule in utils/nextflow/test/Snakefile that asserts some_directory/.nextflow exists after running chipseq_pipeline.
  • Or extend the chipseq_pipeline outputs with nextflow_dir=directory("some_directory/.nextflow") to let Snakemake verify it automatically.
🧹 Nitpick comments (1)
utils/nextflow/test/Snakefile (1)

54-54: Simplify by removing the circular param-output reference.

The param launch_dir=lambda wildcards, output: output.launch_dir creates an unnecessary circular reference where the param depends on the output, which in turn depends on the param. This adds complexity without benefit.

Apply this diff to simplify (after addressing the path resolution issue in the previous comment):

-    launch_dir=lambda wildcards, output: output.launch_dir,
+    launch_dir="some_directory",
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b78911b and df56b13.

📒 Files selected for processing (1)
  • utils/nextflow/test/Snakefile (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: testing
  • GitHub Check: docs
  • GitHub Check: Summary

Comment on lines +47 to +49
multiqc_report="results/multiqc/broadPeak/multiqc_report.html",
# directory from which nextflow is launched, will contain the `.nextflow` directory
launch_dir=directory("some_directory"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Semantically, launch_dir should not be an output.

Declaring launch_dir as a directory() output is semantically incorrect. Snakemake outputs are artifacts produced by the rule, but launch_dir is merely a working directory where Nextflow runs. The .nextflow directory created inside it is the actual artifact.

Additionally, since the wrapper changes the working directory to launch_dir before running Nextflow, all relative paths (inputs, outputs, and the outdir param) will be resolved relative to some_directory instead of the Snakemake working directory. This means:

  • output.multiqc_report declared as "results/multiqc/broadPeak/multiqc_report.html" will be created at "some_directory/results/multiqc/broadPeak/multiqc_report.html"
  • Snakemake will look for it at "results/multiqc/broadPeak/multiqc_report.html" (relative to the Snakemake working directory)
  • The test will fail with a missing output error

Consider these alternatives:

Option 1: Pre-create the directory and pass it as a param

 output:
     multiqc_report="results/multiqc/broadPeak/multiqc_report.html",
-    # directory from which nextflow is launched, will contain the `.nextflow` directory
-    launch_dir=directory("some_directory"),
 params:
     pipeline="nf-core/chipseq",
     revision="2.0.0",
     profile=["test", "docker"],
-    launch_dir=lambda wildcards, output: output.launch_dir,
+    launch_dir="some_directory",

Then add a rule to create the directory before running the test:

rule prepare_launch_dir:
    output:
        directory("some_directory")
    shell:
        "mkdir -p {output}"

Option 2: Make the wrapper convert paths to absolute before chdir

The wrapper should resolve all Snakemake input/output/params paths to absolute paths before changing directories, ensuring they remain valid after the chdir.

Run this script to verify whether the current implementation handles path resolution correctly:


🏁 Script executed:

#!/bin/bash
# Description: Check if the wrapper implementation handles path resolution after chdir

# Check if wrapper.py converts paths to absolute before chdir
rg -A 10 -B 5 'chdir.*launch_dir|os\.chdir' utils/nextflow/wrapper.py

Length of output: 263


Remove launch_dir from outputs and resolve paths before chdir

  • launch_dir is not a produced artifact and must be a param (or pre-created by a separate rule), not a directory() output.
  • The wrapper calls os.chdir(launch_dir) without converting Snakemake inputs/outputs (e.g. "results/…/multiqc_report.html") to absolute paths, so the rule will write files under some_directory/… and Snakemake won’t find them.
  • Fix by either:
    1. Pre-creating some_directory in a preparatory rule and passing it as a simple param, or
    2. Modifying the wrapper to wrap every Snakemake path with os.path.abspath(...) before performing os.chdir.
🤖 Prompt for AI Agents
In utils/nextflow/test/Snakefile around lines 47 to 49, the rule incorrectly
declares launch_dir as an output directory() while the wrapper treats it as a
runtime working directory and calls os.chdir(launch_dir), which causes
downstream output paths to be written under that directory and not found by
Snakemake; fix by removing launch_dir from the outputs and either (A) pre-create
"some_directory" in a separate preparatory rule and pass its path into this rule
as a simple param, or (B) change the wrapper to convert every Snakemake
input/output path to an absolute path (os.path.abspath(...)) before calling
os.chdir so Snakemake can still track outputs correctly.

# We infer this from the output file path. Since that file path can be changed
# e.g. in case of cloud storage, we use a lambda function to infer the outdir.
outdir=lambda wildcards, output: str(Path(output[0]).parents[-2]),
outdir=lambda wildcards, output: str(Path(output.multiqc_report).parents[-2]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify the outdir calculation after path resolution changes.

The calculation .parents[-2] correctly extracts "results" from the path. However, after the wrapper changes into launch_dir, this relative path needs to be handled correctly. Ensure that either:

  1. The wrapper converts this param to an absolute path before chdir, or
  2. The path is explicitly made relative to the new working directory

Run this script to check how the wrapper handles params that contain paths:


🏁 Script executed:

#!/bin/bash
# Description: Check if outdir and other path params are handled correctly after chdir

# Look for param handling around the chdir operation
rg -A 15 -B 5 'snakemake\.params' utils/nextflow/wrapper.py

Length of output: 1277


🏁 Script executed:

#!/bin/bash
# Locate add_parameter definition and inspect its behavior
rg -n 'def add_parameter' -A5 utils/nextflow/wrapper.py

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Search for add_parameter definition and usage across the repo
rg -n "add_parameter" -n .

Length of output: 237


Make outdir absolute before changing into launch_dir.
add_parameter appends paths verbatim and the wrapper calls os.chdir(launch_dir) before invoking Nextflow, so a relative outdir (e.g. from parents[-2]) will be resolved against launch_dir instead of the project root. Convert outdir (and any other path params) to an absolute path (e.g. via os.path.abspath(value) or Path(value).resolve()) before the os.chdir in utils/nextflow/wrapper.py.

🤖 Prompt for AI Agents
In utils/nextflow/test/Snakefile around line 58, outdir is set via parents[-2]
which produces a relative path; because utils/nextflow/wrapper.py calls
os.chdir(launch_dir) before running Nextflow, relative path params will be
resolved incorrectly. Modify utils/nextflow/wrapper.py to convert outdir and any
other path-like parameters to absolute paths (e.g., os.path.abspath(value) or
Path(value).resolve()) when add_parameter is building the parameter dict or
immediately before os.chdir, so all path params are absolute prior to changing
the working directory.

@tedil
Copy link
Contributor Author

tedil commented Oct 9, 2025

Hm, I think the rabbit may potentially have a point regarding the relative-ness of paths; but the thing is that some nextflow pipelines resolve their (relative) paths relative to the launchDir…
But I'd also like to avoid calling abspath or some other path normalization, to minimize issues with storage plugins.

@johanneskoester
Copy link
Contributor

cc @famosab

@famosab
Copy link

famosab commented Oct 10, 2025

the thing is that some nextflow pipelines resolve their (relative) paths relative to the launchDir

I think all pipelines resolve that relative to the launchDir (except for inputs where you can give absolute paths) and you can define the results / work folders to be created somewhere else than the launchDir (both relative to the launch and with absolute paths). But the .nextflow/. files will be in the launchDir folder. Does that help at least a bit? I think setting the launchDir makes sense but it depends on what you want to achieve with the wrapper because I think the outdir can be set to a different path than the launchDir as well as the workdir.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants