Skip to content

Conversation

@cmeesters
Copy link
Member

@cmeesters cmeesters commented Aug 12, 2025

This is an attempt to fulfil the request outlined in PR #336 .

Summary by CodeRabbit

  • New Features

    • New configurable status command and retry attempts; options to enable efficiency reports, set report path and efficiency threshold; QoS, reservation and initial delay before status checks now configurable.
  • Improvements

    • Smarter cluster-aware status detection with contextual warnings, refined shutdown flow, more robust status querying and log cleanup (including empty-directory handling) and optional efficiency report generation.
  • Chores

    • Added Black formatting configuration.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 12, 2025

Walkthrough

Adds a new job_status_query module, extends ExecutorSettings with status/attempts, efficiency-report and log-retention options, implements status-command resolution/validation, enhances shutdown to optionally emit an efficiency report, and adds log cleanup and related helpers and imports.

Changes

Cohort / File(s) Change Summary
Executor config, validation & lifecycle
snakemake_executor_plugin_slurm/__init__.py
Adds settings (status_command, status_attempts, efficiency_report, efficiency_report_path, efficiency_threshold, init_seconds_before_status_checks, keep_successful_logs, delete_logfiles_older_than, qos, reservation); adds ExecutorSettings.__post_init__; updates Executor.__post_init__(test_mode: bool = False); adds get_status_command() and _validate_status_command_settings(); implements clean_old_logs(); integrates status-command validation and efficiency-report generation into shutdown; imports job_status_query helpers.
Status query utilities (new module)
snakemake_executor_plugin_slurm/job_status_query.py
New utilities: get_min_job_age(), is_query_tool_available(), should_recommend_squeue_status_command() and builders for sacct/squeue status query commands (command construction and detection helpers).
Efficiency report formatting (stylistic)
snakemake_executor_plugin_slurm/efficiency_report.py
Reflowed multi-line sacct format string for readability; no behavioral change.
Partitions & tests (formatting)
snakemake_executor_plugin_slurm/partitions.py, tests/*
Formatting and line-wrapping only; no logic changes.
Tooling config
pyproject.toml
Adds Black configuration: line-length = 87, target-version = ['py311'].

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant Executor
  participant QueryUtils as job_status_query
  participant SLURM

  rect rgba(230,245,255,0.6)
  note over Executor: Initialization & status-command selection
  User->>Executor: instantiate Executor(settings)
  Executor->>QueryUtils: is_query_tool_available(sacct/squeue)
  Executor->>QueryUtils: get_min_job_age()
  Executor->>Executor: compute default status_command / _validate_status_command_settings()
  end

  rect rgba(240,255,230,0.6)
  note over Executor: Status polling & retries
  Executor->>Executor: get_status_command()
  alt use sacct
    Executor->>SLURM: run sacct (cmd from QueryUtils)
  else use squeue
    Executor->>SLURM: run squeue (cmd from QueryUtils)
  end
  SLURM-->>Executor: job states
  Executor->>Executor: retry up to status_attempts on transient failures
  end
Loading
sequenceDiagram
  autonumber
  participant Executor
  participant FS as Filesystem

  note over Executor: Shutdown sequence
  Executor->>Executor: clean_old_logs()
  alt efficiency_report enabled
    Executor->>FS: write efficiency report to efficiency_report_path
  else
    Executor->>Executor: finalize shutdown
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas to focus review on:

  • job_status_query.get_min_job_age(): subprocess invocation, parsing, and unit handling.
  • _validate_status_command_settings(): threshold calculations and user-facing warnings.
  • clean_old_logs(): file deletion, empty-directory cleanup, and interactions with retention settings.
  • Executor.post_init signature change and any test/instantiation sites.

Possibly related PRs

Suggested reviewers

  • johanneskoester
  • dlaehnemann

Poem

I thump at sacct and peek at squeue,
I tidy logs when work is through.
I count the runs and write the score,
I warn on age and probe the core.
A rabbit audits clusters — hoppity hooray! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'refactor: first step to outsource query command' accurately describes the main changes: extracting SLURM job status query logic into a new module (job_status_query.py) and integrating smart status_command orchestration throughout the codebase.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/test_job_query

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cmeesters cmeesters marked this pull request as ready for review September 19, 2025 14:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (7)
snakemake_executor_plugin_slurm/job_status_query.py (5)

42-49: Defensive: support minute units (if present) instead of silently assuming seconds.

If MinJobAge is set like “5min”, we currently misinterpret it as 5 seconds.

Apply this diff:

-                if unit in ("h", "hour", "hours"):
+                if unit in ("h", "hour", "hours"):
                     return value * 3600
-                elif unit in ("s", "sec", "secs", "second", "seconds", ""):
+                elif unit in ("m", "min", "mins", "minute", "minutes"):
+                    return value * 60
+                elif unit in ("s", "sec", "secs", "second", "seconds", ""):
                     return value
                 else:
                     # Unknown unit, assume seconds
                     return value

56-67: Use shutil.which instead of spawning a subprocess.

This avoids S603 concerns and is faster/portable.

Apply this diff:

+import shutil
@@
-def is_query_tool_available(tool_name):
+def is_query_tool_available(tool_name):
     """
-    Check if the sacct command is available on the system.
-    Returns True if sacct is available, False otherwise.
+    Check if a given command is available on the system.
+    Returns True if available, False otherwise.
     """
-    cmd = f"which {tool_name}"
-    cmd = shlex.split(cmd)
-    try:
-        subprocess.check_output(cmd, stderr=subprocess.PIPE)
-        return True
-    except subprocess.CalledProcessError:
-        return False
+    return shutil.which(tool_name) is not None

70-98: Clarify function intent and docstring.

Name says “recommend”, docstring says “available”. Keep it consistent: the function returns True when squeue is recommended (MinJobAge >= threshold). No code change needed; just update the docstring.

Apply this diff:

 def should_recommend_squeue_status_command(min_threshold_seconds=120):
     """
-    Determine if the status query with squeue should be recommended based on
-    the MinJobAge configuration (if very low, squeue might not work well)
+    Determine if squeue should be recommended based on MinJobAge.
+    If MinJobAge >= min_threshold_seconds, squeue is considered reliable.
+    Otherwise prefer sacct.
@@
-    Returns True if the option should be available, False otherwise.
+    Returns True if squeue is recommended, False otherwise.
     """

100-111: Fix return type and docstring; build a safe, readable sacct command.

The function returns a string, not a list; the docstring mentions unrelated args. Also avoid manual split-join; build tokens then shlex.join for readability.

Apply this diff:

-def query_job_status_sacct(jobuid) -> list:
+def query_job_status_sacct(jobuid) -> str:
     """
-    Query job status using sacct command
+    Build a sacct command to query job status for a given workflow run UUID.
@@
-        job_ids: List of SLURM job IDs
-        timeout: Timeout in seconds for subprocess call
+        jobuid: Workflow run UUID used as SLURM job name
@@
-        Dictionary mapping job ID to JobStatus object
+        Command string suitable for subprocess with shell=True
     """
@@
-    # -X: only show main job, no substeps
-    query_command = f"""sacct -X --parsable2 \
-                        --clusters all \
-                        --noheader --format=JobIdRaw,State \
-                        --starttime {sacct_starttime} \
-                        --endtime now --name {jobuid}"""
-
-    # for better redability in verbose output
-    query_command = " ".join(shlex.split(query_command))
-
-    return query_command
+    # -X: only show main job, no substeps
+    cmd = [
+        "sacct",
+        "-X",
+        "--parsable2",
+        "--clusters",
+        "all",
+        "--noheader",
+        "--format=JobIdRaw,State",
+        "--starttime",
+        sacct_starttime,
+        "--endtime",
+        "now",
+        "--name",
+        str(jobuid),
+    ]
+    # readable one-liner for logs while keeping quoting correct
+    return shlex.join(cmd)

Also applies to: 119-129


132-154: Unify return type with sacct variant; return a string via shlex.join.

Keeps consumers consistent and improves logging readability.

Apply this diff:

-def query_job_status_squeue(jobuid) -> list:
+def query_job_status_squeue(jobuid) -> str:
     """
-    Query job status using squeue command (newer SLURM functionality)
+    Build an squeue command to query job status (newer SLURM functionality).
@@
-    Returns:
-        Dictionary mapping job ID to JobStatus object
+    Returns:
+        Command string suitable for subprocess with shell=True
@@
-    query_command = [
+    cmd = [
         "squeue",
         "--format=%i|%T",
         "--states=all", 
         "--noheader",
         "--name",
-        f"{jobuid}"
+        str(jobuid),
     ]
-
-    return query_command
+    return shlex.join(cmd)
snakemake_executor_plugin_slurm/__init__.py (2)

601-610: De‑duplicate: build the sacct command via the new helper.

Use query_job_status_sacct(self.run_uuid) to avoid drift between implementations.

Apply this diff (and import the helper):

-from .job_status_query import (
-    get_min_job_age,
-    is_query_tool_available,
-    should_recommend_squeue_status_command,
-)
+from .job_status_query import (
+    get_min_job_age,
+    is_query_tool_available,
+    should_recommend_squeue_status_command,
+    query_job_status_sacct,
+)
@@
-        # -X: only show main job, no substeps
-        sacct_command = f"""sacct -X --parsable2 \
-                        --clusters all \
-                        --noheader --format=JobIdRaw,State \
-                        --starttime {sacct_starttime} \
-                        --endtime now --name {self.run_uuid}"""
-
-        # for better redability in verbose output
-        sacct_command = " ".join(shlex.split(sacct_command))
+        # Build via helper (one-liner string for logging)
+        sacct_command = query_job_status_sacct(self.run_uuid)

81-84: Nit: extra space and wording in help message.

“(MinJobAge is sufficient )” contains a stray space; also prefer “sufficient for reliable usage”.

Apply this diff:

-            msg_part1 = "'squeue' recommended (MinJobAge is sufficient )"
-            msg_part2 = " for reliable usage"
+            msg_part1 = "'squeue' recommended (MinJobAge is sufficient"
+            msg_part2 = ") for reliable usage"
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 104d0f7 and c484c90.

📒 Files selected for processing (2)
  • snakemake_executor_plugin_slurm/__init__.py (8 hunks)
  • snakemake_executor_plugin_slurm/job_status_query.py (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.
📚 Learning: 2025-03-10T15:20:51.829Z
Learnt from: johanneskoester
PR: snakemake/snakemake-executor-plugin-slurm#173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.

Applied to files:

  • snakemake_executor_plugin_slurm/job_status_query.py
📚 Learning: 2024-12-11T14:17:08.749Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#178
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2024-12-11T14:17:08.749Z
Learning: In the `snakemake-executor-plugin-slurm` project, when handling exceptions in `snakemake_executor_plugin_slurm/__init__.py`, prefer concise error messages and avoid unnecessary verbosity or exception chaining when it's not necessary.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
🧬 Code graph analysis (1)
snakemake_executor_plugin_slurm/__init__.py (1)
snakemake_executor_plugin_slurm/job_status_query.py (3)
  • get_min_job_age (7-53)
  • is_query_tool_available (56-67)
  • should_recommend_squeue_status_command (70-97)
🪛 Ruff (0.12.2)
snakemake_executor_plugin_slurm/job_status_query.py

16-16: subprocess call: check for execution of untrusted input

(S603)


64-64: subprocess call: check for execution of untrusted input

(S603)


65-65: Consider moving this statement to an else block

(TRY300)

snakemake_executor_plugin_slurm/__init__.py

52-55: Avoid specifying long messages outside the exception class

(TRY003)


320-320: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF001)


329-329: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: testing
🔇 Additional comments (2)
snakemake_executor_plugin_slurm/__init__.py (2)

45-60: Default selection logic looks good.

sacct preferred, squeue as fallback, and hard-fail if neither is present makes sense.


583-587: status_command validated but not wired into status checks

get_status_command exists and job_status_query implements both sacct and squeue, but active-job checks still appear to use sacct only — either wire get_status_command into check_active_jobs (use squeue when selected) or hide the status_command option behind a feature flag. To confirm, run:

rg -nP -C2 '\bcheck_active_jobs\b' snakemake_executor_plugin_slurm
rg -nP -C2 '\bquery_job_status_sacct\b|\bquery_job_status_squeue\b' snakemake_executor_plugin_slurm

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
snakemake_executor_plugin_slurm/job_status_query.py (1)

13-18: Harden scontrol probe: add timeout and catch FileNotFoundError/TimeoutExpired.

Prevents hangs and handles missing binaries gracefully.

Apply:

-        output = subprocess.check_output(cmd, text=True, stderr=subprocess.PIPE)
-    except subprocess.CalledProcessError:
+        output = subprocess.check_output(
+            cmd, text=True, stderr=subprocess.PIPE, timeout=10
+        )
+    except (subprocess.CalledProcessError, FileNotFoundError, subprocess.TimeoutExpired):
         return None
snakemake_executor_plugin_slurm/__init__.py (2)

67-105: Make help static to avoid external calls at import time.

Prevents import-time scontrol/sacct probes and potential hangs.

Apply:

 def _get_status_command_help():
-    """Get help text with computed default."""
-    default_cmd = _get_status_command_default()
-    sacct_available = is_query_tool_available("sacct")
-    squeue_recommended = should_recommend_squeue_status_command()
-
-    base_help = "Command to query job status. Options: 'sacct', 'squeue'. "
-
-    if default_cmd == "sacct":
-        if sacct_available and not squeue_recommended:
-            info = (
-                "'sacct' detected and will be used "
-                "(MinJobAge may be too low for reliable 'squeue' usage)"
-            )
-        else:
-            info = "'sacct' detected and will be used"
-    else:  # default_cmd == "squeue"
-        if squeue_recommended:
-            # cumbersome, due to black and the need to stay below 80 chars
-            msg_part1 = "'squeue' recommended (MinJobAge is sufficient )"
-            msg_part2 = " for reliable usage"
-            info = msg_part1 + msg_part2
-        elif not sacct_available:
-            info = (
-                "'sacct' not available, falling back to 'squeue'. "
-                "WARNING: 'squeue' may not work reliably if MinJobAge is "
-                "too low"
-            )
-        else:
-            info = (
-                "'squeue' will be used. "
-                "WARNING: MinJobAge may be too low for reliable 'squeue' usage"
-            )
-
-    return (
-        f"{base_help}Default: '{default_cmd}' ({info}). "
-        f"Set explicitly to override auto-detection."
-    )
+    """Help: job status command."""
+    return (
+        "Command to query job status. Options: 'sacct' (preferred), 'squeue'. "
+        "Default is chosen dynamically: 'sacct' if available, otherwise 'squeue'. "
+        "Note: 'squeue' reliability depends on MinJobAge; see plugin docs."
+    )

302-354: Fix MinJobAge logic and threshold source; replace ambiguous ×.

MinJobAge gates squeue visibility, not sacct; derive threshold from init_seconds_before_status_checks.

Apply:

-                # Calculate dynamic threshold: 3 times the initial
-                # status check interval.
-                # The plugin starts with 40 seconds and increases,
-                # so we use (3 * 10) + 40 = 70 seconds as minimum
-                between_status_check_seconds = getattr(
-                    self.workflow.executor_settings,
-                    "seconds_between_status_checks",
-                    70,
-                )
-                dynamic_check_threshold = 3 * between_status_check_seconds
+                # Threshold: 3x initial status check interval (default 40s)
+                initial_interval = getattr(
+                    self.workflow.executor_settings,
+                    "init_seconds_before_status_checks",
+                    40,
+                )
+                dynamic_check_threshold = 3 * initial_interval
@@
-                elif sacct_available and min_job_age is not None:
-                    if (
-                        min_job_age < dynamic_check_threshold
-                        and status_command == "sacct"
-                    ):
-                        self.logger.warning(
-                            f"MinJobAge is {min_job_age} seconds "
-                            f"(< {dynamic_check_threshold}s). "
-                            f"This may cause 'sacct' to report inaccurate "
-                            "job states and the status_command option may "
-                            "be unreliable. "
-                            "(Threshold is 3x status check interval: 3 × "
-                            f"{between_status_check_seconds}s = "
-                            f"{dynamic_check_threshold}s)"
-                        )
-                    elif (
-                        min_job_age >= dynamic_check_threshold
-                        and status_command == "squeue"
-                    ):
-                        self.logger.warning(
-                            f"MinJobAge is {min_job_age} seconds (>= "
-                            f"{dynamic_check_threshold}s). "
-                            f"The 'squeue' command should work reliably for "
-                            "status queries. "
-                            "(Threshold is 3x status check interval: 3 × "
-                            f"{between_status_check_seconds}s = "
-                            f"{dynamic_check_threshold}s)"
-                        )
+                elif sacct_available and min_job_age is not None:
+                    if min_job_age < dynamic_check_threshold and status_command == "squeue":
+                        self.logger.warning(
+                            f"MinJobAge is {min_job_age} seconds (< {dynamic_check_threshold}s). "
+                            "This may cause 'squeue' to miss recently finished jobs; consider using 'sacct'. "
+                            f"(Threshold is 3x initial interval: 3 x {initial_interval}s = {dynamic_check_threshold}s)"
+                        )
+                    elif min_job_age >= dynamic_check_threshold and status_command == "sacct":
+                        self.logger.warning(
+                            f"MinJobAge is {min_job_age} seconds (>= {dynamic_check_threshold}s). "
+                            "The 'squeue' command should also work reliably for status queries. "
+                            f"(Threshold is 3x initial interval: 3 x {initial_interval}s = {dynamic_check_threshold}s)"
+                        )

Run to confirm no stray uses of the removed attribute:

#!/bin/bash
rg -nP -C2 '\bseconds_between_status_checks\b|\binit_seconds_before_status_checks\b'
🧹 Nitpick comments (7)
snakemake_executor_plugin_slurm/job_status_query.py (4)

56-67: Use shutil.which instead of spawning 'which'.

Faster, portable, and avoids S603 concerns; also fix docstring.

Apply:

 def is_query_tool_available(tool_name):
-    """
-    Check if the sacct command is available on the system.
-    Returns True if sacct is available, False otherwise.
-    """
-    cmd = f"which {tool_name}"
-    cmd = shlex.split(cmd)
-    try:
-        subprocess.check_output(cmd, stderr=subprocess.PIPE)
-        return True
-    except subprocess.CalledProcessError:
-        return False
+    """
+    Check if a given CLI tool is available on PATH.
+    """
+    return shutil.which(tool_name) is not None

Add import:

+import shutil

97-127: Fix return type/docstring of query_job_status_sacct (returns str, not list/dict).

Align signature, doc, and behavior.

Apply:

-def query_job_status_sacct(jobuid) -> list:
+def query_job_status_sacct(jobuid) -> str:
     """
-    Query job status using sacct command
+    Build the sacct command used to query job status for a given workflow UUID.
 
     Args:
-        job_ids: List of SLURM job IDs
-        timeout: Timeout in seconds for subprocess call
+        jobuid: Workflow UUID used as job name filter.
 
     Returns:
-        Dictionary mapping job ID to JobStatus object
+        str: command string to pass to subprocess
     """

129-151: Unify squeue builder to return a command string (consistent with sacct path).

Ensures compatibility with job_stati(), which expects a string.

Apply:

-def query_job_status_squeue(jobuid) -> list:
+def query_job_status_squeue(jobuid) -> str:
     """
-    Query job status using squeue command (newer SLURM functionality)
+    Build the squeue command used to query job status for a given workflow UUID.
@@
-    Returns:
-        Dictionary mapping job ID to JobStatus object
+    Returns:
+        str: command string to pass to subprocess
@@
-    query_command = [
+    query_command = [
         "squeue",
         "--format=%i|%T",
         "--states=all",
         "--noheader",
         "--name",
         f"{jobuid}",
     ]
 
-    return query_command
+    return shlex.join(query_command)

123-124: Typo.

“redability” → “readability”.

Apply:

-    # for better redability in verbose output
+    # for better readability in verbose output
snakemake_executor_plugin_slurm/__init__.py (3)

355-363: Return default when status_command is None.

Avoids propagating None.

Apply:

 def get_status_command(self):
     """Get the status command to use, with fallback logic."""
     if hasattr(self.workflow.executor_settings, "status_command"):
-        return self.workflow.executor_settings.status_command
+        value = self.workflow.executor_settings.status_command
+        return value or _get_status_command_default()
     else:
         # Fallback: determine the best command based on
         # cluster configuration
         return _get_status_command_default()

386-395: Guard against None for delete_logfiles_older_than.

Make cleanup robust to explicit None.

Apply:

-        if age_cutoff <= 0 or keep_all:
+        if age_cutoff is None or age_cutoff <= 0 or keep_all:
             return

377-384: Don't let efficiency report failures abort shutdown.

Wrap to log and continue.

Apply:

-        if self.workflow.executor_settings.efficiency_report:
-            threshold = self.workflow.executor_settings.efficiency_threshold
-            report_path = self.workflow.executor_settings.efficiency_report_path
-            create_efficiency_report(
-                e_threshold=threshold,
-                run_uuid=self.run_uuid,
-                e_report_path=report_path,
-                logger=self.logger,
-            )
+        if self.workflow.executor_settings.efficiency_report:
+            threshold = self.workflow.executor_settings.efficiency_threshold
+            report_path = self.workflow.executor_settings.efficiency_report_path
+            try:
+                create_efficiency_report(
+                    e_threshold=threshold,
+                    run_uuid=self.run_uuid,
+                    e_report_path=report_path,
+                    logger=self.logger,
+                )
+            except Exception as e:  # best-effort
+                self.logger.warning(f"Efficiency report failed: {e}")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c484c90 and 8d9624a.

📒 Files selected for processing (4)
  • pyproject.toml (1 hunks)
  • snakemake_executor_plugin_slurm/__init__.py (19 hunks)
  • snakemake_executor_plugin_slurm/efficiency_report.py (1 hunks)
  • snakemake_executor_plugin_slurm/job_status_query.py (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • pyproject.toml
  • snakemake_executor_plugin_slurm/efficiency_report.py
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.
📚 Learning: 2024-12-11T14:17:08.749Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#178
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2024-12-11T14:17:08.749Z
Learning: In the `snakemake-executor-plugin-slurm` project, when handling exceptions in `snakemake_executor_plugin_slurm/__init__.py`, prefer concise error messages and avoid unnecessary verbosity or exception chaining when it's not necessary.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
  • snakemake_executor_plugin_slurm/job_status_query.py
📚 Learning: 2025-03-10T15:20:51.829Z
Learnt from: johanneskoester
PR: snakemake/snakemake-executor-plugin-slurm#173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.

Applied to files:

  • snakemake_executor_plugin_slurm/job_status_query.py
🧬 Code graph analysis (1)
snakemake_executor_plugin_slurm/__init__.py (2)
snakemake_executor_plugin_slurm/job_status_query.py (3)
  • get_min_job_age (7-53)
  • is_query_tool_available (56-67)
  • should_recommend_squeue_status_command (70-94)
snakemake_executor_plugin_slurm/efficiency_report.py (1)
  • create_efficiency_report (190-219)
🪛 Ruff (0.12.2)
snakemake_executor_plugin_slurm/__init__.py

56-60: Avoid specifying long messages outside the exception class

(TRY003)


337-337: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF001)


350-350: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF001)


566-566: Avoid specifying long messages outside the exception class

(TRY003)

snakemake_executor_plugin_slurm/job_status_query.py

16-16: subprocess call: check for execution of untrusted input

(S603)


64-64: subprocess call: check for execution of untrusted input

(S603)


65-65: Consider moving this statement to an else block

(TRY300)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: testing

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)

745-747: Fix path handling for aux_logs (Path has no _str).

Use str(path) to avoid AttributeError during error reporting.

-                    self.report_job_error(
-                        j, msg=msg, aux_logs=[j.aux["slurm_logfile"]._str]
-                    )
+                    self.report_job_error(
+                        j, msg=msg, aux_logs=[str(j.aux["slurm_logfile"])]
+                    )
♻️ Duplicate comments (4)
snakemake_executor_plugin_slurm/__init__.py (3)

66-104: Make help text static; avoid external checks at import time.

This prevents scontrol/sacct/squeue probes during import. Prior review asked for this.

-def _get_status_command_help():
-    """Get help text with computed default."""
-    default_cmd = _get_status_command_default()
-    sacct_available = is_query_tool_available("sacct")
-    squeue_recommended = should_recommend_squeue_status_command()
-
-    base_help = "Command to query job status. Options: 'sacct', 'squeue'. "
-
-    if default_cmd == "sacct":
-        if sacct_available and not squeue_recommended:
-            info = (
-                "'sacct' detected and will be used "
-                "(MinJobAge may be too low for reliable 'squeue' usage)"
-            )
-        else:
-            info = "'sacct' detected and will be used"
-    else:  # default_cmd == "squeue"
-        if squeue_recommended:
-            # cumbersome, due to black and the need to stay below 80 chars
-            msg_part1 = "'squeue' recommended (MinJobAge is sufficient )"
-            msg_part2 = " for reliable usage"
-            info = msg_part1 + msg_part2
-        elif not sacct_available:
-            info = (
-                "'sacct' not available, falling back to 'squeue'. "
-                "WARNING: 'squeue' may not work reliably if MinJobAge is "
-                "too low"
-            )
-        else:
-            info = (
-                "'squeue' will be used. "
-                "WARNING: MinJobAge may be too low for reliable 'squeue' usage"
-            )
-
-    return (
-        f"{base_help}Default: '{default_cmd}' ({info}). "
-        f"Set explicitly to override auto-detection."
-    )
+def _get_status_command_help():
+    """Help: job status command."""
+    return (
+        "Command to query job status. Options: 'sacct' (preferred), 'squeue'. "
+        "Default is chosen dynamically at runtime: 'sacct' if available, otherwise 'squeue'. "
+        "Note: 'squeue' reliability depends on MinJobAge; see plugin docs."
+    )

210-217: Avoid calling a function in dataclass metadata; inline the static help string.

Even though the helper is now pure, inlining avoids any import-time call.

-    status_command: Optional[str] = field(
-        default_factory=_get_status_command_default,
-        metadata={
-            "help": _get_status_command_help(),
-            "env_var": False,
-            "required": False,
-        },
-    )
+    status_command: Optional[str] = field(
+        default_factory=_get_status_command_default,
+        metadata={
+            "help": (
+                "Command to query job status. Options: 'sacct' (preferred), 'squeue'. "
+                "Default is chosen dynamically at runtime: 'sacct' if available, otherwise 'squeue'. "
+                "Note: 'squeue' reliability depends on MinJobAge; see plugin docs."
+            ),
+            "env_var": False,
+            "required": False,
+        },
+    )

301-353: Fix threshold source and inverted warnings in _validate_status_command_settings.

Use init_seconds_before_status_checks (default 40s) and warn about squeue when MinJobAge is low; replace × with x. This repeats earlier guidance.

-                # Calculate dynamic threshold: 3 times the initial
-                # status check interval.
-                # The plugin starts with 40 seconds and increases,
-                # so we use (3 * 10) + 40 = 70 seconds as minimum
-                between_status_check_seconds = getattr(
-                    self.workflow.executor_settings,
-                    "seconds_between_status_checks",
-                    70,
-                )
-                dynamic_check_threshold = 3 * between_status_check_seconds
+                # Threshold: 3x initial status check interval (default 40s)
+                initial_interval = getattr(
+                    self.workflow.executor_settings,
+                    "init_seconds_before_status_checks",
+                    40,
+                )
+                dynamic_check_threshold = 3 * initial_interval
@@
-                    if (
-                        min_job_age < dynamic_check_threshold
-                        and status_command == "sacct"
-                    ):
-                        self.logger.warning(
-                            f"MinJobAge is {min_job_age} seconds "
-                            f"(< {dynamic_check_threshold}s). "
-                            f"This may cause 'sacct' to report inaccurate "
-                            "job states and the status_command option may "
-                            "be unreliable. "
-                            "(Threshold is 3x status check interval: 3 × "
-                            f"{between_status_check_seconds}s = "
-                            f"{dynamic_check_threshold}s)"
-                        )
-                    elif (
-                        min_job_age >= dynamic_check_threshold
-                        and status_command == "squeue"
-                    ):
-                        self.logger.warning(
-                            f"MinJobAge is {min_job_age} seconds (>= "
-                            f"{dynamic_check_threshold}s). "
-                            f"The 'squeue' command should work reliably for "
-                            "status queries. "
-                            "(Threshold is 3x status check interval: 3 × "
-                            f"{between_status_check_seconds}s = "
-                            f"{dynamic_check_threshold}s)"
-                        )
+                    if min_job_age < dynamic_check_threshold and status_command == "squeue":
+                        self.logger.warning(
+                            f"MinJobAge is {min_job_age} seconds (< {dynamic_check_threshold}s). "
+                            "This may cause 'squeue' to miss recently finished jobs; consider using 'sacct'. "
+                            f"(Threshold is 3x initial interval: 3 x {initial_interval}s = {dynamic_check_threshold}s)"
+                        )
+                    elif min_job_age >= dynamic_check_threshold and status_command == "sacct":
+                        self.logger.warning(
+                            f"MinJobAge is {min_job_age} seconds (>= {dynamic_check_threshold}s). "
+                            "The 'squeue' command should also work reliably for status queries. "
+                            f"(Threshold is 3x initial interval: 3 x {initial_interval}s = {dynamic_check_threshold}s)"
+                        )
snakemake_executor_plugin_slurm/job_status_query.py (1)

14-19: Add timeout and catch FileNotFoundError/TimeoutExpired to avoid hangs.

If scontrol is missing or stalls, get_min_job_age() can block startup. Add timeout and broaden exceptions. This mirrors a prior review; re-applying here.

Apply this diff:

-        output = subprocess.check_output(cmd, text=True, stderr=subprocess.PIPE)
-    except subprocess.CalledProcessError:
+        output = subprocess.check_output(
+            cmd, text=True, stderr=subprocess.PIPE, timeout=10
+        )
+    except (
+        subprocess.CalledProcessError,
+        FileNotFoundError,
+        subprocess.TimeoutExpired,
+    ):
         return None
🧹 Nitpick comments (7)
snakemake_executor_plugin_slurm/job_status_query.py (4)

1-5: Import shutil (prep for safer tool detection).

Follow-up to next comment: we’ll use shutil.which instead of invoking which.

 import shlex
 import subprocess
 import re
 from datetime import datetime, timedelta
+import shutil

57-68: Use shutil.which; fix docstring; silence S603.

Avoid spawning which; simpler, faster, and safer.

-def is_query_tool_available(tool_name):
-    """
-    Check if the sacct command is available on the system.
-    Returns True if sacct is available, False otherwise.
-    """
-    cmd = f"which {tool_name}"
-    cmd = shlex.split(cmd)
-    try:
-        subprocess.check_output(cmd, stderr=subprocess.PIPE)
-        return True
-    except subprocess.CalledProcessError:
-        return False
+def is_query_tool_available(tool_name: str) -> bool:
+    """Return True if tool_name is on PATH, False otherwise."""
+    return shutil.which(tool_name) is not None

98-128: Fix return type and doc for query_job_status_sacct.

Annotation and docstring claim a list/dict but the function returns a string command.

-def query_job_status_sacct(jobuid) -> list:
-    """
-    Query job status using sacct command
- 
-    Args:
-        job_ids: List of SLURM job IDs
-        timeout: Timeout in seconds for subprocess call
- 
-    Returns:
-        Dictionary mapping job ID to JobStatus object
-    """
+def query_job_status_sacct(jobuid: str) -> str:
+    """
+    Build the sacct command (as a shell string) to query status for the given
+    workflow run identifier (jobuid). The caller is responsible for execution.
+    """

130-151: Align squeue helper: clarify return type/doc; add --clusters=all for parity.

Keep returning argv list (to allow shell=False later) but document it; include clusters flag like sacct.

-def query_job_status_squeue(jobuid) -> list:
-    """
-    Query job status using squeue command (newer SLURM functionality)
- 
-    Args:
-        job_ids: List of SLURM job IDs
-        timeout: Timeout in seconds for subprocess call
- 
-    Returns:
-        Dictionary mapping job ID to JobStatus object
-    """
+def query_job_status_squeue(jobuid: str) -> list[str]:
+    """
+    Build the squeue command (argv list) to query status for the given run id.
+    The caller is responsible for execution.
+    """
@@
     query_command = [
         "squeue",
         "--format=%i|%T",
         "--states=all",
         "--noheader",
+        "--clusters=all",
         "--name",
         f"{jobuid}",
     ]
snakemake_executor_plugin_slurm/__init__.py (3)

39-43: Import the new query builders to avoid duplicate command construction.

Use the helpers instead of re-assembling sacct strings locally; this matches the PR goal.

 from .job_status_query import (
     get_min_job_age,
     is_query_tool_available,
     should_recommend_squeue_status_command,
+    query_job_status_sacct,
+    query_job_status_squeue,
 )

619-626: Initialize variables used after the loop (status_attempts could be 0).

Prevents UnboundLocalError when status_attempts == 0.

-        sacct_query_durations = []
+        sacct_query_durations = []
+        status_of_jobs = None
+        sacct_query_duration = None

629-646: Use outsourced query builders and honor status_command.

Respect the configured/default status command and remove duplicate sacct assembly.

-        # We use this sacct syntax for argument 'starttime' to keep it
-        # compatible with slurm < 20.11
-        sacct_starttime = f"{datetime.now() - timedelta(days=2):%Y-%m-%dT%H:00}"
-        # previously we had
-        # f"--starttime now-2days --endtime now --name {self.run_uuid}"
-        # in line 218 - once v20.11 is definitively not in use any more,
-        # the more readable version ought to be re-adapted
-
-        # -X: only show main job, no substeps
-        sacct_command = f"""sacct -X --parsable2 \
-                        --clusters all \
-                        --noheader --format=JobIdRaw,State \
-                        --starttime {sacct_starttime} \
-                        --endtime now --name {self.run_uuid}"""
-
-        # for better redability in verbose output
-        sacct_command = " ".join(shlex.split(sacct_command))
+        # Build status query command according to selected backend
+        selected = self.get_status_command()
+        if selected == "squeue":
+            argv = query_job_status_squeue(self.run_uuid)
+            status_query_cmd = " ".join(shlex.quote(a) for a in argv)
+        else:
+            status_query_cmd = query_job_status_sacct(self.run_uuid)

And below (Line 651) replace sacct_command with status_query_cmd:

-                (status_of_jobs, sacct_query_duration) = await self.job_stati(
-                    sacct_command
-                )
+                (status_of_jobs, sacct_query_duration) = await self.job_stati(
+                    status_query_cmd
+                )
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d9624a and 419180a.

📒 Files selected for processing (2)
  • snakemake_executor_plugin_slurm/__init__.py (18 hunks)
  • snakemake_executor_plugin_slurm/job_status_query.py (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.
📚 Learning: 2024-12-11T14:17:08.749Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#178
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2024-12-11T14:17:08.749Z
Learning: In the `snakemake-executor-plugin-slurm` project, when handling exceptions in `snakemake_executor_plugin_slurm/__init__.py`, prefer concise error messages and avoid unnecessary verbosity or exception chaining when it's not necessary.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
  • snakemake_executor_plugin_slurm/job_status_query.py
📚 Learning: 2025-03-10T15:20:51.829Z
Learnt from: johanneskoester
PR: snakemake/snakemake-executor-plugin-slurm#173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.

Applied to files:

  • snakemake_executor_plugin_slurm/job_status_query.py
🧬 Code graph analysis (1)
snakemake_executor_plugin_slurm/__init__.py (2)
snakemake_executor_plugin_slurm/job_status_query.py (3)
  • get_min_job_age (7-54)
  • is_query_tool_available (57-68)
  • should_recommend_squeue_status_command (71-95)
snakemake_executor_plugin_slurm/efficiency_report.py (1)
  • create_efficiency_report (190-219)
🪛 Ruff (0.12.2)
snakemake_executor_plugin_slurm/__init__.py

55-59: Avoid specifying long messages outside the exception class

(TRY003)


336-336: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF001)


349-349: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF001)


565-565: Avoid specifying long messages outside the exception class

(TRY003)

snakemake_executor_plugin_slurm/job_status_query.py

17-17: subprocess call: check for execution of untrusted input

(S603)


65-65: subprocess call: check for execution of untrusted input

(S603)


66-66: Consider moving this statement to an else block

(TRY300)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: testing
🔇 Additional comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)

581-606: Confirm squeue output format on your SLURM cluster
squeue isn’t available in this environment. Please run on your target cluster:

squeue --format=%i\|%T --states=all --noheader --clusters=all | head -5

and ensure each line is exactly <id>|<state> with no multi-cluster prefixes or extra fields.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
snakemake_executor_plugin_slurm/job_status_query.py (1)

14-21: Good: timeout + broader exceptions in get_min_job_age.

This resolves the earlier concern about hangs/missing scontrol. Looks solid.

🧹 Nitpick comments (4)
snakemake_executor_plugin_slurm/job_status_query.py (4)

23-56: Parse minutes and avoid ambiguous units.

Support “m/min/minute(s)” and fail closed on unknown units rather than silently treating them as seconds. Also simplifies conversion via a map.

-            # Use regex to parse value and optional unit
-            # Pattern matches: number + optional whitespace + optional unit
-            match = re.match(r"^(\d+)\s*([a-zA-Z]*)", value_part)
+            # Parse number + optional unit
+            match = re.match(r"^(\d+)\s*([a-zA-Z]+)?", value_part)
             if not match:
                 continue

             value_str = match.group(1)
-            unit = match.group(2).lower() if match.group(2) else ""
+            unit = (match.group(2) or "").lower()

             try:
                 value = int(value_str)
-
-                # Convert to seconds based on unit
-                if unit in ("h", "hour", "hours"):
-                    return value * 3600
-                elif unit in ("s", "sec", "secs", "second", "seconds", ""):
-                    return value
-                else:
-                    # Unknown unit, assume seconds
-                    return value
+                unit_map = {
+                    "": 1, "s": 1, "sec": 1, "secs": 1, "second": 1, "seconds": 1,
+                    "m": 60, "min": 60, "mins": 60, "minute": 60, "minutes": 60,
+                    "h": 3600, "hour": 3600, "hours": 3600,
+                }
+                factor = unit_map.get(unit)
+                if factor is None:
+                    return None  # unknown unit → safer to disable recommendation
+                return value * factor

59-70: Use shutil.which instead of spawning ‘which’.

Removes a subprocess, fixes Ruff S603, and is more portable. Also generalize the docstring.

+import shutil
@@
-def is_query_tool_available(tool_name):
-    """
-    Check if the sacct command is available on the system.
-    Returns True if sacct is available, False otherwise.
-    """
-    cmd = f"which {tool_name}"
-    cmd = shlex.split(cmd)
-    try:
-        subprocess.check_output(cmd, stderr=subprocess.PIPE)
-        return True
-    except subprocess.CalledProcessError:
-        return False
+def is_query_tool_available(tool_name: str) -> bool:
+    """
+    Check if a CLI tool is available on PATH.
+    """
+    return shutil.which(tool_name) is not None

73-97: Clarify intent in docstring/name.

Function returns whether squeue is viable, not a recommendation per se. Consider rewording docstring (or rename to is_squeue_viable) to avoid confusion.

-    """
-    Determine if the status query with squeue should be recommended based on
-    the MinJobAge configuration (if very low, squeue might not work well)
+    """
+    Determine if using squeue for status queries is viable based on MinJobAge
+    (low values make completed jobs disappear quickly).
@@
-    Returns True if the option should be available, False otherwise.
+    Returns True if squeue is viable, False otherwise.

132-153: Fix docstring/type-hint drift and clarify jobuid semantics (job-name == run_uuid).

query_job_status_squeue returns an argv list; its docstring and signature are stale — jobuid is the run_uuid used as sbatch --job-name (submit_string.py:28; init.py:463–465). Update signature/docstring as shown.

-def query_job_status_squeue(jobuid) -> list:
-    """
-    Query job status using squeue command (newer SLURM functionality)
-
-    Args:
-        job_ids: List of SLURM job IDs
-        timeout: Timeout in seconds for subprocess call
-
-    Returns:
-        Dictionary mapping job ID to JobStatus object
-    """
+def query_job_status_squeue(jobuid: str) -> List[str]:
+    """
+    Build the squeue argv to query status by job name/UID.
+    Returns a list of arguments suitable for subprocess without shell=True.
+    """

Also add the necessary typing import (or use list[str] to match the codebase).

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 419180a and bc851dd.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_slurm/job_status_query.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.
📚 Learning: 2024-12-11T14:17:08.749Z
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#178
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2024-12-11T14:17:08.749Z
Learning: In the `snakemake-executor-plugin-slurm` project, when handling exceptions in `snakemake_executor_plugin_slurm/__init__.py`, prefer concise error messages and avoid unnecessary verbosity or exception chaining when it's not necessary.

Applied to files:

  • snakemake_executor_plugin_slurm/job_status_query.py
🪛 Ruff (0.12.2)
snakemake_executor_plugin_slurm/job_status_query.py

17-17: subprocess call: check for execution of untrusted input

(S603)


67-67: subprocess call: check for execution of untrusted input

(S603)


68-68: Consider moving this statement to an else block

(TRY300)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: testing
🔇 Additional comments (1)
snakemake_executor_plugin_slurm/job_status_query.py (1)

100-131: Return argv List[str] from query_job_status_sacct (fix docstring/type-hint/return mismatch)

Change the function to return a List[str] argv suitable for subprocess.run(..., shell=False) and update the signature/docstring. Repo search found no callers outside the definition; verify external callers/tests before merging.

-from datetime import datetime, timedelta
+from datetime import datetime, timedelta
+from typing import List
@@
-def query_job_status_sacct(jobuid) -> list:
-    """
-    Query job status using sacct command
-
-    Args:
-        job_ids: List of SLURM job IDs
-        timeout: Timeout in seconds for subprocess call
-
-    Returns:
-        Dictionary mapping job ID to JobStatus object
-    """
+def query_job_status_sacct(jobuid: str) -> List[str]:
+    """
+    Build the sacct argv to query job status for a given job name/UID.
+    Returns a list of arguments suitable for subprocess without shell=True.
+    """
@@
-    # -X: only show main job, no substeps
-    query_command = f"""sacct -X --parsable2 \
-                        --clusters all \
-                        --noheader --format=JobIdRaw,State \
-                        --starttime {sacct_starttime} \
-                        --endtime now --name {jobuid}"""
-
-    # for better redability in verbose output
-    query_command = " ".join(shlex.split(query_command))
-
-    return query_command
+    # -X: only show main job, no substeps
+    query_command = [
+        "sacct",
+        "-X",
+        "--parsable2",
+        "--clusters", "all",
+        "--noheader",
+        "--format=JobIdRaw,State",
+        "--starttime", sacct_starttime,
+        "--endtime", "now",
+        "--name", str(jobuid),
+    ]
+    return query_command

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
snakemake_executor_plugin_slurm/__init__.py (2)

40-44: Avoid running cluster probes in _get_status_command_help / metadata (import-time subprocess calls).

_get_status_command_help() still calls _get_status_command_default(), is_query_tool_available(), and should_recommend_squeue_status_command(), which in turn run which and scontrol show config. Because the help text is evaluated in the metadata["help"] field, these subprocess calls happen at import time (e.g. on snakemake --help), can be slow, and can fail on systems without SLURM or with misconfigured scontrol. This was already flagged earlier and is still present.

Consider making the help text static and moving all probing logic to runtime-only paths (_get_status_command_default, _validate_status_command_settings, and the actual status-query code). For example:

-def _get_status_command_help():
-    """Get help text with computed default."""
-    default_cmd = _get_status_command_default()
-    sacct_available = is_query_tool_available("sacct")
-    squeue_recommended = should_recommend_squeue_status_command()
-
-    base_help = "Command to query job status. Options: 'sacct', 'squeue'. "
-
-    if default_cmd == "sacct":
-        if sacct_available and not squeue_recommended:
-            info = (
-                "'sacct' detected and will be used "
-                "(MinJobAge may be too low for reliable 'squeue' usage)"
-            )
-        else:
-            info = "'sacct' detected and will be used"
-    else:  # default_cmd == "squeue"
-        if squeue_recommended:
-            # cumbersome, due to black and the need to stay below 80 chars
-            msg_part1 = "'squeue' recommended (MinJobAge is sufficient )"
-            msg_part2 = " for reliable usage"
-            info = msg_part1 + msg_part2
-        elif not sacct_available:
-            info = (
-                "'sacct' not available, falling back to 'squeue'. "
-                "WARNING: 'squeue' may not work reliably if MinJobAge is "
-                "too low"
-            )
-        else:
-            info = (
-                "'squeue' will be used. "
-                "WARNING: MinJobAge may be too low for reliable 'squeue' usage"
-            )
-
-    return (
-        f"{base_help}Default: '{default_cmd}' ({info}). "
-        f"Set explicitly to override auto-detection."
-    )
+def _get_status_command_help():
+    """Help text for the status_command setting."""
+    return (
+        "Command to query job status. Options: 'sacct' (preferred), 'squeue'. "
+        "The actual default is chosen dynamically at runtime based on cluster "
+        "configuration; see the plugin documentation for details."
+    )

Once this is done, keeping metadata={"help": _get_status_command_help(), ...} is safe because the function becomes pure and cheap.

Also applies to: 51-67, 69-106, 226-233


334-335: Fix MinJobAge logic, threshold source, and ambiguous × in _validate_status_command_settings.

The MinJobAge warnings are still inverted and the threshold is derived from a non-existent seconds_between_status_checks setting, while init_seconds_before_status_checks is the documented knob. Also, the log strings still use the ambiguous × character. This was pointed out earlier and remains unresolved.

Suggested adjustment aligning with the documented behaviour (MinJobAge affects squeue, not sacct) and Ruff’s feedback:

-                # Calculate dynamic threshold: 3 times the initial
-                # status check interval.
-                # The plugin starts with 40 seconds and increases,
-                # so we use (3 * 10) + 40 = 70 seconds as minimum
-                between_status_check_seconds = getattr(
-                    self.workflow.executor_settings,
-                    "seconds_between_status_checks",
-                    70,
-                )
-                dynamic_check_threshold = 3 * between_status_check_seconds
+                # Threshold: 3x initial status check interval (default 40s)
+                initial_interval = getattr(
+                    self.workflow.executor_settings,
+                    "init_seconds_before_status_checks",
+                    40,
+                )
+                dynamic_check_threshold = 3 * initial_interval
@@
-                elif sacct_available and min_job_age is not None:
-                    if (
-                        min_job_age < dynamic_check_threshold
-                        and status_command == "sacct"
-                    ):
-                        self.logger.warning(
-                            f"MinJobAge is {min_job_age} seconds "
-                            f"(< {dynamic_check_threshold}s). "
-                            f"This may cause 'sacct' to report inaccurate "
-                            "job states and the status_command option may "
-                            "be unreliable. "
-                            "(Threshold is 3x status check interval: 3 × "
-                            f"{between_status_check_seconds}s = "
-                            f"{dynamic_check_threshold}s)"
-                        )
-                    elif (
-                        min_job_age >= dynamic_check_threshold
-                        and status_command == "squeue"
-                    ):
-                        self.logger.warning(
-                            f"MinJobAge is {min_job_age} seconds (>= "
-                            f"{dynamic_check_threshold}s). "
-                            f"The 'squeue' command should work reliably for "
-                            "status queries. "
-                            "(Threshold is 3x status check interval: 3 × "
-                            f"{between_status_check_seconds}s = "
-                            f"{dynamic_check_threshold}s)"
-                        )
+                elif sacct_available and min_job_age is not None:
+                    if min_job_age < dynamic_check_threshold and status_command == "squeue":
+                        self.logger.warning(
+                            f"MinJobAge is {min_job_age} seconds (< {dynamic_check_threshold}s). "
+                            "This may cause 'squeue' to miss recently finished jobs; "
+                            "consider using 'sacct'. "
+                            f"(Threshold is 3x initial interval: 3 x {initial_interval}s = "
+                            f"{dynamic_check_threshold}s)"
+                        )
+                    elif min_job_age >= dynamic_check_threshold and status_command == "sacct":
+                        self.logger.warning(
+                            f"MinJobAge is {min_job_age} seconds (>= {dynamic_check_threshold}s). "
+                            "The 'squeue' command should also work reliably for status queries. "
+                            f"(Threshold is 3x initial interval: 3 x {initial_interval}s = "
+                            f"{dynamic_check_threshold}s)"
+                        )

Also note that the warning branch

if not sacct_available and status_command == "sacct":
    self.logger.warning(
        "The 'sacct' command is not available on this system. "
        "Using 'squeue' instead for job status queries."
    )

does not actually switch status_command to "squeue"; it only logs. Either adjust the message (“will fall back to 'squeue' once status querying honours this setting”) or implement the actual fallback by mutating the setting.

Also applies to: 337-389

🧹 Nitpick comments (2)
snakemake_executor_plugin_slurm/__init__.py (2)

191-224: Efficiency report wiring now runs debug-style directory listing; consider trimming before production.

With the new efficiency_report* settings and the shutdown hook, enabling efficiency_report will now call create_efficiency_report() on every run. Per the retrieved learnings, that function still logs os.listdir(cwd) at debug level for testing purposes only. This can clutter logs and leak directory contents unnecessarily once this feature is on the main execution path.

Based on learnings, I’d suggest removing or guarding that debug line in snakemake_executor_plugin_slurm/efficiency_report.py (while keeping the summary logger.info), now that report generation is wired into normal shutdown.

Also applies to: 399-419


137-147: Handle delete_logfiles_older_than being None despite Optional typing.

delete_logfiles_older_than is typed as Optional[int] but clean_old_logs() assumes it’s always an int:

age_cutoff = self.workflow.executor_settings.delete_logfiles_older_than
keep_all = self.workflow.executor_settings.keep_successful_logs
if age_cutoff <= 0 or keep_all:
    return

If configuration ever sets this to None, the comparison age_cutoff <= 0 will raise a TypeError.

A small defensive tweak would make this robust:

-        age_cutoff = self.workflow.executor_settings.delete_logfiles_older_than
-        keep_all = self.workflow.executor_settings.keep_successful_logs
-        if age_cutoff <= 0 or keep_all:
+        age_cutoff = self.workflow.executor_settings.delete_logfiles_older_than
+        keep_all = self.workflow.executor_settings.keep_successful_logs
+        if age_cutoff is None or age_cutoff <= 0 or keep_all:
             return

Optionally, you could also mention in the help text that None/negative values disable cleanup.

Also applies to: 421-428

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc851dd and 5a6cacb.

📒 Files selected for processing (3)
  • pyproject.toml (1 hunks)
  • snakemake_executor_plugin_slurm/__init__.py (17 hunks)
  • snakemake_executor_plugin_slurm/efficiency_report.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pyproject.toml
  • snakemake_executor_plugin_slurm/efficiency_report.py
🧰 Additional context used
🧠 Learnings (8)
📓 Common learnings
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.
Learnt from: johanneskoester
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.
📚 Learning: 2024-12-11T14:17:08.749Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 178
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2024-12-11T14:17:08.749Z
Learning: In the `snakemake-executor-plugin-slurm` project, when handling exceptions in `snakemake_executor_plugin_slurm/__init__.py`, prefer concise error messages and avoid unnecessary verbosity or exception chaining when it's not necessary.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-03-10T15:20:51.829Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-01-20T09:13:26.443Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 193
File: .github/workflows/post_to_mastodon.yml:26-26
Timestamp: 2025-01-20T09:13:26.443Z
Learning: In the snakemake-executor-plugin-slurm repository, release PRs follow the naming pattern "chore(main): release X.Y.Z" where X.Y.Z is the version number.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-01-13T09:54:22.950Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-03-31T11:52:05.373Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 249
File: snakemake_executor_plugin_slurm/submit_string.py:21-29
Timestamp: 2025-03-31T11:52:05.373Z
Learning: In the snakemake-executor-plugin-slurm project, account and partition parameters are validated in the submit function before reaching the get_submit_command function. By design, these parameters are either empty strings or already properly formatted with SLURM parameter tags, making additional validation in get_submit_command unnecessary.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-09-09T15:55:55.669Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 0
File: :0-0
Timestamp: 2025-09-09T15:55:55.669Z
Learning: In the Snakemake SLURM executor plugin, Snakemake submits itself as a job using the --wrap parameter in the sbatch command, rather than creating or submitting a separate job.sh script file. The actual implementation uses `call += f' --wrap="{exec_job}"'` where exec_job is the formatted snakemake execution command.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-06-03T17:10:51.292Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 221
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2025-06-03T17:10:51.292Z
Learning: In the snakemake-executor-plugin-slurm repository, the create_efficiency_report method in snakemake_executor_plugin_slurm/__init__.py contains temporary testing/debugging code that logs directory contents using os.listdir() and directory paths. This code was inserted for testing purposes only and should be removed from the production implementation.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
🧬 Code graph analysis (1)
snakemake_executor_plugin_slurm/__init__.py (2)
snakemake_executor_plugin_slurm/job_status_query.py (3)
  • get_min_job_age (7-56)
  • is_query_tool_available (59-70)
  • should_recommend_squeue_status_command (73-97)
snakemake_executor_plugin_slurm/efficiency_report.py (1)
  • create_efficiency_report (233-262)
🪛 Ruff (0.14.5)
snakemake_executor_plugin_slurm/__init__.py

58-62: Avoid specifying long messages outside the exception class

(TRY003)


372-372: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF001)


385-385: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF001)


592-592: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: testing

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
snakemake_executor_plugin_slurm/__init__.py (2)

69-106: Avoid executing external commands at import time for metadata help text.

The _get_status_command_help() function calls _get_status_command_default(), is_query_tool_available(), and should_recommend_squeue_status_command(), which execute external commands (which, scontrol). When this function is invoked in the field metadata at line 229, these commands run at module import time, potentially causing delays or failures during import.

Replace the dynamic help with a static string:

 def _get_status_command_help():
-    """Get help text with computed default."""
-    default_cmd = _get_status_command_default()
-    sacct_available = is_query_tool_available("sacct")
-    squeue_recommended = should_recommend_squeue_status_command()
-
-    base_help = "Command to query job status. Options: 'sacct', 'squeue'. "
-
-    if default_cmd == "sacct":
-        if sacct_available and not squeue_recommended:
-            info = (
-                "'sacct' detected and will be used "
-                "(MinJobAge may be too low for reliable 'squeue' usage)"
-            )
-        else:
-            info = "'sacct' detected and will be used"
-    else:  # default_cmd == "squeue"
-        if squeue_recommended:
-            # cumbersome, due to black and the need to stay below 80 chars
-            msg_part1 = "'squeue' recommended (MinJobAge is sufficient )"
-            msg_part2 = " for reliable usage"
-            info = msg_part1 + msg_part2
-        elif not sacct_available:
-            info = (
-                "'sacct' not available, falling back to 'squeue'. "
-                "WARNING: 'squeue' may not work reliably if MinJobAge is "
-                "too low"
-            )
-        else:
-            info = (
-                "'squeue' will be used. "
-                "WARNING: MinJobAge may be too low for reliable 'squeue' usage"
-            )
-
-    return (
-        f"{base_help}Default: '{default_cmd}' ({info}). "
-        f"Set explicitly to override auto-detection."
-    )
+    """Static help text for status_command."""
+    return (
+        "Command to query job status. Options: 'sacct' (preferred), 'squeue'. "
+        "Default is chosen dynamically: 'sacct' if available, otherwise 'squeue'. "
+        "Note: 'squeue' reliability depends on MinJobAge configuration; see plugin documentation."
+    )

226-246: Complete the integration: status_command is defined but never used in job status querying.

The status_command configuration field is defined with validation and a get_status_command() method, but the actual status polling in check_active_jobs() (lines 662-669) hardcodes a sacct query and never invokes get_status_command(). The two query functions query_job_status_sacct() and query_job_status_squeue() from job_status_query.py are imported but never called. This means the status_command setting currently has no effect on execution.

To complete the integration, modify check_active_jobs() to use the configured command. Here's the approach:

         # -X: only show main job, no substeps
-        sacct_command = f"""sacct -X --parsable2 \
-                        --clusters all \
-                        --noheader --format=JobIdRaw,State \
-                        --starttime {sacct_starttime} \
-                        --endtime now --name {self.run_uuid}"""
-
-        # for better redability in verbose output
-        sacct_command = " ".join(shlex.split(sacct_command))
+        # Import query functions at the top of the file
+        from .job_status_query import query_job_status_sacct, query_job_status_squeue
+        
+        # Get the configured status command
+        status_cmd = self.get_status_command()
+        
+        if status_cmd == "sacct":
+            query_command = query_job_status_sacct(self.run_uuid)
+        elif status_cmd == "squeue":
+            query_command = query_job_status_squeue(self.run_uuid)
+            # Convert list to string for consistency
+            query_command = " ".join(query_command)
+        else:
+            raise WorkflowError(f"Unknown status_command: {status_cmd}")

Alternatively, if you prefer not to implement this now, remove or hide the status_command field to avoid misleading users with a non-functional option.

Also applies to: 387-394, 662-669

🧹 Nitpick comments (2)
snakemake_executor_plugin_slurm/job_status_query.py (1)

63-74: Update docstring to match generic function signature.

The docstring specifically mentions "Check if the sacct command" but the function accepts a generic tool_name parameter. Update the docstring to reflect that this checks for any tool.

Apply this diff:

 def is_query_tool_available(tool_name):
     """
-    Check if the sacct command is available on the system.
-    Returns True if sacct is available, False otherwise.
+    Check if the specified command is available on the system.
+    Returns True if the tool is available, False otherwise.
     """
snakemake_executor_plugin_slurm/__init__.py (1)

268-271: Remove empty __post_init__ or add validation logic.

The __post_init__ method is defined but empty with a placeholder comment. Either remove it if no validation is needed, or implement the validation logic referenced in the docstring.

If no validation is needed:

-    def __post_init__(self):
-        """Validate settings after initialization."""
-        # Add any validation logic here if needed in the future
-        pass

Otherwise, add the actual validation logic here.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a6cacb and e8e360a.

📒 Files selected for processing (2)
  • snakemake_executor_plugin_slurm/__init__.py (17 hunks)
  • snakemake_executor_plugin_slurm/job_status_query.py (1 hunks)
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.
Learnt from: johanneskoester
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.
📚 Learning: 2024-12-11T14:17:08.749Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 178
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2024-12-11T14:17:08.749Z
Learning: In the `snakemake-executor-plugin-slurm` project, when handling exceptions in `snakemake_executor_plugin_slurm/__init__.py`, prefer concise error messages and avoid unnecessary verbosity or exception chaining when it's not necessary.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
  • snakemake_executor_plugin_slurm/job_status_query.py
📚 Learning: 2025-03-10T15:20:51.829Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
  • snakemake_executor_plugin_slurm/job_status_query.py
📚 Learning: 2025-01-13T09:54:22.950Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
  • snakemake_executor_plugin_slurm/job_status_query.py
📚 Learning: 2025-01-20T09:13:26.443Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 193
File: .github/workflows/post_to_mastodon.yml:26-26
Timestamp: 2025-01-20T09:13:26.443Z
Learning: In the snakemake-executor-plugin-slurm repository, release PRs follow the naming pattern "chore(main): release X.Y.Z" where X.Y.Z is the version number.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-03-31T11:52:05.373Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 249
File: snakemake_executor_plugin_slurm/submit_string.py:21-29
Timestamp: 2025-03-31T11:52:05.373Z
Learning: In the snakemake-executor-plugin-slurm project, account and partition parameters are validated in the submit function before reaching the get_submit_command function. By design, these parameters are either empty strings or already properly formatted with SLURM parameter tags, making additional validation in get_submit_command unnecessary.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-07-22T17:31:41.629Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 333
File: .github/workflows/announce-release.yml:0-0
Timestamp: 2025-07-22T17:31:41.629Z
Learning: In the snakemake-executor-plugin-slurm repository, cmeesters prefers using `main` references for GitHub Actions to always get the latest version, with security provided through protected branches and mandatory CI checks rather than commit SHA pinning.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-06-24T12:41:31.837Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 221
File: tests/tests.py:62-70
Timestamp: 2025-06-24T12:41:31.837Z
Learning: In the snakemake-executor-plugin-slurm efficiency report feature, the executor is responsible for creating the efficiency_report_path directory when generating efficiency reports. Tests should not manually check for directory existence but should rely on the executor to properly create necessary directories as part of its normal operation.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-01-17T17:27:32.446Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 187
File: .github/workflows/post_to_mastodon.yml:0-0
Timestamp: 2025-01-17T17:27:32.446Z
Learning: In the mastodon publishing workflow for snakemake-executor-plugin-slurm, the PR_TITLE environment variable is required by the post_to_mastodon.sh script and should be preserved.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-09-16T17:25:38.450Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 355
File: snakemake_executor_plugin_slurm/submit_string.py:0-0
Timestamp: 2025-09-16T17:25:38.450Z
Learning: In the snakemake-executor-plugin-slurm project, constraint and qos parameters follow a three-state logic: None (not set, no flag added), empty string (explicitly set to empty, adds flag with empty quotes like --constraint ''), and non-empty string (adds flag with quoted value). The correct implementation uses `constraint = job.resources.get("constraint")` followed by `if constraint is not None:` to distinguish between "not provided" and "explicitly set to empty".

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-06-03T17:10:51.292Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 221
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2025-06-03T17:10:51.292Z
Learning: In the snakemake-executor-plugin-slurm repository, the create_efficiency_report method in snakemake_executor_plugin_slurm/__init__.py contains temporary testing/debugging code that logs directory contents using os.listdir() and directory paths. This code was inserted for testing purposes only and should be removed from the production implementation.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
🧬 Code graph analysis (1)
snakemake_executor_plugin_slurm/__init__.py (2)
snakemake_executor_plugin_slurm/job_status_query.py (3)
  • get_min_job_age (7-60)
  • is_query_tool_available (63-74)
  • should_recommend_squeue_status_command (77-101)
snakemake_executor_plugin_slurm/efficiency_report.py (1)
  • create_efficiency_report (233-262)
🪛 Ruff (0.14.5)
snakemake_executor_plugin_slurm/__init__.py

58-62: Avoid specifying long messages outside the exception class

(TRY003)


369-369: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF001)


382-382: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF001)


589-589: Avoid specifying long messages outside the exception class

(TRY003)

snakemake_executor_plugin_slurm/job_status_query.py

17-17: subprocess call: check for execution of untrusted input

(S603)


71-71: subprocess call: check for execution of untrusted input

(S603)


72-72: Consider moving this statement to an else block

(TRY300)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: testing

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
snakemake_executor_plugin_slurm/__init__.py (2)

228-235: Import-time execution triggered by metadata help text.

The metadata "help" field at line 231 calls _get_status_command_help() during class definition (import time). Once the function in the previous comment is converted to return a static string, this call becomes safe.

Ensure _get_status_command_help() returns only a static string (no function calls) before this metadata evaluation.


71-108: Critical: Import-time execution still present in help text generation.

Despite past review feedback marking this as addressed, _get_status_command_help() still calls _get_status_command_default(), is_query_tool_available(), and should_recommend_squeue_status_command() which execute external commands (scontrol, which) at import time. This occurs when the metadata dictionary at line 231 is evaluated during module import, causing:

  • Slow/hanging imports if SLURM tools are unavailable
  • Potential exceptions during import
  • Violation of Python import best practices

Replace with a static help string:

 def _get_status_command_help():
-    """Get help text with computed default."""
-    default_cmd = _get_status_command_default()
-    sacct_available = is_query_tool_available("sacct")
-    squeue_recommended = should_recommend_squeue_status_command()
-
-    base_help = "Command to query job status. Options: 'sacct', 'squeue'. "
-
-    if default_cmd == "sacct":
-        if sacct_available and not squeue_recommended:
-            info = (
-                "'sacct' detected and will be used "
-                "(MinJobAge may be too low for reliable 'squeue' usage)"
-            )
-        else:
-            info = "'sacct' detected and will be used"
-    else:  # default_cmd == "squeue"
-        if squeue_recommended:
-            # cumbersome, due to black and the need to stay below 80 chars
-            msg_part1 = "'squeue' recommended (MinJobAge is sufficient )"
-            msg_part2 = " for reliable usage"
-            info = msg_part1 + msg_part2
-        elif not sacct_available:
-            info = (
-                "'sacct' not available, falling back to 'squeue'. "
-                "WARNING: 'squeue' may not work reliably if MinJobAge is "
-                "too low"
-            )
-        else:
-            info = (
-                "'squeue' will be used. "
-                "WARNING: MinJobAge may be too low for reliable 'squeue' usage"
-            )
-
-    return (
-        f"{base_help}Default: '{default_cmd}' ({info}). "
-        f"Set explicitly to override auto-detection."
-    )
+    """Static help text for status_command."""
+    return (
+        "Command to query job status. Options: 'sacct' (preferred), 'squeue'. "
+        "Default is chosen dynamically based on tool availability and MinJobAge. "
+        "Set explicitly to override auto-detection."
+    )
🧹 Nitpick comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)

667-667: Consider renaming unused loop variable to underscore.

The loop variable i is not used within the loop body. By convention, use _ for unused variables.

-        for i in range(status_attempts):
+        for _ in range(status_attempts):
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e8e360a and 3b19f8c.

📒 Files selected for processing (4)
  • snakemake_executor_plugin_slurm/__init__.py (17 hunks)
  • snakemake_executor_plugin_slurm/partitions.py (2 hunks)
  • tests/test_partition_selection.py (2 hunks)
  • tests/tests.py (1 hunks)
✅ Files skipped from review due to trivial changes (3)
  • snakemake_executor_plugin_slurm/partitions.py
  • tests/tests.py
  • tests/test_partition_selection.py
🧰 Additional context used
🧠 Learnings (12)
📓 Common learnings
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.
Learnt from: johanneskoester
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.
📚 Learning: 2024-12-11T14:17:08.749Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 178
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2024-12-11T14:17:08.749Z
Learning: In the `snakemake-executor-plugin-slurm` project, when handling exceptions in `snakemake_executor_plugin_slurm/__init__.py`, prefer concise error messages and avoid unnecessary verbosity or exception chaining when it's not necessary.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-03-10T15:20:51.829Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-01-20T09:13:26.443Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 193
File: .github/workflows/post_to_mastodon.yml:26-26
Timestamp: 2025-01-20T09:13:26.443Z
Learning: In the snakemake-executor-plugin-slurm repository, release PRs follow the naming pattern "chore(main): release X.Y.Z" where X.Y.Z is the version number.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-01-13T09:54:22.950Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-03-31T11:52:05.373Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 249
File: snakemake_executor_plugin_slurm/submit_string.py:21-29
Timestamp: 2025-03-31T11:52:05.373Z
Learning: In the snakemake-executor-plugin-slurm project, account and partition parameters are validated in the submit function before reaching the get_submit_command function. By design, these parameters are either empty strings or already properly formatted with SLURM parameter tags, making additional validation in get_submit_command unnecessary.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-01-17T17:27:32.446Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 187
File: .github/workflows/post_to_mastodon.yml:0-0
Timestamp: 2025-01-17T17:27:32.446Z
Learning: In the mastodon publishing workflow for snakemake-executor-plugin-slurm, the PR_TITLE environment variable is required by the post_to_mastodon.sh script and should be preserved.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-07-22T17:31:41.629Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 333
File: .github/workflows/announce-release.yml:0-0
Timestamp: 2025-07-22T17:31:41.629Z
Learning: In the snakemake-executor-plugin-slurm repository, cmeesters prefers using `main` references for GitHub Actions to always get the latest version, with security provided through protected branches and mandatory CI checks rather than commit SHA pinning.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-06-24T12:41:31.837Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 221
File: tests/tests.py:62-70
Timestamp: 2025-06-24T12:41:31.837Z
Learning: In the snakemake-executor-plugin-slurm efficiency report feature, the executor is responsible for creating the efficiency_report_path directory when generating efficiency reports. Tests should not manually check for directory existence but should rely on the executor to properly create necessary directories as part of its normal operation.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-09-16T17:25:38.450Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 355
File: snakemake_executor_plugin_slurm/submit_string.py:0-0
Timestamp: 2025-09-16T17:25:38.450Z
Learning: In the snakemake-executor-plugin-slurm project, constraint and qos parameters follow a three-state logic: None (not set, no flag added), empty string (explicitly set to empty, adds flag with empty quotes like --constraint ''), and non-empty string (adds flag with quoted value). The correct implementation uses `constraint = job.resources.get("constraint")` followed by `if constraint is not None:` to distinguish between "not provided" and "explicitly set to empty".

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-05-26T11:50:44.708Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 221
File: snakemake_executor_plugin_slurm/__init__.py:800-859
Timestamp: 2025-05-26T11:50:44.708Z
Learning: When using sacct with a specific --format parameter in the snakemake SLURM executor plugin, column validation is unnecessary because the query format is controlled and the specified columns will always be present in the output.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-06-03T17:10:51.292Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 221
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2025-06-03T17:10:51.292Z
Learning: In the snakemake-executor-plugin-slurm repository, the create_efficiency_report method in snakemake_executor_plugin_slurm/__init__.py contains temporary testing/debugging code that logs directory contents using os.listdir() and directory paths. This code was inserted for testing purposes only and should be removed from the production implementation.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
🧬 Code graph analysis (1)
snakemake_executor_plugin_slurm/__init__.py (2)
snakemake_executor_plugin_slurm/job_status_query.py (5)
  • get_min_job_age (7-60)
  • is_query_tool_available (63-74)
  • should_recommend_squeue_status_command (77-101)
  • query_job_status_squeue (136-157)
  • query_job_status_sacct (104-133)
snakemake_executor_plugin_slurm/efficiency_report.py (1)
  • create_efficiency_report (233-262)
🪛 Ruff (0.14.5)
snakemake_executor_plugin_slurm/__init__.py

60-64: Avoid specifying long messages outside the exception class

(TRY003)


369-369: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF001)


382-382: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF001)


591-591: Avoid specifying long messages outside the exception class

(TRY003)


667-667: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: testing
🔇 Additional comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)

655-661: Status command integration looks good.

The status_command option is now properly integrated. The code calls get_status_command() and conditionally invokes either query_job_status_sacct() or query_job_status_squeue() based on the configuration, addressing the concern from past reviews that the option was non-functional.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)

836-851: Fix error message formatting and clarify exception handling.

Line 839: Missing proper punctuation/spacing between the quote and the next sentence.
Line 851: Bare pass after logging an error swallows the exception. Clarify whether this is intentional (graceful degradation) or should propagate the error.

Apply this diff:

                 self.logger.warning(
                     "The SLURM database might not be available ... "
-                    f"Error message: '{error_message}'"
+                    f"Error message: '{error_message}'. "
                     "This error message indicates that the SLURM database is "
                     "currently not available. This is not an error of the "
                     "Snakemake plugin, but some kind of server issue. "
                     "Please consult with your HPC provider."
                 )
             else:
                 self.logger.error(
-                    f"The job status query failed with command '{command}'"
-                    f"Error message: '{error_message}'"
+                    f"The job status query failed with command '{command}'. "
+                    f"Error message: '{error_message}'. "
                     "This error message is not expected, please report it back to us."
                 )
-            pass
+            # Intentionally swallow exception to allow retry on next status check
♻️ Duplicate comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)

70-107: Import-time execution still occurs in _get_status_command_help().

Despite the past review marking this as addressed, the current code still calls _get_status_command_default() (line 72), is_query_tool_available() (line 73), and should_recommend_squeue_status_command() (line 74) within _get_status_command_help(), which is invoked in field metadata at line 230 during module import. These functions execute external commands (which, scontrol), causing side effects at import time.

Replace with a static help string:

 def _get_status_command_help():
-    """Get help text with computed default."""
-    default_cmd = _get_status_command_default()
-    sacct_available = is_query_tool_available("sacct")
-    squeue_recommended = should_recommend_squeue_status_command()
-
-    base_help = "Command to query job status. Options: 'sacct', 'squeue'. "
-
-    if default_cmd == "sacct":
-        if sacct_available and not squeue_recommended:
-            info = (
-                "'sacct' detected and will be used "
-                "(MinJobAge may be too low for reliable 'squeue' usage)"
-            )
-        else:
-            info = "'sacct' detected and will be used"
-    else:  # default_cmd == "squeue"
-        if squeue_recommended:
-            # cumbersome, due to black and the need to stay below 80 chars
-            msg_part1 = "'squeue' recommended (MinJobAge is sufficient )"
-            msg_part2 = " for reliable usage"
-            info = msg_part1 + msg_part2
-        elif not sacct_available:
-            info = (
-                "'sacct' not available, falling back to 'squeue'. "
-                "WARNING: 'squeue' may not work reliably if MinJobAge is "
-                "too low"
-            )
-        else:
-            info = (
-                "'squeue' will be used. "
-                "WARNING: MinJobAge may be too low for reliable 'squeue' usage"
-            )
-
-    return (
-        f"{base_help}Default: '{default_cmd}' ({info}). "
-        f"Set explicitly to override auto-detection."
-    )
+    """Help text for status_command."""
+    return (
+        "Command to query job status. Options: 'sacct' (preferred), 'squeue'. "
+        "Default is auto-detected: 'sacct' if available, otherwise 'squeue'. "
+        "Note: 'squeue' reliability depends on MinJobAge configuration; "
+        "see plugin documentation for details."
+    )
🧹 Nitpick comments (8)
snakemake_executor_plugin_slurm/__init__.py (8)

642-642: Rename variable to reflect support for both status commands.

The variable sacct_query_durations is now used for both sacct and squeue queries. Rename to a command-agnostic name.

Apply this diff:

-        sacct_query_durations = []
+        status_query_durations = []

And update references at lines 674, etc.


650-652: Rename variables to be command-agnostic.

Variables active_jobs_seen_by_sacct and missing_sacct_status are now used for both sacct and squeue queries.

Apply this diff:

-        active_jobs_ids = {job_info.external_jobid for job_info in active_jobs}
-        active_jobs_seen_by_sacct = set()
-        missing_sacct_status = set()
+        active_jobs_ids = {job_info.external_jobid for job_info in active_jobs}
+        active_jobs_seen_by_status_query = set()
+        missing_status = set()

Update all references throughout the method (lines 684, 689, 691, 695, 699, 703, 724, 754, 765).


666-666: Use loop variable or rename to '_'.

The loop variable i is not used within the loop body. Either use it for logging/debugging or rename to _ to indicate intentional discard.

Apply this diff:

-        for i in range(status_attempts):
+        for _ in range(status_attempts):

Based on static analysis hints.


675-696: Update comments and log messages to be command-agnostic.

Comments and debug messages still reference "sacct" specifically but should reflect that either sacct or squeue may be used.

Apply this diff:

                 self.logger.debug(f"could not check status of job {self.run_uuid}")
                     continue
-                sacct_query_durations.append(sacct_query_duration)
-                self.logger.debug(f"status_of_jobs after sacct is: {status_of_jobs}")
+                status_query_durations.append(status_query_duration)
+                self.logger.debug(f"status_of_jobs after query is: {status_of_jobs}")
                 # only take jobs that are still active
-                active_jobs_ids_with_current_sacct_status = (
+                active_jobs_ids_with_current_status = (
                     set(status_of_jobs.keys()) & active_jobs_ids
                 )
                 self.logger.debug(
-                    f"active_jobs_ids_with_current_sacct_status are: "
-                    f"{active_jobs_ids_with_current_sacct_status}"
+                    f"active_jobs_ids_with_current_status are: "
+                    f"{active_jobs_ids_with_current_status}"
                 )
-                active_jobs_seen_by_sacct = (
-                    active_jobs_seen_by_sacct
-                    | active_jobs_ids_with_current_sacct_status
+                active_jobs_seen_by_status_query = (
+                    active_jobs_seen_by_status_query
+                    | active_jobs_ids_with_current_status
                 )
                 self.logger.debug(
-                    "active_jobs_seen_by_sacct are: " f"{active_jobs_seen_by_sacct}"
+                    "active_jobs_seen_by_status_query are: " f"{active_jobs_seen_by_status_query}"
                 )
-                missing_sacct_status = (
-                    active_jobs_seen_by_sacct
-                    - active_jobs_ids_with_current_sacct_status
+                missing_status = (
+                    active_jobs_seen_by_status_query
+                    - active_jobs_ids_with_current_status
                 )
-                self.logger.debug(f"missing_sacct_status are: {missing_sacct_status}")
-                if not missing_sacct_status:
+                self.logger.debug(f"missing_status are: {missing_status}")
+                if not missing_status:
                     break

699-708: Update warning message to be command-agnostic.

The warning message references "slurmdbd" and "sacct" specifically, but this warning applies to either status command when the accounting system is experiencing issues.

Apply this diff:

         if missing_status:
             self.logger.warning(
-                "Unable to get the status of all active jobs that should be "
-                f"in slurmdbd, even after {status_attempts} attempts.\n"
-                "The jobs with the following slurm job ids were previously "
-                " seen by sacct, but sacct doesn't report them any more:\n"
-                f"{missing_status}\n"
-                "Please double-check with your slurm cluster administrator, "
-                "that slurmdbd job accounting is properly set up.\n"
+                f"Unable to get the status of all active jobs, even after "
+                f"{status_attempts} attempts.\n"
+                f"The jobs with the following SLURM job IDs were previously seen "
+                f"but are no longer reported:\n"
+                f"{missing_status}\n"
+                f"Please verify with your SLURM cluster administrator that the "
+                f"accounting/status system is properly configured.\n"
             )

710-768: Update comments referencing sacct/slurmdbd.

Comments at lines 713-717 still reference "slurmdbd" and "sacct" specifically. These should be more generic since squeue is also supported.

Apply this diff:

             for j in active_jobs:
-                # the job probably didn't make it into slurmdbd yet, so
-                # `sacct` doesn't return it
+                # The job probably hasn't been recorded yet or is too new,
+                # so the status query doesn't return it
                 if j.external_jobid not in status_of_jobs:
-                    # but the job should still be queueing or running and
-                    # appear in slurmdbd (and thus `sacct` output) later
+                    # The job should still be queueing or running and
+                    # appear in subsequent status queries
                     yield j
                     continue

722-765: Update references to active_jobs_seen_by_sacct.

After renaming the variable, update the remove() calls to use the new name.

Apply this diff (if the earlier rename is applied):

                 if status == "COMPLETED":
                     self.report_job_success(j)
                     any_finished = True
-                    active_jobs_seen_by_sacct.remove(j.external_jobid)
+                    active_jobs_seen_by_status_query.remove(j.external_jobid)
                     if not self.workflow.executor_settings.keep_successful_logs:

And similarly at lines 754, 765.


749-751: Consider logging the UNKNOWN status case.

When status is "UNKNOWN," the job is assumed finished successfully without confirmation. This silent assumption might mask issues.

Add a debug log:

                 elif status == "UNKNOWN":
-                    # the job probably does not exist anymore, but 'sacct' did not work
-                    # so we assume it is finished
+                    # The job probably does not exist anymore, but the status query
+                    # returned UNKNOWN, so we assume it finished successfully
+                    self.logger.debug(
+                        f"Job {j.external_jobid} has UNKNOWN status; assuming success"
+                    )
                     self.report_job_success(j)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b19f8c and d189548.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_slurm/__init__.py (17 hunks)
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.
Learnt from: johanneskoester
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.
📚 Learning: 2024-12-11T14:17:08.749Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 178
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2024-12-11T14:17:08.749Z
Learning: In the `snakemake-executor-plugin-slurm` project, when handling exceptions in `snakemake_executor_plugin_slurm/__init__.py`, prefer concise error messages and avoid unnecessary verbosity or exception chaining when it's not necessary.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-03-10T15:20:51.829Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-01-13T09:54:22.950Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-01-20T09:13:26.443Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 193
File: .github/workflows/post_to_mastodon.yml:26-26
Timestamp: 2025-01-20T09:13:26.443Z
Learning: In the snakemake-executor-plugin-slurm repository, release PRs follow the naming pattern "chore(main): release X.Y.Z" where X.Y.Z is the version number.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-03-31T11:52:05.373Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 249
File: snakemake_executor_plugin_slurm/submit_string.py:21-29
Timestamp: 2025-03-31T11:52:05.373Z
Learning: In the snakemake-executor-plugin-slurm project, account and partition parameters are validated in the submit function before reaching the get_submit_command function. By design, these parameters are either empty strings or already properly formatted with SLURM parameter tags, making additional validation in get_submit_command unnecessary.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-01-17T17:27:32.446Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 187
File: .github/workflows/post_to_mastodon.yml:0-0
Timestamp: 2025-01-17T17:27:32.446Z
Learning: In the mastodon publishing workflow for snakemake-executor-plugin-slurm, the PR_TITLE environment variable is required by the post_to_mastodon.sh script and should be preserved.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-09-16T17:25:38.450Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 355
File: snakemake_executor_plugin_slurm/submit_string.py:0-0
Timestamp: 2025-09-16T17:25:38.450Z
Learning: In the snakemake-executor-plugin-slurm project, constraint and qos parameters follow a three-state logic: None (not set, no flag added), empty string (explicitly set to empty, adds flag with empty quotes like --constraint ''), and non-empty string (adds flag with quoted value). The correct implementation uses `constraint = job.resources.get("constraint")` followed by `if constraint is not None:` to distinguish between "not provided" and "explicitly set to empty".

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-07-22T17:31:41.629Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 333
File: .github/workflows/announce-release.yml:0-0
Timestamp: 2025-07-22T17:31:41.629Z
Learning: In the snakemake-executor-plugin-slurm repository, cmeesters prefers using `main` references for GitHub Actions to always get the latest version, with security provided through protected branches and mandatory CI checks rather than commit SHA pinning.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-06-16T08:54:07.957Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 221
File: snakemake_executor_plugin_slurm/efficiency_report.py:0-0
Timestamp: 2025-06-16T08:54:07.957Z
Learning: In the context of SLURM executor plugin development, user cmeesters (experienced with SLURM) noted they are not aware of SLURM typically reporting memory values with decimals (like 1.5G), but accepted defensive programming to handle such cases anyway.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-06-03T17:10:51.292Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 221
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2025-06-03T17:10:51.292Z
Learning: In the snakemake-executor-plugin-slurm repository, the create_efficiency_report method in snakemake_executor_plugin_slurm/__init__.py contains temporary testing/debugging code that logs directory contents using os.listdir() and directory paths. This code was inserted for testing purposes only and should be removed from the production implementation.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
🧬 Code graph analysis (1)
snakemake_executor_plugin_slurm/__init__.py (1)
snakemake_executor_plugin_slurm/job_status_query.py (5)
  • get_min_job_age (7-60)
  • is_query_tool_available (63-74)
  • should_recommend_squeue_status_command (77-101)
  • query_job_status_squeue (136-157)
  • query_job_status_sacct (104-133)
🪛 Ruff (0.14.5)
snakemake_executor_plugin_slurm/__init__.py

59-63: Avoid specifying long messages outside the exception class

(TRY003)


590-590: Avoid specifying long messages outside the exception class

(TRY003)


666-666: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: testing
🔇 Additional comments (3)
snakemake_executor_plugin_slurm/__init__.py (3)

395-416: LGTM!

The shutdown flow is well-structured: invokes parent cleanup first, then removes old logs, and finally generates an optional efficiency report. The conditional logic for efficiency reporting is clear and correct.


417-447: LGTM!

The log cleanup implementation is robust: early returns for skip conditions, proper time calculations, appropriate error handling, and cleanup of empty directories. The logic correctly handles the age cutoff and the keep_successful_logs flag.


486-604: Improved documentation and messaging.

The updated comments (lines 486-494, 497-499) and reformatted warning messages (lines 537-546) enhance code clarity without changing behavior. The assertion message clearly explains the %j placeholder constraint, and the run_uuid usage comment now correctly reflects support for both sacct and squeue.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)

70-107: Remove dynamic logic from _get_status_command_help() to prevent import-time subprocess execution.

Past review comments referencing commits 7220326 and 8d9624a are incorrect—those commits only fixed formatting and did not address the underlying issue. The function still executes _get_status_command_default(), is_query_tool_available(), and should_recommend_squeue_status_command() at import time (via field metadata at line 230), causing subprocess calls during module initialization. This must be refactored to static help text:

 def _get_status_command_help():
-    """Get help text with computed default."""
-    default_cmd = _get_status_command_default()
-    sacct_available = is_query_tool_available("sacct")
-    squeue_recommended = should_recommend_squeue_status_command()
-
-    base_help = "Command to query job status. Options: 'sacct', 'squeue'. "
-
-    if default_cmd == "sacct":
-        if sacct_available and not squeue_recommended:
-            info = (
-                "'sacct' detected and will be used "
-                "(MinJobAge may be too low for reliable 'squeue' usage)"
-            )
-        else:
-            info = "'sacct' detected and will be used"
-    else:  # default_cmd == "squeue"
-        if squeue_recommended:
-            # cumbersome, due to black and the need to stay below 80 chars
-            msg_part1 = "'squeue' recommended (MinJobAge is sufficient )"
-            msg_part2 = " for reliable usage"
-            info = msg_part1 + msg_part2
-        elif not sacct_available:
-            info = (
-                "'sacct' not available, falling back to 'squeue'. "
-                "WARNING: 'squeue' may not work reliably if MinJobAge is "
-                "too low"
-            )
-        else:
-            info = (
-                "'squeue' will be used. "
-                "WARNING: MinJobAge may be too low for reliable 'squeue' usage"
-            )
-
-    return (
-        f"{base_help}Default: '{default_cmd}' ({info}). "
-        f"Set explicitly to override auto-detection."
-    )
+    """Help text for status_command field."""
+    return (
+        "Command to query job status. Options: 'sacct' (preferred), 'squeue'. "
+        "Default is chosen dynamically: 'sacct' if available, otherwise 'squeue'. "
+        "Note: 'squeue' reliability depends on MinJobAge; see plugin documentation."
+    )
🧹 Nitpick comments (2)
snakemake_executor_plugin_slurm/__init__.py (2)

362-373: Fix awkward word hyphenation in warning message.

The word "administrator" is split with a hyphen across lines 367-368 ("admini-" / "strator"), which is unusual for Python string literals. Consider rewording for better readability.

Apply this diff:

                         self.logger.warning(
                             f"MinJobAge is {min_job_age} seconds "
                             f"(< {dynamic_check_threshold}s). "
                             f"This may cause 'squeue' to miss recently finished jobs "
-                             "that have been purged from slurmctld, leading to job "
-                             "status queries being impossible with 'squeue'. "
-                             "Consider using 'sacct' instead or let your admini- "
-                             "strator increase MinJobAge. "
+                            "that have been purged from slurmctld, leading to job "
+                            "status queries being impossible with 'squeue'. "
+                            "Consider using 'sacct' instead or ask your "
+                            "administrator to increase MinJobAge. "
                             "(Threshold is 3x status check interval: 3 x "
                             f"{initial_interval}s = "
                             f"{dynamic_check_threshold}s)"
                         )

668-668: Use _ for intentionally unused loop variable.

The loop variable i is not used within the loop body. Use _ to indicate it's intentionally unused.

Apply this diff:

-        for i in range(status_attempts):
+        for _ in range(status_attempts):

Based on static analysis hints.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d189548 and 84abbc3.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_slurm/__init__.py (17 hunks)
🧰 Additional context used
🧠 Learnings (13)
📓 Common learnings
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.
Learnt from: johanneskoester
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.
📚 Learning: 2024-12-11T14:17:08.749Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 178
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2024-12-11T14:17:08.749Z
Learning: In the `snakemake-executor-plugin-slurm` project, when handling exceptions in `snakemake_executor_plugin_slurm/__init__.py`, prefer concise error messages and avoid unnecessary verbosity or exception chaining when it's not necessary.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-03-10T15:20:51.829Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-executor-plugin-slurm PR: 173
File: docs/further.md:96-96
Timestamp: 2025-03-10T15:20:51.829Z
Learning: PR #173 in snakemake-executor-plugin-slurm implements GPU job support by adding resources: `gres` for generic resource specifications (e.g., 'gpu:1'), `gpu`/`gpus` for specifying GPU counts, and `gpu_model`/`gpu_manufacturer` for specifying GPU types, allowing users to request GPU resources directly rather than having to use slurm_extra.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-01-13T09:54:22.950Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-01-20T09:13:26.443Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 193
File: .github/workflows/post_to_mastodon.yml:26-26
Timestamp: 2025-01-20T09:13:26.443Z
Learning: In the snakemake-executor-plugin-slurm repository, release PRs follow the naming pattern "chore(main): release X.Y.Z" where X.Y.Z is the version number.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-03-31T11:52:05.373Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 249
File: snakemake_executor_plugin_slurm/submit_string.py:21-29
Timestamp: 2025-03-31T11:52:05.373Z
Learning: In the snakemake-executor-plugin-slurm project, account and partition parameters are validated in the submit function before reaching the get_submit_command function. By design, these parameters are either empty strings or already properly formatted with SLURM parameter tags, making additional validation in get_submit_command unnecessary.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-06-24T12:41:31.837Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 221
File: tests/tests.py:62-70
Timestamp: 2025-06-24T12:41:31.837Z
Learning: In the snakemake-executor-plugin-slurm efficiency report feature, the executor is responsible for creating the efficiency_report_path directory when generating efficiency reports. Tests should not manually check for directory existence but should rely on the executor to properly create necessary directories as part of its normal operation.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-01-17T17:27:32.446Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 187
File: .github/workflows/post_to_mastodon.yml:0-0
Timestamp: 2025-01-17T17:27:32.446Z
Learning: In the mastodon publishing workflow for snakemake-executor-plugin-slurm, the PR_TITLE environment variable is required by the post_to_mastodon.sh script and should be preserved.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-07-22T17:31:41.629Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 333
File: .github/workflows/announce-release.yml:0-0
Timestamp: 2025-07-22T17:31:41.629Z
Learning: In the snakemake-executor-plugin-slurm repository, cmeesters prefers using `main` references for GitHub Actions to always get the latest version, with security provided through protected branches and mandatory CI checks rather than commit SHA pinning.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-09-09T15:55:55.669Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 0
File: :0-0
Timestamp: 2025-09-09T15:55:55.669Z
Learning: In the Snakemake SLURM executor plugin, Snakemake submits itself as a job using the --wrap parameter in the sbatch command, rather than creating or submitting a separate job.sh script file. The actual implementation uses `call += f' --wrap="{exec_job}"'` where exec_job is the formatted snakemake execution command.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-09-16T17:25:38.450Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 355
File: snakemake_executor_plugin_slurm/submit_string.py:0-0
Timestamp: 2025-09-16T17:25:38.450Z
Learning: In the snakemake-executor-plugin-slurm project, constraint and qos parameters follow a three-state logic: None (not set, no flag added), empty string (explicitly set to empty, adds flag with empty quotes like --constraint ''), and non-empty string (adds flag with quoted value). The correct implementation uses `constraint = job.resources.get("constraint")` followed by `if constraint is not None:` to distinguish between "not provided" and "explicitly set to empty".

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-06-16T08:54:07.957Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 221
File: snakemake_executor_plugin_slurm/efficiency_report.py:0-0
Timestamp: 2025-06-16T08:54:07.957Z
Learning: In the context of SLURM executor plugin development, user cmeesters (experienced with SLURM) noted they are not aware of SLURM typically reporting memory values with decimals (like 1.5G), but accepted defensive programming to handle such cases anyway.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
📚 Learning: 2025-06-03T17:10:51.292Z
Learnt from: cmeesters
Repo: snakemake/snakemake-executor-plugin-slurm PR: 221
File: snakemake_executor_plugin_slurm/__init__.py:0-0
Timestamp: 2025-06-03T17:10:51.292Z
Learning: In the snakemake-executor-plugin-slurm repository, the create_efficiency_report method in snakemake_executor_plugin_slurm/__init__.py contains temporary testing/debugging code that logs directory contents using os.listdir() and directory paths. This code was inserted for testing purposes only and should be removed from the production implementation.

Applied to files:

  • snakemake_executor_plugin_slurm/__init__.py
🧬 Code graph analysis (1)
snakemake_executor_plugin_slurm/__init__.py (1)
snakemake_executor_plugin_slurm/job_status_query.py (5)
  • get_min_job_age (7-60)
  • is_query_tool_available (63-74)
  • should_recommend_squeue_status_command (77-101)
  • query_job_status_squeue (136-157)
  • query_job_status_sacct (104-133)
🪛 Ruff (0.14.5)
snakemake_executor_plugin_slurm/__init__.py

59-63: Avoid specifying long messages outside the exception class

(TRY003)


592-592: Avoid specifying long messages outside the exception class

(TRY003)


668-668: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: testing

Comment on lines +656 to 665
# decide which status command to use
status_command = self.get_status_command()
# Getting the actual command with parameters.
# Here, the commad will be a list generated with
# shlex.split().
if status_command == "sacct":
status_command = query_job_status_sacct(self.run_uuid)
elif status_command == "squeue":
status_command = query_job_status_squeue(self.run_uuid)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Critical: Inconsistent return types from query functions will cause runtime failures.

The two query functions return different types:

  • query_job_status_sacct() returns a string (see line 129 in job_status_query.py: return query_command where query_command is built with f-strings)
  • query_job_status_squeue() returns a list (see line 156 in job_status_query.py: return query_command where query_command is a list)

However, at line 820-821, job_stati() uses subprocess.check_output(command, text=True, shell=True, ...). When shell=True, the command must be a string. Passing a list will cause incorrect behavior or failures.

Verify the actual return types and runtime behavior:

Fix: Ensure both functions return the same type. Since job_stati() uses shell=True, both should return strings:

In job_status_query.py, modify query_job_status_squeue to return a string:

def query_job_status_squeue(jobuid) -> str:
    """
    Query job status using squeue command (newer SLURM functionality)
    """
    # Build squeue command as a list first for clarity
    query_command_parts = [
        "squeue",
        "--format=%i|%T",
        "--states=all",
        "--noheader",
        "--name",
        f"{jobuid}",
    ]
    
    # Join into a string for shell execution
    return " ".join(query_command_parts)

Or alternatively, modify job_stati() to handle both types or always use shell=False.

Minor: Fix typo at line 659.

"commad" should be "command":

-        # Here, the commad will be a list generated with
+        # Here, the command will be a string or list generated with
-        # shlex.split().
+        # the query functions.

🏁 Script executed:

#!/bin/bash
# Check the actual return types in job_status_query.py
rg -A 5 "def query_job_status_sacct" snakemake_executor_plugin_slurm/job_status_query.py
rg -A 5 "def query_job_status_squeue" snakemake_executor_plugin_slurm/job_status_query.py

# Find the return statements
rg "return query_command" snakemake_executor_plugin_slurm/job_status_query.py -B 10

Length of output: 962


I need to verify how these return values flow through the code to confirm the runtime impact:

I need to verify the actual code flow to confirm the issue reaches subprocess execution:

Based on my verification of the codebase and subprocess behavior, the review comment is accurate and well-founded. The actual code confirms:

With shell=True, a string argument is required and recommended, as using a list argument with shell=True is almost always an error.

The script output shows:

  • query_job_status_sacct returns a string (via " ".join(shlex.split(...))
  • query_job_status_squeue returns a list (literal list structure)

This type inconsistency will cause runtime failures when passed to subprocess.check_output(..., shell=True) at line 820-821.


Fix the type inconsistency in query function return values.

When shell=True, a string must be passed to subprocess. Currently:

  • query_job_status_sacct() returns a string ✓
  • query_job_status_squeue() returns a list ✗ (will fail with shell=True)

Both functions have incorrect return type annotations (-> list); query_job_status_sacct should be annotated as -> str.

Modify query_job_status_squeue() in snakemake_executor_plugin_slurm/job_status_query.py to return a string:

def query_job_status_squeue(jobuid) -> str:
    query_command = [
        "squeue",
        "--format=%i|%T",
        "--states=all",
        "--noheader",
        "--name",
        f"{jobuid}",
    ]
    # Join into a string for shell=True compatibility
    return " ".join(query_command)

Also fix the typo at line 659: "commad" → "command"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants