revise option to control number of resubmit fail jobs #545

thangckt · 2025-08-27T11:01:21Z

So far, option to control number of resubmit fail jobs is partially implemented and inconsistent in different Machines.
This PR try to address this problem, as also discussed in issue #525

Summary by CodeRabbit

New Features
- Added a retry_count option to configure how many times failed jobs are retried (default: 3); exposed in public arguments/help.
Tests
- Updated tests to expect retry_count in machine serialization/argument checks.
Chores
- Un-ignored workspace settings and added editor config to standardize linting line length (Ruff = 88).
Bug Fixes
- Harmonized initialization flow for several remote machine backends to ensure consistent setup.

- add option: "strategy": {"customized_script_header_template_file": ""}, - add option: `sge_pe_name`

for more information, see https://pre-commit.ci

u

for more information, see https://pre-commit.ci

u

for more information, see https://pre-commit.ci

u

for more information, see https://pre-commit.ci

u

for more information, see https://pre-commit.ci

Update pbs.py

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

dpdispatcher/machine.py (1)

78-101: Ensure retry_count is persisted and restored across (de)serialization.

retry_count is exposed via arginfo and set in init, but Machine.serialize/load_from_dict neither save nor rehydrate it. This drops user configuration on save/load and undermines centralization.

Apply:

 def load_from_dict(cls, machine_dict):
@@
-        context = BaseContext.load_from_dict(machine_dict)
-        machine = machine_class(context=context)
+        context = BaseContext.load_from_dict(machine_dict)
+        machine = machine_class(context=context)
+        # propagate retry_count even if subclass overrides __init__
+        if "retry_count" in machine_dict:
+            setattr(machine, "retry_count", machine_dict["retry_count"])
         return machine

 def serialize(self, if_empty_remote_profile=False):
     machine_dict = {}
     machine_dict["batch_type"] = self.__class__.__name__
     machine_dict["context_type"] = self.context.__class__.__name__
     machine_dict["local_root"] = self.context.init_local_root
     machine_dict["remote_root"] = self.context.init_remote_root
+    machine_dict["retry_count"] = getattr(self, "retry_count", 3)
     if not if_empty_remote_profile:
         machine_dict["remote_profile"] = self.context.remote_profile
     else:
         machine_dict["remote_profile"] = {}
     # normalize the dict
     base = self.arginfo()
     machine_dict = base.normalize_value(machine_dict, trim_pattern="_*")
     return machine_dict

Optionally clarify docs:

-        doc_retry_count = "Number of retries to resubmit failed jobs."
+        doc_retry_count = "Maximum number of retries to resubmit failed jobs (0 = no retries)."

Also applies to: 401-420, 152-170

🧹 Nitpick comments (6)

dpdispatcher/machines/openapi.py (1)
31-43: Call Machine.init to initialize retry_count; drop stale commented code.

Subclasses overriding init should invoke the base initializer so Machine-level attributes (e.g., retry_count) are consistently set. This also avoids relying on hasattr fallbacks elsewhere and aligns with the PR goal of centralizing this option.

Apply:
 class OpenAPI(Machine):
     def __init__(self, context):
         if not found_bohriumsdk:
             raise ModuleNotFoundError(
                 "bohriumsdk not installed. Install dpdispatcher with `pip install dpdispatcher[bohrium]`"
             )
-        self.context = context
+        super().__init__(context=context)
         self.remote_profile = context.remote_profile.copy()

         self.grouped = self.remote_profile.get("grouped", True)
-        # self.retry_count = self.remote_profile.get("retry_count", 3)
         self.ignore_exit_code = context.remote_profile.get("ignore_exit_code", True)
dpdispatcher/machine.py (1)
321-336: Guard errlog None when constructing last_err_file tail command.

Task.errlog allows None; formatting err_file unconditionally via shlex.quote(task.errlog) risks a TypeError or invalid tail paths. Consider a safe fallback.

Example:
-                err_file=shlex.quote(task.errlog),
+                err_file=shlex.quote(task.errlog or "err"),
dpdispatcher/machines/dp_cloud_server.py (2)
22-36: Initialize base Machine to set retry_count; remove commented legacy assignment.

Align with centralization by calling Machine.init, which binds context and sets retry_count. The commented line can stay removed.

Apply:
 class Bohrium(Machine):
@@
-    def __init__(self, context):
-        self.context = context
+    def __init__(self, context):
+        super().__init__(context=context)
         self.input_data = context.remote_profile["input_data"].copy()
@@
-        # self.retry_count = context.remote_profile.get("retry_count", 3)
         self.ignore_exit_code = context.remote_profile.get("ignore_exit_code", True)
70-73: Use super() directly for clarity.
-        shell_script = super(DpCloudServer, self).gen_script(job)
+        shell_script = super().gen_script(job)
dpdispatcher/submission.py (1)
840-861: Retry policy simplification looks good; fix log typo and harden edge cases.

Typo: "fail_cout" ➜ "fail_count".

Clamp negative retry_count to 0 to avoid surprising behavior.

Optional: log attempt progress.

Apply:
             dlog.info(
-                f"job: {self.job_hash} {self.job_id} terminated; "
-                f"fail_cout is {self.fail_count}; resubmitting job"
+                f"job: {self.job_hash} {self.job_id} terminated; "
+                f"fail_count is {self.fail_count}; resubmitting job"
             )
             retry_count = 3  # Default retry count
             assert self.machine is not None
             if hasattr(self.machine, "retry_count"):
                 retry_count = self.machine.retry_count

-            dlog.info(f"retry_count: {retry_count}")
+            retry_count = max(0, int(retry_count))
+            dlog.info(f"retry_count: {retry_count}")
+            dlog.info(f"retry attempt: {self.fail_count}/{retry_count}")
 
-            if self.fail_count > retry_count:
+            if self.fail_count > retry_count:
dpdispatcher/machines/batch.py (1)
8-8: Use a proper Windows batch script header.

The current header @echo off is appropriate for Windows batch files, but the class name suggests this might be intended as a generic batch system. Consider adding a comment to clarify this is specifically for Windows batch files.
-shell_script_header_template = """@echo off"""
+shell_script_header_template = """@echo off"""  # Windows batch script header

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8b98b36 and 909e595.

📒 Files selected for processing (6)

dpdispatcher/machine.py (4 hunks)
dpdispatcher/machines/batch.py (1 hunks)
dpdispatcher/machines/dp_cloud_server.py (1 hunks)
dpdispatcher/machines/openapi.py (1 hunks)
dpdispatcher/machines/pbs.py (1 hunks)
dpdispatcher/submission.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

dpdispatcher/machines/batch.py (3)

dpdispatcher/utils/job_status.py (1)

JobStatus (4-11)

dpdispatcher/utils/utils.py (1)

customized_script_header_template (216-221)

dpdispatcher/machines/shell.py (6)

gen_script (14-16)

gen_script_header (18-30)

do_submit (32-61)

check_status (63-91)

check_finish_tag (104-107)

kill (109-119)

🪛 GitHub Check: pyright

dpdispatcher/machines/batch.py

[failure] 57-57:
Method "default_resources" overrides class "Machine" in an incompatible manner
Parameter 2 name mismatch: base parameter is named "res", override parameter is named "resources" (reportIncompatibleMethodOverride)

🪛 GitHub Actions: Type checker

dpdispatcher/machines/batch.py

[error] 57-57: Pyright error: Method "default_resources" overrides class "Machine" in an incompatible manner. Parameter 2 name mismatch: base parameter is named "res", override parameter is named "resources" (reportIncompatibleMethodOverride). Command: /home/runner/actions-runner/cached/externals/node20/bin/node /opt/hostedtoolcache/pyright/1.1.404/x64/package/index.js --outputjson

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: test (3.9, windows-latest)
GitHub Check: build (slurm)

🔇 Additional comments (3)

dpdispatcher/machines/batch.py (3)

12-14: LGTM!

The script generation properly delegates to the parent class implementation, maintaining consistency with other machine implementations.

16-28: LGTM!

The header generation logic correctly follows the same pattern as the Shell machine, supporting both customized headers and falling back to the default template.

81-83: LGTM!

The finish tag checking follows the same pattern as other machine implementations and correctly uses the context to check file existence.

dpdispatcher/machines/batch.py

dpdispatcher/machines/pbs.py

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

.gitignore (2)

42-43: Add a brief comment explaining why settings.json is tracked.

Helps future contributors understand the exception.

Apply:
 .vscode/*
+# Track only shared workspace settings (formatters/lints), keep other VSCode files untracked.
 !.vscode/settings.json
43-43: Prefer tool-native config over editor-specific settings (optional).

If the only setting is ruff line length, consider moving it to pyproject.toml (tool.ruff.line-length) and drop the VSCode override to avoid editor coupling.

tests/test_argcheck.py (1)

30-31: Add tests for override and default round-trip semantics.

Cover non-default value and explicit default to guard regressions.

Apply:

@@ class TestJob(unittest.TestCase):
         self.assertDictEqual(norm_dict, expected_dict)
 
+    def test_machine_retry_count_override(self):
+        norm_dict = Machine.load_from_dict(
+            {
+                "batch_type": "slurm",
+                "context_type": "local",
+                "local_root": "./",
+                "remote_root": "/some/path",
+                "retry_count": 5,
+            }
+        ).serialize()
+        self.assertEqual(norm_dict["retry_count"], 5)
+
+    def test_machine_retry_count_default_roundtrip(self):
+        # Omit retry_count to ensure default is serialized as 3
+        norm_dict = Machine.load_from_dict(
+            {
+                "batch_type": "slurm",
+                "context_type": "local",
+                "local_root": "./",
+                "remote_root": "/some/path",
+            }
+        ).serialize()
+        self.assertEqual(norm_dict["retry_count"], 3)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6b244da and ac7116d.

📒 Files selected for processing (4)

.gitignore (1 hunks)
.vscode/settings.json (1 hunks)
dpdispatcher/machine.py (5 hunks)
tests/test_argcheck.py (1 hunks)

✅ Files skipped from review due to trivial changes (1)

.vscode/settings.json

🚧 Files skipped from review as they are similar to previous changes (1)

dpdispatcher/machine.py

🔇 Additional comments (3)

.gitignore (1)

43-43: Negated rule to track .vscode/settings.json looks correct.

This exception to .vscode/* is fine and will allow committing only the workspace settings file.

tests/test_argcheck.py (2)

30-31: LGTM: assertion updated for new retry_count field.

Matches the base Machine serialization default of 3.

30-31: Clarify and consolidate retry_count semantics across all Machines and contexts
Define whether retry_count=3 means “total attempts” or “resubmits after the first try,” remove or update legacy uses (e.g. in dpcloudserver/client.py, context defaults, commented-out assignments in machine classes), align documentation with the chosen behavior, and add an integration test to verify failed-job resubmission limits.

This reverts commit c8b0198.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

dpdispatcher/machines/openapi.py (3)
33-37: Init order change: confirm safety or raise earlier for fail-fast.

Calling super().__init__(context) before verifying found_bohriumsdk changes the previous fail-fast behavior and may introduce side effects if the SDK is missing. If Machine.__init__ does any I/O or assumes provider availability, this could be problematic. Either confirm it’s side-effect free, or keep the SDK check first.

Apply if you want to restore fail-fast:
 def __init__(self, context):
-        super().__init__(context)
-        if not found_bohriumsdk:
-            raise ModuleNotFoundError(
-                "bohriumsdk not installed. Install dpdispatcher with `pip install dpdispatcher[bohrium]`"
-            )
+        if not found_bohriumsdk:
+            raise ModuleNotFoundError(
+                "bohriumsdk not installed. Install dpdispatcher with `pip install dpdispatcher[bohrium]`"
+            )
+        super().__init__(context)
41-43: Use self.remote_profile consistently.

Minor inconsistency: other fields read from self.remote_profile, but ignore_exit_code reads from context.remote_profile. Prefer the local snapshot for consistency.
-        self.ignore_exit_code = context.remote_profile.get("ignore_exit_code", True)
+        self.ignore_exit_code = self.remote_profile.get("ignore_exit_code", True)
38-41: Potential duplication of base init assignments.

If Machine.__init__ already sets self.context and self.remote_profile, these reassignments are redundant. Consider removing to avoid drift.
-        self.context = context
-        self.remote_profile = context.remote_profile.copy()
+        # self.context and self.remote_profile are initialized in Machine.__init__(context)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 26a3732 and 77ce083.

📒 Files selected for processing (2)

dpdispatcher/machines/dp_cloud_server.py (1 hunks)
dpdispatcher/machines/openapi.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

dpdispatcher/machines/dp_cloud_server.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build (slurm)
GitHub Check: build (pbs)

🔇 Additional comments (1)

dpdispatcher/machines/openapi.py (1)

32-43: No provider-level retry_count usage found
Verified no occurrences of retry_count in dpdispatcher/machines; centralization in Machine.__init__ is confirmed.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

dpdispatcher/machines/openapi.py (3)
34-37: Preserve fast-fail by checking SDK availability before base init.

If Machine.__init__ performs non-trivial work (I/O, state writes), failing earlier avoids side effects when Bohrium SDK is missing. Consider swapping the order.
-    def __init__(self, context, **kwargs):
-        super().__init__(context=context, **kwargs)
-        if not found_bohriumsdk:
-            raise ModuleNotFoundError(
+    def __init__(self, context, **kwargs):
+        if not found_bohriumsdk:
+            raise ModuleNotFoundError(
                 "bohriumsdk not installed. Install dpdispatcher with `pip install dpdispatcher[bohrium]`"
             )
+        super().__init__(context=context, **kwargs)
41-42: Use self.remote_profile consistently.

Line 42 reads from context.remote_profile while line 41 uses self.remote_profile. Use the same source to avoid divergence.
-        self.ignore_exit_code = context.remote_profile.get("ignore_exit_code", True)
+        self.ignore_exit_code = self.remote_profile.get("ignore_exit_code", True)
38-40: Remove redundant context reassignment.

In OpenAPI.__init__, the call to bind_context() already does self.context = context; drop the extra self.context = context on line 38.

Retain self.remote_profile = context.remote_profile.copy() to snapshot the already-normalized profile for use in the machine.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 77ce083 and ecc93b2.

📒 Files selected for processing (3)

dpdispatcher/machine.py (6 hunks)
dpdispatcher/machines/dp_cloud_server.py (1 hunks)
dpdispatcher/machines/openapi.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

dpdispatcher/machines/dp_cloud_server.py
dpdispatcher/machine.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build (slurm)
GitHub Check: build (pbs)
GitHub Check: test (3.7, macos-13)

🔇 Additional comments (1)

dpdispatcher/machines/openapi.py (1)

32-33: Forwarding retry_count via kwargs centralizes handling; no residual direct reads found
Ripgrep search across all Python files returned zero instances of remote_profile.retry_count or similar, confirming no leftover provider-specific drift.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

dpdispatcher/machines/pbs.py (2)

234-236: Quote SGE submit paths for safety and parity with PBS.

Unquoted paths are brittle and unsafe. PBS already uses shlex.quote.

Apply:

-        stdin, stdout, stderr = self.context.block_checkcall(
-            "cd {} && {} {}".format(script_file_dir, "qsub", script_file_name)
-        )
+        stdin, stdout, stderr = self.context.block_checkcall(
+            "cd {} && {} {}".format(
+                shlex.quote(script_file_dir), "qsub", shlex.quote(script_file_name)
+            )
+        )

89-91: Fix typo in error message (“erro info” → “error info”).

Minor but user-facing.

Apply:

-                    f"status command {command} fails to execute. erro info: {err_str} return code {ret}"
+                    f"status command {command} fails to execute. error info: {err_str} return code {ret}"

Also applies to: 139-141

♻️ Duplicate comments (1)

dpdispatcher/machines/pbs.py (1)

201-207: Make SGE -pe truly optional; align schema and header generation.

Current code always injects -pe. Allow None and guard the header line.

Apply:

-        pe_name = resources.kwargs.get("pe_name", "mpi")
+        pe_name = resources.kwargs.get("pe_name", None)
         sge_script_header_dict = {}
         sge_script_header_dict["select_node_line"] = f"#$ -N {job_name}\n"
-        sge_script_header_dict["select_node_line"] += (
-            f"#$ -pe {pe_name} {resources.cpu_per_node}\n"
-        )
+        if pe_name is not None:
+            sge_script_header_dict["select_node_line"] += (
+                f"#$ -pe {pe_name} {resources.cpu_per_node}\n"
+            )

And in resources schema (lines 311-317):

-                    Argument(
-                        "pe_name",
-                        str,
-                        optional=True,
-                        default="mpi",
-                        doc=doc_pe_name,
-                        alias=["sge_pe_name"],
-                    ),
+                    Argument(
+                        "pe_name",
+                        [str, type(None)],
+                        optional=True,
+                        default=None,
+                        doc=doc_pe_name,
+                        alias=["sge_pe_name"],
+                    ),

🧹 Nitpick comments (1)

dpdispatcher/machines/pbs.py (1)
194-195: Accept positional args in SGE __init__ to mirror PBS and avoid potential breakage.
-class SGE(PBS):
-    def __init__(self, **kwargs):
-        super().__init__(**kwargs)
+class SGE(PBS):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ecc93b2 and bfe4356.

📒 Files selected for processing (1)

dpdispatcher/machines/pbs.py (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: build (ssh_rsync)
GitHub Check: build (ssh)
GitHub Check: build (pbs)
GitHub Check: build (slurm)

🔇 Additional comments (1)

dpdispatcher/machines/pbs.py (1)
20-22: Preserve positional args in PBS.init signature for backward compatibility.

Internal search found no calls passing context positionally—external clients may still rely on PBS(context, …). Apply:
 class PBS(Machine):
-    def __init__(self, **kwargs):
-        super().__init__(**kwargs)
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
Verify that no callers depend on the old positional signature.

Copilot

Pull Request Overview

This PR standardizes the retry mechanism for failed jobs across different machine backends by introducing a consistent retry_count parameter. The change addresses inconsistencies in how different machine implementations handled job retry logic.

Adds a unified retry_count parameter to the base Machine class with a default value of 3
Removes machine-specific retry count implementations from individual backends (OpenAPI, Bohrium)
Harmonizes constructor patterns across machine backends to use consistent **kwargs approach

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`dpdispatcher/machine.py`	Adds `retry_count` parameter to base Machine class constructor, serialization, and argument info
`dpdispatcher/machines/pbs.py`	Simplifies PBS and SGE constructors to use `**kwargs` pattern
`dpdispatcher/machines/openapi.py`	Updates OpenAPI constructor and removes local retry_count implementation
`dpdispatcher/machines/dp_cloud_server.py`	Updates Bohrium constructor and removes local retry_count implementation
`tests/test_argcheck.py`	Updates test expectations to include retry_count in serialization
`.vscode/settings.json`	Adds VSCode configuration for Ruff line length

Copilot · 2025-09-07T15:58:27Z

dpdispatcher/machines/pbs.py

+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+


The empty __init__ method only calls the parent constructor without adding any PBS-specific functionality. Consider removing this method entirely and rely on the parent class constructor, unless there are plans to add PBS-specific initialization logic.

Suggested change

def __init__(self, **kwargs):

super().__init__(**kwargs)

Copilot · 2025-09-07T15:58:27Z

dpdispatcher/machines/pbs.py

+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)



Similar to the PBS class, this empty __init__ method in SGE only calls the parent constructor. Consider removing this method to reduce code duplication unless SGE-specific initialization is planned.

Suggested change

def __init__(self, **kwargs):

super().__init__(**kwargs)

thangckt · 2025-09-07T23:48:40Z

@njzjz, SGE will not work if you remove what Copilot asked.
I think the current code looks overcomplicate in the terms of class initiation and inherance. It also looks cumbersome and not consistent across the Machines. May be need to refactor for more simplier?

njzjz · 2025-09-16T09:21:58Z

@njzjz, SGE will not work if you remove what Copilot asked. I think the current code looks overcomplicate in the terms of class initiation and inherance. It also looks cumbersome and not consistent across the Machines. May be need to refactor for more simplier?

Could you explain the reason? It only calls the method from the super class.

thangckt and others added 30 commits March 26, 2024 16:25

revise SGE

11fe336

- add option: "strategy": {"customized_script_header_template_file": ""}, - add option: `sge_pe_name`

Update pbs.py

8140a43

Update pbs.py

b893544

Update pbs.py

82b7262

add sge_qe_name

01ffd14

Update pbs.py

56e3afe

y

703aa49

Update submission.py

2c3b38a

u

f73e0f9

[pre-commit.ci] auto fixes from pre-commit.com hooks

957cf4a

for more information, see https://pre-commit.ci

Update pbs.py

034c9d7

Update pbs.py

3f07742

Merge branch 'PR' into master

499c691

Merge pull request #1 from thangckt/master

76150bf

u

[pre-commit.ci] auto fixes from pre-commit.com hooks

ef28a1b

for more information, see https://pre-commit.ci

u

65dca91

Merge pull request #2 from thangckt/master

ffe52e1

u

[pre-commit.ci] auto fixes from pre-commit.com hooks

8d7b566

for more information, see https://pre-commit.ci

u

8c50c4f

Merge branch 'PR' into master

89d2a41

Merge pull request #3 from thangckt/master

2093336

u

[pre-commit.ci] auto fixes from pre-commit.com hooks

920984d

for more information, see https://pre-commit.ci

y

c8f5657

Merge branch 'master' of https://github.com/thangckt/dpdispatcher

ef6ebeb

Merge branch 'PR' into master

f56fa50

Merge pull request #4 from thangckt/master

e06c648

u

[pre-commit.ci] auto fixes from pre-commit.com hooks

6d6c973

for more information, see https://pre-commit.ci

Update pbs.py

dae7909

Merge pull request #5 from thangckt/master

4ca179c

Update pbs.py

[pre-commit.ci] auto fixes from pre-commit.com hooks

83644c3

for more information, see https://pre-commit.ci

u

1ee843f

coderabbitai bot reviewed Aug 27, 2025

View reviewed changes

thangckt and others added 8 commits August 27, 2025 20:15

u

6b244da

u

4bb42d8

Update test_argcheck.py

e897d8a

u

ae8a0a3

[pre-commit.ci] auto fixes from pre-commit.com hooks

0d6bf3d

for more information, see https://pre-commit.ci

Update .gitignore

1d0d467

Merge branch 'master' of https://github.com/thangckt/dpdispatcher

08e24a6

u

ac7116d

coderabbitai bot reviewed Aug 29, 2025

View reviewed changes

thangckt added 6 commits August 29, 2025 16:21

Update test_class_machine_dispatch.py

558e911

u

c8b0198

Revert "u"

2960443

This reverts commit c8b0198.

Update test_class_machine_dispatch.py

6682e06

Update machine.py

26a3732

u

77ce083

coderabbitai bot reviewed Aug 29, 2025

View reviewed changes

u

ecc93b2

coderabbitai bot reviewed Aug 29, 2025

View reviewed changes

Update pbs.py

bfe4356

coderabbitai bot reviewed Aug 29, 2025

View reviewed changes

thangckt mentioned this pull request Aug 29, 2025

is it possible to keep monitoring when job errors? #525

Open

njzjz added the Test Bohrium label Sep 2, 2025

njzjz temporarily deployed to bohrium September 2, 2025 16:47 — with GitHub Actions Inactive

github-actions bot removed the Test Bohrium label Sep 2, 2025

njzjz requested a review from Copilot September 7, 2025 15:57

Copilot AI reviewed Sep 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

revise option to control number of resubmit fail jobs #545

revise option to control number of resubmit fail jobs #545

Uh oh!

thangckt commented Aug 27, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Sep 7, 2025

Uh oh!

Copilot AI Sep 7, 2025

Uh oh!

thangckt commented Sep 7, 2025

Uh oh!

njzjz commented Sep 16, 2025

Uh oh!

Uh oh!

revise option to control number of resubmit fail jobs #545

Are you sure you want to change the base?

revise option to control number of resubmit fail jobs #545

Uh oh!

Conversation

thangckt commented Aug 27, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

thangckt commented Sep 7, 2025

Uh oh!

njzjz commented Sep 16, 2025

Uh oh!

Uh oh!

thangckt commented Aug 27, 2025 •

edited by coderabbitai bot

Loading