Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider unpinning ubuntu runner image for test-ert-on-slurm workflow #8931

Closed
larsevj opened this issue Oct 10, 2024 · 0 comments · Fixed by #9823
Closed

Consider unpinning ubuntu runner image for test-ert-on-slurm workflow #8931

larsevj opened this issue Oct 10, 2024 · 0 comments · Fixed by #9823
Assignees

Comments

@larsevj
Copy link
Contributor

larsevj commented Oct 10, 2024

Pinned to 22.04 due to the following error on 24:

Job for slurmctld.service failed because the control process exited with error code.
See "systemctl status slurmctld.service" and "journalctl -xeu slurmctld.service" for details.
slurm_load_partitions: Unable to contact slurm controller (connect failure)
× slurmctld.service - Slurm controller daemon
     Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Thu 2024-10-10 07:42:39 UTC; 9s ago
       Docs: man:slurmctld(8)
    Process: 3866 ExecStart=/usr/sbin/slurmctld --systemd $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)
   Main PID: 3866 (code=exited, status=1/FAILURE)
        CPU: 2ms
Oct 10 07:42:39 fv-az1950-264 systemd[1]: Starting slurmctld.service - Slurm controller daemon...
Oct 10 07:42:39 fv-az1950-264 (lurmctld)[3866]: slurmctld.service: Referenced but unset environment variable evaluates to an empty string: SLURMCTLD_OPTIONS
Oct 10 07:42:39 fv-az1950-264 slurmctld[3866]: slurmctld: fatal: Incorrect permissions on state save loc: /var/spool
Oct 10 07:42:39 fv-az1950-264 systemd[1]: slurmctld.service: Main process exited, code=exited, status=1/FAILURE
Oct 10 07:42:39 fv-az1950-264 systemd[1]: slurmctld.service: Failed with result 'exit-code'.
Oct 10 07:42:39 fv-az1950-264 systemd[1]: Failed to start slurmctld.service - Slurm controller daemon.
Oct 10 07:42:25 fv-az1950-264 systemd[1]: slurmctld.service - Slurm controller daemon was skipped because of an unmet condition check (ConditionPathExists=/etc/slurm/slurm.conf).
░░ Subject: A start job for unit slurmctld.service has finished successfully
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░ 
░░ A start job for unit slurmctld.service has finished successfully.
░░ 
░░ The job identifier is 1691.
Oct 10 07:42:39 fv-az1950-264 systemd[1]: Starting slurmctld.service - Slurm controller daemon...
░░ Subject: A start job for unit slurmctld.service has begun execution
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░ 
░░ A start job for unit slurmctld.service has begun execution.
░░ 
░░ The job identifier is 2072.
Oct 10 07:42:39 fv-az1950-264 (lurmctld)[3866]: slurmctld.service: Referenced but unset environment variable evaluates to an empty string: SLURMCTLD_OPTIONS
Oct 10 07:42:39 fv-az1950-264 slurmctld[3866]: slurmctld: fatal: Incorrect permissions on state save loc: /var/spool
Oct 10 07:42:39 fv-az1950-264 systemd[1]: slurmctld.service: Main process exited, code=exited, status=1/FAILURE
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░ 
░░ An ExecStart= process belonging to unit slurmctld.service has exited.
░░ 
░░ The process' exit code is 'exited' and its exit status is 1.
Oct 10 07:42:39 fv-az1950-264 systemd[1]: slurmctld.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░ 
░░ The unit slurmctld.service has entered the 'failed' state with result 'exit-code'.
Oct 10 07:42:39 fv-az1950-264 systemd[1]: Failed to start slurmctld.service - Slurm controller daemon.
░░ Subject: A start job for unit slurmctld.service has failed
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░ 
░░ A start job for unit slurmctld.service has finished with a failure.
░░ 
░░ The job identifier is 2072 and the job result is failed.
@sondreso sondreso moved this to Todo in SCOUT Jan 15, 2025
@berland berland moved this from Todo to In Progress in SCOUT Jan 21, 2025
@berland berland self-assigned this Jan 21, 2025
@berland berland moved this from In Progress to Ready for Review in SCOUT Jan 21, 2025
@andreas-el andreas-el moved this from Ready for Review to Reviewed in SCOUT Jan 22, 2025
@github-project-automation github-project-automation bot moved this from Reviewed to Done in SCOUT Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants