[23.1] Let job handler work queue call job wrapper fail #17083

mvdbeek · 2023-11-24T17:00:06Z

This avoids committing the transaction and then have it fail on the next iteration. Should fix the app startup failing in #17079

There's more places though where we commit in the startup case. I wonder if all that's even necessary, the job handler should probably handle all of the error conditions ?

How to test the changes?

(Select all options that apply)

I've included appropriate automated tests.
This is a refactoring of components with existing test coverage.
Instructions for manual testing are as follows:
1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

This avoids committing the transaction and then have it fail on the next iteration. Should fix the app startup failing in galaxyproject#17079

I think the premise that we want to do a rollback on exceptions in this method is wrong (it **may** be correct apprach in other places in the codebase e.g. in `Tool.handle_single_execution()`). Here it prevents us from comitting anything inside the with statement (as the job_wrapper.fail method does). Here's the simplified issue: ```shell ❯ python -i scripts/db_shell.py -c config/galaxy.yml >>> with sa_session() as session, session.begin(): ... sa_session.execute(update(Job).where(Job.id == 1).values(state="error")) ... sa_session.commit() ... sa_session.execute(update(Job).where(Job.id == 1).values(state="ok")) ... sa_session.commit() ... <sqlalchemy.engine.cursor.LegacyCursorResult object at 0x11f1be350> Traceback (most recent call last): File "<stdin>", line 4, in <module> File "<string>", line 2, in execute File "/Users/mvandenb/src/galaxy/.venv/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 1711, in execute conn = self._connection_for_bind(bind, close_with_result=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/mvandenb/src/galaxy/.venv/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 1552, in _connection_for_bind TransactionalContext._trans_ctx_check(self) File "/Users/mvandenb/src/galaxy/.venv/lib/python3.11/site-packages/sqlalchemy/engine/util.py", line 199, in _trans_ctx_check raise exc.InvalidRequestError( sqlalchemy.exc.InvalidRequestError: Can't operate on closed transaction inside context manager. Please complete the context manager before emitting further commands. ``` It is probably still worthwhile to have the job recovery be minimal and do things such as calling the job wrapper fail method that does actual work to the job handler as in galaxyproject#17083, but that's refactoring that can be done on the dev branch and it still seems risky in the sense that we then need to be very careful in ensuring we don't commit anywhere else inside the scope of the begin() statement. Finally I don't think it makes sense that the startup check should ever cause the boot process to fail. This isn't a misconfiguration or even anything catastrophic for the remaining jobs and places unnecessary stress on admins and can basically break at any time and shouldn't cause a complete service failure. Fixes galaxyproject#17079

jmchilton · 2023-11-27T15:37:11Z

The mypy error is legitimate though definitely confusing.

Let job handler work queue call job wrapper fail

d95fe4f

This avoids committing the transaction and then have it fail on the next iteration. Should fix the app startup failing in galaxyproject#17079

mvdbeek changed the base branch from dev to release_23.1 November 24, 2023 17:00

mvdbeek changed the title ~~Let job handler work queue call job wrapper fail~~ [23.1] Let job handler work queue call job wrapper fail Nov 24, 2023

github-actions bot added the area/jobs label Nov 24, 2023

jmchilton approved these changes Nov 25, 2023

View reviewed changes

mvdbeek mentioned this pull request Nov 26, 2023

[23.1] Remove rollback from __check_jobs_at_startup #17085

Merged

4 tasks

kysrpex mentioned this pull request Nov 27, 2023

Galaxy job handlers may crash during startup #17079

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[23.1] Let job handler work queue call job wrapper fail #17083

[23.1] Let job handler work queue call job wrapper fail #17083

mvdbeek commented Nov 24, 2023

jmchilton commented Nov 27, 2023

[23.1] Let job handler work queue call job wrapper fail #17083

Are you sure you want to change the base?

[23.1] Let job handler work queue call job wrapper fail #17083

Conversation

mvdbeek commented Nov 24, 2023

How to test the changes?

License

jmchilton commented Nov 27, 2023