Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[23.1] Let job handler work queue call job wrapper fail #17083

Draft
wants to merge 1 commit into
base: release_23.1
Choose a base branch
from

Conversation

mvdbeek
Copy link
Member

@mvdbeek mvdbeek commented Nov 24, 2023

This avoids committing the transaction and then have it fail on the next iteration. Should fix the app startup failing in #17079

There's more places though where we commit in the startup case. I wonder if all that's even necessary, the job handler should probably handle all of the error conditions ?

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

This avoids committing the transaction and then have it fail on the next
iteration. Should fix the app startup failing in galaxyproject#17079
@mvdbeek mvdbeek changed the base branch from dev to release_23.1 November 24, 2023 17:00
@mvdbeek mvdbeek changed the title Let job handler work queue call job wrapper fail [23.1] Let job handler work queue call job wrapper fail Nov 24, 2023
mvdbeek added a commit to mvdbeek/galaxy that referenced this pull request Nov 26, 2023
I think the premise that we want to do a rollback on exceptions in this
method is wrong (it **may** be correct apprach in other places in the
codebase e.g. in
`Tool.handle_single_execution()`). Here it prevents us from comitting
anything inside the with statement (as the job_wrapper.fail method
does).
Here's the simplified issue:

```shell
❯ python -i scripts/db_shell.py -c config/galaxy.yml
>>> with sa_session() as session, session.begin():
...      sa_session.execute(update(Job).where(Job.id == 1).values(state="error"))
...      sa_session.commit()
...      sa_session.execute(update(Job).where(Job.id == 1).values(state="ok"))
...      sa_session.commit()
...
<sqlalchemy.engine.cursor.LegacyCursorResult object at 0x11f1be350>
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "<string>", line 2, in execute
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 1711, in execute
    conn = self._connection_for_bind(bind, close_with_result=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 1552, in _connection_for_bind
    TransactionalContext._trans_ctx_check(self)
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.11/site-packages/sqlalchemy/engine/util.py", line 199, in _trans_ctx_check
    raise exc.InvalidRequestError(
sqlalchemy.exc.InvalidRequestError: Can't operate on closed transaction inside context manager.  Please complete the context manager before emitting further commands.
```

It is probably still worthwhile to have the job recovery be minimal and
do things such as calling the job wrapper fail method that does actual
work to the job handler as in
galaxyproject#17083, but that's
refactoring that can be done on the dev branch and it still seems risky
in the sense that we then need to be very careful in ensuring we don't
commit anywhere else inside the scope of the begin() statement.

Finally I don't think it makes sense that the startup check should
ever cause the boot process to fail. This isn't a misconfiguration
or even anything catastrophic for the remaining jobs and places
unnecessary stress on admins and can basically break at any time
and shouldn't cause a complete service failure.

Fixes galaxyproject#17079
mvdbeek added a commit to mvdbeek/galaxy that referenced this pull request Nov 27, 2023
I think the premise that we want to do a rollback on exceptions in this
method is wrong (it **may** be correct apprach in other places in the
codebase e.g. in
`Tool.handle_single_execution()`). Here it prevents us from comitting
anything inside the with statement (as the job_wrapper.fail method
does).
Here's the simplified issue:

```shell
❯ python -i scripts/db_shell.py -c config/galaxy.yml
>>> with sa_session() as session, session.begin():
...      sa_session.execute(update(Job).where(Job.id == 1).values(state="error"))
...      sa_session.commit()
...      sa_session.execute(update(Job).where(Job.id == 1).values(state="ok"))
...      sa_session.commit()
...
<sqlalchemy.engine.cursor.LegacyCursorResult object at 0x11f1be350>
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "<string>", line 2, in execute
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 1711, in execute
    conn = self._connection_for_bind(bind, close_with_result=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 1552, in _connection_for_bind
    TransactionalContext._trans_ctx_check(self)
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.11/site-packages/sqlalchemy/engine/util.py", line 199, in _trans_ctx_check
    raise exc.InvalidRequestError(
sqlalchemy.exc.InvalidRequestError: Can't operate on closed transaction inside context manager.  Please complete the context manager before emitting further commands.
```

It is probably still worthwhile to have the job recovery be minimal and
do things such as calling the job wrapper fail method that does actual
work to the job handler as in
galaxyproject#17083, but that's
refactoring that can be done on the dev branch and it still seems risky
in the sense that we then need to be very careful in ensuring we don't
commit anywhere else inside the scope of the begin() statement.

Finally I don't think it makes sense that the startup check should
ever cause the boot process to fail. This isn't a misconfiguration
or even anything catastrophic for the remaining jobs and places
unnecessary stress on admins and can basically break at any time
and shouldn't cause a complete service failure.

Fixes galaxyproject#17079
mvdbeek added a commit to mvdbeek/galaxy that referenced this pull request Nov 27, 2023
I think the premise that we want to do a rollback on exceptions in this
method is wrong (it **may** be correct apprach in other places in the
codebase e.g. in
`Tool.handle_single_execution()`). Here it prevents us from comitting
anything inside the with statement (as the job_wrapper.fail method
does).
Here's the simplified issue:

```shell
❯ python -i scripts/db_shell.py -c config/galaxy.yml
>>> with sa_session() as session, session.begin():
...      sa_session.execute(update(Job).where(Job.id == 1).values(state="error"))
...      sa_session.commit()
...      sa_session.execute(update(Job).where(Job.id == 1).values(state="ok"))
...      sa_session.commit()
...
<sqlalchemy.engine.cursor.LegacyCursorResult object at 0x11f1be350>
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "<string>", line 2, in execute
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 1711, in execute
    conn = self._connection_for_bind(bind, close_with_result=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 1552, in _connection_for_bind
    TransactionalContext._trans_ctx_check(self)
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.11/site-packages/sqlalchemy/engine/util.py", line 199, in _trans_ctx_check
    raise exc.InvalidRequestError(
sqlalchemy.exc.InvalidRequestError: Can't operate on closed transaction inside context manager.  Please complete the context manager before emitting further commands.
```

It is probably still worthwhile to have the job recovery be minimal and
do things such as calling the job wrapper fail method that does actual
work to the job handler as in
galaxyproject#17083, but that's
refactoring that can be done on the dev branch and it still seems risky
in the sense that we then need to be very careful in ensuring we don't
commit anywhere else inside the scope of the begin() statement.

Finally I don't think it makes sense that the startup check should
ever cause the boot process to fail. This isn't a misconfiguration
or even anything catastrophic for the remaining jobs and places
unnecessary stress on admins and can basically break at any time
and shouldn't cause a complete service failure.

Fixes galaxyproject#17079
@jmchilton
Copy link
Member

The mypy error is legitimate though definitely confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants