[WIP] Move pysam index to external process #11558

nuwang · 2021-03-06T15:29:16Z

What did you do?

This PR moves all calls to pysam.index to an external process. This had previously been done in one place in the code:

Line 493 in e5a9524

    
           cmd = ['python', '-c', f"import pysam; pysam.set_verbosity(0); pysam.index('{file_name}', '{index_name}')"]

but this PR extends that to all use cases.

Why did you make this change?

We ran into an issue in the k8s chart where the job handler would abruptly restart while within pysam.index. The proximate cause was a health check failure. The underlying reason was that pysam.index could potentially take a long time, and being an external c extension, appears to be not releasing the GIL, preventing the heartbeat thread from running. The failure of the heartbeat thread to report liveness causes k8s to restart the job handler.

By externalizing the process, we prevent it from blocking the job handler threads, and has the additional benefit of generally preventing any pysam failures from causing a handler crash.

How to test the changes?

(select the most appropriate option; if the latter, provide steps for testing below)

This is a refactoring of components with existing test coverage.

bernt-matthias · 2021-03-06T16:57:23Z

lib/galaxy/datatypes/util/generic_util.py

+               f"import pysam; pysam.set_verbosity(0); pysam.index('{index_flag}', '{file_name}', '{index_name}')"]
+    if stderr:
+        with open(stderr, 'w') as stderr:
+            subprocess.check_call(cmd, stderr=stderr, shell=False)


Can you use

galaxy/lib/galaxy/util/commands.py

Line 88 in ca44259

def execute(cmds, input=None):

?

Thanks for reviewing. The original code has a specific comment saying that stderr needs to be discarded:

galaxy/lib/galaxy/datatypes/binary.py

Line 490 in e5a9524

# we start another process and discard stderr.

and execute doesn't seem to support stderr redirection?

That's what stderr=subprocess.PIPE does (not exactly, but this good enough. the only important thing is that stderr of the externalize pysam call doesn't end up in the outer stderr, which was? a failure reason for the metadata script)

Aah I see, the piped stderr is being ignored. Sure, seems fine, can do.

mvdbeek · 2021-03-06T18:29:14Z

Can you make this optional ? For traditional Galaxy job runners this runs as a external process already as part of the metadata script. When creating many small files creating a subprocess for this is going to be significant overhead.

nuwang · 2021-03-07T07:25:38Z

I'm now wondering whether this is worth doing at all. I guess threads being suspended is only really a problem for the heartbeat thread, but it seems like the heartbeat is not really a good proxy for liveness anyway for a number of reasons.

a. All it’s saying is that a particular thread in the handler is alive, which k8s already knows since the overall handler process is alive. It has a low probability of failure and really doesn’t indicate a lot about the actual health of the handler.
b. The only additional bit of information we know is that the process is communicating with the database, but again, that has never really been a problem, and we’d know about it pretty quickly through other means anyway.
c. The liveness probe also introduces risks like two job handlers with the same name being alive simultaneously for a brief period of time.
d. We presumably don’t know for sure how many other similar instances like this could block the heartbeat thread?

So it seems more effective to redo or simply drop the liveness probe. So if this heartbeat blocking issue is not a problem elsewhere, should we consider just doing that instead?

mvdbeek · 2021-03-07T09:18:31Z

Maybe. Another way to look at "liveness" could be monitoring the main thing each handler is supposed to do. For workflow handlers this might be creating new jobs, and for job handlers that might be dispatching jobs in the job loop. A bit harder to do for web handlers, but I guess if they're responding to requests that might be fine ?

bernt-matthias · 2022-02-18T17:36:48Z

May #13411 be an alternative?

Move pysam index to external process

29a21f7

github-actions bot added the area/datatypes label Mar 6, 2021

bernt-matthias requested changes Mar 6, 2021

View reviewed changes

ksuderman mentioned this pull request Mar 17, 2021

Handlers restart while migrations happen in Job container galaxyproject/galaxy-helm#172

Closed

dannon mentioned this pull request Mar 18, 2021

Job handler delays heartbeat when scheduling large tasks #10894

Open

martenson added the status/WIP label Apr 7, 2021

nsoranzo marked this pull request as draft September 16, 2022 00:18

nsoranzo removed the status/WIP label Sep 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Move pysam index to external process #11558

[WIP] Move pysam index to external process #11558

nuwang commented Mar 6, 2021 •

edited

Loading

bernt-matthias Mar 6, 2021

nuwang Mar 6, 2021 •

edited

Loading

mvdbeek Mar 6, 2021 •

edited

Loading

nuwang Mar 7, 2021

mvdbeek commented Mar 6, 2021 •

edited

Loading

nuwang commented Mar 7, 2021 •

edited

Loading

mvdbeek commented Mar 7, 2021 •

edited

Loading

bernt-matthias commented Feb 18, 2022

[WIP] Move pysam index to external process #11558

Are you sure you want to change the base?

[WIP] Move pysam index to external process #11558

Conversation

nuwang commented Mar 6, 2021 • edited Loading

What did you do?

Why did you make this change?

How to test the changes?

bernt-matthias Mar 6, 2021

Choose a reason for hiding this comment

nuwang Mar 6, 2021 • edited Loading

Choose a reason for hiding this comment

mvdbeek Mar 6, 2021 • edited Loading

Choose a reason for hiding this comment

nuwang Mar 7, 2021

Choose a reason for hiding this comment

mvdbeek commented Mar 6, 2021 • edited Loading

nuwang commented Mar 7, 2021 • edited Loading

mvdbeek commented Mar 7, 2021 • edited Loading

bernt-matthias commented Feb 18, 2022

nuwang commented Mar 6, 2021 •

edited

Loading

nuwang Mar 6, 2021 •

edited

Loading

mvdbeek Mar 6, 2021 •

edited

Loading

mvdbeek commented Mar 6, 2021 •

edited

Loading

nuwang commented Mar 7, 2021 •

edited

Loading

mvdbeek commented Mar 7, 2021 •

edited

Loading