New `--multi_cram` option to produce a multi-query CRAM file combining all the alignments by charles-plessy · Pull Request #114 · nf-core/pairgenomealign

charles-plessy · 2026-05-29T08:34:52Z

Main changes to the code:

Addition, configuration and patching of the samtools/merge module.
Streamlining of the output of a local subworkflow.
Implementation of the option in the main workflow of the pipeline.

Closes #60.

Details on the new feature:

The merged CRAM file is neither a pangenome nor a multiple sequence alignment, but I find it very useful.

Temporary CRAM files are produced but not exported. Their header indicates only the name of the query genomes in the read group fields.

The files are merged in a single CRAM file, where each read group represents one genome. Each target-query alignment is a one-to-one relationship so a base in the target is aligned at most once to each query.

Care is taken to ensure that the path to the reference genome is relative to the current directory. The multi-query CRAM file is output in the same directory as its index and the BGZIpped genome, indexed too.

Thus the multi-query CRAM file can be loaded and visualised in the IGV. The coverage plot shows how many query genomes align to the target at a given location. Expanded track view allows to visualise all the sequence differences.

You can stabilise the order of the genomes, but IGV enforces alphanumeric sorting. You can work around this limitation by prefixing the sample IDs with numbers in the sample sheet.

Custom scripts can (and have) be written to slice a pieces of the multi-query CRAM file and turn these pieces into real MSAs…

PR checklist

…ence.

The merged CRAM file is neither a pangenome nor a multiple sequence alignment, but I find it very useful. Temporarly CRAM files are produced but not exported. Their header indicates only the name of the query genomes in the read group fields. The files are merged in a single CRAM file, where each read group represents one genome. Each target-query alignment is a one-to-one relationship so a base in the target is aligned at most once to each query. Care is taken to ensure that the path to the reference genome is relative to the current directory. The multi-query CRAM file is output in the same directory as its index and the BGZIpped genome, indexed too. Thus the multi-query CRAM file can be loaded and visualised in the IGV. The coverage plot shows how many query genomes align to the target at a given location. Expanded track view allows to visualise all the sequence differences. You can stabilise the order of the genomes, but IGV enforces alphanumeric sorting. You can work around this limitation by prefixing the sample IDs with numbers in the sample sheet. Custom scripts can (and have) be written to slice a pieces of the multi-query CRAM file and turn these pieces into real MSAs…

Will change to CRAM 3.1 in pairgenomealign 3.0.0.

Joon-Klaps

Very minor things mostly on readabality.

Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>

…which I submitted recently based on the local version.

charles-plessy · 2026-06-02T03:01:22Z

@Joon-Klaps Thanks to your comments I made big changes, can I ask you to have a look?

Joon-Klaps

LGTM

charles-plessy added 14 commits May 27, 2026 16:00

Import samtools/merge module

db8002b

Add a multi_cram option.

0ab27ac

Merge the fasta_bgzip_index_dict_samtools outputs in a single channel.

8a99fc1

Also output the dictionary file.

48fcd5a

Patch samtools/merge to preserve local paths to the reference.

f37dcf9

Correct default value of params.multi_cram, for use in if statements.

dcfa35f

Properly handle the case when maf-convert does not need a genome sequ…

e20e2ca

…ence.

Document the changes.

4a19cd1

Also update the subworkflow's snapshot.

0fcb2dc

Merge branch 'dev' into multi-cram-issue-60

c93da93

Fix changelog borken by merge

c870d97

prek run --show-diff-on-failure --color=always --all-files

c6cbffe

Use CRAM 3.0 to be consistent with maf-convert.

9f136d4

Will change to CRAM 3.1 in pairgenomealign 3.0.0.

Joon-Klaps approved these changes Jun 1, 2026

View reviewed changes

charles-plessy and others added 7 commits June 2, 2026 09:51

Generate 4 channels at once.

cc1fd26

Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>

Use the 4 channels generated with multiMap.

220e3c2

Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>

Use the 4 channels generated with multiMap.

fbec929

Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>

Use the same bgzipped genome channel everywhere

2bfdc19

Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>

prek run --show-diff-on-failure --color=always --all-files

1ad1902

Simplify one if/else statement in just one if.

0e58188

Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>

Use nf-core's version of FASTA_BGZIP_INDEX_DICT_SAMTOOLS…

15df6b6

…which I submitted recently based on the local version.

charles-plessy force-pushed the multi-cram-issue-60 branch from a5267b2 to 15df6b6 Compare June 2, 2026 02:56

charles-plessy requested a review from Joon-Klaps June 2, 2026 03:00

Joon-Klaps approved these changes Jun 2, 2026

View reviewed changes

charles-plessy merged commit 8fac8f7 into dev Jun 2, 2026
9 checks passed

charles-plessy deleted the multi-cram-issue-60 branch June 2, 2026 11:41

charles-plessy mentioned this pull request Jun 11, 2026

Multi-sequence CRAM output #60

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New `--multi_cram` option to produce a multi-query CRAM file combining all the alignments#114

New `--multi_cram` option to produce a multi-query CRAM file combining all the alignments#114
charles-plessy merged 21 commits into
devfrom
multi-cram-issue-60

charles-plessy commented May 29, 2026 •

edited

Loading

Uh oh!

Joon-Klaps left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

charles-plessy commented Jun 2, 2026

Uh oh!

Joon-Klaps left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

charles-plessy commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Main changes to the code:

Details on the new feature:

PR checklist

Uh oh!

Joon-Klaps left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

charles-plessy commented Jun 2, 2026

Uh oh!

Joon-Klaps left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

charles-plessy commented May 29, 2026 •

edited

Loading