DeepSAP is a transformer-based workflow designed to enhance splice junction detection in RNA-seq data. By default, DeepSAP utilizes a highly sensitive GPU-accelerated GSNAP TGGA aligner for FASTQ inputs. Alternatively, it can also score pre-aligned BAM files directly — either from GSNAP itself or from any other aligner whose SAM records carry the XA (alternative alignments) tag.
We evaluated the performance of DeepSAP in our Genome Biology article: DeepSAP: improved RNA-seq alignment by integrating transcriptome guidance with transformer-based splice junction scoring (Berakdar, Wu, Zhu, Samadi, Vats, 2026). In our benchmark, DeepSAP demonstrated strong performance, achieving consistently outstanding results across all evaluated metrics using Baruzzo et al. datasets.
For additional resources, including data, detailed analyses, and supplementary materials accompanying the DeepSAP article, please refer to manuscript_data_code/README.md in this repository.
For questions, bug reports, or other DeepSAP support requests, please use the Parabricks developer forum.
- Docker with GPU support
Sizing below is for a human genome–scale reference (GRCh38). The two pipeline stages run sequentially, so peak GPU memory is the maximum of the alignment-stage and TSJS-stage footprints — not their sum.
CPU & RAM:
- CPU: 24 cores recommended (drives GSNAP's pipeline-parallel stages — reader / solver / writer threads — and DeepSAP's TSJS scoring stage).
- System RAM: 64 GB minimum.
GPU memory:
- Minimum recommended: 40 GB (validated on NVIDIA A100 PCIe 40 GB, H100 PCIe, and RTX A6000 48 GB).
- The alignment stage sets the floor; the TSJS stage's GPU memory scales with
--batchand--fp16.
Alignment stage (GPU-accelerated GSNAP):
- GSNAP transcriptome-guided genome index resident on device: ~24 GB.
--localdb-scratch(Stage-2 localdb GPU scratch buffer): default12G, tunable.- Default total: ~36 GB. Setting
--localdb-scratch=1Gbrings the alignment-stage footprint down to ~25 GB (fits a 24 GB card with little headroom — closer to 32 GB is comfortable).
TSJS (transformer splice-junction scoring) stage:
GPU memory here is dominated by two parameters:
-
--batch: number of candidate splice junctions scored per transformer forward pass. Larger batches significantly improve throughput but require more GPU memory. -
--fp16: half-precision floating-point inference is enabled by default and roughly halves GPU memory versus fp32. Disable with--no-fp16, which approximately doubles the per-batch memory shown below.--batchApproximate GPU memory (with --fp16)64 ~1.2 GB 128 ~1.6 GB 256 ~2.2 GB 2048 (default) ~10.4 GB 8192 ~39.5 GB
- RNA-seq reads in FASTQ format.
- Reference file in FASTA format.
- Annotation file in GTF format.
- Optionally, a path to a GSNAP index.
This guide demonstrates how to quickly test DeepSAP's functionality using the malaria_short_pe dataset. Follow these steps to set up your environment and run DeepSAP:
This step downloads the latest DeepSAP Docker container and all required reference files and test sequencing data.
# Pull the DeepSAP Parabricks Docker image
docker pull nvcr.io/nvidia/clara/clara-parabricks-deepsap:latest
# Download reference genome and annotation files
wget -P test/malaria_short_pe/ https://raw.githubusercontent.com/clara-parabricks-workflows/DeepSAP/main/test/malaria_short_pe/Plasmodium_falciparum.ASM276v2.60.gtf
wget -P test/malaria_short_pe/ https://raw.githubusercontent.com/clara-parabricks-workflows/DeepSAP/main/test/malaria_short_pe/Plasmodium_falciparum.ASM276v2.dna.toplevel.fa
# Download downsampled FASTQ sequence reads (10K) from DeepSAP GitHub
wget -P test/malaria_short_pe/ https://raw.githubusercontent.com/clara-parabricks-workflows/DeepSAP/main/test/malaria_short_pe/SRR14793977_10K_1.fastq.gz
wget -P test/malaria_short_pe/ https://raw.githubusercontent.com/clara-parabricks-workflows/DeepSAP/main/test/malaria_short_pe/SRR14793977_10K_2.fastq.gzThis command builds a standalone, reusable GSNAP TGGA index from the FASTA + GTF and writes it under <out>/<prefix>/. Useful when you plan to score many samples against the same reference — build the index once, then reuse it in Step 4.
# Build a reusable GSNAP index from the malaria reference
docker run --gpus 1 --ulimit memlock=-1 --ulimit stack=67108864 --rm \
--volume $(pwd)/test:/workdir \
--volume $(pwd)/test/outputdir:/outputdir \
nvcr.io/nvidia/clara/clara-parabricks-deepsap:latest \
--mode index \
--out /outputdir/ \
--prefix malaria_idx \
--gtf /workdir/malaria_short_pe/Plasmodium_falciparum.ASM276v2.60.gtf \
--fasta /workdir/malaria_short_pe/Plasmodium_falciparum.ASM276v2.dna.toplevel.fa
# -> /outputdir/malaria_idx/This command executes the full DeepSAP pipeline (GSNAP alignment + transformer splice-junction scoring) on the downloaded test dataset using the default --mode GSNAP+TSJS. Since --gsnap_idx is not specified, DeepSAP auto-builds a GSNAP index inline at <out>/gsnap_idx/ before alignment. Pick this path for one-shot runs where you don't need to reuse the index later.
# Run DeepSAP end-to-end (GSNAP index will be auto-generated)
docker run --gpus 1 --ulimit memlock=-1 --ulimit stack=67108864 --rm \
--volume $(pwd)/test:/workdir \
--volume $(pwd)/test/outputdir:/outputdir \
nvcr.io/nvidia/clara/clara-parabricks-deepsap:latest \
--mode GSNAP+TSJS \
--out /outputdir/ \
--prefix test_run_10K \
--mate_1 /workdir/malaria_short_pe/SRR14793977_10K_1.fastq.gz \
--mate_2 /workdir/malaria_short_pe/SRR14793977_10K_2.fastq.gz \
--gtf /workdir/malaria_short_pe/Plasmodium_falciparum.ASM276v2.60.gtf \
--fasta /workdir/malaria_short_pe/Plasmodium_falciparum.ASM276v2.dna.toplevel.faIf you have already generated a GSNAP index (e.g., from Step 2, a previous DeepSAP run, or shared infrastructure), point DeepSAP at it via --gsnap_idx. This takes the fast single-pass streaming path: GSNAP alignment output is piped directly into the TSJS scoring stage without writing an intermediate BAM.
# Run DeepSAP using the index built in Step 2
docker run --gpus 1 --ulimit memlock=-1 --ulimit stack=67108864 --rm \
--volume $(pwd)/test:/workdir \
--volume $(pwd)/test/outputdir:/outputdir \
nvcr.io/nvidia/clara/clara-parabricks-deepsap:latest \
--mode GSNAP+TSJS \
--out /outputdir/ \
--prefix test_run_10K \
--mate_1 /workdir/malaria_short_pe/SRR14793977_10K_1.fastq.gz \
--mate_2 /workdir/malaria_short_pe/SRR14793977_10K_2.fastq.gz \
--gtf /workdir/malaria_short_pe/Plasmodium_falciparum.ASM276v2.60.gtf \
--fasta /workdir/malaria_short_pe/Plasmodium_falciparum.ASM276v2.dna.toplevel.fa\
--gsnap_idx /outputdir/malaria_idx/If you already have a GSNAP-aligned BAM (e.g., from a prior GSNAP alignment run, or from any other aligner whose SAM records carry the XA (alternative alignments) tag), pass it via --sam and DeepSAP skips alignment entirely — running transformer splice-junction scoring directly on the BAM. The output is a new BAM with TSJS-derived MAPQ adjustments and junction-scoring metadata.
# Score a pre-aligned BAM (no GSNAP step)
docker run --gpus 1 --ulimit memlock=-1 --ulimit stack=67108864 --rm \
--volume $(pwd)/test:/workdir \
--volume $(pwd)/test/outputdir:/outputdir \
nvcr.io/nvidia/clara/clara-parabricks-deepsap:latest \
--mode GSNAP+TSJS \
--out /outputdir/ \
--prefix test_run_10K_rescored \
--sam /outputdir/test_run_10K_gsnap.bam \
--gtf /workdir/malaria_short_pe/Plasmodium_falciparum.ASM276v2.60.gtf \
--fasta /workdir/malaria_short_pe/Plasmodium_falciparum.ASM276v2.dna.toplevel.faNote:
--samand--mate_1/--mate_2are mutually exclusive — DeepSAP either aligns or scores an existing alignment, never both in the same run.
DeepSAP's --mode flag selects which pipeline mode to run. The default GSNAP+TSJS reproduces the v0.0.x end-to-end behavior; index lets you pre-build a GSNAP index in isolation (useful for sharing a pre-built index across many samples).
--mode |
Required inputs | Optional inputs | Outputs |
|---|---|---|---|
index |
--fasta, --gtf |
— | GSNAP index at <out>/<prefix>/ |
GSNAP+TSJS (default) |
--fasta, --gtf, and either --mate_1+--mate_2 (optionally with --gsnap_idx) or --sam |
model / batching flags, --score_method |
scored BAM at <out>/<prefix>.bam (+ intermediate datasets) |
docker run --gpus 1 --ulimit memlock=-1 --ulimit stack=67108864 --rm \
--volume $(pwd)/test:/workdir \
--volume $(pwd)/test/outputdir:/outputdir \
nvcr.io/nvidia/clara/clara-parabricks-deepsap:latest \
--mode index \
--out /outputdir/ \
--prefix malaria_idx \
--gtf /workdir/malaria_short_pe/Plasmodium_falciparum.ASM276v2.60.gtf \
--fasta /workdir/malaria_short_pe/Plasmodium_falciparum.ASM276v2.dna.toplevel.fa
# -> /outputdir/malaria_idx/docker run --gpus 1 --ulimit memlock=-1 --ulimit stack=67108864 --rm \
--volume $(pwd)/test:/workdir \
--volume $(pwd)/test/outputdir:/outputdir \
nvcr.io/nvidia/clara/clara-parabricks-deepsap:latest \
--mode GSNAP+TSJS \
--out /outputdir/ \
--prefix test_run_10K_rescored \
--sam /outputdir/test_run_10K_gsnap.bam \
--gtf /workdir/malaria_short_pe/Plasmodium_falciparum.ASM276v2.60.gtf \
--fasta /workdir/malaria_short_pe/Plasmodium_falciparum.ASM276v2.dna.toplevel.fa
# -> /outputdir/test_run_10K_rescored.bam (TSJS-scored)[2025-07-18 12:51:27] [INFO] Running DeepSAP v0.1.0
[2025-07-18 12:51:32] [LOG] Running GSNAP
[2025-07-18 12:51:32] [LOG] Building GSNAP TGGA index
[2025-07-18 12:52:44] [LOG] Running GSNAP TGGA
[2025-07-18 12:52:46] [LOG] Parsing FASTA file '/workdir/malaria_short_pe/Plasmodium_falciparum.ASM276v2.dna.toplevel.fa'
[2025-07-18 12:52:46] [LOG] Parsing GTF file '/workdir/malaria_short_pe/Plasmodium_falciparum.ASM276v2.60.gtf'
[2025-07-18 12:52:47] [LOG] Transcript information:
Number of transcripts: 5767
Shortest transcript: 67 EPT00050203058
Longest transcript: 30863 CAG25094
Transcripts length mean: 2456.79
Transcripts length median: 1618
Transcripts length mode: 71
Shortest intron: 1 PF3D7_1478200: 14__-__3219919__3220323 -> 14__-__3220325__3220534
Longest intron: 2425 CZU00099: 14__+__1639681__1639728 -> 14__+__1642154__1642455
Introns length mean: 163.03
Introns length median: 141.0
Introns length mode: 1
Number of multi exons transcripts: 3064 53.13%
Number of mono exon transcripts: 2703 46.87%
Type of transcripts:
BioType Count Percentage
0 protein_coding 5358 92.91
1 pseudogene 153 2.65
3 ncRNA 102 1.77
4 tRNA 79 1.37
5 rRNA 44 0.76
7 sRNA 17 0.29
6 snRNA 10 0.17
2 nontranslating_CDS 4 0.07
[2025-07-18 12:52:47] [LOG] Collecting splice junctions from GTF
[2025-07-18 12:52:47] [LOG] Collecting splice junctions in mode=NotStrict and window=150
[2025-07-18 12:52:47] [LOG] Collecting splice junctions from transcript types: All
Number of duplicated junctions: 328
Number of short junctions (intron): 0
Number of short junctions (donor): 0
Number of short junctions (acceptor): 0
Number of junctions contains N: 0
Number of accepted junctions: 8764
The First 10 Splicing Signals Types:
Signal Forward Reverse Percentage
GTAG 4096 4431 97.30
AAAA 18 17 0.40
TATA 12 8 0.23
GCAG 9 9 0.21
TTTT 6 9 0.17
ATAT 4 7 0.13
GAGA 5 6 0.13
AGAG 3 6 0.10
TATT 3 6 0.10
TAAT 4 5 0.10
[2025-07-18 12:52:47] [LOG] Collecting splice junctions from SAM/BAM file '/outputdir/test_run_10K_gsnap.bam'
[2025-07-18 12:52:47] [INFO] Sense junctions 518
[2025-07-18 12:52:47] [INFO] Antisense junctions 551
[2025-07-18 12:52:47] [INFO] Total number of reads 20479
[2025-07-18 12:52:47] [INFO] Total number of spliced reads 2233 10.903852727183946%
[2025-07-18 12:52:47] [LOG] Finished parsing a SAM file, len(found_junctions_table)= 1069
[2025-07-18 12:52:47] [LOG] Generating splice-junction prediction dataset batch: 1
[2025-07-18 12:52:47] [LOG] Writting dev.csv file for predicting into '/outputdir/test_run_10K_prediction_batch_1/'
[2025-07-18 12:52:47] [LOG] dev.csv file contains: 0: 1069, 1: 1069
[2025-07-18 12:52:47] [LOG] Predicting found splice junctions using DNABERT MS150
100%|██████████| 67/67 [00:01<00:00, 58.23it/s]
[2025-07-18 12:52:51] [LOG] Generating genome regions
[2025-07-18 12:52:51] [LOG] Parsing FASTA file '/workdir/malaria_short_pe/Plasmodium_falciparum.ASM276v2.dna.toplevel.fa'
[2025-07-18 12:52:53] [LOG] Finished writing BAM successfully into '/outputdir/test_run_10K'
[2025-07-18 12:52:53] [LOG] Number of SAM records: 20479
[2025-07-18 12:52:53] [LOG] Number of reads IDs: 12644
[2025-07-18 12:52:53] [LOG] Number of processed reads IDs: 1405 11.11%
[2025-07-18 12:52:54] [LOG] Finished successfully
| Argument | Description | Required | Default |
|---|---|---|---|
--mode |
Pipeline mode to run: index or GSNAP+TSJS. See Pipeline Modes. |
No | GSNAP+TSJS |
-o, --out |
Path to the output folder | Yes | — |
--prefix |
Output files prefix string | Yes | — |
-g, --gtf |
Path to the GTF annotation file compatible with the BAM file | Yes | — |
-f, --fasta |
Path to the FASTA genome file compatible with the BAM file | Yes | — |
-s, --sam |
Path to the SAM/BAM file or directory of files | Yes (if BAM) | — |
--mate_1 |
Path to FASTQ file of mate 1 (for paired-end reads) | Yes (if FASTQ) | — |
--mate_2 |
Path to FASTQ file of mate 2 (for paired-end reads) | Yes (if FASTQ) | — |
--gsnap_idx |
Path to GSNAP index. If omitted in GSNAP+TSJS mode, one is auto-built from --fasta+--gtf. |
No | auto-build at <out>/gsnap_idx/ |
--gsnap_idx_flags |
Extra flags passed to gmap_build and gsnap |
No | -d index -c transcriptome |
--gsnap_aln_flags |
Extra flags passed to gsnap at alignment time. See GSNAP accelerated parameters below for GPU-acceleration knobs you can wire in here. |
No | --gunzip -A sam --novelsplicing 1 |
-c, --config |
Config .json file to control DeepSAP internal parameters |
No | /scripts/parameters_config.json |
--batch |
Number of candidate splice junctions scored per transformer forward pass. Larger values raise throughput but increase GPU memory use (see Requirements for a memory-vs-batch reference). | No | 2048 |
--no-fp16 |
Don't use fp16 half-precision floating-point | No | fp16 enabled |
--set_size |
Set size to split datasets for inference | No | 102400 (= 1024 × 100) |
-t, --threads |
Number of threads | No | host os.cpu_count() |
--localdb-batch |
[GSNAP accelerated, passed through to gsnap only if set] Requests packed into each GPU kernel launch on the accelerated --localdb=GPU path. |
No | unset (gsnap default 24000) |
--localdb-scratch |
[GSNAP accelerated, passed through to gsnap only if set] Unified GPU device-byte budget for localdb scratch (accepts K/M/G suffixes, e.g. 8G). |
No | unset (gsnap default 12G) |
--batch-nreads |
[GSNAP accelerated, passed through to gsnap only if set] Max individual reads per frame. Paired-end input requires an even value ≥ 2. |
No | unset (gsnap default 250) |
- Added GPU-accelerated GSNAP. The runtime image now ships a CUDA-accelerated GSNAP build with both Stage-1 (r2d) and Stage-2 (localdb) running on the GPU by default; tunable passthrough knobs are exposed via
--localdb-batch,--localdb-scratch, and--batch-nreads(see Command-line Arguments). - Added
--modeflag to explicitly select pipeline mode (index,GSNAP+TSJS). The defaultGSNAP+TSJSpreserves the v0.0.x end-to-end behaviour, including auto-building a GSNAP index when--gsnap_idxis omitted. - Bug fix: output BAM is now correctly suffixed with
.bam. - Bug fix: SAM records with empty CIGAR strings are now normalised to
*before being written to the BAM stream. - Bug fix: stricter logits-shape validation in
predict.py(previously masked by a bareexcept).
- Fixed key error in parsing FASTA files.
- Fixed gene_id pattern error in parsing GTF files.
- Updated GSNAP aligner to version
2025-04-19.
- Initial release.
By pulling and using the Parabricks DeepSAP container, you accept the governing terms: The software and materials are governed by the NVIDIA Software License Agreement (found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/) and the Product-Specific Terms for NVIDIA AI Products (found at https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/); except for the model which is governed by the NVIDIA Models Community License Agreement(found at NVIDIA Community Model License). ADDITIONAL INFORMATION: Apache 2.0.
