Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions meta/bio/salmon_tximport/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,16 @@ description: >
+----------------+----------+------------------------------------------------+
authors:
- Thibault Dayris
pathvars:
default:
- results
- resources
- logs
custom:
transcriptome_sequence: Path to FASTA file with transcriptome sequence
genome_sequence: Path to FASTA file with genome sequence
genome_annotation: Path to GTF file with genome annotation
reads_r1: Path/pattern for FASTQ files with R1 reads
reads_r2: Path/pattern for FASTQ files with R2 reads
tx_to_gene: Path to TSV table with transcript_id <tab> gene_id
per: Pattern for sample identifiers, e.g. ``"{sample}"``
Comment on lines +21 to +33
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Pathvars structure is well-defined, but sample list configuration is missing.

The pathvars section properly defines default directories and custom path mappings. However, there's no pathvar or configuration mechanism for users to specify the list of samples to process. Users would need to know to look at the meta_wrapper.smk file and modify the hardcoded sample list in the tximport rule (currently ["S1", "S2"]), which is not user-friendly.

Consider adding a pathvar or configuration parameter for the sample list, such as:

  • samples: List of sample identifiers to process

This would allow users to specify their samples in the config file rather than modifying the wrapper code.


🏁 Script executed:

#!/bin/bash
# Inspect the complete meta.yaml file to verify structure
cat meta/bio/salmon_tximport/meta.yaml

Length of output: 1769


🏁 Script executed:

#!/bin/bash
# Find and inspect the meta_wrapper.smk file
find . -name "meta_wrapper.smk" -o -name "*wrapper*" -type f | grep -i salmon | head -5

Length of output: 180


🏁 Script executed:

#!/bin/bash
# Search for sample list configuration patterns in salmon_tximport directory
find meta/bio/salmon_tximport -type f \( -name "*.smk" -o -name "*.yaml" -o -name "*.yml" -o -name "*.json" \) -exec ls -la {} \;

Length of output: 355


🏁 Script executed:

#!/bin/bash
# Check the salmon_tximport directory structure
ls -la meta/bio/salmon_tximport/

Length of output: 364


🏁 Script executed:

#!/bin/bash
# Inspect meta_wrapper.smk to find sample list definitions
cat meta/bio/salmon_tximport/meta_wrapper.smk

Length of output: 3592


🏁 Script executed:

#!/bin/bash
# Inspect test config to see how samples are currently configured
cat meta/bio/salmon_tximport/test/config.yaml

Length of output: 316


🏁 Script executed:

#!/bin/bash
# Check if there are other meta wrappers to understand pattern for samples configuration
find meta/bio -name "meta.yaml" -exec grep -l "pathvars" {} \; | head -3 | xargs -I {} sh -c 'echo "=== {} ===" && cat {}'

Length of output: 4314


Hardcoded sample list in tximport rule prevents user configuration.

The tximport rule in meta_wrapper.smk (lines 108-127) hardcodes samples as ["S1", "S2"] in all expand() calls. Users cannot customize the sample list without modifying the wrapper code itself, which defeats the purpose of configurable meta-wrappers.

The test config demonstrates pathvar configuration for file paths and patterns (including per: "{sample}"), but provides no mechanism to specify the actual list of samples to process.

Add a configuration parameter for samples (either as a pathvar or config section) to allow users to specify their samples, such as:

samples: ["sample1", "sample2", "sample3"]

Then update the tximport rule to read this from config rather than hardcoding.

🤖 Prompt for AI Agents
In meta/bio/salmon_tximport/meta.yaml around lines 21 to 33, add a configurable
samples parameter (e.g. samples: ["sample1","sample2"]) so users can supply the
sample list instead of it being hardcoded; then update the tximport rule in
meta_wrapper.smk (around lines 108-127) to read the sample list from config
(config["samples"] or config.get("samples", [])) and use that value in all
expand() calls (with a sensible fallback or validation error if missing) rather
than the current hardcoded ["S1","S2"].

68 changes: 38 additions & 30 deletions meta/bio/salmon_tximport/meta_wrapper.smk
Original file line number Diff line number Diff line change
@@ -1,24 +1,24 @@
rule salmon_decoy_sequences:
input:
transcriptome="resources/transcriptome.fasta",
genome="resources/genome.fasta",
transcriptome="<transcriptome_sequence>",
genome="<genome_sequence>",
output:
gentrome=temp("resources/gentrome.fasta"),
decoys=temp("resources/decoys.txt"),
gentrome=temp("<resources>/gentrome.fasta"),
decoys=temp("<resources>/decoys.txt"),
threads: 1
log:
"decoys.log",
"<logs>/decoys.log",
wrapper:
"master/bio/salmon/decoys"


rule salmon_index_gentrome:
input:
sequences="resources/gentrome.fasta",
decoys="resources/decoys.txt",
sequences="<resources>/gentrome.fasta",
decoys="<resources>/decoys.txt",
output:
multiext(
"salmon/transcriptome_index/",
"<resources>/salmon_gentrome_index/",
"complete_ref_lens.bin",
"ctable.bin",
"ctg_offsets.bin",
Expand All @@ -37,7 +37,7 @@ rule salmon_index_gentrome:
),
cache: True
log:
"logs/salmon/transcriptome_index.log",
"<logs>/salmon/gentrome_index.log",
threads: 2
params:
# optional parameters
Expand All @@ -48,9 +48,10 @@ rule salmon_index_gentrome:

rule salmon_quant_reads:
input:
r="reads/{sample}.fastq.gz",
r1="<reads_r1>",
r2="<reads_r2>",
index=multiext(
"salmon/transcriptome_index/",
"<resources>/salmon_gentrome_index/",
"complete_ref_lens.bin",
"ctable.bin",
"ctg_offsets.bin",
Expand All @@ -67,17 +68,17 @@ rule salmon_quant_reads:
"seq.bin",
"versionInfo.json",
),
gtf="resources/annotation.gtf",
gtf="<genome_annotation>",
output:
quant=temp("pseudo_mapping/{sample}/quant.sf"),
quant_gene=temp("pseudo_mapping/{sample}/quant.genes.sf"),
lib=temp("pseudo_mapping/{sample}/lib_format_counts.json"),
aux_info=temp(directory("pseudo_mapping/{sample}/aux_info")),
cmd_info=temp("pseudo_mapping/{sample}/cmd_info.json"),
libparams=temp(directory("pseudo_mapping/{sample}/libParams")),
logs=temp(directory("pseudo_mapping/{sample}/logs")),
quant=temp("<results>/pseudo_mapping/<per>/quant.sf"),
quant_gene=temp("<results>/pseudo_mapping/<per>/quant.genes.sf"),
lib=temp("<results>/pseudo_mapping/<per>/lib_format_counts.json"),
aux_info=temp(directory("<results>/pseudo_mapping/<per>/aux_info")),
cmd_info=temp("<results>/pseudo_mapping/<per>/cmd_info.json"),
libparams=temp(directory("<results>/pseudo_mapping/<per>/libParams")),
logs=temp(directory("<results>/pseudo_mapping/<per>/logs")),
log:
"logs/salmon/{sample}.log",
"<logs>/salmon/<per>.log",
params:
# optional parameters
libtype="A",
Expand All @@ -90,28 +91,35 @@ rule salmon_quant_reads:
rule tximport:
input:
quant=expand(
"pseudo_mapping/{sample}/quant.sf", sample=["S1", "S2", "S3", "S4"]
"<results>/pseudo_mapping/{sample}/quant.sf",
sample=["S1", "S2"],
),
lib=expand(
"pseudo_mapping/{sample}/lib_format_counts.json",
sample=["S1", "S2", "S3", "S4"],
"<results>/pseudo_mapping/{sample}/lib_format_counts.json",
sample=["S1", "S2"],
),
aux_info=expand(
"pseudo_mapping/{sample}/aux_info", sample=["S1", "S2", "S3", "S4"]
"<results>/pseudo_mapping/{sample}/aux_info",
sample=["S1", "S2"],
),
cmd_info=expand(
"pseudo_mapping/{sample}/cmd_info.json", sample=["S1", "S2", "S3", "S4"]
"<results>/pseudo_mapping/{sample}/cmd_info.json",
sample=["S1", "S2"],
),
libparams=expand(
"pseudo_mapping/{sample}/libParams", sample=["S1", "S2", "S3", "S4"]
"<results>/pseudo_mapping/{sample}/libParams",
sample=["S1", "S2"],
),
logs=expand("pseudo_mapping/{sample}/logs", sample=["S1", "S2", "S3", "S4"]),
tx_to_gene="resources/tx2gene.tsv",
logs=expand(
"<results>/pseudo_mapping/{sample}/logs",
sample=["S1", "S2"],
),
tx_to_gene="<tx_to_gene>",
output:
txi="tximport/SummarizedExperimentObject.RDS",
txi="<results>/tximport/SummarizedExperimentObject.RDS",
params:
extra="type='salmon'",
log:
"logs/tximport.log"
"<logs>/tximport.log",
wrapper:
"master/bio/tximport"
Comment on lines 91 to 125
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Hardcoded sample list prevents user configuration.

The tximport rule hardcodes the sample list as ["S1", "S2"] in all six expand() calls (lines 95, 99, 103, 107, 111, 115). This is the exact issue mentioned in the PR description: "the meta-wrapper will not function unless users modify the last rule; currently samples are still listed within the expand function."

This defeats the purpose of the pathvars system, as users must edit the wrapper code to use their own samples.

Suggested solution:

  1. Add a samples pathvar to meta.yaml:
custom:
  samples: List of sample identifiers to process (e.g., ["sample1", "sample2"])
  1. In meta_wrapper.smk, replace the hardcoded lists with a placeholder or configuration reference:
# Option 1: Use a placeholder (if Snakemake supports this for lists)
sample=<samples>,

# Option 2: Reference from config (more traditional approach)
sample=config.get("samples", ["S1", "S2"]),
  1. Update test/config.yaml:
pathvars:
  samples: ["S1", "S2"]

This would allow users to configure samples in their config file without touching the wrapper code.

Do you want me to help generate a solution that makes the sample list configurable through pathvars or config?

12 changes: 10 additions & 2 deletions meta/bio/salmon_tximport/test/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,15 @@ from snakemake.utils import min_version

min_version("9.13.1")


configfile: "config.yaml"


module salmon_tximport:
meta_wrapper: "master/meta/bio/salmon_tximport"
meta_wrapper:
"master/meta/bio/salmon_tximport"
config:
config


use rule * from salmon_tximport
use rule * from salmon_tximport
9 changes: 9 additions & 0 deletions meta/bio/salmon_tximport/test/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
pathvars:
transcriptome_sequence: "resources/transcriptome.fasta"
genome_sequence: "resources/genome.fasta"
genome_annotation: "resources/annotation.gtf"
tx_to_gene: "resources/tx2gene.tsv"
per: "{sample}"
reads_r1: "reads/{sample}_R1.fq.gz"
reads_r2: "reads/{sample}_R2.fq.gz"
resources: "resources"
Binary file removed meta/bio/salmon_tximport/test/reads/S1.fastq.gz
Binary file not shown.
Binary file added meta/bio/salmon_tximport/test/reads/S1_R1.fq.gz
Binary file not shown.
Binary file added meta/bio/salmon_tximport/test/reads/S1_R2.fq.gz
Binary file not shown.
Binary file removed meta/bio/salmon_tximport/test/reads/S2.fastq.gz
Binary file not shown.
Binary file added meta/bio/salmon_tximport/test/reads/S2_R1.fq.gz
Binary file not shown.
Binary file added meta/bio/salmon_tximport/test/reads/S2_R2.fq.gz
Binary file not shown.
Binary file removed meta/bio/salmon_tximport/test/reads/S3.fastq.gz
Binary file not shown.
Binary file removed meta/bio/salmon_tximport/test/reads/S4.fastq.gz
Binary file not shown.
1 change: 1 addition & 0 deletions meta/bio/salmon_tximport/test/resources/genome.fasta.fai
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
chromosome1 927 13 79 80
5 changes: 3 additions & 2 deletions meta/bio/salmon_tximport/test/resources/transcriptome.fasta
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
>transcript1
CCAGGCTCGTATGTACATCGCTCCTCAAAGTGAGGGGAAGTCCTAAT
CAGGCTCGTATGTACATCGCTCCTCAAAGTGAGGGGAAGTCCTAATCGG
>transcript2
CATCTCCCTGAGTCGGTTTAAAGATTGTCTTGTATGCGTACTCTTGATAGGTAACCCG
CAGGCTCGTATGTACATCGCTCCTCAAAGTGAGGGGAAGTCCTAATCGGATACCGATTGGACTCTTGAGT
ACCGGCCCTGT
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
transcript1 49 13 49 50
transcript2 81 76 70 71
2 changes: 1 addition & 1 deletion test_wrappers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1101,7 +1101,7 @@ def test_salmon_tximport_meta(run):
"--cores",
"2",
"--use-conda",
"tximport/SummarizedExperimentObject.RDS",
"results/tximport/SummarizedExperimentObject.RDS",
],
)

Expand Down
Loading