Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions meta/bio/salmon_tximport/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,16 @@ description: >
+----------------+----------+------------------------------------------------+
authors:
- Thibault Dayris
pathvars:
default:
- results
- resources
- logs
custom:
transcriptome_sequence: Path to FASTA file with transcriptome sequence
genome_sequence: Path to FASTA file with genome sequence
genome_annotation: Path to GTF file with genome annotation
reads_r1: Path/pattern for FASTQ files with R1 reads
reads_r2: Path/pattern for FASTQ files with R2 reads
tx_to_gene: Path to TSV table with transcript_id <tab> gene_id
per: Pattern for sample identifiers, e.g. ``"{sample}"``
Comment on lines +21 to +33
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Pathvars structure is well-defined, but sample list configuration is missing.

The pathvars section properly defines default directories and custom path mappings. However, there's no pathvar or configuration mechanism for users to specify the list of samples to process. Users would need to know to look at the meta_wrapper.smk file and modify the hardcoded sample list in the tximport rule (currently ["S1", "S2"]), which is not user-friendly.

Consider adding a pathvar or configuration parameter for the sample list, such as:

  • samples: List of sample identifiers to process

This would allow users to specify their samples in the config file rather than modifying the wrapper code.


🏁 Script executed:

#!/bin/bash
# Inspect the complete meta.yaml file to verify structure
cat meta/bio/salmon_tximport/meta.yaml

Length of output: 1769


🏁 Script executed:

#!/bin/bash
# Find and inspect the meta_wrapper.smk file
find . -name "meta_wrapper.smk" -o -name "*wrapper*" -type f | grep -i salmon | head -5

Length of output: 180


🏁 Script executed:

#!/bin/bash
# Search for sample list configuration patterns in salmon_tximport directory
find meta/bio/salmon_tximport -type f \( -name "*.smk" -o -name "*.yaml" -o -name "*.yml" -o -name "*.json" \) -exec ls -la {} \;

Length of output: 355


🏁 Script executed:

#!/bin/bash
# Check the salmon_tximport directory structure
ls -la meta/bio/salmon_tximport/

Length of output: 364


🏁 Script executed:

#!/bin/bash
# Inspect meta_wrapper.smk to find sample list definitions
cat meta/bio/salmon_tximport/meta_wrapper.smk

Length of output: 3592


🏁 Script executed:

#!/bin/bash
# Inspect test config to see how samples are currently configured
cat meta/bio/salmon_tximport/test/config.yaml

Length of output: 316


🏁 Script executed:

#!/bin/bash
# Check if there are other meta wrappers to understand pattern for samples configuration
find meta/bio -name "meta.yaml" -exec grep -l "pathvars" {} \; | head -3 | xargs -I {} sh -c 'echo "=== {} ===" && cat {}'

Length of output: 4314


Hardcoded sample list in tximport rule prevents user configuration.

The tximport rule in meta_wrapper.smk (lines 108-127) hardcodes samples as ["S1", "S2"] in all expand() calls. Users cannot customize the sample list without modifying the wrapper code itself, which defeats the purpose of configurable meta-wrappers.

The test config demonstrates pathvar configuration for file paths and patterns (including per: "{sample}"), but provides no mechanism to specify the actual list of samples to process.

Add a configuration parameter for samples (either as a pathvar or config section) to allow users to specify their samples, such as:

samples: ["sample1", "sample2", "sample3"]

Then update the tximport rule to read this from config rather than hardcoding.

🤖 Prompt for AI Agents
In meta/bio/salmon_tximport/meta.yaml around lines 21 to 33, add a configurable
samples parameter (e.g. samples: ["sample1","sample2"]) so users can supply the
sample list instead of it being hardcoded; then update the tximport rule in
meta_wrapper.smk (around lines 108-127) to read the sample list from config
(config["samples"] or config.get("samples", [])) and use that value in all
expand() calls (with a sensible fallback or validation error if missing) rather
than the current hardcoded ["S1","S2"].

68 changes: 38 additions & 30 deletions meta/bio/salmon_tximport/meta_wrapper.smk
Original file line number Diff line number Diff line change
@@ -1,24 +1,24 @@
rule salmon_decoy_sequences:
input:
transcriptome="resources/transcriptome.fasta",
genome="resources/genome.fasta",
transcriptome="<transcriptome_sequence>",
genome="<genome_sequence>",
output:
gentrome=temp("resources/gentrome.fasta"),
decoys=temp("resources/decoys.txt"),
gentrome=temp("<resources>/gentrome.fasta"),
decoys=temp("<resources>/decoys.txt"),
threads: 1
log:
"decoys.log",
"<logs>/decoys.log",
wrapper:
"master/bio/salmon/decoys"


rule salmon_index_gentrome:
input:
sequences="resources/gentrome.fasta",
decoys="resources/decoys.txt",
sequences="<resources>/gentrome.fasta",
decoys="<resources>/decoys.txt",
output:
multiext(
"salmon/transcriptome_index/",
"<resources>/salmon_gentrome_index/",
"complete_ref_lens.bin",
"ctable.bin",
"ctg_offsets.bin",
Expand All @@ -37,7 +37,7 @@ rule salmon_index_gentrome:
),
cache: True
log:
"logs/salmon/transcriptome_index.log",
"<logs>/salmon/gentrome_index.log",
threads: 2
params:
# optional parameters
Expand All @@ -48,9 +48,10 @@ rule salmon_index_gentrome:

rule salmon_quant_reads:
input:
r="reads/{sample}.fastq.gz",
r1="<reads_r1>",
r2="<reads_r2>",
index=multiext(
"salmon/transcriptome_index/",
"<resources>/salmon_gentrome_index/",
"complete_ref_lens.bin",
"ctable.bin",
"ctg_offsets.bin",
Expand All @@ -67,17 +68,17 @@ rule salmon_quant_reads:
"seq.bin",
"versionInfo.json",
),
gtf="resources/annotation.gtf",
gtf="<genome_annotation>",
output:
quant=temp("pseudo_mapping/{sample}/quant.sf"),
quant_gene=temp("pseudo_mapping/{sample}/quant.genes.sf"),
lib=temp("pseudo_mapping/{sample}/lib_format_counts.json"),
aux_info=temp(directory("pseudo_mapping/{sample}/aux_info")),
cmd_info=temp("pseudo_mapping/{sample}/cmd_info.json"),
libparams=temp(directory("pseudo_mapping/{sample}/libParams")),
logs=temp(directory("pseudo_mapping/{sample}/logs")),
quant=temp("<results>/pseudo_mapping/<per>/quant.sf"),
quant_gene=temp("<results>/pseudo_mapping/<per>/quant.genes.sf"),
lib=temp("<results>/pseudo_mapping/<per>/lib_format_counts.json"),
aux_info=temp(directory("<results>/pseudo_mapping/<per>/aux_info")),
cmd_info=temp("<results>/pseudo_mapping/<per>/cmd_info.json"),
libparams=temp(directory("<results>/pseudo_mapping/<per>/libParams")),
logs=temp(directory("<results>/pseudo_mapping/<per>/logs")),
log:
"logs/salmon/{sample}.log",
"<logs>/salmon/<per>.log",
params:
# optional parameters
libtype="A",
Expand All @@ -90,28 +91,35 @@ rule salmon_quant_reads:
rule tximport:
input:
quant=expand(
"pseudo_mapping/{sample}/quant.sf", sample=["S1", "S2", "S3", "S4"]
"<results>/pseudo_mapping/{sample}/quant.sf",
sample=["S1", "S2"],
),
lib=expand(
"pseudo_mapping/{sample}/lib_format_counts.json",
sample=["S1", "S2", "S3", "S4"],
"<results>/pseudo_mapping/{sample}/lib_format_counts.json",
sample=["S1", "S2"],
),
aux_info=expand(
"pseudo_mapping/{sample}/aux_info", sample=["S1", "S2", "S3", "S4"]
"<results>/pseudo_mapping/{sample}/aux_info",
sample=["S1", "S2"],
),
cmd_info=expand(
"pseudo_mapping/{sample}/cmd_info.json", sample=["S1", "S2", "S3", "S4"]
"<results>/pseudo_mapping/{sample}/cmd_info.json",
sample=["S1", "S2"],
),
libparams=expand(
"pseudo_mapping/{sample}/libParams", sample=["S1", "S2", "S3", "S4"]
"<results>/pseudo_mapping/{sample}/libParams",
sample=["S1", "S2"],
),
logs=expand("pseudo_mapping/{sample}/logs", sample=["S1", "S2", "S3", "S4"]),
tx_to_gene="resources/tx2gene.tsv",
logs=expand(
"<results>/pseudo_mapping/{sample}/logs",
sample=["S1", "S2"],
),
tx_to_gene="<tx_to_gene>",
output:
txi="tximport/SummarizedExperimentObject.RDS",
txi="<results>/tximport/SummarizedExperimentObject.RDS",
params:
extra="type='salmon'",
log:
"logs/tximport.log"
"<logs>/tximport.log",
wrapper:
"master/bio/tximport"
Comment on lines 91 to 125
Copy link
Contributor

@coderabbitai coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Hardcoded sample list prevents user configuration.

The tximport rule hardcodes the sample list as ["S1", "S2"] in all six expand() calls (lines 95, 99, 103, 107, 111, 115). This is the exact issue mentioned in the PR description: "the meta-wrapper will not function unless users modify the last rule; currently samples are still listed within the expand function."

This defeats the purpose of the pathvars system, as users must edit the wrapper code to use their own samples.

Suggested solution:

  1. Add a samples pathvar to meta.yaml:
custom:
  samples: List of sample identifiers to process (e.g., ["sample1", "sample2"])
  1. In meta_wrapper.smk, replace the hardcoded lists with a placeholder or configuration reference:
# Option 1: Use a placeholder (if Snakemake supports this for lists)
sample=<samples>,

# Option 2: Reference from config (more traditional approach)
sample=config.get("samples", ["S1", "S2"]),
  1. Update test/config.yaml:
pathvars:
  samples: ["S1", "S2"]

This would allow users to configure samples in their config file without touching the wrapper code.

Do you want me to help generate a solution that makes the sample list configurable through pathvars or config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, pathvars only expand to strings. One could use config["pathvars"]["samples"] to get past this limitation, but it does not comply with the way pathvars were designed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🌐 Web query:

Snakemake pathvars expand sample list configuration

💡 Result:

Short answer + examples.

  1. Put samples (or a samples file path) in config.yaml and load it:
    config.yaml:
    samples: ["s1","s2","s3"]
    data_dir: "data"

Snakefile:
configfile: "config.yaml"

rule all:
input:
expand("{dir}/{sample}.fastq.gz", dir=config["data_dir"], sample=config["samples"])

This uses expand with config lists and path variables. [1][2]

  1. If your config holds per-sample paths (dict), use them directly or via an input function (deferred evaluation) so wildcards resolve at DAG time:

config.yaml:
samples:
s1: "data/s1_R1.fastq.gz"
s2: "data/s2_R1.fastq.gz"

Snakefile:
configfile: "config.yaml"

rule map:
input:
lambda wildcards: config["samples"][wildcards.sample]
output:
"mapped/{sample}.bam"
shell:
"mapper {input} > {output}"

Use an input function when inputs depend on a job's wildcard values. [4]

  1. If you store only a samples TSV, read it and use its index or column:

config.yaml:
samples: "samples.tsv"

Snakefile:
import pandas as pd
configfile: "config.yaml"
samples = pd.read_table(config["samples"]).set_index("sample").index.tolist()

rule all:
input: expand("qc/{sample}_qc.html", sample=samples)

Snakemake docs show using config to list samples and validating sample sheets. [1][3]

References
[1] Snakemake configuration docs (examples with expand and config).
[2] Snakemake examples/tutorials showing expand usage with config lists.
[3] Using a samples TSV / reading config-specified sample sheets.
[4] Deferred input functions explanation (use lambda or def for inputs that depend on wildcards).


@tdayris, thank you for clarifying the pathvars limitation! That makes sense—pathvars are designed for path string substitution, not data structures.

Given that constraint, the standard Snakemake approach would be to keep the sample list in config (outside of pathvars) and reference it directly in the expand() calls. Here's the recommended solution:

1. In meta.yaml, document that samples should be provided in config:

# Add a comment in the documentation section
# Note: Sample list should be provided via config["samples"] as pathvars only support strings

2. In meta_wrapper.smk, replace the hardcoded sample list with a config reference:

rule tximport:
    input:
        quant=expand(
            "<results>/pseudo_mapping/{sample}/quant.sf",
            sample=config.get("samples", ["S1", "S2"]),
        ),
        lib=expand(
            "<results>/pseudo_mapping/{sample}/lib_format_counts.json",
            sample=config.get("samples", ["S1", "S2"]),
        ),
        aux_info=expand(
            "<results>/pseudo_mapping/{sample}/aux_info",
            sample=config.get("samples", ["S1", "S2"]),
        ),
        cmd_info=expand(
            "<results>/pseudo_mapping/{sample}/cmd_info.json",
            sample=config.get("samples", ["S1", "S2"]),
        ),
        libparams=expand(
            "<results>/pseudo_mapping/{sample}/libParams",
            sample=config.get("samples", ["S1", "S2"]),
        ),
        logs=expand(
            "<results>/pseudo_mapping/{sample}/logs",
            sample=config.get("samples", ["S1", "S2"]),
        ),
        tx_to_gene="<tx_to_gene>",

3. In test/config.yaml, add the samples list outside of pathvars:

samples: ["S1", "S2"]

pathvars:
  # existing pathvars...

This approach:

  • ✅ Complies with pathvars design (strings only)
  • ✅ Follows standard Snakemake patterns for sample lists
  • ✅ Allows users to configure samples in their config file without editing wrapper code
  • ✅ Provides sensible defaults for testing

Would you like me to help refine this solution further?

🐰📝


✏️ Learnings added
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 4665
File: meta/bio/salmon_tximport/meta_wrapper.smk:91-125
Timestamp: 2025-10-31T14:56:28.220Z
Learning: In Snakemake wrappers, pathvars only expand to strings, not lists or other data structures. Sample lists should be provided via config["samples"] rather than through pathvars, following standard Snakemake configuration patterns.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 4285
File: bio/agat/wrapper.py:336-340
Timestamp: 2025-06-20T13:10:29.240Z
Learning: User tdayris prefers to fix identified code issues immediately rather than deferring them, as indicated by their comment "Now is better than later" when confirming code fixes.

12 changes: 10 additions & 2 deletions meta/bio/salmon_tximport/test/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,15 @@ from snakemake.utils import min_version

min_version("9.13.1")


configfile: "config.yaml"


module salmon_tximport:
meta_wrapper: "master/meta/bio/salmon_tximport"
meta_wrapper:
"master/meta/bio/salmon_tximport"
config:
config


use rule * from salmon_tximport
use rule * from salmon_tximport
9 changes: 9 additions & 0 deletions meta/bio/salmon_tximport/test/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
pathvars:
transcriptome_sequence: "resources/transcriptome.fasta"
genome_sequence: "resources/genome.fasta"
genome_annotation: "resources/annotation.gtf"
tx_to_gene: "resources/tx2gene.tsv"
per: "{sample}"
reads_r1: "reads/{sample}_R1.fq.gz"
reads_r2: "reads/{sample}_R2.fq.gz"
resources: "resources"
Binary file removed meta/bio/salmon_tximport/test/reads/S1.fastq.gz
Binary file not shown.
Binary file added meta/bio/salmon_tximport/test/reads/S1_R1.fq.gz
Binary file not shown.
Binary file added meta/bio/salmon_tximport/test/reads/S1_R2.fq.gz
Binary file not shown.
Binary file removed meta/bio/salmon_tximport/test/reads/S2.fastq.gz
Binary file not shown.
Binary file added meta/bio/salmon_tximport/test/reads/S2_R1.fq.gz
Binary file not shown.
Binary file added meta/bio/salmon_tximport/test/reads/S2_R2.fq.gz
Binary file not shown.
Binary file removed meta/bio/salmon_tximport/test/reads/S3.fastq.gz
Binary file not shown.
Binary file removed meta/bio/salmon_tximport/test/reads/S4.fastq.gz
Binary file not shown.
1 change: 1 addition & 0 deletions meta/bio/salmon_tximport/test/resources/genome.fasta.fai
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
chromosome1 927 13 79 80
5 changes: 3 additions & 2 deletions meta/bio/salmon_tximport/test/resources/transcriptome.fasta
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
>transcript1
CCAGGCTCGTATGTACATCGCTCCTCAAAGTGAGGGGAAGTCCTAAT
CAGGCTCGTATGTACATCGCTCCTCAAAGTGAGGGGAAGTCCTAATCGG
>transcript2
CATCTCCCTGAGTCGGTTTAAAGATTGTCTTGTATGCGTACTCTTGATAGGTAACCCG
CAGGCTCGTATGTACATCGCTCCTCAAAGTGAGGGGAAGTCCTAATCGGATACCGATTGGACTCTTGAGT
ACCGGCCCTGT
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
transcript1 49 13 49 50
transcript2 81 76 70 71
2 changes: 1 addition & 1 deletion test_wrappers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1101,7 +1101,7 @@ def test_salmon_tximport_meta(run):
"--cores",
"2",
"--use-conda",
"tximport/SummarizedExperimentObject.RDS",
"results/tximport/SummarizedExperimentObject.RDS",
],
)

Expand Down
Loading